0% found this document useful (0 votes)
14 views571 pages

OPTI Transparencies

The document describes an introduction to linear optimization course. It provides details about the instructor, teaching assistant, class times, grading policy, and course materials. The course focuses on linear optimization theory and algorithms. Key topics include linear optimization modeling and applications, the geometry and properties of optimal solutions, duality theory, and simplex and interior point algorithms. The document also provides examples of linear programs and discusses the properties that define a problem as a linear optimization problem.

Uploaded by

Bassem Khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views571 pages

OPTI Transparencies

The document describes an introduction to linear optimization course. It provides details about the instructor, teaching assistant, class times, grading policy, and course materials. The course focuses on linear optimization theory and algorithms. Key topics include linear optimization modeling and applications, the geometry and properties of optimal solutions, duality theory, and simplex and interior point algorithms. The document also provides examples of linear programs and discusses the properties that define a problem as a linear optimization problem.

Uploaded by

Bassem Khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 571

Course:

Optimization I
Introduction to Linear Optimization
ISyE 6661 Fall 2018

Instructor: Dr. Arkadi Nemirovski


[email protected] 404-385-0769
Office hours: Mon 10:00-12:00 Groseclose 446

Teaching Assistant: Haoming Jiang


[email protected] Office location: ISYE Main Building 224
Office hours: Wednesday 10:00-11:00
Classes: Tuesday & Thursday 3:00 – 4:15 pm, IC 105
Lecture Notes, Transparencies, Assignments:
Course website and
https://www2.isye.gatech.edu/~nemirovs/OPTI_Transparencies.pdf
https://www2.isye.gatech.edu/~nemirovs/OPTI_LectureNotes2016.pdf
Grading Policy:

Assignments 5%
Midterm exam 30%
Final exam 65%
♣ To make decisions optimally is one of the most basic desires of a human
being.
Whenever the candidate decisions, design restrictions and design goals can
be properly quantified, optimal decision-making reduces to solving an opti-
mization problem, most typically, a Mathematical Programming one:
minimize
f (x) [ objective ]
subject to " #
equality
hi(x) = 0, i = 1, ..., m (MP)
"
constraints #
inequality
gj (x) ≤ 0, j = 1, ..., k
constraints
x ∈ X [ domain ]
♣ In (MP),
♦ a solution x ∈ Rn represents a candidate decision,
♦ the constraints express restrictions on the meaningful decisions (balance
and state equations, bounds on resources, etc.),
♦ the objective to be minimized represents the losses (minus profit) associ-
ated with a decision.
minimize
f (x) [ objective ]
subject to " #
equality
hi(x) = 0, i = 1, ..., m (MP)
"
constraints #
inequality
gj (x) ≤ 0, j = 1, ..., k
constraints
x ∈ X [ domain ]

♣ To solve problem (MP) means to find its optimal solution x∗, that is,
a feasible (i.e., satisfying the constraints) solution with the value of the
objective ≤ its value at any other feasible solution:


 h (x ) = 0 ∀i & gj (x∗) ≤ 0 ∀j & x∗ ∈ X
 i ∗
x∗ : hi(x) = 0 ∀i & gj (x) ≤ 0 ∀j & x ∈ X

⇒ f (x∗) ≤ f (x)


min f (x)
x
s.t.
hi(x) = 0, i = 1, ..., m (MP)
gj (x) ≤ 0, j = 1, ..., k
x ∈ X

♣ In Combinatorial (or Discrete) Optimization, the domain X is a discrete


set, like the set of all integral or 0/1 vectors.
In contrast to this, in Continuous Optimization we will focus on, X is a
“continuum” set like the entire Rn, a box {x : a ≤ x ≤ b}, or simplex
{x ≥ 0 : xj = 1}, etc., and the objective and the constraints are (at least)
P
j
continuous on X.
♣ In Linear Optimization, X = Rn and the objective and the constraints are
linear functions of x.
In contrast to this, in Nonlinear Continuous Optimization, the objective and
the constraints can be nonlinear functions.
♣ Our course is on Linear Optimization, the simplest and the most fre-
quently used in applications part of Mathematical Programming. Some of
the reasons for LO to be popular are:

• reasonable “expressive power” of LO — while the world we live in is mainly


nonlinear, linear dependencies in many situations can be considered as quite
satisfactory approximations of actual nonlinear dependencies. At the same
time, a linear dependence is easy to specify, which makes it realistic to specify
data of Linear Optimization models with many variables and constraints;

• existence of extremely elegant, rich and essentially complete mathematical


theory of LO;

• last, but by far not least, existence of extremely powerful solution algo-
rithms capable to solve to optimality in reasonable time LO problems with
tens and hundreds of thousands of variables and constraints.
♣ In our course, we will focus primarily on “LO machinery” (LO Theory and
Algorithms), leaving beyond our scope practical applications of LO which
are by far too numerous and diverse to be even outlined in a single course.
The brief outline of the contents is as follows:

• LO Modeling, including instructive examples of LO models and “calculus”


of LO models – collection of tools allowing to recognize the possibility to
pose an optimization problem as an LO program;

• LO Theory – geometry of LO programs, existence and characterization of


optimal solutions, theory of systems of linear inequalities and duality;

• LO Algorithms, including Simplex-type and Interior Point ones, and the


associated complexity issues.
Linear Optimization Models

♣ An LO program. A Linear Optimization problem, or program (LO),


called also Linear Programming problem/program, is the problem of opti-
mizing a linear function cT x of an n-dimensional vector x under finitely many
linear equality and nonstrict inequality constraints.
♣ The Mathematical Programming problem
  

  x1 + x2 ≤ 20
 

min x1 : x1 − x2 = 5 (1)
x 
x1 , x 2 ≥ 0
 
 

is an LO program.
♣ The problem
  

  x1 + x2 ≤ 20 
 
min exp{x1} : x1 − x2 = 5 (10)
x 
x1, x2 ≥ 0 
 
 

is not an LO program, since the objective in (10) is nonlinear.

1.1
♣ The problem
  

 
 2x1 ≥ 20 − x2 

x1 − x2
  
  = 5 
max x1 + x2 : (2)
x 



x1 ≥ 0 


 
x2 ≤ 0 

is an LO program.
♣ The problem
  





 ∀i ≥ 2 : 


ix1 ≥ 20 − x2,

 
 


 
 

max x1 + x2 : x1 − x2 = 5 (20)
x 
x1 ≥ 0
 
 


 
 

  
x2 ≤ 0

 
 

is not an LO program – it has infinitely many linear constraints.

♠ Note: Property of an MP problem to be or not to be an LO program is


the property of a representation of the problem. We classify optimization
problems according to how they are presented, and not according to what
they can be equivalent/reduced to.

1.2
Canonical and Standard forms of LO programs

♣ Observe that we can somehow unify the forms in which LO programs are
written. Indeed
• every linear equality/inequality can be equivalently rewritten in the form
where the left hand side is a weighted sum n
P
j=1 aj xj of variables xj with
coefficients, and the right hand side is a real constant:
2x1 ≥ 20 − x2 ⇔ 2x1 + x2 ≥ 20
• the sign of a nonstrict linear inequality always can be made ”≤”, since the
inequality j aj xj ≥ b is equivalent to j [−aj ]xj ≤ [−b]:
P P

2x1 + x2 ≥ 20 ⇔ −2x1 − x2 ≤ −20


• a linear equality constraint j aj xj = b can be represented equivalently by
P

the pair of opposite inequalities j aj xj ≤ b, j [−aj ]xj ≤ [−b]:


P P
(
2x1 − x2 ≤ 5
2x1 − x2 = 5 ⇔
−2x1 + x2 ≤ −5
• to minimize a linear function j cj xj is exactly the same to maximize the
P
P
linear function j [−cj ]xj .

1.3
♣ Every LO program is equivalent to an LO program in the canonical form,
where the objective should be maximized, and all the constraints are ≤-
inequalities:
nP
n Pn
Opt = max
x
c x
j=1 j j : j=1 aij xj ≤ bio,
1≤i≤m
n [“term-wise” notation] o
T T
⇔ Opt = max c x : ai x ≤ bi, 1 ≤ i ≤ m
x
[“constraint-wise”
n o notation]
⇔ Opt = max cT x : Ax ≤ b
x
[“matrix-vector” notation]
c = [c1; ...; cn], b = [b1; ...; bm],
ai = [ai1; ...; ain], A = [aT T T
1 ; a2 ; ...; am ]
♠ A set X ⊂ Rn given by X = {x : Ax ≤ b} – the solution set of a finite
system of nonstrict linear inequalities aT
i x ≤ b i , 1 ≤ i ≤ m in variables x ∈ Rn

– is called polyhedral set, or polyhedron. An LO program in the canonical


form is to maximize a linear objective over a polyhedral set.
♠ Note: The solution set of an arbitrary finite system of linear equalities
and nonstrict inequalities in variables x ∈ Rn is a polyhedral set.
1.4
  

 
 −x1 + x2 ≤ 6 


  
  3x1 + 2x2 7 
max x2 :
x 
 

 
7x1 − 3x2 ≤ 1 


 
−8x1 − 5x2 ≤ 100 

[−1;5]

[1;2]

[−10;−4]

[−5;−12]

LO program and its feasible domain

1.5
♣ Standard form of an LO program is to maximize a linear function over
the intersection of the nonnegative orthant Rn n
+ = {x ∈ R : x ≥ 0} and the
feasible plane {x : Ax = b}:
 Pn 

P j=1 aij xj = bi , 

Opt = max n
x  j=1 cj xj : 1≤i≤m 

xj ≥ 0, j = 1, ..., n 

( [“term-wise” notation]
)
T
a x = bi , 1 ≤ i ≤ m
⇔ Opt = max cT x : i
x xj ≥ 0, 1 ≤ j ≤ n
n [“constraint-wise” o notation]
⇔ Opt = max cT x : Ax = b, x ≥ 0
x
[“matrix-vector” notation]
c = [c1; ...; cn], b = [b1; ...; bm],
ai = [ai1; ...; ain], A = [aT ;
1 2 aT ; ...; aT ]
m
In the standard form LO program
• all variables are restricted to be nonnegative
• all “general-type” linear constraints are equalities.

1.6
♣ Observation: The standard form of LO program is universal: every LO
program is equivalent to an LO program in the standard form.
Indeed, it suffices to convert to the standard form a canonical LO max{cT x :
x
Ax ≤ b}. This can be done as follows:
• we introduce slack variables, one per inequality constraint, and rewrite the
problem equivalently as

max{cT x : Ax+s = b, s ≥ 0}
x,s
• we further represent x as the difference of two new nonnegative vector
variables x = u − v, thus arriving at the program
n o
T T
max c u − c v : Au − Av + s = b, [u; v; s] ≥ 0 .
u,v,s

1.7
max {−2x1 + x3 : −x1 + x2 + x3 = 1, x ≥ 0}
x

x
3

x
2
x
1

Standard form LO program


and its feasible domain

1.8
LO Terminology

Opt = max{cT x : Ax ≤ b}
x (LO)
[A : m × n]
• The variable vector x in (LO) is called the decision vector of the program;
its entries xj are called decision variables.
• The linear function cT x is called the objective function (or objective) of
the program, and the inequalities aT i x ≤ bi are called the constraints.
• The structure of (LO) reduces to the sizes m (number of constraints) and
n (number of variables). The data of (LO) is the collection of numerical
values of the coefficients in the cost vector c, in the right hand side vector
b and in the constraint matrix A.

1.9
Opt = max{cT x : Ax ≤ b}
x (LO)
[A : m × n]
• A solution to (LO) is an arbitrary value of the decision vector. A solution x
is called feasible if it satisfies the constraints: Ax ≤ b. The set of all feasible
solutions is called the feasible set of the program. The program is called
feasible, if the feasible set is nonempty, and is called infeasible otherwise.

1.10
Opt = max{cT x : Ax ≤ b}
x (LO)
[A : m × n]
• Given a program (LO), there are three possibilities:
— the program is infeasible. In this case, Opt = −∞ by definition.
— the program is feasible, and the objective is not bounded from above on
the feasible set, i.e., for every a ∈ R there exists a feasible solution x such
that cT x > a. In this case, the program is called unbounded, and Opt = +∞
by definition.
The program which is not unbounded is called bounded; a program is
bounded iff its objective is bounded from above on the feasible set (e.g.,
due to the fact that the latter is empty).
— the program is feasible, and the objective is bounded from above on the
feasible set: there exists a real a such that cT x ≤ a for all feasible solutions
x. In this case, the optimal value Opt is the supremum, over the feasible
solutions, of the values of the objective at a solution.

1.11
Opt = max{cT x : Ax ≤ b}
x (LO)
[A : m × n]
• a solution to the program is called optimal, if it is feasible, and the value
of the objective at the solution equals to Opt. A program is called solvable,
if it admits an optimal solution.
♠ In the case of a minimization problem
Opt = min{cT x : Ax ≤ b}
x (LO)
[A : m × n]
• the optimal value of an infeasible program is +∞,
• the optimal value of a feasible and unbounded program (unboundedness
now means that the objective to be minimized is not bounded from below
on the feasible set) is −∞
• the optimal value of a bounded and feasible program is the infimum of
values of the objective at feasible solutions to the program.

1.12
♣ The notions of feasibility, boundedness, solvability and optimality can be
straightforwardly extended from LO programs to arbitrary MP ones. With
this extension, a solvable problem definitely is feasible and bounded, while
the inverse not necessarily is true, as is illustrated by the program

Opt = max {− exp{−x} : x ≥ 0},


x
Opt = 0, but the optimal value is not achieved – there is no feasible solution
where the objective is equal to 0! As a result, the program is unsolvable.
⇒ In general, the fact that an optimization program with a “legitimate” –
real, and not ±∞ – optimal value, is strictly weaker that the fact that the
program is solvable (i.e., has an optimal solution).
♠ In LO the situation is much better: we shall prove that an LO program
is solvable iff it is feasible and bounded.

1.13
Examples of LO Models

♣ Diet Problem: There are n types of products and m types of nutrition


elements. A unit of product # j contains pij grams of nutrition element #
i and costs cj . The daily consumption of a nutrition element # i should
be within given bounds [bi, bi]. Find the cheapest possible “diet” – mixture
of products – which provides appropriate daily amounts of every one of the
nutrition elements.

Denoting xj the amount of j-th product in a diet, the LO model reads

min n
P
x j=1 cj xj [cost to be minimized]
subject to  
Pn
pij xj ≥ bi 

upper & lower bounds on
Pj=1


n p x ≤ b

the contents of nutrition

j=1 ij j i   
1≤i≤m

 elements in a diet
 
you cannot put into a
xj ≥ 0, 1 ≤ j ≤ n  diet a negative amount 
 
of a product

1.14
• Diet problem is routinely used in nourishment of poultry, livestock, etc. As
about nourishment of humans, the model is of no much use since it ignores
factors like food’s taste, food diversity requirements, etc.
• Here is the optimal daily human diet as computed by the software at
http://www.neos-guide.org/content/diet-problem-solver
(when solving the problem, I allowed to use all 68 kinds of food offered by
the code):
Food Serving Cost
Raw Carrots 0.12 cups shredded 0.02
Peanut Butter 7.20 Tbsp 0.25
Popcorn, Air-Popped 4.82 Oz 0.19
Potatoes, Baked 1.77 cups 0.21
Skim Milk 2.17 C 0.28
Daily cost $ 0.96

1.15
♣ Production planning: A factory
• consumes R types of resources (electricity, raw materials of various kinds,
various sorts of manpower, processing times at different devices, etc.)
• produces P types of products.
• There are n possible production processes, j-th of them can be used with
“intensity” xj (intensities are fractions of the planning period during which
a particular production process is used).
• Used at unit intensity, production process # j consumes Arj units of
resource r, 1 ≤ r ≤ R, and yields Cpj units of product p, 1 ≤ p ≤ P .
• The profit of selling a unit of product p is cp.
♠ Given upper bounds b1, ..., bR on the amounts of various recourses available
during the planning period, and lower bounds d1, ..., dP on the amount of
products to be produced, find a production plan which maximizes the profit
under the resource and the demand restrictions.

1.16
Denoting by xj the intensity of production process j, the LO model reads:
n P 
max
P P
x j=1 p=1 cp Cpj xj [profit to be maximized]
subject to  
n upper bounds on
Arj xj ≤ br , 1 ≤ r ≤ R resources should 
P

j=1
be met 
n lower bounds on
Cpj xj ≥ dp, 1 ≤ p ≤ P  products should 
P
j=1 be met
n
  
xj ≤ 1 total intensity should be ≤ 1
P 

j=1  and intensities must be 
xj ≥ 0, 1 ≤ j ≤ n  nonnegative

Implicit assumptions:
• all production can be sold
• there are no setup costs when switching between production processes
• the products are infinitely divisible

1.17
♣ Inventory: An inventory operates over time horizon of T days 1, ..., T and
handles K types of products.
• Products share common warehouse with space C. Unit of product k takes
space ck ≥ 0 and its day-long storage costs hk .
• Inventory is replenished via ordering from a supplier; a replenishment order
sent in the beginning of day t is executed immediately, and ordering a unit
of product k costs ok ≥ 0.
• The inventory is affected by external demand of dtk units of product k in
day t. Backlog is allowed, and a day-long delay in supplying a unit of product
k costs pk ≥ 0.

♠ Given the initial amounts s0k , k = 1, ..., K, of products in warehouse, all


the (nonnegative) cost coefficients and the demands dtk , we want to specify
the replenishment orders vtk (vtk is the amount of product k which is ordered
from the supplier at the beginning of day t) in such a way that at the end of
day T there is no backlogged demand, and we want to meet this requirement
at as small total inventory management cost as possible.

1.18
Building the model

1. Let state variable stk be the amount of product k stored at warehouse


at the end of day t. stk can be negative, meaning that at the end of day t
the inventory owes the customers |stk | units of product k.
Let also U be an upper bound on the total management cost. The problem
reads:
min U
U,v,s
U ≥
P
[ok vtk + max[hk stk , 0] + max[−pk stk , 0]]
1≤k≤K,
1≤t≤T
[cost description]
stk = st−1,k + vtk − dtk , 1 ≤ t ≤ T, 1 ≤ k ≤ K
[state equations]
PK
k=1 max[ck stk , 0] ≤ C, 1 ≤ t ≤ T
[space restriction should be met]
sT k ≥ 0, 1 ≤ k ≤ K
[no backlogged demand at the end]
vtk ≥ 0, 1 ≤ k ≤ K, 1 ≤ t ≤ T
Implicit assumption: replenishment orders are executed, and the demands
are shipped to customers at the beginning of day t.
1.19
♠ Our problem is not and LO program – it includes nonlinear constraints of
the form
[ok vtk + max[hk stk , 0] + max[−pk stk , 0]] ≤ U
P
k,t
max[ck stk , 0] ≤ C, t = 1, ..., T
P
k
Let us show that these constraints can be represented equivalently by linear
constraints.
♣ Consider a MP problem in variables x with linear objective cT x and con-
straints of the form
ni
aT
X
i x+ Termij (x) ≤ bi, 1 ≤ i ≤ m,
j=1
where Termij (x) are convex piecewise linear functions of x, that is, maxima
of affine functions of x:

Termij (x) = max [αT


ij`x + βij`]
1≤`≤L ij

1.20
♣ Observation: MP problem
( )
ni
max cT x : aT Termij (x) ≤ bi, i ≤ m
P
x i x+
j=1 (P )
Termij (x) = max [αT
ij` x + βij`]
1≤`≤L ij

is equivalent to the LO program



T Pni  


 ai x + j=1 τij ≤ bi 




T T
max c x : τij ≥ αij`x + βij`, ,i ≤ m (P 0)
x,τij   
1 ≤ ` ≤ Lij

 
 

in the sense that both problems have the same objectives and x is feasible
for (P ) iff x can be extended, by properly chosen τij , to a feasible solution
to (P 0). As a result,
• every feasible solution (x, τ ) to (P 0) induces a feasible solution x to (P ),
with the same value of the objective;
• vice versa, every feasible solution x to (P ) can be obtained in the above
fashion from a feasible solution to (P 0).

1.21
♠ Applying the above construction to the Inventory problem, we end up with
the following LO model:
min U
U,v,s,x,y,z
U ≥
P
[ok vtk + xtk + ytk ]
k,t
stk = st−1,k + vtk − dtk , 1 ≤ t ≤ T, 1 ≤ k ≤ K
PK
k=1 ztk ≤ C, 1 ≤ t ≤ T
xtk ≥ hk stk , xtk ≥ 0, 1 ≤ k ≤ K, 1 ≤ t ≤ T
ytk ≥ −pk stk , ytk ≥ 0, 1 ≤ k ≤ K, 1 ≤ t ≤ T
ztk ≥ ck stk , ztk ≥ 0, 1 ≤ k ≤ K, 1 ≤ t ≤ T
sT k ≥ 0, 1 ≤ k ≤ K
vtk ≥ 0, 1 ≤ k ≤ K, 1 ≤ t ≤ T

1.22
♣ Warning: The outlined “eliminating piecewise linear nonlinearities” heav-
ily exploits the facts that after the nonlinearities are moved to the left hand
sides of ≤-constraints, they can be written down as the maxima of affine
functions.

Indeed, the attempt to eliminate nonlinearity


min[αTij`x + βij`]
`
in the constraint
... + min[αT
ij`x + βij`]≤bi
`
by introducing upper bound τij on the nonlinearity and representing the
constraint by the pair of constraints
... + τij ≤ bi
τij ≥ min[αTij`x + βij`]
`
fails, since the red constraint in the pair, in general, is not representable by
a system of linear inequalities.

1.23
♣ Transportation problem: There are I warehouses, i-th of them storing
si units of product, and J customers, j-th of them demanding dj units of
product. Shipping a unit of product from warehouse i to customer j costs
cij . Given the supplies si, the demands dj and the costs Cij , we want to
decide on the amounts of product to be shipped from every warehouse to
every customer. Our restrictions are that we cannot take from a warehouse
more product than it has, and that all the demands should be satisfied;
under these restrictions, we want to minimize the total transportation cost.
Let xij be the amount of product shipped from warehouse i to customer j.
The problem reads
" #
P transportation cost
minx i,j cij xij
to be minimized
subject to " #
PJ bounds on supplies
j=1 xij ≤ si , 1 ≤ i ≤ I of warehouses
" #
PI demands should
x
i=1 ij = d j , j = 1, ..., J
"
be satisfied #
no negative
xij ≥ 0, 1 ≤ i ≤ I, 1 ≤ j ≤ J
shipments

1.24
♣ Multicommodity Flow:
• We are given a network (an oriented graph), that is, a finite set of nodes
1, 2, ..., n along with a finite set Γ of arcs — ordered pairs γ = (i, j) of distinct
nodes. We say that an arc γ = (i, j) ∈ Γ starts at node i, ends at node j
and links nodes i and j.
♠ Example: Road network with road junctions as nodes and one-way road
segments “from a junction to a neighbouring junction” as arcs.
• There are N types of “commodities” moving along the network, and we
are given the “external supply” ski of k-th commodity at node i. When
ski ≥ 0, the node i “pumps” into the network ski units of commodity k;
when ski ≤ 0, the node i “drains” from the network |ski| units of commodity
k.
♠ k-th commodity in a road network with steady-state traffic can be com-
prised of all cars leaving within an hour a particular origin (e.g., GaTech
campus) towards a particular destination (e.g., Northside Hospital).

1.25
• Propagation of commodity k through the network is represented by a flow
vector f k . The entries in f k are indexed by arcs, and fγk is the flow of
commodity k in an arc γ.
♠ In a road network with steady-state traffic, an entry fγk of the flow vector
f k is the amount of cars from k-th origin-destination pair which move within
an hour through the road segment γ.

1.26
• A flow vector f k is called a feasible flow, if it is nonnegative and satisfies
the
Conservation law: for every node i, the total amount of commodity k
entering the node plus the external supply ski of the commodity at the node
is equal to the total amount of the commodity leaving the node:
k +s = k
X X
p∈P (i)
f pi ki q∈Q(i)
f iq
P (i) = {p : (p, i) ∈ Γ}; Q(i) = {q : (i, q) ∈ Γ}

-1
−1
-2 −2

1/2
1
2

3 1 1
2
3/2
2
1/2 1

3
1 1

3 0

1.27
♣ Multicommodity flow problem reads: Given
• a network with n nodes 1, ..., n and a set Γ of arcs,
• a number K of commodities along with supplies ski of nodes i = 1, ..., n
to the flow of commodity k, k = 1, ..., K,
• the per unit cost ckγ of transporting commodity k through arc γ,
• the capacities hγ of the arcs,
find the flows f 1, ..., f K of the commodities which are nonnegative, respect
the Conservation law and the capacity restrictions (that is, the total, over
the commodities, flow through an arc does not exceed the capacity of the
arc) and minimize, under these restrictions, the total, over the arcs and the
commodities, transportation cost.
♠ In the Road Network illustration, interpreting ckγ as the travel time along
road segment γ, the Multicommodity flow problem becomes the one of
finding social optimum, where the total travelling time of all cars is as small
as possible.

1.28
♣ We associate with a network with n nodes and m arcs an n × m incidence
matrix P defined as follows:
• the rows of P are indexed by the nodes 1, ..., n
• the columns
 of P are indexed by arcs γ
 1, γ starts at i

• Piγ = −1, γ ends at i

 0, all other cases

#4
4
#3
3

(1,2) (1,3) (1,4) (2,3) (4,3)


 
1 1 1 1
 −1 1  2
P = 
 
−1 −1 −1  3


−1 1 4
1 2

#1 #2

♠ In terms of the incidence matrix, the Conservation Law linking flow f and
external supply s reads
Pf = s
1.29
♠ Multicommodity flow problem reads
PK P k
minf 1,...,f K k=1 γ∈Γ c kγ f γ
[transportation cost to be minimized]
subject to
P f k = sk := [sk1; ...; skn], k = 1, ..., K
[flow conservation laws]
fγk ≥ 0, 1 ≤ k ≤ K, γ ∈ Γ
[flows must be nonnegative]
PK k ≤ h ,γ ∈ Γ
k=1 f γ γ
[bounds on arc capacities]

1.30
−d
1

−d
d + d 2
1 2

red nodes: warehouses green nodes: customers


green arcs: transpostation costs cij , capacities +∞
red arcs: transporation costs 0, capacities si

Note: Transportation problem is a particular case of the Multicommodity


flow one. Here we
• start with I red nodes representing warehouses, J green nodes representing
customers and IJ arcs “warehouse i – customer j;” these arcs have infinite
capacities and transportation costs cij ;
• augment the network with source node which is linked to every warehouse
node i by an arc with zero transportation cost and capacity si;
• consider the single-commodity case where the source node has external
supply j dj , the customer nodes have external supplies −dj , 1 ≤ j ≤ J, and
P

the warehouse nodes have zero external supplies.

1.31
♣ Maximum Flow problem: Given a network with two selected nodes – a
source and a sink, find the maximal flow from the source to the sink, that is,
find the largest s such the external supply “s at the source node, −s at the
sink node, 0 at all other nodes” corresponds to a feasible flow respecting
arc capacities.

The problem reads


 
total flow from source to
max s sink to be maximized
f,s
subject to 
 s, i is the source node

γ Piγ fγ =  −s, i is the sink node
P
 0, for all other nodes
[flow conservation law]
fγ ≥ 0, γ ∈ Γ [arc flows should be ≥ 0]
fγ ≤ hγ , γ ∈ Γ [we should respect arc capacities]

1.32
LO Models in Signal Processing

♣ Fitting Parameters in Linear Regression: “In the nature” there exists


a linear dependence between a variable vector of factors (regressors) x ∈ Rn
and the “ideal output” y ∈ R:

y = θ∗T x
θ∗ ∈ Rn: vector of true parameters.
Given a collection {xi, yi}m
i=1 of noisy observations of this dependence:

yi = θ∗T xi + ξi [ξi : observation noise]


we want to recover the parameter vector θ∗.
♠ When m  n, a typical way to solve the problem is to choose a “discrep-
ancy measure” φ(u, v) – a kind of distance between vectors u, v ∈ Rm – and
to choose an estimate θb of θ∗ by minimizing in θ the discrepancy
 
T T
φ [y1; ...; ym], [θ x1; ...; θ xm]

between the observed outputs and the outputs of a hypothetic model y = θT x


as applied to the observed regressors x1, ..., xm.
1.33
yi = θ∗T xi + ξi [ξi : observation noise]
♠ Setting X = [xT T T
1 ; x2 ; ...; xm ], the recovering routine becomes

θb = argmin φ(y, Xθ) [y = [y1; ...; ym]]


θ
♠ The choice of φ(·, ·) primarily depends on the type of observation noise.
• The most frequently used discrepancy measure is φ(u, v) = ku − vk2, cor-
responding to the case of White Gaussian Noise (ξi ∼ N (0, σ 2) are inde-
pendent) or, more generally, to the case when ξi are independent identically
distributed with zero mean and finite variance. The corresponding Least
Squares recovery
m
[yi − xT 2
X
θb = argmin i θ]
θ i=1
reduces to solving a system of linear equations.

1.34
θb = argmin φ(y, Xθ) [y = [y1; ...; ym]]
θ
♠ There are cases when the recovery reduces to LO:
Pm
1. `1-fit: φ(u, v) = ku − vk1 := i=1 |ui − vi|. Here the recovery problem
reads
Pm n Pm o
T T
min i=1 |yi − xi θ| ⇔ min τ : i=1 |yi − xi θ| ≤ τ (!)
θ θ,τ
There are two ways to reduce (!) to an LO program:
• Intelligent way: Noting that |yi − xT T T
i θ| = max[yi − xi θ, xi θ − yi ], we can
use the “eliminating piecewise linear nonlinearities” trick to convert (!) into
the LO program ( )
T T
y − x θ ≤ τi, xi θ − yi ≤ τi ∀i
min τ : Pi m i
θ,τ,τi i=1 τi ≤ τ
1 + n+m variables, 2m + 1 constraints
• Stupid way: Noting that
Pm
i=1 |ui | =
P
max i i ui ,
1 =±1,...,m =±1
we can convert (!)
n into the LO program o
Pm T
min τ : i=1 i[yi − xi θ] ≤ τ ∀1 = ±1, ..., m = ±1
θ,τ
1 + n variables, 2m constraints
1.35
θb = argmin φ(y, Xθ) [y = [y1; ...; ym]]
θ
2. Uniform fit φ(u, v) = ku − vk∞ := max |ui − vi|:
i
n o
min τ : yi − xT T
i θ ≤ τ, xi θ − yi ≤ τ, 1 ≤ i ≤ m
θ,τ

1.36
Sparsity-oriented Signal Processing
and `1 minimization

♣ Compressed Sensing: We have m-dimensional observation


y = [y1; ...; ym] = Xθ∗ + ξ
[X ∈ Rm×n : sensing matrix, ξ : observation noise]
of unknown “signal” θ∗ ∈ Rn with m  n and want to recover θ∗.
♠ Since m  n, the system Xθ = y − ξ in variables θ, if solvable, has infinitely
many solutions
⇒ Even in the noiseless case, θ∗ cannot be recovered well, unless additional
information on θ∗ is available.
♠ In Compressed Sensing, the additional information on θ∗ is that θ∗ is sparse
— has at most a given number s  m nonzero entries.

1.37
♠ Fact: Many real-life signals x when presented by their coefficients in
properly selected basis (“dictionary”) B:
x = Bu
• columns of B: vectors of basis B
• u: coefficients of x in basis B
become sparse (or nearly so): u has just s  n nonzero entries (or can be
well approximated by vector with s  n nonzero entries). We do not assume
the location of “meaningful coefficients” known in advance.

1.38
Example I: Typical audio signals become sparse (or nearly so) when repre-
senting them ”in frequency domain” – as sums of harmonic oscillations of
different frequencies:
3

-1

-2

-3

-4
0 50 100 150 200 250 300
1.5

0.5

-0.5

-1

-1.5
0 50 100 150 200 250 300

Top: singal in time domain


Bottom: decomposition of signal in harmonic oscillations

1.39
Illustration: 25 sec fragment of audio signal “Mail must go through” (di-
mension 1,058,400) and its ”Fourier coefficients” – amplitudes of partici-
pating harmonic oscillations:
0.6 3500

3000
0.4

2500
0.2

2000

1500

-0.2
1000

-0.4
500

-0.6 0
0 5 10 15 20 25 0 5 10 15 20 25

How mail goes through in time domain How mail goes through in frequency domain

% of leading Fourier coefficients kept energy


100% 100%
25% 99.8%
15% 99.6%
5% 98.2%
1% 79.0%

1.40
Example II: The 256 × 256 image
50

100

150

200

250
50 100 150 200 250

can be thought of as 2562 = 65536-dimensional vector (write down the intensities of pixels
column by column). This image (same as other “non-pathological” images) is nearly sparse
when represented in wavelet basis:
50 50

100 100

150 150

200 200

250 250
50 100 150 200 250 50 100 150 200 250

1% of leading wavelet 5% of leading wavelet


coefficients kept (99.70% of energy) coefficients kept (99.93% of energy)

50 50

100 100

150 150

200 200

250 250
50 100 150 200 250 50 100 150 200 250

10% of leading wavelet 25% of leading wavelet


coefficients kept (99.96% of energy) coefficients kept (99.99% of energy)

1.41
♠ When recovering a signal x∗ admitting a sparse (or nearly so) representation Bu∗ in a
known basis B from observations
y = Ax∗ + η,
the situation reduces to the one when the signal to be recovered is just sparse.
Indeed, we can first recover sparse u∗ from observations
y = Ax∗ + η = [AB]u∗ + η.
After an estimate ub of u∗ is built, we can estimate x∗ by B u
b.
⇒ In fact, sparse recovery is about how to recover a sparse n-dimensional signal x from
m  n observations
y = Ax∗ + η.

1.42
♣ Observation: Assume that ξ = 0, and let every m × 2s submatrix of X be of rank
2s (which will be typically the case when m ≥ 2s).Then θ∗ is the optimal solution to the
optimization problem
min {nnz(θ) : Xθ = y}
θ
[nnz(θ) = Card{j : θj 6= 0}]
♠ Bad news: Problem
min {nnz(θ) : Xθ = y}
θ
is heavily computationally intractable in the large-scale case.
Indeed, essentially the only known algorithm for solving the problem is a “brute force” one:
Look one by one at all finite subsets I of {1, ..., n} of cardinality 0, 1, 2, ..., n, trying each
time to solve the linear system
Xθ = y, θi = 0, i 6∈ I
in variables θ. When the first solvable system is met, take its solution as θ. e
• When s = 5, n = 100, the best known upper bound on the number of steps in this
algorithm is ≈ 7.53e7, which perhaps is doable.
• When s = 20, n = 200, the bound blows up to ≈ 1.61e27, which is by many orders of
magnitude beyond our “computational grasp.”

1.43
♠ Partial remedy: Replace the difficult P to minimize objective nnz(θ) with an “easy-to-
minimize” objective, specifically, kθk1 = i |θi |. As a result, we arrive at `1 -recovery
nPθb = argminθ {kθk1 : Xθ = y} o
⇔ min j zj : Xθ = y, −zj ≤ θj ≤ zj ∀j ≤ n .
θ,z
♠ When the observation is noisy: y = Xθ∗ + ξ and we know an upper bound δ on a norm
kξk of the noise, the `1 -recovery becomes
θb = argminθ {kθk1 : kXθ − yk ≤ δ} .
When k · k is k · k∞ , the latterproblem is an LO program: 
P −δ ≤ [Xθ − y]i ≤ δ ∀i ≤ m
min j zj : .
θ,z −zj ≤ θj ≤ zj ∀j ≤ n

1.44
♣ Compressed Sensing theory shows that under some difficult to verify assumptions on
X which are satisfied with overwhelming probability when X is a large randomly selected
matrix `1 -minimization recovers
— exactly in the noiseless case, and
— with error ≤ C(X)δ in the noisy case
all s-sparse signals with s ≤ O(1)m/ ln(n/m).
♠ How it works:
• X: 620×2048 • θ∗ : 10 nonzeros • δ = 0.005
1

0.8

0.6

0.4

0.2

−0.2

−0.4

−0.6

−0.8

−1
0 500 1000 1500 2000 2500

`1 -recovery, kθb − θ∗ k∞ ≤ 8.9e−4

1.45
♣ Curious (and sad) fact: Theory of Compressed Sensing states that “nearly all” large
m
randomly generated m × n sensing matrices X are s-good with s as large as O(1) ln(n/m) ,
meaning that for these matrices, `1 -minimization in the noiseless case recovers exactly all
s-sparse signals with the indicated value of s.
However: No individual sensing matrices with the outlined property are known. For all
known m × n√sensing matrices with 1  m  n, the provable level of goodness does not
exceed O(1) m... For example, for the 620 × 2048 matrix X from the above numerical
illustration we have m/ ln(n/m) ≈ 518
⇒ we could expect x to be s-good with s of order of hundreds. In fact we can certify
s-goodness of X with s = 10, and can certify that x is not s-good with s = 59.
Note: The best known verifiable sufficient condition for X to be s-good is
1
min kColj (In − Y T X)ks,1 < 2s
 Y 
• Colj (A): j-th column of A
 • kuks,1: the sum of s largest magnitudes 
of entries in u
This condition reduces to LO.

1.46
What Can Be Reduced to LO?

♣ We have seen numerous examples of optimization programs which can be reduced to


LO, although in its original “maiden” form the program is not an LO one. Typical “maiden
form” of a MP problem is
max f (x)
(MP) : x∈X⊂Rn
X = {x ∈ Rn : gi (x) ≤ 0, 1 ≤ i ≤ m}
In LO,
• The objective is linear
• The constraints are affine
♠ Observation: Every MP program is equivalent to a program with linear objective.
Indeed, adding slack variable τ , we can rewrite (MP) equivalently as
max cT y := τ,
y=[x;τ ]∈Y
Y = {[x; τ ] : gi (x) ≤ 0, τ − f (x) ≤ 0}
⇒ we lose nothing when assuming from the very beginning that the objective in (MP) is
linear: f (x) = cT x.

1.47
max n cT x
(MP) : x∈X⊂R
X = {x ∈ Rn : gi (x) ≤ 0, 1 ≤ i ≤ m}
♣ Definition: A polyhedral representation of a set X ⊂ Rn is a representation of X of the
form:
X = {x : ∃w : P x + Qw ≤ r},
that is, a representation of X as the a projection onto the space of x-variables of a polyhedral
set X + = {[x; w] : P x + Qw ≤ r} in the space of x, w-variables.
♠ Observation: Given a polyhedral representation of the feasible set X of (MP), we can
pose (MP) as the LO program
 T
max c x : P x + Qw ≤ r .
[x;w]

1.48
♠ Examples of polyhedral representations:
• The set X = {x ∈ Rn : i |xi | ≤ 1} admits the p.r.
P
 
 −wi ≤ xi ≤ wi , 
n n
X = x ∈ R : ∃w ∈ R : P 1 ≤ i ≤ n, .
 w ≤1 i i

• The set

X= x ∈ R6 : max[x1 , x2 , x3 ] + 2 max[x4 , x5 , x6 ]
≤ x1 − x6 + 5
admits the p.r.
 
 x1 ≤ w1 , x2 ≤ w1 , x3 ≤ w1 
6 2
X = x ∈ R : ∃w ∈ R : x4 ≤ w2 , x5 ≤ w2 , x6 ≤ w2 .
 w1 + 2w2 ≤ x1 − x6 + 5 

1.49
Whether a Polyhedrally Represented Set
is Polyhedral?

♣ Question: Let X be given by a polyhedral representation:


X = {x ∈ Rn : ∃w : P x + Qw ≤ r},
that is, as the projection of the solution set
Y = {[x; w] : P x + Qw ≤ r} (∗)
of a finite system of linear inequalities in variables x, w onto the space of x-variables.
Is it true that X is polyhedral, i.e., X is a solution set of finite system of linear inequalities
in variables x only?
Theorem. Every polyhedrally representable set is polyhedral.
Proof is given by the Fourier — Motzkin elimination scheme which demonstrates that the
projection of the set (∗) onto the space of x-variables is a polyhedral set.

1.50
Y = {[x; w] : P x + Qw ≤ r}, (∗)
Elimination step: eliminating a single slack variable. Given set (∗), assume that
+
w = [w1 ; ...; wm ] is nonempty, and let Y be the projection of Y on the space of variables
x, w1 , ..., wm−1 :
Y + = {[x; w1 ; ...; wm−1 ] : ∃wm : P x + Qw ≤ r} (!)
Let us prove that Y + is polyhedral. Indeed, let us split the linear inequalities pTi x + qiT w ≤ ri ,
1 ≤ i ≤ I, defining Y into three groups:
• black – the coefficient at wm is 0
• red – the coefficient at wm is > 0
• green – the coefficient at wm is < 0
Then

Y = [x; w] :
aTi x + bTi [w1 ; ...; wm−1 ] ≤ ci , i is black
wm ≤ aTi x + bTi [w1 ; ...; wm−1 ] + ci , i is red
wm ≥ aTi x + bTi [w1 ; ...; wm−1 ] + ci , i is green

+

Y = [x; w1 ; ...; wm−1 ] :
aTi x + bTi [w1 ; ...; wm−1 ] ≤ ci , i is black
aTµ x + bTµ [w1 ; ...; wm−1 ] + cµ ≥ aTν x + bTν [w1 ; ...; wm−1 ] + cν
whenever µ is red and ν is green
and thus Y + is polyhedral.

1.51
We have seen that the projection
Y + = {[x; w1 ; ...; wm−1 ] : ∃wm : [x; w1 ; ...; wm ] ∈ Y }
of the polyhedral set Y = {[x, w] : P x + Qw ≤ r} is polyhedral. Iterating the process, we
conclude that the set X = {x : ∃w : [x, w] ∈ Y } is polyhedral, Q.E.D.
♣ Given an LO program
 T
Opt = max c x : Ax ≤ b , (!)
x
observe that the set of values of the objective at feasible solutions can be represented as
T = {τ ∈ R : ∃x : Ax ≤ b, cT x − τ = 0}
= {τ ∈ R : ∃x : Ax ≤ b, cT x ≤ τ, cT x ≥ τ }
that is, T is polyhedrally representable. By Theorem, T is polyhedral, that is, T can be
represented by a finite system of nonstrict linear inequalities in variable τ only. It immediately
follows that if T is nonempty and is bounded from above, T has the largest element. Thus,
we have proved
Corollary. A feasible and bounded LO program admits an optimal solution and thus is
solvable.

1.52
T = {τ ∈ R : ∃x : Ax ≤ b, cT x − τ = 0}
= {τ ∈ R : ∃x : Ax ≤ b, cT x ≤ τ, cT x ≥ τ }
♣ Fourier-Motzkin Elimination Scheme suggests a finite algorithm for solving an LO pro-
gram, where we
• first, apply the scheme to get a representation of T by a finite system S of linear inequal-
ities in variable τ ,
• second, analyze S to find out whether the solution set is nonempty and bounded from
above, and when it is the case, to find out the optimal value Opt ∈ T of the program,
• third, use the Fourier-Motzkin elimination scheme in the backward fashion to find x such
that Ax ≤ b and cT x = Opt, thus recovering an optimal solution to the problem of interest.

Bad news: The resulting algorithm is completely impractical, since the number of inequal-
ities we should handle at an elimination step usually rapidly grows with the step number
and can become astronomically large when eliminating just tens of variables.

1.53
Polyhedrally Representable Functions

♣ Definition: Let f be a real-valued function on a set Domf ⊂ Rn . The epigraph of f is


the set
Epi{f } = {[x; τ ] ∈ Rn × R : x ∈ Domf, τ ≥ f (x)}.
A polyhedral representation of Epi{f } is called a polyhedral representation of f . Function
f is called polyhedrally representable, if it admits a polyhedral representation.
♠ Observation: A Lebesque set {x ∈ Domf : f (x) ≤ a} of a polyhedrally representable
function is polyhedral, with a p.r. readily given by a p.r. of Epi{f }:

 Epi{f } = {[x; 
τ ] : ∃w : P x + τ p + Qw ≤ r} ⇒
x ∈ Domf
x: = {x : ∃w : P x + ap + Qw ≤ r}.
f (x) ≤ a

1.54
Examples: • The function f (x) = max [αTi x + βi ] is polyhedrally representable:
1≤i≤I

Epi{f } = {[x; τ ] : αTi x + βi − τ ≤ 0, 1 ≤ i ≤ I}.


• Extension: Let D = {x : Ax ≤ b} be a polyhedral set in Rn . A function f with the domain
D given in D as f (x) = max [αTi x + βi ] is polyhedrally representable:
1≤i≤I

Epi{f }= {[x; τ ] : x ∈ D, τ ≥ max αTi x + βi } =


1≤i≤I
{[x; τ ] : Ax ≤ b, αTi x − τ + βi ≤ 0, 1 ≤ i ≤ I}.
In fact, every polyhedrally representable function f is of the form stated in Extension.

1.55
Calculus of Polyhedral Representations

♣ In principle, speaking about polyhedral representations of sets and functions, we could


restrict ourselves with representations which do not exploit slack variables, specifically,
• for sets — with representations of the form
X = {x ∈ Rn : Ax ≤ b};
• for functions — with representations of the form
Epi{f } = {[x; τ ] : Ax ≤ b, τ ≥ max αTi x + βi }
1≤i≤I

♠ However, “general” – involving slack variables – polyhedral representations of sets and


functions are much more flexible and can be much more “compact” that the straightforward
– without slack variables – representations.

1.56
Examples:
• The function f (x) = kxk1 : Rn → R admits the p.r.
 
 −wi ≤ xi ≤ wi , 
n
Epi{f } = [x; τ ] : ∃w ∈ R : P 1 ≤ i ≤ n
 w ≤τ
i i

which requires n slack variables and 2n + 1 linear inequality constraints. In contrast to this,
the straightforward — without slack variables — representation of f
 Pn 
 x
i=1 i i ≤ τ
Epi{f } = [x; τ ] :
∀(1 = ±1, ..., n = ±1)
requires 2n inequality constraints.
• The set X = {x ∈ Rn : ni=1 max[xi , 0] ≤ 1} admits the p.r.
P
X
n
X = {x ∈ R : ∃w : 0 ≤ w, xi ≤ wi ∀i, wi ≤ 1}
i

which requires n slack variables and 2n + 1 inequality constraints. Every straightforward —


without slack variables — p.r. of X requires at least 2n − 1 constraints
X
xi ≤ 1, ∅ 6= I ⊂ {1, ..., n}
i∈I

1.57
♣ Polyhedral representations admit a kind of simple and “fully algorithmic” calculus which,
essentially, demonstrates that all convexity-preserving operations with polyhedral sets pro-
duce polyhedral results, and a p.r. of the result is readily given by p.r.’s of the operands.
♠ Role of Convexity: A set X ⊂ Rn is called convex, if whenever two points x, y belong
to X, the entire segment [x, y] linking these points belongs to X:
∀(x, y ∈ x, λ ∈ [0, 1]) :
.
x + λ(y − x) = (1 − λ)x + λy∈ X
A function f : Domf → R is called convex, if its epigraph Epi{f } is a convex set, or, equiv-
alently, if
x, y ∈ Domf, λ ∈ [0, 1]
⇒f ((1 − λ)x + λy) ≤ (1 − λ)f (x) + λf (y).

1.58
Fact: A polyhedral set X = {x : Ax ≤ b} is convex. In particular, a polyhedrally repre-
sentable function is convex.
Indeed,
Ax ≤ b, Ay ≤ b, λ ≥ 0, 1 − λ ≥ 0
A(1 − λ)x ≤ (1 − λ)b

Aλy ≤ λb
⇒ A[(1 − λ)x + λy] ≤ b
Consequences:
• lack of convexity makes impossible polyhedral representation of a set/function,
• consequently, operations with functions/sets allowed by “calculus of polyhedral repre-
sentability” we intend to develop should be convexity-preserving operations.

1.59
Calculus of Polyhedral Sets

♠ Raw materials: X = {x ∈ Rn : aT x ≤ b} (when a 6= 0, or, which is the same, the set is


nonempty and differs from the entire space, such a set is called half-space)
♠ Calculus rules:
S.1. Taking finite intersections: If the sets Xi ⊂ Rn , 1 ≤ i ≤ k, are polyhedral, so is their
intersection, and a p.r. of the intersection is readily given by p.r.’s of the operands.
Indeed, if
Xi = {x ∈ Rn : ∃wi : Pi x + Qi wi ≤ ri }, i = 1, ..., k,
then
k 
i

\ P x + Qi w ≤ ri ,
Xi = x : ∃w = [w1 ; ...; wk ] : i ,
1≤i≤k
i=1
T
which is a polyhedral representation of Xi .
i

1.60
S.2. Taking direct products. Given k sets Xi ⊂ Rni , their direct product X1 × ... × Xk is
the set in Rn1 +...+nk comprised of all block-vectors x = [x1 ; ...; xk ] with blocks xi belonging
to Xi , i = 1, ..., k. E.g., the direct product of k segments [−1, 1] on the axis is the unit
k-dimensional box {x ∈ Rk : −1 ≤ xi ≤ 1, i = 1, ..., k}.
If the sets Xi ⊂ Rni , 1 ≤ i ≤ k, are polyhedral, so is their direct product, and a p.r. of the
product is readily given by p.r.’s of the operands.
Indeed, if
Xi = {xi ∈ Rni : ∃wi : Pi xi + Qi wi ≤ ri }, i = 1, ..., k,
then
X1× ... × Xk
= x = [x1 ; ...; xk ] : ∃w = [w1 ; ...; wk ] :
Pi xi + Qi wi ≤ ri ,
.
1≤i≤k

1.61
S.3. Taking affine image. If X ⊂ Rn is a polyhedral set and y = Ax + b : Rn → Rm is an
affine mapping, then the set
Y = AX + b := {y = Ax + b : x ∈ X} ⊂ Rm
is polyhedral, with p.r. readily given by the mapping and a p.r. of X.
Indeed, if X = {x : ∃w : P x + Qw ≤ r}, then

 {y : ∃[x; w] : P x + Qw ≤ r, y = Ax + 
Y = b}
P x + Qw ≤ r,
= y : ∃[x; w] :
y − Ax ≤ b, Ax − y ≤ −b
Since Y admits a p.r., Y is polyhedral.

1.62
S.4. Taking inverse affine image. If X ⊂ Rn is polyhedral, and x = Ay + b : Rm → Rn is an
affine mapping, then the set
Y = {y ∈ Rm : Ay + b ∈ X} ⊂ Rm
is polyhedral, with p.r. readily given by the mapping and a p.r. of X.
Indeed, if X = {x : ∃w : P x + Qw ≤ r}, then
Y = {y : ∃w : P [Ay + b] + Qw ≤ r}
= {y : ∃w : [P A]y + Qw ≤ r − P b}.

1.63
S.5. Taking arithmetic sum: If the sets Xi ⊂ Rn , 1 ≤ i ≤ k, are polyhedral, so is their
arithmetic sum X1 + ... + Xk := {x = x1 + ... + xk : xi ∈ Xi , 1 ≤ i ≤ k}, and a p.r. of the sum
is readily given by p.r.’s of the operands.
Indeed, the arithmetic sum of X1 , ..., Xk is the image of X1 ×... ×Xk under the linear mapping
[x1 ; ...; xk ] 7→ x1 + ... + xk , and both operations preserve polyhedrality. Here is an explicit p.r.
for the sum:
if Xi = {x : ∃wi : Pi x + Qi wi ≤ ri }, 1 ≤ i ≤ k, then
X1
+ ... + Xk 
i i
Pi x + Qi w ≤ ri , 

1 k 1 k
= x : ∃x , ..., x , w , ..., w : 1 ≤ i ≤ k, ,
Pk i
x = i=1 x
 

and it remains to replace the vector equality in the right hand side by a system of two
opposite vector inequalities.

1.64
Calculus of Polyhedrally Representable
Functions

♣ Preliminaries: Arithmetics of partially defined functions.


• a scalar function f of n variables is specified by indicating its domain Domf – the set
where the function is well defined, and by the description of f as a real-valued function in
the domain.
When speaking about convex functions f , it is very convenient to think of f as of a function
defined everywhere on Rn and taking real values in Domf and the value +∞ outside of
Domf .
With this convention, f becomes an everywhere defined function on Rn taking values in
R ∪ {+∞}, and Domf becomes the set where f takes real values.

1.65
♠ In order to allow for basic operations with partially defined functions, like their addition or
comparison, we augment our convention with the following agreements on the arithmetics
of the “extended real axis” R ∪ {+∞}:
• Addition: for a real a, a + (+∞) = (+∞) + (+∞) = +∞.
• Multiplication by a nonnegative real λ: λ · (+∞) = +∞ when λ > 0, and 0 · (+∞) = 0.
• Comparison: for a real a, a < +∞ (and thus a ≤ +∞ as well), and of course +∞ ≤ +∞.
Note: Our arithmetic is incomplete — operations like (+∞) − (+∞) and (−1) · (+∞)
remain undefined.

1.66
♠ Raw materials: f (x) = aT x + b (affine functions)

Epi{aT x + b} = {[x; τ ] : aT x + b − τ ≤ 0}

♠ Calculus rules:
F.1. Taking linear combinations with positive coefficients. If fi : Rn → R ∪ {+∞} are p.r.f.’s
and λi > 0, 1 ≤ i ≤ k, then f (x) = ki=1 λi fi (x) is a p.r.f., with a p.r. readily given by those
P
of the operands.
Indeed, if
{[x; τ ] : τ ≥ fi (x)}
= {[x; τ ] : ∃wi : Pi x + τ pi + Qi wi ≤ ri }, 1 ≤ i ≤ k,
then
Pk
{[x;
 τ ] : τ ≥ i=1 λi fi (x)} 
ti ≥ fi (x), 1 ≤ i ≤ k,
= [x; τ ] : ∃t1 , ..., tk : P
 i λi ti ≤ τ

= [x; τ ] : ∃t1 , ..., tk , w1 , ..., wk :


Pi x + ti pi + Qi wi ≤ ri , 
P 1 ≤ i ≤ k, .
i λi ti ≤ τ

1.67
F.2. Direct summation. If fi : Rni → R ∪ {+∞}, 1 ≤ i ≤ k, are p.r.f.’s, then so is their direct
sum
k
X
f ([x1 ; ...; xk ]) = fi (xi ) : Rn1 +...+nk → R ∪ {+∞}
i=1
and a p.r. for this function is readily given by p.r.’s of the operands.
Indeed, if
{[xi ; τ ] : τ ≥ fi (xi )}
= {[xi ; τ ] : ∃wi : Pi xi + τ pi + Qi wi ≤ ri }, 1 ≤ i ≤ k,
then
1 ; ...; xk ; τ ] : τ ≥
Pk i
{[x i=1 fi (x )} 
 i
ti ≥ fi (x ), 
= [x1 ; ...; xk ; τ ] : ∃t1 , ..., tk : P 1 ≤ i ≤ k,

i ti ≤ τ 

= [x1 ; ...; xk ; τ ] : ∃t1 , ..., tk , w1 , ..., wk :
Pi xi + ti pi + Qi wi ≤ ri , 
P 1 ≤ i ≤ k, .
i ti ≤ τ

1.68
F.3. Taking maximum. If fi : Rn → R ∪ {+∞} are p.r.f.’s, so is their maximum f (x) =
max[f1 (x), ..., fk (x)], with a p.r. readily given by those of the operands.
Indeed, if
{[x; τ ] : τ ≥ fi (x)}
= {[x; τ ] : ∃wi : Pi x + τ pi + Qi wi ≤ ri }, 1 ≤ i ≤ k,
then
{[x;
τ ] : τ ≥ maxi fi(x)} 
P x + τ p + Q w i ≤ r ,
= [x; τ ] : ∃w1 , ..., wk : i i i i
.
1≤i≤k

1.69
F.4. Affine substitution of argument. If a function f (x) : Rn → R ∪ {+∞} is a p.r.f. and x =
Ay + b : Rm → Rn is an affine mapping, then the function g(y) = f (Ay + b) : Rm → R ∪ {+∞}
is a p.r.f., with a p.r. readily given by the mapping and a p.r. of f .
Indeed, if
{[x; τ ] : τ ≥ f (x)}
= {[x; τ ] : ∃w : P x + τ p + Qw ≤ r},
then
{[y; τ ] : τ ≥ f (Ay + b)}
= {[y; τ ] : ∃w : P [Ay + b] + τ p + Qw ≤ r}
= {[y; τ ] : ∃w : [P A]y + τ p + Qw ≤ r − P b}.

1.70
F.5. Theorem on superposition. Let
• fi (x) : Rn → R ∪ {+∞} be p.r.f.’s, and let
• F (y) : Rm → R ∪ {+∞} be a p.r.f. which is nondecreasing w.r.t. every one of the variables
y1 , ..., ym . Then the superposition

F (f1 (x), ..., fm (x)), fi (x) < +∞ ∀i
g(x) =
+∞, otherwise
of F and f1 , ..., fm is a p.r.f., with a p.r. readily given by those of fi and F .
Indeed, let
{[x; τ ] : τ ≥ fi (x)} = {[x; τ ] : ∃wi : Pi x + τ p + Qi wi ≤ ri },
{[y; τ ] : τ ≥ F (y)} = {[y; τ ] : ∃w : P y + τ p + Qw ≤ r}.
Then  
y ≥ fi (x),1 ≤ i ≤ m,
{[x; τ ] : τ ≥ g(x)}|{z}
= [x; τ ] : ∃y1 , ..., ym : i
F (y1 , ..., ym ) ≤ τ
 (∗) 
wi
P x + yi pi + Qi ≤ ri , 1 ≤ i ≤ m,
= [x; τ ] : ∃y, w1 , ..., wm , w : i ,
P y + τ p + Qw ≤ r
where (∗) is due to the monotonicity of F .

1.71
Note: if some of fi , say, f1 , ..., fk , are affine, then the Superposition Theorem remains valid
when we require the monotonicity of F w.r.t. the variables yk+1 , ..., ym only; a p.r. of the
superposition in this case reads
 
y ≥ fi (x), k + 1 ≤ i ≤ m,
{[x; τ ] : τ ≥ g(x)}= [x; τ ] : ∃yk+1 ..., ym : i
F (f1 (x), ..., fk (x), yk+1 , ..., ym ) ≤ τ
 yi = fi (x), 1 ≤ i ≤ k, 
k+1 m Pi x + yi pi + Qi wi ≤ ri ,
= [x; τ ] : ∃y1 , ..., ym , w , ..., w , w : ,
k + 1 ≤ i ≤ m,
P y + τ p + Qw ≤ r
and the linear equalities yi = fi (x), 1 ≤ i ≤ k, can be replaced by pairs of opposite linear
inequalities.

1.72
Fast Polyhedral Approximation
of the Second Order Cone

♠ Fact:The canonical polyhedral representation X = {x ∈ Rn : Ax ≤ b} of the projection


X = {x : ∃w : P x + Qw ≤ r}
of a polyhedral set X + = {[x; w] : P x + Qw ≤ r} given by a moderate number of linear
inequalities in variables x, w can require a huge number of linear inequalities in variables x.
Question: Can we use this phenomenon in order to approximate to high accuracy a non-
polyhedral set X ⊂ Rn by projecting onto Rn a higher-dimensional polyhedral and simple
(given by a moderate number of linear inequalities) set X + ?

1.73
Theorem: For every n and every , 0 <  < 1/2, one can point out a polyhedral set L+
given by an explicit system of homogeneous linear inequalities in variables x ∈ Rn , t ∈ R,
w ∈ Rk :
X + = {[x; t; w] : P x + tp + Qw ≤ 0} (!)
such that
• the number of inequalities in the system (≈ 2n ln(1/)) and the dimension of the slack
vector w (≈ 0.7n ln(1/)) do not exceed O(1)n ln(1/)
• the projection
L = {[x; t] : ∃w : P x + tp + Qw ≤ 0}
of L+ on the space of x, t-variables is in-between the Second Order Cone and (1 + )-
extension of this cone:
Ln+1 := {[x; t] ∈ Rn+1 : kxk2 ≤ t} ⊂ L
⊂ Ln+1
 := {[x; t] ∈ Rn+1 : kxk2 ≤ (1 + )t}.
In particular, we have
Bn1 ⊂ {x : ∃w : P x + p + Qw ≤ 0} ⊂ Bn1+
Bnr = {x ∈ Rn : kxk2 ≤ r}

1.74
Note: When  = 1.e-17, a usual computer does not distinguish between r = 1 and r =
1 + . Thus, for all practical purposes, the n-dimensional Euclidean ball admits polyhedral
representation with ≈ 28n slack variables and ≈ 79n linear inequality constraints.
Note: A straightforward representation X = {x : Ax ≤ b} of a polyhedral set X satisfying

Bn1 ⊂ X ⊂ Bn1+
n−1
requires at least N = O(1)− 2 linear inequalities. With n = 100,  = 0.01, we get
N ≥ 3.0e85 ≈ 300, 000 × [# of atoms in universe]
With “fast polyhedral approximation” of Bn1 , a 0.01-approximation of B100 requires just 922
linear inequalities on 100 original and 325 slack variables.

1.75
♣ With fast polyhedral approximation of the cone Ln+1 = {[x; t] ∈ Rn+1 : kxk2 ≤ t}, Conic
Quadratic Optimization programs
max cT x : kAi x − bi k2 ≤ cTi x + di , 1 ≤ i ≤ m

(CQI)
x

“for all practical purposes” become LO programs. Note that numerous highly nonlinear
optimization problems, like
minimize cT x subject to
Ax = b
x≥0
 8 1/3
1/7 2/7 3/7 1/5 2/5 1/5
|xi |3
P
≤ x2 x3 x4 + 2x1 x5 x6
i=1
5x2 ≥ x1/21x2 + x1/3 x23 x5/8
 1 2 2 3 4
x2 x1
 x1 x4 x3
 5I

 x3 x6 x3 
x3 x8
exp{x1 } + 2 exp{2x2 − x3 + 4x4 }
+3 exp{x5 + x6 + x7 + x8 } ≤ 12
can be in a systematic fashion converted to/rapidly approximated by problems of the form
(CQI) and thus “for all practical purposes” are just LO programs.

1.76
Geometry of a Polyhedral Set

♣ An LO program maxn cT x : Ax ≤ b is the problem of maximizing a linear objective over
x∈R
a polyhedral set X = {x ∈ Rn : Ax ≤ b} – the solution set of a finite system of nonstrict
linear inequalities
⇒ Understanding geometry of polyhedral sets is the key to LO theory and algorithms.
♣ Our ultimate goal is to establish the following fundamental
Theorem. A nonempty polyhedral set
X = {x ∈ Rn : Ax ≤ b}
admits a representation of the form
 
 λi ≥ 0 ∀i 

 XM XN M


P
X= x= λi vi + µj rj : λi = 1 (!)

 i=1 j=1 i=1 

µj ≥ 0 ∀j
 

where vi ∈ Rn , 1 ≤ i ≤ M and rj ∈ Rn , 1 ≤ j ≤ N are properly chosen “generators.”


Vice versa, every set X representable in the form of (!) is polyhedral.

2.1
b)
a) d)

c)

a): a polyhedral set


b): { 3i=1 λi vi : λi ≥ 0, 3i=1 λi = 1}
P P
c): { 2j=1 µj rj : µj ≥ 0}
P
d): The set a) is the sum of sets b) and c)
Note: shown are the boundaries of the sets.

2.2
∅ 6= X = {x ∈ Rn : Ax ≤ b}
 m 
λ ≥ 0 ∀i
 PM PN Pi M 
X = x = i=1 λi vi + j=1 µj rj : i=1 λi = 1 
(!)
µj ≥ 0 ∀j

♠ X = {x ∈ Rn : Ax ≤ b} is an “outer” description of a polyhedral set X: it says what should


be cut off Rn to get X.
♠ (!) is an “inner” description of a polyhedral set X: it explains how can we get all points
of X, starting with two finite sets of vectors in Rn .
♥ Taken together, these two descriptions offer a powerful “toolbox” for investigating poly-
hedral sets. For example,
• To see that the intersection of two polyhedral subsets X, Y in Rn is polyhedral, we can
use their outer descriptions:
X = {x : Ax ≤ b}, Y = {x : Bx ≤ c}
.
⇒ X ∩ Y = {x : Ax ≤ b, Bx ≤ c}

2.3
∅ 6= X = {x ∈ Rn : Ax ≤ b}
 m 

 λi ≥ 0 ∀i 

P M 
M PN P
X= i=1 λi vi + j=1 µj rj : λi = 1 (!)

 i=1 

µj ≥ 0 ∀j
 

• To see that the image Y = {y = P x + p : x ∈ X} of a polyhedral set X ⊂ Rn under an


affine mapping x 7→ P x + p : Rn → Rm is polyhedral, we can use the inner descriptions:

 X is given by (!) 

 λi ≥ 0 ∀i 

PM N
P M
P 
⇒Y = λi (P vi + p) + µj P rj : λi = 1
i=1 j=1 i=1

 

µj ≥ 0 ∀j

2.4
Preliminaries: Linear Subspaces

♣ Definition: A linear subspace in Rn is a nonempty subset L of Rn which is closed w.r.t.


taking linear combinations of its elements:
xi ∈ L, λi ∈ R, 1 ≤ i ≤ I ⇒ Ii=1 λi xi ∈ L
P
♣ Examples:
• L = Rn
• L = {0}
• L = {x ∈ Rn : x1 = 0}
• L = {x ∈ Rn : Ax = 0}
• Given a set X ⊂ Rn , let Lin(X) be set of all finite linear combinations of vectors from
X. This set – the linear span of X – is a linear subspace which contains X, and this is the
intersection of all linear subspaces containing X.
Convention: A sum of vectors from Rn with empty set of terms is well defined and is the
zero vector. In particular, Lin(∅) = {0}.
♠ Note: The last two examples are “universal:” Every linear subspace L in Rn can be rep-
resented as L = Lin(X) for a properly chosen finite set X ⊂ Rn , same as can be represented
as L = {x : Ax = 0} for a properly chosen matrix A.

2.5
♣ Dimension of a linear subspace. Let L be a linear subspace in Rn .
♠ For properly chosen x1 , ..., xm , we have Pm
L = Lin({x1 , ..., xm }) = i=1 λi xi ;
whenever this is the case, we say that x1 , ..., xm linearly span L.
♠ Facts:
♥ All minimal w.r.t. inclusion collections x1 , ..., xm linearly spanning L (they are called bases
of L) have the same cardinality m, called the dimension dim L of L.
♥ Vectors x1 , ..., xm forming a basis of L always are linearly independent, that is, every
nontrivial (not all coefficients are zero) linear combination of the vectors is a nonzero
vector.

2.6
♠ Facts:
♥ All collections x1 , ..., xm of linearly independent vectors from L which are maximal w.r.t.
inclusion (i.e., extending the collection by any vector from L, we get a linearly dependent
collection) have the same cardinality, namely, dim L, and are bases of L.
♥ Let x1 , ..., xm be vectors from L. Then the following four properties are equivalent:
• x1 , ..., xm is a basis of L
• m = dim L and x1 , ..., xm linearly span L
• m = dim L and x1 , ..., xm are linearly independent
• x1 , ..., xm are linearly independent and linearly span L

2.7
♠ Examples:
• dim {0} = 0, and the only basis of {0} is the empty collection.
• dim Rn = n. When n > 0, there are infinitely many bases in Rn , e.g., one comprised of
standard basic orths ei = [0; ...; 0; 1; 0; ...; 0] (”1” in i-th position), 1 ≤ i ≤ n.
• L = {x ∈ Rn : x1 = 0} ⇒ dim L = n − 1}. An example of a basis in L is e2 , e3 , ..., en .
Facts: ♥ if L⊂L0 are linear subspaces in Rn , then dim L≤dim L0 , with equality taking place
is and only if L = L0 .
⇒ Whenever L is a linear subspace in Rn , we have {0} ⊂ L ⊂ Rn , whence 0 ≤ dim L ≤ n
♥ In every representation of a linear subspace as L = {x ∈ Rn : Ax = 0}, the number of
rows in A is at least n − dim L. This number is equal to n − dim L if and only if the rows of
A are linearly independent.

2.8
“Calculus” of linear subspaces
♥ [taking intersection] When LT1 , L2 are linear subspaces in Rn , so is the set L1 ∩ L2 .
Extension: The intersection Lα of an arbitrary family {Lα }α∈A of linear subspaces of Rn
α∈A
is a linear subspace.
♥ [summation] When L1 , L2 are linear subspaces in Rn , so is their arithmetic sum
L1 + L2 = {x = u + v : u ∈ L1 , v ∈ L2 }.
Note “dimension formula:”
dim L1 + dim L2 = dim (L1 + L2 ) + dim (L1 ∩ L2 )
♥ [taking orthogonal complement] When L is a linear subspace in Rn , so is its orthogonal
complement L⊥ = {y ∈ Rn : y T x = 0 ∀x ∈ L}.
Note:
• (L⊥ )⊥ = L
• L + L⊥ = Rn , L ∩ L⊥ = {0}, whence dim L + dim L⊥ = n
• L = {x : Ax = 0} if and only if the (transposes of) the rows in A linearly span L⊥
• x ∈ Rn ⇒ ∃!(x1 ∈ L, x2 ∈ L⊥ ) : x = x1 + x2 , and for these x1 , x2 one has xT x = xT1 x1 + xT2 x2 .

2.9
♥ [taking direct product] When L1 ⊂ Rn1 and L2 ⊂ Rn2 are linear subspaces, the direct
product (or direct sum) of L1 and L2 – the set
L1 × L2 := {[x1 ; x2 ] ∈ Rn1 +n2 : x1 ∈ L1 , x2 ∈ L2 }
is a linear subspace in Rn1 +n2 , and
dim (L1 × L2 ) = dim L1 + dim L2 .
n
♥ [taking image under linear mapping] When L is a linear subspace in R and x 7→ P x :
Rn → Rm is a linear mapping, the image P L = {y = P x : x ∈ L} of L under the mapping is
a linear subspace in Rm .
♥ [taking inverse image under linear mapping] When L is a linear subspace in Rn and
x 7→ P x : Rm → Rn is a linear mapping, the inverse image P −1 (L) = {y : P y ∈ L} of L under
the mapping is a linear subspace in Rm .

2.10
Preliminaries: Affine Subspaces

♣ Definition: An affine subspace (or affine plane, or simply plane) in Rn is a nonempty


subset M of Rn which can be obtained from a linear subspace L ⊂ Rn by a shift:
M = a + L = {x = a + y : y ∈ L} (∗)
Note: In a representation (∗),
• L us uniquely defined by M : L = M − M = {x = u − v : u, v ∈ M }. L is called the linear
subspace which is parallel to M ;
• a can be chosen as an arbitrary element of M , and only as an element from M .
♠ Equivalently: An affine subspace in Rn is a nonempty subset M of Rn which is closed
with respect to taking affine combinations (linear combinations with coefficients summing
up to 1) of its elements:
I
! I
X X
xi ∈ M, λi ∈ R, λi = 1 ⇒ λi xi ∈ M
i=1 i=1

2.11
♣ Examples:
• M = Rn . The parallel linear subspace is Rn
• M = {a} (singleton). The parallel linear subspace is {0}
• M = {a + λ [b − a] : λ ∈ R} = {(1 − λ)a + λb : λ ∈ R} – (straight) line passing through two
| {z }
6=0
distinct points a, b ∈ Rn . The parallel linear subspace is the linear span R[b − a] of b − a.
Fact: A nonempty subset M ⊂ Rn is an affine subspace if and only if with any pair of
distinct points a, b from M M contains the entire line ` = {(1 − λ) + λb : λ ∈ R} spanned by
a, b.
• ∅= 6 M = {x ∈ Rn : Ax = b}. The parallel linear subspace is {x : Ax = 0}.
• Given a nonempty set X ⊂ Rn , let Aff(X) be the set of all finite affine combinations of
vectors from X. This set – the affine span (or affine hull) of X – is an affine subspace,
contains X, and is the intersection of all affine subspaces containing X. The parallel linear
subspace is Lin(X − a), where a is an arbitrary point from X.

2.12
♠ Note: The last two examples are “universal:” Every affine subspace M in Rn can be
represented as M = Aff(X) for a properly chosen finite and nonempty set X ⊂ Rn , same as
can be represented as M = {x : Ax = b} for a properly chosen matrix A and vector b such
that the system Ax = b is solvable.

2.13
♣ Affine bases and dimension. Let M be an affine subspace in Rn , and L be the parallel
linear subspace.
♠ By definition, the affine dimension (or simply dimension) dim M of M is the (linear)
dimension dim L of the linear subspace L to which M is parallel.
♠ We say that vectors x0 , x1 ..., xm , m ≥ 0,
• are affinely independent, if no nontrivial (not all coefficients are zeros) linear combination
of these vectors with zero sum of coefficients is the zero vector
Equivalently: x0 , ..., xm are affinely independent if and only if the coefficients in an affine
m
P
combination x = λi xi are uniquely defined by the value x of this combination.
i=0
• affinely span M , if Pm Pm
M = Aff({x0 , ..., xm }) = i=0 λi xi : i=0 λi = 1
• form an affine basis in M , if x0 , ..., xm are affinely independent and affinely span M .

2.14
♠ Facts: Let M be an affine subspace in Rn , and L be the parallel linear subspace. Then
♥ A collection x0 , x1 , ..., xm of vectors is an affine basis in M if and only if x0 ∈ M and the
vectors x1 − x0 , x2 − x0 , ..., xm − x0 form a (linear) basis in L
♥ The following properties of a collection x0 , ..., xm of vectors from M are equivalent to
each other:
• x0 , ..., xm is an affine basis in M
• x0 , ..., xm affinely span M and m = dim M
• x0 , ..., xm are affinely independent and m = dim M
• x0 , ..., xm affinely span M and is a minimal, w.r.t. inclusion, collection with this property
• x0 , ..., xm form a maximal, w.r.t. inclusion, affinely independent collection of vectors from
M (that is, the vectors x0 , ..., xm are affinely independent, and extending this collection by
any vector from M yields an affinely dependent collection of vectors).

2.15
♠ Facts:
♥ Let L = Lin(X). Then L admits a linear basis comprised of vectors from X.
♥ Let X 6= ∅ and M = Aff(X). Then M admits an affine basis comprised of vectors from
X.
♥ Let L be a linear subspace. Then every linearly independent collection of vectors from L
can be extended to a linear basis of L.
♥ Let M be an affine subspace. Then every affinely independent collection of vectors from
M can be extended to an affine basis of M .

2.16
Examples:
• dim {a} = 0, and the only affine basis of {a} is x0 = a.
• dim Rn = n. When n > 0, there are infinitely many affine bases in Rn , e.g., one comprised
of the zero vector and the n standard basic orths.
• M = {x ∈ Rn : x1 = 1} ⇒ dim M = n − 1. An example of an affine basis in M is
e1 , e1 + e2 , e1 + e3 , ..., e1 + en .
Extension: M is an affine subspace in Rn of the dimension n − 1 if and only if M can be
represented as M = {x ∈ Rn : eT x = b} with e 6= 0. Such a set is called hyperplane.
♠ Note: A hyperplane M = {x : eT x = b} (e 6= 0) splits Rn into two half-spaces

Π+ = {x : eT x ≥ b}, Π− = {x : eT x ≤ b}
and is the common boundary of these half-spaces.
A polyhedral set is the intersection of a finite (perhaps empty) family of half-spaces.

2.17
♠ Facts:
♥ M ⊂M 0 are affine subspaces in Rn ⇒ dim M ≤dim M 0 , with equality taking place if and only
if M = M 0 .
⇒ Whenever M is an affine subspace in Rn , we have 0 ≤ dim M ≤ n
♥ In every representation of an affine subspace as M = {x ∈ Rn : Ax = b}, the number of
rows in A is at least n − dim M . This number is equal to n − dim M if and only if the rows
of A are linearly independent.

2.18
“Calculus” of affine subspaces
♥ [taking intersection] When M1 , M2 are affine subspaces in Rn and M1 ∩ M2 6= ∅, so is the
set M1 ∩ M2 . T
Extension: If nonempty, the intersection Mα of an arbitrary family {Mα }α∈A of affine
α∈A
n
T
subspaces in R is an affine subspace. The parallel linear subspace is α∈A Lα , where Lα are
the linear subspaces parallel to Mα .
♥ [summation] When M1 , M2 are affine subspaces in Rn , so is their arithmetic sum
M1 + M2 = {x = u + v : u ∈ M1 , v ∈ M2 }.
The linear subspace parallel to M1 + M2 is L1 + L2 , where the linear subspaces Li are parallel
to Mi , i = 1, 2
♥ [taking direct product] When M1 ⊂ Rn1 and M2 ⊂ Rn2 , the direct product (or direct sum)
of M1 and M2 – the set
M1 × M2 := {[x1 ; x2 ] ∈ Rn1 +n2 : x1 ∈ M1 , x2 ∈ M2 }
is an affine subspace in Rn1 +n2 . The parallel linear subspace is L1 ×L2 , where linear subspaces
Li ⊂ Rni are parallel to Mi , i = 1, 2.

2.19
♥ [taking image under affine mapping] When M is an affine subspace in Rn and x 7→ P x + p :
Rn → Rm is an affine mapping, the image P M + p = {y = P x + p : x ∈ M } of M under the
mapping is an affine subspace in Rm . The parallel subspace is P L = {y = P x : x ∈ L}, where
L is the linear subspace parallel to M
♥ [taking inverse image under affine mapping] When M is a linear subspace in Rn , x 7→
P x + p : Rm → Rn is an affine mapping and the inverse image Y = {y : P y + p ∈ M } of M
under the mapping is nonempty, Y is an affine subspace. The parallel linear subspace if
P −1 (L) = {y : P y ∈ L}, where L is the linear subspace parallel to M .

2.20
Convex Sets and Functions

♣ Definitions:
♠ A set X ⊂ Rn is called convex, if along with every two points x, y it contains the entire
segment linking the points:
x, y ∈ X, λ ∈ [0, 1] ⇒ (1 − λ)x + λy ∈ X.
♥ Equivalently: X ∈ Rn is convex, if X is closed w.r.t. taking all convex combinations of
its elements (i.e., linear combinations with nonnegative coefficients summing up to 1):

∀k ≥ 1 : x1 , ..., xk ∈ X, λ1 ≥ 0, ..., λk ≥ 0, ki=1 λi = 1


P
Pk
⇒ λi xi ∈ X
i=1

Example: A polyhedral set X = {x ∈ Rn : Ax ≤ b} is convex. In particular, linear and affine


subspaces are convex sets.

2.21
♠ A function f (x) : Rn → R ∪ {+∞} is called convex, if its epigraph
Epi{f } = {[x; τ ] : τ ≥ f (x)}
is convex.
♥ Equivalently: f is convex, if
x, y ∈ Rn , λ ∈ [0, 1]
⇒ f ((1 − λ)x + λy) ≤ (1 − λ)f (x) + λf (y)
♥ Equivalently: f is convex, if f satisfies the Jensen’s Inequality:

∀k ≥ 1 : x1 , ..., xk ∈ Rn , λ1 ≥ 0, ..., λk ≥ 0, ki=1 λi = 1


P
P  k
k P
⇒f i=1 λi xi ≤ λi f (xi )
i=1

Example: A piecewise linear function


(
max[aTi x + bi ], P x ≤ p
f (x) = i≤I
+∞, otherwise
is convex.

2.22
♠ Convex hull: For a nonempty set X ⊂ Rn , its convex hull is the set comprised of all
convex combinations of elements of X:
 m

x ∈ X, 1 P≤i≤m∈N
λi xi : i
P
Conv(X) = x =
i=1 λi ≥ 0 ∀i, i λi = 1
By definition, Conv(∅) = ∅.
Fact: The convex hull of X is convex, contains X and is the intersection of all convex sets
containing X and thus is the smallest, w.r.t. inclusion, convex set containing X.
Note: a convex combination is an affine one, and an affine combination is a linear one,
whence
X ⊂ Rn ⇒ Conv(X) ⊂ Lin(X)
∅ 6= X ⊂ Rn ⇒ Conv(X) ⊂ Aff(X) ⊂ Lin(X)

2.23
Example: Convex hulls of a 3- and an 8-point sets (red dots) on the 2D plane:

2.24
♣ Dimension of a nonempty set X ∈ Rn :
♥ When X is a linear subspace, dim X is the linear dimension of X (the cardinality of (any)
linear basis in X)
♥ When X is an affine subspace, dim X is the linear dimension of the linear subspace parallel
to X (that is, the cardinality of (any) affine basis of X minus 1)
♥ When X is an arbitrary nonempty subset of Rn , dim X is the dimension of the affine hull
Aff(X) of X.
Note: Some sets X are in the scope of more than one of these three definitions. For these
sets, all applicable definitions result in the same value of dim X.

2.25
Calculus of Convex Sets

♠ [taking intersection]:T if X1 , X2 are convex sets in Rn , so is their intersection X1 ∩ X2 . In


fact, the intersection Xα of a whatever family of convex subsets in Rn is convex.
α∈A
♠ [taking arithmetic sum]: if X1 , X2 are convex sets Rn , so is the set X1 + X2 = {x =
x1 + x2 : x1 ∈ X1 , x2 ∈ X2 }.
♠ [taking affine image]: if X is a convex set in Rn , A is an m × n matrix, and b ∈ Rm ,
then the set AX + b := {Ax + b : x ∈ X} ⊂ Rm – the image of X under the affine mapping
x 7→ Ax + b : Rn → Rm – is a convex set in Rm .
♠ [taking inverse affine image]: if X is a convex set in Rn , A is an n × k matrix, and b ∈ Rm ,
then the set {y ∈ Rk : Ay + b ∈ X} – the inverse image of X under the affine mapping
y 7→ Ay + b : Rk → Rn – is a convex set in Rk .
♠ [taking direct product]: if the sets Xi ⊂ Rni , 1 ≤ i ≤ k, are convex, so is their direct
product X1 × ... × Xk ⊂ Rn1 +...+nk .

2.26
Calculus of Convex Functions

♠ [taking linear combinations with positive coefficients] if functions fi : Rn → R ∪ {+∞} are


convex and λi > 0, 1 ≤ i ≤ k, then the function
k
X
f (x) = λi fi (x)
i=1

is convex.
♠ [direct summation] if functions fi : Rni → R ∪ {+∞}, 1 ≤ i ≤ k, are convex, so is their
direct sum
f ([x1 ; ...; xk ]) = ki=1 fi (xi ) : Rn1 +...+nk → R ∪ {+∞}
P

♠ [taking supremum] the supremum f (x) = sup fα (x) of a whatever (nonempty) family
α∈A
{fα }α∈A of convex functions is convex.
♠ [affine substitution of argument] if a function f (x) : Rn → R ∪ {+∞} is convex and x =
Ay + b : Rm → Rn is an affine mapping, then the function g(y) = f (Ay + b) : Rm → R ∪ {+∞}
is convex.

2.27
♠ Theorem on superposition: Let fi (x) : Rn → R ∪ {+∞} be convex functions, and let
F (y) : Rm → R ∪ {+∞} be a convex function which is nondecreasing w.r.t. every one of the
variables y1 , ..., ym . Then the superposition

F (f1 (x), ..., fm (x)), fi (x) < +∞, 1 ≤ i ≤ m
g(x) =
+∞, otherwise
of F and f1 , ..., fm is convex.

Note: if some of fi , say, f1 , ..., fk , are affine, then the Theorem on superposition theorem
remains valid when we require the monotonicity of F w.r.t. yk+1 , ..., ym only.

2.28
Cones

♣ Definition: A set X ⊂ Rn is called a cone, if X is nonempty, convex and is homogeneous,


that is,
x ∈ X, λ ≥ 0 ⇒ λx ∈ X
Equivalently: A set X ⊂ Rn is a cone, if X is nonempty and is closed w.r.t. addition of its
elements and multiplication of its elements by nonnegative reals:
x, y ∈ X, λ, µ ≥ 0 ⇒ λx + µy ∈ X
Equivalently: A set X ⊂ Rn is a cone, if X is nonempty and is closed w.r.t. taking conic
combinations of its elements (that is, linear combinations with nonnegative coefficients):
Xm
∀m : xi ∈ X, λi ≥ 0, 1 ≤ i ≤ m ⇒ λi xi ∈ X.
i=1
Examples: • Every linear subspace in Rn (i.e., every solution set of a homogeneous system
of linear equations with n variables) is a cone
• The solution set X = {x ∈ Rn : Ax ≤ 0} of a homogeneous system of linear inequalities is
a cone. Such a cone is called polyhedral.

2.29
♣ A cone X is called pointed, if it does not contain lines passing through the origin, or,
equivalently, if the only vector d such that d ∈ X and −d ∈ X is the zero vector: d = 0.
♣ Conic hull: For a nonempty set X ⊂ Rn , its conic hull Cone (X) is defined as the set of
all conic combinations of elements of X:
X 6= ∅  
P λi ≥ 0, 1 ≤ i ≤ m ∈ N
⇒ Cone (X) = x= λi xi :
i xi ∈ X, 1 ≤ i ≤ m
By definition, Cone (∅) = {0}.
Fact: Cone (X) is a cone, contains X and is the intersection of all cones containing X,
and thus is the smallest, w.r.t. inclusion, cone containing X.
Example: The conic hull of the set X = {e1 , ..., en } of all basic orths in Rn is the nonnegative
orthant Rn+ = {x ∈ Rn : x ≥ 0}. This cone is pointed.

2.30
Calculus of Cones

♠ [taking intersection] if X1 , X2 are cones in Rn , so is their intersection X1 ∩ X2 . In fact,


Xα of a whatever family {Xα }α∈A of cones in Rn is a cone.
T
the intersection
α∈A
♠ [taking arithmetic sum] if X1 , X2 are cones in Rn , so is the set X1 + X2 = {x = x1 + x2 :
x1 ∈ X1 , x2 ∈ X2 };
♠ [taking linear image] if X is a cone in Rn and A is an m × n matrix, then the set
AX := {Ax : x ∈ X} ⊂ Rm – the image of X under the linear mapping x 7→ Ax : Rn → Rm –
is a cone in Rm .
♠ [taking inverse linear image] if X is a cone in Rn and A is an n × k matrix, then the set
{y ∈ Rk : Ay ∈ X} – the inverse image of X under the linear mapping y 7→ AyRk → Rn – is a
cone in Rk .
♠ [taking direct products] if Xi ⊂ Rni are cones, 1 ≤ i ≤ k, so is the direct product
X1 × ... × Xk ⊂ Rn1 +...+nk .

2.31
♠ [passing to the dual cone] if X is a cone in Rn , so is its dual cone defined as
X∗ = {y ∈ Rn : y T x ≥ 0 ∀x ∈ X}.
Examples:
• The cone dual to a linear subspace L is the orthogonal complement L⊥ of L
• The cone dual to the nonnegative orthant Rn+ is the nonnegative orthant itself:

(Rn+ )∗ := {y ∈ Rn : y T x ≥ 0 ∀x ≥ 0} = {y ∈ Rn : y ≥ 0}.
• 2D cones bounded by blue rays are dual to cones bounded by red rays:

2.32
Preparing Tools: Caratheodory Theorem

Theorem. Let x1 , ..., xN ∈ Rn and m = dim {x1 , ..., xN }. Then every point x which is a
convex combination of x1 , ..., xN can be represented as a convex combination of at most
m + 1 of the points x1 , ..., XN .
Proof.
• Let M = Aff{x1 , ..., xN }, so that dim M = m. By shifting M (which does not affect
the statement we intend to prove) we can make M a m-dimensional linear subspace in
Rn . Representing points from the linear subspace M by their m-dimensional vectors of
coordinates in a basis of M , we can identify M and Rm , and this identification does not
affect the statement we intend to prove. Thus, assume w.l.o.g. that m = n.

2.33
• Let x = N
P
i=1 µi xi be a representation of x as a convex combination of x1 , ..., xN with as
small number of nonzero coefficients as possible. Reordering x1 , ..., xN and omitting terms
with zero coefficients, assume w.l.o.g. that x = M
P
PM i=1 µi xi , so that µi > 0, 1 ≤ i ≤ M , and
i=1 µi = 1. It suffices to show that M ≤ n + 1. Let, on the contrary, M > n + 1.
• Consider the system of linear equations in variables δ1 , ..., δM :
PM PM
i=1 δ i xi = 0; i=1 δi = 0
This is a homogeneous system of n + 1 linear equations in M > n + 1 variables, and thus it
has a nontrivial solution δ̄1 , ..., δ̄M . Setting µi (t) = µi + tδ̄i , we have
PM PM
P∀t : x = i=1 µi(t)xi, i=1 µi(t) = 1.
• Since δ̄ is nontrivial and i δ¯i = 0, the set I = {i : δi < 0} is nonempty. Let t̄ = min µi /|δ̄i |.
i∈I
Then all µi (t̄) are ≥ 0, at least one of µi (t̄) is zero, and
x= M
P PM
µ
i=1 i ( t̄)xi , i=1 µi (t̄) = 1.
We get a representation of x as a convex combination of xi with less than M nonzero
coefficients, which is impossible. 

2.34
Quiz:
• In the nature, there are 26 “pure” types of tea, denoted A, B,..., Z; all other types are
mixtures of these “pure” types. In the market, 111 blends of pure types, rather than the
pure types of tea themselves, are sold.
• John prefers a specific blend of tea which is not sold in the market; from experience, he
found that in order to get this blend, he can buy 93 of the 111 market blends and mix them
in certain proportion.
• An OR student pointed out that to get his favorite blend, John could mix appropriately
just 27 properly selected market blends. Another OR student found that just 26 of market
blends are enough.
• John does not believe the students, since no one of them asked what exactly is his favorite
blend. Is John right?

2.35
Quiz:
• In the nature, there are 26 “pure” types of tea, denoted A, B,..., Z. In the market, 111
blends of these types are sold.
• John knows that his favorite blend can be obtained by mixing in appropriate proportion
93 of the 111 market blends. Is it true that the same blend can be obtained by mixing
• 27 market blends?
• 26 market blends?
Both answers are true. Let us speak about unit weight portions of tea blends. Then
• a blend can be identified with 26-dimensional vector
x = [xA ; ...; xZ ]
where x? is the weight of pure tea ? in the unit weight portion of the blend. The 26 entries
in x are nonnegative and sum up to 1;
• denoting the marked blends by x1 , ..., x111 and the favorite blend of John by x̄, we know
that
x̄ = 111 i
P
i=1 λi x
with
P111 nonnegative coefficients λi. Comparing the weights of both sides, we conclude that
i=1 λi = 1

2.36
⇒ x̄ is a convex combination of x1 , ..., x111
⇒ [by Caratheodory and due to dim xi = 26] x̄ is a convex combination of just 26 + 1 = 27
of the market blends, thus the first student is right.
• The vectors x1 , ..., x111 have unit sums of entries thus belong to the hyperplane
M = {[xA ; ...; xZ ] : xA + ... + xZ = 1}
which has dimension 25
⇒ The dimension of the set {x1 , x2 , ..., x111 } is at most m = 25
⇒ By Caratheodory, x̄ is a convex combination of just m + 1 = 26 vectors from {x1 , ..., x111 },
thus the second student also is right.

2.37
Preparing Tools: Helley Theorem

Theorem. Let A1 , ..., AN be convex sets in Rn which belong to an affine subspace M of


dimension m. Assume that every m + 1 sets of the collection have a point in common. then
all N sets have a point in common.
Proof. • Same as in the proof of Caratheodory Theorem, we can assume w.l.o.g. that
m = n.
• We need the following fact:
Theorem [Radon] Let x1 , ..., xN be points in Rn . If N ≥ n + 2, we can split the index set
{1, ..., N } into two nonempty non-overlapping subsets I, J such that
Conv{xi : i ∈ I} ∩ Conv{xi : i ∈ J} 6= ∅.
From Radon to Helley: Let us prove Helley’s theorem by indiction in N . There is nothing
to prove when N ≤ n + 1. Thus, assume that N ≥ n + 2 and that the statement holds
true for all collections of N − 1 sets, and let us prove that the statement holds true for
N -element collections of sets as well.

2.38
• Given A1 , ..., AN , we define the N sets
Bi = A1 ∩ A2 ∩ ... ∩ Ai−1 ∩ Ai+1 ∩ ... ∩ AN .
By inductive hypothesis, all Bi are nonempty. Choosing a point xi ∈ Bi , we get N ≥ n + 2
points xi , 1 ≤ i ≤ N .
• By Radon Theorem, after appropriate reordering of the sets A1 , ..., AN , we can assume that
for certain k, Conv{x1 , ..., xk } ∩ Conv{xk+1 , ..., xN } 6= ∅. We claim that if b ∈ Conv{x1 , ..., xk } ∩
Conv{xk+1 , ..., xN }, then b belongs to all Ai , which would complete the inductive step.
To support our claim, note that
— when i ≤ k, xi ∈ Bi ⊂ Aj for all j = k + 1, ..., N , that is,

i ≤ k ⇒ xi ∈ ∩N
j=k+1 Aj .

Since the latter set is convex and b is a convex combination of x1 , ..., xk , we get b ∈ ∩N
j=k+1 Aj .
— when i > k, xi ∈ Bi ⊂ Aj for all 1 ≤ j ≤ k, that is,

i ≥ k ⇒ xi ∈ ∩kj=1 Aj .
Similarly to the above, it follows that b ∈ ∩kj=1 Aj .
Thus, our claim is correct.

2.39
Proof of Radon Theorem: Let x1 , ..., xN ∈ Rn and N ≥ n + 2. We want to prove that
we can split the set of indexes {1, ..., N } into non-overlapping nonempty sets I, J such that
Conv{xi : i ∈ I} ∩ Conv{xi : i ∈ J} 6= ∅.
Indeed, consider the system of n + 1 < N homogeneous linear equations in N variables
δ1 , ..., δN :
N
X N
X
δi xi = 0, δi = 0. (∗)
i=1 i=1
This system has a nontrivial solution δ̄. Let us set I = {i : δ̄i > 0}, J = {i : δ̄i ≤ 0}.
PN P
Since δ̄ 6= 0 and δ̄
i=1 i = 0, both I, J are nonempty, do not intersect and µ := i∈I δ̄i =
P
i∈J [−δ̄i ] > 0. (∗) implies that
X δ̄i X [−δ̄i]
xi = xi 
µ µ
|i∈I{z } |i∈J {z }
∈Conv{xi :i∈I} ∈Conv{xi :i∈J}

2.40
Quiz: The daily functioning of a plant is described by the linear constraints
(a) Ax ≤ f ∈ R10
+
(b) Bx ≥ d ∈ R2013 (!)
(c) Cx ≤ c ∈ R2000
• x: decision vector
• f ∈ R10
+ : vector of resources
• d: vector of demands
• There are N demand scenarios di . In the evening of day t − 1, the manager knows that
the demand of day t will be one of the N scenarios, but he does not know which one. The
manager should arrange a vector of resources f for the next day, at a price c` ≥ 0 per unit
of resource f` , in order to make the next day production problem feasible.
• It is known that every one of the demand scenarios can be “served” by $1 purchase of
resources.
(?) How much should the manager invest in resources to make the next day problem
feasible when
• N = 1 • N = 2 • N = 10 • N = 11
• N = 12 • N = 2013 ?

2.41
(a) : Ax ≤ f ∈ R10
+; (b) : Bx ≥ d ∈ R2013 ; (c) : Cx ≤ c ∈ R2000

Quiz answer: With N scenarios, $ min[N, 11] is enough!


Indeed, the vector of resources f ∈ R10 + appears only in the constraints (a)
⇒ surplus of resources makes no harm
⇒ with N scenarios di , $ N in resources is enough: every di can be “served” by $ 1 purchase
of appropriate resource vector f i ≥ 0, thus it suffices to buy the vector f 1 + ... + f N which
costs $ N and is ≥ f i for every i = 1, ..., n.
To see than $ 11 is enough, let Fi be the set of all resource vectors f which cost at
most $11 and allow to “serve” demand di ∈ D.
A. Fi ∈ R10 is convex (and even polyhedral): it admits polyhedral representation
Fi = {f ∈ R10 : ∃x : Cx ≤ c, Bx ≥ di , Ax ≤ f, f ≥ 0, 10
P
`=1 c` f` ≤ 11}
B. Every 11 sets Fi1 , ..., Fi11 of the family F1 , ..., Fi have a point in common. Indeed, scenario
dis can be “served” by $ 1 vector f s ≥ 0
⇒ every one of the scenarios di1 , ..., di11 can be served by the $ 11 vector of resources
f = f 1 + ... + f 11
⇒ f belongs to every one of Fi1 , ..., Fi11
• By Helley, A and B imply that all the sets F1 , ..., FN have a point f in common. f costs at
most $ 11 (the description of Fi ) and allows to “serve” every one of the demands d1 ,...,dN .

2.42
Preparing Tools: Homogeneous Farkas Lemma

♣ Question: When a homogeneous linear inequality


aT x ≥ 0 (∗)
is a consequence of a system of homogeneous linear inequalities
aTi x ≥ 0, i = 1, ..., m (!)
i.e., when (∗) is satisfied at every solution to (!)?
Observation: If a is a conic combination of a1 , ..., am :
X
∃λi ≥ 0 : a = λi ai , (+)
i

then (∗) is a consequence of (!).


Indeed, (+) implies that
X
T
a x= λi aTi x ∀x,
i
and thus for every x with aTi x ≥ 0 ∀i one has aT x ≥ 0.
♣ Homogeneous Farkas Lemma: (+) is a consequence of (!) if and only if a is a conic
combination of a1 , ..., am .

2.43
Rn ,
P
♣ Equivalently: Given vectors a1 , ..., am ∈ let K = Cone {a1 , ..., am } = { i λi ai : λ ≥ 0}
be the conic hull of the vectors. Given a vector a,
• it is easyPto certify that a ∈ Cone {a1 , ..., am }: a certificate is a collection of weights λi ≥ 0
such that i λi ai = a;
• it is easy to certify that a6∈Cone {a1 , ..., am }: a certificate is a vector d such that aTi d ≥ 0 ∀i
and aT d < 0.

2.44
Proof of HFL: All we need to prove is that If a is not a conic combination of a1 , ..., am ,
then there exists d such that aT d < 0 and aTi d ≥ 0, i = 1, ..., m.
Fact: The set K = Cone {a1 , ..., am } is polyhedrally representable:
 P 
x = i λi ai
Cone {a1 , ..., am } = x : ∃λ ∈ Rm : .
λ≥0
⇒ By Fourier-Motzkin, K is polyhedral:
K = {x : dT` x ≥ c` , 1 ≤ ` ≤ L}.
Observation I: 0 ∈ K ⇒ c` ≤ 0 ∀`
Observation II: λai ∈ Cone {a1 , ..., am } ∀λ > 0 ⇒ λdT` ai ≥ c` ∀λ ≥ 0 ⇒ dT` ai ≥ 0 ∀i, `.
Now, a 6∈ Cone {a1 , ..., am } ⇒ ∃` = `∗ : dT`∗ a < c`∗ ≤ 0⇒ dT`∗ a < 0.
⇒ d = d`∗ satisfies aT d < 0, aTi d ≥ 0, i = 1, ..., m, Q.E.D.

2.45
Corollary: Let a1 , ..., am ∈ Rn and K = Cone {a1 , ..., am }, and let K∗ = {x ∈ Rn : xT u ≥ 0∀u ∈
K} be the dual cone. Then K itself is the cone dual to K∗ :
T x ≥ 0 ∀u ∈ K }
(K∗ )∗ := {u : uP ∗
= K := { i λi ai : λi ≥ 0}.
Proof. • We clearly have
K∗ = (Cone {a1 , ..., am })∗ = {d : dT ai ≥ 0 ∀i = 1, ..., m}
• If K is a cone, then, by definition of K∗ , every vector from K has nonnegative inner
products with all vectors from K∗ and thus K ⊂ (K∗ )∗ for every cone K.
• To prove the opposite inclusion (K∗ )∗ ⊂ K in the case of K = Cone {a1 , ..., am }, let
a ∈ (K∗ )∗ , and let us verify that a ∈ K. Assuming this is not the case, by HFL there exists
d such that aT d < 0 and aTi d ≥ 0∀i ⇒ d ∈ K∗ , that is, a 6∈ (K∗ )∗ , which is a contradiction.

2.46
Understanding Structure of a Polyhedral Set

♣ Situation: We consider a polyhedral set


 
aT1
X = {x ∈ Rn : Ax ≤ b} , A =  · · ·  ∈ Rm×n
(X)
 aTm
⇔ X = x ∈ Rn : aTi x ≤ bi , i ∈ I = {1, ..., m} .
Standing assumption: X 6= ∅.
♠ Faces of X. Let us pick a subset I ⊂ I and replace in (X) the inequality constraints
aTi x ≤ bi , i ∈ I with their equality versions aTi x = bi , i ∈ I. The resulting set

XI = {x ∈ Rn : aTi x ≤ bi , i ∈ I\I, aTi x = bi , i ∈ I}


if nonempty, is called a face of X.
Examples: • X is a face of itself: X = X∅ .
• Let ∆n = Conv{0, e1 , ..., en } = {x ∈ Rn : x ≥ 0, ni=1 xi ≤ 1}
P
T T
P
⇒ I = {1, ..., n + 1} with ai x := −xi ≤ 0 =: bi , 1 ≤ i ≤ n, and an+1 x := i xi ≤ 1 =: bn+1
Every subset I ⊂ I different P from I defines a face. For example, I = {1, n + 1} defines the
n
face {x ∈ R : x1 = 0, xi ≥ 0, i xi = 1}.

2.47
X = x ∈ Rn : aTi x ≤ bi , i ∈ I = {1, ..., m}


Facts:
• A face
∅ 6= XI = {x ∈ Rn : aTi x ≤ bi , i ∈ I, aTi x = bi , i ∈ I}
of X is a nonempty polyhedral set
• A face of a face of X can be represented as a face of X
• if XI and XI 0 are faces of X and their intersection is nonempty, this intersection is a face
of X:
∅ 6= XI ∩ XI 0 ⇒ XI ∩ XI 0 = XI∪I 0 .
♣ A face XI is called proper, if XI 6= X.
Theorem: A face XI of X is proper if and only if dim XI < dim X.

2.48

X = x ∈ Rn : aTi x ≤ bi , i ∈ I = {1, ..., m}
XI = {x ∈ Rn : aTi x ≤ bi , i ∈ I\I, aTi x = bi , i ∈ I}
Proof. One direction is evident. Now assume that XI is a proper face, and let us prove
that dim XI < dim X.
• Since XI 6= X, there exists i∗ ∈ I such that aTi∗ x 6≡ bi∗ on X and thus on M = Aff(X).
⇒ The set M+ = {x ∈ M : aTi∗ x = bi∗ } contains XI (and thus is an affine subspace containing
Aff(XI )), and is $ M .
⇒ Aff(XI )$M , whence
dim XI = dim Aff(XI )<dim M = dim X. 

2.49
Extreme Points of a Polyhedral Set

X = x ∈ Rn : aTi x ≤ bi , i ∈ I = {1, ..., m}




Definition. A point v ∈ X is called an extreme point, or a vertex of X, if it can be


represented as a face of X:
∃I ⊂ I :
(∗)
XI := {x ∈ Rn : aTi x ≤ bi , i ∈ I, aTi x = bi , i ∈ I}= {v}.
Geometric characterization: A point v ∈ X is a vertex of X iff v is not the midpoint of
a nontrivial segment contained in X:
v ± h ∈ X ⇒ h = 0. (!)
Proof. • Let v be a vertex, so that (∗) takes place for certain I, and let h be such that
v ± h ∈ X; we should prove that h = 0. We have ∀i ∈ I:

bi ≥ aTi (v − h) = bi − aTi h & bi ≥ aTi (v + h) = bi + aTi h
⇒ aTi h = 0 ⇒ aTi [v ± h] = bi .
Thus, v ± h ∈ XI = {v}, whence h = 0. We have proved that (∗) implies (!).

2.50
v ± h ∈ X ⇒ h = 0. (!)
∃I ⊂ I :
(∗)
XI := {x ∈ Rn : aTi x ≤ bi , i ∈ I, aTi x = bi , i ∈ I}= {v}.
• Let us prove that (!) implies (∗). Indeed, let v ∈ X be such that (!) takes place; we
should prove that (∗) holds true for certain I. Let
I = {i ∈ I : aTi v = bi },
so that v ∈ XI . It suffices to prove that XI = {v}. Let, on the opposite, ∃e 6= 0 : v + e ∈ XI .
Then aTi (v + e) = bi = aTi v for all i ∈ I, that is, aTi e = 0 ∀i ∈ I, that is,

aTi (v ± te) = bi ∀(i ∈ I, t > 0).


When i ∈ I\I, we have aTi v < bi and thus aTi (v ± te) ≤ bi provided t > 0 is small enough.
⇒ There exists t̄ > 0: aTi (v± t̄e) ≤ bi ∀i ∈ I, that is v± t̄e ∈ X, which is a desired contradiction.


2.51
Algebraic characterization: A point
v ∈ X = {x ∈ Rn : aTi x ≤ bi , i ∈ I = {1, ..., m}}
is a vertex of X iff among the inequalities aTi x ≤ bi which are active at v (i.e., aTi v=bi ) there
are n with linearly independent ai :
Rank {ai : i ∈ Iv } = n, Iv = {i ∈ I : aTi v = bi } (!)
Proof. • Let v be a vertex of X; we should prove that among the vectors ai , i ∈ Iv there are
n linearly independent. Assuming that this is not the case, the linear system aTi e = 0, i ∈ Iv
in variables e has a nonzero solution e. We have
i ∈ Iv ⇒ aTi [v ± te] = bi ∀t,
i ∈ I\Iv ⇒ aTi v < bi
⇒ aTi [v ± te] ≤ bi for all small enough t > 0
whence ∃t̄ > 0 : aTi [v ± t̄e] ≤ bi ∀i ∈ I, that is v ± t̄e ∈ X, which is impossible due to t̄e 6= 0. 

2.52
v ∈ X = {x ∈ Rn : aTi x ≤ bi , i ∈ I}
Rank {ai : i ∈ Iv } = n, Iv = {i ∈ I : aTi v = bi } (!)
• Now let v ∈ X satisfy (!); we should prove that v is a vertex of X, that is, that the relation
v ± h ∈ X implies h = 0. Indeed, when v ± h ∈ X, we should have
∀i ∈ Iv : bi ≥ aTi [vi ± h] = bi ± aTi h
⇒ aTi h = 0 ∀i ∈ Iv .
Thus, h ∈ Rn is orthogonal to n linearly independent vectors from Rn , whence h = 0. 

2.53
♠ Observation: The set Ext(X) of extreme points of a polyhedral set is finite.
Indeed, there could be no more extreme points than faces.
♠ Observation: If X is a polyhedral set and XI is a face of X, then Ext(XI ) ⊂ Ext(X).
Indeed, extreme points are singleton faces, and a face of a face of X can be represented as
a face of X itself.

2.54
♥ Note: Geometric characterization of extreme points allows to define this notion for every
convex set X: A point x ∈ X is called extreme, if x ± h ∈ X ⇒ h = 0
♥ Fact: Let X be convex and x ∈ X. Then x ∈ Ext(X) iff in every representation
m
X
x= λi xi
i=1

of x as a convex combination of points xi ∈ X with positive coefficients one has


x1 = x2 = ... = xm = x
♥ Fact: For a convex set X and x ∈ X, x ∈ Ext(X) iff the set X\{x} is convex.
♥ Fact: For every X ⊂ Rn ,
Ext(Conv(X)) ⊂ X

2.55
Example: Let us describe extreme points of the set
X
n
∆n,k = {x ∈ R : 0 ≤ xi ≤ 1, xi ≤ k} [k ∈ N, 0 ≤ k ≤ n]
i

Description: These are exactly 0/1-vectors from ∆n,k .


In particular, the vertices of the box ∆n,n = {x ∈ Rn : 0 ≤ xi ≤ 1 ∀i} are all 2n 0/1-vectors
from Rn .
Proof. • If x is a 0/1 vector from ∆n,k , then the set of active at x bounds 0 ≤ xi ≤ 1 is of
cardinality n, and the corresponding vectors of coefficients are linearly independent
⇒ x ∈ Ext(∆n,k )
• If x ∈ Ext(∆n,k ), then among the active at x constraints defining ∆n,k should be n linearly
independent. The only options are:
— all n active constraints are among the bounds 0 ≤ xi ≤ 1
⇒ x is a 0/1 vector
—P n−1 of the Pactive constraints are among the bounds, and the remaining active constraint
is i xi ≤ k: i xi = k.
⇒ x has (n − 1) coordinates 0/1 & the sum of all n coordinates is integral
⇒ x is 0/1 vector.

2.56
Example: An n × n matrix A is called double stochastic, if the entries are nonnegative
and all the row and the column sums are equal to 1. Thus, the set of double-stochastic
matrices is a polyhedral set in Rn×n :
 

n×n
Pijn≥ 0 ∀i, j
x 
Πn = x = [xij ]i,j ∈ R : Pi=1 xij = 1 ∀j
n
j=1 xij = 1 ∀i
 

What are the extreme points of Πn ?


Birkhoff’s Theorem The vertices of Πn are exactly the n×n permutation matrices (exactly
one nonzero entry, equal to 1, in every row and every column).
Proof • A permutation matrix P can be viewed as 0/1 vector of dimension n2 and as such
is an extreme point of the box
{[xij ] : 0 ≤ xij ≤ 1}
which contains Πn . Therefore P is an extreme point of Πn since by geometric characteri-
zation of extreme points, an extreme point x of a convex set is an extreme point of every
smaller convex set to which x belongs.

2.57
• Let P be an extreme point of Πn . To prove that Πn is a permutation matrix, note that Πn
2
is cut off Rn by n2 inequalities xij ≥ 0 and 2n − 1 linearly independent linear equations (since
if all but one row and column sums are equal to 1, the remaining sum also is equal to 1).
By algebraic characterization of extreme points, at least n2 − (2n − 1) = (n − 1)2 > (n − 2)n
entries in P should be zeros.
⇒ there is a column in P with at least n − 1 zero entries
⇒ ∃i∗ , j∗ : Pi∗ j∗ = 1.
⇒ P belongs to the face {P ∈ Πn : Pi∗ j∗ = 1} of Πn and thus is an extreme point of the face.
In other words, the matrix obtained from P by eliminating i∗ -th row and j∗ -th column is an
extreme point in the set Πn−1 . Iterating the reasoning, we conclude that P is a permutation
matrix. 

2.58
Recessive Directions and Recessive Cone

X = {x ∈ Rn : Ax ≤ b} is nonempty
♣ Definition. A vector d ∈ Rn is called a recessive direction of X, if X contains a ray
directed by d:
∃x̄ ∈ X : x̄ + td ∈ X ∀t ≥ 0.
♠ Observation: d is a recessive direction of X iff Ad ≤ 0.
♠ Corollary: Recessive directions of X form a polyhedral cone, namely, the cone Rec(X) =
{d : Ad ≤ 0}, called the recessive cone of X.
Whenever x ∈ X and d ∈ Rec(X), one has x + td ∈ X for all t ≥ 0. In particular,
X + Rec(X) = X.
♠ Observation: The larger is a polyhedral set, the larger is its recessive cone:
X ⊂ Y are polyhedral ⇒ Rec(X) ⊂ Rec(Y ).

2.59
Recessive Subspace of a Polyhedral Set

X = {x ∈ Rn : Ax ≤ b} is nonempty
♠ Observation: Directions d of lines contained in X are exactly the vectors from the
recessive subspace
L = KerA := {d : Ad = 0} = Rec(X) ∩ [−Rec(X)]
of X, and X = X + KerA. In particular,
X = X̄ + KerA,
X̄ = {x ∈ Rn : Ax ≤ b, x ∈ [KerA]⊥ }
Note: X̄ is a polyhedral set which does not contain lines.

2.60
Pointed Polyhedral Cones & Extreme Rays
K = {x : Ax ≤ 0}
♣ Definition. Polyhedral cone K is called pointed, if it does not contain lines. Equiva-
lently: K is pointed iff K ∩ {−K} = {0}.
Equivalently: K is pointed iff KerA = {0}..
♣ Definition An extreme ray of K is a face of K which is a nontrivial ray (i.e., the set
R+ d = {td : t ≥ 0} associated with a nonzero vector, called a generator of the ray).
♠ Geometric characterization: A vector d ∈ K is a generator of an extreme ray of K
(in short: d is an extreme direction of K) iff it is nonzero and whenever d is a sum of two
vectors from K, both vectors are nonnegative multiples of d:
d = d1 + d2 , d 1 , d 2 ∈ K ⇒
∃t1 ≥ 0, t2 ≥ 0 : d1 = t1 d, d2 = t2 d.
n
Example: K = R+ This cone is pointed, and its extreme directions are positive multiples
of basic orths. There are n extreme rays Ri = {x ∈ Rn : xi ≥ 0, xj = 0∀(j 6= i)}.
♠ Observation: d is an extreme direction of K iff some (and then – all) positive multiples
of d are extreme directions of K.

2.61
= {x n
K  ∈ R n : Ax ≤ 0}
= x ∈ R : aTi x ≤ 0, i ∈ I = {1, ..., m}

♠ Algebraic characterization of extreme directions: Let K be a pointed cone. Direction


d ∈ K\{0} is an extreme direction of K iff among the homogeneous inequalities aTi x ≤ 0
which are active at d (i.e., are satisfied at d as equations) there are n − 1 inequalities with
linearly independent ai ’s.
Proof. Let 0 6= d ∈ K, I = {i ∈ I : aTi d = 0}.
• Let the set {ai : i ∈ I} contain n − 1 linearly independent vectors, say, a1 , ..., an−1 . Let us
prove that then d is an extreme direction of K. Indeed, the set
L = {x : aTi x = 0, 1 ≤ i ≤ n − 1} ⊃ KI
is a one-dimensional linear subspace in Rn . Since 0 6= d ∈ L, we have L = Rd. Since d ∈ K,
the ray R+ d is contained in KI : R+ d ⊂ KI . Since KI ⊂ L, all vectors from KI are real
multiples of d. Since K is pointed, no negative multiples of d belong to KI .
⇒ KI = R+ d and d 6= 0, i.e., KI is an extreme ray of K, and d is a generator of this ray.


2.62
• Let d be an extreme direction of K, that is, R+ d is a face of K, and let us prove that the
set {ai : i ∈ I} contains n − 1 linearly independent vectors.
Assuming the opposite, the solution set L of the homogeneous system of linear equations
aTi x = 0, i ∈ I
is of dimension ≥ 2 and thus contains a vector h which is not proportional to d. When i 6∈ I,
we have aTi d < 0 and thus aTi (d + th) ≤ 0 when |t| is small enough.
⇒ ∃t̄ > 0 : |t| ≤ t̄ ⇒ aTi [d + th] ≤ 0 ∀i ∈ I
⇒ the face KI of K (which is the smallest face of K containing d)contains two non-
proportional nonzero vectors d, d + t̄h, and thus is strictly larger than R+ d.
⇒ R+ d is not a face of K, which is a desired contradiction. 

2.63
Base of a Cone
K = {x ∈ Rn : Ax ≤ 0}.

♣ Definition. A set B of the form


B = {x ∈ K : f T x = 1} (∗)
is called a base of K, if it is nonempty and intersects with every (nontrivial) ray in K:
∀0 6= d ∈ K ∃!t ≥ 0 : td ∈ B.
Example: The set
Xn
∆n = {x ∈ Rn+ : xi = 1}
i=1
is a base of Rn+ .
♠ Observation: Set (∗) is a base of K iff K 6= {0} and f makes strictly positive inner
products with all nonzero vectors from K:
0 6= x ∈ K ⇒ f T x > 0.

2.64
3D cone K and its base B (pentagon)
Note: extreme rays of K are generated by extreme points of B

♠ Facts:
• K possesses a base iff K 6= {0} and K is pointed.
• K possesses a base B iff K possesses extreme rays, and there is one-to-one correspondence
between extreme rays of K and extreme points of B: extreme directions of K are exactly
positive multiples of extreme points of B.
• The recessive cone of a base B of K is trivial: Rec(B) = {0}.

2.65
K = {x ∈ Rn : Ax ≤ 0}.
Observations:
• If K possesses extreme rays, then K is nontrivial (K 6= {0}) and pointed (K ∩ [−K] = {0}).
In fact, the inverse is also true.
• The set of extreme rays of K is finite.
Indeed, there are no more extreme rays than faces.

2.66
Towards the Main Theorem:
First Step

Theorem. Let
X = {x ∈ Rn : Ax ≤ b}
be a nonempty polyhedral set which does not contain lines. Then
(i) The set V = Ext{X} of extreme points of X is nonempty and finite:
V = {v1 , ..., vN }
(ii) The set of extreme rays of the recessive cone Rec(X) of X is finite; let
R = {r1 , ..., rM }
be a collection of generators of the extreme rays, one generator per ray.
(iii) One has
X=
 Conv(V ) + Cone (R) 
λ ≥ 0 ∀i
 PN PM Pi N 
= x = i=1 λi vi + j=1 µj rj : i=1 λi = 1 
µj ≥ 0 ∀j

2.67
Main Lemma: Let
X = {x ∈ Rn : aTi x ≤ bi , 1 ≤ i ≤ m}
be a nonempty polyhedral set which does not contain lines. Then the set V = Ext{X} of
extreme points of X is nonempty and finite, and
X = Conv(V ) + Rec(X).
Proof: Induction in m = dim X.
Base m = 0 is evident: here
X = {a}, V = {a}, Rec(X) = {0}.
Inductive step m ⇒ m + 1: Let the statement be true for polyhedral sets of dimension
≤ m, and let dim X = m + 1, M = Aff(X), L be the linear subspace parallel to M.
• Take a point x̄ ∈ X and a nonzero direction e ∈ L. Since X does not contain lines, either
e, or −e, or both are not recessive directions of X. Swapping, if necessary, e and −e, assume
that −e 6∈ Rec(X).

2.68
X = {x ∈ Rn : aTi x ≤ bi , 1 ≤ i ≤ m}
nonempty and does not contain lines
• Let us look at the ray x̄ − R+ e = {x(t) = x̄ − te : t ≥ 0}. Since −e 6∈ Rec(X), this ray is
not contained in X, that is, all linear inequalities 0 ≤ bi − aTi x(t) ≡ αi − βi t are satisfied when
t = 0, and some of them are violated at certain values of t ≥ 0. In other words,
αi ≥ 0 ∀i & ∃i : βi > 0
αi
Let t̄ = min , I = {i : αi − βiT t̄ = 0}.
i:βi >0 βi
Then by construction
x(t̄) = x̄ − t̄e ∈ XI = {x ∈ X : aTi x = bi i ∈ I}
so that XI is a face of X. We claim that this face is proper. Indeed, every constraint bi −aTi x
with βi > 0 is nonconstant on M = Aff(X) and thus is non-constant on X. (At least) one
of these constraints bi∗ − aTi∗ x ≥ 0 becomes active at x(t̄), that is, i∗ ∈ I. This constraint is
active everywhere on XI and is nonconstant on X, whence XI $ X.

2.69
X = {x ∈ Rn : aTi x ≤ bi , 1 ≤ i ≤ m}
nonempty and does not contain lines
Situation: X 3 x̄ = x(t̄) + t̄e with t̄ ≥ 0 and x(t̄) belonging to a proper face XI of X.
• Xi is a proper face of X ⇒ dim XI < dim X⇒ (by inductive hypothesis) Ext(XI ) 6= ∅ and
XI = Conv(Ext(XI )) + Rec(XI )
⊂ Conv(Ext(X)) + Rec(X).
In particular, Ext(X) 6= ∅; as we know, this set is finite.
• It is possible that e ∈ Rec(X). Then
x̄ = x(t̄) + t̄e ∈ [Conv(Ext(X)) + Rec(X)] + t̄e
⊂ Conv(Ext(X)) + Rec(X).
• It is possible that e 6∈ Rec(X). In this case, applying the same “moving along the
ray” construction to the ray x̄ + R+ e, we get, along with x(t̄) := x̄ − t̄e ∈ XI , a point
x(−bt) := x + bte ∈ Xb
I
, where bt ≥ 0 and Xb I
is a proper face of X, whence, same as above,
x(−b t) ∈ Conv(Ext(X)) + Rec(X).
By construction, x̄ ∈ Conv{x(t̄), x(−b t)}, whence x̄ ∈ Ext(X) + Rec(X).

2.70
♣ Reasoning in pictures:
♠ Case A: e is a recessive direction of X.
• Let us move from x̄ along the direction −e.
— since e ∈ L, we all the time will stay in Aff(X)
— since −e is not a recessive direction of X, eventually we will be about to leave X. When
it happens, our position x0 = x̄ − λe, λ ≥ 0, will belong to a proper face X 0 of X
⇒ dim (X 0 ) < dim (X)
⇒ [Ind. Hype.] x0 = v + r with v ∈ Conv(Ext(X 0 )) ⊂ Conv(Ext(X)), r ∈ Rec(X 0 ) ⊂ Rec(X)
⇒ x̄ = v
|{z} + [r + λe] ⊂ Conv(Ext(X)) + Rec(X)
| {z }
∈ Conv(Ext(X)) ∈ Rec(X)

0
X&
∃ v ∈ Conv(Ext(X 0 )) ⊂ Conv(Ext(X)),
r ∈ Rec(X 0 ) ⊂ Rec(X):
x̄ e
* x0 = v + r

⇒ for some λ ≥ 0
x0 x̄ = v
|{z} + r| +
{zλe}
∈ Conv(Ext(X)) ∈ Rec(X)

v X

2.71
♣ Reasoning in pictures (continued):
♠ Case B: both e and −e are not recessive directions of X.
• As in Case A, we move from x̄ along the direction −e until hitting a proper face X 0 of X
at a point x0 .
⇒ [Ind. Hype.] x0 = v + r with
v ∈ Conv(Ext(X 0 )) ⊂ Conv(Ext(X)), r ∈ Rec(X 0 ) ⊂ Rec(X)
• Since e is not a recessive direction of X, when moving from x̄ along the direction e, we
eventually hit a proper face X 00 of X at a point x00
⇒ [Ind. Hyp.] x00 = w + s with
w ∈ Conv(Ext(X 00 )) ⊂ Conv(Ext(X)), s ∈ Rec(X 00 ) ⊂ Rec(X)
• x̄ is a convex combination of x0 and x00 : x̄ = λx0 + (1 − λ)x00
⇒ x̄ = λv + (1 − λ)w + λr + (1 − λ)s ∈ Conv(Ext(X)) + Rec(X)
| {z } | {z }
∈ Conv(Ext(X)) ∈ Rec(X)

0
X&
∃ v ∈ Conv(Ext(X 0 )) ⊂ Conv(Ext(X)):
x0 ∈ v + Rec(X 0 ) ⊂ v + Rec(X)
∃ w ∈ Conv(Ext(X 00 )) ⊂ Conv(Ext(X)):
x00 ∈ w + Rec(X 00 ) ⊂ w + Rec(X)
⇒ x̄ ∈ Conv{x0 , x00 } ⊂ Conv{v, w} + Rec(X)
X ⊂ Conv(Ext(X)) + Rec(X)
x0

e - X 00
v R
@

x00
w

2.72
♠ Summary: We have proved that Ext(X) is nonempty and finite, and every point x̄ ∈ X
belongs to
Conv(Ext(X)) + Rec(X),
that is, X ⊂ Conv(Ext(X)) + Rec(X). Since
Conv(Ext(X)) ⊂ X
and X + Rec(X) = X, we have also
X ⊃ Conv(Ext(X)) + Rec(X)
⇒ X = Conv(Ext(X)) + Rec(X). Induction is complete. 

2.73
Corollaries of Main Lemma: Let X be a nonempty polyhedral set which does not contain
lines.
A. If X is has a trivial recessive cone, then
X = Conv(Ext(X)).
B. If K is a nontrivial pointed polyhedral cone, the set of extreme rays of K is nonempty
and finite, and if r1 , ..., rM are generators of the extreme rays of K, then
K = Cone {r1 , ..., rM }.
Proof of B: Let B be a base of K, so that B is a nonempty polyhedral set with Rec(B) =
{0}. By A, Ext(B) is nonempty, finite and B = Conv(Ext(B)), whence K = Cone (Ext(B)).
It remains to note every nontrivial ray in K intersects B, and a ray is extreme iff this
intersection is an extreme point of B.
♠ Augmenting Main Lemma with Corollary B, we get the Theorem.

2.74
♣ We have seen that if X is a nonempty polyhedral set not containing lines, then X admits
a representation
X = Conv(V ) + Cone {R} (∗)
where
• V = V∗ is the nonempty finite set of all extreme points of X;
• R = R∗ is a finite set comprised of generators of the extreme rays of Rec(X) (this set
can be empty).

♠ It is easily seen that this representation is “minimal:” Whenever X is represented in the


form of (∗) with finite sets V , R,
— V contains all vertices of X
— R contains generators of all extreme rays of Rec(X).

2.75
Structure of a Polyhedral Set

Main Theorem (i) Every nonempty polyhedral set X ⊂ Rn can be represented as


X = Conv(V ) + Cone (R) (∗)
where V ⊂ Rn is a nonempty finite set, and R ⊂ Rn is a finite set.
(ii) Vice versa, if a set X given by representation (∗) with a nonempty finite set V and
finite set R, X is a nonempty polyhedral set.
Proof. (i): We know that (i) holds true when X does not contain lines. Now, every
nonempty polyhedral set X can be represented as
b + L, L = Lin{f1, ..., fk },
X=X
where X
b is a nonempty polyhedral set which does not contain lines. In particular,
b = Conv{v1, ..., vN } + Cone {r1, ..., rM }
X
X = Conv{v1 , ..., vN }

+Cone {r1 , ..., rM , f1 , −f1 , ..., fK , −fK }
Note: In every representation (∗) of X, Cone (R) = Rec(X).

2.76
(ii): Let
X = Conv{v1 , ..., vN } + Cone {r1 , .., rM } ⊂ Rn [N ≥ 1, M ≥ 0]
We want to prove that X is a polyhedral set.
Indeed, X admits polyhedral representation:
 PN PM 
x = i=1 λi vi + j=1 µj rj
X = x : ∃λ, µ : PN
λ ≥ 0, µ ≥ 0, i=1 λi = 1
and as such is polyhedral.

2.77
Immediate Corollaries

Corollary I. A nonempty polyhedral set X possesses extreme points iff X does not contain
lines. In addition, the set of extreme points of X is finite.
Indeed, if X does not contain lines, X has extreme points and their number is finite by
Main Lemma. When X contains lines, every point of X belongs to a line contained in X,
and thus X has no extreme points.

2.78
Corollary II. (i) A nonempty polyhedral set X is bounded iff its recessive cone is trivial:
Rec(X) = {0}, and in this case X is the convex hull of the (nonempty and finite) set of its
extreme points:
∅ 6= Ext(X) is finite and X = Conv(Ext(X)).
(ii) The convex hull of a nonempty finite set V is a bounded polyhedral set, and
Ext(Conv(X)) ⊂ V .
Proof of (i): if Rec(X) = {0}, then X does not contain lines and therefore ∅ 6= Ext(X) is
finite and
X = Conv(Ext(X)) + Rec(X)
= Conv(Ext(X)) + {0} (∗)
= Conv(Ext(X)),
and thus X is bounded as the convex hull of a finite set.
Vice versa, if X is bounded, then X clearly does not contain nontrivial rays and thus
Rec(X) = {0}.
Proof of (ii): By Main Theorem (ii),
X := Conv({v1 , ..., vm })
is a polyhedral set, and this set clearly is bounded. Besides this, X = Conv(V ) always
implies that Ext(X) ⊂ V .

2.79
Application examples:
• Every vector x from the set
Xn
n
{x ∈ R : 0 ≤ xi ≤ 1, 1 ≤ i ≤ n, xi ≤ k}
i=1
(k is an integer) is a convex combination of Boolean vectors from this set.

• Every double-stochastic matrix is a convex combination of perturbation matrices.

2.80
Application example: Polar of a polyhedral set. Let X ⊂ Rn be a polyhedral set
containing the origin. The polar Polar (X) of X is given by
Polar (X) = {y : y T x ≤ 1 ∀x ∈ X}.
Examples: • Polar (Rn ) = {0}
• Polar ({0}) = Rn
• If L is a linear subspace, then Polar (L) = L⊥
• If K ⊂ Rn is Pa cone, then Polar (K) = −K∗
• Polar ({x : i |xi | ≤ 1}) = {x :P |xi | ≤ 1 ∀i}
• Polar ({x : |xi | ≤ 1 ∀i}) = {x : i |xi | ≤ 1}

2.81
Theorem. When X is a polyhedral set containing the origin, so is Polar (X), and
Polar (Polar (X)) = X.
Proof. • Representing
X X X
X = {x = λi vi + µj rj : λ ≥ 0, λi = 1, µ ≥ 0}
i j i

we clearly get
Polar (X) = {y : y T vi ≤ 1 ∀i, y T rj ≤ 0 ∀j}
⇒ Polar (X) is a polyhedral set. Inclusion 0 ∈ Polar (X) is evident.
• Let us prove that Polar (Polar (X)) = X. By definition of the polar, we have X ⊂
Polar (Polar (X)). To prove the opposite inclusion, let x̄ ∈ Polar (Polar (X)), and let us
prove that x̄ ∈ X. X is polyhedral:
X = {x : aTi x ≤ bi , 1 ≤ i ≤ K}
and bi ≥ 0 due to 0 ∈ X. By scaling inequalities aTi x ≤ bi , we can assume further that all
nonzero bi ’s are equal to 1. Setting I = {i : bi = 0}, we have
i ∈ I ⇒ aTi x ≤ 0 ∀x ∈ X ⇒ λai ∈ Polar (X) ∀λ ≥ 0
⇒ x̄T [λai ] ≤ 1 ∀λ ≥ 0 ⇒ aTi x̄ ≤ 0
i 6∈ I ⇒ aTi x ≤ bi = 1 ∀x ∈ X ⇒ ai ∈ Polar (X)
⇒ x̄T ai ≤ 1,
whence x̄ ∈ X. 

2.82
Corollary III. (i) A cone K is polyhedral iff it is the conic hull of a finite set:
K = {x ∈ Rn : Bx ≤ 0}
⇔ ∃R = {r1 , ..., rM } ⊂ Rn : K = Cone (R)
(ii) When K is a nontrivial and pointed polyhedral cone, one can take as R the set of
generators of the extreme rays of K.
Proof of (i): If K is a polyhedral cone, then K = K b + L with a linear subspace L and a
pointed cone K. b By Main Theorem (i), we have
Kb = Conv(Ext(K)) b + Cone ({r1, ..., rM })
= Cone ({r1 , ..., rM })
since Ext(K)b = {0}, whence
K = Cone ({r1 , ..., rM , f1 , −f1 , ..., fs , −fs })
where {f1 , ...fs } is a basis in L.
Vice versa, if K = Cone ({r1 , ..., rM }), then K is a polyhedral set (Main Theorem (ii)). that
is, K = {x : Ax ≤ b} for certain A, b. Since K is a cone, we have
K = Rec(K) = {x : Ax ≤ 0},
that is, K is a polyhedral cone. 

(ii) is Corollary B of Main Lemma.

2.83
Applications in LO
♣ Theorem. Consider a LO program 
Opt = max cT x : Ax ≤ b ,
x
and let the feasible set X = {x : Ax ≤ b} be nonempty and thus representable as
X = Conv({v1 , ..., vN }) + Cone ({r1 , ..., rM })
with properly selected v1 , ..., vN , r1 , ..., rM .
(i) The program is solvable iff c has nonpositive inner products with all rj .
(ii) If X does not contain lines and the program is bounded, then among its optimal
solutions, if any exist, there are vertices of X.
Proof. • We have
 
P
T
P T
Pi ≥ 0
λ 
Opt := sup λi c vi + µj c rj : i λi = 1 < +∞
λ,µ  i j µ ≥0 
j
⇔ cT r
≤ 0 ∀r ∈ Cone ({r1 , ..., rM }) = Rec(X)
⇒ Opt = maxi cT vi
Thus, Opt < +∞ implies that the best of the points v1 , ..., vM is an optimal solution.
• It remains to note that when X does not contain lines, we can set {vi }N i=1 = Ext(X). 

2.84
Application to Knapsack problem. A knapsack can store k items. You have n ≥ k items,
j-th of value cj ≥ 0. How to select items to be placed into the knapsack in order to get the
most valuable selection?
Solution: Assuming for a moment that we can put to the knapsack fractions of items, let
xj be the fraction of item j we put to the knapsack. The most valuable selection then is
given by an optimal solution to the LO program
 
X X 
max cj xj : 0 ≤ xj ≤ 1, xj ≤ k
x  
j j

The feasible set is nonempty, polyhedral and bounded, and all extreme points are Boolean
vectors from this set
⇒ There is a Boolean optimal solution.
In fact, the optimal solution is evident: we should put to the knapsack k most valuable of
the items.

2.85
Application to Assignment problem. There are n jobs and n workers. Every job takes
one man-hour. The profit of assigning worker i with job j is cij . How to assign workers
with jobs in such a way that every worker gets exactly one job, every job is carried out by
exactly one worker, and the total profit of the assignment is as large as possible?
Solution: Assuming for a moment that a worker can distribute his time between several
jobs and denoting xij the fraction of activity of worker i spent on job j, we get a relaxed
problem
 
X X X 
max cij xij : xij ≥ 0, xij = 1 ∀j, xij = 1 ∀i
x  
i,j i j

The feasible set is polyhedral, nonempty and bounded


⇒ Program is solvable, and among the optimal solutions there are extreme points of the
set of double stochastic matrices, i.e., permutation matrices
⇒ Relaxation is exact!

2.86
Theory of Systems of Linear Inequalities
and
Duality

♣ We still do not know how to answer some most basic questions about polyhedral sets,
e.g.:
♠ How to recognize that a polyhedral set X = {x ∈ Rn : Ax ≤ b} is/is not empty?
♠ How to recognize that a polyhedral set X = {x ∈ Rn : Ax ≤ b} is/is not bounded?
♠ How to recognize that two polyhedral sets X = {x ∈ Rn : Ax ≤ b} and X 0 = {x : A0 x ≤ b0 }
are/are not distinct?
♠ How to recognize that a given LO program is feasible/bounded/solvale?
♠ .....
Our current goal is to find answers to these and similar questions, and these answers come
from Linear Programming Duality Theorem which is the second main theoretical result in
LO.

3.1
Theorem on Alternative

♣ Consider a system of m strict and nonstrict linear inequalities in variables x ∈ Rn :



< bi , i ∈ I
aTi x (S)
≤ bi , i ∈ I
• ai ∈ Rn , bi ∈ R, 1 ≤ i ≤ m,
• I ⊂ {1, ..., m}, I = {1, ..., m}\I.
Note: (S) is a universal form of a finite system of linear inequalities in n variables.
♣ Main questions on (S) [operational form]:
• How to find a solution to the system if one exists?
• How to find out that (S) is infeasible?
♠ Main questions on (S) [descriptive form]:
• How to certify that (S) is solvable?
• How to certify that (S) is infeasible?

3.2

< bi , i ∈ I
aTi x (S)
≤ bi , i ∈ I
♠ The simplest certificate for solvability of (S) is a solution: plug a candidate certificate
into the system and check that the inequalities are satisfied.
Example: The vector x̄ = [10; 10; 10] is a solvability certificate for the system
−x1 −x2 −x3 < −29
x1 +x2 ≤ 20
x2 +x3 ≤ 20
x1 +x3 ≤ 20
– when plugging it into the system, we get valid numerical inequalities.
But: How to certify that (S) has no solution? E.g., how to certify that the system
−x1 −x2 −x3 < −30
x1 +x2 ≤ 20
x2 +x3 ≤ 20
x1 +x3 ≤ 20
has no solutions?

3.3

< bi , i ∈ I
aTi x (S)
≤ bi , i ∈ I
♣ How to certify that (S) has no solutions?
♠ A recipe: Take a weighted sum, with nonnegative weights, of the inequalities from the
system, thus getting strict or nonstrict scalar linear inequality which, due its origin, is a
consequence of the system – it must be satisfied at every solution to (S). If the resulting
inequality has no solutions at all, then (S) is unsolvable.
Example: To certify that the system
2× −x1 −x2 −x3 < −30
1× x1 +x2 ≤ 20
1× x2 +x3 ≤ 20
1× x1 +x3 ≤ 20
has no solutions, take the weighted sum of the inequalities with the weights marked in red,
thus arriving at the inequality
0 · x1 + 0 · x2 + 0 · x3 < 0.
This is a contradictory inequality which is a consequence of the system
⇒ weights λ = [2; 1; 1; 1] certify insolvability.

3.4

< bi , i ∈ I
aTi x (S)
≤ bi , i ∈ I
A recipe for certifying insolvability:
• assign inequalities of (S) with weights λi ≥ 0 and sum them up, thus arriving at the
inequality Pm Pm
[ λ a ] Tx Ω
 i=1 i i P i=1 λibi 
Ω = ” < ” when P i∈I λi > 0 (!)
Ω = ” ≤ ” when i∈I λi = 0
• If (!) has no solutions, (S) is insolvable. Pm
♠ Observation:
Pm Inequality
P (!) has no solution iff i=1 λi ai = 0 and, in addition,
• i=1 λi bi ≤0 when Pi∈I λi > 0
Pm
• i=1 λi bi <0 when i∈I λi = 0
♣ We have arrived at
Proposition: Given system (S), let us associate with it two systems of linear inequalities
in variables λ1 , ..., λm :
≥ ∀i λ ≥ 0 ∀i
 
λ i 0
 Pm
  Pi m

Pi=1 λi ai = 0 Pi=1 λi ai = 0
(I) : m , (II) : m
 Pi=1 λibi ≤ 0
 
 i=1 λi bi < 0
i∈I λi > 0 λi = 0, i ∈ I
If at least one of the systems (I), (II) has a solution, then (S) has no solutions.

3.5
General Theorem on Alternative: Consider, along with system of linear inequalities

< bi , i ∈ I
aTi x (S)
≤ bi , i ∈ I
in variables x ∈ Rn , two systems of linear inequalitiesin variables λ ∈ Rm :
λ ≥ 0 ∀i λ ≥ 0 ∀i

 Pi m
  Pi m

Pi=1 λi ai = 0 Pi=1 λi ai = 0
(I) : m , (II) : m
 Pi=1 λibi ≤ 0
 
 i=1 λi bi < 0
i∈I λi > 0 λi = 0, i ∈ I
System (S) has no solutions if and only if at least one of the systems (I), (II) has a solution.
Remark: Strict inequalities in (S) in fact do not participate in (II). As a result, (II) has a
solution iff the “nonstrict” subsystem
aTi x ≤ bi , i ∈ I (S 0 )
of (S) has no solutions.
Remark: GTA says that a finite system of linear inequalities has no solutions if and only
if (one of two) other systems of linear inequalities has a solution. Such a solution can be
considered as a certificate of insolvability of (S): (S) is insolvable if and only if such an
insolvability certificate exists.

3.6

< bi , i ∈ I
aTi x (S)
≤ bi , i ∈ I
λ ≥ 0 ∀i λ ≥ 0 ∀i
 
 Pi m
  Pi m

Pi=1 λi ai = 0 Pi=1 λi ai = 0
(I) : m , (II) : m
 Pi=1
 λ i b i ≤ 0 
 i=1 λi bi < 0
i∈I λi > 0 λi = 0, i ∈ I

Proof of GTA: In one direction: “If (I) or (II) has a solution, then (S) has no solutions”
the statement is already proved. Now assume that (S) has no solutions, and let us prove
that one of the systems (I), (II) has a solution. Consider the system of homogeneous linear
inequalities in variables x, t, :
aTi x −bi t + ≤ 0, i ∈ I
aTi x −bi t ≤ 0, i ∈ I
−t + ≤ 0
− < 0
We claim that this system has no solutions. Indeed, assuming that the system has a solution
x̄, t̄, ¯
, we have ¯
 > 0, whence
t̄ > 0 & aTi x̄ < bi t̄, i ∈ I & aTi x̄ ≤ bi t̄, i ∈ I,
⇒ x = x̄/t̄ is well defined and solves unsolvable system (S), which is impossible.

3.7
Situation: System
aTi x −bi t + ≤ 0, i ∈ I
aTi x −bi t ≤ 0, i ∈ I
−t + ≤ 0
− < 0
has no solutions, or, equivalently, the homogeneous linear inequality
− ≥ 0
is a consequence of the system of homogeneous linear inequalities
−aTi x +bi t − ≥ 0, i ∈ I
−aTi x +bi t ≥ 0, i ∈ I
t − ≥ 0
in variables x, t, . By Homogeneous Farkas Lemma, there exist µi ≥ 0, 1 ≤ i ≤ m, µ ≥ 0
such that
Pm m
P P
µi ai = 0 & µi bi + µ = 0 & µi + µ = 1
i=1 i=1 i∈I
When µ > 0, setting λi = µi /µ, wePget
m Pm
P λ ≥ 0, λ a
i=1 i i = 0, i=1 λi bi = −1,
⇒ when i∈I λi > 0, λ solves (I), otherwise λ solves (II).
When µ = 0, setting λi = µi , we Pget P P
λ ≥ 0, i λi ai = 0, i λi bi = 0, i∈I λi = 1,
and λ solves (I). 

3.8
♣ GTA is equivalent to the following
Principle: A finite system of linear inequalities has no solution iff one can get, as a legitimate
(i.e., compatible with the common rules of operating with inequalities) weighted sum of
inequalities from the system, a contradictory inequality, i.e., either inequality 0T x ≤ −1, or
the inequality 0T x < 0.
The advantage of this Principle is that it does not require converting the system into a
standard form. For example, to see that the system of linear constraints
x1 +2x2 < 5
2x1 +3x2 ≥ 3
3x1 +4x2 = 1
has no solutions, it suffices to take the weighted sum of these constraints with the weights
−1, 2, −1, thus arriving at the contradictory inequality
0 · x1 + 0 · x2 > 0

3.9
♣ Specifying the system in question and applying GTA, we can obtain various particular
cases of GTA, e.g., as follows:
Inhomogeneous Farkas Lemma: A nonstrict linear inequality
aT x ≤ α (!)
is a consequence of a solvable system of nonstrict linear inequalities
aTi x ≤ bi , 1 ≤ i ≤ m (S)
if and only if (!) can be obtained by taking weighted sum, with nonnegative coefficients, of
the inequalities from the system and the identically true inequality 0T x ≤ 1: (S) implies (!)
is and only if there exist nonnegative weights
Pm λ0, λ1, ..., λm such that
λ0 · [1 − 0 x] + i=1 λi [bi − ai x] ≡ α − aT x,
T T

or, which is the same, iff there exist nonnegative λ0 , λ1 , ..., λm such that
m
P Pm
λi ai = a, λi bi + λ0 = α
i=1 i=1
or, which again is the same, iff P
there exist nonnegative λ1 , ..., λm such that
m Pm
i=1 λi ai = a, i=1 λi bi ≤ α.

3.10
aTi x ≤ bi , 1 ≤ i ≤ m (S)
aT x ≤ α (!)

Proof. If (!) can be obtained as a weighted sum, with nonnegative coefficients, of


the inequalities from (S) and the inequality 0T x ≤ 1, then (!) clearly is a corollary of (S)
independently of whether (S) is or is not solvable.
Now let (S) be solvable and (!) be a consequence of (S); we want to prove that (!) is
a combination, with nonnegative weights, of the constraints from (S) and the constraint
0T x ≤ 1. Since (!) is a consequence of (S), the system
−aT x < −α, aTi x ≤ bi , 1 ≤ i ≤ m (M )
has no solutions, whence, by GTA, a legitimate weighted sum of the inequalities from the
system is contradictory, that is, there exist µ ≥ 0, λi ≥ 0:
Pm
−µa + i=1 λi aP
i = 0,
≥ −µα + Pm

0 i=1 λi bi & µ > 0
m
(!!)
> −µα + i=1 λi bi & µ = 0

3.11
Situation: the system
−aT x < −α, aTi x ≤ bi 1 ≤ i ≤ m (M )
has no solutions, whence there exist µ ≥ 0, λ ≥ 0 such that
Pm
−µa + i=1 λi aP
i = 0,
≥ −µα + Pm

0 i=1 λi bi & µ > 0
m
(!!)
> −µα + i=1 λi bi & µ = 0
Claim: µ > 0. Indeed, otherwise the inequality −aT x < −α does not participate in the
weighted sum of the constraints from (M ) which is a contradictory inequality
⇒ (S) can be led to a contradiction by taking weighted sum of the constraints
⇒ (S) is infeasible, which is a contradiction.
• When µ > 0, setting λi = µi /µ, we get from (!!)
X m
X
λi ai = a & λi bi ≤ α. 
i i=1

3.12
Why GTA is a deep fact?

♣ Consider the system of four linear inequalities in variables u, v:


−1 ≤ u ≤ 1, −1 ≤ v ≤ 1
and let us derive its consequence as follows:
−1 ≤ u ≤ 1, −1 ≤ v ≤ 1
⇒ u2 ≤ 1, v 2 ≤ 1
⇒ u2 + v 2 ≤ 2 p p
⇒ u + v =√ 1·√ 2
u+1·v ≤ 1 +1 2 u2 + v 2
⇒ u+v ≤ 2 2=2
We have derived from solvable system of nonstrict linear inequalities a consequence which
is a nonstrict linear inequality, and the derivation was “highly nonlinear.”
A statement which says that every derivation of this type can be replaced by just taking
weighted sum of the original inequalities and the trivial inequality 0T x ≤ 1 is a deep statement
indeed!

3.13
♣ For every system S of inequalities, linear or nonlinear alike, taking weighted sums of
inequalities of the system and trivial – identically true – inequalities always results in a
consequence of S. However, GTA heavily exploits the fact that the inequalities of the
original system and a consequence we are looking for are linear. Already for quadratic
inequalities, the statement similar to GTA fails to be true. For example, the quadratic
inequality
x2 ≤ 1 (!)
is a consequence of the system of linear (and thus quadratic) inequalities
−1 ≤ x ≤ 1 (∗)
Nevertheless, (!) cannot be represented as a weighted sum of the inequalities from (∗) and
identically true linear and quadratic inequalities, like
0 · x ≤ 1, x2 ≥ 0, x2 − 2x + 1 ≥ 0, ...

3.14
Answering Questions

♣ How to certify that a polyhedral set X = {x ∈ Rn : Ax ≤ b} is empty/nonempty?


♠ A certificate for X to be nonempty is a solution x̄ to the system Ax ≤ b.
♠ A certificate for X to be empty is a solution λ̄ to the system λ ≥ 0, AT λ = 0, bT λ < 0.
In both cases, X possesses the property in question iff it can be certified as explained above
(“the certification schemes are complete”).
Note: All certification schemes to follow are complete!

3.15
Examples: • The vector x = [1; ...; 1] ∈ Rn certifies that the polyhedral set
X = {x ∈ Rn : x1 + x2 ≤ 2, x2 + x3 ≤ 2, ..., xn + x1 ≤ 2,
−x1 − ... − xn ≤ −n}
is nonempty.
• The vector λ = [1; 1; ...; 1; 2] ∈ Rn+1 ≥ 0 certifies that the polyhedral set
X = {x ∈ Rn : x1 + x2 ≤ 2, x2 + x3 ≤ 2, ..., xn + x1 ≤ 2,
−x1 − ... − xn ≤ −n − 0.01}
is empty. Indeed, summing up the n + 1 constraints defining X = {x : Ax ≤ b} with weights
λi , we get the contradictory inequality
0 ≡ 2 (x1 + ... + xn ) − 2 [x1 + ... + xn ]
| {z }
[AT λ]T x≡0

≤ 2n − 2(n + 0.01) = −0.02


| {z }
bT λ=−0.02<0

3.16
♣ How to certify that a linear inequality cT x ≤ d is violated somewhere on a polyhedral set
X = {x ∈ Rn : Ax ≤ b},
that is, the inequality is not a consequence of the system Ax ≤ b?
A certificate is x̄ such that Ax̄ ≤ b and cT x̄ > d.

3.17
♣ How to certify that a linear inequality cT x ≤ d is satisfied everywhere on a polyhedral set
X = {x ∈ Rn : Ax ≤ b},
that is, the inequality is a consequence of the system Ax ≤ b?
♠ The situation in question arises in two cases:
A. X is empty, the target inequality is an arbitrary one
B. X is nonempty, the target inequality is a consequence of the system Ax ≤ b
Consequently, to certify the fact in question means
— either to certify that X is empty, the certificate being λ such that λ ≥ 0, AT λ = 0, bT λ < 0,
— or to certify that X is nonempty by pointing out a solution x̄ to the system Ax ≤ b and
to certify the fact that cT x ≤ d is a consequence of the solvable system Ax ≤ b by pointing
out a λ which satisfies the system λ ≥ 0, AT λ = c, bT λ ≤ d (we have used Inhomogeneous
Farkas Lemma).

3.18
Note: In the second case, we can omit the necessity to certify that X 6= ∅, since the
existence of λ satisfying λ ≥ 0, AT λ = c, bT λ ≤ d always is sufficient for cT x ≤ d to be a
consequence of Ax ≤ b.

3.19
Example:• To certify that the linear inequality
cT x := x1 + ... + xn ≤ d := n − 0.01
is violated somewhere on the polyhedral set
X = {x ∈ Rn : x1 + x2 ≤ 2, x2 + x3 ≤ 2, ..., xn + x1 ≤ 2}
= {x : Ax ≤ b}
it suffices to note that x = [1; ...; 1] ∈ X and n = cT x > d = n − 0.01
• To certify that the linear inequality
cT x := x1 + ... + xn ≤ d := n
is satisfied everywhere on the above X, it suffices to note that when taking weighted sum
of inequalities defining X, the weights being 1/2, we get the target inequality.
Equivalently: for λ = [1/2; ...; 1/2] ∈ Rn it holds λ ≥ 0, AT λ = [1; ...; 1] = c, bT λ = n ≤ d

3.20
♣ How to certify that a polyhedral set X = {x ∈ Rn : Ax ≤ b} does not contain a polyhedral
set Y = {x ∈ Rn : Cx ≤ d}?
A certificate is a point x̄ such that Cx̄ ≤ d (i.e., x̄ ∈ Y ) and x̄ does not solve the system
Ax ≤ b (i.e., x̄ 6∈ X).
♣ How to certify that a polyhedral set
X = {x ∈ Rn : Ax ≤ b}
contains a polyhedral set
Y = {x ∈ Rn : Cx ≤ d}?
This situation arises in two cases:
— Y = ∅, X is arbitrary. To certify that this is the case, it suffices to point out λ such that
λ ≥ 0, C T λ = 0, dT λ < 0
— Y is nonempty and every one of the m linear inequalities aTi x ≤ bi defining X is satisfied
everywhere on Y . To certify that this is the case, it suffices to point out x̄, λ1 , ..., λm such
that
C T x̄ ≤ d & λi ≥ 0, C T λi = ai , dT λi ≤ bi , 1 ≤ i ≤ m.
Note: Same as above, we can omit the necessity to point out x̄.

3.21
Examples. • To certify that the set
Y = {x ∈ R3 : −2 ≤ x1 + x2 ≤ 2, −2 ≤ x2 + x3 ≤ 2,
−2 ≤ x3 + x1 ≤ 2}
is not contained in the box
X = {x ∈ R3 : |xi | ≤ 2, 1 ≤ i ≤ 3},
it suffices to note that the vector x̄ = [3; −1; −1] belongs to Y and does not belong to X.
• To certify that the above Y is contained in
X 0 = {x ∈ R3 : |xi | ≤ 3, 1 ≤ i ≤ 3}
note that summing up the 6 inequalities
x1 + x2 ≤ 2, −x1 − x2 ≤ 2, x2 + x3 ≤ 2, −x2 − x3 ≤ 2,
x3 + x1 ≤ 2, −x3 − x1 ≤ 2
defining Y with the nonnegative weights
λ1 = 1, λ2 = 0, λ3 = 0, λ4 = 1, λ5 = 1, λ6 = 0
we get
[x1 + x2 ] + [−x2 − x3 ] + [x3 + x1 ] ≤ 6 ⇒ x1 ≤ 3
— with the nonnegative weights
λ1 = 0, λ2 = 1, λ3 = 1, λ4 = 0, λ5 = 0, λ6 = 1
we get
[−x1 − x2 ] + [x2 + x3 ] + [−x3 − x1 ] ≤ 6 ⇒ −x1 ≤ 3
The inequalities −3 ≤ x2 , x3 ≤ 3 can be obtained similarly ⇒ Y ⊂ X 0 .

3.22
♣ How to certify that a polyhedral set Y = {x ∈ Rn : Ax ≤ b} is bounded/unbounded?
• Y is bounded iff for properly chosen R it holds
Y ⊂ XR = {x : |xi | ≤ R, 1 ≤ i ≤ n}
To certify this means
— either to certify that Y is empty, the certificate being λ: λ ≥ 0, AT λ = 0, bT λ < 0,
— or to points out vectors R and vectors λi± such that λi± ≥ 0, AT λi± = ±ei , bT λi± ≤ R for
all i. Since R can be chosen arbitrary large, the latter amounts to pointing out vectors λi±
such that λi± ≥ 0, AT λi± = ±ei , i = 1, ..., n.
• Y is unbounded iff Y is nonempty and the recessive cone Rec(Y ) = {x : Ax ≤ 0} is
nontrivial. To certify that this is the case, it suffices to point out x̄ satisfying Ax̄ ≤ b and d¯
satisfying d¯ 6= 0, Ad¯ ≤ 0.
Note: When Y = {x ∈ Rn : Ax ≤ b} is known to be nonempty, its
boundedness/unboundedness is independent of the particular value of b!

3.23
Examples: • To certify that the set
X = {x ∈ R3 : −2 ≤ x1 + x2 ≤ 2, −2 ≤ x2 + x3 ≤ 2,
−2 ≤ x3 + x1 ≤ 2}
is bounded, it suffices to certify that it belongs to the box {x ∈ R3 : |xi | ≤ 3, 1 ≤ i ≤ 3},
which was already done.
• To certify that the set
X = {x ∈ R4 : −2 ≤ x1 + x2 ≤ 2, −2 ≤ x2 + x3 ≤ 2,
−2 ≤ x3 + x4 ≤ 2, −2 ≤ x4 + x1 ≤ 2}
is unbounded, it suffices to note that the vector x̄ = [0; 0; 0; 0] belongs to X, and the
vector d¯ = [1; −1; 1; −1] when plugged into the inequalities defining X make the bodies of
the inequalities zero and thus is a recessive direction of X.

3.24
Certificates in LO

♣ Consider LO program in the form


  
  P x ≤ p (`) 
T
Opt = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 

♠ How to certify that (P ) is feasible/infeasible?


• To certify that (P ) is feasible, it suffices to point out a feasible solution x̄ to the program.
• To certify that (P ) is infeasible, it suffices to point out aggregation weights λ` , λg , λe such
that
λ` ≥ 0, λg ≤ 0
P λ` + QT λg + RT λe = 0
T

pT λ` + q T λg + rT λe < 0

3.25
  
  P x ≤ p (`) 
T
Opt = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 

♠ How to certify that (P ) is bounded/unbounded?


• (P ) is bounded if either (P ) is infeasible, or (P ) is feasible and there exists a such that
the inequality cT x ≤ a is consequence of the system of constraints. Consequently, to certify
that (P ) is bounded, we should
— either point out an infeasibility certificate λ` ≥ 0, λg ≤ 0, λe : P T λ` + QT λg + RT λe =
0, pT λ` + q T λg + rT λe < 0 for (P ),
— or point out a feasible solution x̄ and a, λ` , λg , λe such that
λ` ≥ 0, λg ≤ 0 & P T λ` + QT λg + RT λe = c
& pT λ` + q T λg + rT λe ≤ a
which, since a can be arbitrary, amounts to
λ` ≥ 0, λg ≤ 0, P T λ` + QT λg + RT λe = c
Note: We can skip the necessity to certify that (P ) is feasible.

3.26
  
  P x ≤ p (`) 
T
Opt = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 

• (P ) is unbounded iff (P ) is feasible and there is a recessive direction d such that cT d > 0
⇒ to certify that (P ) is unbounded, we should point out a feasible solution x̄ to (P ) and a
vector d such that
P d ≤ 0, Qd ≥ 0, Rd = 0, cT d > 0.
Note: If (P ) is known to be feasible, its boundedness/unboundedness is independent of a
particular value of [p; q; r].

3.27
  
  P x ≤ p (`) 
T
Opt = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 

♠ How to certify that Opt ≥ a for a given a ∈ R?


a certificate is a feasible solution x̄ with cT x ≥ a.
♠ How to certify that Opt ≤ a for a given a ∈ R?
Opt ≤ a iff the linear inequality cT x ≤ a is a consequence of the system of constraints. To
certify this, we should
— either point out an infeasibility certificate λ` , λg , λe :
λ` ≥ 0, λg ≤ 0,
P T λ` + QT λg + RT λe = 0,
pT λe + q T λg + rT λe < 0
for (P ),
— or point out λ` , λg , λe such that
λ` ≥ 0, λg ≤ 0 (a)
T T T
P λ` + Q λg + R λe = c (b)
pT λ` + q T λg + rT λe ≤ a (c)
Note: Whenever λ` , λg , λe satisfy (a) and (b), the left hand side in (c) upper-bounds cT x
for every x feasible for (P )
⇒ If cT x ≤ a for every feasible x and cT x̄ = a for some feasible x̄, then ”≤” in (c) is in fact
”=”.

3.28
  
  P x ≤ p (`) 
T
Opt = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 

♣ How to certify that x̄ is an optimal solution to (P )?


• x̄ is optimal solution iff it is feasible and Opt ≤ cT x̄. The latter amounts to existence of
λ` , λg , λe such that
(a) (b)
z }| { z }| {
T T T
λ` ≥ 0, λg ≤ 0 & P λ` + Q λg + R λe = c
& pT λ` + q T λg + rT λe = cT x̄
| {z }
(c)

Multiplying both sides in (b) by x̄T and subtracting the resulting equality from (c) results in
≥0 ≤0
z }| { z }| { z =0
}| {
0 = λT` [p − P x̄] + λTg [q − Qx̄] +λTe [r − Rx̄]
|{z} |{z}
≥0 ≤0

which is possible iff (λ` )i [pi − (P x̄)i ] = 0 for all i and (λg )j [qj − (Qx̄)j ] = 0 for all j.

3.29
  
  P x ≤ p (`) 
T
Opt = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 

♠ We have arrived at the Karush-Kuhn-Tucker Optimality Conditions in LO:

A feasible solution x̄ to (P ) is optimal iff the constraints of (P ) can be assigned


with vectors of Lagrange multipliers λ` , λg , λe in such a way that
• [signs of multipliers] Lagrange multipliers associated with ≤-constraints are non-
negative, and Lagrange multipliers associated with ≥-constraints are nonpositive,
• [complementary slackness] Lagrange multipliers associated with non-active at x̄
constraints are zero, and
• [KKT equation] One has
P T λ` + QT λg + RT λe = c .

3.30
Example. To certify that the feasible solution x̄ = [1; ...; 1] ∈ Rn to the LO program

max x1 + ... + xn :
x
x1 + x2 ≤ 2, x2 + x3 ≤ 2 , ..., xn + x1 ≤ 2 
x1 + x2 ≥ −2, x2 + x3 ≥ −2 , ..., xn + x1 ≥ −2

is optimal, it suffices to assign the constraints with Lagrange multipliers λ` = [1/2; 1/2; ...; 1/2],
λg = [0; ...; 0] and to note that
λ` ≥ 0, λg ≤ 0
1 1
 
 1 1 
T T
 1 1 
P λ` + Q λg =   λ` = c := [1; ...; 1]
 
1 ...
 
 ... 1 
1 1
and complementary slackness takes place.

3.31
♣ Application: Faces of polyhedral set revisited. Recall that a face of a nonempty poly-
hedral set
X = {x ∈ Rn : aTi x ≤ bi , 1 ≤ i ≤ m}
is a nonempty set of the form
XI = {x ∈ Rn : aTi x = bi , i ∈ I, aTi x ≤ bi , i 6∈ I}
This definition is not geometric.
Geometric characterization of faces:
(i) Let cT x be a linear function bounded from above on X. Then the set
ArgmaxX cT x := {x ∈ X : cT x = max cT x0 }
x ∈X 0

is a face of X. In particular, if the maximizer ofcT x over X exists and is unique, it is an


extreme point of X.
(ii) Vice versa, every face of X admits a representation as Argmaxx∈X cT x for properly cho-
sen c. In particular, every vertex of X is the unique maximizer, over X, of some linear
function.

3.32
Proof, (i): Let cT x be bounded from above on X. Then the set X∗ = Argmaxx∈X cT x is
nonempty. Let x∗ ∈ X∗ . By KKT P Optimality conditions, there exist λ ≥ 0 such that
T
i λi ai = c & λi > 0 ⇒ ai x∗ = bi
Let I∗ = {i : λi > 0}, so that P
T
(a): i∈I∗ λi ai = c and (b): ai x∗ = bi , i ∈ I∗ .
We claim that
X∗ = XI∗ := {x ∈ X : aTi x = bi , i ∈ I∗ }.
Indeed,
— x ∈ XI∗ ⇒ cT x = [ i∈I∗ λi ai ]T x [by (a)]
P

= i∈I∗ λi aTi x
P
P
= i∈I∗ bi [by (b)]
= i∈I∗ λi aTi x∗ = cT x∗ , [by (b) and (a)]
P

⇒ x ∈ X∗ := Argmax cT y, and
y∈X
—P T
x ∈ X∗ ⇒ c (x∗ − x) = 0
⇒ i∈I∗ λi (aTi x∗ − aTi x) = 0 [by (a)]
λi (bi − aTi x ) = 0 [by (b) and due
P
⇒ i∈I∗ |{z} to x ∈ X]
|{z}
>0 ≤bi
⇒ aTi x = bi ∀i ∈ I∗ ⇒ x ∈ XI∗ .
Proof, (ii): Let XI = {x ∈ X : aTi x = bi , i
P
∈ I} be a face of X, and let us set c = i∈I ai .
Same as above, it is immediately seen that XI = Argmaxx∈X cT x.

3.33
LO Duality

♣ Consider an LO program 
Opt(P ) = max cT x : Ax ≤ b (P )
x
The dual problem stems from the desire to bound from above the optimal value of the
primal problem (P ), To this end, we use our aggregation technique, specifically,
• assign the constraints aTi x ≤ bi with nonnegative aggregation weights λi (“Lagrange mul-
tipliers”) and sum them up with these weights, thus getting the inequality
[AT λ]T x ≤ bT λ (!)
Note: by construction, this inequality is a consequence of the system of constraints in (P )
and thus is satisfied at every feasible solution to (P ).
• We may be lucky to get in the left hand side of (!) exactly the objective cT x:
AT λ = c.
In this case, (!) says that bT λ is an upper bound on cT x everywhere in the feasible domain
of (P ), and thus bT λ ≥ Opt(P ).

3.34

Opt(P ) = max cT x : Ax ≤ b (P )
x

♠ We arrive at the problem of finding the best – the smallest – upper bound on Opt(P )
achievable with our bounding scheme.This new problem is
Opt(D) = min bT λ : AT λ = c, λ ≥ 0 . (D)
λ
It is called the problem dual to (P ).
♣ Note: Our “bounding principle” can be applied to every LO program, independently of
its format. For example, as applied to the primal LO program in the form
  
  P x ≤ p (`) 
T
Opt(P ) = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 

it leads to the dual problem in the form of



Opt(D) = min pT λ` + q T λg + rT λe :
[λ` ;λg ,λe ]
  (D)
λ` ≥ 0, λg ≤ 0
P T λ` + QT λg + RT λe = c

3.35
  
  P x ≤ p (`) 
Opt(P ) = max cT x : Qx ≥ q (g) (P )
x   Rx = r (e) 

Opt(D) = min pT λ` + q T λg + rT λe :
[λ` ;λg ,λe ]
  (D)
λ` ≥ 0, λg ≤ 0
P T λ` + QT λg + RT λe = c
LO Duality Theorem:Consider a primal LO program (P ) along with its dual program (D).
Then
(i) [Primal-dual symmetry] The duality is symmetric: (D) is an LO program, and the
program dual to (D) is (equivalent to) the primal problem (P ).
(ii) [Weak duality] We always have Opt(D) ≥ Opt(P ).
(iii) [Strong duality] The following 3 properties are equivalent to each other:
• one of the problems is feasible and bounded
• both problems are solvable
• both problems are feasible
and whenever these equivalent to each other properties take place, we have
Opt(P ) = Opt(D).

3.36
  
  P x ≤ p (`) 
Opt(P ) = max c x : T Qx ≥ q (g) (P )
x   Rx = r (e) 

Opt(D) = min pT λ` + q T λg + rT λe :
[λ` ;λg ,λe ]
  (D)
λ` ≥ 0, λg ≤ 0
P T λ` + QT λg + RT λe = c
Proof of Primal-Dual Symmetry: We rewrite (D) is exactly the same form as (P ), that
is, as 
−Opt(D) = max − pT λ` − q T λg − rT λe :
 [λ` ;λg ,λe ] 
λg ≤ 0, λ` ≥ 0
P T λ` + QT λg + RT λe = c
and apply the recipe for building the  dual, resulting in
x` ≥ 0, xg ≤ 0 

 
P xe + xg = −p
  
min T
c xe :
[x` ;xg ;xe ] 
  Qxe + x` = −q 
 
Rxe = −r
whence, setting xe = −x and eliminating
 T xg and xe , the problem dual to dual becomes
min −c x : P x ≤ p, Qx ≥ q, Rx = r
x
which is equivalent to (P ). 

3.37
  
  P x ≤ p (`) 
Opt(P ) = max c x : T Qx ≥ q (g) (P )
x   Rx = r (e) 

Opt(D) = min pT λ` + q T λg + rT λe :
[λ` ;λg ,λe ]
  (D)
λ` ≥ 0, λg ≤ 0
P T λ` + QT λg + RT λe = c

Proof of Weak Duality Opt(D) ≥ Opt(P ): by construction of the dual.

3.38
  
  P x ≤ p (`) 
Opt(P ) = max c x : T Qx ≥ q (g) (P )
x   Rx = r (e) 

Opt(D) = min pT λ` + q T λg + rT λe :
[λ` ;λg ,λe ]
  (D)
λ` ≥ 0, λg ≤ 0
P T λ` + QT λg + RT λe = c
Proof of Strong Duality:
Main Lemma: Let one of the problems (P ), (D) be feasible and bounded. Then both
problems are solvable with equal optimal values.
Proof of Main Lemma: By Primal-Dual Symmetry, we can assume w.l.o.g. that the
feasible and bounded problem is (P ). By what we already know, (P ) is solvable. Let us
prove that (D) is solvable, and the optimal values are equal to each other.
• Observe that the linear inequality cT x ≤ Opt(P ) is a consequence of the (solvable!) system
of constraints of (P ). By Inhomogeneous Farkas Lemma
∃λ` ≥ 0, λg ≤ 0, λe :
P T λ` + QT λg + RT λe = c & pT λ` + q T λg + rT λe ≤ Opt(P ).
⇒ λ is feasible for (D) with the value of dual objective ≤Opt(P ). By Weak Duality, this
value should be ≥Opt(P )
⇒ the dual objective at λ equals to Opt(P )
⇒ λ is dual optimal and Opt(D) = Opt(P ). 

3.39
Main Lemma ⇒ Strong Duality:
• By Main Lemma, if one of the problems (P ), (D) is feasible and bounded, then both
problems are solvable with equal optimal values
• If both problems are solvable, then both are feasible
• If both problems are feasible, then both are bounded by Weak Duality, and thus one of
them (in fact, both of them) is feasible and bounded.

3.40
Immediate Consequences

  
  P x ≤ p (`) 
Opt(P ) = max c x : T Qx ≥ q (g) (P )
x   Rx = r (e) 

Opt(D) = min pT λ` + q T λg + rT λe :
[λ` ;λg ,λe ]
  (D)
λ` ≥ 0, λg ≤ 0
P T λ` + QT λg + RT λe = c
♣ Optimality Conditions in LO: Let x and λ = [λ` ; λg ; λe ] be a pair of feasible solutions
to (P ) and (D). This pair is comprised of optimal solutions to the respective problems
• [zero duality gap] if and only if the duality gap, as evaluated at this pair, vanishes:
DualityGap(x, λ) := [pT λ` + q T λg + rT λe ] − cT x
= 0
• [complementary slackness] if and only if the products of all Lagrange multipliers λi and
the residuals in the corresponding primal constrains are zero:
∀i : [λ` ]i [p − P x]i = 0 & ∀j : [λg ]j [q − Qx]j = 0.

3.41
Proof: We are in the situation when both problems are feasible and thus both are solvable
with equal optimal values. Therefore
 
DualityGap(x, λ) := [pT λ` + q T λg + rT λe ] − Opt(D)
+ Opt(P ) − cT x
For a primal-dual pair of feasible solutions the expressions in the magenta and the red
brackets are nonnegative
⇒ Duality Gap, as evaluated at a primal-dual feasible pair, is nonnegative and can vanish
iff both the expressions in the magenta and the red brackets vanish, that is, iff x is primal
optimal and λ is dual optimal.
• Observe that
DualityGap(x, λ) = [pT λ` + q T λg + rT λe ] − cT x
= [pT λ` + q T λg + rT λe ] − [P T λ` + QT λg + RT λe ]T x
=P λT` [p − P x] + λTg [q − T
PQx] + λe [r − Rx]
= i [λ` ]i [p − P x]i + j [λg ]j [q − Qx]j
Al terms in the resulting sums are nonnegative
⇒ Duality Gap vanishes iff the complementary slackness holds true.

3.42
Geometry of a Primal-Dual Pair of LO Programs

  
P x ≤ p (`)
Opt(P ) = max cT x : (P )
x Rx
 = r (e) 
λ` ≥ 0
Opt(D) = min pT λ` + rT λe : (D)
[λ` ;λe ] P T λ` + RT λe = c
Standing Assumption: The systems of linear equations in (P ), (D) are solvable:
∃x̄, λ̄ = [λ̄` ; λ̄e ] : Rx̄ = r, P T λ̄` + RT λ̄e = −c
♣ Observation: Whenever Rx = r, we have
cT x = −[P T λ̄` + RT λ̄e ]T x = −λ̄T`[P x] − λ̄Te [Rx]
= λ̄T` [p − P x]+ −λ̄T` p − λ̄Te r
⇒ (P ) is equivalent to the problem
max λ̄T` [p − P x] : p − P x ≥ 0, Rx = r .

x

♠ Passing to the new variable (“primal slack”) ξ = p − P x, the problem becomes


 T
max λ̄` ξ : ξ ≥ 0, ξ ∈ MP = ξ̄ + LP
ξ 
LP = {ξ = P x : Rx = 0}
ξ̄ = p − P x̄
MP : primal feasible affine plane

3.43
  
P x ≤ p (`)
Opt(P ) = max cT x : (P )
x Rx
 = r (e) 
λ` ≥ 0
Opt(D) = min pT λ` + rT λe : (D)
[λ` ;λe ] P T λ` + RT λe = c
♣ Let us express (D) in terms of the dual slack λ` . If [λ` ; λe ] satisfy the equality constraints
in (D), then
pT λ` + rT λe = pT λ` + [Rx̄]T λe = pT λ` + x̄T [RT λe ]
= pT λ` + x̄T [c − P T λ` ] = [p − P x̄]T λ` + x̄T c
= ξ̄ T λ` +x̄T c
Besides this,
P T λ` + RT λe = c
⇔ P T λ` + RT λe = −P T λ̄` − RT λ̄e
⇔ P T [λ` + λ̄` ] + RT [λe + λ̄e ] = 0
⇔ λ` ∈ MD := LD − λ̄` ,
LD = {µ` : ∃µe : P T µ` + RT µe = 0}.
⇒ (D) is equivalent to the problem

min ξ̄ T λ` : λ` ≥ 0, λ` ∈ MD = LD − λ̄`
λ
` 
LD = {µ` : ∃µe : P T µ` + RT µe = 0}
MD : dual feasible affine plane

3.44
  
P x ≤ p (`)
Opt(P ) = max cT x : (P )
x Rx
 = r (e) 
λ` ≥ 0
Opt(D) = min pT λ` + rT λe : (D)
[λ` ;λe ] P T λ` + RT λe = c
Bottom line: Problems (P ), (D) are equivalent to problems

max λ̄T` ξ : ξ ≥ 0, ξ ∈ MP = LP + ξ̄ (P)
ξ
min ξ̄ T λ` : λ` ≥ 0, λ` ∈ MD = LD − λ̄` (D)
λ`

where
LP = {ξ : ∃x : ξ = P x, Rx = 0},
LD = {λ` : ∃λe : P T λ` + RT λe = 0}
Note: • Linear subspaces LP and LD are orthogonal complements of each other
• The minus primal objective −λ̄` belongs to the dual feasible plane MD , and the dual
objective ξ̄ belongs to the primal feasible plane MP . Moreover, replacing λ̄` , ξ̄ with any
other pair of points from −MD and MP , problems remain essentially intact – on the
respective feasible sets, the objectives get constant shifts

3.45
Problems (P ), (D) are equivalent to problems

max λ̄T` ξ : ξ ≥ 0, ξ ∈ MP = LP + ξ̄ (P)
ξ
min ξ̄ T λ` : λ` ≥ 0, λ` ∈ MD = LD − λ̄` (D)
λ`

where
LP = {ξ : ∃x : ξ = P x, Rx = 0},
LD = {λ` : ∃λe : P T λ` + RT λe = 0}

• A primal-dual feasible pair (x, [λ` ; λe ]) of solutions to (P ), (D) induces a pair of feasible
solutions (ξ = p − P x, λ` ) to (P, D), and
DualityGap(x, [λ` , λe ]) = λT` ξ.
Thus, to solve (P ), (D) to optimality is the same as to pick a pair of orthogonal to each
other feasible solutions to (P), (D).

3.46
♣ We arrive at a wonderful perfectly symmetric and transparent geometric picture:

Geometrically, a primal-dual pair of LO programs is given by a pair of affine planes MP


and MD in certain RN ; these planes are shifts of linear subspaces LP and LD which are
orthogonal complements of each other.
We intersect MP and MD with the nonnegative orthant RN + , and our goal is to find in these
intersections two orthogonal to each other vectors.
♠ Duality Theorem says that this task is feasible if and only if both MP and MD intersect
the nonnegative orthant.

3.47
z

Geometry of primal-dual pair of LO programs:


Blue area: feasible set of (P) — intersection of the 2D primal feasible plane MP with the
nonnegative orthant R3+ .
Red segment: feasible set of (D) — intersection of the 1D dual feasible plane MD with the
nonnegative orthant R3+ .
Blue dot: primal optimal solution ξ ∗ .
Red dot: dual optimal solution λ∗` .
Pay attention to orthogonality of the primal solution (which is on the z-axis) and the dual
solution (which is in the xy-plane).

3.48
The Cost Function of an LO program, I

♣ Consider an LO program
Opt(b) = max cT x : Ax ≤ b .

(P [b])
x

Note: we treat the data A, c as fixed, and b as varying, and are interested in the properties
of the optimal value Opt(b) as a function of b.
♠ Fact: When b is such that (P [b]) is feasible, the property of problem to be/not to be
bounded is independent of the value of b.
Indeed, a feasible problem (P [b]) is unbounded iff there exists d: Ad ≤ 0, cT d > 0, and this
fact is independent of the particular value of b.
Standing Assumption: There exists b such that P ([b]) is feasible and bounded
⇒ P ([b]) is bounded whenever it is feasible.
Theorem Under Assumption, −Opt(b) is a polyhedrally representable function with the
polyhedral representation
{[b; τ ] : −Opt(b) ≤ τ } = {[b; τ ] : τ + Opt(b) ≥ 0}
= {[b; τ ] : ∃x : Ax ≤ b, τ + cT x ≥ 0}.
The function Opt(b) is monotone in b:
b0 ≤ b00 ⇒ Opt(b0 ) ≤ Opt(b00 ).

3.49
Opt(b) = max cT x : Ax ≤ b .

(P [b])
x
♠ Additional information can be obtained from Duality. The problem dual to (P [b]) is
min b λ : λ ≥ 0, AT λ = c .
 T
(D[b])
λ

By LO Duality Theorem, under our Standing Assumption (D[b]) is feasible for every b for
which P ([b]) is feasible, and
Opt(b) = min bT λ : λ ≥ 0, AT λ = c .

(∗)
λ

Observation: Let b̄ be such that Opt(b̄) > −∞, so that (D[b̄]) is solvable, and let λ̄ be an
optimal solution to (D[b̄]). Then λ̄ is a supergradient of Opt(b) at b = b̄, meaning that
∀b : Opt(b) ≤ Opt(b̄) + λ̄T [b − b̄]. (!)
Indeed, by (∗) we have Opt(b̄) = λ̄T b̄ and Opt(b) ≤ λ̄T b, that is, Opt(b) ≤ λ̄T b̄ + λ̄T [b − b̄] =
Opt(b̄) + λ̄T [b − b̄]. 

3.50

Opt(b) = max cT x : Ax ≤ b (P [b])
x 
= min bT λ : λ ≥ 0, AT λ = c (D[b])
λ
Opt(b̄) > −∞, λ̄ ∈ Argmin{b̄T λ : λ ≥ 0, AT λ = c}
λ
⇒ Opt(b) ≤ Opt(b̄) + λ̄[b − b̄] (!)
Opt(b) is a polyhedrally representable and thus piecewise-linear function with the full-
dimensional domain:
Dom Opt(·) = {b : ∃x : Ax ≤ b}.
Representing the feasible set Λ = {λ : λ ≥ 0, AT λ = c} of (D[b]) as
Λ = Conv({λ1 , ..., λN }) + Cone ({r1 , ..., rM })
we get
Dom Opt(b) = {b : bT rj ≥ 0, 1 ≥ j ≤ M },
b ∈ Dom Opt(b) ⇒ Opt(b) = min λTi b
1≤i≤N

3.51

Opt(b) = max cT x : Ax ≤ b (P [b])
x 
= min bT λ : λ ≥ 0, AT λ = c (D[b])
λ
Opt(b̄) > −∞, λ̄ ∈ Argmin{b̄T λ : λ ≥ 0, AT λ = c}
λ
⇒ Opt(b) ≤ Opt(b̄) + λ̄[b − b̄] (!)

Dom Opt(b) = {b : bT rj ≥ 0, 1 ≤ j ≤ M },
b ∈ Dom Opt(b) ⇒ Opt(b) = min λTi b
1≤i≤N
⇒ Under our Standing Assumption,
• Dom Opt(·) is a full-dimensional polyhedral cone,
• Assuming w.l.o.g. that λi 6= λj when i 6= j, the finitely many hyperplanes {b : λTi b = λTj b},
1 ≤ i < j ≤ N , split this cone into finitely many cells, and in the interior of every cell Opt(b)
is a linear function of b.
• By (!), when b is in the interior of a cell, the optimal solution λ(b) to (D[b]) is unique,
and λ(b) = ∇Opt(b).

3.52
Law of Diminishing Marginal Returns

♠ Consider a function of the form


Opt(β) = max cT x : P x ≤ p, q T x ≤ β

(Pβ )
x

Interpretation: x is a production plan, q T x is the price of resources required by x, β is our


investment in the recourses, Opt(β) is the maximal return for an investment β. Assume
that (Pβ ) is feasible for some β.
♠ As above, for β such that (Pβ ) is feasible, the problem is either always bounded, or is
always unbounded. Assume that the first is the case. Then
• The domain Dom Opt(·) of Opt(·) is a nonempty ray β ≤ β < ∞ with β ≥ −∞, and
• Opt(β) is nondecreasing and concave.
Monotonicity and concavity imply that if
β ≤ β1 < β 2 < β 3 ,
then
Opt(β2 ) − Opt(β1 ) Opt(β3 ) − Opt(β2 )
≥ ,
β2 − β1 β3 − β2
that is, the reward for an extra $1 in the investment can only decrease (or remain the same)
as the investment grows.
In Economics, this is called the law of diminishing marginal returns.

3.53
The Cost Function of an LO program, II

♣ Consider an LO program
Opt(c) = max cT x : Ax ≤ b .

(P [c])
x

Note: we treat the data A, b as fixed, and c as varying, and are interested in the properties
of Opt(c) as a function of c.
Standing Assumption: (P [·]) is feasible (this fact is independent of the value of c).
Theorem Under Assumption, Opt(c) is a polyhedrally representable function with the poly-
hedral representation
{[c; τ ] : Opt(c) ≤ τ } = {[c; τ ] : ∃λ : λ ≥ 0, AT λ = c, bT λ ≤ τ }.
Proof. Since (P [c]) is feasible, by LO Duality Theorem the program is solvable if and only
if the dual program
min{bT λ : λ ≥ 0, AT λ = c} (D[c])
λ
is feasible, and in this case the optimal values of the problems are equal
⇒ τ ≥ Opt(c) iff (D[c]) has a feasible solution with the value of the objective ≤ τ . 

3.54
Opt(c) = max cT x : Ax ≤ b .

(P [c])
x
Theorem Let c̄ be such that Opt(c̄) < ∞, and x̄ be an optimal solution to (P [c̄]). Then x̄
is a subgradient of Opt(·) at the point c̄:
∀c : Opt(c) ≥ Opt(c̄) + x̄T [c − c̄]. (!)
T T T T
Proof: We have Opt(c) ≥ c x̄ = c̄ x̄ + [c − c̄] x̄ = Opt(c̄) + x̄ [c − c̄]. 

♠ Representing
{x : Ax ≤ b} = Conv({v1 , ..., vN }) + Cone ({r1 , ..., rM }),
we see that
• Dom Opt(·) = {c : rjT c ≤ 0, 1 ≤ j ≤ M } is a polyhedral cone, and
• c ∈ Dom Opt(·) ⇒ Opt(c) = max viT c.
1≤i≤N
In particular, if Dom Opt(·) is full-dimensional and vi are distinct from each other, everywhere
in Dom Opt(·) outside finitely many hyperplanes {c : viT c = vjT c}, 1 ≤ i < j ≤ N , the optimal
solution x = x(c) to (P [c]) is unique and x(c) = ∇Opt(c).

3.55
♣ Let X = {x ∈ Rn : Ax ≤ b} be a nonempty polyhedral set. The function

Opt(c) = max cT x : Rn → R ∪ {+∞}


x∈X

has a name - it is called the support function of X. Along with already investigated
properties of the support function, an important one is as follows:
♠ The support function of a nonempty polyhedral set X “remembers” X: if
Opt(c) = max cT x,
x∈X

then
X = {x ∈ Rn : cT x ≤ Opt(c) ∀c}.
Proof. Let X + = {x ∈ Rn : cT x ≤ Opt(c) ∀c}. We clearly have X ⊂ X + . To prove the
inverse inclusion, let x̄ ∈ X + ; we want to prove that x ∈ X. To this end let us represent
X = {x ∈ Rn : aTi x ≤ bi , 1 ≤ i ≤ m}. For every i, we have

aTi x̄ ≤ Opt(ai ) ≤ bi ,
and thus x̄ ∈ X. 

3.56
Applications of Duality in Robust LO

♣ Data uncertainty: Sources


Typically, the data of real world LOs
max cT x : Ax ≤ b

[A = [aij ] : m × n] (LO)
x

is not known exactly when the problem is being solved. The most common reasons for
data uncertainty are:
• Some of data entries (future demands, returns, etc.) do not exist when the problem
is solved and hence are replaced with their forecasts. These data entries are subject to
prediction errors
• Some of the data (parameters of technological devices/processes, contents associated
with raw materials, etc.) cannot be measured exactly, and their true values drift around
the measured “nominal” values. These data are subject to measurement errors

3.57
• Some of the decision variables (intensities with which we intend to use various technologi-
cal processes, parameters of physical devices we are designing, etc.) cannot be implemented
exactly as computed. The resulting implementation errors are equivalent to appropriate ar-
tificial data uncertainties.

A typical implementation error can be modeled as xj 7→ (1 + ξj )xj + ηj , and effect


of these errors on a linear constraint
n
X
aij xj ≤ bj
j=1

is as if there were no implementation errors, but the data aij got the multiplicative
perturbations:
aij 7→ aij (1 + ξj ) ,
and the data bi got the perturbation P
bi 7→ bi − j ηj aij .

3.58
Data uncertainty: Dangers.
In the traditional LO methodology, a small data uncertainty (say, 0.1% or less) is just ig-
nored: the problem is solved as if the given (“nominal”) data were exact, and the resulting
nominal optimal solution is what is recommended for use.
Rationale: we hope that small data uncertainties will not affect too badly the feasibil-
ity/optimality properties of the nominal solution when plugged into the “true” problem.
Fact: The above hope can be by far too optimistic, and the nominal solution can be
practically meaningless.

3.59
♣ Example: Antenna Design
♠ [Physics:] Directional density of energy transmitted by a monochromatic antenna placed
at the origin is proportional to |D(δ)|2 , where the antenna’s diagram D(δ) is a complex-
valued function of 3-D direction (unit 3-D vector) δ.
♠ [Physics:] For an antenna array — a complex antenna comprised of a number of antenna
elements, the diagram is
X
D(δ) = xj Dj (δ) (∗)
j
• Dj (·): diagrams of elements
• xj : complex weights – design parameters responsible for how the elements in the array
are invoked.
♠ Antenna Design problem: Given diagrams
D1 (·), ..., Dn (·)
and a target diagram D∗ (·), find complex weights xi which make the synthesized diagram
(∗) as close as possible to the target diagram D∗ (·).
♥ When Dj (·), D∗ (·) and the weights are real and the “closeness’ is quantified by the
maximal deviation along a finite grid Γ of directions, Antenna Design becomes the LO
problem
n X o
min τ : −τ ≤ D∗ (δ) − xj Dj (δ) ≤ τ ∀δ ∈ Γ .
x∈R ,τ
n j

3.60
♠ Example: Consider planar antenna array comprised of 10 elements (circle surrounded by
9 rings of equal areas) in the plane XY (Earth’s surface”), and our goal is to send most of
the energy “up,” along the 12o cone around the Z-axis.

3.61
p
• Diagram of a ring {z = 0, a ≤ x2 + y 2 ≤ b}:
b 2π
 
Da,b (θ) = 12 r cos 2πrλ−1 cos(θ) cos(φ) dφ dr,
R R 
a 0
• θ: altitude angle • λ: wavelength
0.25

0.2

0.15

0.1

0.05

−0.05

−0.1
0 10 20 30 40 50 60 70 80 90

Diagrams of the
10 elements,
elements vs the
equal areas,
altitude angle θ,
outer radius 1 m
λ =50 cm
• Nominal design problem:
 10
P
τ∗ = min τ : −τ ≤ D∗ (θi ) − xj Dj (θi ) ≤ τ,
x∈R ,τ
10
j=1

1 ≤ i ≤ 240 , θi = 480
1.2

0.8

0.6

0.4

0.2

−0.2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Target (blue) and nominal


optimal (magenta) diagrams,
τ∗ = 0.0589
3.62
But: The design variables are characteristics of physical devices and as such they cannot be
implemented exactly as computed. What happens when there are implementation errors:
xfact
j = (1 + j )xcomp
j , j ∼ Uniform[−ρ, ρ]
with small ρ?
1.2 20 200 1500

15 150
1 1000

10 100
0.8 500

5 50
0.6 0

0 0

0.4 −500
−5 −50

0.2 −1000
−10 −100

0 −1500
−15 −150

−0.2 −20 −200 −2000


0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

ρ=0 ρ = 0.0001 ρ = 0.001 ρ = 0.01

“Dream and reality,” nominal optimal design: samples of 100 actual diagrams (red) for
different uncertainty levels. Blue: the target diagram
Dream Reality
ρ=0 ρ = 0.0001 ρ = 0.001 ρ = 0.01
value mean mean mean
k · k∞ -distance
to target 0.059 5.671 56.84 506.5
energy
concentration 85.1% 16.4% 16.5% 14.9%

Quality of nominal antenna design: dream and reality. Data over 100 samples of
actuation errors per each uncertainty level.

♠ Conclusion: Nominal optimal design is completely meaningless...

3.63
NETLIB Case Study: Diagnosis

♣ NETLIB is a collection of about 100 not very large LPs, mostly of real-world origin. To
motivate the methodology of our “case study”, here is constraint # 372 of the NETLIB
problem PILOT4:
aT x ≡ −15.79081x826 − 8.598819x827 − 1.88789x828 − 1.362417x829
−1.526049x830 − 0.031883x849 − 28.725555x850 − 10.792065x851
−0.19004x852 − 2.757176x853 − 12.290832x854 + 717.562256x855
−0.057865x856 − 3.785417x857 − 78.30661x858 − 122.163055x859
−6.46609x860 − 0.48371x861 − 0.615264x862 − 1.353783x863
−84.644257x864 − 122.459045x865 − 43.15593x866 − 1.712592x870
−0.401597x871 + x880 − 0.946049x898 − 0.946049x916
≥ b ≡ 23.387405
The related nonzero coordinates in the optimal solution x∗ of the problem, as reported by
CPLEX, are:
x∗826 = 255.6112787181108 x∗827 = 6240.488912232100
x∗828 = 3624.613324098961 x∗829 = 18.20205065283259
x∗849 = 174397.0389573037 x∗870 = 14250.00176680900
x∗871 = 25910.00731692178 x∗880 = 104958.3199274139
This solution makes the constraint an equality within machine precision.
♣ Most of the coefficients in the constraint are “ugly reals” like -15.79081 or -84.644257.
We can be sure that these coefficients characterize technological devices/processes, and as
such hardly are known to high accuracy.
⇒ “ugly coefficients” can be assumed uncertain and coinciding with the “true” data within
accuracy of 3-4 digits.
The only exception is the coefficient 1 of x880 , which perhaps reflects the structure of
the problem and is exact.

3.64
aT x ≡ −15.79081x826 − 8.598819x827 − 1.88789x828 − 1.362417x829
−1.526049x830 − 0.031883x849 − 28.725555x850 − 10.792065x851
−0.19004x852 − 2.757176x853 − 12.290832x854 + 717.562256x855
−0.057865x856 − 3.785417x857 − 78.30661x858 − 122.163055x859
−6.46609x860 − 0.48371x861 − 0.615264x862 − 1.353783x863
−84.644257x864 − 122.459045x865 − 43.15593x866 − 1.712592x870
−0.401597x871 + x880 − 0.946049x898 − 0.946049x916
≥ b ≡ 23.387405

♣ Assume that the uncertain entries of a are 0.1%-accurate approximations of unknown


entries in the “true” data ã. How does data uncertainty affect the validity of the constraint
as evaluated at the nominal solution x∗ ?
• The worst case, over all 0.1%-perturbations of uncertain data, violation of the constraint
is as large as 450% of the right hand side!
• With random and independent of each other 0.1% perturbations of the uncertain coeffi-
cients, the statistics of the “relative constraint violation”
max[b−ãT x∗ ,0]
V = b
× 100%
also is disastrous:
Prob{V > 0} Prob{V > 150%} Mean(V )
0.50 0.18 125%
Relative violation of constraint # 372 in PILOT4
(1,000-element sample of 0.1% perturbations)
♣ We see that quite small (just 0.1%) perturbations of “obviously uncertain” data coeffi-
cients can make the “nominal” optimal solution x∗ heavily infeasible and thus – practically
meaningless.

3.65
♣ In Case Study, we choose a “perturbation level” ρ ∈ {1%, 0.1%, 0.01%}, and, for every
one of the NETLIB problems, measure the “reliability index” of the nominal solution at this
perturbation level:
• We compute the optimal solution x∗ of the program
• For every one of the inequality constraints
aT x ≤ b
— we split the left hand side coefficients aj into “certain” (rational fractions p/q with
|q| ≤ 100) and “uncertain” (all the rest). Let J be the set of all uncertain coefficients of
the constraint.
— we compute the reliability index of pthe
P constraint
max[aT x∗ +ρ a2j (x∗j )2 −b,0]
j∈J
max[1,|b|]
× 100%
Note: the reliability index is of order of typical violation (measured in percents of the right
hand side) of the constraint, as evaluated at x∗ , under independent random perturbations,
of relative magnitude ρ, of the uncertain coefficients.
• We treat the nominal solution as unreliable, and the problem - as bad, the level of
perturbations being ρ, if the worst, over the inequality constraints, reliability index is worse
than 5%.

3.66
♣ The results of the Diagnosis phase of Case Study are as follows. From the 90 NETLIB
problems,
— in 27 problems the nominal solution turned out to be unreliable when ρ = 1%;
— 19 of these 27 problems were already bad at the ρ = 0.01%-level of uncertainty
— in 13 problems, 0.01% perturbations of the uncertain data can make the nominal solution
more than 50%-infeasible for some of the constraints.
Problem Sizea) ρ = 0.01% ρ = 0.1%
#badb) Indexc) #bad Index
80BAU3B 2263 × 9799 37 84 177 842
25FV47 822 × 1571 14 16 28 162
ADLITTLE 57 × 97 2 6
AFIRO 28 × 32 1 5
CAPRI 272 × 353 10 39
CYCLE 1904 × 2857 2 110 5 1,100
D2Q06C 2172 × 5167 107 1,150 134 11,500
FINNIS 498 × 614 12 10 63 104
GREENBEA 2393 × 5405 13 116 30 1,160
KB2 44 × 41 5 27 6 268
MAROS 847 × 1443 3 6 38 57
PEROLD 626 × 1376 6 34 26 339
PILOT 1442 × 3652 16 50 185 498
PILOT4 411 × 1000 42 210,000 63 2,100,000
PILOT87 2031 × 4883 86 130 433 1,300
PILOTJA 941 × 1988 4 46 20 463
PILOTNOV 976 × 2172 4 69 13 694
PILOTWE 723 × 2789 61 12,200 69 122,000
SCFXM1 331 × 457 1 95 3 946
SCFXM2 661 × 914 2 95 6 946
SCFXM3 991 × 1371 3 95 9 946
SHARE1B 118 × 225 1 257 1 2,570
a) # of linear constraints (excluding the box ones) plus 1 and # of variables
b) # of constraints with index > 5%
c) The worst, over the constraints, reliability index, in %

3.67
♣ Conclusions:
♦ In real-world applications of Linear Programming one cannot ignore the possibility that a
small uncertainty in the data (intrinsic for the majority of real-world LP programs) can make
the usual optimal solution of the problem completely meaningless from practical viewpoint.
Consequently,
♦ In applications of LP, there exists a real need of a technique capable of detecting cases
when data uncertainty can heavily affect the quality of the nominal solution, and in these
cases to generate a “reliable” solution, one which is immune against uncertainty.
Robust LO is aimed at meeting this need.

3.68
Robust LO: Paradigm

♣ In Robust LO, one considers an uncertain LO problem


n  T o
P = max c x : Ax ≤ b : (c, A, b) ∈ U ,
x

— a family of all usual LO instances of common sizes m (number of constraints) and


n (number of variables) with the data (c, A, b) running through a given uncertainty set
U ⊂ Rnc × Rm×n
A × Rm
b .
♠ We consider the situation where
• The solution should be built before the “true” data reveals itself and thus cannot depend
on the true data. All we know when building the solution is the uncertainty set U to which
the true data belongs.
• The constraints are hard: we cannot tolerate their violation.

3.69
♠ In the outlined “decision environment,” the only meaningful candidate solutions x are
the robust feasible ones – those which remain feasible whatever be a realization of the data
from the uncertainty set:
x ∈ Rn is robust feasible for P ⇔ Ax ≤ b ∀(c, A, b) ∈ U
♥ We characterize the objective at a candidate solution x by the guaranteed value
t(x) = min{cT x : (c, A, b) ∈ U}
of the objective.

3.70
n  T o
P = max c x : Ax ≤ b : (c, A, b) ∈ U ,
x
♥ Finally, we associate with the uncertain problem P its Robust Counterpart

ROpt(P)= max t : t ≤ cT x, Ax ≤ b ∀(c, A, b) ∈ U (RC)
t,x

where one seeks for the best (with the largest guaranteed value of the objective) robust
feasible solution to P.
The optimal solution to the RC is treated as the best among “immunized against uncer-
tainty” solutions and is recommended for actual use.
Basic question: Unless the uncertainty set U is finite, the RC is not an LO program, since
it has infinitely many linear constraints. Can we convert (RC) into an explicit LO program?

3.71
n  T o
P = max c x : Ax ≤ b : (c, A, b) ∈ U
x
⇒ max t : t ≤ cT x, Ax ≤ b ∀(c, A, b) ∈ U (RC)
t,x
Observation: The RC remains intact when the uncertainty set U is replaced with its convex
hull.
Theorem: The RC of an uncertain LO program with nonempty polyhedrally representable
uncertainty set is equivalent to an LO program. Given a polyhedral representation of U ,
the LO reformulation of the RC is easy to get.

3.72
n  T o
P = max c x : Ax ≤ b : (c, A, b) ∈ U
x
⇒ max t : t ≤ cT x, Ax ≤ b ∀(c, A, b) ∈ U (RC)
t,x
Proof of Theorem. Let
U = {ζ = (c, A, b) ∈ RN : ∃w : P ζ + Qw ≤ r}
be a polyhedral representation of the uncertainty set. Setting y = [x; t], the constraints of
(RC) become
qi (ζ) − pTi (ζ)y ≤ 0 ∀ζ ∈ U , 0 ≤ i ≤ m (Ci )
with pi (·), qi (·) affine in ζ. We have
qi (ζ) − pTi (ζ)y ≡ πiT (y)ζ − θi (y),
with θi (y), πi (y) affine in y. Thus, i-th constraint in (RC) reads
max πiT (y)ζ : P ζ + Qwi ≤ r = max πiT (y)ζ ≤ θi (y).
ζ,wi ζ∈U
Since U 6= ∅, by the LO Duality
 T we have
max πi (y)ζ : P ζ + Qwi ≤ r
ζ,wi 
= min rT ηi : ηi ≥ 0, P T ηi = πi (y), QT ηi = 0
ηi
⇒ y satisfies (Ci ) if and only if there exists ηi such that
ηi ≥ 0, P T ηi = πi (y), QT ηi = 0, rT ηi ≤ θi (y) (Ri )
⇒ (RC) is equivalent to the LO program of maximizing eT y ≡ t in variables y, η0 , η1 , ..., ηm
under the linear constraints (Ri ), 0 ≤ i ≤ m.

3.73
♠ Example: The Robust Counterpart of uncertain LO with interval uncertainty:
Uobj = {c : |cj − c0j | ≤ δcj , j = 1, ..., n}
Ui = {(ai1 , , , .aim , bi ) : |aij − a0ij | ≤ δaij , |bi − b0i | ≤ δbi }
is the program
P 0 P
t≤ cj xj − δcj |xj |
 
 
j j
max t : P 0
δaij |xj | ≤ bi − δb0i 
P
x,t  aij xj +
j j
which is equivalent to the LO program
 P 0 P 
 t ≤ c x
j j − δcj yj 
P 0j P j

 

max t : aij xj + δaij yj ≤ bi − δb0i
x,y,t  j j 
 
−yj ≤ xj ≤ yj
 

3.74
n How it works? – Antenna Example o
P10
min τ : −τ ≤ D∗ (θ` ) − j=1 xj Dj (θ` ) ≤ τ, ` = 1, ..., L
x,τ
m
min {τ : Ax + τ a + b ≥ 0} (LO)
x,τ

• The influence of “implementation errors”


xj 7→ (1 + j )xj
with |j | ≤ ρ ∈ [0, 1] is as if there were no implementation errors, but the part A of the
constraint matrix was uncertain and known “up to multiplication by a diagonal matrix with
diagonal entries from [1 − ρ, 1 + ρ]”:
U = {A = Anom Diag{1 + 1 , ..., 1 + 10 } : |j | ≤ ρ} (U)
Note that as far as a particular constraint is concerned, the uncertainty is an interval one
with δAij = ρ|Aij |. The remaining coefficients (and the objective) are certain.
♣ To improve reliability of our design, we replace the uncertain LO program (LO), (U)
with its robust counterpart, which is nothing but an explicit LO program.

3.75
n How it Works: Antenna Designo(continued)
min τ : −τ ≤ D∗ (θi ) − 10
P
τ,x j=1 xj Dj (θi ) ≤ τ, 1 ≤ i ≤ I , xj 7→ (1 + j )xj , −ρ ≤ j ≤ ρ
( P P
D∗ (θi ) − j xj Dj (θi )−ρ |xj ||Dj (θi )| ≥ −τ
)
⇒ min τ : P j
P ,1≤i≤I
τ,x D∗ (θi ) − j xj Dj (θi )+ρ j |xj ||Dj (θi )| ≤ τ

♠ Solving the Robust Counterpart at uncertainty level ρ = 0.01, we arrive at robust design.
The robust optimal value is 0.0815 (39% more than the nominal optimal value 0.0589).
1.2 1.2 1.4

1.2
1 1

1
0.8 0.8

0.8
0.6 0.6

0.6

0.4 0.4
0.4

0.2 0.2
0.2

0 0
0

−0.2 −0.2 −0.2


0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

ρ = 0.01 ρ = 0.05 ρ = 0.1


Robust optimal design: samples of 100 actual diagrams (red).
Reality
ρ = 0.01 ρ = 0.1
k · k∞ max = 0.081 max = 0.216
distance mean = 0.077 mean = 0.113
to target
energy min = 70.3% min = 52.2%
concentration mean = 72.3% mean = 70.8%
Robust optimal design, data over 100 samples of actuation errors.
• For nominal design with ρ = 0.001, the average k · k∞ -distance to target is 56.8, and average energy
concentration is 16.5%.

3.76
♣ Why the “nominal design” is that unreliable?
• The basic diagrams Dj (·) are “nearly linearly dependent”. As a result, the nominal
problem is “ill-posed” – it possesses a huge domain comprised of “nearly optimal” solutions.
Indeed, look what are the optimal values in the nominal Antenna Design LO with added
box constraints |xj | ≤ L on the variables:
L 1 10 102 103 104 105 106
Opt Val 0.0945 0.0800 0.0736 0.0696 0.0649 0.0607 0.0589

The “exactly optimal” solution to the nominal problem is very large, and therefore even
small relative implementation errors may completely destroy the design.
• In the robust counterpart, magnitudes of candidate solutions are penalized, and RC
implements a smart trade-off between the optimality and the magnitude (i.e., the stability)
of the solution.
j 1 2 3 4 5 6 7 8 9 10
xnom
j 2e3 -1e4 6e4 -1e5 1e5 2e4 -1e5 1e6 -7e4 1e4
xrob
j -0.3 5.0 -3.4 -5.1 6.9 5.5 5.3 -7.5 -8.9 13

3.77
How it works? NETLIB Case Study

♣ When applying the RO methodology to the bad NETLIB problems, assuming interval un-
certainty of (relative) magnitude ρ ∈ {1%, 0.1%, 0.01%} in “ugly coefficients” of inequality
constraints (no uncertainty in equations!), it turns out that
• Reliable solutions do exist, except for 4 cases corresponding to the highest (ρ = 1%)
perturbation level.
• The “price of immunization” in terms of the objective value is surprisingly low: when
ρ ≤ 0.1%, it never exceeds 1% and it is less than 0.1% in 13 of 23 cases. Thus, passing to
the robust solutions, we gain a lot in the ability of the solution to withstand data uncer-
tainty, while losing nearly nothing in optimality.

3.78
Objective at robust solution
Nominal
Problem optimal ρ = 0.1% ρ = 1%
value
80BAU3B 987224.2 1009229 (2.2%)
25FV47 5501.846 5502.191 (0.0%) 5505.653 (0.1%)
ADLITTLE 225495.0 228061.3 (1.1%)
AFIRO -464.7531 -464.7500 (0.0%) -464.2613 (0.1%)
BNL2 1811.237 1811.237 (0.0%) 1811.338 (0.0%)
BRANDY 1518.511 1518.581 (0.0%)
CAPRI 1912.621 1912.738 (0.0%) 1913.958 (0.1%)
CYCLE 1913.958 1913.958 (0.0%) 1913.958 (0.0%)
D2Q06C 122784.2 122893.8 (0.1%) Infeasible
E226 -18.75193 -18.75173 (0.0%)
FFFFF800 555679.6 555715.2 (0.0%)
FINNIS 172791.1 173269.4 (0.3%) 178448.7 (3.3%)
GREENBEA -72555250 -72192920 (0.5%) -68869430 (5.1%)
KB2 -1749.900 -1749.638 (0.0%) -1746.613 (0.2%)
MAROS -58063.74 -58011.14 (0.1%) -57312.23 (1.3%)
NESM 14076040 14172030 (0.7%)
PEROLD -9380.755 -9362.653 (0.2%) Infeasible
PILOT -557.4875 -555.3021 (0.4%) Infeasible
PILOT4 -64195.51 -63584.16 (1.0%) -58113.67 (9.5%)
PILOT87 301.7109 302.2191 (0.2%) Infeasible
PILOTJA -6113.136 -6104.153 (0.2%) -5943.937 (2.8%)
PILOTNOV -4497.276 -4488.072 (0.2%) -4405.665 (2.0%)
PILOTWE -2720108 -2713356 (0.3%) -2651786 (2.5%)
SCFXM1 18416.76 18420.66 (0.0%) 18470.51 (0.3%)
SCFXM2 36660.26 36666.86 (0.0%) 36764.43 (0.3%)
SCFXM3 54901.25 54910.49 (0.0%) 55055.51 (0.3%)
SHARE1B -76589.32 -76589.32 (0.0%) -76589.29 (0.0%)
Objective values at nominal and robust solutions to badNETLIB problems.
Percent in (·): Excess of robust optimal value over the nominal optimal value

3.79
Affinely Adjustable Robust Counterpart

♣ The rationale behind the Robust Optimization paradigm as applied to LO is based on


two assumptions:
A. Constraints of an uncertain LO program are a “must”: a meaningful solution should
satisfy all realizations of the constraints allowed by the uncertainty set.
B. All decision variables should be defined before the true data become known and thus
should be independent of the true data.
♣ In many cases, Assumption B is too conservative:
• In dynamical decision-making, only part of decision variables correspond to “here and
now” decisions, while the remaining variables represent “wait and see” decisions to be
made when part of the true data will be already revealed.
(!) “Wait and see” decision variables may – and should! – depend on the corresponding
part of the true data.
• Some of decision variables do not represent actual decisions at all; they are artificial
“analysis variables” introduced to convert the problem into the LO form.
(!) Analysis variables may – and should! – depend on the entire true data.

3.80
Example: Consider the problem of the best k · k1 -approximation
 
 X X 
min t : |bi − aij xj | ≤ t . (P)
x,t  
i j

When the data are certain, this problem is equivalent to the LP program
 
 X X 
min t : yi ≤ t, −yi ≤ bi − aij xj ≤ yi ∀i . (LP)
x,y,t  
i j

With uncertain data, the Robust Counterpart of (P) becomes the semi-infinite problem
 
 X X 
min t : |bi − aij xj | ≤ t ∀(bi , aij ) ∈ U ,
x,t  
i j

or, which is the same, the problem


 
 X X 
min t : ∀(bi , aij ) ∈ U : ∃y : yi ≤ t, −yi ≤ bi − aij xj ≤ yi ,
x,t  
i j

while the RC of (LP) is the much more conservative problem


 
 X X 
min t : ∃y : ∀(bi , aij ) ∈ U : yi ≤ t, −yi ≤ bi − aij xj ≤ yi .
x,t  
i j

3.81
Adjustable Robust Counterpart of an Uncertain LO

♣ Consider an uncertain LO. Assume w.l.o.g. that the data of LO are affinely parameterized
by a “perturbation vector” ζ running through a given perturbation set Z:
n  T o
LP = max c [ζ]x : A[ζ]x − b[ζ] ≤ 0 : ζ ∈ Z
x
[cj [ζ], Aij [ζ], bi [ζ] : affine in ζ]
♠ Assume that every decision variable may depend on a given “portion” of the true data.
Since the latter is affine in ζ, this assumption says that xj may depend on Pj ζ, where Pj
are given matrices.
• Pj = 0 ⇒ xj is non-adjustable: xj represents an independent of the true data “here and
now” decision;
• Pj 6= 0 ⇒ xj is adjustable: xj represents a “wait and see’ decision or an analysis variable
which may adjust itself – fully or partially, depending on Pj – to the true data.

3.82
n  T o
LP = max c [ζ]x : A[ζ]x − b[ζ] ≤ 0 : ζ ∈ Z
x
[cj [ζ], Aij [ζ], bi [ζ] : affine in ζ]
♣ Under the circumstances, a natural Robust Counterpart of LP is the problem

Find t and functions φj (·) such that the decision rules xj = φj (Pj ζ) make all the
constraints feasible for all perturbations ζ ∈ Z, while minimizing the guaranteed
value t of the objective:
P
≥ ∀ζ ∈ Z
 
 cj [ζ]φ j (P j ζ) t 
j
max t : P (ARC)
t,{φj (·)}  φj (Pj ζ)Aj [ζ] − b[ζ] ≤ 0 ∀ζ ∈ Z 
j

3.83
♣ Bad news: The Adjustable Robust Counterpart
P
≥ ∀ζ ∈ Z
 
 cj [ζ]φ j (P j ζ) t 
j
max t : P (ARC)
t,{φj (·)}  φj (Pj ζ)Aj [ζ] − b[ζ] ≤ 0 ∀ζ ∈ Z 
j

of uncertain LP is an infinite-dimensional optimization program and as such typically is


absolutely intractable: How could we represent efficiently general-type functions of many
variables, not speaking about how to optimize with respect to these functions?
♠ Partial Remedy (???): Let us restrict the decision rules xj = φj (Pj ζ) to be easily
representable – specifically, affine – functions:
φj (Pj ζ) ≡ µj + νjT Pj ζ.
With this dramatic simplification, (ARC) becomes a finite-dimensional (still semi-infinite)
optimization problem in new non-adjustable variables µj , νj
cj [ζ](µj + νjT Pj ζ) ≥ t ∀ζ ∈ Z
 P 
 
j
max t : P (AARC)
t,{µj ,νj }  (µj + νjT Pj ζ)Aj [ζ] − b[ζ] ≤ 0 ∀ζ ∈ Z 
j

3.84
♣ We have associated with uncertain LO
n  T o
LP = max c [ζ]x : A[ζ]x − b[ζ] ≤ 0 : ζ ∈ Z
x
[cj [ζ], Aij [ζ], bi [ζ] : affine in ζ]
and the “information matrices” P1 , ..., Pn the Affinely Adjustable Robust Counterpart
T P ζ) ≥ t ∀ζ ∈ Z
 P 
 cj [ζ](µ j + νj j 
j
max t : P (AARC)
t,{µj ,νj }  (µj + νjT Pj ζ)Aj [ζ] − b[ζ] ≤ 0 ∀ζ ∈ Z 
j

♠ Relatively good news:


• AARC is by far more flexible than the usual (non-adjustable) RC of LP.
• As compared to ARC, AARC has much more chances to be computationally tractable:
— In the case of simple recourse, where the coefficients of adjustable variables are certain,
AARC has the same tractability properties as RC:
If the perturbation set Z is given by polyhedral representation, (AARC) can be straightfor-
wardly converted into an explicit LO program.
— In the general case, (AARC) may be computationally intractable; however, under mild
assumptions on the perturbation set, (AARC) admits “tight” computationally tractable
approximation.

3.85
♣ Example: simple Inventory model. There is a single-product inventory system with
• a single warehouse which should at any time store at least Vmin and at most Vmax units of
the product;
• uncertain demands dt of periods t = 1, ..., T known to vary within given bounds:
dt ∈ [d∗t (1 − θ), d∗t (1 + θ)], t = 1, ..., T
• θ ∈ [0, 1]: uncertainty level
No backlogged demand is allowed!
• I factories from which the warehouse can be replenished:
— at the beginning of period t, you may order pt,i units of product from factory i. Your
orders should satisfy the constraints
0≤
Ppt,i ≤ Pi(t) [bounds on capacities per period]
pt,i ≤ Qi [bounds on cumulative capacities]
t

— an order is executed with no delay


— order pt,i costs you ci (t)pt,i .
• The goal: to minimize the total cost of the orders.

3.86
♠ With certain demand, the problem can be modeled as the LO program
P
min
p ,i≤I,t≤T,
ci (t)pt,i [total cost]
t,i
vt ,2≤t≤T +1 t,i
s.t.  
P state equations
vt+1 = vt + pt,i − dt , t = 1, ..., T
i (v1 is given)
Vmin ≤ vt ≤ Vmax , 2≤t≤T +1 [bounds on states]
0 ≤ pt,i ≤ Pi (t), i ≤ I, t ≤ T [bounds
 on orders] 
P cumulative bounds
pt,i ≤ Qi , i ≤ I
t on orders
♠ With uncertain demand, it is natural to assume that the orders pt,i may depend on the
demands of the preceding periods 1, ..., t − 1. The analysis variables vt are allowed to depend
on the entire actual data. In fact, it suffices to allow for vt to depend on d1 , ..., dt−1 .
♠ Applying the AARC methodology, we make pt,i and vt affine functions of past demands:
pt,i = φ0t,i +
P τ
φt,i dτ
1≤τ
P<t τ
vt = ψt + 0 ψt dτ
1≤τ <t

• φ’s and ψ’s are our new decision variables...

3.87
P
min{pt,i ,vt } ci (t)pt,i s.t.
t,i
P
vt+1 = vt + pt,i − dt , t = 1, ..., T
i
Vmin ≤ vt ≤ Vmax , 2 ≤ t ≤ T + 1
P≤ pt,i ≤ Pi(t), i ≤ I, t ≤ T
0
pt,i ≤ Qi , i ≤ I
t P
pt,i = φ0t,i + φτt,i dτ
1≤τ <t
P
vt = ψt0 + ψtτ dτ
1≤τ <t

♠ The AARC is the semi-infinite  LO in non-adjustable  variables φ’s and ψ’s:
P P τ
ci (t) φ0t,i + φt,i dτ ≤ C






 t,i t
1≤τ <t   t−1
  t−1

 P P P P
 0 τ dτ = ψt0 + ψtτ dτ + φ0t,i + φτt,i dτ − dt
 ψt+1
 + ψt+1
minC,{φτt,i ,ψtτ } C s.t.  τ =1 t−1  τ =1 i τ =1

0+
P τ
≤ ψt dτ ≤ Vmax



 V min ψ t

  τ =1   

 t−1
P P t−1
P
 0 ≤ φ0t,i + φτt,i dτ ≤ Pi (t), φ0t,i + φτt,i dτ ≤ Qi


τ =1 t τ =1
• The constraints should be valid for all values of “free” indexes and all demand trajectories
d = {dt }Tt=1 from the “demand uncertainty box”
D = {d : d∗t (1 − θ) ≤ dt ≤ d∗t (1 + θ), 1 ≤ t ≤ T }.
♠ The AARC can be straightforwardly converted to a usual LP and easily solved.

3.88
♣ In the numerical illustration to follow:
• the planning horizon is T = 24
• there are I = 3 factories with per period capacities Pi (t) = 567 and cumulative capacities
Qi = 13600
• the nominal demand d∗t is seasonal:
1800

1600

1400

1200

1000

800

600

400
0 5 10 15 20 25 30 35 40 45 50

  
π(t−1)
d∗t = 1000 1 + 0.5 sin 12

• the ordering costs also are seasonal:


3.5

2.5

1.5

0.5

0
0 5 10 15 20 25

  
π(t−1)
ci (t) = ci 1 + 0.5 sin 12
, c1 = 1, c2 = 1.5, c3 = 2

• v1 = Vmin = 500, Vmax = 2000


• demand uncertainty θ = 20%

3.89
♣ Results:
• Opt(AARC) = 35542.
Note: The non-adjustable RC is infeasible already at 5% uncertainty level!
• With uniformly distributed in the range ±20% demand perturbations, the average, over
100 simulations, AARC management cost is 35121.
Note: Over the same 100 simulations, the average “utopian” management cost (optimal
for a priori known demand trajectories) is 33958, i.e., is by just 3.5% (!) less than the
average AARC management cost.

3.90
♣ Comparison with Dynamic Programming. When applicable, DP is the technique for
dynamical decision-making under uncertainty – in (worst-case-oriented) DP, one solves the
Adjustable Robust Counterpart of uncertain LO, with no ad hoc simplifications like “let us
restrict ourselves with affine decision rules.”
♠ Unfortunately, DP suffers from “curse of dimensionality” – with DP, the computational
effort blows up rapidly as the state dimension of the dynamical process grows. Usually state
dimension 4 is already “too big”.
Note: There is no “curse of dimensionality” in AARC!
However: Reducing the number of factories to 1, increasing the per period capacity of the
remaining factory to 1800 and making its cumulative capacity +∞, we reduce the state
dimension to 1 and make DP easily implementable. With this setup,
• the DP (that is, the “absolutely best”) optimal value is 31270
• the AARC optimal value is 31514 – just by 0.8% worse!

3.91
INTERMEDIATE SUMMARY
I. WHAT IS LO

♣ An LO program is an optimization problem of the form

Optimize a linear objective cT x over x ∈ Rn satisfying a system (S) of finitely many


linear equality and (nonstrict) inequality constraints.

♠ There are several “universal” forms of an LO program, universality meaning that every
LO program can be straightforwardly converted to an equivalent problem in the desired
form.
Examples of universal forms are:
• Canonical form:
Opt = max cT x : Ax ≤ b

[A : m × n]
x
• Standard form:
Opt = max cT x : Ax = b, x ≥ 0

[A : m × n]
x
II. WHAT CAN BE REDUCED TO LO

♣ Every optimization problem max f (y) can be straightforwardly converted to an equivalent


y∈Y
problem with linear objective, specifically, to the problem
max cT x (P )
 x∈X 
T
X = {x = [y; t] : y ∈ Y, t ≤ f (y)}, c x := t
♠ The possibility to convert (P ) into an LO program depends on the geometry of the
feasible set X. When X is polyhedrally representable:
X = {x ∈ Rn : ∃w ∈ Rk : P x + Qw ≤ r}
for properly chosen P, Q, r, (P ) reduces to the LO program
max cT x : P x + Qw ≤ r .

x,w
♣ By Fourier-Motzkin elimination scheme, every polyhedrally representable set X is in fact
polyhedral – it can be represented as the solution set of a system of linear inequalities
without slack variables w:
X = {x : ∃w : P x + Qw ≤ r}
⇔ ∃A, b : X = {x : Ax ≤ b}
This does not make the notion of a polyhedral representation “void” — a polyhedral set
with a “short” polyhedral representation involving slack variables can require astronomically
many inequalities in a slack-variable-free representation.
♠ Every polyhedral (≡ polyhedrally representable) set is convex (but not vice versa), and all
basic convexity-preserving operations with sets (taking finite intersections, affine images,
inverse affine images, arithmetic sums, direct products) as applied to polyhedral operands
yield polyhedral results.
• Moreover, for all basic convexity-preserving operations, a polyhedral representation of the
result is readily given by polyhedral representations of the operands.
♣ A “counterpart” of the notion of polyhedral (≡ polyhedrally representable) set is the
notion of a polyhedrally representable function. A function f : Rn → R ∪ {+∞} is called
polyhedrally representable, if its epigraph is a polyhedrally representable set:
Epi{f } := {[x; τ ] : τ ≥ f (x)]}
= {[x; τ ] : ∃w : P x + τ p + Qw ≤ r}
♠ A polyhedrally representable function always is convex (but not vice versa);
♠ A function f is polyhedrally representable if and only if its domain is a polyhedral set,
and in the domain f is the maximum of finitely many affine functions:
(
max [aT` x + b` ], x ∈ Domf := {x : Cx ≤ d}
f (x) = 1≤`≤L
+∞, otherwise
♠ A level set {x : f (x) ≤ a} of a polyhedrally representable function is polyhedral.
♠ All basic convexity-preserving operations with functions (taking finite maxima, linear
combinations with nonnegative coefficients, affine substitution of argument) as applied to
polyhedrally representable operands yield polyhedrally representable results.
• For all basic convexity-preserving operations with functions, a polyhedral representation
of the result is readily given by polyhedral representations of the operands.
III. STRUCTURE OF A POLYHEDRAL SET
III.A: Extreme Points

♣ Extreme points: A point v ∈ X = {x ∈ Rn : Ax ≤ b} is called an extreme point of X, if v


is not a midpoint of a nontrivial segment belonging to X:
v ∈ Ext(X) ⇔ v ∈ X & v ± h ∈ X ⇒ h = 0.
♠ Facts: Let X = {x ∈ Rn : aTi x ≤ bi , 1 ≤ i ≤ m} be a nonempty polyhedral set.
• X has extreme points if and only if it does not contain lines;
• The set Ext(X) of the extreme points of X is finite;
• X is bounded iff X = Conv(Ext(X));
• A point v ∈ X is an extreme point of X if and only if among the inequalities aTi x ≤ bi
which are active at v: aTi v = bi , there are n = dim x inequalities with linearly independent
vectors of coefficients:
v ∈ Ext(X)
⇔ v ∈ X & Rank {ai : aTi v = bi } = n.
III.B. Recessive Directions

♣ Let X = {x ∈ Rn : Ax ≤ b} be a nonempty polyhedral set. A vector d ∈ Rn is called a


recessive direction of X, if there exists x̄ ∈ X such that the ray R+ · d := {x̄ + td : t ≥ 0} is
contained in X.
♠ Facts: • The set of all recessive directions of X form a cone, called the recessive cone
Rec(X) of X.
• The recessive cone of X is the polyhedral set given by Rec(X) = {d : Ad ≤ 0}
• Adding to an x ∈ X a recessive direction, we get a point in X:
X + Rec(X) = X.
• The recessive cone of X is trivial (i.e., Rec(X) = {0}) if and only if X is bounded.
• X contains a line if and only if Rec(X) is not pointed, that is, Rec(X) contains a line,
or, equivalently, there exists d 6= 0 such that ±d ∈ Rec(X), or, equivalently, the null space
of A is nontrivial: KerA = {d : Ad = 0} 6= {0}.
• A line directed by a vector d belongs to X if and only if it crosses X and d ∈ KerA =
Rec(X) ∩ [−Rec(X)].
• One has
X = X + KerA.
• X can be represented as
X=X
b + KerA
where X
b is a nonempty polyhedral set which does not contain lines. One can take
Xb = X ∩ [KerA]⊥.
III.C. Extreme Rays of a Cone

♣ Let K be a polyhedral cone, that is,


K = {x ∈ Rn : Ax ≤ 0}
= {x ∈ Rn : aTi x ≤ 0, 1 ≤ i ≤ m}
♠ A direction d ∈ K is called an extreme direction of K, if d 6= 0 and in any representation
d = d1 + d2 with d1 , d2 ∈ K both d1 and d2 are nonnegative multiples of d.
• Given d ∈ K, the ray R = R+ · d ⊂ K is called the ray generated by d, and d is called a
generator of the ray. When d 6= 0, all generators of the ray R = R+ · d are positive multiples
of d, and vice versa – every positive multiple of d is a generator of R.
• Rays generated by extreme directions of K are called extreme rays of K.
♠ Facts:
• A direction d 6= 0 is an extreme direction of a pointed polyhedral cone K = {x ∈ Rn :
aTi x ≤ 0, 1 ≤ i ≤ m} if and only if d ∈ K and among the inequalities aTi x ≤ 0 defining K there
are n − 1 which are active at d aTi d = 0 and have linearly independent vectors of coefficients:
d is an extreme direction of K
⇔ d ∈ K\{0} & Rank {ai : aTi d = 0} = n − 1
• K has extreme rays if and only if K is nontrivial (K 6= {0}) and pointed (K ∩ [−K] = {0}),
and in this case
• the number of extreme rays of K is finite, and
• if r1 , ..., rM are generators of extreme rays of K, then
K = Cone ({r1 , ..., rM }).
III.D. Structure of a Polyhedral Set

♣ Theorem: Every nonempty polyhedral set X can be represented in the form


X = Conv({v1 , ..., vN }) + Cone ({r1 , ..., rM }) (!)
where {v1 , ..., vN } is a nonempty finite set, and {r1 , ..., rM } is a finite (possibly empty) set.
Vice versa, every set of the form (!) is a nonempty polyhedral set. In addition,
♠ In a representation (!), one always has
Cone ({r1 , ..., rM }) = Rec(X)
♠ Let X do not contain lines. Then one can take in (!) {v1 , ..., vN } = Ext(X) and to choose,
as r1 , ..., rM , the generators of the extreme rays of Rec(X). The resulting representation is
“minimal”: for every representation of X in the form of (!), it holds Ext(X) ⊂ {v1 , ..., vN }
and every extreme ray of Rec(X), is any, has a generator from the set {r1 , ..., rM }.
IV. FUNDAMENTAL PROPERTIES
OF LO PROGRAMS

♣ Consider a feasible LO program


Opt = max cT x : Ax ≤ b

(P )
x

♠ Facts:
• (P ) is solvable (i.e., admits an optimal solution) if and only if (P ) is bounded (i.e., the
objective is bounded from above on the feasible set)
• (P ) is bounded if and only if cT d ≤ 0 for every d ∈ Rec(X)
• Let the feasible set of (P ) do not contain lines. Then (P ) is bounded if and only if
cT d ≤ 0 for generator of every extreme ray of Rec(X), and in this case there exists an
optimal solution which is an extreme point of the feasible set.
V. GTA AND DUALITY

♣ GTA: Consider a finite system (S) of nonstrict and strict linear inequalities and linear
equations in variables x ∈ Rn :
aTi x Ωi bi , 1 ≤ i ≤ m
(S)
Ωi ∈ {” ≤ ”, ” < ”, ” ≥ ”, ” > ”, ” = ”}
(S) has no solutions if and only if a legitimate weighted sum of the inequalities of the
system is a contradictory inequality, that is

One can assign the constraints of the system with weights λi so that
• the weights of the ”<” and ”≤” inequalities are nonnegative, while the weights
ofPthe ”>” and ”≥” inequalities are nonpositive,
m
• i=1 λP
i ai = 0, and
m
• either
Pm i=1 λibi < 0,
or i=1 λi bi = 0 and at least one of the strict inequalities gets a nonzero weight.
♠ Particular case: Homogeneous Farkas Lemma. A homogeneous linear inequality
aT x ≤ 0 is a consequence of a system of homogeneous linear inequalities aTi x ≤ 0, 1 ≤ i ≤ m,
if and only if the inequality is a combination, with nonnegative weights, of the inequalities
from the system, or, which is the same, if and only if a ∈ Cone ({a1 , ..., am }).
♠ Particular case: Inhomogeneous Farkas Lemma. A linear inequality aT x ≤ b is a
consequence of a solvable system of inequalities aTi x ≤ bi , 1 ≤ i ≤ m, if and only if the
inequality is a combination, with nonnegative weights, of the inequalities from the system
and the trivial identically true inequality 0TP
x ≤ 1, or, which isPthe same, if and only if there
m m
exist nonnegative λi , 1 ≤ i ≤ m, such that i=1 λi ai = a and i=1 λi bi ≤ b.
LO DUALITY
♣ Given an LO program in the form   
  P x ≤ p (`) 
T
Opt(P ) = max c x : Qx ≥ q (g) (P )
x   Rx = r (e) 
we associate with it the dual problem 
Opt(D) = min pT λ` + q T λg + rT λe :
λ=[λ`;λg ;λe]
 λ` ≥ 0 (`)  (D)
λg ≤ 0 (g) .
 P T λ + QT λ + RT λ = c (e)
` g e
LO Duality Theorem:
(i) [symmetry] Duality is symmetric: the problem dual to (D) is (equivalent to) the primal
problem (P )
(ii) [weak duality] Opt(D) ≥ Opt(P )
(iii) [strong duality] The following 3 properties are equivalent to each other:
• one of the problems (P ), (D) is feasible and bounded
• both (P ) and (D) are solvable
• both (P ) and (D) are feasible
Whenever one of (and then – all) these properties takes place, we have
Opt(D) = Opt(P ).
  
  P x ≤ p (`) 
Opt(P ) = max c x : T Qx ≥ q (g) (P )
x   Rx = r (e) 

Opt(D) = min pT λ` + q T λg + rT λe :
λ=[λ`;λg ;λe]
 λ` ≥ 0 (`)  (D)
λg ≤ 0 (g) .
 P T λ + QT λ + RT λ = c (e)
` g e

♠ LO Optimality Conditions: Let


x, λ = [λ` ; λg ; λe ]
be feasible solutions to the respective problems (P ), (D). Then x, λ are optimal solutions
to the respective problems
• if and only if the associated duality gap is zero:
DualityGap(x, λ) := [pT λ` + q T λg + rT λe ] − cT x
= 0
• if and only if the complementary slackness condition holds: nonzero Lagrange multipliers
are associated only with the primal constraints which are active at x:
∀i : [λ` ]i [pi − (P x)i ] = 0,
∀j : [λg ]j [qj − (Qx)j ] = 0
ALGORITHMS OF LINEAR OPTIMIZATION

♣ The existing algorithmic “working horses” of LO fall into two major categories:
♠ Pivoting methods, primarily the Simplex-type algorithms which heavily exploit the
polyhedral structure of LO programs, in particular, move along the vertices of the feasible
set.
♠ Interior Point algorithms, primarily the Primal-Dual Path-Following Methods, much
less “polyhedrally oriented” than the pivoting algorithms and, in particular, traveling along
interior points of the feasible set of LO rather than along its vertices. In fact, IPM’s have
a much wider scope of applications than LO.

4.1
♠ Theoretically speaking (and modulo rounding errors), pivoting algorithms solve LO pro-
grams exactly in finitely many arithmetic operations. The operation count, however, can
be astronomically large already for small LO’s.
In contrast to the disastrously bad theoretical worst-case-oriented performance estimates,
Simplex-type algorithms seem to be extremely efficient in practice. In 1940’s — early 1990’s
these algorithms were, essentially, the only LO solution techniques.
♠ Interior Point algorithms, discovered in 1980’s, entered LO practice in 1990’s. These
methods combine high practical performance (quite competitive with the one of pivoting
algorithms) with nice theoretical worst-case-oriented efficiency guarantees.

4.2
The Primal and the Dual Simplex Algorithms

♣ Recall that a primal-dual pair of LO problems geometrically is as follows:


Given are:
• A pair of linear subspaces LP , LD in Rn which are orthogonal complements to each other:
LP ⊥ = LD
• Shifts of these linear subspaces – the primal feasible plane MP = LP − b and the dual
feasible plane MD = LD + c
The goal:
• We want to find a pair of orthogonal to each other vectors, one from the primal feasible
set MP ∩ Rn+ , and another one from the dual feasible set MD ∩ Rn+ .
This goal is achievable iff the primal and the dual feasible sets are nonempty, and in this
case the members of the desired pair can be chosen among extreme points of the respective
feasible sets.

4.3
♣ In the Primal Simplex method, the goal is achieved via generating a sequence x1 , x2 , ...
of vertices of the primal feasible set accompanied by a sequence c1 , c2 , ... of solutions to the
dual problem which belong to the dual feasible plane, satisfy the orthogonality requirement
[xt ]T ct = 0, but do not belong to Rn+ (and thus are not dual feasible).
The process lasts until
— either a feasible dual solution ct is generated, in which case we end up with a pair of
primal-dual optimal solutions (xt , ct ),
— or a certificate of primal unboundedness (and thus - dual infeasibility) is found.

4.4
♣ In the Dual Simplex method, the goal is achieved via generating a sequence c1 , c2 , ...
of vertices of the dual feasible set accompanied by a sequence x1 , x2 , ... of solutions to
the primal problem which belong to the primal feasible plane, satisfy the orthogonality
requirement [xt ]T ct = 0, but do not belong to Rn+ (and thus are not primal feasible).
The process lasts until
— either a feasible primal solution xt is generated, in which case we end up with a pair of
primal-dual optimal solutions (xt , ct ),
— or a certificate of dual unboundedness (and thus – primal infeasibility) is found.
♣ Both methods work with the primal LO program in the standard form. As a result, the
dual problem is not in the standard form, which makes the implementations of the algorithms
different from each other, in spite of the “geometrical symmetry” of the algorithms.

4.5
Primal Simplex Method

♣ PSM works with an LO program


 T in the standard form
Opt(P ) = max c x : Ax = b, x ≥ 0 [A : m × n] (P )
x
Geometrically, PSM moves from a vertex xt of the feasible set of (P ) to a neighboring
vertex xt+1 , improving the value of the objective, until either an optimal solution, or an
unboundedness certificate are built. This process is “guided” by the dual solutions ct .
F

S S

Geometry of PSM. The objective is the ordinate (“height”). Left: method starts from
vertex S and ascends to vertex F where an improving ray is discovered, meaning that the
problem is unbounded.
Right: method starts from vertex S and ascends to the optimal vertex F.

4.6
 TPSM: Preliminaries
Opt(P ) = max c x : Ax = b, x ≥ 0 [A : m × n] (P )
x

♣ Standing assumption: The m rows of A are linearly independent.


Note: When the rows of A are linearly dependent, the system of linear equations in (P ) is
either infeasible (this is so when Rank A < Rank [A, b]), or, eliminating “redundant” linear
equations, the system Ax = b can be reduced to an equivalent system A0 x = b with linearly
independent rows in A0 . When possible, this transformation is an easy Linear Algebra task
⇒ the assumption Rank A = m is w.l.o.g.
♣ Bases and basic solutions. A set I of m distinct from each other indexes from {1, ..., n}
is called a base (or basis) of A, if the columns of A with indexes from I are linearly
independent, or, equivalently, when the m × m submatrix AI of A comprised of columns
with indexes from I is nonsingular.

4.7

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

Observation I: Let I be a basis. Then the system Ax = b, xi = 0, i 6∈ I has a unique solution


xI = (A−1
I b, 0n−m );
the m entries of xI with indexes from I form the vector A−1 I b, and the remaining n − m
entries are zero. xI is called the basic solution associated with basis I.

4.8

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

Theorem: Extreme points of the feasible set X = {x : Ax = b, x ≥ 0} of (P ) are exactly


the basic solutions which are feasible for (P ).
Proof. A. Let xI be a basic solution which is feasible. The following n constraints specifying
X are active at xI :
— m equality constraints Ax = b;
— n − m bounds xj = 0, j 6∈ I.
Let us prove that the vectors of coefficients of these n constraints are linearly independent;
by algebraic characterization of extreme points, this would imply that xI is an extreme point
of X.
Assuming the vectors of coefficients of the constraints to be linearly dependent, there exists
a nontrivial solution h to the homogeneous system of linear equations
Ah = 0, hj = 0, j 6∈ I,
implying that the corresponding to I m × m basic submatrix of A is degenerate, which is
impossible.

4.9
B. Let x̄ be an extreme point of X, and let J be the set of indexes of positive entries in x̄.
We claim that the columns of A with indexes from J are linearly independent.
Indeed, otherwise we could find a nonzero vector h with entries hj = 0 for j 6∈ J such that
Ah = 0. For small positive t, vectors x̄ + th and x̄ − th are nonnegative, and we always have
A[x̄ + th] = A[x̄ − th] = b, so that x̄ is a midpoint of a nontrivial segment in X, which is
impossible.
• Since the columns of A with indexes in J are linearly independent and the linear span of
the system of all columns of A is Rm (the m rows of A are linearly independent!), we can
extend J to an m-element set I of indexes such that the columns of A with indexes from
I are linearly independent. Clearly, x̄ is the basic solution associated with the basis I, that
is, every extreme point of X is a basic feasible solution.

4.10

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

Note: Theorem says that extreme points of X = {x : Ax = b, x ≥ 0} can be “parameterized”


by bases I of (P ): every feasible basic solution xI is an extreme point, and vice versa. Note
that this parameterization in general is not a one-to-one correspondence: while there exists
at most one extreme point with entries vanishing outside of a given basis, and every extreme
point can be obtained from appropriate basis, there could be several bases defining the same
extreme point. The latter takes place when the extreme point v in question is degenerate
– it has less than m positive entries. Whenever this is the case, all bases containing all the
indexes of the nonzero entries of v specify the same vertex v.

4.11

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

♣ The problem dual to (P ) reads  T


Opt(D) = min b λe : λg ≤ 0, λg = c − AT λe (D)
λ=[λe ;λg ]
Observation II: Given a basis I for (P ), we can uniquely define a solution λI = [λIe ; cI ]
to (D) which satisfies the equality constraints in (D) and is such that cI vanishes on I.
Specifically, setting I = {i1 < i2 < ... < im } and cI = [ci1 ; ci2 ; ...; cim ], one has
λIe = A−T I T −T
I cI , c = c − A AI cI .
Vector cI is called the vector of reduced costs associated with basis I.
Observation III: Let I be a basis of A. Then for every x satisfying the equality constraints
of (P ) one has
cT x = [cI ]T x + const(I)
Indeed, [cI ]T x = [c − AT λIe ]T x = cT x − [λIe ]T Ax = cT x − [λIe ]T b. 

4.12
Observation IV: Let I be a basis of A which corresponds to a feasible basic solution xI
and nonpositive vector of reduced costs cI . Then xI is an optimal solution to (P ), and λI
is an optimal solution to (D).
Indeed, in the case in question xI and λI are feasible solutions to (P ) and (D) satisfying
the complementary slackness condition. 

4.13
 T Step of the PSM
Opt(P ) = max c x : Ax = b, x ≥ 0 [A : m × n] (P )
x

At the beginning of a step of PSM we have at our disposal


• current basis I along with the corresponding feasible basic solution xI , and
• the vector cI of reduced costs associated with I.
We call variables xi basic, if i ∈ I, and non-basic otherwise. Note that all non-basic variables
in xI are zero, while basic ones are nonnegative.
At a step we act as follows:
♠ We check whether cI ≤ 0. If it is the case, we terminate with optimal primal solution xI
and “optimality certificate” – optimal dual solution [λIe ; cI ]

4.14

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

When cI is not nonpositive, we proceed as follows:


♠ We select an index j such that cIj > 0. Since cIi = 0 for i ∈ I, we have j 6∈ I; our intention
is to update the current basis I into a new basis I + (which, same as I is associated with a
feasible basic solution), by adding to the basis the index j (“non-basic variable xj enters the
basis”) and discarding from the basis an appropriately chosen index i∗ ∈ I (“basic variable
xi∗ leaves the basis”). Specifically,
• We look at the solutions x(t), t ≥ 0 is a parameter, defined by
Ax(t) = b &xj (t) = t & x` (t) = 0 ∀` 6∈ [I ∪ {j}]
 xIi − t[A−1
I Aj ] i , i ∈ I
⇔ xi (t) = t, i=j
 0, all other cases
Observation VI: (a): x(0) = xI and (b): cT x(t) − cT xI = cIj t.
(a) is evident. By Observation IV
=0
z}|{
cT [x(t) − x(0)] = [cI ]T [x(t) − x(0)] = i∈I cIi [xi (t) − xi (0)]
P
+cIj [xj (t) − xj (0)] = cIj t.

4.15

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

Situation:• I, xI , cI : a basis of A and the associated basic feasible solution and reduced
costs.
• j 6∈ I : cIj > 0
• Ax(t) = b & x j (t) = t & x` (t) = 0 ∀` 6∈ [I ∪ {j}]
 xIi − t[A−1 I A j ]i , i ∈ I
⇔ xi (t) = t, i=j
 0, all other cases
• cT [x(t) − xI ] = cIj t
There are two options:
A. All quantities [A−1 I Aj ]i , i ∈ I, are ≤ 0. Here x(t) is feasible for all t ≥ 0 and

cT [x(t) − x(0)] = cIj t → ∞ as t → ∞.


We claim (P ) unbounded and terminate.
B. I∗ := {i ∈ I : [A−1
I Aj ]i > 0} 6=n ∅. We set o
xIi xIi∗
t̄ = min −1
[AI Aj ]i
: i ∈ I∗ = −1
[A Aj ]i∗
with i∗ ∈ I∗ .
I
+
Observe that x(t̄) ≥ 0 and xi∗ (t̄) = 0. We set I + = I ∪ {j}\{i∗ }, xI
= x(t̄), compute the
+ + +
vector of reduced costs cI and pass to the next step of the PSM, with I + , xI , cI in the
roles of I, xI , cI .

4.16
Summary

♠ The PSM works with a standard


 T form LO
max c x : Ax = b, x ≥ 0 [A : m × n] (P )
x
♠ At the beginning of a step, we have
• current basis I - a set of m indexes such that the columns of A with these indexes are
linearly independent
• current basic feasible solution xI such that AxI = b, xI ≥ 0 and all nonbasic – with indexes
not in I – entries of xI are zeros

4.17
♠ At a step, we
• A. Compute the vector of reduced costs cI = c − AT λIe such that the basic entries in cI
are zeros.
• If cI ≤ 0, we terminate — xI is an optimal solution.
• Otherwise we pick j with cIj > 0 and build a “ray of solutions” x(t) = xI + th such that
Ax(t) ≡ b, and xj (t) ≡ t is the only nonbasic entry in x(t) which can be 6= 0.
• We have x(0) = xI , cT x(t) − cT xI = cIj t.
• If no basic entry in x(t) decreases as t grows, we terminate: (P ) is unbounded.
• If some basic entries in x(t) decrease as t grows, we choose the largest t = t̄ such that
x(t) ≥ 0. When t = t̄, at least one of the basic entries of x(t) is about to become negative,
and we eliminate its index i∗ from I, adding to I the index j instead (“variable xj enters
the basis, variable xi∗ leaves it”).
+
♠ I + = [I ∪ {j}]\{i∗ } is our new basis, xI = x(t̄) is our new basic feasible solution, and we
pass to the next step.

4.18
Remarks:
• I + , when defined, is a basis of A, and in this case x(t̄) is the corresponding basic feasible
solution. ⇒ The PSM is well defined
• Upon termination (if any), the PSM correctly solves the problem and, moreover,
— either returns extreme point optimal solutions to the primal and the dual problems,
— or returns a certificate of unboundedness - a (vertex) feasible solution xI along with a
d
direction d = dt x(t) ∈ Rec({x : Ax = b, x ≥ 0}) which is an improving direction of the objec-
tive: cT d > 0. On a closest inspection, d is an extreme direction of Rec({x : Ax = b, x ≥ 0}).
+ +
• The method is monotone: cT xI ≥ cT xI . The latter inequality is strict, unless xI = xI ,
which may happen only when xI is a degenerate basic feasible solution.
• If all basic feasible solutions to (P ) are nondegenerate, the PSM terminates in finite time.
Indeed, in this case the objective strictly grows from step to step, meaning that the same ba-
sis cannot be visited twice. Since there are finitely many bases, the method must terminate
in finite time.

4.19
Tableau
 Implementation of the PSM
Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

♣ In computations by hand it is convenient to implement the PSM in the tableau form. The
tableau summarizes the information we have at the beginning of the step, and is updated
from step to step by easy to memorize rules.
♠ The structure of a tableau is as follows:
x1 x2 ··· xn
−cT xI cI1 cI2 ··· cIn
xi1 =< ... > [A−1
I A]1,1 [A−1
I A]1,2 ··· [A−1
I A]1,n
... ... ... ... ...
xin =< ... > [A−1 −1
I A]m,1 [AI A]m,2 ··· [A−1
I A]m,n

• Zeroth row: minus the current value of the objective and the current reduced costs
• Rest of Zeroth column: the names and the values of current basic variables
• Rest of the tableau: the m × n matrix A−1
I A
Note: In the Tableau, all columns but the Zeroth one are labeled by decision variables, and
all rows but the Zeroth one are labeled by the current basic variables.

4.20
♣ To illustrate the Tableau implementation, consider the LO problem
max 10x1 + 12x2 + 12x3
s.t.
x1 + 2x2 + 2x3 ≤ 20
2x1 + x2 + 2x3 ≤ 20
2x1 + 2x2 + x3 ≤ 20
x1 , x2 , x3 ≥ 0
♠ Adding slack variables, we convert the problem into the standard form
max 10x1 + 12x2 + 12x3
s.t.
x1 + 2x2 + 2x3 + x4 = 20
2x1 + x2 + 2x3 + x5 = 20
2x1 + 2x2 + x3 + x6 = 20
x1 , ..., x6 ≥ 0

• We can take I = {4, 5, 6} as the initial basis, xI = [0; 0; 0; 20; 20; 20] as the corresponding
basic feasible solution, and c as cI . The first tableau is
x1 x2 x3 x4 x5 x6
0 10 12 12 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = 20 2 1 2 0 1 0
x6 = 20 2 2 1 0 0 1

4.21
x1 x2 x3 x4 x5 x6
0 10 12 12 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = 20 2 1 2 0 1 0
x6 = 20 2 2 1 0 0 1
• Not all reduced costs (Zeroth row of the Tableau) are nonpositive.
A: Detecting variable to enter the basis:
We choose a variable with positive reduced cost, specifically, x1 , which is about to enter
the basis, and call the corresponding column in the Tableau the pivoting column.

4.22
x1 x2 x3 x4 x5 x6
0 10 12 12 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = 20 2 1 2 0 1 0
x6 = 20 2 2 1 0 0 1
B: Detecting variable to leave the basis:
B.1. We select the positive entries in the pivoting column (not looking at the Zeroth row).
If there is nothing to select, (P ) is unbounded, and we terminate.
B.2. We divide by the selected entries of the pivoting column the corresponding entries of
the Zeroth column, thus getting ratios 20 = 20/1, 10 = 20/2, 10 = 20/2.
B.3. We select the smallest among the ratios in B.2 (this quantity is the above t̄) and
call the corresponding row the pivoting one. The basic variable labeling this row leaves the
basis. In our case, we choose as pivoting row the one labeled by x5 ; this variable leaves the
basis.

4.23
x1 x2 x3 x4 x5 x6
0 10 12 12 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = 20 2 1 2 0 1 0
x6 = 20 2 2 1 0 0 1
C. Updating the tableau:
C.1. We divide all entries in the pivoting row by the pivoting element (the one which is
in the pivoting row and pivoting column) and change the label of the pivoting row to the
name of the variable entering the basis:
x1 x2 x3 x4 x5 x6
0 10 12 12 0 0 0
x4 = 20 1 2 2 1 0 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 20 2 2 1 0 0 1
C.2. We subtract from all non-pivoting rows (including the Zeroth one) multiples of the
(updated) pivoting row to zero out the entries in the pivoting column:
x1 x2 x3 x4 x5 x6
−100 0 7 2 0 −5 0
x4 = 10 0 1.5 1 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 0 0 1 −1 0 −1 1
The step is over.
4.24
x1 x2 x3 x4 x5 x6
−100 0 7 2 0 −5 0
x4 = 10 0 1.5 1 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 0 0 1 −1 0 −1 1
A: There still are positive reduced costs in the Zeroth row. We choose x3 to enter the
basis; the column of x3 is the pivoting column.
B: We select positive entries in the pivoting column (except for the Zeroth row), divide by
the selected entries the corresponding entries in the zeroth column, the getting the ratios
10/1, 10/1, and select the minimal ratio, breaking the ties arbitrarily. Let us select the first
ratio as the minimal one. The corresponding – the first – row becomes the pivoting one,
and the variable x4 labeling this row leaves the basis.

x1 x2 x3 x4 x5 x6
−100 0 7 2 0 −5 0
x4 = 10 0 1.5 1 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 0 0 1 −1 0 −1 1

4.25
x1 x2 x3 x4 x5 x6
−100 0 7 2 0 −5 0
x4 = 10 0 1.5 1 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 0 0 1 −1 0 −1 1
C: We identify the pivoting element (underlined), divide by it the pivoting row and update
the label of this row:
x1 x2 x3 x4 x5 x6
−100 0 7 2 0 −5 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 0 0 1 −1 0 −1 1
We then subtract from non-pivoting rows the multiples of the pivoting row to zero the
entries in the pivoting column:
x1 x2 x3 x4 x5 x6
−120 0 4 0 −2 −4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x6 = 10 0 2.5 0 1 −1.5 1
The step is over.

4.26
x1 x2 x3 x4 x5 x6
−120 0 4 0 −2 −4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x6 = 10 0 2.5 0 1 −1.5 1
A: There still is a positive reduced cost. Variable x2 enters the basis, the corresponding
column becomes the pivoting one:
x1 x2 x3 x4 x5 x6
−120 0 4 0 −2 −4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x6 = 10 0 2.5 0 1 −1.5 1
B: We select the positive entries in the pivoting column (aside of the Zeroth row) and
divide by them the corresponding entries in the Zeroth column, thus getting ratios 10/1.5,
10/2.5. The minimal ratio corresponds to the row labeled by x6 . This row becomes the
pivoting one, and x6 leaves the basis:
x1 x2 x3 x4 x5 x6
−120 0 4 0 −2 −4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x6 = 10 0 2.5 0 1 −1.5 1

4.27
x1 x2 x3 x4 x5 x6
−120 0 4 0 −2 −4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x6 = 10 0 2.5 0 1 −1.5 1
C: We divide the pivoting row by the pivoting element (underlined), change the label of
this row:
x1 x2 x3 x4 x5 x6
−120 0 4 0 −2 −4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x2 = 4 0 1 0 0.4 −0.6 0.4
and then subtract from all non-pivoting rows multiples of the pivoting row to zero the
entries in the pivoting column:
x1 x2 x3 x4 x5 x6
−136 0 0 0 −3.6 −1.6 −1.6
x3 = 4 0 0 1 0.4 0.4 −0.6
x1 = 4 1 0 0 −0.6 0.4 0.4
x2 = 4 0 1 0 0.4 −0.6 0.4
The step is over.

4.28
x1 x2 x3 x4 x5 x6
−136 0 0 0 −3.6 −1.6 −1.6
x3 = 4 0 0 1 0.4 0.4 −0.6
x1 = 4 1 0 0 −0.6 0.4 0.4
x2 = 4 0 1 0 0.4 −0.6 0.4
• The new reduced costs are nonpositive ⇒ the solution x∗ = [4; 4; 4; 0; 0; 0; 0] is optimal.
The dual optimal solution is
λe = [3.6; 1.6; 1.6], cI = [0; 0; −3.6; −1.6; −1.6],
the optimal value is 136.

4.29

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x

♣ Getting started.
In order to run the PSM, we should initialize it with some feasible basic solution. The
standard way to do it is as follows:
♠ Multiplying appropriate equality constraints in (P ) by -1, we ensure that b ≥ 0.
♠ We then build an auxiliary LO program
Pm
Opt(P 0 ) = min i=1 si : Ax+s = b, x ≥ 0, s ≥ 0 (P 0 )
x,s
Observations:
• Opt(P 0 ) ≥ 0, and Opt(P 0 ) = 0 iff (P ) is feasible;
• (P 0 ) admits an evident basic feasible solution x = 0, s = b, the basis being I = {n +
1, ..., n + m};
• When Opt(P 0 ) = 0, the x-part x∗ of every vertex optimal solution (x∗ , s∗ = 0) of (P 0 ) is
a feasible basic solution to (P ).
Indeed, the columns Ai with indexes i such that x∗i > 0 are linearly independent ⇒ the set
I 0 = {i : x∗i > 0} can be extended to a basis I of A, x∗ being the associated basic feasible
solution xI .

4.30

Opt(P ) = max cT x : Ax = b≥ 0, x ≥ 0 (P )
x Pm
0 (P 0 )
Opt(P ) = min
x,s i=1 si : Ax+s = b, x ≥ 0, s ≥ 0

♠ In order to initialize solving (P ) by the PSM, we start with phase I where we solve (P 0 )
by the PSM, the initial feasible basic solution being x = 0, s = b.
Upon termination of Phase I, we have at our disposal
— either an optimal solution x∗ , s∗ 6= 0 to (P 0 ); it happens when Opt(P 0 ) > 0
⇒ (P ) is infeasible,
— or an optimal solution x∗ , s∗ = 0; in this case x∗ is a feasible basic solution to (P ), and
we pass to Phase II, where we solve (P ) by the PSM initialized by x∗ and the corresponding
basis I.

4.31

Opt(P ) = max cT x : Ax = b≥ 0, x ≥ 0 (P )
x Pm
0 (P 0 )
Opt(P ) = min
x,s i=1 si : Ax+s = b, x ≥ 0, s ≥ 0

Note: Phase I yields, along with x∗ , s∗ , an optimal solution [λ∗ ; e c] to the problem dual to
0
(P ): 
 0≤e c := [0n×1 ; 1; ...; 1] − [A, Im]T λ∗
= −AT λ∗ ; [1; ...; 1] − λ∗
0 = e cT [x∗ ; s∗ ]

⇒ 0 = −[x∗ ]T AT λ∗ + Opt(P 0 ) − [s∗ ]T λ∗ & AT λ∗ ≤ 0
⇒ Opt(P 0 ) = bT λ∗ & AT λ∗ ≤ 0
Recall that the standard certificate of infeasibility for (P ) is a pair λg ≤ 0, λe such that

λg ≤ 0 & AT λe + λg = 0 & bT λe < 0.


We see that when Opt(P 0 ) > 0, the vectors λe = −λ∗ , λg = AT λ∗ form an infeasibility
certificate for (P ).

4.32
 TPreventing Cycling
Opt(P ) = max c x : Ax = b, x ≥ 0 [A : m × n] (P )
x

♣ Finite termination of the PSM is fully guaranteed only when (P ) is nondegenerate,


that is, every basic feasible solution has exactly m nonzero entries, or, equivalently, in every
representation of b as a conic combination of linearly independent columns of A, the number
of columns which get positive weights is m.
♠ When (P ) is degenerate, b belongs to the union of finitely many proper linear subspaces
EJ = Lin({Ai : i ∈ J}) ⊂ Rm , where J runs through the family of all (m − 1)-element subsets
of {1, ..., n}
⇒ Degeneracy is a rare phenomenon: given A, the set of right hand side vectors b for which
(P ) is degenerate is of measure 0.
However: There are specific problems with integer data where degeneracy can be typical.

4.33
♣ It turns out that cycling can be prevented by properly chosen pivoting rules which “break
ties” in the cases when there are several candidates to entering the basis and/or several
candidates to leave it.
Example 1: Bland’s “smallest subscript” pivoting rule.
• Given a current tableau containing positive reduced costs, find the smallest index j such
that j-th reduced cost is positive; take xj as the variable to enter the basis.
• Assuming that the above choice of the pivoting column does not lead to immediate
termination due to problem’s unboundedness, choose among all legitimate (i.e., compatible
with the description of the PSM) candidates to the role of the pivoting row the one with
the smallest index and use it as the pivoting row.

4.34
Example 2: Lexicographic pivoting rules.
• RN can be equipped with lexicographic order: x >L y if and only if x 6= y and the first
nonzero entry in x − y is positive, and x ≥L y if and only if x = y or x >L y.
Examples: [1; −1; −2] >L [0; 100, 10], [1; −1; −2] >L [1; −2; 1000].
Note: ≥L and >L share the usual properties of the arithmetic ≥ and >, e.g.
• x ≥L x
• x ≥L y & y ≥L x ⇒ y = x
• x ≥L y ≥L z ⇒ x ≥L z
• x ≥L y & u ≥L v ⇒ x + u ≥L y + v
• x ≥L y, λ ≥ 0 ⇒ λx ≥L λy
In addition, the order ≥L is complete: every two vectors x, y from Rn are lexicographically
comparable, i.e., either x >L y, or x = y, or y >L x.

4.35
Lexicographic pivoting rule (L)
• Given the current tableau which contains positive reduced costs cIj , choose as the index
of the pivoting column a j such that cIj > 0.
• Let ui be the entry of the pivoting column in row i = 1, ..., m, and let some of ui
be positive (that is, the PSM does not detect problem’s unboundedness at the step in
question). Normalize every row i with ui > 0 by dividing all its entries, including those in
the zeroth column, by ui , and choose among the resulting (n + 1)-dimensional row vectors
the smallest w.r.t the lexicographic order, let its index be i∗ (it is easily seen that all rows
to be compared to each other are distinct, so that the lexicographically smallest of them
is uniquely defined). Choose the row i∗ as the pivoting one (i.e., the basic variable labeling
the row i∗ leaves the basis).

4.36
Theorem: Let the PSM be initialized by a basic feasible solution and use pivoting rule (L).
Assume also that all rows in the initial tableau, except for the zeroth row, are lexicograph-
ically positive. Then
• the rows in the tableau, except for the zeroth one, remain lexicographically positive at all
non-terminal steps,
• the vector of reduced costs strictly lexicographically decreases when passing from a tableau
to the next one, whence no basis is visited twice,
• the PSM method terminates in finite time.
Note:Lexicographical positivity of the rows in the initial tableau is easy to achieve: it suf-
fices to reorder variables to make the initial basis to be I = {1, 2, ..., m}. In this case the
non-zeroth rows in the initial tableau look as follows:
 I 
x1 ≥ 0 1 0 0 · · · 0 ? ? · · ·
 xI ≥ 0 0 1 0 · · · 0 ? ? · · · 
−1 −1
 2I 
[AI b, AI A] =  x3 ≥ 0 0 0 1 · · · 0 ? ? · · · 

 ... ... ... ... . . . ... ... ... ... 

xIm ≥ 0 0 0 0 · · · 1 ? ? · · ·
and we see that these rows are >L 0.

4.37
Dual Simplex Method

♣ DSM is aimed at solving LO program


 T in the standard form:
Opt(P ) = max c x : Ax = b, x ≥ 0 [A : m × n] (P )
x
under the same standing assumption that the rows of A are linearly independent.
♠ The problem dual to (P ) reads 
Opt(D) = min bT λe : λg = c − AT λe , λg ≤ 0 (D)
[λe ;λg ]
♠ Complementarity slackness reads
[λg ]i xi = 0 ∀i.
♣ The PSM generates a sequence of neighboring basic feasible solutions to (P ) (aka ver-
tices of the primal feasible set) x1 , x2 , ... augmented by complementary infeasible solutions
[λ1e ; λ1g ], [λ2e ; λ2g ], ... to (D) (these dual solutions satisfy, however, the equality constraints of
(D)) until the infeasibility of the dual problem is discovered or until a feasible dual solution
is built. In the latter case, we terminate with optimal solutions to both (P ) and (D).

4.38

Opt(P ) = max cT x : Ax = b, x ≥ 0
x (P )
[A : m × n, Rank A = m]

♠ The problem dual to (P ) reads 


Opt(D) = min bT λe : λg = c − AT λe , λg ≤ 0 (D)
[λe ;λg ]
♠ Complementarity slackness reads
[λg ]i xi = 0 ∀i.
♠ The DSM generates a sequence of neighboring basic feasible solutions to (D) (aka vertices
of the dual feasible set) [λ1e ; λ1g ], [λ2e ; λ2g ], ... augmented by complementary infeasible solutions
x1 , x2 , ... to (P ) (these primal solutions satisfy, however, the equality constraints of (P ))
until the infeasibility of (P ) is discovered or a feasible primal solution is built. In the latter
case, we terminate with optimal solutions to both (P ) and (D).

4.39
DSM:
 T Preliminaries
Opt(P ) = max c x : Ax = b, x ≥ 0 (P )
x 
Opt(D) = min bT λe : λg + AT λe = c, λg ≤ 0 (D)
[λe ;λg ]

♠ Fact: Every basis I of matrix A induces


— a uniquely defined by the basis primal basic solution xI which solves primal equations,
and has all nonbasic entries in xI equal to zero;
— a uniquely defined by the basis dual basic solution [λIe , cI ] which solves dual equations,
and has all basic entries in cI equal to zero.
♠ Fact:
• Extreme points of the primal feasible set are primal basic solutions xI which are primal
feasible: xI ≥ 0.
• Extreme points of the dual feasible set are dual basic solutions [λIe , cI ] which are dual
feasible: cI ≤ 0.

4.40

Opt(D) = min bT λe : λg + AT λe = c, λg ≤ 0 (D)
[λe ;λg ]

Observation: Let x satisfy Ax = b. Replacing the dual objective bT λe with −xT λg , we shift
the dual objective on the dual feasible plane by a constant. Equivalently:
[λe ; λg ], [λ0e ; λ0g ] satisfy the equality constraints in (D)
⇔bT [λe − λ0e ] = −xT [λg − λ0g ]
Indeed,
λg + AT λe = c = λ0g + AT λ0e
⇒ λg − λ0g = AT [λ0e − λe ]
⇒ −xT [λg − λ0g ] = −xT AT [λ0e − λe ] = bT [λe − λ0e ].

4.41

Opt(P ) = max cT x : Ax = b, x ≥ 0 [A : m × n] (P )
x  T
Opt(D) = min b λe : λg + AT λe = c, λg ≤ 0 (D)
[λe ;λg ]

♣ A step of the DSM:


♠ At the beginning of a step, we have at our disposal a basis I of A along with corresponding
λI
z }|e {
nonpositive vector cI = c − AT A−T I cI of reduced costs and the (not necessary feasible) basic
primal solution xI . At a step we act as follows:
• We check whether xI ≥ 0. If it is the case, we terminate – xI is an optimal primal solution,
and [λIe = A−T I
I cI ; c ] is the complementary optimal dual solution.

4.42

Opt(D) = min bT λe : λg + AT λe = c, λg ≤ 0 (D)
[λe ;λg ]

• If xI has negative entries, we pick one of them, let it be xIi∗ : xIi∗ < 0; note that i∗ ∈ I.
Variable xi∗ will leave the basis.
• We build a “ray of dual solutions”
[λIe (t); cI (t)] = [λIe ; cI ] − t[de ; dg ], t ≥ 0
from the requirements
cI (t) + AT λIe (t) = c,
cIi∗ (t) = −t,
cIi (t) = cIi = 0 for all basic indexes i 6= i∗ .
This results in
λIe (t)
z }| {
c (t) = c − A AI [c + tei∗ ]I = cI − tAT A−T
I T −T
I [ei∗ ]I .
I
Observe that along the ray c (t), the dual objective strictly improves:
bT [λIe (t) − λIe ] = −[xI ]T [cI (t) − cI ] = −xIi∗ [−t] = xIi∗ ·t.
|{z}
<0
♥ It may happen that cI (t)
remains nonpositive for all t ≥ 0. Then we have discovered a
dual feasible ray along which the dual objective tends to −∞. We terminate claiming that
the dual problem is unbounded ⇒ the primal problem is infeasible.

4.43
λIe (t)
z }| {
cI (t) =c− AT −T
AI [c + tei∗ ]I = cI − tAT A−T
I [ei∗ ]I .

♥ Alternatively, as t grows, some of the reduced costs cIj (t) eventually become positive.
We identify the smallest t = t̄ such that cI (t̄) ≤ 0 and index j such that cIj (t) is about to
become positive when t = t̄; note that j 6∈ I. Variable xj enters the basis. The new basis is
I + = [I ∪ {j}]\{i∗ }, the new vector of reduced costs is cI (t̄) = c − AT λIe (t̄). We compute the
+
new basic primal solution xI and pass to the next step.

4.44
Note: As a result of a step, we
— either terminate with optimal primal and dual solutions,
— or terminate with correct claim that (P ) is infeasible, (D) is unbounded,
— or pass to the new basis and new feasible basic dual solution and carry out a new step.
Along the basic dual solutions generated by the method, the values of dual objective never
increase and strictly decrease at every step which does update the current dual solution
(i.e., results in t̄ > 0). The latter definitely is the case when the current dual solution is
nondegenerate, that is, all non-basic reduced costs are strictly negative.
In particular, when all basic dual solutions are nondegenerate, the DSM terminates in finite
time.

4.45
♣ Same as PSM, the DSM possesses Tableau implementation. The structure of the tableau
is exactly the same as in the case of PSM:
x1 x2 ··· xn
−cT xI cI1 cI2 ··· cIn
xi1 =< ... > [A−1
I A]1,1 [A−1
I A]1,2 ··· [A−1
I A]1,n
... ... ... ... ...
xin =< ... > [A−1
I A] m,1 [A −1
I A]m,2 ··· [A−1
I A]m,n
We illustrate the DSM rules by example. The current tableau is
x1 x2 x3 x4 x5 x6
8 −1 −1 −2 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = −20 −2 −1 2 0 1 0
x6 = −1 2 2 −1 0 0 1
which corresponds to I = {4, 5, 6}. The vector of reduced costs is nonpositive, and its basic
entries are zero (a must for DSM).
A. Some of the entries in the basic primal solution (zeroth column) are negative. We select
one of them, say, x5 , and call the corresponding row the pivoting row:
x1 x2 x3 x4 x5 x6
8 −1 −1 −2 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = −20 −2 −1 2 0 1 0
x6 = −1 2 2 −1 0 0 1
Variable x5 will leave the basis.
4.46
x1 x2 x3 x4 x5 x6
8 −1 −1 −2 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = −20 −2 −1 2 0 1 0
x6 = −1 2 2 −1 0 0 1
B.1. If all entries in the pivoting row, except for the one in zeroth column, are nonnegative,
we terminate – the dual problem is unbounded, the primal is infeasible.
B.2. Alternatively, we
— select the negative entries in the pivoting row outside of the zeroth column (all of them
are nonbasic!) and divide by them the corresponding entries in the zeroth row, thus getting
nonnegative ratios, in our example the ratios 1/2 = (−1)/(−2) and 1 = (−1)/(−1);
— pick the smallest of the computed ratios and call the corresponding column the pivoting
column. Variable which marks this column will enter the basis.
x1 x2 x3 x4 x5 x6
8 −1 −1 −2 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = −20 −2 −1 2 0 1 0
x6 = −1 2 2 −1 0 0 1

The pivoting column is the column of x1 . Variable x5 leaves the basis, variable x1 enters
the basis.

4.47
x1 x2 x3 x4 x5 x6
8 −1 −1 −2 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = −20 −2 −1 2 0 1 0
x6 = −1 2 2 −1 0 0 1
C. It remains to update the tableau, which is done exactly as in the PSM:
— we normalize the pivoting row by dividing its entries by the pivoting element (one in the
intersection of the pivoting row and pivoting column) and change the label of this row (the
new label is the variable which enters the basis):
x1 x2 x3 x4 x5 x6
8 −1 −1 −2 0 0 0
x4 = 20 1 2 2 1 0 0
x1 = 10 1 0.5 −1 0 −0.5 0
x6 = −1 2 2 −1 0 0 1
— we subtract from all non-pivoting rows multiples of the (normalized) pivoting row to
zero the non-pivoting entries in the pivoting column:
x1 x2 x3 x4 x5 x6
18 0 −0.5 −3 0 −0.5 0
x4 = 10 0 1.5 3 1 0.5 0
x1 = 10 1 0.5 −1 0 −0.5 0
x6 = −21 0 1 1 0 1 1
The step of DSM is over.
4.48
x1 x2 x3 x4 x5 x6
18 0 −0.5 −3 0 −0.5 0
x4 = 10 0 1.5 3 1 0.5 0
x1 = 10 1 0.5 −1 0 −0.5 0
x6 = −21 0 1 1 0 1 1
The basis is I = {1, 4, 6}, and there still is a negative basic variable x6 . Its row is the
pivoting one:
x1 x2 x3 x4 x5 x6
18 0 −0.5 −3 0 −0.5 0
x4 = 10 0 1.5 3 1 0.5 0
x1 = 10 1 0.5 −1 0 −0.5 0
x6 = −21 0 1 1 0 1 1
However: all entries in the pivoting row, except for the one in the zeroth column, are
nonnegative
⇒ Dual problem is unbounded, primal problem is infeasible.

4.49
x1 x2 x3 x4 x5 x6
18 0 −0.5 −3 0 −0.5 0
x4 = 10 0 1.5 3 1 0.5 0
x1 = 10 1 0.5 −1 0 −0.5 0
x6 = −21 0 1 1 0 1 1
Explanation. In terms of the tableau, the vector dg participating in the description of the
“ray of dual solutions”
[λIe (t); cI (t)] = [λIe ; cI ] − t[de ; dg ]
is just the (transpose of) the pivoting row (with entry in the zeroth column excluded).
When this vector is nonnegative, we have cI (t) ≤ 0 for all t ≥ 0, that is, the entire ray of
dual solutions is feasible. It remains to recall that along this ray the dual objective tends
to −∞.

4.50
Warm Start

♣ Fact: In many applications there is a necessity to solve a sequence of “close” to each


other LO programs.
Example: In branch-and-bound methods for solving Mixed Integer Linear Optimization
problems, one solves a sequence of LOs obtained from each other by adding/deleting
variable/constraint, one at a time.
♣ Question: Assume we have solved an LO
 T
max c̄ x : Āx = b̄, x ≥ 0 (P̄ )
x

to optimality by PSM (or DSM) and have at our disposal the optimal basis I giving rise to
optimal primal solution x̄I and optimal dual solution [λ̄Ie ; c̄I ]:
[x̄I ]I = Ā−1
I b̄ ≥ 0 [feasibility]
c̄I = c̄ − ĀT [ĀI ]−T c̄I ≤ 0 [optimality]
| {z }
λ̄Ie

How could we utilize I when solving an LO problem (P ) “close” to (P̄ )?

4.51
max c̄T x : Āx = b̄, x ≥ 0

(P̄ )
x
Cost vector is updated: c̄ 7→ c
♠ For the new problem, I remains a basis, and x̄I remains a basic feasible solution. We can
easily compute the basic dual solution [λe ; cI ]: cI = c − ĀT [ĀI ]−T cI
| {z }
λe
• If nonbasic entries in cI are nonpositive, x̄I and [λe ; cI ] are optimal primal and dual solutions
to the new problem.
Note: This definitely is the case when nonbasic entries in c̄I are negative (i.e., [λ̄e ; c̄I ] is
nondegenerate dual basic solution to (P̄ )) and c is close enough to c̄. In this case,
x̄I = ∇c c=c̄ Opt(c).

• If some of the nonbasic entries in cI are positive, I is non-optimal for (P ), but I still is a
basis producing a basic feasible primal solution for the new problem. We can solve the new
problem by PSM starting from this basis, which usually allows to solve (P ) much faster
than “from scratch.”

4.52
max c̄T x : Āx = b̄, x ≥ 0

(P̄ )
x
Right hand side vector is updated: b̄ 7→ b
♠ For the new problem, I remains a basis, and [λ̄e ; c̄I ] remains a basic dual feasible solution.
We can easily compute the new primal basic solution:
[xI ]I = [ĀI ]−1 b, [xI ]i = 0 when i 6∈ I
• If basic entries in xI are nonnegative, xI and [λ̄e ; c̄I ] are optimal primal and dual solution
to the new problem.
Note: This definitely is the case when basic entries in x̄I are positive (i.e., x̄I is nondegen-
erate primal basic solution to (P̄ )) and b is close enough to b̄. In this case,
λ̄e = ∇b b=b̄ Opt(b).

• If some of the basic entries in xI are negative, I is non-optimal for (P ), but I still is a
basis producing a dual feasible solution to the new problem. We can solve the new problem
by DSM starting from this basis, which usually allows to solve (P ) much faster than “from
scratch.”

4.53
max c̄T x : Āx = b̄, x ≥ 0

(P̄ )
x
Some nonbasic columns in Ā are updated
and/or
new variables are added
♣ Assume that we update some nonbasic columns in Ā and extend the list of primal variables,
adding to Ā several columns and extending c accordingly.
♠ For the new problem, I remains a basis, and x̄I , extended by appropriate number of zero
entries, remains a basic primal feasible solution. We can solve the new problem by PSM,
starting with these basis and basic feasible solution.

4.54
max c̄T x : Āx = b̄, x ≥ 0

(P̄ )
x
A new equality constraint is added to (P̄ )
♣ Assume that m × n matrix Ā with linearly independent rows is augmented by a new row
aT (linearly independent of the old ones), and b̄ is augmented by a new entry.
♠ Augmenting λ̄e by zero entry, we get a feasible dual solution [λe ; c̄I ] to the new problem.
This solution, however, is not necessarily basic, since I is not a basis for the constraint
matrix A of the new problem.
♠ Fact: [λe ; c̄I ] can be easily converted into a basic dual feasible solution, and we can use
this solution in DSM.
♥ Let ∆ = a − ĀT [ĀI ]−T aI = AT δ.
Note: δ 6= 0 and entries ∆i , i ∈ I, are zero.
• Since the rows in A are independent and δ 6= 0, we have ∆ 6= 0; replacing, if necessary, δ
with −δ, we can assume that one of the entries in ∆ is negative.
Thus, ∆ = AT δ 6= 0, ∆i = 0 for i ∈ I and ∆j < 0 for some j 6∈ I.

4.55
Situation: ∆ = AT δ 6= 0, ∆i = 0 for i ∈ I and ∆j < 0 for some j 6∈ I.
Now let us look at the ray
{[λe (t) := λe − tδ, cI (t) := c − AT λe (t) = c̄I − t∆] : t ≥ 0}
so that
• AT λe (t) + cI (t) ≡ c
• all [cI (t)]i , i ∈ I, are identically zero
• cI (0) = c̄I ≤ 0
• entry [cI (t)]j is positive when t is large
Let t̄ be the largest t ≥ 0 such that cI (t) ≤ 0. When t = t̄, for some j̄ the entry [cI (t̄)]j̄
is about to become positive. Note that j̄ 6∈ I since [cI (t)]i ≡ 0 for i ∈ I. It is immediately
seen that I + = I ∪ {j̄} and [λe (t̄); cI (t̄)] are a basis and the corresponding basic feasible dual
solution for the new problem. We can now solve the new problem by DSP starting with
this solution.

4.56
max c̄T x : Āx = b̄, x ≥ 0

(P̄ )
x
An inequality constraint aT x ≤ α is added to (P̄ )
♣ The new problem in the standard form reads
 
Āx = b̄
min [c̄; 0]T [x; s] : T , x ≥ 0, s ≥ 0 (P )
x,s a x +s = α
♠ Transition from (P̄ ) to (P ) can be done in two steps:
♣ We augment x by new variable s, Ā by a zero column, and c̄ - by zero entry, thus arriving
at the problem
min [c̄; 0]T [x̄; s] : Āx + 0 · s = b̄, x ≥ 0, s ≥ 0

(Pe )
x,s

For this problem, I is a basis, x eI = [xI ; 0] is the associated primal optimal basic feasible
solution, and [λ̄e ; [c̄I ; 0]] is the associated dual optimal basic solution.
• (P ) is obtained from (Pe ) by adding a new equality constraint. We already know how to
warm-start solving (P ) by DSM.

4.57
Network Simplex Method

♣ Recall the (single-product) Network Flow problem:

Given
• an oriented graph with arcs assigned transportation costs and capacities,
• a vector of external supplies at the nodes of the arc,
find the cheapest flow fitting the supply and respecting arc capacities.

♣ Network Flow problems form an important special case in LO.


♠ As applied to Network Flow problems, the Simplex Method admits a dedicated imple-
mentation which has many advantages as compared to the general-purpose algorithm.

5.1
Preliminaries: Undirected Graphs

♣ Undirected graph is a pair G = (N , E) comprised of


• finite nonempty set of nodes N
• set E of arcs, an arc being a 2-element subset of N .
Standard notation: we identify N with the set {1, 2, ..., m}, m being a number of nodes in
the graph in question. Then every arc γ becomes an unordered pair {i, j} of two distinct
from each other (i 6= j) nodes.
We say that nodes i, j are incident to arc γ = {i, j}, and this arc links nodes i and j.

5.2
Undirected graph.
Nodes: N = {1, 2, 3, 4}.
Arcs: E = {{1, 2}, {2, 3}, {1, 3}, {3, 4}}

♣ Walk: an ordered sequence of nodes i1 , ..., it with every two consecutive nodes linked by
an arc: {is , is+1 } ∈ E, 1 ≤ s < t.
• 1, 2, 3, 1, 3, 4, 3 is a walk.
• 1, 2, 4, 1 is not a walk.
♠ Connected graph: there exists a walk which passes through all nodes.
• Graph on the picture is connected.

5.3
♣ Path: a walk with all nodes distinct from each other
• 1, 2, 3, 4 is a path • 1, 2, 3, 1 is not a path
♣ Cycle: a walk i1 , i2 , i3 , ..., it = i1 with the same first and last nodes, all nodes i1 , ..., it−1
distinct from each other and t ≥ 4 (i.e., at least 3 distinct nodes).
• 1, 2, 3, 1 is a cycle • 1, 2, 1 is not a cycle
♣ Leaf: a node which is incident to exactly one arc
• Node 4 is a leaf, all other nodes are not leaves

5.4
A: B:

C: D:

E:

♣ Tree: a connected graph without cycles


• A - non-tree (there is a cycle)
• B – non-tree (not connected)
• C, D, E: trees

5.5
♣ Tree: a connected graph without cycles
♠ Observation I: Every nontrivial (with more than one node) tree has a leaf
• Let a graph with m > 1 nodes be connected and without leaves. Then every node is
incident to at least two arcs. Let us walk along the graph in such a way that when leaving
a node, we do not use an arc which was used when entering the node (such a choice of the
“exit” arc always is possible, since every node is incident to at least two arcs). Eventually
certain node i will be visited for the second time. When this happens for the first time, the
segment of our walk “from the first visit to i till the second visit to i” will be a cycle. Thus,
every connected graph with ≥ 2 nodes and without leaves has a cycle and thus cannot be
a tree. 

5.6
♠ Observation II: A connected m-node graph is a tree iff it has m − 1 arcs.
Proof, I: Let G be an m-node tree, and let us prove that G has exactly m − 1 arcs.
Verification is by induction in m. The case of m = 1 is trivial. Assume that every m-node
tree has exactly m − 1 arcs, and let G be a tree with m + 1 nodes. By Observation I, G
has a leaf node; eliminating from G this node along with the unique arc incident to this
node, we get an m-node graph G− which is connected and has no cycles (since G is so).
By Inductive Hypothesis, G− has m − 1 arcs, whence G has m = (m + 1) − 1 arcs. 

Proof, II: Let G be a connected m-node graph with m − 1 arcs, and let us prove that G is
a tree, that is, that G does not have cycles.
Assume that it is not the case. Take a cycle in G and eliminate from G one of the arcs
on the cycle; note that the new graph G0 is connected along with G. If G0 still have a
cycle, eliminate from G0 an arc on a cycle, thus getting connected graph G00 , and so on.
Eventually a connected cycle-free m-node graph (and thus a tree) will be obtained; by part
I, this graph has m − 1 arcs, same as G. But our process reduces the number of arcs by
1 at every step, and we arrive at the conclusion that no steps of the process were in fact
carried out, that is, G has no cycles. 

5.7
♠ Observation III: Let G be a graph with m nodes and m − 1 arcs and without cycles.
Then G is a tree.
Indeed, we should prove that G is connected. Assuming that this is not the case, we can
split G into k ≥ 2 connected components, that is, split the set N of nodes into k nonempty
and non-overlapping subsets N1 , ..., Nk such that no arc in G links two nodes belonging to
different subsets. The graphs Gi obtained from G by reducing the nodal set to Ni , and the
set of arcs – to the set of those of the original arcs which link nodes from Ni , are connected
graphs without cycles and thus are trees. It follows that Gi has Card(Ni ) − 1 arcs, and the
total number of arcs in all Gi is m − k < m − 1. But this total is exactly the number m − 1
of arcs in G, and we get a contradiction. 

♠ Bottom Line: Let G be a m-node graph. Consider the following properties of G


• G is connected
• G has no cycles
• G has exactly m − 1 arcs.
Every two of these properties imply the third one, and all these properties together mean
that G is a tree.

5.8
♠ Observation IV: Let G be a tree, and i, j be two distinct nodes in G. Then there exists
exactly one path which starts at i and ends at j.
Indeed, since G is connected, there exists a walk which starts at i and ends at j; the walk
of this type with minimum possible number of arcs clearly is a path.
Now let us prove that the path π which links i and j is unique. Indeed, assuming that there
exists another path π 0 which starts at i and ends at j, we get a ‘loop” – a walk which
starts and ends at i and is obtained by moving first from i to j along π and then moving
backward along π 0 . If π 0 6= π, this loop clearly contains a cycle. Thus, G has a cycle, which
is impossible since G is a tree. 

5.9
♠ Observation V: Let G be a tree, and let we add to G a new arc γ∗ = {i∗ , j∗ }. In the
resulting graph, there will be exactly one cycle, and the added arc will be on this cycle.
Note: When counting the number of different cycles in a graph, we do not distinguish
between cycles obtained from each other by shifting the starting node along the cycle
and/or reversing order of nodes. Thus, cycles a, b, c, d, a, b, c, d, a, b and a, d, c, b, a are counted
as a single cycle.
Proof: When adding a new arc to a tree with m nodes, we get a connected graph with m
nodes and m arcs. Such a graph cannot be a tree and therefore it has a cycle C. One of
the arcs in C should be γ∗ (otherwise C would be a cycle in G itself, while G is a tree and
does not have cycles). The “outer” (aside of γ∗ ) part of C should be a path in G which
links the nodes j∗ and i∗ , and by Observation IV such a path is unique. Thus, C is unique
as well. 

5.10
Oriented Graphs

♣ Oriented graph G = (N , E) is a pair comprised of


• finite nonempty set of nodes N
• set E of arcs – ordered pairs of distinct nodes
Thus, an arc γ of G is an ordered pair (i, j) comprised of two distinct (i 6= j) nodes. We
say that arc γ = (i, j)
• starts at node i,
• ends at node j, and
• links nodes i and j.

Directed graph.
Nodes: N = {1, 2, 3, 4}.
Arcs: E = {(1, 2), (2, 3), (1, 3), (3, 4)}

5.11
♠ Given a directed graph G and ignoring the directions of arcs (i.e., converting a directed
arc (i, j) into undirected arc {i, j}), we get an undirected graph G.
b
Note: If G contains “inverse to each other” directed arcs (i, j), (j, i), these arcs yield a
single arc {i, j} in G.
b
♠ A directed graph G is called connected, if its undirected counterpart is connected.

Directed graph G Undirected counterpart G


b of G

5.12
♠ Incidence matrix P of a directed graph G = (N , E):
• the rows are indexed by nodes 1, ..., m
• the columns
 are indexed by arcs γ
 1, γ starts at i
• Piγ = −1, γ ends at i
 0, all other cases

 
1 0 1 0
 −1 1 0 0 
P =
 0 −1 −1

1 
0 0 0 −1

5.13
Network Flow Problem

♣ Network Flow Problem: Given


• an oriented graph G = (N , E)
• costs cγ of transporting a unit flow along arcs γ ∈ E,
• capacities uγ > 0 of arcs γ ∈ E – upper bounds on flows in the arcs,
• a vector s of external supplies at thePnodes of G,
find a flow f = {fγ }γ∈E with as small cost γ cγ fγ as it is possible under the restrictions that
• the flow f respects arc directions and capacities: 0 ≤ fγ ≤ uγ for all γ ∈ E
• the flow f obeys the conservation law w.r.t. the vector of supplies s:
P
at every node i, the total incoming flow j:(j,i)∈E fji plus the external supply si at
P
the node is equal to the total outgoing flow j:(i,j)∈E fij .

♠ Observation: The flow conservation law can be written as


Pf = s
where P is the incidence matrix of G.
♠ Notation: From now on, m stands for the number of nodes, and n stands for the number
of arcs in G.

5.14
♠ The Network Flow problem reads:
 
X 
min cγ fγ : P f = s, 0 ≤ f ≤ u . (NWIni)
f  
γ∈E

The Network Simplex Method is a specialization of the Primal Simplex Method aimed at
solving the Network Flow Problem.
♣ Observation: The column sums in P are equal to 0, that is, [1; 1; ...; 1]T P = 0, whence
[1; ...; 1]T P f = 0 for every f Pm
⇒ The rows in P are linearly Pm dependent, and (NWIni) is infeasible unless i=1 si = 0.
♠ Observation: When i=1 si = 0 (which is necessary for (NWIni) to be solvable), the
last equality constraint in (NWIni) is minus the sum of the first m − 1 equality constraints
⇒ (NWIni) is equivalent to the LO program
 
X 
min cγ fγ : Af = b, 0 ≤ f ≤ u , (NW)
f  
γ∈E

where A is the (m − 1) × n matrix comprised of the first m − 1 rows of P , and b =


[s1 ; s2 ; ...; sm−1 ].

5.15
♣ Standing assumptions: Pm
♥ The total external supply is 0: i=1 si = 0
♥ The graph G is connected
Under these assumptions,
• The Network Flow problem reduces to
 
X 
min cγ fγ : Af = b, 0 ≤ f ≤ u . (NW)
f  
γ∈E

• The rows of A are linearly independent (to be shown later), which makes (NW) well
suited for Simplex-type algorithms.
♣ We start with the uncapacitated case where all capacities are +∞, that is, the problem
of interest is  
X 
min cγ fγ : Af = b, f ≥ 0 . (NWU)
f  
γ∈E

5.16
nP o
min
f γ∈E cγ fγ : Af = b, f ≥ 0 . (NWU)

♣ Our first step is to understand what are bases of A.


♣ Spanning trees. Let I be a set of exactly m − 1 arcs of G. Consider the undirected
graph GI with the same m nodes as G, and with arcs obtained from arcs γ ∈ I by ignoring
their directions. Thus, every arc in GI is of the form {i, j}, where (i, j) or (j, i) is an arc
from I.
Note that the inverse to each other arcs (i, j), (j, i), if present in I, induce a single arc
{i, j} = {j, i} in GI .
♠ It may happen that GI is a tree. In this case I is called a spanning tree of G; we shall
express this situation by words the m − 1 arcs from I form a tree when their directions are
ignored.

5.17
• I = {(1, 2), (2, 3), (3, 4)} is a spanning tree
• I = {(2, 3), (1, 3), (3, 4)} is a spanning tree
• I = {(1, 2), (2, 3), (1, 3)} is not a spanning tree (GI has cycles and is not connected)
• I = {(1, 2), (2, 3)} is not a tree (less than 4 − 1 = 3 arcs)

5.18
nP o
min
f γ∈E cγ fγ : Af = b, f ≥ 0 . (NWU)

♣ We are about to prove that the bases of A are exactly the spanning trees of G.
Observation 0: If I is a spanning tree, then I does not include pairs of opposite to each
other arcs, and thus there is a one-to-one correspondence between arcs in I and induced
arcs in GI .
Indeed, I has m − 1 arcs, and GI is a tree and thus has m − 1 arcs as well; the latter would
be impossible if two arcs in I were opposite to each other and thus would induce a single
arc in GI . 

5.19
Observation I: Every set I 0 of arcs of G which does not contain inverse to each other
arcs and is such that arcs of I 0 do not form cycles after their directions are ignored can be
extended to a spanning tree I. In particular, spanning trees do exist.
Proof. It may happen (case A) that the set E of all arcs of G does not contain in-
verse arcs and arcs from E do not form cycles when their directions are ignored. Since G
is connected, the undirected counterpart of G is a connected graph with no cycles, i.e.,
it has exactly m − 1 arcs. But then E has m − 1 arcs and is a spanning tree, and we are done.

5.20
”It may happen (case A) that the set E of all arcs of G does not contain inverse arcs and
arcs from E do not form cycles when their directions are ignored. Since G is connected, the
undirected counterpart of G is a connected graph with no cycles, i.e., it has exactly m − 1
arcs. But then E has m − 1 arcs and is a spanning tree, and we are done.”
♥ If A is not the case, then either E has a pair of inverse arcs (i, j), (j, i) (case B.1), or E
does not contain inverse arcs and has a cycle (case B.2).
• In the case of B.1 one of the arcs (i, j), (j, i) is not in I 0 (since I 0 does not contain inverse
arcs). Eliminating this arc from G, we get a connected graph G1 , and arcs from I 0 are
among the arcs of G1 .
• In the case of B.2 G has a cycle; this cycle includes an arc γ 6∈ I 0 (since arcs from I 0 make
no cycles). Eliminating this arc from G, we get a connected graph G1 , and arcs from I 0 are
among the arcs of G1 .
♥ Assuming that A is not the case, we build G1 and apply to this connected graph the
same construction as to G. As a result, we
— either conclude that G1 is the desired spanning tree,
— or eliminate from G1 an arc, thus getting connected graph G2 such that all arcs of I 0
are arcs of G2 .
♥ Proceeding in this way, we must eventually terminate; upon termination, the set of arcs
of the current graph is the desired spanning tree containing I 0 . 

5.21
Note: Columns of A are indexed by arcs
⇒ Bases of A are some collections I of m − 1 arcs of G.
♣ Facts:
A. Bases of A are exactly the collections I of m − 1 arcs from E which become spanning
trees after their directions are ignored.
B. If I is a base of A, then the (m − 1) × (m − 1) submatrix AI of A after permuting rows
and columns becomes lower-triangular
nP o entries equal to ±1.
with all diagonal
min
f γ∈E cγ fγ : Af = b, f ≥ 0 . (NWU)
♣ Corollary [Integrality of Network Flow Polytope]: Let b be integral. Then all basic
solutions to (NWU), feasible or not, are integral vectors.
Indeed, the basic entries in a basic solution solve the system Bu = b with integral right hand
side and lower triangular nonsingular matrix B with integral entries and diagonal entries ±1
⇒ all entries in u are integral. 

5.22
Proof of Facts

Claim I: Let I be a spanning tree. Then the columns of A with indexes from I are linearly
independent, that is, I is a basis of A.
Proof. • Consider the graph GI . This is a tree; therefore, for every node i there is a unique
path in GI leading from i to the root node m. We can renumber the nodes j in such a way
that the new indexes j 0 of nodes strictly increase along such a path. Note that the new
number of the root node still is m: m0 = m.
Indeed, we can associate with a node i its “distance” from the root node, defined as the
number of arcs in the unique path from i to m, and then reorder the nodes, making the
distances nonincreasing in the new serial numbers of the nodes.
♥ For example, with GI as shown:

we could set 20 = 1, 10 = 2, 30 = 3, 40 = 4

5.23
Proofs of Facts, continued
• Now let us associate with an arc γ = (i, j) ∈ I its serial number p(γ) = min[i0 , j 0 ]. Observe
that different arcs from I get different serial numbers. Indeed, assuming w.l.o.g. that i0 < j 0 ,
the walk in GI
“from i to j along γ and then from j to the root node m
along the unique path π in GI leading from j to m”
is a path in GI , since 0 -indexes of the nodes in π are ≥ j 0 and thus differ from i0 . If now two
distinct from each other arcs from I, γ = (i, j) γ̄ = (ī, j̄) have equal serial numbers:
min[i0 , j 0 ] = min[ī0 , j̄ 0 ] = s0 ,
then there is a path in GI from s to the root node which starts with the arc γ, same as
there is a path in Gi from s to the root node which starts with γ̄ 6= γ, which is impossible,
since GI is a tree.

Since the serial numbers take values 1, ..., m − 1 and the serial numbers of different arcs
from I are different, serial numbers indeed form an ordering of the m − 1 arcs from I.
♥ In our example, where GI is the graph

and 20 = 1, 10 = 2, 30 = 3, 40 = 4 we get
p(2, 3) = min[20 ; 30 ] = 1,
p(1, 3) = min[10 , 30 ] = 2,
p(3, 4) = min[30 , 40 ] = 3

5.24
Proofs of Facts, continued
• Now let AI be the (m − 1) × (m − 1) submatrix of A comprised of columns with indexes
from I. Let us reorder the columns according to the serial numbers of the arcs γ ∈ I, and
rows – according to the new indexes i0 of the nodes. The resulting matrix BI is or is not
singular simultaneously with AI .
Observe that in B, same as in AI , every column
— either has two nonzero entries (case A), one equal to 1 and one equal to -1 [this is the
case when the arc γ indexing the column is not incident to the root node m],
— or has exactly one nonzero entry (case B), equal to +1 or to -1 [this is the case when
the root node m is incident to the arc γ indexing the column]

• in case A, the index ν of column is the minimum of row indexes of the two nonzero entries
in the column.

• in case B, the index ν of the column is the row index of the only nonzero entry in the
column.

⇒ In both cases, ν is the minimum of row indexes of the nonzero entries in the column.
In other words, B is a lower triangular matrix with diagonal entries ±1 and therefore it is
nonsingular. 

5.25
Proofs of Facts, continued
♥ In our example I = {(2, 3), (1, 3), (3, 4)}, GI is the graph

we have 10 = 2, 20 = 1, 30 = 3, 40 = 4, p(2, 3) = 1, p(1, 3) = 2, p(3, 4) = 3 and


 
1 0 1 0
A =  −1 1 0 0 
0 −1 −1 1
arcs indexing columns, from left to right: (1,2),(2,3),(1,3),(3,4)

so that
   
0 1 0 1 0 0
AI =  1 0 0 , B= 0 1 0 
−1 −1 1 −1 −1 1
As it should be B is lower triangular with diagonal entries ±1.

5.26
Proofs of Facts, continued
Corollary: The m − 1 rows of A are linearly independent.
Indeed, by Observation I, G has a spanning tree I, and by Claim I, the corresponding
(m − 1) × (m − 1) submatrix AI of A is nonsingular.
♣ We have seen that spanning trees are the bases of A. The inverse is also true:
Claim II: Let I be a basis of A. Then I is a spanning tree.
Proof. A basis should be a set of m − 1 indexes of columns in A — i.e., a set I of m − 1
arcs — such that the columns Aγ , γ ∈ I, of A are linearly independent.
• Observe that I cannot contain inverse arcs (i, j), (j, i), since the sum of the corresponding
columns in P (and thus in A) is zero.

5.27
Proofs of Facts, continued
• Consequently GI has exactly m − 1 arcs. We want to prove that the m-node graph GI
is a tree, and to this end it suffices to prove that GI has no cycles. Let, on the contrary,
i1 , i2 , ..., it = i1 be a cycle in GI (t > 3, i1 , ..., it−1 are distinct from each other). Consequently,
I contains t − 1 distinct arcs γ1 , ..., γt−1 such that for every `
— either γ` = (i` , i`+1 ) (“forward arc”),
— or γ` = (i`+1 , i` ) (“backward arc”).
Setting s = 1 or s = −1 depending on whether γs is a forward or a backward arc and
Pt−1
denoting by Aγ the column of A indexed by γ, we get s=1 s Aγs = 0, which is impossible,
since the columns Aγ , γ ∈ I, are linearly independent. 

5.28
Network Simplex Algorithm
Building
nPBlock I: Computing Basic o Solution
min
f γ∈E cγ fγ : Af = b, f ≥ 0 . (NWU)

♣ As a specialization of the Primal Simplex Method, the Network Simplex Algorithm works
with basic feasible solutions.
There is a simple algorithm allowing to specify the basic feasible solution f I associated with
a given basis (i.e., a given spanning tree) I.
Note: f I should be a flow (i.e., Af I = b) which vanishes outside of I (i.e., fγI = 0 whenever
γ 6∈ I).

5.29
♠ The algorithm for specifying f I is as follows:
• GI is a tree and thus has a leaf i∗ ; let γ ∈ I be an arc which is incident to node i∗ . The
flow conservation law specifies fγI :

−si∗ , γ = (j∗ , i∗ )
fγI =
si∗ , γ = (i∗ , j∗ )
We specify fγI , eliminate from I node i∗ and arc γ and update sj∗ to account for the flow in
the arc γ:
s+
j∗ = sj∗ + si∗
We end up with m − 1-node graph G1I equipped with the updated (m − 1)-dimensional vector
of external supplies s1 obtained from s by eliminating si∗ and replacing sj∗ with sj∗ + si∗ . Note
that the total of updated supplies is 0.
• We apply to G1I the same procedure as to GI , thus getting one more entry in f I , reducing
the number of nodes by one and updating the vector of external supplies, and proceed in
this fashion until all entries fγI , γ ∈ I, are specified.

5.30
s = [1; 2; 3; −6]

♠ Illustration: Let I = {(1, 3), (2, 3), (3, 4)}. Graph GI is

I
• We choose a leaf in GI , specifically, node 1, and set f1,3 = s1 = 1. We then eliminate
from GI the node 1 and the incident arc, thus getting the graph G1I , and convert s into s1 :

s12 = s2 = 2; s13 = s3 + s1 = 4; s14 = s4 = −6

5.31
s12 = s2 = 2; s13 = s3 + s1 = 4; s14 = s4 = −6

• We choose a leaf in the new graph, say, the node 4, and set f3,4I = −s1 = 6. We then
4
eliminate from GI the node 4 and the incident arc, thus getting the graph G2I , and convert
1

s1 into s2 :

s22 = s12 = 2; s23 = s13 + s14 = −2


• We choose a leaf, say, node 3, in the new graph and set f2,3 I = −s2 = 2. The algorithm
3
is completed. The resulting basic flow is
I = 0, f I = 2, f I = 1, f I = 6.
f1,2 2,3 1,3 3,4

5.32
Building Block II: Computing Reduced Costs

There is a simple algorithm allowing to specify the reduced costs associated with a given
basis (i.e., a given spanning tree) I.
Note: The reduced costs should form a vector cI = c − AT λI and should satisfy cIγ = 0
whenever γ ∈ I. Observe that the span of the columns of AT is the same as the span of the
columns of P T , thus, we lose nothing when setting cI = c − P T µI . The requirement cIγ = 0
for γ ∈ I reads
cij = µi − µj ∀γ = (i, j) ∈ I (∗),
while
cIij = cij − µi + µj ∀γ = (i, j) ∈ E (!)
♠ To achieve (∗), we act as follows. When building f I , we build a sequence of trees
G0I := GI , G1I ,...,GIm−2 in such a way that Gs+1 I is obtained from GsI by eliminating a leaf and
the arc incident to this leaf.
To build µ, we look through these graphs in the backward order.

5.33
• GIm−2 has two nodes, say, i∗ and j∗ , and a single arc which corresponds to an arc γ =
(γ s , γ f ) ∈ I, where either γ s = j∗ , γ f = i∗ , or γ s = i∗ , γ f = j∗ . We choose µi∗ , µj∗ such that
cγ = µγ s − µγ f
• Let we already have assigned all nodes i of GsI with the “potentials” µi in such a way that
for every arc γ ∈ I linking the nodes from GsI it holds
cγ = µγ s − µγ f (∗s )
The graph Gs−1 I has exactly one node, let it be i∗ , which is not in GsI , and exactly one arc
{j∗ , i∗ }, obtained from an oriented arc γ̄ ∈ I, which is incident to i∗ . Note that µj∗ is already
defined. We specify µi∗ from the requirement
cγ̄ = µγ̄ s − µγ̄ f ,
thus ensuring the validity of (∗s−1 ).
• After G0I is processed, we get the potentials µ satisfying the target relation
cij = µi − µj ∀γ = (i, j) ∈ I (∗),
and then define the reduces costs according to
cIγ = cγ + µγ f − µγ s γ ∈ E.

5.34
Illustration: Let G and I be as in the previous illustrations:

I = {(1, 3), (2, 3), (3, 4)}


and let
c1,2 = 1, c2,3 = 4, c1,3 = 6, c3,4 = 8
We have already found the corresponding graphs GsI :

G0I G1I G2I


• We look at graph G2I and set µ2 = 0, µ3 = −4, thus ensuring c2,3 = µ2 − µ3
• We look at graph G1I and set µ4 = µ3 − c3,4 = −12, thus ensuring c3,4 = µ3 − µ4
• We look at graph G0I and set µ1 = µ3 + c1,3 = 2, thus ensuring c1,3 = µ1 − µ3
The reduced costs are
cI1,2 = c1,2 + µ2 − µ1 = −1, cI1,3 = cI2,3 = cI3,4 = 0.

5.35
Iteration
nPof the Network Simplex o Algorithm
min
f γ∈E cγ fγ : Af = b, f ≥ 0 . (NWU)

♠ At the beginning of an iteration, we have at our disposal a basis – a spanning tree – I


along with the associated feasible basic solution f I and the vector of reduced costs cI .
• It is possible that cI ≥ 0. In this case we terminate, f I being an optimal solution.
Note: We are solving a minimization problem, so that sufficient condition for optimality is
that the reduced costs are nonnegative.
• If cI is not nonnegative, we identify an arc γ b in G such that cI < 0. Note: γ 6∈ I due to

I
cγ = 0 for γ ∈ I. fb enters the basis.
γ
• It may happen that γ e = (i2, i1) ∈ I. Setting hbγ = heγ = 1
b = (i1, i2) is inverse to an arc γ
and hγ = 0 when γ is different from γ e and γb, we have Ah = 0 and cT h = [cI ]T h = cI < 0

I T I
⇒ f + th is feasible for all t > 0 and c (f + th) → −∞ as t → ∞
⇒ The problem is unbounded, and we terminate.

5.36
• Alternatively, γ b = (i1, i2) is not inverse to an arc in I. In this case, adding to GI the arc
{i1 ; i2 }, we get a graph with a single cycle C, that is, we can point out a sequence of nodes
i1 , i2 , ..., it = i1 such that t > 3, i1 , ..., it−1 are distinct, and for every s, 2 ≤ s ≤ t − 1,
— either the arc γs = (is , is+1 ) is in I (“forward arc γs ”),
— or the arc γs = (is+1 , is ) is in I (“backward arc γs ”).
We set γ1 = γ b = (i1, i2) and treat γ1 as a forward arc of C.
We define the flow h according  to
 1, γ = γs is forward
hγ = −1, γ = γs is backward
 0, γ is distinct from all γ
s
By construction, Ah = 0, hb γ
= 1 and hγ = 0 whenever γ 6= γ b and γ 6∈ I.
Setting f I (t) = f I + th, we have Af I (t) ≡ b and cT f I (t) = cT f I + t[cI ]T h = cT [f I ] + tcI →

−∞, t → ∞.
• If h ≥ 0, f I (t) is feasible for all t ≥ 0 and cT f I (t) → −∞, t → ∞
⇒ the problem is unbounded, and we terminate

5.37
• If hγ = −1 for some γ (all these γ are in I), we set
e = argmin{fγI : hγ = −1}[feγ leaves the basis]
γ
γ∈I
t̄ = fI

I + = (I ∪ γ
b)\{γe}
+
f I = f I + t̄h
We have defined the new basis I + along with the corresponding basic feasible flow f I+ and
pass to the next iteration.

5.38
Illustration:

I = {(1, 3), (2, 3), (3, 4)} f1,2 I = 0, f I = 2, f I = 1, f I = 6 cI I I I


2,3 1,3 3,4 1,2 = −1, c1,3 = c2,3 = c3,4 = 0
• cI1,2 < 0 ⇒ f1,2 enters the basis
• Adding γ b = (1, 2) to I, we get the cycle
i1 = 1(1, 2)i2 = 2(2, 3)i3 = 3(1, 3)i4 = 1
(magenta: forward arcs, blue: backward arcs)
⇒ h1,2 = h2,3 = 1, h1,3 = −1, h1,4 = 0
⇒ f1,2I (t) = t, f I (t) = 2 + t, f I (t) = 1 − t, f I (t) = 6
2,3 1,3 3,4
I
⇒ the largest t for which f (t) is feasible is t = 1, variable f1,3 leaves the basis
⇒ I + = {(1, 2), (2, 3), (3, 4)}, f1,2 I + = 1, f I + = 3, f I + = 6, f I + = 0, cost change is cI f I + = −1.
2,3 3,4 1,3 1,2 1,2
The iteration is over.
• New potentials are given by
1 = c1,2 = µ1 − µ2 , 4 = c2,3 = µ2 − µ3 , 8 = c3,4 = µ3 − µ4
⇒ µ1 = 1, µ2 = 0, µ3 = −4, µ4 = −12
+ + + +
cI1,2 = cI2,3 = cI3,4 = 0, cI1,3 = c1,3 − µ1 + µ3 = 6 − 1 − 4 = 1
⇒ New reduced costs are nonnegative
+
⇒ f I is optimal.

5.39
Capacitated Network Flow Problem

♣ Capacitated Network Flow problem on a graph G = (N , E) is to find a minimum cost flow


on G which fits given external supplies and obeys arc capacity bounds:
 T
min c f : Af = b, 0 ≤ f ≤ u , (NW)
f

• A is the (m − 1) × n matrix comprised of the first m − 1 rows of the incidence matrix P


of G,
• b = [s1 ; s2 ; ...; sm−1 ] is the vector comprised of the first m − 1 entries of the vector s of
external supplies,
• u is the vector of arc capacities.
♣ We are about to present a modification of the Network Simplex algorithm for solving
(NW).

5.40
Primal Simplex Method
for
Problems with Bounds on Variables

♣ Consider an LO program
min cT x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n

(P )
x

Standing assumptions:
• for every j, `j < uj (“no fixed variables”)
• for every j, either `j > −∞, or uj < +∞, or both (“no free variables”);
• The rows in A are linearly independent.
Note: (P ) is not in the standard form unless `j = 0, uj = ∞ for all j.
However: The PSM can be adjusted to handle problems (P ) in nearly the same way as it
handles the standard form problems.

5.41

min cT x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n (P )
x

♣ Observation I: if x is a vertex of the feasible set X of (P ), then there exists a basis I


of A such that all nonbasic entries in x are on the bounds:
∀j 6∈ I : xj = `j or xj = uj (∗)
Vice versa, every feasible solution x which, for some basis I, has all nonbasic entries on the
bounds, is a vertex of X.
Proof: identical to the one for the standard case.

5.42

min cT x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n (P )
x

Definition: Vector x is called basic solution associated with a basis I of A, if Ax = b and


all non-basic (with indexes not in I) entries in x are on their bounds.
Observation I says that the vertices of the feasible set of (P ) are exactly the basic solutions
which are feasible.
Difference with the standard case: In the standard case, a basis uniquely defines the
associated basic solution. In the case of (P ), there can be as many basic solutions associated
with a given basis I as many ways there are to set nonbasic variables to their bounds. After
nonbasic variables are somehow set to their bounds,
P the jbasic variables are uniquely defined:
−1
xI = AI [b − j6∈I xj A ]
where Aj are columns of A and AI is the submatrix of A comprised of columns with indexes
in I.

5.43

min cT x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n (P )
x

♣ Reduced costs. Given a basis I for A, the corresponding reduced costs are defined
exactly as in the standard case: as the vector cI uniquely defined by the requirements that
it is of the form c − AT λ and all basic entries in the vector are zeros: cIj = 0, j ∈ I. The
formula for cI is
λI
z }| {
I T −T
c = c − A AI cI
where cI is the vector comprised of the basic entries in c.
Note: As in the standard case, reduced costs associated with a basis are uniquely defined
by this basis.
♣ Observation II: On the feasible affine plane M = {x : Ax = b} of (P ) one has
cT x − [cI ]T x = const
In particular,
x0 , x00 ∈ M ⇒ cT x0 − cT x00 = [cI ]T x0 − [cI ]T x00
Proof. Indeed, if x ∈ M, then
[cI ]T x − cT x = [c − AT λI ]T x − cT x = −[λI ]T Ax = −[λI ]T b
and the result is independent of x ∈ M.

5.44

min cT x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n (P )
x

♣ Corollary: Let I be a basis of A, x∗ be an associated feasible basic solution, and cI be the


associated with I vector of reduced costs. Assume that nonbasic entries in cI and nonbasic
entries in x satisfy the relation  ∗
xj = `j ⇒ cIj ≥ 0
∀j 6∈ I :
x∗j = uj ⇒ cIj ≤ 0
Then x∗ is an optimal solution to (P ).
Proof. By Observation II,  problem
min [cI ]T x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n (P 0 )
x
is equivalent to (P ) ⇒ it suffices to prove that x∗ is optimal for (P 0 ). This is evident:
x is feasible ⇒
=0
z}|{
T I
[cI ] [x − x∗ ] = ∗
P
j∈I cj [xj − xj ]
≥0 ≤0
z}|{ z}|{
P I P I
+ j6∈I, cj [xj − `j ] + j6∈I, cj [x j − uj ]
x∗ =`j | {z } x∗ =uj | {z }
j j
≥0 ≤0
≥ 0

5.45

min cT x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n (P )
x

♣ Iteration of the PSM as applied to (P ) is as follows:


♠ At the beginning of the iteration, we have at our disposal current basis I, an associated
feasible basic solution xI , and the associated reduced costs cI and set
J` = {j 6∈ I : xIj = `j }, Ju = {j 6∈ I : xIj = uj }.
♠ At the iteration, we
A. Check whether cIj ≥ 0 ∀j ∈ J` and cIj ≤ 0 ∀j ∈ Ju . If it is the case, we terminate – x is an
optimal solution to (P ) (by Corollary).
B. If we do not terminate according to A, we pick j∗ such that j∗ ∈ J` & cIj∗ < 0 or
j∗ ∈ Ju & cIj∗ > 0. Further, we build the ray x(t) of solutions such that
Ax(t) = b ∀t ≥ 0
xj (t) = xIj ∀j 6∈ [I ∪ {j∗ }]
xj∗ + t = `j∗ + t, j∗ ∈ J`
xj∗ (t) =
xj∗ − t = uj∗ − t, j∗ ∈ Ju
Note that 
1, j∗ ∈ J`
x(t) = xI − tA−1 A j∗ ,  =
I −1, j∗ ∈ Ju

5.46

min cT x : Ax = b, `j ≤ xj ≤ uj , 1 ≤ j ≤ n (P )
x
J` = {j 6∈ I : xIj = `j }, Ju = {j 6∈ I : xIj = uj }.

Situation: We have built a ray of solutions 


1, j∗ ∈ J`
x(t) = xI − tA−1 j∗
I A ,  = −1, j∗ ∈ Ju
such that
• Ax(t) = b for all t ≥ 0
• the only nonzero entries in x(t) − xI are the basic entries and the j∗ -th entry, the latter
being equal to t.
Note: By construction, the objective improves along the ray x(t):
cT [x(t) − xI ] = [cI ]T [x(t) − xI ] = cIj∗ [xj∗ (t) − xIj∗ ] = cIj∗ t = −|cIj∗ |t ♥
x(0) = xI is feasible. It may happen that as t ≥ 0 grows, all entries xj (t) of x(t) stay all
the time in their feasible ranges [`j , uj ]. If it is the case, we have discovered problem’s
unboundedness and terminate.
♥ Alternatively, when t grows, some of the entries in x(t) eventually leave their feasible
ranges. In this case, we specify the largest t = t̄ ≥ 0 such that x(t̄) still is feasible. As t = t̄,
one of the depending on t entries xi∗ (t) is about to become infeasible — it is in the feasible
+
range when t = t̄ and leaves this range when t > t̄. We set I + = I ∪ {j∗ }\{i∗ }, xI = x(t̄),
+
update cI into cI and pass to the next iteration.

5.47
♠ As a result of an iteration, the following can occur:
♥ We terminate with optimal solution xI and “optimality certificate” cI
♥ We terminate with correct conclusion that the problem is unbounded
+
♥ We pass to the new basis I + and the new feasible basic solution xI . If it happens,
• the objective either improves, or remains the same. The latter can happen only when
+
x = xI (i.e., the above t̄ = 0). Note that this option is possible only when some of the
I

basic entries in xI are on their bounds (“degenerate basic feasible solution”), and in this
case the basis definitely changes;
• the basis can change and can remain the same (the latter happens if i∗ = j∗ . i.e., as
a result of the iteration, certain non-basic variable jumps from one endpoint of its feasible
range to another endpoint of this range).
♠ In the nondegenerate case (i.e., there are no degenerate basic feasible solutions), the
PSM terminates in finite time.
♠ Same as the basic PSM, the method can be initialized by solving auxiliary Phase I program
with readily available basic feasible solution.

5.48
♣ The Network Simplex Method for the capacitated Network Flow problem
 T
min c f : Af = b, ` ≤ f ≤ u , (NW)
f

associated with a graph G = (N = {1, ..., m}, E) is the network flow adaptation of the above
general construction. We assume that G is connected. Then
♠ Bases of A are spanning trees of G – the collections of m − 1 arcs from E which form a
tree after their orientations are ignored;
♠ Basic solution f I associated with basis I is a (not necessarily feasible) flow such that
Af I = b and ∀γ 6∈ I : fγI is either `γ , or uγ .

5.49

min cT f : Af = b, ` ≤ f ≤ u , (NW)
f

♠ At the beginning of an iteration, we have at our disposal a basis I along with associated
feasible basic solution f I .
♥ In course of an iteration,
• we build the reduced costs cIij = cij + µj − µi , (i, j) ∈ E, where the “potentials” µ1 , ..., µm
are given by the requirement
cij = µi − µj ∀(i, j) ∈ I
The algorithm for building the potentials is exactly the same as in the uncapacitated case.
• we check the signs of the non-basic reduced costs to find a “promising” arc – an arc
γ∗ = (i∗ , j∗ ) 6∈ I such that either fγI∗ = `γ∗ and cIγ∗ < 0, or fγI∗ = uγ∗ and cIγ∗ > 0.
♥ If no promising arc exists, we terminate — f I is an optimal solution to (NW).
Alternatively, we find a promising arc γ∗ = (i∗ , j∗ ). Note that γ 6∈ I.
Case A: γ∗ is not inverse to an arc in I. We add arc γ∗ to I. Since I is a spanning tree,
there exists a cycle
C = i1 = i∗ (i∗ , j∗ ) i2 = j∗ γ2 i3 γ3 i4 ...γt−1 it = i1
| {z }
≡γ1
where
• i1 , ..., it−1 are distinct from each other nodes,
• γ2 , ..., γt−1 are distinct from each other arcs from I, and
• for every s, 1 ≤ s ≤ t − 1,
• either γs = (is−1 , is ) (“forward arc”),
• or γs = (is , is−1 ) (“backward arc”).
Note: γ1 = (i∗ , j∗ ) is a forward arc.

5.50
C = i1 = i∗ (i∗ , j∗ ) i2 = j∗ γ2 i3 γ3 i4 ...γt−1 it = i1
| {z }
≡γ1
γ2 , ..., γt−1 ∈ I

♥ We identify the cycle C and define  the flow h as follows:


 0, γ does not enter C
hγ = , γ is a forward arc in C
 −, γ is a backward arc in C

1, fγI∗ = `γ∗ & cIγ∗ < 0
=
−1, fγI∗ = uγ∗ & cIγ∗ > 0
Setting f (t) = f I + th, we ensure that
• Af (t) ≡ b
• cT f (t) decreases as t grows,
• f (0) = f I is feasible,
• fγ∗ (t) = fγI∗ + t is in the feasible range for all small enough t ≥ 0,
• fγ (t) are affine functions of t and are constant when γ is not in C.

5.51
Setting f (t) = f I + th, we ensure that
• Af (t) ≡ b
• cT f (t) decreases as t grows,
• f (0) = f I is feasible,
• fγ∗ (t) = fγI∗ + t is in the feasible range for all small enough t ≥ 0,
• fγ (t) are affine functions of t and are constant when γ is not in C.
♥ It is easy to check whether f (t) remains feasible for all t ≥ 0. If it is the case, the problem
is unbounded, and we terminate.
Alternatively, it is easy to find the largest t = t̄ ≥ 0 such that f (t̄) still is feasible. When
t = t̄, at least one of the flows fγ (t) which do depend on t (⇒ γ belongs to C) is about to
leave its feasible range. We specify the corresponding arc γ b and define
• the new basis as I + = I ∪ {γ∗ }\{γ b}
+
• the new basic feasible solution as f I = f (t̄)
and proceed to the next iteration.

5.52
Case B: γ∗ = (i∗ , j∗ ) is inverse to an arc γ∗0 = (j∗ , i∗ ) ∈ I. Here we act exactly as in Case A,
but with  I
 fγ , γ 6= γ∗ , γ 6= γ∗0
f (t) = fγI + t, γ = γ∗
 I 0
 fγ − It, γ = γ∗ I
1, fγ∗ = `γ∗ & cγ∗ < 0
=
−1, fγI∗ = uγ∗ & cIγ∗ > 0

5.53
♣ Illustration:

c1,2 = 1, c2,3 = 4, c1,3 = 6, c3,4 = 8, s = [1; 2; 3; −6]


`ij = 0 ∀i, j; u1,2 = ∞, u1,3 = ∞, u2,3 = 2, u3,4 = 7
♥ Let I = {(1, 3), (2, 3), (3, 4)}
I = 0, f I = 2, f I = 1, f I = 6 — feasible!
⇒ f1,2 2,3 1,3 3,4
I I I I
⇒ c1,2 = −1, c1,3 = c2,3 = c3,4 = 0
♥ cI1,2 = −1 < 0 and f1,2 I = 0 = `
1,2
⇒ γ∗ = (1, 2) is a promising arc, and it is profitable to increase its flow ( = 1)
♥ Adding (1, 2) to I, we get the cycle
1(1, 2)2(2, 3)3(1, 3)1
⇒ h1,2 = 1, h2,3 = 1, h1,3 = −1, h3,4 = 0
⇒ f1,2 (t) = t, f2,3 (t) = 2 + t, f1,3 (t) = 1 − t, f3,4 (t) = 6
♥ The largest t̄ for which all fγ (t) are in their feasible ranges is t̄ = 0, the corresponding
arc being γb = (2, 3)
⇒ The new basis is I + = {(1, 2), (1, 3), (3, 4)}, the new basic feasible flow is the old one:
I + = 0, f I + = 2, f I + = 1, f I + = 6
f1,2 2,3 1,3 3,4

5.54
♥ The new basis is I + = {(1, 2), (1, 3), (3, 4)}, the new basic feasible flow is the old one:
I + = 0, f I + = 2, f I + = 1, f I + = 6
f1,2 2,3 1,3 3,4
♥ Computing the new reduced costs:
1 = c1,2 = µ1 − µ2 , 6 = c1,3 = µ1 − µ3 , 8 = c3,4 = µ3 − µ4
⇒ µ1 = 1, µ2 = 0, µ3 = −5, µ4 = −13
+ + + +
⇒ cI1,2 = cI1,3 = cI3,4 = 0, cI23 = c23 + µ3 − µ2 = 4 − 5 + 0 = −1
All positive reduced costs (in fact, there are none) correspond to flows on their lower
+
bounds, and all negative reduced costs (namely, cI2,3 = −1) correspond to flows on their
upper bounds
+
⇒ f I is optimal!

5.55
Complexity of LO

♣ A generic computational problem P is a family of instances — particular problems of


certain structure, identified within P by a (real or binary) data vector. For example, LO is
a generic problem LO with instances of the form
min cT x : Ax ≤ b [A = [A1 ; ...; An ] : m × n]
x
An LO instance – a particular LO program – is specified within this family by the data
(c, A, b) which can be arranged in a single vector
[m; n; c; A1 ; A2 ; ...; An ; b].
♣ Investigating complexity of a generic computational problem P is aimed at answering the
question

How the computational effort of solving an instance P of P depends on the “size”


Size(P ) of the instance?

♠ Precise meaning of the above question depends on how we measure the computational
effort and how we measure the sizes of instances.

6.1
♣ Let the generic problem in question be LO – Linear Optimization. There are two natural
ways to treat the complexity of LO:
♠ Real Arithmetic Complexity Model:
• We allow the instances to have real data and measure the size of an instance P by the
sizes m(P ), n(P ) of the constraint matrix of the instance, say, set Size(P ) = m(P )n(P )
• We take, as a model of computations, the Real Arithmetic computer – an idealized
computer capable to store reals and carry out operations of precise Real Arithmetics (four
arithmetic operations and comparison), each operation taking unit time.
• The computational effort in a computation is defined as the running time of the compu-
tation. Since all operations take unit time, this effort is just the total number of arithmetic
operations in the computation.

6.2

min cT x : Ax ≤ b (P )
x

♠ Rational Arithmetic (“Fully finite”) Complexity Model:


• We allow the instances to have rational data and measure the size of an instance P by its
binary length L(P ) — the total number of binary digits in the data c, A, b of the instance;
• We take, as a model of computations, the Rational Arithmetic computer – a computer
which is capable to store rational numbers (represented by pairs of integers) and to carry out
operations of precise Rational Arithmetics (four arithmetic operations and comparison), the
time taken by an operation being a polynomial π(`) in the total binary length ` of operands.
• The computational effort in a computation is again defined as the running time of the
computation. However, now this effort is not just the total number of operations in the
computation, since the time taken by an operation depends on the binary length of the
operands.

6.3
♠ In both Real and Rational Arithmetics complexity models, a solution algorithm B for a
generic problem P is a code for the respective computer such that executing the code on
the data of every instance P of P, the running time of the resulting computation is finite,
and upon termination the computer outputs an exact solution to the instance P . In the
LO case, the latter means that upon termination, the computer outputs
• either an optimal solution to P ,
• or the correct claim “P is infeasible,”
• or the correct claim “P is unbounded.”
For example, the Primal Simplex Method with anti-cycling pivoting rules is a solution
algorithm for LO. Without these rules, the PSM is not a solution algorithm for LO.
♠ A solution algorithm B for P is called polynomial time (“computationally efficient”), if
its running time on every instance P of P is bounded by a polynomial in the size of the
input:
∃a, b : ∀P ∈ P : hrunning time of B on P i ≤ a · Sizeb (P ).
♠ We say that a generic problem P is polynomially solvable (“computationally tractable”),
if it admits a polynomial time solution algorithm.

6.4
Comments:
♠ Polynomial solvability is a worst-case-oriented concept: we want the computational ef-
fort to be bounded by a polynomial of Size(P ) for every P ∈ P. In real life, we would be
completely satisfied by a good bound on the effort of solving most, but not necessary all,
of the instances. Why not to pass from the worst to the average running time?
Answer: For a generic problem P like LO, with instances coming from a wide spectrum of
applications, there is no hope to specify in a meaningful way the distribution according to
which instances arise, and thus there is no meaningful way to say w.r.t. which distribution
the probabilities and averages should be taken.
♠ Why computational tractability is interpreted as polynomial solvability? A high degree
polynomial bound on running time is, for all practical purposes, not better than an expo-
nential bound!

6.5
”Why computational tractability is interpreted as polynomial solvability? A high degree
polynomial bound on running time is, for all practical purposes, not better than an expo-
nential bound!”
Answer:
• At the theoretical level, it is good to distinguish first of all between “definitely bad” and
“possibly good.” Typical non-polynomial complexity bounds are exponential, and these
bounds definitely are bad from the worst-case viewpoint. High degree polynomial bounds
also are bad, but as a matter of fact, they do not arise.
• The property of P to be/not to be polynomially solvable remains intact when changing
“small details” like how exactly we encode the data of an instance to measure its size, what
exactly is the polynomial dependence of the time taken by an arithmetic operation on the
bit length of the operands, etc., etc. As a result, the notion of “computational tractability”
becomes independent of technical “second order” details.

6.6
• After polynomial time solvability is established, we can investigate the complexity in more
details — what are the degrees of the corresponding polynomials, what are the best under
circumstances data structures and polynomial time solution algorithms, etc., etc. But first
thing first...
• Proof of a pudding is in eating: the notion works! As applied in the framework of Rational
Arithmetic Complexity Model, it results in fully compatible with common sense classification
of Discrete Optimization problems into “difficult” and “easy” ones.

6.7
Classes P and NP

♣ Consider a generic discrete problem P, that is, a generic problem with rational data and
instances of the form “given rational data d of an instance, find a solution to the instance,
that is, a rational vector x such that A(d, x) = 1.” Here A is the associated with P predicate
– function taking values 1 (“true”) and 0 (“false”).
Examples:
A. Finding rational solution to a system of linear inequalities with rational data.
Note: From our general theory, a solvable system of linear inequalities with rational data
always has a rational solution.
B. Finding optimal rational primal and dual solutions to an LO program with rational data.
Note: Writing down the primal and the dual constraints and the constraint “the duality gap
is zero,” problem B reduces to problem A.
♣ Given a generic discrete problem P, one can associate with it its feasibility version Pf
“given rational data d of an instance, check whether the instance has a solution.”

6.8
♣ Examples of generic feasibility problems:
• Checking feasibility in LO – generic problem with instances “given a finite system Ax ≤ b
with rational data, check whether the system is solvable”
• Existence of a Hamiltonian cycle in a graph – generic problem with instances “given an
undirected graph, check whether it admits a Hamiltonian cycle – a cycle which visits all
nodes of the graph”
• Stone problem – generic problem with instances “given finitely many stones with integer
weights a1 , ..., an , check whether it is possible to split them into two groups of equal weights,”
Pn
or, equivalently, check whether the single homogeneous linear equation ai xi = 0 has a
i=1
solution in variables xi = ±1.

6.9
♣ A generic discrete problem P with instances “given rational vector of data d, find a
rational solution vector x such that A(d, x) = 1” is said to be in class NP, if
• The predicate A is easy to compute: in the Rational Arithmetic Complexity Model, it can
be computed in time polynomial in the total bit length `(d) + `(x) of d and x.
In other words, given the data d of an instance and a candidate solution x, it is easy to
check whether x indeed is a solution.
• If an instance with data d is solvable, then there exists a “short solution” x to the instance
– a solution with the binary length `(x) ≤ π(`(d)), where π(·) is a polynomial which, same
as the predicate A, is specific for P.

6.10
♠ In our examples:
• Checking existence of a Hamiltonian cycle and the Stones problems clearly are in NP –
in both cases, the corresponding predicates are easy to compute, and the binary length of
a candidate solution is not larger than the binary length of the data of an instance.
• Checking feasibility in LO also is in NP. Clearly, the associated predicate is easy to
compute — indeed, given the rational data of a system of linear inequalities and a rational
candidate solution, it takes polynomial in the total bit length of the data and the solution
time to verify on a Rational Arithmetic computer whether the candidate solution is indeed
a solution. What is unclear in advance, is whether a feasible system of linear inequalities
with rational data admits a “short” rational solution – one of the binary length polynomial
in the total binary length of system’s data. As we shall see in the mean time, this indeed
is the case.
Note: Class NP is extremely wide and covers the vast majority of, if not all, interesting
discrete problems.

6.11
♣ We say that a generic discrete problem P is in class P, if it is in NP and its feasibility
version is polynomially solvable in the Rational Arithmetic complexity model.
♠ Note: By definition of P, P is contained in NP. If these classes were equal, there, for all
practical purposes, would be no difficult problems at all, since, as a matter of fact, finding
a solution to a solvable instance of P ∈ NP reduces to solving a “short” series of related
to P feasibility problems.
♣ While we do not know whether P=NP – and this is definitely the major open problem
in Computer Science and one of the major open problems in Mathematics – we do know
something extremely important – namely, that there exist NP-complete problems.

6.12
♠ Let P, P + be two generic problems from NP. We say that P is polynomially reducible to
P, if there exists a polynomial time, in the Rational Arithmetic complexity model, algorithm
which, given on input the data d of an instance P ∈ P, converts d into the data d+ of an
instance P + ∈ P + such that P + is solvable if and only if P is solvable.
Note: If P is polynomially reducible to P + and the feasibility version Pf+ of P + is polyno-
mially solvable, so is the feasibility version Pf of P.
♠ A problem P ∈ NP is called NP-complete, if every problem from NP is polynomially
reducible to P. In particular:
• all NP-complete problems are polynomially reducible to each other;
• if the Feasibility version of one of NP-complete problems is polynomially solvable, then
the Feasibility versions of all problems from NP are polynomially solvable.

6.13
Facts:
♠ NP-complete problems do exist. For example, both checking the existence of a Hamilto-
nian cycle and Stones are NP-complete.
Moreover, modulo a handful of exceptions, all generic problems from NP for which no poly-
nomial time solution algorithms are known turn out to be NP-complete.
Practical aspects:
♥ Many discrete problems are of high practical interest, and basically all of them are in NP;
as a result, a huge total effort was invested over the years in search of efficient algorithms
for discrete problems. More often than not this effort did not yield efficient algorithms, and
the corresponding problems finally turned out to be NP-complete and thus polynomially
reducible to each other. Thus, the huge total effort invested into various difficult combi-
natorial problems was in fact invested into a single problem and did not yield a polynomial
time solution algorithm.
⇒ At the present level of our knowledge, it is highly unlikely that NP-complete problems
admit efficient solution algorithms, that is, we believe that P6=NP

6.14
♥ Taking for granted that P6=NP, the interpretation of the real-life notion of computational
tractability as the formal notion of polynomial time solvability works perfectly well at least in
discrete optimization – as a matter of fact, polynomial solvability/insolvability classifies the
vast majority of discrete problems into “difficult” and “easy” in exactly the same fashion
as computational practice.
♥ However: For a long time, there was an important exception from the otherwise solid
rule “tractability ≡ polynomial solvability” – Linear Programming with rational data. On
one hand, LO admits an extremely practically efficient computational tool – the Simplex
method – and thus, practically speaking, is computationally tractable. On the other hand,
it was unknown for a long time (and partially remains unknown) whether LO is polynomially
solvable.

6.15
Complexity Status of LO

♣ The complexity of LO can be studied both in the Real and the Rational Arithmetic
complexity models, and the related questions of polynomial solvability are as follows:
Real Arithmetic complexity model: Whether there exists a Real Arithmetics solution
algorithm for LO which solves every LO program with real data in a number of operations
of Real Arithmetics polynomial in the sizes m (number of constraints) and n (number of
variables) of the program?
This is one of the major open questions in Optimization.
Rational Arithmetic complexity model: Whether there exists a Rational Arithmetics
solution algorithm for LO which solves every LO program with rational data in a number
of “bit-wise” operations polynomial in the total binary length of the data of the program?
This question remained open for nearly two decades until it was answered affirmatively by
Leonid Khachiyan in 1978.

6.16
Complexity Status of the Simplex Method

♣ Equipped with appropriate anti-cycling rules, the Simplex method becomes a legitimate
solution algorithm in both Real and Rational Arithmetic complexity models.
♠ However, the only known complexity bounds for the Simplex method in both models are
exponential. Moreover, starting with mid-1960’s, it is known that these “bad bounds” are
not an artifact coming from a poor complexity analysis – they do reflect the bad worst-case
properties of the algorithm.

6.17
♠ Specifically, Klee and Minty presented a series Pn , n = 1, 2, ... of explicit LO programs as
follows:
• Pn has n variables and 2n inequality constraints with rational data of the total bit length
O(n2 )
• the feasible set Xn of Pn is a polytope (bounded nonempty polyhedral set) with 2n vertices
• the 2n vertices of Xn can be arranged into a sequence such that every two neighboring
vertices are linked by an edge (belong to the same one-dimensional face of Xn ), and the
objective strictly increases when passing from a vertex to the next one.
⇒ Unless a special care of the pivoting rules and the starting vertex is taken, the Simplex
method, as applied to Pn , can visit all 2n vertexes of Xn and thus its running time on Pn
in both complexity models will be exponential in n, while the size of Pn in both models is
merely polynomial in n.

6.18
♠ Later, similar “exponential” examples were built for all standard pivoting rules. However,
the family of all possible pivoting rules is too diffuse for analysis, and, strictly speaking,
we do not know whether the Simplex method can be “cured” – converted to a polynomial
time algorithm – by properly chosen pivoting rules.
♣ The question of “whether the Simplex method can be cured” is closely related to the
famous Hirsch Conjecture as follows. Let Xm,n be a polytope in Rn given by m linear
inequalities. An edge path on Xm,n is a sequence of distinct from each other edges (one-
dimensional faces of Xm,n ) in which every two subsequent edges have a point in common
(this point is a vertex of X).
Note: The trajectory of the PSM as applied to an LO with the feasible set Xm,n is an edge
path on Xm,n whatever be the pivoting rules.

6.19
♠ The Hirsch Conjecture suggests that every two vertices in Xm,n can be linked by an edge
path with at most m − n edges, or, equivalently, that the “edge diameter” of Xm,n :
d(Xm,n )
= max min h# of edges in a path from u to vi
u,v∈Ext(Xm,n ) path
is at most m − n.
After several decades of intensive research, this conjecture in its “literal” form was disproved
in 2010 by pointing out (F. Santos) a 43-dimensional polytope given by 86 inequalities with
edge diameter > 43.
Fact: No polynomial in m, n upper bounds on d(xm,n ) are known. If Hirsch Conjecture is
“heavily wrong” and no polynomial in m, n upper bound on edge diameter exists, this would
be an “ultimate” demonstration of inability to “cure” the Simplex method.
♣ Surprisingly, polynomial solvability of LO with rational data in Rational Arithmetic com-
plexity model turned out to be a byproduct of a completely unrelated to LO Real Arithmetic
algorithm for finding approximate solutions to general-type convex optimization programs
– the Ellipsoid algorithm.

6.20
Solving Convex Problems: Ellipsoid Algorithm

♣ There is a wide spectrum of algorithms capable to approximate global solutions of convex


problems to high accuracy in “reasonable” time.
We will start with one of the “universal” algorithms of this type – the Ellipsoid method
imposing only minimal additional to convexity requirements on the problem.

6.21
♣ Consider an optimization program
f∗ = min f (x) (P)
X
• X ⊂ Rn is a closed and bounded convex set with a nonempty interior;
• f is a continuous convex function on Rn .
♠ Assume that our “environment” when solving (P) is as follows:
A. We have access to a Separation Oracle Sep(X) for X – a routine which, given on input
a point x ∈ Rn , reports whether x ∈ X, and in the case of x 6∈ X, returns a separator – a
vector e 6= 0 such that
eT x ≥ maxy∈X eT y
B. We have access to a First Order Oracle which, given on input a point x ∈ X, returns
the value f (x) and a subgradient f 0 (x) of f at x:
∀y : f (y) ≥ f (x) + (y − x)T f 0 (x).
Note: When f is differentiable, one can set f 0 (x) = ∇f (x).
C. We are given positive reals R, r, V such that for some (unknown) c one has
{x : kx − ck ≤ r} ⊂ X ⊂ {x : kxk2 ≤ R}
and
max f (x) − min f (x) ≤ V.
x∈X x∈X

6.22
♠ Example: Consider
 an optimization program 
min f (x) ≡ max [p` + q`T x] : x ∈ X = {x : aTi x ≤ bi , 1 ≤ i ≤ m
x 1≤`≤L
W.l.o.g. we assume that ai 6= 0 for all i.
♠ A Separation Oracle can be as follows: given x, the oracle checks whether aTi x ≤ bi for
all i. If it is the case, the oracle reports that x ∈ X, otherwise it finds i = ix such that
aTix x > bix , reports that x 6∈ X and returns aix as a separator. This indeed is a separator:
y ∈ X ⇒ aTix y ≤ bix < aTix x
♠ A First Order Oracle can be as follows: given x, the oracle computes the quantities
p` + q`T x for ` = 1, ..., L and identifies the largest of these quantities, which is exactly f (x),
along with the corresponding index `, let it be `x : f (x) = p`x + q`Tx x. The oracle returns the
computed f (x) and, as a subgradient f 0 (x), the vector q`x . This indeed is a subgradient:
f (y) ≥ p`x + q`Tx y = [p`x + q`Tx x] + (y − x)T q`x = f (x) + (y − x)T f 0 (x).

6.23
f∗ = min f (x) (P)
X

• X ⊂ Rn is a closed and bounded convex set with a nonempty interior;


• f is a continuous convex function on Rn .
• We have access to a Separation Oracle which, given on input a point x ∈ Rn , reports
whether x ∈ X, and in the case of x 6∈ X, returns a separator e 6= 0:
eT x ≥ maxy∈X eT y
• We have access to a First Order Oracle which, given on input a point x ∈ X, returns the
value f (x) and a subgradient f 0 (x) of f :
∀y : f (y) ≥ f (x) + (y − x)T f 0 (x).
• We are given positive reals R, r, V such that for some (unknown) c one has
{x : kx − ck ≤ r} ⊂ X ⊂ {x : kxk2 ≤ R}
and
max f (x) − min f (x) ≤ V.
x∈X x∈X
♠ How to build a good solution method for (P)?

To get an idea, let us start with univariate case.

6.24
Univariate Case: Bisection

♣ When solving a problem min {f (x) : x ∈ X = [a, b] ⊂ [−R, R]} , by bisection, we recursively
x
update localizers – segments ∆t = [at , bt ] containing the optimal set Xopt .
• Initialization: Set ∆1 = [−R, R] [⊃ Xopt ]

6.25
min {f (x) : x ∈ X = [a, b] ⊂ [−R, R]} ,
x

• Step t: Given ∆t ⊃ Xopt let ct be the midpoint of ∆t . Calling Separation and First Order
oracles at et , we replace ∆t by twice smaller localizer ∆t+1 .
f f

a c b a c b
t−1 a b t t−1 t−1 t a b t−1

1.a) 1.b)

f f f

a c b a c b a c b
t−1 t t−1 t−1 t t−1 t−1 t t−1

2.a) 2.b) 2.c)

1) SepX says that ct 6∈ X and reports, via separator e,


on which side of ct X is.
1.a): ∆t+1 = [at , ct ]; 1.b): ∆t+1 = [ct , bt ]
2) SepX says that ct ∈ X, and Of reports, via signf 0 (ct ),
on which side of ct Xopt is.
2.a): ∆t+1 = [at , ct ]; 2.b): ∆t+1 = [ct , bt ]; 2.c): ct ∈ Xopt

♠ Since the localizers rapidly shrink and X is of positive length, eventually some of search
points will become feasible, and the nonoptimality of the best found so far feasible search
point will rapidly converge to 0 as process goes on.

6.26
Opt(P ) = minx∈X⊂Rn f (x) (P )

♠ Bisection admits multidimensional extension, called Generic Cutting Plane Algorithm,


where one builds a sequence of “shrinking” localizers Gt – closed and bounded convex
domains containing the optimal set Xopt of (P ).
Generic Cutting Plane Algorithm is as follows:
♠ Initialization Select as G1 a closed and bounded convex set containing X and thus being
a localizer.

6.27
Opt(P ) = minx∈X⊂Rn f (x) (P )

♠ Step t = 1, 2, ...: Given current localizer Gt ,


• Select current search point ct ∈ Gt and call Separation and First Order oracles to form
b t := {x ∈ Gt : eTt x ≤ eTt ct}
a cut – to find et 6= 0 s.t. Xopt ⊂ G

A: ct ∈
6 X B: ct ∈ X
Black: X; Blue: Gt ; Magenta: Cutting hyperplane
To this end
— call SepX , ct being the input. If SepX says that ct 6∈ X and returns a separator, take it
as et (case A on the picture).
Note: ct 6∈ X ⇒ all points from Gt \G b t are infeasible
— if ct ∈ Xt , call Of to compute f (ct ), f 0 (ct ). If f 0 (ct ) = 0, terminate, otherwise set
et = f 0 (ct ) (case B on the picture).
Note: When f 0 (ct ) = 0, ct is optimal for (P ), otherwise f (x) > f (ct ) at all feasible x ∈ Gt \G
bt
• By the two “Note” above, G b t is a localizer along with Gt. Select a closed and bounded
convex set Gt+1 ⊃ G b t (it also will be a localizer) and pass to step t + 1.

6.28
Opt(P ) = minx∈X⊂Rn f (x) (P )

♣ Summary: Given current localizer Gt , selecting a point ct ∈ Gt and calling the Separation
and the First Order oracles, we can
♠ in the productive case ct ∈ X, find et such that
eTt (x − ct ) > 0 ⇒ f (x) > f (ct )
♠ in the non-productive case ct 6∈ X, find et such that
eTt (x − ct ) > 0 ⇒ x 6∈ X
⇒ the set G b t = {x ∈ Gt : eTt (x − ct) ≤ 0} is a localizer
♣ We can select as the next localizer Gt+1 any set containing G bt.
♠ We define approximate solution xt built in course of t = 1, 2, ... steps as the best – with
the smallest value of f – of the feasible search points c1 , ..., ct built so far.
If in course of the first t steps no feasible search points were built, xt is undefined.

6.29
Opt(P ) = minx∈X⊂Rn f (x) (P )

♣ Analysing Cutting Plane algorithm


• Let Vol(G) be the n-dimensional volume of a closed and bounded convex set G ⊂ Rn .
Note: For convenience, we use, as the unit of volume, the volume of n-dimensional unit
ball {x ∈ Rn : kxk2 ≤ 1}, and not the volume of n-dimensional unit box.
• Let us call the quantity ρ(G) = [Vol(G)]1/n the radius of G. ρ(G) is the radius of n-
dimensional ball with the same volume as G, and this quantity can be thought of as the
average linear size of G.
Theorem. Let convex problem (P ) satisfying our standing assumptions be solved by
Generic Cutting Plane Algorithm generating localizers G1 , G2 ,... and ensuring that ρ(Gt ) →
0 as t → ∞. Let t̄ be the first step where ρ(Gt+1 ) < ρ(X). Starting with this step,
approximate solution xt is well defined and obeys
h the i h “error bound”i
t ρ(Gτ +1 )
f (x ) − Opt(P ) ≤ min ρ(X) max f − min f
τ ≤t X X

6.30
Opt(P ) = minx∈X⊂Rn f (x) (P )

Explanation: Since int X 6= ∅, ρ(X) is positive, and since X is closed and bounded, (P ) is
solvable. Let x∗ be an optimal solution to (P ).
• Let us fix  ∈ (0, 1) and set X = x∗ + (X − x∗ ).
X is obtained X by similarity transformation which keeps x∗ intact and “shrinks” X towards
x∗ by factor . This transformation multiplies volumes by n ⇒ ρ(X ) = ρ(X).
• Let t be such that ρ(Gt+1 ) < ρ(X) = ρ(X ). Then Vol(Gt+1 ) < Vol(X ) ⇒ the set X \Gt+1
is nonempty ⇒ for some z ∈ X, the point
y = x∗ + (z − x∗ ) = (1 − )x∗ + z
does not belong to Gt+1 .
• G1 contains X and thus y, and Gt+1 does not contain y, implying that for some τ ≤ t, it
holds
eTτ y > eTτ cτ (!)
• We definitely have cτ ∈ X – otherwise eτ separates cτ and X 3 y, and (!) witnesses
otherwise.
⇒ cτ ∈ X ⇒ eτ = f 0 (cτ ) ⇒ f (cτ ) + eTτ (y − cτ ) ≤ f (y)
⇒ [by (!)]
f (cτ ) ≤ f (y) = f ((1 − )x∗ + z) ≤ (1h − )f (x∗ ) + fi(z)
⇒ f (cτ ) − f (x∗ ) ≤[f (z) − f (x∗ )] ≤  max f − min f .
X X
Bottom line: If 0 <  < 1 and ρ(Ght+1 ) < ρ(X), ithen xt is well defined (since τ ≤ t and cτ
is feasible) and f (xt ) − Opt(P ) ≤  max f − min f .
X X

6.31
Opt(P ) = minx∈X⊂Rn f (x) (P )

“Starting with the first step t̄ where ρ(Gt+1 t


 ) < ρ(X), h x is well defined, and
ρ(Gτ +1 ) i
f (xt ) − Opt ≤ min max f − min f ”
τ ≤t ρ(X) X
{z X }
| {z }|
t V
♣ We are done. Let t ≥ t̄, so that t < 1, and let  ∈ (t , 1). Then for some t0 ≤ t we have
ρ(Gt0 +1 ) < ρ(X)
0
t
⇒ [by bottom line] x is well defined and
0
f (xt ) − Opt(P ) ≤ V
0
⇒ [since f (xt ) ≤ f (xt ) due to t ≥ t0 ] xt is well defined and f (xt ) − Opt(P ) ≤ V
⇒ [passing to limit as  → t + 0] xt is well defined and f (xt ) − Opt(P ) ≤ t V 

6.32
Opt(P ) = minx∈X⊂Rn f (x) (P )

♠ Corollary: Let (P ) be solved by cutting Plane Algorithm which ensures, for some
ϑ ∈ (0, 1), that
ρ(Gt+1 ) ≤ ϑρ(Gt )
Then, for every desired accuracy  > 0, finding feasible -optimal solution x to (P ) (i.e., a
feasible solution x satisfying f (x ) − Opt ≤ ) takes at most
  
1 V
N = ln R 1 + +1
ln(1/ϑ) 
steps of the algorithm. Here
ρ(G1 )
R=
ρ(X)
says how well, in terms of volume, the initial localizer G1 approximates X, and
V = max f − min f
X X
is the variation of f on X.
Note: R and V / are under log, implying that high accuracy and poor approximation of X
by G1 cost “nearly nothing.”
What matters, is the factor at the log which is the larger the closer ϑ < 1 is to 1.

6.33
“Academic” Implementation: Centers of Gravity

♠ In high dimensions, to ensure progress in volumes of subsequent localizers in a Cutting


Plane algorithm is not an easy task: we do not know how the cut through ct will pass, and
thus should select ct in Gt in such a way that whatever be the cut, it cuts off the current
localizer Gt a “meaningful” part of its volume.
♠ The most natural choice of ct in GtisZ the center
 Z of gravity:

ct = xdx / 1dx ,
Gt Gt
the expectation of the random vector uniformly distributed on Gt .
Good news: The Center of Gravity policy with Gt+1 = G b t results in
 h in1/n
n
ϑ = 1 − n+1 ≤ [0.632...]1/n (∗)
This results in the complexity bound (# of steps
 needed
 to build -solution)
V
N = 2.2n ln R 1 +  + 1
Note: It can be proved that within absolute constant factor, like 4, this is the best com-
plexity bound achievable by whatever algorithm for convex minimization which can “learn”
the objective via First Order oracle only.

6.34
♣ Reason for (*): Brunn-Minkowski Symmeterization Principle:
Let Y be a convex compact set in Rn , e be a unit direction and Z be “equi-cross-sectional”
to X body symmetric w.r.t. e, so that
• Z is rotationally symmetric w.r.t. the axis e
• for every hyperplane H = {x : eT x = const}, one has
Voln−1 (X ∩ H) = Voln−1 (Z ∩ H)

Then Z is a convex compact set.

Equivalently: Let U, V be convex compact nonempty sets in Rn . Then


Vol1/n (U + V ) ≥ Vol1/n (U ) + Vol1/n (V ).
In fact, convexity of U , V is redundant!

6.35
Disastrously bad news: Centers of Gravity are not implementable, unless the dimension
n of the problem is like 2 or 3.
Reason: We have no control on the shape of localizers. When started with a polytope G1
given by M linear inequalities (e.g., a box), Gt for t  n can be a more or less arbitrary
polytope given by M +t−1 linear inequalities. Computing center of gravity of a general-type
high-dimensional polytope is a computationally intractable task – it requires astronomically
many computations already in the dimensions like 5 – 10.
Remedy: Maintain the shape of Gt simple and convenient for computing centers of gravity,
sacrificing, if necessary, the value of ϑ.
The most natural implementation of this remedy is enforcing Gt to be ellipsoids. As a
result,
• ct becomes computable in O(n2 ) operations (nice!)
• ϑ = [0.632...]1/n ≈ exp{−0.367/n} increases to ϑ ≈ exp{−0.5/n2 }, spoiling the complexity
bound
V
 
N = 2.2n ln R 1 +  + 1
to
N = 4n2 ln R 1 + V + 1
 

(unpleasant, but survivable...)

6.36
Practical Implementation - Ellipsoid Method

♠ Ellipsoid in Rn is the image of the unit n-dimensional ball under one-to-one affine mapping:
E = E(B, c) = {x = Bu + c : uT u ≤ 1}
where B is n × n nonsingular matrix, and c ∈ Rn .
• c is the center of ellipsoid E = E(B, c): when c + h ∈ E, c − h ∈ E as well
• When multiplying by n × n matrix B, n-dimensional volumes are multiplied by |Det(B)|
⇒ Vol(E(B, c)) = |Det(B)|, ρ(E(B, c)) = |Det(B)|1/n .

6.37
E = E(B, c) = {x = Bu + c : uT u ≤ 1}

Simple fact: Let E(B, c) be ellipsoid in Rn and e ∈ Rn be a nonzero vector. The “half-
ellipsoid”
b = {x ∈ E(B, c) : eT x ≤ eT c}
E
is covered by the ellipsoid E + = E(B + , c+ ) given by √
1
c+ = c − n+1 Bp, p = B T e/ eT BB T e
 
+ n n n
B = √n2 −1 B + n+1 − √n2 −1 (Bp)pT ,
• E(B + , c+ ) is the ellipsoid of the smallest volume containing the half-ellipsoid E,
b and the
volume of E(B + , c+ ) is strictly smaller than the one of E(B, c):
+ +
ϑ := ρ(E(B ,c ))
ρ(E(B,c))
≤ exp{− 1
2n2
}.
• Given B, c, e, computing B + , c+ costs O(n2 ) arithmetic operations.

6.38
Opt(P ) = minx∈X⊂Rn f (x) (P )

♣ Ellipsoid method is the Cutting Plane Algorithm where


• all localizers Gt are ellipsoids:
Gt = E(Bt , ct ),
• the search point at step t is ct , and
• Gt+1 is the smallest volume ellipsoid containing the half-ellipsoid
Gb t = {x ∈ Gt : eTt x ≤ eTt ct}
Computationally, at every step of the algorithm we once call the Separation oracle SepX ,
(at most) once call the First Order oracle Of and spend O(n2 ) operations to update (Bt , ct )
into (Bt+1 , ct+1 ) by explicit formulas.
♠ Complexity bound of the Ellipsoid algorithm is 
N = 4n2 ln R 1 + V + 1


R = ρ(G 1)
ρ(X)
≤ R
r
, V = max f (x) − min f (x)
x∈X x∈X
Pay attention:
• R, V,  are under log ⇒ large magnitudes in data entries and high accuracy are not issues
• the factor at the log depends only on the structural parameter of the problem (its design
dimension n) and is independent of the remaining data.

6.39
What is Inside Simple Fact

♠ Messy formulas describing the updating


(Bt , ct ) → (Bt+1 , ct+1 )
in fact are easy to get.
• Ellipsoid E is the image of the unit ball B under affine transformation. Affine transfor-
mation preserves ratio of volumes
⇒ Finding the smallest volume ellipsoid containing a given half-ellipsoid E
b reduces to finding
the smallest volume ellipsoid B + containing half-ball B: b


x=c+Bu
b and E +
E, E b and B +
B, B
• The “ball” problem is highly symmetric, and solving it reduces to a simple exercise in
elementary Calculus.

6.40
Why Ellipsoids?

(?) When enforcing the localizers to be of “simple and stable” shape, why we make
them ellipsoids (i.e., affine images of the unit Euclidean ball), and not something else, say
parallelotopes (affine images of the unit box)?
Answer: In a “simple stable shape” version of Cutting Plane Scheme all localizers are affine
images of some fixed n-dimensional solid C (closed and bounded convex set in Rn with a
nonempty interior). To allow for reducing step by step volumes of localizers, C cannot be
arbitrary. What we need is the following property of C:
One can fix a point c in C in such a way that whatever be a cut
b = {x ∈ C : eT x ≤ eT c} [e 6= 0]
C
this cut can be covered by the affine image of C with the volume less than the one of C:
∃B, b : C
b ⊂ B C + b & |Det(B)| < 1 (!)
Note: The Ellipsoid method corresponds to unit Euclidean ball in the role of C and to
1 1
c = 0, which allows to satisfy (!) with |Det(B)| ≤ exp{− 2n }, finally yielding ϑ ≤ exp{− 2n 2 }.

6.41
• Solids C with the above property are “rare commodity.” For example, n-dimensional box
does not possess it.
• Another “good” solid is n-dimensional simplex (this is not that easy to see!). Here (!)
can be satisfied with |Det(B)| ≤ exp{−O(1/n2 )}, finally yielding ϑ = (1 − O(1/n3 )).
⇒ From the complexity viewpoint, “simplex” Cutting Plane algorithm is worse than the
Ellipsoid method.
The same is true for handful of other known so far (and quite exotic) ”good solids.”

6.42
Ellipsoid Method: pro’s & con’s

♣ Academically speaking, Ellipsoid method is an indispensable tool underlying basically


all results on efficient solvability of generic convex problems, most notably, the famous the-
orem of L. Khachiyan (1978) on polynomial time solvability of Linear Programming with
rational data in Rational Arithmetic Complexity model.
♠ What matters from theoretical perspective, is “universality” of the algorithm (nearly
no assumptions on the problem except for convexity) and complexity bound of the form
“structural parameter outside of log, all else, including required accuracy, under the log.”
♠ Another theoretical (and to some extent, also practical) advantage of the Ellipsoid al-
gorithm is that as far as the representation of the feasible set X is concerned, all we need
is a Separation oracle, and not the list of constraints describing X. The number of these
constraints can be astronomically large, making impossible to check feasibility by looking
at the constraints one by one; however, in many important situations the constraints are
“well organized,” allowing to implement Separation oracle efficiently.

6.43
♠ Theoretically, the only (and minor!) drawbacks of the algorithm is the necessity for
the feasible set X to be bounded, with known “upper bound,” and to possess nonempty
interior.
As of now, there is not way to cure the first drawback without sacrificing universality. The
second “drawback” is artifact: given nonempty set
X = {x : gi (x) ≤ 0, 1 ≤ i ≤ m},
we can extend it to
X  = {x : gi (x) ≤ , 1 ≤ i ≤ m},
thus making the interior nonempty, and minimize the objective within accuracy  on this
larger set, seeking for -optimal -feasible solution instead of -optimal and exactly feasible
one.
This is quite natural: to find a feasible solution is, in general, not easier than to find an
optimal one. Thus, either ask for exactly feasible and exactly optimal solution (which be-
yond LO is unrealistic), or allow for controlled violation in both feasibility and optimality!

6.44
♠ From practical perspective, theoretical drawbacks of the Ellipsoid method become ir-
relevant: for all practical purposes, bounds on the magnitude of variables like 10100 are the
same as no bounds at all, and infeasibility like 10−10 is the same as feasibility. And since
the bounds on the variables and the infeasibility are under log in the complexity estimate,
10100 and 10−10 are not a disaster.
♠ Practical limitations (rather severe!) of Ellipsoid algorithm stem from method’s sen-
sitivity to problem’s design dimension n. Theoretically, with , V, R fixed, the number of
steps grows with n as n2 , and the effort per step is at least O(n2 ) a.o.
⇒ Theoretically, computational effort grows with n at least as O(n4 ),
⇒ n like 1000 and more is beyond the “practical grasp” of the algorithm.
Note: Nearly all modern applications of Convex Optimization deal with n in the range of
tens and hundreds of thousands!

6.45
♠ By itself, growth of theoretical complexity with n as n4 is not a big deal: for Simplex
method, this growth is exponential rather than polynomial, and nobody dies – in reality,
Simplex does not work according to its disastrous theoretical complexity bound.
Ellipsoid algorithm, unfortunately, works more or less according to its complexity bound.
⇒ Practical scope of Ellipsoid algorithm is restricted to convex problems with few tens of
variables.
However: Low-dimensional convex problems from time to time do arise in applications.
More importantly, these problems arise “on a permanent basis” as auxiliary problems within
some modern algorithms aimed at solving extremely large-scale convex problems.
⇒ The scope of practical applications of Ellipsoid algorithm is nonempty, and within this
scope, the algorithm, due to its ability to produce high-accuracy solutions (and surprising
stability to rounding errors) can be considered as the method of choice.

6.46
How It Works
Opt = min f (x), X = {x ∈ Rn : aTi x − bi ≤ 0, 1 ≤ i ≤ m}
x

♠ Real-life problem with n = 10 variables and m = 81, 963, 927 “well-organized” linear
constraints:
CPU, sec t f (xt ) f (xt ) − Opt≤ ρ(Gt )/ρ(G1 )
0.01 1 0.000000 6.7e4 1.0e0
0.53 63 0.000000 6.7e3 4.2e-1
0.60 176 0.000000 6.7e2 8.9e-2
0.61 280 0.000000 6.6e1 1.5e-2
0.63 436 0.000000 6.6e0 2.5e-3
1.17 895 -1.615642 6.3e-1 4.2e-5
1.45 1250 -1.983631 6.1e-2 4.7e-6
1.68 1628 -2.020759 5.9e-3 4.5e-7
1.88 1992 -2.024579 5.9e-4 4.5e-8
2.08 2364 -2.024957 5.9e-5 4.5e-9
2.42 2755 -2.024996 5.7e-6 4.1e-10
2.66 3033 -2.024999 9.4e-7 7.6e-11

6.47
♠ Similar problem with n = 30 variables and
m = 1, 462, 753, 730 “well-organized” linear constraints:
CPU, sec t f (xt ) f (xt ) − Opt≤ ρ(Gt )/ρ(G1 )
0.02 1 0.000000 5.9e5 1.0e0
1.56 649 0.000000 5.9e4 5.0e-1
1.95 2258 0.000000 5.9e3 8.1e-2
2.23 4130 0.000000 5.9e2 8.5e-3
5.28 7080 -19.044887 5.9e1 8.6e-4
10.13 10100 -46.339639 5.7e0 1.1e-4
15.42 13308 -49.683777 5.6e-1 1.1e-5
19.65 16627 -50.034527 5.5e-2 1.0e-6
25.12 19817 -50.071008 5.4e-3 1.1e-7
31.03 23040 -50.074601 5.4e-4 1.1e-8
37.84 26434 -50.074959 5.4e-5 1.0e-9
45.61 29447 -50.074996 5.3e-6 1.2e-10
52.35 31983 -50.074999 1.0e-6 2.0e-11

6.48
From Ellipsoid Method to Polynomial Time Solvability of Linear Programming

♣ Theorem [L. Khachiyan, 1978] A Linear  T Programming problem


min c x : Ax ≤ b
with rational data admits polynomial time solution algorithm: an optimal solution to the
problem (or a correct conclusion that no solution exists) can be found in polynomial time,
that is, in number of bitwise arithmetic operations polynomial in the bitlength L (total
number of bits) in the data.

6.49
♠ Main Lemma: Given a system Ax ≤ b of linear inequalities with rational data, one can
decide whether or not the system is solvable in polynomial time.
♠ Proof of Main Lemma. Eliminating from A columns which are linear combinations of
the remaining columns does not affect solvability of the system Ax ≤ b, and selecting the
maximal linearly independent set of columns in A is a simple Linear Algebra problem which
can be solved in polynomial time.
⇒ We may assume without loss of generality that the columns of A are linearly independent,
or, which is the same, that the solution set does not contain lines. By similar argument,
we may assume without loss of generality that the data are integer.

6.50
Step 1: Reformulating the problem. Assuming that A is m × n, observe that system
Ax ≤ b is feasible if and only if the optimal
 value Opt in the optimization
 problem
Opt = minx∈Rn f (x) = max [Ax − b]i (∗)
1≤i≤m
is nonpositive.
Strategy: (∗) is a convex minimization problem with easy to compute objective
⇒ We could try to check whether or not Opt ≤ 0 by solving the problem by the Ellipsoid
method.
Immediate obstacles:
• The domain of (∗) is the entire space, while the Ellipsoid method requires the domain to
be bounded.
• The Ellipsoid method allows to find high accuracy approximate solutions, while we need
to distinguish between the cases Opt ≤ 0 and Opt > 0, which seems to require finding
precise solution.

6.51
Removing the obstacles, I
 Ax ≤ b is feasible 
⇔ Opt = infn f (x) = max [Ax − b]i ≤0 (∗)
x∈R 1≤i≤m

♠ Fact I: Opt ≤ 0 is and only if 


Opt∗ = min f (x) = max [Ax − b]i : kxk∞ ≤ 2L ≤ 0 (!)
x 1≤i≤m
Indeed, the polyhedral set {x : Ax ≤ b} does not contain lines and therefore is nonempty if
and only if the set possesses an extreme point x̄.
A. By characterization of extreme points of polyhedral sets, we should have Āx̄ = b̄ where
Ā is a nonsingular n × n submatrix of the m × n matrix A, and b̄ is the respective subvector
of b.
⇒ by Cramer’s rule, xj = ∆ ∆
j
, where ∆ 6= 0 is the determinant of Ā and ∆j is the determinant
of the matrix Āj obtained from Ā when replacing j-th column with b̄.
B. Since Ā is with integer entries, its determinant is integer; since it is nonzero, we have
|∆| ≥ 1.
Since Āj is with integer entries of total bit length ≤ L, the magnitude of its determinant
is at most 2L (an immediate corollary of the definition of the total bit length and the
Hadamard’s Inequality stating that the magnitude of a determinant does not exceed the
product of Euclidean lengths of its rows).
Combining A and B, we get |x̄j | ≤ 2L for all j. 

6.52
Removing the obstacles, II
Ax ≤
 b is feasible 
⇔ Opt = infn max [Ax − b]i ≤ 0 (∗)
x∈R 1≤i≤m
 
⇔ Opt∗ := min max [Ax − b]i : kxk∞ ≤ 2L ≤ 0 (!)
x 1≤i≤m

♠ Fact II: Let A, b be with integer entries of the total bit length L and the columns of A
be linearly independent. If Opt is positive, Opt∗ is not too small, specifically, Opt∗ ≥ 2−2L .
Indeed, assume that Opt > 0. Note that Opt∗ ≥ Opt and that
Opt = mint,x {t : Ax − t1 ≤ b} > 0 [1 = [1; ...; 1]]
It is immediately seen that when Opt is positive, the feasible domain of this (clearly feasible)
LP does not contain lines, and thus the problem has an extreme point solution. ⇒ Opt∗
is just a coordinate in an extreme point ȳ of the polyhedral set {y = [x; t] : A+ y ≤ b},
A+ = [A, −1]. Note that the bit length of (A+ , b) is at most 2L.
∆0 00 6= 0, ∆0 of magnitudes not
⇒ [the same argument as in Fact I] Opt = ∆ 00 with integer ∆

exceeding 22L . Since in addition Opt > 0, we conclude that Opt > 2−2L , as claimed.

6.53
♠ Bottom line: When A, b are with integer data of total bit length L and the columns
in A are linearly independent, checking whether the system Ax ≤ b is solvable reduces to
deciding on two hypotheses about  
Opt∗ := minx max [Ax − b]i : kxk∞ ≤ 2L (!)
1≤i≤m
the first stating that Opt∗ ≤ 0, and the second stating that Opt∗ ≥ 2−2L .
• To decide correctly on the hypotheses, it clearly suffices to approximate Opt∗ within
accuracy  = 13 2−2L .
Invoking the efficiency estimate of the Ellipsoid method and taking into account that by
evident reasons n ≤ L, it is immediately seen that resolving the resulting task requires
polynomial in L number of arithmetic operations, including those to mimic Separation and
First Order oracles.
However: The Ellipsoid algorithm uses precise real arithmetics, while we want to check
feasibility in polynomial in L number of bitwise operations. What to do?
• Straightforward (albeit tedious) analysis shows that we lose nothing when replacing the
precise real arithmetic with imprecise one, where one keeps O(nL) digits in the results
before and after the dot. With this implementation, the procedure becomes “fully finite”
and requires polynomial in L number of bitwise operations. 

6.54
From Checking Feasibility to Finding Solution

♠ Note: Solving LP with rational data of bitlength L reduces to solving system of linear
inequalities with rational data of bitlength O(L) (write down the primal and the dual con-
straints and add the inequality “duality gap is ≤ 0”)
⇒ The only thing which is still missing is how to reduce, in a polynomial time fashion,
finding a solution, if any, to a system of linear inequalities with rational data to checking
feasibility of systems of linear inequalities with rational data.

6.55
♣ How to reduce in a polynomial time fashion finding a solution to checking feasibility?
♠ Reduction: To find a solution, if any, to a system S of m linear inequalities and equalities
with rational data, we
• Check in polynomial time whether S is solvable. If not, we are done, otherwise we proceed
as follows.
• We convert the first inequality in S, if any, into equality and check in polynomial time
whether the resulting system is solvable. If yes, this is our new system S1 , otherwise S1 is
obtained from S by eliminating the first inequality.
Note: As it is immediately seen, S1 solvable, and every feasible solution to S1 is feasible
for S. Thus,
•Given a solvable system of m linear inequalities and equalities, we can in polynomial time
replace it with another solvable system of (at most) m linear inequalities and equalities,
strictly reducing the number of inequalities, provided it was positive, and ensuring that
every feasible solution to S1 is feasible for S. Besides this, the bitlength of the data of S1
is (at most) the total bitlength L of the data of S.

6.56
• Iterating the above construction, we in at most m steps end up with a solvable system S ∗
of linear equations such that every feasible solution to S ∗ is feasible for the original system
S.
⇒ Finding a solution to a system S of linear inequalities and equations with rational data
indeed reduces in a polynomial time fashion to polynomially solvable, via elementary Linear
Algebra, problem of solving a system of linear equations with rational data.

6.57
What is Ahead

♣ The theorem on polynomial time solvability of Linear Optimization is “constructive” – we


can explicitly point out the underlying polynomial time solution algorithm (e.g., the Ellipsoid
method). However, from the practical viewpoint this is a kind of “existence theorem” – the
resulting complexity bounds, although polynomial, are “too large” for practical large-scale
computations.
The intrinsic drawback of the Ellipsoid method (and all other “universal” polynomial time
methods in Convex Programming) is that the method utilizes just the convex structure
of instances and is unable to facilitate our a priori knowledge of the particular analytic
structure of these instances.
• In late 80’s, a new family of polynomial time methods for “well-structured” generic convex
programs was found – the Interior Point methods which indeed are able to facilitate our
knowledge of the analytical structure of instances.

6.58
• LO and its extensions – Conic Quadratic Optimization CQO and Semidefinite Optimization
SDO – are especially well-suited for processing by the IP methods, and these methods yield
the best known so far theoretical complexity bounds for the indicated generic problems.

♣ As far as practical computations are concerned, the IP methods

• in the case of Linear Optimization, are competitive with the Simplex method

• in the case of Conic Quadratic and Semidefinite Optimization are the best known so far
numerical techniques.

6.59
From Linear to Conic Optimization
Conic Optimization: Why?

♠ “Universal” Convex Optimization algorithm, like Ellipsoids method, are blind (scientifi-
cally: “black box oriented”) – they do not utilize problem’s structure, aside of convexity,
and “learn” the problem via local information (values and (sub)gradients of objective and
constraints along search points).
At present level of our knowledge, this implies severe limitations on the sizes of convex
problems amenable to “universal” algorithms.
Note: A convex program always has a lot of structure – otherwise how could we know that
the problem is convex?
A good algorithm should utilize a priori knowledge of problem’s structure in order to accel-
erate the solution process.

7.1
A good algorithm should utilize a priori knowledge of problem’s structure in order to accel-
erate the solution process.

Example: The LP Simplex Method is fully adjusted to the particular structure of


an LO problem. Although by far inferior to the Ellipsoid method in the worst
case, Simplex Method in reality is capable to solve LO’s with tens and hundreds
of thousands of variables and constraints – a task which is by far out of reach of
the theoretically efficient “universal” black box oriented algorithms.

7.2
♠ Before utilizing structure of a convex program, one should “reveal” it.
Revealing structure is a highly challenging task: it is unclear what we are looking for until
we find it!
♠ The most useful, as of now, “structure revealing” form of convex program – Conic
Optimization – was found in early 1990’s. The idea behind looks really striking (if not
crazy):
• Traditionally, when passing from a LO problem
min{cT x : Ax − b ≤ 0} (P )
to a convex one,
• linear objective cT x is replaced with convex objective, and
• affine in x left hand side Ax − b in the vector inequality constraint Ax − b ≤ 0 is replaced
with entrywise convex vector-valued function A(x), yielding the vector inequality constraint
A(x) ≤ 0. In Conic Optimization, we keep the objective and the left hand side in the vector
inequality Ax − b ≤ 0 linear/affine, and “introduce nonlinearity” in what “≤ 0” means!
Note: This is not as crazy as it looks. When comparing numbers, there is only one
meaningful notion of ≤. Inequality ≤ in (P ) is something different: it is specific “entrywise”
inequality between vectors, with “a ≤ 0” meaning “all entries in vector a are nonpositive.”
On a closed inspection, the entrywise vector inequality “≤” is neither the only possible, nor
the only useful way to compare vectors, so why to stick to the entrywise ≤ ?

7.3
♣ A Conic Programming optimization program is
 T
Opt = min c x : Ax − b ∈ K , (C)
x

where K ⊂ Rm is a regular cone.


♠ Regularity of K means that
• K is convex cone: P
(xi ∈ K, λi ≥ 0, 1 ≤ i ≤ p) ⇒ i λi xi ∈ K
• K is pointed: ±a ∈ K ⇔ a = 0
• K is closed: xi ∈ K, limi→∞ xi = x ⇒ x ∈ K
• K has a nonempty interior int K:
∃(x̄ ∈ K, r > 0) : {x : kx − x̄k2 ≤ r} ⊂ K
Example: The nonnegative orthant
Rm m
+ = {x ∈ R : xi ≥ 0, 1 ≤ i ≤ m}
is a regular cone, and the associated conic problem (C) is just the usual LO program.
Fact: When passing from LO programs (i.e., conic programs associated with nonnegative
orthants) to conic programs associated with properly chosen wider families of cones, we
extend dramatically the scope of applications we can process, while preserving the major
part of LO theory and preserving our abilities to solve problems efficiently.

7.4
• Let K ⊂ Rm be a regular cone. We can associate with K two relations between vectors
of Rm :
• “nonstrict K-inequality” ≥K :
a ≥K b ⇔ a − b ∈ K
• “strict K-inequality” >K :
a >K b ⇔ a − b ∈ int K
m
Example: when K = R+ , ≥K is the usual “coordinate-wise” nonstrict inequality ”≥”
between vectors a, b ∈ Rm :
a ≥ b ⇔ ai ≥ bi , 1 ≤ i ≤ m
while >K is the usual “coordinate-wise” strict inequality ”>” between vectors a, b ∈ Rm :
a > b ⇔ ai > bi , 1 ≤ i ≤ m
♣ K-inequalities share the basic algebraic and topological properties of the usual coordinate-
wise ≥ and >, for example:
♠ ≥K is a partial order:
• a ≥K a (reflexivity),
• a ≥K b and b ≥K a ⇒ a = b (anti-symmetry)
• a ≥K b and b ≥K c ⇒ a ≥K c (transitivity)

7.5
♠ ≥K is compatible with linear operations:
• a ≥K b and c ≥K d ⇒ a + c ≥K b + d,
• a ≥K b and λ ≥ 0 ⇒ λa ≥K λb
♠ ≥K is stable w.r.t. passing to limits:
ai ≥K bi , ai → a, bi → b as i → ∞ ⇒ a ≥K b
♠ >K satisfies the usual arithmetic properties, like
• a >K b and c ≥K d ⇒ a + c >K b + d
• a >K b and λ > 0 ⇒ λa >K λb
and is stable w.r.t perturbations: if a >K b, then a0 >K b0 whenever a0 is close enough to a
and b0 is close enough to b.
♣ Note: Conic program associated with  T a regular cone K can be written down as
min c x : Ax − b ≥K 0
x
Note: Every convex program can be equivalently reformulated as a conic one.

7.6
Basic Operations with Cones

♣ Given regular cones K` ⊂ Rm` , 1 ≤ ` ≤ L, we can form their direct product


K = K1 × ... × KL; = {x = [x1; ...; xL] : x` ∈ K` ∀`}
,
⊂ Rm1 +...+mL = Rm1 × ...RmL
and this direct product is a regular cone.
Example: Rm + is the direct product of m nonnegative rays R+ = R+ .
1

♣ Given a regular cone K ∈ Rm , we can build its dual cone


K∗ = {x ∈ Rm : xT y ≥ 0 ∀y ∈ K}
The cone K∗ is regular, and (K∗ )∗ = K.
Example: Rm m m
+ is self-dual: (R+ )∗ = R+ .
♣ Fact: The cone dual to a direct product of regular cones is the direct product of the
dual cones of the factors:
(K1 × ... × KL )∗ = (K1 )∗ × ... × (KL )∗

7.7
Data and Structure
 T of Conic Program
minn c x : Ax − b ≥K 0 (CP)
x∈R

♠ When asked “what is the data, and what is the structure in (CP)”, everybody will give
the same answer:
The structure “sits” in the cone K (and in n), and the data are the entries in c, A, b.
But: General type convex cone is as “unstructured” as a general type convex function.
Why not to say that in a convex program of in the MP form
minx∈Rn {f (x) : gi (x) ≤ 0, 1 ≤ i ≤ m}
the structure “sits” in the convex functions f, g1 , ..., gm (and m, n) — definitely true and
absolutely useless!
♠ Fact: Conic problems associated with just three specific families of cones cover nearly
all (for all practical purposes – just all) applications of Convex Optimization.
Cones from the three “magic” families possess transparent structure fully utilized by theo-
retically (and practically!) efficient Interior Point methods “tailored” to these cones.
⇒ Reformulating convex program as a conic program from a “magic family” allows to
process the problem by highly efficient dedicated algorithms.

7.8
Linear/Conic Quadratic/Semidefinite Optimization

♣ The three magic families of cones are


• Direct products of nonnegative rays – nonnegative orthants giving rise to Linear Opti-
mization,
• Direct products of Lorentz cones giving rise to Conic Quadratic Optimization, a.k.a.
Second Order Cone Optimization,
• Direct products of Semidefinite cones giving rise to Semidefinite Optimization.

7.9
Linear Optimization

♣ Let K = LO be the family of all nonnegative orthants, i.e., all direct products of nonneg-
ative rays. Conic programs associated with cones from K are exactly the LO programs
min c x : aTi x − bi ≥ 0, 1 ≤ i ≤ m
 T
x | {z }
⇔Ax−b≥Rm 0
+

7.10
Conic Quadratic Optimization

♣ Lorentz cone Lm of dimension m is the regular m


q cone in R given by
Lm = {x ∈ Rm : xm ≥ x21 + ... + x2m−1}
This cone is self-dual.
♠ Let K = CQP be the family of all direct products of Lorentz cones. Conic programs
associated with cones from K are called conic quadratic programs.
“Mathematical Programming”  T form of a conic quadratic program is
T
min c x : kPi x − pi k2 ≤ qi x + ri , 1 ≤ i ≤ m
x | {z }
⇔[Pi x−pi ;qiT x−ri ]∈Lmi
Note: According our convention “sum over empty set is 0”, L1 = R+ is the nonnegative
ray
⇒ All LO programs are Conic Quadratic ones.

7.11
Semidefinite Optimization

♣ Semidefinite cone Sm m
+ of order m “lives” in the space S of real symmetric m × m matrices
and is comprised of positive semidefinite m × m matrices, i.e., symmetric m × m matrices A
such that dT Ad ≥ 0 for all d.
♥ Equivalent descriptions of positive semidefiniteness: A symmetric m × m matrix A is
positive semidefinite (notation: A  0) if and only if it possesses any one of the following
properties:
• All eigenvalues of A are nonnegative, that is,
A = U Diag{λ}U T
with orthogonal U and nonnegative λ.
Note: In the representation A = U Diag{λ}U T with orthogonal U , λ = λ(A) is the vector of
eigenvalues of A taken with their multiplicities
• A= T
P D DT for a rectangular matrix D, or, equivalently, A is the sum of dyadic matrices:
A = ` d` d`
• All principal minors of A are nonnegative.

7.12
♥ The semidefinite cone Sm+ is regular and self-dual, provided that the inner product on the
space Sm where the cone lives is inherited from the natural embedding Sm into Rm×m :
∀A, B ∈ Sm : hA, Bi = i,j Aij Bij = Tr(AB)
P
♠ Let K = SDP be the family of all direct products of Semidefinite cones. Conic programs
associated with cones from K are called semidefinite programs. Thus, a semidefinite pro-
gram is an optimization
n program of the form o
n
min cT x : Ai x − Bi := j=1 xj Aij − Bi  0, 1 ≤ i ≤ m
P
x
Aij , Bi : symmetric ki × ki matrices
Note: A collection of symmetric matrices A1 , ..., Am is comprised of positive semidefinite
matrices iff the block-diagonal matrix Diag{A1 , ..., Am } is  0
⇒ an SDO program can be written down as a problem with a single  constraint (called
also a Linear Matrix Inequality
 T (LMI)):
min c x : Ax − B := Diag{Ai x − Bi , 1 ≤ i ≤ m}  0 .
x

7.13
♣ Three generic conic problems – Linear, Conic Quadratic and Semidefinite Optimization —
posses intrinsic mathematical similarity allowing for deep unified theoretical and algorithmic
developments, including design of theoretically and practically efficient polynomial time
solution algorithms — Interior Point Methods.
♠ At the same time, “expressive abilities” of Conic Quadratic and especially Semidefinite
Optimization are incomparably stronger than those of Linear Optimization. For all practical
purposes, the entire Convex Programming is within the grasp of Semidefinite Optimization.

7.14
LO/CQO/SDO Hierarchy

♠ L1 = R+ ⇒ LO ⊂ CQO ⇒ Linear Optimization is a particular case of Conic Quadratic


Optimization.
♠ Fact: Conic Quadratic Optimization is a particular case of Semidefinite Optimization.
♥ Explanation: The relation x ≥Lk 0 is equivalent to the relation
xk x1 x2 · · · xk−1

 x1 xk 
 x xk
Arrow(x) =    0.

2
 ... ... 
xk−1 xk

As a result, a system of conic quadratic constraints


Ai x − bi ≥Lki 0, 1 ≤ i ≤ m
is equivalent to the system of LMIs
Arrow(Ai x − bi )  0, 1 ≤ i ≤ m.

7.15
Why
x ≥Lk 0 ⇔ Arrow(x)  0 (!)
 
P Q
Schur Complement Lemma: A symmetric block matrix with positive definite
QT R
R is  0 if and only if the matrix P − QR−1 QT is  0.
Proof. We have    
P Q P Q
T  0 ⇔ [u; v]T [u; v] ≥ 0 ∀[u; v]
Q R QT R
⇔ uT P u + 2uT Qv +v T Rv ≥ ∀[u; v]
⇔ ∀u : uT P u + min 2uT Qv + v T Rv ≥ 0
v
⇔ ∀u : uT P u− u QR−1 QT u
T ≥0
⇔ P − QR−1 QT  0 
♠ Schur Complement Lemma ⇒ (!):
• In one direction: Let x ∈ Lk . Then either xk = 0, whence x = 0 and Arrow(x)
  0,
xk x1 · · · xk−1
Pk−1 x2i  x1 xk 
or xk > 0 and i=1 xk ≤ xk , meaning that the matrix 
 . ...  satisfies the
.. 
xk−1 xk
premise of the SCL and thus is  0.

7.16
 
xk x1 · · · xk−1
 x1 xk 
• In another direction: let  ..
 ...   0. Then either xk = 0, and then
. 
xk−1 xk
Pk−1 x2i
x = 0 ∈ Lk , or xk > 0 and k
i=1 xk ≤ xk by the SCL, whence x ∈ L . 

7.17
♣ Example of CQO program: Control of Linear Dynamical system. Consider a
discrete time linear dynamical system given by
x(0) = 0;
x(t + 1) = Ax(t) + Bu(t) + f (t), 0 ≤ t ≤ T − 1
• x(t): state at time t
• u(t): control at time t
• f (t): given external input
Goal: Given time horizon T , bounds on control ku(t)k2 ≤ 1 for all t and desired destination
x∗ , find a control which makes x(T ) as close as possible to x∗ .
The model: From state equations,P
−1 T −t−1
x(T ) = Tt=0 A [Bu(t) + f (t)],
so that the problem in question  is PT −1 T −t−1 
kx∗ − t=0 A [Bu(t) + f (t)]k2 ≤ τ
min τ :
τ,u(0),...,u(T −1) ku(t)k2 ≤ 1, 0 ≤ t ≤ T − 1

7.18
♣ Example of SDO program: Relaxation of a Combinatorial Problem.
♠ Numerous NP-hard combinatorial problems can be posed as problems of quadratic mini-
mization under quadratic constraints:
Opt(P ) = min {f0 (x) : fi (x) ≤ 0, 1 ≤ i ≤ m}
x (P )
fi (x) = xT Qi x + 2bTi x + ci , 0 ≤ i ≤ m
Example: One can model Boolean constraints xi ∈ {0; 1} as quadratic equality con-
straints x2i = xi and then represent them by pairs of quadratic inequalities x2i − xi ≤ 0 and
−x2i + xi ≤ 0
⇒ Boolean Programming problems reduce to (P ).

7.19
Opt(P ) = min {f0 (x) : fi (x) ≤ 0, 1 ≤ i ≤ m}
x (P )
fi (x) = xT Q ix + 2bTi x + ci , 0 ≤ i ≤ m

♠ In branch-and-bound algorithms, an important role is played by efficient bounding of


Opt(P ) from below.
 To this end one can use Semidefinite
  relaxation as follows:
Qi bi xxT x
• We set Fi = , 0 ≤ i ≤ m, and X[x] = , so that
bTi ci xT 1
fi (x) = Tr(Fi X[x]).
⇒ (P ) is equivalent to the problem
min {Tr(F0 X[x]) : Tr(Fi X[x]) ≤ 0, 1 ≤ i ≤ m} (P 0 )
x

7.20
Opt(P ) = min {f0 (x) : fi (x) ≤ 0, 1 ≤ i ≤ m}
 x  (P )
fi (x) = xT Qi x + 2bTi x + ci , 0 ≤ i ≤ m
⇔ min {Tr(F0 X[x]) : Tr(Fi X[x]) ≤ 0, 1 ≤ i ≤ m}
x
(P 0 )
  
Q i bi
Fi = ,0≤i≤m
bTi ci

• The objective and the constraints in (P 0 ) are linear in X[x], and the only difficulty is that
as x runs through Rn , X[x] runs through a difficult for minimization manifold X ⊂ Sn+1
given by the following restrictions:
A. X  0
B. Xn+1,n+1 = 1
C. Rank X = 1
• Restrictions A, B are simple constraints specifying a nice convex domain
• Restriction C is the “troublemaker” – it makes the feasible set of (P ) difficult
♠ In SDO relaxation, we just eliminate the rank constraint C, thus ending up with the SDO
program  
Tr(Fi X) ≤ 0, 1 ≤ i ≤ m,
Opt(SDO) = min Tr(F 0 X) : .
X∈Sn+1 X  0, Xn+1,n+1 = 1
♠ When passing from (P ) ≡ (P 0 ) to the SDO relaxation, we extend the domain over which
we minimize
⇒ Opt(SDO) ≤ Opt(P ).

7.21
What Can Be Expressed via LO/CQO/SDO ?

♣ Consider a family K of regular cones such that


• K is closed w.r.t. taking direct products of cones: K1 , ..., Km ∈ K ⇒ K1 × ... × Km ∈ K
• K is closed w.r.t. passing from a cone to its dual: K ∈ K ⇒ K∗ ∈ K
Examples: LO, CQO, SDO.
Question: When an optimization program
min f (x) (P )
x∈X

can be posed as a conic problem associated with a cone from K ?


Answer: This is the case when the set X and the function f are K-representable, i.e.,
admit representations of the form
X = {x : ∃u : Ax + Bu + c ∈ KX }
Epi{f } := {[x; τ ] : τ ≥ f (x)}
= {[x; τ ] : ∃w : P x + τ p + Qw + d ∈ Kf }
where KX ∈ K, Kf ∈ K.

7.22
Indeed, if
X = {x : ∃u : Ax + Bu + c ∈ KX }
Epi{f } := {[x; τ ] : τ ≥ f (x)}
= {[x; τ ] : ∃w : P x + τ p + Qw + d ∈ Kf }
then problem
min f (x) (P )
x∈X
is equivalent to
says that x ∈ X
 z }| { 
Ax + Bu + c ∈ KX
min τ : P x + τ p + Qw + d ∈ K
x,τ,u,w F}
| {z
says that τ ≥ f (x)
and the constraints read
[Ax + bu + c; P x + τ p + Qw + d] ∈ K := KX × Kf ∈ K .
♣ K-representable sets/functions always are convex.
♣ K-representable sets/functions admit fully algorithmic calculus completely similar to the
one we have developed in the particular case K = LO.

7.23
♠ Example of CQO-representable function: convex
quadratic form
f (x) = xT AT Ax + 2bT x + c
Indeed,
τ ≥ f (x) ⇔ [τ − c − 2bT x] ≥ kAxk22
h i2 h i2
1+[τ −c−2bT x] 1−[τ −c−2bT x]
⇔ 2
− 2
≥ kAxk22
h i
⇔ Ax; 1−[τ −c−2bT x] 1+[τ −c−2bT x]
2
; 2
∈ Ldim b+2
♠ Examples of SDO-representable functions/sets:
• the maximal eigenvalue λmax (X) of a symmetric m × m matrix X:
τ ≥ λmax (X) ⇔ τ| Im −{zX  0}
LMI
• the sum of k largest eigenvalues of a symmetric m × m matrix X
• −Det1/m (X), X ∈ Sm+
• the set Pd of (vectors of coefficients of) nonnegative on a given segment ∆ algebraic
polynomials p(x) = pd xd + pd−1 xd−1 + ... + p1 x + p0 of degree ≤ d.

7.24
Conic Duality Theorem

♣ Consider a conic program  T


Opt(P ) = min c x : Ax ≥K b (P )
x
As in the LO case, the concept of the dual problem stems from the desire to find a
systematic way to bound from below the optimal value Opt(P ).
♠ In the LO case K = Rm + this mechanism was built as follows:
• We observe that for properly chosen vectors of “aggregation weights” λ (specifically, for
λ ∈ Rm T T
+ ) the aggregated constraint λ Ax ≥ λ b is the consequence of the vector inequality
Ax ≥K b and thus λT Ax ≥ λT b for all feasible solutions x to (P )
• In particular, when admissible vector of aggregation weights λ is such that AT λ = c, then
the aggregated constraint reads “cT x ≥ bT λ for all feasible x” and thus bT λ is a lower bound
on Opt(P ). The dual problem is the problem of maximizing this bound:
T AT λ = c
Opt(D) = max{b λ : }
λ λ is admissible for aggregation

7.25

Opt(P ) = min cT x : Ax ≥K b (P )
x

♠ The same approach works in the case of a general cone K. The only issue to be resolved
is:
What are admissible weight vectors λ for (P )? When a valid vector inequality a ≥K b always
implies the inequality λT a ≥ λT b ?
Answer: is immediate: the required λ’s are exactly the vectors from the cone K∗ dual to
K.
Indeed,
• If λ ∈ K∗ , then
a ≥K b ⇒ a − b ∈ K ⇒ λT (a − b) ≥ 0 ⇒ λT a ≥ λT b,
that is, λ is an admissible weight vector.
• If λ is admissible weight vector and a ∈ K, that is, a ≥K 0, we should have λT a ≥ λT 0 = 0,
so that λT a ≥ 0 for all a ∈ K, i.e., λ ∈ K∗ .

7.26

Opt(P ) = min cT x : Ax ≥K b (P )
x

♠ We arrive at the following construction:


• Whenever λ ∈ K∗ , the scalar inequality λT Ax ≥ λT b is a consequence of the constraint in
(P ) and thus is valid everywhere on the feasible set of (P ).
• In particular, when λ ∈ K∗ is such that AT λ = c, the quantity bT λ is a lower bound on
Opt(P ), and the dual problem is tomaximize this bound:
Opt(D) = max bT λ : AT λ = c, λ ≥K∗ 0 (D)
λ
As it should be, in the LO case, where K = Rm m
+ = (R+ )∗ = K∗ , (D) is nothing but the LP
dual of (P ).

7.27
♣ Our “aggregation mechanism” can be applied to conic problems in a slightly more general
format:  
A ` x ≥ K `
` b , 1 ≤ ` ≤ L
Opt(P ) = min cT x : (P )
x∈X Px = p
Here the dual problem is built as follows:
• We associate with every vector inequality constraint
A` x ≥K` b`
dual variable (“Lagrange multiplier”) λ` ≥K`∗ 0, so that the scalar inequality constraint
λT` A` x ≥ λT` b` is a consequence of A` x ≥K` b` and λ` ≥K∗` 0;
• We associate with the system P x = p a “free” vector µ of Lagrange multipliers of the
same dimension as p, so that the scalar inequality µT P x ≥ µT p is a consequence of the
vector equation P x = p;
• We sum up all the scalar inequalities we got, thus arriving at the scalar inequality
hP iT
L T T
PL
`=1 A` λ` + P µ x ≥ `=1 bT` λ` + pT µ (∗)

7.28
 
A ` x ≥ K `
` b , 1 ≤ ` ≤ L
Opt(P ) = min cT x : (P )
x∈X Px = p

Whenever x is feasible for (P ) and λ` ≥K`∗ 0, 1 ≤ ` ≤ L, we have


hP iT
L Tλ + PTµ
PL T T
A
`=1 ` ` x ≥ `=1 b` λ` + p µ (∗)
• If we are lucky to get in the left hand side of (∗) the expression cT x, that is, if
PL T T
`=1 A` λ` + P µ = c, then the right hand side of (∗) is a lower bound on the objec-
tive of (P ) everywhere in the feasible domain of (P ) and thus is a lower bound on Opt(P ).
The dual problem is to maximize this bound:
Opt(D)
 
PL λ
T λ + pT µ : P` ≥ K` 0, 1 ≤ ` ≤ L (D)
= max b
`=1 ` ` L

T T
λ,µ `=1 A` λ` + P µ = c
Note: When all cones K` are self-dual (as it is the case in Linear/Conic Quadratic/Semi-
definite Optimization), the dual problem (D) involves exactly the same cones K` as the
primal problem.

7.29
Example: Dual of a Semidefinite program.
Consider a Semidefinite program
 Pn j j

min cT x : i=1 A` xj  B` , 1 ≤ ` ≤ L
x Px = p
The cones Sk+ are self-dual, so that the Lagrange multipliers for the -constraints are
matrices Λ`  0 of the same size as the symmetric data matrices Aj` , B` . Aggregating the
constraints of our SDO program and recalling that the inner product hA, Bi in Sk is Tr(AB),
the aggregated linear inequality reads
 
Xn XL n
X L
X
j T
xj  Tr(A` Λ` ) + (P µ)j  ≥ Tr(B` Λ` ) + pT µ
j=1 `=1 j=1 `=1

The equality constraints of the dual should say that the left hand side expression, identically
in x ∈ Rn , is cT x, that is, the dual problem reads
 
L j T
X Tr(A` Λ` ) + (P µ)j = cj , 
T
max Tr(B` Λ` ) + p µ : 1≤j≤n
{Λ` },µ 
Λ`  0, 1 ≤ ` ≤ L

`=1

7.30
Symmetry of Conic Duality

 
A ` x ≥ K `
` b , 1 ≤ ` ≤ L
Opt(P ) = min cT x : (P )
x∈X Px = p
 
PL λ` ≥K∗ 0, 1 ≤ ` ≤ L 
`

L
Opt(D) = max bT` λ` + pT µ : P (D)
λ,µ `=1 AT` λ` + P T µ = c 
`=1
♠ Observe that (D) is, essentially, in the same form as (P ), and thus we can build the dual
of (D). To this end, we rewrite  (D) as 
 P L λ` ≥ K∗
` 0, 1 ≤ ` ≤ L 
−Opt(D) = min − T T L
b` λ` − p µ : P T T (D0 )
λ` ,µ  `=1 A` λ` + P µ = c 
`=1

7.31
 
L λ` ≥ K` 0, 1 ≤ ` ≤ L
∗ 
bT` λ` − pT µ : PL
(D0 )
P
−Opt(D) = min −
λ` ,µ  `=1 AT` λ` + P T µ = c 
`=1

Denoting by −x the vector of Lagrange multipliers for the equality constraints in (D0 ), and
by ξ` ≥[K`∗ ]∗ 0 (i.e., ξ` ≥K` 0) the vectors of Lagrange multipliers for the ≥K`∗ -constraints in
(D0 ) and aggregating the constraints of (D0 ) with these weights, we see that everywhere
on the feasible domain of (DP 0 ) it holds:
T T T
` [ξ` − A` x] λ` + [−P x] µ ≥ −c x
• When the left hand side in this inequality as a function of {λ` }, µ is identically equal to
the objective of (D0 ), i.e., when 
ξ` − A` x = −b` 1 ≤ ` ≤ L,
,
−P x = −p
the quantity −cT x is a lower bound on Opt(D0 ) = −Opt(D), and the problem dual to (D)
thus is  
 A ` x = b` + ξ ` , 1 ≤ ` ≤ L 
maxx,ξ` −cT x : P x = p
 ξ` ≥K` 0, 1 ≤ ` ≤ L 
which is equivalent to (P ).
⇒ Conic duality is symmetric!

7.32
Conic Duality Theorem

♠ A conic program in the form  


min cT y : Ry = r, P y ≥K p, Sy ≥ s
y | {z }
Qy≥M q
is called strictly feasible, if there exists a strictly feasible solution ȳ – a feasible solution
where the vector inequality constraint is satisfied as strict: Qȳ >M q. The program is called
essentially strictly feasible, if there exists a feasible solution yb such that P yb >K p.

7.33

Opt(P ) = min cT x : Ax ≥K b, P x = p (P )
 T x
Opt(D) = max b λ + pT µ : λ ≥K∗ 0, AT λ + P T µ = c (D)
λ,µ

Conic Duality Theorem


♠ [Weak Duality] One has Opt(D) ≤ Opt(P ).
♠ [Symmetry] duality is symmetric: (D) is a conic program, and the program dual to (D)
is (equivalent to) (P ).
♠ [Strong Duality] Let one of the problems (P ),(D) be essentially strictly feasible and
bounded. Then the other problem is solvable, and Opt(D) = Opt(P ).
In particular, if both (P ) and (D) are strictly feasible, then both the problems are solvable
with equal optimal values.

7.34
Example: Dual of the SDO relaxation. Recall that given a (difficult to solve!) quadratic
quadratically constrained problem
Opt∗ = min {f0 (x) : fi (x) ≥ 0, 1 ≤ i ≤ m}
x
fi (x) = T
x Q T
i x + 2bi x + ci
we can bound its optimal value from below by passing to the semidefinite relaxation of the
problem:
Opt∗ ≥ Opt 
Tr(Fi X) ≥ 0, 1 ≤ i ≤ m
:= min Tr(F0 X) : (P )
X
 X  0, X n+1,n+1
 ≡ Tr(GX)
 = 1
Qi b i
G= , Fi = , 0 ≤ i ≤ m.
1 bTi ci
Let us build the dual to (P ). Denoting by λi ≥ 0 the Lagrange multipliers for the scalar
inequality constraints, by Λ  0 the Lagrange multiplier for the LMI X  0, and by µ
– the Lagrange multiplier for the equality constraint Xn+1,n+1 = 1, and aggregating the
constraints, we get the aggregated
Pm inequality
Tr([ i=1 λi Fi ]X) + Tr(ΛX) + µTr(GX) ≥ µ
Specializing the Lagrange multipliers to make the left hand side to be identically equal to
Tr(F0 X), the dual problem reads Pm
Opt(D) = max µ : F0 = i=1 λi Fi + µG + Λ, λ ≥ 0, Λ  0
Λ,{λi },µ
We can easily eliminate Λ, thus arriving
 Pmat
Opt(D) = max µ : i=1 λi Fi + µG  F0 , λ ≥ 0 (D)
{λi },µ

7.35
Geometry of Primal-Dual Pair of Conic Problems

♣ Consider a primal-dual pair of conic problems in the form


Opt(P ) = min cT x : Ax ≥K b, P x = p (P )
x  T
Opt(D) = max b λ + pT µ : λ ≥K∗ 0, AT λ + P T µ = c (D)
λ,µ
♠ Assumption: Linear equality constraints in (P ) and (D) are feasible:
∃x̄, λ̄, µ̄ : P x̄ = p & AT λ̄ + P T µ̄ = c
♠ Let us pass in (P ) from variable x to the slack variable ξ = Ax − b. For x satisfying the
equality constraints P x = p of (P ) we have
cT x =[AT λ̄ + P T µ̄]T x = λ̄T Ax + µ̄T P x = λ̄T ξ + µ̄T p + λ̄T b
⇒ (P ) is equivalent to  T
Opt(P) = min λ̄ ξ : ξ ∈ MP ∩ K (P)
ξ  
= Opt(P ) − bT λ̄ + pT µ̄
MP = LP − ξ̄, ξ̄ = b − Ax̄,
LP = {ξ : ∃x : ξ = Ax, P x = 0}
♠ Let us eliminate from (D) the variable µ. For [λ; µ] satisfying the equality constraint
AT λ + P T µ = c of (D) we have
bT λ + pT µ = bT λ + x̄T P T µ = bT λ + x̄T [c − AT λ] = [b − Ax̄]T λ + cT x̄
| {z }
ξ̄
⇒ (D) is equivalent to  T
Opt(D) = max ξ̄ λ : λ ∈ MD ∩ K∗ = Opt(D) − cT x̄ (D)
λ
MD = LD + λ̄, LD = {λ : ∃µ : AT λ + P T µ = 0}

7.36

Opt(P ) = min cT x : Ax ≥K b, P x = p (P )
x 
Opt(D) = max bT λ + pT µ : λ ≥K∗ 0, AT λ + P T µ = c (D)
λ,µ

♣ Intermediate Conclusion: The primal-dual pair (C), (D) of conic problems with feasible
equality constraints is equivalent
 T to the pair  T 
T
Opt(P) = min λ̄ ξ : ξ ∈ MP ∩ K = Opt(P ) − b λ̄ + p µ̄ (P)
ξ
MP = LP − ξ̄, LP = {ξ : ∃x : ξ = Ax, P x = 0}

Opt(D) = max ξ̄ T λ : λ ∈ MD ∩ K∗ = Opt(D) − cT ξ̄ (D)
λ
MD = LD + λ̄, LD = {λ : ∃µ : AT λ + P T µ = 0}
Observation:The linear subspaces LP and LD are orthogonal complements of each other.
Observation: Let x be feasible for (P ) and [λ, µ] be feasible for (D), and let ξ = Ax − b be
the primal slack associated with x. Then
DualityGap(x, λ, µ) = cT x − [bT λ + pT µ]
= [AT λ + P T µ]T x − [bT λ + pT µ]
= λT [Ax − b] + µT [P x − p] = λT [Ax − b] = λT ξ.
Note: To solve (P ), (D) ⇔ to minimize the duality gap over primal feasible x and dual
feasible λ, µ
⇔ to minimize the inner product of ξ T λ over ξ feasible for (P) and λ feasible for (D).

7.37
♣ Conclusion: A primal-dual pair
 T of conic problems
Opt(P ) = min c x : Ax ≥K b, P x = p (P )
x  T
Opt(D) = max b λ + pT µ : λ ≥K∗ 0, AT λ + P T µ = c (D)
λ,µ
with feasible equality constraints is, geometrically, the problem as follows:
♠ We are given
• a regular cone K in certain RN along with its dual cone K∗
• a linear subspace LP ⊂ RN along with its orthogonal complement LD ⊂ RN
• a pair of vectors ξ̄, λ̄ ∈ RN .
These data define
• Primal feasible set Ξ = [LP − ξ̄] ∩ K ⊂ RN
• Dual feasible set Λ = [LD + λ̄] ∩ K∗ ⊂ RN
♠ We want to find a pair ξ ∈ Ξ and λ ∈ Λ with as small as possible inner product. Whenever
Ξ intersects int K and Λ intersects int K∗ , this geometric problem is solvable, and its optimal
value is 0 (Conic Duality Theorem).

7.38

Opt(P ) = min cT x : Ax ≥K b, P x = p (P )
x 
Opt(D) = max bT λ + pT µ : λ ≥K∗ 0, AT λ + P T µ = c (D)
λ,µ

♣ The data LP , ξ̄, LD , λ̄ of the geometric problem associated with (P ), (D) is as follows:
LP = {ξ = Ax : P x = 0}
ξ̄ : any vector of the form Ax − b with P x = p
LD = L⊥ T T
P = {λ : ∃µ : A λ + P µ = 0}
λ̄ : any vector λ such that AT λ + P T µ = c for some µ
• Vectors ξ ∈ Ξ are exactly vectors of the form Ax − b coming from feasible solutions x to
(P ), and vectors λ from Λ are exactly the λ-components of the feasible solutions [λ; µ] to
(D).
• ξ∗ , λ∗ form an optimal solution to the geometric problem if and only if ξ∗ = Ax∗ − b with
P x∗ = p, λ∗ can be augmented by some µ∗ to satisfy AT λ∗ + P T µ∗ = c and, in addition, x∗
is optimal for (P ), and [λ∗ ; µ∗ ] is optimal for (D).

7.39

Opt(P ) = min cT x : Ax ≥K b, P x = p (P )
x  T
Opt(D) = max b λ + pT µ : λ ≥K∗ 0, AT λ + P T µ = c (D)
λ,µ

♣ Conic Programming Optimality Conditions: Let both (P ) and (D) be essentially


strictly feasible. Then a pair (x, [λ; µ]) of primal and dual feasible solutions is comprised of
optimal solutions to the respective problems if and only if
• [Zero Duality Gap]
 DualityGap(x, [λ; µ]) := cT x − [bT λ + pT µ] = 0 
Indeed,
 DualityGap(x, [λ; µ]) = [cT x − Opt(P )] + [Opt(D) − [bT λ + pT µ]] 
| {z } | {z }
≥0 ≥0
and if and only if
• [Complementary Slackness]
 [Ax − b]T λ = 0 
Indeed,
 [Ax − b]T λ = (AT λ)T x − bT λ = [c − P T µ]T x − bT λ 
= cT x − [bT λ + µT P x]
 
 
 = cT x − [bT λ + pT µ] 
= DualityGap(x, λ; µ])

7.40
♣ Conic Duality, same as the LP one, is
• fully algorithmic: to write down the dual, given the primal, is a purely mechanical
process
• fully symmetric: the dual problem “remembers” the primal one

♥ Cf. Lagrange Duality:


min {f (x) : gi (x) ≤ 0, i = 1, ..., m} (P )
x
 ⇓
 
P
max L(y):= minx f (x) + yi gi (x) (D)
y≥0 i
• Dual “exists in the nature”, but is given implicitly; its objective, typically, is not available
in a closed form
• Duality is asymmetric: given L(·), we, typically, cannot recover f and gi ...

7.41
♣ Conic Duality in the case of Magic cones:
• powerful tool to process problem, to some extent, “on paper”, which in many cases
provides extremely valuable insight and/or allows to end up with a problem much better
suited for numerical processing
• is heavily exploited by efficient polynomial time algorithms for Magic conic problems

7.42
Illustration: Semidefinite Relaxation

♣ Recall that a quadratically constrained quadratic program


Opt n {f0 (x) : fi (x) ≤ 0, 1 ≤ i ≤ m}
 = minx∈R T T
 (QP )
fi (x) = x Qi x + 2bi x + ci , 0 ≤ i ≤ m
admits semidefinite relaxation: We associate with  x the symmetric
 matrix
xx T x
X[x] = [x; 1][x; 1]T = T
x 1
rewrite (QP ) equivalently as
≤ ≤ ≤
 
 Tr(F i X) 0, 1 i m 
X = X[x] for some x
 
Opt = min Tr(F0 X) :
X  m
(QP 0 )
 

 X  0, X n+1,n+1
 = 1,Rank (X) = 1
Q i bi
Fi =
bTi ci
and remove the “troublemaking” rank restriction, arriving at the semidefinite relaxation of
(QP ) – the problem  
Tr(Fi X) ≤ 0, 1 ≤ i ≤ M
Opt(SDO) = min Tr(F0 X) :
X X  0, Xn+1,n+1 = 1

7.43
Opt = minn {f0 (x) : fi (x) ≤ 0, 1 ≤ i ≤ m}
 x∈R  (QP )
fi (x) = xT Qi x + 2bTi x + ci , 0 ≤ i ≤ m
 m 
 Tr(F X)
i ≤ 0, 1 ≤
 i ≤ m 
xx T x
Opt = min Tr(F0 X) :
X  X= T for some x 
  x  1 (QP 0 )
Qi bi
Fi =
bTi ci
 ⇓ 
Tr(Fi X) ≤ 0, 1 ≤ i ≤ M
Opt(SDO) = min Tr(F0 X) : (SDO)
X X  0, Xn+1,n+1 = 1

♠ Probabilistic Interpretation of (SDO):


Assume that instead of solving (QP ) in deterministic variables x, we are solving the problem
in random vectors ξ and want to minimize the expected value of the objective under the
restriction that the constraints are satisfied at average.
Since fi are quadratic, the expectations of the objective and the constraints are affine
functions of the moment matrix  T 
ξξ ξ
X=E
ξT 1
which can be an arbitrary symmetric positive semidefinite matrix X with Xn+1,n+1 = 1. The
“randomized” version of (QP ) is exactly (SDO) (check it!)

7.44
♣ With outlined interpretation, an optimal solution to (SDO) gives rise to (various) ran-
domized solutions to the problem of interest.
In good cases, we can extract from these randomized solutions feasible solutions to the
problem of interest with reasonable approximation guarantees in terms of optimality.
We can, e.g.,
— use X∗ to generate a sample ξ 1 , ..., ξ N of, say, N = 100 random solutions to (QP ),
— “correct” ξ t to get feasible solutions xt to (QP ).
The approach works when the correction is easy, e.g.,
when at some known point x̄ the constraints of (QP ) are
satisfied strictly. Here we can take as xt the closest to ξ t
feasible solution from the segment [x̄, ξ t ].
— select from the resulting N feasible solutions xt to (QP ) the best in terms of the objec-
tive.
♥ When applicable, the outlined approach can be combined with local improvement – N
runs of any traditional algorithm for nonlinear optimization as applied to (QP ), x1 , ..., xN
being the starting points of the runs.

7.45
Lagrangian Relaxation

♣ Recall that for every MP problem


Opt(P ) = min {f (x) : gi (x) ≤ 0, 1 ≤ i ≤ m} (P )
x∈X

its Lagrange function


X
L(x, λ) = f (x) + λi gi (x)
i
underestimates f (x) on the feasible set of (P ), provided λ ≥ 0: λ ≥ 0 ⇒
 
Opt(D) = max L(λ) := inf L(x, λ) ≤ Opt(P )
λ≥0 x∈X

(“Weak Lagrange duality”).


♠ Whenever L is efficiently computable, Opt(D) is an efficiently computable lower bound
on Opt(P ).

7.46
♣ Example:
Opt = minn {f0 (x) : fi (x) ≤ 0, 1 ≤ i ≤ m} (QP )
 x∈R 
fi (x) = xT Qi x + 2bTi x + ci , 0 ≤ i ≤ m
• Applying Lagrange Relaxation Scheme, we get
L(λ) = inf x f0 (x) + m
 P
i=1 λi fi (x)
T
 Pm   Pm T
= inf x x Q0 + i=1 λi Qi x + 2 b0 + i=1 λi bi x

 Pm 
+ c0 + i=1 λi ci

Simple Fact: xT P x + 2q T x + r ≥ τ for n


 all x ∈ R iff
P q
0
qT r − τ
• Using Simple Fact, the Lagrange
 dual
 of (QPm) becomes 
m

P P
 Q0 + λi Qi b0 + λi bi

 

  
Opt(D) = max τ : λ ≥ 0,  i=1 i=1 0
m m
λ,τ 
bT0 + λi bTi c0 +
 P P 

 λi ci − τ 


i=1 i=1

7.47
Opt = minn {f0 (x) : fi (x) ≤ 0, 1 ≤ i ≤ m} (QP )
 x∈R 
fi (x) = xT Qi x + 2bTi x + ci , 0 ≤ i ≤ m

♠ Note: The SDO relaxations of (QP ) resulting from our two relaxation schemes read
 
Tr(Qi X) ≤ 0, 1 ≤ i ≤ M
Opt(SDO) = min Tr(F0 X) : (P )
X
  X  0, X n+1,n+1 = 1 
m m

P P
Q + λ Q b + λ b
 
0 i i 0 i i

 

   
i=1 i=1
 0
  
m m

SDO = max τ :  T P T
P  (D)
λ,τ 
 b 0 + λi b i c 0 + ci λi − τ 


 i=1 i=1 

λ≥0
 

On a closest inspection, they are just semidefinite duals of each other!

7.48
♣ Example: Quadratic Maximization over the box
Opt = maxx {xT Lx : x2i ≤ 1, 1 ≤ i ≤ n} (QP )
⇒ Opt(SDO) = maxX {Tr(XL) : X  0, Xii ≤ 1 ∀i} (SDO)
Note: When L  0 or L has zero diagonal, Opt and Opt(SDO) remain intact when the
inequality constraints are replaced with their equality versions.
♠ MAXCUT: The combinatorial problem “given n-node graph with arcs assigned nonneg-
ative weights aij = aij , 1 ≤ i, j ≤ n, split the nodes into two non-overlapping subsets to
maximize the total weight of the arcs linking nodes from different subsets” is equivalent to
(QP ) with  P
Lij = k aik , j = i
−aij , j 6= i
♠ Theorem of Goemans and Williamson ’94:
Opt ≤ Opt(SDO) ≤ 1.1383 · Opt (!)
Note: To approximate Opt within 4% is NP-hard...

7.49
Sketch of the proof of (!): treat an optimal solution X∗ of (SDO) as the covariance
matrix of zero mean Gaussian random vector ξ and look at
E{sign[ξ]T Lsign[ξ]}.

7.50
Illustration: MAXCUT, 1024 nodes, 2614 arcs.

Suboptimal
 cut, weight ≥ 0.9196 · Opt(SDO) ≥ 0.9196 · Opt

Slightly better than Goemans-Williamson guarantee:
weight ≥ 0.8785 · Opt(SDO) ≥ 0.8785 · Opt

7.51
Opt = max{xT Lx : x2i ≤ 1, 1 ≤ i ≤ n} (QP )
x
⇒ Opt(SDO) = max{Tr(XL) : X  0, Xii ≤ 1 ∀i}(SDO)
X
♠ Nesterov’s π/2 Theorem. Matrix L arising in MAXCUT is  0 (and possesses addi-
tional properties). What can be said about (SDO) under the only restriction L  0?
Answer [Nesterov’98]: Opt ≤ Opt(SDO) ≤ π2 · Opt.
Illustration: L: randomly built positive semidefinite 1024 × 1024 matrix. Relaxation com-
bined with local improvement yields a feasible solution x̄ with
x̄T Lx̄ ≥ 0.7867 · Opt(SDO) ≥ 0.7867 · Opt

7.52
Opt = max{xT Lx : x2i ≤ 1, 1 ≤ i ≤ n} (QP )
x
⇒ Opt(SDO) = max{Tr(XL) : X  0, Xii ≤ 1 ∀i} (SDO)
X

♠ The case of indefinite L: When L is an arbitrary symmetric matrix, one has


Opt ≤ Opt(SDO) ≤ O(1) ln(n)Opt.
This is a particular case of the following result: The SDO relaxation
Opt(SDO) = max {Tr(XL) : Tr(XQi ) ≤ 1, i ≤ m}
X

of the problem
T

Opt = max
 x x Lx P : xT Qi x ≤1, i ≤ m
(P )
Qi  0 ∀i, i Qi  0
satisfies Opt ≤ Opt(SDO) ≤ O(1) ln(m)Opt.
Illustration, A: Problem (QP ) with randomly selected indefinite 1024 × 1024 matrix L.
Relaxation combined with local improvement yields a feasible solution x̄ with
x̄T Lx̄ ≥ 0.7649 · Opt(SDO) ≥ 0.7649 · Opt
Illustration, B: Problem (P ) with randomly selected indefinite 1024×1024 matrix L and 64
randomly selected positive semidefinite matrices Qi of rank 64. Relaxation yields a feasible
solution x̄ with
x̄T Lx̄ ≥ 0.9969 · Opt(SDO) ≥ 0.9969 · Opt

7.53
Illustration: Lyapunov Stability Analysis

♣ Consider an uncertain time varying linear dynamical system


d
x(t) = A(t)x(t) (ULS)
dt
• x(t) ∈ Rn : state at time t,
• A(t) ∈ Rn×n : known to take all values in a given uncertainty set U ⊂ Rn×n .
♠ (ULS) is called stable, if all trajectories of the system go to 0 as t → ∞:
d
A(t) ∈ U ∀t ≥ 0, dt x(t) = A(t)x(t) ⇒ lim x(t) = 0.
t→∞
♣ Question: How to certify stability?
♠ Standard sufficient stability condition is the existence of Lyapunov Stability Certificate
d
– a matrix X  0 such that the function L(x) = xT Xx for some α > 0 satisfies dt L(x(t)) ≤
−αL(x(t)) for all trajectories and thus goes to 0 exponentially fast along the trajectories:
d d
dt
L(x(t)) ≤ −αL(x(t)) ⇒ dt [exp{αt}L(x(t))] ≤ 0
⇒ exp{αt}L(x(t)) ≤ L(x(0)), t ≥ 0
⇒ L(x(t)) ≤ exp{−αt}L(x(0))
⇒ kx(t)k22 ≤ λλmax (X)
min (X)
exp{−αt}kx(0)k22
• For a time-invariant system, this condition is necessary and sufficient for stability.

7.54
♠ Question: When α > 0 is such that
d
dt
L(x(t)) ≤ −αL(x(t)) for all trajectories x(t) satisfying
d
dt
x(t) = A(t)x(t) with A(t) ∈ U for all t ?
♥ Answer: We should have
d d d
T

dt
x (t)Xx(t) = ( dt x(t))T Xx(t) + xT (t)X dt x(t)
T
= x (t)A T T
T
 (t)Xx(t)
T
+ x (t)XAx(t)

= x (t) A (t)X + XA(t) x(t)
≤ −αxT (t)Xx(t)
Thus,
d
dt
L(x(t))
 ≤ −αL(x(t)) for all trajectories
⇔ xT (t) AT (t)X + XA(t) x(t) ≤ −αxT (t)Xx(t) for all trajectories
⇔ xT (t)[AT (t)X + XA(t) + αX]x(t) ≤ 0 for all trajectories
⇔ AT X + XA  −αX ∀A ∈ U
⇒ X  0 is LSC for a given α > 0 iff X solves semi-infinite LMI
AT X + XA  −αX ∀A ∈ U
⇒ Uncertain linear dynamical system
d
dt
x(t) = A(t)x(t), A(t) ∈ U
admits an LSC iff the semi-infinite system of LMI’s
X  I, AT X + XA  −I ∀A ∈ U
in matrix variable X is solvable.
♠ But: SDP is about finite, and not semi-infinite, systems of LMI’s. Semi-infinite systems
of LMI’s typically are heavily computationally intractable...

7.55
X  I, AT X + XA  −I ∀A ∈ U (!)

♠ Solvable case I: Scenario (a.k.a. polytopic) uncertainty U = Conv{A1 , ..., AN }. Here (!)
is equivalent to the finite system of LMI’s
X  I, ATk X + XAk  −I, 1 ≤ k ≤ N
♠ Solvable case II: Unstructured Norm-Bounded uncertainty
U = {A = Ā + B∆C : k∆k2,2 ≤ ρ},
• k · k2,2 : spectral norm of a matrix.
♥ Example: We close open loop time invariant system
d
dt
x(t) = P x(t) + Bu(t) [state equations]
y(t) = Cx(t) [observed output]
with linear feedback
u(t) = Ky(t),
thus arriving at the closed loop system
d
dt
x(t) = [P + BKC]x(t)
and want to certify stability of the closed loop system when the feedback matrix K is subject
to time-varying norm-bounded perturbations:
K = K(t) ∈ V = {K̄ + ∆ : k∆k2,2 ≤ ρ}.
This is exactly the same as to certify stability of the system
d
dt
x(t) = A(t)x(t), A(t) ∈ U = {P B K̄C} +B∆C}
| +{z

with unstructured norm-bounded uncertainty.

7.56
• Observation: The semi-infinite system of LMI’s
X  I & AT X + XAT  −I ∀(A = Ā + B∆C : k∆k2,2 ≤ ρ)
is of the generic
 form
 (A) : finite system of LMI’s in variables x
semi-infinite LMI

T T T
 (!) : A(x) + L (x)∆R + R ∆ L(x)  0 ∀(∆ : k∆k2,2 ≤ ρ)

A(x), L(x): affine in x
♠ Fact: [S. Boyd et al, early 90’s] Assuming w.l.o.g. that R 6= 0, the semi-infinite LMI (!)
can be equivalently represented  by the usual LMI 
T
A(x) − λR R ρL (x)T
 0 (!!)
ρL(x) λI
in variables x, λ, meaning that x satisfies (!) if and only x can be augmented by properly
selected λ to satisfy (!!).

7.57
♣ Key argument when proving Fact:
S-Lemma: A homogeneous quadratic inequality
xT Bx ≥ 0 (B)
is a consequence of strictly feasible homogeneous quadratic inequality
xT Ax ≥ 0 (A)
if and only if (B) can be obtained by taking weighted sum, with nonnegative weights, of
(A) and identically true homogeneous quadratic inequality:
T
∃(λ ≥ 0 & C : x ≥ 0 ∀x}) : xT Bx ≡ λxT Ax + xT Cx
| Cx{z
⇔C0
or, which is the same, if and only if
∃λ ≥ 0 : B  λA.
Immediate corollary: A quadratic inequality
xT Bx + 2bT x + β ≥ 0
is a consequence of strictly feasible quadratic inequality
xT Ax + 2aT x + α ≥ 0
iff  
T
B − λA b − λa T
∃λ ≥ 0 : 0
b − λa β − λα
⇒ We can efficiently optimize a quadratic function over the set given by a single strictly
feasible quadratic constraint.

7.58
♣ S-Lemma: A homogeneous quadratic inequality
xT Bx ≥ 0 (B)
is a consequence of strictly feasible homogeneous quadratic inequality
xT Ax ≥ 0 (A)
if and only if (B) can be obtained by taking weighted sum, with nonnegative weights, of
(A) and identically true homogeneous quadratic inequality:
T
∃(λ ≥ 0 & C : x ≥ 0 ∀x}) : xT Bx ≡ λxT Ax + xT Cx
| Cx{z
⇔C0
or, which is the same, if and only if
∃λ ≥ 0 : B  λA.
♠ Note: The “if” part of the claim is evident and remains true when we replace (A) with a
finite system of quadratic inequalities: Let a system of homogeneous quadratic inequalities
xT Ai x ≥ 0, 1 ≤ i ≤ m,
and a “target” inequality xT Bx ≥ 0 be given. If the target inequality can be obtained by
taking weighted sum, with nonnegative coefficients, of the inequalities of the system and
an identically true homogeneous quadratic inequality, or, equivalently, If there exist λi ≥ 0
such that P
B  i λi Ai ,
then the target inequality is a consequence of the system.

7.59
∃λi ≥ 0 : B  m
P
i=1 λi Ai (!)
T T
⇒ x Bx ≥ 0 is a consequence of x Ai x ≥ 0, 1 ≤ i ≤ m
• If instead of homogeneous quadratic inequalities we were speaking about homogeneous
linear ones, similar sufficient condition for the target inequality to be a consequence of the
system would be also necessary (Homogeneous Farkas Lemma).
• The power of S-Lemma is in the claim that when m = 1, the sufficient condition (!) for
the target inequality xT Bx ≥ 0 to be a consequence of the system xT Ai x ≥ 0, 1 ≤ i ≤ m, is
also necessary, provided the “system” xT A1 x ≥ 0 is strictly feasible.
The “necessity” part of S-Lemma fails to be true when m > 1.

7.60
From S-Lemma to Fact

A + LT ∆R + RT ∆T L  0 ∀(∆ : k∆k2,2 ≤ ρ)
m
T T T
A + [ρL] ∆R + R ∆ [ρL]  0 ∀(∆ : k∆k2,2 ≤ 1)
m
T T
ξ Aξ + 2[∆Rξ ] [ρLξ] ≥ 0 ∀(∆, ξ : k∆k2,2 ≤ 1)
| {z }
η
m
ξ T Aξ + 2η T [ρLξ] ≥ 0 ∀(ξ, η : kηk2 ≤ kRξk2 )
m
ξ R Rξ − η η ≥ 0 ⇒ ξ T Aξ + 2η T ρLξ) ≥ 0
T T T

m
∃λ ≥ 0 : ξ Aξ + 2η ρLξ − λ[ξ T RT Rξ − η T η] ≥ 0 ∀(ξ, η)
T T

 m 
A − λRT R ρLT
∃λ ≥ 0 :  0.
ρL λI

7.61
Proof of the “only if” part of S-Lemma

• Situation: We are given two symmetric matrices A, B such that


(I): ∃x̄ : x̄T Ax̄ > 0
and
(II): xT Ax ≥ 0 implies xT Bx ≥ 0
or, equivalently,
Opt := minx {xT Bx : xT Ax ≥ 0} ≥ 0
(I-II):
and the constraint xT Ax ≥ 0 is strictly feasible
• Goal: To prove that
(III): ∃λ ≥ 0 : B  λA
or, equivalently, that
(III0 ): SDO := minX {Tr(BX) : Tr(AX) ≥ 0, X  0} ≥ 0.
Equivalence of (III) and (III0 ): By (I), semidefinite program in (III0 ) is strictly feasible.
Since the program is homogeneous, its optimal value is either 0, or −∞. By Conic Duality,
the optimal value is finite (i.e., 0) if and only if the dual problem
maxλ,Y {0 : B = λA + Y, λ ≥ 0, Y  0}
is solvable, which is exactly (III).

7.62
• Given that xT Ax ≥ 0 implies xT Bx ≥ 0 we should prove that
Tr(BX) ≥ 0 whenever Tr(AX) ≥ 0 and X  0
• Let X  0 be such that Tr(AX) ≥ 0, and let us prove that Tr(BX) ≥ 0.
There exists orthogonal U such that U T X 1/2 AX 1/2 U is diagonal
⇒ For every vector ξ with ±1 entries:
[X 1/2 U ξ]T A[X 1/2 U ξ] = ξ T [U T X 1/2 AX 1/2 U ] ξ
| {z }
diagonal
= Tr(U X 1/2 AX 1/2 U )
T

= Tr(AX) ≥ 0
⇒ For every vector ξ with ±1 entries:
0 ≤ [X 1/2 U ξ]T B[X 1/2 U ξ] = ξ T [U T X 1/2 BX 1/2 U ]ξ
⇒ [Taking average over ±1 vectors ξ]
0 ≤ Tr(U T X 1/2 BX 1/2 U ) = Tr(BX)
Thus, Tr(BX) ≥ 0, as claimed.

7.63
Interior Point Methods for LO and SDO

♣ Interior Point Methods (IPM’s) are state-of-the-art theoretically and practically effi-
cient polynomial time algorithms for solving well-structured convex optimization programs,
primarily Linear, Conic Quadratic and Semidefinite ones.
Modern IPMs were first developed for LO, and the words “Interior Point” are aimed at
stressing the fact that instead of traveling along the vertices of the feasible set, as in the
Simplex algorithm, the new methods work in the interior of the feasible domain.
♠ Basic theory of IPMs remains the same when passing from LO to SDO
⇒ It makes sense to study this theory in the more general SDO case.

8.1
Primal-Dual Pair of SDO Programs

♣ Consider an SDO program in thenform o


T
Pn
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x
where Aj , B are m×m block diagonal symmetric matrices of a given block-diagonal structure
ν (i.e., with a given number and given sizes of diagonal blocks). (P ) can be thought of as
a conic problem on the self-dual and regular positive semidefinite cone Sν+ in the space Sν
of symmetric block diagonal m × m matrices with block-diagonal structure ν.
Note: In the diagonal case (with the block-diagonal structure in question, all diagonal
blocks are of size 1), (P ) becomes a LO program with m linear inequality constraints and
n variables.

8.2
n Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x

♠ Standing Assumption A: The mapping x 7→ Ax has trivial kernel, or, equivalently, the
matrices A1 , ..., An are linearly independent.
♠ The problem dual to (P ) is
Opt(D) = maxν {Tr(BS) : S  0, Tr(Aj S) = cj ∀j} (D)
S∈S
♠ Standing Assumption B: Both (P ) and (D) are strictly feasible (⇒ both problems are
solvable with equal optimal values).
♠ Let C ∈ Sν satisfy the equality constraint in (D). Passing in (P ) from x to the primal
slack X = Ax − B, (P ) becomes the problem

Opt(P) = minν Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X∈S
LP = {X = Ax} = Lin{A1 , ..., An }
while (D) is the problem

Opt(D) = maxν Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S∈S
LD = L⊥
P = {S : Tr(Aj S) = 0, 1 ≤ j ≤ n}

8.3
n Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
⇐ Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
 S
LP = ImA, LD = L⊥

P

Since (P ) and (D) are strictly feasible, both problems are solvable with equal optimal values,
and a pair of feasible solutions X to (P) and S to (D) is comprised of optimal solutions to
the respective problems iff Tr(XS) = 0.
Fact: For positive semidefinite X, S, Tr(XS) = 0 if and only if XS = SX = 0.

8.4
Proof: • Standard Fact of Linear Algebra: For every matrix A  0 there exists exactly
one matrix B  0 such that A = B 2 ; B is denoted A1/2 .
• Standard Fact of Linear Algebra: Whenever A, B are matrices such that the product
AB makes sense and is a square matrix, Tr(AB) = Tr(BA).
• Standard Fact of Linear Algebra: Whenever A  0 and QAQT makes sense, we have
QAQT  0.
• Standard Facts of LA ⇒ Claim:
0 = Tr(XS) = Tr(X 1/2 X 1/2 S) = Tr(X 1/2 SX 1/2 ) ⇒ All diagonal entries in the positive
semidefinite matrix X 1/2 SX 1/2 are zeros ⇒ X 1/2 SX 1/2 = 0
⇒ (S 1/2 X 1/2 )T (S 1/2 X 1/2 ) = 0 ⇒ S 1/2 X 1/2 = 0
⇒ SX = S 1/2 [S 1/2 X 1/2 ]X 1/2 = 0 ⇒ XS = (SX)T = 0. 

8.5
n Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
LP = ImA, LD = L⊥

P

Theorem: Assuming (P ), (D) strictly feasible, feasible solutions X for (P) and S for (D)
are optimal for the respective problems if and only if
XS = SX = 0
(“SDO Complementary Slackness”).

8.6
Logarithmic Barrier for the Semidefinite Cone Sν+
n Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
LP = ImA, LD = L⊥
 
P

♣ A crucial role in building IPMs for (P ), (D) is played by the logarithmic barrier for the
positive semidefinite cone:
K(X) = − ln Det(X) : int Sµ+ → R

8.7
Back to Basic Analysis: Gradient and Hessian

♣ Consider a smooth (3 times continuously differentiable) function f (x) : D → R defined


on an open subset D of Euclidean space E.
♠ The first order directional derivative of f taken at a point x ∈ D along a direction h ∈ E
is the quantity
d
Df (x)[h] := dt t=0
f (x + th)
Fact: For a smooth f , Df (x)[h] is linear in h and thus
Df (x)[h] = h∇f (x), hi ∀h
for a uniquely defined vector ∇f (x) called the gradient of f at x.
If E is Rn with the standard Euclidean structure, then

[∇f (x)]i = ∂x i
f (x), 1 ≤ i ≤ n

8.8
♠ The second order directional derivative of f taken at a point x ∈ D along a pair of
directions g, h is defined as
d
D2 f (x)[g, h] = dt t=0
[Df (x + tg)[h]]
Fact: For a smooth f , D2 f (x)[g, h] is bilinear and symmetric in g, h, and therefore
D2 f (x)[g, h] = hg, ∇2 f (x)hi = h∇2 f (x)g, hi∀g, h ∈ E
for a uniquely defined linear mapping h 7→ ∇2 f (x)h : E → E, called the Hessian of f at x.
If E is Rn with the standard Euclidean structure, then 2
[∇2 f (x)]ij = ∂x∂i ∂xj f (x)
Fact: Hessian is the derivative of the gradient:
∇f (x + h) = ∇f (x) + [∇2 f (x)]h + Rx (h),
kRx (h)k ≤ Cx khk2 ∀(h : khk ≤ ρx ), ρx > 0
Fact: Gradient and Hessian define the second order Taylor expansion
fb(y) = f (x) + hy − x, ∇f (x)i + 21 hy − x, ∇2 f (x)[y − x]i
of f at x which is a quadratic function of y with the same gradient and Hessian at x as
those of f . This expansion approximates f around x, specifically,
|f (y) − fb(y)| ≤ Cx ky − xk3
∀(y : ky − xk ≤ ρx ), ρx > 0

8.9
n Back to SDO
Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
LP = ImA, LD = L⊥

P
K(X) = − ln DetX : S++ := {X ∈ Sν : X  0} → R
ν

Facts: K(X) is a smooth function on its domain Sν++ = {X ∈ Sν : X  0}. The first- and
the second order directional derivatives of this function taken at a point X ∈ domK along
a direction H ∈ Sν are given by
d −1 −1
 
dt t=0
K(X + tH) = −Tr(X H) ⇔ ∇K(X) = −X
2
d
dt2 t=0
K(X + tH) = Tr(H[X −1 HX −1 ]) = Tr([X −1/2 HX −1/2 ]2 )
In particular, K is strongly convex:
d2
X ∈ DomK, 0 6= H ∈ Sν ⇒ dt 2
t=0
K(X + tH) > 0

8.10
Proof:

d d
dt t=0
[− ln Det(X + tH)] = dt t=0
[− ln Det(X[I + tX −1 H)]
d
= dt t=0
[− ln Det(X) − ln Det(I + tX −1 H)]
d
= dt t=0
[− ln Det(I + tX −1 H)]
d
= − dt t=0
[Det(I + tX −1 H)] [chain rule]
= −Tr(X −1 H)
d d
−Tr([X + tG]−1 H) = dt + tX −1 G]]−1 H)
   
dt t=0  t=0
−Tr([X[I
d −1 G]−1 X −1 H)

dt t=0
−Tr([I
d + tX
−1 G]−1 X −1 H
 
= −Tr dt t=0
[I + tX
= Tr(X −1 GX −1 H)
In particular, when X  0 and H ∈ Sν , H 6= 0, we have
d2
dt2 t=0
K(X + tH) = Tr(X −1 HX −1 H)
= Tr(X −1/2 [X −1/2 HX −1/2 ]X −1/2 H)
= Tr([X −1/2 HX −1/2 ]X −1/2 HX −1/2 )
= hX −1/2 HX −1/2 , X −1/2 HX −1/2 i > 0.

8.11
Additional properties of K(·):
• ∇K(tX) =−[tX]−1 = −t−1 X −1 =t−1 ∇K(X)
• The mapping X 7→ −∇K(X) = X −1 maps the domain Sν++ of K onto itself and is self-
inverse:
S = −∇K(X) ⇔ X = −∇K(S) ⇔ XS = SX = I
• The function K(X) is an interior penalty for the positive semidefinite cone Sν+ : whenever
points Xi ∈ DomK = Sν++ converge to a boundary point of Sν+ , one has K(Xi ) → ∞ as
i → ∞.

8.12
Primal-Dual
n Central Path o
T
Pn
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
K(X) = − ln Det(X)

Let
X = {X ∈ LP − B : X  0}
S = {S ∈ LD + C : S  0}.
be the (nonempty!) sets of strictly feasible solutions to (P) and (D), respectively. Given
path parameter µ > 0, consider the functions
Pµ (X) = Tr(CX) + µK(X) : X → R
.
Dµ (S) = −Tr(BS) + µK(S) : S → R
Fact: For every µ > 0, the function Pµ (X) achieves its minimum at X at a unique point
X∗ (µ), and the function Dµ (S) achieves its minimum on S at a unique point S∗ (µ). These
points are related to each other:
X∗ (µ) = µS∗−1 (µ) ⇔ S∗ (µ) = µX∗−1 (µ)
⇔ X∗ (µ)S∗ (µ) = S∗ (µ)X∗ (µ) = µI
We associate with (P), (D) the primal-dual central path – the curve {X∗ (µ), S∗ (µ)}µ>0 ; for
every µ > 0, X∗ (µ) is a strictly feasible solution to (P), and S∗ (µ) is a strictly feasible
solution to (D).

8.13
n o
n
Opt(P ) = min cT x : Ax := j=1 xj Aj  B
P
(P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
LP = ImA, LD = L⊥

P

Proof of the fact: A. Let us prove that the primal-dual central path is well defined. Let
S̄ be a strictly feasible solution to (D). For every pair of feasible solutions X, X 0 to (P) we
have
0
hX − X 0 , S̄i = hX − X 0 , Ci + hX
| −
0
{z C} i = hX − X , Ci
{z X}, |S −
∈LP ∈LD =L⊥P
⇒ On the feasible plane of (P), the linear functions Tr(CX) and Tr(S̄X) of X differ by a
constant
⇒ To prove the existence of X∗ (µ) is the
 same as to prove that the feasible problem
Opt = min Tr(S̄X) + µK(X) (R)
X∈X
is solvable.

8.14
 
Opt = min Tr(S̄X) + µK(X) (R)
X∈X

Let Xi ∈ X be such that  


Tr(S̄Xi ) + µK(Xi ) → Opt as i → ∞.
We claim that a properly selected subsequence {Xij }∞ j=1 of the sequence {Xi } has a limit
X̄  0.
Claim ⇒ Solvability of (R): Since
 Xij → X̄  0 as j → ∞, we have X̄ ∈ X and
Opt = limj→∞ Tr(S̄Xij ) + µK(Xij ) = Tr(S̄ X̄) + µK(X̄)
⇒ X̄ is an optimal solution to (R).

8.15
Proof of Claim “Let Xi  0 be such  that 
limi→∞ Tr(S̄Xi ) + µK(Xi ) < +∞.
Then a properly selected subsequence of {Xi }∞ i=1 has a limit which is  0”:
First step: Let us prove that Xi form a bounded sequence.
Lemma: Let S̄  0. Then there exists c = c(S̄) > 0 such that Tr(X S̄) ≥ ckXk for all X  0.
Indeed, there exists ρ > 0 such that S̄ − ρU  0 for all U ∈ Sν , kU k ≤ 1. Therefore for every
X  0 we have
∀(U, kU k ≤ 1) : Tr([S̄ − ρU ]X) ≥ 0
⇒ Tr(S̄X) ≥ ρ max Tr(U X) = ρkXk.
U :kU k≤1
Now let Xi satisfy the premise of our claim. Then
Tr(S̄Xi ) + µK(Xi ) ≥ c(S̄)kXi k − µ ln(kXi km ).
Since the left hand side sequence is above bounded and cr − µ ln(rm ) → ∞ as r → +∞, the
sequence kXi k indeed is bounded.

8.16
 
“Let Xi  0 be such that limi→∞ Tr(S̄Xi ) + µK(Xi ) < +∞. Then a properly selected
subsequence of {Xi }∞i=1 has a limit which is  0”
Second step: Let us complete the proof of the claim. We have seen that the

sequence {Xi }i=1 is bounded, and thus we can select from it a converging subse-
quence Xij . Let X̄ = limj→∞ Xij . If X̄ were a boundary point of Sν+ , we would have
Tr(S̄Xij ) + µK(Xij ) → +∞, j → ∞
which is not the case. Thus, X̄ is an interior point of Sν+ , that is, X̄  0.
The existence of S∗ (µ) is proved similarly, with (D) in the role of (P).
The uniqueness of X∗ (µ) and S∗ (µ) follows from the fact that these points are minimizers
of strongly convex functions.

8.17
B. Let us prove that S∗ (µ) = µX∗−1 (µ). Indeed, since X∗ (µ)  0 is the minimizer of
Pµ (X) = Tr(CX)+µK(X) on X = {X ∈ [LP −B]∩ Sν++ }, the first order directional derivatives
of Pµ (X) taken at X∗ (µ) along directions from LP should be zero, that is, ∇Pµ (X∗ (µ)) should
belong to LD = L⊥ P . Thus,
C − µX∗−1 (µ) ∈ LD ⇒ S := µX∗−1 (µ) ∈ C + LD & S  0
⇒ S ∈ S. Besides this,
∇K(S) = −S −1 = −µ−1 X∗ (µ) ⇒ µ∇K(S) = −X∗ (µ)
⇒ µ∇K(S) ∈ −[LP − B] ⇒ µ∇K(S) − B ∈ LP = L⊥ D
⇒ ∇Dµ (S) is orthogonal to LD
⇒ S is the minimizer of Dµ (·) on S = [LD + C] ∩ Sν++ .
⇒ µX∗−1 (µ) =: S = S∗ (µ). 

8.18
Duality nGap on the Central Patho
Opt(P ) = min cT x : Ax := nj=1 xj Aj  B
P
(P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
ν
 
X∗ (µ) ∈ [LP − B] ∩ S++
⇒ : X∗ (µ)S∗ (µ) = µI
S∗ (µ) ∈ [LD + C] ∩ Sν++

Observation: On the primal-dual central path, the duality gap is


Tr(X∗ (µ)S∗ (µ)) = Tr(µI) = µm.
Therefore sum of non-optimalities of the strictly feasible solution X∗ (µ) to (P) and the
strictly feasible solution S∗ (µ) to (D) in terms of the respective objectives is equal to µm
and goes to 0 as µ → +0.
⇒ Our ideal goal would be to move along the primal-dual central path, pushing the path
parameter µ to 0 and thus approaching primal-dual optimality, while maintaining primal-dual
feasibility.

8.19
♠ Our ideal goal is not achievable – how could we move along a curve? A realistic goal
could be to move in a neighborhood of the primal-dual central path, staying close to it.
A good notion of “closeness to the path” is given by the proximity measure of a triple
µ > 0, X ∈ X , S ∈ S to the point (X
p ∗(µ), S∗(µ)) on the path:
−1 −1 −1 − µ−1 S])
q S, µ) = Tr(X[X − µ S]X[X
dist(X,
= Tr(X 1/2 [X 1/2 [X −1 − µ−1 S]X 1/2 ] X 1/2 [X −1 − µ−1 S] )
 
q
= Tr([X 1/2 [X −1 − µ−1 S]X 1/2 ] X 1/2 [X −1 − µ−1 S]X 1/2 )
 
p
= pTr([X 1/2 [X −1 − µ−1 S]X 1/2 ]2 )
= Tr([I − µ−1 X 1/2 SX 1/2 ]2 ).
Note: We see that dist(X, S, µ) is well defined and dist(X, S, µ) = 0 iff X 1/2 SX 1/2 = µI, or,
which is the same,
SX = X −1/2 [X 1/2 SX 1/2 ]X 1/2 = µX −1/2 X 1/2 = µI,
i.e., iff X = X∗ (µ) and S = S∗ (µ).
Note: We have p
−1 −1 −1 −1
p S, µ) = Tr(X[X − µ S]X[X − µ S])
dist(X,
= qTr([I − µ−1 XS][I − µ−1 XS])
 −1 −1
T
= Tr( [I − µ XS][I − µ XS] )
p
= pTr([I − µ−1 SX][I − µ−1 SX])
= Tr(S[S −1 − µ−1 X]S[S −1 − µ−1 X]),
⇒ The proximity is defined in a symmetric w.r.t. X, S fashion.

8.20
Fact: Whenever X ∈ X , S ∈ S and µ > 0, one√has
Tr(XS) ≤ µ[m + mdist(X, S, µ)]
Indeed, we have seen that p
d := dist(X, S, µ) = Tr([I − µ−1 X 1/2 SX 1/2 ]2 ).
Denoting by λi the eigenvalues of X 1/2 SX 1/2 , we have P
d2 = Tr([I − µ−1 X 1/2p SX 1/2 ]2 ) = i [1 − µ−1 λi ]2
P −1 λ | ≤
√ P −1 2
⇒ i√|1 − µ i m i [1 − µ λi ]
= md
P √
⇒ λ i ≤ µ[m + md]
i √
⇒ Tr(XS) = Tr(X 1/2 SX 1/2 ) = i λi ≤ µ[m + md]
P
Corollary. Let us say that a triple (X, S, µ) is close to the path, if X ∈ X , S ∈ S, µ > 0 and
dist(X, S, µ) ≤ 0.1. Whenever (X, S, µ) is close to the path, one has
Tr(XS) ≤ 2µm,
that is, if (X, S, µ) is close to the path, then X is at most 2µm-nonoptimal strictly feasible
solution to (P), and S is at most 2µm-nonoptimal strictly feasible solution to (D).

8.21
How to Trace the Central Path?

♣ The goal: To follow the central path, staying close to it and pushing µ to 0 as fast as
possible.
♣ Question. Assume we are given a triple (X̄, S̄, µ̄) close to the path. How to update it
into a triple (X+ , S+ , µ+ ), also close to the path, with µ+ < µ?
♠ Conceptual answer: Let us choose µ+ , 0 < µ+ < µ̄, and try to update X̄, S̄ into
X+ = X̄ + ∆X, S+ = S̄ + ∆S in order to make the triple (X+ , S+ , µ+ ) close to the path.
Our goal is to ensure that
X+ = X̄ + ∆X ∈ LP − B & X+  0 (a)
S+ = S̄ + ∆S ∈ LD + C & S+  0 (b)
Gµ+ (X+ , S+ ) ≈ 0 (c)
where Gµ (X, S) = 0 expresses equivalently the augmented slackness condition XS = µI.
For example, we can take
Gµ (X, S) = S − µX −1 , or
Gµ (X, S) = X − µS −1 , or
Gµ (X, S) = XS + SX − 2µI, or...

8.22
X+ = X̄ + ∆X ∈ LP − B & X+  0 (a)
S+ = S̄ + ∆S ∈ LD + C & S+  0 (b)
Gµ+ (X+ , S+ ) ≈ 0 (c)

♠ Since X̄ ∈ LP − B and X̄  0, (a) amounts to ∆X ∈ LP , which is a system of linear


equations on ∆X, and to X̄ + ∆X  0. Similarly, (b) amounts to the system ∆S ∈ LD
of linear equations on ∆S, and to S̄ + ∆S  0. To handle the troublemaking nonlinear in
∆X, ∆S condition (c), we linearize Gµ+ in ∆X and ∆S:
Gµ+ (X+ , S+ ) ≈ Gµ+ (X̄, S̄)
∂Gµ+ (X,S) ∂Gµ+ (X,S)
+ ∂X
∆X + ∂S
∆S
(X,S)=(X̄,S̄) (X,S)=(X̄,S̄)
and enforce the lumbarization, as evaluated at ∆X, ∆S, to be zero. We arrive at the
Newton system
∆X ∈ LP , ∆S ∈ LD

∂Gµ+ ∂Gµ+ (N )
∂X
∆X + ∂S
∆S = −G µ+

(the value and the partial derivatives of Gµ+ (X, S) are taken at the point (X̄, S̄)).

8.23
♠ We arrive at conceptual primal-dual path-following method where one iterates the up-
datings
(Xi , Si , µi ) 7→ (Xi+1 = Xi + ∆Xi , Si+1 = Si + ∆Si , µi+1 )
where µi+1 ∈ (0, µi ) and ( ∆Xi , ∆Si are the solution to the Newton system
∆Xi ∈ LP , ∆Si ∈ LD
∂G(i)µi+1 ∂G(i)
µi+1 (i) (Ni )
∂X
∆Xi + ∂S
∆Si = −G µi+1

and G(i)
µ (X, S) = 0 represents equivalently the augmented complementary slackness condi-
tion XS = µI and the value and the partial derivatives of G(i) µi+1 are evaluated at (Xi , Si ).
♠ Initialized by a close to the path triple (X0 , S0 , µ0 ), this conceptual algorithm should
• be well-defined: (Ni ) should remain solvable, Xi should remain strictly feasible for (P),
Si should remain strictly feasible for (D), and
• maintain closeness to the path: for every i, (Xi , Si , µi ) should remain close to the path.
Under these limitations, we want to push µi to 0 as fast as possible.

8.24
Example: Primal
n Path-Following Method
o
n
Opt(P ) = min cT x : Ax := j=1 xj Aj  B
P
(P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
LP = ImA, LD = L⊥
 
P

♣ Let us choose
Gµ (X, S) = S + µ∇K(X) = S − µX −1
Then the Newton system becomes
∆Xi ∈ LP ⇔ ∆Xi = A∆xi
∆Si ∈ LD ⇔ A∗ ∆Si = 0
(Ni )
A∗ U = [Tr(A1 U ); ...; Tr(An U )]
(!) ∆Si + µi+1 ∇2 K(Xi )∆Xi = −[Si + µi+1 ∇K(Xi )]
♠ Substituting ∆Xi = A∆xi and applying A∗ to both sides in (!), we get
(∗) µi+1 [A∗ ∇2 K(Xi )A] ∆xi = − A ∗ ∗ ∇K(X )
 
S
| {z }i +A i
| {z }
H =c
∆Xi = A∆xi 
2
Si+1 = µi+1 ∇K(Xi ) − ∇ K(Xi )A∆xi
The mappings h 7→ Ah, H 7→ ∇2 K(Xi )H have trivial kernels ⇒ H is nonsingular ⇒ (Ni ) has
a unique solution given by  −1
−1 ∗

∆xi = −H µi+1 c + A ∇K(Xi )
∆Xi = A∆xi  
2
Si+1 = Si + ∆Si = µi+1 ∇K(Xi ) − ∇ K(Xi )A∆xi

8.25
n Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S ⊥

L P = ImA, L D = LP

−1
 −1 ∗

 ∆xi = −H µi+1 c + A ∇K(Xi )
⇒ ∆Xi = A∆xi  
 2
Si+1 = Si + ∆Si = µi+1 ∇K(Xi ) − ∇ K(Xi )A∆xi

♠ Xi = Axi − B for a (uniquely defined by Xi ) strictly feasible solution xi to (P ). Setting


F (x) = K(Ax − B),

we have A ∇K(Xi ) = ∇F (xi ), H = ∇2 F (xi )
⇒ The above recurrence can be written solely in terms of xi and F :
µi 7→ µi+1 < µi
(#) 2 −1
 −1 
xi+1 = xi − [∇ F (xi )] µi+1 c + ∇F (xi )
Xi+1 = Axi+1  −B 
2
Si+1 = µi+1 ∇K(Xi ) − ∇ K(Xi )A[xi+1 − xi ]
Recurrence (#) is called the primal path-following method.

8.26
n Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S ⊥

LP = ImA, LD = LP

♠ The primal path-following method can be explained as follows:


• The barrier K(X) = − ln DetX induces the barrier F (x) = K(Ax − B) for the interior P o
of the feasible domain of (P ).
• The primal central path
X∗ (µ) = argminX=Ax−B0 [Tr(CX) + µK(X)]
induces the path
x∗ (µ) ∈ P o : X∗ (µ) = Ax∗ (µ) + µF (x).
Observing that
Tr(C[Ax − B]) + µK(Ax − B) = cT x + µF (x) + const,
we have
x∗ (µ) = argminx∈P o Fµ (x), Fµ (x) = cT x + µF (x).
• The method works as follows: given xi ∈ P o , µi > 0, we
— replace µi with µi+1 < µi
— convert xi into xi+1 by applying to the function Fµi+1 (·) a single step of the Newton
minimization method
xi 7→ xi+1 − [∇2 Fµi+1 (xi )]−1 ∇Fµi+1 (xi )

8.27
n o
n
Opt(P ) = min cT x : Ax := j=1 xj Aj  B
P
(P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
LP = ImA, LD = L⊥

P

Theorem. Let (X0 = Ax0 − B, S0 , µ0 ) be close to the primal-dual central path, and let
(P ) be solved by the Primal path-following method where the path parameter µ is updated
according to  
0.1
µi+1 = 1 − √m µi . (∗)
Then the method is well defined√and all triples (Xi = Axi − B, Si , µi ) are close to the path.
♠ With the rule (∗) it takes O( m) steps to reduce the path parameter µ by an absolute
constant factor. Since the method stays close to the path, the duality gap Tr(Xi Si ) of i-th
iterate does not exceed 2mµi . √
⇒ The number of steps to make the duality gap ≤  does not exceed O(1) m ln 1 + 2mµ


0
.

8.28
2D feasible set of a toy SDO (K = S3+ ).
“Continuous curve” is the primal central path
Dots are iterates xi of the Primal Path-Following method.
Itr# Objective Gap Itr# Objective Gap
1 -0.100000 2.96 7 -1.359870 8.4e-4
2 -0.906963 0.51 8 -1.360259 2.1e-4
3 -1.212689 0.19 9 -1.360374 5.3e-5
4 -1.301082 6.9e-2 10 -1.360397 1.4e-5
5 -1.349584 2.1e-2 11 -1.360404 3.8e-6
6 -1.356463 4.7e-3 12 -1.360406 9.5e-7
Duality gap along the iterations

8.29
♣ The Primal path-following method is yielded by Conceptual Path-Following Scheme when
the Augmented Complementary Slackness condition is represented as
Gµ (X, S) := S + µ∇K(X) = 0.
Passing to the representation
Gµ(X, S) := X + µ∇K(S) = 0,
we arrive at the Dual path-following method with the same theoretical properties as those
of the primal method. the Primal and the Dual path-following methods imply the best
known so far complexity bounds for LO and SDO.
♠ In spite of being “theoretically perfect”, Primal and Dual path-following methods in
practice are inferior as compared with the methods based on less straightforward and more
symmetric forms of the Augmented Complementary Slackness condition.

8.30
♠ The Augmented Complementary Slackness condition is
XS = SX = µI (∗)
Fact: For X, S ∈ Sν++ , (∗) is equivalent to
XS + SX = 2µI
Indeed, if XS = SX = µI, then clearly XS + SX = 2µI. On the other hand,
X, S  0, XS + SX = 2µI
⇒ S + X −1 SX = 2µX −1
⇒ X −1 SX = 2µX −1 − S
⇒ X −1 SX = [X −1 SX]T = XSX −1
⇒ X 2 S = SX 2
We see that X 2 S = SX 2 . Since X  0, X is a polynomial of X 2 , whence X and S commute,
whence XS = SX = µI. 

Fact: Let Q ∈ Sν be nonsingular, and let X, S  0. Then XS = µI if and only if


QXSQ−1 + Q−1 SXQ = 2µI
b = QXQ  0, Se = Q−1SQ−1 
Indeed, it suffices to apply the previous fact to the matrices X
0. 

8.31
♠ In practical path-following methods, at step i the Augmented Complementary Slackness
condition is written down as
Gµi+1 (X, S) := Qi XSQ−1 −1
i + Qi SXQi − 2µi+1 I = 0
with properly chosen varying from step to step nonsingular matrices Qi ∈ Sν .
Explanation: Let Q ∈ Sν be nonsingular. The Q-scaling X 7→ QXQ is a one-to-one linear
mapping of Sν onto itself, the inverse being the mapping X 7→ Q−1 XQ−1 . Q-scaling is a
symmetry of the positive semidefinite cone – it maps the cone onto itself.
⇒ Given a primal-dual pair of semidefinite
 programs
Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S
and a nonsingular matrix Q ∈ Sν ,
one can pass in (P) from variable X to variables X b = QXQ,
while passing in (D) from variable S −1 −1
n to variable S = Q SQ . The resulting problems are
e
o
Opt(P) = min Tr(C e X)
b :X b ∩ Sν
b ∈ [LbP − B] (P)
b
+
X
b n o
Opt(D) = max Tr(B e : Se ∈ [LeD + C]
b S) e ∩S ν (D)
e
+
 Se 
B = QBQ, L
b bP = {QXQ : X ∈ LP },
e = Q−1CQ−1, LeD = {Q−1SQ−1 : S ∈ LD }
C

8.32
n o
Opt(P) = min Tr(Ce X)
b :X b ∩ Sν
b ∈ [LbP − B] (P)
b
+
X
b n o
Opt(D) = max Tr(B e : Se ∈ [LeD + C]
b S) e ∩S ν (D)
e
+
 S
e 
B = QBQ, LP = {QXQ : X ∈ LP },
b b
e = Q−1CQ−1, LeD = {Q−1SQ−1 : S ∈ LD }
C

P
b and D e are dual to each other, the primal-dual central path of this pair is the image of
the primal-dual path of (P), (D) under the primal-dual Q-scaling
(X, S) 7→ (Xb = QXQ, Se = Q−1SQ−1)
Q preserves closeness to the path, etc.
Writing down the Augmented Complementary Slackness condition as
QXSQ−1 + Q−1 SXQ = 2µI (!)
we in fact
• pass from (P), (D) to the equivalent primal-dual pair of problems (P), b (D)e
• write down the Augmented Complementary Slackness condition for the latter pair in the
simplest primal-dual symmetric form
Xb Se + SeX
b = 2µI,
• “scale back” to the original primal-dual variables X, S, thus arriving at (!).
Note: In the LO case Sν is comprised of diagonal matrices, so that (!) is exactly the same
as the “unscaled” condition XS = µI.

8.33
Gµi+1 (X, S) := Qi XSQ−1 −1
i + Qi SXQi − 2µi+1 I = 0 (!)

With (!), the Newton system becomes


∆X ∈ LP , ∆S ∈ LD
Qi ∆XSi Q−1 −1 −1
i + Qi Si ∆XQi + Qi Xi ∆SQi + Qi ∆SXi Qi
−1

= 2µi+1 I − Qi Xi Si Q−1 −1
i − Qi Si Xi Qi
♣ Theoretical analysis of path-following methods simplifies a lot when the scaling (!) is
commutative, meaning that the matrices X bi = QiXiQi and Sbi = Q−1SiQ−1 commute.
i i
Popular choices of commuting scalings are:
1/2
• Qi = Si (“XS-method,” Se = I)
−1/2
• Qi = Xi (“SX-method, X b = I)
1/2
• Qi = X −1/2 (X 1/2 SX 1/2 )−1/2 X 1/2 S
(famous Nesterov-Todd method, X b = S).e

8.34
n Pn o
T
Opt(P ) = min c x : Ax := j=1 xj Aj  B (P )
x 
⇔ Opt(P) = min Tr(CX) : X ∈ [LP − B] ∩ Sν+ (P)
X 
Opt(D) = max Tr(BS) : S ∈ [LD + C] ∩ Sν+ (D)
S ⊥

LP = ImA, LD = LP

Theorem: Let a strictly-feasible primal-dual pair (P ), (D) of semidefinite programs be


solved by a primal-dual path-following method based on commutative scalings. Assume
that the method is initialized by a close to the path triple (X0 , S0 , µ0 = Tr(X0 S0 )/m) and
let the policy for updating µ be  
0.1
µi+1 = 1 − √
m
µi .
The the trajectory is √
well defined and stays close to the path.
As a result, every O( m)√steps of the method reduce duality gap by an absolute constant
factor, and it takes O(1) m ln 1 + mµ
0
steps to make the duality gap ≤ .

8.35
♠ To improve the practical performance of primal-dual path-following methods, in actual
computations  
0.1
• the path parameter is updated in a more aggressive fashion than µ 7→ 1 − √m µ;
• the method is allowed to travel in a wider neighborhood of the primal-dual central path
than the neighborhood given by our “close to the path” restriction dist(X, S, µ) ≤ 0.1;
• instead of updating Xi+1 = Xi +∆Xi , Si+1 = Si +∆Si , one uses the more flexible updating
Xi+1 = Xi + αi ∆Xi , Si+1 = Si + αi ∆Si
with αi given by appropriate line search.
♣ The constructions and the complexity results we have presented are incomplete — they
do not take into account the necessity to come close to the central path before starting
path-tracing and do not take care of the case when the pair (P), (D) is not strictly feasible.
All these “gaps” can be easily closed via the same path-following technique as applied to
appropriate augmented versions of the problem of interest.

8.36
SUMMARY
I. WHAT IS LO

♣ An LO program is an optimization problem of the form

Optimize a linear objective cT x over x ∈ Rn satisfying a system (S) of finitely many


linear equality and (nonstrict) inequality constraints.

♠ There are several “universal” forms of an LO program, universality meaning that every
LO program can be straightforwardly converted to an equivalent problem in the desired
form.
Examples of universal forms are:
• Canonical form:
Opt = max cT x : Ax − b ≥ 0

[A : m × n]
x
• Standard form:
Opt = max cT x : Ax = b, x ≥ 0

[A : m × n]
x
EXTENSION: CONIC PROBLEMS

♣ An LO program is

Opt = max cT x : Ax − b ∈ Rm
+
x
Rm
+ = {x ∈ Rm : xi ≥ 0, 1 ≤ i ≤ m}
Note: The nonnegative orthant Rm + is a regular cone – closed convex pointed cone with a
nonempty interior.
♠ Replacing in the definition of an LO the nonnegative orthant with another regular cone
K, we arrive at a conic problem on a cone K:
 T
Opt = max c x : Ax − b ∈ K
x

In this problem,
• c, A, b form problem’s data
• K “summarizes” problem’s structure.
♠ Essentially, the entire Convex Programming is covered by just 3 generic conic problems:
• LO – cones K are nonnegative orthants – direct products
of nonnegative rays,
• CQO – cones K are direct products of Lorentz cones,
• SDO – cones K are direct products of cones of symmetric
positive semidefinite matrices
• Note:
LO ⊂ CQO ⊂ SDO
and CQO can be approximated in a polynomial time fashion by LO.
II. WHAT CAN BE REDUCED TO LO

♣ Every optimization problem max f (y) can be straightforwardly converted to an equivalent


y∈Y
problem with linear objective, specifically, to the problem
max cT x (P )
 x∈X 
X = {x = [y; t] : y ∈ Y, t ≤ f (y)}, cT x := t
♠ The possibility to convert (P ) into an LO program depends on the geometry of the
feasible set X. When X is polyhedrally representable:
X = {x ∈ Rn : ∃w ∈ Rk : P x + Qw − r ≥ 0}
for properly chosen P, Q, r, (P ) reduces to the LO program
max cT x : P x + Qw − r ≥ 0 .

x,w

♣ Similarly, given a family K of regular cones closed w.r.t. taking direct products and
passing from a cone to its dual, we can define the notion of K-representation of a set X:
X = {x ∈ Rn : ∃w ∈ Rk : P x + Qw − r ∈ K}
with K ∈ K. Given such a representation, (P ) reduces to the conic problem
max cT x : P x + Qw − r ∈ K .

x,w

on a cone from family K.


♣ By Fourier-Motzkin elimination scheme, every polyhedrally representable set X is in fact
polyhedral – it can be represented as the solution set of a system of linear inequalities
without slack variables w:
X = {x : ∃w : P x + Qw − r ≥ 0}
⇔ ∃A, b : X = {x : Ax − b ≥ 0}
This does not make the notion of a polyhedral representation “void” — a polyhedral set
with a “short” polyhedral representation involving slack variables can require astronomically
many inequalities in a slack-variable-free representation.
♠ There is no “Fourier-Motzkin elimination” for general K-representability: it may happen
that a set which admits conic quadratic (or semidefinite) representation utilizing slack
variables does not admit such a representation without slack variables.
♠ Every polyhedral (≡ polyhedrally representable) set is convex (but not vice versa), and all
basic convexity-preserving operations with sets (taking finite intersections, affine images,
inverse affine images, arithmetic sums, direct products) as applied to polyhedral operands
yield polyhedral results.
• Moreover, for all basic convexity-preserving operations, a polyhedral representation of the
result is readily given by polyhedral representations of the operands.
♠ All this holds true for K-representations. “The calculus” of representations via a particular
family of regular cones K (closed w.r.t. taking direct products and duality) is completely
independent of K; what does depend on K, are the “raw materials.”
• Example: The problem
n
P
minimize x`
`=1
(a) x ≥ 0;
(b) aT` x ≤ b` , ` = 1, ..., n;
(c) kP x − pk2 ≤ cT x + d;
`+1

(d) x` ` ≤ eT` x + f` , ` = 1, ..., n;


l 1

(e) x` xl+1 ≥ g`T x + h` , ` = 1, ..., n − 1;


l+3 l+3

x1 x2 x3 ··· xn
 
 x2 x1 x2 · · · xn−1 
(f )
 x
Det  3 x2 x1 · · · xn−2   ≥ 1;
 ... ... ... ... ... 
xn xn−1 xn−2 · · · x1
n  π
2
P 
(g) 1 ≤ x` cos(`ω) ≤ 1 + sin (5ω) ∀ω ∈ − 7 , 1.3
`=1

can be converted, in a systematic way, into an equivalent SDO problem


 T
min c x : Ax − b  0 .
x

Dropping red constraints (f), (g), the remaining problem can be converted into an equivalent
CQO. Further dropping magenta constraints (c), (d), (e), we arrive at LO problem.
♣ A “counterpart” of the notion of polyhedral (≡ polyhedrally representable) set is the
notion of a polyhedrally representable function. A function f : Rn → R ∪ {+∞} is called
polyhedrally representable, if its epigraph is a polyhedrally representable set:
Epi{f } := {[x; τ ] : τ ≥ f (x)]}
= {[x; τ ] : ∃w : P x + τ p + Qw − r ≥ 0}
♠ A polyhedrally representable function always is convex (but not vice versa);
♠ A function f is polyhedrally representable if and only if its domain is a polyhedral set,
and in the domain f is the maximum of finitely many affine functions:
(
max [aT` x + b` ], x ∈ Domf := {x : Cx ≤ d}
f (x) = 1≤`≤L
+∞, otherwise
♠ A level set {x : f (x) ≤ a} of a polyhedrally representable function is polyhedral.
♠ All basic convexity-preserving operations with functions (taking finite maxima, linear
combinations with nonnegative coefficients, affine substitution of argument) as applied to
polyhedrally representable operands yield polyhedrally representable results.
• For all basic convexity-preserving operations with functions, a polyhedral representation
of the result is readily given by polyhedral representations of the operands.
♣ Given a family K of regular cones closed w.r.t. taking direct products and passing from a
cone to its dual, we can define the notion of K-representation of a function f : Rn → R∪{+∞}
as K-representation of its epigraph:
Epi{f } := {[x; τ ] : τ ≥ f (x)]}
= {[x; τ ] : ∃w : P x + τ p + Qw − r ∈ K}
with K ∈ K. All what was said on polyhedral representability of functions, except for
piecewise linearity, specific for polyhedral representability, holds true for K-representability.
• For “rich” K, like SDO, the family of K-representable sets/functions is incomparably wider
than the family of polyhedral sets/functions.
III. STRUCTURE OF A POLYHEDRAL SET
III.A: Extreme Points

♣ Extreme points: A point v ∈ X = {x ∈ Rn : Ax ≤ b} is called an extreme point of X, if v


is not a midpoint of a nontrivial segment belonging to X:
v ∈ Ext(X) ⇔ v ∈ X & v ± h ∈ X ⇒ h = 0.
♠ Facts: Let X = {x ∈ Rn : aTi x ≤ bi , 1 ≤ i ≤ m} be a nonempty polyhedral set.
• X has extreme points if and only if it does not contain lines;
• The set Ext(X) of the extreme points of X is finite;
• X is bounded iff X = Conv(Ext(X));
• A point v ∈ X is an extreme point of X if and only if among the inequalities aTi x ≤ bi
which are active at v: aTi v = bi , there are n = dim x inequalities with linearly independent
vectors of coefficients:
v ∈ Ext(X)
⇔ v ∈ X & Rank {ai : aTi v = bi } = n.
III.B. Recessive Directions

♣ Let X = {x ∈ Rn : Ax ≤ b} be a nonempty polyhedral set. A vector d ∈ Rn is called a


recessive direction of X, if there exists x̄ ∈ X such that the ray R+ · d := {x̄ + td : t ≥ 0} is
contained in X.
♠ Facts: • The set of all recessive directions of X form a cone, called the recessive cone
Rec(X) of X.
• The recessive cone of X is the polyhedral set given by Rec(X) = {d : Ad ≤ 0}
• Adding to an x ∈ X a recessive direction, we get a point in X:
X + Rec(X) = X.
• The recessive cone of X is trivial (i.e., Rec(X) = {0}) if and only if X is bounded.
• X contains a line if and only if Rec(X) is not pointed, that is, Rec(X) contains a line,
or, equivalently, there exists d 6= 0 such that ±d ∈ Rec(X), or, equivalently, the null space
of A is nontrivial: KerA = {d : Ad = 0} 6= {0}.
• A line directed by a vector d belongs to X if and only if it crosses X and d ∈ KerA =
Rec(X) ∩ [−Rec(X)].
• One has
X = X + KerA.
• X can be represented as
X=X
b + KerA
where X
b is a nonempty polyhedral set which does not contain lines. One can take
Xb = X ∩ [KerA]⊥.
III.C. Extreme Rays of a Cone

♣ Let K be a polyhedral cone, that is,


K = {x ∈ Rn : Ax ≤ 0}
= {x ∈ Rn : aTi x ≤ 0, 1 ≤ i ≤ m}
♠ A direction d ∈ K is called an extreme direction of K, if d 6= 0 and in any representation
d = d1 + d2 with d1 , d2 ∈ K both d1 and d2 are nonnegative multiples of d.
• Given d ∈ K, the ray R = R+ · d ⊂ K is called the ray generated by d, and d is called a
generator of the ray. When d 6= 0, all generators of the ray R = R+ · d are positive multiples
of d, and vice versa – every positive multiple of d is a generator of R.
• Rays generated by extreme directions of K are called extreme rays of K.
♠ Facts:
• A direction d 6= 0 is an extreme direction of a pointed polyhedral cone K = {x ∈ Rn :
aTi x ≤ 0, 1 ≤ i ≤ m} if and only if d ∈ K and among the inequalities aTi x ≤ 0 defining K there
are n − 1 which are active at d aTi d = 0 and have linearly independent vectors of coefficients:
d is an extreme direction of K
⇔ d ∈ K\{0} & Rank {ai : aTi d = 0} = n − 1
• K has extreme rays if and only if K is nontrivial (K 6= {0}) and pointed (K ∩ [−K] = {0}),
and in this case
• the number of extreme rays of K is finite, and
• if r1 , ..., rM are generators of extreme rays of K, then
K = Cone ({r1 , ..., rM }).
III.D. Structure of a Polyhedral Set

♣ Theorem: Every nonempty polyhedral set X can be represented in the form


X = Conv({v1 , ..., vN }) + Cone ({r1 , ..., rM }) (!)
where {v1 , ..., vN } is a nonempty finite set, and {r1 , ..., rM } is a finite (possibly empty) set.
Vice versa, every set of the form (!) is a nonempty polyhedral set. In addition,
♠ In a representation (!), one always has
Cone ({r1 , ..., rM }) = Rec(X)
♠ Let X do not contain lines. Then one can take in (!) {v1 , ..., vN } = Ext(X) and to choose,
as r1 , ..., rM , the generators of the extreme rays of Rec(X). The resulting representation is
“minimal”: for every representation of X in the form of (!), it holds Ext(X) ⊂ {v1 , ..., vN }
and every extreme ray of Rec(X), is any, has a generator from the set {r1 , ..., rM }.
IV. FUNDAMENTAL PROPERTIES
OF LO PROGRAMS

♣ Consider a feasible LO program


Opt = max cT x : Ax ≤ b

(P )
x

♠ Facts:
• (P ) is solvable (i.e., admits an optimal solution) if and only if (P ) is bounded (i.e., the
objective is bounded from above on the feasible set)
• (P ) is bounded if and only if cT d ≤ 0 for every d ∈ Rec(X)
• Let the feasible set of (P ) do not contain lines. Then (P ) is bounded if and only if
cT d ≤ 0 for generator of every extreme ray of Rec(X), and in this case there exists an
optimal solution which is an extreme point of the feasible set.
V. GTA AND DUALITY

♣ GTA: Consider a finite system (S) of nonstrict and strict linear inequalities and linear
equations in variables x ∈ Rn :
aTi x Ωi bi , 1 ≤ i ≤ m
(S)
Ωi ∈ {” ≤ ”, ” < ”, ” ≥ ”, ” > ”, ” = ”}
(S) has no solutions if and only if a legitimate weighted sum of the inequalities of the
system is a contradictory inequality, that is

One can assign the constraints of the system with weights λi so that
• the weights of the ”<” and ”≤” inequalities are nonnegative, while the weights
ofPthe ”>” and ”≥” inequalities are nonpositive,
m
• i=1 λP
i ai = 0, and
m
• either
Pm i=1 λibi < 0,
or i=1 λi bi = 0 and at least one of the strict inequalities gets a nonzero weight.
♠ Particular case: Homogeneous Farkas Lemma. A homogeneous linear inequality
aT x ≤ 0 is a consequence of a system of homogeneous linear inequalities aTi x ≤ 0, 1 ≤ i ≤ m,
if and only if the inequality is a combination, with nonnegative weights, of the inequalities
from the system, or, which is the same, if and only if a ∈ Cone ({a1 , ..., am }).
♠ Particular case: Inhomogeneous Farkas Lemma. A linear inequality aT x ≤ b is a
consequence of a solvable system of inequalities aTi x ≤ bi , 1 ≤ i ≤ m, if and only if the
inequality is a combination, with nonnegative weights, of the inequalities from the system
and the trivial identically true inequality 0TP
x ≤ 1, or, which isPthe same, if and only if there
m m
exist nonnegative λi , 1 ≤ i ≤ m, such that i=1 λi ai = a and i=1 λi bi ≤ b.
LO DUALITY
♣ Given an LO program in the form   
Px − p ≥ 0
Opt(P ) = min cT x : (P )
x Rx = r
we associate with it the dual problem
  
λ ≥ 0
Opt(D) = max pT λ + rT µ : . (D)
λ,µ P T λ + RT µ = c
LO Duality Theorem:
(i) [symmetry] Duality is symmetric: the problem dual to (D) is (equivalent to) the primal
problem (P )
(ii) [weak duality] Opt(D) ≤ Opt(P )
(iii) [strong duality] The following 3 properties are equivalent to each other:
• one of the problems (P ), (D) is feasible and bounded
• both (P ) and (D) are solvable
• both (P ) and (D) are feasible
Whenever one of (and then – all) these properties takes place, we have
Opt(D) = Opt(P ).
  
Px − p ≥ 0
Opt(P ) = min cT x : (P )
 x  Rx = r 
λ ≥ 0
Opt(D) = max pT λ + rT µ : . (D)
λ,µ P T λ + RT µ = c
♠ LO Optimality Conditions: Let
x, (λ, µ)
be feasible solutions to the respective problems (P ), (D). Then x, (λ, µ) are optimal solutions
to the respective problems
• if and only if the associated duality gap is zero:
DualityGap(x, (λ, µ)) := cT x − [pT λ + rT µ]
= 0
• if and only if the complementary slackness condition holds:
λT [P x − p] = 0,
or, equivalently, nonzero Lagrange multipliers λi are associated only with the primal con-
straints which are active at x:
∀i : λi [P x − p]i = 0
♣ As a corollary of Duality Theorem, the optimal value in a feasible maximization LO
program is a polyhedral function of the objective admitting an explicit polyhedral represen-
tation:

Opt(x):= maxu xT u : P u − p ≥ 0, Ru = r ≤ τ
m
∃λ, µ : λ ≥ 0, −P λ + RT µ = x, −pT λ + rT µ ≤ τ
T

♠ As a result, the solution set of a semi-infinite scalar linear inequality


aT x ≤ b ∀(a, b) ∈ U
with polyhedral uncertainty set U is polyhedrally representable, with polyhedral representa-
tion readily given by the one of U
⇒ The robust counterpart
min{cT x : aTi x ≤ bi ∀(ai , bi ) ∈ Ui , 1 ≤ i ≤ m}
with polyhedral uncertainty sets Ui is equivalent to an LO problem readily given by polyhedral
representations of the uncertainty sets U1 , ..., Um .
CONIC DUALITY
♣ Given conic program in the form   
  Px − p ≥ 0 
Opt(P ) = min cT x : Qx − q ∈ K (P )
x   Rx = r 
we associate with it the dual problem 
  λ≥0 
Opt(D) = max pT λ + q T ω + rT µ : ω ∈ K∗ . (D)
λ,ω,µ  P T λ + QT ω + R T µ = c
Conic Duality Theorem:
(i) [symmetry] Duality is symmetric: the problem dual to (D) is (equivalent to) the primal
problem (P )
(ii) [weak duality] Opt(D) ≤ Opt(P )
(iii) [strong duality] Let one of the problems (P ), (D) be nearly strictly feasible, meaning
that it has a feasible solution where the nonpolyhedral (“brown”) conic constraint is satisfied
strictly: ... ∈ int .... Then the other problem is solvable, and Opt(P ) = Opt(D). In particular,
if both problems are nearly strictly feasible, both are solvable with equal optimal values.
  
  Px − p ≥ 0 
Opt(P ) = min cT x : Qx − q ∈ K (P )
x   Rx = r 

  λ≥0 
Opt(D) = max pT λ + q T ω + rT µ : ω ∈ K∗ . (D)
λ,ω,µ  P T λ + QT ω + R T µ = c

♠ Conic Programming Optimality Conditions: Let


x, (λ, ω, µ)
be feasible solutions to the respective problems (P ), (D), and let one of the problems be
nearly strictly feasible. Then x, (λ, ω, µ) are optimal solutions to the respective problems
• if and only if the associated duality gap is zero:
DualityGap(x, (λ, ω, µ)) := cT x − [pT λ + q T ω + rT µ]
= 0
• if and only if the complementary slackness condition holds:
λT [P x − p] = 0 & ω T [Qx − q] = 0.
| {z }
⇔λi [P x−p]i =0 ∀i
♣ Conic Duality Theorem underlies several important “nonlinear versions” of the basic
polyhedral results, like
“Conic” Farkas Lemma: A scalar linear inequality aT x ≥ b is a consequence of a strictly
feasible system
Ai x − bi ∈ Ki , i = 1, ..., m, (S)
of conic inequalities if and only if it can be obtained by taking “legitimate weighted sum”
of inequalities from the system and the trivial identically true inequality 0 ≤ 1:
Scalar linear inequality aT x ≥ b is a consequence
of strictly feasible system (S) iffP
∃λi ∈ K∗i : T T
P
i Ai λi = a & i λi bi ≥ b
GEOMETRY OF PRIMAL-DUAL PAIR
OF CONIC PROBLEMS

♣ Geometrically, a primal-dual pair of conic problems is specified by


• a regular cone K in some E = RN and its dual cone K∗
• a pair of linear subspaces LP , LD in E which are orthogonal
complements to each other
• a pair of shift vectors b ∈ E, c ∈ E
and requires to find in the primal feasible set T
[LP − b] K
and the dual feasible set T
[LD + c] K∗
a pair of orthogonal to each other vectors x∗ , λ∗ ; whenever this is possible, x∗ , λ∗ are optimal
solutions to the problems  T T
Opt(P ) = minx c x : x ∈ [LP − b] TK (P )
Opt(D) = maxλ bT λ : λ ∈ [LD + c] K∗ (Q)
and Opt(P ) − Opt(D) + bT c = 0 (strong duality). Besides this,
• one always has Opt(P ) − Opt(D) + bT c ≥ 0 (weak duality);
• under strong duality, [cT x − Opt(P )] + [Opt(D) − bT λ] = λT x for all primal-dual feasible
(x, λ),
• under strong duality, a primal-dual feasible pair (x, λ) is comprised of optimal solutions
iff the solutions in the pair are orthogonal;
• if (P ), (D) are strictly feasible, both problems are solvable and strong duality takes
place. When K = RN + , the conclusion remains true when (P ), (D) are just feasible.
Tractability of LO, CQO, SDO

♠ Convex Programming in general, and the generic conic problems LO, CQO, SDO in
particular, form a “solvable case” in Optimization: in theory, and to some extent also in
practice, globally optimal solutions to these problems can be approximated within whatever
high accuracy in reasonable, polynomial in the sizes of the instances and in a desired number
of accuracy digits, time.

You might also like