Math Prog
Math Prog
M AT H E M AT I C A L
PROGRAMMING
E C O L E P O LY T E C H N I Q U E
Copyright © 2018 Leo Liberti
3 Reformulations 51
3.1 Elementary reformulations 52
3.1.1 Objective function direction and constraint sense 52
3.1.2 Liftings, restrictions and projections 52
3.1.3 Equations to inequalities 52
3.1.4 Inequalities to equations 52
3.1.5 Absolute value terms 52
5
II In-depth topics 65
4 Constraint programming 67
4.1 The dual CP 67
4.2 CP and MP 68
4.3 The CP solution method 69
4.3.1 Domain reduction and consistency 69
4.3.2 Pruning and solving 70
4.4 Objective functions in CP 71
4.5 Some surprising constraints in CP 71
4.6 Sudoku 72
4.6.1 AMPL code 72
4.7 Summary 73
5 Maximum cut 75
5.1 Approximation algorithms 76
5.1.1 Approximation schemes 76
6
5.3 MP formulations 78
5.3.1 A natural formulation 78
5.3.2 A BQP formulation 79
5.3.3 Another quadratic formulation 79
5.3.4 An SDP relaxation 79
5.6 Summary 91
6 Distance Geometry 93
6.1 The fundamental problem of DG 93
6.2 Some applications of the DGP 94
6.2.1 Clock synchronization 94
6.2.2 Sensor network localization 94
6.2.3 Protein conformation 94
6.2.4 Unmanned underwater vehicles 95
Bibliography 107
Index 115
Introduction
An overview of
Mathematical
Programming
1
Imperative and declarative programming
1.3 Universality
Figure 1.2: A program in Minsky’s RM
(from [Liberti and Marinelli, 2014]).
TMs and RMs are flexible enough so that they can “simulate” other
TMs and RMs. Specifically, we are able to design a TM U such that
U (h T, x i) = T ( x ), (1.1)
• primitive recursion:8 if f : Nk → N and g : Nk+2 → N are p.r., Figure 1.3: Alonzo Church, grey
eminence of recursive functions.
then the function φ f ,g : Nk+1 → N defined by: 8
For example, the factorial function
n! is defined by primitive recursion
φ(s, 0) = f (s) (1.2) by setting k = 0, f = const1 = 1,
g(n) = n(n − 1)!.
φ(s, n + 1) = g(s, n, φ(s, n)) (1.3)
is p.r.
• minimalization: if f : Nk+1 → N, then the function ψ : Nk → N
maps s to the least9 natural number y such that, for each z ≤ y, 9
This is a formalization of an opti-
f (s, z) is defined and f (s, y) = 0 is p.r. mization problem in number theory:
finding a minimum natural number
which satisfies a given p.r. property f
Notationally, we denote minimalization by means of the search is p.r.
quantifier µ:10 10
The µ quantifier is typical of com-
putability. Eq. (1.4) is equivalent
ψ(s) ≡ µy (∀z ≤ y ( f (s, z) is defined) ∧ f (s, y) = 0). (1.4) to ψ(s) = min{y ∈ N | ∀z ≤
y ( f (s, y) is defined) ∧ f (s, y) = 0}.
The main result in computability states that TMs (or RMs) and
p.r. functions are both universal models of computation. More
precisely, let T be the description of a TM taking an input s ∈
Nk and producing an output in N, and let Φ T : Nk → N the
associated function s 7→ T (s). Then Φ T is p.r.; and, conversely, for
any p.r. function f there is a TM T such that Φ T = f .
Here is the sketch of a proof. (⇒) Given a p.r. function, we
must show there is a TM which computes it. It is possible, and not
Figure 1.4: The Halting Problem
too difficult,11 to show that there are TMs which compute basic (vinodwadhawan.blogspot.com).
p.r. functions and basic p.r. operations. Since we can combine TMs 11
See Ch. 3, Lemma 1.1, Thm. 3.1, 4.4,
in the appropriate ways, for any p.r. function we can construct a 5.2 in [Cutland, 1980]
TM which computes it.
(⇐) Conversely, given a TM T with input/output function
τ : Rn → R, we want to show that τ ( x ) is a p.r. function.12 We 12
This is the most interesting direction
assume T’s alphabet is the set N of natural numbers, and that the for the purposes of these lecture notes,
since it involves “modelling” a TM by
output of T (if the TM terminates) is going to be stored in a given means of p.r. functions.
16 mathematical programming
where we assumed that Shalt = 0. Now there are two cases: either T
terminates on x, or it does not. If it terminates, there is a number t∗
of steps after which T terminates, and
t∗ = µt(σ( x, t) = 0) (1.5)
∗
τ (x) = ω ( x, t ). (1.6)
Note that Eq. (1.5) says that t∗ is the least t such that σ ( x, t) = 0 and
σ ( x, u) is defined for each u ≤ t.
If T does not terminate on x, then the two functions µt(σ ( x, t) =
0) and τ ( x ) are undefined on x. In either case, we have
τ ( x ) ≡ ω ( x, µt(σ ( x, t) = 0)).
∀i ∈ P ∑ xij ≤ ai .
j∈ Q
What else? Well, suppose you have two facilities with capacities
a1 = a2 = 2 and one customer with demand b1 = 1. Suppose that
the transportation costs are c11 = 1 and c21 = 2 (see Fig. 1.10). Then
imperative and declarative programming 19
∀(u, v) ∈ A xuv ≥ 0,
and if the positive flow rates along the arcs define a set of paths
from s to t.
How do we express this constraint? First, since the flow orig-
inates from the source node s, we require that no flow actually
arrives at s:
∑ xus = 0. (1.7)
(u,s)∈ A Figure 1.13: A Max Flow instance.
Next, no benefit29 would derive from flow leaving the target node t: 29
Eq. (1.8) follows from (1.7) and (1.9)
(prove it), and so it is redundant. If
you use these constraints within a
∑ xtv = 0. (1.8) MILP, nonconvex NLP or MINLP,
(t,v)∈ A however, they might help certain
solvers perform better.
Lastly, the total flow entering intermediate nodes on paths between
s and t must be equal to the total flow exiting them:30 30
These constraints are called flow
conservation equations or material
∀v ∈ V r {s, t} ∑ xuv = ∑ xvu . (1.9) balance constraints.
(u,v)∈ A (v,u)∈ A
given a number F, is it the case that there exists a solution x such that
f ( x ) ≤ F and g( x ) ≤ 0?
1.7.4 Definitions
• The parameters encode the input (or instance): their values are
fixed before the solver is called. The decision variables encode
the output (or solution): their values are fixed by the solver, at
termination.
∀j ∈ Z x j ∈ Z,
min f ( p, x )
max f ( p, x ),
g( p, x ) ≤ 0
g( p, x ) = 0
g( p, x ) ≥ 0,
If the solvers are the interpreters for MP, the modelling software
provides a mapping between optimization problems, formulated in
abstract terms, and their instances, which is the actual input to the
solver. Figure 1.19: Gradient-based minimality
conditions (from www.sce.carleton.
ca).
imperative and declarative programming 23
∑ cij xij
min
i∈ P
j∈ Q
∀i ∈ P ∑ xij ≤
ai
j∈ Q (1.12)
∑ xij
∀j ∈ Q ≥ bj
i∈ P
≥
x 0,
which refers to a given instance of the problem (see Sect. 1.7.3). The
two formulations are different: the former is the generalization
which describes all possible instances of the transportation prob-
lem,36 and the latter corresponds to the specific instance where 36
An instance of the transportation
P = {1, 2}, Q = {1}, a = (2, 2), b = (1), c = (1, 2). The former problem is an assignment of values to
the parameter symbols a, b, c indexed
cannot be solved by any software, unless P, Q, a, b, c are initialized over sets P, Q of varying size.
to values; the latter, on the other hand, can be solved numerically
by an appropriate solver.
Formulation (1.12) is as a structured formulation, whereas (1.13)
is a flat formulation (see Fig. 1.20).
param b{Q};
The point is that AMPL can automat-
param c{P,Q}; ically transform a structured form to a
var x{P,Q} >= 0; flat form.
minimize cost: sum{i in P, j in Q} c[i,j]*x[i,j];
subject to production{i in P}: sum{j in Q} x[i,j] <= a[i];
subject to demand{j in Q}: sum{i in P} x[i,j] >= b[j];
24 mathematical programming
# transportation.dat
param Pmax := 2; AMPL .dat files have a slightly differ-
param Qmax := 1; ent syntax with respect to .mod or .run
files. For example, you would define a
param a :=
set using
1 2.0
2 2.0 set := { 1, 2 };
1.8.3 Python
Python is currently very popular with programmers because of the
large amounts of available libraries: even complex tasks require
very little coding.
timelimit=60
26 mathematical programming
The objective function Eq. (2.3) aims at minimizing the total cost
Figure 2.2: Euler, and TSP art.
of the selected legs of the traveling salesman tour. By Eq. (2.4)-
(2.5),9 we know that the feasible region consists of permutations of 9
Constraints (2.4)-(2.5) define the
{0, . . . , n}: if, by contradiction, there were two integers j, ` such that assignment constraints, which define
an incredibly important substructure
xij = xi` = 1, then this would violate Eq. (2.4); and, conversely, if in MP formulation — these crop up
there were two integers i, ` such that xij = x` j = 1, then this would in scheduling, logistics, resource
allocation, and many other types of
violate Eq. (2.5). Therefore all ordered pairs (i, j) with xij = 1 define problems. Assignment constraints
a bijection {0, . . . , n} → {0, . . . , n}, in other words a permutation. define a bijection on the set of their
indices.
Permutations can be decomposed in products of disjoint cycles.
This, however, would not yield a tour but many subtours. We have
to show that having more than one tour would violate Eq. (2.6).
Suppose, to get a contradiction, that there are at least two tours.
Then one cannot contain city 0 (since the tours have to be disjoint
by definition of bijection): suppose this is the tour i1 , . . . , ih . Then
from Eq. (2.6), by setting the x variables to one along the relevant
legs, for each ` < h we obtain ui` − ui`+1 + n ≤ n − 1 as well
as uih − ui1 + n ≤ n − 1. Now we sum all these inequalities and
observe that all of the u variables cancel out, since they all occur
with changed sign in exactly two inequalities. Thus we obtain
n ≤ n − 1, a contradiction.10 Therefore the above is a valid MILP 10
At least one tour (namely the one
formulation for the TSP. containing city 0) is safe, since we
quantify Eq. (2.6) over i, j both non-
Eq. (2.6) is not the only possible way to eliminate the subtours. zero.
Consider for example the exponentially large family of inequalities
where c, b, A are as in the LP case. ILP is just like MILP but without
continuous variables. Consider the following problem.
(2.5):
min ∑ cij xij
i,j≤n
∀i ≤ n ∑ xij
= 1
j≤n (2.11)
∀j ≤ n ∑ xij = 1
i ≤n
∀i, j ≤ n xij ∈ {0, 1}.
Since ` norms are convex functions and the constraints are con-
vex, Eq. (2.18) is a cNLP. It is routinely used with the Euclidean
norm (` = 2) to project points onto convex sets, such as cones and
subspaces.
Surprisingly, the problem of interpolating an arbitrary set of
points in the plane26 by means of a given class of functions can be 26
Generalize this to Rn .
cast as a cNLP. Let P be a finite set of k points ( pi , qi ) in R2 , and
consider the parametrized function:
f u ( x ) = u0 + u1 x + u2 x 2 + · · · + u m x m .
min k F (u)k`
u Figure 2.13: Nonlinear function
interpolation [Boyd and Vandenberghe,
for some positive integer `. The point is that f u is nonlinear in x but 2004].
linear in u, which are the actual decision variables.
Eq. (2.21).
If K = 2 the EDGP essentially asks to draw a given graph on the
plane such that edge segments have lengths consistent with the
edge function d. The EDGP has many applications in engineering
and other fields [Liberti et al., 2014].
Nonconvex NLPs may have many local optima which are not
global. There are therefore two categories of solvers which one
can use to solve NLPs: local solvers (for finding local optima) and
global solvers (for finding global optima).
We mentioned local NLP solvers (such as IPOPT or SNOPT)
in connection with solving cNLP globally in Sect. 2.5 above. Un-
Figure 2.16: Two applications of EDGP:
fortunately, any guarantee of global optimality disappears when localization of sensor networks, and
deploying such solvers on nonconvex NLPs. The actual situation is protein conformation from Nuclear
Magnetic Resonance (NMR) data.
even worse: most local NLP solvers are designed to simply finding
the closest local optimum from a given feasible point.
In general, local NLP algorithms are iterative, and identify a
sequence x k ∈ Rn (for k ∈ N) which starts from a given feasible
point x0 and by finding a feasible improving direction vector dk by
means of solving an auxiliary (but easier) optimization subproblem,
they then set
x k +1 = x k + d k .
36 mathematical programming
standard form is
)
min x > Qx + c> x
(2.24)
Ax ≤ b,
x > Qx
min
n
∑ ρi xi
≥ µ
i =1 (2.25)
n
∑ x1 = 1
i =1
x ∈ [0, 1]n .
Like all convex programs, the cQP may also arise as a convex
relaxation of a nonconvex NLP or QP, used at every node of a sBB
algorithm.
Sect. 5.3):
1
max 2 ∑ (1 − xi x j )
i< j≤n (2.30)
x ∈ {−1, 1}n ,
where n = |V |. We assign a decision variable xi to each vertex
i ∈ V, and we let xi = 1 if i ∈ S, and xi = −1 if i 6∈ S. If {i, j} is an
edge such that i, j ∈ S, then xi x j = 1 and the corresponding term on
the objective function is zero. Likewise, if i, j 6∈ S, then xi x j = 1 and,
again, the objective term is zero. The only positive contribution to
the objective function arises if xi x j = −1, which happens when
i ∈ S, j 6∈ S or vice versa, in which case the contribution of the
term is 2 (hence we divide the objective by 2 to count the number
of edges in the cut). Formulation (2.30) is not quite a BQP yet, since
the BQP prescribes x ∈ {0, 1}n . However, an affine transformation
takes care of this detail.51
BQPs are usually linearized52 before solving them. Specifically,
they are transformed to (larger) BLPs, and a MILP solver (such as
CPLEX) is then employed [Liberti, 2007].
Figure 2.22: D-Wave’s claimed “quan-
tum computer” (from dwavesys.com),
above. I sat at a few meetings where
2.9.4 Quadratically constrained quadratic programming
IBM researchers de-bunked D-Wave’s
claims, by solving Ising Model BQPs
Quadratically Constrained Quadratic Programming (QCQP) (see Ch. 5) defined on chimera graphs
consists in the minimization of any quadratic form subject to a set (below) on standard laptops faster
of quadratic constraints. The standard form is than on D-Wave’s massive hardware.
Read Scott Aaronson’s blog entry
scottaaronson.com/blog/?p=1400 (it
)
min x > Q0 x + c0 x is written in inverse chronological
(2.31)
∀i ≤ m x > Qi x + ci x ≤ b, order, so start at the bottom). By the
way, I find Prof. Aaronson’s insight,
where c0 , . . . , cm are row vectors in ∈ Rn , Q0 , . . . , Qm are n × n writing and teaching style absolutely
wonderful and highly entertaining, so
matrices, and b ∈ Rm . I encourage you to read anything he
The EDGP (see Sect. 2.6) can be written as a feasibility QCQP writes.
51
Write down the affine transformation
by simply squaring both sides of Eq. (2.21).53 Solving this QCQP,
and the transformed formulation.
however, is empirically very hard:54 there are better formulations 52
This means that each term xi x j is
for the EDGP.55 replaced by an additional variable yij ;
usually, additional linear constraints
Convex QCQPs can be solved to global optimality by local NLP are also added to the formulation to
solvers such as IPOPT and SNOPT. For nonconvex QCQPs, global link y and x variables in a meaningful
solutions can be approximated to desired accuracy using the sBB way.
53
The standard form can be achieved
algorithm (e.g. BARON or Couenne). by replacing each quadratic equation
by two inequalities of opposite sign.
54
Try solving some small EDGP
2.10 Semidefinite programming instances using Couenne. What kind
of size can you achieve?
Semidefinite Programming (SDP) consists in the minimization of
55
Write and test some formulations for
the EDGP.
a linear form subject to a semidefinite cone. The standard form is
min C • X
i
∀i ≤ m A • X = b i (2.32)
X 0,
min ∑ c jh x jh
j,h≤n
∀i ≤ m ∑ aijh x jh = bi .
j,h≤n
∑
57
Equivalently: any PSD matrix is the
f (y) = q jh y j yh , Gram matrix of a k × n matrix Y (the
j,h≤n converse also holds).
f (X) = ∑ q jh x jh = Q • X,
j,h≤n
in X.
We would now like to get as close as possible to state that there
exists a K × n matrix Y such that X − Y > Y = 0. One possible way to
get “as close as possible” is to replace X − Y > Y = 0 by X − Y > Y 0,
which can also be trivially re-written as X − Y > In−1 Y 0. The left
hand side in this equation is known as the Schur complement of X in
the matrix
!
In Y
S= ,
Y> X
Figure 2.26: A solution to the instance
and X − Y > In−1 Y
0 is equivalent to setting S 0. We thus derived in Fig. 2.25 obtained by the SDP solver
MOSEK [mosek7], next to the actual
the SDP relaxation of the EDGP system Eq. (2.33): shape: as you can see, they are very
close, which means that the relaxation
∀{u, v} ∈ E Xuu − 2Xuv + Xvv = d2uv
is almost precise.
!
In Y (2.35)
0.
Y> X
Many variants of this SDP relaxation for the EDGP were proposed
in the literature, see e.g. [Man-Cho So and Ye, 2007, Alfakih et al.,
1999].
SDPs are solved in polynomial time (to desired accuracy) by
means of interior point methods (IPM). There are both commercial
and free SDP solvers, see e.g. MOSEK [mosek7] and SeDuMi.
Natively, they require a lot of work to interface to. Their most
convenient interfaces are Python and the YALMIP [Löfberg, 2004]
MATLAB [matlab] package. The observed complexity of most
SDP solvers appears to be around O(n3 ), so SDP solver technology
cannot yet scale to very large problem sizes. For many problems,
however, SDP relaxations seem to be very tight, and help find
approximate solutions which are “close to being feasible” in the
original problem.
(why?).
min f 1 ( x )
59
A Pareto optimum is a solution x ∗ to
.. ..
Eq. (2.37) such that no other solution
. .
x 0 exists which can improve one
min f q ( x ) (2.37) objective without worsening another.
g( x ) ≤ 0
The Pareto set (or Pareto region) is the
set of Pareto optima.
x ∈ X,
min{ f i ( x ) | ∀ p 6= i f p ( x ) ≤ λ p ∧ g( x ) ≤ 0 ∧ x ∈ X } (?)
We suppose that arc capacities have a unit cost, i.e. the total cost of
installing arc capacities is
∑ Cij . (2.38)
(i,j)∈ A
If we had not been told to minimize the total installed capacity, this
would be a standard Max Flow problem (see Sect. 1.7.2):
∑
max xsj
x ≥0 (s,j)∈ A
∑ xis = 0
(i,s)∈ A
∀i ∈ V r {s, t} ∑ xij = ∑ x ji
(i,j)∈ A ( j,i )∈ A
∀(i, j) ∈ A xij ≤ Cij .
min f ( x, y)
x,y
≤ 0
g( x, y) (2.40)
y ∈ arg min{φx (y) | γx (y) ≤ 0 ∧ y ∈ ζ }
x ∈ X,
min c( x ) · y ≥ α,
y
max λ · b( x ) ≥ α. (2.43)
λ
Now stop and consider Eq. (2.43) within the context of the
upper-level problem: it is telling us that the maximum of a certain
function must be at least α. So if we find any λ which satisfies
λ · b( x ) ≥ α, the maximum will certainly also be ≥ α. This means we
can forget the maxλ . The problem can therefore be written as:
min f (x)
x,λ
g( x ) ≤ 0
λ · b( x ) ≥ α (2.44)
λ · A( x ) ≤
c( x )
x ∈ X,
∀ p ∈ [ p L , pU ] g( p, x ) ≤ 0,
Consider the MP
sup{c> x | ∀ a j ∈ K j
x
∑ a j x j ≤ b }, (2.46)
j≤n
2.14 Summary
Reformulations are symbolic transformations in MP. We usually Figure 3.1: The preface of the special
issue 157 of Discrete Applied Mathe-
assume these transformations can be carried out in polynomial matics dedicated to Reformulations in
time. In Ch. 2 we listed many different types of MP. Some are easier Mathematical Programming [Liberti
and Maculan, 2009].
to solve than others; specifically, some are motivated by a real-
life application, and others by the existence of an efficient solver.
Usually, real-life problems naturally yield MP classes which are
NP-hard, while we are interested in defining the largest MP class
solvable by each efficient (i.e. polytime) solver we have. Ideally,
we would like to reformulate1 a given formulation belonging to a 1
I do not mean that any NP-hard
“hard” class to an equivalent one in an “easy” class. problem can just be reformulated
in polytime to a problem in P, and
Hard problems will remain hard,2 but certain formulations be- no-one knows yet if this is possible or
long to a hard class simply because we did not formulate them in not: the P 6= NP is the most important
open problem in computer science.
the best possible way.3 This chapter is about transforming MPs into 2
This is because if P is NP-hard and
other MPs with desirable properties, or features. Such transforma- Q is obtained from P by means of a
tions are called reformulations. In this chapter, we list a few useful polynomial time reformulation, then
by definition Q is also NP-hard.
and elementary reformulations. 3
More precisely, NP-hard problems
Informally speaking, a reformulation of a MP P is another MP contain infinite subclasses of instances
Q, such that Q can be obtained from P in a shorter time than is which constitute a problem in P,
e.g. the class of all complete graphs
needed to solve P, and such that the solution of Q is in some way is clearly a trivially easy subclass of
useful to solving P. We list four main types of reformulation: exact, Clique.
narrowing, relaxation, approximation. More information can be
found in [Liberti, 2009, Liberti et al., 2009a].
3.0.1 Definition
• A reformulation is exact if there is an efficiently computable surjective4
map φ : G( Q) → G( P). Sometimes we require that φ also extends to
local optima (L( Q) → L( P)), or feasible solutions (F ( Q) → F ( P)),
Figure 3.2: Exact reformulation.
or both. 4
We require φ to be surjective so that
• A reformulation is a narrowing if φ maps G( Q) to a nontrivial5 subset we can retrieve all optima of P from
the optima of Q.
of G( P). Again, we might want to extend φ to local optima or feasible 5
In narrowings, we are allowed to
solutions. lose some optima of the original
problem P; narrowings are mostly
• A reformulation is a relaxation if the objective function value at the used in symmetric MPs, where we are
global optima of Q provides a bound to the objective function value at only interested in one representative
the global optima of P in the optimization direction. optimum per symmetric orbit.
min − f
3.1 Elementary reformulations − min − f
max f
3.1.1 Objective function direction and constraint sense
∑ ph yh = x
h≤k
P ≡ { x ∈ Rn | ∀ i ≤ m g ( x ) ≤ 0 }
yij ≤ xj (3.3)
yij ≥ xi + x j − 1. (3.4)
3.2.1 Proposition
For any i, j with
∀i ≤ I yI ≤ xi (3.5)
Figure 3.5: The reason why Fortet’s
yI ≥ ∑ xi − | I | + 1. (3.6) reformulation works: the constraints
i∈ I form the convex envelope of the set
{( xi , x j , yij ) | yij = xi x j ∧ xi , x j ∈ {0, 1}},
i.e. the smallest convex set containing
3.2.2 Product of a binary variable by a bounded continuous variable the given set.
23
Formulate and prove the general-
ization of Prop. 3.2.1 in the case of | I |
We tackle terms of the form xy where x ∈ {0, 1}, y ∈ [ L, U ], binary variables.
and L ≤ U are given scalar parameters. It suffices to replace the
term xy by an additional variable z, and then add the following
linearization constraints:24 24
Prove their correctness.
z ≥ Lx (3.7)
z ≤ Ux (3.8)
z ≥ y − max(| L|, |U |)(1 − x ) (3.9)
z ≤ y + max(| L|, |U |)(1 − x ). (3.10)
xj ≤ Mz j (3.11)
yj ≤ M (1 − z j ). (3.12)
f ( x ) = · · · + |h( x )| + · · ·
56 mathematical programming
f ( x ) = · · · + (t+ + t− ) + · · ·
f ( x ) = · · · + t+ + t− + · · ·
subject to
t− ≤ h( x ) ≤ t+ .
Finally, since −|h( x )| ≤ h( x ) ≤ | h( x )|, we can simply replace
t− by −t+ in the constraints (and rename t+ as t for simplicity), to
obtain
∀i ≤ n aii ≥ ∑ |aij |.
j 6 =i
a> x + b
τ= ,
c> x + d
Consider the
∀i ≤ m ∑ xij = 1,
j≤n
∀j ≤ n ∑ |w jk | = 1
k≤d
∀i ≤ m ∑ xij = 1
j≤n
∀ j ≤ n |w j |1 = 1
∀i ≤ m, j ≤ n tij+ − tij− = w j pi − w0j ,
∀i ≤ m ∑ xij = 1
j≤n
∀ j |w j |1 = 1
∀i ≤ m, j ≤ n tij+ − tij− = w j pi − w0j
∀i ≤ m, j ≤ n yij+ ≤ min( Mxij , tij+ )
∀i ≤ m, j ≤ n yij+ ≥ Mxij + tij+ − M
∀i ≤ m, j ≤ n yij− ≤ min( Mxij , tij− )
∀i ≤ m, j ≤ n yij− ≥ Mxij + tij− − M,
∀i ≤ m ∑ xij = 1
j≤n
−
where u+ jk , u jk ∈ [0, M ] are continuous variables for all j and
k ≤ d. 32 32
Again, the upper bound M does not
lose generality.
4. The last constraints above are complementarity constraints,33 33
They can be collectively written as
−
which we can reformulate according to Sect. 3.2.3: ∑ u+
jk u jk = 0. Why?
j≤n
k≤d
∀i ≤ m ∑ xij = 1
j≤n
## parameters
param m integer, > 0;
60 mathematical programming
## variables
var w0{N} <= w0U, >= w0L;
var w{N,D} <= wU, >= wL;
var x{M,N} binary;
var tplus{M,N} >= 0, <= big;
var tminus{M,N} >= 0, <= big;
var yplus{M,N} >= 0, <= big;
var yminus{M,N} >= 0, <= big;
var uplus{N,D} >= 0, <= big;
var uminus{N,D} >= 0, <= big;
var z{N,D} binary;
## objective function
minimize fitting_error: sum{i in M, j in N} (yplus[i,j] + yminus[i,j]);
## constraints
subject to assignment {i in M} : sum{j in N} x[i,j] = 1;
subject to minabs_ref {i in M, j in N} :
tplus[i,j] - tminus[i,j] = sum{k in D} w[j,k] * p[i,k] - w0[j];
subject to yplus_lin1 {i in M, j in N} : yplus[i,j] <= big * x[i,j];
subject to yplus_lin2 {i in M, j in N} : yplus[i,j] <= tplus[i,j];
subject to yplus_lin3 {i in M, j in N} :
yplus[i,j] >= big * x[i,j] + tplus[i,j] - big;
subject to yminus_lin1 {i in M, j in N} : yminus[i,j] <= big * x[i,j];
subject to yminus_lin2 {i in M, j in N} : yminus[i,j] <= tminus[i,j];
subject to yminus_lin3 {i in M, j in N} :
yminus[i,j] >= big * x[i,j] + tminus[i,j] - big;
subject to one_norm1_ref{j in N}: sum{k in D} (uplus[j,k] + uminus[j,k]) = 1;
subject to one_norm2_ref{j in N, k in D}: uplus[j,k] - uminus[j,k] = w[j,k];
subject to one_norm3_ref{j in N, k in D}: uplus[j,k] <= big*z[j,k];
subject to one_norm4_ref{j in N, k in D}: uminus[j,k] <= big*(1 - z[j,k]);
• Parameters:
1. Let N ∈ N;
11
0
1
0
0
1
2. for all i ≤ n, si is the expected number of calls to the i-th
component; 15
2
3. for all i ≤ n, j ∈ Ii , cij is the cost, dij is the delivery time, and IWF 11 CE 7
CDB
µij the probability of failure on demand of the j-th off-the- 3
distance
4
indicator
TDB
CSE 8
4. for all i ≤ n, j ∈ Ji , c̄ij is the cost, tij is the estimated devel- 5
CLS
13
case, pij is the probability that the instance is faulty, and bij the 14
1
0 CLDB
1
0
09
1
testability of the j-th purpose-built component for slot i. 0
1 10 1
0
• Constraints:
∀i ≤ n ∑ xij = 1;
j∈ Ii ∪ Ji
13 14
6 OWF
whence the constraint is HRDB CLDB
1
0 1
0
0
1 1
0
09
1 10
∀i ≤ n, j ∈ Ji ∑ νijk = 1 (3.19)
k≤ N
the constraints in Eq. (3.20) are used to replace Nij , and those in
Eq. (3.21) to replace ϑij .34 We obtain: 34
Constraints (3.19) are simply added
to the formulation.
!
min ∑ ∑ cij xij + ∑ c̄ij (tij + τij ∑ kνijk ) xij
i ≤n j∈ Ii j∈ Ji k≤ N
∀i ≤ n ∑ xij = 1
j∈ Ii ∪ Ji
11
00
∑ dij xij + ∑ (tij + τij ∑ kνijk ) xij
1
00
11
∀i ≤ n ≤ T 00
11
j∈ Ii j∈ Ji k≤ N 15
!
∑ si ∑ µij xij + ∑ xij ∑
2
ϑk νijk
11 12
≥ ln( R) IWF CDB TDB
CE 7
i ≤n j∈ Ii j∈ Ji k≤ N 3 4
distance indicator 16
∑
value value
∀i ≤ n, j ∈ Ji νijk = 1. DBI
8
k≤ N 5
CLS
CSE
13 14
3. We distribute products over sums in the formulation to obtain 6 OWF
HRDB CLDB
products of binary variables, we get a MILP reformulation Q of Figure 3.13: After interfacing, we can
P where all the variables are binary. solve a graph partitioning problem
on G = (V, A) so as to identify
We remark that Q derived above has many more variables and semantically related clusters, for
which there might conceivably exist an
constraints than P. More compact reformulations are applicable off-the-shelf solution.
in Step 3 because of the presence of the assignment constraints
[Liberti, 2007].
Reformulation Q essentially rests on linearization variables wijk
which replace the quadratic terms xij νijk throughout the formulation.
64 mathematical programming
3.4 Summary
In-depth topics
4
Constraint programming
p p
We remark that Ci need not be explicitly cast in function of all the C4
p C3
decision variables. In fact, it turns out that every CP formulation x5
x4 x6
C ( p) can be reduced to one which has at most unary and binary
constraints, by reduction to the dual CP, denoted C ∗ ( p). For each
p
i ≤ m, consider the subset Si ⊆ {1, . . . , n} of variable indices that Ci
Figure 4.2: The constraint/variable
depends on.
incidence structure of the CP for-
The dual CP has m dual variables y1 , . . . , ym with domains mulation can be represented as a
R1 , . . . , Rm and the dual constraints: hypergraph.
∀ i < j ≤ m y i [ Si ∩ S j ] = y j [ Si ∩ S j ] . (4.1)
Note that each dual variable yi takes values from the relation Ri ,
which is itself a subset of a cartesian product of the Di ’s. Hence you
should think of each dual variable as a vector or a list, which means
that y = (y1 , . . . , ym ) is a jagged array.3 The meaning of Eq. (4.1) is 3
A jagged array is a basic data struc-
ture: it consists of a list of lists of
different sizes.
68 mathematical programming
p p
that, for each pair of constraints Ci , Cj , we want the corresponding y1 y2
{2, 3}
dual variables yi , y j to take on values from Ri and R j which agree
on Si ∩ S j . Note that formally each dual constraint is binary, since it
only involves two dual variables. Note also that when Si ∩ S j = ∅, {3
{3}
}
the constraint is empty, which means it is always satisfied and we
can formally remove it from the dual CP. Thus, Eq. (4.1) contains
O(m2 ) constraints in the worst case, but in practice there could be
y4 y3
fewer than that.
4.2 CP and MP
p
other constraint C` involving x j , D j becomes node-inconsistent
p
w.r.t. Ci . Since constraints can never add values to domains, the
p
effect of C` on D j must yield a domain D 0j ⊆ D j . Of course, if D 0j is
p
node-inconsistent w.r.t Ci , any superset of D 0j must also be node-
inconsistent, which means that D j is, against the initial assumption.
Consistency with respect to binary constraints of the form
p
Ci ( x j , xh ) is also known as 2-consistency or arc-consistency. The do-
p
mains D j , Dh are 2-consistent with respect to Ci if for each a j ∈ D j
p
there is ah ∈ Dh such that Ci ( a j , ah ) evaluates to YES. Consider
for example the domain D2 = {1, 2, 3, 4} for x2 and the constraint
x1 < x2 : then D1 , D2 are arc-consistent with respect to x1 < x2 if D1
is reduced to {2, 3} and D2 to {3, 4}.
Domain reduction through consistency is usually performed be-
fore choosing the branching variable index in the search algorithm
(Fig. 4.6).
AllDifferent( x1 , . . . , xn )
CP formulations do not natively have objective functions f ( x ) to Figure 4.9: An unforeseen glitch in
optimize, but there are at least two approaches in order to handle Hall’s stable marriage theorem (from
snipview.com).
them.
A simple approach consists in enforcing a new constraint
f ( x ) ≤ d, and then update the value of d with the objective function
value of the best solution found so far during the solution process
(Sect. 4.3). Since a complete search tree exploration lists all of the
feasible solutions, the best optimum at the end of the exploration is
the global optimum.
In the case of discrete bounded variable domains another ap- while L < U do
proach consists in using bisection, which we explain in more detail. d ← ( L + U )/2
x̄ ← C¯( p, d)
Let C ( p) be a CP formulation, with p a parameter vector represent- if x̄ = ∅ then
ing the instance; and let x 0 = C ( p), with x 0 = ∅ if p is an infeasible L←d
else
instance. U←d
Given some lower bound f L and upper bound f U such that end if
f +f end while
f L ≤ minx f ( x ) ≤ f U , we select the midpoint d = L 2 U and
return x̄
consider the formulation C¯( p, d) consisting of all the constraints of
C ( p) together with the additional constraint f ( x ) ≤ d. We compute Figure 4.10: The bisection algorithm
(see Fig. 1.16).
x̄ = C¯( p, d): if x̄ = ∅ we update f L ← d, otherwise f U ← d,
finally we update the midpoint and repeat. This process terminates
whenever f L = f U , in which case x̄ is a feasible point in C ( p) which
minimizes f ( x ).
Bisection may fail to terminate unless f L , f U can only take finitely
many values (e.g. f L , f U ∈ N).8 Since CP formulations are mostly 8
We remark that f L can only increase
defined over integer or discrete decision variables, this requirement and f U can only decrease during the
bisection algorithm, hence if both are
is often satisfied. integer they can only range between
The complexity of bisection is O(t log2 (|U − L|)) where t is the their initial values.
4.6 Sudoku
model sudoku_cp.mod;
data sudoku_cp.dat;
option solver ilogcp;
option ilogcp_options "logverbosity=terse";
solve;
printf "x:\n";
for{i in N} {
for{j in N} {
printf "%2d ", x[i,j];
}
printf "\n";
}
printf "\n";
4.7 Summary
The problem which names this chapter is one of the best known
problems in combinatorial optimization: finding the cutset of
maximum weight in a weighted undirected graph.1 It has two main 1
We already mentioned the un-
motivating applications: finding the minimum energy magnetic weighted variant in Sect. 2.9.3.
as a cut, whereas the set of edges with one incident vertex in S and
the other in V r S is a cutset. So the Max Cut problem calls for the 5
1
p̄ < ,
2m
5.3 MP formulations
Figure 5.3: You should be all familiar
Let us see a few MP formulations for the Max Cut problem.
with the Max Flow = Min Cut theo-
rem, which states that the maximum
flow (see Sect. 1.7.2) in a directed
5.3.1 A natural formulation
graph with given arc capacities is the
same as the capacity of a minimum
We start with a “natural” formulation, based on two sets of decision cut. Explain why we cannot solve Max
variables, both for cut and cutset. Cut on an undirected graph by simply
replacing each edge by two antipar-
• Let V be the vertex set and E the edge set. allel arcs with capacities equal to the
negative weights, and then solving a
• Let w : E → R+ be the edge weight parameters. Max Flow LP to find the minimum
cut. (Picture from faculty.ycp.edu).
• For each {i, j} ∈ E let zij = 1 if {i, j} is in the maximum cutset,
and 0 otherwise (decision variables).
max ∑
s,z∈{0,1} {i,j}∈ E
wij zij ,
∀{i, j} ∈ E zij = yi (1 − y j ) + (1 − yi )y j .
Note that the RHS can only be zero or one. If i, j are both in
S or not in S, we have yi = y j = 1 or yi = y j = 0, so the
RHS evaluates to zero, which implies that the edge {i, j} is not
counted in the maximum cutset. If i ∈ S and j 6∈ S, then the first
term of the RHS is equal to 1, and the second is equal to 0, so the
RHS is equal to 1, which implies that the edge {i, j} is counted.
Similarly for the remaining case i 6∈ S and j ∈ S.
yi − yi y j + y j − yi y j = yi + y j − 2yi y j ,
maximum cut 79
1
2 {i,j∑
max wij (1 − xi x j ). (5.2)
x ∈{−1,1}n }∈ E
Note that Eq. (5.2) is also a natural formulation (see Fig. 5.4): for
each i ∈ V, let xi = 1 if i ∈ S and xi = −1 if i ∈ V r S. Then, for each
{i, j} ∈ E, xi x j = −1 if and only if {i, j} is in the maximum cutset,
and the weight of the corresponding edge is 12 wij (1 − (−1)) = wij .
1
4 i,j∑
max wij (1 − vi · v j ). (5.4)
v ∈Sn −1 ∈V
5.3.1 Proposition
For any n × n real symmetric matrix V, rank V = 1 if and only if there
exists a vector u ∈ Rn such that uu> = VV > .
5.3.2 Proposition
For any real n × n symmetric matrix V, we have: (i) Gram(V ) 0; (ii)
rank Gram(V ) = rank V.
5.3.3 Corollary
If X = ( Xij ) = Gram(V ) and rank X = 1, then there exists a vector u
such that Xij = ui u j .
1
∑ wij (1 − Xij )
max 4
i,j∈V
Diag( X ) = I (5.5)
X 0
rank X = 1,
∑ wij (1 − Xij )
i,j∈V
since r and all of the vi ’s are unit vectors. Thus, i ∈ S0 if and only if
r · vi ≥ 0, which implies that we can define
x 0 = sgn(r > V ),
E ∑ wij = ∑ wij Prob(xi0 6= x0j ). (5.8) Figure 5.7: For every pair of vec-
tors, we can look at the situation
i ∈S0 i< j
j6∈S0 in 2D (from some lecture notes of
R. O’Donnell).
So we need to evaluate the probability that, given a vector r picked
uniformly at random on Sn−1 , its orthogonal hyperplane H forms
angles as specified above with two given vectors vi and v j .
Since we are only talking about two vectors, we can focus on the
plane η spanned by those two vectors. Obviously, the projection of
Sn−1 on η is a circle C centered at the origin. We look at the line λ
on η (through the origin) spanned by r. The question, much simpler
to behold now that it is in 2D, becomes: what is the probability that a
line crossing a circle separates two given radii vi , v j ?
This question really belongs to elementary geometry (see
Fig. 5.7). In order to separate vi , v j , the radius defined by λ has to
belong to the smaller slice of circle delimited by vi and v j . The prob-
ability of this happening is the same as the extent of the (smaller)
angle between vi and v j divided by 2π. Only, λ really identifies a di-
ameter, so we could turn λ around by π and get “the other side” of
λ to define another radius. So we have to multiply this probability
by a factor of two:
arccos(vi · v j )
∀i < j Prob( xi0 6= x 0j ) = . (5.9)
π
Since we do not know Opt(I), in order to evaluate the approxi-
mation ratio of the GW we can only compare the objective function
value of the rounding x 0 with the SDP relaxation value, which, as
an upper bound to Opt(I), is such that
Figure 5.8: When rounding is worse
1 1
∑
2 i< j
wij (1 − xi0 x 0j ) ≤ Opt(I) ≤ ∑ wij (1 − vi · v j ).
2 i< j
(5.10) than SDP: how much worse? (From
some lecture notes of R. O’Donnell)
1 1
arccos ρ and (1 − ρ ),
π 2
84 mathematical programming
(arccos ρ)/π
α = min ≈ 0.87854.
ρ∈[−1,1] (1/2)(1 − ρ)
as claimed.
5.4.1 Lemma
The vector r is uniformly distributed on Sn−1 .
Figure 5.9: Most “wild guess methods”
Proof. Since r10 , . . . , rn0 are independently sampled, their joint proba- for uniformly sampling on a sphere
bility distribution function is simply the product of each individual turn out to be incorrect (above) or,
distribution function: if correct (below), inefficient (from
www.bogotobogo.com).
1 0 2 1 0
∏ √2π e(ri ) /2 = (2π ) n/2
e−kr k2 /2 .
i ≤n
1
Z
0
pA = e−kr k2 /2 dr 0 .
(2π )n/2 A
1
Z
0
pU A = e−kr k2 /2 dr 0
(2π )n/2 U> A
1
Z
> r0 k
= e−kU 2 /2 dr 0
(2π )n/2 A
1
Z
> Uk 0k
= e−kU 2 kr 2 /2 dr 0
(2π )n/2 A
1
Z
0
= e−k I k2 kr k2 /2 dr 0
(2π )n/2 A
1
Z
0
= e−kr k2 /2 dr 0 = p A .
(2π )n/2 A
maximum cut 85
import time
import sys
86 mathematical programming
import math
import numpy as np
from pyomo.environ import *
from pyomo.opt import SolverStatus, TerminationCondition
import pyomo.opt
import cvxopt as cvx
import cvxopt.lapack
import picos as pic
import networkx as nx
GWitn = 10000
MyEpsilon = 1e-6
cplexTimeLimit = 100
def create_maxcut_sdp(Laplacian):
n = Laplacian.shape[0]
# create max cut problem SDP with picos
maxcut = pic.Problem()
# SDP variable: n x n matrix
X = maxcut.add_variable(’X’, (n,n), ’symmetric’)
# constraint: ones on the diagonal
maxcut.add_constraint(pic.tools.diag_vect(X) == 1)
# constraint: X positive semidefinite
maxcut.add_constraint(X >> 0)
# objective function
L = pic.new_param(’L’, Laplacian)
# note that A|B = tr(AB)
maxcut.set_objective(’max’, L|X)
return maxcut
Note that we try and bound the “negative zero” eigenvalues due to
floating point errors.
5.5.3 Main
We now step through the instructions of the “main” part of our
Python program. First, we read the input file having name given as
the first argument on the comand line.
1 2 1
1 3 1
2 3 1
milp = create_maxcut_milp(G)
t0 = time.time()
results = solver.solve(milp, keepfiles = False, tee = True)
t1 = time.time()
milpcpu = t1 - t0
milp.solutions.load_from(results)
ymilp = {i:milp.y[i].value for i in G.nodes()}
milpcut = [i for i,y in ymilp.iteritems() if y == 1]
milpobj = milp.maxcutobj()
t0 = time.time()
N = len(G.nodes())
V = rank_factor(Xsdp, N)
(gwcut,gwobj) = gw_randomized_rounding(V, sdp.obj_value(), L, GWitn)
t1 = time.time()
gwcpu = t1 - t0
This code resides in a text file called maxcut.py, and can be run
using the command line
where file.edg is the instance file with the edge list, as detailed
above (two integers and a floating point number per line, where the
integers are the indices of the vertices adjacent to each edge and the
floating point is its weight).
are satisfied?
2
What if the only way to write a
certain proof formally takes longer
than the maximum extent of the
When the norm in Eq. (6.1) is the 2-norm, the DGP is also called human life? Well, use computers!
Euclidean DGP (EDGP), which was introduced in Sect. 2.6. What about reading a proof? Today we
have formal proofs, partly or wholly
Because it consists of a set of constraints for which we need written out by computers, which
to find a solution, the DGP is a CP problem (see Ch. 4). On the no-one can possibly hope to read in a
lifetime, much less understand. Can
other hand, CP problems usually have integer, bounded variables,
they still be called proofs? This is
whereas the DGP has continuous unbounded ones. So it makes similar to the old “does the universe
more sense to see it as a feasibility NLP, i.e. an NLP with zero still exist if no-one’s around to notice?”
objective function.
As mentioned in Sect. 2.6, it is difficult to solve DGPs as systems
3
A realization of a graph is a function,
mapping each vertex to a vector of
of nonlinear constraints, as given in Eq. (6.1). Usually, MP solvers a Euclidean space, which satisfies
are better at improving optimality than ensuring feasibility. A nice Eq. (6.1).
feature of DGPs is that we can write them as a minimization of
94 mathematical programming
constraint errors:
min
x
∑ | k xi − x j k − dij |. (6.2)
{i,j}∈ E
∑ (sij+ + sij− )
min
x,s
{i,j}∈ E
−
= sij+ − sij−
+
∀{i, j} ∈ E ∑ (tijk + tijk ) − dij
k≤K
−
+
∀k ≤ K, {i, j} ∈ E tijk − tijk = xik − x jk
+ −
∀k ≤ K, {i, j} ∈
E tijk tijk = 0
+ −
Figure 6.6: The 1-norm between these
∀{i, j} ∈ E sij , sij ≥ 0
two points is 12 (from Wikipedia).
+ −
∀k ≤ K, {i, j} ∈ E tijk , tijk ≥ 0,
+ −
where the only nonlinear constraints are the products tijk tijk = 0. In 6
The weighted diameter of a graph is
theory, we cannot apply the reformulation in Sect. 3.2.3 since the t the maximum, over all pairs of vertices
u, v in the graph, of the weight of the
variables are unbounded. On the other hand, their unboundedness shortest path connecting u and v.
stems from the x variables, which are themselves unbounded — but
simply because we did not bother looking at the problem closely 7
This implication holds since all
translations are congruences, but it
enough. Suppose that the weighted diameter6 of the input graph G requires us to add another family of
is γ. Then no realization can have k xi − x j k ≥ γ, which also implies7 (linear) constraints to the formulation.
Which ones?
96 mathematical programming
For the rest of this chapter we shall focus on the 2-norm, i.e. we
shall mean EDGP whenever we write DGP.
k xi − x j k22 = ( xi − x j ) · ( xi − x j )
= xi · xi + x j · x j − 2xi · x j
= k xi k22 + k x j k22 − 2xi · x j
= ∑ 2
xik + ∑ x2jk − 2 ∑ xik x jk ,
k≤K k≤K k≤K
By Eq. (6.7), the rightmost term in the right hand side of Eq. (6.9) is
zero. On dividing through by n, we have
1 1
n ∑ d2ij = n ∑ (xi · xi ) + x j · x j . (6.10)
i ≤n i ≤n
1 1
n ∑ d2ij = xi · xi + n ∑ (x j · x j ). (6.11)
j≤n j≤n
1 1
n ∑ d2ij = n n ∑ (xi · xi ) + ∑ (x j · x j ) = 2 ∑ (xi · xi ) (6.12)
i ≤n i ≤n j≤n i≤n
j≤n
(the last equality in Eq. (6.12) holds because the same quantity
f (k ) = xk · xk is being summed over the same range {1, . . . , n}, with
the symbol k replaced by the symbol i first and j next). We then
divide through by n to get:
1 2
n2 ∑ d2ij = n ∑ (xi · xi ). (6.13)
i ≤n i ≤n
j≤n
and replace the left hand side terms of Eq. (6.15)-(6.16) into Eq. (6.14)
to obtain:
1 1 2
2xi · x j =
n ∑ d2ik + n ∑ d2kj − d2ij − n ∑ (xk · xk ), (6.17)
k≤n k≤n k≤n
1
B = − JDJ, (6.19)
2
where J = In − n1 1 · 1> and 1 = (1, . . . , 1).
| {z }
n
We note that all of the steps can be carried out in polynomial time,
and remark that the above algorithm can also be used to ascertain
whether D is a (non-square) EDM or not.14 14
Why?
can write the EDM D of the realization x ∈ RnK as: EDM of x has rank in {K, K + 1, K + 2}.
>
D = 1diag( xx > ) − 2xx > + diag( xx > )1> ,
where 1 is the (column) vector of all ones, and diag(v) is the (col-
umn) vector of the diagonal of v. This implies16 16
Note that Eq. (6.20) is the matrix
equation inverse of Eq. (6.19).
D = 1diag( B)> − 2B + diag( B)1> , (6.20)
weight dij , whereas ? components corresponds to edges not in E. Can this matrix be completed to an
EDM?
So is the EDMCP just another name for the EDGP? The difference
between EDGP and EDMCP is in the integer K: in the former,
K > 0 is given as part of the input, while in the latter it is part of
the output. Specifically, while the graph information is the same,
the EDGP is also given K, and then asks if, for that K, the given
partial matrix can be completed to a distance matrix of a realization
in RK . The EDMCP, on the other hand, asks if there is any K such
that the given partial matrix can be completed to a distance matrix
of a realization in RK .
This difference is subtle, but has a very deep implication: the
EDGP is known to be NP-hard,19 whereas we do not know the
complexity status of the EDMCP: no-one has found a polytime
algorithm, nor a proof of NP-hardness.
Figure 6.8: An EDMCP instance
(above) and the equivalent EDGP
6.6.1 An SDP formulation for the EDMCP instance (below).
19
By reduction from the Partition
By Prop. 5.3.2, every Gram matrix is PSD, and, by Sidenotes 13 in problem, see [Saxe, 1979].
Ch. 5 and 13 in this chapter, every PSD matrix is the Gram matrix
of some other matrix (or realization). If we were given a partially
specified Gram matrix B = (bij ), we could complete20 it by solving 20
This is called the PSD Completion
Problem (PSDCP).
the following SDP:
)
∀i, j ≤ n : bij 6= ? Xij = bij
(6.21)
X 0.
100 mathematical programming
Note the similarity of the first constraint of Eq. (6.22) with Eq. (2.34),
written in terms of weighted graphs instead of partial matrices.
Any solution X ∗ to Eq. (6.22) is a PSD matrix, and hence a Gram
matrix, such that Xii + X jj − 2Xij = d2ij in the components specified
by D. In other words, if we apply Eq. (6.20) with B replaced by X ∗ ,
we get an EDM with the specified entries.
Compare the two SDP formulations for the EDGP (Eq. (2.35))
and for the EDMCP (Eq. (6.22)). The former is different, in that
we use the Schur complement:21 while in the EDGP we require a 21
The Schur complement being PSD
PSD solution of a given rank K, in the EDMCP we only want a PSD states X xx > , which is a relaxation
of X = xx > . The fact that x (and hence
solution of any rank (or, in other words, any PSD solution). This is its rank) is explicitly mentioned in the
why Eq. (2.35) is only a relaxation of the EDGP, whereas Eq. (6.22) SDP makes the relaxation tighter with
respect to the rank constraint.
is an exact formulation of the EDMCP.
Since SDPs can be solved in polytime and the SDP in Eq. (6.22) is
an exact formulation for the EDMCP, why am I saying that no-one
found a polytime algorithm for solving the EDMCP yet? Essentially,
because of floating point errors: the IPMs that solve SDPs are
approximate methods: they find a matrix which is approximately
feasible with respect to the constraints and approximately PSD (see
Sect. 5.4.3). Worst-case complexity classes such as P or NP, however,
are defined for precise solutions only. As discussed in Sidenote 15
in Ch. 5, approximate feasibility disqualifies the algorithm from
being defined as an “approximation algorithm”.
and remark that any matrix X which is DD is also SDP; this follows
directly from Gershgorin’s theorem.23 Lastly, we reformulate the 23
The proof on Wikipedia is very clear,
absolute values in Eq. (6.23) using Eq. (3.16)-(3.18) in Sect. 3.2.4, search for “Gershgorin circle theorem”.
obtaining the following LP, which will yield solutions of the SDP
relaxation in Eq. (2.35):
= d2ij
∀{i, j} ∈ E Xii + X jj − 2Xij
∀i ∈ V ∑ tij ≤ Xii (6.24)
j 6 =i
∀i 6 = j ∈ V −tij ≤ Xij ≤ tij .
Note that this is a feasibility LP. We can solve it “as such”, or try to
make the entries of X as small as possible (which might reduce its
rank if many of them become zero) by using min ∑ij tij .
as the Gram matrix of VU must be PSD. This means that the set
6.6.1 Proposition
The approximation X ` to the solution of Eq. (2.32) is no worse than X `−1 .
Proof. It suffices to show that X `−1 is feasible for the `-th approxi-
mating LP in the sequence, since this shows that the feasible sets F`
of the approximating LP sequence form a chain
F0 ⊆ · · · ⊆ F ` ⊆ · · ·
Since all are inner approximations of the feasible region of Eq. (2.32),
the larger the feasible region, the better the approximation. Now,
> >
by Eq. (6.27) we have that X `−1 = (U ` ) U ` = (U ` ) IU ` , and since
the identity matrix I is trivially DD, we have that X `−1 ∈ D(U ` ).
Moreover, since X `−1 solves the (` − 1)-th approximating LP in the
sequence Eq. (6.28), we have ∀i ≤ m Ai • X `−1 = bi , which proves
the claim. 2
of the SDP feasible region will not help, since only an optimum
of the SDP (or a bound to this optimum) will provide the bound
guarantee for P.
In such cases, we leverage the fact that both LPs and SDPs have a
strong duality theory. In particular:
So all we need to do is to compute the dual of the original SDP, Figure 6.13: A geometric view of a
and then apply the DD inner approximation to the dual. This will minimization SDP dual: maximizing
yield a feasible solution in the dual SDP, which provides a valid over the direction vectors perpendicu-
lar to the outer approximation (from
guaranteed bound for the original SDP (and hence, in turn, for P). [Dattorro, 2015]).
∀i < j ≤ n (1 − ε)k xi − x j k2 ≤ k Txi − Tx j k2 ≤ (1 + ε)k xi − x j k (6.30) Figure 6.16: The JLL won’t work in low
dimensions.
distance geometry 105
6.10 Summary