Mixed Integer Nonlinear Programming
Mixed Integer Nonlinear Programming
info
The IMA Volumes in Mathematics
and its Applications
Volume 154
www.it-ebooks.info
Institute for Mathematics and
its Applications (IMA)
The Institute for Mathematics and its Applications was estab-
lished by a grant from the National Science Foundation to the University
of Minnesota in 1982. The primary mission of the IMA is to foster research
of a truly interdisciplinary nature, establishing links between mathematics
of the highest caliber and important scientific and technological problems
from other disciplines and industries. To this end, the IMA organizes a wide
variety of programs, ranging from short intense workshops in areas of ex-
ceptional interest and opportunity to extensive thematic programs lasting
a year. IMA Volumes are used to communicate results of these programs
that we believe are of particular value to the broader scientific community.
The full list of IMA books can be found at the Web site of the Institute
for Mathematics and its Applications:
http://www.ima.umn.edu/springer/volumes.html.
Presentation materials from the IMA talks are available at
http://www.ima.umn.edu/talks/.
Video library is at
http://www.ima.umn.edu/videos/.
**********
www.it-ebooks.info
Jon Lee • Sven Leyffer
Editors
www.it-ebooks.info
Editors
Jon Lee Sven Leyffer
Industrial and Operations Engineering Mathematics and Computer Science
University of Michigan Argonne National Laboratory
1205 Beal Avenue Argonne, Illinois 60439
Ann Arbor, Michigan 48109 USA
USA
ISSN 0940-6573
ISBN 978-1-4614-1926-6 e-ISBN 978-1-4614-1927-3
DOI 10.1007/978-1-4614-1927-3
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2011942482
Mathematics Subject Classification (2010): 05C25, 20B25, 49J15, 49M15, 49M37, 49N90, 65K05,
90C10, 90C11, 90C22, 90C25, 90C26, 90C27, 90C30, 90C35, 90C51, 90C55, 90C57, 90C60,
90C90, 93C95
www.it-ebooks.info
FOREWORD
Series Editors
Fadil Santosa, Director of the IMA
Markus Keel, Deputy Director of the IMA
www.it-ebooks.info
www.it-ebooks.info
PREFACE
vii
www.it-ebooks.info
viii PREFACE
www.it-ebooks.info
PREFACE ix
www.it-ebooks.info
x PREFACE
www.it-ebooks.info
PREFACE xi
The MILP road to MIQCP (S. Burer and A. Saxena) surveys re-
sults in mixed-integer quadratically constrained programming. Strong con-
vex relaxations and valid inequalities are the basis of efficient, practical
techniques for global optimization. Some of the relaxations and inequal-
ities are derived from the algebraic formulation, while others are based
on disjunctive programming. Much of the inspiration derives from MILP
methodology.
Linear programming relaxations of quadratically-constrained quadratic
programs (A. Qualizza, P. Belotti, and F. Margot) investigates the use
of LP tools for approximately solving semidefinite programming (SDP)
relaxations of quadratically-constrained quadratic programs. The authors
present classes of valid linear inequalities based on spectral decomposition,
together with computational results.
Extending a CIP framework to solve MIQCPs (T. Berthold, S. Heinz,
and S. Vigerske) discusses how to build a solver for MIQCPs by extending a
framework for constraint integer programming (CIP). The advantage of this
approach is that we can utilize the full power of advanced MILP and con-
straint programming technologies. For relaxation, the approach employs
an outer approximation generated by linearization of convex constraints
and linear underestimation of nonconvex constraints. Reformulation, sep-
aration, and propagation techniques are used to handle the quadratic con-
straints efficiently. The authors implemented these methods in the branch-
cut-and-price framework SCIP.
www.it-ebooks.info
xii PREFACE
www.it-ebooks.info
PREFACE xiii
thanks are due to Fadil Santosa, Chun Liu, Patricia Brick, Dzung Nguyen,
Holly Pinkerton, and Eve Marofsky from the IMA, who made the organi-
zation of the workshop and the publication of this special volume such an
easy and enjoyable affair.
Jon Lee
University of Michigan
Sven Leyffer
Argonne National Laboratory
www.it-ebooks.info
www.it-ebooks.info
CONTENTS
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
xv
www.it-ebooks.info
xvi CONTENTS
www.it-ebooks.info
CONTENTS xvii
www.it-ebooks.info
www.it-ebooks.info
PART I:
Convex MINLP
www.it-ebooks.info
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR
CONVEX MIXED INTEGER NONLINEAR PROGRAMS
PIERRE BONAMI∗ , MUSTAFA KILINdž , AND JEFF LINDEROTH‡
Abstract. This paper provides a survey of recent progress and software for solving
convex Mixed Integer Nonlinear Programs (MINLP)s, where the objective and con-
straints are defined by convex functions and integrality restrictions are imposed on a
subset of the decision variables. Convex MINLPs have received sustained attention in
recent years. By exploiting analogies to well-known techniques for solving Mixed Integer
Linear Programs and incorporating these techniques into software, significant improve-
ments have been made in the ability to solve these problems.
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 1
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_1,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
2 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 3
such as CPLEX [66], XPRESS-MP [47], and Gurobi [63]. Linderoth and
Ralphs [82] give a survey of noncommercial software for MILP.
There has also been steady progress over the past 30 years in the de-
velopment and successful implementation of algorithms for NLPs. We refer
the reader to [12] and [94] for a detailed recital of nonlinear programming
techniques. Theoretical developments have led to successful implemen-
tations in software such as SNOPT [57], filterSQP [52], CONOPT [42],
IPOPT [107], LOQO [103], and KNITRO [32]. Waltz [108] states that the
size of instance solvable by NLP is growing by nearly an order of magnitude
a decade.
Of course, solution algorithms for convex MINLP have benefit from
the technological progress made in solving MILP and NLP. However, in the
realm of MINLP, the progress has been far more modest, and the dimension
of solvable convex MINLP by current solvers is small when compared to
MILPs and NLPs. In this work, our goal is to give a brief introduction to
the techniques which are in state-of-the-art solvers for convex MINLPs. We
survey basic theory as well as recent advances that have made their way
into software. We also attempt to make a fair comparison of all algorithmic
approaches and their implementations.
The remainder of the paper can be outlined as follows. A precise de-
scription of a MINLP and algorithmic building blocks for solving MINLPs
are given in Section 2. Section 3 outlines five different solution techniques.
In Section 4, we describe in more detail some advanced techniques imple-
mented in the latest generation of solvers. Section 5 contains descriptions of
several state-of-the-art solvers that implement the different solution tech-
niques presented. Finally, in Section 6 we present a short computational
comparison of those software packages.
2. MINLP. The focus of this section is to mathematically define a
MINLP and to describe important special cases. Basic elements of algo-
rithms and subproblems related to MINLP are also introduced.
2.1. MINLP problem classes. A Mixed Integer Nonlinear Program
may be expressed in algebraic form as follows:
www.it-ebooks.info
4 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 5
zminlp = minimize η
subject to f (x) ≤ η
gj (x) ≤ 0 ∀j ∈ J, (MINLP-1)
|I|
x ∈ X, xI ∈ Z .
are valid for all j ∈ J and x̂ ∈ Rn . Since f (x) ≤ η and gj (x) ≤ 0, then the
linear inequalities
The value znlpr(l,u) is a lower bound on the value of zminlp that can be
obtained in the subset of the feasible region of (MINLP) where the bounds
I ≤ xI ≤ uI are imposed. Specifically, if (lI , uI ) are the lower and upper
bounds (LI , UI ) for the original instance, then zNLPR(LI ,UI ) provides a
lower bound on zminlp .
In the special case that all of the integer variables are fixed (lI = uI =
x̂I ), the fixed NLP subproblem is formed:
If x̂I ∈ Z|I| and (NLP(x̂I )) has a feasible solution, the value zNLP(x̂I ) pro-
vides an upper bound to the problem (MINLP). If (NLP(x̂I )) is infeasible,
www.it-ebooks.info
6 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
where gj (x)+ = max{0, gj (x)} measures the violation of the nonlinear con-
straints and wj ≥ 0. Since when NLP(x̂I ) is infeasible NLP solvers will
return the solution to NLPF(x̂I ), we will often say, by abuse of terminology,
that NLP(x̂I ) is solved and its solution x is optimal or minimally infeasible,
meaning that it is the optimal solution to NLPF(x̂I ).
3. Algorithms for convex MINLP. With elements of algorithms
defined, attention can be turned to describing common algorithms for
solving MINLPs. The algorithms share many general characteristics with
the well-known branch-and-bound or branch-and-cut methods for solving
MILPs.
3.1. NLP-Based Branch and Bound. Branch and bound is a
divide-and-conquer method. The dividing (branching) is done by parti-
tioning the set of feasible solutions into smaller and smaller subsets. The
conquering (fathoming) is done by bounding the value of the best feasible
solution in the subset and discarding the subset if its bound indicates that
it cannot contain an optimal solution.
Branch and bound was first applied to MILP by Land and Doig [74].
The method (and its enhancements such as branch and cut) remain the
workhorse for all of the most successful MILP software. Dakin [38] real-
ized that this method does not require linearity of the problem. Gupta
and Ravindran [62] suggested an implementation of the branch-and-bound
method for convex MINLPs and investigated different search strategies.
Other early works related to NLP-Based Branch and Bound (NLP-BB for
short) for convex MINLP include [91], [28], and [78].
In NLP-BB, the lower bounds come from solving the subproblems
(NLPR(lI , uI )). Initially, the bounds (LI , UI ) (the lower and upper bounds
on the integer variables in (MINLP)) are used, so the algorithm is initialized
with a continuous relaxation whose solution value provides a lower bound
on zminlp . The variable bounds are successively refined until the subregion
can be fathomed. Continuing in this manner yields a tree L of subproblems.
A node N of the search tree is characterized by the bounds enforced on its
def
integer variables: N = (lI , uI ). Lower and upper bounds on the optimal
solution value zL ≤ zminlp ≤ zU are updated through the course of the
algorithm. Algorithm 1 gives pseudocode for the NLP-BB algorithm for
solving (MINLP).
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 7
www.it-ebooks.info
8 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
zoa = min η
s.t. η ≥ f (x) + ∇f (x)T (x − x) x ∈ K, (MILP-OA)
gj (x) + ∇gj (x) (x − x) ≤ 0
T
j ∈ J, x ∈ K,
x ∈ X, xI ∈ Z .I
zmp(K) = min η
s.t. η ≥ f (x̄) + ∇f (x̄)T (x − x̄) x̄ ∈ K, (MP(K))
gj (x̄) + ∇gj (x̄)T (x − x̄) ≤ 0 j ∈ J, x̄ ∈ K,
x ∈ X, xI ∈ ZI .
We call this problem the OA-based reduced master problem. The solu-
tion value of the reduced master problem (MP(K)), zmp(K) , gives a lower
bound to (MINLP), since K ⊆ K. The OA method proceeds by iteratively
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 9
Note that xI = x̂I , since the integer variables are fixed. In (BC(x̂)), ∇I
refers to the gradients of functions f (or g) with respect to discrete vari-
www.it-ebooks.info
10 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
zgbd(KFS,KIS) = min η
s.t. η ≥ f (x) + (∇I f (x) + ∇I g(x)μ)T (xI − xI ) ∀x ∈ KFS,
T
λ [g(x) + ∇I g(x)T (xI − xI )] ≤ 0 ∀x ∈ KIS,
(RM-GBD)
x ∈ X, xI ∈ ZI ,
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 11
zecp(K) = min η
s.t. η ≥ f (x̄) + ∇f (x̄)T (x − x̄) x̄ ∈ K (RM-ECP(K))
gj (x̄) + ∇gj (x̄)T (x − x̄) ≤ 0 x̄ ∈ K j ∈ J(x̄)
x ∈ X, xI ∈ ZI
def
where J(x̄) = {j ∈ arg maxj∈J gj (x̄)} is the index set of most violated
constraints for each solution x̄ ∈ K, the set of solutions to (RM-ECP(K)).
It is also possible to add linearizations of all violated constraints to (RM-
ECP(K)). In that case, J(x̄) = {j | gj (x̄) > 0}. Algorithm 3 gives the
pseudocode for the ECP algorithm.
The optimal values zecp(K) of (RM-ECP(K)) generate a non-
decreasing sequence of lower bounds. Finite convergence of the algorithm
is achieved when the maximum constraint violation is smaller than a spec-
ified tolerance . Theorem 3.3 states that the sequence of objective values
obtained from the solutions to (RM-ECP(K)) converge to the optimal so-
lution value.
Theorem 3.3. [111] If X = ∅ is compact and f and g are convex and
continuously differentiable, then zecp(K) converges to zminlp .
www.it-ebooks.info
12 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
The ECP method may require a large number of iterations, since the
linearizations added at Step 3 are not coming from solutions to NLP sub-
problems. Convergence can often be accelerated by solving NLP subprob-
lems (NLP(x̂I )) and adding the corresponding linearizations, as in the OA
method. The Extended Cutting Plane algorithm is implemented in the
α-ECP software [110].
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 13
idea could potentially be applied to both the GBD and ECP methods. The
LP/NLP-BB method commonly significantly reduces the total number of
nodes to be enumerated when compared to the OA method. However,
the trade-off is that the number of NLP subproblems might increase. As
part of his Ph.D. thesis, Leyffer implemented the LP/NLP-BB method
and reported substantial computational savings [76]. The LP/NLP-Based
Branch-and-Bound algorithm is implemented in solvers Bonmin [24] and
FilMINT [2].
www.it-ebooks.info
14 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 15
www.it-ebooks.info
16 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 17
www.it-ebooks.info
18 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 19
www.it-ebooks.info
20 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
sible solutions was a major reason for the vast improvement in MILP so-
lution technology [22]. To our knowledge, very few, if any MINLP solvers
add inequalities that are specific to the nonlinear structure of the problem.
Nevertheless, a number of cutting plane techniques that could be imple-
mented have been developed in the literature. Here we outline a few of
these techniques. Most of them have been adapted from known methods in
the MILP case. We refer the reader to [36] for a recent survey on cutting
planes for MILP.
4.4.1. Gomory cuts. The earliest cutting planes for Mixed Integer
Linear Programs were Gomory Cuts [58, 59]. For simplicity of exposition,
we assume a pure Integer Linear Program (ILP): I = {1, . . . , n}, with
linear constraints given in matrix form as Ax ≤ b and x ≥ 0. The idea
underlying the inequalities is to choose a set of non-negative multipliers
u ∈ Rm + andform the surrogate constraint uT Ax ≤ uT b. Since x ≥ 0, the
inequality j∈N u aj xj ≤ uT b is valid, and since uT aj xj is an integer,
T
the
right-hand side may also be rounded down to form the Gomory cut
j∈N u T
a j xj ≤ uT b. This simple procedure suffices to generate all
valid inequalities for an ILP [35]. Gomory cuts can be generalized to Mixed
Integer Gomory (MIG) cuts which are valid for MILPs. After a period of
not being used in practice to solve MILPs, Gomory cuts made a resurgence
following the work of Balas et al. [10], which demonstrated that when used
in combination with branch and bound, MIG cuts were quite effective in
practice.
For MINLP, Cezik and Iyengar [34] demonstrate that if the nonlinear
constraint set gj (x) ≤ 0 ∀j ∈ J can be described using conic constraints
T x K b , then the Gomory procedure is still applicable. Here K, is a
homogeneous, self-dual, proper, convex cone, and the notation x K y de-
notes that (x − y) ∈ K. Each cone K has a dual cone K∗ with the property
def
that K∗ = {u | uT z ≥ 0 ∀z ∈ K} . The extension of the Gomory proce-
dure to the case of conic integer programming is clear from the following
equivalence:
Ax K b ⇔ uT Ax ≥ uT b ∀u K∗ 0.
Specifically, elements from the dual cone u ∈ K∗ can be used to perform
the aggregation, and the regular Gomory procedure applied. To the au-
thors’ knowledge, no current MINLP software employs conic Gomory cuts.
However, most solvers generate Gomory cuts from the existing linear in-
equalities in the model. Further, as pointed out by Akrotirianakis, Maros,
and Rustem [5], Gomory cuts may be generated from the linearizations
(2.1) and (2.2) used in the OA, ECP, or LP/NLP-BB methods. Most
linearization-based software will by default generate Gomory cuts on these
linearizations.
4.4.2. Mixed integer rounding. Consider the simple two-variable
set X = {(x1 , x2 ) ∈ Z × R+ | x1 ≤ b + x2 }. It is easy to see that the
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 21
1
mixed integer rounding inequality x1 ≤ b + 1−f x2 , where f = b − b
represents the fractional part of b, is a valid inequality for X. Studying the
convex hull of this simple set and some related counterparts have generated
a rich classes of inequalities that may significantly improve the ability to
solve MILPs [85]. Key to generating useful inequalities for computation
is to combine rows of the problem in a clever manner and to use variable
substitution techniques.
Atamtürk and Narayan [7] have extended the concept of mixed integer
rounding to the case of Mixed Integer Second-Order Cone Programming
(MISOCP). For the conic mixed integer set
T = (x1 , x2 , x3 ) ∈ Z × R2 | (x1 − b)2 + x22 ≤ x3
www.it-ebooks.info
22 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
(where d(x, η) is the distance to the point (x̄, η̄) in any norm). The lift-
and-project inequality
is valid for MINLP, and is a special case of (4.2). Note that if strong-
branching is used to determine the branching variable, then the values η̂i−
and η̂i+ are produced as a byproduct.
4.5. Heuristics. Here we discuss heuristic methods that are aimed at
finding integer feasible solutions to MINLP with no guarantee of optimality
or success. Heuristics are usually fast algorithms. In a branch-and-bound
algorithm they are typically run right after the Evaluate step. Depending
on the actual running time of the heuristic, it may be called at every node,
every nth node, or only at the root node. In linearization-based methods
like OA, GBD, ECP, or LP/NLP-BB, heuristics may be run in the Upper
Bound and Refine step, especially in the case when NLP(x̂I ) is infeasible.
Heuristics are very important because by improving the upper bound zU ,
they help in the Prune step of the branch-and-bound algorithm or in the
convergence criterion of the other algorithms. From a practical point of
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 23
www.it-ebooks.info
24 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
The two sequences have the property that at each iteration the distance
between xi and x̂i+1 is non-increasing. The procedure stops whenever an
integer feasible solution is found (or x̂k = xk ). This basic procedure may
cycle or stall without finding an integer feasible solution and randomization
has been suggested to restart the procedure [48]. Several variants of this
basic procedure have been proposed in the context of MILP [18, 3, 50].
The authors of [1, 26] have shown that the basic principle of the Feasibility
Pump can also find good solutions in short computing time also in the
context of MINLP.
Another variant of the Feasibility Pump for convex MINLPs was pro-
posed by Bonami et al. [25]. Like in the basic FP scheme two sequences
are constructed with the same properties: x0 , . . . , xk are points in X that
satisfy g(xi ) ≤ 0 but not xiI ∈ Z|I| and x̂1 , . . . , x̂k+1 are points that do
not necessarily satisfy g(x̂i ) ≤ 0 but satisfy x̂iI ∈ Z|I| . The sequence xi is
generated in the same way as before but the sequence x̂i is now generated
by solving MILPs. The MILP to solve for finding x̂i+1 is constructed by
building an outer approximation of the constraints of the problem with
linearizations taken in all the points of the sequence x0 , . . . , xi . Then, x̂i+1
is found as the point in the current outer approximation of the constraints
that is closest to xi in 1 -norm in the space of integer constrained variables:
zFP-Mi = minimize |xj − xij |
i∈I
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 25
5.1. α-ECP. α-ECP [110] is a solver based on the ECP method de-
scribed in Section 3.4. Problems to be solved may be specified in a text-
based format, as user-supplied subroutines, or via the GAMS algebraic
modeling language. The software is designed to solve convex MINLP, but
problems with a pseudo-convex objective function and pseudo-convex con-
straints can also be solved to global optimality with α-ECP. A significant
feature of the software is that no nonlinear subproblems are required to
be solved. (Though recent versions of the code have included an option to
occasionally solve NLP subproblems, which may improve performance, es-
pecially on pseudo-convex instances.) Recent versions of the software also
include enhancements so that each MILP subproblem need not be solved to
global optimality. α-ECP requires a (commercial) MILP software to solve
the reduced master problem (RM-ECP(K)), and CPLEX, XPRESS-MP,
or Mosek may be used for this purpose.
In the computational experiment of Section 6, α-ECP (v1.75.03) is
used with CPLEX (v12.1) as MILP solver, CONOPT (v3.24T) as NLP
solver and α-ECP is run via GAMS. Since all instances are convex, setting
the ECPstrategy option to 1 instructed α-ECP to not perform algorithmic
steps relating to the solution of pseudo-convex instances.
www.it-ebooks.info
26 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
and may use Cbc or Cplex to solve MILP subproblems arising in its vari-
ous algorithms.
The Bonmin NLP-BB algorithm features a range of different heuris-
tics, advanced branching techniques such as strong-branching or pseudo-
costs branching, and five different choices for node selection strategy. The
Bonmin LP/NLP-BB methods use row management, cutting planes, and
branching strategies from Cbc. A distinguishing feature of Bonmin is that
one may instruct Bonmin to use a (time-limited) OA or feasibility pump
heuristic at the beginning of the optimization.
In the computational experiments, Bonmin (v1.1) is used with Cbc
(v2.3) as the MILP solver, Ipopt (v2.7) as NLP solver, and Clp (v1.10)
is used as LP solver. For Bonmin, the algorithms, NLP-BB (denoted
as B-BB) and LP/NLP-BB (denoted as B-Hyb) are tested. The default
search strategies of dynamic node selection (mixture of depth-first-search
and best-bound) and strong-branching were employed.
5.3. DICOPT. DICOPT is a software implementation of the OA
method described in Section 3.2. DICOPT may be used as a solver from the
GAMS modeling language. Although OA has been designed to solve convex
MINLP, DICOPT may often be used successfully as a heuristic approach
for nonconvex MINLP, as it contains features such as equality relaxation
[72] and augmented penalty methods [105] for dealing with nonconvexities.
DICOPT requires solvers for both NLP subproblems and MILP subprob-
lems, and it uses available software as a “black-box” in each case. For NLP
subproblems, possible NLP solvers include CONOPT [42], MINOS [89]
and SNOPT [57]. For MILP subproblems, possible MILP solvers include
CPLEX [66] and XPRESS [47]. DICOPT contains a number of heuristic
(inexact) stopping rules for the OA method that may be especially effective
for nonconvex instances.
In our computational experiment, the DICOPT that comes with
GAMS v23.2.1 is used with CONOPT (v3.24T) as the NLP solver and
Cplex (v12.1) as the MILP solver. In order to ensure that instances are
solved to provable optimality, the GAMS/DICOPT option stop was set to
value 1.
5.4. FilMINT. FilMINT [2] is a non-commercial solver for convex
MINLPs based on the LP/NLP-BB algorithm. FilMINT may be used
through the AMPL language.
FilMINT uses MINTO [93] a branch-and-cut framework for MILP to
solve the reduced master problem (MP(K)) and filterSQP [52] to solve
nonlinear subproblems. FilMINT uses the COIN-OR LP solver Clp or
CPLEX to solve linear programs.
FilMINT by default employs nearly all of MINTO’s enhanced MILP
features, such as cutting planes, primal heuristics, row management, and
enhanced branching and node selection rules. By default, pseudo-costs
branching is used as branching strategy and best estimate is used as node
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 27
www.it-ebooks.info
28 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
or not the instance has a nonlinear objective function, the total number of
variables, the number of integer variables, the number of constraints, and
how many of the constraints are nonlinear.
BatchS: The BatchS problems [97, 104] are multi-product batch plant
design problems where the objective is to determine the volume of the
equipment, the number of units to operate in parallel, and the locations of
intermediate storage tanks.
CLay: The CLay problems [98] are constrained layout problems where
non-overlapping rectangular units must be placed within the confines of
certain designated areas such that the cost of connecting these units is
minimized.
FLay: The FLay problems [98] are farmland layout problems where
the objective is to determine the optimal length and width of a number of
rectangular patches of land with fixed area, such that the perimeter of the
set of patches is minimized.
fo-m-o: These are block layout design problems [33], where an orthog-
onal arrangement of rectangular departments within a given rectangular
facility is required. A distance-based objective function is to be minimized,
and the length and width of each department should satisfy given size and
area requirements.
RSyn: The RSyn problems [98] concern retrofit planning, where one
would like to redesign existing plants to increase throughput, reduce energy
consumption, improve yields, and reduce waste generation. Given limited
capital investments to make process improvements and cost estimations
over a given time horizon, the problem is to identify the modifications that
yield the highest income from product sales minus the cost of raw materials,
energy, and process modifications.
SLay: The SLay problems [98] are safety layout problems where opti-
mal placement of a set of units with fixed width and length is determined
such that the Euclidean distance between their center point and a prede-
fined “safety point” is minimized.
sssd: The sssd instances [45] are stochastic service system design prob-
lems. Servers are modeled as M/M/1 queues, and a set of customers must
be assigned to the servers which can be operated at different service levels.
The objective is to minimize assignment and operating costs.
Syn: The Syn instances [43, 102] are synthesis design problems dealing
with the selection of optimal configuration and parameters for a processing
system selected from a superstructure containing alternative processing
units and interconnections.
trimloss: The trimloss (tls) problems [64] are cutting stock problems
where one would like to determine how to cut out a set of product paper
rolls from raw paper rolls such that the trim loss as well as the overall
production is minimized.
uflquad: The uflquad problems [61] are (separable) quadratic uncapac-
itated facility location problems where a set of customer demands must be
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 29
Table 1
Test set statistics.
www.it-ebooks.info
30 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
satisfied by open facilities. The objective is to minimize the sum of the fixed
cost for operating facilities and the shipping cost which is proportional to
the square of the quantity delivered to each customer.
All test problems are available in AMPL and GAMS formats and are
available from the authors upon request. In our experiments, α−ECP,
DICOPT, and SBB are tested through the GAMS interface, while Bonmin,
FilMINT and MINLP BB are tested through AMPL.
Table 2
Subjective Rating of Best Solver on Specific Instance Families.
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 31
Table 3
Solver statistics on the test set.
www.it-ebooks.info
32 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
Table 4
Comparison of running times (in seconds) for the solvers α-ECP(αECP), Bonmin-
BB(B-BB), Bonmin-LP/NLP-BB(B-Hyb), DICOPT, FilMINT(Fil), FilMINT with
strong-branching cuts(Fil-SBC), MINLP BB(M-BB) and SBB (bold face for best run-
ning time). If the solver could not provide the optimal solution, we state the reason
with following letters: “t” states that the 3 hour time limit is hit, “m” states that the 3
GB memory limit is passed over and “f ” states that the solver has failed to find optimal
solution without hitting time limit or memory limit.
www.it-ebooks.info
1
0.8
0.6
0.4
www.it-ebooks.info
Proportion of problems solved
alphaecp
bonmin-bb
0.2 bonmin-hyb
dicopt
filmint
filmint-sbc
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP
minlpbb
REFERENCES
[1] K. Abhishek, S. Leyffer, and J.T. Linderoth, Feasibility pump heuristics for
Mixed Integer Nonlinear Programs. Unpublished working paper, 2008.
[2] , FilMINT: An outer-approximation-based solver for convex Mixed-Integer
Nonlinear Programs, INFORMS Journal on Computing, 22, No. 4 (2010),
pp. 555–567.
[3] T. Achterberg and T. Berthold, Improving the feasibility pump, Technical
Report ZIB-Report 05-42, Zuse Institute Berlin, September 2005.
[4] T. Achterberg, T. Koch, and A. Martin, Branching rules revisited, Opera-
tions Research Letters, 33 (2004), pp. 42–54.
[5] I. Akrotirianakis, I. Maros, and B. Rustem, An outer approximation based
branch-and-cut algorithm for convex 0-1 MINLP problems, Optimization
Methods and Software, 16 (2001), pp. 21–47.
[6] D. Applegate, R. Bixby, V. Chvátal, and W. Cook, On the solution of trav-
eling salesman problems, in Documenta Mathematica Journal der Deutschen
Mathematiker-Vereinigung, International Congress of Mathematicians, 1998,
pp. 645–656.
[7] A. Atamtürk and V. Narayanan, Conic mixed integer rounding cuts, Mathe-
matical Programming, 122 (2010), pp. 1–20.
[8] E. Balas, Disjunctive programming, in Annals of Discrete Mathematics 5: Dis-
crete Optimization, North Holland, 1979, pp. 3–51.
[9] E. Balas, S. Ceria, and G. Corneujols, A lift-and-project cutting plane al-
gorithm for mixed 0-1 programs, Mathematical Programming, 58 (1993),
pp. 295–324.
[10] E. Balas, S. Ceria, G. Cornuéjols, and N. R. Natraj, Gomory cuts revisited,
Operations Research Letters, 19 (1999), pp. 1–9.
[11] E. Balas and M. Perregaard, Lift-and-project for mixed 0-1 programming:
recent progress, Discrete Applied Mathematics, 123 (2002), pp. 129–154.
[12] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming:
Theory and Algorithms, John Wiley and Sons, New York, second ed., 1993.
[13] E.M.L. Beale, Branch and bound methods for mathematical programming sys-
tems, in Discrete Optimization II, P.L. Hammer, E.L. Johnson, and B.H.
Korte, eds., North Holland Publishing Co., 1979, pp. 201–219.
[14] E.W.L. Beale and J.A. Tomlin, Special facilities in a general mathematical
programming system for non-convex problems using ordered sets of variables,
in Proceedings of the 5th International Conference on Operations Research,
J. Lawrence, ed., 1969, pp. 447–454.
[15] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization,
SIAM, 2001. MPS/SIAM Series on Optimization.
[16] J.F. Benders, Partitioning procedures for solving mixed variable programming
problems, Numerische Mathematik, 4 (1962), pp. 238–252.
[17] M. Bénichou, J.M. Gauthier, P. Girodet, G. Hentges, G. Ribière, and
O. Vincent, Experiments in Mixed-Integer Linear Programming, Mathe-
matical Programming, 1 (1971), pp. 76–94.
[18] L. Bertacco, M. Fischetti, and A. Lodi, A feasibility pump heuristic for gen-
eral mixed-integer problems, Discrete Optimization, 4 (2007), pp. 63–76.
[19] T. Berthold, Primal Heuristics for Mixed Integer Programs, Master’s thesis,
Technische Universität Berlin, 2006.
[20] T. Berthold and A. Gleixner, Undercover - a primal heuristic for MINLP
based on sub-mips generated by set covering, Tech. Rep. ZIB-Report 09-40,
Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB), 2009.
[21] D. Bienstock, Computational study of a family of mixed-integer quadratic pro-
gramming problems, Mathematical Programming, 74 (1996), pp. 121–140.
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 35
www.it-ebooks.info
36 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
[41] S. Drewes, Mixed Integer Second Order Cone Programming, PhD thesis, Tech-
nische Universität Darmstadt, 2009.
[42] A. S. Drud, CONOPT – a large-scale GRG code, ORSA Journal on Computing,
6 (1994), pp. 207–216.
[43] M.A. Duran and I. Grossmann, An outer-approximation algorithm for a
class of Mixed-Integer Nonlinear Programs, Mathematical Programming, 36
(1986), pp. 307–339.
[44] J. Eckstein, Parallel branch-and-bound algorithms for general mixed integer pro-
gramming on the CM-5, SIAM Journal on Optimization, 4 (1994), pp. 794–
814.
[45] S. Elhedhli, Service System Design with Immobile Servers, Stochastic Demand,
and Congestion, Manufacturing & Service Operations Management, 8 (2006),
pp. 92–97.
[46] A. M. Eliceche, S. M. Corvalán, and P. Martı́nez, Environmental life cycle
impact as a tool for process optimisation of a utility plant, Computers and
Chemical Engineering, 31 (2007), pp. 648–656.
[47] Fair Isaac Corporation, XPRESS-MP Reference Manual, 2009. Release 2009.
[48] M. Fischetti, F. Glover, and A. Lodi, The feasibility pump, Mathematical
Programming, 104 (2005), pp. 91–104.
[49] M. Fischetti and A. Lodi, Local branching, Mathematical Programming, 98
(2003), pp. 23–47.
[50] M. Fischetti and D. Salvagnin, Feasibility pump 2.0, Tech. Rep., University of
Padova, 2008.
[51] R. Fletcher and S. Leyffer, Solving Mixed Integer Nonlinear Programs by
outer approximation, Mathematical Programming, 66 (1994), pp. 327–349.
[52] , User manual for filterSQP, 1998. University of Dundee Numerical Anal-
ysis Report NA-181.
[53] A. Flores-Tlacuahuac and L.T. Biegler, Simultaneous mixed-integer dy-
namic optimization for integrated design and control, Computers and Chem-
ical Engineering, 31 (2007), pp. 648–656.
[54] J.J.H. Forrest, J.P.H. Hirst, and J.A. Tomlin, Practical solution of large
scale mixed integer programming problems with UMPIRE, Management Sci-
ence, 20 (1974), pp. 736–773.
[55] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the
Theory of NP-Completeness, W.H. Freeman and Company, New York, 1979.
[56] A. Geoffrion, Generalized Benders decomposition, Journal of Optimization
Theory and Applications, 10 (1972), pp. 237–260.
[57] P.E. Gill, W. Murray, and M.A. Saunders, SNOPT: An SQP algorithm
for large–scale constrained optimization, SIAM Journal on Optimization, 12
(2002), pp. 979–1006.
[58] R.E. Gomory, Outline of an algorithm for integer solutions to linear programs,
Bulletin of the American Mathematical Monthly, 64 (1958), pp. 275–278.
[59] , An algorithm for the mixed integer problem, Tech. Rep. RM-2597, The
RAND Corporation, 1960.
[60] I. Grossmann, J. Viswanathan, A.V.R. Raman, and E. Kalvelagen,
GAMS/DICOPT: A discrete continuous optimization package, Math. Meth-
ods Appl. Sci, 11 (2001), pp. 649–664.
[61] O. Günlük, J. Lee, and R. Weismantel, MINLP strengthening for separaable
convex quadratic transportation-cost ufl, Tech. Rep. RC24213 (W0703-042),
IBM Research Division, March 2007.
[62] O.K. Gupta and A. Ravindran, Branch and bound experiments in convex non-
linear integer programming, Management Science, 31 (1985), pp. 1533–1546.
[63] Gurobi Optimization, Gurobi Optimizer Reference Manual, 2009. Version 2.
[64] I. Harjunkoski, R.Pörn, and T. Westerlund, MINLP: Trim-loss problem,
in Encyclopedia of Optimization, C.A. Floudas and P.M. Pardalos, eds.,
Springer, 2009, pp. 2190–2198.
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 37
www.it-ebooks.info
38 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH
[87] R.D. McBride and J.S. Yormark, An implicit enumeration algorithm for
quadratic integer programming, Management Science, 26 (1980), pp. 282–
296.
[88] Mosek ApS, 2009. www.mosek.com.
[89] B. Murtagh and M. Saunders, MINOS 5.4 user’s guide, Report SOL 83-20R,
Department of Operations Research, Stanford University, 1993.
[90] K.G. Murty and S.N. Kabadi, Some NP-complete problems in quadratic and
nonlinear programming, Mathematical Programming, 39 (1987), pp. 117–
129.
[91] S. Nabal and L. Schrage, Modeling and solving nonlinear integer programming
problems. Presented at Annual AIChE Meeting, Chicago, 1990.
[92] G. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimization,
John Wiley and Sons, New York, 1988.
[93] G.L. Nemhauser, M.W.P. Savelsbergh, and G.C. Sigismondi, MINTO, a
Mixed INTeger Optimizer, Operations Research Letters, 15 (1994), pp. 47–
58.
[94] J. Nocedal and S.J. Wright, Numerical Optimization, Springer-Verlag, New
York, second ed., 2006.
[95] J. Ostrowski, J. Linderoth, F. Rossi, and S. Smriglio, Orbital branching,
Mathematical Programming, 126 (2011), pp. 147–178.
[96] I. Quesada and I.E. Grossmann, An LP/NLP based branch–and–bound algo-
rithm for convex MINLP optimization problems, Computers and Chemical
Engineering, 16 (1992), pp. 937–947.
[97] D.E. Ravemark and D.W.T. Rippin, Optimal design of a multi-product batch
plant, Computers & Chemical Engineering, 22 (1998), pp. 177 – 183.
[98] N. Sawaya, Reformulations, relaxations and cutting planes for generalized
disjunctive programming, PhD thesis, Chemical Engineering Department,
Carnegie Mellon University, 2006.
[99] N. Sawaya, C.D. Laird, L.T. Biegler, P. Bonami, A.R. Conn,
G. Cornuéjols, I. E. Grossmann, J. Lee, A. Lodi, F. Margot, and
A. Wächter, CMU-IBM open source MINLP project test set, 2006. http:
//egon.cheme.cmu.edu/ibm/page.htm.
[100] A. Schrijver, Theory of Linear and Integer Programming, Wiley, Chichester,
1986.
[101] R. Stubbs and S. Mehrotra, A branch-and-cut method for 0-1 mixed convex
programming, Mathematical Programming, 86 (1999), pp. 515–532.
[102] M. Türkay and I.E. Grossmann, Logic-based minlp algorithms for the opti-
mal synthesis of process networks, Computers & Chemical Engineering, 20
(1996), pp. 959 – 978.
[103] R.J. Vanderbei, LOQO: An interior point code for quadratic programming, Op-
timization Methods and Software (1998).
[104] A. Vecchietti and I.E. Grossmann, LOGMIP: a disjunctive 0-1 non-linear
optimizer for process system models, Computers and Chemical Engineering,
23 (1999), pp. 555 – 565.
[105] J. Viswanathan and I.E. Grossmann, A combined penalty function and outer–
approximation method for MINLP optimization, Computers and Chemical
Engineering, 14 (1990), pp. 769–782.
[106] A. Wächter, Some recent advanced in Mixed-Integer Nonlinear Programming,
May 2008. Presentation at the SIAM Conference on Optimization.
[107] A. Wächter and L.T. Biegler, On the implementation of a primal-dual inte-
rior point filter line search algorithm for large-scale nonlinear programming,
Mathematical Programming, 106 (2006), pp. 25–57.
[108] R. Waltz, Current challenges in nonlinear optimization, 2007. Presen-
tation at San Diego Supercomputer Center: CIEG Spring Orientation
Workshop, available at www.sdsc.edu/us/training/workshops/2007sac_
studentworkshop/docs/SDSC07.ppt.
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 39
www.it-ebooks.info
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR
MIXED INTEGER SECOND ORDER
CONE PROGRAMMING∗
SARAH DREWES† AND STEFAN ULBRICH‡
Abstract. This paper deals with outer approximation based approaches to solve
mixed integer second order cone programs. Thereby the outer approximation is based
on subgradients of the second order cone constraints. Using strong duality of the sub-
problems that are solved during the algorithm, we are able to determine subgradients
satisfying the KKT optimality conditions. This enables us to extend convergence results
valid for continuously differentiable mixed integer nonlinear problems to subdifferen-
tiable constraint functions. Furthermore, we present a version of the branch-and-bound
based outer approximation that converges when relaxing the convergence assumption
that every SOCP satisfies the Slater constraint qualification. We give numerical results
for some application problems showing the performance of our approach.
Key words. Mixed Integer Nonlinear Programming, Second Order Cone Program-
ming, Outer Approximation.
min cT x
s.t. Ax = b
x 0 (1.1)
(x)j ∈ [lj , uj ] (j ∈ J),
(x)j ∈ Z (j ∈ J),
x ∈ K, K := K1 × · · · × Knoc ,
where
the SFB 805 and by the state of Hesse within the LOEWE-Center AdRIA.
† Research Group Nonlinear Optimization, Department of Mathematics, Technische
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 41
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_2,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
42 SARAH DREWES AND STEFAN ULBRICH
is the second order cone of dimension ki . Mixed integer second order cone
problems have various applications in finance or engineering, for example
turbine balancing problems, cardinality-constrained portfolio optimization
(cf. Bertsimas and Shioda in [17] or Vielma et al. in [10] ) or the problem of
finding a minimum length connection network also known as the Euclidean
Steiner Tree Problem (ESTP) (cf. Fampa, Maculan in [15]).
Available convex MINLP solvers like BONMIN [22] by Bonami et al. or
FilMINT [25] by Abhishek et al. are in general not applicable for (1.1),
since the occurring second order cone constraints are not continuously dif-
ferentiable.
Branch-and-cut methods for convex mixed 0-1 problems have been dis-
cussed by Stubbs and Mehrotra in [2] and [9] which can be applied to solve
(1.1), if the integer variables are binary. In [5] Çezik and Iyengar discuss
cuts for general self-dual conic programming problems and investigate their
applications on the maxcut and the traveling salesman problem. Atamtürk
and Narayanan present in [12] integer rounding cuts for conic mixed-integer
programming by investigating polyhedral decompositions of the second or-
der cone conditions and in [11] the authors discuss lifting for mixed integer
conic programming, where valid inequalities for mixed-integer feasible sets
are derived from suitable subsets.
One article dealing with non-differentiable functions in the context of
outer approximation approaches for MINLP is [1] by Fletcher and Leyffer,
where the authors prove convergence of outer approximation algorithms for
non-smooth penalty functions. The only article dealing with outer approx-
imation techniques for MISOCPs is [10] by Vielma et al., which is based on
Ben-Tal and Nemirovskii’s polyhedral approximation of the second order
cone constraints [13]. Thereby, the size of the outer approximation grows
when strengthening the precision of the approximation. This precision and
thus the entire outer approximation is chosen in advance, whereas the ap-
proximation presented here is strengthened iteratively in order to guarantee
convergence of the algorithm.
In this paper we present a hybrid branch-and-bound based outer ap-
proximation approach for MISOCPs. The approach is based on the branch-
and-bound based outer approximation approach for continuously differen-
tiable constraints – as proposed by Bonami et al. in [8] on the basis of
Fletcher and Leyffer [1] and Quesada and Grossmann [3]. The idea is to
iteratively compute integer feasible solutions of a (sub)gradient based lin-
ear outer approximation of (1.1) and to tighten this outer approximation
by solving nonlinear continuous problems.
Thereby linear outer approximations based on subgradients satisfying
the Karush Kuhn Tucker (KKT) optimality conditions of the occurring
SOCP problems enable us to extend the convergence result for continuously
differentiable constraints to subdifferentiable second order cone constraints.
Thus, in contrast to [10], the subgradient based approximation induces
convergence of any classical outer approximation based approach under the
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 43
www.it-ebooks.info
44 SARAH DREWES AND STEFAN ULBRICH
min cT x
s.t. Ax = b,
(N LP (xkJ ))
x 0,
xJ = xkJ .
The dual of (N LP (xkJ )), in the sense of Nesterov and Nemirovskii [18] or
Alizadeh and Goldfarb [7], is given by
where IJ = ((IJ )1 , . . . (IJ )noc ) denotes the matrix mapping x to the integer
variables xJ where (IJ )i ∈ R|J|,ki is the block of columns of IJ associated
with the i-th cone of dimension ki . We define
where Ia (x̄) is the index set of active conic constraints that are differentiable
in x̄ and I0 (x̄) is the index set of active constraints that are subdifferen-
tiable in x̄. The crucial point in an outer approximation approach is to
tighten the outer approximation problem such that the integer assignment
of the last solution is cut off. Assume xkJ is this last solution. Then we will
show later that those subgradients in ∂gi (x̄) that satisfy the KKT condi-
tions in the solution x̄ of (N LP (xkJ )) give rise to linearizations with this
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 45
tightening property. Hence, we show now, how to choose elements ξ¯i in the
subdifferentials ∂gi (x̄) for i ∈ {1, . . . noc} that satisfy the KKT conditions
ci + (ATi , (IJ )Ti )μ̄ + λ̄i ξ¯i = 0, i ∈ I0 (x̄),
ci + (ATi , (IJ )Ti )μ̄ + λ̄i ∇gi (x̄i ) = 0, i ∈ Ia (x̄), (3.2)
ci + (ATi , (IJ )Ti )μ̄ = 0, i ∈ I0 (x̄) ∪ Ia (x̄)
with elements ξ̄i ∈ ∂gi (x̄). We now compare both optimality systems to
each other.
First, we consider i ∈ I0 ∪ Ia and thus x̄i ∈ int(Ki ). Lemma 2.2, part
1 induces s̄i = (0, . . . 0)T . Conditions (3.3) for i ∈ I0 ∪ Ia are thus equal to
ci − (ATi , (IJ )Ti )ȳ = 0 and thus μ̄ = −ȳ satisfies the KKT-condition (3.2)
for i ∈ I0 ∪ IA .
Next we consider i ∈ Ia (x̄), where xi ∈ bd(K) \ {0}. Lemma 2.2, part
2 yields
− γ x̄i1 x̄i0
s̄i = =γ (3.7)
−γ x̄i1 −x̄i1
x̄T
for i ∈ Ia (x̄). Inserting ∇gi (x̄) = (−1, x̄i1i1
)T for i ∈ Ia into (3.2) yields
the existence of λi ≥ 0 such that
1
ci + (ATi , (IJ )Ti )μ = λi x̄i1 , i ∈ Ia (x̄). (3.8)
− x̄i1
www.it-ebooks.info
46 SARAH DREWES AND STEFAN ULBRICH
Insertion of (3.7) into (3.3) and comparison with (3.8) yields the exis-
tence of γ ≥ 0 such that μ̄ = −ȳ and λ̄i = γ x̄i0 = γx̄i1 ≥ 0 satisfy the
KKT-conditions (3.2) for i ∈ Ia (x̄).
For i ∈ I0 (x̄), condition (3.2) is satisfied by μ ∈ Rm , λ̄i ≥ 0 and
subgradients ξ̄i of the form ξ¯i = (−1, v T )T , v ≤ 1. Since μ̄ = −ȳ
satisfies (3.2) for i ∈ I0 , we look for a suitable v and λ̄i ≥ 0 satisfying
ci − (ATi , (IJ )Ti )ȳ = λ̄i (1, −v T )T for i ∈ I0 (x̄). Comparing the last con-
dition with (3.3) yields that if s̄i1 > 0, then λ̄i = s̄i0 , −v = s̄s̄i1 i0
satisfy
condition (3.2) for i ∈ I0 (x̄). Since s̄i0 ≥ s̄i1 we obviously have λ̄i ≥ 0
and v = s̄s̄i1
i0
= s̄1i0 s̄i1 ≤ 1. If s̄i1 = 0, the required condition (3.2)
is satisfied by λ̄i = s̄i0 , −v = (0, . . . 0)T .
min u
s.t. Ax = b,
−xi0 + xi1 ≤ u, i = 1, . . . noc, (F (xkJ ))
u ≥ 0,
xJ = xkJ .
It has the property that the optimal solution (x̄, ū) minimizes
the maximal violation of the conic constraints. The dual program of
(F (xkJ )) is
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 47
Lemma 4.1. Assume A1 and A2 hold. Let (x̄, ū) solve (F (xkJ )) with
ū > 0 and let (s̄, ȳ) be the solution of its dual program (F (xkJ )-D). Then
there exist Lagrange multipliers μ̄ = −ȳ and λ̄i ≥ 0 (i ∈ IF ) that solve the
KKT conditions in (x̄, ū) with subgradients
−1 −1
ξi = , if s̄i0 > 0, ξi = , if s̄i0 = 0 (4.6)
− s̄s̄i1
i0
0
for i ∈ IF 0 (x̄).
Proof. Since (F (xkJ )) has interior points, there exist Lagrange multi-
pliers μ ∈ Rm , λ ≥ 0, such that optimal solution (x̄, ū) of (F (xkJ )) satisfies
the KKT-conditions (4.2) - (4.5) with ξi ∈ ∂gi (x̄i ) plus the feasibility con-
ditions. We already used the complementary conditions for ū > 0 and
the inactive constraints. Due to the nonempty interior of (F (xkJ )), (x̄, ū)
satisfies also the primal-dual optimality system
Ax = b,
u ≥ 0,
−ATi yA − (IJT )i yJ = si , i = 1, . . . noc, (4.7)
noc
xi0 + u ≥ x̄i1 , si0 = 1, (4.8)
i=1
si0 ≥ s̄i1 , i = 1, . . . noc, (4.9)
T
si0 (xi0 + u) + si1 xi1 = 0, i = 1, . . . noc, (4.10)
where we again used complementarity for ū > 0.
First we investigate i ∈ IF . In this case x̄i0 + ū > x̄i1 induces
si = (0, . . . 0)T (cf. Lemma 2.2, part 1). Thus, the KKT conditions (4.2)
are satisfied by μA = −yA and μJ = −yJ .
Next, we consider i ∈ IF 1 for which by definition x̄i0 + ū = x̄i1 > 0
holds. Lemma 2.2, part 2 states that there exists γ ≥ 0 with si0 = γ(x̄i0 +
ū) = γx̄i1 and si1 = −γ x̄i1 . Insertion into (4.7) yields
T −1
−Ai yA − (IJ )i yJ + γx̄i1 x̄i1 = 0, i ∈ IF 1 .
x̄i1
x̄T
Since ∇gi (x̄i ) = (−1, x̄i1
i1
)T , we obtain that the KKT-condition (4.3) is
satisfied by μA = −yA , μJ = −yJ and λgi = si0 = γx̄i1 ≥ 0.
www.it-ebooks.info
48 SARAH DREWES AND STEFAN ULBRICH
min cT x
s.t. Ax = b
cT x < cT x̄, x̄ ∈ T, x̄J ∈ Z|J| ,
−x̄i1 xi0 + x̄Ti1 xi1 ≤ 0, i ∈ Ia (x̄), x̄ ∈ T,
−x̄i1 xi0 + x̄Ti1 xi1 ≤ 0, i ∈ IF 1 (x̄) x̄ ∈ S,
−xi0 ≤ 0, i ∈ I0 (x̄), s̄i0 = 0, x̄ ∈ T, (MIP(T,S))
−xi0 − s̄1i0 s̄Ti1 xi1 ≤ 0, i ∈ I0 (x̄), s̄i0 > 0, x̄ ∈ T,
−xi0 − s̄1i0 s̄Ti1 xi1 ≤ 0, i ∈ IF 0 (x̄), s̄i0 > 0, x̄ ∈ S,
−xi0 ≤ 0, i ∈ IF 0 (x̄), s̄i0 = 0, x̄ ∈ S,
xj ∈ [lj , uj ], (j ∈ J)
xj ∈ Z, (j ∈ J).
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 49
www.it-ebooks.info
50 SARAH DREWES AND STEFAN ULBRICH
set i = i + 2, ll = ll + 1.
endif
endwhile
Note that if L = 1, xk is set to x̄ and Step 2b is omitted, Step 2
performs a nonlinear branch-and-bound search. If L = ∞ Algorithm 1
resembles an LP/NLP-based branch-and-bound algorithm. Convergence
of the outer approximation approach in case of continuously differentiable
constraint functions was shown in [1], Theorem 2. We now state conver-
gence of Algorithm 1 for subdifferentiable SOCP constraints.
For this purpose, we first prove that the last integer assignment xkJ is
infeasible in the outer approximation conditions induced by the solution of
a feasible subproblem (N LP (xkJ )).
Lemma 5.1. Assume A1 and A2 hold. If (N LP (xkJ )) is feasible with
optimal solution x̄ and dual solution (s̄, ȳ). Then every x with xJ = xkJ
satisfying the constraints Ax = b and
−x̄i1 xi0 + x̄Ti1 xi1 ≤ 0, i ∈ Ia (x̄),
−xi0 ≤ 0, i ∈ I0 (x̄), s̄i0 = 0, (5.1)
−xi0 − s̄1i0 s̄Ti1 xi1 ≤ 0, i ∈ I0 (x̄), s̄i0 > 0, x̄ ∈ T,
with ξ̄i from Lemma 3.1 and where the last equation follows from Ax̄ = b.
|I ∪I |
Due to A2 we know that there exist μ ∈ Rm and λ ∈ R+0 a satisfying
k
the KKT conditions (3.2) of (N LP (xJ )) in x̄, that is
with the subgradients ξ¯i chosen from Lemma 3.1. Farkas’ Lemma (cf. [20])
states that (5.5) is equivalent to the fact that as long as (x − x̄) satisfies
(5.2) - (5.4), then cTJ¯ (xJ¯ − x̄J¯) ≥ 0 ⇔ cTJ¯ xJ¯ ≥ cTJ¯ x̄J¯ must hold.
In the case that (N LP (xkJ )) is infeasible, we can show that the subgra-
dients (4.6) of Lemma 4.1 together with the gradients of the differentiable
functions gi in the solution of (F (xkJ )) provide inequalities that separate
the last integer solution.
Lemma 5.2. Assume A1 and A2 hold. If (N LP (xkJ )) is infeasible and
thus (x̄, ū) solves (F (xkJ )) with positive optimal value ū > 0, then every x
satisfying the linear equalities Ax = b with xJ = xkJ , is infeasible in the
constraints
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 51
x̄T
−xi0 + i1
x
x̄i1 i1
≤ 0, i ∈ IF 1 (x̄),
s̄T (5.6)
−xi0 − s̄i1
i0
xi1 ≤ 0, i ∈ IF 0 , s̄i0 =
0,
−xi0 ≤ 0, i ∈ IF 0 , s̄i0 = 0,
where IF 1 and IF 0 are defined by (4.1) and (s̄, ȳ) is the solution of the dual
program (F (xkJ )-D) of (F (xkJ )).
Proof. The proof is done in analogy to Lemma 1 in [1]. Due to assump-
tion A1 and A2, the optimal solution of (F (xkJ )) is attained.
We further
know from Lemma 4.1, that there exist λgi ≥ 0, with i∈IF λgi = 1, μA
and μJ satisfying the KKT conditions
∇gi (x̄)λgi + ξin λgi + AT μA + IJT μJ = 0 (5.7)
i∈IF 1 i∈IF 0
www.it-ebooks.info
52 SARAH DREWES AND STEFAN ULBRICH
the solutions of (N LP (xkJ )) or (F (xkJ )). The finiteness follows then from
the boundedness of the feasible set. A1 and A2 guarantee the solvability,
validity of KKT conditions and primal-dual optimality of the nonlinear
subproblems (N LP (xkJ )) and (F (xkJ )). In the case, when (N LP (xkJ )) is
feasible with solution x̄, Lemma 5.1 states that every x̃ with x̃J = x̂J
must satisfy cT x̃ ≥ cT x̄ and is thus infeasible in the constraint cT x̃ <
cT x̄ included in (LP k (T, S)). In the case, when (N LP (xkJ )) is infeasible,
Lemma 5.2 yields the result for (F (xkJ )).
Modified algorithm avoiding A2. We now present an adaption of
Algorithm 1 which is still convergent if the convergence assumption A2
is not valid for every subproblem. Assume N k is a node such that A2 is
violated by (N LP (xkJ )) and assume x with integer assignment xJ = xkJ is
feasible for the updated outer approximation. Then the inner while-loop
in step 2b becomes infinite and Algorithm 1 does not converge. In that
case we solve the SOCP relaxation (SOC k ) in node N k . If that problem
is not infeasible and has no integer feasible solution, we branch on the
solution of this SOCP relaxation to explore the subtree of N k . Hence, we
substitute step 2b by the following step.
2b’. solve (LP k (T, S)) with solution xk , set repeat = true.
while (((LP k (T, S)) feasible) & (xkJ integer) & (cT xk < CU B) & repeat)
save xold
J = xJ
k
k
if (N LP (xJ )is feasible with solution x̄)
T := T ∪ {x̄},
if (cT x̄ < CU B) CU B = cT x̄, x∗ = x̄ endif
else compute solution x̄ of F (xkJ ), S := S ∪ {x̄}
endif
compute solution xk of updated (LP k (T, S))
if (xold k
J == xJ ) set repeat = false endif
endwhile
if (!repeat)
solve nonlinear relaxation (SOC k ) at the node N k with solution x̄
T := T ∪ {x̄}
if (x̄J integer): if cT x̄ < CU B: CUB = cT x̄, x∗ = x̄ endif
go to 2.
else set xk = x̄.
endif
endif
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 53
www.it-ebooks.info
54 SARAH DREWES AND STEFAN ULBRICH
Table 1
Problem sizes (m, n, noc, |J|) and maximal constraints of LP approximation (m oa).
Table 2
Number of solved SOCP/LP problems.
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 55
Table 3
Run times in seconds.
www.it-ebooks.info
56 SARAH DREWES AND STEFAN ULBRICH
Table 1). Hence, with regard to running times, the version with L = 10000
outperforms L = 10 for almost all test instances, compare Table 3. Thus,
for the problems considered, Algorithm 2 with L = 10000, i.e., without
solving additional SOCPs, achieves the best performance in comparison to
nonlinear branch-and-bound as well as Algorithm 1 with L = 10.
In addition to the above considered instances, we tested some of the
classical portfolio optimization instances provided by Vielma et al. [28]
using Algorithm 2 with L = 10000. For each problem, we report in Table 4
the dimension of the MISOCP formulation, the dimension of the largest
relaxation solved by our algorithm and the dimension of the a priori LP
relaxation with accuracy 0.01 that was presented in [10]. For a better
comparison, we report the number of columns plus the number of linear
constraints as it is done in [10]. The dimensions of the largest LP relax-
ations solved by our approach are significantly smaller than the dimensions
of the LP approximations solved by [10]. Furthermore, in the lifted linear
programming approach in [10], every LP relaxation solved during the algo-
rithm is of the specified dimension. In our approach most of the solved LPs
are much smaller than the reported maximal dimension. In Table 5 we re-
port the run times and number of solved nodes problems for our algorithm
(Alg.2). For the sake of completeness we added the average and maximal
run times reported in [10] although it is not an appropriate comparison
since the algorithms have not been tested on similar machines. Since our
implementation of an interior SOCP solver is not as efficient as a commer-
cial solver like CPLEX which is used in [10], a comparison of times is also
difficult. But the authors of [10] report that solving their LP relaxations
usually takes longer than solving the associated SOCP relaxation. Thus,
we can assume that due to the low dimensions of the LPs solved in our
approach and the moderate number of SOCPs, our approach is likely to be
faster when using a more efficient SOCP solver.
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 57
Table 4
Dimension (m+n) and maximal LP approximation (m oa + n) (portfolio instances).
Table 5
Run times and node problems (portfolio instances).
REFERENCES
www.it-ebooks.info
58 SARAH DREWES AND STEFAN ULBRICH
[3] I. Quesada and I.E. Grosmann, An LP/NLP based Branch and Bound Algorithm
for Convex MINLP Optimization Problems, in Computers and Chemical En-
gineering, 1992, 16(10, 11): 937–947.
[4] A.M. Geoffrion, Generalized Benders Decomposition, in Journal of Optimization
Theory and Applications, 1972, 10(4): 237–260.
[5] M.T. Çezik and G. Iyengar, Cuts for Mixed 0-1 Conic Programming, in Math-
ematical Programming, Ser. A, 2005, 104: 179–200.
[6] M.A. Duran and I.E. Grossmann, An Outer-Approximation Algorithm for a
Class of Mixed-Integer Nonlinear Programs, in Mathematical Programming,
1986, 36: 307–339.
[7] F. Alizadeh and D. Goldfarb, Second-Order Cone Programming, RUTCOR,
Rutgers Center for Operations Research, Rutgers University, New Jersey, 2001.
[8] P. Bonami, L.T. Biegler, A.R. Conn, G. Cornuejols, I.E. Grossmann, C.D.
Laird, J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Wächter, An
Algorithmic Framework for Convex Mixed Integer Nonlinear Programs, IBM
Research Division, New York, 2005.
[9] R.A. Stubbs and S. Mehrotra, Generating Convex Polynomial Inequalities for
Mixed 0-1 Programs, Journal of global optimization, 2002, 24: 311–332.
[10] J.P. Vielma, S. Ahmed, and G.L. Nemhauser, A Lifted Linear Programming
Branch-and-Bound Algorithm for Mixed Integer Conic Quadratic Programs,
INFORMS Journal on Computing, 2008, 20(3): 438–450.
[11] A. Atamtürk and V. Narayanan, Lifting for Conic Mixed-Integer Programming,
BCOL Research report 07.04, 2007.
[12] A. Atamtürk and V. Narayanan, Cuts for Conic Mixed-Integer Programming,
Mathematical Programming, Ser. A, DOI 10.1007/s10107-008-0239-4, 2007.
[13] A. Ben-Tal and A. Nemirovski, On Polyhedral Approximations of the Second-
Order Cone, in Mathematics of Operations Research, 2001, 26(2): 193–205.
[14] E. Balas, S. Ceria, and G. Cornuéjols, A lift-and-project cutting plane al-
gorithm for mixed 0-1 programs, in Mathematical Programming, 1993, 58:
295–324.
[15] M. Fampa and N. Maculan, A new relaxation in conic form for the Euclidean
Steiner Tree Problem in Rn , in RAIRO Operations Research, 2001, 35:
383–394.
[16] J. Soukup and W.F. Chow, Set of test problems for the minimum length connec-
tion networks, in ACM SIGMAP Bulletin, 1973, 15: 48–51.
[17] D. Bertsimas and R. Shioda, Algorithm for cardinality-constrained quadratic
optimization, in Computational Optimization and Applications, 2007, 91:
239–269.
[18] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Con-
vex Programming, SIAM Studies in Applied Mathematics, 2001.
[19] R.T. Rockafellar, Convex Analysis, Princeton University Press, 1970.
[20] C. Geiger and C. Kanzow, Theorie und Numerik restringierter Optimierungsauf-
gaben, Springer Verlag Berlin Heidelberg New York, 2002.
[21] J.E. Beasley, OR Library: Collection of test data for Euclidean Steiner Tree Prob-
lems, http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/esteininfo.html.
[22] P. Belotti, P. Bonami, J.J. Forrest, L. Ladanyi, C. Laird, J. Lee, F. Mar-
got, and A. Wächter, BonMin, http://www.coin-or.org/Bonmin/ .
[23] R. Fletcher and S. Leyffer, User Manual of filterSQP, http://
www.mcs.anl.gov/∼leyffer/papers/SQP manual.pdf.
[24] C. Laird and A. Wächter, IPOPT, https://projects.coin-or.org/Ipopt.
[25] K. Abhishek, S. Leyffer, and J.T. Linderoth, FilMINT: An Outer
Approximation-Based Solver for Nonlinear Mixed Integer Programs, Argonne
National Laboratory, Mathematics and Computer Science Division,2008.
[26] S. Drewes, Mixed Integer Second Order Cone Programming, PhD Thesis, June,
2009.
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 59
[27] P. Bonami, M. Kilinc, and J. Linderoth, Algorithms and Software for Convex
Mixed Integer Nonlinear Programs 2009.
[28] J.P. Vielma, Portfolio Optimization Instances http://www2.isye.gatech.edu/
∼jvielma/portfolio/ .
[29] S. Drewes,MISOCP Test Instances https://www3.mathematik.tu-darmstadt.de/
index.php?id=491.
www.it-ebooks.info
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS
OKTAY GÜNLÜK∗ AND JEFF LINDEROTH†
where F = R ∩ (Rn−p
+ × Bp ), B denotes {0, 1}, and
def
R = {(x, z) ∈ Rn−p
+ × [0, 1]p | fj (x, z) ≤ 0 ∀j = 1, . . . , m}.
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 61
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_3,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
62 OKTAY GÜNLÜK AND JEFF LINDEROTH
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 63
or in “disaggregated” form
xij ≤ zi ∀i ∈ I, j ∈ J. (1.3)
is a bounded convex set. (Note that Γ can be convex even when some of
the functions fj defining it are non-convex.)
In this paper we study the convex hull description of sets closely related
to S. We present a number of examples where these simple sets appear as
substructures, and we demonstrate that utilizing the convex hull descrip-
tion of these sets helps solve the optimization problem efficiently. Closely
related to this work is the effort of Frangioni and Gentile [11, 13], who de-
rive a class of cutting planes that significantly strengthen the formulation
for MINLPs containing “on-off” type decisions with convex, separable, ob-
jective functions, and demonstrate that these inequalities are quite useful
in practice. The connection is detailed more in Section 5.3.
www.it-ebooks.info
64 OKTAY GÜNLÜK AND JEFF LINDEROTH
K t = {x ∈ Rn : Gt (x) ≤ 0}
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 65
www.it-ebooks.info
66 OKTAY GÜNLÜK AND JEFF LINDEROTH
W 0 = (x, z) ∈ Rn+1 : x = 0, z = 0
and
W 1 = (x, z) ∈ Rn+1 : fi (x) ≤ 0 for i ∈ I, u ≥ x ≥ l, z = 1
where u, l ∈ Rn+ , and I is the index set for the constraints. Clearly, both
W 0 and W 1 are bounded, and W 0 is a convex set. Furthermore, if W 1 is
also convex then we may write an extended formulation as
conv(W ) = (x, z) ∈ Rn+1 : 1 ≥ λ ≥ 0,
x = λx1 + (1 − λ)x0 ,
z = λz 1 + (1 − λ)z 0 ,
x0 = 0, z 0 = 0, z 1 = 1
fi (x1 ) ≤ 0 for i ∈ I, u ≥ x1 ≥ l .
When all fi (x) that define W 1 are polynomial functions, the convex
hull of W can be described in closed form in the original space of variables.
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 67
and
T 1 = (x, y, z) ∈ Rn+1+1 : fi (x) ≤ 0 i ∈ I, g(x) ≤ y, u ≥ x ≥ l, z = 1
where u, l ∈ Rn+ , and I = {1, . . . , t}.
Lemma 3.2. If T 1 is convex, then conv(T ) = T − ∪ T 0 , where
T − = (x, y, z) ∈ Rn+1+1 : fi (x/z) ≤ 0 i ∈ I, g(x/z) ≤ y/z,
uz ≥ x ≥ lz, 1≥z>0 .
www.it-ebooks.info
68 OKTAY GÜNLÜK AND JEFF LINDEROTH
Applying Lemma 3.2 gives the convex hull of S as the perspective of the
quadratic function defining the set. Note that when z > 0 the constraint
yz ≥ x2 is same as y/z ≥ (x/z)2 and when z = 0, it implies that x = 0.
Lemma 3.3. conv(S) = S c where
S c = (x, y, z) ∈ R3 : yz ≥ x2 , uz ≥ x ≥ lz, 1 ≥ z ≥ 0, x, y ≥ 0 .
where Si has the same form as the set S discussed in the previous sec-
tion except the bounds u and l are replaced with ui and li . Note that
if (w, x, y, z) ∈ Q̄ then (w, x, z) ∈ Q, and therefore proj(w,x,z) (Q̄) ⊆ Q.
On the other hand, for any (w, x, z) ∈ Q, letting yi = x2i gives a point
(w, x, y , z) ∈ Q̄. Therefore, Q̄ is indeed an extended formulation of Q, or,
in other words, Q = proj(w,x,z)(Q̄).
Before we present a convex hull description of Q̄ we first define some
basic properties of mixed-integer sets. First, remember that given a closed
set P ⊂ Rn , a point p ∈ P is called an extreme point of P if it can not be
represented as p = 1/2p1 + 1/2p2 for p1 , p2 ∈ P , p1 = p2 . The set P is
called pointed if it has extreme points. A pointed set P is called integral
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 69
with respect to (w.r.t.) a subset of the indices J if for any extreme point
p ∈ P , pi ∈ Z for all i ∈ J.
Lemma 3.4 ([17]). For i = 1, 2 let Pi ⊂ Rni be a closed and pointed
set which is integral w.r.t. indices Ii . Let
then,
(i) P is integral with respect to I1 ∪ I2 .
(ii) conv(P ) = {(x, y) ∈ Rn1 +n2 : x ∈ conv(P1 ), y ∈ conv(P2 )}.
Lemma 3.5 ([17]). Let P ⊂ Rn be a given closed, pointed set and let
where a ∈ Rn .
(i) If P is integral w.r.t. J, then P is also integral w.r.t. J.
(ii) conv(P ) = P where
so that
n
Q̄ = {(w, x, y, z) ∈ R3n+1 : w ≥ qi yi , (x, y, z) ∈ D}.
i=1
www.it-ebooks.info
70 OKTAY GÜNLÜK AND JEFF LINDEROTH
Notice that a given point p̄ = (w̄, x̄, z̄) satisfies the nonlinear inequalities
in the description of Qc for a particular S ⊆ I if and only if one of the
following conditions
hold: (i) z̄i = 0 for some i ∈ S, or, (ii) if all zi > 0,
then w̄ ≥ i∈S qi x̄2i /z̄i . Based on this observation it is possible to show
that these (exponentially many) inequalities are sufficient to describe the
convex hull of Q in the space of the original variables.
Lemma 3.7 ([17]). Qc = proj(w,x,z) (Q̄c ). Note that all of the expo-
nentially many inequalities that are used in the description of Qc are indeed
necessary. To see this, consider a simple instance with ui = li = qi = 1
for all i ∈ I = {1, 2, . . . , n}. For a given S̄ ⊆ I, let pS̄ = (w̄, x̄, z̄) where
w̄ = |S̄| − 1, z̄i = 1 if i ∈ S̄, and z̄i = 0 otherwise, and x̄ = z̄. Note that
pS̄ ∈ Qc . As z̄i = qi x̄2i , inequality (Π) is satisfied by p̄ for S ⊆ I if and only
if
(|S̄| − 1) z̄i ≥ |S| z̄i .
i∈S i∈S
Note that unless S ⊆ S̄, the term i∈S z̄i becomes zero and therefore
inequality (Π) is satisfied. In addition, inequality (Π) is satisfied whenever
|S̄| > |S|. Combining these two observations, we can conclude that the
only inequality violated by pS̄ is the one with S = S̄. Due to its size, the
projected set is not practical for computational purposes and we conclude
that it is more advantageous to work in the extended space, keeping the
variables yi
3.5. A simple non-quadratic set. The simple 3 variable mixed-
integer set S introduced in Section 3.3 can be generalized to the following
set, studied by Aktürk, Atamtürk, and Gürel [1]:
C = (x, y, z) ∈ R2 × B : y ≥ xa/b , uz ≥ x ≥ lz, x ≥ 0
C 0 = {(0, y, 0) ∈ R3 : y ≥ 0},
and
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 71
By applying Lemma 3.2, the convex hull of C is given by using the perspec-
tive of the function f (y, x) = y b − xa and scaling the resulting inequality
by z a .
Lemma 3.8 (Aktürk, Atamtürk, Gürel [1]). The convex hull of C is
given by
Cc = (x, y, z) ∈ R3 : y b z a−b ≥ xa , uz ≥ x ≥ lz, 1 ≥ z ≥ 0, x, y ≥ 0 .
www.it-ebooks.info
72 OKTAY GÜNLÜK AND JEFF LINDEROTH
def
fij
ρ(f ) = rij ,
1 − fij /uij
(i,j)∈A
rij fij
yij ≥ ∀(i, j) ∈ A, (4.4)
1 − fij /uij
yij ≤ β,
(i,j)∈A
|A|×|K| |A| |A|
x ∈ R+ , y ∈ R+ , f ∈ R + , z ∈ {0, 1}|A| .
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 73
The inequality (4.8) together with inequalities (4.6) and (4.7) is the set C
studied in Section 3.5 and therefore, inequality (4.8) can be replaced with
its perspective counterpart
a /b
xij ij ij
zij ≤ yij (4.9)
zij
www.it-ebooks.info
74 OKTAY GÜNLÜK AND JEFF LINDEROTH
z∈P (4.12)
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 75
for all i ∈ I and t ∈ T so that the new variable yit can replace the term
fit (xit ) in the objective function. Using inequalities (4.11) and (4.13), we
can now replace inequality (4.14) with its perspective counterpart
ait x2it + bit xit zit ≤ yit zit (4.15)
to obtain a stronger formulation.
4.5. Stochastic service system design. Elhedhli [10] describes a
stochastic service system design problem (SSSD) modeled as a network of
M/M/1 queues. The instance is characterized by a sets of customers M ,
facilities N , and service levels K. There are binary decision variables xij
to denote if customer i’s demand is met by facility j and yjk to denote if
facility j is operating at service level k. Customer i has a mean demand
rate of λi and facility j has a mean service rate of μjk when operated at
service level k. There is a fixed cost cij of assigning customer i to facility
j, and a fixed cost fjk of operating facility j at level k.
A straightforward formulation of problem is not convex, however, by
introducing auxiliary variables vj and zjk , Elhedhli provides the following
convex MINLP formulation:
min cij xij + t vj + fjk yjk
i∈M j∈N j∈N j∈N k∈K
subject to λi xij − μjk zjk = 0 ∀j ∈ N
i∈M k∈K
xij = 1 ∀i ∈ M
j∈N
yjk ≤ 1 ∀j ∈ N
k∈K
zjk − yjk ≤ 0 ∀j ∈ N, ∀k ∈ K (4.16)
zjk − vj /(1 + vj ) ≤ 0 ∀j ∈ N, ∀k ∈ K (4.17)
zjk , vj ≥ 0, xij , yjk ∈ {0, 1} ∀i ∈ M, j ∈ N, ∀k ∈ K (4.18)
Instead of directly including the nonlinear constraints (4.17) in the for-
mulation, Elhedhli proposes linearizing the constraints at points (vj , zjk ) =
(vjb , 1), b ∈ B, yielding
1 (vjb )2
zjk − b
vj ≤ . (4.19)
(1 + vj )2 (1 + vjb )2
Elhedhli uses a dynamic cutting plane approach to add inequalities (4.19).
Notice that if yjk = 0, then zjk = 0 and vj ≥ 0, and therefore inequal-
ity (4.17) can be replaced by their perspective counterpart
vj
zjk ≤ (4.20)
1 + vj /yjk
www.it-ebooks.info
76 OKTAY GÜNLÜK AND JEFF LINDEROTH
1 (vjb )2
zjk − vj ≤ yjk (4.21)
(1 + vjb )2 (1 + vjb )2
which dominate the inequalities used in Elhedhli [10]. Note that the in-
equalities (4.21) could also be derived by applying a logical integer strength-
ening argument to the inequalities (4.19). The linearized perspective in-
equalities are called perspective cuts [11], which are discussed in greater
detail in Section 5.3. Computational results demonstrating the effect of
the perspective reformulation on this application are given in Section 6.2.
4.6. Portfolio selection. A canonical optimization problem in finan-
cial engineering is to find a minimum variance portfolio that meets a given
minimum expected return requirement of ρ > 0, see [22]. In the prob-
lem, there is a set N of assets available for purchase. The expected return
of asset i ∈ N is given by αi , and the covariance of the returns between
pairs of assets is given in the form of a positive-definite matrix Q ∈ Rn×n .
There can be at most K different assets in the portfolio and there is a min-
imum and maximum buy-in thresholds for the assets chosen. A MINLP
formulation of the problem is
min{xT Qx | eT x = 1, αT x ≥ ρ, eT z ≤ K; i zi ≤ xi ≤ ui zi , zi ∈ B ∀i ∈ N },
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 77
x2 /z − y ≤ 0, (5.1)
x2 − yz ≤ 0. (5.2)
and the constraint function in (5.3) is convex. This, however, may intro-
duce yet a different obstacle to NLP software, as the constraint function in
(5.3) is not differentiable at (x, y, z) = (0, 0, 0). In Section 6 we will show
some computational experiments aimed at demonstrating the effectiveness
of NLP software at handling perspective constraints in their various equiv-
alent forms.
www.it-ebooks.info
78 OKTAY GÜNLÜK AND JEFF LINDEROTH
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 79
The inequalities (4.20) from the SSSD problem from Section 4.5 are repre-
sentable as rotated SOC constraints using the relation
Inequality (5.8) is called the perspective cut and has been introduced by
Frangioni and Gentile [11]. In their paper, Frangioni and Gentile use these
cuts dynamically to build a tight formulation. It is possible to show [18]
that perspective cuts are indeed outer approximation cuts for the perspec-
tive reformulation for this MINLP and therefore adding all (infinitely many)
perspective cuts has the same strength as the perspective reformulation.
Furthermore, an interesting observation is that the perspective cuts
can also be obtained by first building a linear outer approximation of the
original nonlinear inequality v ≥ f (x) + cz, and then strengthening it using
a logical deductive argument. For example, in the SQUFL problem de-
scribed in Section 4.1, the outer approximation of the inequality yij ≥ x2ij
at a given point (x̄, ȳ, z̄) is
www.it-ebooks.info
80 OKTAY GÜNLÜK AND JEFF LINDEROTH
Using the observation that if zi = 0, then xij = yij = 0, this inequality can
be strengthened to
The inequality (5.10) is precisely the perspective cut (5.8) for this instance.
Following their work on perspective cuts, Frangioni and Gentile com-
putationally compare using a LP solver (where perspective cuts are added
dynamically) and using a second-order cone solver [13]. Based on their
experiments on instances of the unit commitment problem and the portfo-
lio optimization problem discussed earlier, they conclude that the dynamic
(linear) approximation approach is significantly better than an SOC ap-
proach. The LP approach offers significant advantages, such as fast resolves
in branch-and-bound, and the extensive array of cutting planes, branching
rules, and heuristics that are available in powerful commercial MILP soft-
ware. However, a dynamic cutting plane approach requires the use of the
callable library of the solver software to add the cuts. For practitioners,
an advantage of nonlinear automatic reformulation techniques is that they
may be directly implemented in a modeling language.
Here, we offer a simple heuristic to obtain some of the strength of the
perspective reformulation, while retaining the advantages of MILP software
to solve the subproblem and an algebraic modeling language to formulate
the instance. The heuristic works by choosing a set of points in advance
and writing the perspective cuts using these points. This essentially gives
a linear relaxation of the perspective formulation that uses piecewise linear
under-approximations of the nonlinear functions. Solving this underap-
proximating MILP provides a lower bound on the optimal solution value of
the MINLP. To obtain an upper bound, the integer variables may be fixed
at the values found by the solution to the MILP, and a continuous NLP
solved. We will demonstrate the effectiveness of this approach in the next
section.
6. Computational results. The improvement in computational
performance that can be obtained by using the perspective reformulation
is exhibited on two families of instances, SQUFL (described in Section 4.1)
and SSSD (described in Section 4.5). We also demonstrate the behavior
of the various methodologies available (NLP, SOCP, LP) for solving the
relaxations. When demonstrating the behavior of LP to solve the relax-
ations, we use the heuristic approach described at the end of Section 5.3.
The reader interested in comparisons to a dynamic outer-approximation
approach is referred to the work of Frangioni and Gentile [13].
6.1. Separable quadratic uncapacitated facility location. Ran-
dom instances of SQUFL were generated similar to the instances of Günlük,
Lee, and Weismantel [16]. For each facility i ∈ M , a location pi is gen-
erated uniformly in [0, 1]2 and the variable cost parameter was calculated
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 81
www.it-ebooks.info
82 OKTAY GÜNLÜK AND JEFF LINDEROTH
Table 1
Computational behavior of nonlinear formulations on SQUFL.
Original Perspective
m n z̄ ∗ #sol T̄ N̄ #sol T̄ N̄
20 100 408.31 10 307 6,165 10 18 37
20 150 508.38 10 807 7,409 10 33 29
30 100 375.86 10 4,704 67,808 10 33 53
30 150 462.69 7 16,607 96,591 10 56 40
Table 2
Computational behavior of linear formulations on SQUFL.
Original Perspective
m n z̄ ∗ z̄ub z̄lb T̄ N̄ T̄ N̄
20 100 408.31 410.88 373.79 247 491 28 4
20 150 508.38 510.58 449.42 658 510 183 3
30 100 375.86 378.45 335.58 346 510 171 3
30 150 462.69 466.76 389.30 948 475 582 4
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 83
true optimal solution. In order to reduce the gap between lower and upper
bounds in this heuristic approach, a finer approximation of the nonlinear
function can be created by increasing |B|. Table 3 shows the results of
an experiment where the instances were approximated with more linear
inequalities, and the strengthened (perspective) reformulation was solved.
In the table, Ḡ(%) = 100(z̄ub − z̄lb )/z̄ub denotes the average gap between
lower and upper bounds in the heuristic approach. For these instances,
|B| = 50 breakpoints is typically sufficient to prove that the solution ob-
tained is within 1% of optimality. Note, however, the increase in CPU time
required to solve the linear relaxations.
Table 3
Impact of number of piecewise linear approximation on gap and solution time.
m n |B| Ḡ(%) T̄ N̄
20 100 10 9.12% 28 4
20 100 25 1.23% 122 2
20 100 50 0.31% 367 3
20 150 10 11.98% 183 3
20 150 25 1.45% 841 6
20 150 50 0.41% 2338 6
30 100 10 11.32% 171 3
30 100 25 1.35% 1000 9
30 100 50 0.39% 1877 5
30 150 10 16.6% 582 4
30 150 25 2.09% 1433 6
30 150 50 0.48% 3419 6
www.it-ebooks.info
84 OKTAY GÜNLÜK AND JEFF LINDEROTH
Table 4
Bonmin performance on SSSD instances.
Each instance was solved six times with the following combination of
formulation and software:
1. The original MINLP was solved with Bonmin (v0.9);
2. The perspective strengthened MINLP was solved with Bonmin;
3. The original instance was formulated using CQI and solved with
Mosek (v5.0);
4. The perspective strengthened instance was formulated using CQI
and solved with Mosek;
5. The linear under-approximation, not strengthened with perspec-
tive cuts, was solved with the CPLEX (v11.1) MILP solver. After
fixing the integer variables to the solution of this problem, the
continuous NLP problem was solved with IPOPT (v3.4);
6. The linear under-approximation, strengthened with perspective
cuts, was solved with the CPLEX MILP solver. After fixing the
integer variables to the solution of this problem, the continuous
NLP problem was solved with IPOPT (v3.4).
NLP Solvers: Table 4 shows the results solving each instance, with and
without the perspective strengthening, using the MINLP solver Bonmin.
Bonmin uses the interior-point-based solver IPOPT to solve nonlinear re-
laxations. The table lists the optimal solution value (z ∗ ) (or bounds on
the best optimal solution), the CPU time required (T ) in seconds, and the
number of nodes evaluated (#N ). A time limit of 4 hours was imposed.
In all cases, the NLP solver IPOPT failed at a node of the branch
and bound tree with the message “Error: Ipopt exited with error
Restoration failed.” For this instance, the NLP relaxation of SSSD
(especially the perspective-enhanced NLP relaxation) appears difficult to
solve. The fundamental issue is reliability not time, as when successful, all
NLP relaxations solved in less than one second. We performed a small ex-
periment designed to test the impact of the formulation and NLP software.
In this experiment, four different nonlinear formulations of the perspective
constraints were used, and the root node NLP relaxation was solved by
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 85
Table 5
Number of successful SSSD relaxation solutions (out of 10).
Formulation
Solver F1 F2 F3 F4
Ipopt 0 10 10 10
Conopt 0 0 10 10
SNOPT 0 3 0 7
three different NLP packages: Ipopt (v3.4), Conopt (v3.14S), and SNOPT
(v7.2-4). The root relaxation was also solved by Mosek (using the conic
formulation) to obtain the true optimal solution value. The four different
formulations of the perspective strengthening of the nonlinear constraints
(4.17) were the following:
www.it-ebooks.info
86 OKTAY GÜNLÜK AND JEFF LINDEROTH
while Mosek does not solve this instance in more than 1.9 million nodes.
For these SSSD instances, Bonmin is able to add strong valid inequalities
to improve performance, while Mosek does not add these inequalities.
Table 6
Mosek performance on SSSD instances.
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 87
Table 7
Linear/CPLEX performance on SSSD instances.
Table 8
Initial and final root lower bounds for SSSD instances.
REFERENCES
www.it-ebooks.info
88 OKTAY GÜNLÜK AND JEFF LINDEROTH
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 89
[25] R. Stubbs and S. Mehrotra, A branch-and-cut method for 0-1 mixed convex
programming, Mathematical Programming, 86 (1999), pp. 515–532.
[26] R.A. Stubbs, Branch-and-Cut Methods for Mixed 0-1 Convex Programming, PhD
thesis, Northwestern University, December 1996.
[27] J.F. Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over sym-
metric cones, Optimization Methods and Software, 11–12 (1999), pp. 625–653.
[28] M. Tawarmalani and N.V. Sahinidis, Global optimization of mixed integer non-
linear programs: A theoretical and computational study, Mathematical Pro-
gramming, 99 (2004), pp. 563–591.
www.it-ebooks.info
www.it-ebooks.info
PART II:
Disjunctive Programming
www.it-ebooks.info
www.it-ebooks.info
GENERALIZED DISJUNCTIVE PROGRAMMING:
A FRAMEWORK FOR FORMULATION AND
ALTERNATIVE ALGORITHMS FOR
MINLP OPTIMIZATION
IGNACIO E. GROSSMANN∗ AND JUAN P. RUIZ
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 93
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_4,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
94 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 95
formulation process intuitive, but it also keeps in the model the underlying
logic structure of the problem that can be exploited to find the solution
more efficiently. A particular case of these models is generalized disjunc-
tive programming (GDP) [32] the main focus of this paper, and which can
be regarded as a generalization of disjunctive programming [3]. Process
Design [15] and Planning and Scheduling [27] are some of the areas where
GDP formulations have shown to be successful.
2.1. Formulation. The general structure of a GDP can be repre-
sented as follows [32]:
min Z = f (x) + k∈K ck
s.t. g(x) ≤ 0
⎡ ⎤
Yik
∨ ⎣ rik (x) ≤ 0 ⎦ k∈K (GDP)
i∈Dk
ck = γik
Ω(Y ) = T rue
xlo ≤ x ≤ xup
x ∈ Rn , ck ∈ R1 , Yik ∈ {T rue, F alse}, i ∈ Dk , k ∈ K
www.it-ebooks.info
96 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
of unit operations to use (i.e. HX1, R1, R2, DC1) with a cost ck = γk ,
k ∈ {HX1, R1, R2, DC1} , in order to maximize the profit.
The generalized disjunctive program that represents the problem can
be formulated as follows:
max Z = P1 F8 − P2 F1 − ck (1)
k∈K
s.t. F1 = F3 + F2 (2)
F8 = F7 + F5 (3)
⎡ ⎤ ⎡ ⎤
YHX1 ¬YHX1
⎣ F4 = F3 ⎦ ∨ ⎣ F4 = F3 = 0 ⎦ (4)
cHX1 = γHX1 cHX1 = 0
⎡ ⎤ ⎡ ⎤
YR2 ¬YR2
⎣ F5 = β1 F4 ⎦ ∨ ⎣ F5 = F4 = 0 ⎦ (5)
cR2 = γR2 cR2 = 0
⎡ ⎤ ⎡ ⎤
YR1 ¬YR1
⎣ F6 = β2 F2 ⎦ ∨ ⎣ F6 = F2 = 0 ⎦ (6)
cR1 = γR1 cR1 = 0
⎡ ⎤ ⎡ ⎤
YDC1 ¬YDC1
⎣ F7 = β3 F6 ⎦ ∨ ⎣ F7 = F6 = 0 ⎦ (7)
cDC1 = γDC1 cDC1 = 0
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 97
where (1) represents the objective function, (2) and (3) are the global
constraints representing the mass balances around the splitter and mixer
respectively, the disjunctions (4),(5),(6) and (7) represent the existence
or non-existence of the unit operation k , k ∈ {HX1, R1, R2, DC1} with
their respective characteristic equations where β is the ratio between the
inlet and outlet flows and (8) and (9) the logic propositions which enforce
the selection of DC1 if and only if R1 is chosen and HX1 if and only if
R2 is chosen. For the sake of simplicity we have presented here a simple
linear model. In the actual application to a process problem there would
be hundreds or thousands of nonlinear equations.
www.it-ebooks.info
98 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
Ay ≥ a
x ∈ Rn , ν ik ∈ Rn , ck ∈ R1 , yik ∈ {0, 1} , i ∈ Dk , k ∈ K.
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 99
spatial branch and bound method include BARON [34], LINDOGlobal [26],
and Couenne [5].
2.3.2. Logic-Based Methods. In order to fully exploit the logic
structure of GDP problems, two other solution methods have been proposed
for the case of convex nonlinear GDP, namely, the Disjunctive Branch
and Bound method [22], which builds on the concept of Branch and Bound
method by Beaumont [4] and the Logic-Based Outer-Approximation
method [40].
The basic idea in the disjunctive Branch and Bound method is
to directly branch on the constraints corresponding to particular terms
in the disjunctions, while considering the hull relaxation of the remaining
disjunctions. Although the tightness of the relaxation at each node is com-
parable with the one obtained when solving the HR reformulation with a
MINLP solver, the size of the problems solved are smaller and the numerical
robustness is improved.
For the case of Logic-Based Outer-Approximation methods, sim-
ilar to the case of OA for MINLP, the main idea is to solve iteratively a
master problem given by a linear GDP, which will give a lower bound of
the solution and an NLP subproblem that will give an upper bound. As
described in Turkay and Grossmann [40], for fixed values of the Boolean
variables, Yîk = T rue , Yik = F alse with î = i, the corresponding NLP
subproblem (SNLP) is as follows:
min Z = f (x) + k∈K ck
s.t. g(x) ≤ 0
rik (x) ≤ 0
f or Yik = T rue i ∈ Dk , k ∈ K (SNLP)
ck = γik
lo up
x ≤x≤x
x ∈ Rn , ck ∈ R1 .
www.it-ebooks.info
100 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
⎡ ⎤
Yik
∨ ⎣ rik (x ) + ∇rik (x )(x − x ) ≤ 0 ∈ Lik ⎦ k ∈ K (MLGDP)
i∈Dk
ck = γik
Ω(Y ) = T rue
xlo ≤ x ≤ xup
α ∈ R1 , x ∈ Rn , ck ∈ R1 , Yik ∈ {T rue, F alse} , i ∈ Dk , k ∈ K.
It should be noted that before applying the above master problem it is
necessary to solve various subproblems (SNLP) for different values of the
Boolean variables Yik so as to produce at least one linear approximation
of each of the terms i ∈ Dk in the disjunctions k ∈ K. As shown by
Turkay and Grossmann [40] selecting the smallest number of subproblems
amounts to solving a set covering problem, which is of small size and easy
to solve. It is important to note that the number of subproblems solved
in the initialization is often small since the combinatorial explosion that
one might expect is in general limited by the propositional logic. This
property frequently arises in Process Networks since they are often modeled
by using two terms disjunctions where one of the terms is always linear (see
remark below). Moreover, terms in the disjunctions that contain only linear
functions need not be considered for generating the subproblems. Also, it
should be noted that the master problem can be reformulated as an MILP
by using the big-M or Hull reformulation, or else solved directly with a
disjunctive branch and bound method.
Remark. In the context of process networks the disjunctions in the
(GDP) formulation typically arise for each unit i in the following form:
⎡ ⎤ ⎡ ⎤
Yi ¬Yi
⎣ ri (x) ≤ 0 ⎦ ∨ ⎣ B i x = 0 ⎦ i ∈ I
ci = γi ci = 0
in which the inequalities ri apply and a fixed cost γi is incurred if the unit
is selected (Yi ); otherwise (¬Yi ) there is no fixed cost and a subset of the
x variables is set to zero.
2.3.3. Example. We present here numerical results on an example
problem dealing with the synthesis of a process network that was originally
formulated by Duran and Grossmann [10] as an MINLP problem, and
later by Turkay and Grossmann [40] as a GDP problem. Figure 2 shows
the superstructure that involves the possible selection of 8 processes. The
Boolean variables Yj denote the existence or non-existence of processes 1-8.
The global optimal solution is Z ∗ =68.01, and consists of the selection of
processes 2, 4, 6, and 8.
The model in the form of the GDP problem involves disjunctions for
the selection of units, and propositional logic for the relationship of these
units. Each disjunction contains the equation for each unit (these relax as
convex inequalities). The model is as follows:
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 101
Objective function:
min Z = c1 + c2 + c3 + c4 + c5 + c6 + c7 + c8 + x2 − 10x3 + x4
−15x5 − 40x9 + 15x10 + 15x14 + 80x17 − 65x18 + 25x9 − 60x20
+35x21 − 80x22 − 35x25 + 122
Material balances at mixing/splitting points:
x3 + x5 − x6 − x11 = 0
x13 − x19 − x21 = 0
x17 − x9 − x16 − x25 = 0
x11 − x12 − x15 = 0
x6 − x7 − x8 = 0
x23 − x20 − x22 = 0
x23 − x14 − x24 = 0
Specifications on the flows:
x10 − 0.8x17 ≤ 0
x10 − 0.4x17 ≥ 0
x12 − 5x14 ≤ 0
x12 − 2x14 ≥ 0
Disjunctions:
⎡ ⎤ ⎡ ⎤
Y1 ¬Y1
Unit 1: ⎣ ex3 − 1 − x2 ≤ 0 ⎦ ∨ ⎣ x2 = x3 = 0 ⎦
c1 = 5 c1 = 0
⎡ ⎤ ⎡ ⎤
Y2 ¬Y2
Unit 2: ⎣ ex5 /1.2 − 1 − x4 ≤ 0 ⎦ ∨ ⎣ x4 = x5 = 0 ⎦
c2 = 8 c2 = 0
⎡ ⎤ ⎡ ⎤
Y3 ¬Y3
Unit 3: ⎣ 1.5x9 − x8 + x10 ≤ 0 ⎦ ∨ ⎣ x8 = x9 = x10 = 0 ⎦
c3 = 6 c3 = 0
www.it-ebooks.info
102 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
⎡ ⎤ ⎡ ⎤
Y4 ¬Y4
Unit 4: ⎣ 1.5(x12 + x14 ) − x13 = 0 ⎦ ∨ ⎣ x12 = x13 = x14 = 0 ⎦
c4 = 10 c4 = 0
⎡ ⎤ ⎡ ⎤
Y5 ¬Y5
Unit 5: ⎣ x15 − 2x16 = 0 ⎦ ∨ ⎣ x15 = x16 = 0 ⎦
c5 = 6 c5 = 0
⎡ ⎤ ⎡ ⎤
Y6 ¬Y6
Unit 6: ⎣ ex20 /1.5 − 1 − x19 ≤ 0 ⎦ ∨ ⎣ x19 = x20 = 0 ⎦
c6 = 7 c6 = 0
⎡ ⎤ ⎡ ⎤
Y7 ¬Y7
Unit 7: ⎣ ex22 − 1 − x21 ≤ 0 ⎦ ∨ ⎣ x21 = x22 = 0 ⎦
c7 = 4 c7 = 0
⎡ ⎤ ⎡ ⎤
Y8 ¬Y8
Unit 8: ⎣ ex18 − 1 − x10 − x17 ≤ 0 ⎦ ∨ ⎣ x10 = x17 = x18 = 0 ⎦
c8 = 5 c8 = 0
Propositional Logic:
Y1 ⇒ Y3 ∨ Y4 ∨ Y5 ; Y2 ⇒ Y3 ∨ Y4 ∨ Y5 ; Y3 ⇒ Y1 ∨ Y2 ; Y3 ⇒ Y8
Y4 ⇒ Y1 ∨ Y2 ; Y4 ⇒ Y6 ∨ Y7 ; Y5 ⇒ Y1 ∨ Y2 ; Y5 ⇒ Y8
Y6 ⇒ Y4 ; Y7 ⇒ Y4
Y8 ⇒ Y3 ∨ Y5 ∨ (¬Y3 ∧ ¬Y5 )
Specifications:
Y1 ∨ Y2 ; Y4 ∨ Y5 ; Y6 ∨ Y7
Variables:
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 103
Table 1
Results using different GDP solution methods.
min Z = dT x + k ck
s.t. Bx ≤ b
⎡ ⎤
Yik
∨ ⎣ Aik x ≤ aik ⎦ k ∈ K (LGDP)
i∈Dk
ck = γik
Ω(Y ) = T rue
xlo ≤ x ≤ xup
x ∈ Rn, ck ∈ R1 , Yik ∈ {T rue, F alse} , i ∈ Dk , k ∈ K.
www.it-ebooks.info
104 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
work of Sawaya and Grossmann [35] two issues may arise. Firstly, the con-
tinuous relaxation of LBM is often weak, leading to a large number of nodes
enumerated in the branch and bound procedure. Secondly, the increase in
the size of LHR due to the disaggregated variables and new constraints may
not compensate for the strengthening obtained in the relaxation, resulting
in a high computational effort. In order to overcome these issues, Sawaya
and Grossmann [35] proposed a cutting plane methodology that consists
in the generation of cutting planes obtained from the LHR and used to
strengthen the relaxation of LBM. It is important to note, however, that in
the last few years, MIP solvers have improved significantly in the use of the
problem structure to reduce automatically the size of the formulation. As
a result the emphasis should be placed on the strength of the relaxations
rather than on the size of formulations. With this in mind, we present next
the last developments in linear GDPs.
Sawaya [36] proved that any Linear Generalized Disjunctive Program
(LGDP) that involves Boolean and continuous variables can be equivalently
formulated as a Disjunctive Program (DP), that only involves continuous
variables. This means that we are able to exploit the wealth of theory
behind DP from Balas [2, 3] in order to solve LGDP more efficiently.
One of the properties of disjunctive sets is that they can be expressed
in many different equivalent forms. Among these forms, two extreme ones
are the Conjunctive Normal Form (CNF), which is expressed as the inter-
section of elementary sets (i.e. sets that are the union of half spaces), and
the Disjunctive Normal Form (DNF), which is expressed as the union of
polyhedra. One important result in Disjunctive Programming Theory, as
presented in the work of Balas [3], is that we can systematically generate
a set of equivalent DP formulations going from the CNF to the DNF by
using an operation called basic step (Theorem 2.1 [3]), which preserves
regularity. A basic step is defined as "follows. Let F be#the disjunctive set
in regular form (RF) given by F = Sj where Sj = Pi , Pi a polyhe-
j∈T i∈Qj
"
dron, i ∈ Qj .#For k,"l ∈ T, k = l, a basic step consists in replacing Sk Sl
with Skl = (Pi Pj ). Note that a basic step involves intersecting a
i∈Qk
j∈Ql
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 105
min Z = x2
s.t. 0.5x1 + x2 ≤ 1
⎡ ⎤ ⎡ ⎤
Y1 ¬Y1
⎣ x1 = 0 ⎦ ∨ ⎣ x1 = 1 ⎦ (LGDP1)
x2 = 0 0 ≤ x2 ≤ 1
0 ≤ x1 , x2 ≤ 1
x1 , x2 ∈ R, Y1 ∈ {T rue, F alse}.
min Z = x2
⎡ ⎤ ⎡ ⎤
Y1 ¬Y1
⎢ x = 0 ⎥ ⎢ x1 = 1
⎥
s.t. ⎢ 1 ⎥∨⎢ ⎥ (LGDP2)
⎣ x2 = 0 ⎦ ⎣ 0 ≤ x2 ≤ 1 ⎦
0.5x1 + x2 ≤ 1 0.5x1 + x2 ≤ 1
0 ≤ x1 , x2 ≤ 1
x1 , x2 ∈ R, Y1 ∈ {T rue, F alse}.
www.it-ebooks.info
106 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
min Z = lt
s.t. lt ≥ xi + Li ∀i ∈ N
Yij1 Yij2
∨ ∨
xi + Li ≤ xj xj + Lj ≤ xi
Yij3 Yij4
∨ ∨ ∀i, j ∈ N, i < j
yi − H i ≥ y j yj − Hj ≥ yi
xi ≤ U Bi − Li ∀i ∈ N
Hi ≤ yi ≤ W ∀i ∈ N
1
lt, xi , yi ∈ R+ , Yij1,2,3,4 ∈ {T rue, F alse} ∀i, j ∈ N, i < j.
In Table 2, the approach using basic steps to obtain stronger relax-
ations is compared with the original formulation.
Table 2
Comparison of sizes and lower bounds between original and new MIP reformulations.
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 107
Ω(Y ) = T rue
xlo ≤ x ≤ xup
x ∈ Rn , ck ∈ R1 , Yik ∈ {T rue, F alse} , i ∈ Dk , k ∈ K
where f¯, r̄ik , ḡ are convex and the following inequalities are satisfied
f¯(x) ≤ f (x), r̄ik (x) ≤ rik (x), ḡ(x) ≤ g(x). Note that suitable convex un-
derestimators for these functions can be found in Tawarmalani and Sahini-
dis [39].
The feasible region of (RGDPNC) can be relaxed by replacing each
disjunction by its convex hull. This relaxation yields the following convex
NLP:
min Z = f¯(x) + i∈Dk k∈K γij yik
s.t. x = i∈DK ν ik k ∈ K
ḡ(x) ≤ 0
yik r̄ik (ν ik /yik ) ≤ 0 i ∈ Dk , k ∈ K (RGDPRNC)
ik up
0≤ν ≤ yik x i ∈ Dk , k ∈ K
i∈Dk yik = 1 k ∈ K
Ay ≥ a
x ∈ Rn , νik ∈ Rn , ck ∈ R1 , yik ∈ [0, 1] , i ∈ Dk , k ∈ K.
As proved in Lee and Grossmann [23] the solution of this NLP formulation
leads to a lower bound of the global optimum.
www.it-ebooks.info
108 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
The second step consists in using the above relaxation to predict lower
bounds within a spatial branch and bound framework. The main steps in
this implementation are described in Figure 4. The algorithm starts by
obtaining a local solution of the nonconvex GDP problem by solving the
MINLP reformulation with a local optimizer (e.g. DICOPT), which pro-
vides an upper bound of the solution (Z U ). Then, a bound contraction pro-
cedure is performed as described by Zamora and Grossmann [48]. Finally,
a partial branch and bound method is used on RGDP N C as described in
Lee and Grossmann [23] that consists in only branching on the Boolean
variables until a node with all the Boolean variables fixed is reached. At
this point a spatial branch and bound procedure is performed as described
in Quesada and Grossmann [30].
While the method proved to be effective in solving several problems,
a major question is whether one might be able to obtain stronger lower
bounds to improve the computational efficiency.
Recently, Ruiz and Grossmann [33] proposed an enhanced methodol-
ogy that builds on the work of Sawaya [36] to obtain stronger relaxations.
The basic idea consists in relaxing the nonconvex terms in the GDP us-
ing valid linear over- and underestimators previous to the application of
basic steps. This leads to a new linear GDP whose continuous relaxation
is tighter and valid for the original nonconvex GDP problem. The imple-
mentation of basic steps is not trivial. Therefore, Ruiz and Grossmann [33]
proposed a set of rules that aims at keeping the formulation small while
improving the relaxation. Among others, it was shown that intersecting
the global constraints with the disjunctions leads to a linear GDP with the
same number of disjuncts but a stronger relaxation.
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 109
Table 3
Reactor characteristics.
The bilinear GDP model, which maximizes the profit, can be stated
as follows:
max Z = θF X − γF − CP
s.t. FX ≤ d
⎡ ⎤ ⎡ ⎤
Y11 Y21
⎢ F = α1 X + β 1 ⎥ ⎢ F = α2 X + β2 ⎥
⎢ ⎥∨⎢ ⎥ (GDP 1N C )
⎣ X1lo ≤ X ≤ X1up ⎦ ⎣ X2lo ≤ X ≤ X2up ⎦
CP = Cp1 CP = Cp2
Y11 ∨ Y21 = T rue
X, F, CP ∈ R1 , F lo ≤ F ≤ F up , Y11 , Y21 ∈ {T rue, F alse}
www.it-ebooks.info
110 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
max Z = θP − γF − CP
s.t. P ≤d
P ≤ F X lo + F up X − F up X lo (GDP 1RLP 0 )
up lo lo up
P ≤ FX + F X − F X
P ≥ F X lo + F lo X − F lo X lo
P ≥ F X up + F up X − F up X up
⎡ ⎤ ⎡ ⎤
Y11 Y21
⎢ F = α1 X + β 1 ⎥ ⎢ F = α2 X + β2 ⎥
⎢ ⎥ ⎢ ⎥
⎣ X1lo ≤ X ≤ X1up ⎦ ∨ ⎣ X2lo ≤ X ≤ X2up ⎦
CP = Cp1 CP = Cp2
Y11 ∨ Y21 = T rue
X, F, CP ∈ R1 , F lo ≤ F ≤ F up , Y11 , Y21 ∈ {T rue, F alse}.
Figure 6 shows the actual feasible region of (GDP 1N C ) and the projec-
tion on the F − X space of the hull relaxations of (GDP 1RLP 0 ) and
(GDP 1RLP 1 ), where clearly the feasible space in (GDP 1RLP 1 ) is tighter
than in (GDP 1RLP 0 ). Notice that in this case the choice of reactor II is
infeasible.
Example. Water treatment network . This example corresponds
to a synthesis problem of a distributed wastewater multicomponent network
(See Figure 7), which is taken from Galan and Grossmann [12]. Given a set
of process liquid streams with known composition, a set of technologies for
the removal of pollutants, and a set of mixers and splitters, the objective
is to find the interconnections of the technologies and their flowrates to
meet the specified discharge composition of pollutant at minimum total
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 111
cost. Discrete choices involve deciding what equipment to use for each
treatment unit.
Lee and Grossmann [23] formulated this problem as the following noncon-
vex GDP problem:
min Z = k∈P U CPk
s.t. fkj = i∈Mk fij ∀jk ∈ M U
j j
i∈S fi = fk ∀jk ∈ SU
k k
i∈Sk ζi = 1 k ∈ SU
fij = ζik fkj ∀j i ∈ Sk k ∈ SU
⎡ ⎤
Y Pkh
⎢ f j = β jh f j , i ∈ OP Uk , i ∈ IP Uk , ∀j ⎥
∨ ⎢ i k i ⎥ k ∈ PU
h∈Dk ⎣ Fk = j fij , i ∈ OP Uk ⎦
CPk = ∂ik Fk
www.it-ebooks.info
112 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
0 ≤ ζik ≤ 1 ∀i, k
0 ≤ fij , fkj ∀i, j, k
0 ≤ CPk ∀k
Y Pkh ∈ {T rue, F alse} ∀h ∈ Dk ∀k ∈ P U.
The problem involves 9 discrete variables and 114 continuous variables
with 36 bilinear terms.
As it can be seen in Table 4, an improved lower bound was obtained
(i.e. 431.9 vs 400.66) which is a direct indication of the reduction of the
relaxed feasible region. The column “Best Lower Bound”, can be used
as an indicator of the performance of the proposed set of rules to apply
basic steps. Note that the lower bound obtained in this new approach is
the same as the one obtained by solving the relaxed DNF, which is quite
remarkable. A further indication of tightening is shown in Table 5 where
numerical results of the branch and bound algorithm proposed in section
6 are presented. As it can be seen the number of nodes that the spatial
branch and bound algorithm requires before finding the global solution is
significantly reduced.
Table 4
Comparison of lower bounds obtained using different relaxations.
Table 5
Performance using different relaxations within a spatial B&B.
Table 6 shows the size of the LP relaxation obtained in each case. Note
that although the proposed methodology leads to a significant increase in
the size of the formulation, this is not translated proportionally to the
solution time of the resulting LP. This behavior can be understood by
considering that in general, the LP pre-solver will take advantage of the
particular structures of these LPs.
3. Conclusions. In this paper we have provided an overview of the
Generalized Disjunctive Programming Framework. We presented different
solution strategies that exploit the underlying logic structure of the formu-
lations with particular focus on how to develop formulations that lead to
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 113
Table 6
Size of the LP relaxation for example problems.
REFERENCES
[1] Abhishek K., Leyffer S., and Linderoth J.T., FilMINT: An Outer-
Approximation-Based Solver for Nonlinear Mixed Integer Programs,
ANL/MCS-P1374-0906, Argonne National Laboratory, 2006.
[2] Balas E., Disjunctive Programming, 5, 3–51, 1979.
[3] Balas E., Disjunctive Programming and a hierarchy of relaxations for discrete
optimization problems, SIAM J. Alg. Disc. Meth., 6, 466–486, 1985.
[4] Beaumont N., An Algorithm for Disjunctive Programs, European Journal of Op-
erations Research, 48, 362–371, 1991.
[5] Belotti P., Lee J., Liberti L., Margot F., and Wächter A., Branching and
bounds tightening techniques for non-convex MINLP, Optimization Methods
and Software, 24:4, 597–634, 2009.
[6] Benders J.F., Partitioning procedures for solving mixed-variables programming
problems, Numer.Math., 4, 238–252, 1962.
[7] Borchers B. and Mitchell J.E., An Improved Branch and Bound Algorithm for
Mixed Integer Nonlinear Programming, Computers and Operations Research,
21, 359–367, 1994.
[8] Bonami P., Biegler L.T., Conn A.R., Cornuejols G., Grossmann I.E., Laird
C.D., Lee J. , Lodi A. , Margot F., Sawaya N., and Wächter A. , An
algorithmic framework for convex mixed integer nonlinear programs, Discrete
Optimization, 5, 186–204, 2008.
[9] Brooke A., Kendrick D., Meeraus A., and Raman R., GAMS, a User’s Guide,
GAMS Development Corporation, Washington, 1998.
[10] Duran M.A. and Grossmann I.E., An Outer-Approximation Algorithm for a
Class of Mixed-integer Nonlinear Programs, Math Programming, 36, p. 307,
1986.
[11] Fletcher R. and Leyffer S., Solving Mixed Integer Nonlinear Programs by
Outer-Approximation, Math Programming, 66, p. 327, 1994.
[12] Galan B. and Grossmann I.E., Optimal Design of Distributed Wastewater Treat-
ment Networks, Ind. Eng. Chem. Res., 37, 4036–4048, 1998.
www.it-ebooks.info
114 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
[13] Geoffrion A.M., Generalized Benders decomposition, JOTA, 10, 237–260, 1972.
[14] Grossmann I.E., Review of Non-Linear Mixed Integer and Disjunctive Program-
ming Techiques for Process Systems Engineering, Optimization and Engineer-
ing, 3, 227–252, 2002.
[15] Grossmann I.E., Caballero J.A., and Yeomans H., Advances in Mathematical
Programming for Automated Design, Integration and Operation of Chemical
Processes, Korean J. Chem. Eng., 16, 407–426, 1999.
[16] Grossmann I.E. and Lee S., Generalized Convex Disjunctive Programming: Non-
linear Convex Hull Relaxation, Computational Optimization and Applica-
tions, 26, 83–100, 2003.
[17] Gupta O.K. and Ravindran V., Branch and Bound Experiments in Convex Non-
linear Integer Programming, Management Science, 31:12, 1533–1546, 1985.
[18] Hooker J.N. and Osorio M.A., Mixed logical-linear programming, Discrete Ap-
plied Mathematics, 96–97, 395–442, 1999.
[19] Hooker J.N., Logic-Based Methods for Optimization: Combining Optimization
and Constraint Satisfaction, Wiley, 2000.
[20] Horst R. and Tuy H., Global Optimization deterministic approaches, 3rd Ed,
Springer-Verlag, 1996.
[21] Kallrath J., Mixed Integer Optimization in the Chemical Process Industry: Ex-
perience, Potential and Future, Trans. I .Chem E., 78, 809–822, 2000.
[22] Lee S. and Grossmann I.E., New Algorithms for Nonlinear Generalized Dis-
junctive Programming, Computers and Chemical Engineering, 24, 2125–2141,
2000.
[23] Lee S. and Grossmann I.E., Global optimization of nonlinear generalized disjunc-
tive programming with bilinear inequality constraints: application to process
networks, Computers and Chemical Engineering, 27, 1557–1575, 2003.
[24] Leyffer S., Integrating SQP and Branch and Bound for Mixed Integer Nonlinear
Programming, Computational Optimization and Applications, 18, 295–309,
2001.
[25] Liberti L., Mladenovic M., and Nannicini G., A good recipe for solv-
ing MINLPs, Hybridizing metaheuristics and mathematical programming,
Springer, 10, 2009.
[26] Lindo Systems Inc, LindoGLOBAL Solver
[27] Mendez C.A., Cerda J., Grossmann I.E., Harjunkoski I., and Fahl M.,
State-of-the-art Review of Optimization Methods for Short-Term Scheduling
of Batch Processes, Comput. Chem. Eng., 30, p. 913, 2006.
[28] Nemhauser G.L. and Wolsey L.A., Integer and Combinatorial Optimization,
Wiley-Interscience, 1988.
[29] Quesada I. and Grossmann I.E., An LP/NLP Based Branch and Bound Algo-
rithm for Convex MINLP Optimization Problems, Computers and Chemical
Engineering, 16, 937–947, 1992.
[30] Quesada I. and Grossmann I.E., Global optimization of bilinear process networks
with multicomponent flows, Computers and Chemical Engineering, 19:12,
1219–1242, 1995.
[31] Raman R. and Grossmann I.E., Relation Between MILP Modelling and Logical
Inference for Chemical Process Synthesis, Computers and Chemical Engineer-
ing, 15, 73, 1991.
[32] Raman R. and Grossmann I.E., Modelling and Computational Techniques for
Logic-Based Integer Programming, Computers and Chemical Engineering, 18,
p. 563, 1994.
[33] Ruiz J.P. and Grossmann I.E., Strengthening the lower bounds for bilinear and
concave GDP problems, Computers and Chemical Engineering, 34:3, 914–930,
2010.
[34] Sahinidis N.V., BARON: A General Purpose Global Optimization Software Pack-
age, Journal of Global Optimization, 8:2, 201–205, 1996.
www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 115
[35] Sawaya N. and Grossmann I.E., A cutting plane method for solving linear gener-
alized disjunctive programming problems, Computers and Chemical Engineer-
ing, 20:9, 1891–1913, 2005.
[36] Sawaya N., Thesis: Reformulations, relaxations and cutting planes for generalized
disjunctive programming, Carnegie Mellon University, 2006.
[37] Schweiger C.A. and Floudas C.A., Process Synthesis, Design and Control: A
Mixed Integer Optimal Control Framework, Proceedings of DYCOPS-5 on
Dynamics and Control of Process Systems, 189–194, 1998.
[38] Stubbs R. and Mehrotra S. , A Branch-and-Cut Method for 0–1 Mixed Convex
Programming, Math Programming, 86:3, 515–532, 1999.
[39] Tawarmalani M. and Sahinidis N., Convexification and Global Optimization
in Continuous and Mixed-Integer Nonlinear Programming, Kluwer Academic
Publishers, 2002.
[40] Turkay M. and Grossmann I.E., A Logic-Based Outer-Approximation Algorithm
for MINLP Optimization of Process Flowsheets, Computers and Chemical
Enginering, 20, 959–978, 1996.
[41] Vecchietti A., Lee S., and Grossmann, I.E., Modeling of discrete/continuous
optimization problems: characterization and formulation of disjunctions and
their relaxations, Computers and Chemical Engineering, 27,433–448, 2003.
[42] Vecchietti A. and Grossmann I.E., LOGMIP: A Discrete Continuous Nonlinear
Optimizer, Computers and Chemical Engineering, 23, 555–565, 2003.
[43] Viswanathan and Grossmann I.E., A combined penalty function and outer-
approximation method for MINLP optimization, Computers and Chemical
Engineering, 14, 769–782, 1990.
[44] Westerlund T. and Pettersson F., A Cutting Plane Method for Solving Con-
vex MINLP Problems, Computers and Chemical Engineering, 19, S131–S136,
1995.
[45] Westerlund T. and Pörn R., Solving Pseudo-Convex Mixed Integer Optimiza-
tion Problems by Cutting Plane Techniques, Optimization and Engineering,
3, 253–280, 2002.
[46] Williams H.P., Mathematical Building in Mathematical Programming, John Wi-
ley, 1985.
[47] Yuan, X., Zhang S., Piboleau L., and Domenech S., Une Methode
d’optimisation Nonlineare en Variables Mixtes pour la Conception de Pro-
cedes, RAIRO, 22, 331, 1988.
[48] Zamora J.M. and Grossmann I.E., A branch and bound algorithm for problems
with concave univariate , bilinear and linear fractional terms, 14:3, 217–249,
1999.
www.it-ebooks.info
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP
PIETRO BELOTTI∗
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 117
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_5,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
118 PIETRO BELOTTI
&
{x ∈ Rn : Ah x ≤ bh , x ∈ S}, (1.1)
h∈Q
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 119
www.it-ebooks.info
120 PIETRO BELOTTI
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 121
www.it-ebooks.info
122 PIETRO BELOTTI
in general easier and has been extensively studied [46, 59, 61, 41]. Several
exact MINLP solvers [54, 12, 42] seek an LP relaxation of Θk defined by the
set LPk = {x ∈ [, u] : B k x ≤ ck }, with B k ∈ Qmk ×(n+q) , ck ∈ Qmk , and
mk is the number of inequalities of the relaxation. In general, mk depends
on ϑk and, for certain constraints, a larger mk yields a better lower bound.
However, in order to keep the size of the linearization limited, a relatively
small value is used. For the linearization
"n+q procedure we have used, mk ≤ 4
for all k. Let us denote LP = k=n+1 LPk . Because LPk ⊇ Θk for all
k = n + 1, n + 2 . . . , n + q, we have LP ⊇ Θ. Hence, minimizing xn+q over
the convex set LP gives a lower bound on the optimal solution value of P0 .
The aforementioned MINLP solvers are branch-and-bound procedures
that obtain, at every BB node, a linearization of P. It is crucial, as pointed
out by McCormick [46], that for each k = n + 1, n + 2 . . . , n + q, the
linearization of Θk be exact at the lower and upper bounds on the variables
appearing as arguments of ϑk , i.e., the linearization must be such that if a
solution x is feasible for the LP relaxation but not for P, then there exists
an i such that i < xi < ui , so that a spatial disjunction xi ≤ xi ∨ xi ≥ xi
can be used.
In general, for all k = n + 1, n + 2 . . . , n + q, both B k and ck de-
pend on the variable bounds and u: the tighter the variable bounds, the
stronger the lower bound obtained by minimizing xn+q over LP. For such
an approach to work effectively, several bound reduction techniques have
been developed [30, 48, 52, 51, 58, 38]. Most of these techniques use the
nonconvex constraints of the reformulation or the linear constraints of the
linearization, or a combination of both. An experimental comparison of
bound reduction techniques used in a MINLP solver is given in [12].
Let us consider, as an example, a constraint xk = ϑk (x) = (xi )3 , and
the nonconvex set Θk = {x ∈ Rn+q ∩ [, u] : xk = (xi )3 }, whose projection
on the (xi , xk ) space is the bold curve in Figure 1(a). A linearization of
Θk is in Figure 1(b), and is obtained through a procedure based on the
function ϑk (x) = (xi )3 and the bounds on xi . As shown in Fig. 1(c), when
tighter bounds are known for xi a better relaxation can be obtained: the
set Θk = {x ∈ Rn+q ∩ [, u ] : xk = (xi )3 }, where uj = uj for j = i and
ui < ui , admits a tighter linearization LP , which is a proper subset of LP
since the linearization is exact at ui .
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 123
i xi i xi i xi
ui ui ui
Fig. 1. Linearization of constraint xk = (xi )3 : the bold line in (a) represents the
nonconvex set {(xi , xk ) : i ≤ xi ≤ ui , xk = (xi )3 }, while the polyhedra in (b) and (c)
are its linearizations for different bounds on xi .
xi ≤ β ∨ xi ≥ β, (3.1)
www.it-ebooks.info
124 PIETRO BELOTTI
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 125
www.it-ebooks.info
126 PIETRO BELOTTI
xk = x2i xk = x2i
xi xi
i ui i β ui
(a) Infeasible solution of a linearization (b) Linearized subproblems
Fig. 2. MINLP Disjunctions and nonconvex constraints. In (a), the shaded area is
the linearization of the constraint xk = ϑk (xi ) with xi ∈ [i , ui ], whereas (xi , xk ) is the
value of xi and xk in the optimum of the LP relaxation. In (b), the spatial disjunction
xi ≤ β ∨ xi ≥ β generates two sets Θk = {x ∈ X ∩ [, u] : xk = x2i , xi ≤ β} and
Θ 2
k = {x ∈ X ∩ [, u] : xk = xi , xi ≥ β}. The corresponding linearizations, LPk and
LPk , are the smaller shaded areas.
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 127
and P , and denote the tightened problems as SLP and SLP (see Figure
2(b)). The two sets can be described as
SLP = {x ∈ LP : B x ≤ c },
SLP = {x ∈ LP : B x ≤ c },
where B ∈ QH ×(n+q) , c ∈ QH , B ∈ QH ×(n+q) , and c ∈ QH are
the coefficient matrices and the right-hand side vectors of the inequalities
added to the linearizations, which contain the new bound on variable xi
and, possibly, new bounds on other variables.
We re-write these two sets in a more convenient form:
SLP = {x ∈ Rn+q : A x ≤ a }
SLP = {x ∈ Rn+q : A x ≤ a },
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
A a A a
⎜ B ⎟ ⎜ c ⎟ ⎜ B ⎟ ⎜ c ⎟
A = ⎜ ⎟
⎝ −I ⎠ , a = ⎜ ⎟
⎝ − ⎠ ; A = ⎜ ⎟
⎝ −I ⎠ , a = ⎜
⎝ − ⎠
⎟
I u I u
α ≤ u A , α0 = u a ,
α ≤ v A , α0 = v a ,
where u ∈ RK K
+ and v ∈ R+ . Given an LP relaxation and its optimal solu-
tion x , an automatic procedure for generating a cut of this type consists
of finding vectors u and v such that the corresponding cut is maximally
violated [7]. This requires solving the Cut Generating Linear Programming
(CGLP) problem
max α x −α0
s.t. α −u A ≤0
α −v A ≤0
α0 −u a =0 (4.1)
α0 −v a =0
u e +v e
=1
u, v ≥0
where e is the vector with all components equal to one. An optimal solu-
tion (more precisely, any solution with positive objective value) provides a
www.it-ebooks.info
128 PIETRO BELOTTI
valid disjunctive cut that is violated by the current solution x . Its main
disadvantage is its size: the CGLP has (n + q + 1) + K + K variables
and 2(n + q) + 3 constraints, and is hence at least twice as large as the LP
used to compute a lower bound. Given that the optimal solution of the
CGLP is used, in our implementation, to produce just one disjunctive cut,
solving one problem (4.1) for each disjunction of a set of violated ones, at
every branch-and-bound node, might prove ineffective. To this purpose,
Balas et al. [6, 9] present a method to implicitly solve the CGLP for binary
disjunctions by applying pivot operations to the original linear relaxation,
only with a different choice of variables. It is worth noting that, unlike the
MILP case, here A and A differ for much more than a single column. As
shown in [2], this implies that the result by Balas et al. does not hold in
this case.
An example. Consider the continuous nonconvex nonlinear program
P0 : min{x2 : x4 ≥ 1}. It is immediate to check that its feasible region is
the nonconvex union of intervals (−∞, −1] ∪ [+1, +∞), and that its two
global minima are −1 and +1. Its reformulation is as follows:
(P) min w
s.t. w = x2
y = x4
y ≥ 1.
It is crucial to note here that, although the problem is trivial and can be
solved by inspection, state-of-the-art MINLP solvers that use reformula-
tion ignore the relationship between the objective function and the con-
straint, i.e., y = w2 . The tightest convex relaxations of the two nonlinear
constraints are obtained by simply replacing the equality with inequality,
therefore any LP relaxation generated by a MINLP solver is a relaxation
of
(CR) min w
s.t. w ≥ x2
y ≥ x4
y ≥ 1,
(CR ) min{w : y ≥ 1, w ≥ x2 , y ≥ x4 , x ≤ 0}
(CR ) min{w : y ≥ 1, w ≥ x2 , y ≥ x4 , x ≥ 0},
both with the same optimal solution, (x, w, y) = (0, 0, 1). Bound reduction
is crucial here for both subproblems, as it strengthens the bounds on x
using the lower bound on y. Indeed, x ≤ 0 and 1 ≤ y = x2 imply x ≤ −1,
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 129
and their optimal solutions are feasible for P0 and correspond to the two
global optima {−1, +1}. Hence, the problem is solved after branching on
the disjunction x ≤ 0 ∨ x ≥ 0. However, the nonlinear inequality x2 ≥ 1 is
valid for both CR and CR as it is implied by x ≤ −1 in the former and by
x ≥ 1 in the latter. Since w ≥ x2 , the (linear) disjunctive cut w ≥ 1 is valid
for both (SCR ) and (SCR ). If added to (CR), a lower bound of 1 is
obtained which allows to solve the problem without branching. It is easy to
prove that even using a linear relaxation and applying the CGLP procedure
yields the same disjunctive cut. This simple example can be complicated
by considering n variables subject each to a nonconvex constraint:
n 2
min i=1 xi
4
s.t. xi ≥ 1 ∀i = 1, 2 . . . , n.
www.it-ebooks.info
130 PIETRO BELOTTI
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 131
xk = exi xk = exi
(xi , xk )
(xi , xk )
xi xi
0 β 0 β
(a) Separable solution (b) Non-separable solution
Fig. 3. Separable and non-separable points. In (a), although the point is infeasi-
ble for P, another round of linearization cuts is preferred to a disjunction (either by
branching or through a disjunctive cut), as it is much quicker to separate. In (b), no
refinement of the linear relaxation is possible, and a disjunction must be applied.
first, xk < exi , while in the second xk > exi . In both cases, a disjunction
www.it-ebooks.info
132 PIETRO BELOTTI
Table 1
Procedure for generating a disjunctive cut for problem P.
2 See https://projects.coin-or.org/Couenne/browser/problems/nConv.
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 133
zbest − zlower
gap = (6.1)
zbest − z0
is given, where zbest is the objective value of the best solution found by the
four algorithms, zlower is the lower bound obtained by this algorithm, and
z0 is the initial lower bound found by couenne, which is the same for all
variants. If no feasible solution was found by any variant, the lower bound
is reported in brackets. The best performances are highlighted in bold.
www.it-ebooks.info
134 PIETRO BELOTTI
Table 2
Summary of the comparison between the four variants. The first three columns
report, for those instances that could be solved within the time limit of two hours, the
number of instances solved (“solved”), the number of instances for which the variant
obtained the best CPU time or within the 10% of the best (“best time”), and analogously
for the number of BB nodes (“best nodes”). The last column, “best gap,” reports the
number of instances, among those that could not be solved within two hours by any of the
variant, for which a variant obtained the smallest gap, or within 10% of the smallest,
in terms of the remaining gap.
<2h Unsolved
Variant solved best time best nodes best gap
v0 26 12 11 31
rb 26 13 11 35
dc 35 8 13 29
dc+rb 39 20 29 37
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 135
in the generation. This holds especially for the boxQP instances: although
a large amount of time is spent in generating disjunctive cuts, this results
in a better lower bound or a lower CPU time. Again, the fact that the
current separation algorithm is rather simple suggests that a more efficient
implementation would obtain the same benefit in shorter time.
We also graphically represent the performance of the four variants
using performance profiles [24]. Figure 4(a) depicts a comparison on the
CPU time. This performance profile only considers instances that could be
solved in less than two hours by at least one of the variants. Hence, it also
compares the quality of a variant in terms of number of instances solved.
Figure 4(b) is a performance profile on the number of nodes.
Figure 4(c) is a comparison on the remaining gap, and reports on all
instances for which none of the variants could obtain an optimal solution
in two hours or less. Note that this is not a performance profile: rather
than the ratio between gaps, this graph shows, for each algorithm, the
number of instances (plotted on the y axis) with remaining gap below the
corresponding entry on the x axis.
The three graphs show once again that, for the set of instances we
have considered, using both reliability branching and disjunctive cuts pay
off for both easy and difficult MINLP instances. The former are solved in
shorter time, while for the latter we yield a better lower bound.
7. Concluding remarks. Disjunctive cuts are as effective in MINLP
solvers as they are in MILP. Although they are generated from an LP
relaxation of a nonconvex MINLP, they can dramatically improve the lower
bound and hence the performance of a branch-and-bound method.
One disadvantage of the CGLP procedure, namely having to solve a
large LP in order to obtain a single cut, carries over to the MINLP case.
Some algorithms have been developed, for the MILP case, to overcome this
issue [6, 9]. Unfortunately, as shown in [2], their extension to the MINLP
case is not as straightforward.
Acknowledgments. The author warmly thanks François Margot for
all the useful discussions that led to the development of this work, and an
anonymous referee whose feedback helped improve this article. Part of this
research was conducted while the author was a Postdoctoral Fellow at the
Tepper School of Business, Carnegie Mellon University, Pittsburgh PA.
www.it-ebooks.info
136 PIETRO BELOTTI
40 40
30 30
20 20
10 10
0 0
1 10 100 100 101 102 103 10 4 10 5
50
40 dc+ rb
30 dc
20 rb
10 v0
0
0.0 0.2 0.4 0.6 0.8 1.0
www.it-ebooks.info
Table 3
Instances used in our tests. For each instance, “var” is the number of variables, “ivar” the number of integer variables, “con” the number
of constraints, and “aux” the number of auxiliary variables generated at the reformulation step. Instances in the boxQP group are continuous
and only constrained by a bounding box, hence columns “ivar” and “con” are omitted.
Name var aux Name var ivar con aux Name var ivar con aux
boxQP globallib minlplib
sp020-100-1 20 206 catmix100 301 0 200 800 waterz 195 126 137 146
sp030-060-1 30 265 lnts100 499 0 400 1004 ravem 111 53 186 189
sp030-070-1 30 311 camsh200 399 0 400 600 ravempb 111 53 186 189
sp030-080-1 30 368 qp2 50 0 2 1277 enpro56 128 73 192 188
sp030-090-1 30 402 qp3 100 0 52 52 enpro56pb 128 73 192 188
sp030-100-1 30 457 catmix200 601 0 400 1600 csched2 401 308 138 217
sp040-030-1 40 234 turkey 512 0 278 187 water4 195 126 137 146
sp040-040-1 40 319 qp1 50 0 2 1277 enpro48 154 92 215 206
sp040-050-1 40 399 elec50 150 0 50 11226 enpro48pb 154 92 215 206
sp040-060-1 40 478 camsh400 799 0 800 1200 space25a 383 240 201 119
sp040-070-1 40 560 arki0002 2456 0 1976 4827 contvar 279 87 279 747
sp040-080-1 40 648 polygon50 98 0 1273 6074 space25 893 750 235 136
sp040-090-1 40 715 arki0019 510 0 1 4488 lop97icx 986 899 87 407
sp040-100-1 40 806 arki0015 1892 0 1408 2659 du-opt5 18 11 6 221
sp050-030-1 50 366 infeas1 272 0 342 2866 du-opt 20 13 8 222
sp050-040-1 50 498 lnts400 1999 0 1600 4004 waste 1425 400 1882 2298
sp050-050-1 50 636 camsh800 1599 0 1600 2400 lop97ic 1626 1539 87 4241
www.it-ebooks.info
sp060-020-1 60 354 arki0010 3115 0 2890 1976 qapw 450 225 255 227
sp070-025-1 70 618 nConv MacMINLP
sp070-050-1 70 1227 c-sched47 233 140 138 217 trimlon4 24 24 24 41
sp070-075-1 70 1838 synheatmod 53 12 61 148 trimlon5 35 35 30 56
sp080-025-1 80 789 JoseSEN5c 987 38 1215 1845 trimlon6 168 168 72 217
sp080-050-1 80 1625 MIQQP trimlon7 63 63 42 92
sp080-075-1 80 2388 ivalues 404 202 1 3802 space-25-r 843 750 160 111
DISJUNCTIVE CUTS FOR NONCONVEX MINLP
sp090-025-1 90 1012 imisc07 519 259 212 696 space-25 893 750 235 136
sp090-050-1 90 2021 ibc1 2003 252 1913 1630 trimlon12 168 168 72 217
sp090-075-1 90 3033 iswath2 8617 2213 483 4807 space-960-i 5537 960 6497 3614
sp100-025-1 100 1251 imas284 301 150 68 366 misc.
sp100-050-1 100 2520 ieilD76 3796 1898 75 3794 airCond 102 80 156 157
sp100-075-1 100 3728
137
138
Table 4
Comparison between the four methods. Each entry is either the CPU time, in seconds, taken to solve the instance or, if greater than two
hours, the remaining gap (6.1) after two hours. If the remaining gap cannot be computed due to the lack of a feasible solution, the lower bound,
in brackets, is shown instead.
sp050-040-1 29.6% 28.5% 1669 632 lnts400 5.7% 5.7% 10.8% 10.8%
www.it-ebooks.info
sp050-050-1 57.5% 57.2% 22.8% 17.0% camsh800 87.3% 95.9% 97.4% 95.0%
sp060-020-1 1241 953 36 28 arki0010 1622 1538 2126 2079
sp070-025-1 40.1% 40.8% 2292 824 nConv
sp070-050-1 69.3% 70.1% 36.4% 45.3% c-sched47 4.2% 0.8% 4.0% 3.9%
sp070-075-1 81.4% 79.5% 76.7% 76.8% synheatmod 1.2% 0.0% 14.8% 141
sp080-025-1 53.4% 49.5% 4715 1241 JoseSEN5c 56.4% 100.0% 100.0% 99.8%
sp080-050-1 81.6% 81.2% 74.4% 74.4% MIQQP
sp080-075-1 86.9% 88.7% 82.0% 81.2% ivalues 12.1% 23.8% 25.1% 23.2%
sp090-025-1 68.6% 69.1% 38.6% 24.5% imisc07 88.7% 68.8% 99.6% 76.6%
sp090-050-1 83.9% 83.4% 77.4% 77.6% ibc1 (0.787) (0.787) (0.796) (0.813)
sp090-075-1 94.8% 94.9% 87.4% 91.7% iswath2 99.4% 100.0% 99.7% 98.8%
sp100-025-1 75.6% 75.0% 48.6% 40.9% imas284 5007 2273 54.7% 6928
sp100-050-1 89.8% 90.9% 83.7% 82.0% ieilD76 35.2% 99.0% 99.8% 99.6%
sp100-075-1 97.3% 96.6% 91.4% 91.4%
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 139
Table 5
Comparison between the four methods. Each entry is either the CPU time, in
seconds, taken to solve the instance or, if greater than two hours, the remaining gap
(6.1) after two hours. If the remaining gap cannot be computed due to the lack of a
feasible solution, the lower bound, in brackets, is shown instead.
Name v0 rb dc dc+rb
minlplib
waterz 75.1% 58.7% 73.2% 85.9%
ravem 16 23 74 66
ravempb 18 24 55 42
enpro56 39 26 156 76
enpro56pb 55 24 301 81
csched2 2.2% 3.9% 1.2% 3.6%
water4 55.2% 52.0% 44.2% 51.3%
enpro48 58 37 204 114
enpro48pb 49 41 201 126
space25a (116.4) (107.2) (107.3) (100.1)
contvar 91.1% 90.2% 91.1% 89.9%
space25 (98.0) (96.7) (97.1) (177.6)
lop97icx 93.0% 79.6% 93.9% 39.3%
du-opt5 73 170 251 159
du-opt 87 270 136 262
waste (255.4) (439.3) (273.7) (281.6)
lop97ic (2543.5) (2532.8) (2539.4) (2556.7)
qapw (0) (20482) (0) (20482)
MacMINLP
trimlon4 23 4 133 24
trimlon5 48.1% 46 35.6% 142
trimlon6 (16.17) (18.69) (16.18) (18.52)
trimlon7 88.1% 60.5% 89.8% 74.3%
space-25-r (74.2) (68.6) (75.0) (69.6)
space-25 (98.4) (89.6) (96.3) (91.9)
trimlon12 (16.1) (18.6) (16.1) (18.5)
space-960-i (6.5e+6) (6.5e+6) (6.5e+6) (6.5e+6)
misc.
airCond 187 0.9% 876 1471
www.it-ebooks.info
140 PIETRO BELOTTI
Table 6
Comparison of time spent in the separation of disjunctive cuts (tsep ) and in relia-
bility branching (tbr ). Also reported is the number of nodes (“nodes”) and of disjunctive
cuts generated (“cuts”).
rb dc dc+rb
Name tbr nodes tsep cuts tsep cuts tbr nodes
boxQP
sp040-060-1 8 340k 3104 17939 2286 11781 56 3k
sp040-070-1 10 277k 1510 5494 466 2042 56 277k
sp040-080-1 14 174k 4030 15289 3813 14831 95 4k
sp040-090-1 58 136k 4142 13733 3936 14604 115 3k
sp040-100-1 17 178k 4317 11415 3882 11959 126 2k
sp050-050-1 12 289k 4603 10842 4412 11114 127 6k
sp070-025-1 38 308k 1324 2215 383 921 28 308k
sp070-050-1 38 80k 5027 3823 4616 3622 477 80k
sp070-075-1 81 26k 6125 1220 5951 1089 46 26k
sp080-025-1 27 222k 2873 2818 716 1154 28 222k
sp080-050-1 66 35k 5888 1864 5561 1740 85 35k
sp080-075-1 119 4k 6449 720 6464 615 32 4k
sp090-025-1 32 170k 4672 3540 4386 3044 500 170k
sp090-050-1 86 26k 6064 1023 6335 838 37 26k
sp090-075-1 194 4k 6468 467 6862 250 8 4k
sp100-025-1 54 80k 5286 2990 4704 2372 659 80k
sp100-050-1 140 4k 6376 693 6363 733 34 4k
sp100-075-1 272 35k 6852 312 6835 302 3 35k
globallib
lnts100 6084 4k 3234 16 1195 6 4647 3k
camsh200 6242 10k 1772 2303 2123 2645 4192 9k
qp2 214 62k 2662 3505 1351 1662 444 41k
qp3 1298 803k 43 737 47 644 3272 484k
qp1 189 63k 3160 3525 1609 1736 240 45k
elec50 5677 45k 5494 45 5159 68 768 45k
camsh400 3494 33k 3433 818 5242 1263 560 53k
arki0002 6834 53k 3571 237 1641 181 5277 53k
arki0019 5761 53k 1846 39 1543 36 3812 53k
arki0015 2442 2k 2141 215 1442 106 2412 2k
infeas1 1306 2k 1495 45 1273 131 1397 2k
lnts400 457 2k 4767 1 4724 0 373 2k
camsh800 5001 23k 3671 188 2208 100 3172 23k
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 141
Table 7
(Continued) Comparison of time spent in the separation of disjunctive cuts (tsep )
and in reliability branching (tbr ). Also reported is the number of nodes (“nodes”) and
of disjunctive cuts generated (“cuts”).
rb dc dc+rb
Name tbr nodes tsep cuts tsep cuts tbr nodes
nConv
JoseSEN5c 5166 1061k 4280 39 341 5 5315 1061k
MIQQP
ivalues 2816 10k 4223 700 4589 733 1571 10k
imisc07 6005 2k 6835 2621 5475 2131 1404 2k
ibc1 310 2k 6166 38 6368 167 46 2k
iswath2 827 2k 6567 31 6526 52 5 2k
imas284 443 62k 6769 1408 4474 700 381 72k
ieilD76 2934 9k 6994 75 6869 56 804 9k
minlplib
contvar 4284 2k 11 19 13 21 4706 3k
space25 272 1051k 2133 706 178 309 311 1100k
lop97icx 6540 14k 1142 353 4226 2603 2649 3k
waste 5204 13k 7090 715 6934 253 82 13k
lop97ic 3178 11k 4556 25 6734 147 3547 11k
qapw 6400 61k 104 0 34 0 6367 60k
MacMINLP
trimlon6 8500 62k 58 50 558 2649 5944 63k
trimlon12 8432 62k 58 50 562 2649 5941 62k
space-960-i 5859 62k 2127 20 1624 0 4971 62k
misc.
airCond 7123 112k 293 1736 103 1526 39 541k
www.it-ebooks.info
142 PIETRO BELOTTI
REFERENCES
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 143
[25] S. Drewes, Mixed Integer Second Order Cone Programming, PhD thesis, Tech-
nische Universität Darmstadt, 2009.
[26] C.A. Floudas, Global optimization in design and control of chemical process sys-
tems, Journal of Process Control, 10 (2001), pp. 125–134.
[27] A. Frangioni and C. Gentile, Perspective cuts for a class of convex 0-1 mixed
integer programs, Mathematical Programming, 106 (2006), pp. 225–236.
[28] A. Fügenschuh and L. Schewe, Solving a nonlinear mixed-integer sheet metal
design problem with linear approximations, work in progress.
[29] GAMS Development Corp., Gamsworld global optimization library.
http://www.gamsworld.org/global/globallib/globalstat.htm.
[30] E. Hansen, Global Optimization Using Interval Analysis, Marcel Dekker, Inc.,
New York, 1992.
[31] C. Helmberg and F. Rendl, Solving quadratic (0,1)-problems by semidefinite
programs and cutting planes, Mathematical Programming, 82 (1998), pp. 291–
315.
[32] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, Springer
Verlag, Berlin, 1996.
[33] R.C. Jeroslow, There cannot be any algorithm for integer programming with
quadratic constraints, Operations Research, 21 (1973), pp. 221–224.
[34] J.J. Júdice, H.D. Sherali, I.M. Ribeiro, and A.M. Faustino, A
complementarity-based partitioning and disjunctive cut algorithm for mathe-
matical programming problems with equilibrium constraints, Journal of Global
Optimization, 36 (2006), pp. 89–114.
[35] J. Kallrath, Solving planning and design problems in the process industry using
mixed integer and global optimization, Annals of Operations Research, 140
(2005), pp. 339–373.
[36] M. Karamanov and G. Cornuéjols, Branching on general disjunctions, Math-
ematical Programming (2007).
[37] S. Leyffer, MacMINLP: ampl collection of MINLPs. Available at
http://www-unix.mcs.anl.gov/~leyffer/MacMINLP.
[38] L. Liberti, Writing global optimization software, in Global Optimization: from
Theory to Implementation, L. Liberti and N. Maculan, eds., Springer, Berlin,
2006, pp. 211–262.
[39] L. Liberti, C. Lavor, and N. Maculan, A branch-and-prune algorithm for the
molecular distance geometry problem, International Transactions in Opera-
tional Research, 15 (2008), pp. 1–17.
[40] L. Liberti, C. Lavor, M.A.C. Nascimento, and N. Maculan, Reformulation in
mathematical programming: an application to quantum chemistry, Discrete
Applied Mathematics, 157 (2009), pp. 1309–1318.
[41] L. Liberti and C.C. Pantelides, Convex envelopes of monomials of odd degree,
Journal of Global Optimization, 25 (2003), pp. 157–168.
[42] Lindo Systems, Lindo solver suite. Available online at
http://www.gams.com/solvers/lindoglobal.pdf.
[43] M.L. Liu, N.V. Sahinidis, and J.P. Shectman, Planning of chemical process net-
works via global concave minimization, in Global Optimization in Engineering
Design, I. Grossmann, ed., Springer, Boston, 1996, pp. 195–230.
[44] R. Lougee-Heimer, The Common Optimization INterface for Operations Re-
search, IBM Journal of Research and Development, 47 (2004), pp. 57–66.
[45] R. Lougee-Heimer, Cut generation library. http://projects.coin-or.org/Cgl,
2006.
[46] G.P. McCormick, Computability of global solutions to factorable nonconvex pro-
grams: Part I — Convex underestimating problems, Mathematical Program-
ming, 10 (1976), pp. 146–175.
[47] H. Mittelmann, A collection of mixed integer quadratically constrained quadratic
programs. http://plato.asu.edu/ftp/ampl files/miqp ampl.
www.it-ebooks.info
144 PIETRO BELOTTI
[48] R.E. Moore, Methods and Applications of Interval Analysis, Siam, Philadelphia,
1979.
[49] J.H. Owen and S. Mehrotra, A disjunctive cutting plane procedure for gen-
eral mixed-integer linear programs, Mathematical Programming, 89 (2001),
pp. 437–448.
[50] A.T. Phillips and J.B. Rosen, A quadratic assignment formulation of the molec-
ular conformation problem, tech. rep., CSD, Univ. of Minnesota, 1998.
[51] I. Quesada and I.E. Grossmann, Global optimization of bilinear process networks
and multicomponent flows, Computers & Chemical Engineering, 19 (1995),
pp. 1219–1242.
[52] H. Ratschek and J. Rokne, Interval methods, in Handbook of Global Optimiza-
tion, R. Horst and P. M. Pardalos, eds., Vol. 1, Kluwer Academic Publishers,
Dordrecht, 1995, pp. 751–828.
[53] F. Rendl and R. Sotirov, Bounds for the quadratic assignment problem using
the bundle method, Mathematical Programming, 109 (2007), pp. 505–524.
[54] N.V. Sahinidis, BARON: A general purpose global optimization software package,
Journal of Global Optimization, 8 (1996), pp. 201–205.
[55] A. Saxena, P. Bonami, and J. Lee, Disjunctive cuts for non-convex mixed in-
teger quadratically constrained programs, in Proceedings of the 13th Integer
Programming and Combinatorial Optimization Conference, A. Lodi, A. Pan-
conesi, and G. Rinaldi, eds., Vol. 5035 of Lecture Notes in Computer Science,
2008, pp. 17–33.
[56] , Convex relaxations of non-convex mixed integer quadratically constrained
programs: Projected formulations, Mathematical Programming (2011). To
appear.
[57] H. Scheel and S. Scholtes, Mathematical programs with complementarity con-
straints: Stationarity, optimality, and sensitivity, Mathematics of Operations
Research, 25 (2000), pp. 1–22.
[58] E.M.B. Smith, On the Optimal Design of Continuous Processes, PhD thesis, Im-
perial College of Science, Technology and Medicine, University of London, Oct.
1996.
[59] E.M.B. Smith and C.C. Pantelides, A symbolic reformulation/spatial branch-
and-bound algorithm for the global optimisation of nonconvex MINLPs, Com-
puters & Chem. Eng., 23 (1999), pp. 457–478.
[60] R.A. Stubbs and S. Mehrotra, A branch-and-cut method for 0-1 mixed convex
programming, Mathematical Programming, 86 (1999), pp. 515–532.
[61] M. Tawarmalani and N.V. Sahinidis, Convexification and global optimization in
continuous and mixed-integer nonlinear programming: Theory, algorithms,
software and applications, Vol. 65 of Nonconvex Optimization and Its Appli-
cations, Kluwer Academic Publishers, Dordrecht, 2002.
[62] , Global optimization of mixed-integer nonlinear programs: A theoretical and
computational study, Mathematical Programming, 99 (2004), pp. 563–591.
[63] D. Vandenbussche and G.L. Nemhauser, A branch-and-cut algorithm for non-
convex quadratic programs with box constraints, Mathematical Programming,
102 (2005), pp. 559–575.
www.it-ebooks.info
PART III:
Nonlinear Programming
www.it-ebooks.info
www.it-ebooks.info
SEQUENTIAL QUADRATIC PROGRAMMING METHODS
PHILIP E. GILL∗ AND ELIZABETH WONG∗
Abstract. In his 1963 PhD thesis, Wilson proposed the first sequential quadratic
programming (SQP) method for the solution of constrained nonlinear optimization prob-
lems. In the intervening 48 years, SQP methods have evolved into a powerful and effec-
tive class of methods for a wide range of optimization problems. We review some of the
most prominent developments in SQP methods since 1963 and discuss the relationship
of SQP methods to other popular methods, including augmented Lagrangian methods
and interior methods.
Given the scope and utility of nonlinear optimization, it is not surprising that SQP
methods are still a subject of active research. Recent developments in methods for mixed-
integer nonlinear programming (MINLP) and the minimization of functions subject to
differential equation constraints has led to a heightened interest in methods that may be
“warm started” from a good approximate solution. We discuss the role of SQP methods
in these contexts.
minimize f (x)
x∈R
n
⎧⎫
x ⎬⎨ (1.1)
subject to ≤ Ax ≤ u,
⎩ ⎭
c(x)
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 147
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_6,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
148 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 149
www.it-ebooks.info
150 PHILIP E. GILL AND ELIZABETH WONG
1.1. Notation. Given vectors a and b with the same dimension, the
vector with ith component ai bi is denoted by a · b. The vectors e and
ej denote, respectively, the column vector of ones and the jth column of
the identity matrix I. The dimensions of e, ei and I are defined by the
context. Given vectors x and y of dimension nx and ny , the (nx + ny )-
vector of elements of x augmented by elements of y is denoted by (x, y).
The ith component of a vector labeled with a subscript will be denoted
by ( · )i , e.g., (vN )i is the ith component of the vector vN . Similarly, the
subvector of components with indices in the index set S is denoted by ( · )S ,
e.g., (vN )S is the vector with components (vN )i for i ∈ S. The vector with
components max{−xi , 0} (i.e., the magnitude of the negative part of x) is
denoted by [ x ]− . The vector p-norm and its subordinate matrix norm is
denoted by · p .
www.it-ebooks.info
SQP METHODS 151
www.it-ebooks.info
152 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 153
m
minimize f (x) + ρ |ci (x)| subject to x ≥ 0, (1.5)
x∈Rn
i=1
i.e., the elastic problem implicitly enforces a penalty on the sum of the in-
feasibilities of the constraints c(x) = 0. If the original problem is infeasible,
then, for large values of ρ, there is a minimizer of the elastic problem that
is an O(1/ρ) approximation to a minimizer of the sum of the constraint
infeasibilities. This minimizer can be useful in identifying which of the
constraints are causing the infeasibility (see Chinneck [34, 35]).
The elastic problem is called an exact regularization of (1.2) because
if ρ is sufficiently large and (x∗ , π ∗ , z ∗ ) is optimal for (1.2), then it is also
optimal for the elastic problem (1.4) with u = v = 0. See Fletcher [64,
Section 12.3] for a discussion of these issues. The first-order necessary
conditions for (x∗ , u∗ , v ∗ , π ∗ , z ∗ ) to be an optimal solution for the elastic
problem (1.4) are
c(x∗ ) − u∗ + v ∗ = 0, u∗ ≥ 0, v ∗ ≥ 0, (1.6a)
g(x ) − J(x∗ )T π ∗ − z ∗ = 0,
∗
(1.6b)
x∗ · z ∗ = 0, z ∗ ≥ 0, x∗ ≥ 0 (1.6c)
u∗ · (ρe + π ∗ ) = 0, v ∗ · (ρe − π ∗ ) = 0, −ρe ≤ π ∗ ≤ ρe. (1.6d)
To see that the elastic problem (1.4) defines an exact regularization, note
that if π ∗ ∞ < ρ, then a solution (x∗ , π∗ , z ∗ ) of (1.3) is also a solution of
(1.6) with u∗ = v ∗ = 0. Conditions (1.6) are always necessary for a point
(x∗ , u∗ , v ∗ ) to be an optimal solution for (1.4) because the Mangasarian-
Fromovitz constraint qualification is always satisfied.
There are two caveats associated with solving the regularized problem.
First, if a solution of the original problem exists, it is generally only a local
solution of the elastic problem. The elastic problem may be unbounded
below, or may have local solutions that are not solutions of the original
problem. For example, consider the one-dimensional problem
1 3
minimize x + 1 subject to 3x − 32 x2 + 2x = 0, x ≥ 0, (1.7)
x∈R
which has a unique solution (x∗ , π∗ ) = (0, 12 ). For all ρ > 12 , the penalty
function (1.5) has a local minimizer x̄ = 2 − O(1/ρ) such that c(x̄) = 0.
This example shows that regularization can introduce “phantom” solutions
that do not appear in the original problem.
The second caveat is that, in general, the precise definition of the
elastic problem is not known in advance because an appropriate value of
the parameter ρ depends on the optimal multipliers π ∗ . This implies that,
in practice, any estimate of ρ may need to be increased if the minimization
appears to be converging to a regularized solution with u∗ + v ∗ = 0. If the
www.it-ebooks.info
154 PHILIP E. GILL AND ELIZABETH WONG
−1 0 1 2 3 4 5
Fig. 1. This figure depicts the objective function and penalty function (1.5)
for the one-dimensional problem (1.7). The constrained problem has a unique
solution (x∗ , π ∗ ) = (0, 12 ). However, for all ρ > 12 , the penalty function has a
local minimizer x̄ = 2 − O(1/ρ) with c(x̄) = 0.
www.it-ebooks.info
SQP METHODS 155
www.it-ebooks.info
156 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 157
www.it-ebooks.info
158 PHILIP E. GILL AND ELIZABETH WONG
where pk and qk denote the Newton steps for the primal and dual variables.
If the second block of equations is scaled by −1 we obtain the system
Hk −JkT pk gk − JkTπk
=− , (2.6)
Jk 0 qk ck
www.it-ebooks.info
SQP METHODS 159
subject to ck + Jk p = 0,
which, under certain conditions on the curvature of the Lagrangian dis-
cussed below, defines the step from xk to the point that minimizes the
local quadratic model of the objective function subject to the linearized
constraints. It is now a simple matter to include the constant objective
term fk (which does not affect the optimal solution) and write the dual
variables in terms of πk+1 = πk + qk instead of qk . The equations analo-
gous to (2.7) are then
Hk JkT pk g
=− k , (2.8)
Jk 0 −πk+1 ck
which are the first-order optimality conditions for the quadratic program
minimize fk + gkTp + 12 pTHk p subject to ck + Jk p = 0.
p∈R
n
www.it-ebooks.info
160 PHILIP E. GILL AND ELIZABETH WONG
. /
Let Qk be defined so that Jk Qk = 0 Uk , where Uk is m × m. The
assumption that Jk has rank m implies that Uk is nonsingular. If the n
columns of Qk are partitioned into blocks Zk and Yk of dimension n×(n−m)
and n × m, then
. / . /
Jk Qk = Jk Zk Yk = 0 Uk , (2.11)
Using block substitution on the system (2.12) we obtain the following equa-
tions for pk and πk+1 :
Uk pY = −ck , pN = Yk pY ,
T
Zk Hk Zk pZ = −ZkT (gk + Hk pN ), pT = Zk pZ , (2.13)
pk = p N + p T , UkT πk+1 = YkT (gk + Hk pk ).
www.it-ebooks.info
SQP METHODS 161
Uk pY = −ck , pN = Yk pY ,
xF = xk + pN , ZkT Hk Zk pZ = −ZkT g-k (xF ), pT = Zk pZ ,
(2.14)
p k = p N + pT , xk+1 = xF + pT ,
T
Uk πk+1 = YkT g-k (xk+1 ).
which implies that the normal component pN satisfies Jk pN = −ck and con-
stitutes the Newton step from xk to the point xF satisfying the linearized
constraints ck + Jk (x − xk ) = 0. On the other hand, the tangential step pT
satisfies pT = Zk pZ , where ZkTHk Zk pZ = −ZkT g-k (xF ). If the reduced Hes-
sian ZkTHk Zk is positive definite, which will be the case if xk is sufficiently
close to a locally unique (i.e., isolated) minimizer of (2.4), then pT defines
the Newton step from xF to the minimizer of the quadratic model f-k (x) in
the subspace orthogonal to the constraint normals (i.e., on the surface of
the linearized constraint c-k (x) = 0). It follows that the Newton direction
is the sum of two steps: a normal step to the linearized constraint and
the tangential step on the constraint surface that minimizes the quadratic
model. This property reflects the two (usually conflicting) underlying pro-
cesses present in all algorithms for optimization—the minimization of the
objective and the satisfaction of the constraints.
In the discussion above, the normal step pN is interpreted as a Newton
direction for the equations c-k (x) = 0 at x = xk . However, in some situa-
tions, pN may also be interpreted as the solution of a minimization problem.
The Newton direction pk is unique, but the decomposition pk = pT + pN
depends on the choice of the matrix Qk associated with the Jacobian fac-
torization (2.11). If Qk is orthogonal, i.e., if QTk Qk = I, then ZkT Yk = 0
and the columns of Yk form a basis for the range space of JkT . In this case,
pN and pT define the unique range-space and null-space decomposition of
pk , and pN is the unique solution with least two-norm of the least-squares
problem
min -
ck (xk ) + Jk p2 , or, equivalently, min ck + Jk p2 .
p p
www.it-ebooks.info
162 PHILIP E. GILL AND ELIZABETH WONG
Jk (-
xk − xk ) + ck = 0,
(2.15)
xk − xk ) − JkT π
gk + Hk (- -k = 0.
P KP T = LDLT , (2.16)
(In practice, the columns of B may occur anywhere.) When Jk has this
form, a basis for the null space of Jk is given by the columns of the (non-
orthogonal) matrix Qk defined as
−B −1S Im −B −1 S Im
Qk = , with Zk = and Yk = .
In−m In−m 0
www.it-ebooks.info
SQP METHODS 163
see Gill, Murray, Saunders and Wright [90]), and Zk need not be stored
explicitly.
For large sparse problems, the reduced Hessian ZkT Hk Zk associated
with the solution of (2.14) will generally be much more dense than Hk and
B. However, in many cases, n − m is small enough to allow the storage of
a dense Cholesky factor of ZkT Hk Zk .
2.2. Inequality constraints. Given an approximate primal-dual so-
lution (xk , πk ) with xk ≥ 0, an outer iteration of a typical SQP method
involves solving the QP subproblem (2.1), repeated here for convenience:
Assume for the moment that this subproblem is feasible, with primal-dual
xk , π
solution (- -k , z-k ). The next plain SQP iterate is xk+1 = x-k , πk+1 = π
-k
and zk+1 = z-k . The QP first-order optimality conditions are
xk − xk ) + ck = 0,
Jk (- x-k ≥ 0;
gk + Hk (-
xk − xk ) − JkT π
-k − z-k = 0, (2.18)
x-k · z-k = 0, z-k ≥ 0.
Let pk = x-k − xk and let p̄k denote the vector of free components of pk ,
i.e., the components with indices in I(-
xk ). Similarly, let z̄ k denote the free
components of z-k . The complementarity conditions imply that z̄ k = 0 and
we may combine the first two sets of equalities in (2.18) to give
H̄ k J¯Tk p̄k (gk + Hk ηk )I
=− , (2.19)
J¯k 0 −-πk ck + Jk ηk
If the active sets at x-k and xk are the same, i.e., A(- xk ) = A(xk ), then
ηk = 0. If x-k lies in a sufficiently small neighborhood of a nondegenerate
solution x∗ , then A(- xk ) = A(x∗ ) and hence J¯k has full row rank (see
Robinson [157]). In this case we say that the QP identifies the correct
active set at x∗ . If, in addition, (x∗ , π∗ ) satisfies the second-order sufficient
conditions for optimality, then KKT system (2.19) is nonsingular and the
plain SQP method is equivalent to Newton’s method applied to the equality-
constraint subproblem defined by fixing the variables in the active set at
their bounds.
However, at a degenerate QP solution, the rows of J¯k are linearly
dependent and the KKT equations (2.19) are compatible but singular.
www.it-ebooks.info
164 PHILIP E. GILL AND ELIZABETH WONG
Broadly speaking, there are two approaches to dealing with the degen-
erate case, where each approach is linked to the method used to solve the
QP subproblem. The first approach employs a QP method that not only
finds the QP solution x-k , but also identifies a “basic set” of variables that
define a matrix J0k with linearly independent rows. The second approach
solves a regularized or perturbed QP subproblem that provides a perturbed
version of the KKT system (2.19) that is nonsingular for any J¯k .
Identifying independent constraints. The first approach is based on
using a QP algorithm that provides a primal-dual QP solution that satisfies
a nonsingular KKT system analogous to (2.19). A class of quadratic pro-
gramming methods with this property are primal-feasible active-set meth-
ods, which form the basis of the software packages NPSOL and SNOPT.
Primal-feasible QP methods have two phases: in phase 1, a feasible point
is found by minimizing the sum of infeasibilities; in phase 2, the quadratic
objective function is minimized while feasibility is maintained. In each it-
eration, the variables are labeled as being “basic” or “nonbasic”, where the
nonbasic variables are temporarily fixed at their current value. The indices
of the basic and nonbasic variables are denoted by B and N respectively.
A defining property of the B–N partition is that the rows of the Jacobian
appearing in the KKT matrix are always linearly independent. Once an
initial basic set is identified, all subsequent KKT equations have a con-
straint block with independent rows. (For more details of primal-feasible
active-set methods, see Section A.1 of the Appendix.)
Let pk = x-k − xk , where (-xk , π
-k ) is the QP solution found by a primal-
feasible active-set method. Let p0k denote the vector of components of pk in
the final basic set B, with J0k the corresponding columns of Jk . The vector
pk , π
(0 -k ) satisfies the nonsingular KKT equations
H0
k J0kT p0k (gk + Hk ηk )B
=− , (2.20)
J0k 0 −-πk ck + Jk ηk
where ηk is now defined in terms of the final QP nonbasic set, i.e.,
xk − xk )i if i ∈ N ;
(-
(ηk )i = (2.21)
0 if i ∈ N .
As in (2.19), if the basic-nonbasic partition is not changed during the solu-
tion of the subproblem, then ηk = 0. If this final QP nonbasic set is used to
define the initial nonbasic set for the next QP subproblem, it is typical for
the later QP subproblems to reach optimality in a single iteration because
the solution of the first QP KKT system satisfies the QP optimality condi-
tions immediately. In this case, the phase-1 procedure simply performs a
feasibility check that would be required in any case.
Constraint regularization. One of the purposes of regularization is to
define KKT equations that are nonsingular regardless of the rank of J¯k .
Consider the perturbed version of equations (2.19) such that
www.it-ebooks.info
SQP METHODS 165
H̄ k J¯Tk p̄k (gk + Hk ηk )I
=− , (2.22)
J¯k −μI −-
πk ck + Jk ηk
J¯Tk π
-k = (gk+1 )I -k ∈ range(J¯k ).
and π
These are the necessary and sufficient conditions for π -k to be the unique
least-length solution of the compatible equations J¯Tk π = (gk+1 )I . This
implies that the regularization gives a unique vector of multipliers.
Wright [173, 174, 175] and Hager [117] show that an SQP method using
the regularized equations (2.22) will converge at a superlinear rate, even
in the degenerate case. In Section A.3 of the Appendix, QP methods are
discussed that give equations of the form (2.22) at every outer iteration,
not just in the neighborhood of the solution. These methods implicitly
shift the constraints by an amount of order μ and give QP multipliers that
converge to an O(μ) estimate of the least-length multipliers.
A related regularization scheme has been proposed and analyzed by
Fischer [58], who solves a second QP to obtain the multiplier estimates.
Anitescu [3] regularizes the problem by imposing a trust-region constraint
on the plain SQP subproblem (2.1) and solving the resulting subproblem
by a semidefinite programming method.
3. The formulation of modern SQP methods. SQP methods
have evolved considerably since Wilson’s thesis appeared in 1963. Current
www.it-ebooks.info
166 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 167
www.it-ebooks.info
168 PHILIP E. GILL AND ELIZABETH WONG
. /
f (xk + dk (α)) ≤ max [f (xk−j )] − ηΔmk dk (α) ,
0≤j≤r
www.it-ebooks.info
SQP METHODS 169
Han and Powell proposed the use of the 1 penalty function (1.8) as a
merit function, i.e., M(x)
= M(x; ρ) = P1 (x; ρ). Moreover, they suggested
that dk (α) = αpk = α(- xk − xk ), where x-k is the solution of the convex
subproblem
where B0k and J0k denote the matrices of basic components of Bk and Jk
and ηk is defined as in (2.21).
3.2.1. Quasi-Newton approximations. Many methods for uncon-
strained minimization use a quasi-Newton approximation of the Hessian
when second derivatives are either unavailable or too expensive to evaluate.
Arguably, the most commonly used quasi-Newton approximation is defined
using the BFGS method (see Broyden [14], Fletcher[59], Goldfarb[98], and
Shanno [164]). Given iterates xk and xk+1 , and a symmetric approximate
Hessian Bk , the BFGS approximation for the next iteration has the form
1 1
Bk+1 = Bk − B d dTB + y yT , (3.8)
dTkBk dk k k k k ykTdk k k
www.it-ebooks.info
170 PHILIP E. GILL AND ELIZABETH WONG
(As the Hessian of the Lagrangian does not include the linear constraints,
we have omitted z from the Lagrangian term.) This implies that the gra-
dient difference yk in (3.8) involves the gradient of the augmented La-
grangian, with
where πk+1 are estimates of the optimal dual variables. This proposal is
motivated by the fact that if ρ is sufficiently large, the Hessian of L(x, π; ρ)
is positive definite for all (x, π) close to an isolated solution (x∗ , π∗ ) (see
also, Tapia [165], and Byrd, Tapia and Zhang [29]).
The use of an augmented Lagrangian Hessian for the QP subproblem
xk , π
changes the properties of the QP dual variables. In particular, if (- -k , z-k )
is the solution of the QP (3.6) with Bk defined as Hk +ρJkT Jk , then (-xk , π
-k +
ρck , z-k ) is the solution of the QP (3.6) with Bk replaced by Hk (assuming
that the same local solution is found when Hk is not positive definite). In
other words, if the augmented Lagrangian Hessian is used instead of the
Lagrangian Hessian, the x and z variables do not change, but the π-values
are shifted by ρck . An appropriate value for πk+1 in the definition of yk is
then πk+1 = π -k + ρck , giving, after some simplification,
yk = gk+1 − gk − (Jk+1 − Jk )T π T
-k + ρJk+1 (ck+1 − ck ).
www.it-ebooks.info
SQP METHODS 171
dk = xk+1 − xk ,
(3.10)
yk = ∇x L(xk+1 , πk+1 , zk+1 ) − ∇x L(xk , πk+1 , zk+1 ).
If the QP multipliers are used for πk+1 , the difference in Lagrangian gradi-
ents is given by yk = gk+1 − gk − (Jk+1 − Jk )T π -k .
A positive-definite BFGS approximation may appear to be a surprising
choice for Bk , given that the Hessian of the Lagrangian is generally indefi-
nite at the solution. However, Powell’s proposal is based on the observation
that the approximate curvature is likely to be positive in the neighborhood
of an isolated solution, even when the Hessian of the Lagrangian is indefi-
nite. The reason for this is that the iterates of a quasi-Newton SQP converge
to the solution along a path that lies in the null space of the “free” columns
of the Jacobian. As the Lagrangian Hessian is generally positive definite
along this path, the approximate curvature ykT dk is positive as the iterates
converge and an R-superlinear convergence rate is obtained. Powell’s pro-
posal may be justified by considering the properties of (- xk , π
-k ), the solution
of the QP subproblem. Let pk = x-k − xk and g-(x) = gk + Bk (x − xk ). It is
shown in Section A.4.1 of the Appendix that (- xk , π
-k ) satisfies the equations
Uk pY = −ck , pN = Yk pY ,
xF = xk + pN , ZkT Bk Zk pZ = −ZkTg-(xF ), pT = Zk pZ , (3.11)
T T
p k = p N + pT , Uk π -k = Yk g-(xk + pk ),
www.it-ebooks.info
172 PHILIP E. GILL AND ELIZABETH WONG
ykT dk > 0 can be used. Given the definition (3.9) of the least permissible
approximate curvature, Powell [152] redefines yk as yk + Δyk , where Δyk
chosen so that (yk + Δyk )Tdk = σk , i.e.,
σk − ykT dk . /
Δyk = T
yk − Bk dk .
dk (yk − Bk dk )
The Powell modification is always well defined, which implies that it is
always applied—even when it might be unwarranted because of negative
curvature of the Lagrangian in the null space of J0k (cf. (3.7)).
3.2.2. Properties of the merit function. The Han-Powell merit
function M(x; ρ) = P1 (x; ρ) has the appealing property that x∗ is an uncon-
strained minimizer of P1 (x; ρ) for ρ > π ∗ ∞ (see, e.g., Zangwill [179], and
Han and Mangasarian [120]). A potential line-search model for P1 (x; ρ) is
mk (x; ρ) = fk + gkT(x − xk ) + 12 (x − xk )TBk (x − xk ) + ρck + Jk (x − xk )1 ,
which is the 1 penalty function defined with local affine and quadratic
approximations for c and f . However, because Bk is positive definite, a
stronger condition on α is defined by omitting the quadratic term and
using the line-search model
mk (x; ρ) = fk + gkT(x − xk ) + ρck + Jk (x − xk )1 . (3.12)
To obtain a smaller value of P1 (x; ρ) at each iteration, the line-search model
must satisfy Δmk (dk ; ρ) > 0, where Δmk (d; ρ) is the predicted reduction
in M analogous to (3.2). The optimality conditions (2.18) for the QP
subproblem together with the affine model (3.12) defined with x = xk +αpk
allow us to write the predicted reduction as
. /
Δmk (αpk ; ρ) = α ρck 1 + cTkπ -k − pTkz-k + pTkBk pk
. /
= α ρck 1 + cTkπ xk − xk )Tz-k + pTkBk pk . (3.13)
-k − (-
The QP optimality conditions give x-k · z-k = 0, yielding
. /
Δmk (αpk ; ρ) = α ρck 1 + cTkπ -k + xTkz-k + pTkBk pk
1m
. / 2
≥α π k )i | + xk · z-k 1 + pTkBk pk ,
|ci (xk )| ρ − |(-
i=1
www.it-ebooks.info
SQP METHODS 173
www.it-ebooks.info
174 PHILIP E. GILL AND ELIZABETH WONG
. /
B0 J0kT sk gk + Bk (pk + ηk ) B
k =− , (3.15)
J0k 0 −π̄ k c(xk + pk ) + Jk ηk
www.it-ebooks.info
SQP METHODS 175
www.it-ebooks.info
176 PHILIP E. GILL AND ELIZABETH WONG
straints, which can be kept feasible at every iterate (see, e.g., Gill, Murray
and Wright [94]).
www.it-ebooks.info
SQP METHODS 177
subject to −u ≤ ck + Jk (x − xk ) ≤ u, x + v ≥ 0, v ≥ 0, (3.20)
−σk e ≤ x − xk ≤ σk e.
This problem gives vectors u and v of least one-norm for which the con-
straints ck + Jk (x − xk ) = u, x + v ≥ 0 and x − xk ∞ ≤ σk are feasible. If
the original linearized constraints are feasible, then the work necessary to
solve problem (3.20) is comparable to that of the feasibility phase of a two-
phase active-set method for the plain QP subproblem (see Section A.1).
The difference is the extra expense of locating a bounded feasible point
with least-length distance from xk . Let xF denote the computed solution
of the phase-1 problem (3.20). The computation for phase 2 involves the
solution of the QP:
minimize fk + gkT(x − xk ) + 12 (x − xk )TBk (x − xk )
x∈R
n
subject to ck + Jk (x − xk ) − u + v = 0, (3.22)
x ≥ 0, u ≥ 0, v ≥ 0,
www.it-ebooks.info
178 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 179
where the {(ui , vi )} are vector pairs with each (ui , vi ) defined in terms of
(dk− , yk− ), . . . , (di−1 , yi−1 ) (cf. (3.10)) via
1 (i) 1
ui = (i)
Bk di , and vi = yi .
(yiTdi ) 2
1
(dTiBk di ) 2
1
www.it-ebooks.info
180 PHILIP E. GILL AND ELIZABETH WONG
Gourlay and Greenstadt [13], Dennis and Schnabel [48], and Gill, Murray
and Saunders [83]). The sign of vj may be chosen to minimize the rounding
error in computing wj . The quantities (dj , wj ) are stored for each j. During
outer iteration k, the QP solver accesses Bk by requesting products of the
form Bk z. These are computed with work proportional to using the
recurrence relations:
(0)
z ← z + (wjTz)dj , j = k − 1 : k − ; z ← Gk z;
(0)T
t ← Gk z; t ← t + (dTj t) wj , j = k − : k − 1.
Products of the form uTBk u are easily and safely computed as z22 with
z = Gk u.
In a QP solver that updates the Schur complement matrix an explicit
sparse Hessian, the system (3.7) with Bk = GTk Gk is equivalent to
⎛ ⎞⎛ ⎛ ⎞ ⎞
B0(0) J0T u
0k− w
0k− · · · u
0k−1 w
0k−1 pk (gk + Bk ηk )B
k k
⎜ 0 ⎟⎜ ⎜ ⎟
⎜ Jk ⎟ ⎜ −-πk ⎟
⎟ ⎜ ck + Jk ηk ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜u T ⎟ ⎜ rk− ⎟ ⎜ ⎟
⎜ 0k− γk− −1 ⎟⎜ ⎟ ⎜ 0 ⎟
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎜w0 −1 ⎟ ⎜ sk− ⎟ = − ⎜ 0 ⎟,
⎜ k− ⎟⎜ ⎟ ⎜ ⎟
⎜ . .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎜ .. . ⎟⎜ . ⎟ ⎜ . ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎝u0k−1 γk−1 −1 ⎠ ⎝rk−1 ⎠ ⎝ 0 ⎠
0Tk−1
w −1 sk−1 0
www.it-ebooks.info
SQP METHODS 181
www.it-ebooks.info
182 PHILIP E. GILL AND ELIZABETH WONG
where c-k (x) denotes the linearized constraint functions c-k (x) = ck + Jk (x −
xk ), and [ v ]− = max{−vi , 0}. In this case the bound constraints are
not imposed explicitly. Fletcher proposed minimizing this function using a
trust-region method, although a line-search method would also be appropri-
ate, particularly if Hk were positive definite. The trust-region subproblem
has the form
www.it-ebooks.info
SQP METHODS 183
www.it-ebooks.info
184 PHILIP E. GILL AND ELIZABETH WONG
see, e.g., Yuan [178], and Exler and Schittkowski [57]. A QP problem for the
second-order correction may be defined analogous to (3.27). For a general
discussion of the convergence properties of nondifferentiable exact penalty
functions in the SQP context, see Fletcher [64], Burke [22], and Yuan [176].
3.4. Filter methods. The definition of the merit function in the
Han-Powell method or the nonsmooth objective function in the sequential
unconstrained optimization method requires the specification of a penalty
parameter that weights the effect of the constraint violations against the
value of the objective. Another way of forcing convergence is to use a filter,
which is a two-dimensional measure of quality based on f (x) and c(x),
where we assume that x ≥ 0 is satisfied throughout. A filter method
requires
. that /progress be made with respect to the two-dimensional function
c(x), f (x) . Using the conventional notation of filter methods, we define
h(x) = c(x) as the measure of infeasibility
. of/ the equality constraints,
and use (hj , fj ) to denote the pair h(xj ), f (xj ) .
The two-dimensional measure provides the conditions for .a point x̄ to/
be “better” than a point x-. Given
. two points
/ x̄ and x-, the pair h(x̄), f (x̄)
is said to dominate the pair h(- x), f (-
x) if
h(x̄) ≤ βh(-
x) and f (x̄) ≤ f (-
x) − γh(x̄),
where β, γ ∈ (0, 1) are constants with 1−β and γ small (e.g., β = 1−γ with
γ = 10−3 ). (For brevity, we say that x̄ dominates x-, although it must be
emphasized that only the objective value and constraint norm are stored.)
A filter F consists of a list of entries (hj , fj ) such that no entry dominates
another. (This filter is the so-called sloping filter proposed by Chin [31]
and Chin and Fletcher [32]. The original filter proposed by Fletcher and
Leyffer [66, .67] uses γ = 0/ and β = 1.)
A pair h(xk ), f (xk ) is said to be “acceptable to the filter” F if and
only if it is not dominated by any entry in the filter, i.e.,
www.it-ebooks.info
SQP METHODS 185
for the new filter always includes the points that are unacceptable for the
old filter.
As in the Burke-Han approach of Section 3.2.3, the principal goal of
a filter method is the attainment of feasibility. An important property of
the filter defined above is that if there are an infinite sequence of iterations
in which (h(xk ), f (xk )) is entered into the filter, and {f (xk )} is bounded
below, then h(xk ) → 0 (see Fletcher, Leyffer and Toint [68]).
3.4.1. Trust-region filter methods. Fletcher and Leyffer [66, 67]
propose a trust-region filter method in which a filter is used to accept
or reject points generated by a plain SQP subproblem with a trust-region
constraint. Below we give a brief description of the variant of the Fletcher-
Leyffer method proposed by Fletcher, Leyffer and Toint [68]. The filter is
defined in terms of the one-norm of the constraint violations, i.e., h(x) =
c(x)1 , and the trust-region subproblem is given by
www.it-ebooks.info
186 PHILIP E. GILL AND ELIZABETH WONG
as xk+1 = x-k and the trust-region radius δk+1 for the next iteration is
initialized at some value greater than some preassigned minimum value
δmin . This reinitialization provides the opportunity to increase the trust-
region radius based on the change in f . For example, the trust-region
radius can be increased if the predicted reduction in f is greater that some
positive factor of h.
As mentioned above, although (hk , fk ) is acceptable to Fk−1 , it is
not necessarily added to the filter. The point xk is added if and only if
Δmk (dk ) ≤ 0, in which case the QP solution predicts an increase in f , and
the primary aim of the iteration changes to that of reducing h (by allowing
f to increase if necessary). The requirement that Δmk (dk ) ≤ 0 for adding
to the filter ensures that all the filter entries have hj > 0. This is because
if hk = 0, then the QP must be compatible (even without this being an
assumption), and hence, if xk is not a KKT point, then Δmk (dk ) > 0 and
xk is not added to the filter.
Now we drop our assumption that the QP problem (3.31) is always
feasible. If a new entry is never added to the filter during the backtracking
procedure, then δk → 0 and there are two situations that can occur. If
c(xk ) = 0, then the problem looks like an unconstrained problem. If f is
reduced then we must make progress and conventional trust-region theory
applies. On the other hand, if c(xk ) = 0, then reducing the trust-region ra-
dius will eventually give an infeasible QP. In this case, the method switches
to a restoration phase that focuses on minimizing h(x) subject to x ≥ 0.
In this case a restoration filter may be defined that allows nonmonotone
progress on h(x). Note that it is possible for the QP to be infeasible for
any infeasible xk . In this situation the filter method will converge to a
nonoptimal local minimizer of h(x) (just as the Han-Powell method may
converge to a nonoptimal local minimizer of the merit function).
The convergence properties of filter-SQP methods are similar to those
of methods that use a merit function. In particular, it is possible to estab-
lish convergence to either a point that satisfies the first-order necessary con-
ditions for optimality, or a point that minimizes h(x) locally (see Fletcher,
Leyffer and Toint [68] for details). It is not necessary that Hk be positive
definite, although x-k must be a global solution of the QP subproblem (3.31)
(see the cautionary opening remarks of the Appendix concerning the solu-
tion of indefinite QPs). Standard examples that exhibit the Maratos effect
for an SQP method with a merit function cause no difficulties for the filter
method. Although the unit step causes an increase in the constraint viola-
tion, and hence an increase in a penalty function, it also causes a decrease
in the objective and so it is acceptable to the filter. However, Fletcher and
Leyffer [67] give a simple example for which the QP solution increases both
the objective and the constraint violations, resulting in a reduction in the
trust-region radius and the rejection of the Newton step. Fletcher and Leyf-
fer propose the use of a second-order correction step analogous to (3.27).
Ulbrich [168] defines a filter that uses the Lagrangian function instead of
www.it-ebooks.info
SQP METHODS 187
If the trial step length is reduced below a minimum value αmink , the line
search is abandoned and the algorithm switches to the restoration phase.
For more details, the reader is referred to the two papers of Wächter and
Biegler [171, 170]. The caveats of the previous section concerning the def-
inition of Hk also apply to the line-search filter method. In addition, the
absence of an explicit bound on x − xk provided by the trust-region
constraint adds the possibility of unboundedness of the QP subproblem.
Chin, Rashid and Nor [33] consider a line-search filter method that
includes a second-order correction step during the backtracking procedure.
If xk + αpk is not acceptable to the filter, a second-order correction sk is
defined by solving the equality-constrained QP:
where ν ∈ (2, 3) and A(-xk ) is the active set predicted by the QP subproblem
(for a similar scheme, see Herskovits [121], and Panier and Tits [147, 148]).
Given an optimal solution sk , Chin, Rashid and Nor [33] show that under
certain assumptions, the sufficient decrease criteria
www.it-ebooks.info
188 PHILIP E. GILL AND ELIZABETH WONG
in Sections 3.2–3.4 are completely suitable for use with second derivatives.
The main difficulty stems from the possibility that the Hessian of the La-
grangian is indefinite, in which case the inequality constrained QP subprob-
lem is nonconvex. A nonconvex QP is likely to have many local solutions,
and may be unbounded. Some SQP methods are only well-defined if the
subproblem is convex—e.g., methods that rely on the use of a positive-
definite quasi-Newton approximate Hessian. Other methods require the
calculation of a global solution of the QP subproblem, which has the bene-
fit of ensuring that the “same” local solution is found for the final sequence
of related QPs. Unfortunately, nonconvex quadratic programming is NP-
hard, and even the seemingly simple task of checking for local optimality
is intractable when there are zero Lagrange multipliers (see the opening
remarks of the Appendix).
One approach to resolving this difficulty is to estimate the active set
using a convex programming approximation of the plain QP subproblem
(2.1). This active set is then used to define an equality-constrained QP
(EQP) subproblem whose solution may be used in conjunction with a merit
function or filter to obtain the next iterate. One of the first methods to
use a convex program to estimate the active set was proposed by Fletcher
and Sainz de la Maza [69], who proposed estimating the active set by
solving a linear program with a trust-region constraint. (Their method
was formulated first as a sequential unconstrained method for minimizing
a nonsmooth composite function. Here we describe the particular form of
the method in terms of minimizing the 1 penalty function P1 (x, ρ) defined
in (1.8).) The convex subproblem has the form
subject to ck + Jk (x − xk ) − u + v = 0, u ≥ 0, v ≥ 0, (3.35)
xk − δk e ≤ x ≤ xk + δk e, x + w ≥ 0, w ≥ 0.
This equivalence was the motivation for the method to be called the suc-
cessive linear programming (SLP) method. Fletcher and Sainz de la Maza
use the reduction in P1 (x, ρ) predicted by the first-order subproblem (3.34)
to assess the quality of the reduction P1 (xk , ρ) − P1 (xk + dk , ρ) defined by
a second-order method (to be defined below).
www.it-ebooks.info
SQP METHODS 189
Let dkLP = x-kLP − xk , where x-kLP is a solution of the LP (3.35), and define
Δlk = lk (xk ) − lk (xk + dkLP ). Then it holds that
where η is some preassigned scalar such that 0 < η < 12 . This criterion is
used to determine if the new iterate xk+1 should be set to (i) the current
iterate xk (which always triggers a reduction in the trust-region radius);
(ii) the second-order step xk + dk ; or (iii) the first-order step xk + dkLP .
The test for accepting the second-order step is done first, If the second-
order step fails, then the penalty function is recomputed at xk + dkLP and
the test is repeated to determine if xk+1 should be xk + dkLP . Finally, the
trust-region radius is updated based on a conventional trust-region strategy
that compares the reduction in the penalty function with the reduction
predicted by the LP subproblem (the reader is referred to the original paper
for details).
Next, we consider how to define a second-order step. Let B and N
denote the final LP basic and nonbasic sets for the LP (3.35). To simplify
the description, assume that the optimal u, v and w are zero. A second-
order iterate x-k can be defined as the solution of an equality-constrained
quadratic program (EQP) defined by minimizing the quadratic model
f-k (x) = fk +gkT (x−xk )+ 12 (x−xk )TBk (x−xk ) subject to ck +Jk (x−xk ) = 0,
with the nonbasic variables fixed at their current values. Let pk = x-k − xk ,
where (- xk , π
-k ) is the primal-dual EQP solution. Let p0k denote the vector
of components of pk in the final LP basic set B, with J0k the corresponding
columns of Jk . The vector (0 pk , π
-k ) satisfies the KKT equations
0
B 0T
J p
0 k (g k + B k ηk )B
k k
=− , (3.36)
J0k 0 −- πk ck + Jk ηk
There are many ways of solving these KKT equations. The most appropri-
ate method will depend on certain basic properties of the problem being
www.it-ebooks.info
190 PHILIP E. GILL AND ELIZABETH WONG
solved, which include the size of the problem (i.e., the number of variables
and constraints); whether or not the Jacobian is dense or sparse; and how
the approximate Hessian is stored (see, e.g., Section 3.2.1). Fletcher and
Sainz de la Maza suggest finding an approximate solution of the EQP using
a quasi-Newton approximation of the reduced Hessian matrix (see Coleman
and Conn [36]).
The results of Fletcher and Sainz de la Maza may be used to show that,
under reasonable nondegeneracy and second-order conditions, the active set
of the LP subproblem (3.35) ultimately predicts that of the smooth variant
of the penalty function at limit points of {xk }. This implies fast asymptotic
convergence. Fletcher and Sainz de la Maza did not consider the use of
exact second derivatives in their original paper, and it took more than 12
years before the advent of reliable second-derivative trust-region and filter
methods for the EQP subproblem allowed the potential of SLP methods
to be realized. Chin and Fletcher [32] proposed the use of a trust-region
filter method that does not require the use of the 1 penalty function. For
a similar approach that uses a filter, see Fletcher et al. [65]. In a series of
papers, Byrd, Gould, Nocedal and Waltz [25, 26] proposed a method that
employs an additional trust region to safeguard the EQP direction. They
also define an appropriate method for adjusting the penalty parameter.
Recently, Morales, Nocedal and Wu [137], and Gould and Robinson [105,
106, 107] have proposed SQP methods that identify the active set using a
convex QP based on a positive-definite BFGS approximation of the Hessian.
www.it-ebooks.info
SQP METHODS 191
www.it-ebooks.info
192 PHILIP E. GILL AND ELIZABETH WONG
the problem is the elastic problem (1.4) with ρ = ρk−1 , whereas at iteration
k, the problem is the elastic problem with ρ = ρk . We may write
ρk . /
f (x) + ρk eTu + ρk eTv = f (x) + ρk−1 eTu + ρk−1 eTv . (4.2)
ρk−1
If the NLP is infeasible, it must hold that u + v > 0. If ρk−1 is large,
with ρk > ρk−1 and u + v > 0, then the term f (x) is negligible in (4.2),
i.e., f (x) ρk−1 eTu + ρk−1 eTv, so that
www.it-ebooks.info
SQP METHODS 193
AB 0
ĀB = ,
aT 1
so the new basic solution x̄B is the old solution xB , augmented by the new
slack, which is infeasible. This means that if we solve the primal QP then it
would be necessary to go into phase 1 to get started. However, by solving
the dual QP, then we have an initial feasible subspace minimizer for the
dual based on a ȳ B (= x̄B ) such that ĀB ȳ B = b̄ and
z̄ = ḡ + H̄ ȳ − ĀTπ̄.
www.it-ebooks.info
194 PHILIP E. GILL AND ELIZABETH WONG
APPENDIX
A. Methods for quadratic programming. We consider methods
for the quadratic program
minimize g T(x − xI ) + 12 (x − xI )TH(x − xI )
x∈Rn
(A.1)
subject to Ax = AxI − b, x ≥ 0,
where g, H, b, A and xI are given constant quantities, with H symmetric.
The QP objective is denoted by f-(x), with gradient g-(x) = g + H(x − xI ).
In some situations, the general constraints will be written as c-(x) = 0, with
c-(x) = A(x − xI ) + b. The QP active set is denoted by A(x). A primal-
dual QP solution is denoted by (x∗ , π ∗ , z ∗ ). In terms of the QP defined at
the kth outer iteration of an SQP method, we have xI = xk , b = c(xk ),
g = g(xk ), A = J(xk ) and H = H(xk , πk ). It is assumed that A has rank
m. No assumptions are made about H other than symmetry. Conditions
that must hold at an optimal solution of (A.1) are provided by the following
result (see, e.g., Borwein [11], Contesse [43] and Majthay [132]).
Result A.1 (QP optimality conditions). The point x∗ is a local
minimizer of the quadratic program (A.1) if and only if
(a) c-(x∗ ) = 0, x∗ ≥ 0, and there exists at least one pair of vectors
π∗ and z ∗ such that g-(x∗ ) − AT π ∗ − z ∗ = 0, with z ∗ ≥ 0, and
z ∗ · x∗ = 0;
(b) pT Hp ≥ 0 for all nonzero p satisfying g-(x∗ )Tp = 0, Ap = 0, and
pi ≥ 0 for every i ∈ A(x∗ ).
Part (a) gives the first-order KKT conditions (2.18) for the QP (A.1). If
H is positive semidefinite, the first-order KKT conditions are both necessary
and sufficient for (x∗ , π ∗ , z ∗ ) to be a local primal-dual solution of (A.1).
Suppose that (x∗ , π∗ , z ∗ ) satisfies condition (a) with zi∗ = 0 and x∗i = 0
for some i. If H is positive semidefinite, then x∗ is a weak minimizer of
(A.1). In this case, x∗ is a global minimizer with a unique global minimum
f-(x∗ ). If H has at least one negative eigenvalue, then x∗ is known as a dead
point. Verifying condition (b) at a dead point requires finding the global
minimizer of an indefinite quadratic form over a cone, which is an NP-hard
problem (see, e.g., Cottle, Habetler and Lemke [44], Pardalos and Schnit-
ger [149], and Pardalos and Vavasis [150]). This implies that the optimality
of a candidate solution of a general quadratic program can be verified only
if more restrictive (but computationally tractable) sufficient conditions are
satisfied. A dead point is a point at which the sufficient conditions are not
satisfied, but certain necessary conditions hold. Computationally tractable
necessary conditions are based on the following result.
Result A.2 (Necessary conditions for optimality). The point x∗ is a
local minimizer of the QP (A.1) only if
(a) c-(x∗ ) = 0, x∗ ≥ 0, and there exists at least one pair of vectors
π∗ and z ∗ such that g-(x∗ ) − AT π ∗ − z ∗ = 0, with z ∗ ≥ 0, and
z ∗ · x∗ = 0;
www.it-ebooks.info
SQP METHODS 195
www.it-ebooks.info
196 PHILIP E. GILL AND ELIZABETH WONG
frain from referring to the nonbasic and basic sets as the “fixed” and “free”
variables because some active-set methods allow some nonbasic variables
to move (the simplex method for linear programming being one prominent
example). An important attribute of the nonbasic set is that AB has rank
m, i.e., the rows of AB are linearly independent. This implies that the
cardinality of the nonbasic set must satisfy 0 ≤ nN ≤ n − m. It must be
emphasized that our definition of N does not require a nonbasic variable to
be active (i.e., at its lower bound). Also, whereas the active set is defined
uniquely at each point, there are many choices for N (including the empty
set). Given any n-vector y, the vector of basic components of y, denoted
by yB , is the nB -vector whose jth component is component βj of y. Simi-
larly, yN , the vector nonbasic components of y, is the nN -vector whose jth
component is component νj of y.
Given a basic-nonbasic partition of the variables, we introduce the
definitions of stationarity and optimality with respect to a basic set.
Definition A.1 (Subspace stationary point). Let B be a basic set
defined at an x- such that c-(- x) = 0. Then x- is a subspace stationary point
with respect to B (or, equivalently, with respect to AB ) if there exists a
vector π such that g-B (- x) = ATB π. Equivalently, x- is a subspace stationary
point with respect to B if the reduced gradient ZBT g-B (- x) is zero, where the
columns of ZB form a basis for the null-space of AB .
If x- is a subspace stationary point, f- is stationary on the subspace
{x : A(x − x-) = 0, xN = x-N }. At a subspace stationary point, it holds that
g(-x) = AT π + z, where zi = 0 for i ∈ B—i.e., zB = 0. Subspace stationary
points may be classified based on the curvature of f- on the nonbasic set.
Definition A.2 (Subspace minimizer). Let x- be a subspace stationary
point with respect to B. Let the columns of ZB form a basis for the null-
space of AB . Then x- is a subspace minimizer with respect to B if the
reduced Hessian ZBT HZB is positive definite.
If the nonbasic variables are active at x-, then x- is called a standard
subspace minimizer. At a standard subspace minimizer, if zN ≥ 0 then x-
satisfies the necessary conditions for optimality. Otherwise, there exists an
index νs ∈ N such that zνs < 0. If some nonbasic variables are not active
at x-, then x- is called a nonstandard subspace minimizer.
It is convenient sometimes to be able to characterize the curvature of
f- in a form that does not require the matrix ZB explicitly. The inertia of
a symmetric matrix X, denoted by In(X), is the integer triple (i+ , i− , i0 ),
where i+ , i− and i0 denote the number of positive, negative and zero eigen-
values of X. Gould [101] shows that if AB has rank m and AB ZB = 0, then
ZBT HB ZB is positive definite if and only if
HB ATB
In(KB ) = (nB , m, 0), where KB = (A.2)
AB 0
(see Forsgren [70] for a more general discussion, including the case where AB
does not have rank m). Many algorithms for solving symmetric equations
www.it-ebooks.info
SQP METHODS 197
www.it-ebooks.info
198 PHILIP E. GILL AND ELIZABETH WONG
where, as above, g-B (x) are the basic components of g-(x), and HB and HD
are the basic rows of the basic and nonbasic columns of H. If x is a subspace
minimizer, then g-B (x) − ATB π = 0, so that this system simplifies to
⎛ ⎞⎛ ⎞ ⎛ ⎞
HB −ATB HD pB 0
⎝ AB 0 AN ⎠ ⎝ qπ ⎠ = ⎝ 0 ⎠ , (A.4)
0 0 IN pN es
yielding pB and qπ as the solution of the smaller system
HB −ATB pB (hνs )B
=− . (A.5)
AB 0 qπ a νs
The increment qN for multipliers zN are computed from pB , pN and qπ as
qN = (Hp − ATqπ )N . Once pB and qπ are known, a nonnegative step α is
computed so that x + αp is feasible and f-(x + αp) ≤ f-(x). The step that
minimizes f- as a function of α is given by
g(x)Tp/pTHp
−- if pTHp > 0,
α∗ = (A.6)
+∞ otherwise.
The best feasible step is then α = min{α∗ , αM }, where αM is the maximum
feasible step:
⎧
⎨ (xB )i if (pB )i < 0,
αM = min {γi }, where γi = −(pB )i (A.7)
1≤i≤nB ⎩
+∞ otherwise.
(As pN = es and the problem contains only lower bounds, x + tp remains
feasible with respect to the nonbasic variables for all t ≥ 0.) If α = +∞
then f- decreases without limit along p and the problem is unbounded.
Otherwise, the new iterate is (x̄, π̄) = (x + αp, π + αqπ ).
It is instructive to define the step α∗ of (A.6) in terms of the identities
www.it-ebooks.info
SQP METHODS 199
Let z(t) denote the vector of reduced costs at any point on the ray (x +
tp, π +tqπ ), i.e., z(t) = g-(x+tp)−AT(π +tqπ ). It follows from the definition
of p and qπ of (A.4) that zB (t) = 0 for all t, which implies that x + tp is a
subspace stationary point for any step t. (Moreover, x + tp is a subspace
minimizer because the KKT matrix KB is independent of t.) This property,
known as the parallel subspace property of quadratic programming, implies
that x + tp is the solution of an equality-constraint QP in which the bound
on the sth nonbasic is shifted to pass through x + tp. The component
zνs (t) is the reduced cost associated with the shifted version of the bound
xνs ≥ 0. By definition, the sth nonbasic reduced cost is negative at x,
i.e., zνs (0) < 0. Moreover, a simple calculation shows that zνs (t) is an
increasing linear function of t with zνs (α∗ ) = 0 if α∗ is bounded. A zero
reduced cost at t = α∗ means that the shifted bound can be removed from
the equality-constraint problem (A.3) (defined at x = x̄) without changing
its minimizer. Hence, if x̄ = x + α∗ p, the index νs is moved to the basic set,
which adds column aνs to AB for the next iteration. The shifted variable
has been removed from the nonbasic set, which implies that (x̄, π̄) is a
standard subspace minimizer.
If we take a shorter step to the boundary of the feasible region, i.e.,
αM < α∗ , then at least one basic variable lies on its bound at x̄ = x + αp,
and one of these, xβr say, is made nonbasic. If ĀB denotes the matrix AB
with column r deleted, then ĀB is not guaranteed to have full row rank
(for example, if x is a vertex, AB is square and ĀB has more rows than
columns). The linear independence of the rows of ĀB is characterized by
the so-called “singularity vector” uB given by the solution of the equations
HB −ATB uB e
= r . (A.9)
AB 0 vπ 0
The matrix ĀB has full rank if and only if uB = 0. If ĀB is rank deficient,
x̄ is a subspace minimizer with respect to the basis defined by removing
xνs , i.e., xνs is effectively replaced by xβr in the nonbasic set. In this case,
it is necessary to update the dual variables again to reflect the change of
basis (see Gill and Wong [96] for more details). The new multipliers are
π̄ + σvπ , where σ = g-(x̄)T p/(pB )r .
As defined above, this method requires the solution of two KKT sys-
tems at each step (i.e., equations (A.5) and (A.9)). However, if the solution
of (A.9) is such that uB = 0, then the vectors pB and qπ needed at x̄ can be
updated in O(n) operations using the vectors uB and vπ . Hence, it is un-
necessary to solve (A.5) when a basic variable is removed from B following
a restricted step.
Given an initial standard subspace minimizer x0 and basic set B0 , this
procedure generates a sequence of primal-dual iterates {(xj , πj )} and an
associated sequence of basic sets {Bj }. The iterates occur in groups of
consecutive iterates that start and end at a standard subspace minimizer.
www.it-ebooks.info
200 PHILIP E. GILL AND ELIZABETH WONG
The optimality conditions imply that pB and qπ satisfy the KKT system
HB −ATB pB g-B (x) − ATB π
=− . (A.11)
AB 0 qπ 0
www.it-ebooks.info
SQP METHODS 201
The scalar σ must be nonzero or else ATB v̄ = 0, which would contradict the
assumption that AB has rank m. Then
T AB T 0
v pB = v = σ(pB )r = 0,
eTr (pB )r
which implies that (pB )r = 0. This is a contradiction because the ratio test
(A.7) will choose βr as the outgoing basic variable only if (pB )r < 0. It
follows that v̄ = 0, and hence ĀB must have rank m.
www.it-ebooks.info
202 PHILIP E. GILL AND ELIZABETH WONG
1 Some were first proposed for the all-inequality constraint case, but they are easily
www.it-ebooks.info
SQP METHODS 203
H̄ B −ĀTB pB (hβs )B̄
=− and μ = pTB HB pB , (A.13)
ĀB 0 qπ aβs
where B̄ is the basic set with index βs omitted. A comparison of (A.13) with
(A.5) shows that their respective values of (pB , qπ ) are the same, which im-
plies that Fletcher-Gould binding direction is identical to the nonbinding
direction of Section A.1.1. In fact, all binding and nonbinding direction
inertia-controlling methods generate the same sequence of iterates when
started at the same subspace minimizer. The only difference is in the or-
der in which the computations are performed—binding-direction methods
make the targeted nonbasic variable basic at the start of the sequence of
consecutive iterates, whereas nonbinding-direction methods make the vari-
able basic at the end of the sequence when the associated shifted bound
constraint has a zero multiplier. However, it must be emphasized that not
all QP methods are inertia controlling. Some methods allow any number of
zero eigenvalues in the KKT matrix—see, for example, the Bunch-Kaufman
method mentioned above, and the QP methods in the GALAHAD software
package of Gould, Orban, and Toint [109, 110, 104].
www.it-ebooks.info
204 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 205
(b) A dual subspace stationary point at which the reduced KKT matrix
(A.16) is nonsingular is a dual subspace minimizer.
(c) If (w, π, z) is a standard subspace minimizer, then zB = 0 and
zN ≥ 0.
This result implies that x = w at a dual subspace minimizer for the
special case of H positive definite. However, it is helpful to distinguish
between w and x to emphasize that x is the vector of dual variables for
the dual problem. At a subspace stationary point, x is a basic solution of
the primal equality constraints. Moreover, z = H(w − xI ) − ATπ + g =
g-(w) − ATπ = g-(x) − ATπ, which are the primal reduced-costs.
Let (w, π) be a nonoptimal dual subspace minimizer for the dual QP
(A.14). (It will be shown below that the vector w need not be com-
puted explicitly.) As (w, π) is not optimal, there is at least one nega-
tive component of the dual multiplier vector xB , say xβr . If we apply the
nonbinding-direction method of Section A.1.1, we define a dual search di-
rection (Δw, qπ , Δz) that is feasible for the dual equality constraints and
increases a nonbasic variable with a negative multiplier. As (w, π, z) is
assumed to be dual feasible, this gives the constraints for the equality-
constraint QP subproblem in the form
The equations analogous to (A.4) for the dual direction (p, qπ , Δz) are
⎛ ⎞⎛ ⎞ ⎛ ⎞
H B HD 0 0 −HB −HD 0 ΔwB 0
⎜ HDT HN 0 0 −H T
−HN 0 ⎟ ⎜ ΔwN ⎟ ⎜ 0 ⎟
⎜ D ⎟⎜ ⎟ ⎜ ⎟
⎜ 0 AB AN 0 ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0 0 0 ⎟ ⎜ qπ ⎟ ⎜ 0 ⎟
⎜ 0 IN ⎟ ⎜
0 ⎟ ⎜ ΔzN ⎟ ⎜ ⎟
⎜ 0 0 0 0 ⎟ = ⎜ 0 ⎟,
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎜ HB HD −AB 0 0 0 IB ⎟ ⎜ pB ⎟ ⎜ 0 ⎟
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎝ HD HN −ATN −IN 0 0 0 ⎠ ⎝ pN ⎠ ⎝ 0 ⎠
0 0 0 0 0 0 IB ΔzB er
where pB and pN denote the changes in the multipliers x of the dual. Block
elimination gives HΔw = Hp, where pB , pN and qπ are determined by the
equations
HB −ATB pB e
pN = 0, and = r . (A.17)
AB 0 qπ 0
www.it-ebooks.info
206 PHILIP E. GILL AND ELIZABETH WONG
If the curvature is nonzero, the step α∗ = −(xB )r /(pB )r minimizes the dual
objective f-D (w + αΔw, π + αqπ , z + αΔz) with respect to α, and the rth
element of xB + α∗ pB is zero. If the xB are interpreted as estimates of the
primal variables, the step from xB to xB + α∗ pB increases the negative (and
hence infeasible) primal variable (xB )r until it reaches its bound of zero.
If α = α∗ gives a feasible point for dual inequalities, i.e., if z + α∗ Δz are
nonnegative, then the new iterate is (w + α∗ Δw, π + α∗ qπ , z + α∗ Δz). In
this case, the nonbinding variable is removed from the dual nonbasic set,
which means that the index βr is moved to N and the associated entries
of H and A are removed from HB and AB .
If α = α∗ is unbounded, or (w + α∗ Δw, π + α∗ qπ , z + α∗ Δz) is not
feasible, the step is the largest α such that g(w + αΔw) − AT (π + αqπ ) is
nonnegative. The required value is
⎧
⎨ (zN )i if (ΔzN )i < 0
αF = min {γi }, where γi = −(ΔzN )i (A.18)
1≤i≤nN ⎩
+∞ otherwise.
www.it-ebooks.info
SQP METHODS 207
. /
then the dual constraint gradients, the rows of H −AT , are linearly in-
dependent, and the dual feasible region has no degenerate points. In this
situation, an active-set dual method cannot cycle, and will either terminate
with an optimal solution or declare the dual problem to be unbounded.
This nondegeneracy property does not hold for a dual linear program, but
it does hold for strictly convex problems and any QP with H and A of
the form
H̄ 0 . /
H= and A = Ā −Im ,
0 0
Some references include: Goldfarb and Idnani [100], Powell [154]. A variant
of the Goldfarb and Idnani method for dense convex QP has been proposed
by Boland [10].
A.3. QP regularization. The methods considered above rely on the
assumption that each basis matrix AB has rank m. In an active-set method
this condition is guaranteed (at least in exact arithmetic) by the active-set
strategy if the initial basis has rank m. For methods that solve the KKT
system by factoring a subset of m columns of AB (see Section A.4.1), special
techniques can be used to select a linearly independent set of m columns
from A. These procedures depend on the method used to factor the basis—
for example, the SQP code SNOPT employs a combination of LU factor-
ization and basis repair to determine a full-rank basis. If a factorization
www.it-ebooks.info
208 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 209
(see, e.g., Gill and Robinson [95], Gill and Wong [96]). This result implies
that if πE is an approximate multiplier vector (e.g., from the previous QP
subproblem in the SQP context), then the minimizer of M(x, π; πE , μ, ν)
will approximate the minimizer of (A.19). In order to distinguish between
a solution of (A.19) and a minimizer of (A.23) for an arbitrary πE , we
use (x∗ , π∗ ) to denote a minimizer of M(x, π; πE , μ, ν). Observe that sta-
tionarity of ∇M at (x∗ , π∗ ) implies that π∗ = π̄(x∗ ) = πE − c-(x∗ )/μ. The
components of π̄(x∗ ) are the so-called first-order multipliers associated with
a minimizer of (A.23).
Particular values of the parameter ν give some well-known functions
(although, as noted above, each function defines a problem with the com-
mon solution (x∗ , π∗ )). If ν = 0, then M is independent of π, with
1
M(x; πE , μ) ≡ M(x; πE , μ, 0) = f-(x) − c-(x)T πE + c(x)22 .
- (A.24)
2μ
This is the conventional Hestenes-Powell augmented Lagrangian (1.11) ap-
plied to (A.19). If ν = 1 in (A.21), M is the primal-dual augmented
Lagrangian
1 1
f-(x) − c-(x)T πE + c(x)22 +
- c(x) + μ(π − πE )22
- (A.25)
2μ 2μ
considered by Robinson [155] and Gill and Robinson [95]. If ν = −1, then
M is the proximal-point Lagrangian
μ
f-(x) − c-(x)T π − π − πE 22 .
2
As ν is negative in this case, ∇2M is indefinite and M has an unbounded
minimizer. Nevertheless, a unique minimizer of M for ν > 0 is a saddle-
point for an M defined with a negative ν. Moreover, for ν = −1, (x∗ , π∗ )
solves the min-max problem
μ
min max f-(x) − c-(x)T π − π − πE 22 .
x π 2
In what follows, we use M(v) to denote M as a function of the primal-
dual variables v = (x, π) for given values of πE , μ and ν. Given the initial
point vI = (xI , πI ), the stationary point of M(v) is v∗ = vI + Δv, where
Δv = (p, q) with ∇2M(vI )Δv = −∇M(vI ). It can be shown that Δv
satisfies the equivalent system
H −AT p g-(xI ) − AT πI
=− , (A.26)
A μI q c-(xI ) + μ(πI − πE )
www.it-ebooks.info
210 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 211
This technique has been proposed for general nonlinear programming (see,
e.g., Conn, Gould and Toint [39, 40, 41], Friedlander [73], and Friedlander
and Saunders [75]), and to quadratic programming (see, e.g., Dostál, Fried-
lander and Santos [51, 52, 53], Delbos and Gilbert [46], Friedlander and
Leyffer [74]), and Maes [131]). Subproblem (A.30) may be solved using one
of the active-set methods of Sections A.1.2 and A.1.1, although no explicit
phase-one procedure is needed because there are no general constraints.
In the special case of problem (A.30) a primal active-set method defines a
sequence of nonnegative iterates {xj } such that xj+1 = xj + αj pj ≥ 0. At
the jth iteration of the binding-direction method of Section A.1.2, variables
in the nonbasic set N remain unchanged for any value of the step length,
i.e., pN = (pj )N = 0. This implies that the elements pB of the direction pj
must solve the unconstrained QP subproblem:
www.it-ebooks.info
212 PHILIP E. GILL AND ELIZABETH WONG
has some advantages when solving convex and general QP problems. For
more details, see Gill and Wong [96].
If the QP is a “one-off” problem, then established techniques associated
with the bound-constrained augmented Lagrangian method can be used to
update πE and μ (see, e.g., Conn, Gould and Toint [40], Dostál, Friedlander
and Santos [51, 52, 53], Delbos and Gilbert [46], and Friedlander and Leyffer
[74]). These rules are designed to update πE and μ without the need to
find the exact solution of (A.30). In the SQP context, it may be more
appropriate to find an approximate solution of (A.30) for a fixed value of
πE , which is then updated in the outer iteration. Moreover, as μ is being
used principally for regularization, it is given a smaller value than is typical
in a conventional augmented Lagrangian method.
A.4. Solving the KKT system. The principal work associated with
a QP iteration is the cost of solving one or two saddle-point systems of
the form
⎛ ⎞⎛ ⎞ ⎛ ⎞
HB −ATB HD yB gB
⎝ AB μI AN ⎠ ⎝ w ⎠ = ⎝ f1 ⎠ , (A.33)
IN yN f2
where μ is a nonnegative scalar. We focus on two approaches appropriate
for large-scale quadratic programming.
A.4.1. Variable-reduction methods. These methods are appropri-
ate for the case μ = 0. As AB has rank m, there exists a nonsingular QB
such that
. /
A B QB = 0 B , (A.34)
with B an m × m nonsingular matrix. If μ = 0 the matrix QB is used to
transform the generic system (A.33) to
. block-triangular
/ form. The columns
of QB are partitioned so that QB = ZB YB with ZB an nB − m by nB
matrix, then AB ZB = 0 and the columns of ZB span the null-space of AB .
Analogous to (2.11) we obtain the permuted block-triangular system:
⎛ T ⎞⎛ ⎞ ⎛ ⎞
ZB HB ZB ZBT HB YB ZBT HD yZ gZ
⎜ YBTHB ZB YBT HB YB −B T YBT HD ⎟ ⎜yY ⎟ ⎜gY ⎟
⎜ ⎟⎜ ⎟ = ⎜ ⎟, (A.35)
⎝ B 0 AN ⎠ ⎝ w ⎠ ⎝ f1 ⎠
IN yN f2
with gZ = ZBT gB and gY = YBT gB . We formulate the result of the block
substitution in a form that uses matrix-vector products involving the full
matrix H rather than the submatrix HB . This is done to emphasize the
practical utility of accessing the QP Hessian as an operator that defines
the product Hx for a given x. This reformulation requires the definition of
the explicit column permutation P that identifies the basic and nonbasic
columns AB and AN of A, i.e.,
. /
AP = AB AN . (A.36)
www.it-ebooks.info
SQP METHODS 213
T T T
. matrices/ Q, W , Y and Z that act
Given the permutation P , we define
on vectors of length n, i.e., Q = Z Y W , where
ZB YB 0
Z =P , Y =P , and W = P .
0 0 IN
yW = f2 , y0 = W y W ,
ByY = f1 − AN f2 , y1 = Y y Y + y 0 ,
(A.37)
Z THZyZ = Z T(g − Hy1 ), y = ZyZ + y1 ,
B Tw = −Y T(g − Hy).
There are many practical choices for the matrix QB . For small-to-
medium scale problems with dense A and H, the matrix QB can be cal-
culated as the orthogonal factor associated with the QR factorization of
a row and column-permuted ATB (see, e.g., Gill et al. [87]). The method
of variable reduction is appropriate when A is sparse. In this case the
permutation P of (A.36) is specialized further to give
. / . /
AP = AB AN , with AB = B S ,
where K0 is the KKT matrix at the initial point. For simplicity, we assume
that the second block of the variables is scaled by −1 so that the (1, 2)
block of K0 is ATB , not −ATB . The Schur complement method is based
on the assumption that factorizations for K0 and the Schur complement
www.it-ebooks.info
214 PHILIP E. GILL AND ELIZABETH WONG
K0 t = b, Cw = f − V Tt, K0 z = b − V w.
The work required is dominated by two solves with the fixed matrix K0
and one solve with the Schur complement C. If the number of changes to
the basic set is small enough, dense factors of C may be maintained.
We illustrate the definition of (A.38) immediately after the matrix K0
is factorized. (For more details, see, e.g., Bisschop and Meeraus [8], Gill
et al. [91].) Suppose that variable s enters the basic set. The next KKT
matrix can be written as
⎛ ⎞
HB ATB (hs )B
⎝ AB μI as ⎠ ,
T T
(hs )B as hss
where as and hs are the sth columns of. A and H. /This is a matrix of the
form (A.38) with D = (hss ) and V T = (hs )TB aTs .
Now consider the case where the rth basic variable is deleted from the
basic set, so that the rth column is removed from AB . The correspond-
ing changes can be enforced in the solution of the KKT system using the
bordered matrix:
⎛ ⎞
HB ATB (hs )B er
⎜ AB
⎜ μI as 0⎟⎟
⎜ (h )T aT hss 0⎟ .
⎝ s B s ⎠
eTr 0 0 0
Bordering with the unit row and column has the effect of zeroing out the
components of the solution corresponding to the deleted basic variable.
The Schur complement method can be extended to a block LU method
by storing the bordered matrix in block-factored form
K0 V L U Y
= , (A.39)
VT D ZT I C
Lt = b, Cw = f − Z Tt, U z = t − Y w.
This method requires a solve with L and U each, one multiply with Y and
Z T , and one solve with the Schur complement C.
www.it-ebooks.info
SQP METHODS 215
BpY = −-
c(xI ), pF = Y pY , x0 = xI + pF .
where e is the vector of ones. This approach has been used by Gould [103],
and Huynh [123]. An alternative is to define a phase-one subproblem that
minimizes the two-norm of the constraint violations, i.e.,
1 2
minimize
x,v 2 v2 subject to Ax + v = AxI − b, x ≥ 0. (A.41)
This problem is a convex QP. Given an initial point x0 and nonbasic set
N0 for the phase-two problem, the basic variables for phase one consist of
www.it-ebooks.info
216 PHILIP E. GILL AND ELIZABETH WONG
minimize 1
2 Ax − b22 + 12 σx − xI 22 subject to x ≥ 0. (A.42)
x∈R n
If a feasible point exists, then the problem is feasible with the objective
bounded below by zero. The only constraints of this problem are bounds
on x. Applying the nonbinding direction method of Section A.1.1 gives
σIB ATB pB 0
= ,
AB −I −qπ −aνs
REFERENCES
[1] P.R. Amestoy, I.S. Duff, J.-Y. L’Excellent, and J. Koster, A fully asyn-
chronous multifrontal solver using distributed dynamic scheduling, SIAM J.
Matrix Anal. Appl., 23 (2001), pp. 15–41 (electronic).
[2] M. Anitescu, On the rate of convergence of sequential quadratic programming
with nondifferentiable exact penalty function in the presence of constraint
degeneracy, Math. Program., 92 (2002), pp. 359–386.
[3] , A superlinearly convergent sequential quadratically constrained quadratic
programming algorithm for degenerate nonlinear programming, SIAM J. Op-
tim., 12 (2002), pp. 949–978.
[4] C. Ashcraft and R. Grimes, SPOOLES: an object-oriented sparse matrix li-
brary, in Proceedings of the Ninth SIAM Conference on Parallel Processing
for Scientific Computing 1999 (San Antonio, TX), Philadelphia, PA, 1999,
SIAM, p. 10.
[5] R.A. Bartlett and L.T. Biegler, QPSchur: a dual, active-set, Schur-
complement method for large-scale and structured convex quadratic program-
ming, Optim. Eng., 7 (2006), pp. 5–32.
[6] D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods,
Athena Scientific, Belmont, Massachusetts, 1996.
[7] M.C. Biggs, Constrained minimization using recursive equality quadratic pro-
gramming, in Numerical Methods for Nonlinear Optimization, F.A. Lootsma,
ed., Academic Press, London and New York, 1972, pp. 411–428.
[8] J. Bisschop and A. Meeraus, Matrix augmentation and partitioning in the
updating of the basis inverse, Math. Program., 13 (1977), pp. 241–254.
[9] P.T. Boggs and J.W. Tolle, Sequential quadratic programming, in Acta Nu-
merica, 1995, Vol. 4 of Acta Numer., Cambridge Univ. Press, Cambridge,
1995, pp. 1–51.
www.it-ebooks.info
SQP METHODS 217
www.it-ebooks.info
218 PHILIP E. GILL AND ELIZABETH WONG
[33] C.M. Chin, A.H.A. Rashid, and K.M. Nor, A combined filter line search and
trust region method for nonlinear programming, WSEAS Trans. Math., 5
(2006), pp. 656–662.
[34] J.W. Chinneck, Analyzing infeasible nonlinear programs, Comput. Optim.
Appl., 4 (1995), pp. 167–179.
[35] , Feasibility and infeasibility in optimization: algorithms and computa-
tional methods, International Series in Operations Research & Management
Science, 118, Springer, New York, 2008.
[36] T.F. Coleman and A.R. Conn, On the local convergence of a quasi-Newton
method for the nonlinear programming problem, SIAM J. Numer. Anal., 21
(1984), pp. 775–769.
[37] T.F. Coleman and A. Pothen, The null space problem I. Complexity, SIAM J.
on Algebraic and Discrete Methods, 7 (1986), pp. 527–537.
[38] T.F. Coleman and D.C. Sorensen, A note on the computation of an orthogonal
basis for the null space of a matrix, Math. Program., 29 (1984), pp. 234–242.
[39] A.R. Conn, N.I.M. Gould, and Ph. L. Toint, Global convergence of a class of
trust region algorithms for optimization with simple bounds, SIAM J. Numer.
Anal., 25 (1988), pp. 433–460.
[40] , A globally convergent augmented Lagrangian algorithm for optimization
with general constraints and simple bounds, SIAM J. Numer. Anal., 28
(1991), pp. 545–572.
[41] , LANCELOT: a Fortran package for large-scale nonlinear optimization (Re-
lease A), Lecture Notes in Computation Mathematics 17, Springer Verlag,
Berlin, Heidelberg, New York, London, Paris and Tokyo, 1992.
[42] , Trust-Region Methods, Society for Industrial and Applied Mathematics
(SIAM), Philadelphia, PA, 2000.
[43] L.B. Contesse, Une caractérisation complète des minima locaux en programma-
tion quadratique, Numer. Math., 34 (1980), pp. 315–332.
[44] R.W. Cottle, G.J. Habetler, and C.E. Lemke, On classes of copositive ma-
trices, Linear Algebra Appl., 3 (1970), pp. 295–310.
[45] Y.-H. Dai and K. Schittkowski, A sequential quadratic programming algorithm
with non-monotone line search, Pac. J. Optim., 4 (2008), pp. 335–351.
[46] F. Delbos and J.C. Gilbert, Global linear convergence of an augmented La-
grangian algorithm to solve convex quadratic optimization problems, J. Con-
vex Anal., 12 (2005), pp. 45–69.
[47] R.S. Dembo and U. Tulowitzki, Sequential truncated quadratic programming
methods, in Numerical optimization, 1984 (Boulder, Colo., 1984), SIAM,
Philadelphia, PA, 1985, pp. 83–101.
[48] J.E. Dennis, Jr. and R.B. Schnabel, A new derivation of symmetric positive
definite secant updates, in Nonlinear Programming, 4 (Proc. Sympos., Special
Interest Group on Math. Programming, Univ. Wisconsin, Madison, Wis.,
1980), Academic Press, New York, 1981, pp. 167–199.
[49] G. DiPillo and L. Grippo, A new class of augmented Lagrangians in nonlinear
programming, SIAM J. Control Optim., 17 (1979), pp. 618–628.
[50] W.S. Dorn, Duality in quadratic programming, Quart. Appl. Math., 18
(1960/1961), pp. 155–162.
[51] Z. Dostál, A. Friedlander, and S. A. Santos, Adaptive precision control in
quadratic programming with simple bounds and/or equalities, in High per-
formance algorithms and software in nonlinear optimization (Ischia, 1997),
Vol. 24 of Appl. Optim., Kluwer Acad. Publ., Dordrecht, 1998, pp. 161–173.
[52] , Augmented Lagrangians with adaptive precision control for quadratic
programming with equality constraints, Comput. Optim. Appl., 14 (1999),
pp. 37–53.
[53] , Augmented Lagrangians with adaptive precision control for quadratic pro-
gramming with simple bounds and equality constraints, SIAM J. Optim., 13
(2003), pp. 1120–1140 (electronic).
www.it-ebooks.info
SQP METHODS 219
[54] I.S. Duff, MA57—a code for the solution of sparse symmetric definite and in-
definite systems, ACM Trans. Math. Software, 30 (2004), pp. 118–144.
[55] I.S. Duff and J.K. Reid, MA27: a set of Fortran subroutines for solving sparse
symmetric sets of linear equations, Tech. Rep. R-10533, Computer Science
and Systems Division, AERE Harwell, Oxford, England, 1982.
[56] S.K. Eldersveld and M.A. Saunders, A block-LU update for large-scale linear
programming, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 191–201.
[57] O. Exler and K. Schittkowski, A trust region SQP algorithm for mixed-integer
nonlinear programming, Optim. Lett., 1 (2007), pp. 269–280.
[58] A. Fischer, Modified Wilson’s method for nonlinear programs with nonunique
multipliers, Math. Oper. Res., 24 (1999), pp. 699–727.
[59] R. Fletcher, A new approach to variable metric algorithms, Computer Journal,
13 (1970), pp. 317–322.
[60] , A general quadratic programming algorithm, J. Inst. Math. Applics., 7
(1971), pp. 76–91.
[61] , A model algorithm for composite nondifferentiable optimization problems,
Math. Programming Stud. (1982), pp. 67–76. Nondifferential and variational
techniques in optimization (Lexington, Ky., 1980).
[62] , Second order corrections for nondifferentiable optimization, in Numeri-
cal analysis (Dundee, 1981), Vol. 912 of Lecture Notes in Math., Springer,
Berlin, 1982, pp. 85–114.
[63] , An 1 penalty method for nonlinear constraints, in Numerical Optimiza-
tion 1984, P.T. Boggs, R.H. Byrd, and R.B. Schnabel, eds., Philadelphia,
1985, pp. 26–40.
[64] , Practical methods of optimization, Wiley-Interscience [John Wiley &
Sons], New York, 2001.
[65] R. Fletcher, N.I.M. Gould, S. Leyffer, Ph. L. Toint, and A. Wächter,
Global convergence of a trust-region SQP-filter algorithm for general non-
linear programming, SIAM J. Optim., 13 (2002), pp. 635–659 (electronic)
(2003).
[66] R. Fletcher and S. Leyffer, User manual for filterSQP, Tech. Rep. NA/181,
Dept. of Mathematics, University of Dundee, Scotland, 1998.
[67] , Nonlinear programming without a penalty function, Math. Program., 91
(2002), pp. 239–269.
[68] R. Fletcher, S. Leyffer, and Ph. L. Toint, On the global convergence of a
filter-SQP algorithm, SIAM J. Optim., 13 (2002), pp. 44–59 (electronic).
[69] R. Fletcher and E. Sainz de la Maza, Nonlinear programming and nonsmooth
optimization by successive linear programming, Math. Program., 43 (1989),
pp. 235–256.
[70] A. Forsgren, Inertia-controlling factorizations for optimization algorithms,
Appl. Num. Math., 43 (2002), pp. 91–107.
[71] A. Forsgren and P.E. Gill, Primal-dual interior methods for nonconvex non-
linear programming, SIAM J. Optim., 8 (1998), pp. 1132–1152.
[72] A. Forsgren, P.E. Gill, and W. Murray, On the identification of local min-
imizers in inertia-controlling methods for quadratic programming, SIAM J.
Matrix Anal. Appl., 12 (1991), pp. 730–746.
[73] M.P. Friedlander, A Globally Convergent Linearly Constrained Lagrangian
Method for Nonlinear Optimization, PhD thesis, Department of Operations
Research, Stanford University, Stanford, CA, 2002.
[74] M.P. Friedlander and S. Leyffer, Global and finite termination of a two-
phase augmented Lagrangian filter method for general quadratic programs,
SIAM J. Sci. Comput., 30 (2008), pp. 1706–1729.
[75] M.P. Friedlander and M.A. Saunders, A globally convergent linearly con-
strained Lagrangian method for nonlinear optimization, SIAM J. Optim., 15
(2005), pp. 863–897.
www.it-ebooks.info
220 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
SQP METHODS 221
www.it-ebooks.info
222 PHILIP E. GILL AND ELIZABETH WONG
[120] S.P. Han and O.L. Mangasarian, Exact penalty functions in nonlinear pro-
gramming, Math. Programming, 17 (1979), pp. 251–269.
[121] J. Herskovits, A two-stage feasible directions algorithm for nonlinear con-
strained optimization, Math. Programming, 36 (1986), pp. 19–38.
[122] M.R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl., 4
(1969), pp. 303–320.
[123] H.M. Huynh, A Large-Scale Quadratic Programming Solver Based on Block-LU
Updates of the KKT System, PhD thesis, Program in Scientific Computing
and Computational Mathematics, Stanford University, Stanford, CA, 2008.
[124] M.M. Kostreva and X. Chen, A superlinearly convergent method of feasible
directions, Appl. Math. Comput., 116 (2000), pp. 231–244.
[125] , Asymptotic rates of convergence of SQP-type methods of feasible direc-
tions, in Optimization methods and applications, Vol. 52 of Appl. Optim.,
Kluwer Acad. Publ., Dordrecht, 2001, pp. 247–265.
[126] J. Kroyan, Trust-Search Algorithms for Unconstrained Optimization, PhD the-
sis, Department of Mathematics, University of California, San Diego, Febru-
ary 2004.
[127] C.T. Lawrence and A.L. Tits, A computationally efficient feasible sequential
quadratic programming algorithm, SIAM J. Optim., 11 (2001), pp. 1092–1118
(electronic).
[128] S. Leyffer, Integrating SQP and branch-and-bound for mixed integer nonlinear
programming, Comput. Optim. Appl., 18 (2001), pp. 295–309.
[129] D.C. Liu and J. Nocedal, On the limited memory BFGS method for large scale
optimization, Math. Program., 45 (1989), pp. 503–528.
[130] X.-W. Liu and Y.-X. Yuan, A robust algorithm for optimization with gen-
eral equality and inequality constraints, SIAM J. Sci. Comput., 22 (2000),
pp. 517–534 (electronic).
[131] C.M. Maes, A Regularized Active-Set Method for Sparse Convex Quadratic Pro-
gramming, PhD thesis, Institute for Computational and Mathematical Engi-
neering, Stanford University, Stanford, CA, August 2010.
[132] A. Majthay, Optimality conditions for quadratic programming, Math. Program-
ming, 1 (1971), pp. 359–365.
[133] O.L. Mangasarian and S. Fromovitz, The Fritz John necessary optimality
conditions in the presence of equality and inequality constraints, J. Math.
Anal. Appl., 17 (1967), pp. 37–47.
[134] N. Maratos, Exact Penalty Function Algorithms for Finite-Dimensional and
Control Optimization Problems, PhD thesis, Department of Computing and
Control, University of London, 1978.
[135] J. Mo, K. Zhang, and Z. Wei, A variant of SQP method for inequality con-
strained optimization and its global convergence, J. Comput. Appl. Math.,
197 (2006), pp. 270–281.
[136] J.L. Morales, A numerical study of limited memory BFGS methods, Appl.
Math. Lett., 15 (2002), pp. 481–487.
[137] J.L. Morales, J. Nocedal, and Y. Wu, A sequential quadratic programming
algorithm with an additional equality constrained phase, Tech. Rep. OTC-05,
Northwestern University, 2008.
[138] J.J. Moré and D.C. Sorensen, On the use of directions of negative curvature
in a modified Newton method, Math. Program., 16 (1979), pp. 1–20.
[139] , Newton’s method, in Studies in Mathematics, Volume 24. MAA Studies
in Numerical Analysis, G.H. Golub, ed., Math. Assoc. America, Washington,
DC, 1984, pp. 29–82.
[140] J.J. Moré and D. J. Thuente, Line search algorithms with guaranteed sufficient
decrease, ACM Trans. Math. Software, 20 (1994), pp. 286–307.
[141] W. Murray, An algorithm for constrained minimization, in Optimization (Sym-
pos., Univ. Keele, Keele, 1968), Academic Press, London, 1969, pp. 247–258.
www.it-ebooks.info
SQP METHODS 223
www.it-ebooks.info
224 PHILIP E. GILL AND ELIZABETH WONG
www.it-ebooks.info
USING INTERIOR-POINT METHODS WITHIN
AN OUTER APPROXIMATION FRAMEWORK FOR
MIXED INTEGER NONLINEAR PROGRAMMING
HANDE Y. BENSON∗
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 225
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_7,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
226 HANDE Y. BENSON
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 227
[9], [8], and [29] demonstrate that interior-point solvers such as ipopt [37],
loqo [34], and knitro [30] are highly efficient and are the only solvers
capable of handling very large scale NLPs. Therefore, it is important to
resolve difficulties associated with warmstarting and infeasibility detection
to implement an efficient and robust MINLP solver using interior-point
methods.
In [5], we analyzed the use of an interior-point method within a branch-
and-bound framework. We showed that the changing bounds would guar-
antee that the algorithm would stall when warmstarting, and that even with
a coldstart, fixed variables and infeasible problems would cause the algo-
rithm to fail. As a remedy, we proposed a primal-dual penalty approach,
which was able to greatly improve efficiency, handle fixed variables, and
correctly identify all infeasible subproblems in numerical testing.
In this paper, we turn our attention to interior-point methods within
the Outer Approximation framework. Similar challenges arise in this frame-
work, as well. One key difference is that we will limit ourselves to MINLPs
with convex continuous relaxations, that is, cases where f is convex and h
are concave for (1.1). This is required for the underlying theory of the Outer
Approximation framework, and, while it is a limitation, it will also give us
the chance to explore certain special classes of convex problems, such as
second-order cone programming problems (SOCPs) and semidefinite pro-
gramming problems (SDPs), that arise in the continuous relaxations.
The outline of the paper is as follows: We start with a brief description
of the Outer Approximation framework in Section 2. In Section 3, we intro-
duce an infeasible interior-point method and analyze its challenges within
a MINLP algorithm. To address these challenges, we propose the exact
primal-dual penalty method. In Section 4, we turn our attention to the
performance of our algorithm on certain special classes of problems, such
as SOCPs and SDPs. We present implementation details of our approach
and favorable numerical results on problems from literature in Section 5.
2. Outer approximation. The Outer Approximation (OA) algo-
rithm solves an alternating sequence of NLPs and mixed-integer linear
programming problems (MILPs) to solve (1.1). For each y k ∈ Y ∩ Z p ,
the NLP to be solved is obtained from (1.1) by fixing y = y k :
min f (x, y k )
x
s.t. h(x, y k ) ≥ 0 (2.1)
Ax x ≤ bx .
(2.1) may or may not have a feasible solution. As such, we let xk denote
the solution if one exists and the minimizer of infeasibility otherwise. We
define F(Ŷ) as the set of all pairs of (xk , y k ) where xk is an optimal solution
of (2.1) and I(Ŷ) as the set of all pairs of (xk , y k ) where (2.1) is infeasible
for y k ∈ Ŷ. We also define the following MILP:
www.it-ebooks.info
228 HANDE Y. BENSON
min z
x,y,z
x − xk
s.t. f (xk , y k ) + ∇f (xk , y k )T k ≤ z, ∀(xk , y k ) ∈ F(Ŷ)
y − y
x − xk
h(xk , y k ) + ∇h(xk , y k )T k ≥ 0, ∀(xk , y k ) ∈ F(Ŷ)
y − y (2.2)
x − xk
h(xk , y k ) + ∇h(xk , y k )T ≥ 0, ∀(xk , y k ) ∈ I(Ŷ)
y − yk
Ax x ≤ bx
Ay y ≤ by
y ∈ Z p,
min f (x, y k )
x (3.1)
s.t. g(x, y k ) ≥ 0,
where
h(x, y k )
g(x, y k ) = .
bx − Ax x
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 229
min f (x, y k )
x,w
s.t. g(x, y k ) − w = 0 (3.2)
w ≥ 0.
∇f (x, y k ) − A(x, y k )T λ = 0,
−μe + W Λe = 0, (3.4)
g(x, y k ) − w = 0,
the directions given by Newton’s method are found by solving the KKT
system:
⎡ ⎤⎡ ⎤ ⎡ ⎤
−W −1 Λ −I Δw −γ
⎣ −H AT ⎦ ⎣ Δx ⎦ = ⎣ σ ⎦ . (3.6)
−I A Δλ ρ
Note that we have omitted the use of function arguments for ease of display.
Letting
E = W Λ−1
www.it-ebooks.info
230 HANDE Y. BENSON
The reduced KKT system is solved by using the LDLT form of Cholesky
factorization, including exploitation of sparsity by reordering the columns
in a symbolic Cholesky routine. As stated before, for each yk , the sparsity
structure of the matrix in (3.7) may change. Such changes are quite com-
mon, especially when y are binary. Fixing yjk to 0 may cause terms in the
objective or the constraint functions to drop. A careful implementation of
the underlying algorithm can take advantage of such changes if they bring
about substantial reduction in size or complexity for certain subproblems,
or use a general enough sparsity structure so that each subsequent nonlin-
ear subproblem can be solved without additional sparsity structure setups
or calls to the symbolic Cholesky routine.
Once the step directions Δx and Δλ are obtained from (3.7), we can
obtain the step directions for the slack variables from the following formula:
where the superscripts denote the iteration number, α(l) is chosen to ensure
that the slacks w(l+1) and the dual variables λ(l+1) remain strictly positive
and sufficient progress toward optimality and feasibility is attained. At
each iteration, the value of the barrier parameter may also be updated as
a function of W (l+1) λ(l+1) . Both the notion of sufficient progress and the
exact formula for the barrier parameter update vary from one solver to
another, but the general principle remains the same.
The algorithm concludes that it has reached an optimal solution when
the primal infeasibility, the dual infeasibility, and the average complemen-
tarity are all less than a given tolerance level. For (3.1), we have that
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 231
www.it-ebooks.info
232 HANDE Y. BENSON
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 233
min f (x, y k ) + cT ξ
x,w,ξ
s.t. g(x, y k ) − w = 0 (4.1)
−ξ ≤ w ≤ u
ξ ≥ 0,
www.it-ebooks.info
234 HANDE Y. BENSON
min f (x, y k ) + cT ξ
x,y k ,w,ξ
m
m
m
−μ log(ξi ) − μ log(wi + ξi ) − μ log(ui − wi ) (4.3)
i=1 i=1 i=1
s.t. g(x, y k ) − w = 0,
where μ > 0 is the barrier parameter. Letting (λ) once again denote the
dual variables associated with the remaining constraints, the first-order
conditions for the Lagrangian of (4.3) can be written as
g(x, yk ) − w = 0
∇f (x, y k ) − A(x, y k )T λ = 0
λ − μ(W + Ξ)−1 e + μ(U − W )−1 e = 0
c − μΞ−1 e − μ(W + Ξ)−1 e = 0
where Ξ and U are the diagonal matrices with the entries of ξ and u,
respectively. Making the substitution
ψ = μ(U − W )−1 e
we can rewrite the first-order conditions as
g(x, y k ) − w = 0
∇f (x, y k ) − A(x, y k )T λ = 0
(W + Ξ)(Λ + Ψ)e = μe (4.4)
Ξ(C − Λ − Ψ)e = μe
Ψ(U − W )e = μe
where Ψ and C are the diagonal matrices with the entries of ψ and c,
respectively. Note that the new variables ψ serve to relax the nonnegativity
requirements on the dual variables λ, so we refer to them as the dual
relaxation variables.
Applying Newton’s Method to (4.4), and eliminating the step direc-
tions for w, ξ, and ψ, the reduced KKT system arising in the solution of
the penalty problem (4.1) has the same form as (3.7) with
1. /−1 2−1
E = (Λ + Ψ)−1 (W + Ξ) + Ξ(C − Λ − Ψ)−1 + Ψ(U − W )−1
. /−1 (4.5)
γ = .(Λ + Ψ)−1 (W + Ξ) + Ξ(C − Λ − Ψ)−1/ . /
−1 −1 −1
μ(Λ + Ψ) e − μ(C − Λ − Ψ) e − w − μ(U − W ) e − ψ .
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 235
www.it-ebooks.info
236 HANDE Y. BENSON
Since the initial values of the penalty parameters, u and c, may not
be large enough to admit the optimal solution, we also need an updating
scheme for these parameters. Given the relaxation, an optimal solution
can always be found for (4.1), and one possible “static” updating scheme
is to solve a problem to optimality and to increase the penalty parame-
ters if their corresponding relaxation variables are not sufficiently close to
zero. However, this may require multiple solves of a problem and sub-
stantially increase the number of iterations necessary to find the optimal
solution. Instead, we can use a “dynamic” updating scheme, where the
penalty parameters are checked at the end of each iteration and updated.
(k+1) (k) (k+1) (k)
For i = 1, . . . , m + mx , if wi > 0.9ui , then ui = 10ui . Similarly,
(k+1) (k) (k) (k+1) (k)
if λi + ψi > 0.9ci , then ci = 10ci .
4.2. Infeasibility detection. In the preceding discussion, we estab-
lished what can go wrong when warmstarting an interior-point method and
proposed the exact primal-dual penalty approach as a remedy. Another
concern for improving the inner level algorithm within our framework was
the efficient identification of infeasible NLP subproblems. The primal-dual
penalty method described as a remedy for warmstarting can also aid in
infeasibility identification. Since all of the slack variables are relaxed, the
penalty problem (4.1) always possesses a feasible solution. In addition, the
upper bounds on the slack variables guarantee that an optimal solution
to (4.1) always exists. Therefore, a provably convergent NLP algorithm is
guaranteed to find an optimal solution to (4.1). If this solution has the
property that ξi → a for at least one i = 1, . . . , m + mx for some scalar
a > 0 as ci → ∞, then the original problem is infeasible.
It is impractical to allow a penalty parameter to become infinite. How-
ever, a practical implementation can be easily devised by simply dropping
the original objective function and minimizing only the penalty term, which
is equivalent to letting all the penalty parameters become infinite. There-
fore, a feasibility restoration phase similar to the “elastic mode” of snopt
[23] can be used, in that the problem
min cT ξ
x,w,ξ
s.t. g(x, y k ) − w = 0 (4.7)
−ξ ≤ w ≤ u
ξ ≥ 0,
is solved in order to minimize infeasibility. It differs from snopt’s version
in that the slack variables are still bounded above by the dual penalty
parameters. Since these parameters get updated whenever necessary, we
can always find a feasible solution to (4.7). If the optimal objective function
value is nonzero (numerically, greater than the infeasibility tolerance), a
certificate of infeasibility can be issued.
While a feasibility problem can be defined for the original NLP sub-
problem (2.1) as well, a trigger condition for switching into the “elastic
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 237
mode” for solving it is not easy to define within the context of the interior-
point method of Section 3. However, the exact primal-dual penalty ap-
proach can simply track the number of dynamic updates made to the
penalty parameters and switch over to solving (4.7) after a finite num-
ber of such updates are performed. In our numerical testing, we have set
this trigger to occur after three updates to any single penalty parameter.
Note that other infeasibility detection schemes based on penalty meth-
ods are available (see [16]) which would not require the solution of a sep-
arate feasibility problem. As their warmstarting capabilities are yet un-
known, we will investigate such approaches in future work.
5. Special forms of convex NLPs. One class of problems that fits
well into the OA framework is conic programming, specifically second-order
cone programming and semidefinite programming. This class of problems
is especially important in a variety of engineering applications and as re-
laxations of some NP-hard combinatorial problems. Much of the research
has focused on problems that are otherwise linear, due in part to the abun-
dance of strong theoretical results and the ease of extending established
and implemented linear programming algorithms. However, as the models
in each of these areas become more realistic and more complicated, many
of the problems are expressed with nonlinearities in the objective func-
tion and/or the constraints. To handle such nonlinearities efficiently, one
approach is to fit the problem into the NLP framework through reformu-
lation or separation into a series of NLP subproblems. In addition, these
problems can also have some discrete variables, and fitting them into an
NLP framework allows for the use of the efficient mixed-integer nonlinear
programming techniques for their solution.
In standard form, a mixed-integer nonlinear cone programming prob-
lem is given by
min f (x, y)
x,y
s.t. h(x, y) ≥ 0 (5.1)
x∈K
y ∈ Y,
www.it-ebooks.info
238 HANDE Y. BENSON
x+ξ ∈ K
u−x ∈ K (5.4)
ξ ∈ K.
For a second order cone, it is sufficient to pick ξ = (ξ0 , 0), and for a semidef-
inite cone, we only need mat(ξ) to be a diagonal matrix. As before, the
objective function is also converted to
f (x, y) + cT ξ.
Since both the second-order cone and the cone of positive semidefinite
matrices are self-dual, the dual problem also involves a cone constraint,
which is similarly relaxed and bounded.
For both second-order and semidefinite cones, the reformulation of
the cone constraints to fit into the NLP framework have been extensively
discussed in [10]. For second-order cones, an additional challenge is the
nondifferentiability of the Euclidean norm in (5.2). In fact, if the optimal
solution includes x∗1 = 0, it can cause numerical problems for convergence
of the NLP algorithm and theoretical complications for the formulation of
the subsequent MILP even if numerical convergence can be attained for
the NLP. There are several ways around this issue: if a preprocessor is
used and a nonzero lower bound for x0 is available, then the so-called ratio
reformulation (see [10]) can be used to rewrite the cone constraint of (5.2)
as
xT1 x1
≤ x0 , x0 ≥ 0.
x0
Both of these formulations yield convex NLPs, but they are not general
enough. In our implementation, we have used the constraint as given in
(5.2), but a more thorough treatment using a subgradient approach is dis-
cussed in [17].
6. Numerical results. We implemented an OA framework and the
interior-point method using the primal-dual penalty approach in the solver
milano [4]. For comparison purposes, we also implemented the interior-
point method outlined at the beginning of Section 2. The MILPs that arise
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 239
The initial primal and dual solutions used when warmstarting are the
optimal primal and dual solutions of the previous NLP subproblem. For
coldstarts, we used any user-provided initial solutions, and where none were
available, the primal variable was initialized to 0 and all nonnegative slack
and dual variables were initialized to 1. Numerical experience in Table
1 indicates that using this solution can improve the performance of the
algorithm. However, a better primal initial solution can be the optimal
x values from the current MILP. In this case, we would need to use an
approximation to the Lagrange multipliers, for example by approximately
solving a QP model of the NLP subproblem. This will be part of our future
work.
www.it-ebooks.info
240 HANDE Y. BENSON
Table 1
Comparison of the warmstarting primal-dual penalty approach to coldstarts on
small problems from the MINLPLib test suite and two mixed-integer second-order cone
programming problems from [17] (indicated by “*” in the table). “#” indicates the NLP
subproblem being solved, WarmIters and ColdIters are the numbers of warmstart and
coldstart iterations, respectively, and %Impr is the percentage of improvement in the
number of iterations. (INF) indicates that a certificate of infeasibility was issued, and
(IL) denotes that the algorithm reached its iteration limit.
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 241
REFERENCES
www.it-ebooks.info
242 HANDE Y. BENSON
www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 243
[29] J. Nocedal, J. Morales, R. Waltz, G. Liu, and J. Goux, Assessing the po-
tential of interior-point methods for nonlinear optimization, in Large-Scale
PDE-Constrained Optimization, Lecture Notes in Computational Science and
Engineering, Vol. 30, 2003, pp. 167–183.
[30] J. Nocedal and R.A. Waltz, Knitro 2.0 user’s manual, Tech. Rep. OTC 02-2002,
Optimization Technology Center, Northwestern University, January 2002.
[31] I. Quesada and I. Grossmann, An LP/NLP based branch and bound algorithm for
convex MINLP optimization problems, Computers and Chemican Engineering,
16 (1992), pp. 937–947.
[32] N. Sahinidis, Baron: A general purpose global optimization software package,
Journal of Global Optimization, 8(2) (1996), pp. 201–205.
[33] A. Tits, A. Wächter, S. Bakhtiari, T. Urban, and C. Lawrence, A primal-
dual interior-point method for nonlinear programming with strong global and
local convergence properties, SIAM Journal on Optimization, 14 (2003),
pp. 173–199.
[34] R. Vanderbei, LOQO user’s manual—version 3.10, Optimization Methods and
Software, 12 (1999), pp. 485–514.
[35] R. Vanderbei and D. Shanno, An interior-point algorithm for nonconvex nonlin-
ear programming, Computational Optimization and Applications, 13 (1999),
pp. 231–252.
[36] J. Viswanathan and I. Grossman, A combined penalty function and outer ap-
proximation method for MINLP optimization, Computers and Chemical En-
gineering, 14 (1990), pp. 769–782.
[37] A. Wächter and L.T. Biegler, On the implementation of an interior-point filter
line-search algorithm for large-scale nonlinear programming, Tech. Rep. RC
23149, IBM T.J. Watson Research Center, Yorktown, USA, March 2004.
[38] T. Westerlund and K. Lundqvist, Alpha-ecp version 5.01: An interactive
minlp-solver based on the extended cutting plane method, Tech. Rep. 01-178-A,
Process Design Laboratory at Abo Akademii University, 2001.
[39] S. Zhang, A new self-dual embedding method for convex programming, Journal of
Global Optimization, 29 (2004), pp. 479–496.
www.it-ebooks.info
www.it-ebooks.info
PART IV:
Expression Graphs
www.it-ebooks.info
www.it-ebooks.info
USING EXPRESSION GRAPHS IN
OPTIMIZATION ALGORITHMS
DAVID M. GAY∗
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 247
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_8,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
248 DAVID M. GAY
x 3 y 4
− +
()ˆ2 ()ˆ2
+
Fig. 1. Expression graph for f (x, y) = (x − 3)2 + (y + 4)2 .
most such representations is that they are turned into expression graphs
behind the scenes: directed graphs where each node represents an oper-
ation, incoming edges represent operands to the operation, and outgoing
edges represent uses of the result of the operation. This is illustrated in
Figure 1, which shows an expression graph for computing the f : R2 → R
for computing the function f (x, y) = (x − 3)2 + (y + 4)2 , which involves
operators for addition (+), subtraction (−) and squaring (()ˆ2).
It can be convenient to have an explicit expression graph and to com-
pute with it or manipulate it in various ways. For example, for smooth
optimization problems, we can turn expression graphs for objective and
constraint-body evaluations into reasonably efficient ways to compute both
these functions and their gradients. When solving mixed-integer nonlinear
programming (MINLP) problems, computing bounds and convex underes-
timates (or concave overestimates) can be useful and can be done with ex-
plicit expression graphs. Problem simplifications by “presolve” algorithms
and (similarly) domain reductions in constraint programming are readily
carried out on expression graphs.
This paper is concerned with computations related to solving a math-
ematical programming problem: given D ⊆ Rn , f : D → R, c : D → Rm ,
and , u ∈ D ∪ {−∞, ∞}n with i ≤ ui ∀ i, find x∗ such that x = x∗ solves
www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 249
Minimize f (x)
subject to ≤ c(x) ≤ u (1.1)
and x ∈ D.
D = Rp × Zq , (1.2)
with n = p + q.
One of my reasons for interest in the AMPL [7, 8] modeling language
for mathematical programming is that AMPL makes explicit expression
graphs available to separate solvers. Mostly these graphs are only seen and
manipulated by the AMPL/solver interface library [13], but one could also
use them directly in the computations described below.
There are various ways to represent expression graphs. For exam-
ple, AMPL uses a Polish prefix notation (see, e.g., [31]) for the nonlinear
parts of problems conveyed to solvers via a “.nl” file. Kearfott [21] uses
a representation via 4-tuples (operation, result, left, and right operands).
Representations in XML have also been advocated ([9]). For all the specific
representations I have seen, converting from one form to another takes time
linear in the length (nodes + arcs) of the expression graph.
The rest of this paper is organized as follows. The next several sec-
tions discuss derivative computations (§2), bound computations (§3), pre-
solve and constraint propagation (§4), convexity detection (§5), and outer
approximations (§6). Concluding remarks appear in the final section (§7).
2. Derivative computations. When f and c in (1.1) are continu-
ously differentiable in their continuous variables (i.e., the first p variables
when (1.2) holds), use of their derivatives is important for some algorithms;
when integrality is relaxed, partials with respect to nominally integer vari-
ables may also be useful (as pointed out by a referee). Similarly, when
f and c are twice differentiable, some algorithms (variants of Newton’s
method) can make good use of their first and second derivatives. In the
early days of computing, the only known way to compute these derivatives
without the truncation errors of finite differences was to compute them by
the rules of calculus: deriving from, e.g., an expression for f (x) expres-
sions for the components of ∇f (x), then evaluating the derived formulae
as needed. Hand computation of derivatives is an error-prone process, and
many people independently discovered [18] a class of techniques called Au-
tomatic Differentiation (or Algorithmic Differentiation), called AD below.
The idea is to modify a computation so it computes both function and
desired partial derivatives as it proceeds — an easy thing to do with an
expression graph. Forward AD is easiest to understand and implement:
one simply applies the rules of calculus to recur desired partials for the
result of an operation from the partials of the operands. When there is
www.it-ebooks.info
250 DAVID M. GAY
www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 251
struct expr {
efunc *op; /* function for this operation */
int a; /* adjoint index */
real dL, dR; /* left and right partials */
expr *L, *R; /* left and right operands */
};
Fig. 2. ASL structure for binary operations with only f and ∇f available.
struct
expr {
efunc *op; /* function for this operation */
int a; /* adjoint index (for gradient comp.) */
expr *fwd, *bak; /* used in forward and reverse sweeps */
double dO; /* deriv of op w.r.t. t in x + t*p */
double aO; /* adjoint (in Hv computation) of op */
double adO; /* adjoint (in Hv computation) of dO */
double dL; /* deriv of op w.r.t. left operand */
expr *L, *R; /* left and right operands */
double dR; /* deriv of op w.r.t. right operand */
double dL2; /* second partial w.r.t. L, L */
double dLR; /* second partial w.r.t. L, R */
double dR2; /* second partial w.r.t. R, R */
};
the “atan2” function, when Hessian computations are allowed, the function
also computes and stores some second partial derivatives.
Once a function evaluation has stored the partials of each operation,
the “reverse sweep” for gradient computations by reverse AD takes on a
very simple form in the ASL:
do *d->a.rp += *d->b.rp * *d->c.rp;
(2.1)
while(d = d->next);
www.it-ebooks.info
252 DAVID M. GAY
double
f_OPMULT(expr *e A_ASL)
{
expr *eL = e->L.e;
expr *eR = e->R.e;
return (e->dR = (*eL->op)(eL))
* (e->dL = (*eR->op)(eR));
}
Fig. 4. ASL function for multiplication.
in which the Ai are matrices having only a few rows (varying with i) and
n columns, so that fi is a function of only a few variables. A nice feature
of this structure is that f ’s Hessain ∇2 f (x) has the form
∇2 f (x) = ATi ∇2 fi (x)Ai ,
i
i.e., ∇2 f (x) is a sum of outer products involving the little matrices Ai and
the Hessians ∇2 fi (x) of the fi . Knowing this structure, we can compute
each ∇2 fi (x) separately with a few Hessian-vector products, then assemble
the full ∇2 f (x) — e.g., if it is to be used by a solver that wants to see
explicit Hessian matrices.
Many mathematical programming problems involve functions having
a more elaborate structure called partially-separable structure:
⎛ ⎞
ri
f (x) = θi ⎝ fij (Aij x)⎠ , (2.2)
i j=1
www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 253
www.it-ebooks.info
254 DAVID M. GAY
www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 255
For much more about AD in general, see Griewank’s book [19] and the
“autodiff” web site [1], which has pointers to many papers and packages
for AD.
3. Bound computations. Computing bounds on a given expression
can be helpful in various contexts. For nonlinear programming in general
and mixed-integer nonlinear programming in particular, it is sometimes
useful to “branch”, i.e., divide a compact domain into the disjoint union of
two or more compact subdomains that are then considered separately. If
we find a feasible point in one domain and can compute bounds showing
that any feasible points in another subdomain must have a worse objective
value, then we can discard that other subdomain.
Various kinds of bound computations can be done by suitable expres-
sion graph walks. Perhaps easiest to describe and implement are bound
computations based on interval arithmetic [24]: given interval bounds on
the operands of an operation, we compute an interval that contains the
results of the operation. For example, for any a ∈ [a, a] and b ∈ [b, b], the
product ab = a · b satisfies
(It is only necessary to compute all four products when max(a, b) < 0
and min(a, b) > 0, in which case ab ∈ [min(ab, ab), max(ab, ab)].) When
computing with the usual finite-precision floating-point arithmetic, we can
use directed roundings to obtain rigorous enclosures.
Unfortunately, when the same variable appears several times in an
expression, interval arithmetic treats each appearance as though it could
www.it-ebooks.info
256 DAVID M. GAY
have any value in its domain, which can lead to very pessimistic bounds.
More elaborate interval analysis (see, e.g., [25, 26]) can give much tighter
bounds. For instance, mean-value forms [25, 28] have an excellent outer
approximation property that will be explained shortly. Suppose domain
X ⊂ Rn is the Cartesian product of compact intervals, henceforth called
an interval vector, i.e.,
X = [x1 , x1 ] × · · · × [xn , xn ].
{f (c) + sT (x − c) : x ∈ X, s ∈ S} (3.2)
www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 257
.
..
.. .
..
.. .
.
..
.. .
.
..
.. .
..
..
slope .. .
range .
..
.. .
..
..
.. .
.
. . . . . . . . . . . . . . . . . . . . ..
deriv.
range
e.g., for ci ≈ 12 (xi + xi ). With this scheme, permuting the variables may
result in different bounds; deciding which of the n! permutations is best
might not be easy.
Figure 5 indicates why slopes can give sharper bounds than we get from
a first-order Taylor expansion with an interval bound on the derivative.
Bounds on φ (X) give S = [−1, 3], whereas slope bounds give S = [0, 2].
Sometimes we can obtain still tighter bounds by using second-order
slopes [38, 35, 36], i.e., slopes of slopes. The general idea is to compute a
slope matrix H such that an enclosure of
www.it-ebooks.info
258 DAVID M. GAY
Table 4
Bound widths.
www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 259
www.it-ebooks.info
260 DAVID M. GAY
REFERENCES
www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 261
[9] Robert Fourer, Jun Ma, and Kipp Martin, An Open Interface for Hooking
Solvers to Modeling Systems, slides for DIMACS Workshop on COIN-OR,
2006, http://dimacs.rutgers.edu/Workshops/COIN/slides/osil.pdf .
[10] R. Fourer, C. Maheshwari, A. Neumaier, D. Orban, and H. Schichl, Con-
vexity and Concavity Detection in Computational Graphs, manuscript, 2008,
to appear in INFORMS J. Computing.
[11] Edward P. Gatzke, John E. Tolsma, and Paul I. Barton, Construction of
Convex Function Relaxations Using Automated Code Generation Techniques
Optimization and Engineering 3, 2002, pp. 305–326.
[12] Davd M. Gay, Automatic Differentiation of Nonlinear AMPL Models. In Auto-
matic Differentiation of Algorithms: Theory, Implementation, and Applica-
tion, A. Griewank and G. Corliss (eds.), SIAM, 1991, pp. 61–73.
[13] Davd M. Gay, Hooking Your Solver to AMPL, AT&T Bell Laboratories, Numer-
ical Analysis Manuscript 93-10, 1993 (revised 1997). http://www.ampl.com/
REFS/hooking2.pdf .
[14] Davd M. Gay, More AD of Nonlinear AMPL Models: Computing Hessian Infor-
mation and Exploiting Partial Separability. In Computational Differentiation
: Techniques, Applications, and Tools, Martin Berz, Christian Bischof, George
Corliss and Andreas Griewank (eds.), SIAM, 1996, pp. 173–184.
[15] Davd M. Gay, Semiautomatic Differentiation for Efficient Gradient Computa-
tions. In Automatic Differentiation: Applications, Theory, and Implementa-
tions, H. Martin Bücker, George F. Corliss, Paul Hovland and Uwe Naumann
and Boyana Norris (eds.), Springer, 2005, pp. 147–158.
[16] Davd M. Gay, Bounds from Slopes, report SAND-1010xxxx, to be available as
http://www.sandia.gov/~dmgay/bounds10.pdf.
[17] D.M. Gay, T. Head-Gordon, F.H. Stillinger, and M.H. Wright, An Appli-
cation of Constrained Optimization in Protein Folding: The Poly-L-Alanine
Hypothesis, Forefronts 8(2) (1992), pp. 4–6.
[18] Andreas Griewank, On Automatic Differentiation. In Mathematical Program-
ming: Recent Developments and Applications, M. Iri and K. Tanabe (eds.),
Kluwer, 1989, pp. 83–108.
[19] , Andreas Griewank, Evaluating Derivatives, SIAM, 2000.
[20] A. Griewank, D. Juedes, and J. Utke, Algorithm 755: ADOL-C: A package for
the automatic differentiation of algorithms written in C/C++, ACM Trans.
Math Software 22(2) (1996), pp. 131–167.
[21] R. Baker Kearfott An Overview of the GlobSol Package for Verified Global
Optimization, talk slides, 2002, http://www.mat.univie.ac.at/~neum/glopt/
mss/Kea02.pdf .
[22] , Padmanaban Kesavan, Russell J. Allgor, Edward P. Gatzke, and Paul I.
Barton, Outer Approximation Algorithms for Separable Nonconvex Mixed-
Integer Nonlinear Programs, Mathematical Programming 100(3), 2004, pp.
517–535.
[23] R. Krawczyk and A. Neumaier, Interval Slopes for Rational Functions and As-
sociated Centered Forms, SIAM J. Numer. Anal. 22(3) (1985), pp. 604–616.
[24] R.E. Moore, Interval Arithmetic and Automatic Error Analysis in Digital Com-
puting, Ph.D. dissertation, Stanford University, 1962.
[25] Ramon E. Moore, Methods and Applications of Interval Analysis, SIAM, 1979.
[26] Ramon E. Moore, R. Baker Kearfott, and Michael J. Cloud, Introduction
to Interval Analysis, SIAM, 2009.
[27] Ivo P. Nenov, Daniel H. Fylstra, and Lubomir V. Kolev, Convexity Determi-
nation in the Microsoft Excel Solver Using Automatic Differentiation Tech-
niques, extended abstract, 2004, http://www.autodiff.org/ad04/abstracts/
Nenov.pdf.
[28] Arnold Neumaier, Interval Methods for Systems of Equations, Cambridge Uni-
versity Press, 1990.
www.it-ebooks.info
262 DAVID M. GAY
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING
LEO LIBERTI∗
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 263
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_9,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
264 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 265
www.it-ebooks.info
266 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 267
Margot [39, 43] defines the relaxation group GLP (P ) of a BLP P as:
or, in other words, all relabellings of problem variables for which the ob-
jective function and constraints are the same. The relaxation group (2.1)
is used to derive effective BB pruning strategies by means of isomorphism
pruning and isomorphism cuts local to some selected BB tree nodes (Margot
extended his work to general integer variables in [42]). Further results along
the same lines (named orbital branching) are obtained for covering and
packing problems in [50, 51]: if O is an orbit of.4
some subgroup/ of the relax-
ation group, at each BB node the disjunction i∈O xi = 1 ∨ i∈O xi = 0
induces a feasible division ofthe search space; orbital branching restricts
this disjunction to xh = 1 ∨ i∈O xi where h is an arbitrary index in O.
The second was established by Kaibel et al. in 2007 [26, 15], with the
introduction of the packing and partitioning orbitopes, i.e. convex hulls
www.it-ebooks.info
268 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 269
⎫
miny (C • B j )yj ⎪
⎪
j≤k ⎪
⎪
⎬
∀i ≤ m (Ai • B j )yj = bi (2.3)
j≤k ⎪
⎪
yj B j 0. ⎪
⎪
⎭
j≤k
for some integers d and mt (t ≤ d). This allows a size reduction of the SDP
being solved, as the search only needs to be conducted on the smaller-
dimensional space spanned by φ(B).
A different line of research is pursued in [27]: motivated by an appli-
cation (truss optimization), it is shown that the barrier subproblem of a
typical interior point method for SDP “inherits” the same symmetries as
the original SDP.
2.4. Automatic symmetry detection. Automatic symmetry de-
tection does not appear prominently in the mathematical programming
literature. A method for finding the MILP relaxation group (2.1), based
on solving an auxiliary MILP encoding the condition σAπ = A, was pro-
posed and tested in [33] (to the best of our knowledge, the only approach for
symmetry detection that does not reduce the problem to a graph). A more
practically efficient method consists in finding the automorphism group of
vertex-colored bipartite graph encoding the incidence of variables in con-
straints. If the symmetry π is orbitopal and the system Ax ≤ b contains at
least a leading constraint, i.e. a π-invariant constraint that has exactly one
nonzero column in each Cp (for p ≤ q) then a set of generators for GLP (P )
can be found in linear time in the number of nonzeroes of A [7].
The Constraint Programming (CP) literature contains many papers on
symmetries. Whereas most of them discuss symmetry breaking techniques,
a few of them deal with automatic symmetry detection and are relevant to
the material presented in the rest of the paper; all of them rely on reducing
the problem to a graph and solving the associated Graph Isomorphism
(GI) problem. In CP, symmetries are called local if they hold at a specific
search tree node, and global otherwise. Solution symmetries are also called
semantic symmetries, and formulation symmetries are also called syntactic
or constraint symmetries. A Constraint Satisfaction Problem (CSP) can be
represented by its microstructure complement, i.e. a graph whose vertices
are assignments x = a (where x ranges over all CSP variables and a over all
values in the domain of x), and whose edges (xi = a, xj = b) indicate that
the two assignments xi = a and xj = b are incompatible either because of a
constraint in the CSP or because i = j and a = b. Constraint symmetries
www.it-ebooks.info
270 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 271
+ +
× × x3 × × x3
2 x1 x2 x3 2 x1 x2
Fig. 1. Expression tree for 2x1 + x2 x3 + x3 (left). Equal variable vertices can be
contracted to obtain a DAG (right).
www.it-ebooks.info
272 LEO LIBERTI
Because for any function h, h(xπ) ≡ h(x) implies h(xπ) = h(x) for all
x ∈ dom(h), it is clear that GP ≤ ḠP . Thus, it also follows that GP ≤
G∗ (P ). Although ḠP is defined for any MINLP (1.1), if P is a BLP, then
ḠP = GLP (P ) [36]. We remark that if f1 , f2 are linear forms, then f1 = f2
implies f1 ≡ f2 . In other words, for linear forms, ≡ and = are the same
relation [36]. As a corollary, if P is a BLP, then GP = GLP (P ).
If a set of mathematical functions share the same arguments, as for the
objective function f and constraints g of (1.1), the corresponding DAGs for
f, g1 , . . . , gm can share the same variable leaf vertices. This yields a DAG
DP = (VP , AP ) (formed by the union of all the DAGs of functions in P
followed by the contraction of leaf vertices with same variable index label)
which represents the mathematical structure P [49, 58].
4. Automatic computation of the formulation group. The me-
thod proposed in this section also appears (with more details) in [36]. As
mentioned in the literature review, similar techniques are available in CP
[53].
We first define an equivalence relation on VP which determines the
interchangeability of two vertices of DP . Let SF be the singleton set con-
taining the root vertex of the objective function, SC of all constraint root
vertices, SO of all vertices representing operators, SK of all constant ver-
tices and SV of all variable vertices. For v ∈ SF , we denote optimization
direction of the corresponding objective function by d(v); for v ∈ SC , we
denote the constraint sense by s(v). For v ∈ SO , we let (v) be the level
of v in DP , i.e. the length of the path from the root to v ( is well defined
as the only vertices with more than one incoming arc are the leaf vertices),
λ(v) be its operator label and o(v) be the order of v as an argument of
its parent vertex if the latter represents a noncommutative operator, or 1
otherwise. For v ∈ SK , we let μ(v) be the value of v. For v ∈ SV we let
r(v) be the 2-vector of lower and upper variable bounds for v and ζ(v) be
1 if v represents an integral variable or 0 otherwise. We now define the
relation ∼ on VP as follows
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 273
www.it-ebooks.info
274 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 275
www.it-ebooks.info
276 LEO LIBERTI
because k≤D x2ik = xi 2 = 4 for all i ≤ N . Let Q be (6.1) reformulated
according to (6.2): automatic detection of GQ yields an indication that
GQ ∼ = SD × SN , which is a considerably larger group. The difference lies
in the fact that the binary minus is in general not commutative; however,
it is commutative whenever it appears in terms like xi − xj (by defi-
nition of Euclidean norm). Since automatic symmetry detection is based
on expression trees, commutativity of an operator is decided at the vertex
representing the operator, rather than at the parent vertex. Thus, on (6.1),
our automatic system fails to detect the larger group. Reformulation (6.2)
prevents this from happening, thereby allowing the automatic detection of
the larger group.
Example 2. Consider the KNP instance defined by N = 6, D = 2,
whose variable mapping
„ «
x11 x12 x21 x22 x31 x32 x41 x42 x51 x52 x61 x62 α
y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13
"(x11 , x12 )(x21 , x22 )(x31 , x32 )(x41 , x42 )(x51 , x52 )(x61 x62 ),
(x11 , x21 )(x12 , x22 ), (x21 , x31 )(x22 , x32 ), (x31 , x41 )(x32 , x42 ),
(x41 , x51 )(x42 , x52 ), (x51 , x61 )(x52 , x62 )#,
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 277
www.it-ebooks.info
278 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 279
Table 1
Computational results for the Kissing Number Problem.
completely for knp-25 4) because we are keeping the CPU time fixed at
10h. We remark that the effectiveness of the NarrowingKNP reformulation
in low-dimensional spaces can be partly explained by the fact that it is de-
signed to break sphere-related symmetries rather than dimension-related
ones (naturally, the instance size also counts: the largest 2D instance,
knp-10 2, has 21 variables, whereas the smallest 3D one, knp-12 3, has 37
variables).
REFERENCES
www.it-ebooks.info
280 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 281
www.it-ebooks.info
282 LEO LIBERTI
www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 283
www.it-ebooks.info
www.it-ebooks.info
PART V:
Convexification and
Linearization
www.it-ebooks.info
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR
SOLVING MINLPs
BJÖRN GEIßLER∗ , ALEXANDER MARTIN∗ , ANTONIO MORSI∗ , AND
LARS SCHEWE∗
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 287
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_10,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
288 BJÖRN GEIßLER ET AL.
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 289
types of variables, see Section 6 for details. The same type of nonlinearities
occur when one wants to model water or power networks.
Despite the facts that piecewise linear functions might be helpful for
the solution of MINLPs and that they are of interest of their own, they
also directly show up in practical applications. One such example is the
optimization of the social welfare in power spot markets. One of the prod-
ucts that is traded for instance at the European Energy Exchange (EEX)
or the Nordpol Spot (NPS) are hourly bid curves. For each trading hour of
the day the customers give their hourly bids by a set of points of the form
power per price. These set of points are linearly interpolated resulting in a
piecewise linear function. All other conditions in the welfare optimization
problem are of linear or combinatorial nature resulting in a huge mixed
integer linear program containing piecewise linear functions, see [16] for
details.
www.it-ebooks.info
290 BJÖRN GEIßLER ET AL.
yi
x0 xi xn
yi
λi
λi+1
y i+1
λi λi+1
xi xi+1
We start with the convex-combination method [7] (see Fig. 2). It uses
the following observation: When φ is a piecewise linear function, we can
compute the function value at point x if we can express x as the con-
vex combination of the neighboring nodes. We know from Carathéodory’s
theorem that we only need these two neighboring points. This condition
can be expressed using n binary variables z1 , . . . , zn . Thus, we use the
following model
n
n
x= λi xi , λi = 1, λ ≥ 0 (2.1)
i=0 i=0
n
y= λi y i , (2.2)
i=0
λ0 ≤ z1 ,
∀i ∈ {1, . . . , n − 1}. λi ≤ zi + zi+1 ,
λn ≤ zn (2.3)
n
zi ≤ 1 .
i=1
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 291
yi y i+1 −y i
xi+1 −xi δi
yi+1
δi
xi xi+1
n
y i − y i−1
y = y0 + δi (2.5)
i=1
xi − xi−1
www.it-ebooks.info
292 BJÖRN GEIßLER ET AL.
5
3 λi = 1
λi = 1 i=3
i=0
x0 x1 x2 x3 x4 x5
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 293
k−2
n
∀k ∈ {1, . . . , n} λi + λi ≤ (1 − zl ) + zl . (2.7)
i=0 i=k+1 {l|(c(k))l =1} {l|(c(k))l =0}
www.it-ebooks.info
294 BJÖRN GEIßLER ET AL.
m
y= λj y Sj , (3.3)
j=1
λj ≤ zi for j = 1, . . . , m, (3.4)
{i | xS
j ∈V(Si )}
n
zi ≤ 1. (3.5)
i=1
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 295
x2
0)
x
−
x1
2
(x
x0 δ2 − x0 )
δ1 (x1
www.it-ebooks.info
296 BJÖRN GEIßLER ET AL.
0 1 0 1
0 1
2
S4 1 2 S7
2
2 2
S8 S9
0
0 1
0 S6
2 S5 0
2
2
1
S3
0
2 0 S10
S2
0 1
S11
2
1
S1 1
1 2
1 0
n
d
x = xS0 1 + (xSj i − xS0 i )δjSi , (3.6)
i=1 j=1
d 1
n
2
y= y S0 1 + ySj i − y S0 i δjSi , (3.7)
i=1 j=1
d
δjSi ≤ 1 for i = 1, . . . , n, (3.8)
j=1
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 297
www.it-ebooks.info
298 BJÖRN GEIßLER ET AL.
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 299
d
n d
n
x= λSj i xSj i , λSj i = 1, λ ≥ 0, (3.12)
i=1 j=0 i=1 j=0
d
n
y= λSj i ySj i , (3.13)
i=1 j=0
n
d
c(Si )l λSj i ≤ zl for l = 1, . . . , log2 n, (3.14)
i=1 j=0
n
d
(1 − c(Si )l )λSj i ≤ 1 − zl for l = 1, . . . , log2 n, (3.15)
i=1 j=0
Constraints (3.12) and (3.13) present the point (x, y) as convex combi-
nation of given simplices and remain unchanged against the disaggregated
www.it-ebooks.info
300 BJÖRN GEIßLER ET AL.
1001 1001
0001 0001
1010 1010
0000 0000
Fig. 8. Example for a binary encoding and the branching induced by the leftmost
and rightmost bit using the logarithmic disaggregated convex combination model.
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 301
ensure that the error bounds are satisfied. Hence, we are interested in as
strong as possible error estimators.
As starting point for our considerations, we assume the following sit-
uation: Let D ⊆ Rd and f : D → R be some continuous function. Further,
let φ : P → R be a piecewise linear approximation of f over some convex
polytope P ⊆ D. We assume φ to interpolate f on the vertices of some tri-
angulation of P. Thus, for a triangulation with simplex set S, we can define
affine functions φi : x → aTi x+bi for each simplex Si ∈ S with φi (x) = f (x)
for every vertex x of Si such that we can write φ(x) = φi (x) for x ∈ Si .
If we are able to control the linearization error within a simplex, we
are obviously able to control the overall linearization error, since we can
simply add a point of maximum error to the vertex set of our triangulation
and retriangulate the affected region locally. Repeating this process for all
not yet checked simplices, leads to a piecewise linearization, which satisfies
the error bound everywhere.
For this reason, we restrict our further considerations to the situation,
where φ(x) = aT x + b is the linear interpolation of f over a simplex S
with vertices x0 , . . . , xd , i.e., φ(xi ) = f (xi ) for i = 1, . . . , d. We define
the maximum linearization error in terms of the maximum under- and
overestimation of a function (cf. Fig. 9):
Definition 4.1. We call u (f, S) := maxx∈S f (x) − φ(x) the maxi-
mum underestimation, o (f, S) := maxx∈S φ(x) − f (x) the maximum over-
estimation and (f, S) = max{u (f, S), o (f, S)} the maximum lineariza-
tion error of f by φ over S.
f (x), φ(x)
u
o
x
S
www.it-ebooks.info
302 BJÖRN GEIßLER ET AL.
d d
f (x) = f ( λi xi ) ≥ λi f (xi ).
i=0 i=0
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 303
www.it-ebooks.info
304 BJÖRN GEIßLER ET AL.
The identity No = conv(Mo ), again follows from Lemma 4.2, which com-
pletes the proof.
In order to calculate a point where the maximum overestimation of f
by φ over S is attained, it suffices to solve the convex optimization problems
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 305
min cT x
s.t. gi (x) = bi for i = 1, . . . , k1 ,
hi (x) ≥ bi for i = 1, . . . , k2 ,
l ≤ x ≤ u, (5.1)
x ∈ Rd−p × Zp ,
www.it-ebooks.info
306 BJÖRN GEIßLER ET AL.
sin(x)
x
π
2 π 3π
2 2π
Fig. 10. Piecewise polyhedral envelopes of sin(x) on [0, 2π] with breakpoints
π 3π
0, , , 2π.
2 2
n
d
y= λSj i ySj i + e. (5.3)
i=1 j=0
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 307
d 1
n
2
y = y S0 i + ySj i − y S0 i δj + e (5.6)
i=1 j=1
The feasible region of the MIP described by the modified incremental model
together with the Constraints (5.7) and (5.8) is again the union of the boxes,
depicted in Fig. 10. In contrast to the modified disaggregated logarithmic
method, we can only model the piecewise polyhedral envelopes appropri-
ately, if we add inequalities containing binary variables. To clarify the cor-
rectness of (5.7) and (5.8) remember that in every feasible solution of the
described MIP there is some index j with zi = 1 for all i < j and zi = 0 for
all i ≥ j. This means that all terms u (f, Si ) on the left-hand side of (5.7)
and all terms o (f, Si ) on the left-hand side of (5.8) with i = j either cancel
out or are multiplied by 0. Therefore, we get −o (f, Sj ) ≤ e ≤ u (f, Sj ) as
desired.
As we have seen, it suffices to introduce k1 + k2 additional real valued
variables more to model a mixed integer piecewise polyhedral relaxation
instead of a mixed integer piecewise linear approximation of (5.1). This
marginal overhead makes every feasible point of (5.1) also feasible for its
MIP relaxation. Thus, we can use any NLP solver to produce feasible solu-
tions for the MIP, once the integer variables xd−1+p , . . . , xd are fixed. If we
only accept solutions, feasible for the MINLP, as incumbent solutions of the
MIP relaxation, it is straightforward to implement an algorithm to solve
(5.1), which can proof optimality and prune infeasible or suboptimal parts
of the branch and bound tree by using the whole bunch of well evolved
techniques integrated in modern solvers for mixed integer linear problems.
Even if we think about such an MIP relaxation without having any global
solver for the underlying MINLP in mind, there are some obvious advan-
tages compared to a (piecewise linear) approximation. First, any solution
of an MIP relaxation (and even any solution to a corresponding LP relax-
ation) yields a valid lower bound for the MINLP. On the other hand, if
we consider just an approximation, no such statement can be made with-
out making further assumptions concerning the convexity of the involved
www.it-ebooks.info
308 BJÖRN GEIßLER ET AL.
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 309
0.15 200
0.1
150
0.05
0
100
−0.05
−0.1
50
−0.15
−0.2 0
−4 −2 0 2 4 0 1 2 3 4
Fig. 11. Piecewise linear approxima- Fig. 12. Piecewise linear approxima-
tion of the solution set for equation (6.1). tion of the solution set for equation (6.3).
y1 = λ x1 |x1 | . (6.1)
y2 = dxe2 , (6.3)
www.it-ebooks.info
310 BJÖRN GEIßLER ET AL.
Table 1
Demonstration of univariate models for a water supply network example.
For all numerical results we use ILOG CPLEX 11.2 with default pa-
rameters and we do not interfere with the solution process. All computa-
tions are done on one core of an Intel Core2Duo 2GHz machine with 2GB
RAM running on a Linux operating system. The maximum running time is
restricted to at most one hour. We omit the time to calculate the piecewise
linear interpolations, because in our examples it is insignificant compared
to CPLEX’ running time. In Table 1 we compare all mixed integer formula-
tions from Section 2 on a water network optimization problem. The table
shows the entire number of variables, integer variables, auxiliary binary
variables and constraints of the resulting model as well as the number of
branch-and-bound nodes and solution times in seconds CPLEX needed to
prove optimality. An asterisk indicates that the time limit was reached. In
this case the solver’s gap at that time is printed in parentheses. The model
abbreviations stand for convex combination (cc), incremental (inc), special
ordered sets (sos), logarithmic convex combination (log) and logarithmic
disaggregated convex combination (dlog) model. We see that even if sos,
log and dlog yield smaller models it is not always a good idea to use just
these models. In our example inc is by far the best choice, a result that
agrees with further examinations of water network supply instances. In ad-
dition to the results computed by piecewise linear approximations we also
compare them to the piecewise linear relaxations introduced in Section 5.
For our example the running times of both approaches do not differ very
much. In general one could say it is slightly more difficult to find a fea-
sible solution using the approximation techniques, whereas the optimality
proof is typically somewhat faster. More interesting, however, is an inves-
tigation of the optimal solutions found by both approaches (see Table 3).
The relative difference of the objective values obtained by piecewise linear
approximation and piecewise linear relaxation is 10−3, which is not greater
than our error tolerance for the nonlinearities. Of course we cannot ensure
this to hold for arbitrary instances or models, but as our examples and ex-
periences show it is commonly true for our problems. The exact objective
value of this global optimum is somewhere in between of those achieved by
interpolation and relaxation.
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 311
where a, b and c are constants. Each of the arising ten equations of the
above type are piecewise linearized with approximately 700 tetrahedra. In
Table 2 computational results for the three different formulations from Sec-
tion 3, namely convex combination (cc), incremental (inc) and logarithmic
disaggregated convex combination (dlog) model are listed for a gas net-
work optimization problem. We see for our representative example that
dlog plays out its advantage of the drastically reduced number of binary
variables and constraints. Again, inc models seem to have the best struc-
ture, but the LPs are far too large compared to the others. A comparision
of the results obtained by piecewise linear relaxations shows almost identi-
cal running times to those achieved by interpolation techniques. As shown
in Table 3 the relative difference between both objective values is approx-
imately 2.7%, which is in the same order of magnitude as our specified
maximum approximation error of the nonlinearities. Likewise, we see that
the solution found by piecewise linearization techniques lies within 1.3%
and 1.4% of the exact optimum. In addition we like to mention that all
crucial decisions, i.e. problem-specific integer variables that are not intro-
duced to model piecewise linearizations, have an identical assignment in all
presented cases.
7. Future directions. We have seen that using piecewise linear ap-
proximations can be an attractive way of tackling mixed integer nonlinear
programs. We get a globally optimal solution within a-priori determined
tolerances and are able to use the well-developed software tools of mixed
integer linear programming. So far, the best case where these techniques
can be used is when there are only few different nonlinear functions each
of which depends only on a few variables. Then the combinatorics of the
problem dominate the complexity of the approximation and we can expect
a huge gain in being able to use MIP techniques.
This also opens up a variety of different directions for further devel-
oping these methods. One important topic is to fuse the techniques shown
www.it-ebooks.info
312 BJÖRN GEIßLER ET AL.
Table 2
Demonstration of multivariate models for a gas network example.
cc inc dlog
No. of vars 10302 30885 29397
No. of ints. 7072 7682 362
No. of aux. 7036 7646 326
No. of cons 3114 23592 747
No. of nodes 620187 18787 75003
running time * (0.81%) 3204 1386
Table 3
Comparison between optimal objective values obtained by piecewise approximation,
piecewise relaxation and exact optimum.
REFERENCES
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 313
www.it-ebooks.info
314 BJÖRN GEIßLER ET AL.
[27] W.D. Smith, A lower bound for the simplexity of the n-cube via hyperbolic vol-
umes, European Journal of Combinatorics, 21 (2000), pp. 131–137.
[28] F. Tardella, On the existence of polyhedral convex envelopes, in Frontiers in
global optimization, C. Floudas and P. M. Pardalos, eds., Vol. 74 of Nonconvex
Optimization and its Applications, Springer, 2004, pp. 563 – 573.
[29] M.J. Todd, Hamiltonian triangulations of Rn , in Functional Differential Equa-
tions and Approximation of Fixed Points, A. Dold and B. Eckmann,
eds., Vol. 730/1979 of Lecture Notes in Mathematics, Springer, 1979,
pp. 470 – 483.
[30] J.P. Vielma, S. Ahmed, and G. Nemhauser, Mixed-Integer models for nonsep-
arable Piecewise-Linear optimization: Unifying framework and extensions,
Operations Research, 58 (2009), pp. 303–315.
[31] J.P. Vielma, A.B. Keha, and G.L. Nemhauser, Nonconvex, lower semicontinu-
ous piecewise linear optimization, Discrete Optim., 5 (2008), pp. 467–488.
[32] D. Wilson, Polyhedral methods for piecewise-linear functions, Ph.D. thesis in
Discrete Mathematics, University of Kentucky, 1998.
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR
MINLP WITH SEPARABLE NON-CONVEXITY
CLAUDIA D’AMBROSIO∗ , JON LEE† , AND ANDREAS WÄCHTER‡
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 315
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_11,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
316 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 317
min j∈N Cj xj
subject to
f (x) ≤ 0;
(P)
ri (x) + k∈H(i) gik (xk ) ≤ 0 , ∀i ∈ M ;
Lj ≤ xj ≤ Uj , ∀j ∈ N ;
xj integer, ∀j ∈ I ,
www.it-ebooks.info
318 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 319
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
www.it-ebooks.info
320 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
depicted in Fig. 2.
We can obtain a better lower bound by refining the piecewise-linear
lower approximation on the concave pieces. We let
depicted in Fig. 3.
Next, we define further variables to manage our convexification of g
on its domain:
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 321
Constraints (2.6–2.8) ensure that, for each concave interval, the convex
combination of the breakpoints is correctly computed. Finally, (2.2) ap-
proximates the original non-convex univariate function g(xk ). Each single
term of the first and the second summations, using the definition of δp ,
reduces, respectively, to
⎧
⎪
⎨g(Pp ) , if p ∈ {1, . . . , p∗ − 1} ;
g(Pp−1 + δp ) = g(x∗k ) , if p = p∗ ;
⎪
⎩
g(Pp−1 ) , if p ∈ {p∗ + 1, . . . , p} ,
and
⎧
⎪g(Pp ) ,
⎨ if p ∈ {1, . . . , p∗ − 1} ;
g(Xp,b ) αp,b = b∈Bp∗ g(Xp ,b ) αp ,b ,
∗ ∗ if p = p∗ ;
⎪
⎩
b∈Bp g(Pp−1 ) , if p ∈ {p∗ + 1, . . . , p} ,
www.it-ebooks.info
322 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
where
g(x∗k ) , if p∗ ∈ Ȟ ;
γ=
b∈Bp∗ g(Xp∗ ,b ) αp∗ b , if p∗ ∈ Ĥ .
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 323
www.it-ebooks.info
324 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 325
www.it-ebooks.info
326 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
because the “gik (xk )” terms are replaced by a piecewise convex relaxation,
l̃
see (2.2). We denote the values of their approximation by g̃ik (xk ). Then,
l̃
the solutions of Q satisfy
ri (xl̃ ) + l̃
g̃ik (xl̃k ) ≤ 0 , ∀i ∈ M. (2.10)
k∈H(i)
l̃2
Now choose l̃1 , l̃2 with l̃2 > l̃1 . Because the approximation g̃ik (xk ) is
defined to coincide with the convex parts of gik (xk ) and is otherwise a linear
interpolation between breakpoints (note that xl̃k1 is a breakpoint for Ql̃2 if
xl̃k1 is in an interval where gik (xk ) is concave), the Lipschitz-continuity of
the gik (xk ) gives us
l̃2
g̃ik (xl̃k2 ) ≥ gik (xl̃k1 ) − Lg |xl̃k2 − xkl̃1 |.
for all i ∈ M . Because l̃1 , l̃2 with l̃2 > l̃1 have been chosen arbitrarily,
taking the limit as l̃1 , l̃2 → ∞ and using the continuity of r shows that x∗
satisfies (2.9). The continuity of the remaining constraints in P ensure that
x∗ is feasible for P. Because val(Ql̃ ) ≤ val(P) for all l̃, we finally obtain
val(Q∗ ) ≤ val(P), so that x∗ must be a global solution of P.
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 327
www.it-ebooks.info
328 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 329
Figure 5 depicts the three different nonlinear functions gkt (wkt ) that
were used for the computational results presented in Tables 1 and 2. The
dashed line depicts the non-convex function, while the solid line indicates
the initial piecewise-convex underestimator. Note that the third function
was intentionally designed to be pathological and challenging for SC-MINLP.
The remaining problem data was randomly generated.
In Table 1 the performance of SC-MINLP is shown. For the first in-
stance, the global optimum is found at the first iteration, but 4 more iter-
ation are needed to prove global optimality. In the second instance, only
one iteration is needed. In the third instance, the first feasible solution
found is not the global optimum which is found at the third (and last) iter-
ation. Table 2 demonstrates good performance of SC-MINLP. In particular,
instance ufl 1 is solved in about 117 seconds compared to 530 seconds
needed by COUENNE, instance ufl 2 in less than 18 seconds compared to
233 seconds. In instance ufl 3, COUENNE performs better than SC-MINLP,
but this instance is really quite easy for both algorithms. BONMIN 1 finds
solutions to all three instances very quickly, and these solutions turn out
to be globally optimal (but note, however, that BONMIN 1 is a heuristic
algorithm and no guarantee of the global optimality is given). BONMIN 50
also finds the three global optima, but in non-negligible time (greater than
the one needed by SC-MINLP in 2 out of 3 instances).
www.it-ebooks.info
330 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
50
45
40
35
30
25
20
15
10
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
400
350
300
250
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
45
40
35
30
25
20
15
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
www.it-ebooks.info
Table 1
Results for Uncapacitated Facility Location problem.
www.it-ebooks.info
ufl 3 79/21/101 1 1,947.883 2,756.890 - 2.25 -
... 2 2,064.267 2,756.890 no 2.75 2
87/25/105 3 2,292.743 2,292.777 no 3.06 2
AN ALGORITHMIC FRAMEWORK FOR MINLP
331
332
Table 2
Results for Uncapacitated Facility Location problem.
www.it-ebooks.info
ufl 3 32/2/36 8.44 2,292.777 0.73 2,292.775 3.08 2,292.777 3.13 2,292.775
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
AN ALGORITHMIC FRAMEWORK FOR MINLP 333
ujt ≤ n − 1 , ∀t ∈ T ;
j∈J
pjt − ϕ(qjt ) = 0 , ∀j ∈ J, t ∈ T.
Figure 6 shows the non-convex functions ϕ(qjt ) used for the three instances.
The remaining problem data was chosen according to [5].
Our computational results are reported in Tables 3 and 4. We observe
good performance of SC-MINLP. It is able to find the global optimum of the
three instances within the time limit, but COUENNE does not solve to global
optimality any of the instances. Also, BONMIN 1 and BONMIN 50 show good
performance. In particular, often a good solution is found in few seconds,
and BONMIN 1 finds the global optimum in one case.
www.it-ebooks.info
334
Table 3
Results for Hydro Unit Commitment and Scheduling problem.
www.it-ebooks.info
hydro 3 324/142/445 1 -4,753.849 -4,634.409 - 59.33 -
... 2 -4,719.927 -4,660.189 no 96.93 4
336/148/451 3 -4,710.734 -4,710.734 yes 101.57 2
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
Table 4
Results for Hydro Unit Commitment and Scheduling problem.
www.it-ebooks.info
hydro 3 124/62/165 337.77 -4,710.734 (-12,104.40) -3,703.070 5.12 -4,131.095 13.76 -3,951.199
AN ALGORITHMIC FRAMEWORK FOR MINLP
335
336 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
−2
−4
−6
−8
−10
−12
−14
−16
−18
0 5 10 15 20 25 30 35 40
−1
−2
−3
−4
−5
−6
−7
0 5 10 15 20 25 30 35 40
−1
−2
−3
−4
−5
−6
−7
−8
−9
0 5 10 15 20 25 30 35 40
www.it-ebooks.info
Table 5
Results for Nonlinear Continuous Knapsack problem.
www.it-ebooks.info
... 2 -518.057 -516.947 - 14.94 2
... 3 -517.837 -516.947 - 23.75 2
... 4 -517.054 -516.947 - 25.07 2
372/86/515 5 -516.947 -516.947 - 31.73 2
nck 100 35 734/167/1035 1 -83.580 -79.060 - 3.72 -
AN ALGORITHMIC FRAMEWORK FOR MINLP
Table 6
Results for Nonlinear Continuous Knapsack problem.
www.it-ebooks.info
nck 100 35 200/0/101 110.25 -81.638 90.32 -81.638 0.04 -79.060 16.37 -79.060
nck 100 80 200/0/101 109.22 -172.632 (-450.779) -172.632 0.04 -159.462 15.97 -171.024
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
AN ALGORITHMIC FRAMEWORK FOR MINLP 339
−1
−10
−2
−3 −20
−4
−30
−5
−40
−6
−7
−50
−8
−60
−9
−10
−70
−11
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
−0.1 −10
−0.2 −20
−30
−0.3
−40
−0.4
−50
−0.5
−60
−0.6
−70
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
www.it-ebooks.info
340 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
www.it-ebooks.info
Table 7
Results for GLOBALLib and MINLPLib.
www.it-ebooks.info
ex2 1 3 36/4/41 1 -15.000 -15.000 - 0.00 -
ex2 1 4 15/1/16 1 -11.000 -11.000 - 0.00 -
ex2 1 5 50/7/72 1 -269.453 -268.015 - 0.01 -
54/9/74 2 -268.015 -268.015 - 0.15 2
ex2 1 6 56/10/81 1 -44.400 -29.400 - 0.01 -
AN ALGORITHMIC FRAMEWORK FOR MINLP
Table 7 (Continued)
www.it-ebooks.info
... 3 95.063 100.000 - 5.77 19
... 4 98.742 100.000 - 18.51 8
... 5 99.684 100.000 - 37.14 11
184/72/155 6 99.960 100.000 - 56.00 10
ex9 2 3 90/18/125 1 -30.000 0.000 - 0.00 -
... 2 -30.000 0.000 - 2.29 12
... 3 -30.000 0.000 - 6.30 12
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
www.it-ebooks.info
... 10 -1.366 -1.000 - 127.60 12
... 11 -1.253 -1.000 - 1,106.78 22
... 12 -1.116 -1.000 - 3,577.48 13
... 13 -1.003 -1.000 - 587.11 15
557/248/375 14 -1.003 -1.000 - 1,181.12 14
AN ALGORITHMIC FRAMEWORK FOR MINLP
Table 7 (Continued)
www.it-ebooks.info
o7 2 366/42/268 1 79.365 124.324 - 7,200 -
stockcycle 626/432/290 1 119,948.675 119,948.676 - 244.67 -
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
Table 8
Results for GLOBALLib and MINLPLib.
www.it-ebooks.info
ex5 3 3 563/0/550 (-16,872.900) 3.056 (-16,895.400) 3.056 20.30 - 22.92 -
du-opt 242/18/230 13.52 3.556 38.04 3.556 41.89 3.556 289.88 3.556
du-opt5 239/15/227 17.56 8.073 37.96 8.073 72.32 8.118 350.06 8.115
fo7 338/42/437 (8.759) 22.518 (1.95) 22.833 - 22.518 - 24.380
m6 254/30/327 185.13 82.256 54.13 82.256 154.42 82.256 211.49 82.256
no7 ar2 1 394/41/551 (90.583) 127.774 (73.78) 111.141 - 122.313 - 107.871
AN ALGORITHMIC FRAMEWORK FOR MINLP
* This time and correct solution were obtained with non-default options of Couenne (which failed with default settings).
345
346 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
REFERENCES
www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 347
[9] R. Fletcher and S. Leyffer, Solving mixed integer nonlinear programs by outer
approximation, Mathematical Programming, 66 (1994), pp. 327–349.
[10] R. Fourer, D. Gay, and B. Kernighan, AMPL: A Modeling Language for
Mathematical Programming, Duxbury Press/Brooks/Cole Publishing Co., sec-
ond ed., 2003.
[11] GLOBALLib. www.gamsworld.org/global/globallib.htm.
[12] O. Günlük, J. Lee, and R. Weismantel, MINLP strengthening for separable con-
vex quadratic transportation-cost UFL, 2007. IBM Research Report RC24213.
[13] L. Liberti, Writing global optimization software, in Global Optimization: From
Theory to Implementation, L. Liberti and N. Maculan, eds., Springer, Berlin,
2006, pp. 211–262.
[14] MATLAB. www.mathworks.com/products/matlab/, R2007a.
[15] MINLPLib. www.gamsworld.org/minlp/minlplib.htm.
[16] I. Nowak, H. Alperin, and S. Vigerske, LaGO – an object oriented library for
solving MINLPs, in Global Optimization and Constraint Satisfaction, vol. 2861
of Lecture Notes in Computer Science, Springer, Berlin Heidelberg, 2003,
pp. 32–42.
[17] I. Quesada and I. Grossmann, An LP/NLP based branch and bound algorithm
for convex MINLP optimization problems, Computer & Chemical Engineering,
16 (1992), pp. 937–947.
[18] N. Sahinidis, BARON: A general purpose global optimization software package,
J. Global Opt., 8 (1996), pp. 201–205.
[19] A. Wächter and L.T. Biegler, On the implementation of a primal-dual inte-
rior point filter line search algorithm for large-scale nonlinear programming,
Mathematical Programming, 106 (2006), pp. 25–57.
www.it-ebooks.info
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MIXED-INTEGER
SIGNOMIAL PROGRAMMING PROBLEMS
ANDREAS LUNDELL∗ AND TAPIO WESTERLUND∗
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 349
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_12,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
350 ANDREAS LUNDELL AND TAPIO WESTERLUND
minimize f (x),
subject to Ax = a, Bx ≤ b,
(2.1)
g(x) ≤ 0,
q(x) + σ(x) ≤ 0.
J
N
p
σ(x) = cj xi ji , (2.2)
j=1 i=1
where the coefficients cj and the powers pji are real-valued. A posynomial
term is a positive signomial term, so signomial functions are generalizations
of posynomial functions, since the terms are allowed to be both positive
and negative. Note that, if a certain variable xi does not exist in the j-th
term, then pji = 0.
The variables xi are allowed to be reals or integers. Since imaginary
solutions to the problems are not allowed, negative values on a variable
appearing in a signomial term with noninteger power must be excluded.
Also, zero has to be excluded in case a power is negative. Thus, all variables
occurring in the signomial terms are assumed to have a positive fixed lower
bound. For variables having a lower bound of zero, this bound may be
approximated with a small positive lower bound of > 0. Furthermore,
translations of the form xi = xi + τi , where τi > | min xi |, may be used
for variables with a nonpositive lower bound. Note however, that using
translations may introduce additional variables and signomial terms into
the problem.
Signomials are often highly nonlinear and nonconvex. Unfortunately,
convex envelopes are only known for some special cases, for instance, so-
called McCormick envelopes for bilinear terms [22]. Therefore, other tech-
niques for dealing with optimization problems containing signomial func-
tions are needed, including the αBB underestimator [1, 8] or the methods
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 351
used in BARON [27, 29]. These techniques are not confined to convexify-
ing signomial functions only, rather they can be applied to larger classes of
nonconvex functions.
In the transformation method presented here, based on single-variable
power and exponential transformations, the convexity of signomial func-
tions is guaranteed termwise. This is, however, only a sufficient con-
dition for convexity; for instance, the signomial function f (x1 , x2 ) =
x21 + 2x1 x2 + x22 is convex although the middle term is nonconvex. In
the next theorem convexity conditions for signomial terms, first derived in
[21], are given.
Theorem 2.1. The positive signomial term s(x) = c · xp1i · · · xpNN ,
where c > 0, is convex if one of the following two conditions is fulfilled: (i)
all powers pi are negative, or (ii) one power pk is positive, the rest of the
powers pi , i = k are negative and the sum of the powers is greater than or
equal to one, i.e.,
N
pi ≥ 1. (2.3)
i=1
The negative signomial term s(x) = c · xp1i · · · xpNN , where c < 0, is convex
if all powers pi are positive and the sum of the powers is between zero and
one, i.e.,
N
0≤ pi ≤ 1. (2.4)
i=1
1 x21 √
= x−1 −1
1 x2 , = x21 x−1
2 and − x1 = −x0.5
1 (2.5)
x1 x2 x2
while the following terms are nonconvex
x1 √
= x1 x−1
2 , x1 = x0.5
1 and − x 1 x2 . (2.6)
x2
www.it-ebooks.info
352 ANDREAS LUNDELL AND TAPIO WESTERLUND
f (x)
X3
w2 = 0.25, w3 = 0.75
X2
w1 = 0.5, w2 = 0.5
X1 x
x1 x2 x3
given, but other variants, including methods using binary variables, can be
found in [9]. However, SOS formulations are often computationally more
efficient in optimization problems than those using binary variables.
A special ordered set of type 2 is defined as a set of integers, con-
tinuous or mixed-integer and continuous variables, where at most two
variables in the set are nonzero, and if there are two nonzero variables,
these must be adjacent in the set. For example, the sets {1, 0, . . . , 0} and
{0, a, b, 0 . . . , 0}, a, b ∈ R are SOS type 2 sets.
In the PLF formulation in [3] presented here, one SOS of type 2,
{wk }Kk=1 , is used, with the additional conditions that all variables wk as-
sume positive real values between zero and one, and that the sum of the
variables in the set is equal to one, i.e.,
K
∀ k = 1, . . . , K : 0 ≤ wk ≤ 1 and wk = 1. (2.7)
k=1
K
x= xk wk , (2.8)
k=1
and the PLF approximating the function f in the interval [x, x] becomes
K
fˆ(x) = Xk wk , (2.9)
k=1
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 353
In step (i), the nonconvex signomial terms in the signomial function qm (x)
are convexified using single-variable transformations xi = T ji (X ji ), where
X ji is the new transformation variable. In this step, the problem is refor-
mulated so that the generalized signomial constraints are convex, but the
problem itself is still nonconvex, since the nonlinear equality constraints
representing the relations between the transformation variables and the
original variables, i.e., X ji = T ji −1 (xi ), must be included. In step (ii),
however, these relations are approximated with PLFs, in such a way that
the relaxed and convex feasible region of the transformed problem will
overestimate that of the original problem. Thus, the solution to the trans-
formed problem will be a lower bound of the original problem. The lower
bound can then be improved by iteratively adding more breakpoints to the
PLFs and, if the breakpoints are chosen in a certain way, the solution of
the approximated problems will form a converging sequence to the global
optimal solution of the original nonconvex problem.
The transformation technique using single-variable transformations
have been studied previously in many papers, e.g., [4, 15, 25, 31].
3.1. Transformations for positive terms. A positive signomial
term is convexified using single-variable power transformations (PTs), as
well as underestimated by expressing the relation between the original and
transformation variables with PLFs according to
p p
s(x) = c xi i = c xi i · Xipi Qi = sC (x, X) ≥ sC (x, X̂), (3.2)
i i:pi <0 i:pi >0
www.it-ebooks.info
354 ANDREAS LUNDELL AND TAPIO WESTERLUND
in Theorem 2.1. Note that, for a positive term, only the variables xi having
a negative power pi must be transformed.
Definition 3.1. The NPT convex underestimator for a positive sig-
nomial term is obtained by applying the transformation
xi = XiQi , Qi < 0, (3.3)
to all variables xi with positive powers (pi > 0) as long as the inverse
1/Q
transformation Xi = xi i is approximated by a PLF X̂i .
The transformation technique in [14] and [30] is similar to the NPT,
in the respect that it employs single-variable PTs with negative powers
approximated by linear functions; it is, however designed to be used in a
branch-and-bound type framework.
Definition 3.2. The PPT convex underestimator for a positive sig-
nomial term is obtained by applying the transformation
xi = XiQi , (3.4)
to all variables with positive powers, where the transformation powers Qi <
0 for all indices i, except for one (i = k), where Qk ≥ 1. Furthermore, the
condition
pi Qi + pi ≥ 1 (3.5)
i:pi >0 i:pi <0
1/Q
must be fulfilled and the inverse transformation Xi = xi i approximated
by a PLF X̂i .
There is also another transformation scheme available for convexify-
ing positive signomial terms, namely the exponential transformation (ET).
The single-variable ET has been used for a long time for reformulation of
nonconvex geometric programming problems. This transformation is based
on the fact that the function
f (x) = c · ep1 x1 +p2 x2 +...+pi xi · xi+1
p p
i+1 i+2
xi+2 · · · xpI I , (3.6)
where c > 0, p1 , . . . , pi > 0 and pi+1 , . . . , pI < 0, is convex on Using Rn+ .
the ET, a nonconvex positive signomial term is convexified and underesti-
mated according to
p p
s(x) = c xi i = c xi i · epi Xi = sC (x, X) ≥ sC (x, X̂). (3.7)
i i:pi <0 i:pi >0
As in the NPT and PPT, only the variables xi having a positive power pi
must be transformed.
Definition 3.3. The ET convex underestimator for a positive signo-
mial term is obtained by applying the transformation
xi = eXi (3.8)
to the individual variables with positive powers as long as the inverse trans-
formation Xi = ln xi is approximated by a PLF X̂i .
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 355
1/Q 1/Q
where the inverse transformations X1,P = x1 1 and X2,P = x2 2 have
been replaced with the PLF approximations X̂1,P and X̂2,P respectively. If
the transformation used is the NPT, both Q1 and Q2 must be negative, and
if the PPT is used, one of these must be positive, the other negative and
Q1 + Q2 ≥ 1. For example, Q1 = Q2 = −1 or Q1 = 2 and Q2 = −1 gives
convex underestimators for f (x1 , x2 ).
Example 3. The convex underestimator for the function f (x1 , x2 ) =
x1 /x2 = x1 x−1
2 obtained when applying the ET to the function is given by
where the inverse transformation X1,E = ln x1 has been replaced with the
PLF approximation X̂1,E , i.e., only one variable is transformed. When ap-
plying either of the PTs to the function, the convex underestimator becomes
Q1 −1
fˆP (X̂1,P , x2 ) = X̂1,P x2 , (3.12)
1/Q
where the inverse transformation X1,P = x1 1 has been replaced with
the PLF approximation X̂1,P . If the transformation used is the NPT, Q1
must be negative, and if the PPT has been used, Q1 must be positive and
Q1 − 1 ≥ 1, i.e., Q1 ≥ 2. For example, Q1 = 2 can be used. Also in these
cases, only one transformation is needed.
3.2. Transformations for negative terms. In a similar way as for
positive terms, a negative signomial term is convexified and underestimated
using single-variable PTs according to
s(x) = c xpi i = c Xipi Qi · Xipi Qi
i i:pi <0 i:pi >0
(3.13)
= sC (x, X) ≥ sC (x, X̂).
www.it-ebooks.info
356 ANDREAS LUNDELL AND TAPIO WESTERLUND
Q2 Q2
PPT
1 1
1 Q1 PT 1 Q1
PPT
NPT
Fig. 2: Overviews of how the bilinear terms x1 x2 and −x1 x2 are trans-
formed using the PTs into X1Q1 X2Q2 . The colored regions indicates convex
transformations.
However, as the convexity conditions for negative terms are different ac-
cording to Theorem 2.1, the requirements on the powers Qi in the trans-
formations are also different. For example, variables with negative powers
also require transformations.
Definition 3.4. A convex underestimator for a negative signomial
term is obtained by applying the transformation
xi = XiQi , (3.14)
where 0 < Qi ≤ 1 for all variables with positive powers and Qi < 0 for
all variables with negative power, to the individual variables in the term.
Furthermore, the condition
0< pi Qi ≤ 1, (3.15)
i
1/Qi
must be fulfilled and the inverse transformation Xi = xi approximated
by a PLF X̂i .
3.2.1. Example of the transformation technique. In the follow-
ing example, a convex underestimator for a negative signomial term is
obtained using the previous definition.
Example 4. The convex underestimator for the function f (x1 , x2 ) =
−x1 x−1
2 , obtained by applying the PTs for negative terms, is given by
−1·Q2
fˆP (X̂1,P , X̂2,P ) = −X̂1,P
Q1
X̂2,P , (3.16)
1/Q 1/Q
where the inverse transformations X1,P = x1 1 and X2,P = x2 2 have
been replaced with the PLF approximations X̂1,P and X̂2,P respectively.
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 357
In this case, Q1 must be positive and Q2 negative for the powers in the
convexified term 1 · Q1 and −1 · Q2 to both be positive. Furthermore, the
sum of the powers in the transformed term needs to be less than or equal to
one, so for example Q1 = 1/2 and Q2 = −1/2 give a valid transformation.
3.3. Relationships between the transformations. Application of
the single-variable transformations together with the approximation using
PLFs, results in convex underestimators of the original nonconvex term.
Depending on what transformation is used, and in the case of the PTs,
what the value of the transformation power Q is, different underestimation
errors occur. In this section, some results regarding the relationships, con-
nected to the underestimation properties, between the transformations are
summarized. More detail on this subject, as well as proofs of the results,
can be found in [15] and [18].
The first result is regarding the underestimation error resulting from
transforming an individual nonconvex power function xp .
Theorem 3.1. Assume that a single-variable ET and single-variable
PTs with positive and negative powers, i.e., the transformations
have been replaced by the PLFs X̂E , X̂P and X̂N respectively.
Although this theorem states that any PT with positive transformation
power always gives a tighter convex underestimator than the ET and the
ET a tighter convex underestimator than any PT with negative transfor-
mation power, the limit of the single-variable PTs when the transformation
power Q tends to plus or minus infinity is actually the single-variable ET:
Theorem 3.2. For the piecewise linear approximations X̂P , X̂N and
X̂E of the single-variable PT with positive and negative powers and ET
respectively, the following statement is true
www.it-ebooks.info
358 ANDREAS LUNDELL AND TAPIO WESTERLUND
Using the results from the previous theorems, the following theorem
regarding the underestimation properties of a general positive signomial
term can be obtained:
Theorem 3.3. For a general nonconvex signomial term transformed
using the ET, NPT or PPT, the following statements are true:
(i) The ET always gives a tighter underestimator than the NPT.
(ii) The PPT gives a tighter underestimator than the NPT as long as the
transformation powers Qi,N and Qi,P in the NPT and PPT respectively,
fulfill the condition
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 359
yes
www.it-ebooks.info
360 ANDREAS LUNDELL AND TAPIO WESTERLUND
∗
where {x̌i,k }K
k=1 is the set of breakpoints for the variable xi and xi is the
i
optimal value for the variable xi in the current iteration. This criterion
is based on the fact that a solution of the transformed problem is exactly
equal to the solution given by the original problem at the breakpoints of
the PLFs.
4.4. Updating the PLFs. If neither of the termination criteria is
met, additional breakpoints must be added to the PLFs. Now there are
two things that must be considered, namely which transformation variable
approximations to improve and which points to add to the corresponding
PLFs? This subject is explained in detail in [31]; here, only a brief summary
is given.
The simplest strategy for selecting the variables, is to add breakpoints
to the PLFs of all variables transformed. However, for large problems with
many different transformations, this may make the transformed problem
unnecessary complex. Instead, breakpoints could be added to as few PLFs
as possible. For example, it is not necessary to update the PLFs cor-
responding to transformation variables in nonconvex constraints already
fulfilled. A further restriction is to only add breakpoints to the variables
in the constraints violated the most in each iteration.
The breakpoints to be added must also be determined. Several strate-
gies exist, for example the solution point in the previous iteration can be
added. However, this strategy may, unfortunately, lead to subproblems
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 361
X X
x x
x x x x∗ x
(a) The original PLF (b) Solution point x∗ added
X X
x x
x (x − x)/2 x x x̃ x
(c) Midpoint of interval added (d) Most deviating point x̃ added
x1/Q − x1/Q
x̃ = Q· (4.4)
x−x
www.it-ebooks.info
362 ANDREAS LUNDELL AND TAPIO WESTERLUND
minimize − x1 − 3x2 ,
subject to − 0.5x1 + x2 ≤ 3,
0.5x1 − x2 ≤ 1,
8
(5.1)
σ(x1 , x2 ) = sj (x1 , x2 ) + 1.5 ≤ 0,
j=1
1 ≤ x1 ≤ 7, 1 ≤ x2 ≤ 5,
x1 ∈ Z+ , x2 ∈ R+ .
The signomial terms sj (x1 , x2 ) in the signomial constraint, and the trans-
formations convexifying them, are listed in Table 1. The signomial function
σ(x1 , x2 ) is a modification of the Taylor series expansion of the function
sin(x1 +x2 ) at the point (x1 , x2 ) = (1, 1). The feasible region of the original
problem is shown in Figure 5.
As can be seen from Table 1, a total number of nine transformations
are needed to convexify the nonconvex signomial terms. However, of these
transformations, some are identical in different terms, so the number of
different transformations are five, three for the variable x1 and two for
x2 . Furthermore, only two special ordered sets are needed in the PLF
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 363
Table 1
The signomial terms in the signomial constraint σ(x1 , x2 ) in the example
in Section 5.
formulation, one for each variable. Thus, the reformulated problem will
become
minimize − x1 − 3x2 ,
subject to − 0.5x1 + x2 ≤ 3,
0.5x1 − x2 ≤ 1,
8
(5.2)
ŝj (x1 , x2 , X̂1,◦ , X̂2,◦ ) + 1.5 ≤ 0,
j=1
1 ≤ x1 ≤ 7, 1 ≤ x2 ≤ 5,
x1 ∈ Z+ , x2 ∈ R+ ,
where the transformation variables X1,◦ and X2,◦ have been replaced with
the PLFs X̂1,◦ and X̂2,◦ , which are formulated using the expressions from
Section 2.1. The interval endpoints for the variables x1 and x2 are used as
initial breakpoints for the PLFs. The overestimated feasible region of the
problem in the first iteration is illustrated in Figure 7a.
This problem is then solved using a MINLP solver able to solve convex
MINLP problems to global optimality. In this case GAMS/αECP [32]
has been used. The optimal solution in the first iteration is (x1 , x2 ) =
(6, 5.00), which gives an objective function value of −21.00. The original
signomial constraint σ(x1 , x2 ) has the value 90.48 in this point, so it is not
yet optimal since the value is positive. Therefore, more SGO iterations are
needed and additional breakpoints must be added in the next iteration. In
this example, two different strategies are used: the first one is to add the
solution point of the transformed variables, and the second is to add the
midpoint of the current interval of breakpoints which the solution belongs
www.it-ebooks.info
364 ANDREAS LUNDELL AND TAPIO WESTERLUND
−14
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 365
Table 2
The solution of the convex MINLP problem in each SGO iteration in the
example in Section 5 when using the midpoint strategy.
Table 3
The results of solving some trim-loss problems available in the MINLP Li-
brary using an implementation of the SGO algorithm. The number of vari-
ables indicated are in the original nonconvex problem. All transformations
are power transformations of the type xi = Xi0.5 .
Problem #bin. vars #int. vars #transf. SGO sol. Glob. sol.
ex1263a 4 20 20 9.3 9.3
ex1264a 4 20 20 8.6 8.6
ex1265a 5 30 30 10.3 10.3
ex1266a 6 42 42 16.3 16.3
Table 3, the globally optimal solution was found in all cases (and all choices
of breakpoint and variable selection strategies).
7. Conclusions. In this chapter, the SGO algorithm, a global opti-
mization algorithm for solving nonconvex MISP problems to global opti-
mality as a sequence of convex and overestimated subproblems, was pre-
sented. It was shown how the nonconvex problem is reformulated using
single-variable transformations applied to the nonconvex signomial terms,
after which the relationship describing the inverse transformation between
the transformation variables and the original variables are approximated
using PLFs. This resulted in a relaxed convex problem, the feasible region
of which is an overestimation of that of the original problem. The solution
of this transformed problem provides a lower bound to the solution of the
original problem, and by iteratively improving the PLFs by adding addi-
tional breakpoints, the global optimal solution can be found. It was also
illustrated, through an example, how the strategy for selecting the break-
points impacted the number of iterations required to solve the problem.
www.it-ebooks.info
366 ANDREAS LUNDELL AND TAPIO WESTERLUND
Fig. 7: The feasible region of the convex overestimated problem when using
the midpoint strategy. The dark gray region is the integer-relaxed feasible
region of the nonconvex problem and the lighter parts correspond to the
piecewise convex overestimation of the signomial constraint σ(x1 , x2 ). The
dark points correspond to the optimal solution of the current iteration.
The dashed lines indicate the location of the breakpoints in the PLFs.
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 367
Fig. 8: The feasible region of the convex overestimated problem when using
the solution point strategy. The dark gray region is the integer-relaxed
feasible region of the nonconvex problem and the lighter parts correspond to
the piecewise convex overestimation of the signomial constraint σ(x1 , x2 ).
The dark points correspond to the optimal solution of the current iteration.
The dashed lines indicate the location of the breakpoints in the PLFs.
www.it-ebooks.info
368 ANDREAS LUNDELL AND TAPIO WESTERLUND
REFERENCES
[1] C.S. Adjiman, S. Dallwig, C.A. Floudas, and A. Neumaier, A global opti-
mization method, αBB, for general twice-differentiable constrained NLPs –
I. Theoretical advances, Computers and Chemical Engineering, 22 (1998),
pp. 1137–1158.
[2] M. Avriel and D.J. Wilde, Optimal condenser design by geometric program-
ming, Industrial & Engineering Chemistry Process Design and Development,
6 (1967), pp. 256–263.
[3] E.M.L. Beale and J.J.H. Forrest, Global optimization using special ordered
sets, Mathematical Programming, 10 (1976), pp. 52–69.
[4] K.-M. Björk, A Global Optimization Method with Some Heat Exchanger Network
Applications, PhD thesis, bo Akademi University, 2002.
[5] K.-M. Björk, I. Grossmann, and T. Westerlund, Solving heat exchanger net-
work synthesis problems with non-constant heat capacity flowrates and heat
transfer coefficients, AIDIC Conference Series, 5 (2002), pp. 41–48.
[6] G.E. Blau and D.J. Wilde, A lagrangian algorithm for equality constrained gen-
eralized polynomial optimization, AIChE Journal, 17 (1971), pp. 235–240.
[7] R.J. Duffin and E.L. Peterson, Duality theory for geometric programming,
SIAM Journal on Applied Mathematics, 14 (1966), pp. 1307–1349.
[8] C.A. Floudas, Deterministic Global Optimization. Theory, Methods and Appli-
cations, no. 37 in Nonconvex Optimization and Its Applications, Kluwer Aca-
demic Publishers, 1999.
[9] C.A. Floudas and P.M. Pardalos, eds., Encyclopedia of Optimization, Kluwer
Academic Publishers, 2001.
[10] I. Harjunkoski, T. Westerlund, R. Pörn, and H. Skrifvars, Different trans-
formations for solving non-convex trim-loss problems by MINLP, European
Journal of Operational Research, 105 (1998), pp. 594–603.
[11] Y.H.A. Ho, H.-K. Kwan, N. Wong, and K.-L. Ho, Designing globally optimal
delta-sigma modulator topologies via signomial programming, International
Journal of Circuit Theory and Applications, 37 (2009), pp. 453–472.
[12] R. Jabr, Inductor design using signomial programming, The International Journal
for Computation and Mathematics in Electrical and Electronic Engineering,
26 (2007), pp. 461–475.
[13] T.R. Jefferson and C.H. Scott, Generalized geometric programming applied to
problems of optimal control: I. Theory, Journal of Optimization Theory and
Applications, 26 (1978), pp. 117–129.
[14] H.-C. Lu, H.-L. Li, C.E. Gounaris, and C.A. Floudas, Convex relaxation for
solving posynomial programs, Journal of Global Optimization, 46f (2010),
pp. 147–154.
[15] A. Lundell, Transformation Techniques for Signomial Functions in Global Opti-
mization, PhD thesis, Åbo Akademi University, 2009.
[16] A. Lundell, J. Westerlund, and T. Westerlund, Some transformation tech-
niques with applications in global optimization, Journal of Global Optimiza-
tion, 43 (2009), pp. 391–405.
[17] A. Lundell and T. Westerlund, Exponential and power transformations for
convexifying signomial terms in MINLP problems, in Proceedings of the 27th
IASTED International Conference on Modelling, Identification and Control,
L. Bruzzone, ed., ACTA Press, 2008, pp. 154–159.
[18] , Convex underestimation strategies for signomial functions, Optimization
Methods and Software, 24 (2009), pp. 505–522.
[19] , Implementation of a convexification technique for signomial functions, in
19th European Symposium on Computer Aided Process Engineering, J. Je-
zowski and J. Thullie, eds., Vol. 26 of Computer Aided Chemical Engineering,
Elsevier, 2009, pp. 579–583.
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 369
www.it-ebooks.info
www.it-ebooks.info
PART VI:
Mixed-Integer Quadraticaly
Constrained Optimization
www.it-ebooks.info
www.it-ebooks.info
THE MILP ROAD TO MIQCP
SAMUEL BURER∗ AND ANUREET SAXENA†
1. Introduction. More than fifty years have passed since Dantzig et.
al. [25] solved the 50-city travelling salesman problem. An achievement
in itself at the time, their seminal paper gave birth to one of the most
succesful disciplines in computational optimization, Mixed Integer Linear
Programming (MILP). Five decades of wonderful research, both theoretical
and computational, have brought mixed integer programming to a stage
where it can solve many if not all MILPs arising in practice (see [43]).
The ideas discovered during the course of this development have naturally
influenced other disciplines. Constraint programming, for instance, has
adopted and refined many of the ideas from MILP to solve more general
classes of problems [2].
Our focus in this paper is to track the influence of MILP in solving
mixed integer quadratically constrained problems (MIQCP). In particu-
lar, we survey some of the recent research on MIQCP and establish their
connections to well known ideas in MILP. The purpose of this is two-fold.
First, it helps to catalog some of the recent results in a form that is accessi-
ble to a researcher with reasonable background in MILP. Second, it defines
a roadmap for further research in MIQCP; although significant progress
has been made in the field of MIQCP, the “breakthrough” results are yet
to come and we believe that the past of MILP holds the clues to the future
of MIQCP.
Specifically, we focus on the following mixed integer quadratically con-
strained problem
min xT Cx + cT x (MIQCP)
s.t. x∈F
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 373
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_13,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
374 SAMUEL BURER AND ANUREET SAXENA
where
⎧ ⎫
⎨ xT Ak x + aTk x ≤ bk ∀ k = 1, . . . , m ⎬
F := x ∈ Rn : l≤x≤u .
⎩ ⎭
xi ∈ Z ∀i∈I
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 375
In this sense, they generalize the basic LP relaxation often used in MILP.
We also catalog several known and new results establishing the strength
of these inequalities for certain specifications of F . Then, in Section 3,
we describe several related approaches that shed further light on convex
relaxations of (MIQCP).
In Section 4, we discuss methods for dynamically generating valid in-
equalities, which can further improve the relaxations. One of the funda-
mental tools is that of disjunctive programming, which has been used in
the MILP community for five decades. However, the disjunctions employed
herein are new in the sense that they truly exploit the quadratic form
of (MIQCP). Recently, Belotti [11] studies disjunctive cuts for general
MINLP.
Finally, in Section 5, we consider a short computational study to give
some sense of the computational effort and effect of the methods surveyed
in this paper.
1.1. Notation and terminology. Most of the notation used in this
paper is standard. We define here just a few perhaps atypical notations.
For symmetric matrices A and B of conformable dimensions, we define
"A, B# = tr(AB); 6a standard7 fact is that the quadratic form xT Ax can
T
be represented as A, xx . For a set P in the space of variables (x, y),
projx (P ) denotes the coordinate projection of P onto the space x. clconv P
is the closed convex hull of P . For a square matrix A, diag(A) denotes the
vector of diagonal entries of A. The vector e is the all-ones vector, and ei
is the vector having all zeros except a 1 in position i. The notation X 0
means that X is symmetric positive semidefinite; X ' 0 means symmetric
negative semidefinite.
2. Convex relaxations and valid inequalities. In this section,
we describe strong convex relaxations of (MIQCP) and F, which arise
from the algebraic description of F. For the purposes of presentation, we
partition the indices [m] of the quadratic constraints into three groups:
“linear” LQ := {k : Ak = 0}
“convex quadratic” CQ := {k : Ak = 0, Ak 0}
“nonconvex quadratic” N Q := {k : Ak = 0, Ak
0}.
www.it-ebooks.info
376 SAMUEL BURER AND ANUREET SAXENA
So F ⊆ (n may be rewritten as
⎧ ⎫
⎪
⎪ 3 3 aTk x ≤ bk ∀ k ∈ LQ ⎪
⎪
⎪
⎪ 3 T 3 1 ⎪
⎪
⎪
⎪ 3 1 T Bk x 3 T ⎪
⎪
⎨ 3 (a x − bk + 1) 3 ≤ 2 (1 − ak x + bk ) ∀ k ∈ CQ ⎬
F= x : 2 k .
⎪
⎪ xT Ak x + aTk x ≤ bk ∀ k ∈ N Q ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ l≤x≤u ⎪
⎪
⎩ ⎭
xi ∈ Z ∀ i ∈ I
X=xxT
xT Cx + cT x = "C, X# + cT x.
So (MIQCP) becomes
where
⎧ ⎫
⎪
⎪ "Ak , X# + aTk x ≤ bk ∀ k = 1, . . . , m ⎪
⎪
⎨ ⎬
l≤x≤u
F- := (x, X) ∈ Rn × S n : .
⎪
⎪ xi ∈ Z ∀ i ∈ I ⎪
⎪
⎩ ⎭
X = xxT
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 377
0 ≤ (α0 − αT x)(β0 − β T x) = α0 β0 − β0 αT x − α0 β T x + xT αβ T x
The linearizations of (2.2) were considered in [37] and are sometimes re-
ferred to as rank-2 linear inequalities [33]. So we denote the collection of
all (x, X), which satisfy these linearizations, as R2 , i.e.,
In particular, the linearized versions of (2.2a) are called the RLT in-
equalities after the “reformulation-linearization technique” of [55], though
they first appeared in [3, 4, 40]. These inequalities have been studied exten-
sively because of their wide applicability and simple structure. Specifically,
www.it-ebooks.info
378 SAMUEL BURER AND ANUREET SAXENA
which is linearized as
3 3
3 x1 − X11 − X12 3
3 3
3 x2 − X21 − X22 3 ≤ 2/3(1 − x1 − x2 ).
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 379
- We define
is valid for clconv F.
Instead of enforcing (x, X) ∈ PSD, i.e., the full PSD condition (2.6),
one can enforce relaxations of it. For example, since all principal submatri-
ces of Y 0 are semidefinite, one could enforce just that all or some of the
2 × 2 principal submatrices of Y are positive semidefinite. This has been
done in [32], for example.
2.5. The strength of valid inequalities. From the preceding sub-
sections, we have the following result by construction:
Proposition 2.3. clconv F- ⊆ L- ∩ RLT ∩ R2 ∩ S2 ∩ PSD.
Even though R2 ⊆ RLT, we retain RLT in the above expression for em-
phasis. We next catalog and discuss various special cases in which equality
is known to hold in Proposition 2.3.
2.5.1. Simple bounds. We first consider the case when F is defined
by simple, finite bounds, i.e., F = {x ∈ Rn : l ≤ x ≤ u} with (l, u) finite
in all components. In this case, R2 = RLT ⊆ L- and S2 is vacuous. So
Proposition 2.3 can be stated more simply as clconv F- ⊆ RLT ∩ PSD.
Equality holds if and only if n ≤ 2:
Theorem 2.1 (Anstreicher and Burer [6]). Let F = {x ∈ Rn : l ≤
x ≤ u} with (l, u) finite in all components. Then clconv F- ⊆ RLT ∩ PSD
with equality if and only if n ≤ 2.
For n > 2, [6] and [19] derive additional valid inequalities but are still
unable to determine an exact representation by valid inequalities even for
n = 3. ([6] does give an exact disjunctive representation for n = 3.)
We also mention a classical result, which is in some sense subsumed by
Theorem 2.1. Even still, this result indicates the strength of the RLT in-
equalities and can be useful when one-variable quadratics Xii = x2i are not
of interest. The result does not fully classify clconv F- but rather coordinate
projections of it.
Theorem 2.2 (Al-Khayyal and Falk [4]). Let F = {x ∈ Rn : l ≤
x ≤ u} with (l, u) finite in all components. Then, for all 1 ≤ i < j ≤ n,
www.it-ebooks.info
380 SAMUEL BURER AND ANUREET SAXENA
proj(xi ,xj ,Xij ) (clconv F-) = RLTij , where RLTij := {(xi , xj , Xij ) ∈ R3 :
(2.3) holds}.
2.5.2. Binary integer grid. We next consider the case when F is a
binary integer grid: that is, F = {x ∈ Zn : l ≤ x ≤ u} with u = l + e and l
finite in all components. Note that this is simply a shift of the standard 0-1
binary grid and that clconv F- is a polytope. In this case, R2 = RLT ⊆ L-
and S2 is vacuous. So Proposition 2.3 states that clconv F- ⊆ RLT ∩ PSD.
Also, some additional, simple linear equations are valid for clconv F. -
Proposition 2.4. Suppose that i ∈ I has ui = li + 1 with li finite.
Then the equation Xii = (1 + 2 li )xi − li − li2 is valid for clconv F-.
Proof. The shift xi −li is either 0 or 1. Hence, (xi −li )2 = xi −li . After
linearization with Xii = x2i , this quadratic equation becomes the claimed
linear one.
When I = [n], the individual equations Xii = (1 + 2 li )xi − li − li2 can
be collected as diag(X) = (e + 2 l) ◦ x − l − l2 . We remark that the next
result does not make use of PSD.
Theorem 2.3 (Padberg [44]). Let F = {x ∈ Zn : l ≤ x ≤ u} with
u = l + e and l finite in all components. Then
clconv F- ⊆ RLT ∩ (x, X) : diag(X) = (e − 2 l) ◦ x − l − l2
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 381
we can write
T
1 xT ζj ζj
Y = =
x X zj zj
j
T 0 0 T
1 1
= ζj2 +
ζj−1 zj ζj−1 zj zj zj
j:ζj >0 j:ζj =0
0 0T
1 ζj−1 zjT
= ζj2 −1 + .
ζj zj ζj−2 zj zjT 0 zj zjT
j:ζj >0 j:ζj =0
2
This shows that j:ζj >0 ζj = 1 and, from (2.8), that (x, X) is expressed as
the convex combination of points in F- plus the sum of points in rcone(L- ∩
RLT ∩ PSD), as desired.
Using the lemma, we can prove equality in Proposition 2.3 for n ≤ 3.
Theorem 2.4. Let F = {x ∈ Rn : x ≥ 0}. Then clconv F- ⊆
-
L ∩ RLT ∩ PSD with equality if n ≤ 3.
www.it-ebooks.info
382 SAMUEL BURER AND ANUREET SAXENA
Proof. The first statement of the theorem is just Proposition 2.3. Next,
for contradiction, suppose there exists (x̄, X̄) ∈ L- ∩ RLT ∩ PSD \ clconv F.
-
By the separating hyperplane theorem, there exists (c, C) such that
Since (x̄, X̄) ∈ L- ∩ RLT ∩ PSD, by the lemma there exists (z, Z) ∈ conv F-
and (0, D) ∈ rcone(L- ∩ RLT ∩ PSD) such that (x̄, X̄) = (z, Z + D). Thus,
"C, D# < 0.
Since D ≥ 0, D 0, and n ≤ 3, D is completely positive, i.e., there
exists rectangular N ≥ 0 such that D = N N T . We have "C, N N T # < 0,
which implies dT Cd < 0 for some nonzero column d ≥ 0 of N . It follows
that d is a negative direction of recession for the function xT Cx + cT x. In
other words,
a contradiction.
A related result occurs for a bounded slice of the nonnegative orthant,
e.g., the standard simplex {x ≥ 0 : eT x = 1}. In this case, however, the
boundedness, the linear constraint, and R2 ensure that equality holds in
Proposition 2.3 for n ≤ 4.
Theorem 2.5 (Anstreicher and Burer [6]). Let F := {x ≥ 0 : eT x =
1}. Then clconv F- ⊆ L- ∩ RLT ∩ R2 ∩ PSD with equality if and only if
n ≤ 4.
[36] and [6] also give related results where F is an affine transformation
of the standard simplex.
2.5.4. Half ellipsoid. Let F be a half ellipsoid, that is, the intersec-
tion of a linear half-space and a possibly degenerate ellipsoid. In contrast
to the previous cases considered, [57] proved that this case achieves equality
in Proposition 2.3 regardless of the dimension n. On the other hand, the
number of constraints is fixed. In particular, all simple bounds are infinite,
|LQ| = 1, |CQ| = 1, and N Q = ∅ in which case Proposition 2.3 states
simply clconv F- ⊆ L- ∩ S2 ∩ PSD.
Theorem 2.6 (Sturm and Zhang [57]). Suppose
aT1 x ≤ b1
F= x ∈ Rn :
x A2 x + aT2 x ≤ b2
T
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 383
In general, it appears that equality in Proposition 2.3 does not hold in this
case. However, we can still characterize the difference between clconv F-
and L- ∩ PSD. As it turns out in the theorem below, this difference is
precisely the recession cone of L- ∩ PSD, which equals
⎧ ⎫
⎨ 0 ≤ "A, D# if − ∞ < bl ⎬
rcone(L- ∩ PSD) = (0, D) : "A, D# ≤ 0 if bu < +∞ .
⎩ ⎭
D 0.
www.it-ebooks.info
384 SAMUEL BURER AND ANUREET SAXENA
1 xT
Y :=
x X
r̄(r̄ + 1)/2 ≤ p + q − s̄ ≤ 2,
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 385
max{li xj + xi lj − li lj , ui xj + xi uj − ui uj }
min{xi uj + li xj − li uj , xi lj + ui xj − ui lj }.
xT A+ T T −
k x + ak x ≤ bk + x Ak x.
xT A+ T
k x + a k x ≤ b k + zk
xT A−
k x ≤ zk .
www.it-ebooks.info
386 SAMUEL BURER AND ANUREET SAXENA
⎧ ⎫
⎪
⎪ aTk x ≤ bk ∀ k ∈ LQ ⎪
⎪
⎪
⎪ ⎪
⎪
⎨ x Ak x + aTk x ≤ bk
T
∀ k ∈ CQ ⎬
T +
x ∈ Rn : x Ak x + aTk x ≤ bk + zk ∀ k ∈ NQ .
⎪
⎪ ⎪
⎪
⎪
⎪ xT A−k x ≤ zk ∀ k ∈ NQ ⎪
⎪
⎩ ⎭
l ≤ x ≤ u, z ≤ μ
-
This SOCP model is shown to be dominated by the SDP relaxation L∩PSD,
-
while it is not directly comparable to the basic LP relaxation L.
The above relaxation was recently revisited in [53]. The authors stud-
ied the relaxation obtained by the following splitting of the Ak matrices,
T T
Ak = λkj vkj vkj − λkj vkj vkj ,
λkj >0 λkj <0
where {λk1 , . . . , λkn } and {vk1 , . . . , vkn } are the set of eigenvalues and
eigenvectors of Ak , respectively. The constraint xT Ak x+aTk x ≤ bk can thus
1 22 1 22
T
be reformulated as, λkj >0 λkj vkj x + aTk x ≤ bk + λkj <0 λkj vkj T
x .
1 22
T
The non-convex terms vkj x (λkj < 0) can be relaxed by using their se-
cant approximation to derive a convex relaxation of the above constraint.
Instances of (MIQCP) tend to have geometric correlations along those
vkj with λkj < 0, which can be captured by projection techniques, and
embedded within the polarity framework to derive strong cutting planes.
We refer the reader to [53] for further details.
3.3. Results relating simple bounds and the binary integer
grid. Motivated by Theorems such as 2.1 and 2.3 and the prior work of
Padberg [44] and Yajima and Fujie [60], Burer and Letchford [19] studied
the relationship between the two convex hulls
clconv (x, xxT ) : x ∈ [0, 1]n (3.1a)
T n
clconv (x, xx ) : x ∈ {0, 1} . (3.1b)
The convex hull (3.1a) has been named QP Bn by the authors because of
its relationship to “quadratic programming over the box.” The convex hull
(3.1b) is essentially the well known boolean quadric polytope BQPn [44].
In fact, the authors show that BQPn is simply the coordinate projection
of (3.1b) on the variables xi (1 ≤ i ≤ n) and Xij (1 ≤ i < j ≤ n). Note
that nothing is lost in the projection because Xii = xi and Xji = Xij are
valid for (3.1b).
We let π represent the coordinate projection just mentioned, i.e., onto
the variables xi and Xij (i < j). The authors’ result can be stated as
π(QP Bn ) = BQPn , which immediately implies the following:
Theorem 3.2 (Burer and Letchford [19]). Any inequality in the vari-
ables xi (1 ≤ i ≤ n) and Xij (1 ≤ i < j ≤ n), which is valid for BQPn , is
also valid for QP Bn.
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 387
clconv F- ⊆ H
-PSD := L- ∩ RLT ∩ R2 ∩ PSD ∩ {(x, X) : Xii = xi ∀ i ∈ I}.
www.it-ebooks.info
388 SAMUEL BURER AND ANUREET SAXENA
and define H -CPP := H - PSD ∩ CPP. The result below establishes that
- -
clconv F = HCPP .
Theorem 3.3 (Burer [18], Bomze and Jarre [15]). Let F be defined as
in (3.2). Define J := {j : ∃ k s.t. (j, k) ∈ E or (k, j) ∈ E}, and suppose xi
is bounded in {x ≥ 0 : Ax = b} for all i ∈ I ∪ J. Then clconv F- = H -CPP .
We emphasize that the result holds regardless of the boundedness of
F as a whole; it is only important that certain variables are bounded.
Completely positive representations of clconv F- for different F , which are
not already covered by the above theorem, can also be found in [49, 50].
Starting from the above theorem, Burer [17] has implemented a spe-
cialized algorithm for optimizing the relaxation H -PSD . We briefly discuss
this implementation in Section 5.
3.5. Higher-order liftings and projections. Whenever it is not
possible to capture clconv F- exactly in the lifted space (x, X), it is still
possible to lift into ever higher dimensional spaces and to linearize, say,
cubic, quartic, or higher-degree valid inequalities there. This is quite a
deep and powerful technique for capturing clconv F-. We refer the reader
to the following papers: [9, 14, 33, 34, 35, 37, 54].
One of the most famous results in this area is the sequential convexi-
fication result for mixed 0-1 linear programs (M01LPs). Balas [8] showed
that M01LPs are special cases of facial disjunctive programs which possess
the sequential convexifiability property. Simply put, this means that the
closed convex hull of all feasible solutions to a M01LP can be obtained by
imposing the 0-1 condition on the binary variables sequentially, i.e., by im-
posing the 0-1 condition on the first binary variable and convexifying the
resulting set, followed by imposing the 0-1 condition on the second variable,
and so on. This is stated as the following theorem.
Theorem 3.4 (Balas [8]). Let F be the feasible set of a M01LP, i.e.,
F = x ∈ {0, 1}n : aTk x ≤ bk ∀ k = 1, . . . , m
and define L to be its basic linear relaxation in x. For each i = 1, . . . , n,
define Ti := {x : xi ∈ {0, 1}}. and
S0 := L
Si := clconv (Si−1 ∩ Ti ) ∀ i = 1, . . . , n.
Then Sn = clconv F.
There exists an analogous sequential convexficiation for the continuous
case of (MIQCP) for general quadratic constraints.
Theorem 3.5 (Saxena et al. [52]). Suppose that the feasible region F
of (MIQCP) is bounded with I = ∅, i.e., no integer variables. For each
i = 1, . . . , n, define T-i := {(x, X) : Xii ≤ x2i }. Also define
S-0 := L- ∩ PSD
1 2
S-i := clconv S-i−1 ∩ T-i ∀ i = 1, . . . , n.
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 389
-
Then S-n = clconv F.
Part of the motivation for this theorem comes from the fact that
n
8
PSD ∩ T-i = {(x, X) : X = xxT }
i=1
i.e., enforcing all T-i along with positive semidefiniteness recovers the
"nnon-
convex condition X = xxT . This is analogous to the fact that i=1 Ti
recovers the integer condition in Theorem 3.4.
There is one crucial difference between Theorems 3.4 and 3.5. Note
that a M01LP with a single binary variable is polynomial-time solvable;
Balas [8] gave a polynomial-sized LP for this problem. On the other hand,
the analogous problem in the context of (MIQCP) involves minimizing a
linear function over a nonconvex set of the form
www.it-ebooks.info
390 SAMUEL BURER AND ANUREET SAXENA
Given an arbitrary incumbent solution (x̂, X̂), say, from optimizing over
L- or L- ∩ PSD, we would like to choose a basis {v1 , . . . , vn } whose corre-
sponding reformulation most effectively elucidates the infeasibility of (x̂, X̂)
with respect to (4.1). The problem of choosing such a basis can be formu-
lated as the following optimization problem that focuses on maximizing
the violation of (x̂, X̂) with respect to the set of nonconvex constraints
. /2
"X, vi viT # ≤ viT x :
. /2
max maxi=1...n "X̂, vi viT # − viT x̂
s.t. {v1 , . . . , vn } is an orthonormal basis.
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 391
namely, the integrality conditions on the variables xi for i ∈ I and the non-
. /2
convex constraints "X, vi viT # ≤ viT x . Integrality constraints have been
used to derive disjunctions in MILP for the past five decades; examples
of such disjunctions include elementary 0-1 disjunctions, split disjunctions,
GUB disjunctions, etc. We do not detail these disjunctions here. For con-
. /2
straints of the type "Y, vv T # ≤ v T x for fixed v ∈ Rn , Saxena et. al. [52]
proposed a technique to derive a valid disjunction, which we detail next.
. /2
Following [52], we refer to "X, vv T # ≤ v T x as a univariate expression.
Let
ηL (v) := min vT x | (x, X) ∈ P-
-
ηU (v) := max v T x | (x, X) ∈ P
θ ∈ (ηL (v), ηU (v)) .
9 :
ηL (v) ≤ v T x ≤ θ ;
T T
−(v x)(ηL (v) + θ) + θηL (v) ≤ −"X, vv #
9 :
θ ≤ v T x ≤ ηU (v)
. (4.3)
−(v T x)(ηU (v) + θ) + θηU (v) ≤ −"X, vv T #
This disjunction can be derived by splitting the range [ηL (v), ηU (v)] of the
function v T x over P- into the two intervals [ηL (v), θ] and [θ, ηU (v)] and
constructing a secant approximation of the function −(v T x)2 in each of
the intervals, respectively (see Figure 1 for an illustration).
The disjunction (4.3) can then be embedded within the framework of
Cut Generation Linear Programs (CGLPs) to derive disjunctive cuts as
discussed in the following theorem.
Theorem 4.14 ([8]). 1 Let a polyhedral set P = {x : Ax ≥ b}, a
q
disjunction D = k=1 (Dk x ≥ dk ) and a point x̂ ∈ P be given. Then
q
x̂ ∈ Q := clconv ∪k=1 {x ∈ P | Dk x ≥ dk } if and only if the optimal value
of the following Cut Generation Linear Program (CGLP) is non-negative:
1 We caution the reader that the notation used in this theorem is not specifically tied
www.it-ebooks.info
392 SAMUEL BURER AND ANUREET SAXENA
ηL (c) θ ηU (c)
L1
L2
Fig. 1. The constraint −(vT x)2 ≤ −X, vvT and the disjunction (1) represented
in the space spanned by vT x (horizontal axis) and −X, vvT (vertical axis). The fea-
sible region is the grey area above the parabola between ηL (v) and ηU (v). Disjunction
(4.3) is obtained by taking the piecewise-linear approximation of the parabola, using a
breakpoint at θ, and given by the two lines L1 and L2 . Clearly, if ηL (v) ≤ vT x ≤ θ
then (x, X) must be above L1 to be in the grey area; if θ ≤ v T x ≤ ηU (v) then (x, X)
must be above L2 .
min αT x̂ − β (CGLP)
s.t. A u + DkT vk = α
T k
k = 1, . . . , q
bT uk + dTk vk ≥ β k = 1, . . . , q
k k
u ,v ≥ 0 k = 1, . . . , q
q
. /
ξ T uk + ξkT v k = 1
k=1
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 393
is
⎛ ⎞ ⎛ ⎞
4 16 16 15
x̂ = ⎝ 4 ⎠ , X̂ = ⎝16 16 15⎠
3.75 15 15 15
and so
⎛ ⎞
0 0 0
X̂ − x̂x̂T = ⎝0 0 0 ⎠
0 0 0.9375
has exactly one non-zero eigenvalue. The associated eigenvector and uni-
variate expression are given by cT = (0, 0, 1) and X33 ≤ x23 , respectively.
Note that (x̂, X̂) satisfies the secant approximation X33 ≤ 4x3 of X33 ≤ x23
at equality; hence the secant inequality does not cut off this point. Choosing
θ = 2 in (4.3), we get the following disjunction which is satisfied by every
feasible solution (x, X) ∈ F- for this example:
;
0 ≤ x3 ≤ 2 2 ≤ x3 ≤ 4
.
2x3 − X33 ≥ 0 6x3 − X33 ≥ 8
www.it-ebooks.info
394 SAMUEL BURER AND ANUREET SAXENA
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 395
vRLT − v
percent duality gap closed := × 100
vRLT − v∗
was recorded on each instance using the optimal value v for that relaxation.
Only instances having vRLT > v∗ were selected for testing (129 instances).
We remark that, when present, constraint PSD was enforced with a cutting-
plane approach based on convex quadratic cuts rather than a black-box
SDP solver.
Table 1
Summary of computational results.
V1 V2 V2-SI V2-Dsj
>99.99 % 16 23 24 1
98-99.99 % 1 44 4 29
75-98 % 10 23 17 10
25-75 % 11 22 26 29
0-25 % 91 17 58 60
Average Gap Closed 24.80% 76.49% 44.40% 41.54%
www.it-ebooks.info
396 SAMUEL BURER AND ANUREET SAXENA
First, the variant V2 code that uses disjunctive cuts closes 50% more
duality gap than the SDP relaxation V1. In fact, relaxations obtained
by adding disjunctive cuts close more than 98% of the duality gap on 67
out of 129 instances; the same figure for SDP relaxations is 17 out of 129
instances. Second, the authors were able to close 99% of the duality gap
on some of the instances such as st qpc-m3a, st ph13, st ph11, ex3 1 4,
st jcbpaf2, ex2 1 9, etc., on which the SDP relaxation closes 0% of the
duality gap.
Third, the variant V2-SI of the code that uses the secant inequal-
ity instead of disjunctive cuts does close a significant proportion (44.40%)
of the duality gap. However, using disjunctive cuts improves this statis-
tic to 76.49% thereby demonstrating the marginal benefits of disjunc-
tive programming. Fourth, it is worth observing that both variants V2
and V2-SI have access to the same kinds of nonconvexities, namely, uni-
variate expressions "X, vvT # ≤ (v T x)2 derived from eigenvectors v of
X̂ − x̂x̂T . Despite this commonality, why does V2, which has access to
the CGLP apparatus, outperform V2-SI? The answer to this question lies
in the way the individual frameworks process the nonconvex expression
"X, vv T # ≤ (v T x)2 . While V2-SI takes a local view of the problem and
convexifies "X, vv T # ≤ (vT x)2 in the 2-dimensional space spanned by vT x
and "X, vvT #, V2 takes a global view of the problem and combines disjunc-
tive terms with other problem constraints. It is precisely this ability to
derive stronger inferences by combining disjunctive information with other
problem constraints that allows V2 to outperform its local counterpart
V2-SI.
Fifth, it is worth observing that removing PSD has a debilitating effect
on the cutting plane algorithm presented in [52] as demonstrated by the
performance of V2-Dsj relative to V2. While the CGLP apparatus allows
us to take a global view of the problem, its ability to derive strong disjunc-
tive cuts is limited by the strength of the initial relaxation. By removing
PSD, the relaxation is significantly weakened, and this subsequently has a
deteriorating effect on the strength of disjunctive cuts later derived.
Table 2
Selection criteria.
The basic premise of the work in [52] lies in generating valid cutting
planes for clconv F- from the spectrum of X̂ − x̂x̂T , where (x̂, X̂) is the
incumbent solution. In order to highlight the impact of these cuts on
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 397
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 5 10 15 20 25 30 −0.2 50 100 150 200 250
−0.4 −0.4
−0.6 −0.6
V1 V2
Fig. 2. Plot of the sum of positive and negative eigenvalues for st jcbpaf2
with V1–V2.
2.0 2.0
1.6 1.6
1.2 1.2
0.8 0.8
0.4 0.4
0 0
250 500 750 1000 1250 50 100 150 200 250 300
−0.4 −0.4
−0.8 −0.8
V1 V2
Fig. 3. Plot of the sum of positive and negative eigenvalues for ex 9 2 7 with V1–V2.
the spectrum itself, the authors presented details on three instances listed
in Table 2 which we reproduce here for the sake of illustration. Figures
2–4 report the key results. The horizontal axis represents the number of
iterations while the vertical axis reports the sum of the positive eigenvalues
of X̂ − x̂x̂T (broken line) and the sum of the negative eigenvalues of X̂ − x̂x̂T
(solid line) . Some remarks are in order.
First, the graph of the sum of negative eigenvalues converges to zero
much faster than the corresponding graph for positive eigenvalues. This is
not surprising because the problem of eliminating the negative eigenvalues
is a convex programming problem, namely an SDP; the approach of adding
convex-quadratic cuts is just an iterative cutting-plane based technique to
impose the X − xxT 0 condition. Second, V1 has a widely varying effect
on the sum of positive eigenvalues of X − xxT . This is to be expected
because the X − xxT 0 condition imposes no constraint on the positive
eigenvalues of X − xxT . Furthermore, the sum of positive eigenvalues rep-
resents the part of the nonconvexity of F- that is not captured by PSD.
Third, it is interesting to note that the variant that uses disjunctive cuts,
www.it-ebooks.info
398 SAMUEL BURER AND ANUREET SAXENA
2.0 2.0
1.6 1.6
1.2 1.2
0.8 0.8
0.4 0.4
0 0
1 2 3 4 5 6 5 10 15
−0.4 −0.4
−0.8 −0.8
V1 V2
Fig. 4. Plot of the sum of the positive and negative eigenvalues for ex 7 3 1
with V1–V2.
namely V2, is able to force both positive and negative eigenvalues to con-
verge to 0 for the st jcbpaf2 thereby generating an almost feasible solution
to this problem.
4.3. Working with only the original variables. Finally, we would
like to emphasize that all of the relaxations of clconv F- discussed until now
are defined in the lifted space of (x, X). While the additional variable
X enhances the expressive power of the formulation, it also increases the
size of the formulation drastically, resulting in an enormous computational
overhead which would be, for example, incurred at every node of a branch-
and-bound tree. Ideally, we would like to extract the strength of these
extended reformulations in the form of cutting planes that are defined only
in the space of the original x variable. Systematic approaches for construct-
ing such convex relaxations of (MIQCP) are described in a recent paper
by Saxena et. al. [53]. We briefly reproduce some of these results to expose
the reader to this line of research.
Consider the relaxation L- ∩ RLT ∩ PSD of clconv F, - and define Q :=
-
projx (L ∩ RLT ∩ PSD), which is a relaxation of clconv F (not clconv F!) -
in the space of the original variable x—but one that retains the power of
L- ∩ RLT ∩ PSD. Can we separate from Q, hence enabling us to work solely
in the x-space? Specifically, given a point x̂ that satisfies at least the simple
bounds l ≤ x ≤ u, we desire an algorithmic framework that either shows
x̂ ∈ Q or finds an inequality valid for Q which cuts off x̂. Note that x̂ ∈ Q
if and only if the following system is feasible in X with x̂ fixed:
"Ak , X# + aTk x̂ ≤ bk ∀ k = 1, . . . , m
T T T
lx̂ + x̂l − ll
≤ X ≤ x̂uT + lx̂T − luT
ux̂T + x̂uT − uuT
1 x̂T
0.
x̂ X
As is typical, if this system is infeasible, then duality theory provides a cut
(in this case, a convex quadratic cut) cutting off x̂ from Q. Further, one
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 399
can optimize to obtain a deep cut. We refer the reader to [53] for further
details where the authors report computational results to demonstrate the
computational dividends of working in the space of original variables possi-
bly augmented by a few additional variables. We reproduce a slice of their
computational results in Section 5.
5. Computational case study. To give the reader an impression of
the computational requirements of the relaxations and techniques proposed
in this paper, we compare three implementations for solving the relaxation
Nine instances from [20] are tested. Their relevant characteristics un-
der relaxations (5.1) and (5.4) are given in Table 3, and the timings (in
seconds) are give in Table 4. Also, in Table 5, we give the percentage gaps
closed by the three methods relative to the pure linear relaxation L- ∩ RLT
(see Section 4.2 for a careful definition of the percentage gap closed). A
few comments are in order.
www.it-ebooks.info
400 SAMUEL BURER AND ANUREET SAXENA
Table 3
Sizes of tested instances.
# Constraints
# Variables Linear Convex
Instance ipm/cp proj ipm/cp proj ipm/cp proj
(SDP) (quad)
spar100-025-1 5151 203 20201 156 1 119
spar100-025-2 5151 201 20201 151 1 95
spar100-025-3 5151 201 20201 150 1 114
spar100-050-1 5151 201 20201 150 1 98
spar100-050-2 5151 201 20201 150 1 113
spar100-050-3 5151 201 20201 150 1 97
spar100-075-1 5151 201 20201 150 1 131
spar100-075-2 5151 201 20201 150 1 109
spar100-075-3 5151 199 20201 147 1 90
Table 4
Computational utility of projected relaxations.
Time (sec)
Instances ipm cp proj (pre-process + solve)
spar100-025-1 5719.42 59 670.15 + 1.14
spar100-025-2 10185.65 54 538.03 + 1.52
spar100-025-3 5407.09 58 656.59 + 1.24
spar100-050-1 10139.57 76 757.14 + 1.07
spar100-050-2 5355.20 92 929.91 + 1.26
spar100-050-3 7281.26 76 747.46 + 0.82
spar100-075-1 9660.79 101 1509.96 + 2.00
spar100-075-2 6576.10 100 936.61 + 1.23
spar100-075-3 10295.88 81 657.84 + 0.87
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 401
Table 5
Computational utility of projected relaxations.
% Gap Closed
Instances ipm/cp proj
spar100-025-1 98.93% 92.36%
spar100-025-2 99.09% 92.16%
spar100-025-3 99.33% 93.26%
spar100-050-1 98.17% 93.62%
spar100-050-2 98.57% 94.13%
spar100-050-3 99.39% 95.81%
spar100-075-1 99.19% 95.84%
spar100-075-2 99.18% 96.47%
spar100-075-3 99.19% 96.06%
www.it-ebooks.info
402 SAMUEL BURER AND ANUREET SAXENA
Table 6
M01LP vs. MIQCP.
has been rather slow, and exact descriptions are unknown for most classes
of problems except for some very small problem instances.
Third, there is an interesting connection between cuts derived from the
. /2
univariate expression "X, vv T # ≤ v T x for MIQCP and split cuts de-
rived from split disjunctions (πx ≤ π0 )∨(πx ≥ π0 + 1) (π ∈ Zn ) in M01LP.
. /2
To see this, note that "X, vvT # ≤ vT x can be obtained from the el-
ementary non-convex constraint Xii ≤ x2i by the linear transformation
(x, X) −→ (v T x, "X, vv T #) where the linear transformation is chosen de-
pending on the incumbent solution; for example, Saxena et al. [52] derive
the v vector from the spectral decomposition of X̂−x̂x̂T . Similarly, the split
disjunction (πx ≤ π0 ) ∨ (πx ≥ π0 + 1) can be obtained from elementary 0-1
disjunction (xj ≤ 0)∨(xj ≥ 1) by the linear transformation x −→ πx where
the linear transformation is chosen depending on the incumbent solution;
for instance, the well known mixed integer Gomory cuts can be obtained
from split disjunctions derived by monoidal strengthening of elementary
0-1 disjunctions, wherein the monoid that is chosen to strengthen the cut
depends on the incumbent solution (see [9]).
REFERENCES
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 403
[2] T.K. Achterberg, T. Berthold and K.T. Wolter, Constraint integer program-
ming: A new approach to integrate cp and mip, Lecture Notes in Computer
Science, 5015 (2008), pp. 6–20.
[3] F.A. Al-Khayyal, Generalized bilinear programming, Part i: Models, applica-
tions, and linear programming relaxations, European Journal of Operational
Research, 60 (1992), pp. 306–314.
[4] F.A. Al-Khayyal and J.E. Falk, Jointly constrained biconvex programming,
Math. Oper. Res., 8 (1983), pp. 273–286.
[5] F. Alizadeh and D. Goldfarb, Second-order cone programming, Math. Pro-
gram., 95 (2003), pp. 3–51. ISMP 2000, Part 3 (Atlanta, GA).
[6] K.M. Anstreicher and S. Burer, Computable representations for convex hulls
of low-dimensional quadratic forms, with K. Anstreicher, Mathematical Pro-
gramming (Series B), 124(1-2), pp. 33-43 (2010).
[7] A. Atamtürk and V. Narayanan, Conic mixed-integer rounding cuts, Math.
Program., 122 (2010), pp. 1–20.
[8] E. Balas, Disjunctive programming: properties of the convex hull of feasible
points, Discrete Appl. Math., 89 (1998), pp. 3–44.
[9] E. Balas, S. Ceria, and G. Cornuéjols, A lift-and-project cutting plane al-
gorithm for mixed 0-1 programs, Mathematical Programming, 58 (1993),
pp. 295–324.
[10] A. Beck, Quadratic matrix programming, SIAM J. Optim., 17 (2006), pp. 1224–
1238 (electronic).
[11] P. Belotti, Disjunctive cuts for non-convex MINLP, IMA Volume Series, Springer
2010, accepted.
http://myweb.clemson.edu/∼pbelott/papers/belotti-disj-MINLP.pdf.
[12] A. Ben-Tal and A. Nemirovski, On polyhedral approximations of the second-
order cone, Math. Oper. Res., 26 (2001), pp. 193–205.
[13] A. Berman and N. Shaked-Monderer, Completely Positive Matrices, World
Scientific, 2003.
[14] D. Bienstock and M. Zuckerberg, Subset algebra lift operators for 0-1 integer
programming, SIAM J. Optim., 15 (2004), pp. 63–95 (electronic).
[15] I. Bomze and F. Jarre, A note on Burer’s copositive representation of mixed-
binary QPs, Optimization Letters, 4 (2010), pp. 465–472.
[16] P. Bonami, L. Biegler, A. Conn, G. Cornuéjols, I. Grossmann, C. Laird,
J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Wächter, An algorithmic
framework for convex mixed-integer nonlinear programs., Discrete Optimiza-
tion, 5 (2008), pp. 186–204.
[17] S. Burer, Optimizing a polyhedral-semidefinite relaxation of completely positive
programs, Mathematical Programming Computation, 2(1), pp 1–19 (2010).
[18] , On the copositive representation of binary and continuous nonconvex
quadratic programs, Mathematical Programming, 120 (2009), pp. 479–495.
[19] S. Burer and A.N. Letchford, On nonconvex quadratic programming with box
constraints, SIAM J. Optim., 20 (2009), pp. 1073–1089.
[20] S. Burer and D. Vandenbussche, Globally solving box-constrained nonconvex
quadratic programs with semidefinite-based finite branch-and-bound, Comput.
Optim. Appl., 43 (2009), pp. 181–195.
[21] D. Coppersmith, O. Günlük, J. Lee, and J. Leung, A polytope for a product of
a real linear functions in 0/1 variables, manuscript, IBM, Yorktown Heights,
NY, December 2003.
[22] G. Cornuéjols, Combinatorial optimization: packing and covering, Society for
Industrial and Applied Mathematics, Philadelphia, PA, USA, 2001.
[23] R.W. Cottle, G.J. Habetler, and C.E. Lemke, Quadratic forms semi-definite
over convex cones, in Proceedings of the Princeton Symposium on Mathemat-
ical Programming (Princeton Univ., 1967), Princeton, N.J., 1970, Princeton
Univ. Press, pp. 551–565.
www.it-ebooks.info
404 SAMUEL BURER AND ANUREET SAXENA
[24] G. Danninger and I.M. Bomze, Using copositivity for global optimality criteria
in concave quadratic programming problems, Math. Programming, 62 (1993),
pp. 575–580.
[25] G. Dantzig, R. Fulkerson, and S. Johnson, Solution of a large-scale traveling-
salesman problem, J. Operations Res. Soc. Amer., 2 (1954), pp. 393–410.
[26] E. de Klerk and D.V. Pasechnik, Approximation of the stability number of a
graph via copositive programming, SIAM J. Optim., 12 (2002), pp. 875–892.
[27] See the website: www.gamsworld.org/global/globallib/globalstat.htm.
[28] R. Horst and N.V. Thoai, DC programming: overview, J. Optim. Theory Appl.,
103 (1999), pp. 1–43.
[29] M. Jach, D. Michaels, and R. Weismantel, The convex envelope of (N − 1)-
convex fucntions, SIAM J. Optim., 19 (2008), pp. 1451–1466.
[30] R. Jeroslow, There cannot be any algorithm for integer programming with
quadratic constraints., Operations Research, 21 (1973), pp. 221–224.
[31] S. Kim and M. Kojima, Second order cone programming relaxation of non-
convex quadratic optimization problems, Optim. Methods Softw., 15 (2001),
pp. 201–224.
[32] , Exact solutions of some nonconvex quadratic optimization problems via
SDP and SOCP relaxations, Comput. Optim. Appl., 26 (2003), pp. 143–154.
[33] M. Kojima and L. Tunçel, Cones of matrices and successive convex relaxations
of nonconvex sets, SIAM J. Optim., 10 (2000), pp. 750–778.
[34] J.B. Lasserre, Global optimization with polynomials and the problem of moments,
SIAM J. Optim., 11 (2001), pp. 796–817.
[35] M. Laurent, A comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre
relaxations for 0-1 programming, Math. Oper. Res., 28 (2003), pp. 470–496.
[36] J. Linderoth, A simplicial branch-and-bound algorithm for solving quadratically
constrained quadratic programs, Math. Program., 103 (2005), pp. 251–282.
[37] L. Lovász and A. Schrijver, Cones of matrices and set-functions and 0-1 opti-
mization, SIAM Journal on Optimization, 1 (1991), pp. 166–190.
[38] T. Matsui, NP-hardness of linear multiplicative programming and related prob-
lems, J. Global Optim., 9 (1996), pp. 113–119.
[39] J.E. Maxfield and H. Minc, On the matrix equation X X = A, Proc. Edinburgh
Math. Soc. (2), 13 (1962/1963), pp. 125–129.
[40] G.P. McCormick, Computability of global solutions to factorable nonconvex pro-
grams. I. Convex underestimating problems, Math. Programming, 10 (1976),
pp. 147–175.
[41] See the website: http://www.gamsworld.org/minlp/.
[42] K.G. Murty and S.N. Kabadi, Some NP-complete problems in quadratic and
nonlinear programming, Math. Programming, 39 (1987), pp. 117–129.
[43] G. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization, Wiley-
Interscience, 1999.
[44] M. Padberg, The Boolean quadric polytope: some characteristics, facets and rel-
atives, Math. Programming, 45 (1989), pp. 139–172.
[45] P. Pardalos, Global optimization algorithms for linearly constrained indefi-
nite quadratic problems, Computers and Mathematics with Applications, 21
(1991), pp. 87–97.
[46] P.M. Pardalos and S.A. Vavasis, Quadratic programming with one negative
eigenvalue is NP-hard, J. Global Optim., 1 (1991), pp. 15–22.
[47] P. Parrilo, Structured Semidefinite Programs and Semi-algebraic Geometry
Methods in Robustness and Optimization, PhD thesis, California Institute
of Technology, 2000.
[48] G. Pataki, On the rank of extreme matrices in semidefinite programs and the
multiplicity of optimal eigenvalues, Mathematics of Operations Research, 23
(1998), pp. 339–358.
[49] J. Povh and F. Rendl, A copositive programming approach to graph partitioning,
SIAM J. Optim., 18 (2007), pp. 223–241.
www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 405
www.it-ebooks.info
www.it-ebooks.info
LINEAR PROGRAMMING RELAXATIONS
OF QUADRATICALLY CONSTRAINED
QUADRATIC PROGRAMS
ANDREA QUALIZZA∗ , PIETRO BELOTTI† , AND FRANÇOIS MARGOT∗‡
Abstract. We investigate the use of linear programming tools for solving semidefi-
nite programming relaxations of quadratically constrained quadratic problems. Classes
of valid linear inequalities are presented, including sparse P SD cuts, and principal
minors P SD cuts. Computational results based on instances from the literature are
presented.
0750826.
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 407
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_14,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
408 ANDREA QUALIZZA ET AL.
www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 409
EXT is a Linear Program with n(n + 3)/2 + m variables and the same
number of constraints as QCQP. Note that the optimal value of EXT is
usually a weak upper bound for QCQP, as no constraint links the values
of the x and X variables. Two main approaches for doing that have been
proposed and are based on relaxations of the last constraint of LIFT,
namely
X − xxT = 0. (2.1)
and defining
−ck aTk /2 1 xT
Q̃k = , X̃ = ,
ak /2 Qk x X
www.it-ebooks.info
410 ANDREA QUALIZZA ET AL.
Anstreicher [2] observes that, for Quadratic Programs with box con-
straints, the P SD and RLT constraints together yield much better bounds
than those obtained from the PSD or RLT relaxations. In this work,
we want to capture the strength of both techniques and generate a Linear
Programming relaxation of QCQP.
Notice that the four inequalities above, introduced by McCormick [12],
constitute the convex envelope of the set {(xi , xj , Xij ) ∈ R3 : lxi ≤ xi ≤
uxi , lxj ≤ xj ≤ uxj , Xij = xi xj } as proven by Al-Khayyal and Falk [1], i.e.,
they are the tightest relaxation for the single term Xij .
3. Our framework. While the RLT constraints are linear in the
variables in the EXT formulation and therefore can be added directly
to EXT, this is not the case for the P SD constraint. We use a linear
outer-approximation of the PSD relaxation and a cutting plane framework,
adding a linear inequality separating the current solution from the P SD
cone.
The initial relaxation we use and the various cuts generated by our
separation procedure are described in more details in the next sections.
www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 411
In addition, equality (2.1) implies Xii ≥ x2i . We therefore also make sure
that Lii ≥ 0. In the remainder of the paper, this initial relaxation is
identified as EXT+RLT.
3.2. PSD cuts. We use the equivalence that a matrix is positive
semidefinite if and only if
www.it-ebooks.info
412 ANDREA QUALIZZA ET AL.
n+1
mj = wj wi X̃ij for j = 1, . . . , n + 1 .
i=1
Its initial computation takes O(n2 ) and its update, after a single entry of w
is set to 0, takes O(n). The vector m can be used to compute the left hand
side of the test in step 8 in constant time given the value of the violation
d for the inequality generated by the current vector w: Setting the entry
= perm[j] of w to zero reduces the violation by 2m − w2 X̃ and thus
the violation of the resulting vector is (d − 2m + w2 X̃ ).
www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 413
www.it-ebooks.info
414 ANDREA QUALIZZA ET AL.
RLT − BN D
100 · , (4.1)
RLT − OP T
where BN D and RLT are the optimal value for the given relaxation and
the EXT+RLT relaxation respectively, and OP T is either the optimal
value of I or the best known value for a feasible solution. The OP T values
are taken from [14].
4.1. Instances. Tests are performed on a subset of instances from
GLOBALLib [10] and on Box Constrained Quadratic Programs (BoxQPs)
[24]. GLOBALLib contains 413 continuous global optimization problems
of various sizes and types, such as BoxQPs, problems with complemen-
tarity constraints, and general QCQPs. Following [14], we select 160 in-
stances from GLOBALLib having at most 50 variables and that can easily
be formulated as QCQP. The conversion of a non-linear expression into a
quadratic expression, when possible, is performed by adding new variables
and constraints to the problem. Additionally, bounds on the variables are
derived using linear programming techniques and these bound are included
in the formulation. From these 160 instances in AMPL format, we substi-
tute each bilinear term xi xj by the new variable Xij as described for the
LIFT formulation. We build two collections of linearized instances in MPS
format, one with the original precision on the coefficients and right hand
side, and the second with 8-digit precision. In our experiments we used the
latter.
As observed in [14], using together the SDP and RLT relaxations
yields stronger bounds than those given by the RLT relaxation only for 38
www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 415
www.it-ebooks.info
416 ANDREA QUALIZZA ET AL.
Table 1
Comparison of S1M with S2M at several iterations.
g=1 g=5
Iteration S1M S2M Tie S1M S2M Tie inc. impr.
1 7.87 39.33 52.80 1.12 19.1 79.78 0.00 3.21
2 17.98 28.09 53.93 0.00 10.11 89.89 0.00 2.05
3 17.98 19.10 62.92 1.12 7.87 91.01 0.00 1.50
5 12.36 14.61 73.03 3.37 5.62 91.01 0.00 1.77
10 10.11 13.48 76.41 0.00 5.62 94.38 0.00 1.42
15 4.49 13.48 82.03 1.12 6.74 92.14 0.00 1.12
20 1.12 10.11 78.66 1.12 6.74 82.02 10.11 1.02
30 1.12 8.99 79.78 1.12 5.62 83.15 10.11 0.79
50 2.25 6.74 80.90 1.12 4.49 84.28 10.11 0.47
100 0.00 4.49 28.09 0.00 2.25 30.33 67.42 1.88
200 0.00 3.37 15.73 0.00 2.25 16.85 80.90 2.51
300 0.00 2.25 12.36 0.00 2.25 12.36 85.39 3.30
500 0.00 2.25 7.87 0.00 2.25 7.87 89.88 3.85
1000 0.00 2.25 3.37 0.00 2.25 3.37 94.38 7.43
ent iterations. S2M dominates clearly S1M in the very first iteration and
after 200 iterations, while after the first few iterations S1M also manages
to obtain good bounds. Table 2 gives the comparison between these two
algorithms at different times. For comparisons with g = 1, S1M is better
than S2M only in at most 2.25% of the problems, while the converse varies
between roughly 50% (at early times) and 8% (for late times). For g = 5,
S2M still dominates S1M in most cases.
Sparse cuts yield better bounds than using solely the standard P SD
cuts. The observed improvement is around 3% and 5% respectively for
SPARSE1 and SPARSE2. When we are using the MINOR cuts, this value
gets to 6% and 8% respectively for each type of sparsification algorithm
used. Table 3 compares PSDCUT (abbreviated by S) with S2M. The ta-
ble shows that the sparse cuts generated by the sparsification procedures
and minor P SD cuts yield better bounds than the standard cutting plane
algorithm at fixed iterations. Comparisons performed at fixed times, on
the other hand, show that considering the whole set of instances we do not
get any improvement in the first 60 to 120 seconds of computation (see
Table 4). Indeed S2M initially performs worse than the standard cutting
plane algorithm, but after 60 to 120 seconds, it produces better bounds on
average.
In Section 6 detailed computational results are given in Tables 5 and
6 where for each instance we compare the duality gap closed by S and S2M
at several iterations and times. The initial duality gap is obtained as in
(4.1) as RLT − OP T . We then let S2M run with no time limit until the
www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 417
Table 2
Comparison of S1M with S2M at several times.
g=1 g=5
Time S1M S2M Tie S1M S2M Tie inc. impr.
0.5 3.37 52.81 12.36 0.00 43.82 24.72 31.46 2.77
1 0.00 51.68 14.61 0.00 40.45 25.84 33.71 4.35
2 0.00 47.19 15.73 0.00 39.33 23.59 37.08 5.89
3 1.12 44.94 14.61 0.00 34.83 25.84 39.33 5.11
5 1.12 43.82 15.73 0.00 38.20 22.47 39.33 6.07
10 1.12 41.58 16.85 0.00 24.72 34.83 40.45 4.97
15 2.25 37.08 16.85 1.12 21.35 33.71 43.82 3.64
20 1.12 35.96 16.85 1.12 17.98 34.83 46.07 3.49
30 1.12 28.09 22.48 1.12 16.86 33.71 48.31 2.99
60 1.12 20.23 28.09 0.00 12.36 37.08 50.56 2.62
120 0.00 15.73 32.58 0.00 10.11 38.20 51.69 1.73
180 0.00 13.49 32.58 0.00 5.62 40.45 53.93 1.19
300 0.00 11.24 31.46 0.00 3.37 39.33 57.30 0.92
600 0.00 7.86 24.72 0.00 0.00 32.58 67.42 0.72
Table 3
Comparison of S with S2M at several iterations.
g=1 g=5
Iteration S S2M Tie S S2M Tie inc. impr.
1 0.00 76.40 23.60 0.00 61.80 38.20 0.00 10.47
2 0.00 84.27 15.73 0.00 55.06 44.94 0.00 10.26
3 0.00 83.15 16.85 0.00 48.31 51.69 0.00 10.38
5 0.00 80.90 19.10 0.00 40.45 59.55 0.00 10.09
10 1.12 71.91 26.97 0.00 41.57 58.43 0.00 8.87
15 1.12 60.67 38.21 1.12 35.96 62.92 0.00 7.49
20 1.12 53.93 40.45 1.12 29.21 65.17 4.50 6.22
30 1.12 34.83 53.93 0.00 16.85 73.03 10.12 5.04
50 1.12 25.84 62.92 0.00 13.48 76.40 10.12 3.75
100 1.12 8.99 21.35 0.00 5.62 25.84 68.54 5.57
200 0.00 5.62 8.99 0.00 3.37 11.24 85.39 7.66
300 0.00 3.37 7.87 0.00 3.37 7.87 88.76 8.86
500 0.00 3.37 5.62 0.00 3.37 5.62 91.01 8.72
1000 0.00 2.25 0.00 0.00 2.25 0.00 97.75 26.00
value s obtained does not improve by at least 0.01% over ten consecutive
iterations. This value s is an upper bound on the value of the PSD+RLT
relaxation. The column “bound” in the tables gives the value of RLT − s
as a percentage of the gap RLT − OP T , i.e. an approximation of the
www.it-ebooks.info
418 ANDREA QUALIZZA ET AL.
Table 4
Comparison of S with S2M at several times.
g=1 g=5
Time S S2M Tie S S2M Tie inc. impr.
0.5 41.57 17.98 5.62 41.57 17.98 5.62 34.83 -9.42
1 41.57 14.61 5.62 39.33 13.48 8.99 38.20 -8.66
2 42.70 10.11 6.74 29.21 8.99 21.35 40.45 -8.73
3 41.57 8.99 8.99 31.46 6.74 21.35 40.45 -8.78
5 35.96 7.87 15.72 33.71 5.62 20.22 40.45 -7.87
10 34.84 7.87 13.48 30.34 4.50 21.35 43.81 -5.95
15 37.07 5.62 11.24 22.47 2.25 29.21 46.07 -5.48
20 37.07 5.62 8.99 17.98 1.12 32.58 48.32 -4.99
30 30.34 5.62 15.72 11.24 1.12 39.32 48.32 -3.9
60 11.24 12.36 25.84 11.24 2.25 35.95 50.56 -1.15
120 8.99 12.36 24.72 2.25 2.25 41.57 53.93 0.48
180 2.25 14.61 29.21 0.00 4.50 41.57 53.93 1.09
300 0.00 15.73 26.97 0.00 6.74 35.96 57.30 1.60
600 0.00 14.61 13.48 0.00 5.62 22.47 71.91 2.73
www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 419
the exact bound, SeDuMi requires 408 seconds while S2M requires 2,442
seconds to reach the same precision. Nevertheless, for our purposes, most
of the benefits of the P SD constraints are captured in the early iterations.
Two additional improvements are possible. The first one is to use a
cut separation procedure for the RLT inequalities, avoiding their inclusion
in the initial LP and managing them as other cutting planes. This could
potentially speed up the reoptimization of the LP. Another possibility is
to use a mix of the S and S2M algorithms, using the former in the early
iterations and then switching to the latter.
ϳϱϬ
ϳϬϬ
ϲϱϬ
ŽƵŶĚ
^ĞƵDŝ
^ϮD
ϲϬϬ
ďŽƵŶĚ
ϱϱϬ
ϱϬϬ
Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ
dŝŵĞ;ƐͿ
ϭϮϱϬ
ϭϭϱϬ
ϭϬϱϬ
ϵϱϬ
ŽƵŶĚ
^ĞƵDŝ
ϴϱϬ ^ϮD
ďŽƵŶĚ
ϳϱϬ
ϲϱϬ
ϱϱϬ
Ϭ ϱϬ ϭϬϬ ϭϱϬ ϮϬϬ ϮϱϬ ϯϬϬ ϯϱϬ ϰϬϬ ϰϱϬ
dŝŵĞ;ƐͿ
www.it-ebooks.info
420
Table 5
Duality gap closed at several iterations for each instance.
www.it-ebooks.info
ex9 2 6 16 0 99.88 3.50 99.42 23.09 99.86 62.32 99.88
ex9 2 7 10 0 42.30 0.00 4.59 0.00 27.34 3.14 34.91
ANDREA QUALIZZA ET AL.
www.it-ebooks.info
spar030-090-3 30 0 100.00 90.65 91.56 99.97 100.00 100.00 100.00
spar030-100-1 30 0 100.00 77.28 83.25 95.20 98.30 99.85 100.00
spar030-100-2 30 0 99.96 76.78 81.65 93.44 96.84 98.70 99.72
spar030-100-3 30 0 99.85 86.82 88.74 97.45 98.75 99.75 99.83
spar040-030-1 40 0 100.00 25.60 41.96 73.59 84.72 99.13 100.00
spar040-030-2 40 0 100.00 30.93 53.39 79.34 95.62 99.46 100.00
spar040-030-3 40 0 100.00 9.21 31.38 66.46 86.62 98.53 100.00
LP RELAXATIONS OF QUADRATIC PROGRAMS
www.it-ebooks.info
spar050-050-2 50 0 98.74 32.10 41.26 77.48 83.48 - -
spar050-050-3 50 0 98.84 38.57 44.67 80.97 85.36 - -
ANDREA QUALIZZA ET AL.
www.it-ebooks.info
spar030-070-1 97.99 69.81 60.36 94.50 96.38 97.29 97.73 97.70 97.91 - 97.98
spar030-070-2 100.00 96.05 87.93 - - - - - - - -
spar030-070-3 99.98 96.26 90.42 99.98 99.98 99.98 99.98 - 99.98 - -
spar030-080-1 98.99 83.36 74.42 97.80 98.11 98.74 98.88 98.89 98.96 - 98.99
spar030-080-2 100.00 99.83 96.70 - - - - - - - -
spar030-080-3 100.00 99.88 95.87 - - - - - - - -
spar030-090-1 100.00 92.86 87.69 - - - - - - - -
spar030-090-2 100.00 93.80 88.46 - 100.00 - - - - - -
LP RELAXATIONS OF QUADRATIC PROGRAMS
www.it-ebooks.info
spar040-100-2 99.87 65.27 26.07 97.34 96.87 98.60 98.98 99.02 99.39 99.44 99.69
spar040-100-3 98.70 61.40 29.61 93.01 93.17 94.91 96.02 95.81 97.00 96.84 97.77
ANDREA QUALIZZA ET AL.
spar050-030-1 100.00 0.37 3.63 54.46 37.52 70.10 73.34 76.87 84.75 86.23 96.33
spar050-030-2 99.27 0.08 2.79 44.68 38.62 59.58 64.94 67.79 74.98 77.02 86.58
spar050-030-3 99.29 0.00 2.75 44.32 32.31 57.13 59.07 62.54 68.99 71.18 82.86
spar050-040-1 100.00 3.76 1.77 69.97 56.87 77.15 78.30 80.31 84.30 84.90 91.79
spar050-040-2 99.39 2.08 2.84 68.64 58.47 77.72 77.61 81.54 83.63 86.40 90.94
spar050-040-3 100.00 1.76 2.31 79.44 65.71 89.73 87.74 92.67 93.00 95.99 97.69
spar050-050-1 93.02 4.91 1.84 60.64 53.28 65.52 66.42 66.81 70.38 68.45 74.76
spar050-050-2 98.74 6.18 3.39 76.56 68.33 82.34 82.21 84.94 86.52 - 91.34
spar050-050-3 98.84 6.12 2.82 79.38 69.23 84.95 83.23 86.99 86.98 89.77 91.57
Average - 51.45 42.96 87.50 86.38 92.14 93.22 93.18 94.77 93.16 95.86
LP RELAXATIONS OF QUADRATIC PROGRAMS 425
REFERENCES
[1] F.A. Al-Khayyal and J.E. Falk, Jointly constrained biconvex programming.
Math. Oper. Res. 8, pp. 273–286, 1983.
[2] K.M. Anstreicher, Semidefinite Programming versus the Reformulation-
Linearization Technique for Nonconvex Quadratically Constrained Quadratic
Programming, Journal of Global Optimization, 43, pp. 471–484, 2009.
[3] E. Balas, Disjunctive programming: properties of the convex hull of feasible
points. Disc. Appl. Math. 89, 1998.
[4] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming: The-
ory and Algorithms. Wiley, 2006.
[5] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University
Press, 2004.
[6] S.A. Burer and A.N. Letchford, On Non-Convex Quadratic Programming with
Box Constraints, SIAM Journal on Optimization, 20, pp. 1073–1089, 2009.
[7] B. Borchers, CSDP, A C Library for Semidefinite Programming, Optimization
Methods and Software 11(1), pp. 613–623, 1999.
[8] COmputational INfrastructure for Operations Research (COIN-OR).
http://www.coin-or.org.
[9] S.J. Benson and Y. Ye, DSDP5: Software For Semidefinite Programming. Avail-
able at http://www-unix.mcs.anl.gov/DSDP.
[10] GamsWorld Global Optimization library,
http://www.gamsworld.org/global/globallib/globalstat.htm.
[11] L. Lovász and A. Schrijver, Cones of matrices and set-functions and 0-1 opti-
mization. SIAM Journal on Optimization, May 1991.
[12] G.P. McCormick, Nonlinear programming: theory, algorithms and applications.
John Wiley & sons, 1983.
[13] http://www.andrew.cmu.edu/user/aqualizz/research/MIQCP.
[14] A. Saxena, P. Bonami, and J. Lee, Convex Relaxations of Non-Convex Mixed
Integer Quadratically Constrained Programs: Extended Formulations, Math-
ematical Programming, Series B, 124(1–2), pp. 383–411, 2010.
[15] , Convex Relaxations of Non-Convex Mixed Integer Quadratically Con-
strained Programs: Projected Formulations, Optimization Online, November
2008. Available at
http://www.optimization-online.org/DB HTML/2008/11/2145.html.
[16] J.F. Sturm, SeDuMi: An Optimization Package over Symmetric Cones. Available
at http://sedumi.mcmaster.ca.
[17] H.D. Sherali and W.P. Adams, A reformulation-linearization technique for solv-
ing discrete and continuous nonconvex problems. Kluwer, Dordrecht 1998.
[18] H.D. Sherali and B.M.P. Fraticelli, Enhancing RLT relaxations via a new class
of semidefinite cuts. J. Global Optim. 22, pp. 233–261, 2002.
[19] N.Z. Shor, Quadratic optimization problems. Tekhnicheskaya Kibernetika, 1,
1987.
[20] K. Sivaramakrishnan and J. Mitchell, Semi-infinite linear programming ap-
proaches to semidefinite programming (SDP) problems. Novel Approaches to
Hard Discrete Optimization, edited by P.M. Pardalos and H. Wolkowicz, Fields
Institute Communications Series, American Math. Society, 2003.
[21] , Properties of a cutting plane method for semidefinite programming, Tech-
nical Report, Department of Mathematics, North Carolina State University,
September 2007.
[22] K.C. Toh, M.J. Todd, and R.H.Tütüncü, SDPT3: A MATLAB software for
semidefinite-quadratic-linear programming. Available at
http://www.math.nus.edu.sg/~mattohkc/sdpt3.html.
www.it-ebooks.info
426 ANDREA QUALIZZA ET AL.
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPS∗
TIMO BERTHOLD† , STEFAN HEINZ† , AND STEFAN VIGERSKE‡
Abstract. This paper discusses how to build a solver for mixed integer quadrati-
cally constrained programs (MIQCPs) by extending a framework for constraint integer
programming (CIP). The advantage of this approach is that we can utilize the full power
of advanced MILP and CP technologies, in particular for the linear relaxation and the
discrete components of the problem. We use an outer approximation generated by lin-
earization of convex constraints and linear underestimation of nonconvex constraints to
relax the problem. Further, we give an overview of the reformulation, separation, and
propagation techniques that are used to handle the quadratic constraints efficiently.
We implemented these methods in the branch-cut-and-price framework SCIP. Com-
putational experiments indicating the potential of the approach and evaluating the im-
pact of the algorithmic components are provided.
in Berlin, http://www.matheon.de.
† Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany ({berthold,
heinz}@zib.de).
‡ Humboldt University Berlin, Unter den Linden 6, 10099 Berlin, Germany
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 427
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_15,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
428 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
Stop
Solve Relax.
Cuts
Solve Subproblem
infeasible
Enforce Constraints
solved
Primal Heuristics relax. feasible infeas.
relax. feasible
Branching
CP technologies for handling the linear and the discrete parts of the prob-
lem. The integration of MIQCP is a first step towards the incorporation
of MINLP into the concept of constraint integer programming.
We extended the branch-cut-and-price framework SCIP (Solving Con-
straint Integer Programs) [2, 3] by adding methods for MIQCP. SCIP incor-
porates the idea of CIP and implements several state-of-the-art techniques
for solving MILPs. Due to its plugin-based design, it can be easily cus-
tomized, e.g., by adding problem specific separation, presolving, or domain
propagation algorithms.
The framework SCIP solves CIPs by a branch-and-bound algorithm.
The problem is recursively split into smaller subproblems, thereby creating
a search tree and implicitly enumerating all potential solutions. At each
subproblem, domain propagation is performed to exclude further values
from the variables’ domains, and a relaxation may be solved to achieve
a local lower bound – assuming the problem is to minimize the objective
function. The relaxation may be strengthened by adding further valid
constraints (e.g., linear inequalities), which cut off the optimal solution
of the relaxation. In case a subproblem is found to be infeasible, conflict
analysis is performed to learn additional valid constraints. Primal heuristics
are used as supplementary methods to improve the upper bound. Figure 1
illustrates the main algorithmic components of SCIP. In the context of this
article, the relaxation employed in SCIP is a linear program (LP).
The remainder of this article is organized as follows. In Section 2, we
formally define MIQCP and CIP, in Sections 3, 4, and 5, we show how
to handle quadratic constraints inside SCIP, and in Section 6, we present
computational results.
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 429
tation, the so-called pseudo solution, see [2, 3] for details, will be used in the case of
unbounded LP relaxations.
www.it-ebooks.info
430 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
isfies all of its constraints. If the solution violates one or more constraints,
the handler may resolve the infeasibility by adding another constraint, per-
forming a domain reduction, or a branching.
For speeding up computation, a constraint handler may further im-
plement additional features like presolving, cutting plane separation, and
domain propagation for its particular class of constraints. Besides that,
a constraint handler can add valid linear inequalities to the initial LP re-
laxation. For example, all constraint handler for (general or specialized)
linear constraints add their constraints to the initial LP relaxation. The
constraint handler for quadratic constraints adds one linear inequality that
is obtained by the method given in Section 3.2 below.
In the following, we discuss the presolving, separation, propagation,
and enforcement algorithms that are used to handle quadratic constraints.
3.1. Presolving. During the presolving phase, a set of reformula-
tions and simplifications are tried. If SCIP fixes or aggregates variables,
e.g., using global presolving methods like dual bound reduction [2], then
the corresponding reformulations will also be realized in the quadratic con-
straints. Bounds on the variables are tightened using the domain prop-
agation method described in Section 3.3. If, due to reformulations, the
quadratic part of a constraint vanishes, it is replaced by the corresponding
linear constraint. Furthermore, the following reformulations are performed.
Binary Variables. A square of a binary variable is replaced by the
binary variable itself. Further, if a constraint
contains a product of a binary
variable with a linear term, i.e., x ki=1 ai yi , where x is a binary variable,
yi are variables with finite bounds, and ai ∈ Q, i = 1, . . . , k, then this
product will be replaced by a new variable z ∈ R and the linear constraints
yL x ≤ z ≤ y U x
k
k
ai yi − y U (1 − x) ≤ z ≤ ai yi − y L (1 − x), where
i=1 i=1
k
k
yL := ai yiL + ai yiU , and (3.1)
i=1, i=1,
ai >0 ai <0
k
k
yU := ai yiU + ai yiL .
i=1, i=1,
ai >0 ai <0
In the case that k = 1 and y1 is also a binary variable, the product xy1 can
also be handled by SCIP’s handler for AND constraints [11].
Second-Order Cone (SOC) constraints. Constraints of the form
k
γ+ (αi (xi + βi ))2 ≤ (α0 (y + β0 ))2 , (3.2)
i=1
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 431
to separate x̃. In the important special case that xT Ai x ≡ ax2j for some
a > 0 and j ∈ I with x̃j ∈
/ Z, we generate the cut
axj xk ≥ axL L L L
j xk + axk xj − axj xk ,
axj xk ≥ axU U U U
j xk + axk xj − axj xk .
If (xU L U L U U L L
j − xj )x̃k + (xk − xk )x̃j ≤ xj xk − xj xk and the bounds xj and
L
L
xk are finite, the former is used for cut generation, elsewise the latter is
used. If both xL L U U
j or xk and xj or xk are infinite, we skip separation for
constraint i. Similar, for a bilinear term axj xk with a < 0, the McCormick
underestimators are
axj xk ≥ axU L U L
j xk + axk xj − axj xk ,
axj xk ≥ axL U L U
j xk + axk xj − axj xk .
If (xU L U L U L L U U
j − xj )x̃k − (xk − xk )x̃j ≤ xj xk − xj xk and the bounds xj and xk
L
are finite, the former is used for cut generation, elsewise the latter is used.
www.it-ebooks.info
432 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
dj xj + ek + pk,k xk + pk,r xr xk ∈ [, u], (3.5)
j∈J k∈K r∈K
where [fkL , fkU ] :=
[ek , ek ] + r∈K pk,r [xL U L U
r , xr ]. Computing [h , h ] :=
L U L U
j∈J dj [xj , xj ] + k∈K q(pk,k , [fk , fk ], xk ) yields an interval that con-
tains all values that the left hand side of (3.5) can take w.r.t. the current
variables’ domains. If [hL , hU ] ∩ [, u] = ∅, then (3.5) cannot be satisfied for
any x ∈ [xL , xU ] and the current branch-and-bound node can be pruned.
Otherwise, the interval [, u] can be tightened to [, u] ∩ [hL , hU ].
The backward propagation step aims at inferring domain deductions
on the variables in (3.5) using the interval [, u]. For a “linear” variable xj ,
j ∈ J, we can easily infer the bounds
1
[, u] − dj [xL U
j , xj ] − q(pk,k , [fkL , fkU ], xk ) .
dj
j ∈J,j =j k∈K
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 433
dj [xL U
j , xj ] + q(pk ,k , [ek , ek ] + pk,r [xL U
r , xr ], xk )
j∈J k ∈K,k =k r∈K,r =k
+ ([ek , ek ] + (pk,r + pr,k )[xL U 2
r , xr ])xk + pk,k xk ∈ [, u].
r∈K
However, since evaluating the argument of q(·) for each k ∈ K may pro-
duce a huge computational overhead, especially for constraints with many
bilinear terms, we compute the solution set of
dj [xL
j , xU
j ] + q(p
k ,k , [e k , e k ], xk ) + p [x
k ,r k
L
, xU
k ][xL
r , xU
r ]
j∈J k ∈K r∈K
k =k r =k
+ ([ek , ek ] + (pk,r + pr,k )[xL U 2
r , xr ])xk + pk,k xk ∈ [, u], (3.6)
r∈K
www.it-ebooks.info
434 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
k
1 2
η+ α (x̃i + βi )(xi − x̃i ) ≤ α0 y + β0 ,
η i=1 i
k 2
where η := i=1 γ + (αi (x̃i + βi )) . Note that since (x̃, ỹ) violates (4.1),
one has η > α0 ỹ + β0 ≥ 0. For the initial linear relaxation, no inequalities
are added.
We also experimented with adding a linear outer-approximation as
suggested in [7] a priori, but did not observe computational benefits. Thus,
this option has been disabled for the experiments in Section 6.
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 435
www.it-ebooks.info
436 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
each instance, the fastest solution time or – in case all solvers hit the time
limit – the best bounds, are depicted in bold face. Further, for each solver
we calculated the geometric mean of the solution time (in which unsolved
instances are accounted for with the time limit), and collected statistics on
how often a solver solved a problem, computed the best dual bound, found
the best primal solution value, or was the fastest among all solvers.
For our benchmark, we ran SCIP 1.2.0.7 using CPLEX 11.2.1 [19] as
LP solver, Ipopt 3.8 [28] as QCP solver for the heuristics (cf. Section 5),
and LAPACK 3.1.0 to compute eigenvalues. For comparison, we ran BA-
RON 9.0.2 [26], Couenne 0.3 [6], CPLEX 12.1, LindoGlobal 6.0.1 [20], and
MOSEK 6.0.0.55 [23]. Note that BARON, Couenne, and LindoGlobal can
also be applied to general MINLPs. All solvers were run with a time limit
of one hour, a final gap tolerance of 10−4 , and a feasibility tolerance of 10−6
on a 2.5 GHz Intel Core2 Duo CPU with 4 GB RAM and 6 MB Cache.
Mixed Integer Quadratic Programs. Table 6 presents the 25 in-
stances from the Miqp test set [22]. We observe that due to the refor-
mulation (3.1), 15 instances could be reformulated as mixed integer linear
programs in the presolving state.
Table 1 compares the performance of SCIP, BARON, Couenne, and
CPLEX on the Miqp test set. We did not run LindoGlobal since many
of the Miqp instances exceed limitations of our LindoGlobal license. Note
that some of the instances are nonconvex before applying the reformulation
described in Section 3.1, so that we did not apply solvers which have only
been designed for convex problems. Instance ivalues is the only instance
that cannot be handled by CPLEX due to nonconvexity. Altogether, SCIP
performs much better than BARON and Couenne and slightly better than
CPLEX w.r.t. the mean computation time.
Mixed Integer Conic Programs. The Micp test set consists of
three types of optimization problems, see Table 5. The classical
XXX YY
instances contain one convex quadratic constraint of the form kj=1 x2j ≤ u
for some u ∈ Q, where XXX stand for the dimension k and YY is a problem
index. The robust XXX YY instances contain one convex quadratic and one
SOC constraint of dimension k. The shortfall XXX YY instances contain
two SOC constraints of dimension k.
Table 2 compares the performance of BARON, Couenne, CPLEX, MO-
SEK, LindoGlobal, and SCIP on the Micp test set. We observe that on
this specific test set SCIP outperforms BARON, Couenne, and LindoGlobal.
It solves one instance more but is about 20% slower than the commercial
solvers CPLEX and MOSEK.
Mixed Integer Quadratically Constrained Programs. The in-
stances lop97ic, lop97icx, pb302035, pb351535, qap, and qapw were
transformed into MILPs by presolving – which is due to the reformula-
tion (3.1). The instances nuclear*, space25, space25a, and waste are
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 437
Table 1
Results on Miqp instances. Each entry shows the number of seconds to solve a
problem, or the bounds obtained after the one hour limit.
www.it-ebooks.info
438
Table 2
Results on Micp instances. Each entry shows the number of seconds to solve a problem, or the bounds obtained after the one hour limit.
www.it-ebooks.info
shortfall_50_0 [−1.098, −1.095] [−1.102, −1.095] 1913.00 [−1.104, −1.095] 405.93 1602.81
shortfall_50_1 91.84 [−1.103, −1.099] 13.13 [−1.104, −1.102] 21.73 15.44
shortfall_100_0 [−1.12, −1.114] [−1.126, −1.102] [−1.132, −1.112] [−1.125, −1.114] [−1.116, −1.114] [−1.121, −1.114]
shortfall_100_1 [−1.109, −1.106] [−1.113, −1.091] 3301.75 [−1.113, −1.105] [−1.111, −1.106] 2152.56
shortfall_200_0 [−1.149, −1.12] [−1.15, −1.094] [−1.146, −1.125] [−1.479, −1.08] [−1.146, −1.126] [−1.149, −1.119]
shortfall_200_1 [−1.15, −1.131] [−1.152, −1.101] [−1.15, −1.133] [−1.361, −1.089] [−1.151, −1.135] [−1.148, −1.134]
mean time 1202.41 2092.28 226.932 1552.67 228.392 288.956
#solved 8 4 14 8 14 15
#best dual bound 8 4 19 8 16 16
#best primal sol. 13 9 17 12 19 17
TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
#fastest 0 0 8 0 4 3
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 439
Table 3
Results on Minlp instances. Each entry shows the number of seconds to solve a
problem, or the bounds obtained after the one hour limit.
after 4000 seconds for pb351535 and for pb302035, pb351535, respectively.
Further, no bounds were reported in the log file.
CPLEX can be applied to 11 instances of this test set. The clay* and
du-opt* instances were solved within seconds; 4 times CPLEX was fastest,
4 times SCIP was fastest. For the instances pb302035, pb351535, and qap,
CPLEX found good primal solutions, but very weak lower bounds.
www.it-ebooks.info
440 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
variate quadratic functions, cf. Section 3.1, the QCP local search heuristic,
cf. Section 5, and the extended RENS heuristic, cf. Section 5.
For each method, we evaluate the performance only on those instances
from the test sets Miqp, Micp, and Minlp where the method to evaluate
may have an effect (e.g., disabling the reformulation (3.1) is only evaluated
on instances where this reformulation can be applied). The results are
summarized in Table 4. For a number of performance measures we report
the relative change caused by disabling a particular method.
We observe that deactivating one of the methods always leads to more
deteriorations than improvements for both, the dual and primal bounds at
termination. Except for one instance in the case of switching off binary
reformulations, the number of solved instances remains equal or decreases.
Recognizing SOC constraints and convexity allows to solve instances
of those special types much faster. Disabling domain propagation or one
of the primal heuristics yields a small improvement w.r.t. computation
time for easy instances, but results in weaker bounds for those instances
which could not be solved within the time limit. We further observed that
switching off the QCP local search heuristic increases the time until the first
feasible solution is found by 93% and the time until the optimal solution
is found by 26%. For RENS, the numbers are 12% and 43%, accordingly.
Therefore, we still consider applying these techniques to be worthwhile.
7. Conclusions. In this paper, we have shown how a framework for
constraint integer programming can be extended towards a solver for gen-
eral MIQCPs. We implemented methods to correctly handle the quadratic
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 441
Table 4
Relative impact of implemented MIQCP methods. Percentages in columns 3–9 are
relative to the size of the test set. Percentage in mean time column is relative to the
mean time of SCIP with default settings.
REFERENCES
www.it-ebooks.info
442 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 443
[27] J.P. Vielma, S. Ahmed, and G.L. Nemhauser, A lifted linear programming
branch-and-bound algorithm for mixed integer conic quadratic programs, IN-
FORMS J. Comput., 20 (2008), pp. 438–450.
[28] A. Wächter and L.T. Biegler, On the implementation of a primal-dual interior
point filter line search algorithm for large-scale nonlinear programming, Math.
Program., 106 (2006), pp. 25–57.
APPENDIX
In this section, detailed problem statistics are presented for the three
test sets, Micp (Table 5), Miqp, and Minlp (both in Table 6). The
columns belonging to “original problem” state the structure of the origi-
nal problem. The “presolved problem” columns show statistics about the
MIQCP after SCIP has applied its presolving routines, including the ones
described in Section 3.1. The columns “vars”, “int”, and “bin” show the
number of all variables, the number of integer variables, and the num-
ber of binary variables, respectively. The columns “linear”, “quad”, and
“soc” show the number of linear, quadratic, and second-order cone con-
straints, respectively. The column “conv” indicates whether all quadratic
constraints of the presolved MIQCP are convex or whether at least one of
them is nonconvex.
Table 5
Statistics of instances in Micp test set.
www.it-ebooks.info
444 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE
Table 6
Statistics of instances in Miqp (first part) and Minlp (second part) test set.
www.it-ebooks.info
PART VII:
Combinatorial Optimization
www.it-ebooks.info
www.it-ebooks.info
COMPUTATION WITH POLYNOMIAL EQUATIONS AND
INEQUALITIES ARISING IN
COMBINATORIAL OPTIMIZATION
JESUS A. DE LOERA∗ , PETER N. MALKIN† , AND PABLO A. PARRILO‡
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 447
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_16,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
448 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 449
www.it-ebooks.info
450 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
m
where g α := i=1 giαi and a polynomial s(x) ∈ R[x] is SOS if itcan be
written as a sum of squares of other polynomials, that is, s(x) = i qi2 (x)
for some qi (x) ∈ R[x]. We note that the cone of G is also called a preordering
generated by G in [36]. If s(x) is SOS, then clearly s(x) ≥ 0 for all x ∈ Rn .
The sum in the definition of cone(G) is finite, with a total of 2m terms,
corresponding to the subsets of {g1 , . . . , gm }.
The notions of ideal and cone are standard in algebraic geometry,
but they also have inherent convex geometry: Ideals are affine sets and
cones are closed under convex combinations and non-negative scalings, i.e.,
they are actually cones in the convex geometry sense. Ideals and cones
are used for deriving new valid constraints, which are logical consequences
of the given constraints. For example, notice that by construction, every
polynomial in ideal({f1 , . . . , fm }) vanishes in the solution set of the system
f1 (x) = 0, . . . , fm (x) = 0 over the algebraic closure of K. Similarly, every
element of cone({g1 , ..., gm }) is clearly non-negative on the feasible set of
g1 (x) ≥ 0, . . . , gm (x) ≥ 0 over R.
It is well-known that optimization algorithms are intimately tied to the
development of infeasibility certificates. For example, the simplex method
is closely related to Farkas’ lemma. Our starting point is a generalization of
this famous principle. We start with a description of two powerful infeasi-
bility certificates for polynomial systems which generalize the classical ones
for linear optimization. First, as motivation, recall from elementary linear
algebra the “Fredholm alternative theorem” (e.g., see page 28 Corollary
3.1.b in [46]):
Theorem 1.1 (Fredholm’s alternative). Given a matrix A ∈ Km×n
and a vector b ∈ Km ,
x ∈ Kn s.t. Ax + b = 0 ⇔ ∃ μ ∈ Km s.t. μT A = 0, μT b = 1.
It turns out that there are much stronger versions for general polynomials,
which unfortunately do not seem to be widely known among optimizers
(for more details see e.g., [5]).
Theorem 1.2 (Hilbert’s Nullstellensatz). Let F := {f1 , . . . , fm } ⊆
K[x]. Then,
n
x ∈ K s.t. f1 (x) = 0, ..., fs (x) = 0 ⇔ 1 ∈ ideal(F ).
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 451
x ∈ Rn s.t. Ax + b = 0, Cx + d ≥ 0
+
∃λ ∈ Rm
+, ∃μ ∈ R k
s.t. μ A + λT C = 0, μT b + λT d = −1.
T
Again, although not widely known in optimization, it turns out that similar
certificates do exist for non-linear systems of polynomial equations and
inequalities over the reals. The result essentially appears in this form in [2]
and is due to Stengle [49].
Theorem 1.4 (Positivstellensatz). Let F := {f1 , . . . , fm } ⊂ R[x] and
G := {g1 , . . . , gk } ⊂ R[x].
The theorem states that for every infeasible system of polynomial equa-
tions
m and inequalities,
there exists a simple polynomial identity of the form
α
i=1 β i f i + α∈{0,1} sα g = −1 for some βi , sα ∈ R[x] where sα are SOS,
n
www.it-ebooks.info
452 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
f := x2 + x21 + 2 = 0, g := x1 − x22 + 3 ≥ 0.
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 453
www.it-ebooks.info
454 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 455
μ1 f1 + μ2 f2 + μ3 f3 + μ4 f4 = 1
⇔ μ1 (x21 − 1) + μ2 (2x1 x2 + x3 ) + μ3 (x1 + x2 ) + μ4 (x1 + x3 ) =1
⇔ μ1 x21 + 2μ2 x1 x2 + (μ2 + μ4 )x3 + μ3 x2 + (μ3 + μ4 )x1 − μ1 = 1.
Then, equating coefficients on the left and right hand sides of the equation
above gives the following linear system of equations:
www.it-ebooks.info
456 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
This system has two solutions (x1 , x2 , x3 ) = (1, −1, 2) and (x1 , x2 , x3 ) =
(−1, 1, 2). Let F = {f1 , f2 , f3 }. So, we abbreviate the above system as
F (x) = 0. We can replace the monomials 1, x1 , x2 , x3 , x21 , x1 x2 with the
variables λ1 , λx1 , λx2 , λx3 , λx21 , λx1 x2 respectively. The system F (x) = 0
thus gives rise to the following set of linear equations:
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 457
www.it-ebooks.info
458 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 459
www.it-ebooks.info
460 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
[37, 44, 25]. The basic idea behind the FPNulLA algorithm is that, if
1 ∈ F , then instead of replacing F with F + and thereby increasing deg(F ),
we check to see whether there are any new polynomials in F + with degree
at most deg(F ) that were not in F and add them to F , and then check
again whether 1 ∈ F . More formally, if 1 ∈ F , then we replace F with
F + ∩ K[x]d where K[x]d is the set of all polynomials with degree at most
d = deg(F ). We keep replacing F with F + ∩ K[x]d until either 1 ∈ F or
we reach a fixed point, F = F + ∩ K[x]d . This process must terminate.
Note that if we find that 1 ∈ F at some stage of FPNulLAs this implies
that there exists an infeasibility certificate of the form 1 = i=1 βi fi where
β1 , ..., βs ∈ K[x] and the polynomials f1 , ..., fs ∈ K[x] are a vector space
basis of the original set F .
Moreover, we can also improve NulLA by proving that the system
F (x) = 0 is feasible well before reaching the Nullstellensatz bound as fol-
lows. When 1 ∈ F and F = F + ∩ K[x]d , then we could set F ← F + and
d ← d + 1 and repeat the above process. However, when we reach the
fixed point F = F + ∩ K[x]d , we can use the following theorem to determine
if the system is feasible and if so how many solutions it has. First, we
introduce some notation. Let πd : K[[x]] → K[[x]]d be the truncation or
projection of a power series onto a polynomial of degree at most d with coef-
ficients in K. Below, we abbreviate dim(πd (F ◦ )) as dimd (F ◦ ) and similarly
dim(πd−1 (F ◦ )) as dimd−1 (F ◦ ).
Theorem 2.2. Let F ⊂ K[x] be a finite dimensional vector space and
let d = deg(F ). If F = F + ∩ K[x]d and dimd (F ◦ ) = dimd−1 (F ◦ ), then
dim(I ◦ ) = dimd (F ◦ ) where I = ideal(F ).
See the Appendix for a proof of Theorem 2.2 or see original proof in
[37]. There are many equivalent forms of the above theorem that appear
in the literature (see e.g., [37, 44, 25]).
Recall from Theorem 2.1, that there are dim(I ◦ ) solutions of F (x) = 0
over K including multiplicities where I = ideal(F ) and exactly dim(I ◦ ) so-
lutions when I is radical. Checking the fixed point condition in FPNulLA
whether F = F + ∩ K[x]d is equivalent to checking whether dim(F ) =
dim(F + ∩ K[x]d ). Furthermore, to check the condition that dimd (F ◦ ) =
dimd−1 (F ◦ ), we need to compute dim(F + ∩ K[x]d ) and dim(F ∩ K[x]d−1 )
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 461
1 + x + x2 = 0, 1 + y + y 2 = 0, x2 + xy + y 2 = 0.
F + = F + xF + yF
= F + span({x + x2 + x3 , x + xy + xy 2 , x3 + x2 y + xy 2 })
+ span({y + xy + x2 y, y + y 2 + y 3 , x2 y + xy 2 + y 3 }).
and
www.it-ebooks.info
462 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 463
100
Exact
NulLA1
NulLA2
FPNulLA1
80
60
% Infeasible
40
20
0
0 0.02 0.04 0.06 0.08 0.1
Edge Probability
Fig. 1. Non-3-colorable graphs with NulLA rank 1 and 2 and FPNulLA rank 1.
www.it-ebooks.info
464 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
G(100, p) above. The NulLA rank one and FPNulLA rank one approaches
ran on average in less than a second for all p values. However, the exact
approach using zchaff ran in split second times for all p values, but pre-
liminary computational experiments indicate that the gap in running times
between the exact approach and the FPNulLA rank one approach closes
for larger graphs. The NulLA rank two approach ran on average in less
than a second for p ≤ 0.04 and p ≥ 0.08, but the average running times
peaked at about 24 seconds at p = 0.65. Interestingly, for each approach,
the average running time peaked at the transition from feasible to infeasi-
ble at the p value where about half of the graphs were proven infeasible by
the approach.
In order to better understand the practical implications of the NulLA
and FPNulLA approaches, there needs to be more detailed computational
studies performed to compare this approach with the exact method using
satisfiability and other exact approaches such as traditional integer pro-
gramming techniques. See [8] for some additional experimental data.
2.5. Application: The structure of non-3-colorable graphs. In
this section, we state a combinatorial characterization of those graphs
whose combinatorial system of equations encoding 3-colorability has a
NulLA rank of one thus giving a class of polynomial solvable graphs by
Lemma 2.1, and also, we recall bounds for the NulLA rank (see [35]):
Theorem 2.3. The NulLA rank for a polynomial encoding over F2 of
the 3-colorability of a graph with n vertices with no 3-coloring is at least
one and at most 2n. Moreover, in the case of a non-3-colorable graph
containing an odd-wheel (e.g. a 4-clique) as a subgraph, the NulLA rank is
exactly one.
Now we look at those non-3-colorable graphs that have a NulLA rank of
one. Let A denote the set of all possible directed edges or arcs in the graph
G. We are interested in two types of substructures of the graph G: oriented
partial-3-cycles and oriented chordless 4-cycles (see Figure 2). An oriented
partial-3-cycle is a set of two arcs of a 3-cycle, that is, a set {(i, j), (j, k)}
also denoted (i, j, k) where (i, j), (j, k), (k, i) ∈ A. An oriented chordless
4-cycle is a set of four arcs {(i, j), (j, l), (l, k), (k, i)} also denoted (i, j, k, l)
where (i, j), (j, l), (l, k), (k, i) ∈ A and (j, k), (i, l) ∈ A.
Fig. 2. (i) oriented partial 3-cycle and (ii) an oriented chordless 4-cycle.
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 465
8
7 9
6 10
5 11
1
3 n
2
www.it-ebooks.info
466 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
C := {(1, 2, 3, 7), (2, 3, 4, 8), (3, 4, 5, 9), (4, 5, 1, 10), (1, 10, 11, 7),
(2, 6, 11, 8), (3, 7, 11, 9), (4, 8, 11, 10), (5, 9, 11, 6)}.
Figure 4 illustrates the edge directions for the 4-cycles of C. Each undi-
rected edge of the graph is contained in exactly two 4-cycles, so C satisfies
Condition 1 of Theorem 2.4. Now,
and |C(i,j) | ≡ 0 (mod 2) for all other arcs (i, j) ∈ A where i < j. Thus,
|C(i,j) | ≡ 1 (mod 2),
(i,j)∈A,i<j
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 467
all the constraint polynomials have degree one. As we have seen earlier
in the Positivstellensatz (Theorem 1.4 above), the emptiness of a basic
semialgebraic set can be certified through an algebraic identity involving
sum of squares of polynomials.
The connection between sum of squares decompositions of polynomi-
als and convex optimization can be traced back to the work of N. Z. Shor
[48]. His work went relatively unnoticed for several years, until several au-
thors, including Lasserre, Nesterov, and Parrilo, observed, around the year
2000, that the existence of sum of squares decompositions and the search
for infeasibility certificates for a semialgebraic set can be addressed via a
sequence of semidefinite programs relaxations [23, 40, 41, 39]. The first
part of this section will be a short description of the connections between
sums of squares and semidefinite programming, and how the Positivstel-
lensatz allows, in a analogous way to what was presented in Section 2 for
the Nullstellensatz, for a systematic way to formulate these semidefinite
relaxations.
A very central preoccupation of combinatorial optimizers has been the
understanding of the facets that describe the integer hull (normally binary)
of a combinatorial problem. As we will see later on, one can recover quite
a bit of information about the integer hull of combinatorial problems from
a sequence combinatorially controlled SDPs. This kind of approach was
pioneered in the lift-and-project method of Balas, Ceria and Cornuéjols
[1], the matrix-cut method of Lovász and Schrijver [34] and the lineariza-
tion technique of Sherali-Adams [47]. Here we try to present more recent
developments (see [30] and references therein for a very extensive survey).
www.it-ebooks.info
468 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
Notice that the membership test in the main loop of the algorithm
is, by the results described at the beginning of this section, equivalent to
a finite-sized semidefinite program. Similarly to the Nullstellensatz case,
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 469
www.it-ebooks.info
470 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
plus the condition λ1 > 0 (without loss of generality, we can take λ1 = 1).
The first semidefinite constraint arises from linearizing the square of an
arbitrary degree one polynomial, while the other two constraints are the
direct linearization of the original equality and inequality constraints. The
resulting problem is a semidefinite program, and in this case, its infeasibility
directly shows that the original system of polynomial inequalities does not
have a solution.
An appealing geometric interpretation follows from considering the
projection of the feasible set of these relaxations in the space of original
variables (i.e., λxi ). For the linear algebra relaxations of Section 2.2, we ob-
tain outer approximations to the affine hull of the solution set (an algebraic
variety), while the SDP relaxation described here constructs outer approx-
imations to the convex hull of the corresponding semialgebraic set. This
latter viewpoint will be discussed in Section 3.3, for the case of equations
arising from combinatorial problems.
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 471
www.it-ebooks.info
472 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
2. The ideal I is THk -exact if the k-th theta body THk (I) coincides
with the closure of conv(VR (I)).
3. The theta-rank of I is the smallest k such that THk (I) coincides
with the closure of conv(VR (I)).
Example 13. Consider the ideal I = "x2 y − 1# ⊂ R[x, y]. Then
conv(VR (I)) = {(p1 , p2 ) ∈ R2 : p2 > 0}, and any linear polynomial that
√ over V√
is non-negative R (I) is of the form α + βy, where α, β ≥ 0. Since
αy + β ≡ ( αxy)2 + ( β)2 mod I, I is (1, 2)-sos and TH2 -exact.
Example 14. For the case of the stable sets of a graph G, one can
see that
⎧ ⎫
⎪
⎪ ∃ M 0, M ∈ R(n+1)×(n+1) such that ⎪⎪
⎨ ⎬
n M00 = 1,
TH1 (IG ) = y ∈ R : .
⎪
⎪ M0i = Mi0 = Mii = yi ∀ i ∈ V ⎪
⎪
⎩ ⎭
Mij = 0 ∀ {i, j} ∈ E
It is known that TH1 (IG ) is precisely Lovász’s theta body of G. The ideal
IG is TH1 -exact precisely when the graph G is perfect.
By definition, TH1 (I) ⊇ TH2 (I) ⊇ · · · ⊇ conv(VR (I)). As seen in
Example 13, conv(VR (I)) may not always be closed and so the theta-body
sequence of I can converge, if at all, only to the closure of conv(VR (I)).
But the good news for combinatorial optimization is that there is plenty of
good behavior for problems arising with a finite set of possible solutions.
3.4. Application: cuts and exact finite sets. We discuss now a
few important combinatorial examples. As we have seen in Section 2.5 for
3-colorability, and in the preceding section for stable sets, in some special
cases it is possible to give nice combinatorial characterizations of when
low-degree certificates can exactly recognize infeasibility. Here are a few
additional results for the real case:
Example 15. For the max-cut problem we saw earlier, the defining
vanishing ideal is I(SG) = "x2e − xe ∀ e ∈ E, xT ∀ T an odd cycle in G#.
In this case one can prove that the ideal I(SG) is TH1 -exact if and only
if G is a bipartite graph. In general the theta-rank of I(SG) is bounded
above by the size of the max-cut in G. There is no constant k such that
THk (I(SG)) = conv(SG), for all graphs G. Other formulations of max-cut
are studied in [14].
Recall that when S ⊂ Rn is a finite set, its vanishing ideal I(S) is
zero-dimensional and real radical (see [36] Section 12.5 for a definition of
the real radical). In what follows, we say that a finite set S ⊂ Rn is exact
if its vanishing ideal I(S) ⊆ R[x] is TH1 -exact.
Theorem 3.2 ([15]). For a finite set S ⊂ Rn , the following are
equivalent.
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 473
1. S is exact.
2. There is a finite linear inequality description of conv(S) in which
for every inequality g(x) ≥ 0, g is 1-sos mod I(S).
3. There is a finite linear inequality description of conv(S) such that
for every inequality g(x) ≥ 0, every point in S lies either on the
hyperplane g(x) = 0 or on a unique parallel translate of it.
4. The polytope conv(S) is affinely equivalent to a compressed lattice
polytope (every reverse lexicographic triangulation of the polytope
is unimodular with respect to the defining lattice).
Example 16. The vertices of the following 0/1-polytopes in Rn are
exact for every n: (1) hypercubes, (2) (regular) cross polytopes, (3) hyper-
simplices (includes simplices), (4) joins of 2-level polytopes, and (5) stable
set polytopes of perfect graphs on n vertices.
More strongly one can say the following.
Proposition 3.1. Suppose S ⊆ Rn is a finite point set such that for
each facet F of conv(S) there is a hyperplane HF such that HF ∩conv(S) =
F and S is contained in at most t + 1 parallel translates of HF . Then I(S)
is THt -exact.
In [15] the authors show that theta bodies can be computed explicitly
as projections to the feasible set of a semidefinite program. These SDPs are
constructed using the combinatorial moment matrices introduced by [29].
www.it-ebooks.info
474 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
x0
x1
x3 x2
x5 x4
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 475
APPENDIX
A. Proofs. This appendix contains proofs of some the results used in
the main body of the paper that are either hard to find or whose original
proof, available elsewhere, is not written in the language of this survey.
For the purpose of formally prove Theorem 2.2 we need to formalize
further some of the notions of Section 2: The space K[[x]] is isomorphic to
the dual vector space of K[x] consisting of all linear functionals on K[x],
that is, K[[x]] ∼
= Hom(K[x], K). We choose to use K[[x]] instead of as
the dual vector space of K[x] (see e.g., [24]) because using K[[x]] makes
clearer the linearization of a system of polynomial equations. The map
τ : K[[x]] → Hom(K[x], K) where (τ (λ))(g) = α∈Nn λα gα = λ ∗ g for
all λ ∈ K[[x]]
and g ∈ K[x] is an isomorphism and the inverse map is
τ −1 (ψ) = α∈N n ψ(x α α
)x for all ψ ∈ Hom(K[x], K). For a given set
F ⊆ K[x], there is an analogue of the annihilator F ◦ in the context of
the dual vector space Hom(K[x], K) as follows: Ann(K[x], F ) := {ψ ∈
Hom(K[x], K) : ψ(f ) = 0, ∀f ∈ F }. Note that F◦ ∼ = Ann(K[x], F ) since
τ (F ◦ ) = Ann(K[x], F ).
Lemma A.1. Let F ⊆ K[x] be a vector subspace and k ∈ N. Then,
dim(K[x]k /(F ∩ K[x]k )) = dimk (F ◦ ).
Proof. We know from Theorem 3.14 in [45], Ann(K[x], F ∩ K[x]k ) =
Ann(K[x], F ) + Ann(K[x], K[x]k ); thus, (F ∩ K[x]k )◦ = F ◦ + K[x]◦k , and
www.it-ebooks.info
476 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 477
www.it-ebooks.info
478 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
(B + )[∗,d] ⊆ (F ⊕ B)[∗,d] = F ⊕ B.
[∗,d]
But 1 ∈ B, so K[x]d ⊆ (B + ) which then implies K[x]d = F ⊕ B.
Then, since B ⊆ K[x]d−1 , we also have K[x]d−1 = (F ∩ K[x]d−1 ) ⊕ B, and
thus, dim(K[x]d /F ) = dim(B) = dim(K[x]d−1 /F ), which is the stopping
criterion of the outer loop.
Now, we show that NulLA and FPNulLA algorithms run in polynomial
time in the bit-size of the input data when the Nullstellensatz degree is
assumed to be fixed.. To/ begin, note that the number of monomials xα
with deg(xα ) ≤ k is n+k k , which is O(n ).
k
Proof. (of Lemma 2.1). Let d = deg(F ). First note that by definition
(see section 2.1 in [46]) the input size of the defining basis {f1 , f2 , . . . , fm }
of F equals O(cmnd ) where c is the average bit-size of the coefficients in
the basis.
For the proof of (1), observe that in the k th iteration of Algorithm 1
(when the F + operation has increased the degree of F by k), we solve a
system of linear equations Ak x = bk to find coefficients of the Nullstellen-
satz certificate in Step 2 of Algorithm 1. The rows of Ak consist of vectors
of coefficients of all polynomials of the form xα fi where i = 1, . . . m and
deg(xα ) ≤ k. Therefore, Ak has O(mnk ) rows and each row has input size
O(cnd+k ). Hence, the input size of Ak is O(cmnd+2k ). The input size of
bk , which is a vector of zeros and ones, is O(mnk ). Thus, the input size
of the linear system Ak x = bk is O(cmnd+2k ), which is polynomial in the
input size of the basis of F and n, and thus, the system can be solved in
polynomial time (see e.g. Theorem 3.3 of [46]). The complexity of the first
L iterations is thus bounded by L times the complexity of the Lth iteration,
which is polynomial in L, n and the input size of the defining basis of F .
This completes the proof of the first part.
We now prove part (2). Denote by Fk the vector space computed at
the start of the kth outer loop iteration. Let {g1 , . . . gmk } be a basis of Fk
which was given to us either as an input or from the previous iteration.
Observe that the deg(Fk ) = d + k, so each basis polynomial of Fk has bit
size O(log2 (|K|)nd+k ). Note that dim(Fk ) = mk ≤ O(nd+k ); therefore the
bit size of the entire basis {g1 , . . . gmk } is M = O(log2 (|K|)n2(d+k) ). Note
that M is polynomial size in the input size of the initial basis f1 , . . . , fm .
Now we proceed to analyze the cost of the kth iteration, meaning steps
3 to 10 in the pseudocode. As in part (1), Step 3, involves solving a linear
system of size M ; thus it can be done in polynomial time. In Step 4 we
check whether dim(Fk ) = dim(Fk+ ∩ K[x]d+k ), which involves computing
a basis of Fk+ ∩ K[x]d+k . Note that Fk+ has bit size (n + 1)M , and to
compute the desired basis we perform Gaussian elimination on a matrix of
size (n + 1)M , which is polynomial time. If dim(Fk ) = dim(Fk+ ∩ K[x]d+k ),
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 479
REFERENCES
www.it-ebooks.info
480 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO
[15] J. Gouveia, P.A. Parrilo, and R.R. Thomas, Theta bodies for polynomial ideals,
SIAM Journal on Optimization, 20 (2010), pp. 2097–2118.
[16] D. Grigoriev and N. Vorobjov, Complexity of Nullstellensatz and Positivstel-
lensatz proofs, Annals of Pure and Applied Logic, 113 (2002), pp. 153–160.
[17] D. Henrion and J.-B. Lasserre, GloptiPoly: Global optimization over polyno-
mials with MATLAB and SeDuMi, ACM Trans. Math. Softw., 29 (2003),
pp. 165–194.
[18] , Detecting global optimality and extracting solutions in GloptiPoly, in Posi-
tive polynomials in control, Vol. 312 of Lecture Notes in Control and Inform.
Sci., Springer, Berlin, 2005, pp. 293–310.
[19] T. Hogg and C. Williams, The hardest constraint problems: a double phase
transition, Artif. Intell., 69 (1994), pp. 359–377.
[20] A. Kehrein and M. Kreuzer, Characterizations of border bases, Journal of Pure
and Applied Algebra, 196 (2005), pp. 251 – 270.
[21] A. Kehrein, M. Kreuzer, and L. Robbiano, An algebraist’s view on border bases,
in Solving Polynomial Equations: Foundations, Algorithms, and Applications,
A. Dickenstein and I. Emiris, eds., Vol. 14 of Algorithms and Computation in
Mathematics, Springer Verlag, Heidelberg, 2005, ch. 4, pp. 160–202.
[22] J. Kollár, Sharp effective Nullstellensatz, Journal of the AMS, 1 (1988),
pp. 963–975.
[23] J. Lasserre, Global optimization with polynomials and the problem of moments,
SIAM J. on Optimization, 11 (2001), pp. 796–817.
[24] J. Lasserre, M. Laurent, and P. Rostalski, Semidefinite characterization and
computation of zero-dimensional real radical ideals, Found. Comput. Math.,
8 (2008), pp. 607–647.
[25] , A unified approach to computing real and complex zeros of zero-
dimensional ideals, in Emerging Applications of Algebraic Geometry, M. Puti-
nar and S. Sullivant, eds., vol. 149 of IMA Volumes in Mathematics and its
Applications, Springer, 2009, pp. 125–155.
[26] J.B. Lasserre, An explicit equivalent positive semidefinite program for nonlinear
0-1 programs, SIAM J. on Optimization, 12 (2002), pp. 756–769.
[27] M. Laurent, A comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre
relaxations for 0–1 programming, Math. Oper. Res., 28 (2003), pp. 470–496.
[28] , Semidefinite relaxations for max-cut, in The Sharpest Cut: The Impact
of Manfred Padberg and His Work, M. Grötschel, ed., Vol. 4 of MPS-SIAM
Series in Optimization, SIAM, 2004, pp. 257–290.
[29] , Semidefinite representations for finite varieties, Mathematical Program-
ming, 109 (2007), pp. 1–26.
[30] , Sums of squares, moment matrices and optimization over polynomials, in
Emerging Applications of Algebraic Geometry, M. Putinar and S. Sullivant,
eds., Vol. 149 of IMA Volumes in Mathematics and its Applications, Springer,
2009, pp. 157–270.
[31] J. Löfberg, YALMIP: A toolbox for modeling and optimization in MATLAB, in
Proceedings of the CACSD Conference, Taipei, Taiwan, 2004.
[32] L. Lovász, Stable sets and polynomials, Discrete Math., 124 (1994), pp. 137–153.
[33] , Semidefinite programs and combinatorial optimization, in Recent advances
in algorithms and combinatorics, B. Reed and C. Sales, eds., Vol. 11 of CMS
Books in Mathematics, Spring, New York, 2003, pp. 137–194.
[34] L. Lovász and A. Schrijver, Cones of matrices and set-functions and 0-1 opti-
mization, SIAM J. Optim., 1 (1991), pp. 166–190.
[35] S. Margulies, Computer Algebra, Combinatorics, and Complexity: Hilbert’s Null-
stellensatz and NP-Complete Problems, PhD thesis, UC Davis, 2008.
[36] M. Marshall, Positive polynomials and sums of squares., Mathematical Sur-
veys and Monographs, 146. Providence, RI: American Mathematical Society
(AMS). xii, p. 187, 2008.
www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 481
[37] B. Mourrain, A new criterion for normal form algorithms, in Proc. AAECC,
Vol. 1719 of LNCS, Springer, 1999, pp. 430–443.
[38] B. Mourrain and P. Trébuchet, Stable normal forms for polynomial system
solving, Theoretical Computer Science, 409 (2008), pp. 229 – 240. Symbolic-
Numerical Computations.
[39] Y. Nesterov, Squared functional systems and optimization problems, in High
Performance Optimization, J.F. et al., eds., ed., Kluwer Academic, 2000,
pp. 405–440.
[40] P.A. Parrilo, Structured semidefinite programs and semialgebraic geometry meth-
ods in robustness and optimization, PhD thesis, California Institute of Tech-
nology, May 2000.
[41] , Semidefinite programming relaxations for semialgebraic problems, Mathe-
matical Programming, 96 (2003), pp. 293–320.
[42] P.A. Parrilo and B. Sturmfels, Minimizing polynomial functions, in Proceed-
ings of the DIMACS Workshop on Algorithmic and Quantitative Aspects
of Real Algebraic Geometry in Mathematics and Computer Science (March
2001), S. Basu and L. Gonzalez-Vega, eds., American Mathematical Society,
Providence RI, 2003, pp. 83–100.
[43] S. Prajna, A. Papachristodoulou, P. Seiler, and P.A. Parrilo, SOSTOOLS:
Sum of squares optimization toolbox for MATLAB, 2004.
[44] G. Reid and L. Zhi, Solving polynomial systems via symbolic-numeric reduction
to geometric involutive form, Journal of Symbolic Computation, 44 (2009),
pp. 280–291.
[45] S. Roman, Advanced Linear Algebra, Vol. 135 of Graduate Texts in Mathematics,
Springer New York, third ed., 2008.
[46] A. Schrijver, Theory of linear and integer programming, Wiley, 1986.
[47] H. Sherali and W. Adams, A hierarchy of relaxations between the continuous
and convex hull representations for zero-one programming problems, SIAM
Journal on Discrete Mathematics, 3 (1990), pp. 411–430.
[48] N.Z. Shor, Class of global minimum bounds of polynomial functions, Cybernetics,
23 (1987), pp. 731–734.
[49] G. Stengle, A Nullstellensatz and a Positivstellensatz in semialgebraic geometry,
Mathematische Annalen, 207 (1973), pp. 87–97.
[50] H. Stetter, Numerical Polynomial Algebra, SIAM, 2004.
[51] L. Vandenberghe and S. Boyd, Semidefinite programming, SIAM Review, 38
(1996), pp. 49–95.
[52] L. Zhang, zchaff v2007.3.12. Available at http://www.princeton.edu/∼chaff/
zchaff.html, 2007.
www.it-ebooks.info
www.it-ebooks.info
MATRIX RELAXATIONS
IN COMBINATORIAL OPTIMIZATION
FRANZ RENDL∗
(COP ) z ∗ = min{c(F ) : F ∈ F }.
P := conv{xF : F ∈ F }
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 483
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_17,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
484 FRANZ RENDL
z ∗ = min{cT xF : F ∈ F } = min{cT x : x ∈ P }.
The first minimization is over the finite set F , the second one is a linear
program. This is the basic principle underlying the polyhedral approach to
solve combinatorial optimization problems. The practical difficulty lies in
the fact that in general the polyhedron P is not easily available. We recall
two classical examples to illustrate this point.
As a nice example, we consider first the linear assignment problem.
For a given n × n matrix C =(cij ), it consists of finding a permutation φ
of N = {1, . . . , n}, such that i∈N ciφ(i) is minimized. The set of all such
permutations is denoted by Π. In our general setting, we define the ground
set to be E = N ×N , the set of all ordered pairs (i, j). Feasible solutions are
now given through permutations φ as Fφ ⊂ E such that Fφ = {(i, φ(i)) :
i ∈ N }. In this case the characteristic vector of Fφ is the permutation
matrix Xφ given by (Xφ )ij = 1 if and only if j = φ(i). Birkhoff’s theorem
tells us that the convex hull of the set of permutation matrices Π is the set
of doubly stochastic matrices Ω = {X : Xe = X T e = e, X ≥ 0} 1 .
Theorem 1.1. conv{Xφ : φ ∈ Π} = Ω.
Hence we have a simple polyhedral description of P in this case. There-
fore
min ciφ(i) : φ ∈ Π} = min{"C, X# : X ∈ Ω ,
i
www.it-ebooks.info
MATRIX RELAXATIONS 485
max{eT x : x ∈ F ST AB(G)}.
www.it-ebooks.info
486 FRANZ RENDL
K ∗ := {y ∈ Rd : "x, y# ≥ 0 ∀x ∈ K}.
It is well known and can easily be shown, that both N and S + are self-dual.
The dual cone of C is the cone
C ∗ = conv{aaT : a ≥ 0} = {Y : ∃X ≥ 0, Y = XX T }
of completely positive matrices. Membership in S + can be checked in
polynomial time, for instance through the existence (or non-existence) of
the Cholesky decomposition. In contrast, it is NP-complete to decide if a
matrix does not belong to C, see [49]. Wile positive semidefinite matrices
are covered in any reasonable text book on advanced linear algebra, we refer
the reader to [4] for a thorough treatment of completely positive matrices
and to [33] for a recent survey on copositive matrices.
Graphs: A graph G = (V, E) is given through the set of vertices V and
the set of edges E. We sometimes write E(G) to indicate the dependence
on G. If S ⊂ V , we denote by δ(S) := {uv ∈ E : u ∈ S, v ∈ / S} the set of
edges joining S and V \ S. We also say that the edges in δ(S) are cut by S.
2. Matrix relaxations: basic ideas. The classical polyhedral ap-
proach is formulated as a relaxation in Rn , the natural space to embed F .
Here n denotes the cardinality of E, |E| = n. Matrix-based relaxations of
(COP) are easiest explained as follows. To an element xF ∈ F we associate
the matrix xF xTF and consider
diag(xF xTF ) = xF ,
because xF ∈ {0, 1}n. This property immediately shows that the original
linear relaxation, obtained through a partial description of P can also be
modeled in this setting. The full power of matrix lifting is based on the
possibility to constrain M to matrix cones other than polyhedral ones.
Moreover, quadratic constraints on xF will turn into linear constraints on
matrices in M.
If K is some matrix cone and matrices C, A1 , . . . , Am and b ∈ Rm are
given, the problem
is called a linear program over K. Linear programs over S + are also called
semidefinite programs (SDP) and those over C or C ∗ are called copositive
www.it-ebooks.info
MATRIX RELAXATIONS 487
programs (CP) for short. In this paper we will mostly concentrate on SDP
and CP relaxations of combinatorial optimization problems.
The duality theory of linear programming generalizes easily to conic
linear programs. The (Lagrangian) dual associated to (2.2) is given as
sup{bT y : C − yi Ai ∈ K ∗ }. (2.3)
i
Weak duality (sup ≤ inf) holds by construction of the dual. Strong du-
ality (sup = inf), as well as attainment of the respective optima requires
some sort of regularity of the feasible regions. We refer to Duffin [17] for
the original paper, and to the handbook [70] on semidefinite programming
for a detailed discusssion of SDP. The existence of feasible points in the
interior of the primal and dual cone insures the following characterization
of optimality. For ease of notation we write A(X) = b for the equations
in (2.2). The linear operator A has an adjoint AT , defined through the
adjoint identity
We should point out that the inner product on the left is in Rm and on the
right it is in the space of n × n matrices. In this paper the inner products
will always be canonical, so we do not bother to overload the notation to
distinguish them. The adjoint can be expressed as AT (y) = i yi Ai .
Theorem 2.1. [17, 70] Suppose there exists X0 ∈ int(K) such that
A(X0 ) = b and there is y0 such that C − AT (y0 ) ∈ int(K ∗ ). Then the
optima in (2.2) and (2.3) are attained. Moreover, X and y are optimal if
and only if A(X) = b, X ∈ K, Z := C − AT (y) ∈ K ∗ and the optimal
objective values coincide, "X, Z# = 0.
Matrix relaxations can be used in several ways to better understand
(COP). The seminal work of Goemans and Williamson [23] opened the way
to new approximation techniques for some COPs. We will briefly explain
them as we develop the various relaxations. From a computational point of
view, SDP based relaxations pose a serious challenge to existing algorithms
for SDP. We will describe the currently most efficient ways to solve these
relaxations (at least approximately).
There exist several recent survey papers devoted to the connection be-
tween semidefinite optimization and integer programming. The interested
reader is referred to [39] for an extensive summary on the topic covering
the development until 2003. The surveys by Lovász [43], Goemans [22] and
Helmberg [29] all focus on the same topic, but also reflect the scientific
interests and preferences of the respective authors. The present paper is
no exception to this principle. The material selected, and also omitted,
reflects the author’s subjective view on the subject. It is a continuation
and an extension of [57].
www.it-ebooks.info
488 FRANZ RENDL
L := diag(Ae) − A. (3.1)
The following simple properties of the Laplacian L will be used later on.
Proposition 3.1. The Laplacian L of the matrix A satisfies Le = 0
and A ≥ 0 implies that L 0.
Graph partition problems ask to separate the vertices of a graph into a
specified number of partition blocks so that the total weight of edges joining
different blocks is minimized or maximized. Partition problems lead rather
naturally to matrix based relaxations because encoding whether or not
vertices i and j ∈ V are separated has a natural matrix representation,
as we will see briefly. We recall the definition of a cut given by S ⊂ V :
δ(S) = {uv ∈ E(G) : u ∈ S, v ∈ / S}.
3.1. Max-k-Cut. For k ≥ 2, Max-k-Cut asks to partition V (G) into
k subsets (S1 , . . . , Sk ) such that the total weight of edges joining distinct
subsets is maximized. We introduce characteristic vectors si ∈ {0, 1}n for
each Si . The n × k matrix S = (s1 , . . . , sk ) is called the k-partition matrix.
Since ∪i Si = V , we have
k
si = Se = e.
i=1
www.it-ebooks.info
MATRIX RELAXATIONS 489
T
We note that diag(M ) = i λi diag(si si ) = i λi si = e. The Schur-
complement lemma shows that M − 1t eeT 0.
Proposition 3.2 clearly follows with λi = 1. We recall that δ(Si ) de-
notes the set of edges joining Si to V \ Si . A simple calculation using basic
properties of the Laplacian L shows that
sTi Lsi = auv (3.2)
uv∈δ(Si )
gives the weight of all edges cut by Si . Therefore the total weight of all
edges joining distinct subsets is given by
1 T 1
si Lsi = "S, LS#.
2 2
i
The factor 12 comes from the fact that an edge uv ∈ E(G) with u ∈ Si
and v ∈ Sj appears in both δ(Si ) and δ(Sj ). Thus Max-k-Cut can be
modeled as
1
max "S, LS#
2
such that the n × k matrix S has entries 0 or 1 and Se = e. After replacing
SS T by Y , we get the following SDP relaxation
1
zGP −k := max{ "L, Y # : diag(Y ) = e, kY − J ∈ S + , Y ∈ N }. (3.3)
2
The conditions diag(Y ) = e and kY − J ∈ S + are derived from proposition
3.2. Note in particular that Y 0 is implied by kY J 0. The
standard SDP formulation, see [19, 16], is obtained through the variable
transformation
1
X= [kY − J]
k−1
and yields, with "J, L# = 0
k−1 1
max{ "L, X# : diag(X) = e, X ∈ S + , xij ≥ − }. (3.4)
2k k−1
3.2. Max-Cut. The special case of Max-k-Cut with k = 2 is usually
simply called Max-Cut, as the task is to separate V into S and V \ S so as
to maximize the weight of edges in δ(S). In view of (3.2) we clearly have
www.it-ebooks.info
490 FRANZ RENDL
www.it-ebooks.info
MATRIX RELAXATIONS 491
Note the use of (3.6). Later on, Nesterov [51] generalizes this result to
the more general case where only L 0 is assumed. The analysis in this
case shows that the expected value of the cut y obtained from hyperplane
rounding is at least
1 T 2
y Ly ≥ zMC ≈ 0.636zMC .
4 π
Frieze and Jerrum [19] generalize the hyperplane rounding idea to Max-
k-Cut. Starting again from the Gram representation X = V T V with
unit vectors vi forming V , we now take k independent random vectors
r1 , . . . , rk ∈ Rn for rounding. The idea is that partition block Sh contains
those vertices i which have vi most parallel to rh ,
i ∈ Sh ⇐⇒ viT rh = max{viT rl : 1 ≤ l ≤ k}.
Ties are broken arbitrarily. For the computation of the probability that
two vertices are in the same partition block, it is useful to assume that
the entries of the ri are drawn independently from the standard normal
distribution
Pr(vs , vt ∈ S1 ) = Pr(vsT r1 = max vsT ri , vtT r1 = max vtT ri ).
i i
The symmetry properties of the normal distribution imply that this prob-
ability depends on ρ = cos(vsT vt ) only. We denote the resulting probability
by I(ρ). Therefore
P r(vs and vt not separated) = kI(ρ).
www.it-ebooks.info
492 FRANZ RENDL
xi xj = 0 ∀ij ∈ E(G).
x
= eT x. We collect the equations xij =
0 ∀ij ∈ E(G) in the operator equation AG (X) = 0. Therefore we get the
semidefinite programming upper bound α(G) ≤ ϑ(G),
www.it-ebooks.info
MATRIX RELAXATIONS 493
problem has been the starting point for many quite far reaching theoretical
investigations. It is beyond the scope of this paper to explain them in
detail, but here are some key results.
Grötschel, Lovász and Schrijver [25] show that α(G) can be computed
in polynomial time for perfect graphs. This is essentially a consequence
of the tractability to compute ϑ(G), and the fact that α(G) = ϑ(G) holds
for perfect graphs G. We do not explain the concept of perfect graphs here,
but refer for instance to [25, 60]. It is however a prominent open problem to
provide a polynomial time algorithm to compute α(G) for perfect graphs,
which is purely combinatorial (=not making use of ϑ(G)).
The Stable-Set problem provides a good example for approximations
based on other matrix relaxations. Looking at (4.1), we can additionally
ask that X ∈ N . In this case the individual
equations xij = 0 ∀ij ∈ E(G)
can be added into a single equation ij∈E(G) xij = 0. If we use AG for the
adjacency matrix of G, this means
"AG , X# = 0.
We will see shortly that the optimal value of the copositive program on the
right hand side is in fact equal to α(G). This result is implicitly contained
in Bomze et al [8] and was stated explicitly by de Klerk and Pasechnik
[15]. A simple derivation can be obtained from the following theorem of
Motzkin and Straus [48]. We recall that Δ := {x ∈ Rn : x ≥ 0, eT x = 1}.
Theorem 4.1. [48] Let AG be the adjacency matrix of a graph G.
Then
1
= min{xT (AG + I)x : x ∈ Δ}. (4.4)
α(G)
www.it-ebooks.info
494 FRANZ RENDL
But weak duality for conic linear programs also shows that
Finally, any matrix X of the form (4.1) is feasible for the sup-problem,
hence
Combining the last 3 inequalities, we see that equality must hold through-
out, the infimum is attained at λ = α(G) and the supremum is attained
at (4.1) with x being a characteristic vector of a stable set of size α(G).
Hence we have shown the following result.
Theorem 4.2. [15] Let G be a graph. Then
www.it-ebooks.info
MATRIX RELAXATIONS 495
optimal solution of (4.5), so χf (G) = i λi . Since i λi xi = e, we can
apply Lemma 3.1. The matrix
M= λi xi xTi (4.6)
i
therefore satisfies
χf (G)M − J ∈ S + , diag(M ) = e.
Strictly speaking, this is not a linear SDP, because both t and M are vari-
ables, but it can easily be linearized by introducing a new matrix variable
Y for tM and asking that diag(Y ) = te. The resulting problem is the
dual of
which is equal to ϑ(G). Thus we have shown the Lovász ’sandwich theo-
rem’. In [42], the weaker upper bound χ(G) is shown for ϑ(G), but it is
quite clear that the argument goes through also with χf (G).
Theorem 4.3. [42] Let G be a graph. Then α(G) ≤ ϑ(G) ≤ χf (G).
Let us now imitate the steps leading from ϑ(G) to the copositive
strengthening of the stability number from the previous section. The cru-
cial observation is that M from (4.6) and therefore tM ∈ C ∗ .
This leads to the following conic problem, involving matrices both in
S + and C ∗
t∗ := min{t : diag(M ) = e, AG (M ) = 0, tM − J ∈ S + , M ∈ C ∗ }.
Using again Lemma 3.1 we see that M from (4.6) is feasible for this problem,
therefore
t∗ ≤ χf (G).
www.it-ebooks.info
496 FRANZ RENDL
www.it-ebooks.info
MATRIX RELAXATIONS 497
is clearly equivalent to finding bw(G). Blum et. al. relax the difficult part
of bijectively mapping V to {P1 , . . . , Pn }. Let us define the constraints
Simply solving
into the SDP. Blum et. al. [7] consider the following strenghtened problem,
which is equivalent to an SDP, once we make the change of variables X =
V TV
www.it-ebooks.info
498 FRANZ RENDL
www.it-ebooks.info
MATRIX RELAXATIONS 499
how these equations are derived, we start out with the quadratic equation
S T S = M . At this point the Kronecker product of two matrices P and Q,
given by P ⊗ Q := (pij Q), is extremely useful. If P, Q and S have suitable
size, and s = vec(S), the following identity is easy to verify
Using it, we see that (S T S)ij = eTi S T Sej = "ej eTi ⊗ I, ssT #. The symmetry
of ssT allows us to replace ej eTi by Bij := 12 (ej eTi + ei eTj ). Therefore
(S T S)ij = Mij becomes
"Bij ⊗ Jn , Y # = mi mj 1 ≤ i ≤ j ≤ 3. (5.11)
www.it-ebooks.info
500 FRANZ RENDL
We will now see that MQAP has a ’simple’ description by linear equations
intersected with the cone C ∗ . Matrices Y ∈ MQAP are of order n2 × n2 .
It will be useful to consider the following partitioning of Y into n × n block
matrices Y i,j ,
⎛ 1,1 ⎞
Y . . . Y 1,n
⎜
Y = ⎝ ... .. .. ⎟ .
. . ⎠
Y n,1 . . . Y n,n
Similarly, (X T X)ij = xTi xj = tr(xi xTj ) = tr(Y i,j ) = δij . Finally, we have
( ij xij )2 = n2 for any permutation matrix X. We get the following set
of constraints for Y :
Y i,i = I, tr(Y i,j ) = δij , "J, Y # = n2 . (6.1)
i
Povh and Rendl [55] show the following characterization of MQAP , which
can be viewed as a lifted version of Birkhoff’s theorem 1.1.
Theorem 6.1. MQAP = {Y : Y ∈ C ∗ , Y satisfies (6.1)}.
It is not hard to verify that the above result would be wrong without
the seemingly redundant equation "J, Y # = n2 . We can therefore formulate
the quadratic problem QAP as a (linear but intractable) copositive program
www.it-ebooks.info
MATRIX RELAXATIONS 501
1 xT
Xii = xi ∀i ∈ I, ∈ C ∗ }.
x X
P ∈ PG := {P : P = P T , P ∈ Ω, pij = 0 ij ∈
/ E(G)}.
www.it-ebooks.info
502 FRANZ RENDL
What can be said about the speed of convergence? There are several ways
to measure the distance of π(t) from the equilibrium distribution π(∞) =
π = n1 e. One such measure is the maximum relative error at time t
|(P t )ij − πj |
r(t) := max ,
ij πj
see for instance [3]. Let
μP := max{λ2 (P ), −λn (P )} = max{|λi (P )| : i > 1}
denote the second largest eigenvalue of P in modulus (SLEM). It is well
known that μP is closely related to how fast π(t) converges to the equilib-
rium distribution n1 e.
Theorem 7.2. [3] Let P be a symmetric irreducible transition matrix.
Then
r(t) ≤ n(μP )t .
Moreover r(t) ≥ (μP )t if t is even.
Given the graph G, we can ask the question to select the transition
probabilities pij > 0 for ij ∈ E(G) in such a way that the mixing rate of
the Markov chain is as fast as possible. In view of the bounds from the
previous theorem, it makes sense to consider the following optimization
problem (in the matrix variable P ), see Boyd et. al. [9]
min{μP : P ∈ PG }.
They show that μP is in fact a convex function. It follows from the Perron-
Frobenius theorem and the spectral decomposition theorem for symmetric
matrices that μP is either the smallest or the largest eigenvalue of P − n1 J in
absolute value. Hence we can determine μP as the solution of the following
SDP, see [9]
1
min{s : sI P − J −sI, P ∈ PG }. (7.1)
n
The variables are s and the matrix P . In [9], it is shown that an optimal
choice of P may significantly increase the mixing rate of the resulting chain.
Some further extensions of this idea are discussed in [64].
8. Computational progress. Up to now we have mostly concen-
trated on the modeling power of SDP and CP. From a practical point of
view, it is also important to investigate the algorithmic possibilities to ac-
tually solve the relaxations. Solving copositive programs is at least as hard
as general integer programming, hence we only consider solving SDP, which
are tractable. Before we describe various algorithms to solve SDP, we recall
the basic assumptions from Theorem 2.1,
∃X0 ) 0, y0 such that A(X) = b, C − AT (y0 ) ) 0. (8.1)
www.it-ebooks.info
MATRIX RELAXATIONS 503
if and only if
www.it-ebooks.info
504 FRANZ RENDL
Table 1
Interior-point computation times to solve (3.7) with relative accuracy 10−6 . Here
m = n.
n time (secs.)
1000 12
2000 102
3000 340
4000 782
5000 1570
Table 2
` ´Interior-point computation times to solve (4.2) with relative accuracy 10−6 , m =
1 n
2 2
and m = 5n.
. /
1 n
n m= 2 2
time (secs.) n m = 5n time (secs.)
100 2488 12 500 2500 14
150 5542 125 1000 5000 120
200 9912 600 1500 7500 410
The convergence
√ analysis shows that under suitable parameter settings
it takes O( n) Newton iterations to reach a solution with the required
accuracy. Typically, the number of such iterations is not too large, often
only a few dozen, but both the memory requirements (a dense m×m matrix
has to be handled) and the computation times grow rapidly with n and m.
To give some impression, we provide in Table 1 some sample timings to
solve the basic relaxation for Max-Cut, see (3.7). It has m = n rather
simple constraints xii = 1. We also consider computing the theta number
ϑ(G) (4.2), see Table 2. Here the computational effort is also influenced
. /
by the cardinality |E(G)|. We consider dense graphs (m = 12 n2 ) and
sparse graphs (m = 5n). In the first case, the number n of vertices can
not be much larger than about 200, in the second case we can go to much
larger graphs. Looking at these timings, it is quite clear that interior-point
methods will become impractical once n ≈ 3000 or m ≈ 5000.
There have been attempts to overcome working explicitly with the
dense system matrix of order m. Toh [66] for instance reports quite en-
couraging results for larger problems by iteratively solving the linear system
for the search direction. A principal drawback of this approach lies in the
fact that the system matrix gets ill-conditioned, as one gets close to the
optimum. This implies that high accuracy is not easily reachable. We also
mention the approach from Kocvara and Stingl [38], which uses a modified
’barrier function’ and also handles large-scale problems. Another line of
research to overcome some of these limitations consists in exploiting spar-
sity in the data. We refer to [20, 50] for some first fundamental steps in
this direction.
www.it-ebooks.info
MATRIX RELAXATIONS 505
onto LD . Thus ΠD (Z) = C + AT (y) with y = (AAT )−1 A(Z − C). Finally,
the projection onto the hyperplane LC is trivial. Thus one can use alter-
nate projections to solve SDP. Take a starting point (X, y, Z), and project
it onto the affine constraints. This involves solving two linear equations
with system matrix AAT , which remains unchanged throughout. Then
project the result onto the SDP cone and iterate. This requires the spec-
tral decomposition of both X and Z.
This simple iterative scheme is known to converge slowly. In [34] some
acceleration strategies are discussed and computational results with m ≈
100, 000 are reported.
Another solution approach for SDP using only SDP projection and
solving a linear equation with system matrix AAT is proposed by Povh et
al [56] and Malick et al [46]. The approach from [34] can be viewed as
maintaining A(X) = b, Z + AT (y) = C and the zero duality gap condition
bT y = "C, X# and trying to get X and Z into S + . In contrast, the approach
from [56, 46] maintains X 0, Z 0, ZX = 0 and tries to reach feasibility
with respect to the linear equations. The starting point of this approach
www.it-ebooks.info
506 FRANZ RENDL
and iterate until dual feasibility is reached. The special structure of the
subproblem given by (8.4) allows us to interpret the update (8.5) differently.
After introducing the Lagrangian L = fσ,X (y, Z) + "V, Z# with respect
to the constraint Z 0, we get the following optimality conditions for
maximizing (8.4)
1
V = σ(Z + X + AT (y) − C) = σ(W + − W ) = −σW − ,
σ
where W − is the projection of W onto −(S + ). This leads to the boundary
point method from [56]. Given X, Z, solve the ∇y L = 0 for y:
1
AAT (y) = (b − A(X) − A(Z − C)).
σ
Then compute the spectral decomposition of W = C − AT (y) − σ1 X and
get a new iterate X = −σW − , Z = W + and iterate.
The computational effort of one iteration is essentially solving the lin-
ear system with matrix AAT and computing the factorization of W . Fur-
ther details, like convergence analysis and parameter updates are described
in [56, 46]. To show the potential of this approach, we compute ϑ(G) for
www.it-ebooks.info
MATRIX RELAXATIONS 507
Table 3
Boundary-point computation times to solve (4.2) with relative accuracy 10−6 .
n m = 14 n2 time (secs.)
400 40000 40
600 90000 100
800 160000 235
1000 250000 530
1200 360000 1140
www.it-ebooks.info
508 FRANZ RENDL
for instances having several hundred vertices, see also the thesis [69]. A
combination of polyhedral and SDP relaxations for the bisection problem
is studied in [1]. Exact solutions of rather large sparse instances (n ≈ 1000)
are obtained for the first time. Finally, exact solutions for Max-k-Cut are
given in [21].
REFERENCES
www.it-ebooks.info
MATRIX RELAXATIONS 509
www.it-ebooks.info
510 FRANZ RENDL
[38] M. Kocvara and M. Stingl. On the solution of large-scale SDP problems by the
modified barrier method using iterative solvers. Mathematical Programming,
95:413–444, 2007.
[39] M. Laurent and F. Rendl. Semidefinite programming and integer program-
ming. In K. Aardal, G.L. Nemhauser, and R. Weismantel, editors, Discrete
Optimization, pages 393–514. Elsevier, 2005.
[40] E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys (eds.).
The traveling salesman problem, a guided tour of combinatorial optimization.
Wiley, Chicester, 1985.
[41] A. Lisser and F. Rendl. Graph partitioning using Linear and Semidefinite Pro-
gramming. Mathematical Programming (B), 95:91–101, 2002.
[42] L. Lovász. On the Shannon capacity of a graph. IEEE Trans. Inform. Theory,
25:1–7, 1979.
[43] L. Lovász. Semidefinite programs and combinatorial optimization. In B. A. Reed
and C.L. Sales, editors, Recent advances in algorithms and combinatorics,
pages 137–194. CMS books in Mathematics, Springer, 2003.
[44] L. Lovász and A. Schrijver. Cones of matrices and set-functions and 0-1 opti-
mization. SIAM Journal on Optimization, 1:166–190, 1991.
[45] D. Johnson M. Garey, R. Graham and D.E. Knuth. Complexity results for
bandwidth minization. SIAM Journal on Applied Mathematics, 34:477–495,
1978.
[46] J. Malick, J. Povh, F. Rendl, and A. Wiegele. Regularization methods for
semidefinite programming. SIAM Journal on Optimization, 20:336–356, 2009.
[47] R.J. McEliece, E.R. Rodemich, and H.C. Rumsey Jr. The Lovász bound and
some generalizations. Journal of Combinatorics and System Sciences, 3:134–
152, 1978.
[48] T.S. Motzkin and E.G. Straus. Maxima for graphs and a new proof of a theorem
of Turán. Canadian Journal of Mathematics, 17:533–540, 1965.
[49] K.G. Murty and S.N. Kabadi. Some NP-complete problems in quadratic and
nonlinear programming. Mathematical Programming, 39:117–129, 1987.
[50] K. Nakata, K. Fujisawa, M. Fukuda, K. Kojima, and K. Murota. Exploiting
sparsity in semidefinite programming via matrix completion. II: Implementa-
tion and numerical results. Mathematical Programming, 95:303–327, 2003.
[51] Y. Nesterov. Quality of semidefinite relaxation for nonconvex quadratic opti-
mization. Technical report, CORE, 1997.
[52] C.H. Papadimitriou. The NP-completeness of the bandwidth minimization prob-
lem. Computing, 16:263–270, 1976.
[53] J. Povh. Applications of semidefinite and copositive programming in combinato-
rial optimization. PhD thesis, Universtity of Ljubljana, Slovenia, 2006.
[54] J. Povh and F. Rendl. A copositive programming approach to graph partitioning.
SIAM Journal on Optimization, 18(1):223–241, 2007.
[55] J. Povh and F. Rendl. Copositive and semidefinite relaxations of the quadratic
assignment problem. Discrete Optimization, 6(3):231–241, 2009.
[56] J. Povh, F. Rendl, and A. Wiegele. A boundary point method to solve semidef-
inite programs. Computing, 78:277–286, 2006.
[57] F. Rendl. Semidefinite relaxations for integer programming. In M. Jünger Th.M.
Liebling D. Naddef G.L. Nemhauser W.R. Pulleyblank G. Reinelt G. Rinaldi
and L.A. Wolsey, editors, 50 years of integer programming 1958-2008, pages
687–726. Springer, 2009.
[58] F. Rendl, G. Rinaldi, and A. Wiegele. Solving max-cut to optimality by inter-
secting semidefinite and polyhedral relaxations. Mathematical Programming,
212:307–335, 2010.
[59] A. Schrijver. A comparison of the Delsarte and Lovász bounds. IEEE Transac-
tions on Information Theory, IT-25:425–429, 1979.
[60] A. Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. B,
volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003.
www.it-ebooks.info
MATRIX RELAXATIONS 511
[61] H.D. Sherali and W.P. Adams. A hierarchy of relaxations between the continuous
and convex hull representations for zero-one programming problems. SIAM
Journal on Discrete Mathematics, 3(3):411–430, 1990.
[62] H.D. Sherali and W.P. Adams. A hierarchy of relaxations and convex hull
characterizations for mixed-integer zero-one programming problems. Discrete
Appl. Math., 52(1):83–106, 1994.
[63] J. Sturm. Theory and algorithms of semidefinite programming. In T. Terlaki
H. Frenk, K.Roos and S. Zhang, editors, High performance optimization, pages
1–194. Springer Series on Applied Optimization, 2000.
[64] J. Sun, S. Boyd, L. Xiao, and P. Diaconis. The fastest mixing Markov process
on a graph and a connection to a maximum variance unfolding problem. SIAM
Reviev, 48(4):681–699, 2006.
[65] M.J. Todd. A study of search directions in primal-dual interior-point methods
for semidefinite programming. Optimization Methods and Software, 11:1–46,
1999.
[66] K.-C. Toh. Solving large scale semidefinite programs via an iterative solver on the
augmented systems. SIAM Journal on Optimization, 14:670 – 698, 2003.
[67] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review,
38:49–95, 1996.
[68] A. Widgerson. Improving the performance guarantee for approximate graph
colouring. Journal of the ACM, 30:729– 735, 1983.
[69] A. Wiegele. Nonlinear optimization techniques applied to combinatorial opti-
mization problems. PhD thesis, Alpen-Adria-Universität Klagenfurt, Austria,
2006.
[70] H. Wolkowicz, R. Saigal, and L. Vandenberghe (eds.). Handbook of semidef-
inite programming. Kluwer, 2000.
[71] X. Zhao, D. Sun, and K. Toh. A Newton CG augmented Lagrangian method
for semidefinite programming. SIAM Journal on Optimization, 20:1737–1765,
2010.
www.it-ebooks.info
www.it-ebooks.info
A POLYTOPE FOR A PRODUCT OF
REAL LINEAR FUNCTIONS IN 0/1 VARIABLES
OKTAY GÜNLÜK∗ , JON LEE† , AND JANNY LEUNG‡
y≥0; (1.1)
y ≥ x1 + x2 − 1 ; (1.2)
y ≤ x1 ; (1.3)
2
y≤x . (1.4)
([email protected]).
† Department of Industrial and Operations Engineering, University of Michigan,
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 513
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_18,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
514 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
⎛ ⎞
y= a1i x1i ⎝ a2j x2j ⎠ = a1i a2j x1i x2j ; (1.5)
i∈K1 j∈K2 i∈K1 j∈K2
We note that P ((1), (1)) is just the solution set of (1.1–1.4). Our goal
is to investigate the polytope P (a1 , a2 ) generally.
In Section 2, we describe an application to modeling a product of a
pair of nonnegative integer variables using binary expansion. In Section
3, we describe a linear integer formulation of P (a1 , a2 ). In Section 4, we
investigate which of our inequalities are facet describing. In Section 5,
we determine a complete polyhedral characterization of P (a1 , a2 ). In es-
tablishing this characterization, we also find an inequality characterization
of a natural extended-variable formulation. In Section 6, we demonstrate
how to solve the separation problem for the facet describing inequalities of
P (a1 , a2 ). In Section 7, we investigate some topological properties of real
points in the P (a1 , a2 ) that satisfy the product equation (1.5). In Section
8, we briefly describe a generalization of our results.
Our results were first reported in complete form in [CGLL03], which
in turn was developed from [CLL99]. Other related work, much of which
was undertaken or announced later, includes [BB98, AFLS01, HHPR02,
JKKW05, Kub05, CK05, ÇKM06, FFL05, Lee07].
y = x1 x2 ;
0 ≤ x1 ≤ 2 ;
0 ≤ x2 ≤ 2
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 515
y≥0;
y ≥ 2x1 + 2x2 − 4 ;
y ≤ 2x1 ;
y ≤ 2x2 .
If we use these latter linear inequalities to model the product y, and then
seek to maximize y subject to these constraints and a side constraint x1 +
x2 ≤ 2, we find the optimal solution x1 = 1, x2 = 1, y = 2, which does
not satisfy y = x1 x2 . Therefore, this naı̈ve approach is inadequate in the
context of linear integer programming.
We adopt an approach that avoids the specific problem above where
an integer infeasible point is caught in the convex hull of integer feasible
solutions. Specifically, we assume that, for practical purposes, x1 and x2
can be bounded above. So we can write the xl in binary expansion: xl =
i−1 l
i∈Kl 2 xi , for l = 1, 2. That is, we let ali = 2i−1 . The only integer
points in P (a1 , a2 ) are the solutions of (1.5–1.6). Therefore, we avoid the
problem that we encountered when we did not use the binary expansions
of x1 and x2 .
3. Linear integer formulation. Obviously, for l = 1, 2, the simple
bound inequalities are valid for P (a1 , a2 ):
www.it-ebooks.info
516 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
Solving for y, and rewriting this in terms of xl and xl̄ , we obtain the upper
bound inequalities (3.4).
Note that the sets of inequalities (3.4) with l = 1 and l = 2 are
equivalent — this follows by checking that changing l is equivalent to com-
plementing H.
The transformation φl corresponds to the “switching” operation used
in the analysis of the cut polytope (see [DL97], for example). Specifically,
(1.2) and (1.3) are switches of each other under φ1 , and (1.2) and (1.4) are
switches of each other under φ2 .
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 517
Proposition 3.4. The points satisfying (1.6) and (3.3–3.4) for all
cross products H are precisely the points satisfying (1.5–1.6).
Proof. By Propositions 3.1 and 3.3, we need only show that every
point satisfying (1.6) and (3.3–3.4) for all cross products H also satisfies
(1.5). Let (x1 , x2 , y) be a point satisfying (1.6) and (3.3–3.4). Letting
H = {i ∈ K1 : x1i = 1} × {j ∈ K2 : x2j = 1} ,
H = {i ∈ K1 : x1i = 0} × {j ∈ K2 : x2j = 1} ,
www.it-ebooks.info
518 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 519
In the first case, we can expand the determinant along the last column and
we obtain a caterpillar matrix of order m − 1 with the same determinant
as the original matrix, up to the sign. In the second case, we subtract the
first row from the second row (which does not affect the determinant), and
www.it-ebooks.info
520 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
then we expand along the second row of the resulting matrix. Again, we
obtain a caterpillar matrix of order m − 1 with the same determinant as
the original matrix, up to the sign. In either case, the result follows by
induction.
Proposition 4.5. An inequality of the form (3.3) describes a facet
of P (a1 , a2 ) when H ∈ H(k1 , k2 ).
Proof. By Proposition 3.1, these inequalities are valid for P (a1 , a2 ).
It suffices to exhibit k1 + k2 + 1 affinely independent points of P (a1 , a2 )
that are tight for (3.3). Let Φ be the permutation that gives rise to H. It
is easy to check that (x1 , x2 , y) = (0, 1, 0) is a point of P (a1 , a2 ) that is
tight for (3.3). We generate the remaining k1 + k2 points by successively
flipping bits in the order of the permutation Φ. We simply need to check
that each bit flip preserves equality in (3.3). If a variable x1i is flipped
from
0 to 11,2 .the1 increase
2
/in y (i.e., the left-hand side of (3.3)) and in
a a x + x − 1 (i.e., the right-hand side of (3.3)) is precisely
(i,j)∈H i1 j2 i j
a
j:xi ≺xj i j
1 2 a . Similarly, if a variable x2j is flipped from 1 to 0, the decrease
in both of these quantities is precisely i:x1 ≺x2 a1i a2j .
i j
Next, we arrange these k1 + k2 + 1 points, in the order generated, as
the rows of a caterpillar matrix of order k1 + k2 + 1. A point (x1 , x2 , y)
yields the row (x2Φ , 1, x1Φ ), where xlΦ is just xl permuted according to the
order of the xli in Φ. Clearly this defines a caterpillar matrix, which is
nonsingular by Proposition 4.4. Hence, the generated points are affinely
independent, so (3.3) describes a facet when H ∈ H(k1 , k2 ).
Corollary 4.1. Each inequality (3.3) with H ∈ H(k1 , k2 ) admits a
set of tight points in P (a1 , a2 ) that correspond to the rows of a caterpillar
matrix.
Proposition 4.6. An inequality of the form (3.4) describes a facet
of P (a1 , a2 ) when H ∈ H(k1 , k2 ).
Proof. Using the transformation φl , this follows from Proposi-
tion 4.5.
Conversely, every caterpillar matrix of order k1 + k2 + 1 corresponds
to a facet of the form (3.3). More precisely, we have the following result.
Proposition 4.7. Let C be a caterpillar matrix of order k1 + k2 + 1
such that its first k2 columns correspond to a specific permutation of {x2j :
j ∈ K2 } and its last k1 columns correspond to a specific permutation of
{x1i : i ∈ K1 }. Then there exists a facet of P (a1 , a2 ) of the form (3.3)
such that the points corresponding to the rows of C are tight for it.
Proof. It is easy to determine the permutation Ψ that corresponds
to C, by interleaving the given permutations of {x2j : j ∈ K2 } and of
{x1i : i ∈ K1 }, according to the head and tail moves of the caterpillar.
Then, as before, we form the H of (3.3) by putting (i, j) in H if x1i ≺ x2j
in the final permutation.
It is easy to see that each row of C corresponds to a point of P (a1 , a2 )
that is tight for the resulting inequality (3.3).
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 521
and let
Qδ (a1 , a2 ) =
conv x1 ∈ Rk1 , x2 ∈ Rk2 , y ∈ R, δ ∈ Rk1 ×k2 : (3.1-3.2,5.1-5.5) ,
ydown = a1i a2j (x1i + x2j − 1)+
i∈K1 j∈K2
then the points pup = (x1 , x2 , yup ), and pdown = (x1 , x2 , ydown ) are in
Q(a1 , a2 ), and pup ≥ p ≥ pdown . Furthermore, if p is an extreme point, it
has to be one of pup and pdown .
Let K̄1 ⊆ K1 and K̄2 ⊆ K2 be the set of indices corresponding to
fractional components of x1 and x2 respectively. Clearly, K̄1 ∪ K̄2 = ∅. Let
www.it-ebooks.info
522 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
> 0 be a small number so that 1 > xli + > xli − > 0 for all i ∈ K̄l ,
l = 1, 2. Define xl+ where xl+ l l+ l
i := xi + if i ∈ K̄l and xi := xi , otherwise.
l−
Define x similarly. We consider the following two cases and show that if
p is fractional, then it can be represented as a convex combination of two
distinct points in Q(a1 , a2 ).
Case 1. Assume p = pup . Let pa = (x1+ , x2+ , y a ) and pb = (x1− , x2− , y b )
where
ya = a1i a2j min{x1+ 2+
i , xj },
i∈K1 j∈K2
yb = a1i a2j min{x1− 2−
i , xj },
i∈K1 j∈K2
i∈K1 j∈K2
yd = a1i a2j (x1− 2+ +
i + xj − 1) ,
i∈K1 j∈K2
argue that Q(a1 , a2 ) ⊆ P (a1 , a2 ), and then we will show that R(a1 , a2 ) ⊆
Q(a1 , a2 ).
Proposition 5.2. Q(a1 , a2 ) ⊆ P (a1 , a2 ).
Proof. As P (a1 , a2 ) is a bounded convex set, it is sufficient to show
that all of the extreme points of Q(a1 , a2 ) are contained in P (a1 , a2 ). Using
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 523
Notice
that for any u, v ∈ {0, 1}, min{u, v} = (u + v − 1)+ = uv. Therefore,
y = i∈K1 j∈K2 a1i a2j x1i x2j , and p ∈ P (a1 , a2 ).
Proposition 5.3. R(a1 , a2 ) ⊆ Q(a1 , a2 ).
Proof. Assume not, and let p = (x1 , x2 , y) ∈ R(a1 , a2 ) \ Q(a1 , a2 ).
As in the proof of Proposition let pup = (x1 , x2 , yup ) and pdown =
5.1,
(x , x , ydown ), where yup = i∈K1 j∈K2 a1i a2j min{x1i , x2j } and ydown =
1 2
1 2 1 2 + 1 2
i∈K1 j∈K2 ai aj (xi + xj − 1) . Note that, pup , pdown ∈ Q(a , a ). We
next show that yup ≥ y ≥ ydown , and therefore p ∈ conv{pup , pdown } ⊆
Q(a1 , a2 ).
Let H1 = {(i, j) ∈ K1 × K2 : x1i > x2j } and H2 = {(i, j) ∈ K1 × K2 :
xj + x1i > 1}. Applying (3.4) with H = H1 gives
2
y≤ a1i a2j x1i + a1i a2j (x2j − x1i )
i∈K1 j∈K2 (i,j)∈H1
= a1i a2j x1i + a1i a2j min{0, x2j − x1i }
i∈K1 j∈K2 i∈K1 j∈K2
= a1i a2j (x1i + min{0, x2j − x1i })
i∈K1 j∈K2
= a1i a2j min{x1i , x2j } = yup .
i∈K1 j∈K2
= a1i a2j max{0, x1i + x2j − 1} = ydown .
i∈K1 j∈K2
www.it-ebooks.info
524 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
and then we just check whether (x1 , x2 , y) violates the lower bound in-
equality (3.3) for the choice of H = H0 . Similarly, for the upper bound
inequalities (3.4), we let
H0 = (i, j) ∈ Kl × Kl̄ : xli − xl̄j < 0 ,
and then we just check whether (x1 , x2 , y) violates the upper bound in-
equality (3.4) for the choice of H = H0 . Note that for any H ⊆ K1 × K2 ,
a1i a2j (x1i + x2j − 1) ≤ a1i a2j (x1i + x2j − 1).
(i,j)∈H (i,j)∈H0
Therefore, (x1 , x2 , y) satisfies the lower bounds (3.3) for all sets H ⊆ K1 ×
K2 if and only if it satisfies (3.3) for H = H0 .
Using Propositions 4.5 and 4.6, we can see how this separation method
yields facet describing violated inequalities. We develop a permutation of
the variables
{x1i : i ∈ K1 } ∪ {x2j : j ∈ K2 } ,
according to their values. Let δ1 > δ2 > · · · > δp denote the distinct
values in {x1i : i ∈ K1 }. For convenience, let δ0 = 1 and δp+1 = 0. We
define the partition via 2p + 1 blocks, some of which may be empty. For,
t = 1, 2, . . . , p, block 2t consists of
{x1i : i ∈ K1 , x1i = δt } .
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 525
{xli : i ∈ Kl , xli = δt } .
www.it-ebooks.info
526 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
= λy 11 + (1 − λ)y 22
=y .
y 12 = a1i a2j x11 22
i xj .
i∈K1 j∈K2
Then we connect this third point, in the manner above, to each of the
points of the original pair.
The curve of ideal points given to us by Proposition 7.1 is entirely
contained in a 2-dimensional polytope, but it is not smooth in general. By
allowing the curve to be contained in a 3-dimensional polytope, we can
construct a smooth curve of ideal points connecting each pair of extreme
points of P (a1 , a2 ).
Proposition 7.2. Every pair of extreme points of P (a1 , a2 ) is con-
nected by a smooth curve of ideal points of P (a1 , a2 ).
Proof. Let (x11 , x12 , y 11 ) and (x21 , x22 , y 22 ) be a pair of distinct ex-
treme points of P (a1 , a2 ). Our goal is to connect these points with a smooth
curve of ideal points of P (a1 , a2 ). Toward this end, we consider two other
points of P (a1 , a2 ), (x11 , x22 , y 12 ) and (x21 , x12 , y 21 ), which we obtain from
the original pair by letting
y 12 = a1i a2j x11
i xj
22
i∈K1 j∈K2
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 527
and
y 21 = a1i a2j x21 12
i xj .
i∈K1 j∈K2
The reader can easily check that everything that we have done applies to
P (A) by making the substitution of a1i a2j by aij throughout.
www.it-ebooks.info
528 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG
REFERENCES
www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 529
www.it-ebooks.info
www.it-ebooks.info
PART VIII:
Complexity
www.it-ebooks.info
www.it-ebooks.info
ON THE COMPLEXITY OF NONLINEAR
MIXED-INTEGER OPTIMIZATION
MATTHIAS KÖPPE∗
max/min f (x1 , . . . , xn )
s.t. g1 (x1 , . . . , xn ) ≤ 0
.. (1.1)
.
gm (x1 , . . . , xn ) ≤ 0
x ∈ Rn1 × Zn2 ,
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 533
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_19,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
534 MATTHIAS KÖPPE
2. Preliminaries.
2.1. Presentation of the problem. We restrict ourselves to a model
where the problem is presented explicitly. In most of this survey, the func-
tions f and gi will be polynomial functions presented in a sparse encoding,
where all coefficients are rational (or integer) and encoded in the binary
scheme. It is useful to assume that the exponents of monomials are given
in the unary encoding scheme; otherwise already in very simple cases the
results of function evaluations will have an encoding length that is expo-
nential in the input size.
In an alternative model, the functions are presented by oracles, such
as comparison oracles or evaluation oracles. This model permits to handle
more general functions (not just polynomials), and on the other hand it is
very useful to obtain hardness results.
2.2. Encoding issues for solutions. When we want to study the
computational complexity of these optimization problems, we first need to
discuss how to encode the input (the data of the optimization problem) and
the output (an optimal solution if it exists). In the context of linear mixed-
integer optimization, this is straightforward: Seldom are we concerned with
irrational objective functions or constraints; when we restrict the input to
be rational as is usual, then also optimal solutions will be rational.
This is no longer true even in the easiest cases of nonlinear optimiza-
tion, as can be seen on the following quadratically constrained problem in
one continuous variable:
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 535
fA ≥ (1 − ) · fmax . (2.1)
(Here fmin denotes the minimal value of the function on the feasible re-
gion.) It enables us to study objective functions that are not restricted to
be non-negative on the feasible region. In addition, this notion of approxi-
mation is invariant under shifting of the objective function by a constant,
and under exchanging minimization and maximization. On the other hand,
it is not useful for optimization problems that have an infinite range. We
remark that, when the objective function can take negative values on the
feasible region, (2.2) is weaker than (2.1). We will call approximation al-
gorithms and schemes with respect to this notion of approximation weak.
This terminology, however, is not consistent in the literature; [16], for in-
stance, uses the notion (2.2) without an additional attribute and instead
reserves the word weak for approximation algorithms and schemes that give
a guarantee on the absolute error:
www.it-ebooks.info
536 MATTHIAS KÖPPE
fA − fmax ≤ . (2.3)
p(x1 , . . . , xn ) = 0, x1 , . . . , xn ∈ Z (3.1)
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 537
ral numbers that are not recursive, such as the halting problem of universal
Turing machines.
Theorem 3.2. For the following universal pairs (ν, δ)
(58, 4), . . . , (38, 8), . . . , (21, 96), . . . , (14, 2.0 × 105 ), . . . , (9, 1.638 × 1045 ),
www.it-ebooks.info
538 MATTHIAS KÖPPE
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 539
www.it-ebooks.info
540 MATTHIAS KÖPPE
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 541
the case of the standard simplex, it follows via the celebrated Motzkin–
Straus theorem [51] from the inapproximability of the maximum stable set
problem. These are results by Håstad [28]; see also [16].
5. Approximation schemes. For important classes of optimization
problems, while exact optimization is hard, good approximations can still
be obtained efficiently.
Many such examples are known in combinatorial settings. As an ex-
ample in continuous optimization, we refer to the problem of maximizing
homogeneous polynomial functions of fixed degree over simplices. Here de
Klerk et al. [17] proved a weak PTAS.
Below we present a general result for mixed-integer polynomial opti-
mization over polytopes.
5.1. Mixed-integer polynomial optimization in fixed dimen-
sion over linear constraints: FPTAS and weak FPTAS. Here we
consider the problem
max/min f (x1 , . . . , xn )
subject to Ax ≤ b (5.1)
x ∈ Rn1 × Zn2 ,
which holds for any finite set S = {s1 , . . . , sN } of non-negative real num-
bers. This relation can be viewed as an approximation result for k -norms.
Now if P is a polytope and f is an objective function non-negative on
P ∩ Zd , let x1 , . . . , xN denote all the feasible integer solutions in P ∩ Zd
and collect their objective function values si = f (xi ) in a vector s ∈ QN .
www.it-ebooks.info
542 MATTHIAS KÖPPE
k=1 k=2
Fig. 1. Approximation properties of k -norms.
Then, comparing the unit balls of the k -norm and the ∞ -norm (Figure 1),
we get the relation
1 − z n+1
g(P ; z) = z 0 + z 1 + · · · + z n−1 + z n = . (5.3)
1−z
The “long” polynomial has a “short” representation as a rational function.
The encoding length of this new formula is linear in the encoding length
of n. On the basis of this idea, we can solve the summation problem.
Consider the generating function of the interval P = [0, 4],
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 543
1 z5
g(P ; z) = z 0 + z 1 + z 2 + z 3 + z 4 = − .
1−z 1−z
d
We now apply the differential operator z dz and obtain
d 1 −4z 5 + 5z 4
z g(P ; z) = 1z 1 + 2z 2 + 3z 3 + 4z 4 = − .
dz (1 − z)2 (1 − z)2
The idea now is to evaluate this sum instead by computing the limit of the
rational function for z → 1,
4
2 z + z2 25z 5 − 39z 6 + 16z 7
α = lim − ;
z→1 (1 − z)3 (1 − z)3
α=0
www.it-ebooks.info
544 MATTHIAS KÖPPE
(b) Let the dimension d be fixed. Let g(P ; z) be the Barvinok representation
of the generating function α∈P ∩Zd zα of P ∩Zd . Let h ∈ Q[x1 , . . . , xd ]
be a polynomial, given as a list of monomials with rational coeffi-
cients cβ encoded in binary and exponents β encoded in unary. We
can compute in polynomial time a Barvinok representation g(P, h; z)
for the weighted generating function α∈P ∩Zd h(α)zα .
Thus, we can implement the following algorithm in polynomial time
(in fixed dimension).
Algorithm 1 (Computation of bounds for the optimal value).
Input: A rational convex polytope P ⊂ Rd ; a polynomial objective func-
tion f ∈ Q[x1 , . . . , xd ] that is non-negative over P ∩ Zd , given as a list of
monomials with rational coefficients cβ encoded in binary and exponents β
encoded in unary; an index k, encoded in unary.
Output: A lower bound Lk and an upper bound Uk for the maximal function
value f ∗ of f over P ∩Zd . The bounds Lk form a nondecreasing, the bounds
Uk a nonincreasing sequence of bounds that both reach f ∗ in a finite number
of steps.
1. Compute a short rational function expression for the generating function
g(P ; z) = α∈P ∩Zd zα . Using residue techniques, compute |P ∩ Zd | =
g(P ; 1) from g(P ; z).
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 545
From the discussion of the convergence of the bounds, one then obtains
the following result.
Theorem 5.3 (Fully polynomial-time approximation scheme). Let
the dimension d be fixed. Let P ⊂ Rd be a rational convex polytope. Let f
be a polynomial with rational coefficients that is non-negative on P ∩ Zd ,
given as a list of monomials with rational coefficients cβ encoded in binary
and exponents β encoded in unary.
(i) Algorithm 1 computes the bounds Lk , Uk in time polynomial in k, the
input size of P 1and f , and the2 total degree D. The bounds satisfy
Uk − Lk ≤ f ∗ · k |P ∩ Zd | − 1 .
(ii) For k = (1 + 1/) log(|P ∩ Zd |) (a number bounded by a polynomial in
the input size), Lk is a (1 − )-approximation to the optimal value f ∗
and it can be computed in time polynomial in the input size, the total
degree D, and 1/. Similarly, Uk gives a (1 + )-approximation to f ∗ .
(iii) With the same complexity, by iterated bisection of P , we can also find
a feasible solution x ∈ P ∩ Zd with f (x ) − f ∗ ≤ f ∗ .
5.1.4. Extension to the mixed-integer case by discretization.
The mixed-integer case can be handled by discretization of the continuous
variables. We illustrate on an example that one needs to be careful to pick
a sequence of discretizations that actually converges. Consider the mixed-
integer linear optimization problem depicted in Figure 2, whose feasible
region consists of the point ( 12 , 1) and the segment { (x, 0) : x ∈ [0, 1] }.
The unique optimal solution is x = 12 , z = 1. Now consider the sequence
1
of grid approximations where x ∈ m Z≥0 . For even m, the unique optimal
solution to the grid approximation is x = 12 , z = 1. However, for odd m,
the unique optimal solution is x = 0, z = 0. Thus the full sequence of the
optimal solutions to the grid approximations does not converge because it
has two limit points; see Figure 2.
To handle polynomial objective functions that take arbitrary (posi-
tive and negative) values, one can shift the objective function by a large
constant. Then, to obtain a strong approximation result, one iteratively
reduces the constant by a factor. Altogether we have the following result.
Theorem 5.4 (Fully polynomial-time approximation schemes). Let
the dimension n = n1 + n2 be fixed. Let an optimization problem (5.1) of
a polynomial function f over the mixed-integer points of a polytope P and
an error bound be given, where
www.it-ebooks.info
546 MATTHIAS KÖPPE
Z Z Z
1 Opt. 1 1
f ( 12 , 1) = 1
Opt.
1 1 f (0, 0) = 0
2
1 R 2
1 R 1 R
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 547
gi (y, x1 , . . . , xω ) ≤ 0, i = 1, . . . , m,
or using ≥, <, >, or = as the relation. Here y ∈ Rn0 is a free (i.e., not
quantified) variable. Let d ≥ 2 be an upper bound on the degrees of the
polynomials gi . A vector ȳ ∈ Rn0 is called a solution of this formula if
the formula (6.2) becomes a true logic sentence if we set y = ȳ. Let Y
denote the set of all solutions. An -approximate solution is a vector y
with ȳ − y < for some solution ȳ ∈ Y .
The following bound can be proved. When the number ω of “blocks”
of quantifiers (i.e., the number of alternations of the quantifiers ∃ and ∀)
is fixed, then the bound is singly exponential in the dimension.
Theorem 6.1. If the formula (6.2) has only integer coefficients of
binary encoding size at most , then every connected component of Y in-
tersects with the ball {y ≤ r}, where
www.it-ebooks.info
548 MATTHIAS KÖPPE
(md)2 n0 n1 ···nk
O(ω)
distinct -approximate solutions of the formula with the
property that for each connected components of Y ∩ {y ≤ r} one of the
yi is within distance . The algorithm runs in time
. /O(1)
(md)2 n0 n1 ...nk
O(ω)
+ md + log 1 + log r .
this describes that y is an optimal solution (all other solutions x are either
infeasible or have a higher objective value). Thus optimal solutions can be
efficiently approximated using the algorithm of Theorem 6.2.
6.2. Fixed dimension: Convex and quasi-convex integer poly-
nomial minimization. In this section we consider the case of the mini-
mization of convex and quasi-convex polynomials f over the mixed-integer
points in convex regions given by convex and quasi-convex polynomial func-
tions g1 , . . . , gm :
min f (x1 , . . . , xn )
s.t. g1 (x1 , . . . , xn ) ≤ 0
.. (6.4)
.
gm (x1 , . . . , xn ) ≤ 0
x ∈ Rn1 × Zn2 .
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 549
www.it-ebooks.info
550 MATTHIAS KÖPPE
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 551
gi (y, x1 , . . . , xω ) ≤ 0, i = 1, . . . , m
www.it-ebooks.info
552 MATTHIAS KÖPPE
B2 x2d F
x21 x21
B3 B1 x31 x11
−1 0 1
1
n+1.5 1 x41
B4 n+1
(a) (b)
Fig. 4. The implementation of the shallow separation oracle. (a) Test points xij
in the circumscribed ball E(1, 0). (b) Case I: All test points xi1 are (continuously)
feasible; so their convex hull (a cross-polytope) and its inscribed ball E((n + 1)−3 , 0) are
contained in the (continuous) feasible region F .
F B2 F
x21
B3 B1 x31
0 0 x11 x1d
x41
B4
(a) (b)
Fig. 5. The implementation of the shallow separation oracle. (a) Case II: The
center 0 violates a polynomial inequality g0 (x) < 0 (say). Due to convexity, for all
i = 1, . . . , n, one set of each pair Bi ∩ F and Bn+i ∩ F must be empty. (b) Case III: A
test point xk1 is infeasible, as it violates an inequality g0 (x) < 0 (say). However, the
center 0 is feasible at least for this inequality.
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 553
www.it-ebooks.info
554 MATTHIAS KÖPPE
Here max and min denote the componentwise maximum and minimum of
the vectors, respectively.
The fastest algorithm known for submodular function minimization
seems to be by Orlin [54], who gave a strongly polynomial-time algorithm
of running time O(n5 Teval + n6 ), where Teval denotes the running time of
the evaluation oracle. The algorithm is “combinatorial”, i.e., it does not
use the ellipsoid method. This complexity bound simultaneously improved
that of the fastest strongly polynomial-time algorithm using the ellipsoid
method, of running time Õ(n5 Teval + n7 ) (see [50]) and the fastest “com-
binatorial” strongly polynomial-time algorithm by Iwata [35], of running
time O((n6 Teval + n7 ) log n). We remark that the fastest polynomial-time
algorithm, by Iwata [35], runs in O((n4 Teval + n5 ) log M ), where M is the
largest function value. We refer to the recent survey by Iwata [36], who
reports on the developments that preceded Orlin’s algorithm [54].
For the special case of symmetric submodular function minimization,
i.e., f (x) = f (1 − x), Queyranne [56] presented an algorithm of running
time O(n3 Teval ).
Acknowledgments. The author wishes to thank the referees, in par-
ticular for their comments on the presentation of the Lenstra-type algo-
rithm, and his student Robert Hildebrand for a subsequent discussion about
this topic.
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 555
REFERENCES
www.it-ebooks.info
556 MATTHIAS KÖPPE
www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 557
www.it-ebooks.info
www.it-ebooks.info
THEORY AND APPLICATIONS
OF N -FOLD INTEGER PROGRAMMING
SHMUEL ONN∗
AMS(MOS) subject classifications. 05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q,
68R, 68U, 68W, 90B, 90C.
min {wx : x ∈ Zn , Ax = b , l ≤ x ≤ u} ,
The following result of [4] asserts that n-fold integer programs are efficiently
solvable.
∗ Technion - Israel Institute of Technology, 32000 Haifa, Israel
([email protected]). Supported in part by a grant from ISF - the Israel
Science Foundation.
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 559
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_20,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
560 SHMUEL ONN
Theorem 1.1. [4] For each fixed integer (r, s) × t bimatrix A, there
is an algorithm that, given positive integer n, bounds l, u ∈ Znt
∞, b ∈ Z
r+ns
,
and w ∈ Z , solves in time which is polynomial in n and in the binary-
nt
encoding length l, u, b, w of the rest of the data, the following so-termed
linear n-fold integer programming problem,
min wx : x ∈ Znt , A(n) x = b , l ≤ x ≤ u .
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 561
www.it-ebooks.info
562 SHMUEL ONN
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 563
Multiway Tables
Consider m1 X . . . X md X n tables with given margins such as line-sums:
8
4
6 1
3
2
3 0
1
0
A(n) =
5
n
0
1
n
0
9
www.it-ebooks.info
564 SHMUEL ONN
((v∗,i2 ,...,id+1 ), . . . , (vi1 ,...,id ,∗ )), solves in time polynomial in n and w, v,
the (d + 1)-index transportation problem
1 ×···×md ×n
min{wx : x ∈ Zm
+ ,
xi1 ,...,id+1 = v∗,i2 ,...,id+1 , . . . , xi1 ,...,id+1 = vi1 ,...,id ,∗ }.
i1 id+1
Using the algorithm of Theorem 1.1, this n-fold integer program, and hence
the given multi-index transportation problem, can be solved in polynomial
time.
This proof extends immediately to multi-index transportation prob-
lems with nonlinear objective functions of the forms in Theorems 1.2–1.5.
Moreover, as mentioned before, a similar proof shows that multi-index
transportation problems with k-margin constraints, and more generally,
hierarchical margin constraints, can be encoded as n-fold integer program-
ming problems as well. We state this as a corollary.
Corollary 2.1. [5] For every fixed d and m1 , . . . , md , the nonlin-
ear multi-index transportation problem, with any hierarchical margin con-
straints, over (d + 1)-way tables of format m1 × · · · × md × n with variable
n layers, are polynomial time solvable.
2.1.2. Privacy in statistical databases. A common practice in the
disclosure of sensitive data contained in a multiway table is to release some
of the table margins rather than the entries of the table. Once the margins
are released, the security of any specific entry of the table is related to the
set of possible values that can occur in that entry in all tables having the
same margins as those of the source table in the database. In particular, if
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 565
this set consists of a unique value, that of the source table, then this entry
can be exposed and privacy can be violated. This raises the following
fundamental problem.
Entry uniqueness problem: Given hierarchical margin family and entry
index, is the value which can occur in that entry in all tables with these
margins, unique?
The complexity of this problem turns out to behave in analogy to the
complexity of the multi-index transportation problem discussed in §2.1.1.
Consider the problem for d = 3 over l × m × n tables. It is polynomial
time decidable when l, m, n are all fixed, and coNP-complete when l, m, n
are all variable [17]. We discuss next in more detail the in-between cases
which are more delicate and were settled only recently.
If two sides are variable and one is fixed then the problem is still
coNP-complete, even over l × m × 3 tables with fixed n = 3 [20]. Moreover,
Theorem 2.1 implies that any set of nonnegative integers is the set of values
of an entry of some l × m × 3 tables with some specified line-sums. Figure
2 gives an example of line-sums for 6 × 4 × 3 tables where one entry attains
the set of values {0, 2} which has a gap.
2
1 2
2 1 2
2 1 2
0 2 3 0
1 2
1 2
1 2 2 0
0
0 1
0 1 1 2
0
2 0
2 2 2 1
2
0 0
0 0 3 2
2 2
2 0
0 2 0 3
2 1
0 0
www.it-ebooks.info
566 SHMUEL ONN
h
h
P := {y ∈ Rh+1
+ : y0 − s j yj = 0 , yj = 1 } .
j=1 j=1
such that the integer points in T , which are precisely the l × m × 3 tables
with these line-sums, are in bijection with the integer points in P . Moreover
(see [7]), this bijection is obtained by a simple projection from Rl×m×3 to
Rh+1 that erases all but some h+1 coordinates. Let xi,j,k be the coordinate
that is mapped to y0 . Then the set of values that this entry attains in all
tables with these line-sums is, as desired,
xi,j,k : x ∈ T ∩ Zl×m×3 = y0 : y ∈ P ∩ Zh+1 = S.
Finally, if two sides are fixed and one is variable, then entry uniqueness
can be decided in polynomial time by n-fold integer programming. Note
that even over 3 × 3 × n tables, the only solution of the problem available
to-date is the one below.
The polynomial time decidability of the problem when one side is
variable and the others are fixed extends to any dimension d. It also extends
to any hierarchical family of margins, but for simplicity we state it only for
line-sums, as follows.
Theorem 2.4. [20] For every fixed d, m1 , . . . , md , there is an algo-
rithm that, given n, integer line-sums v = ((v∗,i2 ,...,id+1 ), . . . , (vi1 ,...,id ,∗ )),
and entry index (k1 , . . . , kd+1 ), solves in time which is polynomial in n and
v, the corresponding entry uniqueness problem, of deciding if the entry
xk1 ,...,kd+1 is the same in all (d + 1)-tables in the set
1 ×···×md ×n
S := {x ∈ Zm
+ :
xi1 ,...,id+1 = v∗,i2 ,...,id+1 , . . . , xi1 ,...,id+1 = vi1 ,...,id ,∗ }.
i1 id+1
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 567
u := max xk1 ,...,kd+1 : x ∈ S .
Clearly, entry xk1 ,...,kd+1 has the same value in all tables with the given
line-sums if and only if l = u, which can therefore be tested in polynomial
time.
The algorithm of Theorem 2.4 and its extension to any family of hi-
erarchical margins allow statistical agencies to efficiently check possible
margins before disclosure: if an entry value is not unique then disclosure
may be assumed secure, whereas if the value is unique then disclosure may
be risky and fewer margins should be released.
We note that long tables, with one side much larger than the others, of-
ten arise in practical applications. For instance, in health statistical tables,
the long factor may be the age of an individual, whereas other factors may
be binary (yes-no) or ternary (subnormal, normal, and supnormal). More-
over, it is always possible to merge categories of factors, with the resulting
coarser tables approximating the original ones, making the algorithm of
Theorem 2.4 applicable.
Finally, we describe a procedure based on a suitable adaptation of the
algorithm of Theorem 2.4, that constructs the entire set of values that can
occur in a specified entry, rather than just decides its uniqueness. Here
S is the set of tables satisfying the given (hierarchical) margins, and the
running time is output-sensitive, that is, polynomial in the input encoding
plus the number of elements in the output set.
Procedure for constructing the set of values in an entry:
1. Initialize l := −∞, u := ∞, and E := ∅.
2. Solve in polynomial time the following linear n-fold integer pro-
grams:
ˆl := min xk ,...,k : l ≤ xk1 ,...,kd+1 ≤ u , x ∈ S ,
1 d+1
û := max xk1 ,...,kd+1 : l ≤ xk1 ,...,kd+1 ≤ u , x∈S .
www.it-ebooks.info
568 SHMUEL ONN
Cost:
(3+0)2+(2+2)2+(0+3)2 = 34 -2 1
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 569
Proof. Assume G has s vertices and t edges and let D be its s×t vertex-
edge incidence matrix. Let f : Zt → Z and g : Zlt → Z be the l separable
t
convex functions defined by f (y) := e=1 fe (ye ) with ye := k=1 xke and
g(x) := te=1 lk=1 gek (xke ). Let x = (x1 , . . . , xl ) be the vector of variables
with xk ∈ Zt the flow of commodity k for each k. Then the problem can
be rewritten in vector form as
l
l
min f xk + g (x) : x ∈ Zlt , Dxk = dk , xk ≤ u , x ≥ 0 .
k=1 k=1
www.it-ebooks.info
570 SHMUEL ONN
l
constraints become k=0 xk = u and the cost function becomes f (u−x0 )+
g(x1 , . . . , xl ) which is also separable convex. Now let A be the (t, s) × t
bimatrix with first block A1 := It the t×t identity matrix and second block
A2 := D. Let d0 := Du − k=1 dk and let b := (u, d0 , d1 , . . . , dl ). Then
l
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 571
Km,n
sm
cn
For suitable (ml,l) x ml bimatrix A and suitable (0,m) x ml bimatrix W
derived from the vk the problem becomes the n-fold integer program
nml
min { f(W(n)x)+g(x) : x in Z , A(n)x =(si, cj), W(n)x u, x 0 }
We assume below that the underlying digraph is Km,n (with edges ori-
ented from suppliers to consumers), since the problem over any subdigraph
G of Km,n reduces to that over Km,n by simply forcing 0 capacity on all
edges not present in G.
Theorem 2.6. [13] For any fixed l commodities, m suppliers, and vol-
umes vk , there is an algorithm that, given n, supplies and demands si , cj ∈
j
Zl+ , capacities ui,j ∈ Z+ , and convex costs fi,j , gi,k : Z → Z presented by
evaluation oracles, solves in time polynomial in n and si , cj , u, fˆ, ĝ, the
multicommodity transportation problem,
www.it-ebooks.info
572 SHMUEL ONN
l 1 2
min fi,j vk xji,k + j
gi,k xji,k
i,j k k=1
s.t.
l
xji,k ∈ Z , xji,k = sik , xji,k = cjk , vk xji,k ≤ ui,j , xji,k ≥ 0 .
j i k=1
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 573
can answer whether or not f (x) ≤ f (y), or by an evaluation oracle that for
any vector x can return f (x).
3.1. Graver bases and nonlinear integer programming. The
Graver basis is a fundamental object in the theory of integer programming
which was introduced by J. Graver already back in 1975 [11]. However,
only very recently, in the series of papers [4, 5, 12], it was established that
the Graver basis can be used to solve linear (as well as nonlinear) integer
programming problems in polynomial time. In this subsection we describe
these important new developments.
3.1.1. Graver bases. We begin with the definition of the Graver
basis and some of its basic properties. Throughout this subsection let A
be an integer m × n matrix. The lattice of A is the set L(A) := {x ∈ Zn :
Ax = 0} of integer vectors in its kernel. We use L∗ (A) := {x ∈ Zn : Ax =
0, x = 0} to denote the set of nonzero elements in L(A). We use a partial
order on Rn which extends the usual coordinate-wise partial order ≤
on the nonnegative orthant Rn+ and is defined as follows. For two vectors
x, y ∈ Rn we write x y and say that x is conformal to y if xi yi ≥ 0 and
|xi | ≤ |yi | for i = 1, . . . , n, that is, x and y lie in the same orthant of Rn
and each component of x is bounded by the corresponding component of
y in absolute value. A suitable extension of the classical lemma of Gordan
[10] implies that every subset of Zn has finitely many -minimal elements.
We have the following fundamental definition.
Definition 3.1. [11] The Graver basis of an integer matrix A is
defined to be the finite set G(A) ⊂ Zn of -minimal elements in L∗ (A) =
{x ∈ Zn : Ax = 0, x = 0}. Note that G(A) is centrally symmetric, that is,
g ∈ G(A) if and only if −g ∈ G(A). For instance, the Graver basis of the
1 × 3 matrix A := [1 2 1] consists of 8 elements,
G(A) = ± {(2, −1, 0), (0, −1, 2), (1, 0, −1), (1, −1, 1)} .
Note also that the Graver basis may contain elements, such as (1, −1, 1)
in the above small example, whose support involves linearly dependent
columns of A. So the cardinality of the Graver basis cannot be bounded in
terms of m and n only and depends on the entries of A as well. Indeed, the
Graver basis is typically exponential and cannot be written down, let alone
computed, in polynomial time. But, as we will show in the next section,
for n-fold products it can be computed efficiently.
A finite sum u := i vi of vectors in Rn is called conformal if vi u
for all i and hence all summands lie in the same orthant. We start with a
simple lemma.
Lemma 3.1. Any x ∈ L∗ (A) is a conformal sum x = i gi of Graver
basis elements gi ∈ G(A), with some elements possibly appearing more than
once in the sum.
Proof. We use induction on the well partial order . Consider any
x ∈ L∗ (A). If it is -minimal in L∗ (A) then x ∈ G(A) by definition of the
www.it-ebooks.info
574 SHMUEL ONN
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 575
www.it-ebooks.info
576 SHMUEL ONN
The second lemma shows that univariate convex functions can be min-
imized efficiently over an interval of integers using repeated bisections.
Lemma 3.5. There is an algorithm that, given any two integer num-
bers r ≤ s and any univariate convex function f : Z → R given by a com-
parison oracle, solves in time polynomial in r, s the following univariate
integer minimization problem,
min { f (λ) : λ ∈ Z , r ≤ λ ≤ s } .
Proof. If r = s then λ := r is optimal. Assume then r ≤ s − 1.
Consider the integers
C D C D
r+s r+s
r ≤ < +1 ≤ s .
2 2
.G H/ .G H /
Use the oracle of f to compare f r+s 2 and f r+s 2 + 1 . By the con-
vexity of f :
. / . r+s /
2 / = f . 2 + 1/ ⇒
f . r+s λ := r+s
2 is a minimum of f; H
G r+s
f . 2 / < f . 2 + 1/ ⇒ the minimum of f is in [r,
r+s r+s
G 2H ];
2 >f 2 +1
f r+s ⇒ the minimum of f is in [ r+s
r+s
2 + 1, s].
Thus, we either obtain the optimal point, or bisect the interval [r, s] and
repeat. So in O(log(s−r)) = O(r, s) bisections we find an optimal solution
λ ∈ Z ∩ [r, s].
The next two lemmas extend Lemmas 3.4 and 3.5. The first lemma
shows the supermodularity of separable convex functions with respect to
conformal sums.
Lemma 3.6. Let f : Rn→ R be any separable convex function, let
x ∈ Rn be any point, and let gi be any conformal sum in Rn . Then the
following inequality holds,
1 2
f x+ gi − f (x) ≥ (f (x + gi ) − f (x)) .
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 577
sum, we have gi,j gk,j ≥ 0 for all i, k and so, setting r := xj and si := gi,j
for all i, Lemma 3.4 applied to fj implies
fj xj + gi,j − fj (xj ) ≥ (fj (xj + gi,j ) − fj (xj )) . (3.3)
i i
by applying the algorithm of Lemma 3.7 for each g ∈ G(A). If the mini-
mal value in (3.6) satisfies f (xk + λg) < f (xk ) then set xk+1 := xk + λg
www.it-ebooks.info
578 SHMUEL ONN
R
Algorithm: Iteratively greedily augment initial point
to optimal one using elements from G(A)
and repeat, else stop and output the last point xs in the sequence. Now,
Axk+1 = A(xk + λg) = Axk = b by induction on k, so each xk is feasible.
Since the feasible set is finite and the xk have decreasing objective values
and hence distinct, the algorithm terminates.
We now show that the point xs output by the algorithm is optimal.
Let x∗ be any optimal solution to (3.5). Consider any point xk in the
sequence and suppose it is not optimal. We claim that a new point xk+1
will be produced and will satisfy
2n − 3
f (xk+1 ) − f (x∗ ) ≤ (f (xk ) − f (x∗ )) . (3.7)
2n − 2
t
By Lemma 3.2, we can write the difference x∗ −xk = i=1 λi gi as conformal
sum involving 1 ≤ t ≤ 2n − 2 elements gi ∈ G(A) with all λi ∈ Z+ . By
Lemma 3.6,
t
t
∗
f (x ) − f (xk ) = f xk + λi gi − f (xk ) ≥ (f (xk + λi gi ) − f (xk )) .
i=1 i=1
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 579
t
(f (xk + λi gi ) − f (x∗ )) ≤ (t − 1) (f (xk ) − f (x∗ )) .
i=1
2n − 3
f (xk + λg) − f (x∗ ) ≤ f (xk + λi gi ) − f (x∗ ) ≤ (f (xk ) − f (x∗ ))
2n − 2
and so indeed xk+1 := xk + λg will be produced and will satisfy (3.7).
This shows that the last point xs produced and output by the algorithm is
indeed optimal.
We proceed to bound the number s of points. Consider any i < s and
the intermediate non optimal point xi in the sequence produced by the
algorithm. Then f (xi ) > f (x∗ ) with both values integer, and so repeated
use of (3.7) gives
i−1
f (xk+1 ) − f (x∗ )
∗
1 ≤ f (xi ) − f (x ) = (f (x) − f (x∗ ))
f (xk ) − f (x∗ )
k=0
i
2n − 3
≤ (f (x) − f (x∗ ))
2n − 2
and therefore
−1
2n − 2
i ≤ log log (f (x) − f (x∗ )) .
2n − 3
Thus, the number of points produced and the total running time are poly-
nomial.
Next we show that Lemma 3.8 can also be used to find an initial
feasible point for the given integer program or assert that none exists in
polynomial time.
Lemma 3.9. There is an algorithm that, given integer m × n matrix
A, its Graver basis G(A), l, u ∈ Zn∞ , and b ∈ Zm , either finds an x ∈ Zn
www.it-ebooks.info
580 SHMUEL ONN
First note that ˆlj > −∞ if and only if lj > −∞ and ûj < ∞ if and only if
uj < ∞. Therefore there is no g ∈ G(A) satisfying gi ≤ 0 whenever ûi < ∞
and gi ≥ 0 whenever ˆli > −∞ and hence the feasible set of (3.8) is finite by
Lemma 3.3. Next note that x̂ is feasible in (3.8). Now apply the algorithm
of Lemma 3.8 to (3.8) and obtain an optimal solution x. Note that this
can be done in polynomial time since the binary length of x̂ and therefore
also of ˆl, û and of the maximum value fˆ of |f (x)| over the feasible set of
(3.8) are polynomial in the length of the data.
Now note that every point z ∈ S is feasible in (3.8), and every point z
feasible in (3.8) satisfies f (z) ≥ 0 with equality if and only if z ∈ S. So, if
f (x) > 0 then the original set S is empty, whereas if f (x) = 0 then x ∈ S
is a feasible point.
We are finally in position, using Lemmas 3.8 and 3.9, to show that
the Graver basis allows to solve the nonlinear integer program (3.2) in
polynomial time. As usual, fˆ is the maximum of |f (x)| over the feasible
set and need not be part of the input.
Theorem 3.1. [12] There is an algorithm that, given integer m × n
matrix A, its Graver basis G(A), l, u ∈ Zn∞ , b ∈ Zm , and separable convex
f : Zn → Z presented by comparison oracle, solves in time polynomial in
A, G(A), l, u, b, fˆ the problem
min{f (x) : x ∈ Zn , Ax = b , l ≤ x ≤ u} .
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 581
Proof. First, apply the polynomial time algorithm of Lemma 3.9 and
either conclude that the feasible set is infinite or empty and stop, or obtain
an initial feasible point and continue. Next, apply the polynomial time
algorithm of Lemma 3.8 and either conclude that the feasible set is infinite
or obtain an optimal solution.
3.1.3. Specializations and extensions.
www.it-ebooks.info
582 SHMUEL ONN
Now apply the algorithm of the first paragraph above for the lq distance.
Assuming the feasible set is nonempty and finite (else the algorithm stops)
let x∗ be the feasible point which minimizes the lq distance to x̂ obtained
by the algorithm. We claim that it also minimizes the l∞ distance to x̂
and hence is the desired optimal solution. Consider any feasible point x.
By standard inequalities between the l∞ and lq norms,
Therefore
where the last inequality holds by the choice of q. Since x∗ − x̂∞ and
x − x̂∞ are integers we find that x∗ − x̂∞ ≤ x − x̂∞ . This establishes
the claim.
In particular, for all positive p ∈ Z∞ , using the Graver basis we can
solve
min {xp : x ∈ Zn , Ax = b, l ≤ x ≤ u} ,
min {max{|xi | : i = 1, . . . , n} : x ∈ Zn , Ax = b, l ≤ x ≤ u} .
max {f (W x) : x ∈ S} .
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 583
Proof. Consider any edge e of PI and pick two distinct integer points
x, y ∈e. Then g := y − x is in L∗ (A) and hence Lemma 3.1 implies that
g = i hi is a conformal sum for suitable hi ∈ G(A). We claim that
x + hi ∈ PI for all i. Indeed, hi ∈ G(A) implies A(x + hi ) = Ax = b, and
l ≤ x, x + g ≤ u and hi g imply l ≤ x + hi ≤ u.
Now let w ∈ Zn be uniquely maximized over PI at the edge e. Then
whi = w(x + hi ) − wx ≤ 0 for all i. But whi = wg = wy − wx = 0,
implying that in fact whi = 0 and hence x + hi ∈ e for all i. This implies
that hi is a direction of e (in fact, all hi are the same and g is a multiple
of some Graver basis element).
Using Theorems 3.2 and 3.4 and Lemma 3.10 we obtain the following
theorem.
Theorem 3.5. [5] For every fixed d there is an algorithm that, given
integer m × n matrix A, its Graver basis G(A), l, u ∈ Zn∞ , b ∈ Zm , integer
d×n matrix W , and convex function f : Zd → R presented by a comparison
oracle, solves in time which is polynomial in A, W, G(A), l, u, b, the convex
integer maximization problem
max {f (W x) : x ∈ Zn , Ax = b, l ≤ x ≤ u} .
www.it-ebooks.info
584 SHMUEL ONN
We proceed to establish a result of [24] and its extension in [16] which show
that, in fact, the Graver complexity of every integer bimatrix A is finite.
Consider n-fold products A(n) nof A. By definition of the n-fold prod-
uct, A(n) x = 0 if and only if A1 k=1 xk = 0 and A2 xk = 0 for all k. In
particular, a necessary condition for x to lie in L(A(n) ), and in particular
in G(A(n) ), is that xk ∈ L(A2 ) for all k. Call a vector x = (x1 , . . . , xn ) full
if, in fact, xk ∈ L∗ (A2 ) for all k, in which case type(x) = n, and pure if,
moreover, xk ∈ G(A2 ) for all k. Full vectors, and in particular pure vectors,
are natural candidates for lying in the Graver basis G(A(n) ) of A(n) , and
will indeed play an important role in its construction.
Consider any full vector y = (y 1 , . . . , y m ). By definition, each brick of
ki i,j
y satisfies yi ∈ L∗ (A2 ) and is therefore a conformal sum y i = j=1 x of
some elements xi,j ∈ G(A2 ) for all i, j. Let n := k1 + · · · + km ≥ m and let
x be the pure vector
We call the pure vector x an expansion of the full vector y, and we call
full vector y a compression of the pure vector x. Note that A1 y i =
the
A1 xi,j and therefore y ∈ L(A(m) ) if and only if x ∈ L(A(n) ). Note also
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 585
that each full y may have many different expansions and each pure x may
have many different compressions.
Lemma 3.11. Consider any full y = (y 1 , . . . , y m ) and any expansion
x = (x1 , . . . , xn ) of y. If y is in the Graver basis G(A(m) ) then x is in the
Graver basis G(A(n) ).
Proof. Let x = (x1,1 , . . . , xm,km ) = (x1 , . . . , xn ) be an expansion of
ki i,j
y = (y 1 , . . . , y m ) with y i = j=1 x for each i. Suppose indirectly y ∈
(m)
G(A ) but x ∈ / G(A ). Since y ∈ L∗ (A(m) ) we have x ∈ L∗ (A(n) ).
(n)
x ∈ G(A(n) ).
So the type of any pure vector, p and hence the Graver complexity of
A, is at most the largest value i=1 vi of any nonnegative vector v in the
Graver basis G(A1 G2 ).
We proceed to establish the following theorem from [4] which asserts
that Graver bases of n-fold products can be computed in polynomial time.
An n-lifting of a vector y = (y 1 , . . . , y m ) consisting of m bricks is any vector
z = (z 1 , . . . , z n ) consisting of n bricks such that for some 1 ≤ k1 < · · · <
km ≤ n we have z ki = y i for i = 1, . . . , m, and all other bricks of z are
zero; in particular, n ≥ m and type(z) = type(y).
Theorem 3.6. [4] For every fixed integer bimatrix A there is an
algorithm that, given positive integer n, computes the Graver basis G(A(n) )
of the n-fold product of A, in time which is polynomial in n. In particular,
www.it-ebooks.info
586 SHMUEL ONN
So the set of all n-liftings of vectors in G(A(g) ) and hence the Graver basis
G(A(n) ) of the n-fold product can be computed in time polynomial in n as
claimed.
3.2.2. N-fold integer programming in polynomial time. Com-
bining Theorem 3.6 and the results of §3.1 we now obtain Theorems 1.1–1.4.
Theorem 1.1 [4] For each fixed integer (r, s) × t bimatrix A, there is an
algorithm that, given positive integer n, l, u ∈ Znt
∞, b ∈ Z
r+ns
, and w ∈ Znt ,
solves in time which is polynomial in n and l, u, b, w, the following linear
n-fold integer program,
min wx : x ∈ Znt , A(n) x = b , l ≤ x ≤ u .
Proof. Compute the Graver basis G(A(n) ) using the algorithm of The-
orem 3.6. Now apply the algorithm of Theorem 3.2 with this Graver basis
and solve the problem.
Theorem 1.2 [12] For each fixed integer (r, s) × t bimatrix A, there is
∞, b ∈ Z
an algorithm that, given n, l, u ∈ Znt r+ns
, and separable convex
f : Z → Z presented by a comparison oracle, solves in time polynomial
nt
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 587
Proof. Compute the Graver basis G(A(n) ) using the algorithm of The-
orem 3.6. Now apply the algorithm of Theorem 3.1 with this Graver basis
and solve the problem.
Theorem 1.3 [12] For each fixed integer (r, s) × t bimatrix A, there is an
∞, b ∈ Z
algorithm that, given positive integers n and p, l, u ∈ Znt r+ns
, and
x̂ ∈ Z , solves in time polynomial in n, p, and l, u, b, x̂, the following
nt
Proof. Compute the Graver basis G(A(n) ) using the algorithm of The-
orem 3.6. Now apply the algorithm of Theorem 3.5 with this Graver basis
and solve the problem.
3.2.3. Weighted separable convex integer minimization. We
proceed to establish Theorem 1.5 which is a broad extension of Theorem
1.2 that allows the objective function to include a composite term of the
form f (W x), where f : Zd → Z is a separable convex function and W is
an integer matrix with d rows, and to incorporate inequalities on W x. We
begin with two lemmas. As before, fˆ, ĝ denote the maximum values of
|f (W x)|, |g(x)| over the feasible set.
Lemma 3.13. There is an algorithm that, given an integer m × n
matrix A, an integer d × n matrix W , l, u ∈ Zn∞ , ˆl, û ∈ Zd∞ , b ∈ Zm , the
Graver basis G(B) of
A 0
B := ,
W I
www.it-ebooks.info
588 SHMUEL ONN
A(n) 0
B := .
W (n) I
Apply the algorithm of Theorem 3.6 and compute in polynomial time the
Graver basis G(D (n) ) of the n-fold product of D, which is the following
matrix:
0 1
A1 0 0 A1 0 0 ··· A1 0 0
B W1 Ip 0 W1 Ip 0 ··· W1 Ip 0 C
B C
B A2 0 0 0 0 0 ··· 0 0 0 C
B C
B W2 Iq ··· C
B 0 0 0 0 0 0 0 C
B A2 ··· C
D(n) = B
B
0 0 0 0 0 0 0 0 C .
C
B 0 0 0 W2 0 Iq ··· 0 0 0 C
B C
B .. .. .. .. .. .. .. .. .. .. C
B . . . . . . . . . . C
B C
@ 0 0 0 0 0 0 ··· A2 0 0 A
0 0 0 0 0 0 ··· W2 0 Iq
Suitable row and column permutations applied to D(n) give the following
matrix:
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 589
0 1
A1 A1 ··· A1 0 0 ··· 0 0 0 ··· 0
B A2 0 ··· 0 0 0 ··· 0 0 0 ··· 0 C
B C
B 0 A2 ··· 0 0 0 ··· 0 0 0 ··· 0 C
B C
B . .. .. .. .. .. .. .. .. .. .. .. C
B .. C
B . . . . . . . . . . . C
B C
B 0 0 ··· A2 0 0 ··· 0 0 0 ··· 0 C
C := B C .
B W1 W1 ··· W1 Ip Ip ··· Ip 0 0 ··· 0 C
B C
B W2 0 ··· 0 0 0 ··· 0 Iq 0 ··· 0 C
B C
B 0 W2 ··· 0 0 0 ··· 0 0 Iq ··· 0 C
B C
B .. .. .. .. .. .. .. .. .. .. .. .. C
@ . . . . . . . . . . . . A
0 0 ··· W2 0 0 ··· 0 0 0 ··· Iq
Obtain the Graver basis G(C) in polynomial time from G(D(n) ) by per-
muting the entries of each element of the latter by the permutation of the
columns of G(D(n) ) that is used to get C (the permutation of the rows does
not affect the Graver basis).
Now, note that the matrix B can be obtained from C by dropping all
but the first p columns in the second block. Consider any element in G(C),
indexed, according to the block structure, as
(x1 , x2 , . . . , xn , y1 , y 2 , . . . , yn , z 1 , z 2 , . . . , z n ) .
(x1 , x2 , . . . , xn , y 1 , z 1 , z 2 , . . . , z n )
(x1 , x2 , . . . , xn , y 1 , z 1 , z 2 , . . . , z n )
(x1 , x2 , . . . , xn , y 1 , 0, . . . , 0, z 1 , z 2 , . . . , z n )
G(B) := {(x1 , . . . , xn , y 1 , z 1 , . . . , z n ) :
(x1 , . . . , xn , y 1 , 0, . . . , 0, z 1 , . . . , z n ) ∈ G(C)}.
www.it-ebooks.info
590 SHMUEL ONN
Proof. Use the algorithm of Lemma 3.14 to compute the Graver basis
G(B) of
(n)
A 0
B := .
W (n) I
Now apply the algorithm of Lemma 3.13 and solve the nonlinear integer
program.
4. Discussion. We conclude with a short discussion of the universal-
ity of n-fold integer programming and the Graver complexity of (directed)
graphs, a new important invariant which controls the complexity of our
multiway table and multicommodity flow applications.
4.1. Universality of n-fold integer programming. Let us intro-
duce the following notation. For an integer s × t matrix D, let D denote
the (t, s) × t bimatrix whose first block is the t × t identity matrix and
whose second block is D. Consider the following special form of the n-fold
(n)
product, defined for a matrix D, by D [n] := (D) . We consider such m-
[m]
fold products of the 1 × 3 matrix 13 := [1, 1, 1]. Note that 13 is precisely
the (3 + m) × 3m incidence matrix of the complete bipartite graph K3,m .
For instance, for m = 3, it is the matrix
⎛ ⎞
1 0 0 1 0 0 1 0 0
⎜ 0 1 0 0 1 0 0 1 0 ⎟
⎜ ⎟
⎜ 0 0 1 0 0 1 0 0 1 ⎟
[3]
13 = ⎜ ⎜ ⎟ .
⎟
⎜ 1 1 1 0 0 0 0 0 0 ⎟
⎝ 0 0 0 1 1 1 0 0 0 ⎠
0 0 0 0 0 0 1 1 1
We can now rewrite Theorem 2.1 in the following compact and elegant
form.
The Universality Theorem [7] Every rational polytope {y ∈ Rd+ : Ay =
b} stands in polynomial time computable integer preserving bijection with
some polytope
[m][n]
x ∈ R3mn
+ : 13 x=a . (4.1)
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 591
This also shows the universality of n-fold integer programming: every lin-
ear or nonlinear integer program is equivalent to an n-fold integer program
[m]
over some bimatrix 13 which is completely determined by a single pa-
rameter m.
Moreover, for every fixed m, program (4.2) can be solved in polynomial
time for linear forms and broad classes of convex and concave functions by
Theorems 1.1–1.5.
4.2. Graver complexity of graphs and digraphs. The signifi-
cance of the following new (di)-graph invariant will be explained below.
Definition 4.1. [1] The Graver complexity of a graph or a digraph
G is the Graver complexity g(G) := g(D) of the bimatrix D with D the
incidence matrix of G.
One major task done by our algorithms for linear and nonlinear n-fold
integer programming over. a bimatrix
/ A is the construction of the Graver
basis G(A(n) ) in time O ng(A) with g(A) the Graver complexity of A (see
proof of Theorem 3.6).
Since the bimatrix underlying the universal n-fold integer program
[m]
(4.2) is precisely D with D = 13 the incidence matrix of K3,m , it
follows that the complexity of computing the
. relevant/ Graver bases for this
program for fixed m and variable n is O ng(K3,m ) where g(K3,m ) is the
Graver complexity of K3,m as just defined.
Turning to the many-commodity transshipment problem over a di-
graph G discussed in §2.2.1, the bimatrix underlying the n-fold integer
program (2.1) in the proof of Theorem 2.5 is precisely D with D the
incidence matrix of G, and so it follows that the. complexity
/ of computing
the relevant Graver bases for this program is O ng(G) where g(G) is the
Graver complexity of the digraph G as just defined.
So the Graver complexity of a (di)-graph controls the complexity of
computing the Graver bases of the relevant n-fold integer programs, and
hence its significance.
Unfortunately, our present understanding of the Graver complexity of
(di)-graphs is very limited and much more study is required. Very little
is known even for the complete bipartite graphs K3,m : while g(K3,3 ) = 9,
already g(K3,4 ) is unknown. See [1] for more details and a lower bound on
g(K3,m ) which is exponential in m.
Acknowledgements. I thank Jon Lee and Sven Leyffer for inviting
me to write this article. I am indebted to Jesus De Loera, Raymond Hem-
mecke, Uriel Rothblum and Robert Weismantel for their collaboration in
www.it-ebooks.info
592 SHMUEL ONN
REFERENCES
[1] Berstein Y. and Onn S., The Graver complexity of integer programming, Annals
Combin. 13 (2009) 289–296.
[2] Cook W., Fonlupt J., and Schrijver A., An integer analogue of Carathéodory’s
theorem, J. Comb. Theory Ser. B 40 (1986) 63–70.
[3] Cox L.H., On properties of multi-dimensional statistical tables, J. Stat. Plan.
Infer. 117 (2003) 251–273.
[4] De Loera J., Hemmecke R., Onn S., and Weismantel R., N-fold integer pro-
gramming, Disc. Optim. 5 (Volume in memory of George B. Dantzig) (2008)
231–241.
[5] De Loera J., Hemmecke R., Onn S., Rothblum U.G., and Weismantel R.,
Convex integer maximization via Graver bases, J. Pure App. Algeb. 213 (2009)
1569–1577.
[6] De Loera J. and Onn S., The complexity of three-way statistical tables, SIAM
J. Comp. 33 (2004) 819–836.
[7] De Loera J. and Onn S., All rational polytopes are transportation polytopes
and all polytopal integer sets are contingency tables, In: Proc. IPCO 10 –
Symp. on Integer Programming and Combinatoral Optimization (Columbia
University, New York), Lec. Not. Comp. Sci., Springer 3064 (2004) 338–351.
[8] De Loera J. and Onn S., Markov bases of three-way tables are arbitrarily com-
plicated, J. Symb. Comp. 41 (2006) 173–181.
[9] Fienberg S.E. and Rinaldo A., Three centuries of categorical data analysis: Log-
linear models and maximum likelihood estimation, J. Stat. Plan. Infer. 137
(2007) 3430–3445.
[10] Gordan P., Über die Auflösung linearer Gleichungen mit reellen Coefficienten,
Math. Annalen 6 (1873) 23–28.
[11] Graver J.E., On the foundations of linear and linear integer programming I, Math.
Prog. 9 (1975) 207–226.
[12] Hemmecke R., Onn S., and Weismantel R., A polynomial oracle-time algorithm
for convex integer minimization, Math. Prog. 126 (2011) 97–117.
[13] Hemmecke R., Onn S., and Weismantel R., N-fold integer programming and
nonlinear multi-transshipment, Optimization Letters 5 (2011) 13–25.
[14] Hochbaum D.S. and Shanthikumar J.G., Convex separable optimization is not
much harder than linear optimization, J. Assoc. Comp. Mach. 37 (1990)
843–862.
[15] Hoffman A.J. and Kruskal J.B., Integral boundary points of convex polyhedra,
In: Linear inequalities and Related Systems, Ann. Math. Stud. 38, 223–246,
Princeton University Press, Princeton, NJ (1956).
[16] Hoşten S. and Sullivant S., Finiteness theorems for Markov bases of hierarchical
models, J. Comb. Theory Ser. A 114 (2007) 311–321.
[17] Irving R. and Jerrum M.R., Three-dimensional statistical data security problems,
SIAM J. Comp. 23 (1994) 170–184.
www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 593
[18] Lenstra Jr. H.W., Integer programming with a fixed number of variables, Math.
Oper. Res. 8 (1983) 538–548.
[19] Motzkin T.S., The multi-index transportation problem, Bull. Amer. Math. Soc.
58 (1952) 494.
[20] Onn S., Entry uniqueness in margined tables, In: Proc. PSD 2006 – Symp. on
Privacy in Statistical Databses (Rome, Italy), Lec. Not. Comp. Sci., Springer
4302 (2006) 94–101.
[21] Onn S., Convex discrete optimization, In: Encyclopedia of Optimization, Springer
(2009) 513–550.
[22] Onn S., Nonlinear discrete optimization, Zurich Lectures in Advanced Mathemat-
ics, European Mathematical Society, 2010.
[23] Onn S. and Rothblum U.G., Convex combinatorial optimization, Disc. Comp.
Geom. 32 (2004) 549–566.
[24] Santos F. and Sturmfels B., Higher Lawrence configurations, J. Comb. Theory
Ser. A 103 (2003) 151–164.
[25] Schrijver A., Theory of Linear and Integer Programming, Wiley, New York
(1986).
[26] Sebö, A., Hilbert bases, Carathéodory’s theorem and combinatorial optimization,
In: Proc. IPCO 1 - 1st Conference on Integer Programming and Combinatorial
Optimization (R. Kannan and W.R. Pulleyblank Eds.) (1990) 431–455.
[27] Vlach M., Conditions for the existence of solutions of the three-dimensional planar
transportation problem, Disc. App. Math. 13 (1986) 61–78.
[28] Yemelichev V.A., Kovalev M.M., and Kravtsov M.K., Polytopes, Graphs and
Optimisation, Cambridge University Press, Cambridge (1984).
www.it-ebooks.info
www.it-ebooks.info
PART IX:
Applications
www.it-ebooks.info
www.it-ebooks.info
MINLP APPLICATION FOR
ACH INTERIORS RESTRUCTURING
ERICA KLAMPFL∗ AND YAKOV FRADKIN∗
Abstract. In 2006, Ford Motor Company committed to restructure the $1.5 billion
ACH interiors business. This extensive undertaking required a complete re-engineering
of the supply footprint of 42 high-volume product lines over 26 major manufacturing pro-
cesses and more than 50 potential supplier sites. To enable data-driven decision making,
we developed a decision support system (DSS) that could quickly yield a variety of so-
lutions for different business scenarios. To drive this DSS, we developed a non-standard
mathematical model for the assignment problem and a novel practical approach to solve
the resulting large-scale mixed-integer nonlinear program (MINLP). In this paper, we
present the MINLP and describe how we reformulated it to remove the nonlinearity in
the objective function, while still capturing the supplier facility cost as a function of the
supplier’s utilization. We also describe our algorithm and decoupling method that scale
well with the problem size and avoid the nonlinearity in the constraints. Finally, we
provide a computational example to demonstrate the algorithm’s functionality.
∗ Ford Research & Advanced Engineering, Systems Analytics & Environmental Sci-
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 597
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_21,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
598 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 599
3.1. Input information. The list below describes the inputs and
calculated coefficients that we use in the model.
www.it-ebooks.info
600 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 601
Note that only one product program of any given product program group
can get a freight discount. So, if product programs i1 and i2 are in the same
freight family and both product program i1 and i2 are produced at facility
j, then wi1 j = 1 and wi2 j = 0 ∀ j ∈ S and ∀ i1 , i2 ∈ C p . For example, an
instrument panel could receive reduced transportation costs by assigning
cockpit assembly work to the same location as the base instrument panel
and shipping a single finished assembly to the final assembly plant, as
opposed to separately shipping the cockpit and instrument panel.
The fourth variable, mpj , specifies how many units of capacity of type
p ∈ P to move to supplier j ∈ S. This is an integer variable that is greater
than or equal to zero. Note that this variable is integer-constrained (see
constraint (3.18)) by M P Mpj , which is introduced in Section 3.1. An
example unit of capacity that could be moved to a facility is a 2,500 ton
press.
The variable upj is a binary variable that keeps track of whether or
not any additional units of capacity of process p were added to a facility j
that did not already have any initial installed capacity of process type p.
⎧
⎨ 1 if any units of process p ∈ P are added to facility j ∈ S,
upj = such that IPMCpj = 0
⎩
0 otherwise.
www.it-ebooks.info
602 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 603
www.it-ebooks.info
604 ERICA KLAMPFL AND YAKOV FRADKIN
xi1 j = xi2 j ∀ i1 , i2 ∈ C, j ∈ S :
(3.3)
i2 < i1 ∧ PFi1 = 0 ∧ PFi2 = PFi1 .
We guarantee that manufacturing requirements belonging to the
same product program are sourced to the same facility by grouping
them in the same “product-program family” and constraining those
in the same product-program family to be sourced to the same
facility.
• If a requirement i is produced at facility j, then the total fractional
amount produced at j over all processes p should equal 1. In
addition, if a requirement i is not produced at facility j, then no
fractional amount of requirement i must be produced on a process
p belonging to that facility j (nm = 13, 053 constraints).
vipj = xij ∀ i ∈ C, j ∈ S. (3.4)
p∈P:RLip >0
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 605
www.it-ebooks.info
606 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 607
4.1. Input Information. The list below describes the additional in-
puts that were necessary for the MINLP reformulation.
Q = {1, . . . , } set of utilization ranges for facilities.
QCqj = production cost multiplier for facility j ∈ S when the facility
is in utilization range q ∈ Q.
UBq = upper bound on utilization range q ∈ Q; U B 0 = 0.
UBCi = base unit cost of product program i ∈ C p .
UCi = calculated unit cost of product program i ∈ C p .
PCij = calculated production cost of product program i ∈ C p at
Note that UCi and PCij were previously the variables uci and pcij in
Section 3 but are now parameters in the reformulated MINLP.
www.it-ebooks.info
608 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 609
Fig. 1. We discretized each facility’s utilization curve into ranges; range length
could be unequal to better approximate the curve. In this diagram, the utilization
range, qj , is defined by the interval from UBqj −1 to UBqj and has a unit cost fac-
tor QCqj j ∀ j ∈ S.
The following constraints are new constraints that were not in the
original MINLP described in Section 3.4. They have been added as a result
of introducing the utilization range approximation to the facility cost curve.
• A facility j can use at most one utilization level q. If a facility is
not chosen for Ford’s business (even though the facility may carry
third-party business), then we specify that no utilization range at
that facility is chosen (m = 57 constraints).
yqj ≤ 1 ∀ j ∈ S (4.1)
q
• The next two constraints are linking constraints. The first guar-
antees that if no product program i is sourced to a facility j in
www.it-ebooks.info
610 ERICA KLAMPFL AND YAKOV FRADKIN
This reformulated MINLP has a linear objective function and several non-
linear constraints, (3.15) through (3.17) from the original MINLP plus (4.2)
and (4.3). In total, there are mn(n−1)
2
+ λ(λ−1)(2m+1)
2
+ m(3λ + 2n + 4r +
2n+3+2)+2n+1 = 1, 889, 616 constraints in this discrete MINLP refor-
mulation. Recall that this is an upper bound on the number of constraints
since not all facilities are considered in each scenario.
5. Solution technique. Although we were able to remove the nonlin-
earity in the objective function to create the reformulated MINLP described
in the previous section, we are still left with a large-scale MINLP where
the discrete and continuous variables are non-separable. In this section,
we describe how we obviate the nonlinearity in the constraints of the refor-
mulated MINLP by iteratively fixing different variables in the reformulated
MINLP, allowing us to solve a MIP at each iteration. By carefully selecting
which variable to fix at each iteration, we not only are able to solve MIPs
at each iteration, but we are also able to reduce the number of constraints
and variables in each formulation. We first introduce the two MIPs that
we iteratively solve, each of which captures a subset of the decisions to be
made. Next, we describe the algorithm that provides us with a solution
to the discrete MINLP. Finally, we discuss convergence properties of the
algorithm.
5.1. Decoupling the MINLP. We introduce a decoupling of the
discrete MINLP model into two MIPs that can then be solved iteratively
until convergence to a solution of the discrete MINLP reformulation (see
Figure 2). Each MIP is actually a subset of the reformulated MINLP
with a different subset of variables fixed in the different MIPs. We chose
this approach because solution techniques for MIPs are well defined for
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 611
6WDUW
,QLWLDOL]H
&DOFXODWH &DOFXODWH
6ROYH)DFLOLW\&DSDFLW\ 6ROYH)DFLOLW\
0RGHO 8WLOL]DWLRQ0RGHO
<HV <HV
&RQYHUJHQFHWHVW &RQYHUJHQFHWHVW
1R 1R
6WRS
The first MIP model we refer to as the Facility Capacity Model (FCM);
it assumes that the utilization range for a supplier is fixed, and it solves to
determine to which facilities and processes to source and whether or not
facilities that do not currently have capacity should have capacity added.
This solution will then determine upper bounds on a facility’s process ca-
pacity. The second model we refer to as the Facility Utilization Model
(FUM). This model accepts as inputs a facility’s upper bound on capacity
for each process, the number of unique processes at each facility, and the
initial facility percent utilization and determines which utilization range a
facility falls within, as well as the same sourcing decisions. We conclude
this section with a detailed discussion of the algorithm in Figure 2.
5.1.1. Facility capacity problem formulation. The FCM MIP as-
sumes that the utilization range for each supplier is fixed; hence, we remove
the subscript q from all variables. The FCM answers the following two
questions: to which facilities and processes to source, and whether or not
facilities that do not currently have capacity should have capacity added.
As stated in Section 2, most categories of ACH-owned equipment could be
moved from Saline or Utica to a new supplier’s site, at a certain expense.
This FCM solution will then determine upper bounds on a facility’s process
capacity. The FCM model differs from the reformulated MINLP in that
it does not include the variable yqj and so does not need the associated
bounds or the constraints associated with this variable in (4.1)–(4.6). Ad-
ditionally, constraints (3.13) through (3.15) are not needed because nppj ,
pmcpj , and ipuj are calculated after the FCM is solved. Constraints (3.16)
and (3.17) are not needed because PCiqj and UCiqj are not decision vari-
ables and are set according to the fixed utilization range from the previous
solve of the FUM. Constraints (3.18) through (3.20) are no longer needed
www.it-ebooks.info
612 ERICA KLAMPFL AND YAKOV FRADKIN
MinkC = min (FCij + TCij + PCk−1
ij )xij
wij ∀i∈C ,j∈S,
p
xij , i∈C,j∈S
vipj ,mpj
∀i∈C,j∈S,p∈P
− FCij wij + MCpj mpj + CCj
i∈C\{i:CPMi =0}, p∈P,j∈S j∈S c
j∈S
s.t.
xij = 1 ∀ i ∈ C (5.1)
j∈S
xi1 j = xi2 j ∀ i1 , i2 ∈ C, j ∈ S :
i2 < i1 ∧ PFi1 = 0 ∧ PFi2 = PFi1 (5.2)
vipj = xij ∀ i ∈ C, j ∈ S (5.3)
p∈P:RLip >0
RLip vipj ≤ COF(SACpj + CPUpj mpj )
i∈C:RLip >0
∀ j ∈ S, p ∈ P (5.4)
vipj = 1 ∀ i ∈ C (5.5)
p∈P:RLip >0,j∈S
EHi xij ≥ UMIN (5.6)
i∈C,j∈S:Uj =1
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 613
SUj xi1 j = SUj xi2 j ∀ i1 ∈ C p : IFi1 > 0,
j∈S j∈S
xij ∈ B ∀ i ∈ C, j ∈ S (5.12)
0 ≤ vipj ≤ 1 ∀ i ∈ C, p ∈ P, j ∈ S (5.13)
wij ∈ B ∀ i ∈ C p , j ∈ S (5.14)
www.it-ebooks.info
614 ERICA KLAMPFL AND YAKOV FRADKIN
MinkU = min
xiqj ,yqj ,vipqj ,
(FCij + TCij + PCiqj )xiqj
∀i∈C,j∈S,q∈Q,p∈P; i∈C,j∈S,q∈Q
wij ∀i∈C p ,j∈S
− FCij wij + MCpj mkpj + CCj
i∈C\{i:CPMi =0}, p∈P,j∈S j∈S c
j∈S
s.t.
xiqj = 1 ∀ i ∈ C (5.16)
q∈Q,j∈S
xi1 qj = xi2 qj ∀ i1 , i2 ∈ C, j ∈ S : i2 < i1 ∧ PFi1 = 0
q∈Q q∈Q
vipqj = xiqj ∀ i ∈ C, j ∈ S, q ∈ Q (5.18)
p∈P:RLip >0
RLip vipqj ≤ COF(SACpj + CPUpj mkpj ) yqj
i∈C:RLip >0,q∈Q q∈Q
∀ j ∈ S, p ∈ P (5.19)
vipqj = 1 ∀ i ∈ C (5.20)
p∈P:RLip >0,q∈Q,j∈S:SACpj >0
EHi xiqj ≥ UMIN (5.21)
i∈C,q∈Q,j∈S:Uj =1
SUj xi1 qj1 = SUj xi2 qj2 ∀ i1 ∈ C p : IFi1 > 0,
q∈Q,j∈S q∈Q,j∈S
wij ≤ xiqj ∀ i ∈ C p , j ∈ S (5.23)
q∈Q
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 615
xi1 qj + xi2 qj ≥ 2wi1 j ∀ j ∈ S, i1 ∈ C p : CPMi1 > 0,
q∈Q q∈Q
RLip vipqj = PRi xiqj ∀ i ∈ C, j ∈ S (5.26)
p∈P:RLip >0,q∈Q q∈Q
yqj ≤ 1 ∀ j ∈ S (5.27)
q
RLip
v
k pmck ipqj
≤ (UBq − ipukj )yqj
i∈C,p∈P:pmcp,j >0∧RLip >0
npp j pj
∀ j ∈ S, q ∈ Q (5.28)
RLip
vipqj ≥
i∈C,p∈P:pmcp,j >0∧RLip >0
nppkj pmckpj
yqj ≤ xiqj ∀ j ∈ S, q ∈ Q (5.30)
i∈C
xiqj ∈ B ∀ i ∈ C, q ∈ Q, j ∈ S (5.32)
yqj ∈ B ∀ q ∈ Q, j ∈ S (5.33)
0 ≤ vipqj ≤ 1 ∀ i ∈ C, p ∈ P, q ∈ Q, j ∈ S (5.34)
wij ∈ B ∀ i ∈ C p , j ∈ S (5.35)
www.it-ebooks.info
616 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 617
Once this is done, we can determine the cost multiplier for each facility: if
facility j is in utilization level qj (that is, if yqkj ,j = 1), then we use cost
multiplier QCqkj j for that facility. The value of QCqkj j is used consequently
to determine the unit cost for each product program:
UCkij = U BCi · QCqkj j ∀ i ∈ C p , j ∈ S;
PCkij = NPV · UCkij 1,000
VVi
∀ i ∈ C p , j ∈ S.
Recall that we approximate UCkij as described in Section 4.3 and calculate
PCkij as defined in (3.17). Finally, we update the iteration counter (i.e.,
k = k + 1), and return to the step in Section 5.2.3.
5.2.3. Solving the FCM. The third step in the algorithm in Figure
2 is to solve the FCM, which will tell us how many units of process capacity
is moved to a facility.
This MIP typically took around 10 minutes to solve. The output from
this model is MinkC , mkpj , wij , xij , and vipj : note that we only pass MinkC
and mkpj , the objective function value and the process capacity moves to
each facility, respectively, to the next step in the algorithm. The value of
mkpj will be used to determine the upper bound on process capacity at the
k th iteration in the FUM.
5.2.4. Convergence check after FCM. The fourth step in the al-
gorithm in Figure 2 is to check for convergence. If MinkC < Mink−1
U (i.e.,
we obtained a smaller objective function value for the FCM than we did
for the FUM in the previous iteration), then the algorithm continues. If
not, then the algorithm has converged.
5.2.5. Calculate input parameters for FUM. The fifth step in
the algorithm in Figure 2 is to calculate the input parameters necessary for
the FUM that are functions of or dependent upon mkpj . First, we determine
if any new processes were added to a facility for which there was no prior
processes of that type. In the reformulated MINLP, the variable upj was a
binary variable that established this. However, we can now determine if any
new processes were added to a facility for which there was no prior processes
of that type with the following check: if IPMCpj = 0 ∧ mkpj > 0, then no
prior process of that type previously existed, and now it has been added.
Next, we need to calculate nppkj : constraint (3.13) in the reformulated
MINLP was previously used to determine this. We can now update nppkj
after solving the FCM by iterating over all plants j and all processes p,
updating nppkj only if for any mkpj > 0, IPMCpj = 0. Subsequently, we can
update pmcpj and ipuj . That is, we update these parameters as follows:
Let nppkj = IPPj
For j = 1, . . . , m
For p = 1, . . . , r
pmcpj = IPMCpj + CPUpj mpj
www.it-ebooks.info
618 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 619
That is, we only accept an updated solution for F ∗ if the solution to the
FUM is strictly less than the solution to the FCM at the k th iteration.
Similarly, at the k + 1st iteration, the solution for F U M k is also feasible
for F CM k+1 because zU k
satisfies the constraints of F CM k+1 by the na-
k
ture of the problem formulation. Because zU is feasible for F CM k+1 , it
follows that MinC ≤ MinU since zC is the global optimal solution to
k+1 k k+1
F CM k+1 . If Mink+1
C < MinkU , then F ∗ = Mink+1C . Therefore, our algo-
rithm produces a set of solutions that are strictly monotone decreasing. In
addition, this sequence is bounded below because both MinkC and MinkU
are bounded below. Therefore, our algorithm converges and convergence
occurs when MinkU = MinkC = F ∗ or Mink+1 C = MinkU = F ∗ . More strongly,
we know that the algorithm converges finitely because the MIP problems
being alternatively solved have discrete and finite feasible regions: the strict
monotonicity implies the algorithm must eventually stop. [9] provides de-
tails about for convergence theory on monotone decreasing sequences.
6. Computational example. Throughout this paper, we have de-
scribed the size of the problem that we actually solved for the ACH interiors
restructuring. In this section, we provide a small computational example
to demonstrate the algorithm and the core decision making capabilities of
the model. We will first introduce the inputs for this example and follow
with the model runs and solution.
6.1. Inputs. First, we describe the input parameters that affect the
dimension of the problem. We have four product programs (λ = 4) having
a total of ten manufacturing requirements (n = 10) and four third-party
suppliers having seven manufacturing facilities (m = 7) to which the prod-
uct programs need to be sourced. There are thirteen potential processes
(r = 13), although not all processes are available at every facility. We
divide a facility’s utilization into ten levels ( = 10).
Table 1 lists ten manufacturing requirements C, each characterized by
certain process requirement PRi . The column PFi defines the product-
program family-numbers; there are four unique product programs. For
example, the first two manufacturing requirements correspond to the same
product program, so they must be assigned to the same manufacturing fa-
cility. The column with C p ⊂ C provides the indices for the corresponding
manufacturing requirements that represent the four unique product pro-
grams. Each product program has the following values associated with it:
NPV, VVi , EHi , UBCi , IFi , and CPMi . Note that NPV = 3.79 for ev-
ery product program. VVi is the annual production volume, EHi is the
number of required employees, and UBCi is the unit production cost. IFi ,
when non-zero, is the product program’s “intellectual property family”; in
our example, product programs 1 and 4 fall under the same intellectual
property agreement and must be assigned to the same supplier but pos-
sibly to different manufacturing facilities of that supplier. Finally, CPMi ,
when non-zero, is the product program’s “freight-family” identifier. In our
www.it-ebooks.info
620 ERICA KLAMPFL AND YAKOV FRADKIN
Table 1
This table lists the characteristics of four product programs C p having ten manu-
facturing requirements C.
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 621
Table 2
This table lists the characteristics of seven manufacturing facilities S.
Table 3
The load RLip caused by manufacturing requirement i ∈ C if and when fully as-
signed to process p ∈ P.
p∈P
i∈C 1 2 3 4 5 6 7 8 9 10 11 12 13
1 - 149 149 149 - - - - - - - - -
2 - - - 13.1 13.1 13.1 - - - - - - -
3 - - - - - - - - - - - - 72000
4 52.1 52.1 52.1 - - - - - - - - - -
5 - - 230.2 230.2 230.2 - - - - - - - -
6 - - - - - 456 456 456 456 - - - -
7 - - - - - - - - - 125 - - -
8 - - - - - - - - - - 5300 - -
9 - - - - - - - - - - - 6300 -
10 - - - - - - - - - - - - 8174.8
www.it-ebooks.info
622 ERICA KLAMPFL AND YAKOV FRADKIN
Table 4
The total initial installed capacity IPMCpj for process p ∈ P at facility j ∈ S.
j∈S
p∈P 1 2 3 4 5 6 7
1 2,400 1,680 - - 1,440 360 480
2 120 600 - - - 480 240
3 - 960 - 120 1,320 120 240
4 1,200 - 120 - 120 240 -
5 150 - - - - 240 -
6 300 - 480 240 - 240 -
7 - - - 480 - - -
8–9 - - - - - - -
10 150 - 240 120 - - -
11 6,500 - 18,252 25,272 - - -
12 7,800 480 9,600 12,000 - - -
13 40,000 874,000 160,000 178,000 288,000 55,000 14,000
Table 5
The available capacity SACpj of process p ∈ P at facility j ∈ S.
j∈S
p∈P 1 2 3 4 5 6 7
1 1,200 359 - - 1,071 62 -
2 60 241 - - - 111 16
3 - 98 - 120 1,047 17 16
4 330 - - - 102 168 -
5 50 - - - - 46 -
6 90 - 130 25 - 69 -
7 - - - 20 - - -
8-9 - - - - - - -
10 80 - 240 - - - -
11 1,000 - 17,002 21,626 - - -
12 1,000 400 6,384 8,400 - - -
13 10,000 406,010 50,000 75,000 - 16,000 5,000
overload flex factor COF = 1.0; i.e., no overload on any of the processes is
allowed.
We have the option of moving, if needed, the process capacities from
the closed company-owned facilities to a different location, within certain
limits and at a certain cost. In this example, we assume that the maximum
number of units of capacity of process that can be moved to a facility
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 623
Table 6
The cost MCpj of moving a unit of capacity of process p ∈ P to facility j ∈ S.
j∈S
p∈P 1 2 3 4 5 6 7
1 $0.08 $0.08 $0.08 $0.08 $0.08 $0.08 $0.08
2 $0.09 $0.09 $0.09 $0.09 $0.09 $0.09 $0.09
3 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10
4 $0.18 $0.18 $0.18 $0.18 $0.18 $0.18 $0.18
5, 7, 12, 13 - - - - - - -
6 $0.33 $0.33 $0.33 $0.33 $0.33 $0.33 $0.33
8 $0.35 $0.35 $0.35 $0.35 $0.35 $0.35 $0.35
9 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38
10 $0.13 $0.13 $0.13 $0.13 $0.13 $0.13 $0.13
11 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38
www.it-ebooks.info
624 ERICA KLAMPFL AND YAKOV FRADKIN
displays the values for the utilization-dependant cost multipliers QCqj for
each facility j ∈ S in all of the ranges q ∈ Q.
Table 7
Freight cost FCij , in $mln, to ship product program i ∈ C p from facility j ∈ S to
the final vehicle assembly destination. This is the net present value cost over the life of
the project, calculated as the annual freight cost multiplied by the NPV factor.
j∈S
i∈C p
1 2 3 4 5 6 7
1 0.190 0.476 1.198 0.941 0.654 0.667 0.689
3 648.772 1,031.588 886.521 1,571.560 224.048 1,161.343 224.048
4 0.324 0.322 0.627 0.067 0.453 0.443 0.464
6 96.819 193.072 259.753 322.666 95.312 224.905 106.048
Table 8
This table shows utilization-dependant cost multiplier QCqj for each facility j ∈ S.
j∈S
q∈Q UBq 1 2 3 4 5 6 7
1 0.1 1.85 2.57 1.85 1.85 2.57 2.57 1.85
2 0.2 1.4 1.76 1.4 1.4 1.76 1.76 1.4
3 0.3 1.17 1.35 1.17 1.17 1.35 1.35 1.17
4 0.4 1.1 1.22 1.1 1.1 1.22 1.22 1.1
5 0.5 1.061 1.15 1.061 1.061 1.15 1.15 1.061
6 0.6 1.04 1.111 1.04 1.04 1.111 1.111 1.04
7 0.7 1.0201 1.08 1.0201 1.0201 1.08 1.08 1.0201
8 0.8 1.01 1.05 1.01 1.01 1.05 1.05 1.01
9 0.9 1 1.04 1 1 1.04 1.04 1
10 100 1.00001 1.04001 1.00001 1.00001 1.04001 1.04001 1.00001
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 625
This FCM problem had 124 variables and 71 constraints after the
pre-solve. After running the FCM, the manufacturing requirement facility
sourcing decisions are in Table 9, the manufacturing process sourcing deci-
sions are in Table 10, and the objective function value Min1C = 1362.615864.
Also after running the FCM, we get that the units of capacity to move
m6,3 = 3 (increases the capacity of process 6 at facility 3 from 130 to 490),
and mpj = 0 for all other p ∈ P and j ∈ S. Note from Table 3 that
manufacturing requirement 6 can be run on process 6, 7, 8, or 9 and has a
required load of 456. However, we can see in Table 5 that no processes of
type 6, 7, 8, or 9 at any of the facilities has enough available capacity to
meet the requirement for manufacturing requirement 6. So, the problem
would be infeasible if we were not allowed to move units of capacity into
a facility. Although the moving costs for the different processes 6, 8, and
9 are the same for facilities 1, 3, 4, and 6 according to Table 6 (process 7
cannot be moved), adding process capacity of type 6 at facilities 1, 4, and
6 and process capacity of type 8 and 9 at all facilities would require moving
4 units of process capacity: moving process capacity of type 6 to facility 3
only requires moving 3 units of process capacity.
Table 9
This table shows the output value xij specifying whether or not manufacturing
requirement i ∈ C is produced by facility j ∈ S. These output values are equivalent for
the first and the second executions of both the FCM and the FUM.
j∈S
i∈C 1 2 3 4 5 6 7
1–2 1 - - - - - -
3 - 1 - - - - -
4–5 1 - - - - - -
6–10 - - 1 - - - -
Since we are solving the FCM in the first iteration, and Min0U is set to
some arbitrarily large number, we do not need to check for convergence. We
proceed to the next step, where we calculate the values for nppj as defined
in the algorithm section, pmckpj as defined in (3.14), and ipukj as defined in
(3.18). Note that although we move m6,3 = 3 units of capacity of process
p = 6 to facility j = 3, some units of capacity of process six were already
available at facility three, so all the values of nppj as defined in Table 2 are
the same. We update the other two values for the affected . facility number
j = 3: pmc16,3 = 480 + 3 · 120 = 840, and ipu13 = 120−0 + 840−130 +
240−240 18252−17002 9600−6384 160000−50000
/ 120 840
240 + 18252 + 9600 + 160000 /6 = 0.7680378.
www.it-ebooks.info
626 ERICA KLAMPFL AND YAKOV FRADKIN
Table 10
This table shows
P the output from the first and the second executions of the FCM:
the actual load j∈S,q∈Q RLip vipqj of manufacturing requirement i ∈ C on process
p ∈ P.
p∈P
i∈C 1 2 3 4 5 6 7 8 9 10 11 12 13
1 - - - 149 - - - - - - - - -
2 - - - - - 13.1 - - - - - - -
3 - - - - - - - - - - - - 72,000.0
4 52.1 - - - - - - - - - - - -
5 - - - 181 49.2 - - - - - - - -
6 - - - - - 456 - - - - - - -
7 - - - - - - - - - 125 - - -
8 - - - - - - - - - - 5300 - -
9 - - - - - - - - - - - 6300 -
10 - - - - - - - - - - - - 8,174.8
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 627
Table 11
P This table shows the output from the first executionof the FUM: the actual load
j∈S,q∈Q RLip vipqj of manufacturing requirement i ∈ C on process p ∈ P.
p∈P
i∈C 1 2 3 4 5 6 7 8 9 10 11 12 13
1 - 17.9 - 131.1 - - - - - - - - -
2 - - - - - 13.1 - - - - - - -
3 - - - - - - - - - - - - 72000
4 52.1 - - - - - - - - - - - -
5 - - - 180.2 50 - - - - - - - -
6 - - - - - 456 - - - - - - -
7 - - - - - - - - - 125 - - -
8 - - - - - - - - - - 5300 - -
9 - - - - - - - - - - - 6300 -
10 - - - - - - - - - - - - 8174.8
Table 12
This table shows the utilization ranges q̄j for all facilities, as calculated by the first
execution of FUM (for Ford-utilized facilities 1 . . . 3) and enhanced by post-processing
logic of Section 5.2.2 (for non-Ford utilized facilities 4 . . . 7). The corresponding cost
multipliers QCq̄kj j are then passed to the second execution of the FCM.
j∈S
1 2 3 4 5 6 7
q̄j 8 7 7 5 5 8 9
QCq̄kj j 1.01 1.08 1.0201 1.061 1.15 1.05 1
www.it-ebooks.info
628 ERICA KLAMPFL AND YAKOV FRADKIN
www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 629
REFERENCES
www.it-ebooks.info
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIXED-INTEGER
OPTIMAL CONTROL PROBLEMS
SEBASTIAN SAGER∗
Heidelberg, Germany.
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 631
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_22,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
632 SEBASTIAN SAGER
been set up. The PROPT (a matlab toolkit for dynamic optimization using
collocation) homepage states over 100 test cases from different applications
with their results and computation time, [32]. With the software package
dsoa [20] come currently 77 test problems. The ESA provides a test set of
global optimization spacecraft trajectory problems and their best putative
solutions [3].
This is a good starting point. However, no standard has evolved yet
as in the case of finite-dimensional optimization. The specific formats for
which only few optimization / optimal control codes have an interface,
insufficient information on the modeling assumptions, or missing initial
values, parameters, or a concise definition of all constraints make a transfer
to different solvers and environments very cumbersome. The same is true
for hybrid systems, which incorporate MIOCPs as defined in this paper as
a special case. Two benchmark problems have been defined at [19].
Although a general open library would be highly desirable for opti-
mal control problems, we restrict ourselves here to the case of MIOCPs, in
which some or all of the control values and functions need to take values
from a finite set. MIOCPs are of course more general than OCPs as they
include OCPs as a special case, however the focus in this library will be
on integer aspects. We want to be general in our formulation, without
becoming too abstract. It will allow to incorporate ordinary and partial
differential equations, as well as algebraic constraints. Most hybrid systems
can be formulated by means of state-dependent switches. Closed-loop con-
trol problems are on a different level, because a unique and comparable
scenario would include well-defined external disturbances. We try to leave
our approach open to future extensions to nonlinear model predictive con-
trol (NMPC) problems, but do not incorporate them yet. The formulation
allows for different kinds of objective functions, e.g., time minimal or of
tracking type, and of boundary constraints, e.g., periodicity constraints.
Abstract problem formulations, together with a proposed categorization of
problems according to model, objective, and solution characteristics will
be given in Section 2.
MIOCPs include features related to different mathematical disciplines.
Hence, it is not surprising that very different approaches have been pro-
posed to analyze and solve them. There are three generic approaches to
solve model-based optimal control problems, compare [8]: first, solution of
the Hamilton-Jacobi-Bellman equation and in a discrete setting Dynamic
Programming, second indirect methods, also known as the first optimize,
then discretize approach, and third direct methods (first optimize, then dis-
cretize) and in particular all–at–once approaches that solve the simulation
and the optimization task simultaneously. The combination with the ad-
ditional combinatorial restrictions on control functions comes at different
levels: for free in dynamic programming, as the control space is evaluated
anyhow, by means of an enumeration in the inner optimization problem of
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 633
www.it-ebooks.info
634 SEBASTIAN SAGER
and u as controls, not the PDE formulation with x as independent variable and u as
differential states.
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 635
for t ∈ [t0 , tf ] almost everywhere. We will often leave the argument (t)
away for notational convenience.
www.it-ebooks.info
636 SEBASTIAN SAGER
2.1.3. PDE model. If d > 1 the model equation (2.2) becomes a par-
tial differential equation (PDE). Depending on whether convection or dif-
fusion prevails, a further classification into hyperbolic, elliptic, or parabolic
equations is necessary. A more elaborate classification will evolve as more
PDE constrained MIOCPs are described on http://mintoc.de. In this
work one PDE-based instance is presented in Section 11.
2.1.4. Outer convexification. For time-dependent and space- inde-
pendent integer controls often another formulation is beneficial, e.g., [37].
For every element v i of Ω a binary control function ωi (·) is introduced.
Equation (2.2) can then be written as
nω
0= F [x, u, v i ] ωi (t), t ∈ [0, tf ]. (2.4)
i=1
nω
ωi (t) = 1, t ∈ [0, tf ], (2.5)
i=1
with well defined switching functions σi (·) for t ∈ [0, tf ]. This characteristic
applies to the control problems in Sections 6 and 8.
2.1.6. Boolean variables. Discrete switching events can also be ex-
pressed by means of Boolean variables and logical implications. E.g., by in-
troducing logical functions δi : [0, tf ] #→ {true, false} that indicate whether
a model formulation Fi [x, u, v] is active at time t, both state-dependent
switches and outer convexification formulations may be written as disjunc-
tive programs, i.e., optimization problems involving Boolean variables and
logical conditions. Using disjunctive programs can be seen as a more natu-
ral way of modeling discrete events and has the main advantage of resulting
in tighter relaxations of the discrete dicisions, when compared to integer
programming techniques. More details can be found in [29, 46, 47].
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 637
on a time grid {ti }i . With smooth transfer functions also changes in the
dimension of optimization variables can be incorporated, [43].
www.it-ebooks.info
638 SEBASTIAN SAGER
min Φ(y1 , y2 )
y1 ,y2
s.t. 0 = F (y1 , y2 ),
(2.10)
0 ≤ C(y1 , y2 ),
0 ≤ (μ − y2 )T φ(y1 , y2 ), y2 ∈ Y (y1 ), ∀μ ∈ Y (y1 )
where Y (y1 ) is the feasible region for the variational inequality and given
function φ(·). Variational inequalities arise in many domains and are gen-
erally referred to as equilibrium constraints. The variables y1 and y2 may
be controls or states.
2.2.6. Complementarity constraints. This category contains opti-
mization problems with complementarity constraints (MPCCs), for generic
variables / functions y1 , y2 , y3 in the form of
min Φ(y1 , y2 , y3 )
y1 ,y2 ,y3
y1,i = 0 OR y2,i = 0 ∀ i = 1 . . . ny .
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 639
min Φ(y)
y
www.it-ebooks.info
640 SEBASTIAN SAGER
control u(·) can be determined to compensate for the changes in x(·), nat-
urally α(·) needs to do so by taking values in the interior of its feasible
domain. An illustrating example has been given in [58], where velocity
limitations for the energy-optimal operation of New York subway trains
are taken into account, see Section 6. The optimal integer solution does
only exist in the limit case of infinite switching (Zeno behavior), or when
a tolerance is given. Another example is compressor control in supermar-
ket refrigeration systems, see Section 8. Note that all applications may
comprise path-constrained arcs, once path constraints need to be added.
2.3.3. Sensitivity–seeking arcs. We define sensitivity–seeking (also
compromise–seeking) arcs in the sense of Srinivasan and Bonvin, [61], as
arcs which are neither bang–bang nor path–constrained and for which the
optimal control can be determined by time derivatives of the Hamiltonian.
For control–affine systems this implies so-called singular arcs.
A classical small-sized benchmark problem for a sensitivity-seeking
(singular) arc is the Lotka-Volterra Fishing problem, see Section 4. The
treatment of sensitivity–seeking arcs is very similar to the one of path–
constrained arcs. As above, an approximation up to any a priori specified
tolerance is possible, probably at the price of frequent switching.
2.3.4. Chattering arcs. Chattering controls are bang–bang controls
that switch infinitely often in a finite time interval [0, tf ]. An extensive an-
alytical investigation of this phenomenon can be found in [63]. An example
for a chattering arc solution is the famous example of Fuller, see Section 5.
2.3.5. Sliding mode. Solutions of model equations with state-de-
pendent switches as in (2.6) may show a sliding mode behavior in the sense
of Filippov systems [21]. This means that at least one of the functions σi (·)
has infinetely many zeros on the finite time interval [0, tf ]. In other words,
the right hand side switches infinetely often in a finite time horizon.
The two examples with state-dependent switches in this paper in Sec-
tions 6 and 8 do not show sliding mode behavior.
3. F-8 flight control. The F-8 aircraft control problem is based on
a very simple aircraft model. The control problem was introduced by Kaya
and Noakes [36] and aims at controlling an aircraft in a time-optimal way
from an initial state to a terminal state. The mathematical equations form
a small-scale ODE model. The interior point equality conditions fix both
initial and terminal values of the differential states. The optimal, relaxed
control function shows bang bang behavior. The problem is furthermore
interesting as it should be reformulated equivalently. Despite the reformu-
lation the problem is nonconvex and exhibits multiple local minima.
3.1. Model and optimal control problem. The F-8 aircraft con-
trol problem is based on a very simple aircraft model in ordinary differential
equations, introduced by Garrard [24]. The differential states consist of x0
as the angle of attack in radians, x1 as the pitch angle, and x2 as the pitch
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 641
rate in rad/s. The only control function w = w(t) is the tail deflection angle
in radians. The control objective is to control the airplane from one point
in space to another in minimum time. For t ∈ [0, T ] almost everywhere the
mixed-integer optimal control problem is given by
min T
x,w,T
www.it-ebooks.info
642 SEBASTIAN SAGER
Table 1
Results for the F-8 flight control problem. The solution in the second last column
is a personal communication by Martin Schlüter and Matthias Gerdts.
The best known optimal objective value of this problem given is given
by T = 3.78086. The corresponding solution is shown in Figure 1 (right),
another local minimum is plotted in Figure 1 (left). The solution of bang-
bang type switches three resp. five times, starting with w(t) = 1.
Fig. 1. Trajectories for the F-8 flight control problem. Left: corresponding to the
Sager[53] column in Table 1. Right: corresponding to the rightmost column in Table 1.
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 643
Fig. 2. Trajectories for the Lotka Volterra Fishing problem. Top left: optimal
relaxed solution on grid with 52 intervals. Top right: feasible integer solution. Bottom:
corresponding differential states, biomass of prey and of predator fish.
www.it-ebooks.info
644 SEBASTIAN SAGER
5.2. Results. The optimal trajectories for the relaxed control prob-
lem on an equidistant grid G 0 with nms = 20, 30, 60 are shown in the top
row of Figure 3. Note that this solution is not bang–bang due to the dis-
cretization of the control space. Even if this discretization is made very
fine, a trajectory with w(·) = 0.5 on an interval in the middle of [0, 1] will
be found as a minimum.
The application of MS MINTOC [54] yields an objective value of Φ =
1.52845 · 10−5 , which is better than the limit of the relaxed problems,
Φ20 = 1.53203 · 10−5 , Φ30 = 1.53086 · 10−5 , and Φ60 = 1.52958 · 10−5 .
Fig. 3. Trajectories for Fuller’s problem. Top row and bottom left: relaxed optima
for 20, 30, and 60 equidistant control intervals. Bottom right: feasible integer solution.
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 645
www.it-ebooks.info
646 SEBASTIAN SAGER
Table 2
Parameters used for the subway MIOCP and its variants.
The braking deceleration u(·) can be varied between 0 and a given umax . It
can be shown that for problem (6.1) only maximal braking can be optimal,
hence we fixed u(·) to umax without loss of generality. Occurring forces are
1.3
R(x1 ) = ca γ 2 x1 2 + bW γx1 + W + 116, (6.9)
2000
5 −i
1
T (x1 , 1) = bi (1) γx1 − 0.3 , (6.10)
i=0
10
5 −i
1
T (x1 , 2) = bi (2) γx1 − 1 . (6.11)
i=0
10
Parameters are listed in Table 2, while bi (w) and ci (w) are given by
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 647
Details about the derivation of this model and the assumptions made
can be found in [9] or in [38].
6.2. Results. The optimal trajectory for this problem has been cal-
culated by means of an indirect approach in [9, 38], and based on the
direct multiple shooting method in [58]. The resulting trajectory is listed
in Table 3.
Table 3
Optimal trajectory for the subway MIOCP as calculated in [9, 38, 58].
x1 ≤ v4 if x0 = S4 (6.12)
for a given distance 0 < S4 < S and velocity v4 > v3 . Note that the state
x0 (·) is strictly monotonically increasing with time, as ẋ0 = x1 > 0 for all
t ∈ (0, T ).
The optimal order of gears for S4 = 1200 and v4 = 22/γ with the ad-
ditional interior point constraints (6.12) is 1, 2, 1, 3, 4, 2, 1, 3, 4. The stage
lengths between switches are 2.86362, 10.722, 15.3108, 5.81821, 1.18383,
2.72451, 12.917, 5.47402, and 7.98594 with Φ = 1.3978. For different pa-
rameters S4 = 700 and v4 = 22/γ we obtain the gear choice 1, 2, 1, 3, 2, 1,
3, 4 and stage lengths 2.98084, 6.28428, 11.0714, 4.77575, 6.0483, 18.6081,
6.4893, and 8.74202 with Φ = 1.32518.
A more practical restriction are path constraints on subsets of the
track. We will consider a problem with additional path constraints
x1 ≤ v5 if x0 ≥ S5 . (6.13)
www.it-ebooks.info
648 SEBASTIAN SAGER
Optimal solution with 1 touch point Optimal solution with 3 touch points
Fig. 4. The differential state velocity of a subway train over time. The dotted ver-
tical line indicates the beginning of the path constraint, the horizontal line the maximum
velocity. Left: one switch leading to one touch point. Right: optimal solution for three
switches. The energy-optimal solution needs to stay as close as possible to the maximum
velocity on this time interval to avoid even higher energy-intensive accelerations in the
start-up phase to match the terminal time constraint tf ≤ 65 to reach the next station.
The additional path constraint changes the qualitative behavior of the re-
laxed solution. While all solutions considered this far were bang–bang and
the main work consisted in finding the switching points, we now have a
path–constraint arc. The optimal solutions for refined grids yield a series
of monotonically decreasing objective function values, where the limit is
the best value that can be approximated by an integer feasible solution. In
our case we obtain
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 649
www.it-ebooks.info
650 SEBASTIAN SAGER
Fig. 5. Trajectories for the calcium problem. Top left: optimal integer solution.
Top right: corresponding differential states with phase resetting. Bottom left: slightly
perturbed control: stimulus 0.001 too early. Bottom right: long time behavior of optimal
solution: numerical rounding errors lead to transition back from unstable steady-state
to stable limit-cycle.
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 651
www.it-ebooks.info
652 SEBASTIAN SAGER
Table 4
Parameters used for the supermarket refrigeration problem.
Mrm − x8 U Awrm (1 − w1 ) . /
ẋ8 = w1 − x8 x6 − Te (x0 )
τf ill Mrm · Δhlg (x0 )
x(0) = x(tf ),
650 ≤ tf ≤ 750,
x0 ≤ 1.7, 2 ≤ x3 ≤ 5, 2 ≤ x7 ≤ 5
w(t) ∈ {0, 1}4 , t ∈ [0, tf ].
The differential state x0 describes the suction pressure in the suction mani-
fold (in bar). The next three states model temperatures in the first display
case (in C). x1 is the goods’ temperature, x2 the one of the evaporator
wall and x3 the air temperature surrounding the goods. x4 then models
the mass of the liquefied refrigerant in the evaporator (in kg). x5 to x8
describe the corresponding states in the second display case. w0 and w1
describe the inlet valves of the first two display cases, respectively. w2 and
w3 denote the activity of a single compressor.
The model uses the parameter values listed in Table 4 and the poly-
nomial functions obtained from interpolations:
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 653
Fig. 6. Periodic trajectories for optimal relaxed (left) and integer feasible controls
(right), with the controls w(·) in the first row and the differential states in the three
bottom rows.
8.3. Variants. Since the compressors are parallel connected one can
introduce a single control w2 ∈ {0, 1, 2} instead of two equivalent controls.
The same holds for scenarios with n parallel connected compressors.
In [40], the problem was stated slightly different:
• The temperature constraints weren’t hard bounds but there was a
penalization term added to the objective function to minimize the
violation of these constraints.
• The differential equation for the mass of the refrigerant had another
switch, if the valve (e.g. w0 ) is closed. It was formulated as x˙4 =
Mrm − x4 U Awrm . /
if w0 = 1, x˙4 = − x4 x2 − Te (x0 ) if
τf ill Mrm · Δhlg (x0 )
w0 = 0 and x4 > 0, or x˙4 = 0 if w0 = 0 and x4 = 0. This
additional switch is redundant because the mass itself is a factor
on the right hand side and so the complete right hand side is 0 if
x4 = 0.
• A night scenario with two different parameters was given. At night
the following parameters change their value to Q̇airload = 1800.00 Js
www.it-ebooks.info
654 SEBASTIAN SAGER
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 655
μ iμg it
wmot := v(t),
R
2 1
FBf := FB , FBr := FB ,
3 3
m lr g m lf g
FRf (v) := fR (v) , FRr (v) := fR (v) ,
lf + lr lf + lr
fR (v) := 9 · 10−3 + 7.2 · 10−5 v + 5.038848 · 10−10 v 4 ,
1
FAx := cw ρ A v 2 (t), FAy := 0.
2
www.it-ebooks.info
656 SEBASTIAN SAGER
Table 5
Parameters used in the car model.
⎧
⎪
⎪ 0 if x ≤ 44,
⎪
⎪
⎪
⎪ 4 h2 (x − 44)3 if 44 < x ≤ 44.5,
⎪
⎪
⎨ 4 h2 (x − 45)3 + h2 if 44.5 < x ≤ 45,
Pl (x) := h2 if 45 < x ≤ 70, (9.2)
⎪
⎪
⎪
⎪ 4 h2 (70 − x)3 + h2 if 70 < x ≤ 70.5,
⎪
⎪
⎪
⎪ 4 h2 (71 − x)3 if 70.5 < x ≤ 71,
⎩
0 if 71 < x.
⎧
⎪
⎪ h1 if x ≤ 15,
⎪
⎪
⎪
⎪ 4 (h3 − h1 ) (x − 15)3 + h1 if 15 < x ≤ 15.5,
⎪
⎪
⎨ 4 (h3 − h1 ) (x − 16)3 + h3 if 15.5 < x ≤ 16,
Pu (x) := h3 if 16 < x ≤ 94, (9.3)
⎪
⎪
⎪
⎪ 4 (h3 − h4 ) (94 − x)3 + h3 if 94 < x ≤ 94.5,
⎪
⎪
⎪
⎪ 4 (h3 − h4 ) (95 − x)3 + h4 if 94.5 < x ≤ 95,
⎩
h4 if 95 < x.
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 657
9.2. Results. In [25, 26, 37] numerical results for the benchmark
problem have been deduced. In [37] one can also find an explanation why
a bang-bang solution for the relaxed and convexified gear choices has to
be optimal. Table 6 gives the optimal gear choice and the resulting ob-
jective function value (the end time) for different numbers N of control
discretization intervals, which were also used for a discretization of the
path constraints.
Table 6
Gear choice depending on discretization in time N . Times when gear becomes active.
min tf
tf ,x(·),u(·)
c
with η = arctan cyx . Note that the special case cx = 0 leading to η = ± π2
requires separate handling.
The model in Section 9 has a shortcoming, as switching to a low gear is
possible also at high velocities, although this would lead to an unphysically
www.it-ebooks.info
658 SEBASTIAN SAGER
πnMIN
eng R πnMAX
eng R
μ ≤v≤ (10.3)
30it ig 30it iμg
for all t ∈ [0, tf ] and the active gear μ. We write this as reng (v, μ) ≥ 0.
10.2. Results. Parts of the optimal trajectory from [57] are shown in
Figures 7 and 8. The order of gears is (2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1, 2). The
gear switches take place after 1.87, 5.96, 10.11, 11.59, 12.21, 12.88, 15.82,
19.84, 23.99, 24.96, 26.10, and 26.76 seconds, respectively. The final time
is tf = 27.7372 s.
Fig. 7. The steering angle velocity (control), and some differential states of the
optimal solution: directional velocity, side slip angle β, and velocity of yaw angle wz
plotted over time. The vertical lines indicate gear shifts.
As can be seen in Fig. 8, the car uses the track width to its full extent,
leading to active path constraints. As was expected, the optimal gear
increases in an acceleration phase. When the velocity has to be reduced, a
combination of braking, no acceleration, and engine brake is used.
The result depends on the engine speed constraint reng (v, μ) that be-
comes active in the braking phase. If the constraint is omitted, the optimal
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 659
Fig. 8. Elliptic race track seen from above with optimal position and gear choices
of the car. Note the exploitation of the slip (sliding) to change the car’s orientation as
fast as possible, when in first gear. The gear order changes when a different maximum
engine speed is imposed.
solution switches directly from the fourth gear into the first one to maxi-
mize the effect of the engine brake. For nMAX
eng = 15000 braking occurs in
the gear order 4, 2, 1.
Although this was left as a degree of freedom, the optimizer yields a
symmetric solution with respect to the upper and lower parts of the track
for all scenarios we considered.
10.3. Variants. By a more flexible use of Bezier patches more general
track constraints can be specified, e.g., of formula 1 race courses.
11. Simulated moving bed. We consider a simplified model of a
Simulated Moving Bed (SMB) chromatographic separation process that
contains time–dependent discrete decisions. SMB processes have been gain-
ing increased attention lately, see [17, 34, 56] for further references. The
related optimization problems are challenging from a mathematical point of
view, as they combine periodic nonlinear optimal control problems in par-
tial differential equations (PDE) with time–dependent discrete decisions.
11.1. Model and optimal control problem. SMB chromatogra-
phy finds various industrial applications such as sugar, food, petrochemical
and pharmaceutical industries. A SMB unit consists of multiple columns
filled with solid absorbent. The columns are connected in a continuous
cycle. There are two inlet streams, desorbent (De) and feed (Fe), and two
outlet streams, raffinate (Ra) and extract (Ex). The continuous counter-
current operation is simulated by switching the four streams periodically
in the direction of the liquid flow in the columns, thereby leading to better
separation. This is visualized in Figure 9.
www.it-ebooks.info
660 SEBASTIAN SAGER
Feed, Desorbent ?
6
?
1
?
2
?
3
?
4
?
5
6 6 6 6 6 6
Extract, Raffinate ?
1
?
2
?
3
?
4
?
5
?
6
The flow rates Q1 , QDe , QEx and QFe enter as control functions u(·) resp.
time–invariant parameters p into the optimization problem, depending on
the operating scheme to be optimized. The remaining flow rates are derived
by mass balance as
QRa = QDe − QEx + QFe (11.2)
Qi = Qi−1 − wiα Qα + wiα Qα (11.3)
α∈{Ra,Ex} α∈{De,Fe}
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 661
with equilibrium between the liquid and solid phases given by a linear
isotherm:
qiK (x, t) = CK cK
i (x, t). (11.5)
∂cK
i (x, t) ∂cK (x, t)
= −(ui (t)/K̄K ) i (11.6)
∂t ∂x
where K̄K = b + (1 − b )CK . Dividing the column into NF EX compart-
ments and applying a simple backward difference with Δx = L/NF EX
leads to:
dcK
i,j ui (t)NF EX K
= [ci,j−1 (t) − cK
i,j (t)] = k [ci,j−1 (t) − ci,j (t)]
K K K
(11.7)
dt K̄K L
ċK
j
j − − Qi cj −
= Qi− cK K
K
wiα Qα cK
j− + wiα Qα CαK (11.8)
k
α∈{Ra,Ex} α∈{De,Fe}
ċK
j
j − − Qi cj .
= Qi− cK K
(11.9)
kK
www.it-ebooks.info
662 SEBASTIAN SAGER
x #→ P x := (PA xA , PB xB , PM xM ) with
PA xA := (cA A A A
Ndis +1 , . . . , cN , c1 , . . . , cNdis ),
PB xB := (cB B B B
Ndis +1 , . . . , cN , c1 , . . . , cNdis ),
PM xM := (0, 0, 0, 0, 0).
A 1 − pEx B
MEx (T ) ≤ MEx (T ), (11.13)
pEx
B 1 − pRa A
MRa (T ) ≤ MRa (T ). (11.14)
pRa
We impose lower and upper bounds on all external and internal flow rates,
has to hold for all i ∈ I. The objective is to maximize the feed throughput
MFe (T )/T . Summarizing, we obtain the following MIOCP
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 663
Table 7
Fixed or optimized port assignment wiα and switching times of the process strategies.
Process Time 1 2 3 4 5 6
SMB fix 0.00 – 0.63 De Ex Fe Ra
SMB relaxed 0.00 – 0.50 De,Ex Ex Fe Ra
PowerFeed 0.00 – 0.56 De Ex Fe Ra
VARICOL 0.00 – 0.18 De Ex Fe Ra
0.18 – 0.36 De Ex Fe Ra
0.36 – 0.46 De,Ra Ex Fe
0.46 – 0.53 De,Ra Ex Fe
Superstruct 0.00 – 0.10 Ex De
0.10 – 0.18 De,Ex
0.18 – 0.24 De Ra
0.24 – 0.49 De Ex Fe Ra
0.49 – 0.49 De,Ex
www.it-ebooks.info
664 SEBASTIAN SAGER
Listing 2
Generic settings AMPL data file to be included
if ( fix w > 0 ) t h e n { f o r { i in U} { f i x w [ i ] ; } }
if ( f i x dt > 0 ) t h e n { f o r { i in U} { f i x dt [ i ] ; } }
# S e t i n d i c e s o f c o n t r o l s c o r r e s p o n d i n g t o time p o i n t s
f o r { i in 0 . . nu−1} {
f o r { j in 0 . . n t p e r u −1} { l e t u i d x [ i ∗ n t p e r u+j ] := i ; }
}
l e t u i d x [ nt ] := nu −1;
minimize D e v i a t i o n :
0 . 5 ∗ ( dt [ 0 ] / n t p e r u ) ∗ ( ( x [ 0 , 1 ] − r e f 1 ) ˆ 2 + ( x [ 0 , 2 ] − r e f 2 ) ˆ 2 )
+ 0 . 5 ∗ ( dt [ nu −1]/ n t p e r u ) ∗ ( ( x [ nt , 1 ] − r e f 1 ) ˆ 2 + ( x [ nt , 2 ] − r e f 2 ) ˆ 2 )
+ sum { i in I d i f f { 0 , nt } } ( ( dt [ u i d x [ i ] ] / n t p e r u ) ∗
( ( x [ i , 1 ] − r e f 1 )ˆ2 + ( x [ i , 2 ] − r e f 2 )ˆ2 ) ) ;
subj to o v e r a l l s t a g e l e n g t h :
sum { i in U} dt [ i ] = T ;
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 665
Listing 4
AMPL dat file for Lotka Volterra Fishing Problem
# A l g o r i t h m i c pa r am et er s
param n t p e r u := 1 0 0 ; param nu := 1 0 0 ; param nt := 1 0 0 0 0 ;
param nx := 2 ; param f i x w := 0 ; param f i x dt := 1 ;
# Problem parameters
param T := 1 2 . 0 ; param c1 := 0 . 4 ; param c2 := 0 . 2 ;
param r e f 1 := 1 . 0 ; param r e f 2 := 1 . 0 ;
# I n i t i a l values d i f f e r e n t i a l states
l e t x [ 0 , 1 ] := 0 . 5 ; l e t x [ 0 , 2 ] := 0 . 7 ;
fix x [ 0 , 1 ] ; fix x [ 0 , 2 ] ;
minimize D e v i a t i o n : sum { i in 1 . . 3 } x [ nt , i ] ∗x [ nt , i ] ;
www.it-ebooks.info
666 SEBASTIAN SAGER
− 4 . 2 0 8 ∗x [ i , 1 ] − 0 . 3 9 6 ∗x [ i , 3 ] − 0 . 4 7 ∗x [ i , 1 ] ∗x [ i , 1 ]
− 3 . 5 6 4 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗x [ i , 1 ]
+ 20.967∗xi − 6 . 2 6 5 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗ x i + 46∗x [ i , 1 ] ∗ x i ˆ2 − 6 1 . 4 ∗ x i ˆ3
− 2∗w [ u i d x [ i ] ] ∗ ( 2 0 . 9 6 7 ∗ x i − 6 . 2 6 5 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗ x i − 6 1 . 4 ∗ x i ˆ 3 ) ) ;
Listing 6
AMPL dat file for F-8 Flight Control Problem
# Parameters
param n t p e r u := 5 0 0 ; param nu := 6 0 ; param nt := 3 0 0 0 0 ;
param nx := 3 ; param f i x w := 0 ; param f i x dt := 1 ;
param x i := 0 . 0 5 2 3 6 ; param T := 8 ;
Fig. 10. Trajectories for the discretized F-8 flight control problem. Left: optimal
integer control. Right: corresponding differential states.
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 667
REFERENCES
www.it-ebooks.info
668 SEBASTIAN SAGER
[13] M.R. Bussieck, A.S. Drud, and A. Meeraus, Minlplib–a collection of test models
for mixed-integer nonlinear programming, INFORMS J. on Computing, 15
(2003), pp. 114–119.
[14] B. Chachuat, A. Singer, and P. Barton, Global methods for dynamic opti-
mization and mixed-integer dynamic optimization, Industrial and Engineering
Chemistry Research, 45 (2006), pp. 8573–8392.
[15] CMU-IBM, Cyber-infrastructure for MINLP collaborative site. http://minlp.org.
[16] M. Diehl and A. Walther, A test problem for periodic optimal control algorithms,
tech. rep., ESAT/SISTA, K.U. Leuven, 2006.
[17] S. Engell and A. Toumi, Optimisation and control of chromatography, Comput-
ers and Chemical Engineering, 29 (2005), pp. 1243–1252.
[18] W. Esposito and C. Floudas, Deterministic global optimization in optimal con-
trol problems, Journal of Global Optimization, 17 (2000), pp. 97–126.
[19] European Network of Excellence Hybrid Control, Website.
http://www.ist-hycon.org/.
[20] B.C. Fabien, dsoa: Dynamic system optimization.
http://abs-5.me.washington.edu/noc/dsoa.html.
[21] A. Filippov, Differential equations with discontinuous right hand side, AMS
Transl., 42 (1964), pp. 199–231.
[22] A. Fügenschuh, M. Herty, A. Klar, and A. Martin, Combinatorial and contin-
uous models for the optimization of traffic flows on networks, SIAM Journal
on Optimization, 16 (2006), pp. 1155–1176.
[23] A. Fuller, Study of an optimum nonlinear control system, Journal of Electronics
and Control, 15 (1963), pp. 63–71.
[24] W. Garrard and J. Jordan, Design of nonlinear automatic control systems,
Automatica, 13 (1977), pp. 497–505.
[25] M. Gerdts, Solving mixed-integer optimal control problems by Branch&Bound:
A case study from automobile test-driving with gear shift, Optimal Control
Applications and Methods, 26 (2005), pp. 1–18.
[26] , A variable time transformation method for mixed-integer optimal control
problems, Optimal Control Applications and Methods, 27 (2006), pp. 169–182.
[27] S. Göttlich, M. Herty, C. Kirchner, and A. Klar, Optimal control for con-
tinuous supply network models, Networks and Heterogenous Media, 1 (2007),
pp. 675–688.
[28] N. Gould, D. Orban, and P. Toint, CUTEr testing environment for optimiza-
tion and linear algebra solvers. http://cuter.rl.ac.uk/cuter-www/.
[29] I. Grossmann, Review of nonlinear mixed-integer and disjunctive programming
techniques, Optimization and Engineering, 3 (2002), pp. 227–252.
[30] I. Grossmann, P. Aguirre, and M. Barttfeld, Optimal synthesis of complex
distillation columns using rigorous models, Computers and Chemical Engi-
neering, 29 (2005), pp. 1203–1215.
[31] M. Gugat, M. Herty, A. Klar, and G. Leugering, Optimal control for traffic
flow networks, Journal of Optimization Theory and Applications, 126 (2005),
pp. 589–616.
[32] T.O. Inc., Propt - matlab optimal control software (dae, ode).
http://tomdyn.com/.
[33] A. Izmailov and M. Solodov, Mathematical programs with vanishing constraints:
Optimality conditions, sensitivity, and a relaxation method, Journal of Opti-
mization Theory and Applications, 142 (2009), pp. 501–532.
[34] Y. Kawajiri and L. Biegler, A nonlinear programming superstructure for opti-
mal dynamic operations of simulated moving bed processes, I&EC Research,
45 (2006), pp. 8503–8513.
[35] , Optimization strategies for Simulated Moving Bed and PowerFeed pro-
cesses, AIChE Journal, 52 (2006), pp. 1343–1350.
[36] C. Kaya and J. Noakes, A computational method for time-optimal control, Jour-
nal of Optimization Theory and Applications, 117 (2003), pp. 69–92.
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 669
www.it-ebooks.info
670 SEBASTIAN SAGER
www.it-ebooks.info
IMA HOT TOPICS WORKSHOP PARTICIPANTS
J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 671
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
672 IMA HOT TOPICS WORKSHOP PARTICIPANTS
www.it-ebooks.info
IMA HOT TOPICS WORKSHOP PARTICIPANTS 673
www.it-ebooks.info
674 IMA HOT TOPICS WORKSHOP PARTICIPANTS
www.it-ebooks.info
IMA SUMMER PROGRAMS 675
1987 Robotics
1988 Signal Processing
1989 Robust Statistics and Diagnostics
1990 Radar and Sonar (June 18–29)
New Directions in Time Series Analysis (July 2–27)
1991 Semiconductors
1992 Environmental Studies: Mathematical, Computational, and
Statistical Analysis
1993 Modeling, Mesh Generation, and Adaptive Numerical Methods
for Partial Differential Equations
1994 Molecular Biology
www.it-ebooks.info
676 IMA “ HOT TOPICS/SPECIAL” WORKSHOPS
www.it-ebooks.info
IMA “ HOT TOPICS/SPECIAL” WORKSHOPS 677
www.it-ebooks.info
678 SPRINGER LECTURE NOTES FROM THE IMA
www.it-ebooks.info
IMA VOLUMES 679
www.it-ebooks.info
680 IMA VOLUMES
Volume 20: Coding Theory and Design Theory Part I: Coding Theory
Editor: Dijen Ray-Chaudhuri
Volume 21: Coding Theory and Design Theory Part II: Design Theory
Editor: Dijen Ray-Chaudhuri
www.it-ebooks.info
IMA VOLUMES 681
www.it-ebooks.info
682 IMA VOLUMES
www.it-ebooks.info
IMA VOLUMES 683
www.it-ebooks.info
684 IMA VOLUMES
www.it-ebooks.info
IMA VOLUMES 685
www.it-ebooks.info
686 IMA VOLUMES
www.it-ebooks.info
IMA VOLUMES 687
www.it-ebooks.info
688 IMA VOLUMES
www.it-ebooks.info
IMA VOLUMES 689
www.it-ebooks.info
690 IMA VOLUMES
www.it-ebooks.info
7KLVSDJHLQWHQWLRQDOO\OHIWEODQN
www.it-ebooks.info