0% found this document useful (0 votes)
5 views

Mixed Integer Nonlinear Programming

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Mixed Integer Nonlinear Programming

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 712

www.it-ebooks.

info
The IMA Volumes in Mathematics
and its Applications
Volume 154

For further volumes:


http://www.springer.com/series/811

www.it-ebooks.info
Institute for Mathematics and
its Applications (IMA)
The Institute for Mathematics and its Applications was estab-
lished by a grant from the National Science Foundation to the University
of Minnesota in 1982. The primary mission of the IMA is to foster research
of a truly interdisciplinary nature, establishing links between mathematics
of the highest caliber and important scientific and technological problems
from other disciplines and industries. To this end, the IMA organizes a wide
variety of programs, ranging from short intense workshops in areas of ex-
ceptional interest and opportunity to extensive thematic programs lasting
a year. IMA Volumes are used to communicate results of these programs
that we believe are of particular value to the broader scientific community.
The full list of IMA books can be found at the Web site of the Institute
for Mathematics and its Applications:
http://www.ima.umn.edu/springer/volumes.html.
Presentation materials from the IMA talks are available at
http://www.ima.umn.edu/talks/.
Video library is at
http://www.ima.umn.edu/videos/.

Fadil Santosa, Director of the IMA

**********

IMA ANNUAL PROGRAMS

1982–1983 Statistical and Continuum Approaches to Phase Transition


1983–1984 Mathematical Models for the Economics of Decentralized
Resource Allocation
1984–1985 Continuum Physics and Partial Differential Equations
1985–1986 Stochastic Differential Equations and Their Applications
1986–1987 Scientific Computation
1987–1988 Applied Combinatorics
1988–1989 Nonlinear Waves
1989–1990 Dynamical Systems and Their Applications
1990–1991 Phase Transitions and Free Boundaries
1991–1992 Applied Linear Algebra

Continued at the back

www.it-ebooks.info
Jon Lee • Sven Leyffer
Editors

Mixed Integer Nonlinear


Programming

www.it-ebooks.info
Editors
Jon Lee Sven Leyffer
Industrial and Operations Engineering Mathematics and Computer Science
University of Michigan Argonne National Laboratory
1205 Beal Avenue Argonne, Illinois 60439
Ann Arbor, Michigan 48109 USA
USA

ISSN 0940-6573
ISBN 978-1-4614-1926-6 e-ISBN 978-1-4614-1927-3
DOI 10.1007/978-1-4614-1927-3
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2011942482
Mathematics Subject Classification (2010): 05C25, 20B25, 49J15, 49M15, 49M37, 49N90, 65K05,
90C10, 90C11, 90C22, 90C25, 90C26, 90C27, 90C30, 90C35, 90C51, 90C55, 90C57, 90C60,
90C90, 93C95

¤ Springer Science+Business Media, LLC 2012


All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

www.it-ebooks.info
FOREWORD

This IMA Volume in Mathematics and its Applications

MIXED INTEGER NONLINEAR PROGRAMMING

contains expository and research papers based on a highly successful IMA


Hot Topics Workshop “Mixed-Integer Nonlinear Optimization: Algorith-
mic Advances and Applications”. We are grateful to all the participants
for making this occasion a very productive and stimulating one.
We would like to thank Jon Lee (Industrial and Operations Engineer-
ing, University of Michigan) and Sven Leyffer (Mathematics and Computer
Science, Argonne National Laboratory) for their superb role as program or-
ganizers and editors of this volume.
We take this opportunity to thank the National Science Foundation
for its support of the IMA.

Series Editors
Fadil Santosa, Director of the IMA
Markus Keel, Deputy Director of the IMA

www.it-ebooks.info
www.it-ebooks.info
PREFACE

Many engineering, operations, and scientific applications include a mixture


of discrete and continuous decision variables and nonlinear relationships
involving the decision variables that have a pronounced effect on the set
of feasible and optimal solutions. Mixed-integer nonlinear programming
(MINLP) problems combine the numerical difficulties of handling nonlin-
ear functions with the challenge of optimizing in the context of nonconvex
functions and discrete variables. MINLP is one of the most flexible model-
ing paradigms available for optimization; but because its scope is so broad,
in the most general cases it is hopelessly intractable. Nonetheless, an ex-
panding body of researchers and practitioners — including chemical en-
gineers, operations researchers, industrial engineers, mechanical engineers,
economists, statisticians, computer scientists, operations managers, and
mathematical programmers — are interested in solving large-scale MINLP
instances.
Of course, the wealth of applications that can be accurately mod-
eled by using MINLP is not yet matched by the capability of available
optimization solvers. Yet, the two key components of MINLP — mixed-
integer linear programming (MILP) and nonlinear programming (NLP) —
have experienced tremendous progress over the past 15 years. By cleverly
incorporating many theoretical advances in MILP research, powerful aca-
demic, open-source, and commercial solvers have paved the way for MILP
to emerge as a viable, widely used decision-making tool. Similarly, new
paradigms and better theoretical understanding have created faster and
more reliable NLP solvers that work well, even under adverse conditions
such as failure of constraint qualifications.
In the fall of 2008, a Hot-Topics Workshop on MINLP was organized
at the IMA, with the goal of synthesizing these advances and inspiring new
ideas in order to transform MINLP. The workshop attracted more than 75
attendees, over 20 talks, and over 20 posters. The present volume collects
22 invited articles, organized into nine sections on the diverse aspects of
MINLP. The volume includes survey articles, new research material, and
novel applications of MINLP.
In its most general and abstract form, a MINLP can be expressed as

minimize f (x) subject to x ∈ F, (1)


x

where f : Rn → R is a function and the feasible set F contains both non-


linear and discrete structure. We note that we do not generally assume
smoothness of f or convexity of the functions involved. Different realiza-
tions of the objective function f and the feasible set F give rise to key
classes of MINLPs addressed by papers in this collection.

vii

www.it-ebooks.info
viii PREFACE

Part I. Convex MINLP. Even though mixed-integer optimization prob-


lems are nonconvex as a result of the presence of discrete variables, the
term convex MINLP is commonly used to refer to a class of MINLPs for
which a convex program results when any explicit restrictions of discrete-
ness on variables are relaxed (i.e., removed). In its simplest definition, for
a convex MINLP, we may assume that the objective function f in (1) is a
convex function and that the feasible set F is described by a set of convex
nonlinear function, c : Rn → Rm , and a set of indices, I ⊂ {1, . . . , n}, of
integer variables:

F = {x ∈ Rn | c(x) ≤ 0, and xi ∈ Z, ∀i ∈ I}. (2)

Typically, we also demand some smoothness of the functions involved.


Sometimes it is useful to expand the definition of convex MINLP to sim-
ply require that the functions be convex on the feasible region. Besides
problems that can be directly modeled as convex MINLPs, the subject has
relevance to methods that create convex MINLP subproblems.
Algorithms and software for convex mixed-integer nonlinear programs
(P. Bonami, M. Kilinç, and J. Linderoth) discusses the state of the art for
algorithms and software aimed at convex MINLPs. Important elements of
successful methods include a tree search (to handle the discrete variables),
NLP subproblems to tighten linearizations, and MILP master problems to
collect and exploit the linearizations.
A special type of convex constraint is a second-order cone constraint:
y2 ≤ z, where y is vector variable and z is a scalar variable. Subgradient-
based outer approximation for mixed-integer second-order cone program-
ming (S. Drewes and S. Ulbrich) demonstrates how such constraints can be
handled by using outer-approximation techniques. A main difficulty, which
the authors address using subgradients, is that at the point (y, z) = (0, 0),
the function y2 is not differentiable.
Many convex MINLPs have “off/on” decisions that force a continuous
variable either to be 0 or to be in a convex set. Perspective reformula-
tion and applications (O. Günlük and J. Linderoth) describes an effective
reformulation technique that is applicable to such situations. The perspec-
tive g(x, t) = tc(x/t) of a convex function c(x) is itself convex, and this
property can be used to construct tight reformulations. The perspective
reformulation is closely related to the subject of the next section: disjunc-
tive programming.
Part II. Disjunctive programming. Disjunctive programs involve con-
tinuous variable together with Boolean variables which model logical propo-
sitions directly rather than by means of an algebraic formulation.
Generalized disjunctive programming: A framework for formulation
and alternative algorithms for MINLP optimization (I.E. Grossmann and
J.P. Ruiz) addresses generalized disjunctive programs (GDPs), which are
MINLPs that involve general disjunctions and nonlinear terms. GDPs can

www.it-ebooks.info
PREFACE ix

be formulated as MINLPs either through the “big-M” formulation, or by


using the perspective of the nonlinear functions. The authors describe
two approaches: disjunctive branch-and-bound, which branches on the dis-
junctions, and and logic-based outer approximation, which constructs a
disjunctive MILP master problem.
Under the assumption that the problem functions are factorable (i.e.,
the functions can be computed in a finite number of simple steps by us-
ing unary and binary operators), a MINLP can be reformulated as an
equivalent MINLP where the only nonlinear constraints are equations in-
volving two or three variables. The paper Disjunctive cuts for nonconvex
MINLP (P. Belotti) describes a procedure for generating disjunctive cuts.
First, spatial branching is performed on an original problem variable. Next,
bound reduction is applied to the two resulting relaxations, and linear
relaxations are created from a small number of outer approximations of
each nonlinear expression. Then a cut-generation LP is used to produce a
new cut.
Part III. Nonlinear programming. For several important and practical
approaches to solving MINLPs, the most important part is the fast and
accurate solution of NLP subproblems. NLPs arise both as nodes in branch-
and-bound trees and as subproblems for fixed integer or Boolean variables.
The papers in this section discuss two complementary techniques for solving
NLPs: active-set methods in the form of sequential quadratic programming
(SQP) methods and interior-point methods (IPMs).
Sequential quadratic programming methods (P.E. Gill and E. Wong)
is a survey of a key NLP approach, sequential quadratic programming
(SQP), that is especially relevant to MINLP. SQP methods solve NLPs by
a sequence of quadratic programming approximations and are particularly
well-suited to warm starts and re-solves that occur in MINLP.
IPMs are an alternative to SQP methods. However, standard IPMs
can stall if started near a solution, or even fail on infeasible NLPs, mak-
ing them less suitable for MINLP. Using interior-point methods within an
outer approximation framework for mixed-integer nonlinear programming
(H.Y. Benson) suggests a primal-dual regularization that penalizes the con-
straints and bounds the slack variables to overcome the difficulties caused
by warm starts and infeasible subproblems.
Part IV. Expression graphs. Expression graphs are a convenient way
to represent functions. An expression graph is a directed graph in which
each node represents an arithmetic operation, incoming edges represent op-
erations, and outgoing edges represent the result of the operation. Expres-
sion graphs can be manipulated to obtain derivative information, perform
problem simplifications through presolve operations, or obtain relaxations
of nonconvex constraints.
Using expression graphs in optimization algorithms (D.M. Gay) dis-
cusses how expression graphs allow gradients and Hessians to be computed

www.it-ebooks.info
x PREFACE

efficiently by exploiting group partial separability. In addition, the author


describes how expression graphs can be used to tighten bounds on variables
to provide tighter outer approximations of nonconvex expressions, detect
convexity (e.g., for quadratic constraints), and propagate constraints.
Symmetry arises in many MINLP formulations and can mean that
a problem or subproblem may have many symmetric optima or near op-
tima, resulting in large search trees and inefficient pruning. Symmetry in
mathematical programming (L. Liberti) describes how the symmetry group
of a MINLP can be detected by parsing the expression graph. Once the
symmetry group is known, we can add symmetry-breaking constraints or
employ special branching schemes such as orbital branching that mitigate
the adverse effects of symmetry.
Part V. Convexification and linearization. A popular and classical
approach for handling nonconvex functions is to approximate them by using
piecewise-linear functions. This approach requires the addition of binary
variables that model the piecewise approximation. The advantage of such
an approach is that advanced MILP techniques can be applied. The disad-
vantage of the approach is that the approximations are not exact and that
it suffers from the curse of dimensionality.
Using piecewise linear functions for solving MINLPs (B. Geißler, A.
Martin, A. Morsi, and L. Schewe) details how to carry out piecewise-linear
approximation for MINLP. The authors review two formulations of piece-
wise linearization: the convex combination technique and the incremental
technique. They introduce a piecewise-polyhedral outer-approximation al-
gorithm based on rigorous error estimates, and they demonstrate compu-
tational success on water network and gas network problems.
A global-optimization algorithm for mixed-integer nonlinear programs
having separable nonconvexity (C. D’Ambrosio, J. Lee, and A. Wächter)
introduces a method for MINLPs that have all of their nonconvexity in
separable form. The approach aims to retain and exploit existing convexity
in the formulation.
Global optimization of mixed-integer signomial programming problems
(A. Lundell and T. Westerlund) describes a global optimization algorithm
for MINLPs containing signomial functions. The method obtains a convex
relaxation through reformulations, by using single-variable transformations
in concert with piecewise-linear approximations of the inverse transforma-
tions.
Part VI. Mixed-integer quadratically-constrained optimization.
In seeking a more structured setting than general MINLP, but with consid-
erably more modeling power than is afforded by MILP, one naturally con-
siders mixed-integer models with quadratic functions, namely, MIQCPs.
Such models are NP-hard, but they have enough structure that can be
exploited in order to gain computational advantages over treating such
problems as general MINLPs.

www.it-ebooks.info
PREFACE xi

The MILP road to MIQCP (S. Burer and A. Saxena) surveys re-
sults in mixed-integer quadratically constrained programming. Strong con-
vex relaxations and valid inequalities are the basis of efficient, practical
techniques for global optimization. Some of the relaxations and inequal-
ities are derived from the algebraic formulation, while others are based
on disjunctive programming. Much of the inspiration derives from MILP
methodology.
Linear programming relaxations of quadratically-constrained quadratic
programs (A. Qualizza, P. Belotti, and F. Margot) investigates the use
of LP tools for approximately solving semidefinite programming (SDP)
relaxations of quadratically-constrained quadratic programs. The authors
present classes of valid linear inequalities based on spectral decomposition,
together with computational results.
Extending a CIP framework to solve MIQCPs (T. Berthold, S. Heinz,
and S. Vigerske) discusses how to build a solver for MIQCPs by extending a
framework for constraint integer programming (CIP). The advantage of this
approach is that we can utilize the full power of advanced MILP and con-
straint programming technologies. For relaxation, the approach employs
an outer approximation generated by linearization of convex constraints
and linear underestimation of nonconvex constraints. Reformulation, sep-
aration, and propagation techniques are used to handle the quadratic con-
straints efficiently. The authors implemented these methods in the branch-
cut-and-price framework SCIP.

Part VII. Combinatorial optimization. Because of the success of


MILP methods and because of beautiful and algorithmically important re-
sults from polyhedral combinatorics, nonlinear functions and formulations
have not been heavily investigated for combinatorial optimization prob-
lems. With improvements in software for general NLP, SDP, and MINLP,
however, researchers are now investing considerable effort in trying to ex-
ploit these gains for combinatorial-optimization problems.
Computation with polynomial equations and inequalities arising in
combinatorial optimization (J.A. De Loera, P.N. Malkin, and P.A. Par-
rilo) discusses how the algebra of multivariate polynomials can be used to
create large-scale linear algebra or semidefinite-programming relaxations of
many kinds of combinatorial feasibility and optimization problems.
Matrix relaxations in combinatorial optimization (F. Rendl) discusses
the use of SDP as a modeling tool in combinatorial optimization. The
main techniques to get matrix relaxations of combinatorial-optimization
problems are presented. Semidefiniteness constraints lead to tractable re-
laxations, while constraints that matrices be completely positive or copos-
itive do not. This survey illustrates the enormous power and potential of
matrix relaxations.
A polytope for a product of real linear functions in 0/1 variables (O.
Günlük, J. Lee, and J. Leung) uses polyhedral methods to give a tight

www.it-ebooks.info
xii PREFACE

formulation for the convex hull of a product of two linear functions in


0/1 variables. As an example, by writing a pair of general integer vari-
ables in binary expansion, the authors have a technique for linearizing their
product.
Part VIII. Complexity. General MINLP is incomputable, independent
of conjectures such as P=NP. From the point of view of complexity the-
ory, however, considerable room exists for negative results (e.g., incom-
putability, intractablility and inapproximability results) and positive re-
sults (e.g., polynomial-time algorithms and approximations schemes) for
restricted classes of MINLPs.
On the complexity of nonlinear mixed-integer optimization (M. Köppe)
is a survey on the computational complexity of MINLP. It includes incom-
putability results that arise from number theory and logic, fully polynomial-
time approximation schemes in fixed dimension, and polynomial-time al-
gorithms for special cases.
Theory and applications of n-fold integer programming (S. Onn) is
an overview of the theory of n-fold integer programming, which enables
the polynomial-time solution of fundamental linear and nonlinear inte-
ger programming problems in variable dimension. This framework yields
polynomial-time algorithms in several application areas, including multi-
commodity flows and privacy in statistical databases.
Part IX. Applications. A wide range of applications of MINLP exist.
This section focuses on two new application domains.
MINLP application for ACH interiors restructuring (E. Klampfl and
Y. Fradkin) describes a very large-scale application of MINLP developed
by the Ford Motor Company. The MINLP models the re-engineering of
42 product lines over 26 manufacturing processes and 50 potential supplier
sites. The resulting MINLP model has 350,000 variables (17,000 binary)
and 1.6 million constraints and is well beyond the size that state-of-the-art
MINLP solvers can handle. The authors develop a piecewise-linearization
scheme for the objective and a decomposition technique that decouples the
problem into two coupled MILPs that are solved iteratively.
A benchmark library of mixed-integer optimal control problems (S.
Sager) describes a challenging new class of MINLPs. These are optimal
control problems, involving differential-algebraic equation constraints and
integrality restrictions on the controls, such as gear ratios. The authors de-
scribe 12 models from a range of applications, including biology, industrial
engineering, trajectory optimization, and process control.
Acknowledgments. We gratefully acknowledge the generous financial
support from the IMA that made this workshop possible, as well as fi-
nancial support from IBM. This work was supported in part by the Of-
fice of Advanced Scientific Computing Research, Office of Science, U.S.
Department of Energy, under Contract DE-AC02-06CH11357. Special

www.it-ebooks.info
PREFACE xiii

thanks are due to Fadil Santosa, Chun Liu, Patricia Brick, Dzung Nguyen,
Holly Pinkerton, and Eve Marofsky from the IMA, who made the organi-
zation of the workshop and the publication of this special volume such an
easy and enjoyable affair.

Jon Lee
University of Michigan

Sven Leyffer
Argonne National Laboratory

www.it-ebooks.info
www.it-ebooks.info
CONTENTS

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Part I: Convex MINLP


Algorithms and software for convex mixed integer nonlinear
programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Pierre Bonami, Mustafa Kilinç, and Jeff Linderoth

Subgradient based outer approximation for mixed integer second


order cone programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Sarah Drewes and Stefan Ulbrich

Perspective reformulation and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


Oktay Günlük and Jeff Linderoth

Part II: Disjunctive Programming


Generalized disjunctive programming: A framework for formulation
and alternative algorithms for MINLP optimization . . . . . . . . . . . . . . . . . . . 93
Ignacio E. Grossmann and Juan P. Ruiz

Disjunctive cuts for nonconvex MINLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117


Pietro Belotti

Part III: Nonlinear Programming


Sequential quadratic programming methods . . . . . . . . . . . . . . . . . . . . . . . . . 147
Philip E. Gill and Elizabeth Wong

Using interior-point methods within an outer approximation


framework for mixed integer nonlinear programming . . . . . . . . . . . . . . . . . 225
Hande Y. Benson

Part IV: Expression Graphs


Using expression graphs in optimization algorithms . . . . . . . . . . . . . . . . . . 247
David M. Gay

Symmetry in mathematical programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263


Leo Liberti

xv

www.it-ebooks.info
xvi CONTENTS

Part V: Convexification and Linearization


Using piecewise linear functions for solving MINLPs . . . . . . . . . . . . . . . . . 287
Björn Geißler, Alexander Martin, Antonio Morsi, and Lars Schewe

An algorithmic framework for MINLP with separable


non-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Claudia D’Ambrosio, Jon Lee, and Andreas Wächter

Global optimization of mixed-integer signomial programming


problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Andreas Lundell and Tapio Westerlund

Part VI: Mixed-Integer Quadraticaly


Constrained Optimization
The MILP road to MIQCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
Samuel Burer and Anureet Saxena

Linear programming relaxations of quadratically constrained


quadratic programs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Andrea Qualizza, Pietro Belotti, and François Margot

Extending a CIP framework to solve MIQCPs . . . . . . . . . . . . . . . . . . . . . . . 427


Timo Berthold, Stefan Heinz, and Stefan Vigerske

Part VII: Combinatorial Optimization


Computation with polynomial equations and inequalities arising
in combinatorial optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Jesus A. De Loera, Peter N. Malkin, and Pablo A. Parrilo

Matrix relaxations in combinatorial optimization . . . . . . . . . . . . . . . . . . . . 483


Franz Rendl

A polytope for a product of real linear functions in 0/1 variables. . . . . 513


Oktay Günlük, Jon Lee, and Janny Leung

Part VIII: Complexity


On the complexity of nonlinear mixed-integer optimization . . . . . . . . . . 533
Matthias Köppe

Theory and applications of n-fold integer programming . . . . . . . . . . . . . . 559


Shmuel Onn

www.it-ebooks.info
CONTENTS xvii

Part IX: Applications


MINLP Application for ACH interiors restructuring . . . . . . . . . . . . . . . . . 597
Erica Klampfl and Yakov Fradkin

A benchmark library of mixed-integer optimal control problems . . . . . . 631


Sebastian Sager

List of Hot Topics participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671

www.it-ebooks.info
www.it-ebooks.info
PART I:
Convex MINLP

www.it-ebooks.info
www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR
CONVEX MIXED INTEGER NONLINEAR PROGRAMS
PIERRE BONAMI∗ , MUSTAFA KILINdž , AND JEFF LINDEROTH‡

Abstract. This paper provides a survey of recent progress and software for solving
convex Mixed Integer Nonlinear Programs (MINLP)s, where the objective and con-
straints are defined by convex functions and integrality restrictions are imposed on a
subset of the decision variables. Convex MINLPs have received sustained attention in
recent years. By exploiting analogies to well-known techniques for solving Mixed Integer
Linear Programs and incorporating these techniques into software, significant improve-
ments have been made in the ability to solve these problems.

Key words. Mixed Integer Nonlinear Programming; Branch and Bound.

1. Introduction. Mixed-Integer Nonlinear Programs (MINLP)s are


optimization problems where some of the variables are constrained to take
integer values and the objective function and feasible region of the problem
are described by nonlinear functions. Such optimization problems arise in
many real world applications. Integer variables are often required to model
logical relationships, fixed charges, piecewise linear functions, disjunctive
constraints and the non-divisibility of resources. Nonlinear functions are
required to accurately reflect physical properties, covariance, and economies
of scale.
In full generality, MINLPs form a particularly broad class of challeng-
ing optimization problems, as they combine the difficulty of optimizing
over integer variables with the handling of nonlinear functions. Even if we
restrict our model to contain only linear functions, MINLP reduces to a
Mixed-Integer Linear Program (MILP), which is an NP-Hard problem [55].
On the other hand, if we restrict our model to have no integer variable but
allow for general nonlinear functions in the objective or the constraints,
then MINLP reduces to a Nonlinear Program (NLP) which is also known
to be NP-Hard [90]. Combining both integrality and nonlinearity can lead
to examples of MINLP that are undecidable [67].

∗ Laboratoire d’Informatique Fondamentale de Marseille, CNRS, Aix-Marseille Uni-

versités, Parc Scientifique et Technologique de Luminy, 163 avenue de Luminy - Case


901, F-13288 Marseille Cedex 9, France ([email protected]). Supported
by ANR grand BLAN06-1-138894.
† Department of Industrial and Systems Engineering, University of Wisconsin-

Madison, 1513 University Ave., Madison, WI, 53706 ([email protected]).


‡ Department of Industrial and Systems Engineering, University of Wisconsin-

Madison, 1513 University Ave., Madison, WI 53706 ([email protected]). The work of


the second and third authors is supported by the US Department of Energy under grants
DE-FG02-08ER25861 and DE-FG02-09ER25869, and the National Science Foundation
under grant CCF-0830153.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 1
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_1,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
2 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

In this paper, we restrict ourselves to the subclass of MINLP where


the objective function to minimize is convex, and the constraint functions
are all convex and upper bounded. In these instances, when integrality is
relaxed, the feasible set is convex. Convex MINLP is still NP-hard since it
contains MILP as a special case. Nevertheless, it can be solved much more
efficiently than general MINLP since the problem obtained by dropping
the integrity requirements is a convex NLP for which there exist efficient
algorithms. Further, the convexity of the objective function and feasible
region can be used to design specialized algorithms.
There are many diverse and important applications of MINLPs. A
small subset of these applications includes portfolio optimization [21, 68],
block layout design in the manufacturing and service sectors [33, 98], net-
work design with queuing delay constraints [27], integrated design and con-
trol of chemical processes [53], drinking water distribution systems security
[73], minimizing the environmental impact of utility plants [46], and multi-
period supply chain problems subject to probabilistic constraints [75].
Even though convex MINLP is NP-Hard, there are exact methods for
its solution—methods that terminate with a guaranteed optimal solution
or prove that no such solution exists. In this survey, our main focus is on
such exact methods and their implementation.
In the last 40 years, at least five different algorithms have been pro-
posed for solving convex MINLP to optimality. In 1965, Dakin remarked
that the branch-and-bound method did not require linearity and could be
applied to convex MINLP. In the early 70’s, Geoffrion [56] generalized Ben-
ders decomposition to make an exact algorithm for convex MINLP. In the
80’s, Gupta and Ravindran studied the application of branch and bound
[62]. At the same time, Duran and Grossmann [43] introduced the Outer
Approximation decomposition algorithm. This latter algorithm was subse-
quently improved in the 90’s by Fletcher and Leyffer [51] and also adapted
to the branch-and-cut framework by Quesada and Grossmann [96]. In the
same period, a related method called the Extended Cutting Plane method
was proposed by Westerlund and Pettersson [111]. Section 3 of this paper
will be devoted to reviewing in more detail all of these methods.
Two main ingredients of the above mentioned algorithms are solving
MILP and solving NLP. In the last decades, there have been enormous
advances in our ability to solve these two important subproblems of convex
MINLP.
We refer the reader to [100, 92] and [113] for in-depth analysis of the
theory of MILP. The advances in the theory of solving MILP have led to
the implementation of solvers both commercial and open-source which are
now routinely used to solve many industrial problems of large size. Bixby
and Rothberg [22] demonstrate that advances in algorithmic technology
alone have resulted in MILP instances solving more than 300 times faster
than a decade ago. There are effective, robust commercial MILP solvers

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 3

such as CPLEX [66], XPRESS-MP [47], and Gurobi [63]. Linderoth and
Ralphs [82] give a survey of noncommercial software for MILP.
There has also been steady progress over the past 30 years in the de-
velopment and successful implementation of algorithms for NLPs. We refer
the reader to [12] and [94] for a detailed recital of nonlinear programming
techniques. Theoretical developments have led to successful implemen-
tations in software such as SNOPT [57], filterSQP [52], CONOPT [42],
IPOPT [107], LOQO [103], and KNITRO [32]. Waltz [108] states that the
size of instance solvable by NLP is growing by nearly an order of magnitude
a decade.
Of course, solution algorithms for convex MINLP have benefit from
the technological progress made in solving MILP and NLP. However, in the
realm of MINLP, the progress has been far more modest, and the dimension
of solvable convex MINLP by current solvers is small when compared to
MILPs and NLPs. In this work, our goal is to give a brief introduction to
the techniques which are in state-of-the-art solvers for convex MINLPs. We
survey basic theory as well as recent advances that have made their way
into software. We also attempt to make a fair comparison of all algorithmic
approaches and their implementations.
The remainder of the paper can be outlined as follows. A precise de-
scription of a MINLP and algorithmic building blocks for solving MINLPs
are given in Section 2. Section 3 outlines five different solution techniques.
In Section 4, we describe in more detail some advanced techniques imple-
mented in the latest generation of solvers. Section 5 contains descriptions of
several state-of-the-art solvers that implement the different solution tech-
niques presented. Finally, in Section 6 we present a short computational
comparison of those software packages.
2. MINLP. The focus of this section is to mathematically define a
MINLP and to describe important special cases. Basic elements of algo-
rithms and subproblems related to MINLP are also introduced.
2.1. MINLP problem classes. A Mixed Integer Nonlinear Program
may be expressed in algebraic form as follows:

zminlp = minimize f (x)


subject to gj (x) ≤ 0 ∀j ∈ J, (MINLP)
|I|
x ∈ X, xI ∈ Z ,

where X is a polyhedral subset of Rn (e.g. X = {x | x ∈ Rn+ , Ax ≤ b}).


The functions f : X → R and gj : X → R are sufficiently smooth functions.
The algorithms presented here only require continuously differentiable func-
tions, but in general algorithms for solving continuous relaxations converge
much faster if functions are twice-continuously differentiable. The set J is
the index set of nonlinear constraints, I is the index set of discrete variables
and C is the index set of continuous variables, so I ∪ C = {1, . . . , n}.

www.it-ebooks.info
4 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

For convenience, we assume that the set X is bounded; in particular


some finite lower bounds LI and upper bounds UI on the values of the
integer variables are known. In most applications, discrete variables are
restricted to 0-1 values, i.e., xi ∈ {0, 1} ∀i ∈ I. In this survey, we focus on
the case where the functions f and gj are convex. Thus, by relaxing the
integrality constraint on x, a convex program, minimization of a convex
function over a convex set, is formed. We will call such problems convex
MINLPs. From now on, unless stated, we will refer convex MINLPs as
MINLPs.
There are a number of important special cases of MINLP. If f (x) =
xT Qx + dT x + h, is a (convex) quadratic function of x, and there are only
linear constraints on the problem (J = ∅), the problem is known as a mixed
integer quadratic program (MIQP). If both f (x) and gj (x) are quadratic
functions of x for each j ∈ J, the problem is known as a mixed integer
quadratically constrained program (MIQCP). Significant work was been
devoted to these important special cases [87, 29, 21].
If the objective function is linear, and all nonlinear constraints have
the form gj (x) = Ax + b2 − cT x − d, then the problem is a Mixed Integer
Second-Order Cone Program (MISOCP). Through a well-known transfor-
mation, MIQCP can be transformed into a MISOCP. In fact, many different
types of sets defined by nonlinear constraints are representable via second-
order cone inequalities. Discussion of these transformations is out of the
scope of this work, but the interested reader may consult [15]. Relatively
recently, commercial software packages such as CPLEX [66], XPRESS-MP
[47], and Mosek [88] have all been augmented to include specialized al-
gorithms for solving these important special cases of convex MINLPs. In
what follows, we focus on general convex MINLP and software available
for its solution.

2.2. Basic elements of MINLP methods. The basic concept un-


derlying algorithms for solving (MINLP) is to generate and refine bounds
on its optimal solution value. Lower bounds are generated by solving a
relaxation of (MINLP), and upper bounds are provided by the value of
a feasible solution to (MINLP). Algorithms differ in the manner in which
these bounds are generated and the sequence of subproblems that are solved
to generate these bounds. However, algorithms share many basic common
elements, which are described next.

Linearizations: Since the objective function of (MINLP) may be non-


linear, its optimal solution may occur at a point that is interior to the
convex hull of its set of feasible solutions. It is simple to transform the
instance to have a linear objective function by introducing an auxiliary
variable η and moving the original objective function into the constraints.
Specifically, (MINLP) may be equivalently stated as

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 5

zminlp = minimize η
subject to f (x) ≤ η
gj (x) ≤ 0 ∀j ∈ J, (MINLP-1)
|I|
x ∈ X, xI ∈ Z .

Many algorithms rely on linear relaxations of (MINLP), obtained by


linearizing the objective and constraint functions at a given point x̂. Since
f and gj are convex and differentiable, the inequalities

f (x̂) + ∇f (x̂)T (x − x̂) ≤ f (x),


gj (x̂) + ∇gj (x̂)T (x − x̂) ≤ gj (x),

are valid for all j ∈ J and x̂ ∈ Rn . Since f (x) ≤ η and gj (x) ≤ 0, then the
linear inequalities

f (x̂) + ∇f (x̂)T (x − x̂) ≤ η, (2.1)


gj (x̂) + ∇gj (x̂)T (x − x̂) ≤ 0 (2.2)

are valid for (MINLP-1). Linearizations of gj (x) outer approximate the


feasible region, and linearizations of f (x) underestimate the objective func-
tion. We often refer to (2.1)–(2.2) as outer approximation constraints.
Subproblems: One important subproblem used by a variety of algo-
rithms for (MINLP) is formed by relaxing the integrity requirements and
restricting the bounds on the integer variables. Given bounds (lI , uI ) =
{(i , ui ) | ∀i ∈ I}, the NLP relaxation of (MINLP) is

znlpr(l,u) = minimize f (x)


subject to gj (x) ≤ 0 ∀j ∈ J, (NLPR(lI , uI ))
x ∈ X; lI ≤ xI ≤ uI .

The value znlpr(l,u) is a lower bound on the value of zminlp that can be
obtained in the subset of the feasible region of (MINLP) where the bounds
I ≤ xI ≤ uI are imposed. Specifically, if (lI , uI ) are the lower and upper
bounds (LI , UI ) for the original instance, then zNLPR(LI ,UI ) provides a
lower bound on zminlp .
In the special case that all of the integer variables are fixed (lI = uI =
x̂I ), the fixed NLP subproblem is formed:

zNLP(x̂I ) = minimize f (x)


subject to gj (x) ≤ 0, ∀j ∈ J (NLP(x̂I ))
x ∈ X; xI = x̂I .

If x̂I ∈ Z|I| and (NLP(x̂I )) has a feasible solution, the value zNLP(x̂I ) pro-
vides an upper bound to the problem (MINLP). If (NLP(x̂I )) is infeasible,

www.it-ebooks.info
6 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

NLP software typically will deduce infeasibility by solving an associated


feasibility subproblem. One choice of feasibility subproblem employed by
NLP solvers is

m
zNLPF(x̂I ) = minimize wj gj (x)+
j=1
s.t. x ∈ X, xI = x̂I , (NLPF(x̂I ))

where gj (x)+ = max{0, gj (x)} measures the violation of the nonlinear con-
straints and wj ≥ 0. Since when NLP(x̂I ) is infeasible NLP solvers will
return the solution to NLPF(x̂I ), we will often say, by abuse of terminology,
that NLP(x̂I ) is solved and its solution x is optimal or minimally infeasible,
meaning that it is the optimal solution to NLPF(x̂I ).
3. Algorithms for convex MINLP. With elements of algorithms
defined, attention can be turned to describing common algorithms for
solving MINLPs. The algorithms share many general characteristics with
the well-known branch-and-bound or branch-and-cut methods for solving
MILPs.
3.1. NLP-Based Branch and Bound. Branch and bound is a
divide-and-conquer method. The dividing (branching) is done by parti-
tioning the set of feasible solutions into smaller and smaller subsets. The
conquering (fathoming) is done by bounding the value of the best feasible
solution in the subset and discarding the subset if its bound indicates that
it cannot contain an optimal solution.
Branch and bound was first applied to MILP by Land and Doig [74].
The method (and its enhancements such as branch and cut) remain the
workhorse for all of the most successful MILP software. Dakin [38] real-
ized that this method does not require linearity of the problem. Gupta
and Ravindran [62] suggested an implementation of the branch-and-bound
method for convex MINLPs and investigated different search strategies.
Other early works related to NLP-Based Branch and Bound (NLP-BB for
short) for convex MINLP include [91], [28], and [78].
In NLP-BB, the lower bounds come from solving the subproblems
(NLPR(lI , uI )). Initially, the bounds (LI , UI ) (the lower and upper bounds
on the integer variables in (MINLP)) are used, so the algorithm is initialized
with a continuous relaxation whose solution value provides a lower bound
on zminlp . The variable bounds are successively refined until the subregion
can be fathomed. Continuing in this manner yields a tree L of subproblems.
A node N of the search tree is characterized by the bounds enforced on its
def
integer variables: N = (lI , uI ). Lower and upper bounds on the optimal
solution value zL ≤ zminlp ≤ zU are updated through the course of the
algorithm. Algorithm 1 gives pseudocode for the NLP-BB algorithm for
solving (MINLP).

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 7

Algorithm 1 The NLP-Based Branch-and-Bound algorithm


0. Initialize.
L ← {(LI , UI )}. zU = ∞. x∗ ← NONE.
1. Terminate?
Is L = ∅? If so, the solution x∗ is optimal.
2. Select.
Choose and delete a problem N i = (lIi , uiI ) from L.
3. Evaluate.
Solve NLPR(lIi , uiI ). If the problem is infeasible, go to step 1, else
let znlpr(lIi ,uiI ) be its optimal objective function value and x̂i be its
optimal solution.
4. Prune.
If znlpr(lIi ,uiI ) ≥ zU , go to step 1. If x̂i is fractional, go to step 5,
else let zU ← znlpr(lIi ,uiI ) , x∗ ← x̂i , and delete from L all problems
j
with zL ≥ zU . Go to step 1.
5. Divide.
Divide the feasible region of N i into a number of smaller feasi-
ble subregions, creating nodes N i1 , N i2 , . . . , N ik . For each j =
i
1, 2, . . . , k, let zLj ← znlpr(lIi ,uiI ) and add the problem N ij to L. Go
to 1.

As described in step 4 of Algorithm 1, if NLPR(lIi , uiI ) yields an in-


tegral solution (a solution where all discrete variables take integer values),
then znlpr(lIi ,uiI ) gives an upper bound for MINLP. Fathoming of nodes oc-
curs when the lower bound for a subregion obtained by solving NLPR(lIi ,
uiI ) exceeds the current upper bound zU , when the subproblem is infeasi-
ble, or when the subproblem provides a feasible integral solution. If none
of these conditions is met, the node cannot be pruned and the subregion is
divided to create new nodes. This Divide step of Algorithm 1 may be per-
formed in many ways. In most successful implementations, the subregion
is divided by dichotomy branching. Specifically, the feasible region of N i is
divided into subsets by changing bounds on one integer variable based on
the solution x̂i to NLPR(lIi , uiI ). An index j ∈ I such that x̂j ∈ Z is chosen
and two new children nodes are created by adding the bound xj ≤ x̂j  to
one child and xj ≥ x̂j  to the other child. The tree search continues until
all nodes are fathomed, at which point x∗ is the optimal solution.
The description makes it clear that there are various choices to be
made during the course of the algorithm. Namely, how do we select which
subproblem to evaluate, and how do we divide the feasible region? A partial
answer to these two questions will be provided in Sections 4.2 and 4.3.
The NLP-Based Branch-and-Bound algorithm is implemented in solvers
MINLP-BB [77], SBB [30], and Bonmin [24].

www.it-ebooks.info
8 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

3.2. Outer Approximation. The Outer Approximation (OA)


method for solving (MINLP) was first proposed by Duran and Grossmann
[43]. The fundamental insight behind the algorithm is that (MINLP)
is equivalent to a Mixed Integer Linear Program (MILP) of finite size.
The MILP is constructed by taking linearizations of the objective and
constraint functions about the solution to the subproblem NLP(x̂I ) or
NLPF(x̂I ) for various choices of x̂I . Specifically, for each integer assign-
ment x̂I ∈ ProjxI (X) ∩ Z|I| (where ProjxI (X) denotes the projection of X
onto the space of integer constrained variables), let x ∈ arg min NLP(x̂I )
be an optimal solution to the NLP subproblem with integer variables fixed
according to x̂I . If NLP(x̂I ) is not feasible, then let x ∈ arg min NLPF(x̂I )
be an optimal solution to its corresponding feasibility problem. Since
ProjxI (X) is bounded by assumption, there are a finite number of sub-
problems NLP(x̂I ). For each of these subproblems, we choose one optimal
solution, and let K be the (finite) set of these optimal solutions. Using
these definitions, an outer-approximating MILP can be specified as

zoa = min η
s.t. η ≥ f (x) + ∇f (x)T (x − x) x ∈ K, (MILP-OA)
gj (x) + ∇gj (x) (x − x) ≤ 0
T
j ∈ J, x ∈ K,
x ∈ X, xI ∈ Z .I

The equivalence between (MINLP) and (MILP-OA) is specified in the


following theorem:
Theorem 3.1. [43, 51, 24] If X = ∅, f and g are convex, continuously
differentiable, and a constraint qualification holds for each xk ∈ K then
zminlp = zoa . All optimal solutions of (MINLP) are optimal solutions of
(MILP-OA).
From a practical point of view it is not relevant to try and formulate
explicitly (MILP-OA) to solve (MINLP)—to explicitly build it, one would
have first to enumerate all feasible assignments for the integer variables
in X and solve the corresponding nonlinear programs NLP(x̂I ). The OA
method uses an MILP relaxation (MP(K)) of (MINLP) that is built in a
manner similar to (MILP-OA) but where linearizations are only taken at
a subset K of K:

zmp(K) = min η
s.t. η ≥ f (x̄) + ∇f (x̄)T (x − x̄) x̄ ∈ K, (MP(K))
gj (x̄) + ∇gj (x̄)T (x − x̄) ≤ 0 j ∈ J, x̄ ∈ K,
x ∈ X, xI ∈ ZI .

We call this problem the OA-based reduced master problem. The solu-
tion value of the reduced master problem (MP(K)), zmp(K) , gives a lower
bound to (MINLP), since K ⊆ K. The OA method proceeds by iteratively

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 9

adding points to the set K. Since function linearizations are accumulated


as iterations proceed, the reduced master problem (MP(K)) yields a non-
decreasing sequence of lower bounds.
OA typically starts by solving (NLPR(LI ,UI )). Linearizations about
the optimal solution to (NLPR(lI , uI )) are used to construct the first re-
duced master problem (MP(K)). Then, (MP(K)) is solved to optimality to
give an integer solution, x̂. This integer solution is then used to construct
the NLP subproblem (NLP(x̂I )). If (NLP(x̂I )) is feasible, linearizations
about the optimal solution of (NLP(x̂I )) are added to the reduced master
problem. These linearizations eliminate the current solution x̂ from the fea-
sible region of (MP(K)) unless x̂ is optimal for (MINLP). Also, the optimal
solution value zNLP(x̂I ) yields an upper bound to MINLP. If (NLP(x̂I )) is
infeasible, the feasibility subproblem (NLPF(x̂I )) is solved and lineariza-
tions about the optimal solution of (NLPF(x̂I )) are added to the reduced
master problem (MP(K)). The algorithm iterates until the lower and upper
bounds are within a specified tolerance . Algorithm 2 gives pseudocode
for the method. Theorem 3.1 guarantees that this algorithm cannot cycle
and terminates in a finite number of steps.
Note that the reduced master problem need not be solved to optimal-
ity. In fact, given the upper bound U B and a tolerance , it is sufficient
to generate any new (η̂, x̂) that is feasible to (MP(K)), satisfies the in-
tegrality requirements, and for which η ≤ U B − . This can usually be
achieved by setting a cutoff value in the MILP software to enforce the con-
straint η ≤ U B − . If a cutoff value is not used, then the infeasibility of
(MP(K)) implies the infeasibility of (MINLP). If a cutoff value is used, the
OA iterations are terminated (Step 1 of Algorithm 2) when the OA mas-
ter problem has no feasible solution. OA is implemented in the software
packages DICOPT [60] and Bonmin [24].
3.3. Generalized Benders Decomposition. Benders Decomposi-
tion was introduced by Benders [16] for the problems that are linear in the
“easy” variables, and nonlinear in the “complicating“ variables. Geoffrion
[56] introduced the Generalized Benders Decomposition (GBD) method for
MINLP. The GBD method is very similar to the OA method, differing only
in the definition of the MILP master problem. Specifically, instead of us-
ing linearizations for each nonlinear constraint, GBD uses duality theory to
derive one single constraint that combines the linearizations derived from
all the original problem constraints.
In particular, let x be the optimal solution to (NLP(x̂I )) for a given
integer assignment x̂I and μ ≥ 0 be the corresponding optimal Lagrange
multipliers. The following generalized Benders cut is valid for (MINLP)

η ≥f (x) + (∇I f (x) + ∇I g(x)μ)T (xI − x̂I ). (BC(x̂))

Note that xI = x̂I , since the integer variables are fixed. In (BC(x̂)), ∇I
refers to the gradients of functions f (or g) with respect to discrete vari-

www.it-ebooks.info
10 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

Algorithm 2 The Outer Approximation algorithm.


0. Initialize.
zU ← +∞. zL ← −∞. x∗ ← NONE. Let x0 be the optimal solution
of (NLPR(L
  I ,UI ))
K ← x0 . Choose a convergence tolerance .
1. Terminate?
Is zU − zL <  or (MP(K)) infeasible? If so, x∗ is −optimal.
2. Lower Bound
Let zMP(K) be the optimal value of MP(K) and (η̂, x̂) its optimal
solution.
zL ← zMP(K)
3. NLP Solve
Solve (NLP(x̂I )).
Let xi be the optimal (or minimally infeasible) solution.
4. Upper Bound?
Is xi feasible for (MINLP) and f (xi ) < zU ? If so, x∗ ← xi and
zU ← f (xi ).
5. Refine
K ← K ∪ {xi } and i ← i + 1.
Go to 1.

ables. The inequality (BC(x̂)) is derived by building a surrogate of the


OA constraints using the multipliers μ and simplifying the result using the
Karush-Kuhn-Tucker conditions satisfied by x (which in particular elimi-
nates the continuous variables from the inequality).
If there is no feasible solution to (NLP(x̂I )), a feasibility cut can be
obtained similarly by using the solution x to (NLPF(x̂I )) and corresponding
multipliers λ ≥ 0:
T
λ [g(x) + ∇I g(x)T (xI − x̂I )] ≤ 0. (FCY(x̂))

In this way, a relaxed master problem similar to (MILP-OA) can be


defined as:

zgbd(KFS,KIS) = min η
s.t. η ≥ f (x) + (∇I f (x) + ∇I g(x)μ)T (xI − xI ) ∀x ∈ KFS,
T
λ [g(x) + ∇I g(x)T (xI − xI )] ≤ 0 ∀x ∈ KIS,
(RM-GBD)
x ∈ X, xI ∈ ZI ,

where KFS is the set of solutions to feasible subproblems (NLP(x̂I )) and


KIS is the set solutions to infeasible subproblems (NLPF(x̂I )). Conver-
gence results for the GBD method are similar to those for OA.

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 11

Theorem 3.2. [56] If X = ∅, f and g are convex, and a constraint


qualification holds for each xk ∈ K, then zminlp = zgbd(KFS,KIS) . The
algorithm terminates in a finite number of steps.
The inequalities used to create the master problem (RM-GBD) are
aggregations of the inequalities used for (MILP-OA). As such, the lower
bound obtained by solving a reduced version of (RM-GBD) (where only
a subset of the constraints is considered) can be significantly weaker than
for (MP(K)). This may explain why there is no available solver that uses
solely the GBD method for solving convex MINLP. Abhishek, Leyffer and
Linderoth [2] suggest to use the Benders cuts to aggregate inequalities in
an LP/NLP-BB algorithm (see Section 3.5).
3.4. Extended Cutting Plane. Westerlund and Pettersson [111]
proposed the Extended Cutting Plane (ECP) method for convex MINLPs,
which is an extension of Kelley’s cutting plane method [70] for solving
convex NLPs. The ECP method was further extended to handle pseudo-
convex function in the constraints [109] and in the objective [112] in the
α-ECP method. Since this is beyond our definition of (MINLP), we give
only here a description of the ECP method when all functions are convex.
The reader is invited to refer to [110] for an up-to-date description of this
enhanced method. The main feature of the ECP method is that it does not
require the use of an NLP solver. The algorithm is based on the iterative
solution of a reduced master problem (RM-ECP(K)). Linearizations of the
most violated constraint at the optimal solution of (RM-ECP(K)) are added
at every iteration. The MILP reduced master problem (RM-ECP(K)) is
defined as:

zecp(K) = min η
s.t. η ≥ f (x̄) + ∇f (x̄)T (x − x̄) x̄ ∈ K (RM-ECP(K))
gj (x̄) + ∇gj (x̄)T (x − x̄) ≤ 0 x̄ ∈ K j ∈ J(x̄)
x ∈ X, xI ∈ ZI
def
where J(x̄) = {j ∈ arg maxj∈J gj (x̄)} is the index set of most violated
constraints for each solution x̄ ∈ K, the set of solutions to (RM-ECP(K)).
It is also possible to add linearizations of all violated constraints to (RM-
ECP(K)). In that case, J(x̄) = {j | gj (x̄) > 0}. Algorithm 3 gives the
pseudocode for the ECP algorithm.
The optimal values zecp(K) of (RM-ECP(K)) generate a non-
decreasing sequence of lower bounds. Finite convergence of the algorithm
is achieved when the maximum constraint violation is smaller than a spec-
ified tolerance . Theorem 3.3 states that the sequence of objective values
obtained from the solutions to (RM-ECP(K)) converge to the optimal so-
lution value.
Theorem 3.3. [111] If X = ∅ is compact and f and g are convex and
continuously differentiable, then zecp(K) converges to zminlp .

www.it-ebooks.info
12 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

The ECP method may require a large number of iterations, since the
linearizations added at Step 3 are not coming from solutions to NLP sub-
problems. Convergence can often be accelerated by solving NLP subprob-
lems (NLP(x̂I )) and adding the corresponding linearizations, as in the OA
method. The Extended Cutting Plane algorithm is implemented in the
α-ECP software [110].

Algorithm 3 The Extended Cutting Plane algorithm


0. Initialize.
Choose convergence tolerance . K ← ∅.
1. Lower Bound
Let (η i , xi ) be the optimal solution to (RM-ECP(K)).
2. Terminate?
Is gj (x̄i ) <  ∀j ∈ J and f (x̄i ) − η̄ i < ? If so, xi is optimal with
−feasibility.
3. Refine
K ← K ∪ {xi }, t ∈ arg maxj gj (x̄i ), and J(x̄i ) = {t}
i ← i + 1. Go to 1.

3.5. LP/NLP-Based Branch-and-Bound. The LP/NLP-Based


Branch-and-Bound algorithm (LP/NLP-BB) was first proposed by Que-
sada and Grossmann [96]. The method is an extension of the OA method
outlined in Section 3.2, but instead of solving a sequence of master prob-
lems (MP(K)), the master problem is dynamically updated in a single
branch-and-bound tree that closely resembles the branch-and-cut method
for MILP.
We denote by LP(K, iI , uiI ) the LP relaxation of (MP(K)) obtained
by dropping the integrality requirements and setting the lower and upper
bounds on the xI variables to lI and uI respectively. The LP/NLP-BB
method starts by solving the NLP relaxation (NLPR(LI ,UI )), and sets up
the reduced master problem (MP(K)). A branch-and-bound enumeration
is then started for (MP(K)) using its LP relaxation. The branch-and-
bound enumeration generates linear programs LP(K, iI , uiI ) at each node
N i = (iI , uiI ) of the tree. Whenever an integer solution is found at a
node, the standard branch and bound is interrupted and (NLP(x̂iI )) (and
(NLPF(x̂iI )) if NLP(x̂iI ) is infeasible) is solved by fixing integer variables
to solution values at that node. The linearizations from the solution of
(NLP(x̂iI )) are then used to update the reduced master problem (MP(K)).
The branch-and-bound tree is then continued with the updated reduced
master problem. The main advantage of LP/NLP-BB over OA is that
the need of restarting the tree search is avoided and only a single tree is
required. Algorithm 4 gives the pseudo-code for LP/NLP-BB.
Adding linearizations dynamically to the reduced master problem
(MP(K)) is a key feature of LP/NLP-BB. Note, however that the same

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 13

idea could potentially be applied to both the GBD and ECP methods. The
LP/NLP-BB method commonly significantly reduces the total number of
nodes to be enumerated when compared to the OA method. However,
the trade-off is that the number of NLP subproblems might increase. As
part of his Ph.D. thesis, Leyffer implemented the LP/NLP-BB method
and reported substantial computational savings [76]. The LP/NLP-Based
Branch-and-Bound algorithm is implemented in solvers Bonmin [24] and
FilMINT [2].

Algorithm 4 The LP/NLP-Based Branch-and-Bound algorithm.


0. Initialize.
L ← {(LI , UI )}. zU ← +∞. x∗ ← NONE.
Let x be the optimal solution of (NLPR(lI , uI )).
K ← {x}.
1. Terminate?
Is L = ∅? If so, the solution x∗ is optimal.
2. Select.
Choose and delete a problem N i = (lIi , uiI ) from L.
3. Evaluate.
Solve LP(K, lIi , uiI ). If the problem is infeasible, go to step 1, else
let zMPR(K,lIi ,uiI ) be its optimal objective function value and (η̂ i , x̂i )
be its optimal solution.
4. Prune.
If zMPR(K,lIi ,uiI ) ≥ zU , go to step 1.
5. NLP Solve?
Is x̂iI integer? If so, solve (NLP(x̂iI )), otherwise go to step 8.
Let xi be the optimal (or minimally infeasible) solution.
6. Upper bound?
Is xi feasible for (MINLP) and f (xi ) < zU ? If so, x∗ ← xi , zU ←
f (xi ).
7. Refine.
Let K ← K ∪ (xi ). Go to step 3.
8. Divide.
Divide the feasible region of N i into a number of smaller feasi-
ble subregions, creating nodes N i1 , N i2 , . . . , N ik . For each j =
i
1, 2, . . . , k, let zLj ← zMPR(K,lIi ,uiI ) and add the problem N ij to L.
Go to step 1.

4. Implementation techniques for convex MINLP. Seasoned al-


gorithmic developers know that proper engineering and implementation can
have a large positive impact on the final performance of software. In this
section, we present techniques that have proven useful in efficiently imple-
menting the convex MINLP algorithms of Section 3.

www.it-ebooks.info
14 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

The algorithms for solving MINLP we presented share a great deal in


common with algorithms for solving MILP. NLP-BB is similar to a branch
and bound for MILP, simply solving a different relaxation at each node.
The LP/NLP-BB algorithm can be viewed as a branch-and-cut algorithm,
similar to those employed to solve MILP, where the refining linearizations
are an additional class of cuts used to approximate the feasible region. An
MILP solver is used as a subproblem solver in the iterative algorithms (OA,
GBD, ECP). In practice, all the methods spend most of their computing
time doing variants of the branch-and-bound algorithm. As such, it stands
to reason that advances in techniques for the implementation of branch
and bound for MILP should be applicable and have a positive impact for
solving MINLP. The reader is referred to the recent survey paper [84] for
details about modern enhancements in MILP software.
First we discuss improvements to the Refine step of LP/NLP-BB,
which may also be applicable to the GBD or ECP methods. We then pro-
ceed to the discussion of the Select and Divide steps which are important
in any branch-and-bound implementation. The section contains an intro-
duction to classes of cutting planes that may be useful for MINLP and
reviews recent developments in heuristics for MINLP.
We note that in the case of iterative methods OA, GBD and ECP,
some of these aspects are automatically taken care of by using a “black-
box” MILP solver to solve (MP(K)) as a component of the algorithm. In
the case of NLP-BB and LP/NLP-BB, one has to more carefully take these
aspects into account, in particular if one wants to be competitive in practice
with methods employing MILP solvers as components.

4.1. Linearization generation. In the OA Algorithm 2, the ECP


Algorithm 3, or the LP/NLP-BB Algorithm 4, a key step is to Refine the
approximation of the nonlinear feasible region by adding linearizations of
the objective and constraint functions (2.1) and (2.2). For convex MINLPs,
linearizations may be generated at any point and still give a valid outer
approximation of the feasible region, so for all of these algorithms, there is
a mechanism for enhancing them by adding many linear inequalities. The
situation is similar to the case of a branch-and-cut solver for MILP, where
cutting planes such as Gomory cuts [59], mixed-integer-rounding cuts [85],
and disjunctive (lift and project) cuts [9] can be added to approximate the
convex hull of integer solutions, but care must be taken in a proper imple-
mentation to not overwhelm the software used for solving the relaxations
by adding too many cuts. Thus, key to an effective refinement strategy in
many algorithms for convex MINLP is a policy for deciding when inequal-
ities should be added and removed from the master problem and at which
points the linearizations should be taken.
Cut Addition: In the branch-and-cut algorithm for solving MILP,
there is a fundamental implementation choice that must be made when
confronted with an infeasible (fractional) solution: should the solution be

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 15

eliminated by cutting or branching? Based on standard ideas employed


for answering this question in the context of MILP, we offer three rules-of-
thumb that are likely to be effective in the context of linearization-based
algorithms for solving MINLP. First, linearizations should be generated
early in the procedure, especially at the very top of the branch-and-bound
tree. Second, the incremental effectiveness of adding additional lineariza-
tions should be measured in terms of the improvement in the lower bound
obtained. When the rate of lower bound change becomes too low, the
refinement process should be stopped and the feasible region divided in-
stead. Finally, care must be taken to not overwhelm the solver used for the
relaxations of the master problem with too many linearizations.
Cut Removal: One simple strategy for limiting the number of linear
inequalities in the continuous relaxation of the master problem (MP(K)) is
to only add inequalities that are violated by the current solution to the lin-
ear program. Another simple strategy for controlling the size of (MP(K))
is to remove inactive constraints from the formulation. One technique is
to monitor the dual variable for the row associated with the linearization.
If the value of the dual variable is zero, implying that removal of the in-
equality would not change the optimal solution value, for many consecutive
solutions, then the linearization is a good candidate to be removed from
the master problem. To avoid cycling, the removed cuts are usually stored
in a pool. Whenever a cut of the pool is found to be violated by the current
solution it is put back into the formulation.
Linearization Point Selection. A fundamental question in any
linearization-based algorithm (like OA, ECP, or LP/NLP-BB) is at which
points should the linearizations be taken. Each algorithm specifies a mini-
mal set of points at which linearizations must be taken in order to ensure
convergence to the optimal solution. However, the algorithm performance
may be improved by additional linearizations. Abhishek, Leyffer, and Lin-
deroth [2] offer three suggestions for choosing points about which to take
linearizations.
The first method simply linearizes the functions f and g about the
fractional point x̂ obtained as a solution to an LP relaxation of the master
problem. This method does not require the solution of an additional (non-
linear) subproblem, merely the evaluation of the gradients of objective and
constraint functions at the (already specified) point. (The reader will note
the similarity to the ECP method).
A second alternative is to obtain linearizations about a point that is
feasible with respect to the nonlinear constraints. Specifically, given a (pos-
sibly fractional) solution x̂, the nonlinear program (NLP(x̂I )) is solved to
obtain the point about which to linearize. This method has the advan-
tage of generating linearization about points that are closer to the feasible
region than the previous method, at the expense of solving the nonlinear
program (NLP(x̂I )).

www.it-ebooks.info
16 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

In the third point-selection method, no variables are fixed (save those


that are fixed by the nodal subproblem), and the NLP relaxation (NLPR(lI ,
uI )) is solved to obtain a point about which to generate linearizations.
These linearizations are likely to improve the lower bound by the largest
amount when added to the master problem since the bound obtained after
adding the inequalities is equal to zNLPR(li ,ui ) , but it can be time-consuming
to compute the linearizations.
These three classes of linearizations span the trade-off spectrum of
time required to generate the linearization versus the quality/strength of
the resulting linearization. There are obviously additional methodologies
that may be employed, giving the algorithm developer significant freedom
to engineer linearization-based methods.
4.2. Branching rules. We now turn to the discussion of how to split
a subproblem in the Divide step of the algorithms. As explained in Section
2.1, we consider here only branching by dichotomy on the variables. Sup-
pose that we are at node N i of the branch-and-bound tree with current
solution x̂i . The goal is to select an integer-constrained variable xj ∈ I
that is not currently integer feasible (x̂ij ∈ Z) to create two subproblems
by imposing the constraint xj ≤ x̂ij  (branching down) and xj ≥ x̂ij 
(branching up) respectively. Ideally, one would want to select the vari-
able that leads to the smallest enumeration tree. This of course cannot be
performed exactly, since the variable which leads to the smallest subtree
cannot be know a priori.
A common heuristic reasoning to choose the branching variable is to
try to estimate how much one can improve the lower bound by branching
on each variable. Because a node of the branch-and-bound tree is fathomed
whenever the lower bound for the node is above the current upper bound,
one should want to increase the lower bound as much as possible. Suppose
i i
that, for each variable xj , we have estimates Dj− and Dj+ on the increase
in the lower bound value obtained by branching respectively down and up.
i
A reasonable choice would be to select the variable for which both Dj− and
i i i
Dj+ are large. Usually, Dj− and Dj+ are combined in order to compute
a score for each variable and the variable of highest score is selected. A
common formula for computing this score is:
i
μ min(Dj− i
, Dj+ ) + (1 − μ) max(Dj−
i i
, Dj+ )

(where μ ∈ [0, 1] is a prescribed parameter typipcally larger than 12 ).


i i
As for the evaluation or estimation of Dj− and Dj+ , two main methods
have been proposed: pseudo-costs [17] and strong-branching [66, 6]. Next,
we will present these two methods and how they can be combined.
4.2.1. Strong-branching. Strong-branching consists in computing
i i
the values Dj− and Dj+ by performing the branching on variable xj and
solving the two associated sub-problems. For each variable xj currently

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 17

fractional in x̂ij , we solve the two subproblems Nj− i i


and Nj+ obtained by
i
branching down and up, respectively, on variable j. Because Nj− and/or
i
Nj+ may be proven infeasible, depending on their status, different decision
may be taken.
• If both sub-problems are infeasible: the node N i is infeasible and
is fathomed.
• If one of the subproblems is infeasible: the bound on variable xj can
be strengthened. Usually after the bound is modified, the node is
reprocessed from the beginning (going back to the Evaluate step).
• If both subproblems are feasible, their values are used to compute
i i
Dj− and Dj+ .
Strong-branching can very significantly reduce the number of nodes
in a branch-and-bound tree, but is often slow overall due to the added
computing cost of solving two subproblems for each fractional variable.
To reduce the computational cost of strong-branching, it is often efficient
to solve the subproblems only approximately. If the relaxation at hand
is an LP (for instance in LP/NLP-BB) it can be done by limiting the
number of dual simplex iterations when solving the subproblems. If the
relaxation at hand is an NLP, it can be done by solving an approximation
of the problem to solve. Two possible relaxations that have been recently
suggested [23, 106, 80] are the LP relaxation obtained by constructing an
Outer Approximation or the Quadratic Programming approximation given
by the last Quadratic Programming sub-problem in a Sequential Quadratic
Programming (SQP) solver for nonlinear programming (for background on
SQP solvers see [94]).
4.2.2. Pseudo-costs. The pseudo-costs method consists in keeping
the history of the effect of branching on each variable and utilizing this
historical information to select good branching variables. For each variable
xj , we keep track of the number of times the variable has been branched on
(τj ) and the total per-unit degradation of the objective value by branching
down and up, respectively, Pj− and Pj+ . Each time variable j is branched
on, Pj− and Pj+ are updated by taking into account the change of bound
at that node:
i−
zL − zL
i i+
zL − zL i
Pj− = + P j− , and P j+ = + Pj+ ,
fji 1 − fji
i i
where xj is the branching variable, N− and N+ denote the nodes from
i i− i+
the down and up branch, zL (resp. zL and zL ) denote the lower bounds
computed at node N i (resp. N− i
and N+ i
), and fji = x̂ij − x̂ij  denotes
the fractional part of x̂ij . Whenever a branching decision has to be made,
i i
estimates of Dj− , Dj+ are computed by multiplying the average of observed
degradations with the current fractionality:
Pj− Pj+
i
Dj− = fji i
, and Dj+ = (1 − fji ) .
τj τj

www.it-ebooks.info
18 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

Note that contrary to strong-branching, pseudo-costs require very little


i i
computation since the two values Pj− and Pj+ are only updated once
i− i+
the values zL and zL have been computed (by the normal process of
branch and bound). Thus pseudo-costs have a negligible computational
cost. Furthermore, statistical experiments have shown that pseudo-costs
often provide reasonable estimates of the objective degradations caused by
branching [83] when solving MILPs.
Two difficulties arise with pseudo-costs. The first one, is how to update
the historical data when a node is infeasible. This matter is not settled.
Typically, the pseudo-costs update is simply ignored if a node is infeasible.
The second question is how the estimates should be initialized. For
this, it seems that the agreed upon state-of-the art is to combine pseudo-
costs with strong-branching, a method that may address each of the two
methods’ drawbacks— strong-branching is too slow to be performed at
every node of the tree, and pseudo-costs need to be initialized. The idea
is to use strong-branching at the beginning of the tree search, and once all
pseudo-costs have been initialized, to revert to using pseudo-costs. Several
variants of this scheme have been proposed. A popular one is reliability
branching [4]. This rule depends on a reliability parameter κ (usually a
natural number between 1 and 8), pseudo-costs are trusted for a particular
variable only after strong-branching has been performed κ times on this
variable.
Finally, we note that while we have restricted ourselves in this dis-
cussion to dichotomy branching, one can branch in many different ways.
Most state-of-the-art solvers allow branching on SOS constraints [14]. More
generally, one could branch on split disjunctions of the form (πT xI ≤
π0 ) ∨ (π T xI ≥ π0 + 1) (where (π, π0 ) ∈ Zn+1 ). Although promising results
have been obtained in the context of MILP [69, 37], as far as we know, these
methods have not been used yet in the context of MINLPs. Finally, meth-
ods have been proposed to branch efficiently in the presence of symmetries
[86, 95]. Again, although they would certainly be useful, these methods
have not yet been adapted into software for solving MINLPs, though some
preliminary work is being done in this direction [81].

4.3. Node selection rules. The other important strategic decision


left unspecified in Algorithms 1 and 4 is which node to choose in the Select
step. Here two goals needs to be considered: decreasing the global upper
bound z U by finding good feasible solutions, and proving the optimality of
the current incumbent x∗ by increasing the lower bound as fast as possible.
Two classical node selection strategies are depth-first-search and best-first
(or best-bound). As its name suggest, depth first search selects at each
iteration the deepest node of the enumeration tree (or the last node put
in L). Best-first follows an opposite strategy of picking the open node N i
i
with the smallest zL (the best lower bound).

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 19

Both these strategies have their inherent strengths and weaknesses.


Depth-first has the advantage of keeping the size of the list of open-nodes
as small as possible. Furthermore, the changes from one subproblem to
the next are minimal, which can be very advantageous for subproblem
solvers that can effective exploit “warm-start” information. Also, depth-
first search is usually able to find feasible solutions early in the tree search.
On the other hand, depth-first can exhibit extremely poor performance
if no good upper bound is known or found: it may explore many nodes
with lower bound higher than the actual optimal solution. Best-bound has
the opposite strengths and weaknesses. Its strength is that, for a fixed
branching, it minimizes the number of nodes explored (because all nodes
explored by it would be explored independently of the upper bound). Its
weaknesses are that it may require significant memory to store the list L
of active nodes, and that it usually does not find integer feasible solutions
before the end of the search. This last property may not be a shortcoming
if the goal is to prove optimality but, as many applications are too large
to be solved to optimality, it is particularly undesirable that a solver based
only on best-first aborts after several hours of computing time without
producing one feasible solution.
It should seem natural that good strategies are trying to combine both
best-first and depth first. Two main approaches are two-phase methods
[54, 13, 44, 83] and diving methods [83, 22].
Two-phase methods start by doing depth-first to find one (or a small
number of) feasible solution. The algorithm then switches to best-first in
order to try to prove optimality (if the tree grows very large, the method
may switch back to depth-first to try to keep the size of the list of active
nodes under control).
Diving methods are also two-phase methods in a sense. The first phase
called diving does depth-first search until a leaf of the tree (either an integer
feasible or an infeasible one) is found. When a leaf is found, the next node
is selected by backtracking in the tree for example to the node with best
lower bound, and another diving is performed from that node. The search
continues by iterating diving and backtracking.
Many variants of these two methods have been proposed in context
of solving MILP. Sometimes, they are combined with estimations of the
quality of integer feasible solutions that may be found in a subtree com-
puted using pseudo-costs (see for example [83]). Computationally, it is not
clear which of these variants performs better. A variant of diving called
probed diving that performs reasonably well was described by Bixby and
Rothberg [22]. Instead of conducting a pure depth-first search in the diving
phase, the probed diving method explores both children of the last node,
continuing the dive from the best one of the two (in terms of bounds).

4.4. Cutting planes. Adding inequalities to the formulation so that


its relaxation will more closely approximate the convex hull of integer fea-

www.it-ebooks.info
20 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

sible solutions was a major reason for the vast improvement in MILP so-
lution technology [22]. To our knowledge, very few, if any MINLP solvers
add inequalities that are specific to the nonlinear structure of the problem.
Nevertheless, a number of cutting plane techniques that could be imple-
mented have been developed in the literature. Here we outline a few of
these techniques. Most of them have been adapted from known methods in
the MILP case. We refer the reader to [36] for a recent survey on cutting
planes for MILP.
4.4.1. Gomory cuts. The earliest cutting planes for Mixed Integer
Linear Programs were Gomory Cuts [58, 59]. For simplicity of exposition,
we assume a pure Integer Linear Program (ILP): I = {1, . . . , n}, with
linear constraints given in matrix form as Ax ≤ b and x ≥ 0. The idea
underlying the inequalities is to choose a set of non-negative multipliers
u ∈ Rm + andform the surrogate constraint uT Ax ≤ uT b. Since x ≥ 0, the
inequality j∈N u aj xj ≤ uT b is valid, and since uT aj xj is an integer,
T

the
 right-hand side may also be rounded down to form the Gomory cut
j∈N u T
a j xj ≤ uT b. This simple procedure suffices to generate all
valid inequalities for an ILP [35]. Gomory cuts can be generalized to Mixed
Integer Gomory (MIG) cuts which are valid for MILPs. After a period of
not being used in practice to solve MILPs, Gomory cuts made a resurgence
following the work of Balas et al. [10], which demonstrated that when used
in combination with branch and bound, MIG cuts were quite effective in
practice.
For MINLP, Cezik and Iyengar [34] demonstrate that if the nonlinear
constraint set gj (x) ≤ 0 ∀j ∈ J can be described using conic constraints
T x K b , then the Gomory procedure is still applicable. Here K, is a
homogeneous, self-dual, proper, convex cone, and the notation x K y de-
notes that (x − y) ∈ K. Each cone K has a dual cone K∗ with the property
def
that K∗ = {u | uT z ≥ 0 ∀z ∈ K} . The extension of the Gomory proce-
dure to the case of conic integer programming is clear from the following
equivalence:
Ax K b ⇔ uT Ax ≥ uT b ∀u K∗ 0.
Specifically, elements from the dual cone u ∈ K∗ can be used to perform
the aggregation, and the regular Gomory procedure applied. To the au-
thors’ knowledge, no current MINLP software employs conic Gomory cuts.
However, most solvers generate Gomory cuts from the existing linear in-
equalities in the model. Further, as pointed out by Akrotirianakis, Maros,
and Rustem [5], Gomory cuts may be generated from the linearizations
(2.1) and (2.2) used in the OA, ECP, or LP/NLP-BB methods. Most
linearization-based software will by default generate Gomory cuts on these
linearizations.
4.4.2. Mixed integer rounding. Consider the simple two-variable
set X = {(x1 , x2 ) ∈ Z × R+ | x1 ≤ b + x2 }. It is easy to see that the

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 21

1
mixed integer rounding inequality x1 ≤ b + 1−f x2 , where f = b − b
represents the fractional part of b, is a valid inequality for X. Studying the
convex hull of this simple set and some related counterparts have generated
a rich classes of inequalities that may significantly improve the ability to
solve MILPs [85]. Key to generating useful inequalities for computation
is to combine rows of the problem in a clever manner and to use variable
substitution techniques.
Atamtürk and Narayan [7] have extended the concept of mixed integer
rounding to the case of Mixed Integer Second-Order Cone Programming
(MISOCP). For the conic mixed integer set
  
T = (x1 , x2 , x3 ) ∈ Z × R2 | (x1 − b)2 + x22 ≤ x3

the following simple conic mixed integer rounding inequality



[(1 − 2f )(x1 − b) + f ]2 + x22 ≤ x3

helps to describe the convex hull of T . They go on to show that employing


these inequalities in a cut-and-branch procedure for solving MISOCPs is
significantly beneficial. To the authors’ knowledge, no available software
employs this technology, so this may be a fruitful line of computational
research.

4.4.3. Disjunctive inequalities. Stubbs and Mehrotra [101], build-


ing on the earlier seminal work of Balas [8] on disjunctive programming
and its application to MILP (via lift and project cuts) of Balas, Ceria and
Cornuéjols [9], derive a lift and project cutting plane framework for convex
(0-1) MINLPs. Consider the feasible region of the continuous relaxation
of (MINLP-1) R = {(x, η) | f (x) ≤ η, gj (x) ≤ 0 ∀j ∈ J, x ∈ X}. The
procedure begins by choosing a (branching) dichotomy xi = 0 ∨ xi = 1
for some i ∈ I. The convex hull of the union of the two (convex) sets
def
Ri− = {(x, η) ∈ R | xi = 0}, Ri+ = {(x, η) ∈ R | xi = 1} can be repre-
sented in a space of dimension 3n + 5 as
⎧ ⎫

⎪ x = λ− x− + λ+ x+ , ⎪

⎨ ⎬
(x, η, x− , η − , η = λ − η − + λ+ η + ,
Mi (R) = + + − + − + − + .

⎪ x ,η ,λ ,λ ) λ + λ = 1, λ ≥ 0, λ ≥ 0 ⎪ ⎪
⎩ ⎭
(x− , η − ) ∈ Ri− , (x+ , η + ) ∈ Ri+

One possible complication with the convex hull description Mi (R) is


caused by the nonlinear, nonconvex relationships x = λ− x− + λ+ x+ and
η = λ− η − + λ+ η + . However, this description can be transformed to an
equivalent description M̃i (R) with only convex functional relationships be-
tween variables using the perspective function [101, 65].

www.it-ebooks.info
22 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

/ conv(Ri− ∪ Ri+ ), the lift and project


Given some solution (x̄, η̄) ∈
procedure operates by solving a convex separation problem

min d(x, η) (4.1)


(x,η,x̃− ,η̃ − ,x̃+ ,η̃ + ,λ− ,λ+ )∈M̃i (R)

(where d(x, η) is the distance to the point (x̄, η̄) in any norm). The lift-
and-project inequality

ξxT (x − x̄) + ξηT (η − η̄) ≥ 0 (4.2)

separates (x̄, η̄) from conv(Ri− ∪ Ri+ ), where ξ is a subgradient of d(x, η) at


the optimal solution of (4.1).
An implementation and evaluation of some of these ideas in the context
of MISOCP has been done by Drewes [41]. Cezik and Iyengar [34] had
also stated that the application of disjunctive cuts to conic-IP should be
possible.
A limitation of disjunctive inequalities is that in order to generate a
valid cut, one must solve an auxiliary (separation) problem that is three
times the size of the original relaxation. In the case of MILP, clever intu-
ition of Balas and Perregaard [11] allow to solve this separation problem
while staying in the original space. No such extension is known in the case
of MINLP. Zhu and Kuno [114] have suggested to replace the true nonlin-
ear convex hull by a linear approximation taken about the solution to a
linearized master problem like MP(K).
Kılınç et al. [71] have recently made the observation that a weaker
form of the lift and project inequality (4.2) can be obtained from branching
dichotomy information. Specifically, given values η̂i− = min{η|(x, η) ∈ Ri− }
and η̂i+ min{η|(x, η) ∈ Ri+ }, the strong-branching cut

η ≥ η̂i− + (η̂i+ − η̂i− )xi

is valid for MINLP, and is a special case of (4.2). Note that if strong-
branching is used to determine the branching variable, then the values η̂i−
and η̂i+ are produced as a byproduct.
4.5. Heuristics. Here we discuss heuristic methods that are aimed at
finding integer feasible solutions to MINLP with no guarantee of optimality
or success. Heuristics are usually fast algorithms. In a branch-and-bound
algorithm they are typically run right after the Evaluate step. Depending
on the actual running time of the heuristic, it may be called at every node,
every nth node, or only at the root node. In linearization-based methods
like OA, GBD, ECP, or LP/NLP-BB, heuristics may be run in the Upper
Bound and Refine step, especially in the case when NLP(x̂I ) is infeasible.
Heuristics are very important because by improving the upper bound zU ,
they help in the Prune step of the branch-and-bound algorithm or in the
convergence criterion of the other algorithms. From a practical point of

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 23

view, heuristics are extremely important when the algorithm cannot be


carried out to completion, so that a feasible solution may be returned to
the user.
Many heuristic methods have been devised for MILP, we refer the
reader to [19] for a recent and fairly complete review. For convex MINLP,
two heuristic principles that have been used are diving heuristics and the
feasibility pump.
We note that several other heuristic principles could be used such as
RINS [39] or Local Branching [49] but as far as we know, these have not
been applied yet to (convex) MINLPs and we will not cover them here.
4.5.1. Diving heuristics. Diving heuristics are very related to the
diving strategies for node selection presented in Section 4.3. The basic
principle is to simulate a dive from the current node to a leaf of the tree
by fixing variables (either one at a time or several variables at a time).
The most basic scheme is, after the NLP relaxation has been solved, to
fix the variable which is the least integer infeasible in the current solution
to the closest integer and resolve. This process is iterated until either the
current solution is integer feasible or the NLP relaxation becomes infeasible.
Many variants of this scheme have been proposed for MILP (see [19] for
a good review). These differ mainly in the the number of variables fixed,
the way to select variables to fix, and in the possibility of doing a certain
amount of backtracking (unfixing previously fixed variables). The main
difficulty when one tries to adapt these scheme to MINLP is that instead
of having to resolve an LP with a modified bound at each iteration (an
operation which is typically done extremely efficiently by state-of-the-art
LP solvers) one has to solve an NLP (where warm-starting methods are
usually much less efficient).
Bonami and Gonçalves [26] have adapted the basic scheme to MINLPs
in two different manners. First in a straightforward way, but trying to limit
the number of NLPs solved by fixing many variables at each iteration and
backtracking if the fixings induce infeasibility. The second adaptation tries
to reduce the problem to a MILP by fixing all the variables that appear in
a nonlinear term in the objective or the constraints. This MILP problem
may be given to a MILP solver in order to find a feasible solution. A similar
MINLP heuristic idea of fixing variables to create MILPs can be found in
the work [20].
4.5.2. Feasibility pumps. The feasibility pump is another heuristic
principle for quickly finding feasible solutions. It was initially proposed by
Fischetti, Glover and Lodi [48] for MILP, and can be extended to convex
MINLP in several manners.
First we present the feasibility pump in its most trivial extension to
MINLP. The basic principle of the Feasibility Pump consists of generating
a sequence of points x0 , . . . , xk that satisfy the constraints of the contin-
uous relaxation NLPR(LI ,UI ). Associated with the sequence x0 , . . . , xk

www.it-ebooks.info
24 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

of integer infeasible points is a sequence x̂1 , . . . , x̂k+1 of points that are


integer feasible but do not necessarily satisfy the other constraints of the
problem. Specifically, x0 is the optimal solution of NLPR(LI ,UI ). Each
x̂i+1 is obtained by rounding xij to the nearest integer for each j ∈ I and
keeping the other components equal to xij . The sequence xi is generated
by solving a nonlinear program whose objective function is to minimize the
distance of x to x̂i on the integer variables according to the 1 -norm:

zFP-NLP(x̂I ) = minimize |xk − x̂ik |
k∈I

subject to gj (x) ≤ 0 ∀j ∈ J, (FP-NLP(x̂I ))


x ∈ X; LI ≤ x̂I ≤ UI .

The two sequences have the property that at each iteration the distance
between xi and x̂i+1 is non-increasing. The procedure stops whenever an
integer feasible solution is found (or x̂k = xk ). This basic procedure may
cycle or stall without finding an integer feasible solution and randomization
has been suggested to restart the procedure [48]. Several variants of this
basic procedure have been proposed in the context of MILP [18, 3, 50].
The authors of [1, 26] have shown that the basic principle of the Feasibility
Pump can also find good solutions in short computing time also in the
context of MINLP.
Another variant of the Feasibility Pump for convex MINLPs was pro-
posed by Bonami et al. [25]. Like in the basic FP scheme two sequences
are constructed with the same properties: x0 , . . . , xk are points in X that
satisfy g(xi ) ≤ 0 but not xiI ∈ Z|I| and x̂1 , . . . , x̂k+1 are points that do
not necessarily satisfy g(x̂i ) ≤ 0 but satisfy x̂iI ∈ Z|I| . The sequence xi is
generated in the same way as before but the sequence x̂i is now generated
by solving MILPs. The MILP to solve for finding x̂i+1 is constructed by
building an outer approximation of the constraints of the problem with
linearizations taken in all the points of the sequence x0 , . . . , xi . Then, x̂i+1
is found as the point in the current outer approximation of the constraints
that is closest to xi in 1 -norm in the space of integer constrained variables:


zFP-Mi = minimize |xj − xij |
i∈I

s.t. g(xl ) + ∇g(xl )T (x − xl ) ≤ 0 l = 1, . . . , i, (M-FPi )


x ∈ X, xI ∈ ZI .

Unlike the procedure of Fischetti, Glover and Lodi, the Feasibility


Pump for MINLP cannot cycle and it is therefore an exact algorithm:
either it finds a feasible solution or it proves that none exists. This variant
of the FP principle can also be seen as a variant of the Outer Approximation
decomposition scheme presented in Section 3.2. In [25], it was also proposed

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 25

to iterate the FP scheme by integrating the linearization of the objective


function in the constraint system of (M-FPi ) turning the feasibility pump
into an exact iterative algorithm which finds solutions of increasingly better
cost until eventually proving optimality. Abhishek et al. [1] have also
proposed to integrate this Feasibility Pump into a single tree search (in the
same way as Outer Approximation decomposition can be integrated in a
single tree search when doing the LP/NLP-BB).

5. Software. There are many modern software packages implement-


ing the algorithms of Section 3 that employ the modern enhancements
described in Section 5. In this section, we describe the features of six dif-
ferent packages. The focus is on solvers for general convex MINLPs, not
only special cases such as MIQP, MIQCP, or MISOCP. All of these pack-
ages may be freely used via a web interface on NEOS (http://www-neos.
mcs.anl.gov).

5.1. α-ECP. α-ECP [110] is a solver based on the ECP method de-
scribed in Section 3.4. Problems to be solved may be specified in a text-
based format, as user-supplied subroutines, or via the GAMS algebraic
modeling language. The software is designed to solve convex MINLP, but
problems with a pseudo-convex objective function and pseudo-convex con-
straints can also be solved to global optimality with α-ECP. A significant
feature of the software is that no nonlinear subproblems are required to
be solved. (Though recent versions of the code have included an option to
occasionally solve NLP subproblems, which may improve performance, es-
pecially on pseudo-convex instances.) Recent versions of the software also
include enhancements so that each MILP subproblem need not be solved to
global optimality. α-ECP requires a (commercial) MILP software to solve
the reduced master problem (RM-ECP(K)), and CPLEX, XPRESS-MP,
or Mosek may be used for this purpose.
In the computational experiment of Section 6, α-ECP (v1.75.03) is
used with CPLEX (v12.1) as MILP solver, CONOPT (v3.24T) as NLP
solver and α-ECP is run via GAMS. Since all instances are convex, setting
the ECPstrategy option to 1 instructed α-ECP to not perform algorithmic
steps relating to the solution of pseudo-convex instances.

5.2. Bonmin. Bonmin is an open-source MINLP solver and frame-


work with implementations of algorithms NLP-BB, OA, and two different
LP/NLP-BB algorithms with different default parameters. Source code and
binaries of Bonmin are available from COIN-OR (http://www.coin-or.
org). Bonmin may be called as a solver from both the AMPL and GAMS
modeling languages.
Bonmin interacts with the COIN-OR software Cbc to manage the
branch-and-bound trees of its various algorithms. To solve NLP subprob-
lems, Bonmin may be instrumented to use either Ipopt [107] or FilterSQP
[52]. Bonmin uses the COIN-OR software Clp to solve linear programs,

www.it-ebooks.info
26 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

and may use Cbc or Cplex to solve MILP subproblems arising in its vari-
ous algorithms.
The Bonmin NLP-BB algorithm features a range of different heuris-
tics, advanced branching techniques such as strong-branching or pseudo-
costs branching, and five different choices for node selection strategy. The
Bonmin LP/NLP-BB methods use row management, cutting planes, and
branching strategies from Cbc. A distinguishing feature of Bonmin is that
one may instruct Bonmin to use a (time-limited) OA or feasibility pump
heuristic at the beginning of the optimization.
In the computational experiments, Bonmin (v1.1) is used with Cbc
(v2.3) as the MILP solver, Ipopt (v2.7) as NLP solver, and Clp (v1.10)
is used as LP solver. For Bonmin, the algorithms, NLP-BB (denoted
as B-BB) and LP/NLP-BB (denoted as B-Hyb) are tested. The default
search strategies of dynamic node selection (mixture of depth-first-search
and best-bound) and strong-branching were employed.
5.3. DICOPT. DICOPT is a software implementation of the OA
method described in Section 3.2. DICOPT may be used as a solver from the
GAMS modeling language. Although OA has been designed to solve convex
MINLP, DICOPT may often be used successfully as a heuristic approach
for nonconvex MINLP, as it contains features such as equality relaxation
[72] and augmented penalty methods [105] for dealing with nonconvexities.
DICOPT requires solvers for both NLP subproblems and MILP subprob-
lems, and it uses available software as a “black-box” in each case. For NLP
subproblems, possible NLP solvers include CONOPT [42], MINOS [89]
and SNOPT [57]. For MILP subproblems, possible MILP solvers include
CPLEX [66] and XPRESS [47]. DICOPT contains a number of heuristic
(inexact) stopping rules for the OA method that may be especially effective
for nonconvex instances.
In our computational experiment, the DICOPT that comes with
GAMS v23.2.1 is used with CONOPT (v3.24T) as the NLP solver and
Cplex (v12.1) as the MILP solver. In order to ensure that instances are
solved to provable optimality, the GAMS/DICOPT option stop was set to
value 1.
5.4. FilMINT. FilMINT [2] is a non-commercial solver for convex
MINLPs based on the LP/NLP-BB algorithm. FilMINT may be used
through the AMPL language.
FilMINT uses MINTO [93] a branch-and-cut framework for MILP to
solve the reduced master problem (MP(K)) and filterSQP [52] to solve
nonlinear subproblems. FilMINT uses the COIN-OR LP solver Clp or
CPLEX to solve linear programs.
FilMINT by default employs nearly all of MINTO’s enhanced MILP
features, such as cutting planes, primal heuristics, row management, and
enhanced branching and node selection rules. By default, pseudo-costs
branching is used as branching strategy and best estimate is used as node

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 27

selection strategy. An NLP-based Feasibility Pump can be run at the be-


ginning of the optimization as a heuristic procedure. The newest version of
FilMINT has been augmented with the simple strong-branching disjunctive
cuts described in Section 4.4.3.
In the computational experiments of Section 6, FilMINT v0.1 is used
with Clp as LP solver. Two versions of FilMINT are tested—the default
version and a version including the strong-branching cuts (Filmint-SBC).
5.5. MINLP BB. MINLP BB [77] is an implementation of the NLP-
BB algorithm equipped with different node selection and variable selection
rules. Instances can be specified to MINLP BB through an AMPL inter-
face.
MINLP BB contains its own tree-manager implementation, and NLP
subproblems are solved by FilterSQP [52]. Node selection strategies avail-
able in MINLP BB include depth-first-search, depth-first-search with back-
track to best-bound, best-bound, and best-estimated solution. For branch-
ing strategies, MINLP BB contains implementations of most fractional
branching, strong-branching, approximate strong-branching using second-
order information, pseudo-costs branching and reliability branching.
MINLP BB is written in FORTRAN. Thus, there is no dynamic memory
allocation, and the user must specify a maximum memory (stack) size at
the beginning of algorithm to store the list of open nodes.
For the computational experiments with MINLP BB, different levels
of stack size were tried in an attempt to use the entire available mem-
ory for each instance. The default search strategies of depth-first-search
with backtrack to best-bound and pseudo-costs branching were employed
in MINLP BB (v20090811).
5.6. SBB. SBB [30] is a NLP-based branch-and-bound solver that is
available through the GAMS modeling language. The NLP subproblems
can be solved by CONOPT [42], MINOS [89] and SNOPT [57]. Pseudo-
costs branching is an option as a branching rule. As a node selection
strategy, depth-first-search, best-bound, best-estimate or combination of
these three can be employed. Communication of subproblems between the
NLP solver and tree manager is done via files, so SBB may incur some
extra overhead when compared to other solvers.
In our computational experiments, we use the version of SBB shipped
with GAMS v23.2.1. CONOPT is used as NLP solver, and the SBB default
branching variable and node selection strategies are used.
6. Computational study.
6.1. Problems. The test instances used in the computational exper-
iments were gathered from the MacMINLP collection of test problems [79],
the GAMS collection of MINLP problems [31], the collection on the web-
site of IBM-CMU research group [99], and instances created by the authors.
Characteristics of the instances are given in Table 1, which lists whether

www.it-ebooks.info
28 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

or not the instance has a nonlinear objective function, the total number of
variables, the number of integer variables, the number of constraints, and
how many of the constraints are nonlinear.
BatchS: The BatchS problems [97, 104] are multi-product batch plant
design problems where the objective is to determine the volume of the
equipment, the number of units to operate in parallel, and the locations of
intermediate storage tanks.
CLay: The CLay problems [98] are constrained layout problems where
non-overlapping rectangular units must be placed within the confines of
certain designated areas such that the cost of connecting these units is
minimized.
FLay: The FLay problems [98] are farmland layout problems where
the objective is to determine the optimal length and width of a number of
rectangular patches of land with fixed area, such that the perimeter of the
set of patches is minimized.
fo-m-o: These are block layout design problems [33], where an orthog-
onal arrangement of rectangular departments within a given rectangular
facility is required. A distance-based objective function is to be minimized,
and the length and width of each department should satisfy given size and
area requirements.
RSyn: The RSyn problems [98] concern retrofit planning, where one
would like to redesign existing plants to increase throughput, reduce energy
consumption, improve yields, and reduce waste generation. Given limited
capital investments to make process improvements and cost estimations
over a given time horizon, the problem is to identify the modifications that
yield the highest income from product sales minus the cost of raw materials,
energy, and process modifications.
SLay: The SLay problems [98] are safety layout problems where opti-
mal placement of a set of units with fixed width and length is determined
such that the Euclidean distance between their center point and a prede-
fined “safety point” is minimized.
sssd: The sssd instances [45] are stochastic service system design prob-
lems. Servers are modeled as M/M/1 queues, and a set of customers must
be assigned to the servers which can be operated at different service levels.
The objective is to minimize assignment and operating costs.
Syn: The Syn instances [43, 102] are synthesis design problems dealing
with the selection of optimal configuration and parameters for a processing
system selected from a superstructure containing alternative processing
units and interconnections.
trimloss: The trimloss (tls) problems [64] are cutting stock problems
where one would like to determine how to cut out a set of product paper
rolls from raw paper rolls such that the trim loss as well as the overall
production is minimized.
uflquad: The uflquad problems [61] are (separable) quadratic uncapac-
itated facility location problems where a set of customer demands must be

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 29

Table 1
Test set statistics.

Problem NonL Obj Vars Ints Cons NonL Cons



BatchS121208M √ 407 203 1510 1
BatchS151208M √ 446 203 1780 1
BatchS201210M 559 251 2326 1
CLay0303H 100 21 114 36
CLay0304H 177 36 210 48
CLay0304M 57 36 58 48
CLay0305H 276 55 335 60
CLay0305M 86 55 95 60
FLay04H 235 24 278 4
FLay05H 383 40 460 5
FLay05M 63 40 60 5
FLay06M 87 60 87 6
fo7 2 115 42 197 14
fo7 115 42 197 14
fo8 147 56 257 16
m6 87 30 145 12
m7 115 42 197 14
o7 2 115 42 197 14
RSyn0805H 309 37 426 3
RSyn0805M02M 361 148 763 6
RSyn0805M03M 541 222 1275 9
RSyn0805M04M 721 296 1874 12
RSyn0810M02M 411 168 854 12
RSyn0810M03M 616 252 1434 18
RSyn0820M 216 84 357 14
RSyn0830M04H 2345 496 4156 80
RSyn0830M 251 94 405 20
RSyn0840M √ 281 104 456 28
SLay06H √ 343 60 435 0
SLay07H √ 477 84 609 0
SLay08H √ 633 112 812 0
SLay09H √ 811 144 1044 0
SLay09M √ 235 144 324 0
SLay10M 291 180 405 0
sssd-10-4-3 69 52 30 12
sssd-12-5-3 96 75 37 15
sssd-15-6-3 133 108 45 18
Syn15M04M 341 120 762 44
Syn20M03M 316 120 657 42
Syn20M04M 421 160 996 56
Syn30M02M 321 120 564 40
Syn40M03H 1147 240 1914 84
Syn40M 131 40 198 28
tls4 106 89 60 4
tls5 √ 162 136 85 5
uflquad-20-150 √ 3021 20 3150 0
uflquad-30-100 √ 3031 30 3100 0
uflquad-40-80 3241 40 3280 0

www.it-ebooks.info
30 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

satisfied by open facilities. The objective is to minimize the sum of the fixed
cost for operating facilities and the shipping cost which is proportional to
the square of the quantity delivered to each customer.
All test problems are available in AMPL and GAMS formats and are
available from the authors upon request. In our experiments, α−ECP,
DICOPT, and SBB are tested through the GAMS interface, while Bonmin,
FilMINT and MINLP BB are tested through AMPL.

6.2. Computational results. The computational experiments have


been run on a cluster of identical 64-bit Intel Xeon microprocessors clocked
at 2.67 GHz, each with 3 GB RAM. All machines run the Red Hat En-
terprise Linux Server 5.3 operating system. A three hour time limit is
enforced. The computing times used in our comparisons are the wall-clock
times (including system time). All runs were made on processors dedicated
to the computation. Wall-clock times were used to accurately account for
system overhead incurred by file I/O operations required by the SBB solver.
For example, on the problem FLay05M, SBB reports a solution time of 0.0
seconds for 92241 nodes, but the wall-clock time spent is more than 17
minutes.
Table 3 summarizes the performance of the solvers on the 48 problems
of the test set. The table lists for each solver the number of times the opti-
mal solution was found, the number of times the time limit was exceeded,
the number of times the node limit exceeded, the number of times an error
occurred (other than time limit or memory limit), the number of times
the solver is fastest, and the arithmetic and geometric means of solution
times in seconds. When reporting aggregated solution times, unsolved or
failed instances are accounted for with the time limit of three hours. A
performance profile [40] of solution time is given in Figure 1. The detailed
performance of each solver on each test instance is listed in Table 4.
There are a number of interesting observations that can be made from
this experiment. First, for the instances that they can solve, the solvers
DICOPT and α-ECP tend to be very fast. Also, loosely speaking, for each
class of instances, there seems to be one or two solvers whose performance
dominates the others, and we have listed these in Table 2.

Table 2
Subjective Rating of Best Solver on Specific Instance Families.

Instance Family Best Solvers


Batch DICOPT
CLay, FLay, sssd FilMINT, MINLP BB
Fo, RSyn, Syn DICOPT, α-ECP
SLay MINLP BB
uflquad Bonmin (B-BB)

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 31

Table 3
Solver statistics on the test set.

Solver Opt. Time Mem. Error Fastest Arith. Geom.


Limit Limit Mean Mean
α-ECP 37 9 0 2 4 2891.06 105.15
Bonmin-BB 35 5 8 0 4 4139.60 602.80
Bonmin-Hyb 32 0 15 1 1 3869.08 244.41
Dicopt 30 16 0 2 21 4282.77 90.79
Filmint 41 7 0 0 4 2588.79 343.47
Filmint-SBC 43 5 0 0 3 2230.11 274.61
MinlpBB 35 3 7 3 12 3605.45 310.09
Sbb 18 23 6 1 0 7097.49 2883.75

In general, the variation in solver performance on different instance


families indicates that a “portfolio” approach to solving convex MINLPs
is still required. Specifically, if the performance of a specific solver is not
satisfactory, one should try other software on the instance as well.
7. Conclusions. Convex Mixed-Integer Nonlinear Programs
(MINLPs) can be used to model many decision problems involving both
nonlinear and discrete components. Given their generality and flexibility,
MINLPs have been proposed for many diverse and important scientific
applications. Algorithms and software are evolving so that instances of
these important models can often be solved in practice. The main advances
are being made along two fronts. First, new theory is being developed.
Second, theory and implementation techniques are being translated
from the more-developed arena of Mixed Integer Linear Programming
into MINLP. We hope this survey has provided readers the necessary
background to delve deeper into this rapidly evolving field.
Acknowledgments. The authors would also like to thank Jon Lee
and Sven Leyffer for inviting them to the stimulating IMA workshop on
Mixed Integer Nonlinear Programming. The comments of two anonymous
referees greatly improved the presentation.

www.it-ebooks.info
32 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

Table 4
Comparison of running times (in seconds) for the solvers α-ECP(αECP), Bonmin-
BB(B-BB), Bonmin-LP/NLP-BB(B-Hyb), DICOPT, FilMINT(Fil), FilMINT with
strong-branching cuts(Fil-SBC), MINLP BB(M-BB) and SBB (bold face for best run-
ning time). If the solver could not provide the optimal solution, we state the reason
with following letters: “t” states that the 3 hour time limit is hit, “m” states that the 3
GB memory limit is passed over and “f ” states that the solver has failed to find optimal
solution without hitting time limit or memory limit.

Problem αECP B-BB B-Hyb Dicopt Fil Fil-SBC M-BB Sbb


BatchS121208M 384.8 47.8 43.9 6.9 31.1 14.7 128.7 690.1
BatchS151208M 297.4 139.2 71.7 9.3 124.5 42.8 1433.5 138.3
BatchS201210M 137.3 188.1 148.8 9.5 192.4 101.9 751.2 463.8
CLay0303H 7.0 21.1 13.0 33.2 2.0 1.4 0.5 14.0
CLay0304H 22.1 76.0 68.9 t 21.7 8.7 7.3 456.6
CLay0304M 16.2 54.1 50.4 t 5.0 5.0 6.2 192.4
CLay0305H 88.8 2605.5 125.2 1024.3 162.0 58.4 87.6 1285.6
CLay0305M 22.7 775.0 46.1 2211.0 10.3 32.9 31.9 582.6
FLay04H 106.0 48.1 42.2 452.3 8.3 8.6 12.5 41.9
FLay05H t 2714.0 m t 1301.2 1363.4 1237.8 4437.0
FLay05M 5522.3 596.8 m t 698.4 775.4 57.5 1049.2
FLay06M t m m t t t 2933.6 t
fo7 2 16.5 m 104.5 t 271.9 200.0 964.1 t
fo7 98.9 t 285.2 4179.6 1280.1 1487.9 f t
fo8 447.4 t m t 3882.7 7455.5 m t
m6 0.9 79.4 31.1 0.2 89.0 29.8 7.1 2589.9
m7 4.6 3198.3 75.4 0.5 498.6 620.0 215.9 t
o7 2 728.4 m m t 3781.3 7283.8 m t
RSyn0805H 0.6 4.0 0.1 0.1 0.5 3.4 2.0 4.3
RSyn0805M02M 9.8 2608.2 m 3.4 485.0 123.8 806.1 t
RSyn0805M03M 16.6 6684.6 10629.9 7.2 828.2 668.3 4791.9 t
RSyn0805M04M 10.5 10680.0 m 8.1 1179.2 983.6 4878.8 m
RSyn0810M02M 10.1 t m 4.6 7782.2 3078.1 m m
RSyn0810M03M 43.3 m m 10.8 t 8969.5 m m
RSyn0820M 1.4 5351.4 11.8 0.3 232.2 231.6 1005.1 t
RSyn0830M04H 19.9 320.4 30.5 5.6 78.7 193.7 f f
RSyn0830M 2.2 m 7.4 0.5 375.7 301.1 1062.3 m
RSyn0840M 1.6 m 13.7 0.4 3426.2 2814.7 m m
SLay06H 316.8 3.4 34.4 7.5 186.6 5.0 1.3 731.9
SLay07H 2936.0 7.6 46.6 39.8 385.0 81.5 5.6 t
SLay08H 8298.0 12.5 178.1 t 1015.0 156.0 9.7 t
SLay09H t 25.6 344.0 3152.5 7461.5 1491.9 55.0 t
SLay09M t 8.6 63.0 t 991.7 41.8 2.1 1660.7
SLay10M t 108.4 353.7 t t 3289.6 43.3 2043.7
sssd-10-4-3 2.4 16.8 31.2 2.8 3.7 3.0 1.0 t
sssd-12-5-3 f 56.1 m f 9.6 13.3 5.7 t
sssd-15-6-3 f 163.4 m f 36.9 1086.4 41.6 t
Syn15M04M 2.9 379.5 8.5 0.2 39.5 47.5 60.4 278.0
Syn20M03M 2.9 9441.4 16.2 0.1 1010.3 901.9 1735.0 t
Syn20M04M 3.9 m 22.6 0.2 7340.9 5160.4 m t
Syn30M02M 3.0 t 13.6 0.4 2935.5 3373.8 8896.1 t
Syn40M03H 8.3 18.9 3.9 1.1 6.5 87.9 f 19.7
Syn40M 1.8 877.7 0.6 0.1 108.1 110.7 101.4 t
tls4 377.7 t f t 383.0 336.7 1281.8 t
tls5 t m m t t t m m
uflquad-20-150 t 422.2 m t t t t t
uflquad-30-100 t 614.5 m t t t t t
uflquad-40-80 t 9952.3 m t t t t t

www.it-ebooks.info
1

0.8

0.6

0.4

www.it-ebooks.info
Proportion of problems solved
alphaecp
bonmin-bb
0.2 bonmin-hyb
dicopt
filmint
filmint-sbc
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP

minlpbb

Fig. 1. Performance Profile Comparing Convex MINLP Solvers.


sbb
0
1 10 100 1000 10000
not more than x-times worst than best solver
33
34 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

REFERENCES

[1] K. Abhishek, S. Leyffer, and J.T. Linderoth, Feasibility pump heuristics for
Mixed Integer Nonlinear Programs. Unpublished working paper, 2008.
[2] , FilMINT: An outer-approximation-based solver for convex Mixed-Integer
Nonlinear Programs, INFORMS Journal on Computing, 22, No. 4 (2010),
pp. 555–567.
[3] T. Achterberg and T. Berthold, Improving the feasibility pump, Technical
Report ZIB-Report 05-42, Zuse Institute Berlin, September 2005.
[4] T. Achterberg, T. Koch, and A. Martin, Branching rules revisited, Opera-
tions Research Letters, 33 (2004), pp. 42–54.
[5] I. Akrotirianakis, I. Maros, and B. Rustem, An outer approximation based
branch-and-cut algorithm for convex 0-1 MINLP problems, Optimization
Methods and Software, 16 (2001), pp. 21–47.
[6] D. Applegate, R. Bixby, V. Chvátal, and W. Cook, On the solution of trav-
eling salesman problems, in Documenta Mathematica Journal der Deutschen
Mathematiker-Vereinigung, International Congress of Mathematicians, 1998,
pp. 645–656.
[7] A. Atamtürk and V. Narayanan, Conic mixed integer rounding cuts, Mathe-
matical Programming, 122 (2010), pp. 1–20.
[8] E. Balas, Disjunctive programming, in Annals of Discrete Mathematics 5: Dis-
crete Optimization, North Holland, 1979, pp. 3–51.
[9] E. Balas, S. Ceria, and G. Corneujols, A lift-and-project cutting plane al-
gorithm for mixed 0-1 programs, Mathematical Programming, 58 (1993),
pp. 295–324.
[10] E. Balas, S. Ceria, G. Cornuéjols, and N. R. Natraj, Gomory cuts revisited,
Operations Research Letters, 19 (1999), pp. 1–9.
[11] E. Balas and M. Perregaard, Lift-and-project for mixed 0-1 programming:
recent progress, Discrete Applied Mathematics, 123 (2002), pp. 129–154.
[12] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming:
Theory and Algorithms, John Wiley and Sons, New York, second ed., 1993.
[13] E.M.L. Beale, Branch and bound methods for mathematical programming sys-
tems, in Discrete Optimization II, P.L. Hammer, E.L. Johnson, and B.H.
Korte, eds., North Holland Publishing Co., 1979, pp. 201–219.
[14] E.W.L. Beale and J.A. Tomlin, Special facilities in a general mathematical
programming system for non-convex problems using ordered sets of variables,
in Proceedings of the 5th International Conference on Operations Research,
J. Lawrence, ed., 1969, pp. 447–454.
[15] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization,
SIAM, 2001. MPS/SIAM Series on Optimization.
[16] J.F. Benders, Partitioning procedures for solving mixed variable programming
problems, Numerische Mathematik, 4 (1962), pp. 238–252.
[17] M. Bénichou, J.M. Gauthier, P. Girodet, G. Hentges, G. Ribière, and
O. Vincent, Experiments in Mixed-Integer Linear Programming, Mathe-
matical Programming, 1 (1971), pp. 76–94.
[18] L. Bertacco, M. Fischetti, and A. Lodi, A feasibility pump heuristic for gen-
eral mixed-integer problems, Discrete Optimization, 4 (2007), pp. 63–76.
[19] T. Berthold, Primal Heuristics for Mixed Integer Programs, Master’s thesis,
Technische Universität Berlin, 2006.
[20] T. Berthold and A. Gleixner, Undercover - a primal heuristic for MINLP
based on sub-mips generated by set covering, Tech. Rep. ZIB-Report 09-40,
Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB), 2009.
[21] D. Bienstock, Computational study of a family of mixed-integer quadratic pro-
gramming problems, Mathematical Programming, 74 (1996), pp. 121–140.

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 35

[22] R. Bixby and E. Rothberg, Progress in computational mixed integer program-


ming. A look back from the other side of the tipping point, Annals of Oper-
ations Research, 149 (2007), pp. 37–41.
[23] P. Bonami, Branching strategies and heuristics in branch-and-bound for con-
vex MINLPs, November 2008. Presentation at IMA Hot Topics Workshop:
Mixed-Integer Nonlinear Optimization: Algorithmic Advances and Applica-
tions.
[24] P. Bonami, L.T. Biegler, A.R. Conn, G. Cornuéjols, I.E. Grossmann, C.D.
Laird, J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Wächter, An al-
gorithmic framework for convex Mixed Integer Nonlinear Programs, Discrete
Optimization, 5 (2008), pp. 186–204.
[25] P. Bonami, G. Cornuéjols, A. Lodi, and F. Margot, A feasibility pump for
Mixed Integer Nonlinear Programs, Mathematical Programming, 119 (2009),
pp. 331–352.
[26] P. Bonami and J. Gonçalves, Heuristics for convex mixed integer nonlinear
programs, Computational Optimization and Applications. To appear DOI:
10.1007/s10589-010-9350-6.
[27] R. Boorstyn and H. Frank, Large-scale network topological optimization, IEEE
Transactions on Communications, 25 (1977), pp. 29–47.
[28] B. Borchers and J.E. Mitchell, An improved branch and bound algorithm for
Mixed Integer Nonlinear Programs, Computers & Operations Research, 21
(1994), pp. 359–368.
[29] , A computational comparison of branch and bound and outer approxima-
tion algorithms for 0-1 Mixed Integer Nonlinear Programs, Computers &
Operations Research, 24 (1997), pp. 699–701.
[30] M.R. Bussieck and A. Drud, Sbb: A new solver for Mixed Integer Nonlinear
Programming, talk, OR 2001, Section ”Continuous Optimization”, 2001.
[31] M.R. Bussieck, A.S. Drud, and A. Meeraus, MINLPLib – a collection of test
models for Mixed-Integer Nonlinear Programming, INFORMS Journal on
Computing, 15 (2003).
[32] R.H. Byrd, J. Nocedal, and R.A. Waltz, KNITRO: An integrated package
for nonlinear optimization, in Large Scale Nonlinear Optimization, Springer
Verlag, 2006, pp. 35–59.
[33] I. Castillo, J. Westerlund, S. Emet, and T. Westerlund, Optimization
of block layout deisgn problems with unequal areas: A comparison of milp
and minlp optimization methods, Computers and Chemical Engineering, 30
(2005), pp. 54–69.
[34] M.T. Cezik and G. Iyengar, Cuts for mixed 0-1 conic programming, Mathe-
matical Programming, 104 (2005), pp. 179–202.
[35] V. Chvátal, Edmonds polytopes and a heirarchy of combinatorial problems, Dis-
crete Mathematics, 4 (1973), pp. 305–337.
[36] M. Conforti, G. Cornuéjols, and G. Zambelli, Polyhedral approaches to
Mixed Integer Linear Programming, in 50 Years of Integer Programming
1958—2008, M. Jünger, T. Liebling, D. Naddef, W. Pulleyblank, G. Reinelt,
G. Rinaldi, and L. Wolsey, eds., Springer, 2009.
[37] G. Cornuéjols, L. Liberti, and G. Nannicini, Improved strategies for branch-
ing on general disjunctions. To appear in Mathematical Programming A.
DOI: 10.1007/s10107-009-0333-2.
[38] R.J. Dakin, A tree search algorithm for mixed programming problems, Computer
Journal, 8 (1965), pp. 250–255.
[39] E. Danna, E. Rothberg, and C. LePape, Exploring relaxation induced neigh-
borhoods to improve MIP solutions, Mathematical Programming, 102 (2005),
pp. 71–90.
[40] E. Dolan and J. Moré, Benchmarking optimization software with performance
profiles, Mathematical Programming, 91 (2002), pp. 201–213.

www.it-ebooks.info
36 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

[41] S. Drewes, Mixed Integer Second Order Cone Programming, PhD thesis, Tech-
nische Universität Darmstadt, 2009.
[42] A. S. Drud, CONOPT – a large-scale GRG code, ORSA Journal on Computing,
6 (1994), pp. 207–216.
[43] M.A. Duran and I. Grossmann, An outer-approximation algorithm for a
class of Mixed-Integer Nonlinear Programs, Mathematical Programming, 36
(1986), pp. 307–339.
[44] J. Eckstein, Parallel branch-and-bound algorithms for general mixed integer pro-
gramming on the CM-5, SIAM Journal on Optimization, 4 (1994), pp. 794–
814.
[45] S. Elhedhli, Service System Design with Immobile Servers, Stochastic Demand,
and Congestion, Manufacturing & Service Operations Management, 8 (2006),
pp. 92–97.
[46] A. M. Eliceche, S. M. Corvalán, and P. Martı́nez, Environmental life cycle
impact as a tool for process optimisation of a utility plant, Computers and
Chemical Engineering, 31 (2007), pp. 648–656.
[47] Fair Isaac Corporation, XPRESS-MP Reference Manual, 2009. Release 2009.
[48] M. Fischetti, F. Glover, and A. Lodi, The feasibility pump, Mathematical
Programming, 104 (2005), pp. 91–104.
[49] M. Fischetti and A. Lodi, Local branching, Mathematical Programming, 98
(2003), pp. 23–47.
[50] M. Fischetti and D. Salvagnin, Feasibility pump 2.0, Tech. Rep., University of
Padova, 2008.
[51] R. Fletcher and S. Leyffer, Solving Mixed Integer Nonlinear Programs by
outer approximation, Mathematical Programming, 66 (1994), pp. 327–349.
[52] , User manual for filterSQP, 1998. University of Dundee Numerical Anal-
ysis Report NA-181.
[53] A. Flores-Tlacuahuac and L.T. Biegler, Simultaneous mixed-integer dy-
namic optimization for integrated design and control, Computers and Chem-
ical Engineering, 31 (2007), pp. 648–656.
[54] J.J.H. Forrest, J.P.H. Hirst, and J.A. Tomlin, Practical solution of large
scale mixed integer programming problems with UMPIRE, Management Sci-
ence, 20 (1974), pp. 736–773.
[55] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the
Theory of NP-Completeness, W.H. Freeman and Company, New York, 1979.
[56] A. Geoffrion, Generalized Benders decomposition, Journal of Optimization
Theory and Applications, 10 (1972), pp. 237–260.
[57] P.E. Gill, W. Murray, and M.A. Saunders, SNOPT: An SQP algorithm
for large–scale constrained optimization, SIAM Journal on Optimization, 12
(2002), pp. 979–1006.
[58] R.E. Gomory, Outline of an algorithm for integer solutions to linear programs,
Bulletin of the American Mathematical Monthly, 64 (1958), pp. 275–278.
[59] , An algorithm for the mixed integer problem, Tech. Rep. RM-2597, The
RAND Corporation, 1960.
[60] I. Grossmann, J. Viswanathan, A.V.R. Raman, and E. Kalvelagen,
GAMS/DICOPT: A discrete continuous optimization package, Math. Meth-
ods Appl. Sci, 11 (2001), pp. 649–664.
[61] O. Günlük, J. Lee, and R. Weismantel, MINLP strengthening for separaable
convex quadratic transportation-cost ufl, Tech. Rep. RC24213 (W0703-042),
IBM Research Division, March 2007.
[62] O.K. Gupta and A. Ravindran, Branch and bound experiments in convex non-
linear integer programming, Management Science, 31 (1985), pp. 1533–1546.
[63] Gurobi Optimization, Gurobi Optimizer Reference Manual, 2009. Version 2.
[64] I. Harjunkoski, R.Pörn, and T. Westerlund, MINLP: Trim-loss problem,
in Encyclopedia of Optimization, C.A. Floudas and P.M. Pardalos, eds.,
Springer, 2009, pp. 2190–2198.

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 37

[65] J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimiza-


tion Algorithms I: Fundamentals (Grundlehren Der Mathematischen Wis-
senschaften), Springer, October 1993.
[66] IBM, Using the CPLEX Callable Library, Version 12, 2009.
[67] R. Jeroslow, There cannot be any algorithm for integer programming with
quadratic constraints, Operations Research, 21 (1973), pp. 221–224.
[68] N.J. Jobst, M.D. Horniman, C.A. Lucas, and G. Mitra, Computational as-
pects of alternative portfolio selection models in the presence of discrete asset
choice constraints, Quantitative Finance, 1 (2001), pp. 489–501.
[69] M. Karamanov and G. Cornuéjols, Branching on general disjunctions, tech.
rep., Carnegie Mellon University, July 2005. Revised August 2009. Available
at http://integer.tepper.cmu.edu.
[70] J.E. Kelley, The cutting plane method for solving convex programs, Journal of
SIAM, 8 (1960), pp. 703–712.
[71] M. Kılınç, J. Linderoth, J. Luedtke, and A. Miller, Disjunctive strong
branching inequalities for Mixed Integer Nonlinear Programs.
[72] G.R. Kocis and I.E. Grossmann, Relaxation strategy for the structural opti-
mization of process flowheets, Industrial Engineering Chemical Research, 26
(1987), pp. 1869–1880.
[73] C.D. Laird, L.T. Biegler, and B. van Bloemen Waanders, A mixed integer
approach for obtaining unique solutions in source inversion of drinking water
networks, Journal of Water Resources Planning and Management, Special
Issue on Drinking Water Distribution Systems Security, 132 (2006), pp. 242–
251.
[74] A.H. Land and A.G. Doig, An automatic method for solving discrete program-
ming problems, Econometrica, 28 (1960), pp. 497–520.
[75] M. Lejeune, A unified approach for cycle service levels, Tech. Rep.,
George Washington University, 2009. Available on Optimization Online
http://www.optimization-online.org/DB HTML/2008/11/2144.html.
[76] S. Leyffer, Deterministic Methods for Mixed Integer Nonlinear Programming,
PhD thesis, University of Dundee, Dundee, Scotland, UK, 1993.
[77] , User manual for MINLP-BB, 1998. University of Dundee.
[78] , Integrating SQP and branch-and-bound for Mixed Integer Nonlinear Pro-
gramming, Computational Optimization & Applications, 18 (2001), pp. 295–
309.
[79] , MacMINLP: Test problems for Mixed Integer Nonlinear Programming,
2003. http://www.mcs.anl.gov/~leyffer/macminlp.
[80] , Nonlinear branch-and-bound revisited, August 2009. Presentation at 20th
International Symposium on Mathematical Programming.
[81] L. Liberti, Reformulations in mathematical programming: Symmetry, Mathe-
matical Programming (2010). To appear.
[82] J. Linderoth and T. Ralphs, Noncommercial software for Mixed-Integer Linear
Programming, in Integer Programming: Theory and Practice, CRC Press
Operations Research Series, 2005, pp. 253–303.
[83] J.T. Linderoth and M.W.P. Savelsbergh, A computational study of search
strategies in mixed integer programming, INFORMS Journal on Computing,
11 (1999), pp. 173–187.
[84] A. Lodi, MIP computation and beyond, in 50 Years of Integer Programming
1958—2008, M. Jünger, T. Liebling, D. Naddef, W. Pulleyblank, G. Reinelt,
G. Rinaldi, and L. Wolsey, eds., Springer, 2009.
[85] H. Marchand and L.A. Wolsey, Aggregation and mixed integer rounding to
solve MIPs, Operations Research, 49 (2001), pp. 363–371.
[86] F. Margot, Exploiting orbits in symmetric ILP, Mathematical Programming,
Series B, 98 (2003), pp. 3–21.

www.it-ebooks.info
38 PIERRE BONAMI, MUSTAFA KILINÇ, AND JEFF LINDEROTH

[87] R.D. McBride and J.S. Yormark, An implicit enumeration algorithm for
quadratic integer programming, Management Science, 26 (1980), pp. 282–
296.
[88] Mosek ApS, 2009. www.mosek.com.
[89] B. Murtagh and M. Saunders, MINOS 5.4 user’s guide, Report SOL 83-20R,
Department of Operations Research, Stanford University, 1993.
[90] K.G. Murty and S.N. Kabadi, Some NP-complete problems in quadratic and
nonlinear programming, Mathematical Programming, 39 (1987), pp. 117–
129.
[91] S. Nabal and L. Schrage, Modeling and solving nonlinear integer programming
problems. Presented at Annual AIChE Meeting, Chicago, 1990.
[92] G. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimization,
John Wiley and Sons, New York, 1988.
[93] G.L. Nemhauser, M.W.P. Savelsbergh, and G.C. Sigismondi, MINTO, a
Mixed INTeger Optimizer, Operations Research Letters, 15 (1994), pp. 47–
58.
[94] J. Nocedal and S.J. Wright, Numerical Optimization, Springer-Verlag, New
York, second ed., 2006.
[95] J. Ostrowski, J. Linderoth, F. Rossi, and S. Smriglio, Orbital branching,
Mathematical Programming, 126 (2011), pp. 147–178.
[96] I. Quesada and I.E. Grossmann, An LP/NLP based branch–and–bound algo-
rithm for convex MINLP optimization problems, Computers and Chemical
Engineering, 16 (1992), pp. 937–947.
[97] D.E. Ravemark and D.W.T. Rippin, Optimal design of a multi-product batch
plant, Computers & Chemical Engineering, 22 (1998), pp. 177 – 183.
[98] N. Sawaya, Reformulations, relaxations and cutting planes for generalized
disjunctive programming, PhD thesis, Chemical Engineering Department,
Carnegie Mellon University, 2006.
[99] N. Sawaya, C.D. Laird, L.T. Biegler, P. Bonami, A.R. Conn,
G. Cornuéjols, I. E. Grossmann, J. Lee, A. Lodi, F. Margot, and
A. Wächter, CMU-IBM open source MINLP project test set, 2006. http:
//egon.cheme.cmu.edu/ibm/page.htm.
[100] A. Schrijver, Theory of Linear and Integer Programming, Wiley, Chichester,
1986.
[101] R. Stubbs and S. Mehrotra, A branch-and-cut method for 0-1 mixed convex
programming, Mathematical Programming, 86 (1999), pp. 515–532.
[102] M. Türkay and I.E. Grossmann, Logic-based minlp algorithms for the opti-
mal synthesis of process networks, Computers & Chemical Engineering, 20
(1996), pp. 959 – 978.
[103] R.J. Vanderbei, LOQO: An interior point code for quadratic programming, Op-
timization Methods and Software (1998).
[104] A. Vecchietti and I.E. Grossmann, LOGMIP: a disjunctive 0-1 non-linear
optimizer for process system models, Computers and Chemical Engineering,
23 (1999), pp. 555 – 565.
[105] J. Viswanathan and I.E. Grossmann, A combined penalty function and outer–
approximation method for MINLP optimization, Computers and Chemical
Engineering, 14 (1990), pp. 769–782.
[106] A. Wächter, Some recent advanced in Mixed-Integer Nonlinear Programming,
May 2008. Presentation at the SIAM Conference on Optimization.
[107] A. Wächter and L.T. Biegler, On the implementation of a primal-dual inte-
rior point filter line search algorithm for large-scale nonlinear programming,
Mathematical Programming, 106 (2006), pp. 25–57.
[108] R. Waltz, Current challenges in nonlinear optimization, 2007. Presen-
tation at San Diego Supercomputer Center: CIEG Spring Orientation
Workshop, available at www.sdsc.edu/us/training/workshops/2007sac_
studentworkshop/docs/SDSC07.ppt.

www.it-ebooks.info
ALGORITHMS AND SOFTWARE FOR CONVEX MINLP 39

[109] T. Westerlund, H.I. Harjunkoski, and R. Pörn, An extended cutting plane


method for solving a class of non-convex minlp problems, Computers and
Chemical Engineering, 22 (1998), pp. 357–365.
[110] T. Westerlund and K. Lundqvist, Alpha-ECP, version 5.101. an interactive
minlp-solver based on the extended cutting plane method, in Updated version
of Report 01-178-A, Process Design Laboratory, Abo Akademi Univeristy,
2005.
[111] T. Westerlund and F. Pettersson, A cutting plane method for solving con-
vex MINLP problems, Computers and Chemical Engineering, 19 (1995),
pp. s131–s136.
[112] T. Westerlund and R. Pörn, . a cutting plane method for minimizing pseudo-
convex functions in the mixed integer case, Computers and Chemical Engi-
neering, 24 (2000), pp. 2655–2665.
[113] L.A. Wolsey, Integer Programming, John Wiley and Sons, New York, 1998.
[114] Y. Zhu and T. Kuno, A disjunctive cutting-plane-based branch-and-cut algo-
rithm for 0-1 mixed-integer convex nonlinear programs, Industrial and En-
gineering Chemistry Research, 45 (2006), pp. 187–196.

www.it-ebooks.info
www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR
MIXED INTEGER SECOND ORDER
CONE PROGRAMMING∗
SARAH DREWES† AND STEFAN ULBRICH‡

Abstract. This paper deals with outer approximation based approaches to solve
mixed integer second order cone programs. Thereby the outer approximation is based
on subgradients of the second order cone constraints. Using strong duality of the sub-
problems that are solved during the algorithm, we are able to determine subgradients
satisfying the KKT optimality conditions. This enables us to extend convergence results
valid for continuously differentiable mixed integer nonlinear problems to subdifferen-
tiable constraint functions. Furthermore, we present a version of the branch-and-bound
based outer approximation that converges when relaxing the convergence assumption
that every SOCP satisfies the Slater constraint qualification. We give numerical results
for some application problems showing the performance of our approach.

Key words. Mixed Integer Nonlinear Programming, Second Order Cone Program-
ming, Outer Approximation.

AMS(MOS) subject classifications. 90C11.

1. Introduction. Mixed Integer Second Order Cone Programs (MIS-


OCP) can be formulated as

min cT x
s.t. Ax = b
x  0 (1.1)
(x)j ∈ [lj , uj ] (j ∈ J),
(x)j ∈ Z (j ∈ J),

where c = (cT1 , . . . cTnoc )T ∈ Rn , A = (A . . . Anoc ) ∈ Rm,n , with ci ∈ Rki


1 ,noc
m,ki
and Ai ∈ R for i ∈ {1, . . . noc} and i=1 ki = n. Furthermore, b ∈ Rm ,
(x)j denotes the j-th component of x, lj , uj ∈ R for j ∈ J and J ⊂ {1, . . . n}
denotes the integer index set. Here, x  0 for x = (xT1 , . . . xTnoc )T with
xi ∈ Rki for i ∈ {1, . . . noc} denotes that

x ∈ K, K := K1 × · · · × Knoc ,

where

Ki := {xi = (xi0 , xTi1 )T ∈ R × Rki −1 : xi1 2 ≤ xi0 }

∗ Research partially supported by the German Research Foundation (DFG) within

the SFB 805 and by the state of Hesse within the LOEWE-Center AdRIA.
† Research Group Nonlinear Optimization, Department of Mathematics, Technische

Universität Darmstadt, Germany.


‡ Research Group Nonlinear Optimization, Department of Mathematics, Technische

Universität Darmstadt, Germany.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 41
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_2,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
42 SARAH DREWES AND STEFAN ULBRICH

is the second order cone of dimension ki . Mixed integer second order cone
problems have various applications in finance or engineering, for example
turbine balancing problems, cardinality-constrained portfolio optimization
(cf. Bertsimas and Shioda in [17] or Vielma et al. in [10] ) or the problem of
finding a minimum length connection network also known as the Euclidean
Steiner Tree Problem (ESTP) (cf. Fampa, Maculan in [15]).
Available convex MINLP solvers like BONMIN [22] by Bonami et al. or
FilMINT [25] by Abhishek et al. are in general not applicable for (1.1),
since the occurring second order cone constraints are not continuously dif-
ferentiable.
Branch-and-cut methods for convex mixed 0-1 problems have been dis-
cussed by Stubbs and Mehrotra in [2] and [9] which can be applied to solve
(1.1), if the integer variables are binary. In [5] Çezik and Iyengar discuss
cuts for general self-dual conic programming problems and investigate their
applications on the maxcut and the traveling salesman problem. Atamtürk
and Narayanan present in [12] integer rounding cuts for conic mixed-integer
programming by investigating polyhedral decompositions of the second or-
der cone conditions and in [11] the authors discuss lifting for mixed integer
conic programming, where valid inequalities for mixed-integer feasible sets
are derived from suitable subsets.
One article dealing with non-differentiable functions in the context of
outer approximation approaches for MINLP is [1] by Fletcher and Leyffer,
where the authors prove convergence of outer approximation algorithms for
non-smooth penalty functions. The only article dealing with outer approx-
imation techniques for MISOCPs is [10] by Vielma et al., which is based on
Ben-Tal and Nemirovskii’s polyhedral approximation of the second order
cone constraints [13]. Thereby, the size of the outer approximation grows
when strengthening the precision of the approximation. This precision and
thus the entire outer approximation is chosen in advance, whereas the ap-
proximation presented here is strengthened iteratively in order to guarantee
convergence of the algorithm.
In this paper we present a hybrid branch-and-bound based outer ap-
proximation approach for MISOCPs. The approach is based on the branch-
and-bound based outer approximation approach for continuously differen-
tiable constraints – as proposed by Bonami et al. in [8] on the basis of
Fletcher and Leyffer [1] and Quesada and Grossmann [3]. The idea is to
iteratively compute integer feasible solutions of a (sub)gradient based lin-
ear outer approximation of (1.1) and to tighten this outer approximation
by solving nonlinear continuous problems.
Thereby linear outer approximations based on subgradients satisfying
the Karush Kuhn Tucker (KKT) optimality conditions of the occurring
SOCP problems enable us to extend the convergence result for continuously
differentiable constraints to subdifferentiable second order cone constraints.
Thus, in contrast to [10], the subgradient based approximation induces
convergence of any classical outer approximation based approach under the

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 43

specified assumptions. We also present an adaption of the algorithm that


converges even if one of these convergence assumptions is violated.
In numerical experiments we show the applicability of the algorithm
and compare it to a nonlinear branch-and-bound approach.
2. Preliminaries. In the following int(Ki ) denotes the interior
of the cone Ki , i.e. those vectors xi satisfying xi0 > xi1 ,
bd(Ki ) denotes the boundary of Ki , i.e. those vectors xi satisfy-
ing xi0 = xi1 . By  ·  we denote the Euclidean norm.
Assume g : Rn → R is a convex and subdifferentiable function on Rn .
Then due to the convexity of g, the inequality g(x) ≥ g(x̄) + ξ T (x − x̄)
holds for all x̄, x ∈ Rn and every subgradient ξ ∈ ∂g(x̄) – see for ex-
ample [19]. Thus, we obtain a linear outer approximation of the region
{x : g(x) ≤ 0} applying constraints of the form

g(x̄) + ξ T (x − x̄) ≤ 0. (2.1)

In the case of (1.1), the feasible region is described by constraints

gi (x) := −xi0 + xi1  ≤ 0, i = 1, . . . noc, (2.2)

where gi (x) is differentiable on Rn \ {x : xi1  = 0} with ∇gi (xi ) =


xT
(−1, xi1 i1 
) and subdifferentiable if xi1  = 0. The following lemma gives
a detailed description of the subgradients of (2.2).
Lemma 2.1. The convex function gi (xi ) := −xi0 + xi1  is subdiffer-
ential in xi = (xi0 , xTi1 )T = (a, 0T )T , a ∈ R, with ∂gi ((a, 0T )T ) = {ξ =
(ξ0 , ξ1T )T , ξ0 ∈ R, ξ1 ∈ Rki −1 : ξ0 = −1, ξ1  ≤ 1}.
Proof. Follows from the subgradient inequality in (a, 0T )T .
The following lemma investigates a complementarity constraint on two el-
ements of the second order cone that is used in the subsequent sections.
Lemma 2.2. Assume K is the second order cone of dimension k and
x = (x0 , xT1 )T ∈ K, s = (s0 , sT1 )T ∈ K satisfy the condition xT s = 0, then
1. x ∈ int(K) ⇒ s = (0, . . . 0)T ,
2. x ∈ bd(K) \ {0} ⇒ s ∈ bd(K) and ∃γ ≥ 0 : s = γ(x0 , −xT1 ).
Proof. 1.: Assume x1  > 0 and s0 > 0. Due to x0 > x̄1  it holds
that sT x = s0 x0 + sT1 x1 > s0 x1  + sT1 x1 ≥ s0 x1  − s1 x1 . Then
xT s = 0 can only be true, if s0 x1  − s1 x1  < 0 ⇔ s0 < s1  which
contradicts s ∈ K. Thus, s0 = 0 ⇒ s = (0, . . . 0)T . If x1  = 0, then s0 = 0
follows directly from x0 > 0.
2.: Due to x0 = x1 , we have sT x = 0 ⇔ −sT1 x1 = s0 x1 . Since
s0 ≥ s1  ≥ 0 we have −sT1 x1 = s0 x1  ≥ x1 s1 . Cauchy -Schwarz’s
inequality yields −sT1 x1 = x1 s1  which implies both s1 = −γx1 , γ ∈ R
and s0 = s1 . It follows that −xT1 s1 = γxT1 x1 ≥ 0. Together with
s0 = s1  and x1  = x0 we get that there exists γ ≥ 0, such that s1 =
( − γx1 , −γxT1 )T = γ(x0 , −xT1 )T .

www.it-ebooks.info
44 SARAH DREWES AND STEFAN ULBRICH

We make the following assumptions:


A1. The set {x : Ax = b, xJ ∈ [l, u]} is bounded.
A2. Every nonlinear subproblem (N LP (xkJ )) that is obtained from
(1.1) by fixing the integer variables to the value xkJ has nonempty
interior (Slater constraint qualification).
These assumptions comply with the assumptions made by Fletcher and
Leyffer in [1] as follows. We drop the assumption of continuous differen-
tiability of the constraint functions, but we assume a constraint qualifica-
tion inducing strong duality instead of an arbitrary constraint qualification
which suffices in the differentiable case.
Remark. A2 might be expected as a strong assumption, since it is
violated as soon as a leading cone variable xi0 is fixed to zero. In that
case, all variables belonging to that cone can be eliminated and the Slater
condition may hold now for the reduced problem. Moreover, at the end of
Section 5, we present an enhancement of the algorithm that converges even
if assumption A2 is violated.
3. Feasible nonlinear subproblems. For a given integer configura-
tion xkJ , we define the SOCP subproblem

min cT x
s.t. Ax = b,
(N LP (xkJ ))
x  0,
xJ = xkJ .

The dual of (N LP (xkJ )), in the sense of Nesterov and Nemirovskii [18] or
Alizadeh and Goldfarb [7], is given by

max (bT , xk,T


J )y
s.t. (A , IJT )y + s = c,
T (N LP (xkJ )-D)
s  0,

where IJ = ((IJ )1 , . . . (IJ )noc ) denotes the matrix mapping x to the integer
variables xJ where (IJ )i ∈ R|J|,ki is the block of columns of IJ associated
with the i-th cone of dimension ki . We define

I0 (x̄) := {i : x̄i = (0, . . . 0)T },


(3.1)
Ia (x̄) := {i : gi (x̄) = 0, x̄i = (0, . . . 0)T },

where Ia (x̄) is the index set of active conic constraints that are differentiable
in x̄ and I0 (x̄) is the index set of active constraints that are subdifferen-
tiable in x̄. The crucial point in an outer approximation approach is to
tighten the outer approximation problem such that the integer assignment
of the last solution is cut off. Assume xkJ is this last solution. Then we will
show later that those subgradients in ∂gi (x̄) that satisfy the KKT condi-
tions in the solution x̄ of (N LP (xkJ )) give rise to linearizations with this

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 45

tightening property. Hence, we show now, how to choose elements ξ¯i in the
subdifferentials ∂gi (x̄) for i ∈ {1, . . . noc} that satisfy the KKT conditions
ci + (ATi , (IJ )Ti )μ̄ + λ̄i ξ¯i = 0, i ∈ I0 (x̄),
ci + (ATi , (IJ )Ti )μ̄ + λ̄i ∇gi (x̄i ) = 0, i ∈ Ia (x̄), (3.2)
ci + (ATi , (IJ )Ti )μ̄ = 0, i ∈ I0 (x̄) ∪ Ia (x̄)

in the solution x̄ of (N LP (xkJ )) with appropriate Lagrange multipliers μ̄


and λ̄ ≥ 0. This step is not necessary if the constraint functions are
continuously differentiable, since ∂gi (x̄) then contains only one element:
the gradient ∇gi (x̄).
Lemma 3.1. Assume A1 and A2. Let x̄ solve (N LP (xkJ )) and let
(s̄, ȳ) be the corresponding dual solution of (N LP (xkJ )-D). Then there exist
Lagrange multipliers μ̄ = −ȳ and λ̄i ≥ 0 (i ∈ I0 ∪ Ia ) that solve the KKT
conditions (3.2) in x̄ with subgradients
   
−1 ¯ −1
ξ̄i = , if s̄i0 > 0, ξi = , if s̄i0 = 0 (i ∈ I0 (x̄)).
− s̄s̄i1
i0
0

Proof. A1 and A2 guarantee the existence of a primal-dual solution


(x̄, s̄, ȳ) satisfying the primal dual optimality system (cf. Alizadeh and
Goldfarb [7])
ci − (ATi , (IJ )Ti )ȳ = s̄i , i = 1, . . . noc, (3.3)
Ax̄ = b, IJ x̄ = xkJ , (3.4)
x̄i0 ≥ x̄i1 , s̄i0 ≥ s̄i1 , i = 1, . . . noc, (3.5)
s̄Ti x̄i = 0, i = 1, . . . noc. (3.6)
Since (N LP (xkJ )) is convex and due to A2, there also exist Lagrange mul-
tipliers μ̄ ∈ R , λ̄ ∈ Rnoc , such that x̄ satisfies the KKT-conditions (3.2)
m

with elements ξ̄i ∈ ∂gi (x̄). We now compare both optimality systems to
each other.
First, we consider i ∈ I0 ∪ Ia and thus x̄i ∈ int(Ki ). Lemma 2.2, part
1 induces s̄i = (0, . . . 0)T . Conditions (3.3) for i ∈ I0 ∪ Ia are thus equal to
ci − (ATi , (IJ )Ti )ȳ = 0 and thus μ̄ = −ȳ satisfies the KKT-condition (3.2)
for i ∈ I0 ∪ IA .
Next we consider i ∈ Ia (x̄), where xi ∈ bd(K) \ {0}. Lemma 2.2, part
2 yields
   
 − γ x̄i1  x̄i0
s̄i = =γ (3.7)
−γ x̄i1 −x̄i1
x̄T
for i ∈ Ia (x̄). Inserting ∇gi (x̄) = (−1, x̄i1i1 
)T for i ∈ Ia into (3.2) yields
the existence of λi ≥ 0 such that
 
1
ci + (ATi , (IJ )Ti )μ = λi x̄i1 , i ∈ Ia (x̄). (3.8)
− x̄i1 

www.it-ebooks.info
46 SARAH DREWES AND STEFAN ULBRICH

Insertion of (3.7) into (3.3) and comparison with (3.8) yields the exis-
tence of γ ≥ 0 such that μ̄ = −ȳ and λ̄i = γ x̄i0 = γx̄i1  ≥ 0 satisfy the
KKT-conditions (3.2) for i ∈ Ia (x̄).
For i ∈ I0 (x̄), condition (3.2) is satisfied by μ ∈ Rm , λ̄i ≥ 0 and
subgradients ξ̄i of the form ξ¯i = (−1, v T )T , v ≤ 1. Since μ̄ = −ȳ
satisfies (3.2) for i ∈ I0 , we look for a suitable v and λ̄i ≥ 0 satisfying
ci − (ATi , (IJ )Ti )ȳ = λ̄i (1, −v T )T for i ∈ I0 (x̄). Comparing the last con-
dition with (3.3) yields that if s̄i1  > 0, then λ̄i = s̄i0 , −v = s̄s̄i1 i0
satisfy
condition (3.2) for i ∈ I0 (x̄). Since s̄i0 ≥ s̄i1  we obviously have λ̄i ≥ 0
and v =  s̄s̄i1
i0
 = s̄1i0 s̄i1  ≤ 1. If s̄i1  = 0, the required condition (3.2)
is satisfied by λ̄i = s̄i0 , −v = (0, . . . 0)T .

4. Infeasible nonlinear subproblems. If the nonlinear program


(N LP (xkJ )) is infeasible for xkJ , the algorithm solves a feasibility problem
of the form

min u
s.t. Ax = b,
−xi0 + xi1  ≤ u, i = 1, . . . noc, (F (xkJ ))
u ≥ 0,
xJ = xkJ .

It has the property that the optimal solution (x̄, ū) minimizes
the maximal violation of the conic constraints. The dual program of
(F (xkJ )) is

max (bT , xk,T


J )y
s.t. −(AT , 
IJT )y + s = 0,
noc
su + i=1 si0 = 1, (F (xkJ )-D)
si1  ≤ si0 , i = 1, . . . noc,
su ≥ 0.

We define the index sets of active constraints in a solution (x̄, ū) of


(F (xkJ )),

IF := IF (x̄) := {i ∈ {1, . . . noc} : −x̄i0 + x̄i1  = ū},


IF 0 := IF 0 (x̄) := {i ∈ IF : x̄i1  = 0}, (4.1)
IF 1 := IF 1 (x̄) := {i ∈ IF : x̄i1  = 0}.

One necessity for convergence of the outer approximation approach


is the following. Analogously to the feasible case, the solution of the fea-
sibility problem (F (xkJ )) must tighten the outer approximation such that
the current integer assignment xkJ is no longer feasible for the linear outer
approximation. For this purpose, we identify subgradients ξi ∈ ∂gi (x̄) at
the solution (ū, x̄) of (F (xkJ )) that satisfy the KKT conditions of (F (xkJ ))

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 47

ATi μA + (IJ )Ti μJ = 0, i ∈ IF , (4.2)


∇gi (x̄i )λgi + ATi μA + (IJ )Ti μJ = 0, i ∈ IF 1 , (4.3)
ξi λgi + ATi μA + (IJ )Ti μJ = 0, i ∈ IF 0 , (4.4)

(λg )i = 1. (4.5)
i∈IF

Lemma 4.1. Assume A1 and A2 hold. Let (x̄, ū) solve (F (xkJ )) with
ū > 0 and let (s̄, ȳ) be the solution of its dual program (F (xkJ )-D). Then
there exist Lagrange multipliers μ̄ = −ȳ and λ̄i ≥ 0 (i ∈ IF ) that solve the
KKT conditions in (x̄, ū) with subgradients
   
−1 −1
ξi = , if s̄i0 > 0, ξi = , if s̄i0 = 0 (4.6)
− s̄s̄i1
i0
0

for i ∈ IF 0 (x̄).

Proof. Since (F (xkJ )) has interior points, there exist Lagrange multi-
pliers μ ∈ Rm , λ ≥ 0, such that optimal solution (x̄, ū) of (F (xkJ )) satisfies
the KKT-conditions (4.2) - (4.5) with ξi ∈ ∂gi (x̄i ) plus the feasibility con-
ditions. We already used the complementary conditions for ū > 0 and
the inactive constraints. Due to the nonempty interior of (F (xkJ )), (x̄, ū)
satisfies also the primal-dual optimality system
Ax = b,
u ≥ 0,
−ATi yA − (IJT )i yJ = si , i = 1, . . . noc, (4.7)
noc

xi0 + u ≥ x̄i1 , si0 = 1, (4.8)
i=1
si0 ≥ s̄i1 , i = 1, . . . noc, (4.9)
T
si0 (xi0 + u) + si1 xi1 = 0, i = 1, . . . noc, (4.10)
where we again used complementarity for ū > 0.
First we investigate i ∈ IF . In this case x̄i0 + ū > x̄i1  induces
si = (0, . . . 0)T (cf. Lemma 2.2, part 1). Thus, the KKT conditions (4.2)
are satisfied by μA = −yA and μJ = −yJ .
Next, we consider i ∈ IF 1 for which by definition x̄i0 + ū = x̄i1  > 0
holds. Lemma 2.2, part 2 states that there exists γ ≥ 0 with si0 = γ(x̄i0 +
ū) = γx̄i1  and si1 = −γ x̄i1 . Insertion into (4.7) yields
 
T −1
−Ai yA − (IJ )i yJ + γx̄i1  x̄i1 = 0, i ∈ IF 1 .
x̄i1 

x̄T
Since ∇gi (x̄i ) = (−1, x̄i1
i1 
)T , we obtain that the KKT-condition (4.3) is
satisfied by μA = −yA , μJ = −yJ and λgi = si0 = γx̄i1  ≥ 0.

www.it-ebooks.info
48 SARAH DREWES AND STEFAN ULBRICH

Finally, we investigate i ∈ IF 0 , where x̄i0 + ū = x̄i1  = 0. Since


μA = −yA , μJ = −yJ satisfy the KKT-conditions for i ∈ IF 0 , we derive a
subgradient ξi that satisfies (4.4) with that choice. In analogy to Lemma
T T
3.1 from Section 3 we derive that ξi = (−1, ξi1 ) with ξi1 = −s i1
si0
, if si0 > 0
and ξi1 = 0 otherwise, are suitable together with λi = si0 ≥ 0.
5. The algorithm. Let T ⊂ Rn contain solutions of nonlinear sub-
problems (N LP (xkJ )) and let S ⊂ Rn contain solutions of feasibility prob-
lems (F (xkJ )). We build a linear outer approximation of (1.1) based on
subgradient based linearizations of the form (2.1). Thereby we use the
subgradients specified in Lemma 3.1 and 4.1. This gives rise to the mixed
integer linear outer approximation problem

min cT x
s.t. Ax = b
cT x < cT x̄, x̄ ∈ T, x̄J ∈ Z|J| ,
−x̄i1 xi0 + x̄Ti1 xi1 ≤ 0, i ∈ Ia (x̄), x̄ ∈ T,
−x̄i1 xi0 + x̄Ti1 xi1 ≤ 0, i ∈ IF 1 (x̄) x̄ ∈ S,
−xi0 ≤ 0, i ∈ I0 (x̄), s̄i0 = 0, x̄ ∈ T, (MIP(T,S))
−xi0 − s̄1i0 s̄Ti1 xi1 ≤ 0, i ∈ I0 (x̄), s̄i0 > 0, x̄ ∈ T,
−xi0 − s̄1i0 s̄Ti1 xi1 ≤ 0, i ∈ IF 0 (x̄), s̄i0 > 0, x̄ ∈ S,
−xi0 ≤ 0, i ∈ IF 0 (x̄), s̄i0 = 0, x̄ ∈ S,
xj ∈ [lj , uj ], (j ∈ J)
xj ∈ Z, (j ∈ J).

The idea of outer approximation based algorithms is to use such a


linear outer approximation (MIP(T,S))of the original problem (1.1) to
produce integer assignments. For each integer assignment the nonlinear
subproblem (N LP (xkJ )) is solved generating feasible solutions for (1.1) as
long as (N LP (xkJ )) is feasible. We define nodes N k consisting of lower
and upper bounds on the integer variables that can be interpreted as
branch-and-bound nodes for (1.1) as well as (MIP(T,S)). We define the
following problems associated with N k :

(M ISOC k ) mixed integer SOCP with bounds of N k


(SOC k ) continuous relaxation of (M ISOC k )
(M IP k (T, S)) MIP outer approximation of (M ISOC k )
(LP k (T, S)) continuous relaxation of (M IP k (T, S))

Thus, if (LP k (T, S)) is infeasible, (SOC k ), (M ISOC k ) and


(M IP k (T, S)) are also infeasible. The optimal objective function value
of (LP k (T, S)) is less or equal than the optimal objective function val-
ues of (M IP k (T, S)) and (SOC k ) respectively, and these are less or equal
than the optimal objective function value of (M ISOC k ). Thus, the algo-
rithm stops searching the subtree of N k either if the problem itself or
its outer approximation becomes infeasible or if the objective function

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 49

value of (M IP k (T, S)) exceeds the optimal function value of a known


feasible solution of (1.1). The latter case is expressed in the condition
cT x < cT x̄, ∀x̄ ∈ T in (M IP k (T, S)). The following hybrid algorithm in-
tegrates branch-and-bound and the outer approximation approach as pro-
posed by Bonami et al. in [8] for convex differentiable MINLPs.
Algorithm 1 Hybrid OA/B-a-B for (1.1)

Input: Problem (1.1)


Output: Optimal solution x∗ or indication of infeasibility.
Initialization: CUB := ∞, solve (SOC 0 ) with solution x0 ,
if ((SOC 0 ) infeasible) STOP, problem infeasible
else set S = ∅, T = {x0 } and solve (MIP(T,S))
endif
1. if ( (MIP(T,S))infeasible) STOP, problem infeasible
else solution x(1) found:
(1)
if (N LP (xJ ) feasible)
(1)
compute solution x̄ of (N LP (xJ )), T := T ∪ {x̄},
if (c x̄ < CU B) CU B = c x̄, x∗ = x̄ endif.
T T
(1)
else compute solution x̄ of F (xJ ), S := S ∪ {x̄}.
endif
endif
Nodes := {N 0 = (lb0 = l, ub0 = u)}, ll := 0, L := 10, i := 0
2. while Nodes = ∅ do select N k from Nodes, Nodes := Nodes \ N k
2a. if (ll = 0 mod L) solve (SOC k )
if ((SOC k ) feasible): solution x̄, T := T ∪ {x̄}
if (x̄J integer):
if (cT x̄ < CU B) CU B = cT x̄, x∗ = x̄ endif
go to 2.
endif
else go to 2.
endif
endif
2b. solve (LP k (T, S)) with solution xk
while ((LP k (T, S)) feasible) & (xkJ integer) & (cT xk < CU B)
if (N LP (xkJ )is feasible with solution x̄) T := T ∪ {x̄}
if (cT x̄ < CU B) CU B = cT x̄, x∗ = x̄ endif
else solve F (xkJ ) with solution x̄, S := S ∪ {x̄}
endif
compute solution xk of updated (LP k (T, S))
endwhile
2c. if (cT xk < CUB ) branch on variable xkj ∈ Z,
create N i+1 = N k , with ubi+1j = xkj ,
i+2 i+2
create N = N , with lbj = xkj ,
k

www.it-ebooks.info
50 SARAH DREWES AND STEFAN ULBRICH

set i = i + 2, ll = ll + 1.
endif
endwhile
Note that if L = 1, xk is set to x̄ and Step 2b is omitted, Step 2
performs a nonlinear branch-and-bound search. If L = ∞ Algorithm 1
resembles an LP/NLP-based branch-and-bound algorithm. Convergence
of the outer approximation approach in case of continuously differentiable
constraint functions was shown in [1], Theorem 2. We now state conver-
gence of Algorithm 1 for subdifferentiable SOCP constraints.
For this purpose, we first prove that the last integer assignment xkJ is
infeasible in the outer approximation conditions induced by the solution of
a feasible subproblem (N LP (xkJ )).
Lemma 5.1. Assume A1 and A2 hold. If (N LP (xkJ )) is feasible with
optimal solution x̄ and dual solution (s̄, ȳ). Then every x with xJ = xkJ
satisfying the constraints Ax = b and
−x̄i1 xi0 + x̄Ti1 xi1 ≤ 0, i ∈ Ia (x̄),
−xi0 ≤ 0, i ∈ I0 (x̄), s̄i0 = 0, (5.1)
−xi0 − s̄1i0 s̄Ti1 xi1 ≤ 0, i ∈ I0 (x̄), s̄i0 > 0, x̄ ∈ T,

where Ia and I0 are defined by (3.1), satisfies cT x ≥ cT x̄.


Proof. Assume x, with xJ = x̄J satisfies Ax = b and (5.1), namely

(∇gi (x̄))TJ (xJ¯ − x̄J¯)≤ 0, i ∈ Ia (x̄), (5.2)


(ξ¯i )TJ (xJ¯ − x̄J¯) ≤ 0, i ∈ I0 (x̄), (5.3)
A(x − x̄) = 0, (5.4)

with ξ̄i from Lemma 3.1 and where the last equation follows from Ax̄ = b.
|I ∪I |
Due to A2 we know that there exist μ ∈ Rm and λ ∈ R+0 a satisfying
k
the KKT conditions (3.2) of (N LP (xJ )) in x̄, that is

−ci = ATi μ + λi ξ̄i , i ∈ I0 (x̄),


−ci = ATi μ + λi ∇gi (x̄), i ∈ Ia (x̄), (5.5)
−ci = ATi μ, i ∈ I0 (x̄) ∪ Ia (x̄)

with the subgradients ξ¯i chosen from Lemma 3.1. Farkas’ Lemma (cf. [20])
states that (5.5) is equivalent to the fact that as long as (x − x̄) satisfies
(5.2) - (5.4), then cTJ¯ (xJ¯ − x̄J¯) ≥ 0 ⇔ cTJ¯ xJ¯ ≥ cTJ¯ x̄J¯ must hold.
In the case that (N LP (xkJ )) is infeasible, we can show that the subgra-
dients (4.6) of Lemma 4.1 together with the gradients of the differentiable
functions gi in the solution of (F (xkJ )) provide inequalities that separate
the last integer solution.
Lemma 5.2. Assume A1 and A2 hold. If (N LP (xkJ )) is infeasible and
thus (x̄, ū) solves (F (xkJ )) with positive optimal value ū > 0, then every x
satisfying the linear equalities Ax = b with xJ = xkJ , is infeasible in the
constraints

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 51

x̄T
−xi0 + i1
x
x̄i1  i1
≤ 0, i ∈ IF 1 (x̄),
s̄T (5.6)
−xi0 − s̄i1
i0
xi1 ≤ 0, i ∈ IF 0 , s̄i0 =
 0,
−xi0 ≤ 0, i ∈ IF 0 , s̄i0 = 0,

where IF 1 and IF 0 are defined by (4.1) and (s̄, ȳ) is the solution of the dual
program (F (xkJ )-D) of (F (xkJ )).
Proof. The proof is done in analogy to Lemma 1 in [1]. Due to assump-
tion A1 and A2, the optimal solution of (F (xkJ )) is attained.
 We further
know from Lemma 4.1, that there exist λgi ≥ 0, with i∈IF λgi = 1, μA
and μJ satisfying the KKT conditions
 
∇gi (x̄)λgi + ξin λgi + AT μA + IJT μJ = 0 (5.7)
i∈IF 1 i∈IF 0

in x̄ with subgradients (4.6). To show the result of the lemma, we assume


now that x, with xJ = xkJ , satisfies Ax = b and conditions (5.6) which are
equivalent to

gi (x̄) + ∇gi (x̄)T (x − x̄) ≤ 0, i ∈ IF 1 (x̄),


gi (x̄) + ξin,T (x − x̄) ≤ 0, i ∈ IF 0 (x̄).

We multiply the inequalities  by (λg )i ≥ 0 and add all inequalities.


Since gi (x̄) = ū for i ∈ IF and i∈IF λgi = 1 we get
 
(λgi ū + λgi ∇gi (x̄)T (x − x̄)) + (λgi ū + λgi ξin,T (x − x̄)) ≤ 0
i∈IF 1 i∈IF 0
 T
 
⇔ ū + λgi ∇gi (x̄) + (λgi ξin ) (x − x̄) ≤ 0.
i∈IF 1 i∈IF 0

Insertion of (5.7) yields

ū + (−AT μA − IJT μJ )T (x − x̄)≤ 0


⇔Ax=Ax̄=b ū − μTJ (xJ − x̄J ) ≤0
xJ =xk
J =x̄J
⇔ ū ≤ 0.

This is a contradiction to the assumption ū > 0.


Thus, the solution x̄ of (F (xkJ )) produces new constraints (5.6) that
strengthen the outer approximation such that the integer solution xkJ is
no longer feasible. If (N LP (xkJ )) is infeasible, the active set IF (x̄) is not
empty and thus, at least one constraint (5.6) can be added.
Theorem 5.1. Assume A1 and A2. Then Algorithm 1 terminates in a
finite number of steps at an optimal solution of (1.1) or with the indication,
that it is infeasible.
Proof. We show that no integer assignment xkJ is generated twice by
showing that xJ = xkJ is infeasible in the linearized constraints created in

www.it-ebooks.info
52 SARAH DREWES AND STEFAN ULBRICH

the solutions of (N LP (xkJ )) or (F (xkJ )). The finiteness follows then from
the boundedness of the feasible set. A1 and A2 guarantee the solvability,
validity of KKT conditions and primal-dual optimality of the nonlinear
subproblems (N LP (xkJ )) and (F (xkJ )). In the case, when (N LP (xkJ )) is
feasible with solution x̄, Lemma 5.1 states that every x̃ with x̃J = x̂J
must satisfy cT x̃ ≥ cT x̄ and is thus infeasible in the constraint cT x̃ <
cT x̄ included in (LP k (T, S)). In the case, when (N LP (xkJ )) is infeasible,
Lemma 5.2 yields the result for (F (xkJ )).
Modified algorithm avoiding A2. We now present an adaption of
Algorithm 1 which is still convergent if the convergence assumption A2
is not valid for every subproblem. Assume N k is a node such that A2 is
violated by (N LP (xkJ )) and assume x with integer assignment xJ = xkJ is
feasible for the updated outer approximation. Then the inner while-loop
in step 2b becomes infinite and Algorithm 1 does not converge. In that
case we solve the SOCP relaxation (SOC k ) in node N k . If that problem
is not infeasible and has no integer feasible solution, we branch on the
solution of this SOCP relaxation to explore the subtree of N k . Hence, we
substitute step 2b by the following step.

2b’. solve (LP k (T, S)) with solution xk , set repeat = true.
while (((LP k (T, S)) feasible) & (xkJ integer) & (cT xk < CU B) & repeat)
save xold
J = xJ
k
k
if (N LP (xJ )is feasible with solution x̄)
T := T ∪ {x̄},
if (cT x̄ < CU B) CU B = cT x̄, x∗ = x̄ endif
else compute solution x̄ of F (xkJ ), S := S ∪ {x̄}
endif
compute solution xk of updated (LP k (T, S))
if (xold k
J == xJ ) set repeat = false endif
endwhile
if (!repeat)
solve nonlinear relaxation (SOC k ) at the node N k with solution x̄
T := T ∪ {x̄}
if (x̄J integer): if cT x̄ < CU B: CUB = cT x̄, x∗ = x̄ endif
go to 2.
else set xk = x̄.
endif
endif

Note that every subgradient of a conic constraint provides a valid linear


outer approximation of the form (2.1). Thus, in the case that we cannot
identify the subgradients satisfying the KKT system of (N LP (xkJ )), we
take an arbitrary subgradient to update the linear outer approximation
(LP k (T, S)).

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 53

Lemma 5.3. Assume A1 holds. Then Algorithm 2, which is Algorithm


1, where Step 2b is replaced by 2b’, terminates in a finite number of steps
at an optimal solution of (1.1) or with the indication that it is infeasible.
Proof. If A2 is not satisfied, we have no guarantee that the lineariza-
tion in the solution x̄ of (N LP (xkJ )) separates the current integer solution.
Hence, assume the solution of (LP k (T, S)) is integer feasible with solu-
tion xk and the same integer assignment xkJ is optimal for the updated
outer approximation (LP k (T ∪ {x̄}, S)) or (LP k (T, S ∪ {x̄})). Then the
nonlinear relaxation (SOC k ) is solved at the current node N k . If the prob-
lem (SOC k ) is not infeasible and its solution is not integer, the algorithm
branches on its solution producing new nodes N i+1 and N i+2 . These nodes
are again searched using Algorithm 1 as long as the situation of repeated
integer solutions does not occur. Otherwise it is again branched on the
solution of the continuous relaxation. If this is done for a whole subtree,
the algorithm coincides with a nonlinear branch-and-bound search for this
subtree which is finite due to the boundedness of the integer variables.
Remarks. The convergence result of Theorem 5.1 can be directly ex-
tended to any outer approximation approach for (1.1) which is based on the
properties of (MIP(T,S))(and thus (LP k (T, S))) proved in Lemma 5.1 and
Lemma 5.2. In particular, convergence of the classical outer approximation
approach as well as the Generalized Benders Decomposition approach (cf.
[6], [4] or [27]) is naturally implied.
Furthermore, our convergence result can be generalized to arbitrary
mixed integer programming problems with subdifferentiable convex con-
straint functions, if it is possible to identify the subgradients that satisfy
the KKT system in the solutions of the associated nonlinear subproblems.

6. Numerical experiments. We implemented the modified version


of the outer approximation approach Algorithm 2 (’B&B-OA’) as well as
a nonlinear branch-and-bound approach (’B&B’). The SOCP problems are
solved with our own implementation of an infeasible primal-dual interior
point approach (cf. [26], Chapter 1), the linear programs are solved with
CPLEX 10.0.1.
We report results for mixed 0-1 formulations of different ESTP test
problems (instances t 4*, t 5*) from Beasley’s website [21] (cf. [16]) and
some problems arising in the context of turbine balancing (instances Test*).
The data sets are available on the web [29]. Each instance was solved using
the nonlinear branch-and-bound algorithm as well as Algorithm 2, once for
the choice L = 10 and for the choice L = 10000 respectively.
We used best first node selection and pseudocost branching in the
nonlinear branch-and-bound approach and depth first search as well as
most fractional branching in Algorithm 2, since those performed best in
former tests.
Table 1 gives an overview of the problem dimensions according to
the notation in this paper. We also list for each problem the number of

www.it-ebooks.info
54 SARAH DREWES AND STEFAN ULBRICH

Table 1
Problem sizes (m, n, noc, |J|) and maximal constraints of LP approximation (m oa).

Problem n m noc |J| m oa m oa


(L=10) (L=10000)
t4 nr22 67 50 49 9 122 122
t4 nrA 67 50 49 9 213 231
t4 nrB 67 50 49 9 222 240
t4 nrC 67 50 49 9 281 272
t5 nr1 132 97 96 18 1620 1032
t5 nr21 132 97 96 18 2273 1677
t5 nrA 132 97 96 18 1698 998
t5 nrB 132 97 96 18 1717 1243
t5 nrC 132 97 96 18 1471 1104
Test07 84 64 26 11 243 243
Test07 an 84 63 33 11 170 170
Test54 366 346 120 11 785 785
Test07GF 87 75 37 12 126 110
Test54GF 369 357 131 12 1362 1362
Test07 lowb 212 145 153 56 7005 2331
Test07 lowb an 210 145 160 56 1730 308

Table 2
Number of solved SOCP/LP problems.

Problem B&B B&B-OA B&B-OA


(L=10) (L=10000)
(SOCP) (SOCP/LP) (SOCP/LP)
t4 nr22 31 9/15 9/15
t4 nrA 31 19/39 20/40
t4 nrB 31 20/40 21/41
t4 nrC 31 26/43 25/43
t5 nr1 465 120/745 52/720
t5 nr21 613 170/957 88/1010
t5 nrA 565 140/941 50/995
t5 nrB 395 105 /519 64/552
t5 nrC 625 115/761 56/ 755
Test07 13 8/20 8/20
Test07 an 7 5/9 5/9
Test54 7 5/9 5/9
Test07GF 41 5/39 2/35
Test54GF 37 11/68 9/63
Test07 lowb 383 392/3065 115/2599
Test07 lowb an 1127 128/1505 9/1572

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 55

Table 3
Run times in seconds.

Problem B&B B&B-OA B&B-OA


(L=10) (L=10000)
t4 nr22 2.83 0.57 0.63
t4 nrA 2.86 1.33 1.44
t4 nrB 2.46 1.72 1.71
t4 nrC 2.97 1.54 2.03
t5 nr1 128.96 42.22 29.58
t5 nr21 139.86 61.88 20.07
t5 nrA 128.37 53.44 17.04
t5 nrB 77.03 36.55 18.94
t5 nrC 150.79 44.57 16.11
Test07 0.42 0.28 0.26
Test07 an 0.11 0.16 0.14
Test54 4.4 1.33 1.40
Test07GF 2.26 0.41 0.24
Test54GF 32.37 6.95 5.53
Test07 lowb 244.52 499.47 134.13
Test07 lowb an 893.7 128.44 14.79

constraints of the largest LP relaxation solved during Algorithm 2 (’m oa’).


As depicted in Algorithm 2, every time a nonlinear subproblem is solved,
the number of constraints of the outer approximation problem grows by
the number of conic constraints active in the solution of that nonlinear
subproblem. Thus, the largest fraction of linear programs solved during
the algorithm have significantly fewer constraints than (’m oa’).
For each algorithm Table 2 displays the number of solved SOCP nodes
and LP nodes whereas Table 3 displays the run times. A comparison of
the branch-and-bound approach and Algorithm 2 on the basis of Table 2
shows, that the latter algorithm solves remarkable fewer SOCP problems.
Table 3 displays that for almost all test instances, the branch-and-bound
based outer approximation approach is preferable regarding running times,
since the LP problems stay moderately in size.
For L = 10, at every 10th node an additional SOCP problem is solved
whereas for L = 10000, for our test set no additional SOCP relaxations
are solved. In comparison with L = 10000, for L = 10 more (11 out
of 16 instances) or equally many (3 out of 16 instances) SOCP problems
are solved, whereas the number of solved LP problems is decreased only
for 6 out of 16 instances. Moreover, the number of LPs spared by the
additional SOCP solves for L = 10 is not significant in comparison with
L = 10000 (compare Table 2) and the sizes of the LPs for L = 10000
stay smaller in most cases, since fewer linearizations are added (compare

www.it-ebooks.info
56 SARAH DREWES AND STEFAN ULBRICH

Table 1). Hence, with regard to running times, the version with L = 10000
outperforms L = 10 for almost all test instances, compare Table 3. Thus,
for the problems considered, Algorithm 2 with L = 10000, i.e., without
solving additional SOCPs, achieves the best performance in comparison to
nonlinear branch-and-bound as well as Algorithm 1 with L = 10.
In addition to the above considered instances, we tested some of the
classical portfolio optimization instances provided by Vielma et al. [28]
using Algorithm 2 with L = 10000. For each problem, we report in Table 4
the dimension of the MISOCP formulation, the dimension of the largest
relaxation solved by our algorithm and the dimension of the a priori LP
relaxation with accuracy 0.01 that was presented in [10]. For a better
comparison, we report the number of columns plus the number of linear
constraints as it is done in [10]. The dimensions of the largest LP relax-
ations solved by our approach are significantly smaller than the dimensions
of the LP approximations solved by [10]. Furthermore, in the lifted linear
programming approach in [10], every LP relaxation solved during the algo-
rithm is of the specified dimension. In our approach most of the solved LPs
are much smaller than the reported maximal dimension. In Table 5 we re-
port the run times and number of solved nodes problems for our algorithm
(Alg.2). For the sake of completeness we added the average and maximal
run times reported in [10] although it is not an appropriate comparison
since the algorithms have not been tested on similar machines. Since our
implementation of an interior SOCP solver is not as efficient as a commer-
cial solver like CPLEX which is used in [10], a comparison of times is also
difficult. But the authors of [10] report that solving their LP relaxations
usually takes longer than solving the associated SOCP relaxation. Thus,
we can assume that due to the low dimensions of the LPs solved in our
approach and the moderate number of SOCPs, our approach is likely to be
faster when using a more efficient SOCP solver.

7. Summary. We presented a branch-and-bound based outer approx-


imation approach using subgradient based linearizations. We proved con-
vergence under a constraint qualification that guarantees strong duality of
the occurring subproblems and extended the algorithm such that this as-
sumption can be relaxed. We presented numerical experiments for some ap-
plication problems investigating the performance of the approach in terms
of solved linear and second order cone subproblems as well as run times.
We also investigated the sizes of the linear approximation problems.
Comparison to a nonlinear branch-and-bound algorithm showed that
the outer approximation approach solves almost all problems in signifi-
cantly shorter running times and that its performance is best when not
solving additional SOCP relaxations. In comparison to the outer approx-
imation based approach by Vielma et al. in [10], we observed that the
dimensions of our LP relaxations are significantly smaller which makes our
approach competitive..

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 57

Table 4
Dimension (m+n) and maximal LP approximation (m oa + n) (portfolio instances).

Problem n+m |J| n+ m oa nums+cols


[Alg.2] [10]
classical 20 0 105 20 137 769
classical 20 3 105 20 129 769
classical 20 4 105 20 184 769
classical 30 0 155 30 306 1169
classical 30 1 155 30 216 1169
classical 30 3 155 30 207 1169
classical 30 4 155 30 155 1169
classical 40 0 205 40 298 1574
classical 40 1 205 40 539 1574
classical 40 3 205 40 418 1574
classical 50 2 255 50 803 1979
classical 50 3 255 50 867 1979

Table 5
Run times and node problems (portfolio instances).

Problem Sec. [Alg.2] Nodes [Alg.2] Sec. [10] Sec. [10]


(SOCP/LP) (average) (max)
classical 20 0 2.62 10/75 0.29 1.06
classical 20 3 0.62 2/9 0.29 1.06
classical 20 4 5.70 32/229 0.29 1.06
classical 30 0 52.70 119/2834 1.65 27.00
classical 30 1 16.13 30/688 1.65 27.00
classical 30 3 8.61 20/247 1.65 27.00
classical 30 4 0.41 1/0 1.65 27.00
classical 40 0 46.07 51/1631 14.84 554.52
classical 40 1 361.62 292/15451 14.84 554.52
classical 40 3 138.28 171/5222 14.84 554.52
classical 50 2 779.74 496/ 19285 102.88 1950.81
classical 50 3 1279.61 561/36784 102.88 1950.81

Acknowledgements. We would like to thank the referees for their


constructive comments that were very helpful to improve this paper.

REFERENCES

[1] R. Fletcher and S. Leyffer, Solving Mixed Integer Nonlinear Programs by


Outer Approximation, in Mathematical Programming, 1994, 66: 327–349.
[2] R.A. Stubbs and S. Mehrotra, A branch-and-cut method for 0-1 mixed convex
programming in Mathematical Programming, 1999, 86: 515–532.

www.it-ebooks.info
58 SARAH DREWES AND STEFAN ULBRICH

[3] I. Quesada and I.E. Grosmann, An LP/NLP based Branch and Bound Algorithm
for Convex MINLP Optimization Problems, in Computers and Chemical En-
gineering, 1992, 16(10, 11): 937–947.
[4] A.M. Geoffrion, Generalized Benders Decomposition, in Journal of Optimization
Theory and Applications, 1972, 10(4): 237–260.
[5] M.T. Çezik and G. Iyengar, Cuts for Mixed 0-1 Conic Programming, in Math-
ematical Programming, Ser. A, 2005, 104: 179–200.
[6] M.A. Duran and I.E. Grossmann, An Outer-Approximation Algorithm for a
Class of Mixed-Integer Nonlinear Programs, in Mathematical Programming,
1986, 36: 307–339.
[7] F. Alizadeh and D. Goldfarb, Second-Order Cone Programming, RUTCOR,
Rutgers Center for Operations Research, Rutgers University, New Jersey, 2001.
[8] P. Bonami, L.T. Biegler, A.R. Conn, G. Cornuejols, I.E. Grossmann, C.D.
Laird, J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Wächter, An
Algorithmic Framework for Convex Mixed Integer Nonlinear Programs, IBM
Research Division, New York, 2005.
[9] R.A. Stubbs and S. Mehrotra, Generating Convex Polynomial Inequalities for
Mixed 0-1 Programs, Journal of global optimization, 2002, 24: 311–332.
[10] J.P. Vielma, S. Ahmed, and G.L. Nemhauser, A Lifted Linear Programming
Branch-and-Bound Algorithm for Mixed Integer Conic Quadratic Programs,
INFORMS Journal on Computing, 2008, 20(3): 438–450.
[11] A. Atamtürk and V. Narayanan, Lifting for Conic Mixed-Integer Programming,
BCOL Research report 07.04, 2007.
[12] A. Atamtürk and V. Narayanan, Cuts for Conic Mixed-Integer Programming,
Mathematical Programming, Ser. A, DOI 10.1007/s10107-008-0239-4, 2007.
[13] A. Ben-Tal and A. Nemirovski, On Polyhedral Approximations of the Second-
Order Cone, in Mathematics of Operations Research, 2001, 26(2): 193–205.
[14] E. Balas, S. Ceria, and G. Cornuéjols, A lift-and-project cutting plane al-
gorithm for mixed 0-1 programs, in Mathematical Programming, 1993, 58:
295–324.
[15] M. Fampa and N. Maculan, A new relaxation in conic form for the Euclidean
Steiner Tree Problem in Rn , in RAIRO Operations Research, 2001, 35:
383–394.
[16] J. Soukup and W.F. Chow, Set of test problems for the minimum length connec-
tion networks, in ACM SIGMAP Bulletin, 1973, 15: 48–51.
[17] D. Bertsimas and R. Shioda, Algorithm for cardinality-constrained quadratic
optimization, in Computational Optimization and Applications, 2007, 91:
239–269.
[18] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Con-
vex Programming, SIAM Studies in Applied Mathematics, 2001.
[19] R.T. Rockafellar, Convex Analysis, Princeton University Press, 1970.
[20] C. Geiger and C. Kanzow, Theorie und Numerik restringierter Optimierungsauf-
gaben, Springer Verlag Berlin Heidelberg New York, 2002.
[21] J.E. Beasley, OR Library: Collection of test data for Euclidean Steiner Tree Prob-
lems, http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/esteininfo.html.
[22] P. Belotti, P. Bonami, J.J. Forrest, L. Ladanyi, C. Laird, J. Lee, F. Mar-
got, and A. Wächter, BonMin, http://www.coin-or.org/Bonmin/ .
[23] R. Fletcher and S. Leyffer, User Manual of filterSQP, http://
www.mcs.anl.gov/∼leyffer/papers/SQP manual.pdf.
[24] C. Laird and A. Wächter, IPOPT, https://projects.coin-or.org/Ipopt.
[25] K. Abhishek, S. Leyffer, and J.T. Linderoth, FilMINT: An Outer
Approximation-Based Solver for Nonlinear Mixed Integer Programs, Argonne
National Laboratory, Mathematics and Computer Science Division,2008.
[26] S. Drewes, Mixed Integer Second Order Cone Programming, PhD Thesis, June,
2009.

www.it-ebooks.info
SUBGRADIENT BASED OUTER APPROXIMATION FOR MISOCP 59

[27] P. Bonami, M. Kilinc, and J. Linderoth, Algorithms and Software for Convex
Mixed Integer Nonlinear Programs 2009.
[28] J.P. Vielma, Portfolio Optimization Instances http://www2.isye.gatech.edu/
∼jvielma/portfolio/ .
[29] S. Drewes,MISOCP Test Instances https://www3.mathematik.tu-darmstadt.de/
index.php?id=491.

www.it-ebooks.info
www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS
OKTAY GÜNLÜK∗ AND JEFF LINDEROTH†

Abstract. In this paper we survey recent work on the perspective reformulation


approach that generates tight, tractable relaxations for convex mixed integer nonlin-
ear programs (MINLP)s. This preprocessing technique is applicable to cases where the
MINLP contains binary indicator variables that force continuous decision variables to
take the value 0, or to belong to a convex set. We derive from first principles the perspec-
tive reformulation, and we discuss a variety of practical MINLPs whose relaxation can
be strengthened via the perspective reformulation. The survey concludes with comments
and computations comparing various algorithmic techniques for solving perspective re-
formulations.

Key words. Mixed-integer nonlinear programming, perspective functions.

AMS(MOS) subject classifications. 90C11, 90C30.

1. Introduction. Over the past two decades, tremendous advances


have been made in the ability to solve mixed integer linear programs
(MILP)s. A fundamental reason for the vast improvement is the ability to
build tight, tractable relaxations of MILPs. The relaxations are built either
via problem reformulation (automatically during a preprocessing phase), or
dynamically through the addition of cutting planes. In this paper we sur-
vey a collection of techniques for obtaining tight relaxations to (convex)
Mixed Integer Nonlinear Programs (MINLP)s. We call these preprocessing
techniques the perspective reformulation, since they rely on replacing the
original convex function in the formulation with its so-called perspective.
1.1. Motivation. Consider a 0-1 MINLP, of the form

min c(x, z) (1.1)


(x,z)∈F

where F = R ∩ (Rn−p
+ × Bp ), B denotes {0, 1}, and
def
R = {(x, z) ∈ Rn−p
+ × [0, 1]p | fj (x, z) ≤ 0 ∀j = 1, . . . , m}.

We call the set R a continuous relaxation of F , and we emphasize that R is


not unique in the sense that F can have many different continuous relax-
ations. Throughout, we will be interested in sets R that are convex. Even
under the convexity requirement, the set R is not unique, and any convex
∗ Mathematical Sciences Department, IBM T.J. Watson Research Center, P.O. Box

218, Yorktown Heights, NY 10598, USA ([email protected]).


† Department of Industrial and Systems Engineering, University of Wisconsin-

Madison, 1513 University Avenue, Madison, WI 53706, USA ([email protected]).


The second author was supported by the US Department of Energy under grants DE-
FE02-08ER25861 and DE-FG02-09ER25869, and the National Science Foundation under
grant CCF-0830153.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 61
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_3,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
62 OKTAY GÜNLÜK AND JEFF LINDEROTH

set R with the property that R ∩ Rn−p + × Bp = F is a valid continuous


relaxation of F . If we let conv(F ) to denote the convex hull of F , clearly
all continuous relaxations of F that are convex have to contain conv(F )
and therefore conv(F) is the smallest convex continuous relaxation of F .
In the field of MILP, significant effort is spent on obtaining tight re-
laxations of the feasible set of solutions. This effort is justified by the fact
that the optimization problem simply becomes a linear programming (LP)
problem if conv(F) can be explicitly described with linear inequalities. No-
tice that as the objective function is linear, it is easy to find an optimal
solution that is an extreme point of conv(F), which is guaranteed to be in
F as all extreme points of conv(F) are in F.
In MINLP, on the other hand, the optimal solution to the relaxation
may occur at a point interior to conv(F) and as such, it is not guaranteed
to be integral. It is, however, easy to transform any problem to one with
a linear objective function by moving the nonlinear objective function into
the constraints. Specifically, the problem (1.1) can be equivalently stated as
min{η | η ∈ R, (x, z) ∈ F, η ≥ c(x, z)}. (1.2)
We can, therefore, without loss of generality assume that the objective
function of (1.1) is linear. Notice that, under this assumption, it is possible
to solve the MINLP as a convex nonlinear programming (NLP) problem
if conv(F) can be explicitly described using convex functions. In general,
an explicit description of conv(F) is hard to produce and unlike the linear
case, is not necessarily unique.
In this paper, we review the perspective reformulation approach that,
given a MINLP with an associated continuous relaxation R (perhaps after
applying the transformation (1.2)), produces a smaller (tighter) continu-
ous relaxation R that contains conv(F). The advantage of having tight
relaxations is that as they approximate F better, they give better lower
bounds, and they are more effective in obtaining optimal integral solutions
via an enumeration algorithm.
1.2. The importance of formulation. To emphasize the impor-
tance of a tight relaxation, consider the well-known uncapacitated facility
location problem (UFLP). Modeling the UFLP as a MILP is a canonical
example taught to nearly all integer programming students to demonstrate
the impact of a “good” versus “poor” formulation. In the UFLP, each cus-
tomer in a set J must have his demand met from some facilities in a set I.
A binary variable zi indicates if the facility i is open, and the continuous
variable xij represents the percentage of customer j’s demands met from
facility i.
The logical relationships that a customer may only be served from an
open facility may be written algebraically in “aggregated” form

xij ≤ |J|zi ∀i ∈ I
j∈J

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 63

or in “disaggregated” form

xij ≤ zi ∀i ∈ I, j ∈ J. (1.3)

Writing the constraints in disaggregated form (1.3) makes a significant dif-


ference in the computational performance of MILP solvers. For example,
in 2005, Leyffer and Linderoth [21] experiment with a simple branch and
bound based MILP solver and report that on the average it took the solver
10,000 times longer when the aggregated formulation is used. For modern
commercial MILP solvers, however, both formulations solve almost simulta-
neously. This is because modern MILP software automatically reformulates
the aggregated (weak) formulation into the disaggregated (strong) one. We
strongly believe that similar performance improvements can be obtained by
MINLP solvers by performing automatic reformulation techniques specially
developed for MINLP problems, and we hope this survey work will spur
this line of research.
A common reformulation technique used by MILP solvers is to rec-
ognize simple structures that appear in the formulation and replace them
with tight relaxations for these structures. The tightening of these struc-
tures leads to the tightening of the overall relaxation. We will follow this
same approach here to derive tighter (perspective) relaxations through the
study and characterization of convex hulls of simple sets.
1.3. The perspective reformulation. We are particularly inter-
ested in simple sets related to “on-off” type decisions. To that end, let z
be a binary indicator variable that controls continuous variables x. The
perspective reformulation is based on strengthening the natural continuous
relaxation of the following “on-off” set:
 
def n x = x̂ if z = 0
S = (x, z) ∈ R × B ,
x ∈ Γ if z = 1

where x̂ is a given point (ex: x̂ = 0), and


def
Γ = {x ∈ Rn | fj (x) ≤ 0 j = 1, . . . , m, u ≥ x ≥ }

is a bounded convex set. (Note that Γ can be convex even when some of
the functions fj defining it are non-convex.)
In this paper we study the convex hull description of sets closely related
to S. We present a number of examples where these simple sets appear as
substructures, and we demonstrate that utilizing the convex hull descrip-
tion of these sets helps solve the optimization problem efficiently. Closely
related to this work is the effort of Frangioni and Gentile [11, 13], who de-
rive a class of cutting planes that significantly strengthen the formulation
for MINLPs containing “on-off” type decisions with convex, separable, ob-
jective functions, and demonstrate that these inequalities are quite useful
in practice. The connection is detailed more in Section 5.3.

www.it-ebooks.info
64 OKTAY GÜNLÜK AND JEFF LINDEROTH

The remainder of the paper is divided into 6 sections. Section 2 gives a


review of perspective functions and how they can be used to obtain strong
MINLP formulations. Section 3 first derives the convex hull description of
two simple sets and then uses them as building blocks to describe the convex
hull of more complicated sets. Section 4 describes a number of applications
where the perspective reformulation technique can be successfully applied.
Section 5 contains some discussion about computational approaches for
solving relaxations arising from the perspective reformulation. Section 6
demonstrates numerically the impact of perspective reformulation approach
on two applications. Concluding remarks are offered in Section 7.
2. Perspective functions and convex hulls. The perspective of a
given function of f : Rn → R is the function f˜ : Rn+1 → R defined as
follows:

⎨ λf (x/λ) if λ > 0
f˜(λ, x) = 0 if λ = 0 (2.1)

∞ otherwise.

An important property of perspective functions is that f˜ is convex provided


that f is convex. A starting point for the use of the perspective function
for strong formulations of MINLPs is the work of Ceria and Soares [8].
2.1. Using perspective functions to obtain convex hulls. Ceria
and Soares characterize the closure of the convex hull of the union of convex
sets using the perspective transformation. The main result of Ceria and
Soares is stated (in a simplified form) in Theorem 2.1.
Theorem 2.1 (Ceria and Soares [8]). For t ∈ T , let Gt : Rn → Rmt
be a vector-valued function with the property that the corresponding sets

K t = {x ∈ Rn : Gt (x) ≤ 0}

are convex and bounded. Let K = conv(∪t∈T K t ). Then x ∈ K if and only


if the following (nonlinear) system is feasible:
 
x= xt ; λt = 1; G̃t (λt , xt ) ≤ 0, λt ≥ 0, ∀t ∈ T. (2.2)
t∈T t∈T

Theorem 2.1 provides an extended formulation that describes the set K


in a higher dimensional space. The work extends a well-known result from
Balas in the case that all the K t are polyhedral [2]. Also note that Gt being
a convex function is sufficient, but not necessary for K t to be convex.
A similar argument using the perspective functions was used by Stubbs
and Mehrotra to formulate a convex programming problem to generate
disjunctive cutting planes [25]. Later, Grossmann and Lee apply these same
concepts to more general logic-constrained optimization problems known a
generalized disjunctive programs [15].

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 65

2.2. Computational challenges. There are a number of challenges


in using the convex hull characterization of Theorem 2.1 in computation.
One challenge is determining an appropriate disjunction such that F ⊆
∪t∈T K t . For example, using the disjunction associated with requiring a
collection of the variables to be in {0, 1} requires |T | to be exponential in
the number of variables chosen. In this case, a cutting plane algorithm,
like the one suggested by Stubbs and Mehrotra may be appropriate [25].
A second challenge occurs when K t is not bounded. In this case,
the convex hull characterization is more complicated, and a closed form
solution may not be known. Ceria and Soares address these complications
and suggest a log-barrier approach to its solution [8].
A third challenge arises from the form of the perspective function. By
definition, there is a point of nondifferentiability at λt = 0. This may
cause difficulty for solvers used to solve the relaxation. Grossmann and
Lee suggest to use the perturbed perspective inequality

(λ + )f (x/(λ + )) ≤ 0

for a small constant  > 0, which is valid if f (0) ≤ 0. An improved


perturbed expression is suggested by Furman, Sawaya, and Grossmann [14]:

((1 − )λ + )f (x/((1 − )λ + )) ≤ f (0)(1 − λ). (2.3)

Notice that both expressions give an exact approximation of the perspective


inequality as  → 0. In addition, inequality (2.3) has the very useful
property that it gives the perspective inequality at λ = 0 and λ = 1,
for any value of 0 <  < 1. Furthermore, it preserves convexity as the
left hand side of the inequality is convex if f is convex, and the inequality
always forms a relaxation for any choice of 0 <  < 1.
When the sets K t are defined by conic quadratic inequalities (CQI),
it is easier to deal with the nondifferentiability issue as the perspective of
the associated functions are known to be representable by CQI [3]. We
discuss this further in Section 5.2 and give CQI representation of different
perspective functions that arise in the MINLP applications considered in
Section 4.
3. Simple sets. We next apply Theorem 2.1 to the special case where
T = 2, and the sets K 0 and K 1 have a specific, simple, structure. More
precisely, we consider the cases when K 0 is either a single point or a ray
and K 1 is defined by convex functions. We then use these sets as building
blocks to describe the convex hull of more complicated sets which appear as
sub-structures in some MINLP models. We also note that there is ongoing
work on other special cases by Bonami, Cornuéjols, and Hijazi [5].
3.1. The convex hull of a point and a convex set. Consider the
set W = W 0 ∪ W 1 which is defined using the indicator variable z ∈ {0, 1}
as follows:

www.it-ebooks.info
66 OKTAY GÜNLÜK AND JEFF LINDEROTH

 
W 0 = (x, z) ∈ Rn+1 : x = 0, z = 0

and
 
W 1 = (x, z) ∈ Rn+1 : fi (x) ≤ 0 for i ∈ I, u ≥ x ≥ l, z = 1

where u, l ∈ Rn+ , and I is the index set for the constraints. Clearly, both
W 0 and W 1 are bounded, and W 0 is a convex set. Furthermore, if W 1 is
also convex then we may write an extended formulation as

conv(W ) = (x, z) ∈ Rn+1 : 1 ≥ λ ≥ 0,
x = λx1 + (1 − λ)x0 ,
z = λz 1 + (1 − λ)z 0 ,
x0 = 0, z 0 = 0, z 1 = 1

fi (x1 ) ≤ 0 for i ∈ I, u ≥ x1 ≥ l .

We next give a description of W without the additional variables.


Lemma 3.1. If W 1 is convex, then conv(W ) = W − ∪ W 0 , where
 
W − = (x, z) ∈ Rn+1 : fi (x/z) ≤ 0 i ∈ I, uz ≥ x ≥ lz, 1≥ z > 0

(notice that z is strictly positive).


Proof. As x0 , z 0 and z 1 are fixed in the extended formulation above,
it is possible to substitute out these variables. In addition, as z = λ
after these substitutions, we can eliminate λ. Furthermore, as x = λx1 =
zx1 , we can eliminate x1 by replacing it with x/z provided that z > 0.
If, on the other hand, z = 0, clearly (x, 0) ∈ conv(W ) if and only if
(x, 0) ∈ W 0 .
It is also possible to show that W 0 is contained in the closure of W −
(see [17]) which leads to the following observation.
Corollary 3.1. conv(W ) = closure(W − ).
We would like to emphasize that even when f (x) is a convex function
fi (x/z) may not be convex. However, for z > 0 we have

fi (x/z) ≤ 0 ⇔ z q fi (x/z) ≤ 0 (3.1)

for any q ∈ R. In particular, taking q = 1 gives the perspective function


which is known to be convex provided that f (x) is convex. Consequently,
the set W − described in Lemma 3.1 can also be written as follows:
 
W − = (x, z) ∈ Rn+1 : zfi (x/z) ≤ 0 i ∈ I, uz ≥ x ≥ lz, 1≥ z > 0 .

When all fi (x) that define W 1 are polynomial functions, the convex
hull of W can be described in closed form in the original space of variables.

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 67

More precisely, let


pi n
  q
fi (x) = cit xj itj
t=1 j=1
n
for all i ∈ I and define qit = j=1 qitj , qi = maxt {qit } and q̄it = qi − qit .
If all fi (x) are convex and bounded in [l, u], then (see [17])
 pi
 n
 q
conv(W ) = (x, z) ∈ Rn+1 : cit z q̄it xj itj ≤ 0 for i ∈ I,
t=1 j=1

zu ≥ x ≥ lz, 1 ≥ z ≥ 0 .

3.2. The convex hull of a ray and a convex set. It is possible


to extend Lemma 3.1 to obtain the convex hull description of a ray and a
convex set that contains the ray as an unbounded direction. More precisely
consider the set T = T0 ∪ T1 where
 
T 0 = (x, y, z) ∈ Rn+1+1 : x = 0, y ≥ 0, z = 0 ,

and
 
T 1 = (x, y, z) ∈ Rn+1+1 : fi (x) ≤ 0 i ∈ I, g(x) ≤ y, u ≥ x ≥ l, z = 1
where u, l ∈ Rn+ , and I = {1, . . . , t}.
Lemma 3.2. If T 1 is convex, then conv(T ) = T − ∪ T 0 , where

T − = (x, y, z) ∈ Rn+1+1 : fi (x/z) ≤ 0 i ∈ I, g(x/z) ≤ y/z,

uz ≥ x ≥ lz, 1≥z>0 .

Proof. Using the same arguments as in the proof of Lemma 3.1, it is


easy to show that conv(T ) = P ∪ T 0 , where the following gives an extended
formulation for the set P :

P = (x, y, z) ∈ Rn+1+1 : fi (x/z) ≤ 0 i ∈ I, uz ≥ x ≥ lz, 1 ≥ z > 0
1−z 0 
g(x/z) ≤ y/z − y , y0 ≥ 0 .
z
As (1 − z)/z > 0 and y 0 ≥ 0 for all feasible points, y 0 can easily be
projected out to show that P = T − .
Similar to the proof of Corollary 3.1, it is possible to show that T 0 is
contained in the closure of T − .
Corollary 3.2. conv(T ) = closure(T − ).
In addition, we note that when all fi (x) for i ∈ I and g(x) are polynomial
functions, the convex hull of T can be described in closed form by simply
multiplying each inequality in the description of T − with z raised to an
appropriate power. We do not present this description to avoid repetition.

www.it-ebooks.info
68 OKTAY GÜNLÜK AND JEFF LINDEROTH

3.3. A simple quadratic set. Consider the following mixed-integer


set with 3 variables:
 
S = (x, y, z) ∈ R2 × B : y ≥ x2 , uz ≥ x ≥ lz, x ≥ 0 .
 
Notice that S = S 0 ∪ S 1 where S 0 = (0, y, 0) ∈ R3 : y ≥ 0 , and
 
S 1 = (x, y, 1) ∈ R3 : y ≥ x2 , u ≥ x ≥ l, x ≥ 0 .

Applying Lemma 3.2 gives the convex hull of S as the perspective of the
quadratic function defining the set. Note that when z > 0 the constraint
yz ≥ x2 is same as y/z ≥ (x/z)2 and when z = 0, it implies that x = 0.
Lemma 3.3. conv(S) = S c where
 
S c = (x, y, z) ∈ R3 : yz ≥ x2 , uz ≥ x ≥ lz, 1 ≥ z ≥ 0, x, y ≥ 0 .

Notice that x2 − yz is not a convex function and yet the set T c =


{(x, y, z) ∈ R3 : yz ≥ x2 , x, y, z ≥ 0} is a convex set. This explains
why the set S c , obtained by intersecting T c with half-spaces, is convex.
3.4. A larger quadratic set. Using the convex hull description of
the set S, it is possible to produce a convex hull description of the following
set
 n
 
n+1 n
Q = (w, x, z) ∈ R ×B : w ≥ qi x2i , ui zi ≥ xi ≥ li zi , i ∈ I , (3.2)
i=1

where I = {1, . . . , n} and q, u, l ∈ Rn+ . The convex hull description


n of Q
2
is closely related to the convex envelope of the function i=1 qi xi over
a mixed integer set. This set was first considered in the Ph.D. thesis of
Stubbs [26].
Now consider the following extended formulation of Q
  
def
Q̄ = (w, x, y, z) ∈ R3n+1 : w ≥ qi yi , (xi , yi , zi ) ∈ Si , i ∈ I
i

where Si has the same form as the set S discussed in the previous sec-
tion except the bounds u and l are replaced with ui and li . Note that
if (w, x, y, z) ∈ Q̄ then (w, x, z) ∈ Q, and therefore proj(w,x,z) (Q̄) ⊆ Q.
On the other hand, for any (w, x, z) ∈ Q, letting yi = x2i gives a point
(w, x, y  , z) ∈ Q̄. Therefore, Q̄ is indeed an extended formulation of Q, or,
in other words, Q = proj(w,x,z)(Q̄).
Before we present a convex hull description of Q̄ we first define some
basic properties of mixed-integer sets. First, remember that given a closed
set P ⊂ Rn , a point p ∈ P is called an extreme point of P if it can not be
represented as p = 1/2p1 + 1/2p2 for p1 , p2 ∈ P , p1 = p2 . The set P is
called pointed if it has extreme points. A pointed set P is called integral

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 69

with respect to (w.r.t.) a subset of the indices J if for any extreme point
p ∈ P , pi ∈ Z for all i ∈ J.
Lemma 3.4 ([17]). For i = 1, 2 let Pi ⊂ Rni be a closed and pointed
set which is integral w.r.t. indices Ii . Let

P  = {(x, y) ∈ Rn1 +n2 : x ∈ P1 , y ∈ P2 },

then,
(i) P  is integral with respect to I1 ∪ I2 .
(ii) conv(P  ) = {(x, y) ∈ Rn1 +n2 : x ∈ conv(P1 ), y ∈ conv(P2 )}.
Lemma 3.5 ([17]). Let P ⊂ Rn be a given closed, pointed set and let

P  = {(w, x) ∈ Rn+1 : w ≥ ax, x ∈ P }

where a ∈ Rn .
(i) If P is integral w.r.t. J, then P  is also integral w.r.t. J.
(ii) conv(P  ) = P  where

P  = {(w, x) ∈ Rn+1 : w ≥ ax, x ∈ conv(P )}.

We are now ready to present the convex hull of Q̄.


Lemma 3.6. The set
  
Q̄c = (w, x, y, z) ∈ R3n+1 : w ≥ qi yi , (xi , yi , zi ) ∈ Sic , i ∈ I .
i

is integral w.r.t. the indices of z variables. Furthermore, conv(Q̄) = Q̄c .


Proof. Let

D = {(x, y, z) ∈ R3n : (xi , yi , zi ) ∈ Si , i ∈ I}

so that
n

Q̄ = {(w, x, y, z) ∈ R3n+1 : w ≥ qi yi , (x, y, z) ∈ D}.
i=1

By Lemma 3.5, the convex hull of Q̄ can be obtained by replacing D with


its convex hull in this description. By Lemma 3.4, this can simply be done
by taking convex hulls of Si ’s, that is, by replacing Si with conv(Si ) in the
description of D. Finally, by Lemma 3.5, Q̄c is integral.
A natural next step is to study the projection of the set Q̄c into the
2
space of (w, x, z). One possibility is to substitute
 the2 term xi /zi for each
variable yi , resulting in the inequality w ≥ i qi xi /zi . This formula,
however, may not be suitable for computation as it is not defined for zi = 0,
and zi = 0 is one of the two feasible values for zi . We next present an

www.it-ebooks.info
70 OKTAY GÜNLÜK AND JEFF LINDEROTH

explicit description of the projection that uses an exponential number of


inequalities. Let
   
Qc = (w, x, z) ∈ R2n+1 : w zi ≥ qi x2i zl , ∀S ⊆ I (Π)
i∈S i∈S l∈S\{i}

ui zi ≥ xi ≥ li zi , xi ≥ 0, i ∈ I .

Notice that a given point p̄ = (w̄, x̄, z̄) satisfies the nonlinear inequalities
in the description of Qc for a particular S ⊆ I if and only if one of the
following conditions
 hold: (i) z̄i = 0 for some i ∈ S, or, (ii) if all zi > 0,
then w̄ ≥ i∈S qi x̄2i /z̄i . Based on this observation it is possible to show
that these (exponentially many) inequalities are sufficient to describe the
convex hull of Q in the space of the original variables.
Lemma 3.7 ([17]). Qc = proj(w,x,z) (Q̄c ). Note that all of the expo-
nentially many inequalities that are used in the description of Qc are indeed
necessary. To see this, consider a simple instance with ui = li = qi = 1
for all i ∈ I = {1, 2, . . . , n}. For a given S̄ ⊆ I, let pS̄ = (w̄, x̄, z̄) where
w̄ = |S̄| − 1, z̄i = 1 if i ∈ S̄, and z̄i = 0 otherwise, and x̄ = z̄. Note that
pS̄ ∈ Qc . As z̄i = qi x̄2i , inequality (Π) is satisfied by p̄ for S ⊆ I if and only
if
 
(|S̄| − 1) z̄i ≥ |S| z̄i .
i∈S i∈S

Note that unless S ⊆ S̄, the term i∈S z̄i becomes zero and therefore
inequality (Π) is satisfied. In addition, inequality (Π) is satisfied whenever
|S̄| > |S|. Combining these two observations, we can conclude that the
only inequality violated by pS̄ is the one with S = S̄. Due to its size, the
projected set is not practical for computational purposes and we conclude
that it is more advantageous to work in the extended space, keeping the
variables yi
3.5. A simple non-quadratic set. The simple 3 variable mixed-
integer set S introduced in Section 3.3 can be generalized to the following
set, studied by Aktürk, Atamtürk, and Gürel [1]:
 
C = (x, y, z) ∈ R2 × B : y ≥ xa/b , uz ≥ x ≥ lz, x ≥ 0

where a, b ∈ Z+ and a ≥ b > 0. Clearly C = C 0 ∪ C 1 , with

C 0 = {(0, y, 0) ∈ R3 : y ≥ 0},

and

C 1 = {(x, y, 1) ∈ R3 : y ≥ xa/b , u ≥ x ≥ l, x ≥ 0}.

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 71

By applying Lemma 3.2, the convex hull of C is given by using the perspec-
tive of the function f (y, x) = y b − xa and scaling the resulting inequality
by z a .
Lemma 3.8 (Aktürk, Atamtürk, Gürel [1]). The convex hull of C is
given by
 
Cc = (x, y, z) ∈ R3 : y b z a−b ≥ xa , uz ≥ x ≥ lz, 1 ≥ z ≥ 0, x, y ≥ 0 .

In addition, it is possible to construct the convex hull of larger sets


using the set C as a building block. See [1] for more details.

4. Applications. In this section, we present six applications to which


the perspective reformulation has been applied.

4.1. Separable Quadratic UFL. The Separable Quadratic Unca-


pacitated Facility Location Problem (SQUFL) was introduced by Günlük,
Lee, and Weismantel [16]. In the SQUFL, there is a set of customers J,
opening a facility i ∈ I. All customers have unit demand that can be
satisfied using open facilities only. The shipping cost is proportional to
the square of the quantity delivered. Letting zi indicate if facility i ∈ I is
open, and xij denote the fraction of customer j’s demand met from facility
i, SQUFL can be formulated as follows:
 
min ci zi + qij x2ij
i∈I i∈I j∈J
subject to xij ≤ zi ∀i ∈ I, ∀j ∈ J,

xij = 1 ∀j ∈ J,
i∈I
zi ∈ {0, 1}, xij ≥ 0 ∀i ∈ I, ∀j ∈ J.

To apply the perspective formulation, auxiliary variables yij are used to


replace the terms x2ij in the objective function and the constraints

x2ij − yij ≤ 0 ∀i ∈ I, j ∈ J (4.1)

are added. In this reformulation, if zi = 0, then xij = 0 and yij ≥ 0 ∀j ∈ J,


while if zi = 1, the convex nonlinear constraints (4.1) should also hold.
Therefore, we can strengthen the formulation of SQUFL by replacing (4.1)
by its perspective reformulation

x2ij /zi − yij ≤ 0 ∀i ∈ I, ∀j ∈ J. (4.2)

We demonstrate the impact of this reformulation on solvability of the


MINLP in Section 6.1.

www.it-ebooks.info
72 OKTAY GÜNLÜK AND JEFF LINDEROTH

4.2. Network design with congestion constraints. The next ap-


plication is a network design problem with requirements on queuing delay.
Similar models appear in the papers [6], [4], and [7]. In the problem, there
is a set of commodities K to be shipped over a capacitated directed net-
work G = (N, A). The capacity of arc (i, j) ∈ A is uij , and each node
i ∈ N has a net supply bki of commodity k ∈ K. There is a fixed cost cij of
opening each arc (i, j) ∈ A, and we introduce {0,1}-variables zij to indicate
whether arc (i, j) ∈ A is opened. The quantity of commodity k routed on
arc (i, j) is measured by variable xkij and fij = k∈K xkij denotes the total
flow on the arc. A typical measure of the total weighted congestion (or
queuing delay) is

def
 fij
ρ(f ) = rij ,
1 − fij /uij
(i,j)∈A

where rij ≥ 0 is a user-defined weighting parameter for each arc. We use


decision variables yij to measure the contribution of the congestion on arc
(i, j) to the total congestion ρ(f ). The network should be designed so as
to keep the total queuing delay less than a given value β, and this is to be
accomplished at minimum cost. The resulting optimization model (NDCC)
can be written as

min cij zij
(i,j)∈A
 
subject to xkij − xkji = bki ∀i ∈ N, ∀k ∈ K,
(i,j)∈A (j,i)∈A

xkij − fij = 0 ∀(i, j) ∈ A,
k∈K
fij ≤ uij zij ∀(i, j) ∈ A, (4.3)

rij fij
yij ≥ ∀(i, j) ∈ A, (4.4)
1 − fij /uij

yij ≤ β,
(i,j)∈A
|A|×|K| |A| |A|
x ∈ R+ , y ∈ R+ , f ∈ R + , z ∈ {0, 1}|A| .

In this formulation of NDCC, note that if zij = 0, then fij = 0 and


yij ≥ 0. On the other hand, if zij = 1, then fij and yij must satisfy
fij ≤ uij and constraint (4.4). Therefore, each constraint (4.4) can be
replaced by its perspective counterpart:
 
rij fij /zij yij
zij − ≤ 0. (4.5)
1 − fij /(uij zij ) zij

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 73

4.3. Scheduling with controllable processing times. Consider a


scheduling problem where jobs are assigned to non-identical parallel ma-
chines with finite capacity. Let J denote the set of jobs and I denote the
set of machines. In this problem, not all jobs have to be processed but if
job j ∈ J is assigned to a machine i ∈ I, a reward of hij is collected. The
regular processing time of job j on machine i is pij , however by paying a
certain cost, it can be reduced to (pij − xij ) where xij ∈ [0, uij ]. The cost
of reducing the processing time of job i on machine j by xij units is given
by the expression
a /bij
fij (xij ) = kij xijij .
This problem is called the machine-job assignment problem with control-
lable times and has been recently studied by Aktürk, Atamtürk and Gürel
[1]. A MINLP formulation for this problem is:

max (hij zij − fij (xij ))
i∈I j∈J

subject to (pij zij − xij ) ≤ ci ∀i ∈ I
j∈J

xij ≤ uij zij ∀i ∈ I, ∀j ∈ J (4.6)



zij ≤ 1 ∀j ∈ J
i∈I
zij ∈ {0, 1}, xij ≥ 0 ∀i ∈ I, ∀j ∈ J (4.7)
where the variable zij denotes if job j is assigned to machine i and xij is
the reduction on the associated processing time. The total processing time
available on machine i ∈ I is denoted by ci . The objective is to maximize
the sum of the rewards minus the cost of reducing the processing times.
As in the case of the SQUFL in Section 4.1, after adding a new variable
yij and a new constraint
a /bij
xijij ≤ yij (4.8)
for all i ∈ I, j ∈ J, it is possible to replace the objective function with the
following linear expression:

(hij zij − yij /kij ).
i∈I j∈J

The inequality (4.8) together with inequalities (4.6) and (4.7) is the set C
studied in Section 3.5 and therefore, inequality (4.8) can be replaced with
its perspective counterpart
 a /b
xij ij ij
zij ≤ yij (4.9)
zij

www.it-ebooks.info
74 OKTAY GÜNLÜK AND JEFF LINDEROTH

to obtain a stronger formulation. The authors of [1] raise both sides of


a −b
inequality (4.9) to the bth power and multiply both sides by zijij ij to
obtain equivalent inequalities
a b
xijij ≤ yijij z aij −bij . (4.10)

4.4. The unit commitment problem. One of the essential opti-


mization problems in power generation is the so-called unit commitment
problem which involves deciding the power output levels of a collection of
power generators over a period of time. In this setting, a generator (also
called a unit) is either turned off and generates no power, or it is turned on
and generates power in a given range [l, u]. It is important to point out that
l > 0 and therefore production levels are ”semi-continuous”. In most mod-
els, time horizon is divided into a small number of discrete intervals (ex:
48 half hour intervals for a daily problem) and the generators are required
to collectively satisfy a given demand level in each interval. The operating
cost of a generator is typically modeled by a convex quadratic function.
There are also additional constraints including the most commonly used
min-up, min-down constraints that require that a generator must stay on
for a certain number of time periods after it is turned on, and similarly, it
must stay down for a number of time periods after it is turned off. Letting
I denote the set of generators and T denote the set of time periods under
consideration, a MINLP formulation for this problem is the following:
 
min hit zit + fit (xit )
i∈I t∈T i∈I t∈T

subject to xit = dt ∀t ∈ T
i∈I
li zit ≤ xit ≤ ui zit ∀i ∈ I, ∀t ∈ T (4.11)

z∈P (4.12)

zij ∈ {0, 1} ∀i ∈ I, ∀t ∈ T (4.13)

where variable zit denotes if generator i ∈ I is turned on in period t ∈ T


and variable xit gives the production level when the generator is on. There
is a fixed cost hit of operating a unit i in period t as well as a variable cost
given by the convex quadratic function fit (x) = ait x2 + bit x for some given
ait , bit ∈ R+ . The demand requirement in period t is given by dt . Finally,
the constraint (4.12) above expresses some other constraints involving how
generators can be turned on and off.
To obtain a stronger formulation, it is again possible to introduce new
variables yit and constraints

ait x2it + bit xit ≤ yit (4.14)

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 75

for all i ∈ I and t ∈ T so that the new variable yit can replace the term
fit (xit ) in the objective function. Using inequalities (4.11) and (4.13), we
can now replace inequality (4.14) with its perspective counterpart
ait x2it + bit xit zit ≤ yit zit (4.15)
to obtain a stronger formulation.
4.5. Stochastic service system design. Elhedhli [10] describes a
stochastic service system design problem (SSSD) modeled as a network of
M/M/1 queues. The instance is characterized by a sets of customers M ,
facilities N , and service levels K. There are binary decision variables xij
to denote if customer i’s demand is met by facility j and yjk to denote if
facility j is operating at service level k. Customer i has a mean demand
rate of λi and facility j has a mean service rate of μjk when operated at
service level k. There is a fixed cost cij of assigning customer i to facility
j, and a fixed cost fjk of operating facility j at level k.
A straightforward formulation of problem is not convex, however, by
introducing auxiliary variables vj and zjk , Elhedhli provides the following
convex MINLP formulation:

  
min cij xij + t vj + fjk yjk
i∈M j∈N j∈N j∈N k∈K
 
subject to λi xij − μjk zjk = 0 ∀j ∈ N
i∈M k∈K

xij = 1 ∀i ∈ M
j∈N

yjk ≤ 1 ∀j ∈ N
k∈K
zjk − yjk ≤ 0 ∀j ∈ N, ∀k ∈ K (4.16)
zjk − vj /(1 + vj ) ≤ 0 ∀j ∈ N, ∀k ∈ K (4.17)
zjk , vj ≥ 0, xij , yjk ∈ {0, 1} ∀i ∈ M, j ∈ N, ∀k ∈ K (4.18)
Instead of directly including the nonlinear constraints (4.17) in the for-
mulation, Elhedhli proposes linearizing the constraints at points (vj , zjk ) =
(vjb , 1), b ∈ B, yielding

1 (vjb )2
zjk − b
vj ≤ . (4.19)
(1 + vj )2 (1 + vjb )2
Elhedhli uses a dynamic cutting plane approach to add inequalities (4.19).
Notice that if yjk = 0, then zjk = 0 and vj ≥ 0, and therefore inequal-
ity (4.17) can be replaced by their perspective counterpart
vj
zjk ≤ (4.20)
1 + vj /yjk

www.it-ebooks.info
76 OKTAY GÜNLÜK AND JEFF LINDEROTH

yielding a tighter (non-linear) formulation. Furthermore, linearizing these


inequalities at points (vj , yjk , zjk ) = (vjb , 1, 1), b ∈ B, gives

1 (vjb )2
zjk − vj ≤ yjk (4.21)
(1 + vjb )2 (1 + vjb )2

which dominate the inequalities used in Elhedhli [10]. Note that the in-
equalities (4.21) could also be derived by applying a logical integer strength-
ening argument to the inequalities (4.19). The linearized perspective in-
equalities are called perspective cuts [11], which are discussed in greater
detail in Section 5.3. Computational results demonstrating the effect of
the perspective reformulation on this application are given in Section 6.2.
4.6. Portfolio selection. A canonical optimization problem in finan-
cial engineering is to find a minimum variance portfolio that meets a given
minimum expected return requirement of ρ > 0, see [22]. In the prob-
lem, there is a set N of assets available for purchase. The expected return
of asset i ∈ N is given by αi , and the covariance of the returns between
pairs of assets is given in the form of a positive-definite matrix Q ∈ Rn×n .
There can be at most K different assets in the portfolio and there is a min-
imum and maximum buy-in thresholds for the assets chosen. A MINLP
formulation of the problem is

min{xT Qx | eT x = 1, αT x ≥ ρ, eT z ≤ K; i zi ≤ xi ≤ ui zi , zi ∈ B ∀i ∈ N },

where the decision variable xi is the percentage of the portfolio invested in


asset i and zi is a binary variable indicating the purchase of asset i. Unfor-
tunately, direct application of the perspective reformulation is not possible,
as the objective is not a separable function of the decision variables.
However, in many practical applications, the covariance matrix is ob-
tained from a factor model and has the form Q = BΩB T + Δ2 , for a
given exposure matrix, B ∈ Rn×f , positive-definite factor-covariance ma-
trix Ω ∈ Rf ×f , and positive definite, diagonal specific-variance matrix
Δ ∈ Rn×n [24]. If a factor model is given, a separable portion of the
objective function is easily extracted by introducing variables yi , changing
the objective to

min xT (BΩB T )x + Δii yi ,
i∈N

and enforcing the constraints yi ≥ x2i ∀i ∈ N . These constraints can then


be replaced by their perspective counterparts yi ≥ x2i /zi to obtain a tighter
formulation.
Even if the covariance matrix Q does not directly have an embedded
diagonal structure from a factor model, it may still be possible to find a
diagonal matrix D such that R = Q − D is positive-definite. For example,

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 77

Frangioni and Gentile [11] suggest using D = λn I, where λn > 0 is the


smallest eigenvalue of Q. Frangioni and Gentile [12] subsequently gave a
semidefinite programming approach to obtain a diagonal matrix D that
may have desirable computational properties.
5. Computational approaches. Algorithms to solve MINLP are
based on solving a sequence of continuous relaxations of the formulation. In
this section, we discuss approaches and software for solving the perspective
reformulation. The approaches for solving the perspective reformulation
fall into three general categories, with tradeoffs in speed and generality of
the approaches. The first approach, for use in the most general cases, is to
simply give the reformulated problem to a general purpose NLP solver. The
second approach is to use a solver that is specialized for second-order cone
programming problems. A final approach is to linearize the nonlinear func-
tions of the perspective reformulation and use an LP solver. This approach
is most effective if the linearizations are added in a dynamic manner.
5.1. NLP solvers. Care must be taken when using a traditional
solver for nonlinear programs to solve the perspective reformulation. Ap-
plying the perspective transformation to a constraint f (x) ≤ 0 leads to
zf (x/z) ≤ 0, which is not defined when z = 0. Often, the constraint
zf (x/z) ≤ 0 can be manipulated to remove z from the denominator, but
this may result in new difficulties for the NLP solver. To illustrate this
point, consider the inequalities (4.1) in the description of the SQUFL intro-
duced in Section 4.1. Applying the perspective transformation to x2 −y ≤ 0
gives

x2 /z − y ≤ 0, (5.1)

which is not defined at z = 0. Multiplying both sides of the inequality by


z gives

x2 − yz ≤ 0. (5.2)

However, since the function x2 − yz is not convex, traditional NLP solvers


cannot guarantee convergence to a globally optimal solution to the relax-
ation. A reformulation trick can be used to turn (5.1) into an equivalent
inequality with a convex constraint function. Specifically, since y, z ≥ 0,
(5.1) is equivalent to

(2x)2 + (y − z)2 − y − z ≤ 0, (5.3)

and the constraint function in (5.3) is convex. This, however, may intro-
duce yet a different obstacle to NLP software, as the constraint function in
(5.3) is not differentiable at (x, y, z) = (0, 0, 0). In Section 6 we will show
some computational experiments aimed at demonstrating the effectiveness
of NLP software at handling perspective constraints in their various equiv-
alent forms.

www.it-ebooks.info
78 OKTAY GÜNLÜK AND JEFF LINDEROTH

5.2. SOCP solvers. A second-order cone program (SOCP) is a


mathematical program with (conic) quadratic inequalities of the form
Ax + b2 ≤ cT x + d. (5.4)
A rotated second-order cone constraint is of the form
x2 ≤ yz with y ≥ 0, z ≥ 0. (5.5)
As noted in Section 5.1, rotated second-order cone constraints (5.5) are
equivalent to second-order cone constraints (5.4) since
(2x, y − z)T  ≤ y + z ⇔ x2 ≤ yz, y ≥ 0, z ≥ 0. (5.6)
The set of points that satisfy (5.4) or (5.5) forms a convex set, and
efficient and robust algorithms exist for solving optimization problems con-
taining second-order cone constraints [27, 23]. An interesting and impor-
tant observation from a computational standpoint is that the nonlinear
inequalities present in all of the applications described in Section 4 can be
described with second order cone constraints. Further, if a set of points
is representable using second order cone constraints, then the perspective
mapping of the set is also SOC-representable [3]. Therefore, quite often
software designed to solve SOCPs can be used to solve the perspective
relaxations arising from real applications.
To demonstrate the variety of reformulation techniques required to
express nonlinear constraints in their SOC representation, we give the SOC
representations for all of the nonlinear perspective constraints appearing
in the applications in Section 4. The book of Ben-Tal and Nemirovski
[3] contains a wealth of knowledge about the types of inequalities whose
feasible regions are representable with CQI.
For the SQUFL described in Section 4.1, the nonlinear inequalities in
the perspective reformulation (4.2) can be multiplied by zi and then are ev-
idently in rotated second order cone form (5.5). The nonlinear inequalities
(4.5) in the perspective reformulation of the NDCC described in Section 4.2
can be put in the (rotated) second order cone form
(yij − rij fij )(uij zij − fij ) ≥ rij fij2 , (5.7)
which is a rotated SOC constraint as yij ≥ rij fij and uij zij ≥ fij for
any feasible solution. Note that we multiply the inequality by zij to ob-
tain the form (5.7) above. For the scheduling application of Section 4.3,
the nonlinear inequalities (4.10) can be represented using SOC constraints.
In this case, the transformation is more complicated, requiring O(log2 aij )
additional variables and O(log2 aij ) constraints. The details of the repre-
sentation are provided in [1]. The nonlinear inequalities (4.15) arising from
the perspective reformulation of the Unit Commitment Problem are also
representable with CQI as
ait x2it ≤ zit (bit xit − yit ).

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 79

The inequalities (4.20) from the SSSD problem from Section 4.5 are repre-
sentable as rotated SOC constraints using the relation

zjk yjk + zjk vj − yjk vj ≤ 0 ⇔ vj2 ≤ (−zjk + vj )(yjk + vj ).

For the mean-variance problem of Section 4.6, the constraints yi ≥ x2i ∀i ∈


N and the perspective version yi ≥ x2i /zi ∀i ∈ N can be placed in SOC
format in the same fashion as the SQUFL problem in Section 4.1.
Using SOC software is generally preferable to using general NLP soft-
ware for MINLP instances whose only nonlinearities can be put in SOC
form. We will provide computational evidence for the improved perfor-
mance of SOCP solvers over general NLP software in Section 6.
5.3. LP solvers. We next discuss how to use outer approximation
cuts [9] to solve the perspective formulation via linear programming solvers.
As current LP solvers are significantly faster then both NLP and SOCP
solvers, this approach may offer significant advantages. We will discuss
this idea using a simple MINLP where the nonlinearity is restricted to the
objective function. Consider
 
minn f (x) + cz | Ax ≤ bz ,
(x,z)∈R ×B

where (i) X = {x | Ax ≤ b} is bounded (also implying {x | Ax ≤ 0} = {0}),


(ii) f (x) is a convex function that is finite on X, and (iii) f (0) = 0. Under
these assumptions, for any x̄ ∈ X and subgradient s ∈ ∂f (x̄), the following
inequality

v ≥ f (x̄) + c + sT (x − x̄) + (c + f (x̄) − sT x̄))(z − 1) (5.8)

is valid for the equivalent mixed integer program


 
minn v | v ≥ f (x) + cz, Ax ≤ bz .
(x,z,v)∈R ×B×R

Inequality (5.8) is called the perspective cut and has been introduced by
Frangioni and Gentile [11]. In their paper, Frangioni and Gentile use these
cuts dynamically to build a tight formulation. It is possible to show [18]
that perspective cuts are indeed outer approximation cuts for the perspec-
tive reformulation for this MINLP and therefore adding all (infinitely many)
perspective cuts has the same strength as the perspective reformulation.
Furthermore, an interesting observation is that the perspective cuts
can also be obtained by first building a linear outer approximation of the
original nonlinear inequality v ≥ f (x) + cz, and then strengthening it using
a logical deductive argument. For example, in the SQUFL problem de-
scribed in Section 4.1, the outer approximation of the inequality yij ≥ x2ij
at a given point (x̄, ȳ, z̄) is

yij ≥ 2(x̄ij )xij − (x̄ij )2 . (5.9)

www.it-ebooks.info
80 OKTAY GÜNLÜK AND JEFF LINDEROTH

Using the observation that if zi = 0, then xij = yij = 0, this inequality can
be strengthened to

yij ≥ 2(x̄ij )xij − (x̄ij )2 zi . (5.10)

The inequality (5.10) is precisely the perspective cut (5.8) for this instance.
Following their work on perspective cuts, Frangioni and Gentile com-
putationally compare using a LP solver (where perspective cuts are added
dynamically) and using a second-order cone solver [13]. Based on their
experiments on instances of the unit commitment problem and the portfo-
lio optimization problem discussed earlier, they conclude that the dynamic
(linear) approximation approach is significantly better than an SOC ap-
proach. The LP approach offers significant advantages, such as fast resolves
in branch-and-bound, and the extensive array of cutting planes, branching
rules, and heuristics that are available in powerful commercial MILP soft-
ware. However, a dynamic cutting plane approach requires the use of the
callable library of the solver software to add the cuts. For practitioners,
an advantage of nonlinear automatic reformulation techniques is that they
may be directly implemented in a modeling language.
Here, we offer a simple heuristic to obtain some of the strength of the
perspective reformulation, while retaining the advantages of MILP software
to solve the subproblem and an algebraic modeling language to formulate
the instance. The heuristic works by choosing a set of points in advance
and writing the perspective cuts using these points. This essentially gives
a linear relaxation of the perspective formulation that uses piecewise linear
under-approximations of the nonlinear functions. Solving this underap-
proximating MILP provides a lower bound on the optimal solution value of
the MINLP. To obtain an upper bound, the integer variables may be fixed
at the values found by the solution to the MILP, and a continuous NLP
solved. We will demonstrate the effectiveness of this approach in the next
section.
6. Computational results. The improvement in computational
performance that can be obtained by using the perspective reformulation
is exhibited on two families of instances, SQUFL (described in Section 4.1)
and SSSD (described in Section 4.5). We also demonstrate the behavior
of the various methodologies available (NLP, SOCP, LP) for solving the
relaxations. When demonstrating the behavior of LP to solve the relax-
ations, we use the heuristic approach described at the end of Section 5.3.
The reader interested in comparisons to a dynamic outer-approximation
approach is referred to the work of Frangioni and Gentile [13].
6.1. Separable quadratic uncapacitated facility location. Ran-
dom instances of SQUFL were generated similar to the instances of Günlük,
Lee, and Weismantel [16]. For each facility i ∈ M , a location pi is gen-
erated uniformly in [0, 1]2 and the variable cost parameter was calculated

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 81

as qij = 50pi − pj 2 . The fixed cost ci of opening a facility is gener-


ated uniformly in [1, 100]. Ten random instances were created for values of
m ∈ {20, 30} and n ∈ {100, 150}. Thus, the instances solved had between
2100 and 4650 constraints, and between 2020 and 4530 variables, of which
20 or 30 (m) were binary variables. All instances were solved on an Intel
Core 2 2.4GHz CPU, with a CPU time limit of 8 hours. Each instance was
solved with four different methods.
1. The original formulation, was solved with CPLEX (v11.0) software
for Mixed-Integer Quadratic Programming (MIQP);
2. The perspective reformulation, with the perspective inequalities
put in rotated SOC form, was solved with the Mosek (v5.0) soft-
ware for Mixed-Integer Second-Order Cone Programming (MIS-
OCP);
3. The linear under-approximation using unstrengthened inequalities
(5.9) was solved with the CPLEX (v11.0) software for Mixed-
Integer Linear Programming (MILP); and
4. The linear under-approximation using the perspective cuts (5.10)
was solved with the CPLEX (v11.0) software for MILP.
Optimization problems like SQUFL, that have a convex quadratic objective
function and linear constraints, can be solved with a simplex-type pivot-
ing method [20]. The pivoting method has significant advantages when
used for branch-and-bound, as the solution procedure for child subprob-
lems may be very effectively warmstarted using solution information from
the parent node. CPLEX (v11.0) can use this pivoting-based method in its
MIQP solver, which is why CPLEX was chosen as the solver for original
formulation in method (1). The (rotated) SOC constraints (5.5) in the
perspective reformulation can be directly specified to the solver Mosek via
the GAMS modeling language. For this reason, we chose to use Mosek for
solving the perspective reformulation in method (2). In methods (3) and
(4), |B| = 10 breakpoints equally distributed between 0 and 1 were used
to underestimate each nonlinear function. After solving the approximating
MILP, the value of the integer variables was fixed and the resulting con-
tinuous relaxation (a convex quadratic program) was solved with CPLEX.
Note that methods (3) and (4) are both implementations of the heuristic
method described at the end of Section 5.3. Methods (1) and (2) are exact
methods for solving the same problems. In order to make method (3) or
(4) exact, the linearizations would have to be added dynamically as cutting
planes.
Nonlinear and SOC Solvers: Table 1 shows the average performance of
the two nonlinear formulations, methods (1) and (2), run on an Intel Core
2 2.4GHz CPU, with a CPU time limit of 8 hours. In the heading of the
table, z̄ ∗ is the average optimal solution value, “# sol” denotes the number
of instances out of 10 that were solved to optimality within the time limit,
T̄ is the average CPU time required by the solver, and N̄ is the average
number of nodes in the branch and bound search tree.

www.it-ebooks.info
82 OKTAY GÜNLÜK AND JEFF LINDEROTH

Table 1
Computational behavior of nonlinear formulations on SQUFL.

Original Perspective
m n z̄ ∗ #sol T̄ N̄ #sol T̄ N̄
20 100 408.31 10 307 6,165 10 18 37
20 150 508.38 10 807 7,409 10 33 29
30 100 375.86 10 4,704 67,808 10 33 53
30 150 462.69 7 16,607 96,591 10 56 40

From this experiment, the strong positive influence of the perspective


reformulation is apparent—instances that cannot be solved in 8 hours with
the original formulation are solved in around a minute using the strong
perspective reformulation. In [18], the commercial solvers DICOPT [19]
and BARON [28] were unable to solve small instances without the refor-
mulation technique. This demonstrates that commercial MINLP solvers
have yet to implement the perspective reformulation technique.
LP Solvers: Table 2 shows the results of solving the SQUFL instances
using the two different linear approximations to the problem. In the ta-
ble, the average of all lower bounds (obtained by solving the MILP outer-
approximation to optimality) is given in the column z̄lb of the table. The av-
erage upper bound on z ∗ is given in the column z̄ub . The average CPU time
(T̄ ) and average number of nodes (N̄ ) is also given in the table. The results
demonstrate that strengthening the linear approximation of the nonlinear
functions (the perspective cuts) significantly strengthens the formulation,
as indicated by the significantly reduced number of nodes. Solving the root
node linear relaxation with the piecewise linear approximation requires a
significant time for both the strengthen and unstrengthened formulations,
so the time improvements are not as great as in the nonlinear case.

Table 2
Computational behavior of linear formulations on SQUFL.

Original Perspective
m n z̄ ∗ z̄ub z̄lb T̄ N̄ T̄ N̄
20 100 408.31 410.88 373.79 247 491 28 4
20 150 508.38 510.58 449.42 658 510 183 3
30 100 375.86 378.45 335.58 346 510 171 3
30 150 462.69 466.76 389.30 948 475 582 4

The lower bound provided by the solution to the approximating MILP


is on average about 10% below the optimal solution value, and the upper
bound found by fixing the integer variables is typically quite close to the

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 83

true optimal solution. In order to reduce the gap between lower and upper
bounds in this heuristic approach, a finer approximation of the nonlinear
function can be created by increasing |B|. Table 3 shows the results of
an experiment where the instances were approximated with more linear
inequalities, and the strengthened (perspective) reformulation was solved.
In the table, Ḡ(%) = 100(z̄ub − z̄lb )/z̄ub denotes the average gap between
lower and upper bounds in the heuristic approach. For these instances,
|B| = 50 breakpoints is typically sufficient to prove that the solution ob-
tained is within 1% of optimality. Note, however, the increase in CPU time
required to solve the linear relaxations.

Table 3
Impact of number of piecewise linear approximation on gap and solution time.

m n |B| Ḡ(%) T̄ N̄
20 100 10 9.12% 28 4
20 100 25 1.23% 122 2
20 100 50 0.31% 367 3
20 150 10 11.98% 183 3
20 150 25 1.45% 841 6
20 150 50 0.41% 2338 6
30 100 10 11.32% 171 3
30 100 25 1.35% 1000 9
30 100 50 0.39% 1877 5
30 150 10 16.6% 582 4
30 150 25 2.09% 1433 6
30 150 50 0.48% 3419 6

6.2. Stochastic service design. Instances of the SSSD were ran-


domly created following the suggestions of Elhedhli [10]. The demand
rate of each customer λi was uniformly distributed between 0.5 and 1.5.
The lowest service rate μj1 was uniformly distributed between 5|M |/8|N |
and 7|M |/8|N |. For |K| = 3, the remaining service levels were set to
μj2 = 2μj1 , and μj3 = 3μj1 . The fixed cost for the lowest service level was
uniformly distributed between 250 and 500, and to induce economies of
scale, if uj = fj1 /μj1 , then fixed costs for the higher service levels were set
2/3 1/2
to fj2 = uj μj2 and fj3 = uj μj3 . We use t = 100 for the queueing weight
delay parameter, as these instances appear to be the most difficult com-
putationally in the work of Elhedhli. [10]. Instances with |M | = 15, 20, 25
and |N | = 4, 8 were created. The MINLP formulations contained between
48 and 90 rows and between 89 and 257 columns, where the majority of the
variables (between 72 and 224) were binary. The number of breakpoints
used in the linearization for each instance was |B| = 10.

www.it-ebooks.info
84 OKTAY GÜNLÜK AND JEFF LINDEROTH

Table 4
Bonmin performance on SSSD instances.

Without Perspective With Perspective


|M | |N | z∗ T (sec.) #N z ∗ T (sec.) #N
15 4 5.76 1161 21357 Failed after 236 nodes
15 8 (9.37,9.41) 14400 224500 Failed after 91 nodes
20 4 3.71 282 6342 Failed after 786 nodes
20 8 (4.391,4.393) 14400 188987 Failed after 144 nodes
25 4 2.98 238 3914 Failed after 235 nodes
25 8 Failed after 2468 nodes Failed after 85 nodes

Each instance was solved six times with the following combination of
formulation and software:
1. The original MINLP was solved with Bonmin (v0.9);
2. The perspective strengthened MINLP was solved with Bonmin;
3. The original instance was formulated using CQI and solved with
Mosek (v5.0);
4. The perspective strengthened instance was formulated using CQI
and solved with Mosek;
5. The linear under-approximation, not strengthened with perspec-
tive cuts, was solved with the CPLEX (v11.1) MILP solver. After
fixing the integer variables to the solution of this problem, the
continuous NLP problem was solved with IPOPT (v3.4);
6. The linear under-approximation, strengthened with perspective
cuts, was solved with the CPLEX MILP solver. After fixing the
integer variables to the solution of this problem, the continuous
NLP problem was solved with IPOPT (v3.4).
NLP Solvers: Table 4 shows the results solving each instance, with and
without the perspective strengthening, using the MINLP solver Bonmin.
Bonmin uses the interior-point-based solver IPOPT to solve nonlinear re-
laxations. The table lists the optimal solution value (z ∗ ) (or bounds on
the best optimal solution), the CPU time required (T ) in seconds, and the
number of nodes evaluated (#N ). A time limit of 4 hours was imposed.
In all cases, the NLP solver IPOPT failed at a node of the branch
and bound tree with the message “Error: Ipopt exited with error
Restoration failed.” For this instance, the NLP relaxation of SSSD
(especially the perspective-enhanced NLP relaxation) appears difficult to
solve. The fundamental issue is reliability not time, as when successful, all
NLP relaxations solved in less than one second. We performed a small ex-
periment designed to test the impact of the formulation and NLP software.
In this experiment, four different nonlinear formulations of the perspective
constraints were used, and the root node NLP relaxation was solved by

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 85

Table 5
Number of successful SSSD relaxation solutions (out of 10).

Formulation
Solver F1 F2 F3 F4
Ipopt 0 10 10 10
Conopt 0 0 10 10
SNOPT 0 3 0 7

three different NLP packages: Ipopt (v3.4), Conopt (v3.14S), and SNOPT
(v7.2-4). The root relaxation was also solved by Mosek (using the conic
formulation) to obtain the true optimal solution value. The four different
formulations of the perspective strengthening of the nonlinear constraints
(4.17) were the following:

zjk yjk − zjk vj − vj yjk ≤ 0, (F1)


vj
zjk − ≤ 0, (F2)
1 + vj /yjk
vj2 − (vj − zjk )(vj + yjk ) ≤ 0, (F3)

4vj2 + (yjk + zjk )2 − 2vj + yjk − zjk ≤ 0. (F4)

In all cases, an initial iterate of



i λi x
ij vj yjk
yjk = 1/|K|, xij = 1/|J|, vj =  , zjk =
k μjk yjk − i λi xij (1 + vj )

was used. Ten random instances of size |M | = 100, |N | = 40 were solved,


and Table 5 shows the number of instances for which the root node was
correctly solved to (global) optimality for each formulation and software
package. The results of this experiment indicate that modeling conic (per-
spective) inequalities in their convex form (F4) has an appreciably positive
impact on the solution quality. The perspective results for Bonmin in Ta-
ble 4 were obtained with the formulation (F 4), so even using the “best”
formulation for the NLP solver was not sufficient to ensure the correct
solution to the instance.
SOCP Solvers: Table 6 shows the results of solving the SSSD instances
with the Mosek software (methods (3) and (4)). The table lists the time
(T ) in seconds, and the number of nodes (#N ). A time limit of 4 hours
was imposed, and if the time limit was reached, indicated by T ∗ in the
table, bounds on the optimal solution value are listed. In some cases, using
the perspective reformulation has a very positive impact, while in other
cases, the difference is minor. It is interesting to note the difference in
behavior between Bonmin and Mosek without perspective strengthening.
For example, Bonmin solved the |M | = 15, |N | = 4 instance in 21357 nodes,

www.it-ebooks.info
86 OKTAY GÜNLÜK AND JEFF LINDEROTH

while Mosek does not solve this instance in more than 1.9 million nodes.
For these SSSD instances, Bonmin is able to add strong valid inequalities
to improve performance, while Mosek does not add these inequalities.

Table 6
Mosek performance on SSSD instances.

Without Perspective With Perspective



|M | |N | z T #N z∗ T #N

15 4 (4.88,5.76) T 1.95M 5.76 250 26454
15 8 (8.53,9.41) T∗ 2.54M (9.38,9.41) T∗ 2.06M
20 4 (2.77,3.71) T∗ 3.44M 3.71 52 12806
20 8 (4.31,4.39) T∗ 2.13M (4.391,4.393) T∗ 1.50M
25 4 2.98 46 10128 2.98 19 3045
25 8 (6.143,6.146) T∗ 1.21M (6.143,6.146) T∗ 1.18M

LP Solvers: Table 7 shows the results on the SSSD obtained when


solving the linearized instances with CPLEX (methods (5) and (6)). In the
table zlb is the lower bound obtained by solving the MILP to optimality, and
the time (T ) and number of nodes (#N ) are also given for the search. The
value zub is obtained by fixing the (optimal) integer solution and solving
the NLP with Ipopt. The time require to solve the NLP is negligible.
The results for the linear case are interesting. Specifically, strengthening
the root relaxation by adding perspective cuts does not appear to help
the computations in this case. For these instances, CPLEX is able to
significantly improve the gap at the root node by adding its own cutting
planes. Table 8 demonstrates the improvement in root lower bound value
between the initial solve and final processing for both the strengthened
and unstrengthened linear approximation, as well as the root lower bounds
obtained from the nonlinear formulations. Note also the case |M | = 25,
|N | = 4, where the solution obtained after fixing integer variables is far
from optimal, demonstrating that one should not always expect to obtain
a good solution from the linearization-based heuristic method.

7. Conclusions. The perspective reformulation is a tool to create


strong relaxations of convex mixed integer nonlinear programs that have
0-1 variables to indicate special on-off logical relationships. The perspec-
tive reformulation can be derived as a special case of theorems of convex
analysis or via techniques more familiar to MILP researchers: extended
formulations, projection, and convex hulls of “simple” sets.
Many applications of MINLP could take advantage of this reformula-
tion technique. In the applications described in this survey, the reformu-
lated inequalities can be cast as second-order cone constraints, a transfor-
mation that can improve an instance’s solvability.

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 87

Table 7
Linear/CPLEX performance on SSSD instances.

Without Perspective With Perspective


|M | |N | zlb zub T #N zlb zub T #N
15 4 5.75 5.76 1.1 7586 5.75 5.76 1.1 7755
15 8 9.11 9.71 242 1.32M 9.11 9.71 818 4.46M
20 4 3.40 7.07 0.7 908 3.4 7.07 0.3 1517
20 8 4.37 4.41 810 5.59M 4.37 4.41 933 5.93M
25 4 2.64 17.2 0.7 486 2.64 17.2 0.8 541
25 8 6.13 6.16 374 1.75M 6.13 6.16 599 2.93M

Table 8
Initial and final root lower bounds for SSSD instances.

Without Perspective With Perspective


Nonlinear Linear Nonlinear Linear
|M | |N | Initial Final Initial Final
15 4 1.76 1.74 3.66 4.42 4.02 4.21
15 8 3.16 2.06 5.68 6.75 6.11 6.41
20 4 1.34 1.33 1.87 2.56 1.89 2.13
20 8 1.64 1.55 2.41 2.93 2.86 2.99
25 4 1.05 1.05 1.45 2.13 1.45 1.54
25 8 2.18 2.15 3.53 4.27 3.97 4.20

We hope this survey has achieved its goals of introducing a wider


audience to the perspective reformulation technique, motivating software
developers to consider automatic recognition of the structures required for
the perspective reformulation, and spurring the research community to
investigate additional simple sets occurring in practical MINLPs in the
hope of deriving strong relaxations.
Acknowledgments. The authors would like to thank Jon Lee and
Sven Leyffer for organizing the very fruitful meeting on MINLP at the
Institute for Mathematics and its Applications (IMA) in November, 2008.
The comments of Kevin Furman and Nick Sawaya were also helpful in
preparing Section 2.2. The comments of two anonymous referees helped
clarify the presentation and contribution.

REFERENCES

[1] S. Aktürk, A. Atamtürk, and S. Gürel, A strong conic quadratic reformulation


for machine-job assignment with controllable processing times, Operations Re-
search Letters, 37 (2009), pp. 187–191.

www.it-ebooks.info
88 OKTAY GÜNLÜK AND JEFF LINDEROTH

[2] E. Balas, Disjunctive programming and a hierarchy of relaxations for discrete


optimization problems, SIAM Journal on Algebraic and Discrete Methods, 6
(1985), pp. 466–486.
[3] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization,
SIAM, 2001. MPS/SIAM Series on Optimization.
[4] D. Bertsekas and R. Gallager, Data Networks, Prentice-Hall, Englewood Cliffs,
NJ, 1987.
[5] P. Bonami, G. Cornuéjols, and H. Hijazi, Mixed integer non-linear programs
with on/off constraints: Convex analysis and applications, 2009. Poster Pre-
sentation at MIP 2009 Conference.
[6] R. Boorstyn and H. Frank, Large-scale network topological optimization, IEEE
Transactions on Communications, 25 (1977), pp. 29–47.
[7] B. Borchers and J.E. Mitchell, An improved branch and bound algorithm for
mixed integer nonlinear programs, Computers & Operations Research, 21
(1994), pp. 359–368.
[8] S. Ceria and J. Soares, Convex programming for disjunctive optimization, Math-
ematical Programming, 86 (1999), pp. 595–614.
[9] M.A. Duran and I. Grossmann, An outer-approximation algorithm for a class
of mixed-integer nonlinear programs, Mathematical Programming, 36 (1986),
pp. 307–339.
[10] S. Elhedhli, Service system design with immobile servers, stochastic demand,
and congestion, Manufacturing & Service Operations Management, 8 (2006),
pp. 92–97.
[11] A. Frangioni and C. Gentile, Perspective cuts for a class of convex 0–1 mixed
integer programs, Mathematical Programming, 106 (2006), pp. 225–236.
[12] , SDP diagonalizations and perspective cuts for a class of nonseparable
MIQP, Operations Research Letters, 35 (2007), pp. 181–185.
[13] , A computational comparison of reformulations of the perspective relax-
ation: SOCP vs. cutting planes, Operations Research Letters, 24 (2009),
pp. 105–113.
[14] K. Furman, I. Grossmann, and N. Sawaya, An exact MINLP formulation for
nonlinear disjunctive programs based on the convex hull, 2009. Presentation
at 20th International Symposium on Mathematical Programming.
[15] I. Grossmann and S. Lee, Generalized convex disjunctive programming: Non-
linear convex hull relaxation, Computational Optimization and Applications
(2003), pp. 83–100.
[16] O. Günlük, J. Lee, and R. Weismantel, MINLP strengthening for separable
convex quadratic transportation-cost UFL, Tech. Rep. RC24213 (W0703-042),
IBM Research Division, March 2007.
[17] O. Günlük and J. Linderoth, Perspective relaxation of mixed integer nonlinear
programs with indicator variables, Tech. Rep. RC24694 (W0811-076), IBM
Research Division, November 2008.
[18] , Perspective relaxation of mixed integer nonlinear programs with indicator
variables, Mathematical Programming, Series B, 104 (2010), pp. 183–206.
[19] G.R. Kocis and I.E. Grossmann, Computational experience with DICOPT solv-
ing MINLP problems in process systems engineering, Computers and Chemi-
cal Engineering, 13 (1989), pp. 307–315.
[20] C.E. Lemke, Bimatrix equilibrium points and mathematical programming, Man-
agement Science, 11 (1965), pp. 681–689.
[21] S. Leyffer and J. Linderoth, A practical guide to mixed integer nonlinear pro-
gramming, 2005. Short Course offered at SIAM Optimization Conference.
[22] H.M. Markowitz, Portfolio selection, Journal of Finance, 7 (1952), pp. 77–91.
[23] The mosek optimization tools manual. version 5.0 (revision 84), 2008.
(www.mosek.com.)
[24] A.F. Perold, Large-scale portfolio optimization, Management Science, 30 (1984),
pp. 1143–1160.

www.it-ebooks.info
PERSPECTIVE REFORMULATION AND APPLICATIONS 89

[25] R. Stubbs and S. Mehrotra, A branch-and-cut method for 0-1 mixed convex
programming, Mathematical Programming, 86 (1999), pp. 515–532.
[26] R.A. Stubbs, Branch-and-Cut Methods for Mixed 0-1 Convex Programming, PhD
thesis, Northwestern University, December 1996.
[27] J.F. Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over sym-
metric cones, Optimization Methods and Software, 11–12 (1999), pp. 625–653.
[28] M. Tawarmalani and N.V. Sahinidis, Global optimization of mixed integer non-
linear programs: A theoretical and computational study, Mathematical Pro-
gramming, 99 (2004), pp. 563–591.

www.it-ebooks.info
www.it-ebooks.info
PART II:
Disjunctive Programming

www.it-ebooks.info
www.it-ebooks.info
GENERALIZED DISJUNCTIVE PROGRAMMING:
A FRAMEWORK FOR FORMULATION AND
ALTERNATIVE ALGORITHMS FOR
MINLP OPTIMIZATION
IGNACIO E. GROSSMANN∗ AND JUAN P. RUIZ

Abstract. Generalized disjunctive programming (GDP) is an extension of the dis-


junctive programming paradigm developed by Balas. The GDP formulation involves
Boolean and continuous variables that are specified in algebraic constraints, disjunc-
tions and logic propositions, which is an alternative representation to the traditional
algebraic mixed-integer programming formulation. After providing a brief review of
MINLP optimization, we present an overview of GDP for the case of convex functions
emphasizing the quality of continuous relaxations of alternative reformulations that in-
clude the big-M and the hull relaxation. We then review disjunctive branch and bound
as well as logic-based decomposition methods that circumvent some of the limitations
in traditional MINLP optimization. We next consider the case of linear GDP problems
to show how a hierarchy of relaxations can be developed by performing sequential inter-
section of disjunctions. Finally, for the case when the GDP problem involves nonconvex
functions, we propose a scheme for tightening the lower bounds for obtaining the global
optimum using a combined disjunctive and spatial branch and bound search. We illus-
trate the application of the theoretical concepts and algorithms on several engineering
and OR problems.

Key words. Disjunctive programming, Mixed-integer nonlinear programming,


global optimization.

AMS(MOS) subject classifications.

1. Introduction. Mixed-integer optimization provides a framework


for mathematically modeling many optimization problems that involve dis-
crete and continuous variables. Over the last few years there has been a
pronounced increase in the development of these models, particularly in
process systems engineering [15, 21, 27].
Mixed-integer linear programming (MILP) methods and codes such
as CPLEX, XPRESS and GUROBI have made great advances and are
currently applied to increasingly larger problems. Mixed-integer nonlinear
programming (MINLP) has also made significant progress as a number of
codes have been developed over the last decade (e.g. DICOPT, SBB, α-
ECP, Bonmin, FilMINT, BARON, etc.). Despite these advances, three
basic questions still remain in this area: a) How to develop the “best”
model?, b) How to improve the relaxation in these models?, c) How to
solve nonconvex GDP problems to global optimality?
Motivated by the above questions, one of the trends has been to rep-
resent discrete and continuous optimization problems by models consisting
of algebraic constraints, logic disjunctions and logic relations [32, 18]. The

∗ Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213


([email protected]). NSF OCI-0750826.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 93
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_4,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
94 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

basic motivation in using these representations is: a) to facilitate the mod-


eling of discrete and continuous optimization problems, b) to retain and
exploit the inherent logic structure of problems to reduce the combinatorics
and to improve the relaxations, and c) to improve the bounds of the global
optimum in nonconvex problems.
In this paper we provide an overview of Generalized Disjunctive Pro-
gramming [32], which can be regarded as a generalization of disjunctive
programming [3]. In contrast to the traditional algebraic mixed-integer
programming formulations, the GDP formulation involves Boolean and
continuous variables that are specified in algebraic constraints, disjunctions
and logic propositions. We first address the solution of GDP problems for
the case of convex functions for which we consider the big-M and the hull
relaxation MINLP reformulations. We then review disjunctive branch and
bound as well as logic-based decomposition methods that circumvent some
of the MINLP reformulations. We next consider the case of linear GDP
problems to show how a hierarchy of relaxations can be developed by per-
forming sequential intersection of disjunctions. Finally, for the case when
the GDP problem involves nonconvex functions, we describe a scheme for
tightening the lower bounds for obtaining the global optimum using a com-
bined disjunctive and spatial branch and bound search. We illustrate the
application of the theoretical concepts and algorithms on several engineer-
ing and OR problems.
2. Generalized disjunctive programming. The most basic form
of an MINLP problem is as follows:
min Z = f(x,y)
s.t. gj (x, y) ≤ 0 j ∈ J (MINLP)
x∈X y∈Y
where f : Rn → R1 , g : Rn → Rm are differentiable functions, J is the index
set of constraints, and x and y are the continuous and discrete variables,
respectively. In the general case the MINLP problem will also involve
nonlinear equations, which we omit here for convenience in the presen-
tation. The set X commonly corresponds to a convex compact set, e.g.
X = {x|x ∈ Rn , Dx ≤ d, xlo ≤ x ≤ xup }; the discrete set Y corresponds to
a polyhedral set of integer points, Y = {y|y ∈ Z m , Ay ≤ a}, which in most
applications is restricted to 0-1 values, y ∈ {0, 1}m . In most applications
of interest the objective and constraint functions f , g are linear in y (e.g.
fixed cost charges and mixed-logic constraints): f (x, y) = cT y + r(x),
g(x, y) = By + h(x). The derivation of most methods for MINLP assumes
that the functions f and g are convex [14].
An alternative approach for representing discrete and continuous op-
timization problems is by using models consisting of algebraic constraints,
logic disjunctions and logic propositions [4, 32, 40, 18, 19, 22]. This ap-
proach not only facilitates the development of the models by making the

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 95

formulation process intuitive, but it also keeps in the model the underlying
logic structure of the problem that can be exploited to find the solution
more efficiently. A particular case of these models is generalized disjunc-
tive programming (GDP) [32] the main focus of this paper, and which can
be regarded as a generalization of disjunctive programming [3]. Process
Design [15] and Planning and Scheduling [27] are some of the areas where
GDP formulations have shown to be successful.
2.1. Formulation. The general structure of a GDP can be repre-
sented as follows [32]:

min Z = f (x) + k∈K ck
s.t. g(x) ≤ 0
⎡ ⎤
Yik
∨ ⎣ rik (x) ≤ 0 ⎦ k∈K (GDP)
i∈Dk
ck = γik
Ω(Y ) = T rue
xlo ≤ x ≤ xup
x ∈ Rn , ck ∈ R1 , Yik ∈ {T rue, F alse}, i ∈ Dk , k ∈ K

where f : Rn → R1 is a function of the continuous variables x in


the objective function, g : Rn → Rl belongs to the set of global con-
straints, the disjunctions k ∈ K, are composed of a number of terms
i ∈ Dk , that are connected by the OR operator. In each term there is
a Boolean variable Yik , a set of inequalities rik (x) ≤ 0, rik : Rn → Rm ,
and a cost variable ck . If Yik is True, then rik (x) ≤ 0 and ck = γik
are enforced; otherwise they are ignored. Also, Ω(Y ) = T rue are logic
propositions for the Boolean
 variables expressedin the conjunctive normal
form Ω(Y ) = ∧ ∨ (Yik ) ∨ (¬Yik ) where for each clause t,
t=1,2,..T Yik ∈Rt Yik ∈Qt
t = 1, 2 . . . T , Rt is the subset of Boolean variables that are non-negated,
and Qt is the subset of Boolean variables that are negated. As indicated
in [36], we assume that the logic constraints ∨ Yik are contained in
i∈Dk
Ω(Y ) = T rue.
There are three major cases that arise in problem (GDP): a) linear
functions f , g and r; b) convex nonlinear functions f , g and r ; c) non-
convex functions f , g and r. Each of these cases require different solution
methods.
2.2. Illustrative example. The following example illustrates how
the GDP framework can be used to model the optimization of a simple
process network shown in Figure 1 that produces a product B by consuming
a raw material A. The variables F represent material flows. The problem
is to determine the amount of product to produce (F8 ) with a selling price
P1 , the amount of raw material to buy (F1 ) with a cost P2 and the set

www.it-ebooks.info
96 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

Fig. 1. Process network example.

of unit operations to use (i.e. HX1, R1, R2, DC1) with a cost ck = γk ,
k ∈ {HX1, R1, R2, DC1} , in order to maximize the profit.
The generalized disjunctive program that represents the problem can
be formulated as follows:

max Z = P1 F8 − P2 F1 − ck (1)
k∈K

s.t. F1 = F3 + F2 (2)
F8 = F7 + F5 (3)
⎡ ⎤ ⎡ ⎤
YHX1 ¬YHX1
⎣ F4 = F3 ⎦ ∨ ⎣ F4 = F3 = 0 ⎦ (4)
cHX1 = γHX1 cHX1 = 0

⎡ ⎤ ⎡ ⎤
YR2 ¬YR2
⎣ F5 = β1 F4 ⎦ ∨ ⎣ F5 = F4 = 0 ⎦ (5)
cR2 = γR2 cR2 = 0

⎡ ⎤ ⎡ ⎤
YR1 ¬YR1
⎣ F6 = β2 F2 ⎦ ∨ ⎣ F6 = F2 = 0 ⎦ (6)
cR1 = γR1 cR1 = 0

⎡ ⎤ ⎡ ⎤
YDC1 ¬YDC1
⎣ F7 = β3 F6 ⎦ ∨ ⎣ F7 = F6 = 0 ⎦ (7)
cDC1 = γDC1 cDC1 = 0

YR2 ⇔ YHX1 (8)


YR1 ⇔ YDC1 (9)

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 97

Fi ∈ R, ck ∈ R1 , Yk ∈ {T rue, F alse} i ∈ {1, 2, 3, 4, 5, 6, 7, 8}


k ∈ {HX1, R1, R2, DC1}

where (1) represents the objective function, (2) and (3) are the global
constraints representing the mass balances around the splitter and mixer
respectively, the disjunctions (4),(5),(6) and (7) represent the existence
or non-existence of the unit operation k , k ∈ {HX1, R1, R2, DC1} with
their respective characteristic equations where β is the ratio between the
inlet and outlet flows and (8) and (9) the logic propositions which enforce
the selection of DC1 if and only if R1 is chosen and HX1 if and only if
R2 is chosen. For the sake of simplicity we have presented here a simple
linear model. In the actual application to a process problem there would
be hundreds or thousands of nonlinear equations.

2.3. Solution methods.

2.3.1. MINLP reformulation. In order to take advantage of the


existing MINLP solvers, GDPs are often reformulated as an MINLP by
using either the big-M (BM) [28], or the Hull Relaxation (HR) [22] refor-
mulation. The former yields:
 
min Z = f (x) + i∈Dk k∈K γik yik
s.t. g(x) ≤ 0
rik (x) ≤ M (1 − yik ) i ∈ Dk , k ∈ K (BM )

i∈Dk yik = 1 k∈K
Ay ≥ a
x ∈ Rn , yik ∈ {0, 1} , i ∈ Dk , k ∈ K
where the variable yik has a one to one correspondence with the Boolean
variable Yik . Note that when yik = 0 and the parameter M is sufficently
large, the associated constraint becomes redundant; otherwise, it is en-
forced. Also, Ay ≥ a is the reformulation of the logic constraints in the
discrete space, which can be easily accomplished as described in the work
by Williams [46] and discussed in the work by Raman and Grossmann [32].
The hull reformulation yields,
 
min Z = f (x) + i∈Dk k∈K γik yik

s.t. x = i∈DK ν ik k∈K
g(x) ≤ 0
yik rik (ν ik /yik ) ≤ 0 i ∈ Dk , k ∈ K (HR)
ik up
0≤ν ≤ yik x i ∈ Dk , k ∈ K

i∈Dk yik = 1 k∈K

www.it-ebooks.info
98 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

Ay ≥ a
x ∈ Rn , ν ik ∈ Rn , ck ∈ R1 , yik ∈ {0, 1} , i ∈ Dk , k ∈ K.

As it can be seen, the HR reformulation is less intuitive than the BM.


However, there is also a one to one correspondence between (GDP) and
(HR). Note that the size of the problem is increased by introducing a new
set of disaggregated variables ν ik and new constraints. On the other hand,
as proved in Grossmann and Lee [16] and discussed by Vecchietti, Lee and
Grossmann [41], the HR formulation is at least as tight and generally tighter
than the BM when the discrete domain is relaxed (i.e. 0 ≤ yik ≤ 1, k ∈
K, i ∈ Dk ). This is of great importance considering that the efficiency of
the MINLP solvers heavily rely on the quality of these relaxations.
It is important to note that on the one hand the term yik rik (ν ik /yik ) is
convex if rik (x) is a convex function. On the other hand the term requires
the use of a suitable approximation to avoid singularities. Sawaya [36]
proposed the following reformulation which yields an exact approximation
at yik = 0 and yik = 1 for any value of ε in the interval (0,1), and the
feasibility and convexity of the approximating problem are maintained:
yik rik (ν ik /yik ) ≈ ((1 − ε)yik + ε)rik (ν ik /((1 − ε)yik + ε)) − εrik (0)(1 − yik )
Note that this approximation assumes that rik (x) is defined at x = 0
and that the inequality 0 ≤ ν ik ≤ yik xup is enforced.
Methods that have addressed the solution of problem (MINLP) in-
clude the branch and bound method (BB) [17, 7, 38, 24], Generalized Ben-
ders Decomposition (GBD) [13], Outer-Approximation (OA) [10, 47, 11],
LP/NLP based branch and bound [29, 8], and Extended Cutting Plane
Method (ECP) [44, 45]. An extensive description of these methods is dis-
cussed in the review paper by Grossmann [14] while some implementation
issues are discussed in Liberti et al. [25].
The number of computer codes for solving MINLP problems has in-
creased in the last decade. The program DICOPT [43] is an MINLP solver
that is available in the modeling system GAMS [9], and is based on the
outer-approximation method with heuristics for handling nonconvexities.
A similar code to DICOPT, AAOA, is available in AIMMS. Codes that im-
plement the branch and bound method include the code MINLP-BB that
is based on an SQP algorithm [24] and is available in AMPL, and the code
SBB which is available in GAMS [9]. Both codes assume that the bounds
are valid even though the original problem may be nonconvex. The code
α–ECP that is available in GAMS implements the extended cutting plane
method by Westerlund and Pettersson [44], including the extension by
Westerlund and Pörn [45]. The open source code Bonmin [8] implements
the branch and bound method, the outer-approximation and an extension
of the LP/NLP based branch and bound method in one single framework.
FilMINT [1] also implements a variant of the LP/NLP based branch and
bound method. Codes for the global optimization that implement the

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 99

spatial branch and bound method include BARON [34], LINDOGlobal [26],
and Couenne [5].
2.3.2. Logic-Based Methods. In order to fully exploit the logic
structure of GDP problems, two other solution methods have been proposed
for the case of convex nonlinear GDP, namely, the Disjunctive Branch
and Bound method [22], which builds on the concept of Branch and Bound
method by Beaumont [4] and the Logic-Based Outer-Approximation
method [40].
The basic idea in the disjunctive Branch and Bound method is
to directly branch on the constraints corresponding to particular terms
in the disjunctions, while considering the hull relaxation of the remaining
disjunctions. Although the tightness of the relaxation at each node is com-
parable with the one obtained when solving the HR reformulation with a
MINLP solver, the size of the problems solved are smaller and the numerical
robustness is improved.
For the case of Logic-Based Outer-Approximation methods, sim-
ilar to the case of OA for MINLP, the main idea is to solve iteratively a
master problem given by a linear GDP, which will give a lower bound of
the solution and an NLP subproblem that will give an upper bound. As
described in Turkay and Grossmann [40], for fixed values of the Boolean
variables, Yîk = T rue , Yik = F alse with î = i, the corresponding NLP
subproblem (SNLP) is as follows:

min Z = f (x) + k∈K ck
s.t. g(x) ≤ 0

rik (x) ≤ 0
f or Yik = T rue i ∈ Dk , k ∈ K (SNLP)
ck = γik
lo up
x ≤x≤x
x ∈ Rn , ck ∈ R1 .

It is important to note that only the constraints that belong to the


active terms in the disjunction (i.e. associated Boolean variable Yik =
T rue) are imposed. This leads to a substantial reduction in the size of the
problem compared to the direct application of the traditional OA method
on the MINLP reformulation. Assuming that L subproblems are solved
in which sets of linearizations  = 1, 2....L are generated for subsets of
disjunction terms Lik = {|Yik = T rue}, one can define the following
disjunctive OA master problem (MLGDP):

min Z = α + k∈K ck
s.t.

α ≥ f (x ) + ∇f (x )T (x − x )
 = 1, 2....., L
g(x ) + ∇g(x )T (x − x ) ≤ 0

www.it-ebooks.info
100 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
⎡ ⎤
Yik
∨ ⎣ rik (x ) + ∇rik (x )(x − x ) ≤ 0  ∈ Lik ⎦ k ∈ K (MLGDP)
i∈Dk
ck = γik
Ω(Y ) = T rue
xlo ≤ x ≤ xup
α ∈ R1 , x ∈ Rn , ck ∈ R1 , Yik ∈ {T rue, F alse} , i ∈ Dk , k ∈ K.
It should be noted that before applying the above master problem it is
necessary to solve various subproblems (SNLP) for different values of the
Boolean variables Yik so as to produce at least one linear approximation
of each of the terms i ∈ Dk in the disjunctions k ∈ K. As shown by
Turkay and Grossmann [40] selecting the smallest number of subproblems
amounts to solving a set covering problem, which is of small size and easy
to solve. It is important to note that the number of subproblems solved
in the initialization is often small since the combinatorial explosion that
one might expect is in general limited by the propositional logic. This
property frequently arises in Process Networks since they are often modeled
by using two terms disjunctions where one of the terms is always linear (see
remark below). Moreover, terms in the disjunctions that contain only linear
functions need not be considered for generating the subproblems. Also, it
should be noted that the master problem can be reformulated as an MILP
by using the big-M or Hull reformulation, or else solved directly with a
disjunctive branch and bound method.
Remark. In the context of process networks the disjunctions in the
(GDP) formulation typically arise for each unit i in the following form:
⎡ ⎤ ⎡ ⎤
Yi ¬Yi
⎣ ri (x) ≤ 0 ⎦ ∨ ⎣ B i x = 0 ⎦ i ∈ I
ci = γi ci = 0
in which the inequalities ri apply and a fixed cost γi is incurred if the unit
is selected (Yi ); otherwise (¬Yi ) there is no fixed cost and a subset of the
x variables is set to zero.
2.3.3. Example. We present here numerical results on an example
problem dealing with the synthesis of a process network that was originally
formulated by Duran and Grossmann [10] as an MINLP problem, and
later by Turkay and Grossmann [40] as a GDP problem. Figure 2 shows
the superstructure that involves the possible selection of 8 processes. The
Boolean variables Yj denote the existence or non-existence of processes 1-8.
The global optimal solution is Z ∗ =68.01, and consists of the selection of
processes 2, 4, 6, and 8.
The model in the form of the GDP problem involves disjunctions for
the selection of units, and propositional logic for the relationship of these
units. Each disjunction contains the equation for each unit (these relax as
convex inequalities). The model is as follows:

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 101

Fig. 2. Superstructure for process network.

Objective function:
min Z = c1 + c2 + c3 + c4 + c5 + c6 + c7 + c8 + x2 − 10x3 + x4
−15x5 − 40x9 + 15x10 + 15x14 + 80x17 − 65x18 + 25x9 − 60x20
+35x21 − 80x22 − 35x25 + 122
Material balances at mixing/splitting points:
x3 + x5 − x6 − x11 = 0
x13 − x19 − x21 = 0
x17 − x9 − x16 − x25 = 0
x11 − x12 − x15 = 0
x6 − x7 − x8 = 0
x23 − x20 − x22 = 0
x23 − x14 − x24 = 0
Specifications on the flows:
x10 − 0.8x17 ≤ 0
x10 − 0.4x17 ≥ 0
x12 − 5x14 ≤ 0
x12 − 2x14 ≥ 0
Disjunctions:
⎡ ⎤ ⎡ ⎤
Y1 ¬Y1
Unit 1: ⎣ ex3 − 1 − x2 ≤ 0 ⎦ ∨ ⎣ x2 = x3 = 0 ⎦
c1 = 5 c1 = 0
⎡ ⎤ ⎡ ⎤
Y2 ¬Y2
Unit 2: ⎣ ex5 /1.2 − 1 − x4 ≤ 0 ⎦ ∨ ⎣ x4 = x5 = 0 ⎦
c2 = 8 c2 = 0
⎡ ⎤ ⎡ ⎤
Y3 ¬Y3
Unit 3: ⎣ 1.5x9 − x8 + x10 ≤ 0 ⎦ ∨ ⎣ x8 = x9 = x10 = 0 ⎦
c3 = 6 c3 = 0

www.it-ebooks.info
102 IGNACIO E. GROSSMANN AND JUAN P. RUIZ
⎡ ⎤ ⎡ ⎤
Y4 ¬Y4
Unit 4: ⎣ 1.5(x12 + x14 ) − x13 = 0 ⎦ ∨ ⎣ x12 = x13 = x14 = 0 ⎦
c4 = 10 c4 = 0
⎡ ⎤ ⎡ ⎤
Y5 ¬Y5
Unit 5: ⎣ x15 − 2x16 = 0 ⎦ ∨ ⎣ x15 = x16 = 0 ⎦
c5 = 6 c5 = 0
⎡ ⎤ ⎡ ⎤
Y6 ¬Y6
Unit 6: ⎣ ex20 /1.5 − 1 − x19 ≤ 0 ⎦ ∨ ⎣ x19 = x20 = 0 ⎦
c6 = 7 c6 = 0
⎡ ⎤ ⎡ ⎤
Y7 ¬Y7
Unit 7: ⎣ ex22 − 1 − x21 ≤ 0 ⎦ ∨ ⎣ x21 = x22 = 0 ⎦
c7 = 4 c7 = 0
⎡ ⎤ ⎡ ⎤
Y8 ¬Y8
Unit 8: ⎣ ex18 − 1 − x10 − x17 ≤ 0 ⎦ ∨ ⎣ x10 = x17 = x18 = 0 ⎦
c8 = 5 c8 = 0

Propositional Logic:

Y1 ⇒ Y3 ∨ Y4 ∨ Y5 ; Y2 ⇒ Y3 ∨ Y4 ∨ Y5 ; Y3 ⇒ Y1 ∨ Y2 ; Y3 ⇒ Y8
Y4 ⇒ Y1 ∨ Y2 ; Y4 ⇒ Y6 ∨ Y7 ; Y5 ⇒ Y1 ∨ Y2 ; Y5 ⇒ Y8
Y6 ⇒ Y4 ; Y7 ⇒ Y4
Y8 ⇒ Y3 ∨ Y5 ∨ (¬Y3 ∧ ¬Y5 )

Specifications:

Y1 ∨ Y2 ; Y4 ∨ Y5 ; Y6 ∨ Y7

Variables:

xj , ci ≥ 0, Yi = {T rue, F alse} i = 1, 2...8, j = 1, 2.....25.

Table 1 shows a comparison between the three solution approaches


presented before. Master and NLP represent the number of MILP master
problems and NLP subproblems solved to find the solution. It should be
noted that the Logic-Based Outer-Approximation method required solving
only three NLP subproblems to initialize the master problem (MGDLP),
which was reformulated as an MILP using the hull relaxation reformulation.
2.4. Special cases.
2.4.1. Linear generalized disjunctive programming. A particu-
lar class of GDP problems arises when the functions in the objective and
constraints are linear. The general formulation of a linear GDP as described
by Raman and Grossmann [32] is as follows:

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 103

Table 1
Results using different GDP solution methods.

Outer-Approximation* Disjunctive B&B Logic-Based OA**


NLP 2 5 4
Master 2 0 1
* Solved with DICOPT through EMP (GAMS).
** Solved with LOGMIP (GAMS).


min Z = dT x + k ck
s.t. Bx ≤ b
⎡ ⎤
Yik
∨ ⎣ Aik x ≤ aik ⎦ k ∈ K (LGDP)
i∈Dk
ck = γik
Ω(Y ) = T rue
xlo ≤ x ≤ xup
x ∈ Rn, ck ∈ R1 , Yik ∈ {T rue, F alse} , i ∈ Dk , k ∈ K.

The big-M formulation reads:


 
min Z = dT x + i∈Dk k∈K γij yik
s.t. Bx ≤ b
Aik x ≤ aik + M (1 − yik ) i ∈ Dk , k ∈ K (LBM)

i∈Dk yik = 1 k ∈ K
Ay ≥ a
x ∈ Rn , yik ∈ {0, 1} , i ∈ Dk , k ∈ K

while the HR formulation reads:


 
min Z = dT x + i∈Dk k∈K γij yik

s.t. x = i∈DK ν ik k ∈ K
Bx ≤ b
Aik ν ik ≤ aik yik i ∈ Dk , k ∈ K (LHR)
ik
0≤ν ≤ yik Uv i ∈ Dk , k ∈ K

i∈Dk yik = 1 k ∈ K
Ay ≥ a
x ∈ Rn , νik ∈ Rn , ck ∈ R1 , yik ∈ {0, 1} , i ∈ Dk , k ∈ K.

As a particular case of a GDP, LGDPs can be solved using MIP solvers


applied on the LBM or LHR reformulations. However, as described in the

www.it-ebooks.info
104 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

work of Sawaya and Grossmann [35] two issues may arise. Firstly, the con-
tinuous relaxation of LBM is often weak, leading to a large number of nodes
enumerated in the branch and bound procedure. Secondly, the increase in
the size of LHR due to the disaggregated variables and new constraints may
not compensate for the strengthening obtained in the relaxation, resulting
in a high computational effort. In order to overcome these issues, Sawaya
and Grossmann [35] proposed a cutting plane methodology that consists
in the generation of cutting planes obtained from the LHR and used to
strengthen the relaxation of LBM. It is important to note, however, that in
the last few years, MIP solvers have improved significantly in the use of the
problem structure to reduce automatically the size of the formulation. As
a result the emphasis should be placed on the strength of the relaxations
rather than on the size of formulations. With this in mind, we present next
the last developments in linear GDPs.
Sawaya [36] proved that any Linear Generalized Disjunctive Program
(LGDP) that involves Boolean and continuous variables can be equivalently
formulated as a Disjunctive Program (DP), that only involves continuous
variables. This means that we are able to exploit the wealth of theory
behind DP from Balas [2, 3] in order to solve LGDP more efficiently.
One of the properties of disjunctive sets is that they can be expressed
in many different equivalent forms. Among these forms, two extreme ones
are the Conjunctive Normal Form (CNF), which is expressed as the inter-
section of elementary sets (i.e. sets that are the union of half spaces), and
the Disjunctive Normal Form (DNF), which is expressed as the union of
polyhedra. One important result in Disjunctive Programming Theory, as
presented in the work of Balas [3], is that we can systematically generate
a set of equivalent DP formulations going from the CNF to the DNF by
using an operation called basic step (Theorem 2.1 [3]), which preserves
regularity. A basic step is defined as "follows. Let F be#the disjunctive set
in regular form (RF) given by F = Sj where Sj = Pi , Pi a polyhe-
j∈T i∈Qj
"
dron, i ∈ Qj .#For k,"l ∈ T, k = l, a basic step consists in replacing Sk Sl
with Skl = (Pi Pj ). Note that a basic step involves intersecting a
i∈Qk
j∈Ql

given pair of disjunctions Sk and Sl .


Although the formulations obtained after the application of basic steps
on the disjunctive sets are equivalent, their continuous relaxations "
are not.
We denote the continuous relaxation of a disjunctive set F = Sj in
j∈T
regular form where each Sj is a union of polyhedra,
" as the hull-relaxation
of F (or h − rel F ). Here h − rel F := clconv Sj and clconv Sj de-
j∈T #
notes the closure of the convex hull of Sj . That is, if Sj = Pi ,
i∈Qj

Pi = {x ∈ Rn , Ai x ≤ bi }, then cl conv Sj is given by, x = i∈Qj ν i , λi ≥

0, i∈Qj λi = 1, Ai ν i ≤ bi λi , i ∈ Qj . Note that the convex hull of F is

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 105

in general different from its hull-relaxation.


As described by Balas (Theorem 4.3 [3]), the application of a basic
step on a disjunctive set leads to a new disjunctive set whose relaxation is
at least as tight,
" if not tighter, as the former. That is, for i = 0, 1, . . . ., t
let Fi = Sj be a sequence of regular forms of a disjunctive set,
j∈Ti "
such that: i) F0 is in CNF, with P0 = Sj , ii) Ft is in DNF ,
j∈T0
iii) for i = 1, . . . ., t, Fi is obtained from Fi−1 by a basic step. Then
h − rel F0 ⊇ h − rel F1 ⊇ .... ⊇ h − rel Ft . As shown by Sawaya [36], this
leads to a procedure to find MIP reformulations that are often tighter
than the traditional LHR.
To illustrate let us consider the following example:

min Z = x2
s.t. 0.5x1 + x2 ≤ 1
⎡ ⎤ ⎡ ⎤
Y1 ¬Y1
⎣ x1 = 0 ⎦ ∨ ⎣ x1 = 1 ⎦ (LGDP1)
x2 = 0 0 ≤ x2 ≤ 1
0 ≤ x1 , x2 ≤ 1
x1 , x2 ∈ R, Y1 ∈ {T rue, F alse}.

An equivalent formulation can be obtained by the application of


a basic step between the global constraint (or one term disjunction)
0.5x1 + x2 ≤ 1 and the two-term disjunction.

min Z = x2
⎡ ⎤ ⎡ ⎤
Y1 ¬Y1
⎢ x = 0 ⎥ ⎢ x1 = 1

s.t. ⎢ 1 ⎥∨⎢ ⎥ (LGDP2)
⎣ x2 = 0 ⎦ ⎣ 0 ≤ x2 ≤ 1 ⎦
0.5x1 + x2 ≤ 1 0.5x1 + x2 ≤ 1
0 ≤ x1 , x2 ≤ 1
x1 , x2 ∈ R, Y1 ∈ {T rue, F alse}.

As it can be seen in Figure 3, the hull relaxation of the later formulation


is tighter than the original leading to a stronger lower bound.
Example. Strip Packing Problem (Hifi, 1998). We apply the
new approach to obtain stronger relaxations on a set of instances for the
Strip Packing Problem. Given a set of small rectangles with width Hi and
length Li and a large rectangular strip of fixed width W and unknown
length L. The problem is to fit the small rectangules on the strip (without
rotation and overlap) in order to minimize the length L of the strip.

www.it-ebooks.info
106 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

Fig. 3. a-Projected feasible region of LGDP1 , b-Projected feasible region of relaxed


LGDP1 , c-Projected feasible region of relaxed LGDP2.

The LGDP for this problem is presented below: [36]

min Z = lt
s.t. lt ≥ xi + Li ∀i ∈ N
   
Yij1 Yij2
∨ ∨
xi + Li ≤ xj xj + Lj ≤ xi
   
Yij3 Yij4
∨ ∨ ∀i, j ∈ N, i < j
yi − H i ≥ y j yj − Hj ≥ yi
xi ≤ U Bi − Li ∀i ∈ N
Hi ≤ yi ≤ W ∀i ∈ N
1
lt, xi , yi ∈ R+ , Yij1,2,3,4 ∈ {T rue, F alse} ∀i, j ∈ N, i < j.
In Table 2, the approach using basic steps to obtain stronger relax-
ations is compared with the original formulation.

Table 2
Comparison of sizes and lower bounds between original and new MIP reformulations.

Formulation without Basic Steps Formulation with Basic Steps


Instance Vars 0-1 Constr. LB Vars 0-1 Constr. LB
4 Rectang. 102 24 143 4 170 24 347 8
25 Rectang. 4940 1112 7526 9 5783 1112 8232 27
31 Rectang. 9716 2256 14911 10.64 11452 2256 15624 33

It is important to note that although the size of the reformulated


MIP is significantly increased when applying basic steps, the LB is greatly
improved.

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 107

2.4.2. Nonconvex generalized disjunctive programs. In gen-


eral, some of the functions f , rik or g might be nonconvex, giving rise to a
nonconvex GDP problem. The direct application of traditional algorithms
to solve the reformulated MINLP in this case, such as Generalized Benders
Decomposition (GBD) [6, 13] or Outer-Approximation (OA) [43] may fail
to find the global optimum since the solution of the NLP subproblem may
correspond to a local optimum and the cuts in the master problem may not
be valid. Therefore, specialized algorithms should be used in order to find
the global optimum [20, 39]. With this aim in mind, Lee and Grossmann
[23] proposed the following two-level branch and bound algorithm.
The first step in this approach is to introduce convex underestimators
of the nonconvex functions in the original nonconvex GDP. This leads to:
 
min Z = f¯(x) + i∈Dk k∈K γij yik
s.t. ḡ(x) ≤ 0
⎡ ⎤
Yik
∨ ⎣ r̄ik (x) ≤ 0 ⎦ k∈K (RGDPNC)
i∈Dk
ck = γik

Ω(Y ) = T rue
xlo ≤ x ≤ xup
x ∈ Rn , ck ∈ R1 , Yik ∈ {T rue, F alse} , i ∈ Dk , k ∈ K
where f¯, r̄ik , ḡ are convex and the following inequalities are satisfied
f¯(x) ≤ f (x), r̄ik (x) ≤ rik (x), ḡ(x) ≤ g(x). Note that suitable convex un-
derestimators for these functions can be found in Tawarmalani and Sahini-
dis [39].
The feasible region of (RGDPNC) can be relaxed by replacing each
disjunction by its convex hull. This relaxation yields the following convex
NLP:
 
min Z = f¯(x) + i∈Dk k∈K γij yik

s.t. x = i∈DK ν ik k ∈ K
ḡ(x) ≤ 0
yik r̄ik (ν ik /yik ) ≤ 0 i ∈ Dk , k ∈ K (RGDPRNC)
ik up
0≤ν ≤ yik x i ∈ Dk , k ∈ K

i∈Dk yik = 1 k ∈ K
Ay ≥ a
x ∈ Rn , νik ∈ Rn , ck ∈ R1 , yik ∈ [0, 1] , i ∈ Dk , k ∈ K.
As proved in Lee and Grossmann [23] the solution of this NLP formulation
leads to a lower bound of the global optimum.

www.it-ebooks.info
108 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

Fig. 4. Steps in global optimization algorithm.

The second step consists in using the above relaxation to predict lower
bounds within a spatial branch and bound framework. The main steps in
this implementation are described in Figure 4. The algorithm starts by
obtaining a local solution of the nonconvex GDP problem by solving the
MINLP reformulation with a local optimizer (e.g. DICOPT), which pro-
vides an upper bound of the solution (Z U ). Then, a bound contraction pro-
cedure is performed as described by Zamora and Grossmann [48]. Finally,
a partial branch and bound method is used on RGDP N C as described in
Lee and Grossmann [23] that consists in only branching on the Boolean
variables until a node with all the Boolean variables fixed is reached. At
this point a spatial branch and bound procedure is performed as described
in Quesada and Grossmann [30].
While the method proved to be effective in solving several problems,
a major question is whether one might be able to obtain stronger lower
bounds to improve the computational efficiency.
Recently, Ruiz and Grossmann [33] proposed an enhanced methodol-
ogy that builds on the work of Sawaya [36] to obtain stronger relaxations.
The basic idea consists in relaxing the nonconvex terms in the GDP us-
ing valid linear over- and underestimators previous to the application of
basic steps. This leads to a new linear GDP whose continuous relaxation
is tighter and valid for the original nonconvex GDP problem. The imple-
mentation of basic steps is not trivial. Therefore, Ruiz and Grossmann [33]
proposed a set of rules that aims at keeping the formulation small while
improving the relaxation. Among others, it was shown that intersecting
the global constraints with the disjunctions leads to a linear GDP with the
same number of disjuncts but a stronger relaxation.

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 109

Fig. 5. Two reactor network.

The following example illustrates the idea behind this approach to


obtain a stronger relaxation in a simple nonconvex GDP. Figure 5 shows
a small superstructure consisting of two reactors, each characterized by a
flow-conversion curve, a conversion range for which it can be designed, and
its corresponding cost as can be seen in Table 3. The problem consists in
choosing the reactor and conversion that maximize the profit from sales of
the product considering that there is a limit on the demand. The char-
acteristic curve of the reactors is defined as F = aX + b in the range of
conversions [X lo , X up ] where F and X are the flow of raw material and
conversion respectively. The price of the product is given by θ = 2, the
cost of the raw material is given by λ = 0.2 and the limit in the demand
by d = 2.

Table 3
Reactor characteristics.

Reactor Curve Range Cost


a b X lo X up Cp
I -8 9 0.2 0.95 2.5
II -10 15 0.7 0.99 1.5

The bilinear GDP model, which maximizes the profit, can be stated
as follows:

max Z = θF X − γF − CP
s.t. FX ≤ d
⎡ ⎤ ⎡ ⎤
Y11 Y21
⎢ F = α1 X + β 1 ⎥ ⎢ F = α2 X + β2 ⎥
⎢ ⎥∨⎢ ⎥ (GDP 1N C )
⎣ X1lo ≤ X ≤ X1up ⎦ ⎣ X2lo ≤ X ≤ X2up ⎦
CP = Cp1 CP = Cp2
Y11 ∨ Y21 = T rue
X, F, CP ∈ R1 , F lo ≤ F ≤ F up , Y11 , Y21 ∈ {T rue, F alse}

The associated linear GDP relaxation is obtained by replacing the


bilinear term, FX, using the McCormick convex envelopes:

www.it-ebooks.info
110 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

max Z = θP − γF − CP
s.t. P ≤d
P ≤ F X lo + F up X − F up X lo (GDP 1RLP 0 )
up lo lo up
P ≤ FX + F X − F X
P ≥ F X lo + F lo X − F lo X lo
P ≥ F X up + F up X − F up X up
⎡ ⎤ ⎡ ⎤
Y11 Y21
⎢ F = α1 X + β 1 ⎥ ⎢ F = α2 X + β2 ⎥
⎢ ⎥ ⎢ ⎥
⎣ X1lo ≤ X ≤ X1up ⎦ ∨ ⎣ X2lo ≤ X ≤ X2up ⎦
CP = Cp1 CP = Cp2
Y11 ∨ Y21 = T rue
X, F, CP ∈ R1 , F lo ≤ F ≤ F up , Y11 , Y21 ∈ {T rue, F alse}.

Intersecting the improper disjunctions given by the inequalities of the


relaxed bilinear term with the only proper disjunction (i.e. by applying
five basic steps), we obtain the following GDP formulation,

max Z = θP − γF − CP (GDP 1RLP 1 )


s.t.
2 3 2 3
Y11 Y21
6 P ≤d 7 6 P ≤d 7
6 7 6 7
6 P ≤ F X up + F lo X − F lo X up 7 6 P ≤ F X up + F lo X − F lo X up 7
6 7 6 7
6 P ≤ F X lo + F up X − F up X lo 7 6 P ≤ F X lo + F up X − F up X lo 7
6 7 6 7
6 P ≥ F X lo + F lo X − F lo X lo 7∨6 P ≥ F X lo + F lo X − F lo X lo 7
6 7 6 7
6 P ≥ F X up + F up X − F up X up 7 6 P ≥ F X up + F up X − F up X up 7
6 7 6 7
6 F = α1 X + β1 7 6 F = α2 X + β2 7
6 7 6 7
4 X1lo ≤ X ≤ X1up 5 4 X2lo ≤ X ≤ X2up 5
CP = Cp1 CP = Cp2
Y11 ∨ Y11 = T rue
X, F, CP ∈ R1 , F lo ≤ F ≤ F up , Y11 , Y21 ∈ {T rue, F alse}.

Figure 6 shows the actual feasible region of (GDP 1N C ) and the projec-
tion on the F − X space of the hull relaxations of (GDP 1RLP 0 ) and
(GDP 1RLP 1 ), where clearly the feasible space in (GDP 1RLP 1 ) is tighter
than in (GDP 1RLP 0 ). Notice that in this case the choice of reactor II is
infeasible.
Example. Water treatment network . This example corresponds
to a synthesis problem of a distributed wastewater multicomponent network
(See Figure 7), which is taken from Galan and Grossmann [12]. Given a set
of process liquid streams with known composition, a set of technologies for
the removal of pollutants, and a set of mixers and splitters, the objective
is to find the interconnections of the technologies and their flowrates to
meet the specified discharge composition of pollutant at minimum total

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 111

Fig. 6. a-Projected feasible region of GDP 1NC , b-Projected feasible region of


relaxed GDP 1RLP 0 , c-Projected feasible region of relaxed GDP 1RLP 1 .

Fig. 7. Water treatment superstructure.

cost. Discrete choices involve deciding what equipment to use for each
treatment unit.

Lee and Grossmann [23] formulated this problem as the following noncon-
vex GDP problem:

min Z = k∈P U CPk

s.t. fkj = i∈Mk fij ∀jk ∈ M U
 j j
i∈S fi = fk ∀jk ∈ SU
 k k
i∈Sk ζi = 1 k ∈ SU
fij = ζik fkj ∀j i ∈ Sk k ∈ SU
⎡ ⎤
Y Pkh
⎢ f j = β jh f j , i ∈ OP Uk , i ∈ IP Uk , ∀j ⎥
∨ ⎢ i k i  ⎥ k ∈ PU
h∈Dk ⎣ Fk = j fij , i ∈ OP Uk ⎦
CPk = ∂ik Fk

www.it-ebooks.info
112 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

0 ≤ ζik ≤ 1 ∀i, k
0 ≤ fij , fkj ∀i, j, k
0 ≤ CPk ∀k
Y Pkh ∈ {T rue, F alse} ∀h ∈ Dk ∀k ∈ P U.
The problem involves 9 discrete variables and 114 continuous variables
with 36 bilinear terms.
As it can be seen in Table 4, an improved lower bound was obtained
(i.e. 431.9 vs 400.66) which is a direct indication of the reduction of the
relaxed feasible region. The column “Best Lower Bound”, can be used
as an indicator of the performance of the proposed set of rules to apply
basic steps. Note that the lower bound obtained in this new approach is
the same as the one obtained by solving the relaxed DNF, which is quite
remarkable. A further indication of tightening is shown in Table 5 where
numerical results of the branch and bound algorithm proposed in section
6 are presented. As it can be seen the number of nodes that the spatial
branch and bound algorithm requires before finding the global solution is
significantly reduced.

Table 4
Comparison of lower bounds obtained using different relaxations.

Global Lower Bound Lower Bound Best


Optimum (Lee and Grossmann) (Ruiz and Grossmann) Lower Bound
1214.87 400.66 431.9 431.9

Table 5
Performance using different relaxations within a spatial B&B.

Lee and Grosmann Ruiz and Grossmann


Nodes Bounding % Time(sec) Nodes Bounding % Time(sec)
408 8 176 130 16 115

Table 6 shows the size of the LP relaxation obtained in each case. Note
that although the proposed methodology leads to a significant increase in
the size of the formulation, this is not translated proportionally to the
solution time of the resulting LP. This behavior can be understood by
considering that in general, the LP pre-solver will take advantage of the
particular structures of these LPs.
3. Conclusions. In this paper we have provided an overview of the
Generalized Disjunctive Programming Framework. We presented different
solution strategies that exploit the underlying logic structure of the formu-
lations with particular focus on how to develop formulations that lead to

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 113

Table 6
Size of the LP relaxation for example problems.

Lee and Grossmann Ruiz and Grossmann


Constraints Variables Constraints Variables
544 346 3424 1210

stronger relaxations. In particular, for the case of linear GDP we showed


how a hierarchy of relaxations can be developed by performing sequential
intersection of disjunctions. Finally, for the case when the GDP prob-
lem involves nonconvex functions, we proposed a scheme for tightening the
lower bounds for obtaining the global optimum using a combined disjunc-
tive and spatial branch and bound search. We illustrated the application
of the theoretical concepts and algorithms on several engineering and OR
problems.

Aknowledgments. The authors would like to acknowledge financial


support from the National Science Foundation under Grant OCI-0750826.

REFERENCES

[1] Abhishek K., Leyffer S., and Linderoth J.T., FilMINT: An Outer-
Approximation-Based Solver for Nonlinear Mixed Integer Programs,
ANL/MCS-P1374-0906, Argonne National Laboratory, 2006.
[2] Balas E., Disjunctive Programming, 5, 3–51, 1979.
[3] Balas E., Disjunctive Programming and a hierarchy of relaxations for discrete
optimization problems, SIAM J. Alg. Disc. Meth., 6, 466–486, 1985.
[4] Beaumont N., An Algorithm for Disjunctive Programs, European Journal of Op-
erations Research, 48, 362–371, 1991.
[5] Belotti P., Lee J., Liberti L., Margot F., and Wächter A., Branching and
bounds tightening techniques for non-convex MINLP, Optimization Methods
and Software, 24:4, 597–634, 2009.
[6] Benders J.F., Partitioning procedures for solving mixed-variables programming
problems, Numer.Math., 4, 238–252, 1962.
[7] Borchers B. and Mitchell J.E., An Improved Branch and Bound Algorithm for
Mixed Integer Nonlinear Programming, Computers and Operations Research,
21, 359–367, 1994.
[8] Bonami P., Biegler L.T., Conn A.R., Cornuejols G., Grossmann I.E., Laird
C.D., Lee J. , Lodi A. , Margot F., Sawaya N., and Wächter A. , An
algorithmic framework for convex mixed integer nonlinear programs, Discrete
Optimization, 5, 186–204, 2008.
[9] Brooke A., Kendrick D., Meeraus A., and Raman R., GAMS, a User’s Guide,
GAMS Development Corporation, Washington, 1998.
[10] Duran M.A. and Grossmann I.E., An Outer-Approximation Algorithm for a
Class of Mixed-integer Nonlinear Programs, Math Programming, 36, p. 307,
1986.
[11] Fletcher R. and Leyffer S., Solving Mixed Integer Nonlinear Programs by
Outer-Approximation, Math Programming, 66, p. 327, 1994.
[12] Galan B. and Grossmann I.E., Optimal Design of Distributed Wastewater Treat-
ment Networks, Ind. Eng. Chem. Res., 37, 4036–4048, 1998.

www.it-ebooks.info
114 IGNACIO E. GROSSMANN AND JUAN P. RUIZ

[13] Geoffrion A.M., Generalized Benders decomposition, JOTA, 10, 237–260, 1972.
[14] Grossmann I.E., Review of Non-Linear Mixed Integer and Disjunctive Program-
ming Techiques for Process Systems Engineering, Optimization and Engineer-
ing, 3, 227–252, 2002.
[15] Grossmann I.E., Caballero J.A., and Yeomans H., Advances in Mathematical
Programming for Automated Design, Integration and Operation of Chemical
Processes, Korean J. Chem. Eng., 16, 407–426, 1999.
[16] Grossmann I.E. and Lee S., Generalized Convex Disjunctive Programming: Non-
linear Convex Hull Relaxation, Computational Optimization and Applica-
tions, 26, 83–100, 2003.
[17] Gupta O.K. and Ravindran V., Branch and Bound Experiments in Convex Non-
linear Integer Programming, Management Science, 31:12, 1533–1546, 1985.
[18] Hooker J.N. and Osorio M.A., Mixed logical-linear programming, Discrete Ap-
plied Mathematics, 96–97, 395–442, 1999.
[19] Hooker J.N., Logic-Based Methods for Optimization: Combining Optimization
and Constraint Satisfaction, Wiley, 2000.
[20] Horst R. and Tuy H., Global Optimization deterministic approaches, 3rd Ed,
Springer-Verlag, 1996.
[21] Kallrath J., Mixed Integer Optimization in the Chemical Process Industry: Ex-
perience, Potential and Future, Trans. I .Chem E., 78, 809–822, 2000.
[22] Lee S. and Grossmann I.E., New Algorithms for Nonlinear Generalized Dis-
junctive Programming, Computers and Chemical Engineering, 24, 2125–2141,
2000.
[23] Lee S. and Grossmann I.E., Global optimization of nonlinear generalized disjunc-
tive programming with bilinear inequality constraints: application to process
networks, Computers and Chemical Engineering, 27, 1557–1575, 2003.
[24] Leyffer S., Integrating SQP and Branch and Bound for Mixed Integer Nonlinear
Programming, Computational Optimization and Applications, 18, 295–309,
2001.
[25] Liberti L., Mladenovic M., and Nannicini G., A good recipe for solv-
ing MINLPs, Hybridizing metaheuristics and mathematical programming,
Springer, 10, 2009.
[26] Lindo Systems Inc, LindoGLOBAL Solver
[27] Mendez C.A., Cerda J., Grossmann I.E., Harjunkoski I., and Fahl M.,
State-of-the-art Review of Optimization Methods for Short-Term Scheduling
of Batch Processes, Comput. Chem. Eng., 30, p. 913, 2006.
[28] Nemhauser G.L. and Wolsey L.A., Integer and Combinatorial Optimization,
Wiley-Interscience, 1988.
[29] Quesada I. and Grossmann I.E., An LP/NLP Based Branch and Bound Algo-
rithm for Convex MINLP Optimization Problems, Computers and Chemical
Engineering, 16, 937–947, 1992.
[30] Quesada I. and Grossmann I.E., Global optimization of bilinear process networks
with multicomponent flows, Computers and Chemical Engineering, 19:12,
1219–1242, 1995.
[31] Raman R. and Grossmann I.E., Relation Between MILP Modelling and Logical
Inference for Chemical Process Synthesis, Computers and Chemical Engineer-
ing, 15, 73, 1991.
[32] Raman R. and Grossmann I.E., Modelling and Computational Techniques for
Logic-Based Integer Programming, Computers and Chemical Engineering, 18,
p. 563, 1994.
[33] Ruiz J.P. and Grossmann I.E., Strengthening the lower bounds for bilinear and
concave GDP problems, Computers and Chemical Engineering, 34:3, 914–930,
2010.
[34] Sahinidis N.V., BARON: A General Purpose Global Optimization Software Pack-
age, Journal of Global Optimization, 8:2, 201–205, 1996.

www.it-ebooks.info
FORMULATION AND ALGORITHMS FOR MINLP OPTIMIZATION 115

[35] Sawaya N. and Grossmann I.E., A cutting plane method for solving linear gener-
alized disjunctive programming problems, Computers and Chemical Engineer-
ing, 20:9, 1891–1913, 2005.
[36] Sawaya N., Thesis: Reformulations, relaxations and cutting planes for generalized
disjunctive programming, Carnegie Mellon University, 2006.
[37] Schweiger C.A. and Floudas C.A., Process Synthesis, Design and Control: A
Mixed Integer Optimal Control Framework, Proceedings of DYCOPS-5 on
Dynamics and Control of Process Systems, 189–194, 1998.
[38] Stubbs R. and Mehrotra S. , A Branch-and-Cut Method for 0–1 Mixed Convex
Programming, Math Programming, 86:3, 515–532, 1999.
[39] Tawarmalani M. and Sahinidis N., Convexification and Global Optimization
in Continuous and Mixed-Integer Nonlinear Programming, Kluwer Academic
Publishers, 2002.
[40] Turkay M. and Grossmann I.E., A Logic-Based Outer-Approximation Algorithm
for MINLP Optimization of Process Flowsheets, Computers and Chemical
Enginering, 20, 959–978, 1996.
[41] Vecchietti A., Lee S., and Grossmann, I.E., Modeling of discrete/continuous
optimization problems: characterization and formulation of disjunctions and
their relaxations, Computers and Chemical Engineering, 27,433–448, 2003.
[42] Vecchietti A. and Grossmann I.E., LOGMIP: A Discrete Continuous Nonlinear
Optimizer, Computers and Chemical Engineering, 23, 555–565, 2003.
[43] Viswanathan and Grossmann I.E., A combined penalty function and outer-
approximation method for MINLP optimization, Computers and Chemical
Engineering, 14, 769–782, 1990.
[44] Westerlund T. and Pettersson F., A Cutting Plane Method for Solving Con-
vex MINLP Problems, Computers and Chemical Engineering, 19, S131–S136,
1995.
[45] Westerlund T. and Pörn R., Solving Pseudo-Convex Mixed Integer Optimiza-
tion Problems by Cutting Plane Techniques, Optimization and Engineering,
3, 253–280, 2002.
[46] Williams H.P., Mathematical Building in Mathematical Programming, John Wi-
ley, 1985.
[47] Yuan, X., Zhang S., Piboleau L., and Domenech S., Une Methode
d’optimisation Nonlineare en Variables Mixtes pour la Conception de Pro-
cedes, RAIRO, 22, 331, 1988.
[48] Zamora J.M. and Grossmann I.E., A branch and bound algorithm for problems
with concave univariate , bilinear and linear fractional terms, 14:3, 217–249,
1999.

www.it-ebooks.info
www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP
PIETRO BELOTTI∗

Abstract. Mixed Integer Nonlinear Programming (MINLP) problems present two


main challenges: the integrality of a subset of variables and nonconvex (nonlinear) ob-
jective function and constraints. Many exact solvers for MINLP are branch-and-bound
algorithms that compute a lower bound on the optimal solution using a linear program-
ming relaxation of the original problem.
In order to solve these problems to optimality, disjunctions can be used to partition
the solution set or to obtain strong lower bounds on the optimal solution of the problem.
In the MINLP context, the use of disjunctions for branching has been subject to intense
research, while the practical utility of disjunctions as a means of generating valid linear
inequalities has attracted some attention only recently.
We describe an application to MINLP of a well-known separation method for dis-
junctive cuts that has shown to be very effective in Mixed Integer Linear Programming
(MILP). As the experimental results show, this application obtains encouraging results
in the MINLP case even when a simple separation method is used.

Key words. Disjunctions, MINLP, Couenne, Disjunctive cuts.

AMS(MOS) subject classifications. 90C57.

1. Motivation: nonconvex MINLP. Mixed integer nonlinear pro-


gramming is a powerful modeling tool for very generally defined problems
in optimization [32]. A MINLP problem is defined as follows:

(P0 ) min f (x)


s.t. gj (x) ≤ 0 ∀j = 1, 2 . . . , m
i ≤ xi ≤ ui ∀i = 1, 2 . . . , n
x ∈ Zr × Rn−r ,

where f : Rn → R and gj : Rn → R, for all j = 1, 2 . . . , m, are in general


multivariate, nonconvex functions; n is the number of variables, r is the
number of integer variables, and x is the n-vector of variables, whose i-th
component is denoted by xi ; and i and ui are lower and upper bounds on
variable xi . We assume that all these bounds are finite, as otherwise the
problem is undecidable [33].
We assume that f and all gj ’s are factorable, i.e., they can be computed
in a finite number of simple steps, starting with model variables and real
constants, using unary (e.g., log, exp, sin, cos, tan) and binary operators
(e.g., + , − , ∗ , / ,ˆ). Note that this framework, although very general,
excludes problems whose constraints or objective function are, for example,
black-box functions or indefinite integrals such as the error function.
There are numerous applications of P0 in Chemical Engineering [14,
26, 35] and Computational Biology [39, 40, 50], among others. Special
∗ Department of Mathematical Sciences, Clemson University, Clemson, SC 29634

([email protected]).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 117
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_5,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
118 PIETRO BELOTTI

subclasses of MINLP, such as Mixed Integer Linear Programming (MILP),


where f is linear and all gi ’s are affine, and convex MINLPs (i.e., MINLPs
whose continuous relaxation is a convex nonlinear program), admit special
(and more efficient) solvers, therefore the only reason to use a general-
purpose nonconvex MINLP solver is that the problem cannot be classified
as any of those special cases.
Effective algorithms for nonconvex optimization aim at finding a relax-
ation and obtaining a good lower bound on the optimal solution value. In
the MILP case, a lower bound can be found by solving the LP relaxation
obtained by relaxing integrality on the variables. In the convex MINLP
case, relaxing integrality yields a convex nonlinear problem and hence a
lower bound. In the general case, finding a relaxation and a lower bound
on the global optimum for P0 can be hard, since relaxing integrality yields
a nonconvex NLP.

Disjunctions. When the relaxation does not obtain a strong lower


bound, an approach to strengthening the relaxation is to use logical dis-
junctions that are satisfied by all solutions of P0 . In their most general
form, disjunctions are logical operators that return true whenever one or
more of their operands are true [5]. In this work, we consider disjunctions
involving linear inequalities, although more general disjunctions are possi-
ble. Let us denote as S the feasible set of P0 , i.e., S = {x ∈ Zr × Rn−r :
i ≤ xi ≤ ui ∀i = 1, 2 . . . , n, gj (x) ≤ 0 ∀j = 1, 2 . . . , m}. A disjunction is an
operator on a set of systems of inequalities over the set S, written as

&
{x ∈ Rn : Ah x ≤ bh , x ∈ S}, (1.1)
h∈Q

where Ah ∈ Qmh ×n , bh ∈ Qmh , mh is the number of inequalities in the


h-th system, and Q is an index set. We say that the disjunction is true if
there exist x ∈ S and h ∈ Q such that Ah x ≤ bh .
The general definition (1.1) comprises several classes of disjunctions.
Two important ones are the integer disjunction xi ≤ α ∨ xi ≥ α + 1, which
is valid for any α ∈ Z if xi is an integer variable; and the spatial disjunction
xi ≤ β ∨ xi ≥ β, with β ∈ R and xi a continuous variable.
If disjunctions are used in the branching step or to generate disjunctive
cuts, two main problems need to be addressed: (i) how to describe the set of
disjunctions and (ii) what disjunction should be used among the (possibly
infinite) set of valid disjunctions.

Branch-and-bound algorithms. The branch operation partitions


the feasible set of an optimization problem by imposing a disjunction. Each
term of a disjunction (1.1) is associated with a subproblem Ph of the form

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 119

(Ph ) min f (x)


s.t. gj (x) ≤ 0 ∀j = 1, 2 . . . , m
Ah x ≤ bh (1.2)
i ≤ xi ≤ ui ∀i = 1, 2 . . . , n
x ∈ Zr × Rn−r ,
in which the constraints (1.2) are those imposed by branching. We denote
the feasible region of Ph by Sh .
A branch-and-bound (BB) algorithm applies this branching method
recursively. Any MINLP with a bounded feasible region can be solved
by a BB in a finite number of steps [32]. Note that an optimal solution
of a subproblem Ph is not necessarily optimal for P0 , since the feasible
regions of individual subproblems are subsets of the original feasible region
and may not include an optimal solution of P0 . Nevertheless, solutions
obtained in this way are still feasible and hence yield valid upper bounds
on the optimal value of f (x). An advantage of applying a branching method
is that it allows to discard any subproblem whose associated lower bound
exceeds the best upper bound found so far.
Disjunctive cuts. Disjunctions can also be imposed indirectly by de-
riving one or more implied constraints from them [5]. A valid inequality
with respect to S consists of a pair (α, α0 ) ∈ Qn+1 such that α x ≤ α0 for
all x ∈ S. An inequality that is valid for S and violated by at least one
member of the feasible region of the relaxation of P0 is called a cut.
A disjunctive cut with respect to a disjunction of the form (1.1) is
an inequality that is valid for Sh for each h ∈ Q, or equivalently, for the
convex hull of the union of these sets. Amending disjunctive cuts to the
feasible region of a relaxation of P0 is often an effective way of tightening
a relaxation of S.
Disjunctive cuts have long been studied in the MILP context [7, 9, 49].
They have also been generated from disjunctions arising from MINLPs
with a certain structure. For MINLP with binary variables and whose
continuous relaxation is convex, Stubbs and Mehrotra [60] generalized the
procedure proposed by Balas, Ceria and Cornuéjols [7] and described a
separation procedure based on a convex optimization problem.
A specialized procedure has been successfully applied to Mixed Integer
Conic Programming (MICP) problems, which are MILPs amended by a
set of constraints whose feasible region is a cone in Rn . The second order
cone and the cone of symmetric semidefinite matrices are among the most
important classes of conic constraints in this class. Çezik and Iyengar [19]
and lately Drewes [25] proposed, for MICP problems where all disjunctions
are generated from binary variables, an application of the procedure to the
conic case, where disjunctive inequalities are obtained by solving a contin-
uous conic optimization problem. Analogously, Frangioni and Gentile [27]

www.it-ebooks.info
120 PIETRO BELOTTI

describe a class of disjunctive cuts that can be used in convex MINLPs


where binary variables are used to model semi-continuous variables, i.e.,
variables that take values in the set {0} ∪ [l, u] with 0 ∈/ [l, u].
Another interesting class of problems is that of Mathematical Pro-
grams with Complementarity Constraints (MPCC) [57], i.e., MINLPs that
contain nonconvex constraints x y = 0, with x ∈ Rk+ , y ∈ Rk+ . These can
be more easily stated as xi yi = 0 for all i = 1, 2 . . . , k, and give rise to
simple disjunctions xi = 0 ∨ yi = 0. Júdice et al. [34] study an MPCC
where the only nonlinear constraints are the complementarity constraints,
and therefore relaxing the latter yields an LP. Disjunctive cuts are gener-
ated from solutions of the LP that violate a complementarity constraint
(i.e., solutions where xi > 0 and yi > 0 for some i ∈ {1, 2 . . . , k}) through
the observation that both variables are basic and by applying standard
disjunctive arguments to the corresponding tableau rows.

Scope of the paper and outline. While the application to MILP


of cuts derived from disjunctions has been studied for decades and efficient
implementations are available, their practical application to MINLP has
recently attracted a lot of attention: a notable example is the work by
Saxena et al. [55, 56], where disjunctive cuts are used to solve MINLPs
with quadratic nonconvex objective and constraints.
In particular, [55] point out that “. . . even though the results presented
in this paper focussed on MIQCPs, they are equally applicable to a much
wider class of nonconvex MINLPs. All we need is an automatic system
that can take a nonconvex MINLP and extract a corresponding MIQCP
relaxation. Development of software such as Couenne [11, 12] is a step in
this direction.”
Our contribution is an attempt to follow that indication: we present
an application of disjunctive programming to the context of nonconvex
MINLP problems. We provide an Open Source implementation of a pro-
cedure that generates disjunctive cuts for MINLP problems, and present a
set of computational results that substantiate the practical utility of this
application.
Several exact solution methods for nonconvex MINLP rely, in order to
obtain valid lower bounds, on reformulation and linearization techniques
[46, 59, 61], which we briefly introduce in the next section as their definition
is essential to describe the type of disjunctions we use in this context.
Section 3 describes the general disjunctions arising from reformulating an
MINLP, and their use in separating disjunctive cuts is shown in Section 4.
A simple separation procedure based on the cut generating LP (CGLP)
is discussed in Section 5. This procedure has been added to couenne, a
general-purpose, Open Source solver for MINLP [11], and tested on a set of
publicly available nonconvex MINLP instances. These tests are presented
in Section 6.

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 121

2. Lower bounds of an MINLP. Several MINLP solvers are BB


algorithms whose lower bounding technique uses a Linear Programming
(LP) relaxation constructed in two steps:
• reformulation: P0 is transformed in an equivalent MINLP with
a set of new variables, a linear objective function, and a set of
nonlinear equality constraints;
• linearization: each nonlinear equality constraint is relaxed by re-
placing it with a set of linear inequalities.
Here we provide a brief description whose purpose is to introduce the
derivation of disjunctions in Section 3.
Reformulation. If f and all gi ’s are factorable, they can be rep-
resented by expression trees, i.e., n-ary trees where every node is either
a constant, a variable, or an n-ary operator whose arguments are rep-
resented as children of the node, and which are in turn expression trees
[21, Chapter 3]. While all leaves of an expression tree are variables
or constants, each non-leaf node, which is an operator of a finite set
O = {+, ∗, ˆ , /, sin, cos, exp, log}, is associated with a new variable, de-
noted as auxiliary variable. As a result, we obtain a reformulation of P0
[46, 59, 61]:

(P) min xn+q


s.t. xk = ϑk (x) k = n + 1, n + 2 . . . , n + q
i ≤ xi ≤ ui i = 1, 2 . . . , n + q
x∈X

where X = Zr × Rn−r × Zs × Rq−s , that is, q auxiliary variables are


introduced, s of which are integral and q − s continuous1 . The auxiliary
xn+q is associated to the operator that defines the objective function of P0 .
Each auxiliary xk is associated with a function ϑk (x) from O and depends,
in general, on one or more of the variables x1 , x2 . . . , xk−1 . Let us define
the bounding box [, u] of P as the set {x ∈ Rn+q : i ≤ xi ≤ ui , i =
1, 2 . . . , n + q}.
In the reformulation, all nonlinear constraints are of the form xk =
ϑk (x) for all k = n + 1, n + 2 . . . , n + q, where ϑk is an operator from O. It
is then possible to obtain an LP relaxation of P, or linearization for short,
by applying operator-specific algorithms to each such constraint [59].
Linearization. Consider a nonlinear constraint xk = ϑk (x) in the
reformulation P. The set Θk = {x ∈ X ∩ [, u] : xk = ϑk (x)} is, in general,
nonconvex. The" nonconvex set of feasible solutions of the reformulation is
therefore Θ = n+q
k=n+1 Θk .
While finding a convex relaxation of Θ might prove very difficult, ob-
taining a convex relaxation of each Θk , for all k = n + 1, n + 2 . . . , n + q, is
1 The integrality and the bounds on x , for k = n + 1, n + 2 . . . , n + q, are inferred
k
from the associated function ϑk and the bounds and the integrality of its arguments.

www.it-ebooks.info
122 PIETRO BELOTTI

in general easier and has been extensively studied [46, 59, 61, 41]. Several
exact MINLP solvers [54, 12, 42] seek an LP relaxation of Θk defined by the
set LPk = {x ∈ [, u] : B k x ≤ ck }, with B k ∈ Qmk ×(n+q) , ck ∈ Qmk , and
mk is the number of inequalities of the relaxation. In general, mk depends
on ϑk and, for certain constraints, a larger mk yields a better lower bound.
However, in order to keep the size of the linearization limited, a relatively
small value is used. For the linearization
"n+q procedure we have used, mk ≤ 4
for all k. Let us denote LP = k=n+1 LPk . Because LPk ⊇ Θk for all
k = n + 1, n + 2 . . . , n + q, we have LP ⊇ Θ. Hence, minimizing xn+q over
the convex set LP gives a lower bound on the optimal solution value of P0 .
The aforementioned MINLP solvers are branch-and-bound procedures
that obtain, at every BB node, a linearization of P. It is crucial, as pointed
out by McCormick [46], that for each k = n + 1, n + 2 . . . , n + q, the
linearization of Θk be exact at the lower and upper bounds on the variables
appearing as arguments of ϑk , i.e., the linearization must be such that if a
solution x is feasible for the LP relaxation but not for P, then there exists
an i such that i < xi < ui , so that a spatial disjunction xi ≤ xi ∨ xi ≥ xi
can be used.
In general, for all k = n + 1, n + 2 . . . , n + q, both B k and ck de-
pend on the variable bounds  and u: the tighter the variable bounds, the
stronger the lower bound obtained by minimizing xn+q over LP. For such
an approach to work effectively, several bound reduction techniques have
been developed [30, 48, 52, 51, 58, 38]. Most of these techniques use the
nonconvex constraints of the reformulation or the linear constraints of the
linearization, or a combination of both. An experimental comparison of
bound reduction techniques used in a MINLP solver is given in [12].
Let us consider, as an example, a constraint xk = ϑk (x) = (xi )3 , and
the nonconvex set Θk = {x ∈ Rn+q ∩ [, u] : xk = (xi )3 }, whose projection
on the (xi , xk ) space is the bold curve in Figure 1(a). A linearization of
Θk is in Figure 1(b), and is obtained through a procedure based on the
function ϑk (x) = (xi )3 and the bounds on xi . As shown in Fig. 1(c), when
tighter bounds are known for xi a better relaxation can be obtained: the
set Θk = {x ∈ Rn+q ∩ [, u ] : xk = (xi )3 }, where uj = uj for j = i and
ui < ui , admits a tighter linearization LP , which is a proper subset of LP
since the linearization is exact at ui .

3. Disjunctions in MINLP. In order to apply the paradigm of dis-


junctive programming to nonconvex MINLP, we need (i) a description of
the set of valid disjunctions in P and (ii) a procedure for selecting, from said
set, a disjunction violated by an optimal solution x to the LP relaxation.

Deriving disjunctions. Nonconvex constraints in the reformulation


P above are of two types:
• integrality of a subset of variables: xi for i ∈ {1, 2 . . . , r, n + 1, n +
1 . . . , n + s};

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 123

xk = (xi )3 xk = (xi )3 xk = (xi )3

i xi i xi i xi
ui ui ui

(a) Reformulated constraint (b) A linearization (c) A refined linearization

Fig. 1. Linearization of constraint xk = (xi )3 : the bold line in (a) represents the
nonconvex set {(xi , xk ) : i ≤ xi ≤ ui , xk = (xi )3 }, while the polyhedra in (b) and (c)
are its linearizations for different bounds on xi .

• nonconvex constraints xk = ϑk (x) for k = n + 1, n + 2 . . . , n + q.


MILP problems and convex MINLPs contain nonconvex constraints of the
first type only. When these constraints are relaxed, one obtains a contin-
uous relaxation which yields a lower bound on the optimal solution value
and a solution vector x that satisfies the convex constraints, but may vio-
late the integrality requirements. If xi ∈ / Z for one of the integer variables,
then x violates the variable disjunction xi ≤ xi  ∨ xi ≥ xi .
There has been extensive research on how to use this type of disjunc-
tion, how to derive disjunctive cuts from them, and how to efficiently add
such inequalities, using a BB method coupled with a cut generating proce-
dure [7, 8, 49]. Several generalizations can be introduced at the MILP level:
a well known example is the split disjunction πx ≤ π0 ∨ πx ≥ π0 + 1, where
(π, π0 ) ∈ Zp+1 and x ∈ Zp is a vector of p integer variables [10, 17, 22, 36].
The reformulation of an MINLP problem is subject to both types of
nonconvex constraints. The optimal solution to the LP relaxation, x ,
may be infeasible for the integrality constraints or for one or more of the
constraints of the reformulation, xk = ϑk (x). In this work, we will primarily
consider disjunctions arising from constraints of the second type, and will
focus on the problem of finding a valid disjunction that is violated by a
solution to the LP relaxation of P0 .
For the sake of simplicity, consider a nonconvex constraint xk = ϑk (xi )
with ϑk univariate and xi continuous (disjunctions can be derived with
a similar procedure when ϑk is multivariate and/or xi is integer), and
assume this constraint is violated by a solution x of the LP relaxation,
i.e., xk = ϑk (xi ). The spatial disjunction

xi ≤ β ∨ xi ≥ β, (3.1)

www.it-ebooks.info
124 PIETRO BELOTTI

although valid for any β ∈ [i , ui ], is not violated by x . This is a point


of departure with MILP, for which disjunctions over integer variables are
valid and, obviously, violated by fractional solutions.
A violated disjunction for MINLP, however, can be derived by apply-
ing bound reduction and the linearization procedure. Suppose that the
linearization for the set Θk = {x ∈ X ∩ [, u] : xi ≤ β, xk = ϑk (xi )} is
LPk = {x ∈ [, u] : B  x ≤ c }, and analogously, the linearization for Θk =
{x ∈ X ∩ [, u] : xi ≥ β, xk = ϑk (xi )} is LPk = {x ∈ [, u] : B  x ≤ c };
we assume that the linear constraints include the new upper (resp. lower)
bound β on xi . Assuming that the two linearizations LPk and LPk are
exact at the bounds on xi , we have x ∈ / LPk ∪LPk . Hence, the disjunction

x ∈ LPk ∨ x ∈ LPk (3.2)

is valid and violated by x , and can be used to generate a disjunctive cut


or a branching rule.
Although it might seem intuitive to set β = xi , in practice spatial
disjunctions (3.1) are often selected such that β = xi , for instance because
xi is very close to one of the bounds on xi . Even so, there are procedures
to derive a valid disjunction (3.2) violated by x . The rules for selecting a
good value of β have been discussed, among others, in [12, 62] and will not
be presented here.
As an example, consider the constraint xk = ϑk (xi ) = (xi )2 and the
components (xi , xk ) of the LP solution x as in Figure 2(a). Since x lies in
the convex hull of Θk = {x ∈ Rn+q , x ∈ [, u], xk = (xi )2 }, refining the lin-
earization, by adding linearization inequalities for other auxiliary variables,
may not be sufficient to separate x . In this case, a valid disjunction that
is violated by x can be used to either create two subproblems or to derive
a disjunctive cut. We observe that (xi , xk ) is not necessarily a vertex of
the linearization of xk = (xi )2 since it is simply a projection of x (which
is a vertex of the LP relaxation) onto the plane (xi , xk ). Starting from
the valid disjunction xi ≤ β ∨ xi ≥ β, one can create two linearizations
LP and LP , shown as shaded areas in Figure 2(b). The corresponding
disjunction x ∈ LP ∨ x ∈ LP is valid and violated by x .

Selecting a disjunction. In its simplest form, the problem of finding


the most effective disjunction considers variable disjunctions only and can
be stated as follows: given an optimal solution x of an LP relaxation
of P, an index set I of integer variables with fractional value in x , and
an index set N of violated nonconvex constraints of P, i.e., N = {k ∈
{n + 1, n + 1 . . . , n + q} : xk = ϑk (x )}, choose either (i) an element i ∈ I
defining the integer disjunction xi ≤ xi  ∨ xi ≥ xi  or (ii) an element k
of N , a variable xi that is an argument of ϑk (x), and a scalar β defining
the spatial disjunction xi ≤ β ∨ xi ≥ β.

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 125

In order to select a disjunction, an infeasibility parameter, which de-


pends on the solution x of the LP relaxation, is associated with each
variable xi of the reformulation. A null infeasibility corresponds to a dis-
junction satisfied by x , and a nonzero infeasibility corresponds to a dis-
junction violated by x . Ideally, a large infeasibility should indicate a better
disjunction, for example in terms of improvement in the lower bound of the
two problems generated by branching on that disjunction. The infeasibility
Ωint (xi ) of an integer variable xi is computed as min{xi − xi , xi  − xi }.
For a variable xj appearing in one or more nonlinear expressions, consider
the set Dj of indices k of nonconvex constraints xk = ϑk (x) where ϑk con-
tains xj as an argument, and define γk = |xk − ϑk (x )| for all k in Dj . We
define
 the infeasibility Ωnl (xi ) of xj as a convex combination of mink∈Dj γk ,
k∈Dj γk , and maxk∈Dj γk . If xj is also integer, the maximum between
Ωnl (xi ) and Ωint (xi ) is used. We refer the reader to [12, §5] for a more
detailed description, which is omitted here for the sake of conciseness.
A standard procedure for selecting disjunctions sorts them in non-
increasing order of infeasibility. When selecting a disjunction for branching,
the one with maximum infeasibility is chosen. If a set of disjunctive cuts
is desired, the first p disjunctions in the sorted list are chosen, for a given
p ≥ 1, and p disjunctive cuts are generated.
As discussed in [1, 12], this infeasibility measure may not be the best
way to rank disjunctions. More sophisticated techniques have been pro-
posed for disjunction selection in the branching process, for example strong
branching [4], pseudocosts branching [13], reliability branching [1], and Vi-
olation Transfer [43, 62]. A generalization of reliability branching [1] to the
MINLP case has been recently presented [12].

Disjunctions in special cases of MINLP. Some classes of MINLP


problems exhibit nonconvex constraints that give rise to special disjunc-
tions, and are used to generate disjunctive cuts. One such example has
been studied for quadratically constrained quadratic programs (QCQPs)
by Saxena et al. [55, 56]. These problems contain nonlinear terms of the
form xi xj . Applying a reformulation such as that described in Section 2
would obtain auxiliary variables yij = xi xj for 1 ≤ i ≤ j ≤ n. In matricial
form, this corresponds to the nonconvex constraint Y = xx , where Y
is a symmetric, n × n matrix of auxiliary variables and x is the n-vector
of variables (notice that there are n(n + 1)/2 new variables instead of n2 ,
since yij = yji ). Rather than applying a linearization to each nonconvex
constraint yij = xi xj , such a constraint can be relaxed as Y − xx  0,
thus reducing a QCQP to a (convex) Semidefinite program, which yields
comparably good lower bounds [31, 53, 3]. However, these SDP models are
obtained from the original problem by relaxing the nonconvex constraint
xx − Y  0. In [55], this constraint is used to generate disjunctions: given
a vector v ∈ Rn , the non-convex constraint (v  x)2 ≥ v  Y v, which can be
rewritten as w2 ≥ z after a change of variables, is obtained from the neg-

www.it-ebooks.info
126 PIETRO BELOTTI

xk = x2i xk = x2i

(xi , xk ) (xi , xk )

xi xi
i ui i β ui
(a) Infeasible solution of a linearization (b) Linearized subproblems

Fig. 2. MINLP Disjunctions and nonconvex constraints. In (a), the shaded area is
the linearization of the constraint xk = ϑk (xi ) with xi ∈ [i , ui ], whereas (xi , xk ) is the
value of xi and xk in the optimum of the LP relaxation. In (b), the spatial disjunction
xi ≤ β ∨ xi ≥ β generates two sets Θk = {x ∈ X ∩ [, u] : xk = x2i , xi ≤ β} and

Θ 2
k = {x ∈ X ∩ [, u] : xk = xi , xi ≥ β}. The corresponding linearizations, LPk and
LPk , are the smaller shaded areas.

ative eigenvalues of the matrix x̄x̄ − Ȳ , where (x̄, Ȳ ) is a solution to the


relaxation. This is then used to generate a disjunction and disjunctive cuts
in x and Y . In a more recent development [56], this procedure is modified
to generate disjunctive cuts in the original variables x only.
4. Disjunctive cuts in nonconvex MINLP. Assume that the lin-
earization step produces an LP relaxation that we denote min{xn+q : Ax ≤
a,  ≤ x ≤ u}, where A ∈ QK×(n+q) , a ∈ QK , and K is the total number of
linear inequalities generated for all of the nonlinear constraints xk = ϑk (x)
of the reformulation (we recall that K ≤ 4(n + q)), while  and u are
the vectors of the lower and upper bounds on both original and auxil-
iary variables. Let us denote the feasible set of the linear relaxation as
LP = {x ∈ Rn+q : Ax ≤ a,  ≤ x ≤ u}.
Consider a disjunction xi ≤ β ∨ xi ≥ β. The two LP relaxations
LP = {x ∈ LP : xi ≤ β}
LP = {x ∈ LP : xi ≥ β},

are created by amending to LP one constraint of the disjunction. As


pointed out in the previous section, the disjunction alone does not elimi-
nate any solution from LP ∪ LP , because LP = LP ∪ LP , while this
does not hold for disjunctions on integer variables, where LP strictly con-
tains the union of the two subproblems. Consider two problems P and
P , obtained by intersecting P with the constraints xi ≤ β and xi ≥ β,
respectively. Apply bound reduction and the linearization procedure to P

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 127

and P , and denote the tightened problems as SLP and SLP (see Figure
2(b)). The two sets can be described as

SLP = {x ∈ LP : B  x ≤ c },
SLP = {x ∈ LP : B  x ≤ c },
   
where B  ∈ QH ×(n+q) , c ∈ QH , B  ∈ QH ×(n+q) , and c ∈ QH are
the coefficient matrices and the right-hand side vectors of the inequalities
added to the linearizations, which contain the new bound on variable xi
and, possibly, new bounds on other variables.
We re-write these two sets in a more convenient form:
SLP = {x ∈ Rn+q : A x ≤ a }
SLP = {x ∈ Rn+q : A x ≤ a },

where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
A a A a
⎜ B ⎟ ⎜ c ⎟ ⎜ B  ⎟ ⎜ c ⎟
A = ⎜ ⎟
⎝ −I ⎠ , a = ⎜ ⎟
⎝ − ⎠ ; A = ⎜ ⎟
⎝ −I ⎠ , a = ⎜
⎝ − ⎠

I u I u

so as to include the initial linear constraints, the new linearizations inequal-


ities, and the variable bounds in a more compact notation. We denote as
K  (resp. K  ) the number of rows of A (resp. A ) and the number of
elements of a (resp. a ).
As described by Balas [5], an inequality α x ≤ α0 , where α ∈ Qn+q
and α0 ∈ Q, is valid for the convex hull of SLP ∪SLP if α and α0 satisfy:

α ≤ u A , α0 = u a ,
α ≤ v  A , α0 = v  a ,
 
where u ∈ RK K
+ and v ∈ R+ . Given an LP relaxation and its optimal solu-
tion x , an automatic procedure for generating a cut of this type consists
of finding vectors u and v such that the corresponding cut is maximally
violated [7]. This requires solving the Cut Generating Linear Programming
(CGLP) problem

max α x −α0
s.t. α −u A ≤0
α −v  A ≤0
α0 −u a =0 (4.1)
α0 −v  a =0
u e +v  e

=1
u, v ≥0

where e is the vector with all components equal to one. An optimal solu-
tion (more precisely, any solution with positive objective value) provides a

www.it-ebooks.info
128 PIETRO BELOTTI

valid disjunctive cut that is violated by the current solution x . Its main
disadvantage is its size: the CGLP has (n + q + 1) + K  + K  variables
and 2(n + q) + 3 constraints, and is hence at least twice as large as the LP
used to compute a lower bound. Given that the optimal solution of the
CGLP is used, in our implementation, to produce just one disjunctive cut,
solving one problem (4.1) for each disjunction of a set of violated ones, at
every branch-and-bound node, might prove ineffective. To this purpose,
Balas et al. [6, 9] present a method to implicitly solve the CGLP for binary
disjunctions by applying pivot operations to the original linear relaxation,
only with a different choice of variables. It is worth noting that, unlike the
MILP case, here A and A differ for much more than a single column. As
shown in [2], this implies that the result by Balas et al. does not hold in
this case.
An example. Consider the continuous nonconvex nonlinear program
P0 : min{x2 : x4 ≥ 1}. It is immediate to check that its feasible region is
the nonconvex union of intervals (−∞, −1] ∪ [+1, +∞), and that its two
global minima are −1 and +1. Its reformulation is as follows:

(P) min w
s.t. w = x2
y = x4
y ≥ 1.

It is crucial to note here that, although the problem is trivial and can be
solved by inspection, state-of-the-art MINLP solvers that use reformula-
tion ignore the relationship between the objective function and the con-
straint, i.e., y = w2 . The tightest convex relaxations of the two nonlinear
constraints are obtained by simply replacing the equality with inequality,
therefore any LP relaxation generated by a MINLP solver is a relaxation
of
(CR) min w
s.t. w ≥ x2
y ≥ x4
y ≥ 1,

whose optimal solution is (x, w, y) = (0, 0, 1), whose value is w = 0 and


which is infeasible for P and hence for P0 . Imposing the disjunction x ≤
0∨x ≥ 0 to P yields two subproblems with the following convex relaxations:

(CR ) min{w : y ≥ 1, w ≥ x2 , y ≥ x4 , x ≤ 0}
(CR ) min{w : y ≥ 1, w ≥ x2 , y ≥ x4 , x ≥ 0},

both with the same optimal solution, (x, w, y) = (0, 0, 1). Bound reduction
is crucial here for both subproblems, as it strengthens the bounds on x
using the lower bound on y. Indeed, x ≤ 0 and 1 ≤ y = x2 imply x ≤ −1,

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 129

and analogously x ≥ 0 and 1 ≤ y = x2 imply x ≥ 1. Hence, the relaxations


of the two tightened subproblems are

(SCR ) min{w : y ≥ 1, w ≥ x2 , y ≥ x4 , x ≤ −1}


(SCR ) min{w : y ≥ 1, w ≥ x2 , y ≥ x4 , x ≥ +1}

and their optimal solutions are feasible for P0 and correspond to the two
global optima {−1, +1}. Hence, the problem is solved after branching on
the disjunction x ≤ 0 ∨ x ≥ 0. However, the nonlinear inequality x2 ≥ 1 is
valid for both CR and CR as it is implied by x ≤ −1 in the former and by
x ≥ 1 in the latter. Since w ≥ x2 , the (linear) disjunctive cut w ≥ 1 is valid
for both (SCR ) and (SCR ). If added to (CR), a lower bound of 1 is
obtained which allows to solve the problem without branching. It is easy to
prove that even using a linear relaxation and applying the CGLP procedure
yields the same disjunctive cut. This simple example can be complicated
by considering n variables subject each to a nonconvex constraint:
n 2
min i=1 xi
4
s.t. xi ≥ 1 ∀i = 1, 2 . . . , n.

A standard BB implementation needs to branch on all variables before


closing the gap between lower and upper bound, requiring an exponential
number of subproblems as one needs to branch on all disjunctions. How-
ever, the disjunctive cuts wi ≥ 1∀i = 1, 2 . . . , n, where wi is the variable
associated with the expression x2i , allow to find an optimal solution imme-
diately. Although this example uses a nonlinear convex hull for the problem
and the subproblems, it can be shown that the same disjunctive cut can
be generated within a framework where linear relaxations are used and a
CGLP-based method for generating disjunctive cuts is used.
A test with n = 100 was conducted using the two variants of couenne
dc and v0 described in Section 6, respectively with and without disjunctive
cuts. With disjunctive cuts generated at all nodes (at most 20 per node),
the problem was solved to global optimality (the optimal solution has a
value of 100) in 57 seconds and 18 BB nodes on a Pentium M 2.0GHz laptop,
41 seconds of which were spent in the generation of such cuts. Without
disjunctive cuts, the solver was stopped after two hours of computation,
more than 319,000 active BB nodes, and a lower bound of 14.
As a side note, one might expect a BB procedure to enforce all disjunc-
tions as branching rules. Hence, at any node of depth d in the BB tree the
d disjunctions applied down to that level determine the bound to be equal
to d. Because all optima have objective value equal to n, an exponential
number of nodes will have to be explored, so that the global lower bound
of the BB will increase as the logarithm of the number of nodes.
This example can be interpreted as follows: the decomposition of the
expression trees at the reformulation phase is followed by a linearization
that only takes into account, for each nonlinear constraint yi = x4i , of the

www.it-ebooks.info
130 PIETRO BELOTTI

variables xi and yi only — this causes a bad lower bound as there is no


link between the lower bound on yi and that on wi . The disjunctive cuts,
while still based on a poor-quality LP relaxation, have a more “global”
perspective of the problem, where the bound on yi implies a bound on wi .

5. A procedure to generate disjunctive cuts. The procedure is


sketched in Table 1, and its details are discussed below. It requires a
description of P, the feasible region of its linear relaxation LP = {x ∈
Rn+q : Ax ≤ a}, lower and upper bounds , u, an optimal solution x
of the relaxation, and disjunction xi ≤ β ∨ xi ≥ β. Branch-and-bound
procedures for MINLP recursively reduce the bounding box [, u], therefore
the procedure sketched can be applied at every branch-and-bound node.
Implementation details. We have used couenne [11, 23], an Open
Source software package included in the Coin-OR infrastructure [44], for
all experiments. It implements reformulation, linearization, and bound
reduction methods, as well as a reliability branching scheme [12]. It also
recognizes complementarity constraints, i.e., constraints of the form xi xj =
0, with i = j, as the disjunction xi = 0 ∨ xj = 0.
couenne is a BB whose lower bounding method is based on the re-
formulation and linearization steps as discussed in Section 2. At the be-
ginning, the linearization procedure is run in order to obtain an initial LP
relaxation. At each node of the BB, two cut generation procedures are
used: up to four rounds of cuts to refine the linear relaxation and at most
one round of disjunctive cuts.
Calls to the disjunctive cut generator. While the procedure that gener-
ates a linear relaxation of the MINLP is relatively fast and hence is carried
out multiple times at every node of the BB tree, separating disjunctive
cuts using a CGLP is CPU-intensive as it requires solving a relatively large
LP for each disjunction. Therefore, in our experiments, disjunctive cuts
are only generated at BB nodes of depth lesser than 10, and at most 20
violated disjunctions per node are used.
For each node where disjunctive cuts are separated, the first 20 dis-
junctions of a list provided by the branch-and-bound algorithm (which,
in couenne’s case, is cbc [18]) are selected. This list is sorted in non-
increasing order of the infeasibility of variable xi of the reformulation and
a solution x of the LP relaxation.
Disjunctions and convex/concave functions. Disjunctions can help cut
an LP solution through branching or disjunctive cuts, but they are not
the only way to do so. In order to generate disjunctive cuts only when
necessary, in our experiments a disjunction from a nonlinear constraint
xk = ϑk (x) is not used, for branching or for separating a disjunctive cut,
if it is possible to add a linearization inequality for ϑk that cuts x , i.e., if
xk > ϑk (x ) and ϑk is concave or xk < ϑk (x ) and ϑk is convex. Consider
Figure 3, where the components (xi , xk ) of the LP solution associated
with the nonlinear constraint xk = exi are shown in two cases: in the

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 131

xk = exi xk = exi

(xi , xk )

(xi , xk )

xi xi
0 β 0 β
(a) Separable solution (b) Non-separable solution

Fig. 3. Separable and non-separable points. In (a), although the point is infeasi-
ble for P, another round of linearization cuts is preferred to a disjunction (either by
branching or through a disjunctive cut), as it is much quicker to separate. In (b), no
refinement of the linear relaxation is possible, and a disjunction must be applied.

first, xk < exi , while in the second xk > exi . In both cases, a disjunction
 

xi ≤ β ∨ xi ≥ β could be considered by the MINLP solver as a means to


create two new subproblems and the two LP relaxations, both excluding
x , or to create a disjunctive cut. However, the point x in Figure 3(a) can
be cut simply by refining the LP relaxation, thus avoiding the evaluation
of a disjunction on xi .
Disjunctions on integer variables. Disjunctions of the form xi ≤ xi ∨
xi ≥ xi  are also used in this framework, as they may also lead to two
subproblems with refined relaxation and tightened bounds. A more effi-
cient separation technique for generating disjunctive cuts based on integer
disjunctions, CglLandP [20], is available in the COIN-OR cut generation
library [45]. couenne has an option for separating cuts using CglLandP,
but can also use the procedure described above on integer disjunctions. The
two procedures are different: while the first uses a variant of the method
proposed by Balas et al. [6, 9] and is hence faster, the CGLP-based proce-
dure in couenne takes advantage of the reduced bounds and the refined
linearization, which are not available to CglLandP. In our experiments,
we have turned off CglLandP and separated disjunctive cuts on integer
variables with the method described in this paper, with the only difference
that the disjunction used is of the form xi ≤ β ∨ xi ≥ β + 1, with β ∈ Z.
Branching priorities. In the software framework we chose for our imple-
mentation, integer and nonlinear disjunctions have a priority, i.e., a scalar
that determines precedences in choosing disjunctions: a disjunction can
be selected only if there are no violated disjunctions with smaller priority.

www.it-ebooks.info
132 PIETRO BELOTTI

Table 1
Procedure for generating a disjunctive cut for problem P.

Input: A problem P and a linear relaxation, LP: (A, a, , u)


Optimal solution of LP: x
A disjunction xi ≤ β ∨ xi ≥ β
1 Create P from P by imposing xi ≤ β
2 Create P from P by imposing xi ≥ β
3 Apply bound reduction to P , obtain [ , u ]
4 Apply bound reduction to P , obtain [ , u ]
5 Generate linear relaxations SLP of P , defined by A , a
6 Generate linear relaxations SLP of P , defined by A , a
7 Construct CGLP (4.1) and solve it
8 Return (α, α0 )

In our experiments, we have assigned the same priority to all disjunctions,


therefore disjunctions are chosen based on their infeasibility only.
Rank of the generated inequalities. Suppose an LP relaxation LP and
K disjunctions are available at a branch-and-bound node. In our imple-
mentation, after generating the disjunctive cut α x ≤ α0 for the j-th dis-
junction, the cut for the (j + 1)-st disjunction is generated using a CGLP
constructed from the same LP relaxation LP, i.e., without the new cut
α x ≤ α0 (couenne has an option to amend the CGLP with the cuts
generated in the same round, which was not tested). This is done to limit
the rank of the new cut, given that high-rank cuts may introduce numerical
errors. In fact, the K-th generated cut would otherwise be of rank K even if
LP contained no disjunctive cuts, and, in the worst case, its rank would be
Kd if generated in a branch-and-bound node at depth d. Therefore, given
the maximum node depth of 10 mentioned above, the maximum rank of the
cuts separated in the tests below is 10. If, in the future, disjunctive cuts
are generated at deeper nodes of the BB tree, mechanisms for controlling
the rank of each cut, such as that used by [55], might be necessary.
6. Experimental results. In order to assess the utility and effec-
tiveness of the procedure to generate disjunctive cuts for MINLP described
in this paper, we have performed a battery of tests on a set of 84 publicly
available MINLP instances from the following repositories:
• MacMINLP [37] and minlplib [16]: a collection of MINLP in-
stances, both convex and nonconvex;
• nConv: a collection of nonconvex MINLPs2 ;
• MIQQP: Mixed Integer quadratically constrained quadratic pro-
grams [47]; model qpsi.mod was used;
• globallib: a collection of continuous NLP problems [29];

2 See https://projects.coin-or.org/Couenne/browser/problems/nConv.

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 133

• boxQP: continuous, nonconvex, box-constrained quadratic prob-


lems; the smaller instances are from [63] and those with more than
60 variables are from [15];
• airCond, a 2D bin-packing problem for air conduct design [28].
While most of these instances are nonconvex MINLP problems, the
boxQP instances belong to a specific class of problems for which much more
efficient methods exist. In particular, the one presented by Burer et al. [15]
solves all these instances more quickly than couenne. It should also be
noted that, for this type of problems, there are much stronger relaxations
than that obtained through reformulation, see for example Anstreicher [3].
Table 3 describes the parameters of each instance: number of variables
(var), of integer variables (ivar), of constraints (con), and of auxiliary vari-
ables (aux), or, in the notation used above: n, r, m, and q. The latter
parameter is a good indicator of the size of the LP relaxation, as it is
proportional to the number of linearization inequalities added.
We have conducted tests to compare two distinct branching techniques
with disjunctive cuts, in order to understand what combination is most
effective. The following four variants of couenne have been tested:
• v0: no disjunctive cuts, and the basic branching scheme br-plain
described in [12], Section 5.1;
• rb: no disjunctive cuts, and an extension of reliability branching
[1] to MINLP denoted int-br-rev in [12];
• dc: disjunctive cuts separated until depth 10 of the BB tree and
the br-plain branching scheme;
• dc+rb: disjunctive cuts separated until depth 10 of the BB tree
and reliability branching.
The latter variant, apart from being a combination of more sophisticated
methods, has a further advantage: since reliability branching is a method
to rank branching rules and therefore disjunctions, disjunctive cuts are
separated only using the most promising disjunctions.
All tests were performed on a computer with a 2.66GHz processor,
64GB of RAM, and Linux kernel 2.6.29. couenne version 0.2, compiled
with gcc 4.4.0, was used. A time limit of two hours was set for all variants.
Tables 4 and 5 report in detail the comparison between the four vari-
ants of couenne. If an algorithm solved an instance in less than two hours,
its CPU time in seconds is reported. Otherwise, the remaining gap

zbest − zlower
gap = (6.1)
zbest − z0

is given, where zbest is the objective value of the best solution found by the
four algorithms, zlower is the lower bound obtained by this algorithm, and
z0 is the initial lower bound found by couenne, which is the same for all
variants. If no feasible solution was found by any variant, the lower bound
is reported in brackets. The best performances are highlighted in bold.

www.it-ebooks.info
134 PIETRO BELOTTI

Table 2
Summary of the comparison between the four variants. The first three columns
report, for those instances that could be solved within the time limit of two hours, the
number of instances solved (“solved”), the number of instances for which the variant
obtained the best CPU time or within the 10% of the best (“best time”), and analogously
for the number of BB nodes (“best nodes”). The last column, “best gap,” reports the
number of instances, among those that could not be solved within two hours by any of the
variant, for which a variant obtained the smallest gap, or within 10% of the smallest,
in terms of the remaining gap.

<2h Unsolved
Variant solved best time best nodes best gap
v0 26 12 11 31
rb 26 13 11 35
dc 35 8 13 29
dc+rb 39 20 29 37

Table 2 summarizes our results by pointing out what variants perform


better overall. For each variant, the column “solved” reports the number
of instances solved by that variant before the time limit, while column
“best time” reports the number of solved instances whose CPU time is
best among the four variants, or at most 10% greater than the best time.
An analogous measure is given in the third column, “best nodes,” for the
BB nodes. The last column, “best gap,” refers to instances that were not
solved before the time limit by any of the variants, and reports for how
many of these the variant had the best gap, or within 10% of the best.
Although this gives a somewhat limited perspective, it shows that the
variants with disjunctive cuts, especially the one coupled with reliability
branching, have an edge over the remaining variants. They in fact allow
to solve more problems within the time limit, on average, and, even when
a problem is too difficult to solve, the remaining gap is smaller more often
when using disjunctive cuts.
Variants with disjunctive cuts seem to outperform the others, espe-
cially for the boxQP instances (we recall that more efficient solvers are
available for this type of problem [15]). For some instances, however, dis-
junctive cuts only seem to slow down the BB as the CPU time spent in
separation does not produce any effective cut. The tradeoff between the
effectiveness of the disjunctive cuts and the time spent generating them
suggests that a faster cut generation would increase the advantage.
We further emphasize this fact by showing the amount of time spent
by reliability branching and disjunctive cuts, which is included in the total
CPU time reported. Tables 6 and 7 show, for a subset of instances for
which branching time or separation time were relatively large (at least
500 seconds), the CPU time spent in both processes and the number of
resulting cuts or BB nodes. This selection of instances shows that, in
certain cases, the benefit of disjunctive cuts is worth the CPU time spent

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 135

in the generation. This holds especially for the boxQP instances: although
a large amount of time is spent in generating disjunctive cuts, this results
in a better lower bound or a lower CPU time. Again, the fact that the
current separation algorithm is rather simple suggests that a more efficient
implementation would obtain the same benefit in shorter time.
We also graphically represent the performance of the four variants
using performance profiles [24]. Figure 4(a) depicts a comparison on the
CPU time. This performance profile only considers instances that could be
solved in less than two hours by at least one of the variants. Hence, it also
compares the quality of a variant in terms of number of instances solved.
Figure 4(b) is a performance profile on the number of nodes.
Figure 4(c) is a comparison on the remaining gap, and reports on all
instances for which none of the variants could obtain an optimal solution
in two hours or less. Note that this is not a performance profile: rather
than the ratio between gaps, this graph shows, for each algorithm, the
number of instances (plotted on the y axis) with remaining gap below the
corresponding entry on the x axis.
The three graphs show once again that, for the set of instances we
have considered, using both reliability branching and disjunctive cuts pay
off for both easy and difficult MINLP instances. The former are solved in
shorter time, while for the latter we yield a better lower bound.
7. Concluding remarks. Disjunctive cuts are as effective in MINLP
solvers as they are in MILP. Although they are generated from an LP
relaxation of a nonconvex MINLP, they can dramatically improve the lower
bound and hence the performance of a branch-and-bound method.
One disadvantage of the CGLP procedure, namely having to solve a
large LP in order to obtain a single cut, carries over to the MINLP case.
Some algorithms have been developed, for the MILP case, to overcome this
issue [6, 9]. Unfortunately, as shown in [2], their extension to the MINLP
case is not as straightforward.
Acknowledgments. The author warmly thanks François Margot for
all the useful discussions that led to the development of this work, and an
anonymous referee whose feedback helped improve this article. Part of this
research was conducted while the author was a Postdoctoral Fellow at the
Tepper School of Business, Carnegie Mellon University, Pittsburgh PA.

www.it-ebooks.info
136 PIETRO BELOTTI

40 40

30 30

20 20

10 10

0 0
1 10 100 100 101 102 103 10 4 10 5

(a) CPU time (b) BB nodes

50

40 dc+ rb

30 dc

20 rb

10 v0

0
0.0 0.2 0.4 0.6 0.8 1.0

(c) Remaining gap

Fig. 4. Performance profiles for the four variants of couenne.

www.it-ebooks.info
Table 3
Instances used in our tests. For each instance, “var” is the number of variables, “ivar” the number of integer variables, “con” the number
of constraints, and “aux” the number of auxiliary variables generated at the reformulation step. Instances in the boxQP group are continuous
and only constrained by a bounding box, hence columns “ivar” and “con” are omitted.

Name var aux Name var ivar con aux Name var ivar con aux
boxQP globallib minlplib
sp020-100-1 20 206 catmix100 301 0 200 800 waterz 195 126 137 146
sp030-060-1 30 265 lnts100 499 0 400 1004 ravem 111 53 186 189
sp030-070-1 30 311 camsh200 399 0 400 600 ravempb 111 53 186 189
sp030-080-1 30 368 qp2 50 0 2 1277 enpro56 128 73 192 188
sp030-090-1 30 402 qp3 100 0 52 52 enpro56pb 128 73 192 188
sp030-100-1 30 457 catmix200 601 0 400 1600 csched2 401 308 138 217
sp040-030-1 40 234 turkey 512 0 278 187 water4 195 126 137 146
sp040-040-1 40 319 qp1 50 0 2 1277 enpro48 154 92 215 206
sp040-050-1 40 399 elec50 150 0 50 11226 enpro48pb 154 92 215 206
sp040-060-1 40 478 camsh400 799 0 800 1200 space25a 383 240 201 119
sp040-070-1 40 560 arki0002 2456 0 1976 4827 contvar 279 87 279 747
sp040-080-1 40 648 polygon50 98 0 1273 6074 space25 893 750 235 136
sp040-090-1 40 715 arki0019 510 0 1 4488 lop97icx 986 899 87 407
sp040-100-1 40 806 arki0015 1892 0 1408 2659 du-opt5 18 11 6 221
sp050-030-1 50 366 infeas1 272 0 342 2866 du-opt 20 13 8 222
sp050-040-1 50 498 lnts400 1999 0 1600 4004 waste 1425 400 1882 2298
sp050-050-1 50 636 camsh800 1599 0 1600 2400 lop97ic 1626 1539 87 4241

www.it-ebooks.info
sp060-020-1 60 354 arki0010 3115 0 2890 1976 qapw 450 225 255 227
sp070-025-1 70 618 nConv MacMINLP
sp070-050-1 70 1227 c-sched47 233 140 138 217 trimlon4 24 24 24 41
sp070-075-1 70 1838 synheatmod 53 12 61 148 trimlon5 35 35 30 56
sp080-025-1 80 789 JoseSEN5c 987 38 1215 1845 trimlon6 168 168 72 217
sp080-050-1 80 1625 MIQQP trimlon7 63 63 42 92
sp080-075-1 80 2388 ivalues 404 202 1 3802 space-25-r 843 750 160 111
DISJUNCTIVE CUTS FOR NONCONVEX MINLP

sp090-025-1 90 1012 imisc07 519 259 212 696 space-25 893 750 235 136
sp090-050-1 90 2021 ibc1 2003 252 1913 1630 trimlon12 168 168 72 217
sp090-075-1 90 3033 iswath2 8617 2213 483 4807 space-960-i 5537 960 6497 3614
sp100-025-1 100 1251 imas284 301 150 68 366 misc.
sp100-050-1 100 2520 ieilD76 3796 1898 75 3794 airCond 102 80 156 157
sp100-075-1 100 3728
137
138

Table 4
Comparison between the four methods. Each entry is either the CPU time, in seconds, taken to solve the instance or, if greater than two
hours, the remaining gap (6.1) after two hours. If the remaining gap cannot be computed due to the lack of a feasible solution, the lower bound,
in brackets, is shown instead.

Name v0 rb dc dc+rb Name v0 rb dc dc+rb


boxQP globallib
sp020-100-1 57 70 21 9 catmix100 9 10 14 13
sp030-060-1 1925 1864 503 280 lnts100 30.0% 29.6% 15.4% 25.4%
sp030-070-1 3.4% 2.2% 1129 718 camsh200 79.7% 61.1% 78.5% 64.7%
sp030-080-1 13.7% 10.7% 1406 1342 qp2 12.5% 11.3% 13.8% 10.5%
sp030-090-1 2591 1662 695 316 qp3 6.6% 5.5% 6.5% 5.9%
sp030-100-1 12.9% 11.3% 2169 1654 catmix200 53 57 57 53
sp040-030-1 74 60 9 8 turkey 114 127 110 127
sp040-040-1 3.1% 5.2% 393 308 qp1 12.4% 12.2% 14.4% 11.8%
sp040-050-1 4.5% 8.2% 703 370 elec50 (353.6) (353.6) (353.6) (353.6)
sp040-060-1 39.4% 36.3% 6467 4145 camsh400 84.7% 72.9% 84.1% 80.8%
sp040-070-1 27.5% 24.5% 2746 1090 arki0002 (0) (0) (0) (0)
sp040-080-1 39.6% 40.4% 6.4% 6389 polygon50 (-20.2) (-20.2) (-20.2) (-20.2)
sp040-090-1 41.2% 41.4% 7.9% 1.7% arki0019 13.7% 13.7% 13.7% 13.7%
sp040-100-1 44.3% 40.4% 10.4% 7.6% arki0015 83.6% 83.6% 83.6% 83.6%
sp050-030-1 354 361 31 29 infeas1 62.6% 62.6% 62.5% 54.9%
PIETRO BELOTTI

sp050-040-1 29.6% 28.5% 1669 632 lnts400 5.7% 5.7% 10.8% 10.8%

www.it-ebooks.info
sp050-050-1 57.5% 57.2% 22.8% 17.0% camsh800 87.3% 95.9% 97.4% 95.0%
sp060-020-1 1241 953 36 28 arki0010 1622 1538 2126 2079
sp070-025-1 40.1% 40.8% 2292 824 nConv
sp070-050-1 69.3% 70.1% 36.4% 45.3% c-sched47 4.2% 0.8% 4.0% 3.9%
sp070-075-1 81.4% 79.5% 76.7% 76.8% synheatmod 1.2% 0.0% 14.8% 141
sp080-025-1 53.4% 49.5% 4715 1241 JoseSEN5c 56.4% 100.0% 100.0% 99.8%
sp080-050-1 81.6% 81.2% 74.4% 74.4% MIQQP
sp080-075-1 86.9% 88.7% 82.0% 81.2% ivalues 12.1% 23.8% 25.1% 23.2%
sp090-025-1 68.6% 69.1% 38.6% 24.5% imisc07 88.7% 68.8% 99.6% 76.6%
sp090-050-1 83.9% 83.4% 77.4% 77.6% ibc1 (0.787) (0.787) (0.796) (0.813)
sp090-075-1 94.8% 94.9% 87.4% 91.7% iswath2 99.4% 100.0% 99.7% 98.8%
sp100-025-1 75.6% 75.0% 48.6% 40.9% imas284 5007 2273 54.7% 6928
sp100-050-1 89.8% 90.9% 83.7% 82.0% ieilD76 35.2% 99.0% 99.8% 99.6%
sp100-075-1 97.3% 96.6% 91.4% 91.4%
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 139

Table 5
Comparison between the four methods. Each entry is either the CPU time, in
seconds, taken to solve the instance or, if greater than two hours, the remaining gap
(6.1) after two hours. If the remaining gap cannot be computed due to the lack of a
feasible solution, the lower bound, in brackets, is shown instead.

Name v0 rb dc dc+rb
minlplib
waterz 75.1% 58.7% 73.2% 85.9%
ravem 16 23 74 66
ravempb 18 24 55 42
enpro56 39 26 156 76
enpro56pb 55 24 301 81
csched2 2.2% 3.9% 1.2% 3.6%
water4 55.2% 52.0% 44.2% 51.3%
enpro48 58 37 204 114
enpro48pb 49 41 201 126
space25a (116.4) (107.2) (107.3) (100.1)
contvar 91.1% 90.2% 91.1% 89.9%
space25 (98.0) (96.7) (97.1) (177.6)
lop97icx 93.0% 79.6% 93.9% 39.3%
du-opt5 73 170 251 159
du-opt 87 270 136 262
waste (255.4) (439.3) (273.7) (281.6)
lop97ic (2543.5) (2532.8) (2539.4) (2556.7)
qapw (0) (20482) (0) (20482)
MacMINLP
trimlon4 23 4 133 24
trimlon5 48.1% 46 35.6% 142
trimlon6 (16.17) (18.69) (16.18) (18.52)
trimlon7 88.1% 60.5% 89.8% 74.3%
space-25-r (74.2) (68.6) (75.0) (69.6)
space-25 (98.4) (89.6) (96.3) (91.9)
trimlon12 (16.1) (18.6) (16.1) (18.5)
space-960-i (6.5e+6) (6.5e+6) (6.5e+6) (6.5e+6)
misc.
airCond 187 0.9% 876 1471

www.it-ebooks.info
140 PIETRO BELOTTI

Table 6
Comparison of time spent in the separation of disjunctive cuts (tsep ) and in relia-
bility branching (tbr ). Also reported is the number of nodes (“nodes”) and of disjunctive
cuts generated (“cuts”).

rb dc dc+rb
Name tbr nodes tsep cuts tsep cuts tbr nodes
boxQP
sp040-060-1 8 340k 3104 17939 2286 11781 56 3k
sp040-070-1 10 277k 1510 5494 466 2042 56 277k
sp040-080-1 14 174k 4030 15289 3813 14831 95 4k
sp040-090-1 58 136k 4142 13733 3936 14604 115 3k
sp040-100-1 17 178k 4317 11415 3882 11959 126 2k
sp050-050-1 12 289k 4603 10842 4412 11114 127 6k
sp070-025-1 38 308k 1324 2215 383 921 28 308k
sp070-050-1 38 80k 5027 3823 4616 3622 477 80k
sp070-075-1 81 26k 6125 1220 5951 1089 46 26k
sp080-025-1 27 222k 2873 2818 716 1154 28 222k
sp080-050-1 66 35k 5888 1864 5561 1740 85 35k
sp080-075-1 119 4k 6449 720 6464 615 32 4k
sp090-025-1 32 170k 4672 3540 4386 3044 500 170k
sp090-050-1 86 26k 6064 1023 6335 838 37 26k
sp090-075-1 194 4k 6468 467 6862 250 8 4k
sp100-025-1 54 80k 5286 2990 4704 2372 659 80k
sp100-050-1 140 4k 6376 693 6363 733 34 4k
sp100-075-1 272 35k 6852 312 6835 302 3 35k
globallib
lnts100 6084 4k 3234 16 1195 6 4647 3k
camsh200 6242 10k 1772 2303 2123 2645 4192 9k
qp2 214 62k 2662 3505 1351 1662 444 41k
qp3 1298 803k 43 737 47 644 3272 484k
qp1 189 63k 3160 3525 1609 1736 240 45k
elec50 5677 45k 5494 45 5159 68 768 45k
camsh400 3494 33k 3433 818 5242 1263 560 53k
arki0002 6834 53k 3571 237 1641 181 5277 53k
arki0019 5761 53k 1846 39 1543 36 3812 53k
arki0015 2442 2k 2141 215 1442 106 2412 2k
infeas1 1306 2k 1495 45 1273 131 1397 2k
lnts400 457 2k 4767 1 4724 0 373 2k
camsh800 5001 23k 3671 188 2208 100 3172 23k

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 141

Table 7
(Continued) Comparison of time spent in the separation of disjunctive cuts (tsep )
and in reliability branching (tbr ). Also reported is the number of nodes (“nodes”) and
of disjunctive cuts generated (“cuts”).

rb dc dc+rb
Name tbr nodes tsep cuts tsep cuts tbr nodes
nConv
JoseSEN5c 5166 1061k 4280 39 341 5 5315 1061k
MIQQP
ivalues 2816 10k 4223 700 4589 733 1571 10k
imisc07 6005 2k 6835 2621 5475 2131 1404 2k
ibc1 310 2k 6166 38 6368 167 46 2k
iswath2 827 2k 6567 31 6526 52 5 2k
imas284 443 62k 6769 1408 4474 700 381 72k
ieilD76 2934 9k 6994 75 6869 56 804 9k
minlplib
contvar 4284 2k 11 19 13 21 4706 3k
space25 272 1051k 2133 706 178 309 311 1100k
lop97icx 6540 14k 1142 353 4226 2603 2649 3k
waste 5204 13k 7090 715 6934 253 82 13k
lop97ic 3178 11k 4556 25 6734 147 3547 11k
qapw 6400 61k 104 0 34 0 6367 60k
MacMINLP
trimlon6 8500 62k 58 50 558 2649 5944 63k
trimlon12 8432 62k 58 50 562 2649 5941 62k
space-960-i 5859 62k 2127 20 1624 0 4971 62k
misc.
airCond 7123 112k 293 1736 103 1526 39 541k

www.it-ebooks.info
142 PIETRO BELOTTI

REFERENCES

[1] T. Achterberg, T. Koch, and A. Martin, Branching rules revisited, OR Letters,


33 (2005), pp. 42–54.
[2] K. Andersen, G. Cornuéjols, and Y. Li, Split closure and intersection cuts,
Mathematical Programming, 102 (2005), pp. 457–493.
[3] K.M. Anstreicher, Semidefinite programming versus the reformulation-
linearization technique for nonconvex quadratically constrained quadratic pro-
gramming, Journal of Global Optimization, 43 (2009), pp. 471–484.
[4] D.L. Applegate, R.E. Bixby, V. Chvátal, and W.J. Cook, The Traveling Sales-
man Problem, A Computational Study, Princeton, 2006.
[5] E. Balas, Disjunctive programming: Properties of the convex hull of feasible
points, Discrete Applied Mathematics, 89 (1998), pp. 3–44.
[6] E. Balas and P. Bonami, New variants of Lift-and-Project cut generation from
the LP tableau: Open Source implementation and testing, in Integer Pro-
gramming and Combinatorial Optimization, Vol. 4513 of Lecture Notes in
Computer Science, Springer Berlin/Heidelberg, 2007, pp. 89–103.
[7] E. Balas, S. Ceria, and G. Cornuéjols, A lift-and-project cutting plane al-
gorithm for mixed 0-1 programs, Mathematical Programming, 58 (1993),
pp. 295–324.
[8] , Mixed 0-1 programming by lift-and-project in a branch-and-cut framework,
Management Science, 42 (1996), pp. 1229–1246.
[9] E. Balas and M. Perregaard, A precise correspondence between lift-and-project
cuts, simple disjunctive cuts, and mixed integer Gomory cuts for 0-1 program-
ming, Mathematical Programming, 94 (2003), pp. 221–245.
[10] E. Balas and A. Saxena, Optimizing over the split closure, Mathematical Pro-
gramming, Series A, 113 (2008), pp. 219–240.
[11] P. Belotti, couenne: a user’s manual, tech. rep., Lehigh University, 2009.
[12] P. Belotti, J. Lee, L. Liberti, F. Margot, and A. Wächter, Branching and
bounds tightening techniques for non-convex minlp, Optimization Methods
and Software, 24 (2009), pp. 597–634.
[13] M. Benichou, J.M. Gauthier, P. Girodet, G. Hentges, G. Ribiere, and
O. Vincent, Experiments in mixed–integer linear programming, Mathemati-
cal Programming, 1 (1971), pp. 76–94.
[14] L.T. Biegler, I.E. Grossmann, and A.W. Westerberg, Systematic Methods of
Chemical Process Design, Prentice Hall, Upper Saddle River (NJ), 1997.
[15] S. Burer and D. Vandenbussche, Globally solving box-constrained nonconvex
quadratic programs with semidefinite-based finite branch-and-bound, Comput.
Optim. Appl., 43 (2009), pp. 181–195.
[16] M.R. Bussieck, A.S. Drud, and A. Meeraus, MINLPLib – a collec-
tion of test models for mixed-integer nonlinear programming, INFORMS
Journal of Computing, 15 (2003), pp. 114–119. Available online at
http://www.gamsworld.org/minlp/minlplib/minlpstat.htm.
[17] A. Caprara and A.N. Letchford, On the separation of split cuts and related
inequalities, Mathematical Programming, Series B, 94 (2003), pp. 279–294.
[18] CBC-2.1, Available from https://projects.coin-or.org/cbc.
[19] M.T. Çezik and G. Iyengar, Cuts for Mixed 0-1 Conic Programming, Mathe-
matical Programming, Ser. A, 104 (2005), pp. 179–200.
[20] CglLandP, https://projects.coin-or.org/Cgl/wiki/CglLandP.
[21] J.S. Cohen, Computer algebra and symbolic computation: elementary algorithms,
A.K. Peters, Ltd., 2002.
[22] W. Cook, R. Kannan, and A. Schrijver, Chvátal closures for mixed integer
programming problems, Mathematical Programming, 47 (1990), pp. 155–174.
[23] Couenne, https://www.coin-or.org/Couenne.
[24] E.D. Dolan and J.J. Moré, Benchmarking optimization software with perfor-
mance profiles, Mathematical Programming, 91 (2002), pp. 201–213.

www.it-ebooks.info
DISJUNCTIVE CUTS FOR NONCONVEX MINLP 143

[25] S. Drewes, Mixed Integer Second Order Cone Programming, PhD thesis, Tech-
nische Universität Darmstadt, 2009.
[26] C.A. Floudas, Global optimization in design and control of chemical process sys-
tems, Journal of Process Control, 10 (2001), pp. 125–134.
[27] A. Frangioni and C. Gentile, Perspective cuts for a class of convex 0-1 mixed
integer programs, Mathematical Programming, 106 (2006), pp. 225–236.
[28] A. Fügenschuh and L. Schewe, Solving a nonlinear mixed-integer sheet metal
design problem with linear approximations, work in progress.
[29] GAMS Development Corp., Gamsworld global optimization library.
http://www.gamsworld.org/global/globallib/globalstat.htm.
[30] E. Hansen, Global Optimization Using Interval Analysis, Marcel Dekker, Inc.,
New York, 1992.
[31] C. Helmberg and F. Rendl, Solving quadratic (0,1)-problems by semidefinite
programs and cutting planes, Mathematical Programming, 82 (1998), pp. 291–
315.
[32] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, Springer
Verlag, Berlin, 1996.
[33] R.C. Jeroslow, There cannot be any algorithm for integer programming with
quadratic constraints, Operations Research, 21 (1973), pp. 221–224.
[34] J.J. Júdice, H.D. Sherali, I.M. Ribeiro, and A.M. Faustino, A
complementarity-based partitioning and disjunctive cut algorithm for mathe-
matical programming problems with equilibrium constraints, Journal of Global
Optimization, 36 (2006), pp. 89–114.
[35] J. Kallrath, Solving planning and design problems in the process industry using
mixed integer and global optimization, Annals of Operations Research, 140
(2005), pp. 339–373.
[36] M. Karamanov and G. Cornuéjols, Branching on general disjunctions, Math-
ematical Programming (2007).
[37] S. Leyffer, MacMINLP: ampl collection of MINLPs. Available at
http://www-unix.mcs.anl.gov/~leyffer/MacMINLP.
[38] L. Liberti, Writing global optimization software, in Global Optimization: from
Theory to Implementation, L. Liberti and N. Maculan, eds., Springer, Berlin,
2006, pp. 211–262.
[39] L. Liberti, C. Lavor, and N. Maculan, A branch-and-prune algorithm for the
molecular distance geometry problem, International Transactions in Opera-
tional Research, 15 (2008), pp. 1–17.
[40] L. Liberti, C. Lavor, M.A.C. Nascimento, and N. Maculan, Reformulation in
mathematical programming: an application to quantum chemistry, Discrete
Applied Mathematics, 157 (2009), pp. 1309–1318.
[41] L. Liberti and C.C. Pantelides, Convex envelopes of monomials of odd degree,
Journal of Global Optimization, 25 (2003), pp. 157–168.
[42] Lindo Systems, Lindo solver suite. Available online at
http://www.gams.com/solvers/lindoglobal.pdf.
[43] M.L. Liu, N.V. Sahinidis, and J.P. Shectman, Planning of chemical process net-
works via global concave minimization, in Global Optimization in Engineering
Design, I. Grossmann, ed., Springer, Boston, 1996, pp. 195–230.
[44] R. Lougee-Heimer, The Common Optimization INterface for Operations Re-
search, IBM Journal of Research and Development, 47 (2004), pp. 57–66.
[45] R. Lougee-Heimer, Cut generation library. http://projects.coin-or.org/Cgl,
2006.
[46] G.P. McCormick, Computability of global solutions to factorable nonconvex pro-
grams: Part I — Convex underestimating problems, Mathematical Program-
ming, 10 (1976), pp. 146–175.
[47] H. Mittelmann, A collection of mixed integer quadratically constrained quadratic
programs. http://plato.asu.edu/ftp/ampl files/miqp ampl.

www.it-ebooks.info
144 PIETRO BELOTTI

[48] R.E. Moore, Methods and Applications of Interval Analysis, Siam, Philadelphia,
1979.
[49] J.H. Owen and S. Mehrotra, A disjunctive cutting plane procedure for gen-
eral mixed-integer linear programs, Mathematical Programming, 89 (2001),
pp. 437–448.
[50] A.T. Phillips and J.B. Rosen, A quadratic assignment formulation of the molec-
ular conformation problem, tech. rep., CSD, Univ. of Minnesota, 1998.
[51] I. Quesada and I.E. Grossmann, Global optimization of bilinear process networks
and multicomponent flows, Computers & Chemical Engineering, 19 (1995),
pp. 1219–1242.
[52] H. Ratschek and J. Rokne, Interval methods, in Handbook of Global Optimiza-
tion, R. Horst and P. M. Pardalos, eds., Vol. 1, Kluwer Academic Publishers,
Dordrecht, 1995, pp. 751–828.
[53] F. Rendl and R. Sotirov, Bounds for the quadratic assignment problem using
the bundle method, Mathematical Programming, 109 (2007), pp. 505–524.
[54] N.V. Sahinidis, BARON: A general purpose global optimization software package,
Journal of Global Optimization, 8 (1996), pp. 201–205.
[55] A. Saxena, P. Bonami, and J. Lee, Disjunctive cuts for non-convex mixed in-
teger quadratically constrained programs, in Proceedings of the 13th Integer
Programming and Combinatorial Optimization Conference, A. Lodi, A. Pan-
conesi, and G. Rinaldi, eds., Vol. 5035 of Lecture Notes in Computer Science,
2008, pp. 17–33.
[56] , Convex relaxations of non-convex mixed integer quadratically constrained
programs: Projected formulations, Mathematical Programming (2011). To
appear.
[57] H. Scheel and S. Scholtes, Mathematical programs with complementarity con-
straints: Stationarity, optimality, and sensitivity, Mathematics of Operations
Research, 25 (2000), pp. 1–22.
[58] E.M.B. Smith, On the Optimal Design of Continuous Processes, PhD thesis, Im-
perial College of Science, Technology and Medicine, University of London, Oct.
1996.
[59] E.M.B. Smith and C.C. Pantelides, A symbolic reformulation/spatial branch-
and-bound algorithm for the global optimisation of nonconvex MINLPs, Com-
puters & Chem. Eng., 23 (1999), pp. 457–478.
[60] R.A. Stubbs and S. Mehrotra, A branch-and-cut method for 0-1 mixed convex
programming, Mathematical Programming, 86 (1999), pp. 515–532.
[61] M. Tawarmalani and N.V. Sahinidis, Convexification and global optimization in
continuous and mixed-integer nonlinear programming: Theory, algorithms,
software and applications, Vol. 65 of Nonconvex Optimization and Its Appli-
cations, Kluwer Academic Publishers, Dordrecht, 2002.
[62] , Global optimization of mixed-integer nonlinear programs: A theoretical and
computational study, Mathematical Programming, 99 (2004), pp. 563–591.
[63] D. Vandenbussche and G.L. Nemhauser, A branch-and-cut algorithm for non-
convex quadratic programs with box constraints, Mathematical Programming,
102 (2005), pp. 559–575.

www.it-ebooks.info
PART III:
Nonlinear Programming

www.it-ebooks.info
www.it-ebooks.info
SEQUENTIAL QUADRATIC PROGRAMMING METHODS
PHILIP E. GILL∗ AND ELIZABETH WONG∗

Abstract. In his 1963 PhD thesis, Wilson proposed the first sequential quadratic
programming (SQP) method for the solution of constrained nonlinear optimization prob-
lems. In the intervening 48 years, SQP methods have evolved into a powerful and effec-
tive class of methods for a wide range of optimization problems. We review some of the
most prominent developments in SQP methods since 1963 and discuss the relationship
of SQP methods to other popular methods, including augmented Lagrangian methods
and interior methods.
Given the scope and utility of nonlinear optimization, it is not surprising that SQP
methods are still a subject of active research. Recent developments in methods for mixed-
integer nonlinear programming (MINLP) and the minimization of functions subject to
differential equation constraints has led to a heightened interest in methods that may be
“warm started” from a good approximate solution. We discuss the role of SQP methods
in these contexts.

Key words. Large-scale nonlinear programming, SQP methods, nonconvex pro-


gramming, quadratic programming, KKT systems.

AMS(MOS) subject classifications. 49J20, 49J15, 49M37, 49D37, 65F05,


65K05, 90C30.

1. Introduction. This paper concerns the formulation of methods for


solving the smooth nonlinear programs that arise as subproblems within
a method for mixed-integer nonlinear programming (MINLP). In general,
this subproblem has both linear and nonlinear constraints, and may be
written in the form

minimize f (x)
x∈R
n
⎧⎫
x ⎬⎨ (1.1)
subject to  ≤ Ax ≤ u,
⎩ ⎭
c(x)

where f (x) is a linear or nonlinear objective function, c(x) is a vector of m


nonlinear constraint functions ci (x), A is a matrix, and l and u are vectors
of lower and upper bounds. Throughout, we assume that the number of
variables is large, and that A and the derivatives of f and c are sparse.
The constraints involving the matrix A and functions ci (x) will be called
the general constraints; the remaining constraints will be called bounds.
We assume that the nonlinear functions are smooth and that their first
and second derivatives are available. An equality constraint corresponds to

∗ Department of Mathematics, University of California, San Diego, La Jolla, CA

92093-0112 ([email protected], [email protected]). This material is based upon work


supported by the National Science Foundation under Grant No. DMS-0915220, and the
Department of Energy under grant DE-SC0002349.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 147
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_6,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
148 PHILIP E. GILL AND ELIZABETH WONG

setting i = ui . Similarly, a special “infinite” value for i or ui is used to


indicate the absence of one of the bounds.
The nonlinear programs that arise in the context of MINLP have sev-
eral important features. First, the problem is one of a sequence of many
related NLP problems with the same objective but with additional linear
or nonlinear constraints. For efficiency, it is important that information
from the solution of one problem be used to “warm start” the solution
of the next. Second, many MINLP methods generate infeasible subprob-
lems as part of the solution process (e.g., in branch and bound methods).
This implies that infeasible constraints will occur for a significant subset of
problems to be solved, which is in contrast to the situation in conventional
optimization, where constraint infeasibility is considered to be a relatively
unusual event, caused by an error in the constraint formulation. In mixed-
integer linear programming, the phase 1 simplex method provides a reliable
“certificate of infeasibility”, i.e., a definitive statement on whether or not
a feasible point exists. However, the question of the solvability of a set
of nonconvex inequalities is NP-hard. Many optimization methods replace
the solvability problem by a minimization problem in which the norm of
the constraint residual is minimized. If the constraints are not convex,
then this minimization problem is likely to have infeasible local minimizers
regardless of the existence of a feasible point. It follows that the minimiza-
tion method may terminate at a “phantom” infeasible point even though
the constraints have a feasible point (see Section 1.3).
Sequential quadratic programming methods and interior methods are
two alternative approaches to handling the inequality constraints in (1.1).
Sequential quadratic programming (SQP) methods find an approximate so-
lution of a sequence of quadratic programming (QP) subproblems in which
a quadratic model of the objective function is minimized subject to the
linearized constraints. Interior methods approximate a continuous path
that passes through a solution of (1.1). In the simplest case, the path is
parameterized by a positive scalar parameter μ that may be interpreted as
a perturbation for the optimality conditions for the problem (1.1). Both
interior methods and SQP methods have an inner/outer iteration structure,
with the work for an inner iteration being dominated by the cost of solv-
ing a large sparse system of symmetric indefinite linear equations. In the
case of SQP methods, these equations involve a subset of the variables and
constraints; for interior methods, the equations involve all the constraints
and variables.
SQP methods provide a relatively reliable “certificate of infeasibility”
and they have the potential of being able to capitalize on a good initial
starting point. Sophisticated matrix factorization updating techniques are
used to exploit the fact that the linear equations change by only a single
row and column at each inner iteration. These updating techniques are
often customized for the particular QP method being used and have the
benefit of providing a uniform treatment of ill-conditioning and singularity.

www.it-ebooks.info
SQP METHODS 149

On the negative side, it is difficult to implement SQP methods so that


exact second derivatives can be used efficiently and reliably. Some of these
difficulties stem from the theoretical properties of the quadratic program-
ming subproblem, which can be nonconvex when second derivatives are
used. Nonconvex quadratic programming is NP-hard—even for the calcu-
lation of a local minimizer [43, 72]. The complexity of the QP subproblem
has been a major impediment to the formulation of second-derivative SQP
methods (although methods based on indefinite QP have been proposed
[63, 66]). Over the years, algorithm developers have avoided this difficulty
by eschewing second derivatives and by solving a convex QP subproblem de-
fined with a positive semidefinite quasi-Newton approximate Hessian (see,
e.g., [83]). There are other difficulties associated with conventional SQP
methods that are not specifically related to the use of second derivatives.
An SQP algorithm is often tailored to a particular updating technique, e.g.,
the matrix factors of the Jacobian in the outer iteration can be chosen to
match those of the method for the QP subproblem. Any reliance on cus-
tomized linear algebra software makes it hard to “modernize” a method
to reflect new developments in software technology (e.g., in languages that
exploit new advances in computer hardware such as multicore processors
or GPU-based architectures). Another difficulty is that active-set methods
may require a substantial number of QP iterations when the outer iterates
are far from the solution. The use of a QP subproblem is motivated by the
assumption that the QP objective and constraints provide good “models” of
the objective and constraints of the NLP (see Section 2). This should make
it unnecessary (and inefficient) to solve the QP to high accuracy during the
preliminary iterations. Unfortunately, the simple expedient of limiting the
number of inner iterations may have a detrimental effect upon reliability.
An approximate QP solution may not predict a sufficient improvement in
a merit function (see Section 3.2). Moreover, some of the QP multipli-
ers will have the wrong sign if an active-set method is terminated before
a solution is found. This may cause difficulties if the QP multipliers are
used to estimate the multipliers for the nonlinear problem. These issues
would largely disappear if a primal-dual interior method were to be used
to solve the QP subproblem. These methods have the benefit of providing
a sequence of feasible (i.e., correctly signed) dual iterates. Nevertheless,
QP solvers based on conventional interior methods have had limited suc-
cess within SQP methods because they are difficult to “warm start” from
a near-optimal point (see the discussion below). This makes it difficult to
capitalize on the property that, as the outer iterates converge, the solution
of one QP subproblem is a very good estimate of the solution of the next.
Broadly speaking, the advantages and disadvantages of SQP methods
and interior methods complement each other. Interior methods are most
efficient when implemented with exact second derivatives. Moreover, they
can converge in few inner iterations—even for very large problems. The in-
ner iterates are the iterates of Newton’s method for finding an approximate

www.it-ebooks.info
150 PHILIP E. GILL AND ELIZABETH WONG

solution of the perturbed optimality conditions for a given μ. As the di-


mension and zero/nonzero structure of the Newton equations remains fixed,
these Newton equations may be solved efficiently using either iterative or
direct methods available in the form of advanced “off-the-shelf” linear al-
gebra software. In particular, any new software for multicore and parallel
architectures is immediately applicable. Moreover, the perturbation pa-
rameter μ plays an auxiliary role as an implicit regularization parameter of
the linear equations. This implicit regularization plays a crucial role in the
robustness of interior methods on ill-conditioned and ill-posed problems.
On the negative side, although interior methods are very effective for
solving “one-off” problems, they are difficult to adapt to solving a sequence
of related NLP problems. This difficulty may be explained in terms of the
“path-following” interpretation of interior methods. In the neighborhood
of an optimal solution, a step along the path x(μ) of perturbed solutions
is well-defined, whereas a step onto the path from a neighboring point
will be extremely sensitive to perturbations in the problem functions (and
hence difficult to compute). Another difficulty with conventional interior
methods is that a substantial number of iterations may be needed when
the constraints are infeasible.

1.1. Notation. Given vectors a and b with the same dimension, the
vector with ith component ai bi is denoted by a · b. The vectors e and
ej denote, respectively, the column vector of ones and the jth column of
the identity matrix I. The dimensions of e, ei and I are defined by the
context. Given vectors x and y of dimension nx and ny , the (nx + ny )-
vector of elements of x augmented by elements of y is denoted by (x, y).
The ith component of a vector labeled with a subscript will be denoted
by ( · )i , e.g., (vN )i is the ith component of the vector vN . Similarly, the
subvector of components with indices in the index set S is denoted by ( · )S ,
e.g., (vN )S is the vector with components (vN )i for i ∈ S. The vector with
components max{−xi , 0} (i.e., the magnitude of the negative part of x) is
denoted by [ x ]− . The vector p-norm and its subordinate matrix norm is
denoted by  · p .

1.2. Background. To simplify the notation, the problem format of


(1.1) is modified by introducing slack variables and replacing each general
constraint of the form i ≤ ϕi (x) ≤ ui by the equality constraint ϕi (x) −
si = 0 and range constraint i ≤ si ≤ ui . Without loss of generality, we
assume only nonnegativity constraints and use c(x) to denote the vector
of combined linear and nonlinear equality constraint functions. (However,
we emphasize that the exploitation of the properties of linear constraints
is an important issue in the solution of MINLP problems.) The problem to
be solved is then

minimize f (x) subject to c(x) = 0, x ≥ 0, (1.2)


x∈R
n

www.it-ebooks.info
SQP METHODS 151

where f and the m components of the constraint vector c are assumed to


be twice continuously differentiable for all x ∈ Rn . Any slack variables are
included in the definition of x and c.
Let g(x) denote ∇f (x), the gradient of f evaluated at x. Similarly,
let J(x) denote the m × n constraint Jacobian with rows formed from the
constraint gradients ∇ci (x). It is assumed that J(x) has rank m for all x
(see the discussion of the use of slack variables in Section 1). Through-
out the discussion, the component πi of the m-vector π will denote the
dual variable associated with the constraint ci (x) = 0 or its linearization.
Similarly, zj denotes the dual variable associated with the bound xj ≥ 0.
A constraint is active at x if it is satisfied with equality. For any feasible
x, i.e., for any x such that c(x) = 0 and x ≥ 0, all m equality constraints
ci (x) = 0 are necessarily active. The indices associated with the active
nonnegativity constraints comprise the active set, denoted by A(x), i.e.,
A(x) = { i : xi = 0 }. A nonnegativity constraint that is not in the active
set is said to be inactive. The inactive set contains the indices of the
inactive constraints, i.e., the so-called “free” variables I(x) = { i : xi > 0 }.
Under certain constraint regularity assumptions, an optimal solution of
(1.2) must satisfy conditions that may be written in terms of the derivatives
of the Lagrangian function L(x, π, z) = f (x) − π Tc(x) − z Tx. The triple
(x∗ , π ∗ , z ∗ ) is said to be a first-order KKT point for problem (1.2) if it
satisfies the KKT conditions
c(x∗ ) = 0, x∗ ≥ 0,
g(x∗ ) − J(x∗ )T π ∗ − z ∗ = 0, (1.3)
∗ ∗ ∗
x · z = 0, z ≥ 0.

The property of strict complementarity holds if the vectors x∗ and z ∗ sat-


isfy x∗ · z ∗ = 0 with x∗ + z ∗ > 0. In keeping with linear programming
terminology, we refer to the dual variables π and z as the π-values and
reduced costs, respectively. The vector-triple (x, π, z) is said to constitute
a primal-dual estimate of the quantities (x∗ , π∗ , z ∗ ) satisfying (1.3).
The purpose of the constraint regularity assumption is to guarantee
that a linearization of the constraints describes the nonlinear constraints
with sufficient accuracy that the KKT conditions of (1.3) are necessary
for local optimality. One such regularity assumption is the Mangasarian-
Fromovitz constraint qualification [133, 143], which requires that J(x∗ ) has
rank m, and that there exists a vector p such that J(x∗ )p = 0 with pi > 0 for
all i ∈ A(x∗ ). Another common, but slightly more restrictive, assumption
is the linear independence constraint qualification, which requires that the
matrix of free columns of J(x∗ ) has full row rank.
Let H(x, π) denote the Hessian of L(x, π, z) with respect to x, i.e.,
m

H(x, π) = ∇2xx L(x, π, z) 2
= ∇ f (x) − πi ∇2 ci (x).
i=1

www.it-ebooks.info
152 PHILIP E. GILL AND ELIZABETH WONG

Under the linear independence constraint qualification, the second-order


necessary optimality conditions require that the first-order conditions (1.3)
hold with the additional condition that pTH(x∗ , π∗ )p ≥ 0 for all p such
that J(x∗ )p = 0, and pi = 0 for every i ∈ A(x∗ ). See, e.g., Nocedal and
Wright [143, Chapter 12] for more discussion of constraint assumptions and
optimality conditions.
-
For a feasible point x, we will denote by J(x) the matrix comprising
columns of J(x) corresponding to indices in I(x). A point x at which
- T) and the linear independence constraint qualifica-
(g(x))I ∈ range(J(x)
tion does not hold is said to be degenerate. For example, if x is a degenerate
vertex, then more than n − m bounds must be active and J(x) - has more
rows than columns. The Mangasarian-Fromovitz constraint qualification
may or may not hold at a degenerate pont. Practical NLP problems with
degenerate points are very common and it is crucial that an algorithm
-
be able to handle J(x) with dependent rows. Throughout our discussion
of the effects of degeneracy in SQP methods, it will be assumed that the
Mangasarian-Fromovitz regularity assumption holds.

1.3. Infeasible problems. In the normal situation, when solving a


“one-off” nonlinear program of the form (1.2), one may expect that the
problem is feasible—i.e., that there exist points that satisfy the constraints.
This is because an infeasible problem is generally the result of a unintended
formulation or coding error. However, there are situations when the detec-
tion of infeasibility is of particular interest. An example is mixed integer
nonlinear programming, where the occurrence of infeasibility is likely as
part of a branch and bound fathoming criteria. In this situation, the rapid
and reliable detection of infeasibility is a crucial requirement of an algo-
rithm. One way of handling this situation is to define a related regularized
problem that always has feasible points. This is done by formulating an
alternative problem that is always well posed, yet has (x∗ , π ∗ , z ∗ ) as a so-
lution when (x∗ , π ∗ , z ∗ ) exists.
As the question of the existence of a solution revolves around whether
or not the constraints admit a feasible point, we can always relax the
constraints sufficiently to allow the constraints to be feasible. It is then just
a question of solving the relaxed problem while simultaneously reducing the
amount of relaxation. This process can be automated by introducing elastic
variables v and w in (1.2), and formulating the elastic problem

minimize f (x) + ρeTu + ρeTv


x∈Rn ; u,v∈Rm
(1.4)
subject to c(x) − u + v = 0, x ≥ 0, u ≥ 0, v ≥ 0,

where ρ is a “penalty” on the elasticity, often referred to as the elastic


weight, and e is the vector of ones. The smooth elastic problem is equivalent
to the nonsmooth bound-constrained problem

www.it-ebooks.info
SQP METHODS 153

m

minimize f (x) + ρ |ci (x)| subject to x ≥ 0, (1.5)
x∈Rn
i=1

i.e., the elastic problem implicitly enforces a penalty on the sum of the in-
feasibilities of the constraints c(x) = 0. If the original problem is infeasible,
then, for large values of ρ, there is a minimizer of the elastic problem that
is an O(1/ρ) approximation to a minimizer of the sum of the constraint
infeasibilities. This minimizer can be useful in identifying which of the
constraints are causing the infeasibility (see Chinneck [34, 35]).
The elastic problem is called an exact regularization of (1.2) because
if ρ is sufficiently large and (x∗ , π ∗ , z ∗ ) is optimal for (1.2), then it is also
optimal for the elastic problem (1.4) with u = v = 0. See Fletcher [64,
Section 12.3] for a discussion of these issues. The first-order necessary
conditions for (x∗ , u∗ , v ∗ , π ∗ , z ∗ ) to be an optimal solution for the elastic
problem (1.4) are

c(x∗ ) − u∗ + v ∗ = 0, u∗ ≥ 0, v ∗ ≥ 0, (1.6a)
g(x ) − J(x∗ )T π ∗ − z ∗ = 0,

(1.6b)
x∗ · z ∗ = 0, z ∗ ≥ 0, x∗ ≥ 0 (1.6c)
u∗ · (ρe + π ∗ ) = 0, v ∗ · (ρe − π ∗ ) = 0, −ρe ≤ π ∗ ≤ ρe. (1.6d)

To see that the elastic problem (1.4) defines an exact regularization, note
that if π ∗ ∞ < ρ, then a solution (x∗ , π∗ , z ∗ ) of (1.3) is also a solution of
(1.6) with u∗ = v ∗ = 0. Conditions (1.6) are always necessary for a point
(x∗ , u∗ , v ∗ ) to be an optimal solution for (1.4) because the Mangasarian-
Fromovitz constraint qualification is always satisfied.
There are two caveats associated with solving the regularized problem.
First, if a solution of the original problem exists, it is generally only a local
solution of the elastic problem. The elastic problem may be unbounded
below, or may have local solutions that are not solutions of the original
problem. For example, consider the one-dimensional problem
1 3
minimize x + 1 subject to 3x − 32 x2 + 2x = 0, x ≥ 0, (1.7)
x∈R

which has a unique solution (x∗ , π∗ ) = (0, 12 ). For all ρ > 12 , the penalty
function (1.5) has a local minimizer x̄ = 2 − O(1/ρ) such that c(x̄) = 0.
This example shows that regularization can introduce “phantom” solutions
that do not appear in the original problem.
The second caveat is that, in general, the precise definition of the
elastic problem is not known in advance because an appropriate value of
the parameter ρ depends on the optimal multipliers π ∗ . This implies that,
in practice, any estimate of ρ may need to be increased if the minimization
appears to be converging to a regularized solution with u∗ + v ∗ = 0. If the

www.it-ebooks.info
154 PHILIP E. GILL AND ELIZABETH WONG

−1 0 1 2 3 4 5

Fig. 1. This figure depicts the objective function and penalty function (1.5)
for the one-dimensional problem (1.7). The constrained problem has a unique
solution (x∗ , π ∗ ) = (0, 12 ). However, for all ρ > 12 , the penalty function has a
local minimizer x̄ = 2 − O(1/ρ) with c(x̄) = 0.

original problem is infeasible, then u∗ + v ∗ is nonzero for all solutions and


ρ must go to infinity if the elastic problem is to provide an estimate of the
minimum sum of infeasibilities. Gill, Murray, and Saunders [83] apply an
SQP method to a sequence of regularized problems in which ρ is increased
geometrically. The sequence is terminated when a solution is found with
u∗ + v ∗ = 0, or ρ reaches a preassigned upper limit. However, in the
context of MINLP, where constraint infeasibility is typical, it is crucial that
infeasibility be detected rapidly, which implies that the value of ρ may need
to be increased before an accurate solution of the regularized problem is
found. (For further information on choosing the value of ρ in this context,
see, e.g., Exler and Schittkowski [57], and Byrd, Nocedal, and Waltz [28].)
The objective function in the elastic problem (1.5) is the 1 penalty
function
m

P1 (x; ρ) = f (x) + ρc(x)1 = f (x) + ρ |ci (x)|. (1.8)
i=1

Regularization using an 1 penalty function is (by far) the most common


form of constraint regularization for problems with inequality constraints.
However, other exact regularizations can be defined based on using alterna-
tive norms to measure the constraint violations. If the ∞ penalty function

P∞ (x; ρ) = f (x) + ρc(x)∞ = f (x) + ρ max |ci (x)| (1.9)


1≤i≤m

www.it-ebooks.info
SQP METHODS 155

is minimized subject to the constraints x ≥ 0, an equivalent smooth con-


strained form of the regularized problem is

minimize f (x) + ρθ subject to −θe ≤ c(x) ≤ θe, x ≥ 0, θ ≥ 0, (1.10)


x∈R ; θ∈R
n

where θ is a temporary nonnegative auxiliary variable. This regularization


is exact if ρ > π ∗ 1 . The 2 penalty function f (x) + ρc(x)2 also defines
an exact regularization, although the use of the two-norm in this form
is less common because there is no equivalent smooth constrained form of
the problem. (For more on the properties of exact regularization for convex
optimization, see Friedlander and Tseng [76].)
The 2 penalty function is one exception to the rule that a constraint
regularization for (1.2) can be written as either a perturbed nonlinearly
constrained problem or an equivalent bound-constrained problem, where
both formulations depend on the optimal multipliers π∗ . However, for
some forms of regularization, the dependence on π ∗ can be explicit (and
hence harder to apply). Consider the bound-constrained problem

minimize f (x) − c(x)T πE + 12 ρc(x)22 subject to x ≥ 0, (1.11)


x∈Rn

where πE is an m-vector, and ρ is a nonnegative scalar penalty parameter.


Problem (1.11) is used in a number of methods for general nonlinear pro-
gramming problems based on sequential bound-constrained minimization,
see, e.g., [39, 40, 41, 73, 75]. The objective function is the well-known
Hestenes-Powell augmented Lagrangian, which was first proposed for se-
quential unconstrained optimization (see, e.g., Hestenes [122], Powell [151],
Rockafellar [158], Tapia [166], and Bertsekas [6]). The regularization is ex-
act for πE = π ∗ and all ρ > ρ̄, where ρ̄ depends on the spectral radius of
the Hessian of the Lagrangian (and hence, implicitly, on the magnitude of
π∗ ). Clearly, this function has a more explicit dependence on π ∗ . If x∗ is
a solution of (1.11) for πE ≈ π ∗ , then x∗ satisfies the perturbed nonlinearly
constrained problem

minimize f (x) subject to c(x) = μ(πE − π∗ ), x ≥ 0, (1.12)


x∈Rn

where π∗ = πE − ρc(x∗ ) and μ is the inverse penalty parameter 1/ρ.


In later sections we consider the close connection between problem
regularization and the formulation of SQP methods. In particular, we show
that many SQP formulations can be considered in terms of a “plain” SQP
method being applied to a related regularized nonlinear program.
Regularization is a very broad idea that can take many forms. For
example, a problem formulation may be regularized to ensure that a so-
lution exists or, if there are many solutions, to ensure that a particular
favorable solution is found (such as a least-norm solution). Other forms
of regularization are specifically designed to make sure that an algorithm

www.it-ebooks.info
156 PHILIP E. GILL AND ELIZABETH WONG

may be applied with minimal risk of numerical breakdown. For example,


adding slack variables to all constraints, including equalities, guarantees
that the Jacobian has full row rank. Such regularization schemes have a
beneficial effect on whatever method is used. Some forms of regularization
are associated with a specific technique (e.g., trust-region methods impose
an implicit regularization on a given subproblem—see Section 3.1.2).
However, although regularization is useful (and sometimes vital) there
is usually some price to be paid for its use. In many cases, regularization
leads to additional computational overhead or algorithmic complexity. In
some cases, regularization will give an approximate rather than exact so-
lution of the problem. More seriously, some forms of regularization lead to
the possibility of “phantom” solutions that are not solutions of the original
problem.
2. Local properties of SQP methods. In many introductory texts,
“the” SQP method is defined as one in which the quadratic programming
subproblem involves the minimization of a quadratic model of the objec-
tive function subject to a linearization of the constraints. This description,
which broadly defines the original SQP method of Wilson [172] for con-
vex programming, is somewhat over-simplistic for modern SQP methods.
Nevertheless, we start by defining a “vanilla” or “plain” SQP method in
these terms.
The basic structure of an SQP method involves inner and outer iter-
ations. Associated with the kth outer iteration is an approximate solution
xk , together with dual variables πk and zk for the nonlinear constraints
and bounds. Given (xk , πk , zk ), new primal-dual estimates are found by
solving the quadratic programming subproblem
minimize f (xk ) + g(xk )T(x − xk ) + 12 (x − xk )TH(xk , πk )(x − xk )
x∈R
n
(2.1)
subject to c(xk ) + J(xk )(x − xk ) = 0, x ≥ 0.
In our plain SQP method, this subproblem is solved by iteration using
a quadratic programming method. New estimates πk+1 and zk+1 of the
Lagrange multipliers are the optimal multipliers for the subproblem (2.1).
The iterations of the QP method constitute the SQP inner iterations.
The form of the plain QP subproblem (2.1) is motivated by a certain
fixed-point property that requires the SQP method to terminate in only
one (outer) iteration when started at an optimal solution. In particular,
the plain QP subproblem is defined in such a way that if (xk , πk , zk ) =
(x∗ , π ∗ , z ∗ ), then the NLP primal-dual solution (x∗ , π∗ , z ∗ ) satisfies the QP
optimality conditions for (2.1) and thereby constitutes a solution of the
subproblem (see Section 2.2 below for a statement of the QP optimality
conditions). Under certain assumptions on the problem derivatives, this
fixed-point property implies that (xk , πk , zk ) → (x∗ , π ∗ , z ∗ ) when the initial
point (x0 , π0 , z0 ) is sufficiently close to (x∗ , π∗ , z ∗ ). These assumptions are
discussed further below.

www.it-ebooks.info
SQP METHODS 157

Given our earlier statement that SQP methods “minimize a quadratic


model of the objective function”, readers unfamiliar with SQP methods
might wonder why the quadratic term of the quadratic objective of (2.1)
involves the Hessian of the Lagrangian function and not the Hessian of
the objective function. However, at (xk , πk , zk ) = (x∗ , π ∗ , z ∗ ), the objec-
tive of the subproblem defines the second-order local variation of f on the
constraint surface c(x) = 0. Suppose that x(α) is a twice-differentiable
feasible path starting at xk , parameterized by a nonnegative scalar α; i.e.,
x(0) = xk and c(x(α)) = 0. An inspection of the derivatives f  (x(α)) and
f  (x(α)) at α = 0 indicates that the function
. / . /T . /
f-k (x) = f (xk ) + g(xk )T x − xk + 12 x − xk H(xk , πk ) x − xk (2.2)

defines a second-order approximation of f for all x lying on x(α), i.e.,


f-k (x) may be regarded as a local quadratic model of f that incorporates the
curvature of the constraints c(x) = 0.
This constrained variation of the objective is equivalent to the uncon-
strained variation of a function known as the modified Lagrangian, which
is given by

L(x; xk , πk ) = f (x) − πkT (c(x) − c-k (x)), (2.3)

where c-k (x) denotes


. / vector of linearized constraint functions c-k (x) =
the
c(xk ) + J(xk ) x − xk , and c(x) − c-k (x) is known as the departure from
linearity (see Robinson [156] and Van der Hoek [169]). The first and second
derivatives of the modified Lagrangian are given by
. /T
∇L(x; xk , πk ) = g(x) − J(x) − J(xk ) πk ,
m

∇2L(x; xk , πk ) = ∇2f (x) − (πk )i ∇2 ci (x).
i=1

The Hessian of the modified Lagrangian is independent of xk and coincides


with the Hessian (with respect to x) of the conventional Lagrangian. Also,
L(x; xk , πk )|x=xk = f (xk ), and ∇L(x; xk , πk )|x=xk = g(xk ), which implies
that f-k (x) defines a local quadratic model of L(x; xk , πk ) at x = xk .
Throughout the remaining discussion, gk , ck , Jk and Hk denote g(x),
c(x), J(x) and H(x, π) evaluated at xk and πk . With this notation, the
quadratic objective is f-k (x) = fk + gkT(x − xk ) + 12 (x − xk )THk (x − xk ),
with gradient g-k (x) = gk + Hk (x − xk ). A “hat” will be used to denote
quantities associated with the QP subproblem.
2.1. Equality constraints. We motivate some of the later discussion
by reviewing the connection between SQP methods and Newton’s method
for solving a system of nonlinear equations. We begin by omitting the

www.it-ebooks.info
158 PHILIP E. GILL AND ELIZABETH WONG

nonnegativity constraints and considering the equality constrained problem

minimize f (x) subject to c(x) = 0. (2.4)


x∈Rn

In the case of unconstrained optimization, a standard approach to the


formulation of algorithms is to use the first-order optimality conditions to
define a system of nonlinear equations ∇f (x) = 0 whose solution is a first-
order optimal point x∗ . In the constrained case, the relevant nonlinear
equations involve the gradient of the Lagrangian function L(x, π), which
incorporates the first-order feasibility and optimality conditions satisfied
by x∗ and π∗ . If the rows of the constraint Jacobian J at x∗ are linearly
independent, a primal-dual solution represented by the n+m vector (x∗ , π∗ )
must satisfy the n + m nonlinear equations F (x, π) = 0, where
 
g(x) − J(x)T π
F (x, π) ≡ ∇L(x, π) = . (2.5)
−c(x)

These equations may be solved efficiently using Newton’s method.


2.1.1. Newton’s method and SQP. Consider one iteration of New-
ton’s method, starting at estimates xk and πk of the primal and dual vari-
ables. If vk denotes the iterate defined by (n + m)-vector (xk , πk ), then the
next iterate vk+1 is given by

vk+1 = vk + Δvk , where F  (vk )Δvk = −F (vk ).

Differentiating (2.5) with respect to x and π gives F  (v) ≡ F  (x, π) as


 
 H(x, π) −J(x)T
F (x, π) = ,
−J(x) 0

which implies that the Newton equations may be written as


    
Hk −JkT pk g − JkTπk
=− k ,
−Jk 0 qk −ck

where pk and qk denote the Newton steps for the primal and dual variables.
If the second block of equations is scaled by −1 we obtain the system
    
Hk −JkT pk gk − JkTπk
=− , (2.6)
Jk 0 qk ck

which is an example of a saddle-point system. Finally, if the second block


of variables is scaled by −1 we obtain an equivalent symmetric system
    
Hk JkT pk g − JkTπk
=− k , (2.7)
Jk 0 −qk ck

www.it-ebooks.info
SQP METHODS 159

which is often referred to as the KKT system.


It may not be clear immediately how this method is related to an
SQP method. The crucial link follows from the observation that the KKT
equations (2.7) represent the first-order optimality conditions for the primal
and dual solution (pk , qk ) of the quadratic program
minimize (gk − JkTπk )Tp + 12 pTHk p
p∈Rn

subject to ck + Jk p = 0,
which, under certain conditions on the curvature of the Lagrangian dis-
cussed below, defines the step from xk to the point that minimizes the
local quadratic model of the objective function subject to the linearized
constraints. It is now a simple matter to include the constant objective
term fk (which does not affect the optimal solution) and write the dual
variables in terms of πk+1 = πk + qk instead of qk . The equations analo-
gous to (2.7) are then
    
Hk JkT pk g
=− k , (2.8)
Jk 0 −πk+1 ck
which are the first-order optimality conditions for the quadratic program
minimize fk + gkTp + 12 pTHk p subject to ck + Jk p = 0.
p∈R
n

When written in terms of the x variables, this quadratic program is


minimize fk + gkT(x − xk ) + 12 (x − xk )THk (x − xk )
x∈Rn
(2.9)
subject to ck + Jk (x − xk ) = 0.
2.1.2. Local convergence. A standard analysis of Newton’s method
(see, e.g., Moré and Sorensen [139, Theorem 2.8]) shows that if the KKT
matrix is nonsingular at a solution (x∗ , π ∗ ), and (x0 , π0 ) lies in a sufficiently
small neighborhood of (x∗ , π ∗ ) in which f and c are twice-continuously
differentiable, then the SQP iterates (xk , πk ) will converge to (x∗ , π ∗ ) at a
Q-superlinear rate. If, in addition, H(x, π) is locally Lipschitz continuous,
then the SQP iterates (xk , πk ) are Q-quadratically convergent. As x is only
a subvector of v, with v = (x, π), the convergence rate of xk does not follow
immediately. However, as xk − x∗  ≤ vk − v ∗ , a Q-quadratic rate of
convergence of (xk , πk ) implies an R-quadratic rate of convergence of xk .
For more on the rate of convergence of {xk } relative to {xk , πk }, see Ortega
and Rheinboldt [146, Chapter 9].
Conditions for the nonsingularity of the KKT matrix may be deter-
mined by transforming the KKT system into an equivalent system that
reveals the rank. If Qk is an n × n nonsingular matrix, then (2.8) is equiv-
alent to the system
 T    T 
Qk Hk Qk (Jk Qk )T pQ Qk gk
=− , with pk = Qk pQ . (2.10)
Jk Qk 0 −πk+1 ck

www.it-ebooks.info
160 PHILIP E. GILL AND ELIZABETH WONG

. /
Let Qk be defined so that Jk Qk = 0 Uk , where Uk is m × m. The
assumption that Jk has rank m implies that Uk is nonsingular. If the n
columns of Qk are partitioned into blocks Zk and Yk of dimension n×(n−m)
and n × m, then
. / . /
Jk Qk = Jk Zk Yk = 0 Uk , (2.11)

which shows that Jk Zk = 0 and Jk Yk = Uk . Since Zk and Yk are sections


of the nonsingular matrix Qk , they must have independent columns, and,
in particular, the columns of Zk must form a basis for the null-space of Jk .
If QTk Hk Qk and Jk Qk are partitioned to conform to the Z–Y partition of
Qk , we obtain the block lower-triangular system
⎛ ⎞⎛ ⎞ ⎛ ⎞
Uk 0 0 pY ck
⎝ZkTHk Yk ZkTHk Zk 0 ⎠ ⎝ pZ ⎠ = − ⎝ZkTgk ⎠ , (2.12)
Yk Hk Yk Yk Hk Zk UkT
T T
−πk+1 YkTgk

where the (n − m)-vector pZ and m-vector pY are the parts of pQ that


conform to the columns of Zk and Yk . It follows immediately from (2.12)
that the Jacobian F  (xk , πk ) is nonsingular if Jk has independent rows and
ZkTHk Zk is nonsingular. In what follows, we use standard terminology and
refer to the vector ZkTgk as the reduced gradient and the matrix ZkTHk Zk
as the reduced Hessian. If J(x∗ ) has rank m and the columns of the ma-
trix Z ∗ form a basis for the null-space of J(x∗ ), then the conditions: (i)
∇L(x∗ , π ∗ ) = 0; and (ii) Z ∗TH(x∗ , π∗ )Z ∗ positive definite, are sufficient
for x∗ to be an isolated minimizer of the equality constraint problem (2.4).
2.1.3. Properties of the Newton step. The equations (2.12) have
a geometrical interpretation that provides some insight into the properties
of the Newton direction. From (2.10), the vectors pZ and pY must satisfy
 
. / pZ
pk = Qk pQ = Zk Yk = Zk pZ + Yk pY .
pY

Using block substitution on the system (2.12) we obtain the following equa-
tions for pk and πk+1 :

Uk pY = −ck , pN = Yk pY ,
T
Zk Hk Zk pZ = −ZkT (gk + Hk pN ), pT = Zk pZ , (2.13)
pk = p N + p T , UkT πk+1 = YkT (gk + Hk pk ).

These equations involve the auxiliary vectors pN and pT such that pk =


pN + pT and Jk pT = 0. We call pN and pT the normal and tangential
steps associated with pk . Equations (2.13) may be simplified further by
introducing the intermediate vector xF such that xF = xk + pN . The
definition of the gradient of f-k implies that gk + Hk pN = ∇f-k (xk + pN ) =
g-k (xF ), which allows us to rewrite (2.13) in the form

www.it-ebooks.info
SQP METHODS 161

Uk pY = −ck , pN = Yk pY ,
xF = xk + pN , ZkT Hk Zk pZ = −ZkT g-k (xF ), pT = Zk pZ ,
(2.14)
p k = p N + pT , xk+1 = xF + pT ,
T
Uk πk+1 = YkT g-k (xk+1 ).

The definition of xF implies that

c-k (xF ) = ck + Jk (xF − xk ) = ck + Jk pN = ck + Jk Yk pY = ck + Uk pY = 0,

which implies that the normal component pN satisfies Jk pN = −ck and con-
stitutes the Newton step from xk to the point xF satisfying the linearized
constraints ck + Jk (x − xk ) = 0. On the other hand, the tangential step pT
satisfies pT = Zk pZ , where ZkTHk Zk pZ = −ZkT g-k (xF ). If the reduced Hes-
sian ZkTHk Zk is positive definite, which will be the case if xk is sufficiently
close to a locally unique (i.e., isolated) minimizer of (2.4), then pT defines
the Newton step from xF to the minimizer of the quadratic model f-k (x) in
the subspace orthogonal to the constraint normals (i.e., on the surface of
the linearized constraint c-k (x) = 0). It follows that the Newton direction
is the sum of two steps: a normal step to the linearized constraint and
the tangential step on the constraint surface that minimizes the quadratic
model. This property reflects the two (usually conflicting) underlying pro-
cesses present in all algorithms for optimization—the minimization of the
objective and the satisfaction of the constraints.
In the discussion above, the normal step pN is interpreted as a Newton
direction for the equations c-k (x) = 0 at x = xk . However, in some situa-
tions, pN may also be interpreted as the solution of a minimization problem.
The Newton direction pk is unique, but the decomposition pk = pT + pN
depends on the choice of the matrix Qk associated with the Jacobian fac-
torization (2.11). If Qk is orthogonal, i.e., if QTk Qk = I, then ZkT Yk = 0
and the columns of Yk form a basis for the range space of JkT . In this case,
pN and pT define the unique range-space and null-space decomposition of
pk , and pN is the unique solution with least two-norm of the least-squares
problem

min -
ck (xk ) + Jk p2 , or, equivalently, min ck + Jk p2 .
p p

This interpretation is useful in the formulation of variants of Newton’s


method that do not require (xk , πk ) to lie in a small neighborhood of
(x∗ , π ∗ ). In particular, it suggests a way of computing the normal step
when the equations Jk p = −ck are not compatible.
For consistency with the inequality constrained case below, the primal-
dual solution of the kth QP subproblem is denoted by (- xk , π
-k ). With this
notation, the first-order optimality conditions for the QP subproblem (2.9)
are given by

www.it-ebooks.info
162 PHILIP E. GILL AND ELIZABETH WONG

Jk (-
xk − xk ) + ck = 0,
(2.15)
xk − xk ) − JkT π
gk + Hk (- -k = 0.

Similarly, the Newton iterates are given by xk+1 = x-k = xk + pk and


πk+1 = π-k = πk + qk .
2.1.4. Calculation of the Newton step. There are two broad ap-
proaches for solving the Newton equations (either in saddle-point form
(2.6) or symmetric form (2.7). The first involves solving the full n + m
KKT equations, the second decomposes the KKT equations into the three
systems associated with the block lower-triangular equations (2.12).
In the full-matrix approach, the matrix K may be represented by its
symmetric indefinite factorization (see, e.g., Bunch and Parlett [20], and
Bunch and Kaufman [18]):

P KP T = LDLT , (2.16)

where P is a permutation matrix, L is lower triangular and D is block


diagonal, with 1 × 1 or 2 × 2 blocks. (The latter are required to retain nu-
merical stability.) Some prominent software packages include MA27 (Duff
and Reid [55]), MA57 (Duff [54]), MUMPS (Amestoy et al. [1]), PARDISO
(Schenk and Gärtner [159]), and SPOOLES (Ashcraft and Grimes [4]).
The decomposition approach is based on using an explicit or implicit
representation of the null-space basis matrix Zk . When Jk is dense, Zk is
usually computed directly from a QR factorization of Jk (see, e.g., Coleman
and Sorensen [38], and Gill et al. [85]). When Jk is sparse, however, known
techniques for obtaining an orthogonal and sparse Z may be expensive in
time and storage, although some effective algorithms have been proposed
(see, e.g., Coleman and Pothen [37]; Gilbert and Heath [78]).
The representation of Zk most commonly used in sparse problems
is called the variable-reduction form of Zk , and is obtained as follows.
The columns of Jk are partitioned so as to identify explicitly an m × m
nonsingular matrix B (the basis matrix ). Assuming that B is at the “left”
of Jk , we have
. /
Jk = B S .

(In practice, the columns of B may occur anywhere.) When Jk has this
form, a basis for the null space of Jk is given by the columns of the (non-
orthogonal) matrix Qk defined as
     
−B −1S Im −B −1 S Im
Qk = , with Zk = and Yk = .
In−m In−m 0

This definition of Qk means that matrix-vector products ZkTv or Zk v can be


computed using a factorization of B (typically, a sparse LU factorization;

www.it-ebooks.info
SQP METHODS 163

see Gill, Murray, Saunders and Wright [90]), and Zk need not be stored
explicitly.
For large sparse problems, the reduced Hessian ZkT Hk Zk associated
with the solution of (2.14) will generally be much more dense than Hk and
B. However, in many cases, n − m is small enough to allow the storage of
a dense Cholesky factor of ZkT Hk Zk .
2.2. Inequality constraints. Given an approximate primal-dual so-
lution (xk , πk ) with xk ≥ 0, an outer iteration of a typical SQP method
involves solving the QP subproblem (2.1), repeated here for convenience:

minimize fk + gkT (x − xk ) + 12 (x − xk )THk (x − xk )


x∈Rn
(2.17)
subject to Jk (x − xk ) = −ck , x ≥ 0.

Assume for the moment that this subproblem is feasible, with primal-dual
xk , π
solution (- -k , z-k ). The next plain SQP iterate is xk+1 = x-k , πk+1 = π
-k
and zk+1 = z-k . The QP first-order optimality conditions are

xk − xk ) + ck = 0,
Jk (- x-k ≥ 0;
gk + Hk (-
xk − xk ) − JkT π
-k − z-k = 0, (2.18)
x-k · z-k = 0, z-k ≥ 0.

Let pk = x-k − xk and let p̄k denote the vector of free components of pk ,
i.e., the components with indices in I(-
xk ). Similarly, let z̄ k denote the free
components of z-k . The complementarity conditions imply that z̄ k = 0 and
we may combine the first two sets of equalities in (2.18) to give
    
H̄ k J¯Tk p̄k (gk + Hk ηk )I
=− , (2.19)
J¯k 0 −-πk ck + Jk ηk

where J¯k is the matrix of free columns of Jk , and ηk is the vector



xk − xk )i if i ∈ A(-
(- xk );
(ηk )i =
0 if i ∈ I(-
xk ).

If the active sets at x-k and xk are the same, i.e., A(- xk ) = A(xk ), then
ηk = 0. If x-k lies in a sufficiently small neighborhood of a nondegenerate
solution x∗ , then A(- xk ) = A(x∗ ) and hence J¯k has full row rank (see
Robinson [157]). In this case we say that the QP identifies the correct
active set at x∗ . If, in addition, (x∗ , π∗ ) satisfies the second-order sufficient
conditions for optimality, then KKT system (2.19) is nonsingular and the
plain SQP method is equivalent to Newton’s method applied to the equality-
constraint subproblem defined by fixing the variables in the active set at
their bounds.
However, at a degenerate QP solution, the rows of J¯k are linearly
dependent and the KKT equations (2.19) are compatible but singular.

www.it-ebooks.info
164 PHILIP E. GILL AND ELIZABETH WONG

Broadly speaking, there are two approaches to dealing with the degen-
erate case, where each approach is linked to the method used to solve the
QP subproblem. The first approach employs a QP method that not only
finds the QP solution x-k , but also identifies a “basic set” of variables that
define a matrix J0k with linearly independent rows. The second approach
solves a regularized or perturbed QP subproblem that provides a perturbed
version of the KKT system (2.19) that is nonsingular for any J¯k .
Identifying independent constraints. The first approach is based on
using a QP algorithm that provides a primal-dual QP solution that satisfies
a nonsingular KKT system analogous to (2.19). A class of quadratic pro-
gramming methods with this property are primal-feasible active-set meth-
ods, which form the basis of the software packages NPSOL and SNOPT.
Primal-feasible QP methods have two phases: in phase 1, a feasible point
is found by minimizing the sum of infeasibilities; in phase 2, the quadratic
objective function is minimized while feasibility is maintained. In each it-
eration, the variables are labeled as being “basic” or “nonbasic”, where the
nonbasic variables are temporarily fixed at their current value. The indices
of the basic and nonbasic variables are denoted by B and N respectively.
A defining property of the B–N partition is that the rows of the Jacobian
appearing in the KKT matrix are always linearly independent. Once an
initial basic set is identified, all subsequent KKT equations have a con-
straint block with independent rows. (For more details of primal-feasible
active-set methods, see Section A.1 of the Appendix.)
Let pk = x-k − xk , where (-xk , π
-k ) is the QP solution found by a primal-
feasible active-set method. Let p0k denote the vector of components of pk in
the final basic set B, with J0k the corresponding columns of Jk . The vector
pk , π
(0 -k ) satisfies the nonsingular KKT equations
    
H0
k J0kT p0k (gk + Hk ηk )B
=− , (2.20)
J0k 0 −-πk ck + Jk ηk
where ηk is now defined in terms of the final QP nonbasic set, i.e.,

xk − xk )i if i ∈ N ;
(-
(ηk )i = (2.21)
0 if i ∈ N .
As in (2.19), if the basic-nonbasic partition is not changed during the solu-
tion of the subproblem, then ηk = 0. If this final QP nonbasic set is used to
define the initial nonbasic set for the next QP subproblem, it is typical for
the later QP subproblems to reach optimality in a single iteration because
the solution of the first QP KKT system satisfies the QP optimality condi-
tions immediately. In this case, the phase-1 procedure simply performs a
feasibility check that would be required in any case.
Constraint regularization. One of the purposes of regularization is to
define KKT equations that are nonsingular regardless of the rank of J¯k .
Consider the perturbed version of equations (2.19) such that

www.it-ebooks.info
SQP METHODS 165
    
H̄ k J¯Tk p̄k (gk + Hk ηk )I
=− , (2.22)
J¯k −μI −-
πk ck + Jk ηk

where μ is a small positive constant. In addition, assume that Z̄ Tk H̄ k Z̄ k is


positive definite, where the columns of Z̄ k form a basis for the null space
of J¯k . With this assumption, the unperturbed KKT equations (2.19) are
singular if and only if J¯k has linearly dependent rows.
For simplicity, assume that ηk = 0. Let ( U V ) be an orthonormal ma-
trix such that the columns of U form a basis for null(J¯Tk ) and the columns
of V form a basis for range(J¯k ). The unique expansion π -k = U πU + V πV
allows us to rewrite (2.22) as
⎛ ⎞⎛ ⎞ ⎛ ⎞
H̄ k J¯Tk V p̄k (gk )I
⎝V T J¯ −μI ⎠ ⎝−πV ⎠ = − ⎝V T ck ⎠ , (2.23)
k
−μI −πU 0

where J¯Tk U = 0 from the definition of U , and U T ck = 0 because ck ∈


range(J¯k ). The following simple argument shows that the equations (2.23)
are nonsingular, regardless of the rank of J¯k . First, observe that V TJ¯k
has full row rank. Otherwise, if v TV TJ¯k = 0, it must be the case that
V v ∈ null(J¯Tk ). But since V v ∈ range(V ) and range(V ) is orthogonal to
null(J¯Tk ), we conclude that V v = 0, and the linearly independence of the
columns of V gives v = 0.
Moreover, equations (2.23) imply that πU = 0 and π -k ∈ range(J¯k ). If
(gk+1 )I denotes the free components of gk+1 = gk + Hpk , then

J¯Tk π
-k = (gk+1 )I -k ∈ range(J¯k ).
and π

These are the necessary and sufficient conditions for π -k to be the unique
least-length solution of the compatible equations J¯Tk π = (gk+1 )I . This
implies that the regularization gives a unique vector of multipliers.
Wright [173, 174, 175] and Hager [117] show that an SQP method using
the regularized equations (2.22) will converge at a superlinear rate, even
in the degenerate case. In Section A.3 of the Appendix, QP methods are
discussed that give equations of the form (2.22) at every outer iteration,
not just in the neighborhood of the solution. These methods implicitly
shift the constraints by an amount of order μ and give QP multipliers that
converge to an O(μ) estimate of the least-length multipliers.
A related regularization scheme has been proposed and analyzed by
Fischer [58], who solves a second QP to obtain the multiplier estimates.
Anitescu [3] regularizes the problem by imposing a trust-region constraint
on the plain SQP subproblem (2.1) and solving the resulting subproblem
by a semidefinite programming method.
3. The formulation of modern SQP methods. SQP methods
have evolved considerably since Wilson’s thesis appeared in 1963. Current

www.it-ebooks.info
166 PHILIP E. GILL AND ELIZABETH WONG

implementations of SQP methods for large-scale optimization have solved


problems with as many as 40,000 variables and inequality constraints (see,
e.g., Gill, Murray and Saunders [83]). During this evolution, both the
theory and practice of SQP methods have benefited substantially from de-
velopments in competing methods. Similarly, research in SQP methods
has had a considerable impact on the formulation and analysis of rival
methods—for example, on the treatment of equality constraints in interior
methods. On the surface, many recent SQP methods bear little resem-
blance to the plain SQP method proposed by Wilson. In this section we
review some of the principal developments in SQP methods since 1963 while
emphasizing connections to other methods. In our discussion, we use the
broad definition of an SQP method as one that uses a quadratic program-
ming subproblem to estimate the active set. Implicit in this definition is
the assumption that, in the neighborhood of the solution, an SQP method
will solve the Newton KKT equations (or some approximation) defined in
terms of the free variables.
The complex interrelationships that exist between optimization meth-
ods make it difficult (and controversial) to give a precise taxonomy of the
many different SQP approaches. Instead, we will discuss methods under
four topics that, in our opinion, were influential in shaping developments
in the area. Each of these topics will provide a starting-point for discussion
of related methods and extensions. The topics are: (i) merit functions and
the Han-Powell SQP method, (ii) sequential unconstrained methods, (iii)
line-search and trust-region filter methods, and (iv) methods that solve a
convex program to determine an estimate of the active set. The modern
era of SQP methods can be traced to the publication of the Han-Powell
method in 1976 [118, 153]. (It may be argued that almost all subsequent
developments in SQP methods are based on attempts to correct perceived
theoretical and practical deficiencies in the Wilson-Han-Powell approach.)
The sequential unconstrained approaches to SQP have evolved from a 1982
paper by Fletcher [61, 62]. Filter SQP methods are a more recent develop-
ment, being proposed by Fletcher and Leyffer [66, 67] in 1998.
3.1. Review of line-search and trust-region methods. Our dis-
cussion of the equality constrained problem in Section 2.1.1 emphasizes the
local equivalence between a plain SQP method and Newton’s method ap-
plied to the first-order optimality conditions. As the Newton iterates may
diverge or may not be well-defined if the starting point is not sufficiently
close to a solution, some modification is needed to force convergence from
arbitrary starting points. Line-search methods and trust-region methods
are two alternative modifications of Newton’s method. We begin by review-
ing the main properties of these methods in the context of unconstrained
minimization.
3.1.1. Line-search methods: the unconstrained case. Associ-
ated with the kth iteration of a conventional line-search method for un-

www.it-ebooks.info
SQP METHODS 167

constrained optimization is a scalar-valued function mk (x) that represents


a local line-search model of f . The next iterate is then xk+1 = xk + dk ,
where dk is chosen so that the improvement in f is at least as good as a
fixed fraction of the improvement in the local model, i.e., dk must satisfy
. /
f (xk ) − f (xk + dk ) ≥ η mk (xk ) − mk (xk + dk ) , (3.1)

where η is a fixed parameter such that 0 < η < 12 . Typical line-search


models are affine and quadratic functions based on a first- or second-order
Taylor-series approximation of f . For example, a first-order approximation
provides the affine line-search model mk (x) = f (xk ) + g(xk )T(x − xk ).
In a general line-search method, the change in variables has the form
dk ≡ dk (αk ), where αk is a scalar steplength that defines a point on
the parameterized path dk (α). In the simplest case, dk (α) = αpk , where
pk is an approximate solution of the unconstrained subproblem minp∈Rn
gkTp + 12 pTBk p, with Bk a positive-definite approximation of the Hessian
Hk . More generally, if Hk is indefinite, dk (α) is defined in terms of pk
and a direction sk such that sTkHk sk < 0 (see, e.g., Goldfarb [99], Moré and
Sorensen [138], and Olivares, Moguerza and Prieto [144]). A crucial feature
of a line-search method is that dk (α) is defined in terms of a convex sub-
problem, which may be defined implicitly during the calculation of pk ; see,
e.g., Greenstadt [112], Gill and Murray [81], Schnabel and Eskow [163]).
Condition (3.1) may be written as f (xk ) − f (xk + dk ) ≥ ηΔmk (dk ),
where the quantity

Δmk (d) = mk (xk ) − mk (xk + d) (3.2)

is the change in f predicted by the line-search model function. An essential


property of the line-search model is that it must always be possible to find
an αk that satisfies (3.1). In particular, there must exist a positive ᾱ
such that
. /
f (xk + dk (α)) ≤ f (xk ) − ηΔmk dk (α) , for all α ∈ (0, ᾱ). (3.3)

/ the model must predict a reduction in f (x) at


For this condition .to hold,
x = xk , i.e., Δmk dk (α) > 0 for all α ∈ (0, ᾱ). Under the assumption
that (3.3) holds, there are various algorithms for finding an appropriate αk .
For example, in a backtracking line search, the step αk = 1 is decreased by
a fixed factor until condition (3.1) is satisfied. It can be shown that this
simple procedure is enough to guarantee a sufficient decrease in f . More
sophisticated methods satisfy (3.1) in conjunction with other conditions
that ensure a sufficient decrease in f (see, e.g., Ortega and Rheinboldt [145],
Moré and Thuente [140], and Gill et al. [86]).
The line-search methods defined above enforce a monotone decrease
in f at each iteration. In some cases the definition of dk (α) may warrant
the use of a nonmonotone line search in which f is permitted to increase
on some iterations. An example of a nonmonotone line-search condition is

www.it-ebooks.info
168 PHILIP E. GILL AND ELIZABETH WONG

. /
f (xk + dk (α)) ≤ max [f (xk−j )] − ηΔmk dk (α) ,
0≤j≤r

where r is some fixed number of previous iterations (for other schemes of


varying complexity, see, e.g., Grippo, Lampariello and Lucidi [113, 114,
115], Toint [167], and Zhang and Hager [180]). In Section 3.2, we discuss
the “watchdog technique”, which is a nonmonotone line search that allows
the value αk = 1 to be used for a limited number of steps, regardless of the
value of f .
3.1.2. Trust-region methods: the unconstrained case. When
there are no constraints, line-search methods and trust-region methods
have many properties in common. Both methods choose the value of a
scalar variable so that the objective improves by an amount that is at least
as good as a fraction of the improvement in a local model (see condition
(3.1)). A crucial difference is that a line-search method involves the solution
of a bounded convex subproblem. By contrast, trust-region methods solve
a constrained, possibly nonconvex, subproblem of the form

min gkTd + 12 dTHk d subject to d ≤ δk , (3.4)


d∈Rn

with condition (3.1) being enforced, if necessary, by reducing the positive


scalar δk (the trust-region radius). The final value of δk is also used to define
an initial estimate of δk+1 , with the possibility that δk+1 is increased to a
multiple of δk if the reduction in f is significantly better than the reduction
predicted by the model. If the trust-region radius is reduced over a sequence
of consecutive iterations, the step dk will go to zero along the direction of
steepest descent with respect to the particular norm used to define the trust
region. As in a line search, it is possible to define trust-region methods that
do not enforce a reduction in f at every step (see, e.g., Gu and Mo [116]).
The complexity of constrained minimization is generally higher than
that of unconstrained minimization. Moreover, the trust-region subprob-
lem may need to be solved more than once before the condition (3.1) is
satisfied. Nevertheless, trust-region methods provide computational bene-
fits when some of the eigenvalues of Hk are close to zero (see Kroyan [126]).
Modern trust-region methods require only an approximate solution of (3.4).
For a comprehensive review of trust-region methods for both unconstrained
and constrained optimization, see Conn, Gould and Toint [42].
3.2. The Han-Powell method. Han [119] and Powell [153] intro-
duced two crucial improvements to the plain SQP method of Wilson. The
first was the use of a QP subproblem defined in terms of a positive-definite
quasi-Newton approximation. The second was the use of a line-search merit
function to obtain a sequence of improving estimates of the solution.
A merit function M is a scalar-valued function whose value provides
a measure of the quality of a given point as an estimate of a solution of the
constrained problem. Each value of M represents a compromise between

www.it-ebooks.info
SQP METHODS 169

the (usually conflicting) aims of minimizing the objective function and


minimizing the constraint violations. Analogous to the unconstrained case,
the merit function is used in conjunction with a line-search model mk (x)
to define a sufficient decrease at the kth iteration. In the constrained case,
dk is chosen to satisfy
. /
M(xk ) − M(xk + dk ) ≥ η mk (xk ) − mk (xk + dk ) , xk + dk ≥ 0. (3.5)

Han and Powell proposed the use of the 1 penalty function (1.8) as a
merit function, i.e., M(x) 
= M(x; ρ) = P1 (x; ρ). Moreover, they suggested
that dk (α) = αpk = α(- xk − xk ), where x-k is the solution of the convex
subproblem

minimize fk + gkT(x − xk ) + 12 (x − xk )TBk (x − xk )


x∈Rn
(3.6)
subject to ck + Jk (x − xk ) = 0, x ≥ 0,

with Bk a positive-definite approximation to the Hessian of the Lagrangian


or augmented Lagrangian (in Section 3.2.1 below, we discuss the definition
of Bk in this context). As the QP (3.6) is convex, it may be solved using
either a primal or dual active-set method (see Section A.1 of the Appendix).
In either case, the QP multipliers π -k , and vector p0k of components of pk in
the final basic set satisfy the nonsingular KKT equation
    
0
B k J0kT p0k (gk + Bk ηk )B
=− , (3.7)
J0k −-
πk ck + Jk ηk

where B0k and J0k denote the matrices of basic components of Bk and Jk
and ηk is defined as in (2.21).
3.2.1. Quasi-Newton approximations. Many methods for uncon-
strained minimization use a quasi-Newton approximation of the Hessian
when second derivatives are either unavailable or too expensive to evaluate.
Arguably, the most commonly used quasi-Newton approximation is defined
using the BFGS method (see Broyden [14], Fletcher[59], Goldfarb[98], and
Shanno [164]). Given iterates xk and xk+1 , and a symmetric approximate
Hessian Bk , the BFGS approximation for the next iteration has the form
1 1
Bk+1 = Bk − B d dTB + y yT , (3.8)
dTkBk dk k k k k ykTdk k k

where dk = xk+1 − xk and yk = g(xk+1 ) − g(xk ). If Bk is positive definite,


then Bk+1 is positive definite if and only if the approximate curvature ykT dk
is positive.
For the constrained case, Han [119] proposed maintaining a BFGS
approximation of the Hessian of the augmented Lagrangian function

LA (x, π; ρ) = f (x) − c(x)T π + 12 ρc(x)Tc(x).

www.it-ebooks.info
170 PHILIP E. GILL AND ELIZABETH WONG

(As the Hessian of the Lagrangian does not include the linear constraints,
we have omitted z from the Lagrangian term.) This implies that the gra-
dient difference yk in (3.8) involves the gradient of the augmented La-
grangian, with

dk = xk+1 − xk , and yk = ∇x LA (xk+1 , πk+1 ; ρ) − ∇x LA (xk , πk+1 ; ρ),

where πk+1 are estimates of the optimal dual variables. This proposal is
motivated by the fact that if ρ is sufficiently large, the Hessian of L(x, π; ρ)
is positive definite for all (x, π) close to an isolated solution (x∗ , π∗ ) (see
also, Tapia [165], and Byrd, Tapia and Zhang [29]).
The use of an augmented Lagrangian Hessian for the QP subproblem
xk , π
changes the properties of the QP dual variables. In particular, if (- -k , z-k )
is the solution of the QP (3.6) with Bk defined as Hk +ρJkT Jk , then (-xk , π
-k +
ρck , z-k ) is the solution of the QP (3.6) with Bk replaced by Hk (assuming
that the same local solution is found when Hk is not positive definite). In
other words, if the augmented Lagrangian Hessian is used instead of the
Lagrangian Hessian, the x and z variables do not change, but the π-values
are shifted by ρck . An appropriate value for πk+1 in the definition of yk is
then πk+1 = π -k + ρck , giving, after some simplification,

yk = gk+1 − gk − (Jk+1 − Jk )T π T
-k + ρJk+1 (ck+1 − ck ).

If the approximate curvature ykT dk is not positive, the matrix Bk+1


of (3.8) is either indefinite or undefined. In terms of an update to the
Hessian of the augmented Lagrangian, a negative ykT dk implies that either
ρ is not sufficiently large, or the curvature of the penalty term 12 ρc(x)Tc(x)
is negative along dk . In the first case, ρ must be increased by an amount
that is sufficiently large to give a positive value of ykTdk . In the second case,
the approximate curvature of the Lagrangian is not sufficiently positive and
there is no finite ρ that gives ykTdk > 0. In this case, the update should be
skipped. The curvature is considered not sufficiently positive if

ykTdk < σk , σk = αk (1 − η)pTkBk pk , (3.9)

where η is a preassigned constant (0 < η < 1) and pk is the search direction


x-k − xk defined by the QP subproblem. If ykTdk < σk , then ρ is replaced by
ρ + Δρ, where


⎨ σk − ykT dk
, if dTk Jk+1
T
(ck+1 − ck ) > 0;
Δρ = dTk Jk+1
T
(ck+1 − ck )

⎩ 0, otherwise.

If Δρ = 0, the approximate curvature of c(x)Tc(x) is not positive and the


update should be skipped.
Maintaining an approximation of the Hessian of LA (x; π, ρ) involves a
number of difficulties, all of which stem from the need to increase the value

www.it-ebooks.info
SQP METHODS 171

of ρ. First, the usual convergence of the sequence {Bk } is disrupted when


ρ is increased. Second, a large increase in ρ will give an ill-conditioned
matrix Bk+1 . Finally, because ρ is always increased, the ill-effects of large
values of ρ persist throughout the computation.
Powell [153] suggested the use of a positive-definite BFGS approxima-
tion for the Lagrangian Hessian, i.e., the update pair is

dk = xk+1 − xk ,
(3.10)
yk = ∇x L(xk+1 , πk+1 , zk+1 ) − ∇x L(xk , πk+1 , zk+1 ).

If the QP multipliers are used for πk+1 , the difference in Lagrangian gradi-
ents is given by yk = gk+1 − gk − (Jk+1 − Jk )T π -k .
A positive-definite BFGS approximation may appear to be a surprising
choice for Bk , given that the Hessian of the Lagrangian is generally indefi-
nite at the solution. However, Powell’s proposal is based on the observation
that the approximate curvature is likely to be positive in the neighborhood
of an isolated solution, even when the Hessian of the Lagrangian is indefi-
nite. The reason for this is that the iterates of a quasi-Newton SQP converge
to the solution along a path that lies in the null space of the “free” columns
of the Jacobian. As the Lagrangian Hessian is generally positive definite
along this path, the approximate curvature ykT dk is positive as the iterates
converge and an R-superlinear convergence rate is obtained. Powell’s pro-
posal may be justified by considering the properties of (- xk , π
-k ), the solution
of the QP subproblem. Let pk = x-k − xk and g-(x) = gk + Bk (x − xk ). It is
shown in Section A.4.1 of the Appendix that (- xk , π
-k ) satisfies the equations

Uk pY = −ck , pN = Yk pY ,
xF = xk + pN , ZkT Bk Zk pZ = −ZkTg-(xF ), pT = Zk pZ , (3.11)
T T
p k = p N + pT , Uk π -k = Yk g-(xk + pk ),

where Uk is nonsingular and the columns of Zk lie in the null space of Jk .


These equations indicate that the QP step is the sum of the vectors pN and
pT , where pN is the Newton step to the linearized constraints and pT is
a quasi-Newton step based on approximate second-derivative information
associated with the reduced Hessian ZkT Bk Zk . Because of this disparity in
the quality of the Newton steps, the constraints tend to converge to zero
faster than the reduced gradient and the convergence of a quasi-Newton
SQP method is characterized by the relationship pN /pT  → 0, i.e., the
final search directions lie almost wholly in the null space of J(x∗ ).
If xk is far from a solution, the approximate curvature ykT dk may not
be positive and the formula (3.8) will give an indefinite or undefined Bk+1 .
If, as in the case of unconstrained minimization, the update is skipped
when ykT dk ≤ 0, no new information about curvature of the Lagrangian
will be gained. In this situation, an alternative pair of vectors satisfying

www.it-ebooks.info
172 PHILIP E. GILL AND ELIZABETH WONG

ykT dk > 0 can be used. Given the definition (3.9) of the least permissible
approximate curvature, Powell [152] redefines yk as yk + Δyk , where Δyk
chosen so that (yk + Δyk )Tdk = σk , i.e.,
σk − ykT dk . /
Δyk = T
yk − Bk dk .
dk (yk − Bk dk )
The Powell modification is always well defined, which implies that it is
always applied—even when it might be unwarranted because of negative
curvature of the Lagrangian in the null space of J0k (cf. (3.7)).
3.2.2. Properties of the merit function. The Han-Powell merit
function M(x; ρ) = P1 (x; ρ) has the appealing property that x∗ is an uncon-
strained minimizer of P1 (x; ρ) for ρ > π ∗ ∞ (see, e.g., Zangwill [179], and
Han and Mangasarian [120]). A potential line-search model for P1 (x; ρ) is
mk (x; ρ) = fk + gkT(x − xk ) + 12 (x − xk )TBk (x − xk ) + ρck + Jk (x − xk )1 ,
which is the 1 penalty function defined with local affine and quadratic
approximations for c and f . However, because Bk is positive definite, a
stronger condition on α is defined by omitting the quadratic term and
using the line-search model
mk (x; ρ) = fk + gkT(x − xk ) + ρck + Jk (x − xk )1 . (3.12)
To obtain a smaller value of P1 (x; ρ) at each iteration, the line-search model
must satisfy Δmk (dk ; ρ) > 0, where Δmk (d; ρ) is the predicted reduction
in M analogous to (3.2). The optimality conditions (2.18) for the QP
subproblem together with the affine model (3.12) defined with x = xk +αpk
allow us to write the predicted reduction as
. /
Δmk (αpk ; ρ) = α ρck 1 + cTkπ -k − pTkz-k + pTkBk pk
. /
= α ρck 1 + cTkπ xk − xk )Tz-k + pTkBk pk . (3.13)
-k − (-
The QP optimality conditions give x-k · z-k = 0, yielding
. /
Δmk (αpk ; ρ) = α ρck 1 + cTkπ -k + xTkz-k + pTkBk pk
1m
. / 2
≥α π k )i | + xk · z-k 1 + pTkBk pk ,
|ci (xk )| ρ − |(-
i=1

which implies that if Bk is positive definite, then a sufficient condition


for Δmk (αpk ; ρ) > 0 is ρ ≥ - πk ∞ . Han [118] uses this condition to
define a nondecreasing sequence {ρk } such that ρk > - π j ∞ for all k ≥
j. With this definition of {ρk }, and under assumptions that include the
uniform boundedness of the sequence {Bk } and the existence of at least
one nonnegative x such that ck + Jk (x − xk ) = 0, Han shows that all
accumulation points of the sequence {xk } are first-order KKT points of the
constrained problem (1.2).

www.it-ebooks.info
SQP METHODS 173

3.2.3. Extensions. The introduction of the Wilson-Han-Powell SQP


method (i.e., the plain SQP method with a convex subproblem and a line-
search with respect to a merit function) had an immediate beneficial effect
on the performance of optimization codes. However, as is the case with all
successful innovations, it was not long before certain issues were identified
that have an impact on performance and reliability. In this section we
consider some of these issues and outline some extensions of the Wilson-
Han-Powell method that are intended to address them.
The Maratos effect and alternative merit functions. The value
of the penalty parameter in the 1 merit function M in (3.5) can have
a substantial effect on the overall efficiency of an SQP method. When
solving a sequence of related problems, it may be possible to provide a
good estimate of the optimal multipliers, and a value of ρ ≈ π ∗ ∞ can
be specified. When ρ is large relative to the magnitude of f , the level
surfaces of M closely resemble the constraint surface c(x) = 0. If the
constraints are changing rapidly and the SQP outer iterates become close
to a nonoptimal point near the constraints (as is the case for methods that
use a quasi-Newton approximation Bk , see Section 3.2.1), the iterates must
negotiate the base of a steep-sided curved valley. In this situation, the affine
model of the constraints provides for only a limited amount of progress
along the SQP direction, and the step α = 1 fails to reduce the value
of M. This rejection of the plain SQP step near x∗ causes a breakdown
of the superlinear convergence rate. Various strategies have been devised
to prevent this phenomenon, which is known as the “Maratos effect” (see
Maratos [134]). One approach is to use a “nonmonotone” line search that
allows the merit function to increase for a limited number of iterations (see,
e.g., Chamberlain et al. [30], and Dai and Schittkowski [45]).
Another approach, proposed by Fletcher [62], seeks to reduce the mag-
nitude of the penalty term by computing a second-order correction to the
Newton step. The second-order correction uses the step pk + sk , where the
step sk is the solution of a second subproblem:

minimize fk + gkT(pk + s) + 12 (pk + s)TBk (pk + s)


s∈R
n
(3.14)
subject to c(xk + pk ) + Jk s = 0, xk + pk + s ≥ 0.

The second-order correction requires an additional constraint evaluation at


the SQP point xk + pk .
If a feasible-point active-set method is used to solve (3.6) and the
subproblem is started with the basic set from the previous iteration, the
second-order correction need be computed only if (3.6) is solved in one
iteration, which is a necessary condition for the method to be in the final
stages of convergence. If the solution of (3.14) is also identified in one
iteration and J0k is the matrix of basic columns of Jk , then sk satisfies the
equations

www.it-ebooks.info
174 PHILIP E. GILL AND ELIZABETH WONG

   . / 
B0 J0kT sk gk + Bk (pk + ηk ) B
k =− , (3.15)
J0k 0 −π̄ k c(xk + pk ) + Jk ηk

where π̄ k is the vector of optimal multipliers for (3.14) and ηk is defined


as in (2.21). In this situation, if some factorization of the KKT matrix
is available on termination of the solution of (3.6), the correction may
be obtained with just one solve with a different right-hand side. For an
analysis of the rate of convergence, see Fletcher [62] and Yuan [177].
Other merit functions may be defined for different choices of the norm
of the constraint violations. For the infinity-norm, the ∞ penalty func-
tion P∞ (x; ρ) defined in (1.9) may be used in conjunction with the line-
search model

mk (x; ρ) = fk + gkT(x − xk ) + ρck + Jk (x − xk )∞ .

This model predicts a reduction in P∞ (x; ρ) if pTk Bk pk ≥ 0 and ρ > -


π k 1 .
Anitescu [2] considers the convergence of the ∞ merit function when ap-
plied with various line-search strategies and a convex QP subproblem.
Like its 1 counterpart, the ∞ penalty function can exhibit the Mara-
tos effect for large values of ρ. Merit functions that do not have this prob-
lem may be defined by using a smooth norm for the constraint violations.
In general, a merit function may be defined in terms of the primal vari-
ables only, or may include estimates of the Lagrange multipliers. A merit
function that does not suffer from the Maratos effect is the augmented
Lagrangian function:

M(x, π; ρ) ≡ f (x) − πT c(x) + 12 ρc(x)T c(x), (3.16)

where π is a multiplier estimate and ρ is a nonnegative penalty parameter.


Schittkowski [160, 161, 162], and Gill et al. [93, 83] define SQP meth-
ods in which both the primal and dual variables are modified by the line
search, with

xk+1 = xk + αk pk , πk+1 = πk + αk qk , (3.17)

where the primal-dual search directions pk = x-k − xk and qk = π -k − πk are


based on the solution (-xk , π
-k ) of a convex QP with a quasi-Newton Hessian.
When an augmented Lagrangian is used in the conventional role as an ob-
jective function for sequential unconstrained minimization, new multiplier
estimates are obtained by maximizing with respect to the dual variables.
In the SQP context, the inclusion of the dual variables as arguments for
minimization serves to make the augmented Lagrangian a continuous func-
tion of both the primal and dual variables, with the step length acting as a
continuation parameter that links the old and new values of π. If necessary,
the penalty parameter is increased to ensure that the primal-dual direction
is a descent direction for the merit function. However, it can be shown that

www.it-ebooks.info
SQP METHODS 175

under typical assumptions on the problem, the penalty parameter remains


bounded (see Gill et al. [93], and Murray and Prieto [142] for details). If
the objective is convex and the feasible region is a convex set, it is often
the case that the penalty parameter never needs to be increased from an
initial value of zero.
A number of line-search SQP methods have been proposed that use
variants of the conventional augmented Lagrangian as a merit function (see,
e.g., DiPillo and Grippo [49], Bertsekas [6], Byrd, Tapia and Zhang [29], and
Anitescu [2]). A primal-dual augmented Lagrangian has been proposed by
Gill and Robinson [95]. Given an estimate πE of the multipliers π ∗ , consider
the function
1
LA (x, π; πE , μ) = f (x) − c(x)T πE + c(x)22

(3.18)
1
+ c(x) + μ(π − πE )2 ,

where μ is a positive inverse penalty parameter (see also, Forsgren and


Gill [71], Robinson [155], and Gill and Robinson [95]). The primal-dual aug-
mented Lagrangian has a bound-constrained minimization property anal-
ogous to the conventional augmented Lagrangian (1.11). In particular, if
πE is given the value of the optimal multiplier vector π∗ , then (x∗ , π∗ ) is a
first-order KKT point for the bound-constrained problem

minimize LA (x, π; π ∗ , μ) subject to x ≥ 0.


x∈Rn ; π∈Rm

Moreover, if the second-order sufficient conditions for optimality hold, then


there exists a finite μ̄ such that (x∗ , π ∗ ) is an isolated unconstrained min-
imizer of LA for all μ < μ̄. It follows that LA may be minimized simulta-
neously with respect to both the primal and dual variables. A benefit of
using LA as an SQP merit function is that it may be used in conjunction
with a regularized method for solving the QP subproblem (see Section A.3
of the Appendix for details).
We conclude this section by mentioning methods that avoid the need
for a merit function altogether by generating iterates that are always feasi-
ble. In many physical and engineering applications, the constraint functions
not only characterize the desired properties of the solution, but also define
a region in which the problem statement is meaningful (for example, f (x)
or some of the constraint functions may be undefined outside the feasible
region). In these applications, an interior point can usually be determined
trivially. Interior methods are therefore highly appropriate for this class
of problem. However, several SQP methods have been proposed for opti-
mization in this context, see, e.g., Lawrence and Tits [127], and Kostreva
and Chen [124, 125]. These methods are suitable for problems that have
only inequality constraints, the only exception being linear equality con-

www.it-ebooks.info
176 PHILIP E. GILL AND ELIZABETH WONG

straints, which can be kept feasible at every iterate (see, e.g., Gill, Murray
and Wright [94]).

Formulation of the QP subproblem. A potential difficulty associated


with SQP methods based on the direct linearization of the nonlinear con-
straints, is that the QP subproblem may be infeasible. This can be caused
by the nonlinear constraints being infeasible, or by a poor linearization at
the current iterate. In the context of the Wilson-Han-Powell method, this
problem may be resolved by perturbing the QP subproblem so that the con-
straints always have a feasible point. The magnitude of the perturbation is
then reduced or eliminated as the iterates converge. Powell [153] focused
on the case of an infeasible linearization and considered the modified QP:

minimize fk + gkT(x − xk ) + 12 (x − xk )TBk (x − xk ) + 12 ρk (1 − θ)2


x∈Rn ,θ∈R1
(3.19)
subject to (1 − θ)ck + Jk (x − xk ) = 0, x + θ [ xk ]− ≥ 0,

where [ x ]− is the vector with components max{−xi , 0}, and θ is an addi-


tional variable that is driven to zero by increasing the nonnegative penalty
parameter ρk . The modified QP is always feasible—for example, the point
(x, θ) = (xk , 1) satisfies the constraints.
Burke [21, 22], and Burke and Han [23] use a different approach that is
based on the observation that the problem (1.2) is actually two problems in
one: the feasibility problem of satisfying the constraints, and the optimality
problem of minimizing f . They define a line-search algorithm that has the
primary goal of attaining feasibility.
The computation of the line-search direction is organized into two
phases. The first phase ignores the objective and computes a descent di-
rection for a function that measures the distance of an arbitrary point to
the set of feasible points for the nonlinear problem. The required direction
is computed by minimizing the distance to the feasible set for the linearized
constraints. For the second phase, the constraint residuals corresponding
to the optimal value of the distance function are used to modify the con-
straints of the conventional QP subproblem. The modified QP is always
feasible, and the resulting direction is used in a line search with a merit
function that includes a term involving the value of the distance function.
Under certain assumptions, this procedure provides a sequence that con-
verges to a first-order stationary point of either the original problem or the
distance function.
The definition of the distance function requires a choice of norm, al-
though Burke and Han provide a general analysis that is independent of the
norm. For simplicity, we describe the computations for each phase when
the distance function is defined in terms of the one-norm. Given current
values of parameters σk and βk such that 0 < σk ≤ βk , the first phase
involves the solution of the linear program

www.it-ebooks.info
SQP METHODS 177

minimize eTu + eTv


x,v∈Rn ; u∈Rm

subject to −u ≤ ck + Jk (x − xk ) ≤ u, x + v ≥ 0, v ≥ 0, (3.20)
−σk e ≤ x − xk ≤ σk e.
This problem gives vectors u and v of least one-norm for which the con-
straints ck + Jk (x − xk ) = u, x + v ≥ 0 and x − xk ∞ ≤ σk are feasible. If
the original linearized constraints are feasible, then the work necessary to
solve problem (3.20) is comparable to that of the feasibility phase of a two-
phase active-set method for the plain QP subproblem (see Section A.1).
The difference is the extra expense of locating a bounded feasible point
with least-length distance from xk . Let xF denote the computed solution
of the phase-1 problem (3.20). The computation for phase 2 involves the
solution of the QP:
minimize fk + gkT(x − xk ) + 12 (x − xk )TBk (x − xk )
x∈R
n

subject to ck + Jk (x − xk ) = c-k (xF ), x + [ xF ]− ≥ 0, (3.21)


−βk e ≤ x − xk ≤ βk e,
where, as usual, c-k (x) denotes the vector of linearized constraint functions
c-k (x) = ck + Jk (x − xk ). The phase-2 problem (3.21) is a convex program
with bounded solution x-k (say). This solution is used to define the search
direction pk = x-k − xk for a line search on the merit function
M(x) = f (x) + ρk c(x)1 + ρk  [ x ]− 1 .
Once xk+1 = xk + αk pk has been determined, the positive-definite approx-
imate Hessian Bk is updated as in the conventional Han-Powell method.
For details on how the parameters ρk , σk and βk are updated, the reader is
referred to Burke and Han [23]. Other related variants of the Wilson-Han-
Powell formulation are proposed by Liu and Yuan [130], and Mo, Zhang
and Wei [135].
Another approach to the treatment of infeasibility is to modify the
original nonlinear constraints so that the linearized constraints of the QP
subproblem are always feasible. This is the approach taken in the method
of SNOPT (see Gill, Murray and Saunders [83]). The method proceeds
to solve (1.2) as given, using QP subproblems based on the conventional
linearization of the nonlinear constraints. If a QP subproblem proves to be
infeasible or unbounded (or if the Lagrange multiplier estimates become
large), SNOPT enters nonlinear elastic mode and switches to the nonlinear
elastic problem (1.4). The QP subproblem for the nonlinear elastic problem
is given by
minimize f-k (x) + ρk eTu + ρk eTv
x∈Rn ; u,v∈Rm

subject to ck + Jk (x − xk ) − u + v = 0, (3.22)
x ≥ 0, u ≥ 0, v ≥ 0,

www.it-ebooks.info
178 PHILIP E. GILL AND ELIZABETH WONG

where f-k (x) + ρk eTu + ρk eTv is the composite QP objective and ρk is a


nonnegative penalty parameter or elastic weight analogous to the quantity
defined in Section 1.3. This problem is always feasible and the solution is
used in conjunction with a merit function defined in terms of the nonlinear
elastic problem (1.4).
Quasi-Newton updates and indefiniteness. A necessary condition for
Q-superlinear convergence is that the approximate Hessian matrices {Bk }
satisfy
. /
Zk ZkT Bk − H(x∗ , π∗ ) Zk ZkT dk 
lim = 0,
k→∞ dk 
where Zk is the matrix defined in (3.11) (see Boggs and Tolle [9]). The
definition of yk and dk should ensure that this condition is satisfied as the
solution is approached, so that Q-superlinear convergence is not inhibited.
One possible modification uses the intermediate point xF defined by
equations (3.11). If xF is known, new values of dk and yk are computed
based on evaluating the nonlinear functions at the point wk = xk +αk (xF −
xk ). The BFGS update is then attempted using the update pair:
dk = xk+1 − wk , yk = ∇x L(xk+1 , πk+1 , zk+1 ) − ∇x L(wk , πk+1 , zk+1 ).
The purpose of this modification is to exploit the properties of the reduced
Hessian in the neighborhood of a local minimizer of (2.4). With this choice
of wk , the change in variables is dk = xk+1 − wk = αk pT , where pT is the
vector x-k − xF (see (3.11) above). Then,
ykTdk = αk ykT pT ≈ α2k pTT H(xk , πk+1 )pT = α2k pTZ ZkTH(wk , πk+1 )Zk pZ .
It follows that ykTdk approximates the curvature of the reduced Hessian,
which is positive definite sufficiently close to an isolated local minimizer of
(2.4). If this modification does not provide sufficiently positive approximate
curvature, no update is made. An additional function evaluation is required
at wk , but the modification is rarely needed more than a few times—even
when the Hessian of the Lagrangian has negative eigenvalues at a solution.
(For further information, see Gill, Murray and Saunders [83].)
Large-scale Hessians. If the number of variables is large, conventional
quasi-Newton methods are prohibitively expensive because of the need to
store the (dense) matrix Bk . A limited-memory approach uses a fixed
number of vectors, say , to define a positive-definite approximation to
H(xk , πk ) based on curvature information accrued during the most recent
 iterations. Let  be preassigned (say  = 10), and consider any iteration
k such that k ≥  − 1. Given any initial positive-definite approximation
(0) (i)
Bk to H(xk , πk ), consider the sequence of matrices {Bk }, for i = k − ,
k −  + 1, . . . , k, such that
(k−) (0) (i+1) (i)
Bk = Bk , Bk = Bk + vi viT − ui uTi , i = k − , . . . , k − 1,

www.it-ebooks.info
SQP METHODS 179

where the {(ui , vi )} are  vector pairs with each (ui , vi ) defined in terms of
(dk− , yk− ), . . . , (di−1 , yi−1 ) (cf. (3.10)) via
1 (i) 1
ui = (i)
Bk di , and vi = yi .
(yiTdi ) 2
1
(dTiBk di ) 2
1

Similar limited-memory quasi-Newton approximations are described by


Nocedal and Wright [143], Buckley and LeNir [15, 16] and Gilbert and
Lemaréchal [77]. More elaborate schemes are given by Liu and Nocedal
[129], Byrd, Nocedal, and Schnabel [27], and Gill and Leonard [80], and
some have been evaluated by Morales [136].
The definition of Bk requires the  pairs (ui , vi ). Each of the vectors
(i)
ui (k −  ≤ i ≤ k − 1) involves the product Bk di , which is computed using
the recurrence relation
i−1

(i) (0) . /
Bk di = Bk di + (vjT di )vj − (uTj di )uj .
j=k−

For the vectors vi (k −  ≤ i ≤ k − 1) and scalars vjTdi (k −  ≤ j ≤ i − 1),


only vk−1 and vjTdk−1 (k −  ≤ j ≤ k − 2) need to be computed at iteration
k as the other quantities are available from the previous iteration.
A separate calculation may be used to update the diagonals of Bk
from (3.8). On completion of iteration k, these diagonals form the next
(0)
positive-definite Bk+1 . Then, at the kth iteration, we define the approxi-
mate Hessian
(k) (0)
Bk = Bk = Bk + Vk VkT − Uk UkT ,

where Uk = ( uk− uk−+1 · · · uk−1 ) and Vk = ( vk− vk−+1 · · · vk−1 ).


It must be emphasized that Bk is not computed explicitly. Many sparse
QP solvers access Bk by requesting products of the form Bk u. These are
computed with work proportional to . For situations where the QP solver
solves an explicit sparse system of the form (3.7), the solution may be found
using the bordered matrix
⎛ (0) ⎞
B0 J0kT V0kT U 0 T ⎛ pk ⎞ ⎛
(gk + Bk ηk )B

k k
⎜ 0 ⎟⎜
⎜ Jk ⎟ ⎜−- πk ⎟ ⎜
⎟ = − ⎜ ck + Jk ηk ⎟ ,

⎜ 0 ⎟⎝ ⎠ ⎝ ⎠
⎝ Vk I ⎠ r 0
0
Uk −I s 0

where B 0(0) , J0k , V0k and U


0k denote the matrices of basic components of B (0) ,
k k
Jk , Vk and Uk . Following [88, Section 3.6.2], if we define
     T
0(0) J0T I V0 1 2
B
K0 = k k , S= − 0kT K0−1 V0k U 0k ,
J0k −I U k

www.it-ebooks.info
180 PHILIP E. GILL AND ELIZABETH WONG

it would be efficient to work with a sparse factorization of K0 and dense


factors of its Schur complement S. (For a given QP subproblem, U and V
are constant, but changes to J0k would be handled by appropriate updates
to the Schur complement. See Section A.4.2 of the Appendix.) For general
QP solvers that require an explicit sparse Hessian, the limited-memory up-
dates can be applied implicitly by including additional linear equality con-
straints in the QP subproblem, see Gould and Robinson [107]. Bradley [12]
describes a BFGS limited-memory method for SQP that employs a diagonal
approximation in conjunction with a circular buffer.
In practice, the quasi-Newton approximation may become indefinite
because of rounding error and it is better numerically to write Bk in the
form Bk = GTk Gk , where Gk is the product of elementary matrices
k−1

(0)
Gk = Gk (I + dj wjT ), (3.23)
j=k−

(0) (0)T (0) (j)


with Bk = Gk Gk and wj = (±vj − uj )/(dTjBk dj ) 2 (see Brodlie,
1

Gourlay and Greenstadt [13], Dennis and Schnabel [48], and Gill, Murray
and Saunders [83]). The sign of vj may be chosen to minimize the rounding
error in computing wj . The quantities (dj , wj ) are stored for each j. During
outer iteration k, the QP solver accesses Bk by requesting products of the
form Bk z. These are computed with work proportional to  using the
recurrence relations:
(0)
z ← z + (wjTz)dj , j = k − 1 : k − ; z ← Gk z;
(0)T
t ← Gk z; t ← t + (dTj t) wj , j = k −  : k − 1.

Products of the form uTBk u are easily and safely computed as z22 with
z = Gk u.
In a QP solver that updates the Schur complement matrix an explicit
sparse Hessian, the system (3.7) with Bk = GTk Gk is equivalent to
⎛ ⎞⎛ ⎛ ⎞ ⎞
B0(0) J0T u
0k− w
0k− · · · u
0k−1 w
0k−1 pk (gk + Bk ηk )B
k k
⎜ 0 ⎟⎜ ⎜ ⎟
⎜ Jk ⎟ ⎜ −-πk ⎟
⎟ ⎜ ck + Jk ηk ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜u T ⎟ ⎜ rk− ⎟ ⎜ ⎟
⎜ 0k− γk− −1 ⎟⎜ ⎟ ⎜ 0 ⎟
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎜w0 −1 ⎟ ⎜ sk− ⎟ = − ⎜ 0 ⎟,
⎜ k− ⎟⎜ ⎟ ⎜ ⎟
⎜ . .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎜ .. . ⎟⎜ . ⎟ ⎜ . ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎝u0k−1 γk−1 −1 ⎠ ⎝rk−1 ⎠ ⎝ 0 ⎠
0Tk−1
w −1 sk−1 0

where d0j = (dj )B , u


0j = (Bk dj )B , and γj = d0Tj u
(j)
0j (see Gill, Murray and
Saunders [83], and Huynh [123]).

www.it-ebooks.info
SQP METHODS 181

An alternative form of the limited-memory update is used by Gill,


Murray and Saunders [83]. Let r and k denote two outer iterations such
that r ≤ k ≤ r + . At iteration k the BFGS approximate Hessian may be
expressed in terms of  updates to a positive-definite Br :
k−1
 . /
Bk = Br + vi viT − ui uTi , (3.24)
i=r

where ui = Bi di /(dTiBi di ) 2 , and vi = yi /(yiTdi ) 2 . In this scheme, the


1 1

k − r pairs (ui , vi ) do not need to be recomputed for each update. On


completion of iteration k = r + , a total of  pairs have been accumulated,
and the storage is “reset” by discarding the previous updates. Moreover,
the definition of ui is simplified by the identity Bi di = −αi ∇x L(- xi , π
-i , z-i )
that follows from the QP optimality conditions (2.18). As in the previous
scheme, a separate calculation may be used to update the diagonals of Bk
from (3.8). On completion of iteration k = r + , these diagonals form the
next positive-definite Br (with r = k + 1).
This scheme has an advantage in the SQP context when the constraints
are linear: the reduced Hessian for the QP subproblem can be updated
between outer iterations (see Section A.4.1).
Early termination of QP subproblems. SQP theory usually assumes
that the QP subproblems are solved to optimality. For large problems with
a poor starting point and B0 = I, many thousands of iterations may be
needed for the first QP, building up many free variables that are promptly
eliminated by more thousands of iterations in the second QP. In general, it
seems wasteful to expend much effort on any QP before updating Bk and
the constraint linearization.
Any scheme for early termination must be implemented in a way that
does not compromise the reliability of the SQP method. For example, sup-
pose that the QP iterations are terminated after an arbitrary fixed number
of steps. If a primal active-set method is used to solve the subproblem,
the multipliers associated with QP constraints that have not been opti-
mized will be negative. Using these multipliers directly (or first setting
them to zero) in the definition of the Lagrangian function is problematic.
The resulting search direction may not be a descent direction for the merit
function, or may require the penalty parameter to be increased unnecessar-
ily. For example, the value of the lower bound on the penalty parameter
for the 1 merit function involves the values of the QP multipliers—see,
(3.13). Dembo and Tulowitzki [47] suggest using a dual feasible active-
set method for the QP subproblem and terminating the inner iterations
when the norm of a potential search direction pk = x-k − xk is small. Dual
feasible active-set methods have the advantage that the approximate multi-
pliers are nonnegative, but a terminated iteration will have some negative
primal variables—this time making the definition of the search direction
problematic.

www.it-ebooks.info
182 PHILIP E. GILL AND ELIZABETH WONG

Murray and Prieto [142] suggest another approach to terminating the


QP solutions early, requiring that at least one QP subspace stationary point
be reached (see Definition A.1 of the Appendix). The associated theory im-
plies that any subsequent point x-k generated by a special-purpose primal-
feasible QP solver gives a sufficient decrease in the augmented Lagrangian
merit function (3.16), provided that - xk − xk  is nonzero.
Another way to save inner iterations safely during the early outer
iterations is to suboptimize the QP subproblem. At the start of an outer
iteration, many variables are fixed at their current values (i.e., xi is fixed
at (xk )i ) and an SQP outer iteration is performed on the reduced problem
(solving a smaller QP to get a search direction for the nonfixed variables).
Once a solution of the reduced QP is found, the fixed variables are freed,
and the outer iteration is completed with a “full” search direction that
happens to leave many variables unaltered because pi = (- xi − xk )i = 0
for the temporarily fixed variables. At each step, the conventional theory
for the reduction in the merit function should guarantee progress on the
associated reduced nonlinear problem. In practice, it may not be obvious
which variables should be fixed at each stage, the reduced QP could be
infeasible, and degeneracy could produce a zero search direction. Instead,
the choice of which variables to fix is made within the QP solver. In
the method of SNOPT, QP iterations are performed on the full problem
until a feasible point is found or elastic mode is entered. The iterations
are continued until certain limits are reached and not all steps have been
degenerate. At this point all variables such that xi = (xk )i are frozen at
their current value and the reduced QP is solved to optimality. With this
scheme it is safe to impose rather arbitrary limits, such as limits on the
number of iterations (for the various termination conditions that may be
applied, see Gill, Murray and Saunders [83, 84]). Note that this form of
suboptimization enforces the condition ((- xk − xk ) · z-k )i = 0 for the frozen
variables and so the nonoptimized variables have no affect on the magnitude
of the penalty parameter in (3.13).

3.3. Sequential unconstrained methods. Fletcher [61] observed


that the 1 penalty function (1.8) can be minimized subject to bounds
by solving a sequence of nondifferentiable unconstrained subproblems of
the form

minimize f-k (x) + ρ-


ck (x)1 + ρ [ x ]− 1 , (3.25)
x∈R
n

where c-k (x) denotes the linearized constraint functions c-k (x) = ck + Jk (x −
xk ), and [ v ]− = max{−vi , 0}. In this case the bound constraints are
not imposed explicitly. Fletcher proposed minimizing this function using a
trust-region method, although a line-search method would also be appropri-
ate, particularly if Hk were positive definite. The trust-region subproblem
has the form

www.it-ebooks.info
SQP METHODS 183

minimize f-k (xk + d) + ρ-


ck (xk + d)1 + ρ [ xk + d ]− 1
d∈R
n
(3.26)
subject to d ≤ δk .

The trust-region radius δk is increased or decreased based on condition


(3.5), where M(x; ρ) = P1 (x; ρ) and mk (x; ρ) is defined in terms of an affine
or quadratic model of the modified Lagrangian (see (3.12)). This approach
forms the basis of Fletcher’s “S1 -QP method”. Each subproblem has a
piecewise quadratic objective function and has the same complexity as a
quadratic program of the form (2.1). If the infinity norm is used to define
the size of the trust region, the subproblem is equivalent to the smooth
quadratic program

minimize f-k (xk + d) + ρeTu + ρeTv + ρeTw


d,w∈Rn ; u,v∈Rm

subject to c-k (xk + d) − u + v = 0, u ≥ 0, v ≥ 0,


−δk e ≤ d ≤ δk e, xk + d + w ≥ 0, w ≥ 0,

where, analogous to (1.4), the vectors u and v may be interpreted as the


positive and negative parts of the affine function ck + Jk d. A benefit of
this formulation is that a solution of (3.26) always exists, even when the
linearized constraints of the plain QP subproblem are inconsistent. Observe
that the unconstrained subproblem (3.25) is defined in terms of f-k (x),
the model of the modified Lagrangian (see (2.2)). This feature is crucial
because it implies that if the trust-region radius and penalty parameter are
sufficiently large in the neighborhood of an isolated solution, the S1 -QP
subproblem is the same as the plain SQP subproblem (2.1). Nevertheless,
the implicit minimization of the 1 penalty function means that there is
the possibility of the Maratos effect. For the trust-region approach, the
second-order correction may be determined from the quadratic program

minimize f-k (xk + dk + s) + ρeTu + ρeTv + ρeTw


s,w∈Rn ; u,v∈Rm

subject to Jk s − u + v = −c(xk + dk ), u ≥ 0, v ≥ 0, (3.27)


−δk e ≤ dk + s ≤ δk e, xk + dk + s ≥ 0, w ≥ 0.

Yuan [177] gives an analysis of the superlinear convergence of trust-region


methods that use the second-order correction.
The S1 -QP approach can be used in conjunction with other uncon-
strained merit functions. Many of these extensions lead to a subproblem
that is equivalent to a quadratic program. The “S∞ -QP method” uses the
trust-region subproblem

minimize f-k (xk + d) + ρ-


ck (xk + d)∞ + ρ [ xk + d ]− ∞
d∈R
n
(3.28)
subject to d∞ ≤ δk ,

www.it-ebooks.info
184 PHILIP E. GILL AND ELIZABETH WONG

which is equivalent to the quadratic program

minimize f-k (xk + d) + ρθ + ρσ


d∈Rn ; θ,σ∈R

subject to −θe ≤ c-k (xk + d) ≤ θe, θ ≥ 0, (3.29)


−δk e ≤ d ≤ δk e, xk + d + σe ≥ 0, σ ≥ 0,

see, e.g., Yuan [178], and Exler and Schittkowski [57]. A QP problem for the
second-order correction may be defined analogous to (3.27). For a general
discussion of the convergence properties of nondifferentiable exact penalty
functions in the SQP context, see Fletcher [64], Burke [22], and Yuan [176].
3.4. Filter methods. The definition of the merit function in the
Han-Powell method or the nonsmooth objective function in the sequential
unconstrained optimization method requires the specification of a penalty
parameter that weights the effect of the constraint violations against the
value of the objective. Another way of forcing convergence is to use a filter,
which is a two-dimensional measure of quality based on f (x) and c(x),
where we assume that x ≥ 0 is satisfied throughout. A filter method
requires
. that /progress be made with respect to the two-dimensional function
c(x), f (x) . Using the conventional notation of filter methods, we define
h(x) = c(x) as the measure of infeasibility
. of/ the equality constraints,
and use (hj , fj ) to denote the pair h(xj ), f (xj ) .
The two-dimensional measure provides the conditions for .a point x̄ to/
be “better” than a point x-. Given
. two points
/ x̄ and x-, the pair h(x̄), f (x̄)
is said to dominate the pair h(- x), f (-
x) if

h(x̄) ≤ βh(-
x) and f (x̄) ≤ f (-
x) − γh(x̄),

where β, γ ∈ (0, 1) are constants with 1−β and γ small (e.g., β = 1−γ with
γ = 10−3 ). (For brevity, we say that x̄ dominates x-, although it must be
emphasized that only the objective value and constraint norm are stored.)
A filter F consists of a list of entries (hj , fj ) such that no entry dominates
another. (This filter is the so-called sloping filter proposed by Chin [31]
and Chin and Fletcher [32]. The original filter proposed by Fletcher and
Leyffer [66, .67] uses γ = 0/ and β = 1.)
A pair h(xk ), f (xk ) is said to be “acceptable to the filter” F if and
only if it is not dominated by any entry in the filter, i.e.,

h(xk ) ≤ βhj or f (xk ) ≤ fj − γh(xk ) (3.30)

for every (hj , fj ) ∈ F. In some situations, an accepted point (h(xk ), f (xk ))


is added to the filter. This operation adds (h(xk ), f (xk )) to the list of
entries (hj , fj ) in F, and removes any entries that are dominated by the
new pair. The test (3.30) provides an important inclusion property that if a
pair (h, f ) is added to the filter, then the set of points that are unacceptable

www.it-ebooks.info
SQP METHODS 185

for the new filter always includes the points that are unacceptable for the
old filter.
As in the Burke-Han approach of Section 3.2.3, the principal goal of
a filter method is the attainment of feasibility. An important property of
the filter defined above is that if there are an infinite sequence of iterations
in which (h(xk ), f (xk )) is entered into the filter, and {f (xk )} is bounded
below, then h(xk ) → 0 (see Fletcher, Leyffer and Toint [68]).
3.4.1. Trust-region filter methods. Fletcher and Leyffer [66, 67]
propose a trust-region filter method in which a filter is used to accept
or reject points generated by a plain SQP subproblem with a trust-region
constraint. Below we give a brief description of the variant of the Fletcher-
Leyffer method proposed by Fletcher, Leyffer and Toint [68]. The filter is
defined in terms of the one-norm of the constraint violations, i.e., h(x) =
c(x)1 , and the trust-region subproblem is given by

minimize fk + gkT(x − xk ) + 12 (x − xk )THk (x − xk )


x∈Rn
(3.31)
subject to ck + Jk (x − xk ) = 0, x ≥ 0, (x − xk )∞ ≤ δk .

To simplify the discussion, we start by assuming that the QP subproblem


(3.31) remains feasible. In this case, the filter method generates a sequence
of points {xk } and a corresponding sequence of filters {Fk } such that xk is
acceptable to the filter Fk and xk+1 = x-k , where x-k is a global minimizer
of the QP subproblem (3.31). The use of a filter alone does not necessarily
enforce convergence to a solution of the constrained problem. For example,
if the iterates converge to an arbitrary feasible point and the infeasibility
measure h is reduced by a factor of at least β at each iteration, then the
iterates will be acceptable to the filter independently of f . This implies that
the filter must be used in conjunction with a sufficient reduction condition
analogous to (3.1), i.e.,

Δmk (dk ) > 0 and f (xk ) − f (xk + dk ) ≥ ηΔmk (dk ), (3.32)

where dk = x-k − xk , and Δmk (dk ) = mk (xk ) − mk (xk + dk ) for a local


model mk (x) of f (e.g., mk (x) = f (xk ) + g(xk )T(x − xk )).
At the start of the kth iteration, we have a point xk and a filter
Fk−1 such that (hk , fk ) is acceptable to Fk−1 , but is not yet included
in Fk−1 (it will be shown below that xk may or may not be included
in the filter even though it constitutes an acceptable entry). The kth
iteration is analogous to that of a backtracking line-search method, except
that the backtracking steps are performed by solving the QP (3.31) with
decreasing values of the trust-region radius. The backtracking continues
until x-k is acceptable to the combined filter Fk−1 ∪ (hk , fk ), and either
f (xk ) − f (xk + dk ) ≥ ηΔmk (dk ) or Δmk (dk ) ≤ 0. On termination of the
backtracking procedure, if Δmk (dk ) ≤ 0, then (hk , fk ) is added to Fk−1
(giving Fk ), otherwise Fk = Fk−1 . Finally, the next iterate is defined

www.it-ebooks.info
186 PHILIP E. GILL AND ELIZABETH WONG

as xk+1 = x-k and the trust-region radius δk+1 for the next iteration is
initialized at some value greater than some preassigned minimum value
δmin . This reinitialization provides the opportunity to increase the trust-
region radius based on the change in f . For example, the trust-region
radius can be increased if the predicted reduction in f is greater that some
positive factor of h.
As mentioned above, although (hk , fk ) is acceptable to Fk−1 , it is
not necessarily added to the filter. The point xk is added if and only if
Δmk (dk ) ≤ 0, in which case the QP solution predicts an increase in f , and
the primary aim of the iteration changes to that of reducing h (by allowing
f to increase if necessary). The requirement that Δmk (dk ) ≤ 0 for adding
to the filter ensures that all the filter entries have hj > 0. This is because
if hk = 0, then the QP must be compatible (even without this being an
assumption), and hence, if xk is not a KKT point, then Δmk (dk ) > 0 and
xk is not added to the filter.
Now we drop our assumption that the QP problem (3.31) is always
feasible. If a new entry is never added to the filter during the backtracking
procedure, then δk → 0 and there are two situations that can occur. If
c(xk ) = 0, then the problem looks like an unconstrained problem. If f is
reduced then we must make progress and conventional trust-region theory
applies. On the other hand, if c(xk ) = 0, then reducing the trust-region ra-
dius will eventually give an infeasible QP. In this case, the method switches
to a restoration phase that focuses on minimizing h(x) subject to x ≥ 0.
In this case a restoration filter may be defined that allows nonmonotone
progress on h(x). Note that it is possible for the QP to be infeasible for
any infeasible xk . In this situation the filter method will converge to a
nonoptimal local minimizer of h(x) (just as the Han-Powell method may
converge to a nonoptimal local minimizer of the merit function).
The convergence properties of filter-SQP methods are similar to those
of methods that use a merit function. In particular, it is possible to estab-
lish convergence to either a point that satisfies the first-order necessary con-
ditions for optimality, or a point that minimizes h(x) locally (see Fletcher,
Leyffer and Toint [68] for details). It is not necessary that Hk be positive
definite, although x-k must be a global solution of the QP subproblem (3.31)
(see the cautionary opening remarks of the Appendix concerning the solu-
tion of indefinite QPs). Standard examples that exhibit the Maratos effect
for an SQP method with a merit function cause no difficulties for the filter
method. Although the unit step causes an increase in the constraint viola-
tion, and hence an increase in a penalty function, it also causes a decrease
in the objective and so it is acceptable to the filter. However, Fletcher and
Leyffer [67] give a simple example for which the QP solution increases both
the objective and the constraint violations, resulting in a reduction in the
trust-region radius and the rejection of the Newton step. Fletcher and Leyf-
fer propose the use of a second-order correction step analogous to (3.27).
Ulbrich [168] defines a filter that uses the Lagrangian function instead of

www.it-ebooks.info
SQP METHODS 187

f and shows that superlinear convergence may be obtained without using


the second-order correction.
3.4.2. Line-search filter methods. The trust-region filter method
described in Section 3.4.1 may be modified to use a line search by solv-
ing the plain SQP subproblem and replacing the backtracking trust-region
procedure by a conventional backtracking
. line search. In /this case, the
candidate pair for the filter is h(xk + αk pk ), f (xk + αk pk ) , where αk is
a member of a decreasing sequence of steplengths, and pk = x-k − xk , with
x-k a solution of the plain QP (2.17). Analogous to (3.32), the sufficient
decrease criteria for the objective are

Δmk (αk pk ) > 0 and f (xk ) − f (xk + αk pk ) ≥ ηΔmk (αk pk ).

If the trial step length is reduced below a minimum value αmink , the line
search is abandoned and the algorithm switches to the restoration phase.
For more details, the reader is referred to the two papers of Wächter and
Biegler [171, 170]. The caveats of the previous section concerning the def-
inition of Hk also apply to the line-search filter method. In addition, the
absence of an explicit bound on x − xk  provided by the trust-region
constraint adds the possibility of unboundedness of the QP subproblem.
Chin, Rashid and Nor [33] consider a line-search filter method that
includes a second-order correction step during the backtracking procedure.
If xk + αpk is not acceptable to the filter, a second-order correction sk is
defined by solving the equality-constrained QP:

minimize fk + gkT(pk + s) + 12 (pk + s)THk (pk + s)


s∈R
n
(3.33)
subject to c(xk + pk ) + Jk s = 0, (xk + pk + s)A = −pk ν e,

where ν ∈ (2, 3) and A(-xk ) is the active set predicted by the QP subproblem
(for a similar scheme, see Herskovits [121], and Panier and Tits [147, 148]).
Given an optimal solution sk , Chin, Rashid and Nor [33] show that under
certain assumptions, the sufficient decrease criteria

f (xk ) − f (xk + αk pk + α2k sk ) ≥ ηΔmk (αk pk ) and Δmk (αk pk ) > 0

give a sequence {xk } with local Q-superlinear convergence.


3.5. SQP methods based on successive LP and QP. In the
MINLP context, it is necessary to solve a sequence of related nonlinear
programs, some with infeasible constraints. For maximum efficiency, it is
crucial that the active set from one problem is used to provide a warm
start for the next. A substantial benefit of SQP methods is that they are
easily adapted to accept an estimate of the active set. However, if warm
starts are to be exploited fully, it is necessary that the second derivatives of
the problem functions are available and that these derivatives are utilized
by the SQP method. Unfortunately, none of the SQP methods discussed

www.it-ebooks.info
188 PHILIP E. GILL AND ELIZABETH WONG

in Sections 3.2–3.4 are completely suitable for use with second derivatives.
The main difficulty stems from the possibility that the Hessian of the La-
grangian is indefinite, in which case the inequality constrained QP subprob-
lem is nonconvex. A nonconvex QP is likely to have many local solutions,
and may be unbounded. Some SQP methods are only well-defined if the
subproblem is convex—e.g., methods that rely on the use of a positive-
definite quasi-Newton approximate Hessian. Other methods require the
calculation of a global solution of the QP subproblem, which has the bene-
fit of ensuring that the “same” local solution is found for the final sequence
of related QPs. Unfortunately, nonconvex quadratic programming is NP-
hard, and even the seemingly simple task of checking for local optimality
is intractable when there are zero Lagrange multipliers (see the opening
remarks of the Appendix).
One approach to resolving this difficulty is to estimate the active set
using a convex programming approximation of the plain QP subproblem
(2.1). This active set is then used to define an equality-constrained QP
(EQP) subproblem whose solution may be used in conjunction with a merit
function or filter to obtain the next iterate. One of the first methods to
use a convex program to estimate the active set was proposed by Fletcher
and Sainz de la Maza [69], who proposed estimating the active set by
solving a linear program with a trust-region constraint. (Their method
was formulated first as a sequential unconstrained method for minimizing
a nonsmooth composite function. Here we describe the particular form of
the method in terms of minimizing the 1 penalty function P1 (x, ρ) defined
in (1.8).) The convex subproblem has the form

minimize lk (x) = fk + gkT(x − xk ) + ρ-


ck (x)1 + ρ [ x ]− 1
x∈R
n
(3.34)
subject to x − xk  ≤ δk ,

which involves the minimization of a piecewise linear function subject to a


trust-region constraint (cf. (3.26)). If the trust-region constraint is defined
in terms of the infinity-norm, the problem (3.34) is equivalent to the linear
programming (LP) problem:

minimize fk + gkT(x − xk ) + ρeTu + ρeTv + ρeTw


x,w∈Rn ; u,v∈Rm

subject to ck + Jk (x − xk ) − u + v = 0, u ≥ 0, v ≥ 0, (3.35)
xk − δk e ≤ x ≤ xk + δk e, x + w ≥ 0, w ≥ 0.

This equivalence was the motivation for the method to be called the suc-
cessive linear programming (SLP) method. Fletcher and Sainz de la Maza
use the reduction in P1 (x, ρ) predicted by the first-order subproblem (3.34)
to assess the quality of the reduction P1 (xk , ρ) − P1 (xk + dk , ρ) defined by
a second-order method (to be defined below).

www.it-ebooks.info
SQP METHODS 189

Given a positive-definite approximation Bk of the Hessian of the La-


grangian, let qk (x) denote the piecewise quadratic function

qk (x) = lk (x) + 12 (x − xk )TBk (x − xk ).

Let dkLP = x-kLP − xk , where x-kLP is a solution of the LP (3.35), and define
Δlk = lk (xk ) − lk (xk + dkLP ). Then it holds that

qk (xk ) − min qk (xk + d) ≥ 12 Δlk min{Δlk /βk , 1},


d

where βk = (dkLP )T Bk dkLP > 0. This inequality suggests that a suitable


acceptance criterion for an estimate xk + d is

P1 (xk , ρ) − P1 (xk + d, ρ) ≥ ηΔlk min{Δlk /βk , 1},

where η is some preassigned scalar such that 0 < η < 12 . This criterion is
used to determine if the new iterate xk+1 should be set to (i) the current
iterate xk (which always triggers a reduction in the trust-region radius);
(ii) the second-order step xk + dk ; or (iii) the first-order step xk + dkLP .
The test for accepting the second-order step is done first, If the second-
order step fails, then the penalty function is recomputed at xk + dkLP and
the test is repeated to determine if xk+1 should be xk + dkLP . Finally, the
trust-region radius is updated based on a conventional trust-region strategy
that compares the reduction in the penalty function with the reduction
predicted by the LP subproblem (the reader is referred to the original paper
for details).
Next, we consider how to define a second-order step. Let B and N
denote the final LP basic and nonbasic sets for the LP (3.35). To simplify
the description, assume that the optimal u, v and w are zero. A second-
order iterate x-k can be defined as the solution of an equality-constrained
quadratic program (EQP) defined by minimizing the quadratic model
f-k (x) = fk +gkT (x−xk )+ 12 (x−xk )TBk (x−xk ) subject to ck +Jk (x−xk ) = 0,
with the nonbasic variables fixed at their current values. Let pk = x-k − xk ,
where (- xk , π
-k ) is the primal-dual EQP solution. Let p0k denote the vector
of components of pk in the final LP basic set B, with J0k the corresponding
columns of Jk . The vector (0 pk , π
-k ) satisfies the KKT equations
    
0
B 0T
J p
0 k (g k + B k ηk )B
k k
=− , (3.36)
J0k 0 −- πk ck + Jk ηk

where ηk is defined in terms of the final LP nonbasic set, i.e.,



xkLP − xk )i if i ∈ N ;
(-
(ηk )i =
0 if i ∈ N .

There are many ways of solving these KKT equations. The most appropri-
ate method will depend on certain basic properties of the problem being

www.it-ebooks.info
190 PHILIP E. GILL AND ELIZABETH WONG

solved, which include the size of the problem (i.e., the number of variables
and constraints); whether or not the Jacobian is dense or sparse; and how
the approximate Hessian is stored (see, e.g., Section 3.2.1). Fletcher and
Sainz de la Maza suggest finding an approximate solution of the EQP using
a quasi-Newton approximation of the reduced Hessian matrix (see Coleman
and Conn [36]).
The results of Fletcher and Sainz de la Maza may be used to show that,
under reasonable nondegeneracy and second-order conditions, the active set
of the LP subproblem (3.35) ultimately predicts that of the smooth variant
of the penalty function at limit points of {xk }. This implies fast asymptotic
convergence. Fletcher and Sainz de la Maza did not consider the use of
exact second derivatives in their original paper, and it took more than 12
years before the advent of reliable second-derivative trust-region and filter
methods for the EQP subproblem allowed the potential of SLP methods
to be realized. Chin and Fletcher [32] proposed the use of a trust-region
filter method that does not require the use of the 1 penalty function. For
a similar approach that uses a filter, see Fletcher et al. [65]. In a series of
papers, Byrd, Gould, Nocedal and Waltz [25, 26] proposed a method that
employs an additional trust region to safeguard the EQP direction. They
also define an appropriate method for adjusting the penalty parameter.
Recently, Morales, Nocedal and Wu [137], and Gould and Robinson [105,
106, 107] have proposed SQP methods that identify the active set using a
convex QP based on a positive-definite BFGS approximation of the Hessian.

4. SQP issues relevant to MINLP.

4.1. Treatment of linear constraints. An important feature of


SQP methods is that it is relatively easy to exploit the special proper-
ties of linear constraints. This can be an advantage when a method for
MINLP solves a sequence of NLPs that differ by only the number of linear
constraints that are imposed. Suppose that the general linear constraints
are a subset of the constraints defined by c(x) = 0, e.g., cL (x) = Ax−b = 0.
Then a feasible point for the linear constraints cL (x) = Ax − b = 0, x ≥ 0,
can be found by solving the elastic problem

minimize ρeTu + eTv


x∈Rn ; u,v∈Rm
(4.1)
subject to Ax − u + v = b, x ≥ 0, u ≥ 0, v ≥ 0.

This is equivalent to minimizing the one-norm of the general linear con-


straint violations subject to the simple bounds. An important property
of linear constraints is that it is possible to determine the solvability of
a system of linear inequalities in a finite number of steps. If the linear
constraints are infeasible (u = 0 or v = 0), then the SQP algorithm can
terminate without computing the nonlinear functions. Otherwise, all sub-
sequent iterates satisfy the linear constraints.

www.it-ebooks.info
SQP METHODS 191

4.2. Treatment of infeasibilities. If the constraints of the QP sub-


problem (2.1) have no feasible point, then no QP solution exists. This
could be for two reasons, either: (i) the NLP is feasible but the quadratic
programming subproblem is locally infeasible, or (ii) the NLP is infeasible.
If the NLP is convex, then infeasibility of the quadratic programming sub-
problem implies infeasibility of the original problem, but in the nonconvex
case, there is no such implication.
If the subproblem is infeasible, the algorithm may continue in elastic
mode, by solving the elastic QP (3.22). There are two interpretations of
the role of the elastic QP. In one interpretation, the elastic problem defines
a regularization of the plain QP subproblem (2.1). In this case, if the
NLP is feasible and ρk ≥ πk+1 ∞ , then problems (2.1) and (3.22) are
equivalent. An alternative interpretation is to view the elastic QP as the
QP subproblem associated with the elastic nonlinear problem (1.4), so that
the elastic constraints are present in the original problem and are inherited
by the QP subproblem. Note that any solution of the NLP may be regarded
as a solution of (3.22) for a value of ρk such that ρk ≥ πk+1 ∞ . Hence,
even if ρk is not present explicitly, we may consider both the subproblem
(3.22) and the original problem (1.4) to have an associated implicit value
of ρk that is larger than πk+1 ∞ .

4.3. Infeasibility detection. As we discussed in Section 1.3, it is


important to be able to determine as quickly as possible if the NLP (1.2)
is infeasible. In an SQP framework, infeasibility may be detected either
by solving the QP subproblem in elastic mode (3.22) with a sequence of
penalty parameters ρk → ∞, or by solving a sequence of elastic nonlinear
problems of the form (1.4) with ρk → ∞. For an SQP method that solves
a sequence of nonlinear elastic problems and uses a quasi-Newton approxi-
mation to the Hessian, infeasibility is usually signaled by the occurrence of
a sequence of elastic problems in which the penalty parameter is increased,
but the current xk remains fixed, i.e., an optimal solution for a problem
with ρ = ρk is optimal for the problem with ρ = ρk+1 , etc. This is usually
a reliable indication that xk is a local minimizer of the sum of infeasibil-
ities. This behavior can be explained by the fact that a warm start uses
the approximate Hessian from the previous elastic problem, which is not
changed as ρk and the QP-multipliers are increased. This is one situation
where the inability of a quasi-Newton Hessian to adapt to changes in the
multipliers is beneficial!
The situation is different when the SQP method uses the exact Hessian
of the Lagrangian. In this case, the multipliers reflect the magnitude of
ρk , and so the Hessian changes substantially. In the following, we give
a brief discussion of this case that reflects the paper of Byrd, Curtis and
Nocedal [24]. For an infeasible problem, it must hold that ρk → ∞ and
ρk > ρk−1 for an infinite subsequence of iterates. In this situation, different
problems are being solved at outer iterations k −1 and k. At iteration k −1,

www.it-ebooks.info
192 PHILIP E. GILL AND ELIZABETH WONG

the problem is the elastic problem (1.4) with ρ = ρk−1 , whereas at iteration
k, the problem is the elastic problem with ρ = ρk . We may write
ρk . /
f (x) + ρk eTu + ρk eTv = f (x) + ρk−1 eTu + ρk−1 eTv . (4.2)
ρk−1

If the NLP is infeasible, it must hold that u + v > 0. If ρk−1 is large,
with ρk > ρk−1 and u + v > 0, then the term f (x) is negligible in (4.2),
i.e., f (x) ρk−1 eTu + ρk−1 eTv, so that

f (x) + ρk eTu + ρk eTv ≈ ρk eTu + ρk eTv


ρk . /
= ρk−1 eTu + ρk−1 eTv (4.3)
ρk−1
ρk . /
≈ f (x) + ρk−1 eTu + ρk−1 eTv .
ρk−1

The form of (4.4) implies that the elastic problems at iterations k − 1


and k differ (approximately) by only a multiplicative factor ρk /ρk−1 in the
scaling of the objective function. The approximation becomes increasingly
accurate as ρk−1 tends to infinity.
Let (xk , uk , vk ) be the solution provided by the elastic QP subprob-
lem at iteration k − 1, with corresponding Lagrange multiplier estimates
(πk , zk ). Also assume that (xk , uk , vk ) is close to optimal for the corre-
sponding elastic problem (1.4) with ρ = ρk−1 . If ρk > ρk−1 , the question
is how to provide a good initial point to this new problem. If (xk , uk , vk )
is the exact solution of the elastic problem for ρ = ρk−1 , then (πk , zk ) are
the corresponding Lagrange multipliers. Moreover, if the objective func-
tions differ by the factor ρk /ρk−1 , then (xk , uk , vk ) is again optimal for
the new problem, and the dual variables inherit the same scaling as the
objective
. function (see (1.6b)).
/ In this situation, the new multipliers are
(ρk /ρk−1 )πk , (ρk /ρk−1 )zk . Based on these observations, in an idealized
situation, we expect that (xk , uk , vk ), together with scaled Lagrange multi-
plier estimates (ρk /ρk−1 )πk and (ρk /ρk−1 )zk , provide good initial estimates
for the new elastic QP subproblem. Hence, if second derivatives are used in
the QP subproblem, the Hessian of the Lagrangian . should be evaluated at/
(xk , uk , vk ) with Lagrange multiplier estimates (ρk /ρk−1 )πk , (ρk /ρk−1 )zk
in order to obtain fast convergence as ρk increases.
As ρk tends to infinity, the objective function becomes less important
compared to the penalty term in the objective of (1.4). Eventually only
the infeasibilities matter, and the iterates converge to a local minimizer of
the sum of infeasibilities. See Byrd, Curtis and Nocedal [24] for a detailed
discussion on infeasibility detection, including a discussion on how to let
ρk → ∞ rapidly.
4.4. Solving a sequence of related QP subproblems. In MINLP
branch and bound methods it is necessary to solve a sequence of NLPs
that differ by a single constraint (see, e.g., Leyffer [128], and Goux and

www.it-ebooks.info
SQP METHODS 193

Leyffer [111]). For example, at the solution of a relaxed problem, some


integer variables take a non-integer value. The MINLP algorithm selects
one of the integer variables that takes a non-integer value, say xi with value
x̄i , and branches on it. Branching generates two new NLP problems by
adding simple bounds xi ≤  x̄i  and xi ≥  x̄i  + 1 to the NLP relaxation
(where  v  is the largest integer not greater than v). The SQP methods of
Section 3.5 that solve an initial convex programming problem to determine
the active set have the advantage that a dual QP/LP solver may be used to
solve the convex QP subproblem (dual active-set QP methods are discussed
in Section A.2). This provides similar advantages to MINLP solvers as the
dual simplex method provides to MILP. If the SQP method is implemented
with a dual QP solver, and is warm started with the primal-dual solution
of the previous relaxation, then the dual variables are feasible and only one
branched variable is infeasible. The infeasible xi can be moved towards
feasibility immediately.
A similar situation applies if a nonlinear cut adds a constraint to the
NLP. For simplicity, assume that the QP has objective g Tx + 12 xTHx and
constraints Ax = b, x ≥ 0. As the QP is in standard form, the cut adds
a new row and column to A, a zero element to the objective g, and a zero
row and column to H. This gives a new problem with Ā, b̄, ḡ and H̄ (say).
The new column of Ā corresponds to the unit vector associated with the
new slack variable. An obvious initial basis for the new problem is

 
AB 0
ĀB = ,
aT 1

so the new basic solution x̄B is the old solution xB , augmented by the new
slack, which is infeasible. This means that if we solve the primal QP then it
would be necessary to go into phase 1 to get started. However, by solving
the dual QP, then we have an initial feasible subspace minimizer for the
dual based on a ȳ B (= x̄B ) such that ĀB ȳ B = b̄ and

z̄ = ḡ + H̄ ȳ − ĀTπ̄.

We can choose π̄ to be the old π augmented by a zero. The new element


of ȳB corresponds to the new slack, so the new elements of ḡ and row and
column of H̄ are zero. This implies that z̄ is essentially z, and hence z̄ ≥ 0.

Acknowledgments. The authors would like to thank Anders Fors-


gren for numerous discussions on SQP methods during the preparation of
this paper. We are also grateful to the referees for suggestions that sub-
stantially improved the presentation.

www.it-ebooks.info
194 PHILIP E. GILL AND ELIZABETH WONG

APPENDIX
A. Methods for quadratic programming. We consider methods
for the quadratic program
minimize g T(x − xI ) + 12 (x − xI )TH(x − xI )
x∈Rn
(A.1)
subject to Ax = AxI − b, x ≥ 0,
where g, H, b, A and xI are given constant quantities, with H symmetric.
The QP objective is denoted by f-(x), with gradient g-(x) = g + H(x − xI ).
In some situations, the general constraints will be written as c-(x) = 0, with
c-(x) = A(x − xI ) + b. The QP active set is denoted by A(x). A primal-
dual QP solution is denoted by (x∗ , π ∗ , z ∗ ). In terms of the QP defined at
the kth outer iteration of an SQP method, we have xI = xk , b = c(xk ),
g = g(xk ), A = J(xk ) and H = H(xk , πk ). It is assumed that A has rank
m. No assumptions are made about H other than symmetry. Conditions
that must hold at an optimal solution of (A.1) are provided by the following
result (see, e.g., Borwein [11], Contesse [43] and Majthay [132]).
Result A.1 (QP optimality conditions). The point x∗ is a local
minimizer of the quadratic program (A.1) if and only if
(a) c-(x∗ ) = 0, x∗ ≥ 0, and there exists at least one pair of vectors
π∗ and z ∗ such that g-(x∗ ) − AT π ∗ − z ∗ = 0, with z ∗ ≥ 0, and
z ∗ · x∗ = 0;
(b) pT Hp ≥ 0 for all nonzero p satisfying g-(x∗ )Tp = 0, Ap = 0, and
pi ≥ 0 for every i ∈ A(x∗ ).
Part (a) gives the first-order KKT conditions (2.18) for the QP (A.1). If
H is positive semidefinite, the first-order KKT conditions are both necessary
and sufficient for (x∗ , π ∗ , z ∗ ) to be a local primal-dual solution of (A.1).
Suppose that (x∗ , π∗ , z ∗ ) satisfies condition (a) with zi∗ = 0 and x∗i = 0
for some i. If H is positive semidefinite, then x∗ is a weak minimizer of
(A.1). In this case, x∗ is a global minimizer with a unique global minimum
f-(x∗ ). If H has at least one negative eigenvalue, then x∗ is known as a dead
point. Verifying condition (b) at a dead point requires finding the global
minimizer of an indefinite quadratic form over a cone, which is an NP-hard
problem (see, e.g., Cottle, Habetler and Lemke [44], Pardalos and Schnit-
ger [149], and Pardalos and Vavasis [150]). This implies that the optimality
of a candidate solution of a general quadratic program can be verified only
if more restrictive (but computationally tractable) sufficient conditions are
satisfied. A dead point is a point at which the sufficient conditions are not
satisfied, but certain necessary conditions hold. Computationally tractable
necessary conditions are based on the following result.
Result A.2 (Necessary conditions for optimality). The point x∗ is a
local minimizer of the QP (A.1) only if
(a) c-(x∗ ) = 0, x∗ ≥ 0, and there exists at least one pair of vectors
π∗ and z ∗ such that g-(x∗ ) − AT π ∗ − z ∗ = 0, with z ∗ ≥ 0, and
z ∗ · x∗ = 0;

www.it-ebooks.info
SQP METHODS 195

(b) pT Hp ≥ 0 for all nonzero p satisfying Ap = 0, and pi = 0 for every


i ∈ A(x∗ ).
Suitable sufficient conditions for optimality are given by (a)–(b) with (b)
replaced by the condition that pT Hp ≥ ωp2 for some ω > 0 and all p
such that Ap = 0, and pi = 0 for every i ∈ A+ (x), where A+ (x) is the
index set A+ (x) = {i ∈ A(x) : zi > 0}.
Typically, software for general quadratic programming is designed to
terminate at a dead point. Nevertheless, it is possible to define procedures
that check for optimality at a dead point, but the chance of success in a
reasonable amount of computation time depends on the dimension of the
problem (see Forsgren, Gill and Murray [72]).
A.1. Primal active-set methods. We start by reviewing the prop-
erties of primal-feasible active-set methods for quadratic programming. An
important feature of these methods is that once a feasible iterate is found,
all subsequent iterates are feasible. The methods have two phases. In the
first phase (called the feasibility phase or phase one), a feasible point is
found by minimizing the sum of infeasibilities. In the second phase (the
optimality phase or phase two), the quadratic objective function is mini-
mized while feasibility is maintained. Each phase generates a sequence of
inner iterates {xj } such that xj ≥ 0. The new iterate xj+1 is defined as
xj+1 = xj + αj pj , where the step length αj is a nonnegative scalar, and pj
is the QP search direction. For efficiency, it is beneficial if the computa-
tions in both phases are performed by the same underlying method. The
two-phase nature of the algorithm is reflected by changing the function
being minimized from a function that reflects the degree of infeasibility to
the quadratic objective function. For this reason, it is helpful to consider
methods for the optimality phase first.
At the jth step of the optimality phase, c-(xj ) = A(xj − xI )+ b = 0 and
xj ≥ 0. The vector pj is chosen to satisfy certain properties with respect
to the objective and constraints. First, pj must be a direction of decrease
for f- at xj , i.e., there must exist a positive ᾱ such that

f-(xj + αpj ) < f-(xj ) for all α ∈ (0, ᾱ].


In addition, xj +pj must be feasible with respect to the general constraints,
and feasible with respect to the bounds associated with a certain “working
set” of variables that serves as an estimate of the optimal active set of
the QP. Using the terminology of linear programming, we call this working
set of variables the nonbasic set, denoted by N = {ν1 , ν2 , . . . , νnN }.
Similarly, we define the set B of indices that are not in N as the basic set,
with B = {β1 , β2 , . . . , βnB }, where nB = n − nN . Although B and N are
strictly index sets, we will follow common practice and refer to variables
xβr and xνs as being “in B” and “in N ” respectively.
With these definitions, we define the columns of A indexed by N and
B, the nonbasic and basic columns of A, as AN and AB , respectively. We re-

www.it-ebooks.info
196 PHILIP E. GILL AND ELIZABETH WONG

frain from referring to the nonbasic and basic sets as the “fixed” and “free”
variables because some active-set methods allow some nonbasic variables
to move (the simplex method for linear programming being one prominent
example). An important attribute of the nonbasic set is that AB has rank
m, i.e., the rows of AB are linearly independent. This implies that the
cardinality of the nonbasic set must satisfy 0 ≤ nN ≤ n − m. It must be
emphasized that our definition of N does not require a nonbasic variable to
be active (i.e., at its lower bound). Also, whereas the active set is defined
uniquely at each point, there are many choices for N (including the empty
set). Given any n-vector y, the vector of basic components of y, denoted
by yB , is the nB -vector whose jth component is component βj of y. Simi-
larly, yN , the vector nonbasic components of y, is the nN -vector whose jth
component is component νj of y.
Given a basic-nonbasic partition of the variables, we introduce the
definitions of stationarity and optimality with respect to a basic set.
Definition A.1 (Subspace stationary point). Let B be a basic set
defined at an x- such that c-(- x) = 0. Then x- is a subspace stationary point
with respect to B (or, equivalently, with respect to AB ) if there exists a
vector π such that g-B (- x) = ATB π. Equivalently, x- is a subspace stationary
point with respect to B if the reduced gradient ZBT g-B (- x) is zero, where the
columns of ZB form a basis for the null-space of AB .
If x- is a subspace stationary point, f- is stationary on the subspace
{x : A(x − x-) = 0, xN = x-N }. At a subspace stationary point, it holds that
g(-x) = AT π + z, where zi = 0 for i ∈ B—i.e., zB = 0. Subspace stationary
points may be classified based on the curvature of f- on the nonbasic set.
Definition A.2 (Subspace minimizer). Let x- be a subspace stationary
point with respect to B. Let the columns of ZB form a basis for the null-
space of AB . Then x- is a subspace minimizer with respect to B if the
reduced Hessian ZBT HZB is positive definite.
If the nonbasic variables are active at x-, then x- is called a standard
subspace minimizer. At a standard subspace minimizer, if zN ≥ 0 then x-
satisfies the necessary conditions for optimality. Otherwise, there exists an
index νs ∈ N such that zνs < 0. If some nonbasic variables are not active
at x-, then x- is called a nonstandard subspace minimizer.
It is convenient sometimes to be able to characterize the curvature of
f- in a form that does not require the matrix ZB explicitly. The inertia of
a symmetric matrix X, denoted by In(X), is the integer triple (i+ , i− , i0 ),
where i+ , i− and i0 denote the number of positive, negative and zero eigen-
values of X. Gould [101] shows that if AB has rank m and AB ZB = 0, then
ZBT HB ZB is positive definite if and only if
 
HB ATB
In(KB ) = (nB , m, 0), where KB = (A.2)
AB 0
(see Forsgren [70] for a more general discussion, including the case where AB
does not have rank m). Many algorithms for solving symmetric equations

www.it-ebooks.info
SQP METHODS 197

that compute an explicit matrix factorization of KB also provide the inertia


as a by-product of the calculation, see, e.g., Bunch [17], and Bunch and
Kaufman [18].
Below, we discuss two alternative formulations of an active-set method.
Each generates a feasible sequence {xj } such that xj+1 = xj + αj pj with
f-(xj+1 ) ≤ f-(xj ). Neither method requires the QP to be convex, i.e., H need
not be positive semidefinite. The direction pj is defined as the solution of an
QP subproblem with equality constraints. Broadly speaking, the nonbasic
components of pj are specified and the basic components of pj are adjusted
to satisfy the general constraints A(xj + pj ) = AxI − b. If pB and pN denote
the basic and nonbasic components of pj , then the nonbasic components
are fixed by enforcing constraints of the form pN = dN , where dN is a
constant vector that characterizes the active-set method being used. The
restrictions on pj define constraints Ap = 0 and pN = dN . Any remaining
degrees of freedom are used to define pj as the direction that produces the
largest reduction in f-. This gives the equality constrained QP subproblem
minimize g-(xj )Tp + 12 pT Hp subject to Ap = 0, pN = dN .
p

In the following sections we define two methods based on alternative defi-


nitions of dN . Both methods exploit the properties of a subspace minimizer
(see Definition A.2) in order to simplify the linear systems that must be
solved.
A.1.1. Nonbinding-direction methods. We start with a method
that defines a change in the basic-nonbasic partition at every iteration. In
particular, one of three changes occurs: (i) a variable is moved from the
basic set to the nonbasic set; (ii) a variable is moved from the nonbasic set
to the basic set; or (ii) a variable in the basic set is swapped with a variable
in the nonbasic set. These changes result in a column being added, deleted
or swapped in the matrix AB .
In order to simplify the notation, we drop the subscript j and consider
the definition of a single iteration that starts at the primal-dual point (x, π)
and defines a new iterate (x̄, π̄) such that x̄ = x + αp and π̄ = π + αqπ .
A crucial assumption about (x, π) is that it is a subspace minimizer with
respect to the basis B. It will be shown that this assumption guarantees
that the next iterate (x̄, π̄) (and hence each subsequent iterate) is also a
subspace minimizer.
Suppose that the reduced cost associated with the sth variable is neg-
ative, i.e., zνs < 0. The direction p is defined so that all the nonbasic
components are fixed except for the sth, which undergoes a unit change.
This definition implies that a positive step along p increases xνs but leaves
all the other nonbasics unchanged. The required direction is defined by the
equality constrained QP subproblem:
minimize g-(x)Tp + 12 pT Hp subject to Ap = 0, pN = es , (A.3)
p

www.it-ebooks.info
198 PHILIP E. GILL AND ELIZABETH WONG

and is said to be nonbinding with respect to the nonbasic variables. If the


multipliers for the constraints Ap = 0 are defined in terms of an increment
qπ to π, then pB and qπ satisfy the optimality conditions
⎛ ⎞⎛ ⎞ ⎛ ⎞
HB −ATB HD pB g-B (x) − ATB π
⎝ AB 0 AN ⎠ ⎝ qπ ⎠ = − ⎝ 0 ⎠,
0 0 IN pN − es

where, as above, g-B (x) are the basic components of g-(x), and HB and HD
are the basic rows of the basic and nonbasic columns of H. If x is a subspace
minimizer, then g-B (x) − ATB π = 0, so that this system simplifies to
⎛ ⎞⎛ ⎞ ⎛ ⎞
HB −ATB HD pB 0
⎝ AB 0 AN ⎠ ⎝ qπ ⎠ = ⎝ 0 ⎠ , (A.4)
0 0 IN pN es
yielding pB and qπ as the solution of the smaller system
    
HB −ATB pB (hνs )B
=− . (A.5)
AB 0 qπ a νs
The increment qN for multipliers zN are computed from pB , pN and qπ as
qN = (Hp − ATqπ )N . Once pB and qπ are known, a nonnegative step α is
computed so that x + αp is feasible and f-(x + αp) ≤ f-(x). The step that
minimizes f- as a function of α is given by

g(x)Tp/pTHp
−- if pTHp > 0,
α∗ = (A.6)
+∞ otherwise.
The best feasible step is then α = min{α∗ , αM }, where αM is the maximum
feasible step:

⎨ (xB )i if (pB )i < 0,
αM = min {γi }, where γi = −(pB )i (A.7)
1≤i≤nB ⎩
+∞ otherwise.
(As pN = es and the problem contains only lower bounds, x + tp remains
feasible with respect to the nonbasic variables for all t ≥ 0.) If α = +∞
then f- decreases without limit along p and the problem is unbounded.
Otherwise, the new iterate is (x̄, π̄) = (x + αp, π + αqπ ).
It is instructive to define the step α∗ of (A.6) in terms of the identities

g-(x)Tp = zνs and pTHp = (qN )s , (A.8)


which follow from the equations (A.4) that define pB and pN . Then, if α∗
is bounded, we have α∗ = −zνs /(qN )s , or, equivalently,
zνs + α∗ (qN )s = 0.

www.it-ebooks.info
SQP METHODS 199

Let z(t) denote the vector of reduced costs at any point on the ray (x +
tp, π +tqπ ), i.e., z(t) = g-(x+tp)−AT(π +tqπ ). It follows from the definition
of p and qπ of (A.4) that zB (t) = 0 for all t, which implies that x + tp is a
subspace stationary point for any step t. (Moreover, x + tp is a subspace
minimizer because the KKT matrix KB is independent of t.) This property,
known as the parallel subspace property of quadratic programming, implies
that x + tp is the solution of an equality-constraint QP in which the bound
on the sth nonbasic is shifted to pass through x + tp. The component
zνs (t) is the reduced cost associated with the shifted version of the bound
xνs ≥ 0. By definition, the sth nonbasic reduced cost is negative at x,
i.e., zνs (0) < 0. Moreover, a simple calculation shows that zνs (t) is an
increasing linear function of t with zνs (α∗ ) = 0 if α∗ is bounded. A zero
reduced cost at t = α∗ means that the shifted bound can be removed from
the equality-constraint problem (A.3) (defined at x = x̄) without changing
its minimizer. Hence, if x̄ = x + α∗ p, the index νs is moved to the basic set,
which adds column aνs to AB for the next iteration. The shifted variable
has been removed from the nonbasic set, which implies that (x̄, π̄) is a
standard subspace minimizer.
If we take a shorter step to the boundary of the feasible region, i.e.,
αM < α∗ , then at least one basic variable lies on its bound at x̄ = x + αp,
and one of these, xβr say, is made nonbasic. If ĀB denotes the matrix AB
with column r deleted, then ĀB is not guaranteed to have full row rank
(for example, if x is a vertex, AB is square and ĀB has more rows than
columns). The linear independence of the rows of ĀB is characterized by
the so-called “singularity vector” uB given by the solution of the equations

    
HB −ATB uB e
= r . (A.9)
AB 0 vπ 0

The matrix ĀB has full rank if and only if uB = 0. If ĀB is rank deficient,
x̄ is a subspace minimizer with respect to the basis defined by removing
xνs , i.e., xνs is effectively replaced by xβr in the nonbasic set. In this case,
it is necessary to update the dual variables again to reflect the change of
basis (see Gill and Wong [96] for more details). The new multipliers are
π̄ + σvπ , where σ = g-(x̄)T p/(pB )r .
As defined above, this method requires the solution of two KKT sys-
tems at each step (i.e., equations (A.5) and (A.9)). However, if the solution
of (A.9) is such that uB = 0, then the vectors pB and qπ needed at x̄ can be
updated in O(n) operations using the vectors uB and vπ . Hence, it is un-
necessary to solve (A.5) when a basic variable is removed from B following
a restricted step.
Given an initial standard subspace minimizer x0 and basic set B0 , this
procedure generates a sequence of primal-dual iterates {(xj , πj )} and an
associated sequence of basic sets {Bj }. The iterates occur in groups of
consecutive iterates that start and end at a standard subspace minimizer.

www.it-ebooks.info
200 PHILIP E. GILL AND ELIZABETH WONG

Each of the intermediate iterates is a nonstandard subspace minimizer at


which the same nonbasic variable may not be on its bound. At each in-
termediate iterate, a variable moves from B to N . At the first (standard)
subspace minimizer of the group, a nonbasic variable with a negative re-
duced cost is targeted for inclusion in the basic set. In the subsequent set
of iterations, this reduced cost is nondecreasing and the number of basic
variables decreases. The group of consecutive iterates ends when the tar-
geted reduced cost reaches zero, at which point the associated variable is
made basic.
The method outlined above is based on a method first defined for
constraints in all-inequality form by Fletcher [60], and extended to sparse
QP by Gould [103]. Recent refinements, including the technique for reduc-
ing the number of KKT solves, are given by Gill and Wong [96]. Each of
these methods is an example of an inertia-controlling method. The idea of
an inertia-controlling method is to use the active-set strategy to limit the
number of zero and negative eigenvalues in the KKT matrix KB so that it
has inertia (nB , m, 0) (for a survey, see Gill et al. [92]). At an arbitrary
feasible point, a subspace minimizer can be defined by making sufficiently
many variables temporarily nonbasic at their current value (see, e.g., Gill,
Murray and Saunders [83] for more details).
A.1.2. Binding-direction methods. The next method employs a
more conventional active-set strategy in which the nonbasic variables are
always active. We start by assuming that the QP is strictly convex, i.e.,
that H is positive definite. Suppose that (x, π) is a feasible primal-dual
pair such that xi = 0 for i ∈ N , where N is chosen so that AB has rank
m. As in a nonbinding direction method, the primal-dual direction (p, qπ )
is computed from an equality constrained QP subproblem. However, in
this case the constraints of the subproblem not only force Ap = 0 but also
require that every nonbasic variable remains unchanged for steps of the
form x + αp. This is done by fixing the nonbasic components of p at zero,
giving the equality constraints Ap = AB pB + AN pN = 0 and pN = 0. The
resulting subproblem defines a direction that is binding, in the sense that
it is “bound” or “attached” to the constraints in the nonbasic set. The QP
subproblem that gives the best improvement in f- is then

minimize g-(x)Tp + 12 pT Hp subject to AB pB = 0, pN = 0. (A.10)


p

The optimality conditions imply that pB and qπ satisfy the KKT system
    
HB −ATB pB g-B (x) − ATB π
=− . (A.11)
AB 0 qπ 0

These equations are nonsingular under our assumptions that H is positive


definite and AB has rank m. If (x, π) is a subspace stationary point, then
zB = g-B (x) − ATB π = 0 and the solution (pB , qπ ) is zero. In this case, no

www.it-ebooks.info
SQP METHODS 201

improvement can be made in f- along directions in the null-space of AB .


If the components of z = g-(x) − ATπ are nonnegative then x is optimal
for (A.1). Otherwise, a nonbasic variable with a negative reduced cost is
selected and moved to the basic set (with no change to x), thereby defining
(A.11) with new AB , HB and (necessarily nonzero) right-hand side. Given a
nonzero solution of (A.11), x + p is either feasible or infeasible with respect
to the bounds. If x + p is infeasible, N cannot be the correct nonbasic set
and feasibility is maintained by limiting the step by the maximum feasible
step αM as in (A.7). At the point x̄ = x + αp, at least one of the basic
variables must reach its bound and it is moved to the nonbasic set for the
next iteration. Alternatively, if x + p is feasible, x̄ = x + p is a subspace
minimizer and a nonoptimal nonbasic variable is made basic as above.
The method described above defines groups of consecutive iterates that
start with a variable being made basic. No more variables are made basic
until either an unconstrained step is taken (i.e., α = 1), or a sequence of
constrained steps results in the definition of a subspace minimizer (e.g., at a
vertex). At each constrained step, the number of basic variables decreases.
As H is positive definite in the strictly convex case, the KKT equations
(A.11) remain nonsingular as long as AB has rank m. One of the most
important properties of a binding-direction method is that once an initial
nonbasic set is chosen (with the implicit requirement that the associated
AB has rank m), then all subsequent AB will have rank m (and hence
the solution of the KKT system is always well defined). This result is of
sufficient importance that we provide a brief proof.
If a variable becomes basic, a column is added to AB and the rank does
not change. It follows that the only possibility for AB to lose rank is when
a basic variable is made nonbasic. Assume that AB has rank m and that
the first basic variable is selected to become nonbasic, i.e., .r = 1. If/ ĀB
denotes the matrix AB without its first column, then AB = aβr ĀB . If
ĀB does not have rank m then there must exist a nonzero m-vector v̄ such
that ĀTB v̄ = 0. If σ denotes the quantity σ = −aTβr v̄, then the (m+1)-vector
v = (v̄, σ) satisfies
  
aTβr 1 v̄ . /
= 0, or equivalently, ATB er v = 0.
ĀTB 0 σ

The scalar σ must be nonzero or else ATB v̄ = 0, which would contradict the
assumption that AB has rank m. Then
   
T AB T 0
v pB = v = σ(pB )r = 0,
eTr (pB )r

which implies that (pB )r = 0. This is a contradiction because the ratio test
(A.7) will choose βr as the outgoing basic variable only if (pB )r < 0. It
follows that v̄ = 0, and hence ĀB must have rank m.

www.it-ebooks.info
202 PHILIP E. GILL AND ELIZABETH WONG

If H is not positive definite, the KKT matrix KB associated with the


equations (A.11) may have fewer than nB positive eigenvalues (cf. (A.2)),
i.e., the reduced Hessian ZBT HB ZB may be singular or indefinite. In this
situation, the subproblem (A.10) is unbounded and the equations (A.11)
cannot be used directly to define p. In this case we seek a direction p such
that pN = 0 and AB pB = 0, where

gBT pB < 0, and pTB HB pB ≤ 0. (A.12)

The QP objective decreases without bound along such a direction, so either


the largest feasible step αM (A.7) is infinite, or a basic variable must become
nonbasic at some finite αM such that f-(x + αM p) ≤ f-(x). If αM = +∞,
the QP problem is unbounded and the algorithm is terminated.
A number of methods1 maintain an unsymmetric block-triangular de-
composition of KB in which the reduced Hessian ZBT HB ZB is one of the
diagonal blocks (the precise form of the decomposition is discussed in Sec-
tion A.4.1). Given this block-triangular decomposition, the methods of Gill
and Murray [82], Gill et al. [87, 92], and Gill, Murray and Saunders [83]
factor the reduced Hessian as LB DB LTB , where LB is unit lower triangular
and DB is diagonal. These methods control the inertia of KB by starting
the iterations at a subspace minimizer. With this restriction, the reduced
Hessian has at most one nonpositive eigenvalue, and the direction pB is
unique up to a scalar multiple. This property allows the computation to
be arranged so that DB has at most one nonpositive element, which al-
ways occurs in the last position. The vector pB is then computed from a
triangular system involving the rows and columns of LB associated with
the positive-definite principal submatrix of ZBT HB ZB (see, e.g., Gill et al.
[87, 92] for further details).
The method of Bunch and Kaufman [19] allows the reduced Hessian to
have any number of nonpositive eigenvalues in the KKT matrix (and there-
fore need not be started at a subspace minimizer). In this case, a symmetric
indefinite factorization of ZBT HB ZB is maintained, giving a block diagonal
factor DB with 1 × 1 or 2 × 2 diagonal blocks. In the strictly convex case,
methods may be defined that employ a symmetric block decomposition of
KB , see, e.g., Gill et al. [79].
As the reduced Hessian may not be positive definite, methods that
maintain a block-triangular decomposition of KB must use customized
methods to factor and modify ZBT HB ZB as the iterations proceed. This
makes it difficult to apply general-purpose solvers that exploit structure
in A and H. Methods that factor the KKT system directly are also prob-
lematic because KB can be singular. Fletcher [60] proposed that any po-
tential singularity be handled by embedding KB in a larger system that is

1 Some were first proposed for the all-inequality constraint case, but they are easily

reformulated for constraints in standard form.

www.it-ebooks.info
SQP METHODS 203

known to be nonsingular. This idea was extended to sparse KKT equations


by Gould [103]. Fletcher and Gould define an inertia-controlling method
based on solving a nonsingular bordered system that includes information
associated with the variable xβs that was most recently made basic. The
required binding direction pB may be found by solving the bordered system
⎛ ⎞⎛ ⎞ ⎛ ⎞
HB ATB es pB 0
⎝ AB ⎠ ⎝ −qπ ⎠ = ⎝ 0 ⎠ ,
eTs −μ 1

which is nonsingular. A simple calculation shows that pB , qπ and μ satisfy

    
H̄ B −ĀTB pB (hβs )B̄
=− and μ = pTB HB pB , (A.13)
ĀB 0 qπ aβs

where B̄ is the basic set with index βs omitted. A comparison of (A.13) with
(A.5) shows that their respective values of (pB , qπ ) are the same, which im-
plies that Fletcher-Gould binding direction is identical to the nonbinding
direction of Section A.1.1. In fact, all binding and nonbinding direction
inertia-controlling methods generate the same sequence of iterates when
started at the same subspace minimizer. The only difference is in the or-
der in which the computations are performed—binding-direction methods
make the targeted nonbasic variable basic at the start of the sequence of
consecutive iterates, whereas nonbinding-direction methods make the vari-
able basic at the end of the sequence when the associated shifted bound
constraint has a zero multiplier. However, it must be emphasized that not
all QP methods are inertia controlling. Some methods allow any number of
zero eigenvalues in the KKT matrix—see, for example, the Bunch-Kaufman
method mentioned above, and the QP methods in the GALAHAD software
package of Gould, Orban, and Toint [109, 110, 104].

A.2. Dual active-set methods. In the convex case (i.e., when H is


positive semidefinite) the dual of the QP subproblem (A.1) is

minimize f-D (w, π, z) = bTπ + xTI z + 12 (w − xI )T H(w − xI )


w∈Rn ,π∈Rm ,z∈Rn (A.14)
subject to H(w − xI ) − ATπ − z = −g, z ≥ 0.

The dual constraints are in standard form, with nonnegativity constraints


on z. The optimality conditions for the dual are characterized by the
following result.
Result A.3 (Optimality conditions for the dual). The point (w, π, z)
is a minimizer of the dual QP (A.14) if and only if
(a) (w, π, z) satisfies H(w − xI ) − ATπ − z = −g, with z ≥ 0;
(b) there exists an n-vector y such that
(i) H(w − xI ) = H(y − xI );

www.it-ebooks.info
204 PHILIP E. GILL AND ELIZABETH WONG

(ii) Ay = AxI − b, with y ≥ 0; and


(iii) y · z = 0.
The components of y are the Lagrange multipliers for the dual bounds
z ≥ 0. Similarly, the components of y − xI are the “π-values”, i.e., the
multipliers for the equality constraints H(w − xI ) − ATπ − z = −g. The re-
lationship between the primal and dual solution was first given by Dorn [50].
If the dual has a bounded solution, then part (b) implies that the vector y
of multipliers for the dual is a KKT point of the primal, and hence consti-
tutes a primal solution. Moreover, if the dual has a bounded solution and
H is positive definite, then w = y.
A.2.1. A dual nonbinding direction method. Dual active-set
methods can be defined that are based on applying conventional primal
active-set methods to the dual problem (A.14). For brevity, we consider
the case where H is positive definite; the positive semidefinite case is con-
sidered by Gill and Wong [96]. Consider a feasible point (w, π, z) for the
dual QP (A.14), i.e., H(w − xI ) − ATπ − z = −g and z ≥ 0. Our intention
is to make the notation associated with the dual algorithm consistent with
the notation for the primal. To do this, we break with the notation of Sec-
tion A.1 and use B to denote the nonbasic set and N to denote the basic
set for the dual QP. This implies that the dual nonbasic variables are {zβ1 ,
zβ2 , . . . , zβnB }, where nB = n − nN .
. /
A dual basis contains all the columns of H −AT together with the
unit columns corresponding to the dual basic variables, i.e., the columns of
I with indices in N . It follows that the rows and columns of the dual basis
may be permuted to give
 
HB HD −ATB 0
, (A.15)
HDT HN −ATN −IN
where AN and AB denote the columns of A indexed by N and B. The
dual nonbasic set B = {β1 , β2 , . . . , βnB } now provides an estimate of
which of the bounds z ≥ 0 are active at the solution of (A.14). As H is
positive definite, the dual basis has full row rank regardless of the rank of
the submatrix −ATB . This implies that if the columns of AB are to be used
to define a basis for a primal solution, it is necessary to impose additional
conditions on the dual basis. Here, we assume that the matrix
 
HB ATB
KB = (A.16)
AB 0
is nonsingular. This condition ensures that AB has rank m. To distin-
guish KB from the full KKT matrix for the dual, we refer to KB as the
reduced KKT matrix. The next result concerns the properties of a subspace
minimizer for the dual QP.
Result A.4 (Properties of a subspace minimizer for the dual). Con-
sider the dual QP (A.14) with H positive definite.

www.it-ebooks.info
SQP METHODS 205

(a) If (w, π, z) is a subspace stationary point, then there exists a vector


x such that

Hw = Hx, with AB xB + AN xN = AxI − b, and xN = 0.

(b) A dual subspace stationary point at which the reduced KKT matrix
(A.16) is nonsingular is a dual subspace minimizer.
(c) If (w, π, z) is a standard subspace minimizer, then zB = 0 and
zN ≥ 0.
This result implies that x = w at a dual subspace minimizer for the
special case of H positive definite. However, it is helpful to distinguish
between w and x to emphasize that x is the vector of dual variables for
the dual problem. At a subspace stationary point, x is a basic solution of
the primal equality constraints. Moreover, z = H(w − xI ) − ATπ + g =
g-(w) − ATπ = g-(x) − ATπ, which are the primal reduced-costs.
Let (w, π) be a nonoptimal dual subspace minimizer for the dual QP
(A.14). (It will be shown below that the vector w need not be com-
puted explicitly.) As (w, π) is not optimal, there is at least one nega-
tive component of the dual multiplier vector xB , say xβr . If we apply the
nonbinding-direction method of Section A.1.1, we define a dual search di-
rection (Δw, qπ , Δz) that is feasible for the dual equality constraints and
increases a nonbasic variable with a negative multiplier. As (w, π, z) is
assumed to be dual feasible, this gives the constraints for the equality-
constraint QP subproblem in the form

HΔw − ATqπ − Δz = 0, ΔzB = er .

The equations analogous to (A.4) for the dual direction (p, qπ , Δz) are
⎛ ⎞⎛ ⎞ ⎛ ⎞
H B HD 0 0 −HB −HD 0 ΔwB 0
⎜ HDT HN 0 0 −H T
−HN 0 ⎟ ⎜ ΔwN ⎟ ⎜ 0 ⎟
⎜ D ⎟⎜ ⎟ ⎜ ⎟
⎜ 0 AB AN 0 ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0 0 0 ⎟ ⎜ qπ ⎟ ⎜ 0 ⎟
⎜ 0 IN ⎟ ⎜
0 ⎟ ⎜ ΔzN ⎟ ⎜ ⎟
⎜ 0 0 0 0 ⎟ = ⎜ 0 ⎟,
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎜ HB HD −AB 0 0 0 IB ⎟ ⎜ pB ⎟ ⎜ 0 ⎟
⎜ T ⎟⎜ ⎟ ⎜ ⎟
⎝ HD HN −ATN −IN 0 0 0 ⎠ ⎝ pN ⎠ ⎝ 0 ⎠
0 0 0 0 0 0 IB ΔzB er

where pB and pN denote the changes in the multipliers x of the dual. Block
elimination gives HΔw = Hp, where pB , pN and qπ are determined by the
equations
    
HB −ATB pB e
pN = 0, and = r . (A.17)
AB 0 qπ 0

As HΔw = Hp, the change in z can be computed as Δz = Hp−ATqπ . The


curvature of the dual objective is given by ΔwTHΔw = (pB )r from (A.8).

www.it-ebooks.info
206 PHILIP E. GILL AND ELIZABETH WONG

If the curvature is nonzero, the step α∗ = −(xB )r /(pB )r minimizes the dual
objective f-D (w + αΔw, π + αqπ , z + αΔz) with respect to α, and the rth
element of xB + α∗ pB is zero. If the xB are interpreted as estimates of the
primal variables, the step from xB to xB + α∗ pB increases the negative (and
hence infeasible) primal variable (xB )r until it reaches its bound of zero.
If α = α∗ gives a feasible point for dual inequalities, i.e., if z + α∗ Δz are
nonnegative, then the new iterate is (w + α∗ Δw, π + α∗ qπ , z + α∗ Δz). In
this case, the nonbinding variable is removed from the dual nonbasic set,
which means that the index βr is moved to N and the associated entries
of H and A are removed from HB and AB .
If α = α∗ is unbounded, or (w + α∗ Δw, π + α∗ qπ , z + α∗ Δz) is not
feasible, the step is the largest α such that g(w + αΔw) − AT (π + αqπ ) is
nonnegative. The required value is

⎨ (zN )i if (ΔzN )i < 0
αF = min {γi }, where γi = −(ΔzN )i (A.18)
1≤i≤nN ⎩
+∞ otherwise.

If αF < α∗ then at least one component of zN is zero at (w + αF Δw, π +


αF qπ , z + αF Δz), and the index of one of these, νs say, is moved to B.
The composition of the new dual basis is determined by the singularity
vector, adapted to the dual QP from the nonbinding direction method of
Section A.1.1. Define the vector u, uπ and v such that u = v − eνs , where
vN = 0, and uπ and vB are determined by the equations
    
HB −ATB vB (hνs )B
= .
AB 0 uπ aνs

If u is zero, then (w + αF qπ , π + αF qπ , z + αF Δz) is a subspace minimizer


with respect to the basis defined with variable βr replaced by constraint νs .
Otherwise, νs is moved to B, which has the effect of adding the column aνs
to AB , and adding a row and column to HB . As in the primal nonbinding
direction method, the vectors p and qπ may be computed in O(n) operations
if no column swap is made.
If (w, π, z) is subspace minimizer at which the reduced KKT matrix
(A.16) is nonsingular, then the next iterate is also a subspace minimizer
with a nonsingular reduced KKT matrix. (For more details, see Gill and
Wong [96]).
The algorithm described above is a special case of the method of Gill
and Wong [96], which is defined for the general convex case (i.e., when H
can be singular). If H = 0 this method is equivalent to the dual simplex
method. Bartlett and Biegler [5] propose a method for the strictly convex
case that uses the Schur-complement method to handle the changes to the
KKT equations when the active set changes (see Section A.4.2).
The dual problem (A.14) has fewer inequality constraints than vari-
ables, which implies that if H and A have no common nontrivial null vector,

www.it-ebooks.info
SQP METHODS 207
. /
then the dual constraint gradients, the rows of H −AT , are linearly in-
dependent, and the dual feasible region has no degenerate points. In this
situation, an active-set dual method cannot cycle, and will either terminate
with an optimal solution or declare the dual problem to be unbounded.
This nondegeneracy property does not hold for a dual linear program, but
it does hold for strictly convex problems and any QP with H and A of
the form
 
H̄ 0 . /
H= and A = Ā −Im ,
0 0

where H̄ is an (n − m) × (n − m) positive-definite matrix.


A.2.2. Finding an initial dual-feasible point. An initial dual-
feasible point may be defined by applying a conventional phase-one method
to the dual constraints, i.e., by minimizing the sum of infeasibilities for the
dual constraints H(x − xI ) − ATπ − z = −g, z ≥ 0. If H is nonsingular and
A has full rank, another option is to define N = ∅ and compute (x0 , π0 )
from the equations
    
H −AT x0 g − HxI
=− .
A 0 π0 b − AxI

This choice of basis gives z0 = 0, c-(x0 ) = 0, with (x0 , π0 , z0 ) a dual subspace


minimizer.
A.2.3. The Goldfarb-Idnani method. If H is nonsingular,
the vectors
     
π b A
y= , b̄ = , and Ā = ,
z xI I

may be used to eliminate w − xI from (A.14) to give the dual problem:

minimize y T(b̄ − ĀH −1 g) + 12 y TĀH −1 ĀT y, subject to z ≥ 0.


y∈Rn+m , z∈Rn

Some references include: Goldfarb and Idnani [100], Powell [154]. A variant
of the Goldfarb and Idnani method for dense convex QP has been proposed
by Boland [10].
A.3. QP regularization. The methods considered above rely on the
assumption that each basis matrix AB has rank m. In an active-set method
this condition is guaranteed (at least in exact arithmetic) by the active-set
strategy if the initial basis has rank m. For methods that solve the KKT
system by factoring a subset of m columns of AB (see Section A.4.1), special
techniques can be used to select a linearly independent set of m columns
from A. These procedures depend on the method used to factor the basis—
for example, the SQP code SNOPT employs a combination of LU factor-
ization and basis repair to determine a full-rank basis. If a factorization

www.it-ebooks.info
208 PHILIP E. GILL AND ELIZABETH WONG

reveals that the square submatrix is rank deficient, suspected dependent


columns are discarded and replaced by the columns associated with slack
variables. However, for methods that solve the KKT system by direct fac-
torization, such as the Schur complement method of Section A.4.2, basis
repair is not an option because the factor routine may be a “black-box”
that does not incorporate rank-detection. Unfortunately, over the course
of many hundreds of iterations, performed with KKT matrices of varying
degrees of conditioning, an SQP method can place even the most robust
symmetric indefinite solver under considerable stress. (Even a relatively
small collection of difficult problems can test the reliability of a solver.
Gould, Scott, and Hu [108] report that none of the 9 symmetric indefinite
solvers tested was able to solve all of the 61 systems in their collection.)
In this situation it is necessary to use a regularized method, i.e., a method
based on solving equations that are guaranteed to be solvable without the
luxury of basis repair.
To illustrate how a problem may be regularized, we start by considering
a QP with equality constraints, i.e.,
minimize g T(x − xI ) + 12 (x − xI )TH(x − xI )
x∈Rn
(A.19)
subject to Ax = AxI − b.
Assume for the moment that this subproblem has a feasible primal-dual
solution (x∗ , π∗ ). Given an estimate πE of the QP multipliers π∗ , a positive
μ and arbitrary ν, consider the generalized augmented Lagrangian
1
M(x, π; πE , μ, ν) = f-(x) − c-(x)T πE + c(x)22
-

ν (A.20)
+ c(x) + μ(π − πE )22
-

(see Forsgren and Gill [71], and Gill and Robinson [95]). The function M
involves n + m variables and has gradient vector
 . /
g-(x) − AT π + (1 + ν)AT π − π̄(x)
∇M(x, π; πE , μ, ν) = . / , (A.21)
νμ π − π̄(x)

where π̄(x) = πE − c-(x)/μ. If we happen to know the value of π ∗ , and


define πE = π ∗ , then simple substitution in (A.21) shows that (x∗ , π ∗ ) is
a stationary point of M for all ν and all positive μ. The Hessian of M is
given by
 . / T 
2 H + 1+νμ A A νAT
∇ M(x, π; πE , μ, ν) = , (A.22)
νA νμI

which is independent of πE . If we make the additional assumptions that


ν is nonnegative and the reduced Hessian of the QP subproblem is posi-
tive definite, then ∇2M is positive semidefinite for all μ sufficiently small.

www.it-ebooks.info
SQP METHODS 209

Under these assumptions, if πE = π ∗ it follows that (x∗ , π∗ ) is the unique


minimizer of the unconstrained problem
minimize M(x, π; πE , μ, ν) (A.23)
x∈Rn ,π∈Rm

(see, e.g., Gill and Robinson [95], Gill and Wong [96]). This result implies
that if πE is an approximate multiplier vector (e.g., from the previous QP
subproblem in the SQP context), then the minimizer of M(x, π; πE , μ, ν)
will approximate the minimizer of (A.19). In order to distinguish between
a solution of (A.19) and a minimizer of (A.23) for an arbitrary πE , we
use (x∗ , π∗ ) to denote a minimizer of M(x, π; πE , μ, ν). Observe that sta-
tionarity of ∇M at (x∗ , π∗ ) implies that π∗ = π̄(x∗ ) = πE − c-(x∗ )/μ. The
components of π̄(x∗ ) are the so-called first-order multipliers associated with
a minimizer of (A.23).
Particular values of the parameter ν give some well-known functions
(although, as noted above, each function defines a problem with the com-
mon solution (x∗ , π∗ )). If ν = 0, then M is independent of π, with
1
M(x; πE , μ) ≡ M(x; πE , μ, 0) = f-(x) − c-(x)T πE + c(x)22 .
- (A.24)

This is the conventional Hestenes-Powell augmented Lagrangian (1.11) ap-
plied to (A.19). If ν = 1 in (A.21), M is the primal-dual augmented
Lagrangian
1 1
f-(x) − c-(x)T πE + c(x)22 +
- c(x) + μ(π − πE )22
- (A.25)
2μ 2μ
considered by Robinson [155] and Gill and Robinson [95]. If ν = −1, then
M is the proximal-point Lagrangian
μ
f-(x) − c-(x)T π − π − πE 22 .
2
As ν is negative in this case, ∇2M is indefinite and M has an unbounded
minimizer. Nevertheless, a unique minimizer of M for ν > 0 is a saddle-
point for an M defined with a negative ν. Moreover, for ν = −1, (x∗ , π∗ )
solves the min-max problem
μ
min max f-(x) − c-(x)T π − π − πE 22 .
x π 2
In what follows, we use M(v) to denote M as a function of the primal-
dual variables v = (x, π) for given values of πE , μ and ν. Given the initial
point vI = (xI , πI ), the stationary point of M(v) is v∗ = vI + Δv, where
Δv = (p, q) with ∇2M(vI )Δv = −∇M(vI ). It can be shown that Δv
satisfies the equivalent system
    
H −AT p g-(xI ) − AT πI
=− , (A.26)
A μI q c-(xI ) + μ(πI − πE )

www.it-ebooks.info
210 PHILIP E. GILL AND ELIZABETH WONG

which is independent of the value of ν (see Gill and Robinson [95]). If ν = 0,


the primal-dual direction is unique. If ν = 0 (i.e., M is the conventional
augmented Lagrangian (A.24)), Δv satisfies the equations
    
H −AT p g-(xI ) − AT π
=− , (A.27)
A μI q c-(xI ) + μ(π − πE )

for an arbitrary vector π. In this case, p is unique but q depends on the


choice of π. In particular, if we define the equations (A.27) with π = πI ,
then we obtain directions identical to those of (A.26). Clearly, it must hold
that p is independent of the choice of ν in (A.21).
The point (x∗ , π∗ ) = (xI + p, πI + q) is the primal-dual solution of the
perturbed QP

minimize g T(x − xI ) + 12 (x − xI )TH(x − xI )


x∈Rn
(A.28)
subject to Ax = AxI − b − μ(π∗ − πE ),

where the perturbation shifts each constraint of (A.19) by an amount that


depends on the corresponding component of π∗ − πE . Observe that the
constraint shift depends on the solution, so it cannot be defined a priori.
The effect of the shift is to regularize the KKT equations by introducing the
nonzero (2, 2) block μI. In the regularized case it is not necessary for A
to have full row rank for the KKT equations to be nonsingular. A full-rank
assumption is required if the (2, 2) block is zero. In particular, if we choose
πE = πI , the system (A.26) is:
    
H −AT p g-(xI ) − AT πI
=− . (A.29)
A μI q c-(xI )

These equations define a regularized version of the Newton equations (2.7).


These equations also form the basis for the primal-dual formulations of the
quadratic penalty method considered by Gould [102] (for related methods,
see Murray [141], Biggs [7] and Tapia [166]).
The price paid for the regularized equations is an approximate solution
of the original problem. However, once (x∗ , π∗ ) has been found, πE can be
redefined as π∗ and the process repeated—with a smaller value of μ if nec-
essary. There is more discussion of the choice of πE below. However, before
turning to the inequality constraint case, we summarize the regularization
for equality constraints.
• The primal-dual solution (x∗ , π∗ ) of the equality constraint prob-
lem (A.19) is approximated by the solution of the perturbed KKT
system (A.26).
• The resulting approximation (x∗ , π∗ ) = (xI + p, πI + q) is a station-
ary point of the function M (A.21) for all values of the parameter
ν. If μ > 0 and ν ≥ 0 then (x∗ , π∗ ) is a minimizer of M for all μ
sufficiently small.

www.it-ebooks.info
SQP METHODS 211

As the solution of the regularized problem is independent of ν, there is


little reason to use nonzero values of ν in the equality-constraint case.
However, the picture changes when there are inequality constraints and an
approximate solution of the QP problem is required, as is often the case in
the SQP context.
The method defined above can be extended to the inequality constraint
problem (A.1) by solving, the bound-constrained subproblem

minimize M(x; πE , μ) subject to x ≥ 0. (A.30)


x∈Rn

This technique has been proposed for general nonlinear programming (see,
e.g., Conn, Gould and Toint [39, 40, 41], Friedlander [73], and Friedlander
and Saunders [75]), and to quadratic programming (see, e.g., Dostál, Fried-
lander and Santos [51, 52, 53], Delbos and Gilbert [46], Friedlander and
Leyffer [74]), and Maes [131]). Subproblem (A.30) may be solved using one
of the active-set methods of Sections A.1.2 and A.1.1, although no explicit
phase-one procedure is needed because there are no general constraints.
In the special case of problem (A.30) a primal active-set method defines a
sequence of nonnegative iterates {xj } such that xj+1 = xj + αj pj ≥ 0. At
the jth iteration of the binding-direction method of Section A.1.2, variables
in the nonbasic set N remain unchanged for any value of the step length,
i.e., pN = (pj )N = 0. This implies that the elements pB of the direction pj
must solve the unconstrained QP subproblem:

minimize pTB (∇Mj )B + 12 pTB (∇2Mj )B pB .


pB

As in (A.26), the optimality conditions imply that pB satisfies


    
HB −ATB pB g (xj ) − AT πj )B
(-
=− , (A.31)
AB μI qj c-(xj ) + μ(πj − πE )

where πj is an estimate of the optimal multipliers π∗ of (A.30). The next


iterate is defined as xj+1 = xj + αj pj , where the steplength αj is chosen
such that xj + αj pj ≥ 0. As in the equality-constraint case, the dual
variables may be updated as πj+1 = πj + αj qj . The dual iterates πj will
converge to the multipliers π∗ of the perturbed QP:

minimize g T(x − xI ) + 12 (x − xI )TH(x − xI )


x∈Rn
(A.32)
subject to Ax = AxI − b − μ(π∗ − πE ), x ≥ 0.

At an optimal solution (x∗ , π∗ ) of (A.30) the vector z∗ = g-(x∗ ) − AT π∗


provides an estimate of the optimal reduced costs z ∗ . As in the equality-
constraint case, the vector of first-order multipliers π̄(x∗ ) = πE − c-(x∗ )/μ
is identical to π∗ . Problem (A.30) may be solved using a bound-constraint
variant of the nonbinding-direction method of Section A.1.1. This method

www.it-ebooks.info
212 PHILIP E. GILL AND ELIZABETH WONG

has some advantages when solving convex and general QP problems. For
more details, see Gill and Wong [96].
If the QP is a “one-off” problem, then established techniques associated
with the bound-constrained augmented Lagrangian method can be used to
update πE and μ (see, e.g., Conn, Gould and Toint [40], Dostál, Friedlander
and Santos [51, 52, 53], Delbos and Gilbert [46], and Friedlander and Leyffer
[74]). These rules are designed to update πE and μ without the need to
find the exact solution of (A.30). In the SQP context, it may be more
appropriate to find an approximate solution of (A.30) for a fixed value of
πE , which is then updated in the outer iteration. Moreover, as μ is being
used principally for regularization, it is given a smaller value than is typical
in a conventional augmented Lagrangian method.
A.4. Solving the KKT system. The principal work associated with
a QP iteration is the cost of solving one or two saddle-point systems of
the form
⎛ ⎞⎛ ⎞ ⎛ ⎞
HB −ATB HD yB gB
⎝ AB μI AN ⎠ ⎝ w ⎠ = ⎝ f1 ⎠ , (A.33)
IN yN f2
where μ is a nonnegative scalar. We focus on two approaches appropriate
for large-scale quadratic programming.
A.4.1. Variable-reduction methods. These methods are appropri-
ate for the case μ = 0. As AB has rank m, there exists a nonsingular QB
such that
. /
A B QB = 0 B , (A.34)
with B an m × m nonsingular matrix. If μ = 0 the matrix QB is used to
transform the generic system (A.33) to
. block-triangular
/ form. The columns
of QB are partitioned so that QB = ZB YB with ZB an nB − m by nB
matrix, then AB ZB = 0 and the columns of ZB span the null-space of AB .
Analogous to (2.11) we obtain the permuted block-triangular system:
⎛ T ⎞⎛ ⎞ ⎛ ⎞
ZB HB ZB ZBT HB YB ZBT HD yZ gZ
⎜ YBTHB ZB YBT HB YB −B T YBT HD ⎟ ⎜yY ⎟ ⎜gY ⎟
⎜ ⎟⎜ ⎟ = ⎜ ⎟, (A.35)
⎝ B 0 AN ⎠ ⎝ w ⎠ ⎝ f1 ⎠
IN yN f2
with gZ = ZBT gB and gY = YBT gB . We formulate the result of the block
substitution in a form that uses matrix-vector products involving the full
matrix H rather than the submatrix HB . This is done to emphasize the
practical utility of accessing the QP Hessian as an operator that defines
the product Hx for a given x. This reformulation requires the definition of
the explicit column permutation P that identifies the basic and nonbasic
columns AB and AN of A, i.e.,
. /
AP = AB AN . (A.36)

www.it-ebooks.info
SQP METHODS 213

T T T
. matrices/ Q, W , Y and Z that act
Given the permutation P , we define
on vectors of length n, i.e., Q = Z Y W , where
     
ZB YB 0
Z =P , Y =P , and W = P .
0 0 IN

In terms of these matrices, the block substitution yields

yW = f2 , y0 = W y W ,
ByY = f1 − AN f2 , y1 = Y y Y + y 0 ,
(A.37)
Z THZyZ = Z T(g − Hy1 ), y = ZyZ + y1 ,
B Tw = −Y T(g − Hy).

There are many practical choices for the matrix QB . For small-to-
medium scale problems with dense A and H, the matrix QB can be cal-
culated as the orthogonal factor associated with the QR factorization of
a row and column-permuted ATB (see, e.g., Gill et al. [87]). The method
of variable reduction is appropriate when A is sparse. In this case the
permutation P of (A.36) is specialized further to give
. / . /
AP = AB AN , with AB = B S ,

where B an m × m nonsingular subset of the columns


. of A and
/ S an m × nS
matrix with nS = nB − m. The matrix Q = Z Y W is constructed
so that
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
−B −1 S Im 0
Z = P ⎝ InS ⎠ , Y = P ⎝ 0 ⎠ , and W = P ⎝ 0 ⎠ .
0 0 IN

This form means that matrix-vector products Z Tv or Zv can be computed


using a factorization of B (typically, a sparse LU factorization; see Gill et
al. [89]), and Z need not be stored explicitly.
A.4.2. The Schur complement method. Solving a “one-off” KKT
system can be done very effectively using sparse matrix factorization tech-
niques. However, within a QP algorithm, many systems must be solved in
which the matrix changes by a single row and column. Instead of redefining
and re-solving the KKT equations at each iteration, the solution may be
found by solving a bordered system of the form
      
K0 V z b HB ATB
= , with K 0 = , (A.38)
VT D w f AB μI

where K0 is the KKT matrix at the initial point. For simplicity, we assume
that the second block of the variables is scaled by −1 so that the (1, 2)
block of K0 is ATB , not −ATB . The Schur complement method is based
on the assumption that factorizations for K0 and the Schur complement

www.it-ebooks.info
214 PHILIP E. GILL AND ELIZABETH WONG

C = D − V TK0−1 V exist. Then the solution of (A.38) can be determined


by solving the equations

K0 t = b, Cw = f − V Tt, K0 z = b − V w.

The work required is dominated by two solves with the fixed matrix K0
and one solve with the Schur complement C. If the number of changes to
the basic set is small enough, dense factors of C may be maintained.
We illustrate the definition of (A.38) immediately after the matrix K0
is factorized. (For more details, see, e.g., Bisschop and Meeraus [8], Gill
et al. [91].) Suppose that variable s enters the basic set. The next KKT
matrix can be written as
⎛ ⎞
HB ATB (hs )B
⎝ AB μI as ⎠ ,
T T
(hs )B as hss

where as and hs are the sth columns of. A and H. /This is a matrix of the
form (A.38) with D = (hss ) and V T = (hs )TB aTs .
Now consider the case where the rth basic variable is deleted from the
basic set, so that the rth column is removed from AB . The correspond-
ing changes can be enforced in the solution of the KKT system using the
bordered matrix:
⎛ ⎞
HB ATB (hs )B er
⎜ AB
⎜ μI as 0⎟⎟
⎜ (h )T aT hss 0⎟ .
⎝ s B s ⎠
eTr 0 0 0

Bordering with the unit row and column has the effect of zeroing out the
components of the solution corresponding to the deleted basic variable.
The Schur complement method can be extended to a block LU method
by storing the bordered matrix in block-factored form
    
K0 V L U Y
= , (A.39)
VT D ZT I C

where K0 = LU , LY = V , U TZ = V , and C = D − Z TY , which is the


Schur complement matrix (see Eldersveld and Saunders [56], Huynh [123]).
Using the block factors, the solution of (A.38) can be computed from
the equations

Lt = b, Cw = f − Z Tt, U z = t − Y w.

This method requires a solve with L and U each, one multiply with Y and
Z T , and one solve with the Schur complement C.

www.it-ebooks.info
SQP METHODS 215

Although the augmented system (in general) increases in dimension


by one at every iteration, the K0 block is fixed and defined by the initial
basic set. As the inner iterations proceed, the size of C increases and the
work required to perform a solve or update increases. It may be necessary
to restart the process by discarding the existing factors and re-forming K0
based on the current set of basic variables.
A.5. Finding an initial feasible point. There are two approaches
to finding a feasible point for the QP constraints. The first, common in
linear programming, is to find a point that satisfies the equality constraints
and then iterate (if necessary) to satisfy the bounds. The second method
finds a point that satisfies the bound constraints and then iterates to satisfy
the equality constraints. In each case we assume that an initial nonbasic
set is known (in the SQP context, this is often the final nonbasic set from
the previous QP subproblem).
The first approach is suitable if the variable reduction method is used
in the optimality phase. In this. case,
/ a factorization is available of the
matrix B such that AB QB = 0 B . Given xI , a point x0 is computed
that satisfies the constraints Ax = AxI − b, i.e., we define

BpY = −-
c(xI ), pF = Y pY , x0 = xI + pF .

If x0 ≥ 0, then x0 is the initial iterate for a phase-one algorithm that


minimizes the linear function − i∈V(x) xi , where V(x) is the index set of
violated bounds at x. The two-phase nature of the algorithm is reflected
by changing the function being minimized from the sum of infeasibilities to
the quadratic objective function. The function − i∈V(x) xi is a piece-wise
linear function that gives the one-norm of the constraint infeasibilities at
x. A feature of this approach is that many violated constraints can become
feasible at any given iteration.
Minimizing the explicit sum of infeasibilities directly is less straight-
forward for the Schur complement method because the objective function
changes from one iteration to the next. In this case, a single phase with a
composite objective may be used, i.e.,

minimize g T(x − xI ) + 12 (x − xI )TH(x − xI ) + eTu + eTv


x,v,w
(A.40)
subject to Ax − u + v = AxI − b, x ≥ 0, u ≥ 0, v ≥ 0,

where e is the vector of ones. This approach has been used by Gould [103],
and Huynh [123]. An alternative is to define a phase-one subproblem that
minimizes the two-norm of the constraint violations, i.e.,
1 2
minimize
x,v 2 v2 subject to Ax + v = AxI − b, x ≥ 0. (A.41)

This problem is a convex QP. Given an initial point x0 and nonbasic set
N0 for the phase-two problem, the basic variables for phase one consist of

www.it-ebooks.info
216 PHILIP E. GILL AND ELIZABETH WONG

the x0 variables in B0 and the m variables v0 such that Ax0 + v0 = AxI − b.


The variables v are always basic.
Another phase-one subproblem appropriate for the Schur complement
method minimizes a strictly convex objective involving the two-norm of the
constraint violations and a primal regularization term:

minimize 1
2 Ax − b22 + 12 σx − xI 22 subject to x ≥ 0. (A.42)
x∈R n

If a feasible point exists, then the problem is feasible with the objective
bounded below by zero. The only constraints of this problem are bounds
on x. Applying the nonbinding direction method of Section A.1.1 gives
    
σIB ATB pB 0
= ,
AB −I −qπ −aνs

with pN = es and qN = σes − ATN qπ . Solving this phase-one problem is


equivalent to applying the regularized QP algorithm in [96] with μ = 1 and
πE = 0, to the problem
σ
minimize x − x0 22 subject to Ax = b, x ≥ 0.
x∈Rn 2

REFERENCES

[1] P.R. Amestoy, I.S. Duff, J.-Y. L’Excellent, and J. Koster, A fully asyn-
chronous multifrontal solver using distributed dynamic scheduling, SIAM J.
Matrix Anal. Appl., 23 (2001), pp. 15–41 (electronic).
[2] M. Anitescu, On the rate of convergence of sequential quadratic programming
with nondifferentiable exact penalty function in the presence of constraint
degeneracy, Math. Program., 92 (2002), pp. 359–386.
[3] , A superlinearly convergent sequential quadratically constrained quadratic
programming algorithm for degenerate nonlinear programming, SIAM J. Op-
tim., 12 (2002), pp. 949–978.
[4] C. Ashcraft and R. Grimes, SPOOLES: an object-oriented sparse matrix li-
brary, in Proceedings of the Ninth SIAM Conference on Parallel Processing
for Scientific Computing 1999 (San Antonio, TX), Philadelphia, PA, 1999,
SIAM, p. 10.
[5] R.A. Bartlett and L.T. Biegler, QPSchur: a dual, active-set, Schur-
complement method for large-scale and structured convex quadratic program-
ming, Optim. Eng., 7 (2006), pp. 5–32.
[6] D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods,
Athena Scientific, Belmont, Massachusetts, 1996.
[7] M.C. Biggs, Constrained minimization using recursive equality quadratic pro-
gramming, in Numerical Methods for Nonlinear Optimization, F.A. Lootsma,
ed., Academic Press, London and New York, 1972, pp. 411–428.
[8] J. Bisschop and A. Meeraus, Matrix augmentation and partitioning in the
updating of the basis inverse, Math. Program., 13 (1977), pp. 241–254.
[9] P.T. Boggs and J.W. Tolle, Sequential quadratic programming, in Acta Nu-
merica, 1995, Vol. 4 of Acta Numer., Cambridge Univ. Press, Cambridge,
1995, pp. 1–51.

www.it-ebooks.info
SQP METHODS 217

[10] N.L. Boland, A dual-active-set algorithm for positive semi-definite quadratic


programming, Math. Programming, 78 (1997), pp. 1–27.
[11] J.M. Borwein, Necessary and sufficient conditions for quadratic minimality,
Numer. Funct. Anal. and Optimiz., 5 (1982), pp. 127–140.
[12] A.M. Bradley, Algorithms for the Equilibration of Matrices and Their Appli-
cation to Limited-Memory Quasi-Newton Methods, PhD thesis, Institute for
Computational and Mathematical Engineering, Stanford University, Stan-
ford, CA, May 2010.
[13] K.W. Brodlie, A.R. Gourlay, and J. Greenstadt, Rank-one and rank-two
corrections to positive definite matrices expressed in product form, J. Inst.
Math. Appl., 11 (1973), pp. 73–82.
[14] C.G. Broyden, The convergence of a class of double rank minimization algo-
rithms, I & II, J. Inst. Maths. Applns., 6 (1970), pp. 76–90 and 222–231.
[15] A. Buckley and A. LeNir, QN-like variable storage conjugate gradients, Math.
Program., 27 (1983), pp. 155–175.
[16] , BBVSCG–a variable storage algorithm for function minimization, ACM
Trans. Math. Software, 11 (1985), pp. 103–119.
[17] J.R. Bunch, Partial pivoting strategies for symmetric matrices, SIAM J. Numer.
Anal., 11 (1974), pp. 521–528.
[18] J.R. Bunch and L. Kaufman, Some stable methods for calculating inertia and
solving symmetric linear systems, Math. Comput., 31 (1977), pp. 163–179.
[19] , A computational method for the indefinite quadratic programming prob-
lem, Linear Algebra Appl., 34 (1980), pp. 341–370.
[20] J.R. Bunch and B.N. Parlett, Direct methods for solving symmetric indefinite
systems of linear equations, SIAM J. Numer. Anal., 8 (1971), pp. 639–655.
[21] J.V. Burke, A sequential quadratic programming method for potentially infeasi-
ble mathematical programs, J. Math. Anal. Appl., 139 (1989), pp. 319–351.
[22] , A robust trust region method for constrained nonlinear programming prob-
lems, SIAM J. Optim., 2 (1992), pp. 324–347.
[23] J.V. Burke and S.-P. Han, A robust sequential quadratic programming method,
Math. Programming, 43 (1989), pp. 277–303.
[24] R.H. Byrd, F.E. Curtis, and J. Nocedal, Infeasibility detection and SQP meth-
ods for nonlinear optimization, SIAM Journal on Optimization, 20 (2010),
pp. 2281–2299.
[25] R.H. Byrd, N.I.M. Gould, J. Nocedal, and R.A. Waltz, An algorithm for
nonlinear optimization using linear programming and equality constrained
subproblems, Math. Program., 100 (2004), pp. 27–48.
[26] , On the convergence of successive linear-quadratic programming algo-
rithms, SIAM J. Optim., 16 (2005), pp. 471–489.
[27] R.H. Byrd, J. Nocedal, and R.B. Schnabel, Representations of quasi-Newton
matrices and their use in limited-memory methods, Math. Program., 63
(1994), pp. 129–156.
[28] R.H. Byrd, J. Nocedal, and R.A. Waltz, Steering exact penalty methods for
nonlinear programming, Optim. Methods Softw., 23 (2008), pp. 197–213.
[29] R.H. Byrd, R.A. Tapia, and Y. Zhang, An SQP augmented Lagrangian
BFGS algorithm for constrained optimization, SIAM J. Optim., 20 (1992),
pp. 210–241.
[30] R.M. Chamberlain, M.J.D. Powell, C. Lemarechal, and H.C. Pedersen,
The watchdog technique for forcing convergence in algorithms for constrained
optimization, Math. Programming Stud. (1982), pp. 1–17. Algorithms for
constrained minimization of smooth nonlinear functions.
[31] C.M. Chin, A New Trust Region based SLP Filter Algorithm which uses EQP
Active Set Strategy, PhD thesis, Department of Mathematics, University of
Dundee, Scotland, 2001.
[32] C.M. Chin and R. Fletcher, On the global convergence of an SLP-filter algo-
rithm that takes EQP steps, Math. Program., 96 (2003), pp. 161–177.

www.it-ebooks.info
218 PHILIP E. GILL AND ELIZABETH WONG

[33] C.M. Chin, A.H.A. Rashid, and K.M. Nor, A combined filter line search and
trust region method for nonlinear programming, WSEAS Trans. Math., 5
(2006), pp. 656–662.
[34] J.W. Chinneck, Analyzing infeasible nonlinear programs, Comput. Optim.
Appl., 4 (1995), pp. 167–179.
[35] , Feasibility and infeasibility in optimization: algorithms and computa-
tional methods, International Series in Operations Research & Management
Science, 118, Springer, New York, 2008.
[36] T.F. Coleman and A.R. Conn, On the local convergence of a quasi-Newton
method for the nonlinear programming problem, SIAM J. Numer. Anal., 21
(1984), pp. 775–769.
[37] T.F. Coleman and A. Pothen, The null space problem I. Complexity, SIAM J.
on Algebraic and Discrete Methods, 7 (1986), pp. 527–537.
[38] T.F. Coleman and D.C. Sorensen, A note on the computation of an orthogonal
basis for the null space of a matrix, Math. Program., 29 (1984), pp. 234–242.
[39] A.R. Conn, N.I.M. Gould, and Ph. L. Toint, Global convergence of a class of
trust region algorithms for optimization with simple bounds, SIAM J. Numer.
Anal., 25 (1988), pp. 433–460.
[40] , A globally convergent augmented Lagrangian algorithm for optimization
with general constraints and simple bounds, SIAM J. Numer. Anal., 28
(1991), pp. 545–572.
[41] , LANCELOT: a Fortran package for large-scale nonlinear optimization (Re-
lease A), Lecture Notes in Computation Mathematics 17, Springer Verlag,
Berlin, Heidelberg, New York, London, Paris and Tokyo, 1992.
[42] , Trust-Region Methods, Society for Industrial and Applied Mathematics
(SIAM), Philadelphia, PA, 2000.
[43] L.B. Contesse, Une caractérisation complète des minima locaux en programma-
tion quadratique, Numer. Math., 34 (1980), pp. 315–332.
[44] R.W. Cottle, G.J. Habetler, and C.E. Lemke, On classes of copositive ma-
trices, Linear Algebra Appl., 3 (1970), pp. 295–310.
[45] Y.-H. Dai and K. Schittkowski, A sequential quadratic programming algorithm
with non-monotone line search, Pac. J. Optim., 4 (2008), pp. 335–351.
[46] F. Delbos and J.C. Gilbert, Global linear convergence of an augmented La-
grangian algorithm to solve convex quadratic optimization problems, J. Con-
vex Anal., 12 (2005), pp. 45–69.
[47] R.S. Dembo and U. Tulowitzki, Sequential truncated quadratic programming
methods, in Numerical optimization, 1984 (Boulder, Colo., 1984), SIAM,
Philadelphia, PA, 1985, pp. 83–101.
[48] J.E. Dennis, Jr. and R.B. Schnabel, A new derivation of symmetric positive
definite secant updates, in Nonlinear Programming, 4 (Proc. Sympos., Special
Interest Group on Math. Programming, Univ. Wisconsin, Madison, Wis.,
1980), Academic Press, New York, 1981, pp. 167–199.
[49] G. DiPillo and L. Grippo, A new class of augmented Lagrangians in nonlinear
programming, SIAM J. Control Optim., 17 (1979), pp. 618–628.
[50] W.S. Dorn, Duality in quadratic programming, Quart. Appl. Math., 18
(1960/1961), pp. 155–162.
[51] Z. Dostál, A. Friedlander, and S. A. Santos, Adaptive precision control in
quadratic programming with simple bounds and/or equalities, in High per-
formance algorithms and software in nonlinear optimization (Ischia, 1997),
Vol. 24 of Appl. Optim., Kluwer Acad. Publ., Dordrecht, 1998, pp. 161–173.
[52] , Augmented Lagrangians with adaptive precision control for quadratic
programming with equality constraints, Comput. Optim. Appl., 14 (1999),
pp. 37–53.
[53] , Augmented Lagrangians with adaptive precision control for quadratic pro-
gramming with simple bounds and equality constraints, SIAM J. Optim., 13
(2003), pp. 1120–1140 (electronic).

www.it-ebooks.info
SQP METHODS 219

[54] I.S. Duff, MA57—a code for the solution of sparse symmetric definite and in-
definite systems, ACM Trans. Math. Software, 30 (2004), pp. 118–144.
[55] I.S. Duff and J.K. Reid, MA27: a set of Fortran subroutines for solving sparse
symmetric sets of linear equations, Tech. Rep. R-10533, Computer Science
and Systems Division, AERE Harwell, Oxford, England, 1982.
[56] S.K. Eldersveld and M.A. Saunders, A block-LU update for large-scale linear
programming, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 191–201.
[57] O. Exler and K. Schittkowski, A trust region SQP algorithm for mixed-integer
nonlinear programming, Optim. Lett., 1 (2007), pp. 269–280.
[58] A. Fischer, Modified Wilson’s method for nonlinear programs with nonunique
multipliers, Math. Oper. Res., 24 (1999), pp. 699–727.
[59] R. Fletcher, A new approach to variable metric algorithms, Computer Journal,
13 (1970), pp. 317–322.
[60] , A general quadratic programming algorithm, J. Inst. Math. Applics., 7
(1971), pp. 76–91.
[61] , A model algorithm for composite nondifferentiable optimization problems,
Math. Programming Stud. (1982), pp. 67–76. Nondifferential and variational
techniques in optimization (Lexington, Ky., 1980).
[62] , Second order corrections for nondifferentiable optimization, in Numeri-
cal analysis (Dundee, 1981), Vol. 912 of Lecture Notes in Math., Springer,
Berlin, 1982, pp. 85–114.
[63] , An 1 penalty method for nonlinear constraints, in Numerical Optimiza-
tion 1984, P.T. Boggs, R.H. Byrd, and R.B. Schnabel, eds., Philadelphia,
1985, pp. 26–40.
[64] , Practical methods of optimization, Wiley-Interscience [John Wiley &
Sons], New York, 2001.
[65] R. Fletcher, N.I.M. Gould, S. Leyffer, Ph. L. Toint, and A. Wächter,
Global convergence of a trust-region SQP-filter algorithm for general non-
linear programming, SIAM J. Optim., 13 (2002), pp. 635–659 (electronic)
(2003).
[66] R. Fletcher and S. Leyffer, User manual for filterSQP, Tech. Rep. NA/181,
Dept. of Mathematics, University of Dundee, Scotland, 1998.
[67] , Nonlinear programming without a penalty function, Math. Program., 91
(2002), pp. 239–269.
[68] R. Fletcher, S. Leyffer, and Ph. L. Toint, On the global convergence of a
filter-SQP algorithm, SIAM J. Optim., 13 (2002), pp. 44–59 (electronic).
[69] R. Fletcher and E. Sainz de la Maza, Nonlinear programming and nonsmooth
optimization by successive linear programming, Math. Program., 43 (1989),
pp. 235–256.
[70] A. Forsgren, Inertia-controlling factorizations for optimization algorithms,
Appl. Num. Math., 43 (2002), pp. 91–107.
[71] A. Forsgren and P.E. Gill, Primal-dual interior methods for nonconvex non-
linear programming, SIAM J. Optim., 8 (1998), pp. 1132–1152.
[72] A. Forsgren, P.E. Gill, and W. Murray, On the identification of local min-
imizers in inertia-controlling methods for quadratic programming, SIAM J.
Matrix Anal. Appl., 12 (1991), pp. 730–746.
[73] M.P. Friedlander, A Globally Convergent Linearly Constrained Lagrangian
Method for Nonlinear Optimization, PhD thesis, Department of Operations
Research, Stanford University, Stanford, CA, 2002.
[74] M.P. Friedlander and S. Leyffer, Global and finite termination of a two-
phase augmented Lagrangian filter method for general quadratic programs,
SIAM J. Sci. Comput., 30 (2008), pp. 1706–1729.
[75] M.P. Friedlander and M.A. Saunders, A globally convergent linearly con-
strained Lagrangian method for nonlinear optimization, SIAM J. Optim., 15
(2005), pp. 863–897.

www.it-ebooks.info
220 PHILIP E. GILL AND ELIZABETH WONG

[76] M.P. Friedlander and P. Tseng, Exact regularization of convex programs,


SIAM J. Optim., 18 (2007), pp. 1326–1350.
[77] J.C. Gilbert and C. Lemaréchal, Some numerical experiments with variable-
storage quasi-Newton algorithms, Math. Program. (1989), pp. 407–435.
[78] J.R. Gilbert and M.T. Heath, Computing a sparse basis for the null space, Re-
port TR86-730, Department of Computer Science, Cornell University, 1986.
[79] P.E. Gill, N.I.M. Gould, W. Murray, M.A. Saunders, and M.H. Wright,
A weighted gram-schmidt method for convex quadratic programming, Math.
Program., 30 (1984), pp. 176–195.
[80] P.E. Gill and M.W. Leonard, Limited-memory reduced-Hessian methods
for large-scale unconstrained optimization, SIAM J. Optim., 14 (2003),
pp. 380–401.
[81] P.E. Gill and W. Murray, Newton-type methods for unconstrained and linearly
constrained optimization, Math. Program., 7 (1974), pp. 311–350.
[82] , Numerically stable methods for quadratic programming, Math. Program.,
14 (1978), pp. 349–372.
[83] P.E. Gill, W. Murray, and M.A. Saunders, SNOPT: An SQP algorithm for
large-scale constrained optimization, SIAM Rev., 47 (2005), pp. 99–131.
[84] , User’s guide for SNOPT Version 7: Software for large-scale nonlinear
programming, Numerical Analysis Report 06-2, Department of Mathematics,
University of California, San Diego, La Jolla, CA, 2006.
[85] P.E. Gill, W. Murray, M.A. Saunders, G.W. Stewart, and M.H. Wright,
Properties of a representation of a basis for the null space, Math. Program-
ming, 33 (1985), pp. 172–186.
[86] P.E. Gill, W. Murray, M.A. Saunders, and M.H. Wright, A note on a
sufficient-decrease criterion for a nonderivative step-length procedure, Math.
Programming, 23 (1982), pp. 349–352.
[87] , Procedures for optimization problems with a mixture of bounds and gen-
eral linear constraints, ACM Trans. Math. Software, 10 (1984), pp. 282–298.
[88] , Sparse matrix methods in optimization, SIAM J. Sci. Statist. Comput., 5
(1984), pp. 562–589.
[89] , Maintaining LU factors of a general sparse matrix, Linear Algebra Appl.,
88/89 (1987), pp. 239–270.
[90] , A Schur-complement method for sparse quadratic programming, Report
SOL 87-12, Department of Operations Research, Stanford University, Stan-
ford, CA, 1987.
[91] , A Schur-complement method for sparse quadratic programming, in Reli-
able Numerical Computation, M.G. Cox and S.J. Hammarling, eds., Oxford
University Press, 1990, pp. 113–138.
[92] , Inertia-controlling methods for general quadratic programming, SIAM
Rev., 33 (1991), pp. 1–36.
[93] , Some theoretical properties of an augmented Lagrangian merit function,
in Advances in Optimization and Parallel Computing, P.M. Pardalos, ed.,
North Holland, North Holland, 1992, pp. 101–128.
[94] P.E. Gill, W. Murray, and M.H. Wright, Practical Optimization, Academic
Press, London and New York, 1981.
[95] P.E. Gill and D.P. Robinson, A primal-dual augmented Lagrangian, Compu-
tational Optimization and Applications (2010), pp. 1–25.
http://dx.doi.org/10.1007/s10589-010-9339-1.
[96] P.E. Gill and E. Wong, Methods for convex and general quadratic program-
ming, Numerical Analysis Report 11-1, Department of Mathematics, Univer-
sity of California, San Diego, La Jolla, CA, 2011.
[97] , A regularized method for convex and general quadratic programming, Nu-
merical Analysis Report 10-2, Department of Mathematics, University of
California, San Diego, La Jolla, CA, 2010.

www.it-ebooks.info
SQP METHODS 221

[98] D. Goldfarb, A family of variable metric methods derived by variational means,


Math. Comp., 24 (1970), pp. 23–26.
[99] , Curvilinear path steplength algorithms for minimization which use direc-
tions of negative curvature, Math. Program., 18 (1980), pp. 31–40.
[100] D. Goldfarb and A. Idnani, A numerically stable dual method for solving
strictly convex quadratic programs, Math. Programming, 27 (1983), pp. 1–33.
[101] N.I.M. Gould, On practical conditions for the existence and uniqueness of so-
lutions to the general equality quadratic programming problem, Math. Pro-
gram., 32 (1985), pp. 90–99.
[102] , On the accurate determination of search directions for simple differen-
tiable penalty functions, IMA J. Numer. Anal., 6 (1986), pp. 357–372.
[103] , An algorithm for large-scale quadratic programming, IMA J. Numer.
Anal., 11 (1991), pp. 299–324.
[104] N.I.M. Gould, D. Orban, and Ph.L. Toint, GALAHAD, a library of thread-
safe Fortran 90 packages for large-scale nonlinear optimization, ACM Trans.
Math. Software, 29 (2003), pp. 353–372.
[105] N.I.M. Gould and D.P. Robinson, A second derivative SQP method with im-
posed descent, Numerical Analysis Report 08/09, Computational Laboratory,
University of Oxford, Oxford, UK, 2008.
[106] , A second derivative SQP method: Global convergence, SIAM J. Optim.,
20 (2010), pp. 2023–2048.
[107] , A second derivative SQP method: Local convergence and practical issues,
SIAM J. Optim., 20 (2010), pp. 2049–2079.
[108] N.I.M. Gould, J.A. Scott, and Y. Hu, A numerical evaluation of sparse direct
solvers for the solution of large sparse symmetric linear systems of equations,
ACM Trans. Math. Software, 33 (2007), pp. Art. 10, 32.
[109] N.I.M. Gould and Ph.L. Toint, An iterative working-set method for large-scale
nonconvex quadratic programming, Appl. Numer. Math., 43 (2002), pp. 109–
128. 19th Dundee Biennial Conference on Numerical Analysis (2001).
[110] , Numerical methods for large-scale non-convex quadratic programming, in
Trends in industrial and applied mathematics (Amritsar, 2001), Vol. 72 of
Appl. Optim., Kluwer Acad. Publ., Dordrecht, 2002, pp. 149–179.
[111] J.-P. Goux and S. Leyffer, Solving large MINLPs on computational grids, Op-
tim. Eng., 3 (2002), pp. 327–346. Special issue on mixed-integer programming
and its applications to engineering.
[112] J. Greenstadt, On the relative efficiencies of gradient methods, Math. Comp.,
21 (1967), pp. 360–367.
[113] L. Grippo, F. Lampariello, and S. Lucidi, Newton-type algorithms with non-
monotone line search for large-scale unconstrained optimization, in System
modelling and optimization (Tokyo, 1987), Vol. 113 of Lecture Notes in
Control and Inform. Sci., Springer, Berlin, 1988, pp. 187–196.
[114] , A truncated Newton method with nonmonotone line search for uncon-
strained optimization, J. Optim. Theory Appl., 60 (1989), pp. 401–419.
[115] , A class of nonmonotone stabilization methods in unconstrained optimiza-
tion, Numer. Math., 59 (1991), pp. 779–805.
[116] N.-Z. Gu and J.-T. Mo, Incorporating nonmonotone strategies into the trust
region method for unconstrained optimization, Comput. Math. Appl., 55
(2008), pp. 2158–2172.
[117] W.W. Hager, Stabilized sequential quadratic programming, Comput. Optim.
Appl., 12 (1999), pp. 253–273. Computational optimization—a tribute to
Olvi Mangasarian, Part I.
[118] S.P. Han, Superlinearly convergent variable metric algorithms for general nonlin-
ear programming problems, Math. Programming, 11 (1976/77), pp. 263–282.
[119] , A globally convergent method for nonlinear programming, J. Optim. The-
ory Appl., 22 (1977), pp. 297–309.

www.it-ebooks.info
222 PHILIP E. GILL AND ELIZABETH WONG

[120] S.P. Han and O.L. Mangasarian, Exact penalty functions in nonlinear pro-
gramming, Math. Programming, 17 (1979), pp. 251–269.
[121] J. Herskovits, A two-stage feasible directions algorithm for nonlinear con-
strained optimization, Math. Programming, 36 (1986), pp. 19–38.
[122] M.R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl., 4
(1969), pp. 303–320.
[123] H.M. Huynh, A Large-Scale Quadratic Programming Solver Based on Block-LU
Updates of the KKT System, PhD thesis, Program in Scientific Computing
and Computational Mathematics, Stanford University, Stanford, CA, 2008.
[124] M.M. Kostreva and X. Chen, A superlinearly convergent method of feasible
directions, Appl. Math. Comput., 116 (2000), pp. 231–244.
[125] , Asymptotic rates of convergence of SQP-type methods of feasible direc-
tions, in Optimization methods and applications, Vol. 52 of Appl. Optim.,
Kluwer Acad. Publ., Dordrecht, 2001, pp. 247–265.
[126] J. Kroyan, Trust-Search Algorithms for Unconstrained Optimization, PhD the-
sis, Department of Mathematics, University of California, San Diego, Febru-
ary 2004.
[127] C.T. Lawrence and A.L. Tits, A computationally efficient feasible sequential
quadratic programming algorithm, SIAM J. Optim., 11 (2001), pp. 1092–1118
(electronic).
[128] S. Leyffer, Integrating SQP and branch-and-bound for mixed integer nonlinear
programming, Comput. Optim. Appl., 18 (2001), pp. 295–309.
[129] D.C. Liu and J. Nocedal, On the limited memory BFGS method for large scale
optimization, Math. Program., 45 (1989), pp. 503–528.
[130] X.-W. Liu and Y.-X. Yuan, A robust algorithm for optimization with gen-
eral equality and inequality constraints, SIAM J. Sci. Comput., 22 (2000),
pp. 517–534 (electronic).
[131] C.M. Maes, A Regularized Active-Set Method for Sparse Convex Quadratic Pro-
gramming, PhD thesis, Institute for Computational and Mathematical Engi-
neering, Stanford University, Stanford, CA, August 2010.
[132] A. Majthay, Optimality conditions for quadratic programming, Math. Program-
ming, 1 (1971), pp. 359–365.
[133] O.L. Mangasarian and S. Fromovitz, The Fritz John necessary optimality
conditions in the presence of equality and inequality constraints, J. Math.
Anal. Appl., 17 (1967), pp. 37–47.
[134] N. Maratos, Exact Penalty Function Algorithms for Finite-Dimensional and
Control Optimization Problems, PhD thesis, Department of Computing and
Control, University of London, 1978.
[135] J. Mo, K. Zhang, and Z. Wei, A variant of SQP method for inequality con-
strained optimization and its global convergence, J. Comput. Appl. Math.,
197 (2006), pp. 270–281.
[136] J.L. Morales, A numerical study of limited memory BFGS methods, Appl.
Math. Lett., 15 (2002), pp. 481–487.
[137] J.L. Morales, J. Nocedal, and Y. Wu, A sequential quadratic programming
algorithm with an additional equality constrained phase, Tech. Rep. OTC-05,
Northwestern University, 2008.
[138] J.J. Moré and D.C. Sorensen, On the use of directions of negative curvature
in a modified Newton method, Math. Program., 16 (1979), pp. 1–20.
[139] , Newton’s method, in Studies in Mathematics, Volume 24. MAA Studies
in Numerical Analysis, G.H. Golub, ed., Math. Assoc. America, Washington,
DC, 1984, pp. 29–82.
[140] J.J. Moré and D. J. Thuente, Line search algorithms with guaranteed sufficient
decrease, ACM Trans. Math. Software, 20 (1994), pp. 286–307.
[141] W. Murray, An algorithm for constrained minimization, in Optimization (Sym-
pos., Univ. Keele, Keele, 1968), Academic Press, London, 1969, pp. 247–258.

www.it-ebooks.info
SQP METHODS 223

[142] W. Murray and F.J. Prieto, A sequential quadratic programming algorithm


using an incomplete solution of the subproblem, SIAM J. Optim., 5 (1995),
pp. 590–640.
[143] J. Nocedal and S.J. Wright, Numerical Optimization, Springer-Verlag, New
York, 1999.
[144] A. Olivares, J.M. Moguerza, and F.J. Prieto, Nonconvex optimization using
negative curvature within a modified linesearch, European J. Oper. Res., 189
(2008), pp. 706–722.
[145] J.M. Ortega and W.C. Rheinboldt, Iterative solution of nonlinear equations
in several variables, Academic Press, New York, 1970.
[146] , Iterative solution of nonlinear equations in several variables, Society
for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000.
Reprint of the 1970 original.
[147] E.R. Panier and A.L. Tits, A superlinearly convergent feasible method for the
solution of inequality constrained optimization problems, SIAM J. Control
Optim., 25 (1987), pp. 934–950.
[148] , On combining feasibility, descent and superlinear convergence in inequal-
ity constrained optimization, Math. Programming, 59 (1993), pp. 261–276.
[149] P.M. Pardalos and G. Schnitger, Checking local optimality in constrained
quadratic programming is NP-hard, Oper. Res. Lett., 7 (1988), pp. 33–35.
[150] P.M. Pardalos and S.A. Vavasis, Quadratic programming with one negative
eigenvalue is NP-hard, J. Global Optim., 1 (1991), pp. 15–22.
[151] M.J.D. Powell, A method for nonlinear constraints in minimization problems,
in Optimization, R. Fletcher, ed., London and New York, 1969, Academic
Press, pp. 283–298.
[152] , The convergence of variable metric methods for nonlinearly constrained
optimization calculations, in Nonlinear Programming, 3 (Proc. Sympos., Spe-
cial Interest Group Math. Programming, Univ. Wisconsin, Madison, Wis.,
1977), Academic Press, New York, 1978, pp. 27–63.
[153] , A fast algorithm for nonlinearly constrained optimization calculations,
in Numerical Analysis, Dundee 1977, G.A. Watson, ed., no. 630 in Lecture
Notes in Mathematics, Heidelberg, Berlin, New York, 1978, Springer Verlag,
pp. 144–157.
[154] , On the quadratic programming algorithm of Goldfarb and Idnani, Math.
Programming Stud., (1985), pp. 46–61.
[155] D.P. Robinson, Primal-Dual Methods for Nonlinear Optimization, PhD thesis,
Department of Mathematics, University of California, San Diego, Septem-
ber 2007.
[156] S.M. Robinson, A quadratically-convergent algorithm for general nonlinear pro-
gramming problems, Math. Program., 3 (1972), pp. 145–156.
[157] , Perturbed Kuhn-Tucker points and rates of convergence for a class of
nonlinear programming algorithms, Math. Program., 7 (1974), pp. 1–16.
[158] R.T. Rockafellar, Augmented Lagrange multiplier functions and duality in
nonconvex programming, SIAM J. Control Optim., 12 (1974), pp. 268–285.
[159] O. Schenk and K. Gärtner, Solving unsymmetric sparse systems of linear
equations with PARDISO, in Computational science—ICCS 2002, Part II
(Amsterdam), Vol. 2330 of Lecture Notes in Comput. Sci., Springer, Berlin,
2002, pp. 355–363.
[160] K. Schittkowski, The nonlinear programming method of Wilson, Han, and
Powell with an augmented Lagrangian type line search function. I. Conver-
gence analysis, Numer. Math., 38 (1981/82), pp. 83–114.
[161] , The nonlinear programming method of Wilson, Han, and Powell with an
augmented Lagrangian type line search function. II. An efficient implemen-
tation with linear least squares subproblems, Numer. Math., 38 (1981/82),
pp. 115–127.

www.it-ebooks.info
224 PHILIP E. GILL AND ELIZABETH WONG

[162] , On the convergence of a sequential quadratic programming method with an


augmented Lagrangian line search function, Math. Operationsforsch. Statist.
Ser. Optim., 14 (1983), pp. 197–216.
[163] R.B. Schnabel and E. Eskow, A new modified Cholesky factorization, SIAM J.
Sci. and Statist. Comput., 11 (1990), pp. 1136–1158.
[164] D.F. Shanno, Conditioning of quasi-Newton methods for function minimization,
Math. Comp., 24 (1970), pp. 647–656.
[165] R.A. Tapia, A stable approach to Newton’s method for general mathematical
programming problems in Rn , J. Optim. Theory Appl., 14 (1974), pp. 453–
476.
[166] , Diagonalized multiplier methods and quasi-Newton methods for con-
strained optimization, J. Optim. Theory Appl., 22 (1977), pp. 135–194.
[167] Ph. L. Toint, An assessment of nonmonotone linesearch techniques for uncon-
strained optimization, SIAM J. Sci. Comput., 17 (1996), pp. 725–739.
[168] S. Ulbrich, On the superlinear local convergence of a filter-SQP method, Math.
Program., 100 (2004), pp. 217–245.
[169] G. Van der Hoek, Asymptotic properties of reduction methods applying lin-
early equality constrained reduced problems, Math. Program., 16 (1982),
pp. 162–189.
[170] A. Wächter and L.T. Biegler, Line search filter methods for nonlinear pro-
gramming: local convergence, SIAM J. Optim., 16 (2005), pp. 32–48 (elec-
tronic).
[171] , Line search filter methods for nonlinear programming: motivation and
global convergence, SIAM J. Optim., 16 (2005), pp. 1–31 (electronic).
[172] R.B. Wilson, A Simplicial Method for Convex Programming, PhD thesis, Har-
vard University, 1963.
[173] S.J. Wright, Superlinear convergence of a stabilized SQP method to a degenerate
solution, Comput. Optim. Appl., 11 (1998), pp. 253–275.
[174] , Modifying SQP for degenerate problems, SIAM J. Optim., 13 (2002),
pp. 470–497.
[175] , An algorithm for degenerate nonlinear programming with rapid local con-
vergence, SIAM J. Optim., 15 (2005), pp. 673–696.
[176] Y.-X. Yuan, Conditions for convergence of trust region algorithms for nonsmooth
optimization, Math. Programming, 31 (1985), pp. 220–228.
[177] , On the superlinear convergence of a trust region algorithm for nonsmooth
optimization, Math. Programming, 31 (1985), pp. 269–285.
[178] , On the convergence of a new trust region algorithm, Numer. Math., 70
(1995), pp. 515–539.
[179] W.I. Zangwill, Non-linear programming via penalty functions, Management
Sci., 13 (1967), pp. 344–358.
[180] H. Zhang and W.W. Hager, A nonmonotone line search technique and its appli-
cation to unconstrained optimization, SIAM J. Optim., 14 (2004), pp. 1043–
1056 (electronic).

www.it-ebooks.info
USING INTERIOR-POINT METHODS WITHIN
AN OUTER APPROXIMATION FRAMEWORK FOR
MIXED INTEGER NONLINEAR PROGRAMMING
HANDE Y. BENSON∗

Abstract. Interior-point methods for nonlinear programming have been demon-


strated to be quite efficient, especially for large scale problems, and, as such, they
are ideal candidates for solving the nonlinear subproblems that arise in the solution
of mixed-integer nonlinear programming problems via outer approximation. However,
traditionally, infeasible primal-dual interior-point methods have had two main perceived
deficiencies: (1) lack of infeasibility detection capabilities, and (2) poor performance
after a warmstart. In this paper, we propose the exact primal-dual penalty approach
as a means to overcome these deficiencies. The generality of this approach to handle
any change to the problem makes it suitable for the outer approximation framework,
where each nonlinear subproblem can differ from the others in the sequence in a variety
of ways. Additionally, we examine cases where the nonlinear subproblems take on spe-
cial forms, namely those of second-order cone programming problems and semidefinite
programming problems. Encouraging numerical results are provided.

Key words. interior-point methods, nonlinear programming, integer programming.

AMS(MOS) subject classifications. 90C51, 90C11, 90C30, 90C25.

1. Introduction. The optimization problem considered in this paper


is the Mixed Integer Nonlinear Programming (MINLP) problem of the form
min f (x, y)
x,y
s.t. h(x, y) ≥ 0
Ax x ≤ bx (1.1)
Ay y ≤ by
y ∈ Z p,
where x ∈ Rn , f : Rn+p → R and h : Rn+p → Rmn are twice continuously
differentiable, Ax ∈ Rmx ×n , Ay ∈ Rmy ×p , bx ∈ Rmx , by ∈ Rmy , and the
linear constraints define polyhedral sets X and Y, which we assume to be
bounded. When p = 0, we have the standard nonlinear programming prob-
lem (NLP), and when n = 0, we have an integer nonlinear programming
problem. The constraints h(x, y) ≥ 0 are nonlinear, and they can take
special forms such as second-order cone constraints:
3 3
3 x̂ 3
ẑ − 3 3
3 ŷ 3 ≥ 0,
2

where ẑ is a scalar equal to one of the elements of x or y, x̂ and ŷ are vectors


consisting of some or all elements of x and y, respectively, and  ·2 denotes
∗ Department of Decision Sciences, Bennett S. LeBow School of Business, Drexel

University, Philadelphia, PA 19104 ([email protected]). Research supported by NSF


grant CCF-0725692.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 225
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_7,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
226 HANDE Y. BENSON

the Euclidean norm. These special forms include nonlinear formulations of


semidefinite constraints as well, as shown in [10], [13], and [14].
The existing algorithms for solving a problem of the form (1.1) em-
ploy a two-level approach. In Branch-and-Bound ([28], [25]), the outer
level successively partitions the feasible region of (1.1) by introducing or
modifying bounds on y, while the inner level solves the continuous subprob-
lems obtained by relaxing the integer constraints. In Outer Approximation
([18],[31]), the outer level solves a mixed-integer linear programming prob-
lem derived by the linearization of the objective and constraint functions
at the solutions of the inner problem which is obtained by fixing the val-
ues of y in the original MINLP. Generalized Benders Decomposition [22]
similarly alternates between the solution of a mixed-integer linear program-
ming problem, but in the dual sense, and a continuous inner problem af-
ter fixing y. Other approaches, such as cutting-plane algorithms [3] exist
for special forms of (1.1) including second-order cone programming prob-
lems. Software implementing these methods include SBB [21], MINLP
[26], BARON [32], DICOPT [36], AlphaECP [38], FilMINT [1], and
Bonmin [11].
Regardless of the approach chosen to solve (1.1), a sequence of con-
tinuous optimization problems need to be solved, and the solution of these
problems can account for a significant portion of the total runtime. There-
fore, the solution algorithm employed to solve these problems must be
efficient. A major source of this efficiency stems from the ability to reuse
information obtained from solving related problems, or warmstarting. The
solution algorithm must also be provably convergent in a sense that guar-
antees to find the global optimum for the continuous problem when such
a solution exists and to issue a certificate of infeasibility when it does not.
Failing on any single continuous relaxation will mean the failure of the
overall algorithm.
In this paper, we will examine the use of an interior-point method
as the inner level solution algorithm. Lack of warmstart and infeasibil-
ity detection capabilities have long been the main perceived difficulties of
interior-point methods. Restarting from the solution of a previous problem
may lead the algorithm to encounter numerical problems or even to stall,
since the complementarity conditions lead to some nonnegative variables to
be on the boundary at the given solution. For an infeasible interior-point
method, it may be advantageous to start and remain infeasible throughout
the solution process, and therefore, issuing a certificate of infeasibility for
the problem in general is rather difficult. Additionally, a primal-dual inte-
rior point method seeks the analytic centers of the faces of optimal primal
and dual solutions. Constraint qualifications that confirm the existence
and finiteness of both primal and dual solutions are required to guaran-
tee convergence. In fact, only one of the MINLP codes mentioned above,
[11], uses a pure interior-point method, that is implemented in IPOPT
[37], to solve the inner problem. Nevertheless, numerical studies such as

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 227

[9], [8], and [29] demonstrate that interior-point solvers such as ipopt [37],
loqo [34], and knitro [30] are highly efficient and are the only solvers
capable of handling very large scale NLPs. Therefore, it is important to
resolve difficulties associated with warmstarting and infeasibility detection
to implement an efficient and robust MINLP solver using interior-point
methods.
In [5], we analyzed the use of an interior-point method within a branch-
and-bound framework. We showed that the changing bounds would guar-
antee that the algorithm would stall when warmstarting, and that even with
a coldstart, fixed variables and infeasible problems would cause the algo-
rithm to fail. As a remedy, we proposed a primal-dual penalty approach,
which was able to greatly improve efficiency, handle fixed variables, and
correctly identify all infeasible subproblems in numerical testing.
In this paper, we turn our attention to interior-point methods within
the Outer Approximation framework. Similar challenges arise in this frame-
work, as well. One key difference is that we will limit ourselves to MINLPs
with convex continuous relaxations, that is, cases where f is convex and h
are concave for (1.1). This is required for the underlying theory of the Outer
Approximation framework, and, while it is a limitation, it will also give us
the chance to explore certain special classes of convex problems, such as
second-order cone programming problems (SOCPs) and semidefinite pro-
gramming problems (SDPs), that arise in the continuous relaxations.
The outline of the paper is as follows: We start with a brief description
of the Outer Approximation framework in Section 2. In Section 3, we intro-
duce an infeasible interior-point method and analyze its challenges within
a MINLP algorithm. To address these challenges, we propose the exact
primal-dual penalty method. In Section 4, we turn our attention to the
performance of our algorithm on certain special classes of problems, such
as SOCPs and SDPs. We present implementation details of our approach
and favorable numerical results on problems from literature in Section 5.
2. Outer approximation. The Outer Approximation (OA) algo-
rithm solves an alternating sequence of NLPs and mixed-integer linear
programming problems (MILPs) to solve (1.1). For each y k ∈ Y ∩ Z p ,
the NLP to be solved is obtained from (1.1) by fixing y = y k :

min f (x, y k )
x
s.t. h(x, y k ) ≥ 0 (2.1)
Ax x ≤ bx .

(2.1) may or may not have a feasible solution. As such, we let xk denote
the solution if one exists and the minimizer of infeasibility otherwise. We
define F(Ŷ) as the set of all pairs of (xk , y k ) where xk is an optimal solution
of (2.1) and I(Ŷ) as the set of all pairs of (xk , y k ) where (2.1) is infeasible
for y k ∈ Ŷ. We also define the following MILP:

www.it-ebooks.info
228 HANDE Y. BENSON

min z
x,y,z  
x − xk
s.t. f (xk , y k ) + ∇f (xk , y k )T k ≤ z, ∀(xk , y k ) ∈ F(Ŷ)
 y − y 
x − xk
h(xk , y k ) + ∇h(xk , y k )T k ≥ 0, ∀(xk , y k ) ∈ F(Ŷ)
 y − y  (2.2)
x − xk
h(xk , y k ) + ∇h(xk , y k )T ≥ 0, ∀(xk , y k ) ∈ I(Ŷ)
y − yk
Ax x ≤ bx
Ay y ≤ by
y ∈ Z p,

where z ∈ R is a dummy variable.


Assuming that f is convex and h are concave, (1.1) is equivalent to
(2.2) for Ŷ = Y, as shown in [18], [20], and [11]. Of course, solving (2.2)
for Ŷ = Y requires the solution of (2.1) for every yk ∈ Y ∩ Z p , which
constitutes the worst-case scenario. Instead, at each iteration, we solve
(2.2) with Ŷ ⊆ Y . Starting with y 0 ∈ Y ∩ Z p , we let Ŷ = {}. Then,
at each iteration k = 0, . . . , M , we solve (2.1) with y k to obtain xk , let
Ŷ = Ŷ ∪ {y k }, and solve (2.2). The solution gives y k+1 , and we repeat
the process. Throughout the iterations, we keep track of an upper bound
on the optimal objective function value of (2.2). Letting the upper bound
start at ∞, we update it with the optimal objective function value of (2.1)
whenever a solution exists. If this value is not less than the current upper
bound, then we stop the algorithm and declare that the pair (xk , y k ) which
gave the current upper bound is the optimal solution to (1.1).

3. Interior-point methods. The OA approach described above re-


quires the repeated solves of NLPs obtained by fixing the values of the
integer variables y in (1.1). At iteration k of the OA algorithm, (2.1)
is solved for a different value of y k . For each value of y k , therefore, we
can expect changes to both the objective function and the constraints of
(2.1). Depending on the implementation, these changes could even be re-
flected in the problem structure, including the number of constraints and
the nonzero structures of the Jacobian and the Hessian. To solve (2.1), we
use an interior-point method, for which we now provide an overview. A
more detailed explanation can be found in [35].
For ease of notation, let us rewrite (2.1) as follows:

min f (x, y k )
x (3.1)
s.t. g(x, y k ) ≥ 0,

where
 
h(x, y k )
g(x, y k ) = .
bx − Ax x

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 229

We start by adding the nonnegative slacks w ∈ Rm to the inequality


constraints in (3.1).

min f (x, y k )
x,w
s.t. g(x, y k ) − w = 0 (3.2)
w ≥ 0.

We incorporate the slacks in a logarithmic barrier term in the objective


function and eliminate the nonnegativity constraints:
m

k
min f (x, y ) − μ log wi
x,w (3.3)
i=1
k
s.t. g(x, y ) − w = 0,

where μ > 0 is the barrier parameter.


Denoting the Lagrange multipliers by λ, the first order conditions for
(3.3) are

∇f (x, y k ) − A(x, y k )T λ = 0,
−μe + W Λe = 0, (3.4)
g(x, y k ) − w = 0,

where e is the vector of all ones of appropriate dimension, A(x, y k ) is the


transpose of the Jacobian of the constraints, and W and Λ are diagonal
matrices with entries from w and λ, respectively.
Newton’s method is employed to iterate to a point satisfying (3.4).
Letting
m

H(x, yk , λ) = ∇2 f (x, y k ) − λi ∇2 gi (x, y k ),
i=1
σ = ∇f (x, y k ) − A(x, y k )T λ, (3.5)
γ = μW −1 e − λ,
ρ = w − g(x, y k ),

the directions given by Newton’s method are found by solving the KKT
system:
⎡ ⎤⎡ ⎤ ⎡ ⎤
−W −1 Λ −I Δw −γ
⎣ −H AT ⎦ ⎣ Δx ⎦ = ⎣ σ ⎦ . (3.6)
−I A Δλ ρ

Note that we have omitted the use of function arguments for ease of display.
Letting

E = W Λ−1

www.it-ebooks.info
230 HANDE Y. BENSON

we can eliminate the slacks to obtain the reduced KKT system:


    
−H AT Δx σ
= . (3.7)
A E Δλ ρ + Eγ

The reduced KKT system is solved by using the LDLT form of Cholesky
factorization, including exploitation of sparsity by reordering the columns
in a symbolic Cholesky routine. As stated before, for each yk , the sparsity
structure of the matrix in (3.7) may change. Such changes are quite com-
mon, especially when y are binary. Fixing yjk to 0 may cause terms in the
objective or the constraint functions to drop. A careful implementation of
the underlying algorithm can take advantage of such changes if they bring
about substantial reduction in size or complexity for certain subproblems,
or use a general enough sparsity structure so that each subsequent nonlin-
ear subproblem can be solved without additional sparsity structure setups
or calls to the symbolic Cholesky routine.
Once the step directions Δx and Δλ are obtained from (3.7), we can
obtain the step directions for the slack variables from the following formula:

Δw = W Λ−1 (μW −1 e − λ − Δλ). (3.8)

The algorithm then proceeds to a new estimate of the optimum by

x(l+1) = x(l) + α(l) Δx(l)


λ(l+1) = λ(l) + α(l) Δλ(l) (3.9)
w(l+1) = w(l) + α(l) Δw(l) ,

where the superscripts denote the iteration number, α(l) is chosen to ensure
that the slacks w(l+1) and the dual variables λ(l+1) remain strictly positive
and sufficient progress toward optimality and feasibility is attained. At
each iteration, the value of the barrier parameter may also be updated as
a function of W (l+1) λ(l+1) . Both the notion of sufficient progress and the
exact formula for the barrier parameter update vary from one solver to
another, but the general principle remains the same.
The algorithm concludes that it has reached an optimal solution when
the primal infeasibility, the dual infeasibility, and the average complemen-
tarity are all less than a given tolerance level. For (3.1), we have that

primal infeasibility = ρ∞


dual infeasibility = σ∞
wT λ
average complementarity = ,
m

where  · ∞ denotes the infinity norm.

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 231

3.1. Challenges when using interior-point methods. An infea-


sible interior-point method, such as the one described above, has two main
challenges within the OA framework: guaranteeing that a certificate of in-
feasibility will be issued when a solution does not exist and warmstarting.
An interior-point method such as the one described above is also known
as an infeasible interior-point method. This terminology is used to indi-
cate that the initial values for the primal variables x are not required to
be feasible for the problem. In fact, the iterates are not even required
to be feasible until optimality is also attained. Therefore, an infeasible
interior-point method can potentially iterate forever when attempting to
solve an infeasible problem. In practice, the solver will stop after reaching
a preset iteration limit, and the result will be inconclusive. This leads to a
failure in the overall algorithm, as we cannot produce a certificate of opti-
mality or infeasibility at an iteration. Therefore, an infeasibility detection
scheme is required. This scheme could be a “Phase I” approach where the
interior-point method is first employed to solve the problem of minimizing
constraint violations. If a feasible solution is found, the algorithm proceeds
toward the optimum from there. Otherwise, a certificate of infeasibility
is issued. While detecting infeasibility early in the solution process, this
approach could significantly increase the solution time when an optimal
solution exists, since it essentially requires the solution of two problems.
Another possibility is to use the so-called “elastic mode,” where the algo-
rithm starts solving (2.1), but switches to minimizing the infeasibility only
after certain trigger conditions are observed. Therefore, when an optimal
solution to the original problem exists, it can be found within a single solve,
and if the problem is infeasible, trigger conditions that switch over to the
feasibility problem early enough can keep the number of iterations reason-
able for issuing a certificate of infeasibility. However, in the case of solving
an NLP using the interior-point method described above, defining such
trigger conditions could be a challenge. A third possibility is to use a one-
shot approach, where a reformulation of (2.1) is solved and the solution of
this reformulation gives the optimal solution or a certificate of infeasibility
for the original problem. An example is self-dual embedding, which is well-
developed for second-order cone and semidefinite programming problems.
Version for convex NLPs ([39],[27]) also exist.
Even if an optimal solution exists, the interior-point method described
above may not be guaranteed to find it. Certain constraint qualifica-
tions required for standard convergence proofs, such as the Mangasarian-
Fromowitz Constraint Qualification (MFCQ), may not be satisfied. There-
fore, it is important to use an approach that is provably convergent under
mild assumptions. Penalty methods, which reformulate the problem (2.1)
and use interior-point methods to solve the resulting problem, are such ap-
proaches. [2], [6], [33], and [24] all use penalty methods, and the algorithms
proposed in these papers make few assumptions, including differentiabil-

www.it-ebooks.info
232 HANDE Y. BENSON

ity and boundedness of the iterates, without requiring strong constraint


qualifications.
Warmstarting is the use of information obtained during the solution of
a problem to solve the subsequent, closely-related problems. For the case
of MINLP, warmstarting will refer specifically to setting the initial solution
(including primal, dual, and slack variables) of an NLP of the form (2.1) to
the optimal solution of the previous one solved within the OA framework.
Because of the complementarity conditions, at the optimal solution, some
of the nonnegative slack and dual variables are equal to 0, but starting
the next problem from these values may cause the algorithm to stall. The
following example illustrates exactly what can go wrong:
min (x − 0.25)2 + y
x,y
s.t. −60x3 ≥ −y
x ≥ 0
y ∈ {0, 1}.
The reduced KKT system for this problem is:
⎡ ⎤ ⎛ ⎞
−2 − 360xλ1 −180x2 1 ⎛ ⎞ 2x − 0.5 + 180x2 λ1

⎢ −180x2 w1 ⎥ Δx ⎜ μ − (y − 60x3 ) ⎟
⎢ 0 ⎥
⎥ ⎝Δλ1 ⎠ = ⎜ ⎟.
⎣ λ1 ⎝ λ1 μ ⎠
w2 ⎦ Δλ2 −x
1 0 λ2
λ2
Let y = 1 for the first subproblem. Then, x∗ = 0.25, w∗ = (0.062, 0.25),
and λ∗ = (0, 0). Let y = 0 for the next subproblem. In the first iteration,
Δx = 0, Δλ2 = 0, Δw2 = 0, and Δλ1 > 0 but very close to 0. Then,
Δw1 = λμ1 − w1 − w λ1 Δλ1 = −1. The steplength is shortened to less than
1

0.062, and the algorithm becomes stuck at the old solution.


One possible remedy is to simply re-initialize the slack and dual vari-
ables away from 0. Doing so would modify both the diagonal matrix, D, in
(3.7) and the right-hand side of (3.7), forcing the variables to move. While
this seems like a simple remedy, there are two drawbacks to this approach.
First, the initialization is rather arbitrary and may adversely affect the
efficiency algorithm at the current node. Secondly, note that simply re-
initializing some of the variables may result in negative step directions for
other dual and slack variables. Since the interior-point method shortens
the steplength to keep such variables strictly positive, the algorithm may
still become stuck.
Traditional penalty approaches, which only provide primal relaxations,
may still become stuck when used with a primal-dual interior-point method
where the steplength α depends on the dual iterates as well. However, they
have other desirable properties, including infeasibility detection capabili-
ties and regularizations that automatically satisfy constraint qualifications.
The exact primal-dual penalty method proposed in [7] for linear program-
ming and in [8] for nonlinear programming is a remedy that has been

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 233

demonstrated to work for warmstarts. This method relaxes the nonnega-


tivity constraint on the dual and slack variables and provides regularization
for the matrix in the reduced KKT system (3.7). Thus, the optimal solu-
tion of one problem can be used without modification to provide a warm-
start for another, the regularization ensures that the variables that need to
move indeed make progress, and the algorithm does not become stuck due
to the nonnegativity constraints. This approach was shown to work well for
mixed-integer nonlinear programming within the branch-and-bound frame-
work in [5]. Additional benefits include robustness due to regularization
and infeasibility detection capabilities. Details of this approach for gen-
eral nonlinear programming problems are given in [8], and we provide an
overview here.
4. The exact primal-dual penalty approach. The primal-dual
penalty problem corresponding to (3.1) has the form

min f (x, y k ) + cT ξ
x,w,ξ
s.t. g(x, y k ) − w = 0 (4.1)
−ξ ≤ w ≤ u
ξ ≥ 0,

where c ∈ Rm are the primal penalty parameters, u ∈ Rm are the dual


penalty parameters, and ξ ∈ Rm are the primal relaxation variables. This
new form of the primal penalty problem differs from the classical approach
presented in [19] in two crucial aspects: (1) The slacks, w, rather than
the constraint functions and the bounds themselves are relaxed, and (2)
upper bounds are also added to the slacks. Both of these changes are
made specifically for warmstarting, as relaxing the slacks removes their
nonnegativity constraints and allows for longer steps and the upper bounds
serve to relax the dual variables in a similar manner. The dual problem
associated with (4.1) can be expressed as follows:

max f (x, y k ) − ∇f (x, y k )T x − (h(x, y k ) − A(x, y k )x)T λ − uT ψ


λ,ψ
s.t. ∇f (x, y k ) − A(x, y k )T λ = 0 (4.2)
−ψ ≤ λ ≤ c − ψ
ψ ≥ 0,

where ψ ∈ Rm are the dual relaxation variables. These relaxation variables


are incorporated into the objective function using a penalty term with dual
penalty parameters u. For further details of the primal and dual problems,
as well as a proof of the exactness of the penalty approach, the reader is
referred to [7] and [8].
We follow the development of earlier in Section 2 in order to present
the algorithm to solve (4.1). The logarithmic barrier problem associated
with (4.1) is

www.it-ebooks.info
234 HANDE Y. BENSON

min f (x, y k ) + cT ξ
x,y k ,w,ξ
m
 m
 m

−μ log(ξi ) − μ log(wi + ξi ) − μ log(ui − wi ) (4.3)
i=1 i=1 i=1
s.t. g(x, y k ) − w = 0,
where μ > 0 is the barrier parameter. Letting (λ) once again denote the
dual variables associated with the remaining constraints, the first-order
conditions for the Lagrangian of (4.3) can be written as
g(x, yk ) − w = 0
∇f (x, y k ) − A(x, y k )T λ = 0
λ − μ(W + Ξ)−1 e + μ(U − W )−1 e = 0
c − μΞ−1 e − μ(W + Ξ)−1 e = 0
where Ξ and U are the diagonal matrices with the entries of ξ and u,
respectively. Making the substitution
ψ = μ(U − W )−1 e
we can rewrite the first-order conditions as
g(x, y k ) − w = 0
∇f (x, y k ) − A(x, y k )T λ = 0
(W + Ξ)(Λ + Ψ)e = μe (4.4)
Ξ(C − Λ − Ψ)e = μe
Ψ(U − W )e = μe
where Ψ and C are the diagonal matrices with the entries of ψ and c,
respectively. Note that the new variables ψ serve to relax the nonnegativity
requirements on the dual variables λ, so we refer to them as the dual
relaxation variables.
Applying Newton’s Method to (4.4), and eliminating the step direc-
tions for w, ξ, and ψ, the reduced KKT system arising in the solution of
the penalty problem (4.1) has the same form as (3.7) with
1. /−1 2−1
E = (Λ + Ψ)−1 (W + Ξ) + Ξ(C − Λ − Ψ)−1 + Ψ(U − W )−1
. /−1 (4.5)
γ = .(Λ + Ψ)−1 (W + Ξ) + Ξ(C − Λ − Ψ)−1/ . /
−1 −1 −1
μ(Λ + Ψ) e − μ(C − Λ − Ψ) e − w − μ(U − W ) e − ψ .

The steplength, α(k) , at each iteration k is chosen to ensure that


w(k+1) + ξ (k+1) > 0
λ(k+1) + ψ (k+1) > 0
ξ (k+1) > 0
ψ (k+1) > 0
u − w(k+1) > 0
c − λ(k+1) − ψ (k+1) > 0

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 235

and sufficient progress toward optimality and feasibility is made. The


barrier parameter, μ, may be updated at each iteration as a function of
(W + Ξ)(Λ + Ψ)e, Ξ(C − Λ − Ψ)e, and Ψ(U − W )e.
There are several things to note about this approach. First, the spar-
sity structure of the reduced KKT matrix of the penalty problem is the
same as the sparsity structure of (3.7). There are also no additional func-
tion evaluations or other time consuming computations required. This
means that solving the penalty problem (4.1) instead of (3.1) does not re-
quire significant additional computational effort. Second, by modifying E,
the relaxation/penalty scheme is said to regularize the reduced KKT ma-
trix, providing numerical stability as well as aiding in warmstarting. Third,
steplength control no longer relies on the dual and slack variables of the
original problem, thereby allowing for longer steps in the initial iterations
to ensure that the algorithm does not become stuck.
The primal-dual penalty approach presents an ideal remedy to the
warmstarting issues of an interior-point method. For each NLP subprob-
lem, we can use the optimal primal, dual, and slack variable values of the
previous subproblem as the initial solution, and simply re-initialize the
primal and dual relaxation variables in order to facilitate the original vari-
ables to move toward a new optimum. The penalty parameters need to
be chosen large enough to admit the optimal solution of the subproblem,
and warmstart information may be useful to determine appropriate values.
They may also need to be updated during the course of the algorithm.
4.1. Setting and updating the penalty parameters. The most
important aspect of setting the initial values of the penalty parameters is
to ensure that they are sufficiently larger than those components of the
current iterate for which they serve as upper bounds. We let the solution
of one NLP subproblem be (x∗ , w∗ , λ∗ ). The penalty parameters are set as
follows:
u = w∗ + κw e
c = λ∗ + ψ (0) + κλ e
where
κw = max(g(x∗ , y ∗ ), w∗ , 1.0)
κλ = max(λ∗ , 1.0)
The relaxation variables are initialized as
ξ (0) = βκw e
(4.6)
ψ (0) = βκλ e
where β is a constant with a default value of 10−4. These initializations are
generally sufficient after a warmstart to start the penalty method without
moving the iterates too far from the current point. Note that the relax-
ation is performed using a variable, so if a larger relaxation is needed, the
variables, ξ and ψ, will move as necessary.

www.it-ebooks.info
236 HANDE Y. BENSON

Since the initial values of the penalty parameters, u and c, may not
be large enough to admit the optimal solution, we also need an updating
scheme for these parameters. Given the relaxation, an optimal solution
can always be found for (4.1), and one possible “static” updating scheme
is to solve a problem to optimality and to increase the penalty parame-
ters if their corresponding relaxation variables are not sufficiently close to
zero. However, this may require multiple solves of a problem and sub-
stantially increase the number of iterations necessary to find the optimal
solution. Instead, we can use a “dynamic” updating scheme, where the
penalty parameters are checked at the end of each iteration and updated.
(k+1) (k) (k+1) (k)
For i = 1, . . . , m + mx , if wi > 0.9ui , then ui = 10ui . Similarly,
(k+1) (k) (k) (k+1) (k)
if λi + ψi > 0.9ci , then ci = 10ci .
4.2. Infeasibility detection. In the preceding discussion, we estab-
lished what can go wrong when warmstarting an interior-point method and
proposed the exact primal-dual penalty approach as a remedy. Another
concern for improving the inner level algorithm within our framework was
the efficient identification of infeasible NLP subproblems. The primal-dual
penalty method described as a remedy for warmstarting can also aid in
infeasibility identification. Since all of the slack variables are relaxed, the
penalty problem (4.1) always possesses a feasible solution. In addition, the
upper bounds on the slack variables guarantee that an optimal solution
to (4.1) always exists. Therefore, a provably convergent NLP algorithm is
guaranteed to find an optimal solution to (4.1). If this solution has the
property that ξi → a for at least one i = 1, . . . , m + mx for some scalar
a > 0 as ci → ∞, then the original problem is infeasible.
It is impractical to allow a penalty parameter to become infinite. How-
ever, a practical implementation can be easily devised by simply dropping
the original objective function and minimizing only the penalty term, which
is equivalent to letting all the penalty parameters become infinite. There-
fore, a feasibility restoration phase similar to the “elastic mode” of snopt
[23] can be used, in that the problem
min cT ξ
x,w,ξ
s.t. g(x, y k ) − w = 0 (4.7)
−ξ ≤ w ≤ u
ξ ≥ 0,
is solved in order to minimize infeasibility. It differs from snopt’s version
in that the slack variables are still bounded above by the dual penalty
parameters. Since these parameters get updated whenever necessary, we
can always find a feasible solution to (4.7). If the optimal objective function
value is nonzero (numerically, greater than the infeasibility tolerance), a
certificate of infeasibility can be issued.
While a feasibility problem can be defined for the original NLP sub-
problem (2.1) as well, a trigger condition for switching into the “elastic

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 237

mode” for solving it is not easy to define within the context of the interior-
point method of Section 3. However, the exact primal-dual penalty ap-
proach can simply track the number of dynamic updates made to the
penalty parameters and switch over to solving (4.7) after a finite num-
ber of such updates are performed. In our numerical testing, we have set
this trigger to occur after three updates to any single penalty parameter.
Note that other infeasibility detection schemes based on penalty meth-
ods are available (see [16]) which would not require the solution of a sep-
arate feasibility problem. As their warmstarting capabilities are yet un-
known, we will investigate such approaches in future work.
5. Special forms of convex NLPs. One class of problems that fits
well into the OA framework is conic programming, specifically second-order
cone programming and semidefinite programming. This class of problems
is especially important in a variety of engineering applications and as re-
laxations of some NP-hard combinatorial problems. Much of the research
has focused on problems that are otherwise linear, due in part to the abun-
dance of strong theoretical results and the ease of extending established
and implemented linear programming algorithms. However, as the models
in each of these areas become more realistic and more complicated, many
of the problems are expressed with nonlinearities in the objective func-
tion and/or the constraints. To handle such nonlinearities efficiently, one
approach is to fit the problem into the NLP framework through reformu-
lation or separation into a series of NLP subproblems. In addition, these
problems can also have some discrete variables, and fitting them into an
NLP framework allows for the use of the efficient mixed-integer nonlinear
programming techniques for their solution.
In standard form, a mixed-integer nonlinear cone programming prob-
lem is given by

min f (x, y)
x,y
s.t. h(x, y) ≥ 0 (5.1)
x∈K
y ∈ Y,

where K is a cone. The second-order, or Lorentz, cone is defined by

K = {(x0 , x1 ) ∈ Rn : x0 ∈ R, x1 ∈ Rn−1 , x1 2 ≤ x0 }, (5.2)

where  · 2 denotes the Euclidean norm. The semidefinite cone is defined


by

K = {x ∈ Rn : mat(x)  0}, (5.3)

where mat(x) ∈ Rk×k with n = k 2 is the matlab-like notation for the


columnwise definition of a matrix from the vector x, and  0 constrains

www.it-ebooks.info
238 HANDE Y. BENSON

this matrix to be symmetric and positive semidefinite. Note that K can


also represent the intersection of finitely many such cones.
The primal-dual penalty method can be applied to this problem just
as in (4.1). The cone constraint can be handled as

x+ξ ∈ K
u−x ∈ K (5.4)
ξ ∈ K.

For a second order cone, it is sufficient to pick ξ = (ξ0 , 0), and for a semidef-
inite cone, we only need mat(ξ) to be a diagonal matrix. As before, the
objective function is also converted to

f (x, y) + cT ξ.

Since both the second-order cone and the cone of positive semidefinite
matrices are self-dual, the dual problem also involves a cone constraint,
which is similarly relaxed and bounded.
For both second-order and semidefinite cones, the reformulation of
the cone constraints to fit into the NLP framework have been extensively
discussed in [10]. For second-order cones, an additional challenge is the
nondifferentiability of the Euclidean norm in (5.2). In fact, if the optimal
solution includes x∗1 = 0, it can cause numerical problems for convergence
of the NLP algorithm and theoretical complications for the formulation of
the subsequent MILP even if numerical convergence can be attained for
the NLP. There are several ways around this issue: if a preprocessor is
used and a nonzero lower bound for x0 is available, then the so-called ratio
reformulation (see [10]) can be used to rewrite the cone constraint of (5.2)
as
xT1 x1
≤ x0 , x0 ≥ 0.
x0

Similarly, if preprocessing can determine that x1 2 and x0 are bounded


above by small values, then we can rewrite the cone constraint as

e(x1 x1 −x0 )/2 ≤ 1,


T 2
x0 ≥ 0.

Both of these formulations yield convex NLPs, but they are not general
enough. In our implementation, we have used the constraint as given in
(5.2), but a more thorough treatment using a subgradient approach is dis-
cussed in [17].
6. Numerical results. We implemented an OA framework and the
interior-point method using the primal-dual penalty approach in the solver
milano [4]. For comparison purposes, we also implemented the interior-
point method outlined at the beginning of Section 2. The MILPs that arise

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 239

within the OA framework were solved using a branch-and-bound algorithm


using interior-point methods to solve the LP relaxations. We tested both
codes on 12 problems from the MINLPLib [15] test suite and 2 MINLPs
with second-order cone constraints from [17]. The problems were chosen
to have convex nonlinear relaxations, to be small for easy implementation
in Matlab, and to require more than one NLP subproblem in the solution
process so that the effect of warmstarting could be measured. We included
only two of the small problems from [17] because several of the remaining
problems had artificial continuous variables and equality constraints and
only had integer variables when converted to the form (2.1). Since milano
is implemented for the Matlab environment, we converted the problems
from MINLPLib from the gams [12] format to Matlab format.

The initial primal and dual solutions used when warmstarting are the
optimal primal and dual solutions of the previous NLP subproblem. For
coldstarts, we used any user-provided initial solutions, and where none were
available, the primal variable was initialized to 0 and all nonnegative slack
and dual variables were initialized to 1. Numerical experience in Table
1 indicates that using this solution can improve the performance of the
algorithm. However, a better primal initial solution can be the optimal
x values from the current MILP. In this case, we would need to use an
approximation to the Lagrange multipliers, for example by approximately
solving a QP model of the NLP subproblem. This will be part of our future
work.

In Table 1, we present results highlighting the effect of the primal-


dual penalty approach on the interior-point method. In our testing, we
have the primal-dual penalty approach determine the subproblems to be
solved, and the columns “WarmIters” and “ColdIters” provide the average
number of iterations over those subproblems after a warmstart using the
primal-dual penalty approach and a coldstart using the original form of
the interior-point method, respectively. The column “%Impr” indicates
the percentage improvement in the number of iterations. This number is
not always positive, but the warmstart approach is never more than 17%
worse than the coldstart approach. The worsening can be remedied in
many cases using different initial values for the penalty parameters and the
relaxation variables. In the remaining 30 of the 32 subproblems solved, the
percentage improvement can range from 0 to 65%.

We also provide information on the infeasible problems identified by


the penalty approach. Since the original formulation of the interior-point
method has no mechanism with which to issue a certificate of infeasibility,
the coldstart algorithm goes to an iteration limit of 500 for each infeasible
subproblem, after making a significant computational effort. This means
that for problems alan, fuel, gbd, and synthes2, the OA algorithm would
fail after encountering an infeasible NLP subproblem.

www.it-ebooks.info
240 HANDE Y. BENSON

Table 1
Comparison of the warmstarting primal-dual penalty approach to coldstarts on
small problems from the MINLPLib test suite and two mixed-integer second-order cone
programming problems from [17] (indicated by “*” in the table). “#” indicates the NLP
subproblem being solved, WarmIters and ColdIters are the numbers of warmstart and
coldstart iterations, respectively, and %Impr is the percentage of improvement in the
number of iterations. (INF) indicates that a certificate of infeasibility was issued, and
(IL) denotes that the algorithm reached its iteration limit.

NAME n p m + mx # WarmIters ColdIters %Impr

alan 4 4 6 1 (INF) (IL) –


2 8 10 20.00
3 10 13 23.07
4 11 13 15.38
5 9 13 30.77
6 9 10 10.00
bsp5var2* 2 1 4 1 7 7 0.00
2 6 7 14.29
ex1223a 3 4 9 1 10 13 23.07
2 7 12 41.67
ex1223b 3 4 9 1 11 13 15.38
2 12 15 20.00
3 9 12 25.00
4 9 11 18.18
5 10 11 9.09
ex1223 7 4 13 1 13 12 -8.30
2 13 15 13.33
3 15 16 6.25
4 12 15 20.00
5 14 12 -16.67
fuel 12 3 15 1 (INF) (IL) –
2 18 51 64.71
gbd 1 3 2 1 (INF) (IL) –
2 7 9 22.22
gkocis 8 3 8 1 15 14 -7.14
2 11 12 8.33
oaer 6 3 6 1 13 12 -8.33
2 12 16 25.00
procsel 7 3 7 1 10 10 0.00
2 9 12 25.00
3 9 10 10.00
4 10 10 0.00
st e14 7 4 13 1 11 13 15.38
2 12 15 20.00
3 9 12 25.00
4 9 11 18.18
5 10 11 9.09
synthes1 3 3 5 1 15 14 -7.14
2 12 12 0.00
3 12 14 14.29
synthes2 6 5 12 1 (INF) (IL) –
2 15 15 0.00
3 9 14 35.71
4 9 18 50.00
5 10 17 41.18
test5* 1 4 3 1 9 9 0.00
2 6 8 25.00

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 241

7. Conclusion. In this paper, we described the solution of a mixed-


integer nonlinear programming problem using an interior-point method
within the context of an outer approximation algorithm. We resolved
the issues of warmstarting, infeasibility detection, and robustness for the
interior-point method. In doing so, we used the exact primal-dual penalty
method of [7] and [8]. The resulting algorithm was implemented using the
interior-point code MILANO [4] and tested on a suite of MINLPs. The
numerical testing yielded encouraging results.
As discussed, interior-point codes have been shown to be computation-
ally superior to other approaches in studies such as [9] for large problems.
Therefore, the proposed approach is especially attractive for large MINLPs,
where an interior-point code may be the only means of obtaining a solution
to each continuous relaxation in a reasonable amount of time. The use of
the primal-dual penalty method further improves the robustness and the
efficiency of this approach.
The next step in this study is to incorporate the proposed approach
within a more efficient algorithm to handle the integer variables and in-
troduce heuristics for generating feasible solutions quickly. Numerical re-
sults in [7] and [8] demonstrate the strong performance of the primal-dual
penalty approach under a variety of problem modifications, including the
addition of constraints and variables. Thus, we are optimistic that the
performance improvements demonstrated in this paper will continue to be
applicable when used within any integer programming framework.

Acknowledgements. The author wishes to thank Sven Leyffer and


an anonymous referee for their helpful comments and suggestions.

REFERENCES

[1] K. Abhishek, S. Leyffer, and J. Linderoth, FilMINT: An outer-approximation-


based solver for nonlinear mixed integer programs, Tech. Rep. Preprint
ANL/MCS-P1374-0906, Argonne National Laboratory, Mathematics and Sci-
ence Division, September 2006.
[2] P. Armand, A quasi-newton penalty barrier method for convex minimization prob-
lems, Computational Optimization and Applications, 26 (2003), pp. 5–34.
[3] A. Atamtürk and V. Narayanan, Conic mixed-integer rounding cuts, Research
Report BCOL.06.03, IEOR, University of California-Berkeley, December 2006.
[4] H. Benson, MILANO - a Matlab-based code for mixed-integer linear and nonlinear
optimization. http://www.pages.drexel.edu/~hvb22/milano.
[5] , Mixed-integer nonlinear programming using interior-point methods, tech.
rep., Submitted to Optimization Methods and Software, November 2007.
[6] H. Benson, A. Sen, and D. Shanno, Convergence analysis of an interior-point
method for nonconvex nonlinear programming, tech. rep., Submitted to Math-
ematical Programming Computation, February 2009.
[7] H. Benson and D. Shanno, An exact primal-dual penalty method approach to
warmstarting interior-point methods for linear programming, Computational
Optimization and Applications, 38 (2007), pp. 371–399.

www.it-ebooks.info
242 HANDE Y. BENSON

[8] , Interior-point methods for nonconvex nonlinear programming: Regular-


ization and warmstarts, Computational Optimization and Applications, 40
(2008), pp. 143–189.
[9] H. Benson, D. Shanno, and R. Vanderbei, Interior-point methods for nonconvex
nonlinear programming: Filter methods and merit functions, Computational
Optimization and Applications, 23 (2002), pp. 257–272.
[10] H. Benson and R. Vanderbei, Solving problems with semidefinite and related
constraints using interior-point methods for nonlinear programming, Mathe-
matical Programming B, 95 (2003), pp. 279–302.
[11] P. Bonami, L. Biegler, A. Conn, G. Cornuejols, I. Grossman, C. Laird,
J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Waechter, An algorithmic
framework for convex mixed integer nonlinear programs, Discrete Optimiza-
tion, 5 (2008), pp. 186–204.
[12] A. Brooke, D. Kendrick, and A. Meeraus, GAMS: A User’s Guide, Scientific
Press, 1988.
[13] S. Burer, R. Monteiro, and Y. Zhang, Solving semidefinite programs via non-
linear programming part I: Transformations and derivatives, tech. rep., TR99-
17, Dept. of Computational and Applied Mathematics, Rice University, Hous-
ton TX, 1999.
[14] , Solving semidefinite programs via nonlinear programming part II: Interior
point methods for a subclass of SDPs, tech. rep., TR99-17, Dept. of Compu-
tational and Applied Mathematics, Rice University, Houston TX, 1999.
[15] M. Bussieck, A. Drud, and A. Meeraus, MINLPLib - a collection of test models
for mixed-integer nonlinear programming, INFORMS Journal on Computing,
15(1) (2003), pp. 114–119.
[16] R. H. Byrd, F. E. Curtis, and J. Nocedal, Infeasibility detection and sqp meth-
ods for nonlinear optimization, SIAM Journal on Optimization, 20 (2010),
pp. 2281–2299.
[17] S. Drewes, Mixed integer second order cone programming. PhD thesis. Technis-
chen Universitat Darmstadt, Munich, Germany., 2009.
[18] M. Duran and I. Grossmann, An outer-approximation algorithm for a class of
mixed-integer nonlinear programs, Mathematical Programming, 36 (1986),
pp. 307–339.
[19] R. Fletcher, Practical Methods of Optimization, J. Wiley and Sons, Chichester,
England, 1987.
[20] R. Fletcher and S. Leyffer, Solving mixed integer nonlinear programs by outer
approximation, Mathematical Programming, 66 (1994), pp. 327–349.
[21] GAMS, GAMS-SBB user notes. March 2001.
[22] A. Geoffrion, Generalized benders decomposition, Journal of Optimization The-
ory and Applications, 10 (1972), pp. 237–260.
[23] P. Gill, W. Murray, and M. Saunders, User’s guide for SNOPT 5.3: A Fortran
package for large-scale nonlinear programming, tech. rep., Systems Optimiza-
tion Laboratory, Stanford University, Stanford, CA, 1997.
[24] N. Gould, D. Orban, and P. Toint, An interior-point l1-penalty method for
nonlinear optimization, Tech. Rep. RAL-TR-2003-022, Rutherford Appleton
Laboratory Chilton, Oxfordshire, UK, November 2003.
[25] O.K. Gupta and A. Ravindran, Branch and bound experiments in convex nonlin-
ear integer programming, Management Science, 31(12) (1985), pp. 1533–1546.
[26] S. Leyffer, Integrating SQP and branch-and-bound for mixed integer nonlinear
programming, Tech. Rep. NA-182, Department of Mathematics, University of
Dundee, August 1998.
[27] Z. Luo, J. Sturm, and S. Zhang, Conic convex programming and self-dual em-
bedding, Optimization Methods and Software, 14 (2000), pp. 169–218.
[28] G. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization, Wiley,
New York, 1988.

www.it-ebooks.info
USING IPMS WITHIN OUTER APPROXIMATION FOR MINLP 243

[29] J. Nocedal, J. Morales, R. Waltz, G. Liu, and J. Goux, Assessing the po-
tential of interior-point methods for nonlinear optimization, in Large-Scale
PDE-Constrained Optimization, Lecture Notes in Computational Science and
Engineering, Vol. 30, 2003, pp. 167–183.
[30] J. Nocedal and R.A. Waltz, Knitro 2.0 user’s manual, Tech. Rep. OTC 02-2002,
Optimization Technology Center, Northwestern University, January 2002.
[31] I. Quesada and I. Grossmann, An LP/NLP based branch and bound algorithm for
convex MINLP optimization problems, Computers and Chemican Engineering,
16 (1992), pp. 937–947.
[32] N. Sahinidis, Baron: A general purpose global optimization software package,
Journal of Global Optimization, 8(2) (1996), pp. 201–205.
[33] A. Tits, A. Wächter, S. Bakhtiari, T. Urban, and C. Lawrence, A primal-
dual interior-point method for nonlinear programming with strong global and
local convergence properties, SIAM Journal on Optimization, 14 (2003),
pp. 173–199.
[34] R. Vanderbei, LOQO user’s manual—version 3.10, Optimization Methods and
Software, 12 (1999), pp. 485–514.
[35] R. Vanderbei and D. Shanno, An interior-point algorithm for nonconvex nonlin-
ear programming, Computational Optimization and Applications, 13 (1999),
pp. 231–252.
[36] J. Viswanathan and I. Grossman, A combined penalty function and outer ap-
proximation method for MINLP optimization, Computers and Chemical En-
gineering, 14 (1990), pp. 769–782.
[37] A. Wächter and L.T. Biegler, On the implementation of an interior-point filter
line-search algorithm for large-scale nonlinear programming, Tech. Rep. RC
23149, IBM T.J. Watson Research Center, Yorktown, USA, March 2004.
[38] T. Westerlund and K. Lundqvist, Alpha-ecp version 5.01: An interactive
minlp-solver based on the extended cutting plane method, Tech. Rep. 01-178-A,
Process Design Laboratory at Abo Akademii University, 2001.
[39] S. Zhang, A new self-dual embedding method for convex programming, Journal of
Global Optimization, 29 (2004), pp. 479–496.

www.it-ebooks.info
www.it-ebooks.info
PART IV:
Expression Graphs

www.it-ebooks.info
www.it-ebooks.info
USING EXPRESSION GRAPHS IN
OPTIMIZATION ALGORITHMS
DAVID M. GAY∗

Abstract. An expression graph, informally speaking, represents a function in a


way that can be manipulated to reveal various kinds of information about the function,
such as its value or partial derivatives at specified arguments and bounds thereon in
specified regions. (Various representations are possible, and all are equivalent in com-
plexity, in that one can be converted to another in time linear in the expression’s size.)
For mathematical programming problems, including the mixed-integer nonlinear pro-
gramming problems that were the subject of the IMA workshop that led to this paper,
there are various advantages to representing problems as collections of expression graphs.
“Presolve” deductions can simplify the problem, e.g., by reducing the domains of some
variables and proving that some inequality constraints are never or always active. To
find global solutions, it is helpful sometimes to solve relaxed problems (e.g., allowing
some “integer” variables to vary continuously or introducing convex or concave relax-
ations of some constraints or objectives), and to introduce “cuts” that exclude some
relaxed variable values. There are various ways to compute bounds on an expression
within a specified region or to compute relaxed expressions from expression graphs. This
paper sketches some of them. As new information becomes available in the course of a
branch-and-bound (or -cut) algorithm, some expression-graph manipulations and pre-
solve deductions can be revisited and tightened, so keeping expression graphs around
during the solution process can be helpful. Algebraic problem representations are a
convenient source of expression graphs. One of my reasons for interest in the AMPL
modeling language is that it delivers expression graphs to solvers.

Key words. Expression graphs, automatic differentiation, bound computation,


constraint propagation, presolve.

AMS(MOS) subject classifications. Primary 68U01, 68N20, 68W30, 05C85.

1. Introduction. For numerically solving a problem, various prob-


lem representations are often possible. Many factors can influence one’s
choice of representation, including familiarity, computational costs, and
interfacing needs. Representation possibilities include some broad, often
overlapping categories that may be combined with uses of special-purpose
libraries: general-purpose programming compiled languages (such as C,
C++, Fortran, and sometimes Java), interpreted languages (such as awk,
Java, or Python), and “little languages” specialized for dealing with par-
ticular problem domains (such as AMPL for mathematical programming
or MATLAB for matrix computations — and much else). Common to

∗ Sandia National Laboratories, Albuquerque, NM 87185-1318. Sandia National Lab-


oratories is a multi-program laboratory operated by Sandia Corporation, a wholly
owned subsidiary of Lockheed Martin company, for the U.S. Department of Energy’s
National Nuclear Security Administration under contract DE–AC04–94AL85000. This
manuscript (SAND2009–5066C) has been authored by a contractor of the U.S. Govern-
ment under contract DE–AC04–94AL85000. Accordingly, the U.S. Government retains
a nonexclusive, royalty-free license to publish or reproduce the published form of this
contribution, or allow others to do so, for U.S. Government purposes.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 247
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_8,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
248 DAVID M. GAY

x 3 y 4

− +

()ˆ2 ()ˆ2

+
Fig. 1. Expression graph for f (x, y) = (x − 3)2 + (y + 4)2 .

most such representations is that they are turned into expression graphs
behind the scenes: directed graphs where each node represents an oper-
ation, incoming edges represent operands to the operation, and outgoing
edges represent uses of the result of the operation. This is illustrated in
Figure 1, which shows an expression graph for computing the f : R2 → R
for computing the function f (x, y) = (x − 3)2 + (y + 4)2 , which involves
operators for addition (+), subtraction (−) and squaring (()ˆ2).
It can be convenient to have an explicit expression graph and to com-
pute with it or manipulate it in various ways. For example, for smooth
optimization problems, we can turn expression graphs for objective and
constraint-body evaluations into reasonably efficient ways to compute both
these functions and their gradients. When solving mixed-integer nonlinear
programming (MINLP) problems, computing bounds and convex underes-
timates (or concave overestimates) can be useful and can be done with ex-
plicit expression graphs. Problem simplifications by “presolve” algorithms
and (similarly) domain reductions in constraint programming are readily
carried out on expression graphs.
This paper is concerned with computations related to solving a math-
ematical programming problem: given D ⊆ Rn , f : D → R, c : D → Rm ,
and , u ∈ D ∪ {−∞, ∞}n with i ≤ ui ∀ i, find x∗ such that x = x∗ solves

www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 249

Minimize f (x)
subject to  ≤ c(x) ≤ u (1.1)
and x ∈ D.

For MINLP problems, D restricts some components to be integers,


e.g.,

D = Rp × Zq , (1.2)

with n = p + q.
One of my reasons for interest in the AMPL [7, 8] modeling language
for mathematical programming is that AMPL makes explicit expression
graphs available to separate solvers. Mostly these graphs are only seen and
manipulated by the AMPL/solver interface library [13], but one could also
use them directly in the computations described below.
There are various ways to represent expression graphs. For exam-
ple, AMPL uses a Polish prefix notation (see, e.g., [31]) for the nonlinear
parts of problems conveyed to solvers via a “.nl” file. Kearfott [21] uses
a representation via 4-tuples (operation, result, left, and right operands).
Representations in XML have also been advocated ([9]). For all the specific
representations I have seen, converting from one form to another takes time
linear in the length (nodes + arcs) of the expression graph.
The rest of this paper is organized as follows. The next several sec-
tions discuss derivative computations (§2), bound computations (§3), pre-
solve and constraint propagation (§4), convexity detection (§5), and outer
approximations (§6). Concluding remarks appear in the final section (§7).
2. Derivative computations. When f and c in (1.1) are continu-
ously differentiable in their continuous variables (i.e., the first p variables
when (1.2) holds), use of their derivatives is important for some algorithms;
when integrality is relaxed, partials with respect to nominally integer vari-
ables may also be useful (as pointed out by a referee). Similarly, when
f and c are twice differentiable, some algorithms (variants of Newton’s
method) can make good use of their first and second derivatives. In the
early days of computing, the only known way to compute these derivatives
without the truncation errors of finite differences was to compute them by
the rules of calculus: deriving from, e.g., an expression for f (x) expres-
sions for the components of ∇f (x), then evaluating the derived formulae
as needed. Hand computation of derivatives is an error-prone process, and
many people independently discovered [18] a class of techniques called Au-
tomatic Differentiation (or Algorithmic Differentiation), called AD below.
The idea is to modify a computation so it computes both function and
desired partial derivatives as it proceeds — an easy thing to do with an
expression graph. Forward AD is easiest to understand and implement:
one simply applies the rules of calculus to recur desired partials for the
result of an operation from the partials of the operands. When there is

www.it-ebooks.info
250 DAVID M. GAY

only one independent variable, it is easy and efficient to recur high-order


derivatives with respect to that variable. For example, Berz et al. [3, 4]
have done highly accurate simulations of particle beams using high-order
Taylor series (i.e., by recurring high derivatives).
Suppose f is a function of n > 1 variables and that computing f (x)
involves L operations. Then the complexity of computing f (x) and its
gradient ∇f (x) by forward AD is O(nL). It can be much faster to use
“reverse AD” to compute f (x) and ∇f (x). With this more complicated
AD variant, one first computes f (x), then revisits the operations in reverse
order to compute the “adjoint” of each operation, i.e., the partial of f
with respect to the result of the operation. By the end of this “reverse
sweep”, the computed adjoints of the original variables are the partials
of f with respect to these variables, i.e., ∇f (x). The reverse sweep just
involves initializing variables representing the adjoints to zero and adding
partials of individual operations times adjoints to the adjoint variables
of the corresponding operands, which has the same complexity O(L) as
computing f (x). For large n, reverse AD can be much faster than forward
AD or finite differences.
The AMPL/solver interface library (ASL) makes arrangements for re-
verse AD sweeps while reading expression graphs from a “.nl” file and
converting them to internal expression graphs. This amounts to a prepro-
cessing step before any numerical computing is done, and is one of many
useful kinds of expression-graph walks. Many ways of handling implementa-
tion details are possible, but the style I find convenient is to represent each
operation (node in the expression graph) by a C “struct” that has a pointer
to a function that carries out the operation, pointers to the operands, and
auxiliary data that depend on the intended use of the graph. For exam-
ple, the “expr” structure used in the ASL for binary operations has the
fields shown in Figure 2 when only function and gradient computations
are allowed [12], and has the more elaborate form shown in Figure 3 when
Hessian computations are also allowed [14]. The intent here is not to give
a full explanation of these structures, but just to illustrate how represen-
tations can vary, depending on their intended uses. In reality, some other
type names appear in the ASL, and some fields appear in a different order.
Figures 2 and 3 both assume typedefs of the form

typedef struct expr expr;


typedef double efunc(expr*);

so that an “efunc” is a double-valued function that takes one argument,


a pointer to an “expr” structure. Use of such a function is illustrated in
Figure 4, which shows the ASL’s “op” function for multiplication. This
is a particularly simple binary operation in that the left partial is the
right operand and vice versa. Moreover the second partials are constants
(0 or 1) and need not be computed. In other cases, such as division and

www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 251

struct expr {
efunc *op; /* function for this operation */
int a; /* adjoint index */
real dL, dR; /* left and right partials */
expr *L, *R; /* left and right operands */
};
Fig. 2. ASL structure for binary operations with only f and ∇f available.

struct
expr {
efunc *op; /* function for this operation */
int a; /* adjoint index (for gradient comp.) */
expr *fwd, *bak; /* used in forward and reverse sweeps */
double dO; /* deriv of op w.r.t. t in x + t*p */
double aO; /* adjoint (in Hv computation) of op */
double adO; /* adjoint (in Hv computation) of dO */
double dL; /* deriv of op w.r.t. left operand */
expr *L, *R; /* left and right operands */
double dR; /* deriv of op w.r.t. right operand */
double dL2; /* second partial w.r.t. L, L */
double dLR; /* second partial w.r.t. L, R */
double dR2; /* second partial w.r.t. R, R */
};

Fig. 3. ASL structure for binary operations with f , ∇f , and ∇2 f available.

the “atan2” function, when Hessian computations are allowed, the function
also computes and stores some second partial derivatives.
Once a function evaluation has stored the partials of each operation,
the “reverse sweep” for gradient computations by reverse AD takes on a
very simple form in the ASL:
do *d->a.rp += *d->b.rp * *d->c.rp;
(2.1)
while(d = d->next);

Here d points to a “derp” structure (named for der ivative propagation)


of four pointers: d->a.rp points to an adjoint being updated, d->b.rp
points to an adjoint of the current operation, d->c.rp points to a partial
derivative of this operation, and d->next points to the next derp structure
to be processed. Thus for each of its operands, an operation contributes
the product of its adjoint and a partial derivative to the adjoint of the
operand.
Hessian or Hessian-vector computations are sometimes useful. Given
a vector v ∈ Rn and a function f : R2 → R represented by an expression
graph of L nodes, we can compute ∇2 f (x)v in O(L) operations by what
amounts to a mixture of forward and reverse AD, either applying reverse

www.it-ebooks.info
252 DAVID M. GAY

double
f_OPMULT(expr *e A_ASL)
{
expr *eL = e->L.e;
expr *eR = e->R.e;
return (e->dR = (*eL->op)(eL))
* (e->dL = (*eR->op)(eR));
}
Fig. 4. ASL function for multiplication.

AD to the result of computing φ (0) with φ(τ ) ≡ f (x + τ v) (computing


φ(0) and φ (0) by forward AD), or by applying forward AD to v T ∇f (x),
with ∇f (x) computed by reverse AD. Both descriptions lead to the same
numerical operations (but have different overheads in Sacado context dis-
cussed below). The ASL offers Hessian-vector computations done this way,
since some nonlinear optimization solvers use Hessian-vector products in it-
erative “matrix-free” methods, such as conjugate gradients, for computing
search directions.
Many mathematical programming problems (1.1) involve “partially
separable” functions. A function f : Rn → R is partially separable if it has
the form

f (x) = fi (Ai x),
i

in which the Ai are matrices having only a few rows (varying with i) and
n columns, so that fi is a function of only a few variables. A nice feature
of this structure is that f ’s Hessain ∇2 f (x) has the form

∇2 f (x) = ATi ∇2 fi (x)Ai ,
i

i.e., ∇2 f (x) is a sum of outer products involving the little matrices Ai and
the Hessians ∇2 fi (x) of the fi . Knowing this structure, we can compute
each ∇2 fi (x) separately with a few Hessian-vector products, then assemble
the full ∇2 f (x) — e.g., if it is to be used by a solver that wants to see
explicit Hessian matrices.
Many mathematical programming problems involve functions having
a more elaborate structure called partially-separable structure:
⎛ ⎞
 ri
f (x) = θi ⎝ fij (Aij x)⎠ , (2.2)
i j=1

in which θi : R → R is smooth and each Aij matrix has only a small


number of rows. The full ∇2 f (x) is readily assembled from the pieces of

www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 253

this representation (and their partials). By doing suitable expression-graph


walks, the ASL finds partially-separable structure (2.2) automatically and
arranges for it to be exploited when explicit Hessians are desired. More
graph walks determine the sparsity of the desired Hessians — usually the
Hessian of the Lagrangian function. See [14] for more details. (For use in
a parallel computing context, I have recently been convinced to add a way
to express the Lagrangian function as a sum of pieces and to arrange for
efficient computation of the Hessian of each piece, with the sparsity of each
piece made available in a preprocessing step.)
The expression-graph walks that the ASL does once to prepare for later
numerical evaluations make such computations reasonably efficient, but, as
illustrated in the above reverse-sweep loop and in Figure 4, some pointer
chasing is still involved. With another kind of graph walk, that done by
the nlc program described in [13], we can convert expression graphs into
Fortran or C (C++), eliminating much of the pointer chasing and some
unnecessary operations, e.g., addition of zero and multiplication by ±1.
The expression graphs that AMPL uses internally often involve loops,
i.e., iterating something over a set, so dealing with loops in expression
graphs is not hard. For simplicity in the ASL, the graphs that AMPL
writes to “.nl” files to represent nonlinear programming problems are loop-
free, with all operations explicit. Perhaps sometime this will change, as
it somewhat limits problems that can be encoded in “.nl” files and some-
times makes them much larger than they might be if they were allowed
to use looping nodes. This current limitation is somewhat mitigated by
an imported-function facility that permits arbitrary functions to be intro-
duced via shared libraries. When such functions are involved in gradient
or Hessian computations, the functions must provide first or first and sec-
ond partials with respect to their arguments, so the ASL can involve the
functions in the derivative computations.
Some languages, such as Fortran and C++, allow operator overloading.
With overloading, one can use the same arithemetic operators and func-
tions in expressions involving new types; thus, after modifying source code
by changing the types of some variables, one can leave the bulk of the source
code unchanged and obtain a program with altered (ideally enhanced) be-
havior. Operator overloading in suitable languages provides another way to
make AD computations conveniently available. An early, very general, and
often used package for AD computations in C++ codes is ADOL-C [20],
which operates by capturing an expression graph (called a “tape” in [20])
as a side effect of doing a computation of interest, then walking the graph
to compute the desired derivatives. Because Sandia National Laboratories
does much C++ computing and because more specialized implementations
are sometimes more efficient, it has been worthwhile to develop our own
C++ package, Sacado [2, 33], for AD. The reverse-AD portion of Sacado
[15] does a reverse sweep whose core is equivalent to (2.1).

www.it-ebooks.info
254 DAVID M. GAY

Computations based on operator overloading are very general and con-


venient, but present a restricted view of a calculation — somewhat like
looking through a keyhole. As indicated by the timing results in Table
1 below, when a more complete expression graph is available, it can be
used to prepare faster evaluations than are readily available from over-
loading techniques. Table 1 shows relative and absolute evaluation times
for function and gradient evaluations of an empirical energy function for
a protein-folding problem considered in [17]. This problem is rich in tran-
scendental function evaluations (such as cos(), sqrt(), atan()), which masks
some overhead. Computations were on a 3GHz Intel Xeon CPU; the times
in the “rel.” column are relative to the time to compute f (x) alone (the
“Compiled C, no ∇f ” line), using a C representation obtained with the
nlc program mentioned above. The “Sacade RAD” line is for C++ code
that uses reverse-mode AD via operator overloading provided by Sacado.
(Sacado also offers forward-mode AD, which would be considerably slower
on this example.) The two ASL lines compare evaluations designed for
computing f (x) and ∇f (x) only (“ASL, fg mode”) with evaluations that
save second partials for possible use in computing ∇2 f (x) or ∇2 f (x)v. The
“nlc” line is for evaluations using C from the nlc program; this line excludes
time to run nlc and compile and link its output. Solving nonlinear mathe-
matical programming problems often involves few enough evaluations that
ASL evaluations make the solution process faster than would use of nlc, but
for an inner loop repeated many times, preparing the inner-loop evaluation
with nlc (or some similar facility) could be worthwhile.
Table 1
Evaluation times for f and ∇f : protein folding (n = 66).

Eval style sec/eval rel.

Compiled C, no ∇f 2.92e–5 1.0


Sacado RAD 1.90e–4 6.5
nlc 4.78e–5 1.6
ASL, fg mode 9.94e–5 3.4
ASL, pfgh mode 1.26e–4 4.3

One lesson to draw from Table 1 is that while operator overloading


is very convenient in large codes, in that one can significantly enhance
computations by doing little more than changing a few types, there may be
room to improve performance by replacing code involved in computational
bottlenecks by alternate code.
Hessian-vector computations provide a more dramatic contrast be-
tween evaluations done with operator overloading (in C++) and evalua-
tions prepared with the entire expression graph in view. Table 2 shows
timings for Hessian-vector computations done several ways on the Hessian

www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 255

of a 100×100 dense quadratic form, f (x) = 12 xT Qx. (Such evaluations only


involve additions and multiplications and are good for exposing overhead.)
The kinds of evaluations in Table 2 include two ways of nesting the forward
(FAD) and reverse (RAD) packages of Sacado, a custom mixture (“RAD2”)
of forward- and reverse AD that is also in Sacado, and the “interpreted”
evaluations of the AMPL/solver interface library (ASL) prepared by the
ASL’s pfgh read routine. The computations were again on a 3GHz Intel
Xeon CPU.
Table 2
1 T
Times for ∇2 f (x)v with f = 2
x Qx, n = 100.

Eval style sec/eval rel.

RAD ◦ FAD 4.70e–4 18.6


FAD ◦ RAD 1.07e–3 42.3
RAD2 (Custom mixture) 2.27e–4 9.0
ASL, pfgh mode 2.53e–5 1.0

For much more about AD in general, see Griewank’s book [19] and the
“autodiff” web site [1], which has pointers to many papers and packages
for AD.
3. Bound computations. Computing bounds on a given expression
can be helpful in various contexts. For nonlinear programming in general
and mixed-integer nonlinear programming in particular, it is sometimes
useful to “branch”, i.e., divide a compact domain into the disjoint union of
two or more compact subdomains that are then considered separately. If
we find a feasible point in one domain and can compute bounds showing
that any feasible points in another subdomain must have a worse objective
value, then we can discard that other subdomain.
Various kinds of bound computations can be done by suitable expres-
sion graph walks. Perhaps easiest to describe and implement are bound
computations based on interval arithmetic [24]: given interval bounds on
the operands of an operation, we compute an interval that contains the
results of the operation. For example, for any a ∈ [a, a] and b ∈ [b, b], the
product ab = a · b satisfies

ab ∈ [min(ab, ab, ab, ab), max(ab, ab, ab, ab)].

(It is only necessary to compute all four products when max(a, b) < 0
and min(a, b) > 0, in which case ab ∈ [min(ab, ab), max(ab, ab)].) When
computing with the usual finite-precision floating-point arithmetic, we can
use directed roundings to obtain rigorous enclosures.
Unfortunately, when the same variable appears several times in an
expression, interval arithmetic treats each appearance as though it could

www.it-ebooks.info
256 DAVID M. GAY

have any value in its domain, which can lead to very pessimistic bounds.
More elaborate interval analysis (see, e.g., [25, 26]) can give much tighter
bounds. For instance, mean-value forms [25, 28] have an excellent outer
approximation property that will be explained shortly. Suppose domain
X ⊂ Rn is the Cartesian product of compact intervals, henceforth called
an interval vector, i.e.,

X = [x1 , x1 ] × · · · × [xn , xn ].

Suppose f : X → R and we have a point c ∈ X and another interval vector


S ⊂ Rn with the property that

for any x ∈ X, there is an s ∈ S


(3.1)
such that f (x) = f (c) + sT (x − c);

if f ∈ C 1 (R), i.e., f is continuously differentiable, it suffices for S to enclose


{∇f (x) : x ∈ X}. Then any enclosure of

{f (c) + sT (x − c) : x ∈ X, s ∈ S} (3.2)

also encloses f (X) ≡ {f (x) : x ∈ X}. For an interval vector V with


components [v i , v i ], define the width w(V ) by w(V ) = max{v i −v i : 1 ≤ i ≤
n}, and for an interval enclosure F = [F , F ] of f (X), define inf{f (X)} =
inf{f (x) : x ∈ X}, sup{f (X)} = sup{f (x) : x ∈ X}, and (F, f (X)) =
max(inf{f (X)} − F , F − sup{f (X)}), which is sometimes called the excess
width. If f ∈ C 1 (R) and S = ∇f (X) ≡ {∇f (x) : x ∈ X}, and F =
{f (c) + sT (x − c) : s ∈ S, x ∈ X}, then (F, f (X)) = O(w(X)2 ), and this
remains true if we use interval arithmetic (by walking an expression graph
for f ) to compute an outer approximation S of ∇f (X) and compute F
from S by interval arithmetic. If ∇f (c) = 0, then for small enough h > 0
and X = [c1 − h, c1 + h] × · · · × [cn − h, cn + h], the relative excess width
(w(F ) − w(f (X))) /w(X) = O(h), which is the excellent approximation
property mentioned above. This means that by dividing a given compact
domain into sufficiently small subdomains and computing bounds on each
separately, we can achieve bounds within a factor of (1 + τ ) of optimal for
a specified τ > 0.
We can do better by computing slopes [23, 28] rather than interval
bounds on ∇f . Slopes are divided differences, and interval bounds on them
give an S that satisfies (3.1), so an enclosure of (3.2) gives valid bounds.
For φ ∈ C 1 (R) and ξ, ζ ∈ R, the slope φ[ξ, ζ] is uniquely defined by

(φ(ξ) − φ(ζ))/(ξ − ζ) if ξ = ζ,
φ[ξ, ζ] =
φ (ζ) if ξ = ζ.

Slopes for functions of n variables are n-tuples of bounds on divided differ-


ences; they are not uniquely defined, but can be computed (e.g., from an
expression graph) operation by operation in a way much akin to forward

www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 257

.
..
.. .
..
.. .
.
..
.. .
.
..
.. .
..
..
slope .. .
range .
..
.. .
..
..
.. .
.
. . . . . . . . . . . . . . . . . . . . ..

deriv.
range

Fig. 5. Slopes and derivatives for φ(x) = x2 , x ∈ [−.5, 1.5], c = 0.5 .

AD. The general idea is to compute bounds on f (X) = {f (x) : x ∈ X}


by choosing a nominal point z, computing f (z), and using slopes to bound
f (x) − f (z) for x ∈ X:

n
f (X) ⊆ {f (z) + si (xi − zi ) : x ∈ X, si ∈ f [X, z]i }
i=1

where the i-th component f [X, z]i of an interval slope is a bound on (f (x +


τ ei ) − f (x))/τ (which is taken to be ∂f (x)/∂xi if τ = 0) for x ∈ Xi and
τ such that x + τ ei ∈ X. Most simply Xi = X for all i, but we can get
tighter bounds [32] at the cost of more memory by choosing

Xi = [x1 , x1 ] × · · · [xi , xi ] × {ci+1 } × · · · × {cn },

e.g., for ci ≈ 12 (xi + xi ). With this scheme, permuting the variables may
result in different bounds; deciding which of the n! permutations is best
might not be easy.
Figure 5 indicates why slopes can give sharper bounds than we get from
a first-order Taylor expansion with an interval bound on the derivative.
Bounds on φ (X) give S = [−1, 3], whereas slope bounds give S = [0, 2].
Sometimes we can obtain still tighter bounds by using second-order
slopes [38, 35, 36], i.e., slopes of slopes. The general idea is to compute a
slope matrix H such that an enclosure of

f (c) + ∇f (c)(x − c) + (x − c)T H(x − c)

www.it-ebooks.info
258 DAVID M. GAY

for x ∈ X gives valid bounds on f (X). (To be rigorous in the face of


roundoff error, we must compute interval bounds on ∇f (c).) Bounds com-
puted this way are sometimes better than those from the mean-value form
(3.2). In general, whenever there are several ways to compute bounds, it
is usually best to compute all of them and compute their intersection; this
is common practice in interval analysis.
As described in [16], I have been modifying some ASL routines to do
bound computations from first- and second-order slopes. The computations
are similar to forward AD, which greatly simplifies the .nl file reader in
that it does not have to set up a reverse sweep. One small novelty is my
exploitation of sparsity where possible; the preliminary expression-graph
walk done by the .nl reader sets things up so evaluations can proceed more
quickly. Table 3 summarizes the bound computations available at this
writing, and Table 4 shows the widths of two sets of resulting bounds. In
Table 3, F denotes an outer approximation of f . See [16] for details and
more references. Explicit expression graphs are convenient for this work,
in which the main focus is on properties of particular operations.
Table 3
Bound computations.

interval F (X) ⊃ f (X)


Taylor 1 f (z) + F  (X)(X − z)
slope 1 f (z) + F [X, z](X − z)
slope 2 f (z) + f  (z)(X − z)
+F [X, z, z](X − z)2
slope 2* slope 2 plus Theorem 2 in [16]

Table 4
Bound widths.

Method Barnes Sn525


interval 162.417 0.7226
Taylor 1 9.350 0.3609
slope 1 6.453 0.3529
slope 2 3.007 0.1140
slope 2* 2.993 0.1003
true 2.330 0.0903

4. Presolve and constraint propagation. Often it is worthwhile to


spend a little time trying to simplify a mathematical programming problem
before presenting it to a solver. With the help of suitable bound compu-
tations, sometimes it is possible to fix some variables and remove some

www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 259

constraints. Occasionally a problem can be completely solved this way,


but more likely the solver will run faster when presented with a simpler
problem.
For linear constraints and objectives, computing bounds is straightfor-
ward but best done with directed roundings [6] to avoid false simplifications.
For nonlinear problems, we can use general bounding techniques, such
as those sketched in §3, along with specific knowledge about some nonlinear
functions, such as that sin(x) ∈ [−1, 1] for all x.
Another use of bounding techniques is to reduce variable domains. If
we have rigorous bounds showing that part of the nominal problem domain
is mapped to values, all of which violate problem constraints, then we
can discard that part of the nominal problem domain. In the constraint-
propagation community, this is called “constraint propagation”, but it can
also be regarded as “nonlinear presolve”. See [34] for more discussion of
constraint propagation on expression graphs.
In general, a presolve algorithm may repeatedly revisit bound compu-
tations when new information comes along. For instance, if we have upper
and lower bounds on all but one of the variables appearing in a linear in-
equality constraint, we can deduce a bound on the remaining variable; when
another deduction implies a tighter bound on one of the other variables,
we can revisit the inequality constraint and tighten any bounds deduced
from it. Sometimes this leads to a sequence of deductions tantamount to
solving linear equations by an iterative method, so it is prudent to limit the
repetitions in some way. Similar comments apply when we use nonlinear
bounding techniques.
As mentioned in §3, it is sometimes useful to branch, i.e., divide the do-
main into disjoint subdomains that are then considered separately, perhaps
along with the addition of “cuts”, i.e., inequalities that must be satisfied
(e.g., due to the requirement that some variables assume integer values).
Any such branching and imposition of cuts invites revisiting relevant pre-
solve deductions, which might now be tightened, so an expression-graph
representation of the problem can be attractive and convenient.

5. Convexity detection. Convexity is a desirable property for sev-


eral reasons: it is useful in computing bounds; convex minimization (or
concave maximization) problems can be much easier to solve globally than
nonconvex ones; and convexity enables use of some algorithms that would
otherwise not apply. It is thus useful to know when an optimization prob-
lem is convex (or concave).
An approach useful for some problems is to use a problem specification
that guarantees convexity; examples include CVXMOD [5] and Young’s
recent Ph.D. thesis [37]. More generally, a suitable expression-graph walk
[10, 27, 29, 30] can sometimes find sufficient conditions for an expression to
be convex or concave. As a special case, some solvers make special provi-
sions for quadratic objectives and sometimes quadratic constraints. Walk-

www.it-ebooks.info
260 DAVID M. GAY

ing a graph to decide whether it represents a constant, linear, quadratic,


or nonlinear expression is straightforward; if quadratic, we can attempt to
compute a Cholesky factorization to decide whether it is convex.
6. Outer approximations. Finding outer approximations — convex
underestimates and concave overestimates — for a given expression can
be useful. By optimizing the outer approximations, we obtain bounds
on the expression, and if we compute linear outer approximations (i.e.,
sets of linear inequalities satisfied by the expression), we can use a linear
programming solver to compute bounds, which can be fast or convenient.
It is conceptually straightforward to walk an expression graph and derive
rigorous outer approximations; see, e.g., [11, 22] for details. Linear outer
approximations are readily available from first-order slopes; see §7 of [34].
7. Concluding remarks. Expression graphs are not convenient for
users to create explicitly, but are readily derived from high-level represen-
tations that are convenient for users. Once we have expression graphs,
it is possible to do many useful sorts of computations with them, includ-
ing creating other representations that are faster to evaluate, carrying out
(automatic) derivative computations, computing bounds and outer approx-
imations, detecting convexity, and recognizing problem structure, and sim-
plifying problems. Internal use of expression graphs can be a boon in opti-
mization (and other) algorithms in general, and in mixed-integer nonlinear
programming in particular.
Acknowledgment. I thank an anonymous referee for helpful com-
ments.

REFERENCES

[1] Autodiff Web Site, http://www.autodiff.org.


[2] Roscoe A. Bartlett, David M. Gay, and Eric T. Phipps, Automatic Differenti-
ation of C++ Codes for Large-Scale Scientific Computing. In Computational
Science – ICCS 2006, Vassil N. Alexandrov, Geert Dick van Albada, Peter
M.A. Sloot, and Jack Dongarra (eds.), Springer, 2006, pp. 525–532.
[3] Martin Berz, Differential Algebraic Description of Beam Dynamics to Very High
Orders, Particle Accelerators 24 (1989), p. 109.
[4] Martin Berz, yoko Makino, Khodr Shamseddine, Georg H. Hoffstätter,
and Weishi Wan, COSY INFINITY and Its Applications in Nonlinear Dy-
namics, SIAM, 1996.
[5] Stephen P. Boyd and Jacob Mattingley, CVXMOD — Convex Optimization
Software in Python, http://cvxmod.net/, accessed July 2009.
[6] R. Fourer and D.M. Gay, Experience with a Primal Presolve Algorithm. In
Large Scale Optimization: State of the Art, W.W. Hager, D.W. Hearn, and
P.M. Pardalos (eds.), Kluwer Academic Publishers, 1994, pp. 135–154.
[7] R. Fourer, D.M. Gay, and B.W. Kernighan, A Modeling Language for Mathe-
matical Programming, Management Science 36(5) (1990), pp. 519–554.
[8] Robert Fourer, David M. Gay, and Brian W. Kernighan, AMPL: A Mod-
eling Language for Mathematical Programming, Duxbury Press/Brooks/Cole
Publishing Co., 2nd edition, 2003.

www.it-ebooks.info
USING EXPRESSION GRAPHS IN OPTIMIZATION ALGORITHMS 261

[9] Robert Fourer, Jun Ma, and Kipp Martin, An Open Interface for Hooking
Solvers to Modeling Systems, slides for DIMACS Workshop on COIN-OR,
2006, http://dimacs.rutgers.edu/Workshops/COIN/slides/osil.pdf .
[10] R. Fourer, C. Maheshwari, A. Neumaier, D. Orban, and H. Schichl, Con-
vexity and Concavity Detection in Computational Graphs, manuscript, 2008,
to appear in INFORMS J. Computing.
[11] Edward P. Gatzke, John E. Tolsma, and Paul I. Barton, Construction of
Convex Function Relaxations Using Automated Code Generation Techniques
Optimization and Engineering 3, 2002, pp. 305–326.
[12] Davd M. Gay, Automatic Differentiation of Nonlinear AMPL Models. In Auto-
matic Differentiation of Algorithms: Theory, Implementation, and Applica-
tion, A. Griewank and G. Corliss (eds.), SIAM, 1991, pp. 61–73.
[13] Davd M. Gay, Hooking Your Solver to AMPL, AT&T Bell Laboratories, Numer-
ical Analysis Manuscript 93-10, 1993 (revised 1997). http://www.ampl.com/
REFS/hooking2.pdf .
[14] Davd M. Gay, More AD of Nonlinear AMPL Models: Computing Hessian Infor-
mation and Exploiting Partial Separability. In Computational Differentiation
: Techniques, Applications, and Tools, Martin Berz, Christian Bischof, George
Corliss and Andreas Griewank (eds.), SIAM, 1996, pp. 173–184.
[15] Davd M. Gay, Semiautomatic Differentiation for Efficient Gradient Computa-
tions. In Automatic Differentiation: Applications, Theory, and Implementa-
tions, H. Martin Bücker, George F. Corliss, Paul Hovland and Uwe Naumann
and Boyana Norris (eds.), Springer, 2005, pp. 147–158.
[16] Davd M. Gay, Bounds from Slopes, report SAND-1010xxxx, to be available as
http://www.sandia.gov/~dmgay/bounds10.pdf.
[17] D.M. Gay, T. Head-Gordon, F.H. Stillinger, and M.H. Wright, An Appli-
cation of Constrained Optimization in Protein Folding: The Poly-L-Alanine
Hypothesis, Forefronts 8(2) (1992), pp. 4–6.
[18] Andreas Griewank, On Automatic Differentiation. In Mathematical Program-
ming: Recent Developments and Applications, M. Iri and K. Tanabe (eds.),
Kluwer, 1989, pp. 83–108.
[19] , Andreas Griewank, Evaluating Derivatives, SIAM, 2000.
[20] A. Griewank, D. Juedes, and J. Utke, Algorithm 755: ADOL-C: A package for
the automatic differentiation of algorithms written in C/C++, ACM Trans.
Math Software 22(2) (1996), pp. 131–167.
[21] R. Baker Kearfott An Overview of the GlobSol Package for Verified Global
Optimization, talk slides, 2002, http://www.mat.univie.ac.at/~neum/glopt/
mss/Kea02.pdf .
[22] , Padmanaban Kesavan, Russell J. Allgor, Edward P. Gatzke, and Paul I.
Barton, Outer Approximation Algorithms for Separable Nonconvex Mixed-
Integer Nonlinear Programs, Mathematical Programming 100(3), 2004, pp.
517–535.
[23] R. Krawczyk and A. Neumaier, Interval Slopes for Rational Functions and As-
sociated Centered Forms, SIAM J. Numer. Anal. 22(3) (1985), pp. 604–616.
[24] R.E. Moore, Interval Arithmetic and Automatic Error Analysis in Digital Com-
puting, Ph.D. dissertation, Stanford University, 1962.
[25] Ramon E. Moore, Methods and Applications of Interval Analysis, SIAM, 1979.
[26] Ramon E. Moore, R. Baker Kearfott, and Michael J. Cloud, Introduction
to Interval Analysis, SIAM, 2009.
[27] Ivo P. Nenov, Daniel H. Fylstra, and Lubomir V. Kolev, Convexity Determi-
nation in the Microsoft Excel Solver Using Automatic Differentiation Tech-
niques, extended abstract, 2004, http://www.autodiff.org/ad04/abstracts/
Nenov.pdf.
[28] Arnold Neumaier, Interval Methods for Systems of Equations, Cambridge Uni-
versity Press, 1990.

www.it-ebooks.info
262 DAVID M. GAY

[29] Dominique Orban, Dr. AMPL Web Site, http://www.gerad.ca/~orban/drampl/,


accessed July 2009.
[30] Dominique Orban and Robert Fourer, Dr. AMPL, A Meta Solver for Op-
timization, CORS/INFORMS Joint International Meeting, 2004, http://
users.iems.northwestern.edu/~4er/SLIDES/ban0406h.pdf.
[31] Polish notation, http://en.wikipedia.org/wiki/Polish_notation, accessed July
2009.
[32] S.M. Rump, Expansion and Estimation of the Range of Nonlinear Functions,
Mathematics of Computation 65(216) (1996), pp. 1503–1512.
[33] Sacado Web Site, http://trilinos.sandia.gov/packages/sacado/.
[34] Hermann Schichl and Arnold Neumaier, Interval Analysis on Directed Acyclic
Graphs for Global Optimization Journal of Global Optimization 33(4) (2005),
pp. 541–562.
[35] Marco Schnurr, Steigungen hoeherer Ordnung zur verifizierten globalen Opti-
mierung, Ph.D. dissertation, Universität Karlsruhe, 2007.
[36] Marco Schnurr, The Automatic Computation of Second-Order Slope Tuples for
Some Nonsmooth Functions, Electronic Transactions on Numerical Analysis
30 (2008), pp. 203–223.
[37] Joseph G. Young Program Analysis and Transformation in Mathematical Pro-
gramming, Ph.D. dissertation, Rice University, 2008.
[38] Shen Zuhe and M.A. Wolfe, On Interval Enclosures Using Slope Arithmetic,
Applied Mathematics and Computation 39 (1990), pp. 89–105.

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING
LEO LIBERTI∗

Abstract. Symmetry is mainly exploited in mathematical programming in order


to reduce the computation times of enumerative algorithms. The most widespread ap-
proach rests on: (a) finding symmetries in the problem instance; (b) reformulating the
problem so that it does not allow some of the symmetric optima; (c) solving the modi-
fied problem. Sometimes (b) and (c) are performed concurrently: the solution algorithm
generates a sequence of subproblems, some of which are recognized to be symmetrically
equivalent and either discarded or treated differently. We review symmetry-based anal-
yses and methods for Linear Programming, Integer Linear Programming, Mixed-Integer
Linear Programming and Semidefinite Programming. We then discuss a method (intro-
duced in [36]) for automatically detecting symmetries of general (nonconvex) Nonlinear
and Mixed-Integer Nonlinear Programming problems and a reformulation based on ad-
joining symmetry breaking constraints to the original formulation. We finally present a
new theoretical and computational study of the formulation symmetries of the Kissing
Number Problem.

Key words. MINLP, NLP, reformulation, group, graph isomorphism, permutation,


expression tree.

AMS(MOS) subject classifications. 90C11, 90C26, 90C30, 05C25, 20B25.

1. Introduction. Mathematical Programming (MP) is a language


for formally describing classes of optimization problems. A MP consists of:
parameters, encoding the problem input; decision variables, encoding the
problem output; one objective function to be minimized; and a set of con-
straints describing the feasible set (some of these constraints may be bounds
or integrality requirements on the decision variables). The objective func-
tion and constraints are described by mathematical expressions whose ar-
guments are the parameters and the decision variables. Let N = {1, . . . , n}
and M = {1, . . . , m} for some nonnegative integers m, n, and Z ⊆ N . In
general, a MP formulation is as follows:

min f (x) ⎬
s.t. g(x) ≤ 0 (1.1)

∀i ∈ Z xi ∈ Z,

where x ∈ Rn is a vector of decision variables, and f : Rn → R and g : Rn →


Rm are functions that can be written as mathematical expressions involving
of a finite number of operators (e.g. {+, −, ×, ÷, ↑, log, exp, sin, cos, tan})
and x as arguments. If f, g are affine forms and Z = ∅, (1.1) is a Linear
Program (LP). If f, g contain some nonlinear term and Z = ∅, (1.1) is a
Nonlinear Program (NLP), and if f is a convex function and the feasible
set {x | g(x) ≤ 0} is convex, (1.1) is a convex NLP (cNLP). If f, g are affine

∗ LIX, École Polytechnique, F-91128 Palaiseau, France (liberti@lix.


polytechnique.fr).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 263
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_9,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
264 LEO LIBERTI

and Z = ∅, (1.1) is a Mixed-Integer Linear Program (MILP) , and if Z = N


it is an Integer Linear Program (ILP). If f, g contain some nonlinear term
and Z = ∅, (1.1) is a Mixed-Integer Nonlinear Program (MINLP), and
if f and the feasible set are convex, it is a convex MINLP (cMINLP). A
special case of MP, called Semidefinite Programming (SDP) is when f, g
are affine forms, Z = ∅, x is a square matrix, and there is an additional
constraint stating that x must be symmetric positive semidefinite (x  0).
Although this constraint cannot be written as a mathematical expression
of the operators listed above, SDP is important because it can be solved
in polynomial time by a special-purpose interior point method [1], and
because many tight relaxations of polynomial programming problems can
be cast as SDPs.
Symmetries have been used in MP for analysis purposes or in order
to speed up solution methods. The general approach is as follows. First,
symmetries are detected, either algorithmically or because of some known
mathematical property of the given optimization problem. Once a subset of
problem symmetries is known, either the MP is reformulated so that some
symmetric optima become infeasible and then solved via standard solution
methods (static symmetry breaking [43]), or a known solution method is
modified so that it recognizes and exploits symmetry dynamically. Sym-
metries in MP can be broadly classified in two types: solution symmetries,
i.e. those variable symmetries that fix the set of solutions setwise; and for-
mulation symmetries, i.e. those variable symmetries that fix the formulation
(encoded in some data structure). If the formulation group structure for
a given MP varies considerably from instance to instance, then automatic
symmetry detection methods may be required.
Currently, the main effort is that of removing symmetries from a prob-
lem in order to find a global optimum more quickly. After extensive com-
putational experimentations with all the symmetric instances of most pub-
lic instance libraries solved by means of Couenne [6] and BARON [57]
(both implementing a spatial Branch-and-Bound (sBB) algorithm) and also
RECIPE [38] (based on the Variable Neighbourhood Search (VNS) meta-
heuristic [22]), my own very personal opinion is that removing symmetries
is good when solving with sBB and bad when solving with VNS. The sBB
algorithm is a tree search based on bisecting the variable bounds at each
tree node along the spatial direction generating the largest error between
the solutions of the upper bounding and lower bounding subproblems. The
leaf nodes of the search tree contain globally optimal solutions or infeasible
portions of space. If the problem has many symmetric optima, a corre-
spondingly large number of leaves contain these optima. When some of
the symmetric optima are eliminated from the feasible region, the bound
relative to parent nodes is likely to increase, which accelerates sBB con-
vergence. VNS, on the other hand, works by jumping from optimum to
optimum via paths defined by local searches started from random points in
increasingly large neighbourhoods of the incumbent, attempting to improve

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 265

the objective function value. Since in general it is important to explore as


much as possible of the feasible region, one usually tends to move to a new
optimum even though the objective function value stays the same. In this
context, removing symmetries prevents the algorithm from exploring larger
portions of the search space.
In this paper, we provide a study of symmetry for NLPs and MINLPs
in general form (no convexity assumption is made on f, g). Our literature
review (Sect. 2) first presents a general overview of mathematical program-
ming techniques drawing from group theory, particularly in regard to LP,
ILP, MILP, SDP, and then focuses on those items that are relevant to auto-
matic symmetry detection (one of the main topics discussed in this paper).
An automatic symmetry detection method (originally introduced in [36]) is
recounted in Sections 3-4: we construct a digraph that encodes the struc-
ture of the mathematical expression representation of f, g, and then apply
known graph-based symmetry detection algorithms to derive the group of
permutations of the decision variable indices that fix the symbolic structure
of f, g. In Sect. 5 we introduce some linear inequalities that are valid for at
least one optimum of (1.1) but which are likely to make at least some sym-
metric optima infeasible. We then present an original application of the
proposed techniques to the Kissing Number Problem [30] in Sect. 6: we
use our automatic symmetry detection method to formulate a conjecture
on the KNP group structure, which we then prove to be true; we derive
some symmetry breaking inequalities, and discuss computational results
which show the positive impact of the proposed approach.
1.1. Notation. We follow the notation style common in classical al-
gebra, see e.g. [10, 2], with some modifications drawn from computational
group theory [60]. Most of the groups considered in this paper act on
vectors in Rn by permuting the components. Permutations act on sets of
vectors by acting on each vector in the set. We denote the identity per-
mutation by e. We employ standard group nomenclature: Sn , Cn are the
symmetric and cyclic groups of order n. For any function f : S → T (where
S, T are sets) we denote S (the domain of f ) by dom(f ).
For a group G ≤ Sn and a set X of row vectors, XG = {xg | x ∈ X ∧
g ∈ G}; if Y is a set of column vectors, GY = {gy | y ∈ Y ∧ g ∈ G}. If X =
{x}, we denote XG by xG (and similarly GY by Gy if Y = {y}); xG is also
known as the orbit of x in G (and similarly for Gy); in computational group
theory literature the notation orb(x, G) is sometimes employed instead of
the more algebraic xG. The (setwise) stabilizer stab(X, G) of a set X with
respect to a group G is the largest subgroup H of G such that XH = X.
For any permutation
 π ∈ Sn , let Γ(π) be the set of its disjoint cycles, so
that π = τ . For a group G and π ∈ G let "π# be the subgroup
τ ∈Γ(π)
of G generated by π, and for a subset S ⊆ G let "S# be the subgroup of
G generated by all elements of S. Given B ⊆ {1, . . . , n}, Sym(B) is the
symmetric group of all the permutations of elements in B. A permutation

www.it-ebooks.info
266 LEO LIBERTI

π ∈ Sn is limited to B if it fixes every element outside B; π acts on


B ⊆ {1, . . . , n} as a permutation ρ ∈ Sym(B) if π fixes B setwise and
ρ = π[B] is the permutation of B induced by π. Because disjoint cycles
commute, it follows from the definition that for all k ∈ N, πk [B] = (π[B])k .
A group G ≤ Sn with generators {g1 , . . . , gs } acts on B ⊆ {1, . . . , n} as H
if "gi [B] | i ≤ s# = H; in this case we denote H by G[B]. If B is an orbit
of#the natural action of G on the integers (i.e. the natural action of G on
dom(π), which fixes every other integer), then it is easy to show that
π∈G
G[B] is a transitive constituent of G [21]. In general, G[B] may not be a
subgroup of G: take G = "(1, 2)(3, 4), (1, 3), (4, 2)# and B = {1, 2}, then
G[B] = "(1, 2)# ≤ G. Let B, D ⊆ {1, . . . , n} with B ∩ D = ∅; if π ∈ Sn fixes
both B, D setwise, it is easy to show that π[B ∪ D] = π[B]π[D].
2. Literature review. This literature review does not only cover the
material strictly inherent to later sections, but attempts to be as informa-
tive as possible as concerns the use of symmetry in MP, in such a way
as to provide an overview which is complementary to [43]. The first part
(Sect. 2.1-2.3) of this review covers a representative subset of the most
important works about symmetry in optimization, notably in LP, MILP
and (briefly) SDP. We survey those topics which are most relevant to later
sections (i.e. symmetry detection methods) in Sect. 2.4.
2.1. Symmetry in Linear Programming. The geometrical objects
of LP are polyhedra, and there is a very rich literature on symmetric poly-
hedra [54]. Such results, however, are mostly about the classification of
symmetric polyhedra and are rarely used in MP.
The inherent symmetry of the simplex algorithm is studied in [64, 65,
66]. Given two m × n matrices A, B, let SA = {x ∈ Rm+n | (I|A)x = 0}
(where (I|A) is the m × (m + n) matrix formed by the m × m identity
followed by the columns of A) and SB = {x ∈ Rm+n | (I|B)x = 0}; A, B
are combinatorially equivalent (written A :: B) if there exists π in the sym-
metric group Sm+n such that πSA = SB . The paper [64] gives different
necessary and sufficient conditions for A :: B (among which a formula for
constructing all combinatorially equivalent matrices from submatrices of
A). In [65] an application to solving matrix games via the simplex method
is presented. In [55], Tucker’s combinatorial equivalence is used to devise
a simplex algorithm variant capable of solving a pair of primal/dual LPs
directly without many of the necessary pre-processing steps.
A very recent result [23] shows that every LP has an optimum in the
subspace generated by the fixed points of the action of its symmetry group
(2.1) on its feasible region.
2.2. Symmetry in Mixed-Integer Linear Programming. The
existing work on symmetry in MILP may be classified in three broad cate-
gories: (a) the abelian group approach proposed by Gomory to write integer
feasibility conditions for Integer Linear Programs (ILPs); (b) symmetry-

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 267

breaking techniques for specific problems, whose symmetry group can be


computed in advance; (c) general-purpose symmetry group computations
and symmetry-breaking techniques to be used in BB-type solution algo-
rithms. We consider MILPs of the form min{cx | Ax ≤ b ∧ ∀i ∈ Z xi ∈ Z}.
Category (a) was established by R. Gomory [20]: given a basis B of
the constraint matrix A, it exploits the (abelian) group G = Zn /"col(B)#,
where Zn is the additive group of integer n-sequences and "col(B)# is the
additive group generated by the columns of the (nonsingular) matrix B.
Consider the natural group homomorphism ϕ : Zn → G with ker ϕ =
"col(B)#: letting (xB , xN ) be a basic/nonbasic partition of the decision
variables, apply ϕ to the standard form constraints BxB + N xN = b to
obtain ϕ(BxB )+ϕ(N xN ) = ϕ(b). Since ϕ(BxB ) = 0 if and only if xB ∈ Zn ,
setting ϕ(N xN ) = ϕ(b) is a necessary and sufficient condition for xB to
be integer feasible. Gomory’s seminal paper gave rise to further research,
among which [69, 5]. The book [25] is a good starting point.
Category (b) is possibly the richest in terms of number of published
papers. Many types of combinatorial problems exhibit a certain amount of
symmetry. Symmetries are usually broken by means of specific branching
techniques (e.g. [41]), appropriate global cuts (e.g. [61]) or special formu-
lations [31, 9] based on the problem structure. The main limitation of the
methods in this category is that they are difficult to generalize and/or to
be made automatic.
Category (c) contains two main research streams. The first was estab-
lished by Margot in the early 2000s [39, 40], and is applicable to Binary
Linear Programs (BLPs) in the form:

min cx ⎬
Ax ≤ b

x ∈ {0, 1}n.

Margot [39, 43] defines the relaxation group GLP (P ) of a BLP P as:

GLP (P ) = {π ∈ Sn | cπ = c ∧ ∃σ ∈ Sn (σb = b ∧ σAπ = A)}, (2.1)

or, in other words, all relabellings of problem variables for which the ob-
jective function and constraints are the same. The relaxation group (2.1)
is used to derive effective BB pruning strategies by means of isomorphism
pruning and isomorphism cuts local to some selected BB tree nodes (Margot
extended his work to general integer variables in [42]). Further results along
the same lines (named orbital branching) are obtained for covering and
packing problems in [50, 51]: if O is an orbit of.4
some subgroup/ of the relax-
ation group, at each BB node the disjunction i∈O xi = 1 ∨ i∈O xi = 0
induces a feasible division ofthe search space; orbital branching restricts
this disjunction to xh = 1 ∨ i∈O xi where h is an arbitrary index in O.
The second was established by Kaibel et al. in 2007 [26, 15], with the
introduction of the packing and partitioning orbitopes, i.e. convex hulls

www.it-ebooks.info
268 LEO LIBERTI

of certain 0-1 matrices that represent possible solutions to sets of packing


and partitioning constraints. These are used in problems defined in terms
of matrices of binary decision  variables xij (for i ≤ m, j ≤ n). Since a
typical packing constraint is j∈Ji xij ≤ 1 for some i ≤ m, Ji ⊆ {1, . . . , n}
(partitioning constraints simply replace inequalities with  equations), sets
of such constraint may exhibit column symmetries in i≤m Sym(Ji ), and
row symmetries in Sm . Orbitopes are convex hulls of binary m × n matri-
ces that have lexicographically ordered columns: their vertices represent a
subset of feasible solutions of the corresponding packing/partitioning prob-
lem from which several symmetries have been removed. Given a partition
C1 , . . . , Cq of the variable indices, a permutation π ∈ GLP (P ) is an or-
bitopal symmetry if there are p, r ≤ q such that π is a bijection Cp → Cr
that keeps all other Cs elementwise fixed, for s ∈ {p, r} [7]. In [26], a
complete description of packing/partitioning orbitopes in terms of linear
inequalities is provided ([15] gives a much shorter, more enlightening and
less technical presentation than that given in [26]). Inspired by the work
on orbitopes, E. Friedman proposed a similar but more general approach
leading to fundamental domains [17]: given a feasible polytope X ⊆ [0, 1]n
with integral extreme points and a group G acting as an affine transforma-
tion on X (i.e. for all π ∈ G there is a matrix A ∈ GL(n) and an n-vector
d such that πx = Ax + d for all x ∈ X), a fundamental domain is a subset
F ⊂ X such that GF = X.

2.3. Symmetry in Semidefinite Programming. There are several


works describing the exploitation of symmetry in Semidefinite Program-
ming (see e.g. [27, 19, 28]). Much of the material in this section is taken
from the commendable tutorial [68]. Consider the following SDP:

minX C •X ⎪

∀k ≤ m Ak • X ≤ bi (2.2)


X  0,

where X is an n×n symmetric matrix abd M1 •M1 = trace(M1  M2 ) is the


trace product between matrices M1 , M2 . Let GSDP be the largest subgroup
of Sn such that if X ∗ is an optimum then πX ∗ is also an optimum for all
π ∈ GSDP , where the action of π on an n × n matrix M is to permute the
columnsand the rows of M according to π. If X ∗ is an optimum, taking
1
|GSDP |
πX ∗ shows that there is always an optimal solution of (2.2) in
π∈G
B, the space of GSDP -invariant matrices. Let R1 , . . . , Rk be the orbits of
{(i, j) | i, j ≤ n} under GSDP , and for all r ≤ k let B r = (brij ) the 0-1
incidence matrix of (i, j) ∈ Rr (i.e. brij = 1 of (i, j) ∈ Rr and 0 otherwise).
Then B 1 , . . . , B k is a basis of B and (2.2) can be re-cast as a search over
the coefficients of a linear form in B 1 , . . . , B k :

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 269
 ⎫
miny (C • B j )yj ⎪

j≤k ⎪

 ⎬
∀i ≤ m (Ai • B j )yj = bi (2.3)
j≤k  ⎪

yj B j  0. ⎪


j≤k

By rewriting (2.2) and (2.3) over C, B becomes a semisimple algebra over


C. This implies that it is possible to find an algebra isomorphism
5
φ:B→ Cmt ×mt
t≤d

for some integers d and mt (t ≤ d). This allows a size reduction of the SDP
being solved, as the search only needs to be conducted on the smaller-
dimensional space spanned by φ(B).
A different line of research is pursued in [27]: motivated by an appli-
cation (truss optimization), it is shown that the barrier subproblem of a
typical interior point method for SDP “inherits” the same symmetries as
the original SDP.
2.4. Automatic symmetry detection. Automatic symmetry de-
tection does not appear prominently in the mathematical programming
literature. A method for finding the MILP relaxation group (2.1), based
on solving an auxiliary MILP encoding the condition σAπ = A, was pro-
posed and tested in [33] (to the best of our knowledge, the only approach for
symmetry detection that does not reduce the problem to a graph). A more
practically efficient method consists in finding the automorphism group of
vertex-colored bipartite graph encoding the incidence of variables in con-
straints. If the symmetry π is orbitopal and the system Ax ≤ b contains at
least a leading constraint, i.e. a π-invariant constraint that has exactly one
nonzero column in each Cp (for p ≤ q) then a set of generators for GLP (P )
can be found in linear time in the number of nonzeroes of A [7].
The Constraint Programming (CP) literature contains many papers on
symmetries. Whereas most of them discuss symmetry breaking techniques,
a few of them deal with automatic symmetry detection and are relevant to
the material presented in the rest of the paper; all of them rely on reducing
the problem to a graph and solving the associated Graph Isomorphism
(GI) problem. In CP, symmetries are called local if they hold at a specific
search tree node, and global otherwise. Solution symmetries are also called
semantic symmetries, and formulation symmetries are also called syntactic
or constraint symmetries. A Constraint Satisfaction Problem (CSP) can be
represented by its microstructure complement, i.e. a graph whose vertices
are assignments x = a (where x ranges over all CSP variables and a over all
values in the domain of x), and whose edges (xi = a, xj = b) indicate that
the two assignments xi = a and xj = b are incompatible either because of a
constraint in the CSP or because i = j and a = b. Constraint symmetries

www.it-ebooks.info
270 LEO LIBERTI

are defined in [11] as the automorphisms of the microstructure comple-


ment. A k-ary nogood is a k-partial solution (i.e. an assignment of values
to k variables) which cannot be extended to a full solution of the given
CSP instance. The k-nogood hypergraph of the CSP has assignments x = a
as vertices and all m-ary nogoods as hyperedges, for m ≤ k. For a k-ary
CSP (one whose constraints have maximum arity k), the group of solution
symmetries is equal to the automorphism group of its k-nogood hypergraph
[11]. In [13] (possibly the first work in which a reduction from formulation-
type symmetries to GI was proposed), SAT symmetries are automatically
detected by reducing the problem to a bipartite graph, and identified by
solving the corresponding GI instance, similarly to the approach taken in
[50]. In [53], constraints involving the arithmetic operations +, −, × are
reduced to Directed Acyclic Graphs (DAG) whose leaf vertices represent
variables and intermediate vertices represent operators; vertex colors iden-
tify same operator types and constraints having the same right hand sides.
Thm. 3.1 in [53] shows that the automorphism group of this DAG is iso-
morphic to the constraint group of the corresponding CSP instance, and
might be considered the CP equivalent of Thm. 4.1 and Thm. 4.2 appearing
below (although the proof techniques are completely different). In [52], a
systematic reduction of many types of constraints to an equivalent graph
form is proposed; an improved representation and extensive computational
results are given in [46]. The problem of determining the constraint group
of a model (instead of an instance) is discussed in [47] — we pursue a similar
line of reasoning when inferring the structure of the KNP group (indepen-
dently of the instance) from a sequence of automatically computed KNP
instance groups.
3. Groups of a mathematical program. Let P be a MP with
formulation as in (1.1), and F(P ) (resp. G(P )) be the set of its fea-
sible (resp. globally optimal) points. Two important groups are con-
nected with P . The solution group is the group of all permutations of
the variable indices which map G(P ) into itself; it is defined formally as
G∗ (P ) = stab(G(P ), Sn ) and contains as a subgroup the “symmetry group”
of P , defined limited to MILPs in [43] as the group of permutations map-
ping feasible solutions into feasible solutions having the same objective
function value. Computing solution groups directly from their definition
would imply knowing G(P ) aprioristically, which is evidently irrealistic.
The other important group related to P (denoted by ḠP ) fixes the
formulation of P . For two functions f1 , f2 : Rn → R we write f1 = f2 to
mean dom(f1 ) = dom(f2 ) ∧ ∀x ∈ dom(f1 ) (f1 (x) = f2 (x)). Then

ḠP = {π ∈ Sn | Zπ = Z ∧ f (xπ) = f (x) ∧ ∃ σ ∈ Sm (σg(xπ) = g(x))}.

It is easy to show that ḠP ≤ G∗ (P ): let π ∈ ḠP and x∗ ∈ G(P ); x∗ π ∈


F(P ) because Zπ = Z, g(x∗ π) = σ−1 g(x∗ ); and it has the same function

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 271

value because f (x∗ π) = f (x∗ ) by definition. Thus G(P )π = G(P ) and


π ∈ G∗ (P ).
The definition of ḠP implies the existence of a method for testing
whether f (xπ) = f (x) and whether there is a permutation σ ∈ Sm such
that σg(xπ) = g(x). Since Nonlinear Equations (determining if a set
of general nonlinear equations has a solution) is an undecidable problem in
general [70], such tests are algorithmically intractable. Instead, we assume
the existence of a YES/NO oracle ≡ that answers YES if it can establish
that f1 = f2 (i.e. f1 , f2 have the same domain and are pointwise equal
on their domain). Such an oracle defines an equivalence relation ≡ on
the set of all functions appearing in (1.1): if a pair of functions (f1 , f2 )
belongs to the relation then the functions are equal, but not all pairs of
equal functions might belong to ≡ (i.e. ≡ might answer NO even though
f1 = f2 ). This weakening of the equality relationship will allow us to
give an algorithmically feasible definition of the symmetry group of the
formulation.
We define the ≡ oracle by only considering functions that can be writ-
ten syntactically using infix notation in terms of a finite set of operators
(e.g. arithmetic, logarithm, exponential and so on), a finite set of constants
in Q and the set of problem variables x1 , . . . , xn . Such functions can be
naturally represented by means of expression trees (Fig. 1 left) which, by
contracting leaf vertices with equal labels, can be transformed into DAGs
as shown in Fig. 1 (right). The ≡ oracle is then implemented as a recur-

+ +

× × x3 × × x3

2 x1 x2 x3 2 x1 x2

Fig. 1. Expression tree for 2x1 + x2 x3 + x3 (left). Equal variable vertices can be
contracted to obtain a DAG (right).

sive graph exploration. The function DAG representation is well known


and should perhaps be attributed to folklore (it is mentioned e.g. in [29],
Sect. 2.3). DAG representable functions are routinely used in Global Op-
timization (GO) to automatically build relaxations of MINLPs [62, 32, 6]
or tighten the variable bounds [58]. In the context of symmetry in MP,
precise definitions for DAG representable functions and the ≡ oracle im-
plementation are given in [36, 12]. The formulation group of P can now be
defined as:
GP = {π ∈ Sn |Zπ = Z ∧ f (xπ) ≡ f (x) ∧ ∃σ ∈ Sm (σg(xπ) ≡ g(x))}. (3.1)

www.it-ebooks.info
272 LEO LIBERTI

Because for any function h, h(xπ) ≡ h(x) implies h(xπ) = h(x) for all
x ∈ dom(h), it is clear that GP ≤ ḠP . Thus, it also follows that GP ≤
G∗ (P ). Although ḠP is defined for any MINLP (1.1), if P is a BLP, then
ḠP = GLP (P ) [36]. We remark that if f1 , f2 are linear forms, then f1 = f2
implies f1 ≡ f2 . In other words, for linear forms, ≡ and = are the same
relation [36]. As a corollary, if P is a BLP, then GP = GLP (P ).
If a set of mathematical functions share the same arguments, as for the
objective function f and constraints g of (1.1), the corresponding DAGs for
f, g1 , . . . , gm can share the same variable leaf vertices. This yields a DAG
DP = (VP , AP ) (formed by the union of all the DAGs of functions in P
followed by the contraction of leaf vertices with same variable index label)
which represents the mathematical structure P [49, 58].
4. Automatic computation of the formulation group. The me-
thod proposed in this section also appears (with more details) in [36]. As
mentioned in the literature review, similar techniques are available in CP
[53].
We first define an equivalence relation on VP which determines the
interchangeability of two vertices of DP . Let SF be the singleton set con-
taining the root vertex of the objective function, SC of all constraint root
vertices, SO of all vertices representing operators, SK of all constant ver-
tices and SV of all variable vertices. For v ∈ SF , we denote optimization
direction of the corresponding objective function by d(v); for v ∈ SC , we
denote the constraint sense by s(v). For v ∈ SO , we let (v) be the level
of v in DP , i.e. the length of the path from the root to v ( is well defined
as the only vertices with more than one incoming arc are the leaf vertices),
λ(v) be its operator label and o(v) be the order of v as an argument of
its parent vertex if the latter represents a noncommutative operator, or 1
otherwise. For v ∈ SK , we let μ(v) be the value of v. For v ∈ SV we let
r(v) be the 2-vector of lower and upper variable bounds for v and ζ(v) be
1 if v represents an integral variable or 0 otherwise. We now define the
relation ∼ on VP as follows

∀u, v ∈ VP u ∼ v ⇔ (u, v ∈ SF ∧ d(u) = d(v))


∨ (u, v ∈ SC ∧ s(u) = s(v))
∨ (u, v ∈ SO ∧ (u) = (v) ∧ λ(u) = λ(v) ∧ o(u) = o(v))
∨ (u, v ∈ SK ∧ μ(u) = μ(v))
∨ (u, v ∈ SV ∧ r(u) = r(v) ∧ ζ(u) = ζ(v)).

It is easy to show that ∼ is an equivalence relation on VP , and therefore


partitions VP into K disjoint subsets V1 , . . . , VK .
For a digraph D = (V, A), its automorphism group Aut(D) is the group
of vertex permutations γ such that (γ(u), γ(v)) ∈ A for all (u, v) ∈ A [56].
Let GDAG (P ) be the largest subgroup of Aut(DP ) fixing Vk setwise for all
k ≤ K. We assume without loss of generality that the vertices of DP are

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 273

uniquely numbered so that for all j ≤ n, the j-th vertex corresponds to


the leaf vertex for variable xj (the rest of the numbering is not important),
i.e. SV = {1, . . . , n}.
Let G ≤ Sn and ω be a subset of {1, . . . , n}. Let H = Sym(ω) and
define the mapping ψ : G → H by ψ(π) = π[ω] for all π ∈ G. Then the
following holds.
Theorem 4.1 ([36], Thm. 4). ψ is a group homomorphism if and
only if G stabilizes ω setwise.
Next, we note that GDAG (P ) fixes SV setwise [36]. As a corollary to
Thm. 4.1, the map ϕ : GDAG (P ) → Sym(SV ) given by ϕ(γ) = γ[SV ] is
a group homomorphism.
Theorem 4.2 ([36], Thm. 7). Imϕ = GP groupwise.
By Thm. 4.2, we can automatically generate GP by looking for the largest
subgroup of Aut(DP ) fixing all Vk ’s. Thus, the problem of computing
GP has been reduced to computing the (generators of the) automorphism
group of a certain vertex-coloured DAG. This is in turn equivalent to the
GI problem [3]. GI is in NP, but it is not known whether it is in P or NP-
complete. A notion of GI-completeness has therefore been introduced for
those graph classes for which solving the GI problem is as hard as solving
it on general graphs [67]. Rooted DAGs are GI-complete [8] but there is an
algorithm for solving the GI problem on trees which is linear in the number
of vertices in the tree ([56], Ch. 8.5.2). This should give an insight as to
the type of difficulty inherent to computing Aut(DP ).
Corollary 4.1. If C  is a set of group generators of GDAG (P ), then
C = {π[SV ] | π ∈ C  } is a set of generators for GP .
Cor. 4.1 allows the practical computation of a formulation group: one first
forms the graph DP , then computes generators C  of GDAG (P ), and finally
considers their action on SV to explicitly construct C. Based on the re-
sults of this section, we implemented a software system (called symmgroup)
that automatically detects the formulation group of a problem (1.1). Our
system first calls AMPL [16] to parse the instance; the ROSE Reformula-
tion/Optimization Software Engine [37] AMPL-hooked solver is then called
(with ROSE’s Rsymmgroup reformulator) to produce a file representation
of the problem expression DAG. This is then fed into nauty’s [45, 44]
dreadnaut shell to efficiently compute the generators of Aut(DP ). A sys-
tem of shell scripts and Unix tools parses the nauty output to form a valid
GAP [18] input, used to print the actual group description via the com-
mand StructureDescription.
5. Symmetry breaking constraints. Once the formulation group
is detected, we can adjoin constraints to (1.1) in order to make some of the
symmetric optima infeasible. According to the classification in [35], this is
a reformulation of the narrowing type.
Definition 5.1. Given a problem P , a narrowing Q of P is a formu-
lation (1.1) such that (a) there is a function η : F (Q) → F(P ) for which

www.it-ebooks.info
274 LEO LIBERTI

η(G(Q)) (the image of G(Q) under η) is a subset of G(P ), and (b) Q is


infeasible only if P is.
Our narrowing rests on adjoining some static symmetry breaking in-
equalities (SSBIs) [43] to the original formulation, i.e. inequalities that are
designed to cut some of the symmetric solutions while keeping at least one
optimal one. The reformulated problem can then be solved by standard
software packages such as CPLEX [24] (for MILPs) and Couenne [6] or
BARON [57] for MINLPs.
We first give a formal definition of SSBIs that makes them depend on
a group rather than just a set of solutions.
Definition 5.2. Given a permutation π ∈ Sn acting on the compo-
nent indices of the vectors in a given set X ⊆ Rn , the constraints g(x) ≤ 0
(that is, {g1 (x) ≤ 0, . . . , gq (x) ≤ 0}) are symmetry breaking constraints
(SBCs) with respect to π and X if there is y ∈ X such that g(yπ) ≤ 0.
Given a group G, g(x) ≤ 0 are SBCs w.r.t G and X is there is y ∈ XG
such that g(y) ≤ 0.
If there are no ambiguities as regards X, we simply say “SBCs with
respect to π” (respectively, G). In most cases, X = G(P ). The following
facts are easy to prove.
1. For any π ∈ Sn , if g(x) ≤ 0 are SBCs with respect to π, X then
they are also SBCs with respect to "π#, X.
2. For any H ≤ G, if g(x) ≤ 0 are SBCs with respect to H, X then
they are also SBCs with respect to G, X.
3. Let g(x) ≤ 0 be SBCs with respect to π ∈ Sn , X ⊆ Rn and let
B ⊆ {1, . . . , n}. If g(x) ≡ g(x[B]) (i.e. the constraints g only
involve variable indices in B) then g(x) ≤ 0 are also SBCs with
respect to π[B], X[B].
As regards Fact 3, if g(x) ≡ g(x[B]) we denote the SBCs g(x) ≤ 0 by
g[B](x) ≤ 0; if B is the domain of a permutation α ∈ Sym(B), we also use
the notation g[α](x) ≤ 0.
Example 1. Let y = (1, 1, −1), X = {y} and π = (1, 2, 3); then {x1 ≤
x2 , x1 ≤ x3 } are SBCs with respect to π and X because yπ satisfies the
constraints. {x1 ≤ x2 , x2 ≤ x3 } are SBCs with respect to S3 and X because
(−1, 1, 1) = y(1, 2, 3) ∈ XSn ; however, they are not SBCs with respect to
"(2, 3)# and X because X"(2, 3)# = {y, y(2, 3)} = {(1, 1, −1), (1, −1, 1)} and
neither vector satisfies the constraints.
We use SBCs to yield narrowings of the original problem P .
Theorem 5.1 ([36], Thm. 11). If g(x) ≤ 0 are SBCs for any subgroup
G of GP and G(P ), then the problem Q obtained by adjoining g(x) ≤ 0 to
the constraints of P is a narrowing of P .
6. An application to the Kissing Number Problem. Given pos-
itive integers D, N , the decision version of the Kissing Number Problem
(KNP) [30] asks whether N unit spheres can be positioned adjacent to a
unit sphere centered in the origin in RD . The optimization version asks

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 275

for the maximum possible N . The pedigree of this problem is illustrious,


having originated in a discussion between I. Newton and D. Gregory. The
name of the problem is linked to billiard game jargon: when two balls
touch each other, they are said to “kiss”. As both Newton and Gregory
were of British stock, one may almost picture the two chaps going down
the pub arm in arm for a game of pool and a pint of ale; and then, in
the fumes of alcohol, getting into a brawl about whether twelve or thirteen
spheres might kiss a central one if the billiard table surface was tridimen-
sional. This interpretation disregards the alleged scholarly note (mentioned
in [63]) about the problem arising from an astronomical question. When
D = 2, the maximum feasible N is of course 6 (hexagonal lattice). When
D = 3, the maximum feasible N was conjectured by Newton to be 12 and
by Gregory to be 13 (Newton was proven right 180 years later [59]). The
problem for D = 4 was settled recently with N = 24 [48]. The problem
for D = 5 is still open: a lower bound taken from lattice theory is 40,
and an upper bound derived with Bachoc and Vallentin’s extension [4] of
Delsarte’s Linear Programming (LP) bound [14] is 45.
We formulate the decision version of the KNP as a nonconvex NLP:

maxx,α α ⎪



∀i ≤ N xi 2 = 4 ⎬
i j 2
∀i < j ≤ N x − x  ≥ 4α (6.1)

∀i ≤ N xi ∈ [−2, 2]D ⎪ ⎪


α ∈ [0, 1].

For any given N, D > 1, if a global optimum (x∗ , α∗ ) of (6.1) has α∗ = 1


then a kissing configuration of N balls in RD exists; otherwise, it does
not. In practice, (6.1) is usually solved by heuristics such as Variable
Neighbourhood Search (VNS) [30], because solving it by sBB takes too long
even on very small instances. One of the reasons for the slow convergence
of sBB is that (6.1) has many symmetries. In fact, Aut(G(KNP)) has
infinite (uncountable) cardinality: each optimum x∗ can be rotated by any
angle in RD , and hence for all orthogonal transformations μ ∈ SO(D, R)
(the special orthogonal group of RD ), μ(x∗ ) ∈ G(KNP). Such symmetries
can be easily disposed of by deciding the placement of D spheres so that
they are mutually adjacent as well as adjacent to the central sphere in RD ,
but computational experience suggests that this does little, by itself, to
decrease the size of the sBB tree.
We used the symmgroup system in order to detect the structure of
G(6.1) automatically for a few KNP instances, obtaining an indication that
G(6.1) ∼
= SD . However, since D is small with respect to N , this is not likely
to help the solution process significantly. Let xi = (xi1 , . . . , xiD ) for all
i ≤ N . As in [30] we remark that, for all i < j ≤ N :
 
xi − xj 2 = (xik − xjk )2 = 8 − 2 xik xjk , (6.2)
k≤D k≤D

www.it-ebooks.info
276 LEO LIBERTI


because k≤D x2ik = xi 2 = 4 for all i ≤ N . Let Q be (6.1) reformulated
according to (6.2): automatic detection of GQ yields an indication that
GQ ∼ = SD × SN , which is a considerably larger group. The difference lies
in the fact that the binary minus is in general not commutative; however,
it is commutative whenever it appears in terms like xi − xj  (by defi-
nition of Euclidean norm). Since automatic symmetry detection is based
on expression trees, commutativity of an operator is decided at the vertex
representing the operator, rather than at the parent vertex. Thus, on (6.1),
our automatic system fails to detect the larger group. Reformulation (6.2)
prevents this from happening, thereby allowing the automatic detection of
the larger group.
Example 2. Consider the KNP instance defined by N = 6, D = 2,
whose variable mapping
„ «
x11 x12 x21 x22 x31 x32 x41 x42 x51 x52 x61 x62 α
y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13

yields the following flat [32] instance:


2y13 + y1 y3 + y2 y4 ≤ 4
min(−y13 ) 2y13 + y3 y11 + y4 y12 ≤ 4
2y13 + y1 y5 + y2 y6 ≤ 4
y12 + y22 =4 2y13 + y5 y7 + y6 y8 ≤ 4
2y13 + y1 y7 + y2 y8 ≤ 4
y32 + y42 =4 2y13 + y5 y9 + y6 y10 ≤ 4
2y13 + y1 y9 + y2 y10 ≤ 4
y52 + y62 =4 2y13 + y5 y11 + y6 y12 ≤ 4
2y13 + y1 y11 + y2 y12 ≤ 4
y72 + y82 =4 2y13 + y7 y9 + y8 y10 ≤ 4
2y13 + y3 y5 + y4 y6 ≤ 4
y92 + y10
2
=4 2y13 + y7 y11 + y8 y12 ≤ 4
2 2 2y13 + y3 y7 + y4 y8 ≤ 4
y11 + y12 =4 2y13 + y9 y11 + y10 y12 ≤ 4.
2y13 + y3 y9 + y4 y10 ≤ 4
On the above instance, the symmgroup system reports GP ∼
= C2 × S6 ,
generated as:

"(1, 2)(3, 4)(5, 6)(7, 8)(9, 10)(11, 12),


(1, 3)(2, 4), (3, 5)(4, 6), (5, 7)(6, 8),(7, 9)(8, 10), (9, 11)(10, 12)#,

which, in original variable space, maps to:

"(x11 , x12 )(x21 , x22 )(x31 , x32 )(x41 , x42 )(x51 , x52 )(x61 x62 ),
(x11 , x21 )(x12 , x22 ), (x21 , x31 )(x22 , x32 ), (x31 , x41 )(x32 , x42 ),
(x41 , x51 )(x42 , x52 ), (x51 , x61 )(x52 , x62 )#,

or, in other words, letting xi = (xi1 , xi2 ) for all i ≤ 6,

"τ, (x1 , x2 ), (x2 , x3 ), (x3 , x4 ), (x4 , x5 ), (x5 , x6 )#


6
where τ = i=1 (xi1 , xi2 ). Carried over to the spheres in R2 , this is a
symmetric group action acting independently on the six spheres and on the
two spatial dimensions.

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 277

For N = 12, D = 3 the formulation group is S3 × S12 and for N =


24, D = 4 it is S4 × S24 . This suggests a formulation group SD × SN in
general, where the solutions can be permuted by symmetric actions on the
coordinate indices and, independently, the sphere indices. We now prove
this statement formally. For all i ≤ N call the constraintsxi 2 = 4 the
center constraints and for all i < j ≤ N call the constraints k≤D xik xjk ≤
4 − 2α the distance constraints.
Theorem 6.1. GQ ∼ = SD × SN .
Proof. Let (x, α) ∈ G(Q); the following claims are easy to establish.

1. For any k ≤ D − 1, the permutation τk = i≤N (xik , xi,k+1 ) is in
GQ , as both center and distance constraints are invariant w.r.t. it;
notice that "τk | k ≤ D − 1# ∼ = SD . 
2. For any i ≤ N − 1, the permutation σi = k≤D (xik , xi+1,k ) is in
GQ , as both center and distance constraints are invariant w.r.t. it;
notice that "σi | i ≤ N − 1# ∼ = SN .
3. Any permutation moving α to one of the x variables is not in GQ .
This follows because the objective function only consists of the
variable α, so it is only invariant w.r.t. identity permutation.
4. For any k ≤ D − 1, if π ∈ GQ such that π(xik ) = xi,k+1 for some
i ≤ N then π(xik ) = xi,k+1 for all i ≤ N , as otherwise the term

k≤D xik xjk (appearing in the distance constraints) would not be
invariant.
5. For any i ≤ N − 1, if π ∈ GQ such that π(xik ) = xi+1,k for some
k ≤ D, then π(xik ) = xi+1,k for all k ≤ D, as otherwise the term

k≤D xik xi+1,k (appearing in some of the distance constraints)
would not be invariant.
Let HD = "τk | k ≤ D − 1# and HN = "σi | i ≤ N − 1#. Claims 1-2
imply that HD , HN ≤ GQ . It is easy (but tedious) to check that HD HN =
HN HD ; it follows that HD HN ≤ GQ [10] and hence HD , HN are normal
subgroups of HD HN . Since HD ∩HN = {e}, we have HD HN ∼ = HD ×HN ∼ =
SD ×SN ≤ GQ [2]. Now suppose π ∈ GQ with π = e. By Claim 3, π cannot
move α so it must map xih to xjk for some i < j ≤ N, h < k ≤ D; the
action h → k (resp. i → j) on the components (resp. spheres) indices can
be decomposed into a product of transpositions h → h + 1, . . . , k − 1 → k
(resp. i → i + 1, . . . , j − 1 → j). Thus, by Claim 4 (resp. 5), π involves a
certain product γ of τk ’s and σi ’s; furthermore, since by definition γ maps
xih to xjk , any permutation in GQ (including π) can be obtained as a
product of these elements γ; hence π is an element of HD HN , which shows
GQ ≤ HD HN . Therefore, GQ ∼ = SD × SN as claimed.
In problems involving Euclidean distances, it is often assumed that
symmetries are rotations and translations of Rn ; we remark that GQ is
not necessarily isomorphic to a (finite) subgroup of SO(D, R). Permuting
two sphere indices out of N is an action in GQ but in general there is no
rotation that can act in the same way in RD . Hence enforcing SBCs for

www.it-ebooks.info
278 LEO LIBERTI

GQ is not implied by simply fixing D adjacent spheres in order to break


symmetries in the special orthogonal group.
By Thm. 6.1, GQ = "τk , σi | k ≤ D − 1, i ≤ N − 1#. It is easy to show
that there is just one orbit in the natural action of GQ on the set A =
{1, . . . , N } × {1, . . . , D}, and that the action of GQ on A is not symmetric
(otherwise GQ would be isomorphic to SN D , contradicting Thm. 6.1).
Proposition 6.1. For any fixed h ≤ D,

∀i ≤ N  {1} xi−1,h ≤ xih (6.3)

are SBCs with respect to GQ , G(Q).


Proof. Let x̄ ∈ G(Q); since the σi generate the symmetric group
acting on the N spheres, there exists a permutation π ∈ GQ such that
(x̄π(i),h | i ≤ N ) are ordered as in (6.3).
6.1. Computational results on the KNP. Comparative solutions
yielded by running BARON [57] on KNP instances with and without SBC
reformulation have been obtained on one 2.4GHz Intel Xeon CPU of a
computer with 8 GB RAM (shared by 3 other similar CPUs) running Linux.
These results are shown in Table 1, which contains the following statistics
at termination (occurring after 10h of user CPU time):
1. the objective function value of the incumbent
2. the seconds of user CPU time taken (meaningful if < 10h)
3. the gap still open
4. the number of BB nodes closed and those still on the tree.
The first column contains the instance name in the form knp-N D. The
first subsequent set of three columns refer to the solution of the original
formulations (CPU time, best optimal objective function value f ∗ , open
gap at termination, number of nodes created and number of open nodes
in the tree at termination); the second subsequent set of three columns
(labelled NarrowingKNP) refer to the solution of the formulation obtained
by adjoining (6.3) to the original formulation. The last column (R.t.) con-
tains the time (in user CPU seconds) needed to automatically compute
the formulation group using the methods in Sect. 4. In both formulations
we fixed the first sphere at (−2, 0, . . . , 0) to break some of the orthogonal
symmetries. We remark that the objective function values are negative
because we are using a minimization direction (instead of maximization).
Judging from the 2-dimensional KNP instances, where BARON con-
verges to optimality, it is evident that the NarrowingKNP reformulation is
crucial to decrease the CPU time significantly: the total CPU time needed
to solve the five 2D instances in the original formulation is 74047s, whereas
the NarrowingKNP reformulations only take 173s, that is a gain of over 400
times. It also appears clear from the results relating to the larger instances
that adjoining SBCs to the formulation makes a definite (positive) differ-
ence in the exploration rate of the search tree. The beneficial effects of the
narrowing decrease with the instance size (to the extent of disappearing

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 279

Table 1
Computational results for the Kissing Number Problem.

Original problem NarrowingKNP


f∗ nodes f∗ nodes
Instance Slv CPU gap tree CPU gap tree R.t.
-1 1118 -1 186
knp-6 2 B 8.66 0% 0 1.91 0% 0 1.43
-0.753 13729 -0.753 260
knp-7 2 B 147.21 0% 0 3.86 0% 0 1.47
-0.586 179994 -0.586 650
knp-8 2 B 1892 0% 0 12.17 0% 0 2.94
-0.47 1502116 -0.47 1554
knp-9 2 B 36000 33.75% 176357 37.36 0% 0 1.96
-0.382 936911 -0.382 3446
knp-10 2 B 36000 170% 167182 117.79 0% 0 1.97
-1.105 299241 -1.105 273923
knp-12 3 B 36000 8.55% 12840 36000 8.55% 5356 3.39
-0.914 102150 -0.914 68248
knp-13 3 B 36000 118% 64499 36000 118% 33013 3.38
-0.966 10156 -0.92 4059
knp-24 4 B 36000 107% 7487 36000 117% 2985 5.62
-0.93 7768 -0.89 4251
knp-24 5 B 36000 116% 5655 36000 124% 3122 6.1

completely for knp-25 4) because we are keeping the CPU time fixed at
10h. We remark that the effectiveness of the NarrowingKNP reformulation
in low-dimensional spaces can be partly explained by the fact that it is de-
signed to break sphere-related symmetries rather than dimension-related
ones (naturally, the instance size also counts: the largest 2D instance,
knp-10 2, has 21 variables, whereas the smallest 3D one, knp-12 3, has 37
variables).

7. Conclusion. This paper introduces the study of symmetries in


nonlinear and mixed-integer nonlinear programming. We use a general-
ization of the definition of formulation group given by Margot, based on
transforming a mathematical programming formulation into a DAG. This
allows automatic symmetry detection using graph isomorphism tools. Sym-
metries are then broken by means of static symmetry-breaking inequalities.
We present an application of our findings to the Kissing Number Problem.

Acknowledgements. I wish to thank François Margot for many use-


ful discussions and suggestions, and for carefully checking [34] (from which
some of the present material is taken) as well as one particularly careful ref-
eree. This work was financially supported by grants: ANR 07-JCJC-0151
“ARS”, Digiteo Chair 2009-14D “RMNCCO”, Digiteo Emergence 2009-
55D “ARM”.

REFERENCES

[1] F. Alizadeh. Interior point methods in Semidefinite Programming with applica-


tions to combinatorial optimization. SIAM Journal on Optimization, 5(1):13–
51, 1995.

www.it-ebooks.info
280 LEO LIBERTI

[2] R. Allenby. Rings, Fields and Groups: an Introduction to Abstract Algebra.


Edward Arnold, London, 1991.
[3] L. Babai. Automorphism groups, isomorphism, reconstruction. In R. Graham,
M. Grötschel, and L. Lovász, editors, Handbook of Combinatorics, Vol. 2,
pages 1447–1540. MIT Press, Cambridge, MA, 1996.
[4] C. Bachoc and F. Vallentin. New upper bounds for kissing numbers from
Semidefinite Programming. Journal of the American Mathematical Society,
21:909–924, 2008.
[5] D. Bell. Constructive group relaxations for integer programs. SIAM Journal on
Applied Mathematics, 30(4):708–719, 1976.
[6] P. Belotti, J. Lee, L. Liberti, F. Margot, and A. Wächter. Branching and
bounds tightening techniques for non-convex MINLP. Optimization Methods
and Software, 24(4):597–634, 2009.
[7] T. Berthold and M. Pfetsch. Detecting orbitopal symmetries. In B. Fleis-
chmann, K.-H. Borgwardt, R. Klein, and A. Tuma, editors, Operations Re-
search Proceedings 2008, pages 433–438, Berlin, 2009. Springer.
[8] K. Booth and C. Colbourn. Problems polynomially equivalent to graph isomor-
phism. Technical Report CS-77-04, University of Waterloo, 1979.
[9] M. Boulle. Compact mathematical formulation for graph partitioning. Optimiza-
tion and Engineering, 5:315–333, 2004.
[10] A. Clark. Elements of Abstract Algebra. Dover, New York, 1984.
[11] D. Cohen, P. Jeavons, C. Jefferson, K. Petrie, and B. Smith. Symmetry defi-
nitions for constraint satisfaction problems. In P. van Beek, editor, Constraint
Programming, Vol. 3709 of LNCS. Springer, 2005.
[12] A. Costa, P. Hansen, and L. Liberti. Formulation symmetries in circle packing.
In R. Mahjoub, editor, Proceedings of the International Symposium on Com-
binatorial Optimization, Vol. 36 of Electronic Notes in Discrete Mathematics,
pages 1303–1310, Amsterdam, 2010. Elsevier.
[13] J. Crawford, M. Ginsberg, E. Luks, and A. Roy. Symmetry-breaking pred-
icates for search problems. In Principles of Knowledge Representation and
Reasoning, pages 148–159, Cambridge, MA, 1996. Morgan Kaufmann.
[14] Ph. Delsarte. Bounds for unrestricted codes by linear programming. Philips
Research Reports, 27:272–289, 1972.
[15] Y. Faenza and V. Kaibel. Extended formulations for packing and partitioning
orbitopes. Mathematics of Operations Research, 34(3):686–697, 2009.
[16] R. Fourer and D. Gay. The AMPL Book. Duxbury Press, Pacific Grove, 2002.
[17] E.J. Friedman. Fundamental domains for integer programs with symmetries. In
A. Dress, Y. Xu, and B. Zhu, editors, COCOA Proceedings, Vol. 4616 of
LNCS, pages 146–153. Springer, 2007.
[18] The GAP Group. GAP – Groups, Algorithms, and Programming, Version 4.4.10,
2007.
[19] K. Gatermann and P. Parrilo. Symmetry groups, Semidefinite Programs and
sums of squares. Journal of Pure and Applied Algebra, 192:95–128, 2004.
[20] R. Gomory. Some polyhedra related to combinatorial problems. Linear Algebra
and Its Applications, 2(4):451–558, 1969.
[21] M. Hall. Theory of Groups. Chelsea Publishing Company, New York, 2nd edition,
1976.
[22] P. Hansen and N. Mladenović. Variable neighbourhood search: Principles and
applications. European Journal of Operations Research, 130:449–467, 2001.
[23] K. Herr and R. Bödi. Symmetries in linear and integer programs. Technical
Report 0908.3329v1 [math.CO], arXiv.org, 2009.
[24] ILOG. ILOG CPLEX 11.0 User’s Manual. ILOG S.A., Gentilly, France, 2008.
[25] E. Johnson. Integer Programming: Facets, Subadditivity and Duality for Group
and Semi-group Problems. SIAM, Philadelphia, 1980.
[26] V. Kaibel and M. Pfetsch. Packing and partitioning orbitopes. Mathematical
Programming, 114(1):1–36, 2008.

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 281

[27] Y. Kanno, M. Ohsaki, K. Murota, and N. Katoh. Group symmetry in interior-


point methods for Semidefinite Program. Optimization and Engineering,
2:293–320, 2001.
[28] E. De Klerk and R. Sotirov. Exploiting group symmetry in Semidefinite Pro-
gramming relaxations of the quadratic assignment problem. Mathematical
Programming, 122(2):225–246, 2010.
[29] D.E. Knuth. The Art of Computer Programming, Part I: Fundamental Algo-
rithms. Addison-Wesley, Reading, MA, 1968.
[30] S. Kucherenko, P. Belotti, L. Liberti, and N. Maculan. New formulations
for the kissing number problem. Discrete Applied Mathematics, 155(14):1837–
1841, 2007.
[31] J. Lee and F. Margot. On a binary-encoded ILP coloring formulation. INFORMS
Journal on Computing, 19(3):406–415, 2007.
[32] L. Liberti. Writing global optimization software. In L. Liberti and N. Maculan,
editors, Global Optimization: from Theory to Implementation, pages 211–262.
Springer, Berlin, 2006.
[33] L. Liberti. Automatic generation of symmetry-breaking constraints. In B. Yang,
D.-Z. Du, and C.A. Wang, editors, COCOA Proceedings, Vol. 5165 of LNCS,
pages 328–338, Berlin, 2008. Springer.
[34] L. Liberti. Reformulations in mathematical programming: Symmetry. Technical
Report 2165, Optimization Online, 2008.
[35] L. Liberti. Reformulations in mathematical programming: Definitions and sys-
tematics. RAIRO-RO, 43(1):55–86, 2009.
[36] L. Liberti. Reformulations in mathematical programming: Automatic symmetry
detection and exploitation. Mathematical Programming, DOI 10.1007/s10107-
010-0351-0.
[37] L. Liberti, S. Cafieri, and F. Tarissan. Reformulations in mathematical pro-
gramming: A computational approach. In A. Abraham, A.-E. Hassanien,
P. Siarry, and A. Engelbrecht, editors, Foundations of Computational Intel-
ligence Vol. 3, number 203 in Studies in Computational Intelligence, pages
153–234. Springer, Berlin, 2009.
[38] L. Liberti, N. Mladenović, and G. Nannicini. A good recipe for solving
MINLPs. In V. Maniezzo, T. Stützle, and S. Voß, editors, Hybridizing meta-
heuristics and mathematical programming, Vol. 10 of Annals of Information
Systems, pages 231–244, New York, 2009. Springer.
[39] F. Margot. Pruning by isomorphism in branch-and-cut. Mathematical Program-
ming, 94:71–90, 2002.
[40] F. Margot. Exploiting orbits in symmetric ILP. Mathematical Programming B,
98:3–21, 2003.
[41] F. Margot. Small covering designs by branch-and-cut. Mathematical Program-
ming B, 94:207–220, 2003.
[42] F. Margot. Symmetric ILP: coloring and small integers. Discrete Optimization,
4:40–62, 2007.
[43] F. Margot. Symmetry in integer linear programming. In M. Jünger, T. Liebling,
D. Naddef, G. Nemhauser, W. Pulleyblank, G. Reinelt, G. Rinaldi, and
L. Wolsey, editors, 50 Years of Integer Programming, pages 647–681. Springer,
Berlin, 2010.
[44] B. McKay. Practical graph isomorphism. Congressus Numerantium, 30:45–87,
1981.
[45] B. McKay. nauty User’s Guide (Version 2.4). Computer Science Dept. , Aus-
tralian National University, 2007.
[46] C. Mears, M. Garcia de la Banda, and M. Wallace. On implementing sym-
metry detection. Constraints, 14(2009):443–477, 2009.

www.it-ebooks.info
282 LEO LIBERTI

[47] C. Mears, M. Garcia de la Banda, M. Wallace, and B. Demoen. A novel


approach for detecting symmetries in CSP models. In L. Perron and M. Trick,
editors, Constraint Programming, Artificial Intelligence and Operations Re-
search, volume 5015 of LNCS, pages 158–172, New York, 2008. Springer.
[48] O. Musin. The kissing number in four dimensions. arXiv:math.MG/0309430v2,
April 2005.
[49] A. Neumaier. Complete search in continuous global optimization and constraint
satisfaction. Acta Numerica, 13:271–369, 2004.
[50] J. Ostrowski, J. Linderoth, F. Rossi, and S. Smriglio. Orbital branching.
In M. Fischetti and D.P. Williamson, editors, IPCO, volume 4513 of LNCS,
pages 104–118. Springer, 2007.
[51] J. Ostrowski, J. Linderoth, F. Rossi, and S. Smriglio. Constraint orbital
branching. In A. Lodi, A. Panconesi, and G. Rinaldi, editors, IPCO, volume
5035 of LNCS, pages 225–239. Springer, 2008.
[52] J.-F. Puget. Automatic detection of variable and value symmetries. In P. van
Beek, editor, Constraint Programming, volume 3709 of LNCS, pages 475–489,
New York, 2005. Springer.
[53] A. Ramani and I. Markov. Automatically exploiting symmetries in constraint
programming. In B. Faltings, A. Petcu, F. Fages, and F. Rossi, editors, Con-
straint Solving and Constraint Logic Programming, volume 3419 of LNAI,
pages 98–112, Berlin, 2005. Springer.
[54] S. Robertson. Polytopes and Symmetry. Cambridge University Press, Cambridge,
1984.
[55] R.T. Rockafellar. A combinatorial algorithm for linear programs in the general
mixed form. Journal of the Society for Industrial and Applied Mathematics,
12(1):215–225, 1964.
[56] K.H. Rosen, editor. Handbook of Discrete and Combinatorial Mathematics. CRC
Press, New York, 2000.
[57] N.V. Sahinidis and M. Tawarmalani. BARON 7.2.5: Global Optimization of
Mixed-Integer Nonlinear Programs, User’s Manual, 2005.
[58] H. Schichl and A. Neumaier. Interval analysis on directed acyclic graphs for
global optimization. Journal of Global Optimization, 33(4):541–562, 2005.
[59] K. Schütte and B.L. van der Waerden. Das problem der dreizehn kugeln.
Mathematische Annalen, 125:325–334, 1953.
[60] A. Seress. Permutation Group Algorithms. Cambridge University Press, Cam-
bridge, 2003.
[61] H. Sherali and C. Smith. Improving discrete model representations via symmetry
considerations. Management Science, 47(10):1396–1407, 2001.
[62] E. Smith and C. Pantelides. A symbolic reformulation/spatial branch-and-bound
algorithm for the global optimisation of nonconvex MINLPs. Computers &
Chemical Engineering, 23:457–478, 1999.
[63] G. Szpiro. Newton and the kissing problem. Plus magazine (online), 23, January
2003.
[64] A.W. Tucker. A combinatorial equivalence of matrices. In R. Bellman and
M. Hall, editors, Proceedings of the 10th Symposium of Applied Mathematics,
pages 129–140, Providence, Rhode Island, 1960. AMS.
[65] A.W. Tucker. Solving a matrix game by linear programming. IBM Journal of
Research and Development, 4:507–517, 1960.
[66] A.W. Tucker. Combinatorial theory underlying linear programs. In L. Graves and
P. Wolfe, editors, Recent Advances in Mathematical Programming. McGraw-
Hill, New York, 1963.
[67] R. Uehara, S. Toda, and T. Nagoya. Graph isomorphism completeness for
chordal bipartite graphs and strongly chordal graphs. Discrete Applied Math-
ematics, 145:479–482, 2005.
[68] F. Vallentin. Symmetry in Semidefinite Programs. Linear Algebra and its Ap-
plications, 430:360–369, 2009.

www.it-ebooks.info
SYMMETRY IN MATHEMATICAL PROGRAMMING 283

[69] L. Wolsey. Group representation theory in integer programming. Technical Re-


port Op. Res. Center 41, MIT, 1969.
[70] W. Zhu. Unsolvability of some optimization problems. Applied Mathematics and
Computation, 174:921–926, 2006.

www.it-ebooks.info
www.it-ebooks.info
PART V:
Convexification and
Linearization

www.it-ebooks.info
www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR
SOLVING MINLPs
BJÖRN GEIßLER∗ , ALEXANDER MARTIN∗ , ANTONIO MORSI∗ , AND
LARS SCHEWE∗

1. Introduction. In this chapter we want to demonstrate that in


certain cases general mixed integer nonlinear programs (MINLPs) can be
solved by just applying purely techniques from the mixed integer linear
world. The way to achieve this is to approximate the nonlinearities by
piecewise linear functions. The advantage of applying mixed integer lin-
ear techniques are that these methods are nowadays very mature, that is,
they are fast, robust, and are able to solve problems with up to millions of
variables. In addition, these methods have the potential of finding globally
optimal solutions or at least to provide solution guarantees. On the other
hand, one tends to say at this point “If you have a hammer, everything
is a nail.”[15], because one tries to reformulate or to approximate an ac-
tual nonlinear problem until one obtains a model that is tractable by the
methods one is common with. Besides the fact that this is a very typical
approach in mathematics the question stays whether this is a reasonable
approach for the solution of MINLPs or whether the nature of the nonlin-
earities inherent to the problem gets lost and the solutions obtained from
the mixed integer linear problem have no meaning for the MINLP. The
purpose of this chapter is to discuss this question. We will see that the
truth lies somewhere in between and that there are problems where this is
indeed a reasonable way to go and others where it is not.
The idea to obtain a mixed integer linear program (MIP) out of a
MINLP is simple. Each nonlinear function is approximated by a piecewise
linear function. This reads for 1-dimensional functions as follows: We
subdivide the interval [a, b], where we want to approximate the nonlinear
function f (x) by introducing vertices [a = x̄0 < x̄2 < . . . < x̄n = b] and
determine the function value ȳi = f (x̄i ) at each of these vertices. The pairs
(x̄i−1 , ȳi−1 ) and (x̄i , ȳi ) might then be connected by lines for i = 1, . . . , n to
obtain a piecewise linear function. In higher dimensions this principle is the
same except that the lines are triangles, tetrahedra, or in general simplices.
This fact immediately shows one major problem with this approach. The
number of simplices needed increases drastically with the dimension, for
instance, even the subdivision of a cube in dimension five contains at least
25 simplices, and in dimension six already 86. Thus this approach seems
applicable for MINLPs only if the nonlinear functions contain only a few
variables. We will see such examples later on.
∗ Department of Mathematics, Friedrich-Alexander-University of Erlangen- Nurem-

berg, Erlangen, Germany ({bjoern.geissler, alexander.martin, antonio.morsi,


lars.schewe}@math.uni-erlangen.de).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 287
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_10,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
288 BJÖRN GEIßLER ET AL.

On the other hand this drawback might be overcome by modeling


high dimensional nonlinear functions by sums of one-dimensional functions
by using the logarithm and by substituting variables. For instance the
nonlinear equation f = xy with x, y ∈ R+ may be modeled by substituting
u = log(x), v = log(y) and f  = log(f ). Now we can rewrite the nonlinear
equation in two variables by the linear equation f  = u + v. consisting of
sums of only 1-dimensional functions. Functions in this form are called
separable and are discussed in Section 2. However, we already see at this
stage that these transformations might cause numerical difficulties due to
the application of the log.
If it is not advisory to apply these transformations or if the nonlinear
function is not separable we have to approximate the nonlinear functions
as they are by piecewise linear functions. And in fact the literature is huge
on how to model piecewise linear functions. Starting in the fifties where it
was suggested to model piecewise linear functions by convex combinations
of the vertices various new ideas have come up until today. A paper by
Nemhauser and Vielma [23] marks the current end. All these methods will
be discussed in Section 3. Interesting to note is that from a theoretical
point of view some of the methods are clearly superior to others, but this
fact cannot yet be confirmed by numerical results as we will see in Section 6.
What we have left out so far is a discussion about the introduction of
the vertices in order to get a good piecewise linear approximation. On one
side, the approach has the potential to approximate the nonlinear functions
arbitrarily close by introducing enough vertices. On the other hand, the
introduction of additional vertices implies the introduction of additional
binary variables or combinatorial conditions, and thus one must be careful
in order not to end up in intractable models. Thus, an interesting side
problem occurs how to introduce not too many of such vertices keeping
simultaneously the approximation error under control. This topic is ad-
dressed in Section 4.
Before starting the discussions in detail let us introduce some moti-
vating practical examples, where piecewise linear functions seem to be a
reasonable way to go. As pointed out above, the piecewise linear approach
for the solution of MINLPs seems to be appropriate when the involved non-
linear functions are of low dimension. This is true for example for any kind
of network problems with nonlinearities occurring on the edges such as the
design and management of energy networks. For instance, in gas network
optimization, the transport of gas is modeled by a system of partial differ-
ential equations. After discretization of the PDEs a system of differential
algebraic equations is left, where the nonlinearities depend on the ingoing
pressure, the outgoing pressure and the flow of the gas in each pipe. The
PDE or DAE systems, respectively, model the pressure loss of the gas in
the pipes. Similar, the fuel gas consumptions of the compressors, which are
used to increase the pressure of the gas again, depend also on these three

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 289

types of variables, see Section 6 for details. The same type of nonlinearities
occur when one wants to model water or power networks.
Despite the facts that piecewise linear functions might be helpful for
the solution of MINLPs and that they are of interest of their own, they
also directly show up in practical applications. One such example is the
optimization of the social welfare in power spot markets. One of the prod-
ucts that is traded for instance at the European Energy Exchange (EEX)
or the Nordpol Spot (NPS) are hourly bid curves. For each trading hour of
the day the customers give their hourly bids by a set of points of the form
power per price. These set of points are linearly interpolated resulting in a
piecewise linear function. All other conditions in the welfare optimization
problem are of linear or combinatorial nature resulting in a huge mixed
integer linear program containing piecewise linear functions, see [16] for
details.

2. Linearizing 1D- and separable functions. We start our dis-


cussion with the case of univariate functions. The methods can directly be
adapted to the case of separable functions. Additionally, most methods for
higher dimensional functions are extensions of the one-dimensional case, so
the understanding of this easier case is helpful for the study of the more
complicated methods. However, we have already seen in the introduction
that also one-dimensional piecewise linear functions occur in applications.
We will assume in this section that we are already dealing with a piecewise
linear function, for methods how to discretize given nonlinear functions we
refer to Section 4.
The case of separable functions can be reduced to the case of one-
dimensional functions. A function f : Rd → R is called separable, if
it can be written as a sum of one-dimensional functions, i.e., there ex-

k
ist f1 , . . . , fk : R → R such that f (x1 , . . . , xk ) = fi (xi ). However,
i=1
one should note that in many cases where a function is given analyti-
cally it can be transformed into a composition
d of separable functions. For
instance a monomial function f (x) = i=1 xα i can be transformed into
i
d
g(x) = i=1 αi log xi so that f = exp ◦ g.
For the rest of this section we will study the following situation (see
Fig 1): We are given a continuous piecewise linear function φ : R → R
that is encoded by n + 1 vertices xi (i ∈ {0, . . . , n}) and the respective
function values y i = φ(xi ). Thus, the function consists of n line-segments.
The aim of this section is to formulate a mixed integer linear model of the
relation y = φ(x) for a point x ∈ [x0 , xn ]. For this we introduce auxiliary
variables to model the piecewise linear geometry. However, in all models
it is necessary to enforce some combinatorial constraints on these auxiliary
variables. For this we use additional binary variables, these will be called
z (with respective subscripts) throughout the chapter.

www.it-ebooks.info
290 BJÖRN GEIßLER ET AL.

yi

x0 xi xn

Fig. 1. A piecewise linear function.

yi
λi
λi+1
y i+1

λi λi+1

xi xi+1

Fig. 2. Convex-combination method.

We start with the convex-combination method [7] (see Fig. 2). It uses
the following observation: When φ is a piecewise linear function, we can
compute the function value at point x if we can express x as the con-
vex combination of the neighboring nodes. We know from Carathéodory’s
theorem that we only need these two neighboring points. This condition
can be expressed using n binary variables z1 , . . . , zn . Thus, we use the
following model
n
 n

x= λi xi , λi = 1, λ ≥ 0 (2.1)
i=0 i=0

n

y= λi y i , (2.2)
i=0

λ0 ≤ z1 ,
∀i ∈ {1, . . . , n − 1}. λi ≤ zi + zi+1 ,
λn ≤ zn (2.3)
n
zi ≤ 1 .
i=1

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 291

yi y i+1 −y i
xi+1 −xi δi

yi+1

δi

xi xi+1

Fig. 3. Incremental method.

Another method to model a piecewise linear function is the incremental


method [21] (see Fig. 3). Here we express x lying in interval i as xi−1 + δ.
y −y
The function value can then be expressed as y = y i−1 + xii −xi−1 i−1
δ. To
use this in a formulation we use variables δi for each interval i such that
0 ≤ δi ≤ xi − xi−1 . We then use binary variables zi to force the so-called
“filling condition”, i.e., the condition δi > 0 implies that δi−1 is at its upper
bound.
We obtain the following model
n

x = x0 + δi (2.4)
i=1

n
y i − y i−1
y = y0 + δi (2.5)
i=1
xi − xi−1

∀i ∈ {1, . . . , n}. (xi−1 − xi−2 )zi−1 ≤ δi


(2.6)
∀i ∈ {1, . . . , n − 1}. δi ≤ (xi − xi−1 )zi .

The quality of these two schemes was discussed by Padberg [24]. He


studied the case where the objective function of a linear program is a piece-
wise linear function. In that case the formulation of the incremental method
always yields an integral solution of the LP-relaxation whereas this is not
the case for the convex combination method. In any case the polyhedron
described by the incremental method is properly contained of the one de-
scribed by the convex combination method. These are not the only methods
that were proposed to model piecewise linear functions: In particular we
want to mention the so-called multiple choice method [1].
The methods described so far are generic and be readily incorporated
in standard mixed integer linear models. However, the number of newly in-
troduced binary variables often slows down the solver. In addition, branch-
ing on the binary variables tends to result in deep and unbalanced branch-
and-bound trees. This problem was noticed quite early in the development

www.it-ebooks.info
292 BJÖRN GEIßLER ET AL.


5

3 λi = 1
λi = 1 i=3
i=0

x0 x1 x2 x3 x4 x5

Fig. 4. Branching decision for SOS-2 constraints.

c(k) 000 001 011 010 110


x0 x1 x2 x3 x4 x5

Fig. 5. Gray code encoding for SOS-2 constraints.

of these methods. The proposed remedy was to modify the branching


strategy of the solver (see e.g. [3, 4]). Thus, to model the combinatorial
constraints of the respective models we no longer use the auxiliary binary
variables, but we incorporate the constraints directly in the branching. We
describe these modifications for the case of the convex combination method;
in the case of the incremental formulation a similar approach can be made
(cf. [13]).
Looking at the convex combination method we see that we need to
enforce condition (2.3) on the components of λ: At most two components
of λ may be non-zero, and if two components are non-zero, these have to
be adjacent. This condition is called a Special Ordered Set Condition of
Type 2 (SOS-2). Thus, if the LP-relaxation yields some λ that violates
these conditions one can define new branches in the branch-and-bound-

k
tree by generating a node with the additional constraint that λi = 1
i=0

n
and another one with the additional constraint λi = 1 (see Fig. 4). One
i=k
can summarize this by saying that we take the decision in which of the n
intervals the point x should lie. As mentioned earlier, the main drawback of
the classical formulations with binary variables is the unbalanced branch-
and-bound tree that we obtain. On the other hand, the interaction between
the binary variables from the formulation of the piecewise linear function
with the rest of the model may yield speed ups compared to an approach
with modified branching.
However, it has been pointed out recently by Vielma and Nemhauser
in [23] that one can formulate conditions like the SOS-2 condition using
dramatically fewer binary variables. We discuss their formulation for the
SOS-2 case, i.e., the convex-combination method. We have already seen
that with n binary variables we can encode 2n different cases, so we should

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 293

be able to model an SOS-2 condition with only log2 n binary variables.


And that is indeed the case.
The idea is to use a Gray code to encode the definition intervals of the
piecewise linear function. We take an injective function c : {1, . . . , n} →
{0, 1} log2 n with the additional property that for any number k the vec-
tors c(k) and c(k + 1) only differ in one component. For an example see
Fig. 5. Using binary variables z1 , . . . , z log2 n , we can then enforce the
SOS-2 condition with the following constraints (note that we still need the
constraints (2.1)):

k−2
 n
  
∀k ∈ {1, . . . , n} λi + λi ≤ (1 − zl ) + zl . (2.7)
i=0 i=k+1 {l|(c(k))l =1} {l|(c(k))l =0}

This approach can even further be generalized, see Nemhauser and


Vielma [23] and Vielma et al. [30]. The model we present here can be
viewed as the one-dimensional variant of the logarithmic formulation of
the aggregated convex combination method (again see [30]). The so-called
disaggregated formulation will be presented in the next section in the con-
text of nonseparable functions.
In general, the methods here can be adapted to special cases. For
instance, sometimes one may drop the SOS-2 constraints for the convex
combination method. This is the case for convex φ, if it is the objective
function or it only appears in a constraint of the form φ(x) ≤ b. In addition,
the mentioned methods can also be extended to the case of discontinuous
φ, see [8, 31, 30].

3. Piecewise linearization of nonseperable functions. In this


section we present adequate mixed integer modeling techniques for higher
dimensional functions. Even if the methods from the previous section are
not restricted to univariate functions, as we have seen with separable func-
tions, many functions do not fall into these category. Commonly in that
case modeling tricks can be applied to reduce the degree of a nonlinear
term, but this tricks often cause numerically instable programs.
The univariate MIP approaches from the previous section provide a
basis for the following models of multivariate piecewise linear functions. We
review the presented approaches with a view to piecewise linear functions
of arbitrary dimension.
For the remainder of this section let φ : Rd → R be an arbitrary con-
tinuous piecewise linear function. Thus the function φ does not necessarily
have to be separable and of course it does not need to be convex and con-
cave, respectively. First, we generalize the definition of piecewise linear
functions to higher dimensions.

www.it-ebooks.info
294 BJÖRN GEIßLER ET AL.

Definition 3.1. Let D ⊂ Rd be a compact set. A continuous function


φ : D → R is called piecewise linear if it can be written in the form
φ(x) = φS (x) for x ∈ S ∀S ∈ S, (3.1)

with affine functions φS for a finite set of simplices S that partitions D.


For the sake of simplicity we restrict ourselves to functions over com-
pact domains, though some techniques can be applied to unbounded do-
mains, too. Furthermore our MIP techniques deal with continuous func-
tions only. More information on a special class of discontinuous functions
so-called lower semi-continuous functions can be found in [31]. In the lit-
erature it is common to be not as restrictive and require the domain S of
each piece to be simply a polytope. However, since both definitions are
equivalent and some of our approaches rely on simplicial pieces we go for
the above definition.
According to Definition 3.1, we denote by S the set of simplices forming
D. The cardinality of this set is n := |S|. The set  of vertices of a single
d-simplex S is denoted by V(S) := xS0 , . . . , xSd . Furthermore V(S) =
#
{xS1 , . . . , xSm } := S∈S V(S) is the entire set of vertices of S. As in the
previous section our aim is to formulate a mixed integer linear model in
which y = φ(x) for x ∈ D holds. According to the univariate case auxiliary
variables will uniformly be denoted by z.
We start by adapting the convex-combination method to d-dimensional
functions. The key idea here is that every point x ∈ S ⊆ D can be expressed
as convex combination of vertices of the simplex S. In addition with binary
variables z1 , . . . , zn used to decide which simplex contains the point x we
yield the following model.
m
 m

x= λj xSj , λj = 1, λ ≥ 0, (3.2)
j=1 j=1

m

y= λj y Sj , (3.3)
j=1


λj ≤ zi for j = 1, . . . , m, (3.4)
{i | xS
j ∈V(Si )}

n

zi ≤ 1. (3.5)
i=1

A slight modification of this model is known as disaggregated convex


combination method [22]. Its name results from explicitly introducing λ-
variables for each simplex’ vertex. Originally, this approach is designed

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 295

x2

0)
x

x1

2
(x
x0 δ2 − x0 )
δ1 (x1

Fig. 6. A point x = x0 + δ1 (x1 − x0 ) + δ2 (x2 − x0 ) inside a triangle.

for semi-continuous functions, but on some continuous model instances the


drawback of much more variables is evened by the fact that this formu-
lation, unlike the aggregated variant, is locally ideal. The term of locally
ideal MIP formulations for piecewise linear functions has been established
in [24, 25] and states that the polytope of the corresponding LP relaxation,
without further constraints, is integral.
Adapting the incremental method to higher dimensional functions is
not as simple as the convex combination method as we will notice. At first
we see that any point xS inside a simplex S ∈ S can be expresseddeither as
convex combination of its vertices or equivalently as xS = xS0 + j=1 (xSj −
d
xS0 )δjS with S S
j=1 δi ≤ 1 and nonnegative δi ≥ 0 for i = 1, . . . , d (cf.
Fig. 6).
When we look back to the incremental method in dimension one, we
notice that the second main argument of this approach is that an ordering
of the simplices holds in which the last vertex of any simplex is equal to
the first vertex of the next one. A natural generalization of this approach
to dimension d ≥ 2 is therefore possible if an ordering of simplices with the
following properties is available.
(O1) The simplices in S = {S1 , . . . , Sn } are ordered in such a way that
Si ∩ Si+1 = ∅ for i = 1, . . . , n − 1 holds and
(O2) for each simplex Si its vertices xS0 i , . . . , xSd i can be labeled such
S
that xSd i = x0 i+1 holds for i = 1, . . . , n − 1.
An ordering of a two-dimensional example triangulation is illustrated
in Fig. 7. In the univariate case such an ordering is automatically given
since the set of simplices S simply consists of a sequence of line segments.
Based on properties (O1) and (O2) the first vertex xS0 i of simplex Si is
i−1
obtained by xS0 i = xS0 1 + k=1 (xSd k − xS0 k ).
Bringing this together with the representation of a point inside a single
simplex as sum of a vertex and the rays spanning the simplex from that
vertex, we get the generalized incremental model

www.it-ebooks.info
296 BJÖRN GEIßLER ET AL.

0 1 0 1
0 1
2
S4 1 2 S7
2
2 2
S8 S9
0
0 1
0 S6
2 S5 0
2
2
1
S3
0
2 0 S10
S2
0 1
S11
2

1
S1 1
1 2

1 0

Fig. 7. Triangle ordering on a rectangular domain.

n 
 d
x = xS0 1 + (xSj i − xS0 i )δjSi , (3.6)
i=1 j=1

d 1
n 
 2
y= y S0 1 + ySj i − y S0 i δjSi , (3.7)
i=1 j=1

d

δjSi ≤ 1 for i = 1, . . . , n, (3.8)
j=1

δjSi ≥ 0 for i = 1, . . . , n and j = 1, . . . , d. (3.9)

In addition the δ-variables have to satisfy a “generalized filling condi-


S
tion”, that is, if for any simplex Si a variable δjSi is positive then δd i−1 = 1
must hold. Obviously we can conclude that a variable δjSi can only be pos-
itive if for all previous simplices k = 1, . . . , i − 1 the variables δdSk are equal
to one. To enforce this condition we introduce auxiliary binary variables z
and use the constraints
d
 S
δj i+1 ≤ zi for i = 1, . . . , n − 1, (3.10)
j=1

zi ≤ δdSi for i = 1, . . . , n − 1. (3.11)

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 297

Likewise the univariate case, the integrality of the polytope described


by inequalities (3.8)-(3.11) together with the nonnegativity constraints zi ≥
0 for i = 1, . . . , n − 1, is guaranteed as shown by Wilson [32].
In the following we will characterize for which simplicial sets an order-
ing with properties (O1) and (O2) exists as well as how this ordering can be
computed efficiently. In case of a univariate function, ordering the simplices
according to (O1) and (O2) is trivially done by labeling the line segment
connecting (xi−1 , y i−1 ) and (xi , yi ) with Si for i = 2, . . . , n. For bivariate
piecewise linear functions the simplices are triangles and it is somewhat
more involved to determine an appropriate labeling scheme. As early as
in 1979 Todd introduced an ordering of the simplices for triangulations of
type K1 and J1 [29], before Wilson [32] showed that the adjacency graph
of any triangulation of a domain that is a topological disc in R2 is hamil-
tonian. Here, the adjacency graph is constructed, by introducing a vertex
for each simplex of the triangulation and an edge between any two vertices,
whose corresponding simplices have at least one common point. Eventu-
ally, Bartholdi and Goldsman gave an algorithm with running time O(n2 )
to calculate a hamiltonian path through the adjacency graph of any facet-
adjacent triangulation in R2 [2]. We call a triangulation with d-simplices
facet-adjacent, if for each nonempty subset S  ⊂ S there exist d-simplices
S  ∈ S  and S ∈ S \ S  , such that V(S  ) ∩ V(S) = d. In other words, a trian-
gulation is facet-adjacent, if and only if the graph consisting of the vertices
and edges of the triangulation is d-connected. For example, every triangu-
lation of a convex set is facet-adjacent. For d ≥ 3 we suggest Algorithm
1 to compute an ordering of the simplices and vertices of a facet-adjacent
triangulation, satisfying (O1) and (O2).
Theorem 3.1. Algorithm 1 determines in O(n2 + nd2 ) an ordering
of the simplices and vertices satisfying (O1) and (O2).
Proof. To proof the correctness of Algorithm 1, we first show that
before and after each iteration of the while loop, the elements of the trian-
gulation, formed by the simplices in S  , are ordered according to (O1) and
(O2) such that the ordering induces a hamiltonian cycle on the adjacency
graph of S. Second, we show that the cardinality of S  is increased by one
during each iteration of the loop.
In the first step of Algorithm 1, we have to choose two d-simplices
S1 , S2 ∈ S with |V(S1 ) ∩ V(S2 )| = d. Since the simplices in S form a facet-
adjacent triangulation, this is always possible and we can label the vertices
in V(S1 ) ∪ V(S2 ) such that xSd 1 = xS0 2 and xS0 1 = xSd 2 holds. In the next
step S1 and S2 are added to S  , which is obviously ordered according to
(O1) and (O2).
Since S forms a facet-adjacent triangulation, there must be at least one
simplex Si ∈ S  which has a common facet (and thus d common vertices)
with some simplex S ∈ S \ S  at the beginning of each iteration of the loop.
At the end of the while loop S is added to S  . Thus, Algorithm 1 is finite.

www.it-ebooks.info
298 BJÖRN GEIßLER ET AL.

Data: A set S of d-simplices (d ≥ 3), forming a facet-adjacent


triangulation
Result: An ordering of the simplices S ∈ S and vertices v ∈ V(S)
for all S ∈ S, satisfying (O1) and (O2)
Choose two d-simplices S1 , S2 ∈ S with |V(S1 ) ∩ V(S2 )| = d;
Label the vertices in V(S1 ) ∪ V(S2 ) such that xSd 1 = xS0 2 and
xS0 1 = xSd 2 holds;
S  = (S1 , S2 );
while |S  | = n do
Choose simplices S ∈ S \ S  and Si ∈ S  with
|V(S) ∩ V(Si )| = d;
Choose some vertex w ∈ (V(S) ∩ V(Si )) \ {xS0 i , xSd i };
if xS0 i ∈ V(S) then
Set xS0 = xS0 i and xSd = w;
Change the vertex labeling of Si by setting xS0 i = w;
Set S  = (. . . , Si−1 , S, Si , Si+1 , . . .);
else
Set xS0 = w and xSd = xSd i ;
Change the vertex labeling of Si by setting xSd i = w;
Set S  = (. . . , Si−1 , Si , S, Si+1 , . . .);
end
end
return S 
Algorithm 1: Ordering d-simplices.

To conclude that the simplices in S  are ordered appropriately at the


end of each iteration, we have to see that there always exists some vertex
w ∈ (V(S) ∩ V(Si )) \ {xS0 i , xSd i }, because S and Si have d ≥ 3 common
vertices. Further, there is only one vertex in V(S) \ V(Si ) and V(Si ) \ V(S),
respectively. Therefore we either have xS0 i ∈ V(S) ∩ V(Si ) or xSd i ∈ V(S) ∩
V(Si ) (or both).

In the first case, we insert S between Si and its predecessor in S 


by setting xS0 = xS0 i and xSd = xS0 i = w and leaving all other vertices
untouched. In the second case we insert S between Si and its successor
analogously. Thus, in both cases S  is ordered according to (O1) and (O2)
at the end of each iteration.

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 299

Therefore, Algorithm 1 terminates after a finite number of steps,


with an ordering of the simplices and vertices, satisfying properties (O1)
and (O2).
Concerning the running time, it is easy to see that the while loop is
executed n times and choosing a pair of simplices S ∈ S \ S  and Si ∈ S 
with common facet, can be done in O(n) steps. Updating the vertex labels
only depends on the dimension d and can be done in time O(d2 ). All other
operations can be done in constant time if appropriate data structures
are used. Thus, we can conclude the running time of Algorithm 1 to be
O(n2 + nd2 ).
We like to remark that a consequence of Theorem 3.1 is that every
graph, which is an adjacency graph of a facet-adjacent triangulation with
d-simplices (d ≥ 3) has a hamiltonian cycle.
The third MIP model for piecewise linear multivariate functions we
present is the logarithmic disaggregated convex combination model from
Vielma and Nemhauser [23]. Their approach is based on the disaggregated
convex combination model but uses just a logarithmic number of auxiliary
binary variables to enforce a single simplex to be active. The idea for
reducing the amount of binary variables results from the fact that log2 (n)
binary variables are sufficient to encode n different states, each choosing
one simplex out of n. Therefore an arbitrary injective function c : S →
{0, 1} log2 n can be used.

 d
n  d
n 

x= λSj i xSj i , λSj i = 1, λ ≥ 0, (3.12)
i=1 j=0 i=1 j=0

d
n 

y= λSj i ySj i , (3.13)
i=1 j=0

n 
 d
c(Si )l λSj i ≤ zl for l = 1, . . . , log2 n, (3.14)
i=1 j=0

n 
 d
(1 − c(Si )l )λSj i ≤ 1 − zl for l = 1, . . . , log2 n, (3.15)
i=1 j=0

zl ∈ {0, 1} for l = 1, . . . , log2 n. (3.16)

Constraints (3.12) and (3.13) present the point (x, y) as convex combi-
nation of given simplices and remain unchanged against the disaggregated

www.it-ebooks.info
300 BJÖRN GEIßLER ET AL.

0011 0110 0011 0110


1000 1000
0111 0111
0101 0101
0100 0100
0010 0010

1001 1001
0001 0001
1010 1010

0000 0000

Fig. 8. Example for a binary encoding and the branching induced by the leftmost
and rightmost bit using the logarithmic disaggregated convex combination model.

convex combination model. This model is another locally ideal formula-


tion. An illustration of an example binary encoding and a branching on
the leftmost and rightmost bit is depicted in Fig. 8. In each case white
triangles correspond to a down branch and the up branch is shown in gray.
For a generalization to higher dimensions of the aggregated logarithmic
convex combination model from the univariate case we refer to Vielma and
Nemhauser [23]. We skip this model because it is not as flexible as the
others in the sense that it requires the set of simplices S to be topologically
equivalent to the J1 triangulation, which is also known as “Union Jack”-
triangulation, and thus cannot be applied to arbitrary triangulations like
the other formulations.
4. Controlling the approximation error. In the preceding sec-
tions we have presented a couple of methods to model piecewise linear
functions in terms of mixed integer linear constraints. Although we have
seen that there are applications, in which piecewise linear functions arise
naturally, it is also possible to apply the above mentioned techniques to
piecewise linear approximations of nonlinear expressions. By this means,
we are able to formulate MIP approximations of (mixed integer) nonlinear
programs.
In order to make a point about the quality of such an approximation,
we must be able to measure the linearization errors. Doing this a-posteriori
is quite simple: We solve some MIP approximation to optimality and eval-
uate the nonlinear expressions at the point where the optimum is attained.
Unfortunately, this is not enough, whenever we want the solution to satisfy
some predefined error bounds. In this case we must be able to estimate
the linearization errors a-priori in order to avoid solving MIPs of increasing
complexity (due to finer linearization grids) again and again.
In this section, we will introduce a general methodology to construct
a-priori error estimators for piecewise linear approximations.
As we have seen the complexity of the resulting MIPs strongly depends
on the number of linear pieces of the approximations and if we overestimate
the linearization errors, we will need more linear pieces as necessary to

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 301

ensure that the error bounds are satisfied. Hence, we are interested in as
strong as possible error estimators.
As starting point for our considerations, we assume the following sit-
uation: Let D ⊆ Rd and f : D → R be some continuous function. Further,
let φ : P → R be a piecewise linear approximation of f over some convex
polytope P ⊆ D. We assume φ to interpolate f on the vertices of some tri-
angulation of P. Thus, for a triangulation with simplex set S, we can define
affine functions φi : x → aTi x+bi for each simplex Si ∈ S with φi (x) = f (x)
for every vertex x of Si such that we can write φ(x) = φi (x) for x ∈ Si .
If we are able to control the linearization error within a simplex, we
are obviously able to control the overall linearization error, since we can
simply add a point of maximum error to the vertex set of our triangulation
and retriangulate the affected region locally. Repeating this process for all
not yet checked simplices, leads to a piecewise linearization, which satisfies
the error bound everywhere.
For this reason, we restrict our further considerations to the situation,
where φ(x) = aT x + b is the linear interpolation of f over a simplex S
with vertices x0 , . . . , xd , i.e., φ(xi ) = f (xi ) for i = 1, . . . , d. We define
the maximum linearization error in terms of the maximum under- and
overestimation of a function (cf. Fig. 9):
Definition 4.1. We call u (f, S) := maxx∈S f (x) − φ(x) the maxi-
mum underestimation, o (f, S) := maxx∈S φ(x) − f (x) the maximum over-
estimation and (f, S) = max{u (f, S), o (f, S)} the maximum lineariza-
tion error of f by φ over S.

f (x), φ(x)

u

o

x
S

Fig. 9. The maximum underestimation u and the maximum overestimation o of


f by φ over S.

Since the maximum overestimation o (f, S) of f over S with respect


to φ is equal to the maximum underestimation u (−f, S) of −f over S
with respect to −φ, we restrict ourselves to only consider the maximum

www.it-ebooks.info
302 BJÖRN GEIßLER ET AL.

overestimation for the remainder of this section. In cases, where f is convex


(or concave) over S, calculating the maximum overestimation can be done
efficiently as a consequence of the following proposition:
Proposition 4.1. If f is convex over S, then o (f, S) can be obtained
by solving a convex optimization problem. If f is concave over S, then
o (f, S) = 0 holds.
Proof. If f is convex, we have o (f, S) = maxx∈S φ(x) − f (x). Since
f is convex, −f is concave and thus, φ(x) − f (x) is concave as the sum of
concave functions. Therefore, f (x) − φ(x) is convex and we can calculate
o (f, S) = − minx∈S f (x)−φ(x) by solving a convex minimization problem
over a simplicial domain.
Next, we consider the case, where f is concave. Since φ is an affine
function, for every x ∈ S, the point (x, φ(x)) is a convex combination of
the vertices x0 , . . . , xd of S and the associated values of φ, i.e.,
d
 d
 d

x= λi xi , φ(x) = λi φ(xi ), λi = 1, λi ≥ 0, i = 0, . . . , d.
i=0 i=0 i=0

Since f is concave, we can apply Jensen’s inequality [12] to get

d d

f (x) = f ( λi xi ) ≥ λi f (xi ).
i=0 i=0

Since φ interpolates f in the vertices of S, we get


d

f (x) ≥ λi φ(xi ) = φ(x),
i=0

which shows o (f, S) = 0.


Thus, we only have to maximize a concave function (i.e., minimize a
convex function) over a simplicial domain, in order to obtain the maximum
overestimation, whenever we know that the approximated function is either
convex or concave over S. This can be done efficiently. Unfortunately, the
situation gets more involved, if f is indefinite, or if we simply do not know
about the definiteness of f over S. In this case, we can only estimate the
maximum overestimation in general. To show, how to do this, we need the
following definitions:
Definition 4.2.
(i) A function

μ ∈ U(f, S) := {ξ : S → R : ξ convex, ξ(x) ≤ f (x) ∀x ∈ S}

is called a convex underestimator of f over S.

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 303

(ii) The function vexS [f ] : S → R, defined as

vexS [f ](x) := sup{μ(x) : μ ∈ U(f, S)}

is called the convex envelope of f over S.


Clearly, the convex envelope is the tightest convex underestimator of
f over S, since the pointwise supremum of convex functions is convex [26].
To derive upper bounds for the maximum overestimation of f over
S, we can use convex underestimating functions as shown in the following
proposition.
Proposition 4.2. Let μ be a convex underestimating function of f
over S, then o (f, S) ≤ o (μ, S) holds.
Proof. Since μ(x) ≤ f (x) ∀x ∈ S, we get

o (f, S) = max φ(x) − f (x) ≤ max φ(x) − μ(x) = o (μ, S).


x∈S x∈S

There are many cases, in which we can derive a convex underestimator


for an indefinite function. For example, in the case of a twice continuously
differentiable function, we can use α-underestimators as introduced in [20]
or a further developed class of convex underestimators, based on piecewise
α-underestimators as described in [9, 10]. As we mentioned above, the
complexity of the approximating mixed integer linear programs highly de-
pend on the quality of the error estimators. Naturally, the best possible
error estimators can be constructed by using the convex envelopes of the
approximated function. From Theorem 4.1 below, we will see that using
the envelopes, we can actually calculate the exact value of the maximum
linearization error within a simplex. Before we formulate the theorem we
give two lemmas summarizing some basic properties of convex envelopes:
Lemma 4.1. Let C ⊂ Rd be a subset of the d-dimensional real space,
ξ, η : C → R real valued functions and η affine, then for all x ∈ conv(C)
vexC [ξ](x) + vexC [η](x) = vexC [ξ + η](x).
Lemma 4.2. Let C be a compact subset of Rd and ξ a lower semi-
continuous real-valued function on C. Further, let M be the set of globally
minimum points of ξ over C and N the set of globally minimum points of
vexC [ξ]. Then

min ξ(x) = min vexC [ξ](x) and N = conv(M).


x∈C x∈conv(C)

Here, the convex hull of a set S is denoted by conv(S). A proof of


Lemma 4.1 and Lemma 4.2 can be found in [28]. Now we can formulate
Theorem 4.1, which constitutes the main result of this section. We will
see that once we have the convex envelope of f , we are able to calculate
the maximum overestimation by solving a convex optimization problem.

www.it-ebooks.info
304 BJÖRN GEIßLER ET AL.

Moreover, we can identify a point, where the maximum error is attained


efficiently.
Theorem 4.1. Let Mo := {x ∈ S : φ(x) − f (x) = o (f, S)} be the
set of global maximizers for the overestimation of f by φ over S and let
No := {x ∈ S : φ(x) − vexS [f ](x) = o (vexS [f ], S)} be the set of global
maximizers for the overestimation of the convex envelope of f by φ over S.
Then we get o (f, S) = o (vexS [f ], S) and No = conv(Mo ).
Proof. We get

o (vexS [f ], S) = max φ(x) − vexS [f ](x)


x∈S
= − min vexS [f ](x) − φ(x)
x∈S
= − min vexS [f ](x) + vexS [−φ](x),
x∈S

since −φ is affine. Next, we can apply Lemma 4.1 to conclude

o (vexS [f ], S) = − min vexS [f − φ](x).


x∈S

By assumption, S is convex and we can use Lemma 4.2 to get

o (vexS [f ], S) = − min(f − φ)(x)


x∈S
= − min f (x) − φ(x)
x∈S
= max φ(x) − f (x)
x∈S
= o (f, S).

The identity No = conv(Mo ), again follows from Lemma 4.2, which com-
pletes the proof.
In order to calculate a point where the maximum overestimation of f
by φ over S is attained, it suffices to solve the convex optimization problems

min {xi : xj = xjj , j < i} (4.1)


x∈No

iteratively for i = 1, . . . , d. Here xj is the solution of Problem (4.1) for


i = j. The point xj lies on all supporting hyperplanes Hk = {x ∈ Rd :
xk = xkk } for k ≤ j of No . Since the " normal
. vectors
/ of H1 , . . . , Hj are
k
linearly independent, the dimension of k≤j H ∩ No is at most d − j.
"d . /
Thus, xd = i=1 Hi ∩ No must be an extreme point of No and we get
xd ∈ Mo . Therefore, xd is a global minimizer of φ(x)−f (x). We remark that
it is not necessary to solve d instances of Problem (4.1) in general, since if an
optimal solution xj of (4.1) for j < d satisfies φ(xj )−f (xj ) = o (vexS [f ], S),
the point xj already is an extreme point of the convex hull of Mo .
Unfortunately it is hard to determine the envelopes of indefinite func-
tions in general, but in recent times, there was much progress in this field,

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 305

e.g., in [11] it was shown that the envelopes of indefinite (d − 1)-convex


functions can be calculated efficiently. We have to point out that although
Theorem 4.1 is valid for any dimension, it is impractical to apply the tech-
niques described in this chapter to nonlinear expressions, which depend on
a large number of variables, because even the number τd of simplices, nec-
essary to triangulate the d-cube is growing rapidly with d. From [27], we
d
6 2 ·d!
know that τd ≥ d+1 is a valid inequality. Even though, we can only
2·(d+1) 2

deal with piecewise linear approximations of nonlinear expressions depend-


ing just on a few variables efficiently, the vast majority of nonlinearities,
occurring in practical applications are of this type or they are separable
such that they can be decomposed into uni-, bi- or trivariate functions.
In the following section, we will show, how to use the error estimators
developed so far and the modeling techniques from the preceding sections
to construct a mixed integer linear programming relaxation for any given
mixed integer nonlinear program.

5. MIP relaxations of mixed integer nonlinear programs. In


this section we will show how to construct a mixed integer linear program-
ming relaxation for a mixed integer nonlinear program. To this end we will
slightly modify the techniques introduced in Section 2 and 3 by using the
error estimators developed in Section 4. In the remainder of this chapter
we deal with a mixed integer nonlinear program of the form

min cT x
s.t. gi (x) = bi for i = 1, . . . , k1 ,
hi (x) ≥ bi for i = 1, . . . , k2 ,
l ≤ x ≤ u, (5.1)
x ∈ Rd−p × Zp ,

where gi : Rd → R for i = 1, . . . , k1 and hi : Rd → R for i = 1, . . . , k2


are real valued functions and l, u ∈ Rd are the vectors of lower and upper
bounds on the variables.
Our goal is to construct piecewise polyhedral envelopes as shown in
Fig. 10 for each gi and hi using modified versions of the modeling techniques
for piecewise linear functions introduced in Section 2 and 3. For f = gi and
i ∈ {1, . . . , k1 } the idea is as simple as introducing an additional real valued
variable e, which is added to the approximated function value and which is
bounded from below and above by −o (f, S) and u (f, S) (cf. Section 4),
respectively. These variable bounds depend on the simplex S and we will
explain how to model these bounds appropriately. To this end we consider
some (nonlinear) expression f = gi occurring on the left-hand side in one of
the equality constraints of Problem (5.1), together with its piecewise linear
approximation φ (cf. Section 4).

www.it-ebooks.info
306 BJÖRN GEIßLER ET AL.

sin(x)

x
π
2 π 3π
2 2π

Fig. 10. Piecewise polyhedral envelopes of sin(x) on [0, 2π] with breakpoints
π 3π
0, , , 2π.
2 2

In case of the convex combination, logarithmic or SOS model, the


approximated function value is given by Equation (3.3), which we modify
as follows:
m

y= λj y j + e. (5.2)
j=1

In case of the disaggregated logarithmic model, we modify equation


(3.13) to


n 
d
y= λSj i ySj i + e. (5.3)
i=1 j=0

Additionally we introduce the following two inequalities, to model the


bounds of e in case of the convex combination model:
n
 n

− o (f, Si )zi ≤ e ≤ u (f, Si )zi . (5.4)
i=1 i=1

In a logarithmic or an SOS model, we can only incorporate the proper


bounds during branching. For the disaggregated logarithmic model we
guarantee the bounds on e by
⎛ ⎞ ⎛ ⎞
n d
 n
 d

− ⎝o (f, Si ) λSj i ⎠ ≤ e ≤ ⎝u (f, Si ) λSj i ⎠ , (5.5)
i=1 j=0 i=1 j=0

where u (f, Si ) and o (f, Si ) are computed as described in Section 4. In


every feasible solution, the λ-variables sum up to 1 and satisfy the SOS
condition (i.e., all positive λ-variables belong to vertices of the same sim-
plex). The boxes, depicted in Fig. 10 exactly describe the feasible region
of the MIP resulting from the modified convex combination, SOS or (dis-
aggregated) logarithmic model together with (5.4) and (5.5), respectively.

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 307

If f = hi for some i ∈ {1, . . . , k2 } we can omit the left inequality in (5.4)


and (5.5), because we must not bound e from above.
In case of the incremental method we substitute equation (3.7) by

d 1
n 
 2
y = y S0 i + ySj i − y S0 i δj + e (5.6)
i=1 j=1

and add the inequalities


n−1

u (f, S1 ) + zi (u (f, Si+1 ) − u (f, Si )) ≥ e, (5.7)
i=1
n−1

−o (f, S1 ) − zi (o (f, Si+1 ) − o (f, Si )) ≤ e. (5.8)
i=1

The feasible region of the MIP described by the modified incremental model
together with the Constraints (5.7) and (5.8) is again the union of the boxes,
depicted in Fig. 10. In contrast to the modified disaggregated logarithmic
method, we can only model the piecewise polyhedral envelopes appropri-
ately, if we add inequalities containing binary variables. To clarify the cor-
rectness of (5.7) and (5.8) remember that in every feasible solution of the
described MIP there is some index j with zi = 1 for all i < j and zi = 0 for
all i ≥ j. This means that all terms u (f, Si ) on the left-hand side of (5.7)
and all terms o (f, Si ) on the left-hand side of (5.8) with i = j either cancel
out or are multiplied by 0. Therefore, we get −o (f, Sj ) ≤ e ≤ u (f, Sj ) as
desired.
As we have seen, it suffices to introduce k1 + k2 additional real valued
variables more to model a mixed integer piecewise polyhedral relaxation
instead of a mixed integer piecewise linear approximation of (5.1). This
marginal overhead makes every feasible point of (5.1) also feasible for its
MIP relaxation. Thus, we can use any NLP solver to produce feasible solu-
tions for the MIP, once the integer variables xd−1+p , . . . , xd are fixed. If we
only accept solutions, feasible for the MINLP, as incumbent solutions of the
MIP relaxation, it is straightforward to implement an algorithm to solve
(5.1), which can proof optimality and prune infeasible or suboptimal parts
of the branch and bound tree by using the whole bunch of well evolved
techniques integrated in modern solvers for mixed integer linear problems.
Even if we think about such an MIP relaxation without having any global
solver for the underlying MINLP in mind, there are some obvious advan-
tages compared to a (piecewise linear) approximation. First, any solution
of an MIP relaxation (and even any solution to a corresponding LP relax-
ation) yields a valid lower bound for the MINLP. On the other hand, if
we consider just an approximation, no such statement can be made with-
out making further assumptions concerning the convexity of the involved

www.it-ebooks.info
308 BJÖRN GEIßLER ET AL.

nonlinearities. Second, even piecewise linear MIP approximations of high


accuracy are likely to be infeasible, although the underlying MINLP has a
nonempty set of feasible points. This becomes apparent e.g., for MINLPs
having constraints of type f (x) = g(x), where f and g are nonlinear func-
tions, whose graphs intersect in a finite number of feasible points. For an
MIP relaxation all these points are feasible regardless of the chosen accu-
racy. On the other hand, even a low accuracy approximation might capture
such points of intersection rather exact and thus reflects the properties of
the MINLP model quite well, while a relaxation might smooth the feasible
set in such a way that the resulting solutions violate some characteristic
properties of the underlying MINLP.
In [18] a technique similar to the approach presented above has been
introduced. There, the piecewise polyhedral envelopes are constructed us-
ing special ordered sets but instead of branching to fulfill the SOS con-
dition, they branch on the original problem variables and introduce new
breakpoints in each new node of the branch and bound tree. The second
difference is that they decompose the involved nonlinear expressions into
simpler expressions and derive explicit formulas for the maximum lineariza-
tion error for a set of nonlinear expressions occurring within the problem of
power system analysis. Therefore our approach to calculate the lineariza-
tion errors is somewhat more general. On the other hand we are not yet
able to refine the polyhedral envelopes dynamically, but we believe that
the combinatorics of piecewise linear MIP models can be further exploited
by appropriate cutting planes and separation algorithms.

6. Computational results. In this section we report on some com-


putational results for problems including the introduced piecewise lineariza-
tion techniques. These problems are based upon the real-world applications
of water supply network optimization and transient technical optimization
of gas networks that consider the problem of time-dependent optimiza-
tion in distribution networks. These networks consist of pipes to transport
water or gas from suppliers to consumers. Due to friction within pipes
pressure gets lost. This loss can be compensated by pumps in water supply
networks or compressor stations in the case of gas transportation. Run-
ning pumps and compressors consume power depending on their current
pressure increase and flow. This power is either generated by electricity or
by the combustion of fuel gas. The aim of our optimization problems is to
minimize the overall electricity and fuel gas consumption, respectively.
After applying a time and space discretization this problem reduces
to a mixed integer nonlinear program. On the one hand nonlinearities are
used to represent the physical behavior of water or gas within the network
components and on the other hand binary variables are used to describe
discrete switching processes of valves, compressors and pumps or to enforce
minimum running times of the devices.

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 309

0.15 200

0.1
150
0.05

0
100
−0.05

−0.1
50
−0.15

−0.2 0
−4 −2 0 2 4 0 1 2 3 4

Fig. 11. Piecewise linear approxima- Fig. 12. Piecewise linear approxima-
tion of the solution set for equation (6.1). tion of the solution set for equation (6.3).

We formulate an MIP to solve the problem of minimal power consump-


tion of gas and water networks, respectively. Nonlinearities are approxi-
mated by piecewise linearizations and included in our models by applying
the methods from Section 2 and Section 3. Afterward CPLEX as black
box MIP solver is used to solve the resulting mixed binary linear models.
We refer to [19] for further details on the problem of transient technical
optimization.
The arising nonlinearities of our water network optimization problems
are the pressure loss due to friction within pipes, which can be described
by equations of the form

y1 = λ x1 |x1 | . (6.1)

The factor λ is implicitly defined through an equation


 
1 c
√ = −a log10 b + √ , (6.2)
λ x1 λ
with constants a, b and c. Additionally, a pump’s pressure increase is either
defined by a smooth function of the form

y2 = dxe2 , (6.3)

with constant values d and e or directly by a piecewise linear interpola-


tion of a given point set. All the above nonlinearities are univariate and
we approximate them by an interpolation as explained in Section 4. For
instance, for a water network optimization problem instance with eight
pipes, four pumps and 24 time steps we get 192 nonlinearities of the form
(6.1) and 96 of the type (6.3) in our chosen example. On average each of
them is approximated by about 30 line segments (cf. Fig. 11 and Fig. 12).
The accuracy is guaranteed through a relative approximation error of at
most 10−3 .

www.it-ebooks.info
310 BJÖRN GEIßLER ET AL.

Table 1
Demonstration of univariate models for a water supply network example.

cc inc sos log dlog


No. of vars 6522 6621 3871 4697 7526
No. of ints 2851 3001 200 776 776
No. of aux 2751 2901 150 676 676
No. of cons 3920 9419 569 2145 2149
No. of nodes 136706 16004 182161 85383 128320
running time 1200 106 413 1462 * (∞)

For all numerical results we use ILOG CPLEX 11.2 with default pa-
rameters and we do not interfere with the solution process. All computa-
tions are done on one core of an Intel Core2Duo 2GHz machine with 2GB
RAM running on a Linux operating system. The maximum running time is
restricted to at most one hour. We omit the time to calculate the piecewise
linear interpolations, because in our examples it is insignificant compared
to CPLEX’ running time. In Table 1 we compare all mixed integer formula-
tions from Section 2 on a water network optimization problem. The table
shows the entire number of variables, integer variables, auxiliary binary
variables and constraints of the resulting model as well as the number of
branch-and-bound nodes and solution times in seconds CPLEX needed to
prove optimality. An asterisk indicates that the time limit was reached. In
this case the solver’s gap at that time is printed in parentheses. The model
abbreviations stand for convex combination (cc), incremental (inc), special
ordered sets (sos), logarithmic convex combination (log) and logarithmic
disaggregated convex combination (dlog) model. We see that even if sos,
log and dlog yield smaller models it is not always a good idea to use just
these models. In our example inc is by far the best choice, a result that
agrees with further examinations of water network supply instances. In ad-
dition to the results computed by piecewise linear approximations we also
compare them to the piecewise linear relaxations introduced in Section 5.
For our example the running times of both approaches do not differ very
much. In general one could say it is slightly more difficult to find a fea-
sible solution using the approximation techniques, whereas the optimality
proof is typically somewhat faster. More interesting, however, is an inves-
tigation of the optimal solutions found by both approaches (see Table 3).
The relative difference of the objective values obtained by piecewise linear
approximation and piecewise linear relaxation is 10−3, which is not greater
than our error tolerance for the nonlinearities. Of course we cannot ensure
this to hold for arbitrary instances or models, but as our examples and ex-
periences show it is commonly true for our problems. The exact objective
value of this global optimum is somewhere in between of those achieved by
interpolation and relaxation.

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 311

In gas network optimization problems, higher dimensional nonlinear-


ities arise, too. The friction within pipes is expressed by a bivariate non-
linear equation
x21
y1 = c λ , (6.4)
x2
with constant factor c and λ defined again by equation (6.2). The 32 arising
equations of this form are replaced by piecewise linearizations with approx-
imately 14 triangles to achieve a maximum relative error of at most 1%.
Furthermore, trivariate nonlinear terms arise to describe the compressor’s
power consumption through an equation of the form
  
b
x2
y2 = a x1 −c , (6.5)
x3

where a, b and c are constants. Each of the arising ten equations of the
above type are piecewise linearized with approximately 700 tetrahedra. In
Table 2 computational results for the three different formulations from Sec-
tion 3, namely convex combination (cc), incremental (inc) and logarithmic
disaggregated convex combination (dlog) model are listed for a gas net-
work optimization problem. We see for our representative example that
dlog plays out its advantage of the drastically reduced number of binary
variables and constraints. Again, inc models seem to have the best struc-
ture, but the LPs are far too large compared to the others. A comparision
of the results obtained by piecewise linear relaxations shows almost identi-
cal running times to those achieved by interpolation techniques. As shown
in Table 3 the relative difference between both objective values is approx-
imately 2.7%, which is in the same order of magnitude as our specified
maximum approximation error of the nonlinearities. Likewise, we see that
the solution found by piecewise linearization techniques lies within 1.3%
and 1.4% of the exact optimum. In addition we like to mention that all
crucial decisions, i.e. problem-specific integer variables that are not intro-
duced to model piecewise linearizations, have an identical assignment in all
presented cases.
7. Future directions. We have seen that using piecewise linear ap-
proximations can be an attractive way of tackling mixed integer nonlinear
programs. We get a globally optimal solution within a-priori determined
tolerances and are able to use the well-developed software tools of mixed
integer linear programming. So far, the best case where these techniques
can be used is when there are only few different nonlinear functions each
of which depends only on a few variables. Then the combinatorics of the
problem dominate the complexity of the approximation and we can expect
a huge gain in being able to use MIP techniques.
This also opens up a variety of different directions for further devel-
oping these methods. One important topic is to fuse the techniques shown

www.it-ebooks.info
312 BJÖRN GEIßLER ET AL.

Table 2
Demonstration of multivariate models for a gas network example.

cc inc dlog
No. of vars 10302 30885 29397
No. of ints. 7072 7682 362
No. of aux. 7036 7646 326
No. of cons 3114 23592 747
No. of nodes 620187 18787 75003
running time * (0.81%) 3204 1386

Table 3
Comparison between optimal objective values obtained by piecewise approximation,
piecewise relaxation and exact optimum.

Approximation Relaxation Exact


Water network example 73.4473 73.3673 73.4211
Gas network example 2.7661 2.6903 2.7268

in this chapter with classical nonlinear programming techniques. Can we


find a piecewise linear approximation that is just fine enough to get the
optimal setting of the integral variables? Then we could fix the integral
variables and use a standard nonlinear programming tool to compute the
optimal value of the continuous variables. As the schemes we have pre-
sented are quite general and do not require much of the functions to be
approximated, it will be important in which cases one can do better. In
the case of a second-order cone constraint Ben-Tal and Nemirovski [5] have
given an ingenious construction for an LP model of a piecewise linear ap-
proximation that does not need any extra binary variables. What about
other functions, can we give similar constructions for those? Modeling
support will also be an important practical topic: How can we relieve the
modeler of the burden to generate a good approximation of the nonlinear
function and automate this process?

REFERENCES

[1] A. Balakrishnan and S.C. Graves, A composite algorithm for a concave-cost


network flow problem, Networks, 19 (1989), pp. 175–202.
[2] J.J. Bartholdi III and P. Goldsman, The vertex-adjacency dual of a triangulated
irregular network has a hamiltonian cycle, Operations Research Letters, 32
(2004), pp. 304–308.
[3] E.M.L. Beale and J.J.H. Forrest, Global optimization using special ordered
sets, Math. Programming, 10 (1976), pp. 52–69.
[4] E.M.L. Beale and J.A. Tomlin, Special facilitiess in a general mathematical
programming system for non-convex problems using ordered sets of variables,

www.it-ebooks.info
USING PIECEWISE LINEAR FUNCTIONS FOR SOLVING MINLPs 313

in OR 69, J. Lawrence, ed., International Federation of Operational Research


Societies, Travistock Publications, 1970, pp. 447–454.
[5] A. Ben-Tal and A. Nemirovski, On polyhedral approximations of the second-
order cone, Math. Oper. Res., 26 (2001), pp. 193–205.
[6] K.L. Croxton, B. Gendron, and T.L. Magnanti, Variable disaggregation in
network flow problems with piecewise linear costs, Oper. Res., 55 (2007),
pp. 146–157.
[7] G.B. Dantzig, On the significance of solving linear programming problems with
some integer variables, Econometrica, 28 (1960), pp. 30–44.
[8] I.R. de Farias, Jr., M. Zhao, and H. Zhao, A special ordered set approach
for optimizing a discontinuous separable piecewise linear function, Oper. Res.
Lett., 36 (2008), pp. 234–238.
[9] C.E. Gounaris and C.A. Floudas, Tight convex underestimators for C 2 -
contiuous problems: I. multivariate functions, Journal of Global Optimization,
42 (2008), pp. 69–89.
[10] , Tight convex underestimators for C 2 -contiuous problems: I. univariate
functions, Journal of Global Optimization, 42 (2008), pp. 51–67.
[11] M. Jach, D. Michaels, and R. Weismantel, The convex envelope of (n-1) convex
functions, Siam Journal on Optimization, 19 (3) (2008), pp. 1451–1466.
[12] J.L.W.V. Jensen, Sur les fonctions convexes et les inégalités entre les valeurs
moyennes, Acta Methematica, 30 (1) (1906), pp. 175 – 193.
[13] A.B. Keha, I.R. de Farias, Jr., and G.L. Nemhauser, Models for representing
piecewise linear cost functions, Oper. Res. Lett., 32 (2004), pp. 44–48.
[14] , A branch-and-cut algorithm without binary variables for nonconvex piece-
wise linear optimization, Oper. Res., 54 (2006), pp. 847–858.
[15] T. Koch, Personal communication, 2008.
[16] A. Krion, Optimierungsmethoden zur Berechnung von Cross-Border-Flow beim
Market-Coupling im europäischen Stromhandel, Master’s thesis, Discrete Op-
timization Group, Department of Mathematics, Technische Universität Darm-
stadt, Darmstadt, Germany, 2008.
[17] J. Lee and D. Wilson, Polyhedral methods for piecewise-linear functions. I. The
lambda method, Discrete Appl. Math., 108 (2001), pp. 269–285.
[18] S. Leyffer, A. Sartenaer, and E. Wanufell, Branch-and-refine for
mixed-integer nonconvex global optimization, Tech. Rep. ANL/MCS-P1547-
0908,Argonne National Laboratory, Mathematics and Computer Science
Division, 2008.
[19] D. Mahlke, A. Martin, and S. Moritz, A mixed integer approach for time-
dependent gas network optimization, Optimization Methods and Software, 25
(2010), pp. 625 – 644.
[20] C.D. Maranas and C.A. Floudas, Global minimum potential energy conforma-
tions of small molecules, Journal of Global Optimization, 4 (1994), pp. 135–
170.
[21] H.M. Markowitz and A.S. Manne, On the solution of discrete programming
problems, Econometrica, 25 (1957), pp. 84–110.
[22] R.R. Meyer, Mixed integer minimization models for piecewise-linear functions of
a single variable, Discrete Mathematics, 16 (1976), pp. 163 – 171.
[23] G. Nemhauser and J.P. Vielma, Modeling disjunctive constraints with a loga-
rithmic number of binary variables and constraints, in Integer Programming
and Combinatorial Optimization, Vol. 5035 of Lecture Notes in Computer
Science, 2008, pp. 199–213.
[24] M. Padberg, Approximating separable nonlinear functions via mixed zero-one
programs, Oper. Res. Lett., 27 (2000), pp. 1–5.
[25] M. Padberg and M.P. Rijal, Location, scheduling, design and integer program-
ming, Kluwer Academic Publishers, Boston, 1996.
[26] R.T. Rockafellar, Convex Analysis, Princeton University Press, 1970.

www.it-ebooks.info
314 BJÖRN GEIßLER ET AL.

[27] W.D. Smith, A lower bound for the simplexity of the n-cube via hyperbolic vol-
umes, European Journal of Combinatorics, 21 (2000), pp. 131–137.
[28] F. Tardella, On the existence of polyhedral convex envelopes, in Frontiers in
global optimization, C. Floudas and P. M. Pardalos, eds., Vol. 74 of Nonconvex
Optimization and its Applications, Springer, 2004, pp. 563 – 573.
[29] M.J. Todd, Hamiltonian triangulations of Rn , in Functional Differential Equa-
tions and Approximation of Fixed Points, A. Dold and B. Eckmann,
eds., Vol. 730/1979 of Lecture Notes in Mathematics, Springer, 1979,
pp. 470 – 483.
[30] J.P. Vielma, S. Ahmed, and G. Nemhauser, Mixed-Integer models for nonsep-
arable Piecewise-Linear optimization: Unifying framework and extensions,
Operations Research, 58 (2009), pp. 303–315.
[31] J.P. Vielma, A.B. Keha, and G.L. Nemhauser, Nonconvex, lower semicontinu-
ous piecewise linear optimization, Discrete Optim., 5 (2008), pp. 467–488.
[32] D. Wilson, Polyhedral methods for piecewise-linear functions, Ph.D. thesis in
Discrete Mathematics, University of Kentucky, 1998.

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR
MINLP WITH SEPARABLE NON-CONVEXITY
CLAUDIA D’AMBROSIO∗ , JON LEE† , AND ANDREAS WÄCHTER‡

Abstract. We present an algorithm for Mixed-Integer Nonlinear Programming


(MINLP) problems in which the non-convexity in the objective and constraint func-
tions is manifested as the sum of non-convex univariate functions. We employ a lower
bounding convex MINLP relaxation obtained by approximating each non-convex func-
tion with a piecewise-convex underestimator that is repeatedly refined. The algorithm
is implemented at the level of a modeling language. Favorable numerical results are
presented.

Key words. Mixed-integer nonlinear programming, global optimization, spatial


branch-and-bound, separable, non-convex.

AMS(MOS) subject classifications. 65K05, 90C11, 90C26, 90C30.

1. Introduction. The global solution of practical instances of Mixed-


Integer NonLinear Programming (MINLP) problems has been considered
for some decades. Over a considerable period of time, technology for the
global optimization of convex MINLP (i.e., the continuous relaxation of the
problem is a convex program) had matured (see, for example, [8, 17, 9, 3]),
and rather recently there has been considerable success in the realm of
global optimization of non-convex MINLP (see, for example, [18, 16, 13, 2]).
Global optimization algorithms, e.g., spatial branch-and-bound ap-
proaches like those implemented in codes like BARON [18] and COUENNE [2],
have had substantial success in tackling complicated, but generally small
scale, non-convex MINLPs (i.e., mixed-integer nonlinear programs having
non-convex continuous relaxations). Because they are aimed at a rather
general class of problems, the possibility remains that larger instances from
a simpler class may be amenable to a simpler approach.
We focus on MINLPs for which the non-convexity in the objective
and constraint functions is manifested as the sum of non-convex univariate
functions. There are many problems that are already in such a form, or
can be brought into such a form via some simple substitutions. In fact,
the first step in spatial branch-and-bound is to bring problems into nearly
such a form. For our purposes, we assume that the model already has this
form. We have developed a simple algorithm, implemented at the level of
a modeling language (in our case AMPL, see [10]), to attack such separable
problems. First, we identify subintervals of convexity and concavity for
the univariate functions using external calls to MATLAB [14]. With such an

∗ Departmentof ECSS, University of Bologna, Italy ([email protected]).


† Departmentof Industrial and Operations Engineering, University of Michigan,
Ann Arbor, MI 48109, U.S.A. ([email protected]).
‡ IBM T.J. Watson Research Center, NY 10598, U.S.A. ([email protected]).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 315
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_11,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
316 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

identification at hand, we develop a convex MINLP relaxation of the prob-


lem. Our convex MINLP relaxation differs from those typically employed
in spatial branch-and-bound; rather than relaxing the graph of a univariate
function on an interval to an enclosing polygon, we work on each subinter-
val of convexity and concavity separately, using linear relaxation on only
the “concave side” of each function on the subintervals. The subintervals
are glued together using binary variables. Next, we employ ideas of spa-
tial branch-and-bound, but rather than branching, we repeatedly refine our
convex MINLP relaxation by modifying it at the modeling level. We attack
our convex MINLP relaxation, to get lower bounds on the global minimum,
using the code BONMIN [3, 4] as a black-box convex MINLP solver. Finally,
by fixing the integer variables in the original non-convex MINLP, and then
locally solving the associated non-convex NLP restriction, we get an up-
per bound on the global minimum, using the code IPOPT [19]. We use
the solutions found by BONMIN and IPOPT to guide our choice of further
refinements.
We implemented our framework using the modeling language AMPL. In
order to obtain all of the information necessary for the execution of the
algorithm, external software, specifically the tool for high-level computa-
tional analysis MATLAB, the convex MINLP solver BONMIN, and the NLP
solver IPOPT, are called directly from the AMPL environment. A detailed
description of the entire algorithmic framework, together with a proof of
its convergence, is provided in 2.
We present computational results in 3. Some of the instances arise
from specific applications; in particular, Uncapacitated Facility Location
problems, Hydro Unit Commitment and Scheduling problems, and Nonlin-
ear Continuous Knapsack problems. We also present computational results
on selected instances of GLOBALLib and MINLPLib. We have had signif-
icant success in our preliminary computational experiments. In particular,
we see very few major iterations occurring, with most of the time being
spent in the solution of a small number of convex MINLPs. As we had
hoped, our method does particularly well on problems for which the non-
convexity is naturally separable. An advantage of our approach is that it
can be implemented easily using existing software components and that
further advances in technology for convex MINLP will immediately give us
a proportional benefit.
Finally, we note that a preliminary shorter version of the present paper
appeared as [7].
2. Our algorithmic framework. We focus now on MINLPs, where
the non-convexity in the objective and constraint functions is manifested
as the sum of non-convex univariate functions. Without loss of generality,
we take them to be of the form

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 317

min j∈N Cj xj
subject to
f (x) ≤ 0;
(P)
ri (x) + k∈H(i) gik (xk ) ≤ 0 , ∀i ∈ M ;
Lj ≤ xj ≤ Uj , ∀j ∈ N ;
xj integer, ∀j ∈ I ,

where N := {1, 2, . . . , n} , f : Rn → Rp and ri : Rn → R ∀i ∈ M , are


convex functions, H(i) ⊆ N ∀i ∈ M , the gik : R → R are non-convex
univariate function ∀i ∈ M , and I ⊆ N . Letting H := ∪i∈M H(i), we can
take each Lj and Uj to be finite or infinite for j ∈ N \ H , but for j ∈ H
we assume that these are finite bounds.
We assume that the problem functions are sufficiently smooth (e.g.,
twice continuously differentiable) with the exception that we allow the
univariate gik to be continuous functions defined piecewise by sufficiently
smooth functions over a finite set of subintervals of [Lk , Uk ]. Without loss
of generality, we have taken the objective function as linear and all of the
constraints to be inequalities, and further of the less-then-or-equal variety.
Linear equality constraints could be included directly in this formulation,
while we assume that nonlinear equalities have been split into two inequal-
ity constraints.
Our approach is an iterative technique based on three fundamental
ingredients:
• A reformulation method with which we obtain a convex MINLP
relaxation Q of the original problem P. Solving the convex MINLP
relaxation Q, we obtain a lower bound of our original problem P ;
• A non-convex NLP restriction R of the original MINLP problem
P obtained by fixing the variables within the set {xj : j ∈ I}.
Locally solving the non-convex NLP restriction R, we obtain an
upper bound of our original problem P ;
• A refinement technique aimed at improving, at each iteration, the
quality of the lower bound obtained by solving the convex MINLP
relaxation Q.
The main idea of our algorithmic framework is to iteratively solve a
lower-bounding relaxation Q and an upper-bounding restriction R so that,
in case the value of the upper and the lower bound are the same, the global
optimality of the solution found is proven; otherwise we make a refinement
to the lower-bounding relaxation Q. At each iteration, we seek to decrease
the gap between the lower and the upper bound, and hopefully, before
too long, the gap will be within a tolerance value, or the lower bounding
solution is deemed to be sufficiently feasible for the original problem. In
these cases, or in the case a time/iteration limit is reached, the algorithm
stops. If the gap is closed, we have found a global optimum, otherwise
we have a heuristic solution (provided that the upper bound is not +∞).
The lower-bounding relaxation Q is a convex relaxation of the original non-

www.it-ebooks.info
318 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

convex MINLP problem, obtained by approximating the concave part of


the non-convex univariate functions using piecewise linear approximation.
The novelty in this part of the algorithmic framework is the new formula-
tion of the convex relaxation: The function is approximated only where it is
concave, while the convex parts of the functions are not approximated, but
taken as they are. The convex relaxation proposed is described in details
in 2.1. The upper-bounding restriction R, described in 2.2, is obtained
simply by fixing the variables with integrality constraints. The refinement
technique consists of adding one or more breakpoints where needed, i.e.,
where the approximation of the non-convex function is bad and the solution
of the lower-bounding problem lies. Refinement strategies are described in
2.3, and once the ingredients of the algorithmic framework are described in
detail, we give a pseudo-code description of our algorithmic framework (see
2.4). Here, we also discuss some considerations about the general frame-
work and the similarities and differences with popular global optimization
methods. Theoretical convergence guarantees are discussed in 2.5. In 3,
computational experiments are presented, detailing the performance of the
algorithm and comparing the approach to other methods.

2.1. The lower-bounding convex MINLP relaxation Q. To ob-


tain our convex MINLP relaxation Q of the MINLP problem P, we need
to locate the subintervals of the domain of each univariate function gi for
which the function is uniformly convex or concave.  For simplicity of no-
tation, rather than refer to the constraint ri (x) + k∈H(i) gik (xk ) ≤ 0,
we consider a term of the form g(xk ) := gik (xk ), where g : R → R is a
univariate non-convex function of xk , for some k (1 ≤ k ≤ n).
We want to explicitly view each such g as a piecewise-defined function,
where on each piece the function is either convex or concave. This feature
also allows us to handle functions that are already piecewise defined by
the modeler. In practice, for each non-convex function g , we compute
the points at which the convexity/concavity may change, i.e., the zeros of
the second derivative of g , using MATLAB. In case a function g is naturally
piecewise defined, we are essentially refining the piecewise definition of it
in such a way that the convexity/concavity is uniform on each piece.
Example 1. Consider the piecewise-defined univariate function

1 + (xk − 1)3 , for 0 ≤ xk ≤ 2 ;
g(xk ) :=
1 + (xk − 3)2 , for 2 ≤ xk ≤ 4 ,

depicted in Fig. 1. In addition to the breakpoints xk = 0, 2, 4 of the defi-


nition of g, the convexity/concavity changes at xk = 1, so by utilizing an
additional breakpoint at xk = 1 the convexity/concavity is now uniform on
each piece.
Now, on each concave piece we can use a secant approximation to give
a piecewise-convex lower approximation of g.

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 319

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Fig. 1. A piecewise-defined univariate function.

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Fig. 2. A piecewise-convex lower approximation.

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Fig. 3. Improved piecewise-convex lower approximation.

www.it-ebooks.info
320 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

Example 1, continued. Relative to g(xk ) of Example 1, we have


the piecewise-convex lower approximation

⎨ xk , for 0 ≤ xk ≤ 1 ;
g(xk ) := 1 + (xk − 1)3 , for 1 ≤ xk ≤ 2 ;

1 + (xk − 3)2 , for 2 ≤ xk ≤ 4 ,

depicted in Fig. 2.
We can obtain a better lower bound by refining the piecewise-linear
lower approximation on the concave pieces. We let

Lk =: P0 < P1 < · · · < Pp := Uk (2.1)

be the ordered breakpoints at which the convexity/concavity of g changes,


including, in the case of piecewise definition of g, the points at which the
definition of g changes. We define:

[Pp−1 , Pp ] := the p-th subinterval of the domain of g (p ∈ {1 . . . p});


Ȟ := the set of indices of subintervals on which g is convex;
Ĥ := the set of indices of subintervals on which g is concave.

On the concave intervals, we will allow further breakpoints. We let


Bp be the ordered set of breakpoints for the concave interval indexed by
p ∈ Ĥ. We denote these breakpoints as

Pp−1 =: Xp,1 < Xp,2 < · · · < Xp,|Bp | := Pp ,

and in our relaxation we will view g as lower bounded by the piecewise-


linear function that has value g(Xp,j ) at the breakpoints Xp,j , and is
otherwise linear between these breakpoints.
Example 1, continued again. Utilizing further breakpoints, for ex-
ample at xk = 1/3 and xk = 2/3, we can improve the piecewise-convex
lower approximation to
⎧ 19
⎪ 9 xk , for 0 ≤ xk ≤ 13 ;

⎪ . /

⎪ 19 7 1 1 2
⎨ 27 + 9 .xk − 3 / , for 3 ≤ xk ≤ 3 ;
g(xk ) := 26
+ 19 xk − 23 , for 23 ≤ xk ≤ 1 ;

⎪ 27

⎪ 3
⎪ 1 + (xk − 1) ,
⎩ for 1 ≤ xk ≤ 2 ;
1 + (xk − 3)2 , for 2 ≤ xk ≤ 4 ,

depicted in Fig. 3.
Next, we define further variables to manage our convexification of g
on its domain:

zp := a binary variable indicating if xk ≥ Pp (p = 1, . . . , p − 1);

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 321

δp := a continuous variable assuming a positive value iff xk ≥ Pp−1 (p =


1, . . . , p);
αp,b := weight of breakpoint b in the piecewise-linear approximation of the
interval indexed by p (p ∈ Ĥ, b ∈ Bp ).
In the convex relaxation of the original MINLP P, we substitute each
univariate non-convex term g(xk ) with
   p−1
p∈Ȟ g(Pp−1 + δp ) + p∈Ĥ b∈Bp g(Xp,b ) αp,b − p=1 g(Pp ) , (2.2)

and we include the following set of new constraints:



P0 + pp=1 δp − xk = 0 ; (2.3)
δp − (Pp − Pp−1 )zp ≥ 0 , ∀p ∈ Ȟ ∪ Ĥ ; (2.4)
δp − (Pp − Pp−1 )zp−1 ≤ 0 , ∀p ∈ Ȟ ∪ Ĥ ; (2.5)

Pp−1 + δp − b∈Bp Xp,b αp,b = 0 , ∀p ∈ Ĥ ; (2.6)

b∈Bp αp,b = 1 , ∀p ∈ Ĥ ; (2.7)
{αp,b : b ∈ Bp } := SOS2 , ∀p ∈ Ĥ , (2.8)
with two dummy variables z0 := 1 and zp := 0.
Constraints (2.3–2.5) together with the integrality of the z variables
ensure that, given an xk value, say x∗k ∈ [Pp∗ −1 , Pp∗ ]:

⎪ ∗
⎨Pp − Pp−1 , if 1 ≤ p ≤ p − 1 ;
δp = x∗k − Pp−1 , if p = p∗ ;


0, otherwise.

Constraints (2.6–2.8) ensure that, for each concave interval, the convex
combination of the breakpoints is correctly computed. Finally, (2.2) ap-
proximates the original non-convex univariate function g(xk ). Each single
term of the first and the second summations, using the definition of δp ,
reduces, respectively, to


⎨g(Pp ) , if p ∈ {1, . . . , p∗ − 1} ;
g(Pp−1 + δp ) = g(x∗k ) , if p = p∗ ;


g(Pp−1 ) , if p ∈ {p∗ + 1, . . . , p} ,
and

⎪g(Pp ) ,
⎨ if p ∈ {1, . . . , p∗ − 1} ;

g(Xp,b ) αp,b = b∈Bp∗ g(Xp ,b ) αp ,b ,
∗ ∗ if p = p∗ ;


b∈Bp g(Pp−1 ) , if p ∈ {p∗ + 1, . . . , p} ,

reducing expression (2.2) to


p∗ −1 p p−1
p=1 g(Pp ) + γ + p=p∗ +1 g(Pp−1 ) − p=1 g(Pp ) = γ ,

www.it-ebooks.info
322 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

where

g(x∗k ) , if p∗ ∈ Ȟ ;
γ= 
b∈Bp∗ g(Xp∗ ,b ) αp∗ b , if p∗ ∈ Ĥ .

So, if x∗k is in a subinterval on which g is convex, then the approximation


(2.2) is exact; while if x∗k is in a subinterval on which g is concave, then
the approximation is a piecewise-linear underestimation of g.
Constraints (2.8) define |Ĥ| Special Ordered Sets of Type 2 (SOS2),
i.e., ordered sets of positive variables among which at most 2 can assume
a non-zero value, and, in this case, they must be consecutive (see Beale
and Tomlin [1]). Unfortunately, at the moment, convex MINLP solvers do
not typically handle SOS2 like most MILP solvers do (also defining special-
purpose branching strategies). For this reason, we substitute constraints
(2.8), ∀p ∈ Ĥ, with new binary variables yp,b , with b ∈ {1, . . . , |Bp | − 1},
and constraints:
αp,b ≤ yp,b−1 + yp,b ∀b ∈ Bp ; (2.8.a)
|Bp |−1
b=1 yp,b = 1 , (2.8.b)
with dummy values yp,0 = yp,|Bp | = 0. In the future, when convex MINLP
solvers will handle the definition of SOS2, variables y and constraints
(2.8.a–b) would be not necessary.
It is important to note that if we utilized a very large number of break-
points at the start, solving the resulting convex MINLP Q would mean es-
sentially solving globally the original MINLP P up to some pre-determined
tolerance related to the density of the breakpoints. But of course such a
convex MINLP Q would be too hard to be solved in practice. With our
algorithmic framework, we dynamically seek a significantly smaller convex
MINLP Q , thus generally more easily solvable, which we can use to guide
the non-convex NLP restriction R to a good local solution, eventually set-
tling on and proving global optimality of such a solution to the original
MINLP P .
2.2. The upper-bounding non-convex NLP restriction R.
Given a solution x of the convex MINLP relaxation Q , the upper-bounding
restriction R is defined as the non-convex NLP:

min j∈N Cj xj
subject to
f (x) ≤ 0;
(R)
ri (x) + k∈H(i) gik (xk ) ≤ 0 , ∀i ∈ M ;
Lj ≤ xj ≤ Uj , ∀j ∈ N ;
xj = xj , ∀j ∈ I .
A solution of this non-convex NLP R is a heuristic solution of the
non-convex MINLP problem P for two reasons: (i) the integer variables

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 323

xj , j ∈ I , might not be fixed to globally optimal values; (ii) the NLP


R is non-convex, and so even if the integer variables xj , j ∈ I , are
fixed to globally optimal values, the NLP solver may only find a local
optimum of the non-convex NLP R or even fail to find a feasible point. This
consideration emphasizes the importance of the lower-bounding relaxation
Q for the guarantee of global optimality. The upper-bounding problem
resolution could be seen as a “verification phase” in which a solution of
the convex MINLP relaxation Q is tested to be really feasible for the non-
convex MINLP P . To emphasis this, the NLP solver for R is given the
solution of the convex MINLP relaxation as starting point.
2.3. The refinement technique. At the end of each iteration, we
have two solutions: x , the solution of the lower-bounding convex MINLP
relaxation Q , and x , the solution of the upper-bounding non-convex NLP
restriction R ; in case we cannot
  of R , e.g., if R is infeasible,
find a solution
then no x is available. If j∈N Cj xj = j∈N Cj xj within a certain tol-
erance, or if x is sufficiently feasible for the original constraints, we return
to the user as solution the point x or x, respectively. Otherwise, in order
to continue, we want to refine the approximation of the lower-bounding
convex MINLP relaxation Q by adding further breakpoints. We employed
two strategies:
• Based on the lower-bounding problem solution x: For each i ∈ M
and k ∈ H(i), if xk lies in a concave interval of gik , add xk as a
breakpoint for the relaxation of gik .
This procedure drives the convergence of the overall method since it
makes sure that the lower bounding problem becomes eventually a
sufficiently accurate approximation of the original problem in the
neighborhood of the global solution. Since adding a breakpoint
increases the size of the convex MINLP relaxation, in practice we
do not add such a new breakpoint if it would be within some small
tolerance of an existing breakpoint for gik .
• Based on the upper-bounding problem solution x: For each i ∈ M
and k ∈ H(i), if xk lies in a concave interval of gik , add xk as a
breakpoint for the relaxation of gik .
The motivation behind this option is to accelerate the convergence
of the method. If the solution found by the upper-bounding prob-
lem is indeed the global solution, the relaxation should eventually
be exact at this point to prove its optimality. Again, to keep the
size of the relaxation MINLP manageable, breakpoints are only
added if they are not too close to existing ones.
We found that these strategies work well together. Hence, at each
major iteration, we add a breakpoint in each concave interval where x lies
in order to converge and one where x lies to speed up the convergence.
2.4. The algorithmic framework. Algorithm 1 details our
SC-MINLP (Sequential Convex MINLP) Algorithm, while Figure 4 depicts,

www.it-ebooks.info
324 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

at a high level, how we designed our implementation of it. For an opti-


mization problem, val(·) refers to its optimal objective value.

Algorithm 1 : SC-MINLP (Sequential Convex MINLP) Algorithm


Choose tolerances ε, εfeas > 0; initialize LB := −∞; U B := +∞;
Find Ppi , Ĥ i , Ȟ i , Xpb
i
(∀i ∈ M, p ∈ {1 . . . pi }, b ∈ Bpi ).
repeat
Solve the convex MINLP relaxation Q of the original problem P to
obtain x;
if (x is feasible for the original problem P (within tolerance εfeas))
then
return x
end if
if (val(Q) > LB) then
LB := val(Q);
end if
Solve the non-convex NLP restriction R of the original problem P to
obtain x;
if (solution x could be computed and val(R) < U B) then
U B := val(R); xUB := x
end if
if (U B − LB > ε) then
Update Bpi , Xpb i
;
end if
until ((U B − LB ≤ ε) or (time or iteration limited exceeded))
return the current best solution xUB

At each iteration, the lower-bounding MINLP relaxation Q and the


upper-bounding NLP restriction R are redefined: What changes in Q are
the sets of breakpoints that refine the piecewise-linear approximation of
concave parts of the non-convex functions. At each iteration, the number of
breakpoint used increases, and so does the accuracy of the approximation.
What may change in R are the values of the fixed integer variables xj ,
j ∈ I . Moreover, what changes is the starting point given to the NLP
solver, derived from an optimal solution of the lower-bounding MINLP
relaxation Q.
Our algorithmic framework bears comparison with spatial branch-and-
bound, a successful technique in global optimization. In particular:
• during the refining phase, the parts in which the approximation is
bad are discovered and the approximation is improved there, but
we do it by adding one or more breakpoints instead of branching
on a continuous variable as in spatial branch-and-bound;
• like spatial branch-and-bound, our approach is a rigorous global-
optimization algorithm rather than a heuristic;

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 325

Fig. 4. The SC-MINLP (Sequential Convex MINLP) framework.

• unlike spatial branch-and-bound, our approach does not utilize an


expression tree; it works directly on the broad class of separable
non-convex MINLPs of the form P, and of course problems that
can be put in such a form;
• unlike standard implementations of spatial branch-and-bound
methods, we can directly keep multivariate convex functions in
our relaxation instead of using linear approximations;
• unlike spatial branch-and-bound, our method can be effectively
implemented at the modeling-language level.

2.5. Convergence analysis. For the theoretical convergence analy-


sis of Algorithm 1, we make the following assumptions, denoting by l the
iteration counter for the repeat loop.
A1. The functions f (x) and ri (x) are continuous, and the univari-
ate functions gik in (P) are uniformly Lipschitz-continuous with
a bounded Lipschitz constant Lg .
A2. The set of breakpoints (2.1) are correctly identified.
A3. The problem P has a feasible point. Hence, for each l, the relax-
ation Ql is feasible, and we assume its (globally) optimal solution
xl is computed.
A4. The refinement technique described in Section 2.3 adds a break-
point for every lower-bounding problem solution xl , even if it is
very close to an existing breakpoint.
A5. The feasibility tolerance εfeas and the optimality gap tolerance ε
are both chosen to be zero, and no iteration limit is set.
Theorem 2.1. Under assumptions A1-A5, Algorithm 1 either termi-
nates at a global solution of the original problem P, or each limit point of

the sequence {xl }l=1 is a global solution of P.

www.it-ebooks.info
326 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

Proof. By construction, Ql is always a relaxation of P, and hence


val(Ql ) is always less than or equal to the value val(P) of the objective
function at the global solution to P.
If the algorithm terminates in a finite number of iterations, it either
returns an iterate xl that is feasible for P (after the feasibility test for P)
and therefore a global solution for P, or it returns a best upper bounding
solution xUB , which as a solution for R is feasible for P and has the same
value as any global solution for P (since U B = LB).
In the case that the algorithm generates an infinite sequence {xl }
of iterates, let x∗ be a limit point of this sequence, i.e., there exists a
∞ ∞
subsequence {xl̃ }l̃=1 of {xl }l=1 converging to x∗ . As a solution of Ql̃ , each
xl̃ satisfies the constraints of P, except for

ri (x) + gik (xk ) ≤ 0 , ∀i ∈ M, (2.9)
k∈H(i)

because the “gik (xk )” terms are replaced by a piecewise convex relaxation,

see (2.2). We denote the values of their approximation by g̃ik (xk ). Then,

the solutions of Q satisfy

ri (xl̃ ) + l̃
g̃ik (xl̃k ) ≤ 0 , ∀i ∈ M. (2.10)
k∈H(i)

l̃2
Now choose l̃1 , l̃2 with l̃2 > l̃1 . Because the approximation g̃ik (xk ) is
defined to coincide with the convex parts of gik (xk ) and is otherwise a linear
interpolation between breakpoints (note that xl̃k1 is a breakpoint for Ql̃2 if
xl̃k1 is in an interval where gik (xk ) is concave), the Lipschitz-continuity of
the gik (xk ) gives us

l̃2
g̃ik (xl̃k2 ) ≥ gik (xl̃k1 ) − Lg |xl̃k2 − xkl̃1 |.

Together with (2.10) we therefore obtain


  
ri (xl̃2 ) + gik (xl̃k1 ) ≤ ri (xl̃2 ) + l̃2
g̃ik (xl̃k2 ) + Lg |xl̃k2 − xkl̃1 |
k∈H(i) k∈H(i) k∈H(i)

≤ Lg |xl̃k2 − xkl̃1 |
k∈H(i)

for all i ∈ M . Because l̃1 , l̃2 with l̃2 > l̃1 have been chosen arbitrarily,
taking the limit as l̃1 , l̃2 → ∞ and using the continuity of r shows that x∗
satisfies (2.9). The continuity of the remaining constraints in P ensure that
x∗ is feasible for P. Because val(Ql̃ ) ≤ val(P) for all l̃, we finally obtain
val(Q∗ ) ≤ val(P), so that x∗ must be a global solution of P.

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 327

3. Computational results. We implemented our algorithmic frame-


work as an AMPL script, and we used MATLAB as a tool for numerical convex-
ity analysis, BONMIN as our convex MINLP solver, and IPOPT as our NLP
solver.
We used MATLAB to detect the subintervals of convexity and concavity
for the non-convex univariate functions in the model. In particular, MATLAB
reads a text file generated by the AMPL script, containing the constraints
with univariate non-convex functions, together with the names and bounds
of the independent variables. With this information, using the Symbolic
Math Toolbox, MATLAB first computes the formula for the second deriva-
tive of each univariate non-convex function, and then computes its zeros to
split the function into subintervals of convexity and concavity. The zeros
are computed in the following manner. The second derivative is evalu-
ated at 100 evenly-spaced points on the interval defined by the variable’s
lower and upper bounds. Then, between points where there is a change
in the sign of the second derivative, another 100 evenly-spaced points are
evaluated. Finally, the precise location of each breakpoint is computed us-
ing the MATLAB function “fzero”. For each univariate non-convex function,
we use MATLAB to return the number of subintervals, the breakpoints, and
associated function values in a text file which is read by the AMPL script.
In this section we present computational results for four problem cat-
egories. The tests were executed on a single processor of an Intel Core2
CPU 6600, 2.40 GHz with 1.94 GB of RAM, using a time limit of 2 hours
per instance. The relative optimality gap and feasibility tolerance used for
all the experiments is 10−4 , and we do not add a breakpoint if it would be
within 10−5 of an existing breakpoint.
For each set of problems, we describe the non-convex MINLP model
P. Two tables with computational results exhibit the behavior of our
algorithm on some instances of each problem class. The first table presents
the iterations of our SC-MINLP Algorithm, with the columns labeled as
follows:
• instance: the instance name;
• var/int/cons: the total number of variables, the number of integer
variables, and the number of constraints in the convex relaxation
Q;
• iter #: the iteration count;
• LB: the value of the lower bound;
• UB: the value of the upper bound;
• int change: indicated whether the integer variables in the lower
bounding solution x are different compared to the previous itera-
tion;
• time MINLP: the CPU time needed to solve the convex MINLP
relaxation Q to optimality (in seconds);
• # br added: the number of breakpoints added at the end of the
previous iteration.

www.it-ebooks.info
328 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

The second table presents comparisons of our SC-MINLP Algorithm with


COUENNE and BONMIN. COUENNE is an open-source Branch-and-Bound algo-
rithm aimed at the global solution of MINLP problems [2, 6]. It is an
exact method for the problems we address in this paper. BONMIN is an
open-source code for solving general MINLP problems [3, 4], but it is an
exact method only for convex MINLPs. Here, BONMIN’s nonlinear branch-
and-bound option was chosen. When used for solving non-convex MINLPs,
the solution returned is not guaranteed to be a global optimum. However,
a few heuristic options are available in BONMIN, specifically designed to
treat non-convex MINLPs. Here, we use the option that allows solving
the root node with a user-specified number of different randomly-chosen
starting points, continuing with the best solution found. This heuristic use
of BONMIN is in contrast to its use in SC-MINLP, where BONMIN is employed
only for the solution of the convex MINLP relaxation Q.
The columns in the second table have the following meaning:
• instance: the instance name;
• var/int/cons: the total number of variables, the number of integer
variables, and the number of constraints;
• for each approach, in particular SC-MINLP, COUENNE, BONMIN 1,
BONMIN 50, we report:
– time (LB): the CPU time (or the value of the lower bound (in
parentheses) if the time limit is reached);
– UB: the value of the upper bound.
BONMIN 1 and BONMIN 50 both refer to the use of BONMIN, but they
differ in the number of multiple solutions of the root node; in the first case,
the root node is solved just once, while in the second case, 50 randomly-
generated starting points are given to the root-node NLP solver. If BONMIN
reached the time limit, we do not report the lower bound because BONMIN
cannot determine a valid lower bound for a non-convex problem.
3.1. Uncapacitated Facility Location (UFL) problem. The
UFL application is presented in [12]. The set of customers is denoted
with T and the set of facilities is denoted with K (wkt is the fraction of
demand of customer t satisfied by facility k for each t ∈ T, k ∈ K). Uni-
variate non-convexity in the model arises due to nonlinear shipment costs.
The UFL model formulation is as follows:
 
min k∈K Ck yk + t∈T vt
subject to

vt ≥ − k∈K Skt skt , ∀ t ∈ T ;
skt ≤ gkt (wkt ) , ∀ t ∈ T ;
w ≤ yk , ∀ t ∈ T, k ∈ K ;
kt
k∈K wkt = 1 , ∀ t ∈ T ;
wkt ≥ 0 , ∀ t ∈ T, k ∈ K ;
yk ∈ {0, 1} , ∀ k ∈ K .

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 329

Figure 5 depicts the three different nonlinear functions gkt (wkt ) that
were used for the computational results presented in Tables 1 and 2. The
dashed line depicts the non-convex function, while the solid line indicates
the initial piecewise-convex underestimator. Note that the third function
was intentionally designed to be pathological and challenging for SC-MINLP.
The remaining problem data was randomly generated.
In Table 1 the performance of SC-MINLP is shown. For the first in-
stance, the global optimum is found at the first iteration, but 4 more iter-
ation are needed to prove global optimality. In the second instance, only
one iteration is needed. In the third instance, the first feasible solution
found is not the global optimum which is found at the third (and last) iter-
ation. Table 2 demonstrates good performance of SC-MINLP. In particular,
instance ufl 1 is solved in about 117 seconds compared to 530 seconds
needed by COUENNE, instance ufl 2 in less than 18 seconds compared to
233 seconds. In instance ufl 3, COUENNE performs better than SC-MINLP,
but this instance is really quite easy for both algorithms. BONMIN 1 finds
solutions to all three instances very quickly, and these solutions turn out
to be globally optimal (but note, however, that BONMIN 1 is a heuristic
algorithm and no guarantee of the global optimality is given). BONMIN 50
also finds the three global optima, but in non-negligible time (greater than
the one needed by SC-MINLP in 2 out of 3 instances).

3.2. Hydro Unit Commitment and Scheduling problem. The


Hydro Unit Commitment and Scheduling problem is described in [5]. Uni-
variate non-convexity in the model arises due to the dependence of the
power produced by each turbine on the water flow passing through the
turbine. The following model was used for the computational results of
Tables 3 and 4:
  1 2
min − j∈J t∈T Δt Πt pjt − Cj w
0jt − (Dj + Πt Ej )0
yjt
subject to
vt − Vt = 0 ; 
vt − vt−1 − 3600 Δt (It − j∈J qjt − st ) = 0 , ∀t ∈ T ;
qjt − (Q− j ujt + Qj gjt ) ≥ 0 , ∀j ∈ J, t ∈ T ;
qjt − (Q− j ujt + Qj gjt ) ≤ 0 , ∀j ∈ J, t ∈ T ;

(qjt − qj(t−1) ) + Δq− ≥ 0 , ∀t ∈ T ;
j∈J
(q − qj(t−1) ) − Δq+ ≤ 0 , ∀t ∈ T ;
 jt
j∈J
st − j∈J (Wj w 0jt + Yj y0jt ) ≥ 0 , ∀t ∈ T ;

j∈J qjt + st − Θ ≥ 0 , ∀t ∈ T ;
gjt − gj(t−1) − (w 0jt − wjt ) = 0 , ∀j ∈ J, t ∈ T ;
w0jt + wjt ≤ 1 , ∀j ∈ J, t ∈ T ;
ujt − uj(t−1) − (0 yjt − yjt ) = 0 , ∀j ∈ J, t ∈ T ;
y0jt + yjt ≤ 1 , ∀j ∈ J, t ∈ T ;
gjt + ukt ≤ 1 , ∀j, k ∈ J, t ∈ T ;

www.it-ebooks.info
330 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

50

45

40

35

30

25

20

15

10

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

400

350

300

250

200

150

100

50

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

45

40

35

30

25

20

15

10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 5. UFL: Shapes of −gkt (wkt ) for the three instances.

www.it-ebooks.info
Table 1
Results for Uncapacitated Facility Location problem.

iter int time # br


instance var/int/cons # LB UB change MINLP added
ufl 1 153/39/228 1 4,122.000 4,330.400 - 1.35 -
... 2 4,324.780 4,330.400 no 11.84 11
... 3 4,327.724 4,330.400 no 19.17 5
... 4 4,328.993 4,330.400 no 30.75 5
205/65/254 5 4,330.070 4,330.400 no 45.42 5
ufl 2 189/57/264 1 27,516.600 27,516.569 - 4.47 -

www.it-ebooks.info
ufl 3 79/21/101 1 1,947.883 2,756.890 - 2.25 -
... 2 2,064.267 2,756.890 no 2.75 2
87/25/105 3 2,292.743 2,292.777 no 3.06 2
AN ALGORITHMIC FRAMEWORK FOR MINLP
331
332

Table 2
Results for Uncapacitated Facility Location problem.

SC-MINLP COUENNE BONMIN 1 BONMIN 50


var/int/cons time time
instance original (LB) UB (LB) UB time UB time UB
ufl 1 45/3/48 116.47 4,330.400 529.49 4,330.400 0.32 4,330.400 369.85 4,330.400
ufl 2 45/3/48 17.83 27,516.569 232.85 27,516.569 0.97 27,516.569 144.06 27,516.569

www.it-ebooks.info
ufl 3 32/2/36 8.44 2,292.777 0.73 2,292.775 3.08 2,292.777 3.13 2,292.775
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
AN ALGORITHMIC FRAMEWORK FOR MINLP 333

ujt ≤ n − 1 , ∀t ∈ T ;
j∈J
pjt − ϕ(qjt ) = 0 , ∀j ∈ J, t ∈ T.

Figure 6 shows the non-convex functions ϕ(qjt ) used for the three instances.
The remaining problem data was chosen according to [5].
Our computational results are reported in Tables 3 and 4. We observe
good performance of SC-MINLP. It is able to find the global optimum of the
three instances within the time limit, but COUENNE does not solve to global
optimality any of the instances. Also, BONMIN 1 and BONMIN 50 show good
performance. In particular, often a good solution is found in few seconds,
and BONMIN 1 finds the global optimum in one case.

3.3. Nonlinear Continuous Knapsack problem. In this subsec-


tion, we present results for the Nonlinear Continuous Knapsack problem.
This purely continuous problem can be motivated as a continuous resource-
allocation problem. A limited resource of amount C (such as advertising
dollars) has to be partitioned for different categories j ∈ N (such as ad-
vertisements for different products). The objective function is the over-
all return from all categories. The non-convexity arises because a small
amount of allocation provides only a small return, up to a threshold, when
the advertisements are noticed by the consumers and result in substantial
sales. On the other hand, at some point, saturation sets in, and an increase
of advertisement no longer leads to a significant increase in sales.
Our model is the non-convex NLP problem:

min − j∈N pj
subject to
p
 − gj (xj ) ≤ 0 , ∀j ∈ N ;
j
j∈N xj ≤ C ;
0 ≤ xj ≤ U , ∀j ∈ N ,

where gj (xj ) = cj /(1 + bj exp−aj (xj +dj ) ) .


Note that the sigmoid function gj (xj ) is increasing, and it is convex
up to the inflection point at x = −dj + ln(bj )/aj , whereupon it becomes
concave.
The instances were generated randomly. In particular, independently
for each j ∈ N , aj was uniformly generated in the interval [0.1, 0.2], bj
and cj in the interval [0, U ] and dj in the interval [−U, 0]. The name of
each instance contains information about the values |N | and C, namely,
nck |N | C.
Our computational results are reported in Tables 5 and 6. SC-MINLP
finds the global optimum for all the 6 instances in less than 3 minutes.
COUENNE is able to close the gap for only 2 instances within the time limit.
BONMIN 1 and BONMIN 50 terminate quickly, but the global optimum is
found only for 1 instance for BONMIN 1 and 2 instances for BONMIN 50.

www.it-ebooks.info
334

Table 3
Results for Hydro Unit Commitment and Scheduling problem.

iter int time # br


instance var/int/cons # LB UB change MINLP added
hydro 1 324/142/445 1 -10,231.039 -10,140.763 - 18.02 -
332/146/449 2 -10,140.760 -10,140.763 no 23.62 4
hydro 2 324/142/445 1 -3,950.697 -3,891.224 - 21.73 -
... 2 -3,950.583 -3,891.224 no 21.34 2
... 3 -3,950.583 -3,891.224 no 27.86 2
336/148/451 4 -3,932.182 -3,932.182 no 38.20 2

www.it-ebooks.info
hydro 3 324/142/445 1 -4,753.849 -4,634.409 - 59.33 -
... 2 -4,719.927 -4,660.189 no 96.93 4
336/148/451 3 -4,710.734 -4,710.734 yes 101.57 2
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
Table 4
Results for Hydro Unit Commitment and Scheduling problem.

SC-MINLP COUENNE BONMIN 1 BONMIN 50


var/int/cons time time
instance original (LB) UB (LB) UB time UB time UB
hydro 1 124/62/165 107.77 -10,140.763 (-11,229.80) -10,140.763 5.03 -10,140.763 5.75 -7,620.435
hydro 2 124/62/165 211.79 -3,932.182 (-12,104.40) -2,910.910 4.63 -3,928.139 7.02 -3,201.780

www.it-ebooks.info
hydro 3 124/62/165 337.77 -4,710.734 (-12,104.40) -3,703.070 5.12 -4,131.095 13.76 -3,951.199
AN ALGORITHMIC FRAMEWORK FOR MINLP
335
336 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

−2

−4

−6

−8

−10

−12

−14

−16

−18

0 5 10 15 20 25 30 35 40

(a) Instance hydro 1

−1

−2

−3

−4

−5

−6

−7
0 5 10 15 20 25 30 35 40

(b) Instance hydro 2

−1

−2

−3

−4

−5

−6

−7

−8

−9
0 5 10 15 20 25 30 35 40

(c) Instance hydro 3

Fig. 6. Hydro UC: Shapes of −ϕ(qjt ) for the three instances.

www.it-ebooks.info
Table 5
Results for Nonlinear Continuous Knapsack problem.

iter int time # br


instance var/int/cons # LB UB change MINLP added
nck 20 100 144/32/205 1 -162.444 -159.444 - 0.49 -
146/33/206 2 -159.444 -159.444 - 0.94 1
nck 20 200 144/32/205 1 -244.015 -238.053 - 0.67 -
... 2 -241.805 -238.053 - 0.83 1
... 3 -241.348 -238.053 - 1.16 1
... 4 -240.518 -238.053 - 1.35 1
... 5 -239.865 -238.053 - 1.56 1
... 6 -239.744 -238.053 - 1.68 1
156/38/211 7 -239.125 -239.125 - 1.81 1
nck 20 450 144/32/205 1 -391.499 -391.337 - 0.79 -
146/32/206 2 -391.364 -391.337 - 0.87 1
nck 50 400 356/78/507 1 -518.121 -516.947 - 4.51 -

www.it-ebooks.info
... 2 -518.057 -516.947 - 14.94 2
... 3 -517.837 -516.947 - 23.75 2
... 4 -517.054 -516.947 - 25.07 2
372/86/515 5 -516.947 -516.947 - 31.73 2
nck 100 35 734/167/1035 1 -83.580 -79.060 - 3.72 -
AN ALGORITHMIC FRAMEWORK FOR MINLP

... 2 -82.126 -81.638 - 21.70 2


... 3 -82.077 -81.638 - 6.45 2
744/172/1040 4 -81.638 -81.638 - 11.19 1
nck 100 80 734/167/1035 1 -174.841 -171.024 - 6.25 -
... 2 -173.586 -172.631 - 24.71 2
337

742/171/1039 3 -172.632 -172.632 - 12.85 2


338

Table 6
Results for Nonlinear Continuous Knapsack problem.

SC-MINLP COUENNE BONMIN 1 BONMIN 50


var/int/cons time time
instance original (LB) UB (LB) UB time UB time UB
nck 20 100 40/0/21 15.76 -159.444 3.29 -159.444 0.02 -159.444 1.10 -159.444
nck 20 200 40/0/21 23.76 -239.125 (-352.86) -238.053 0.03 -238.053 0.97 -239.125
nck 20 450 40/0/21 15.52 -391.337 (-474.606) -383.149 0.07 -348.460 0.84 -385.546
nck 50 400 100/0/51 134.25 -516.947 (-1020.73) -497.665 0.08 -438.664 2.49 -512.442

www.it-ebooks.info
nck 100 35 200/0/101 110.25 -81.638 90.32 -81.638 0.04 -79.060 16.37 -79.060
nck 100 80 200/0/101 109.22 -172.632 (-450.779) -172.632 0.04 -159.462 15.97 -171.024
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
AN ALGORITHMIC FRAMEWORK FOR MINLP 339

−1
−10
−2

−3 −20

−4
−30
−5

−40
−6

−7
−50

−8

−60
−9

−10
−70

−11

0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

−0.1 −10

−0.2 −20

−30
−0.3

−40
−0.4

−50
−0.5

−60
−0.6

−70
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Fig. 7. Example shapes of −gj (xj ) for instance nck 20 100.

3.4. GLOBALLib and MINLPLib instances. Our algorithm is


suitable for problems with functions having all non-convexities manifested
as sums of univariate non-convex functions, and moreover the variables in
the univariate non-convex functions should be bounded. We selected 16
instances from GLOBALLib [11] and 9 from MINLPLib [15] to test our
algorithm. These instances were selected because they were easily refor-
mulated to have these properties. Note that we are not advocating solution
of general global optimization problems via reformulation as problems of
the form P , followed by application of our algorithm SC-MINLP. We simply
produced these reformulations to enlarge our set of test problems.
To reformulate these instances in order to have univariate non-convex
functions, we used as a starting point the reformulation performed by
COUENNE [2]. It reformulates MINLP problems in order to have the fol-
lowing “basic operators”: sum, product, quotient, power, exp, log, sin,
cos, abs. The only non-univariate basic operators are product and quo-
tient (power xy is converted to ey log(x) ). In these cases we modified the
reformulation of COUENNE in order to have only univariate non-convexities:

www.it-ebooks.info
340 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

• product: xy is transformed into (w12 − w22 )/4 with w1 = x + y and


w2 = x − y ;
• quotient: x/y is transformed into xw and w = 1/y and xw is then
treated as any other product.
We report in the first part of the table the GLOBALLib results. The
second part of the table reports the MINLPLib results. The instances were
selected among the ones reported in the computational results of a recent
paper [2]. Some of these particular MINLPLib instances are originally con-
vex, but their reformulation is non-convex. It is clear that the employment
of specific solvers for convex MINLPs such as BONMIN to solve this kind of
problem is much more efficient and advisable. However, we used the non-
convex reformulation of these instances to extend our test bed. In Table 8,
the second column reports the number of variables, integer variables and
constraints of the reformulated problem.
Table 8 shows that COUENNE performs much better than SC-MINLP on
the GLOBALLib instances that we tested. But on such small instances,
COUENNE behaves very well in general, so there cannot be much to recom-
mend any alternative algorithm. Also BONMIN 1 and BONMIN 50 perform
well. They find global optima in 10 (resp. 13) out of the 16 GLOBAL-
Lib instances rather quickly. However, in 3 (resp. 1) instances they fail to
produce a feasible solution.
Concerning the MINLPLib instances, 4 out of 9 instances are solved to
global optimality by SC-MINLP. COUENNE finishes within the imposed time
limit on precisely the same 4 instances (in one case, COUENNE with default
settings incorrectly returned a solution with a worse objective function).
In the 5 instances not solved (by both solvers) within the imposed time
limit, the lower bound given by SC-MINLP is always better (higher) than
the one provided by COUENNE. This result emphasizes the quality of the
lower bound computed by the solution of the convex MINLP relaxation Q.
Moreover, the upper bound computed by SC-MINLP is better in 2 instances
out of 5. Concerning BONMIN 1 and BONMIN 50 performance, when they
terminate before the time limit is reached (4 instances out of 9), the solution
computed is globally optimal for 3 instances. Note that in these cases, the
CPU time needed by BONMIN 1 and BONMIN 50 is often greater than that
needed by SC-MINLP. When the time limit is reached, BONMIN 1 computes
a better solution than SC-MINLP in only 1 instance out of 5 (for 1 instance
they provide the same solution) and BONMIN 50 computes a better solution
than SC-MINLP in 1 only instance out of 5.
Finally, we note that for several instances of applying SC-MINLP, in
just the second iteration, the convex MINLP solver BONMIN was in the
midst of working when our self-imposed time limit of 2 hours was reached.
In many of these cases, much better lower bounds could be achieved by
increasing the time limit. Generally, substantial improvements in BONMIN
would produce a corresponding improvement in the results obtained with
SC-MINLP.

www.it-ebooks.info
Table 7
Results for GLOBALLib and MINLPLib.

reformulated iter int time # br


instance var/int/cons # LB UB change MINLP added
ex14 2 1 239/39/358 1 0.000 0.000 - 5.35 -
ex14 2 2 110/18/165 1 0.000 0.000 - 3.33 -
ex14 2 6 323/53/428 1 0.000 0.000 - 7.02 -
ex14 2 7 541/88/808 1 0.000 0.000 - 1.06 -
ex2 1 1 27/5/38 1 -18.900 -16.500 - 0.01 -
... 2 -18.318 -16.500 - 0.06 1
... 3 -18.214 -16.500 - 0.12 1
... 4 -18.000 -16.500 - 0.15 1
... 5 -17.625 -17.000 - 0.22 2
39/11/44 6 -17.000 -17.000 - 0.26 1
ex2 1 2 29/5/40 1 -213.000 -213.000 - 0.01 -

www.it-ebooks.info
ex2 1 3 36/4/41 1 -15.000 -15.000 - 0.00 -
ex2 1 4 15/1/16 1 -11.000 -11.000 - 0.00 -
ex2 1 5 50/7/72 1 -269.453 -268.015 - 0.01 -
54/9/74 2 -268.015 -268.015 - 0.15 2
ex2 1 6 56/10/81 1 -44.400 -29.400 - 0.01 -
AN ALGORITHMIC FRAMEWORK FOR MINLP

... 2 -40.500 -39.000 - 0.16 2


... 3 -40.158 -39.000 - 0.25 1
66/15/86 4 -39.000 -39.000 - 0.52 2
341
342

Table 7 (Continued)

ex2 1 7 131/20/181 1 -13,698.362 -4,105.278 - 0.07 -


2 -10,643.558 -4,105.278 - 1.34 11
3 -8,219.738 -4,105.278 - 2.68 5
4 -6,750.114 -4,105.278 - 4.66 10
5 -5,450.142 -4,105.278 - 16.42 11
6 -5,014.019 -4,105.278 - 32.40 6
7 -4,740.743 -4,105.278 - 38.61 15
8 -4,339.192 -4,150.410 - 86.47 4
9 -4,309.425 -4,150.410 - 133.76 11
10 -4,250.248 -4,150.410 - 240.50 6
11 -4,156.125 -4,150.410 - 333.08 5
309/109/270 12 -4,150.411 -4,150.410 - 476.47 5
ex9 2 2 68/14/97 1 55.556 100.000 - 0.00 -
... 2 78.679 100.000 - 0.83 10

www.it-ebooks.info
... 3 95.063 100.000 - 5.77 19
... 4 98.742 100.000 - 18.51 8
... 5 99.684 100.000 - 37.14 11
184/72/155 6 99.960 100.000 - 56.00 10
ex9 2 3 90/18/125 1 -30.000 0.000 - 0.00 -
... 2 -30.000 0.000 - 2.29 12
... 3 -30.000 0.000 - 6.30 12
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

... 4 -30.000 0.000 - 8.48 12


... 5 -24.318 0.000 - 30.44 22
Table 7 (Continued)

... 6 -23.393 0.000 - 16.25 10


... 7 -21.831 0.000 - 31.92 12
... 8 -19.282 0.000 - 26.53 12
... 9 -7.724 0.000 - 203.14 17
332/139/246 10 -0.000 0.000 - 5,426.48 12
ex9 2 6 105/22/149 1 -1.500 3.000 - 0.02 -
... 2 -1.500 1.000 - 9.30 28
... 3 -1.500 0.544 - 34.29 15
... 4 -1.500 -1.000 - 48.33 12
... 5 -1.500 -1.000 - 130.04 22
... 6 -1.500 -1.000 - 50.29 24
... 7 -1.500 -1.000 - 53.66 13
... 8 -1.500 -1.000 - 67.93 22
... 9 -1.500 -1.000 - 103.13 14

www.it-ebooks.info
... 10 -1.366 -1.000 - 127.60 12
... 11 -1.253 -1.000 - 1,106.78 22
... 12 -1.116 -1.000 - 3,577.48 13
... 13 -1.003 -1.000 - 587.11 15
557/248/375 14 -1.003 -1.000 - 1,181.12 14
AN ALGORITHMIC FRAMEWORK FOR MINLP

ex5 2 5 802/120/1,149 1 -34,200.833 -3,500.000 - 0.10 -


1,280/359/1,388 2 -23,563.500 -3,500.000 - 7,192.22 239
ex5 3 3 1,025/181/1,474 1 -67,499.021 - - 0.17 -
1,333/335/1,628 2 -16,872.900 3.056 - 7,187.11 154
343
344

Table 7 (Continued)

du-opt 458/18/546 1 3.556 3.556 - 5.30 -


du-opt5 455/15/545 1 8.073 8.073 - 9.38 -
fo7 366/42/268 1 8.759 22.518 - 7,200 -
m6 278/30/206 1 82.256 82.256 - 182.72 -
no7 ar2 1 421/41/325 1 90.583 127.774 - 7,200 -
no7 ar3 1 421/41/325 1 81.539 107.869 - 7,200 -
no7 ar4 1 421/41/325 1 76.402 104.534 - 7,200 -

www.it-ebooks.info
o7 2 366/42/268 1 79.365 124.324 - 7,200 -
stockcycle 626/432/290 1 119,948.675 119,948.676 - 244.67 -
CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER
Table 8
Results for GLOBALLib and MINLPLib.

SC-MINLP COUENNE BONMIN 1 BONMIN 50


reformulated var/int/cons time time
instance original (LB) UB (LB) UB time UB time UB
ex14 2 1 122/0/124 21.81 0.000 0.18 0.000 0.13 0.000 46.25 0.000
ex14 2 2 56/0/57 12.86 0.000 0.10 0.000 0.05 0.000 28.02 0.000
ex14 2 6 164/0/166 42.56 0.000 1.03 0.000 1.25 0.000 138.02 0.000
ex14 2 7 277/0/280 128.70 0.000 0.63 0.000 0.28 0.000 170.62 0.000
ex2 1 1 12/0/8 2.84 -17.000 0.16 -17.000 0.04 -11.925 0.89 -17.000
ex2 1 2 14/0/10 1.75 -213.000 0.06 -213.000 0.02 -213.000 0.66 -213.000
ex2 1 3 24/0/17 1.62 -15.000 0.04 -15.000 0.02 -15.000 0.36 -15.000
ex2 1 4 12/0/10 1.28 -11.000 0.04 -11.000 0.03 -11.000 0.58 -11.000
ex2 1 5 29/0/30 2.21 -268.015 0.13 -268.015 0.03 -268.015 0.96 -268.015
ex2 1 6 26/0/21 3.31 -39.000 0.12 -39.000 0.04 -39.000 1.11 -39.000
ex2 1 7 71/0/61 1388.00 4,150.410 3.98 4,150.410 0.04 4,150.410 13.64 4,150.410
ex9 2 2 38/0/37 1,355.47 100.000 0.36 100.000 0.12 - 2.65 100.000
ex9 2 3 54/0/53 5,755.76 0.000 0.73 0.000 0.08 15.000 2.85 5.000
ex9 2 6 57/0/53 (-1.003) -1.000 0.78 -1.000 0.21 - 4.04 -1.000
ex5 2 5 442/0/429 (-23,563.500) -3,500.000 (-11,975.600) -3,500.000 2.06 -3,500.000 315.53 -3,500.000

www.it-ebooks.info
ex5 3 3 563/0/550 (-16,872.900) 3.056 (-16,895.400) 3.056 20.30 - 22.92 -
du-opt 242/18/230 13.52 3.556 38.04 3.556 41.89 3.556 289.88 3.556
du-opt5 239/15/227 17.56 8.073 37.96 8.073 72.32 8.118 350.06 8.115
fo7 338/42/437 (8.759) 22.518 (1.95) 22.833 - 22.518 - 24.380
m6 254/30/327 185.13 82.256 54.13 82.256 154.42 82.256 211.49 82.256
no7 ar2 1 394/41/551 (90.583) 127.774 (73.78) 111.141 - 122.313 - 107.871
AN ALGORITHMIC FRAMEWORK FOR MINLP

no7 ar3 1 394/41/551 (81.539) 107.869 (42.19) 113.810 - 121.955 - 119.092


no7 ar4 1 394/41/551 (76.402) 104.534 (54.85) 98.518 - 117.992 - 124.217
o7 2 338/42/437 (79.365) 124.324 (5.85) 133.988 - 126.674 - 130.241
stockcycle 578/480/195 251.54 119,948.676 84.35 119, 948.676∗ 321.89 119,948.676 328.03 119,948.676

* This time and correct solution were obtained with non-default options of Couenne (which failed with default settings).
345
346 CLAUDIA D’AMBROSIO, JON LEE, AND ANDREAS WÄCHTER

4. Summary. In this paper, we proposed an algorithm for solving


to global optimality a broad class of separable MINLPs. Our simple al-
gorithm, implemented within the AMPL modeling language, works with a
lower-bounding convex MINLP relaxation and an upper-bounding non-
convex NLP restriction. For the definition of the lower-bounding prob-
lem, we identify subintervals of convexity and concavity for the univariate
functions using external calls to MATLAB; then we develop a convex MINLP
relaxation of the problem approximating the concave intervals of each non-
convex function with the linear relaxation. The subintervals are glued
together using binary variables. We iteratively refine our convex MINLP
relaxation by modifying it at the modeling level. The upper-bound is ob-
tained by fixing the integer variables in the original non-convex MINLP,
then locally solving the associated non-convex NLP restriction. We pre-
sented preliminary computational experiments on models from a variety of
application areas, including problems from GLOBALLib and MINLPLib.
We compared our algorithm with the open-source solvers COUENNE as an
exact approach, and BONMIN as a heuristic approach, obtaining significant
success.
Acknowledgments. This work was partially developed when the first
author was visiting the IBM T.J. Watson Research Center, and their sup-
port is gratefully acknowledged. We also thank Pietro Belotti for discus-
sions about the modification of COUENNE to reformulate GLOBALLib and
MINLPLib instances.

REFERENCES

[1] E. Beale and J. Tomlin, Special facilities in a general mathematical programming


system for non-convex problems using ordered sets of variables, in Proc. of the
5th Int. Conf. on Operations Research, J. Lawrence, ed., 1970, pp. 447–454.
[2] P. Belotti, J. Lee, L. Liberti, F. Margot, and A. Wächter, Branching and
bounds tightening techniques for non-convex MINLP, Optimization Methods
and Software, 24 (2009), pp. 597–634.
[3] P. Bonami, L. Biegler, A. Conn, G. Cornuéjols, I. Grossmann, C. Laird,
J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Wächter, An algorithmic
framework for convex mixed integer nonlinear programs, Discrete Optimiza-
tion, 5 (2008), pp. 186–204.
[4] BONMIN. projects.coin-or.org/Bonmin, v. 1.0.
[5] A. Borghetti, C. D’Ambrosio, A. Lodi, and S. Martello, An MILP approach
for short-term hydro scheduling and unit commitment with head-dependent
reservoir, IEEE Transactions on Power Systems, 23 (2008), pp. 1115–1124.
[6] COUENNE. projects.coin-or.org/Couenne, v. 0.1.
[7] C. D’Ambrosio, J. Lee, and A. Wächter, A global-optimization algorithm for
mixed-integer nonlinear programs having separable non-convexity, in Proc. of
17th Annual European Symposium on Algorithms (ESA), Copenhagen, Den-
mark. Lecture Notes in Computer Science, A. Fiat and P. Sander, eds., 5757
(2009), pp. 107–118.
[8] M. Duran and I. Grossmann, An outer-approximation algorithm for a class of
mixed-integer nonlinear programs, Mathematical Programming, 36 (1986),
pp. 307–339.

www.it-ebooks.info
AN ALGORITHMIC FRAMEWORK FOR MINLP 347

[9] R. Fletcher and S. Leyffer, Solving mixed integer nonlinear programs by outer
approximation, Mathematical Programming, 66 (1994), pp. 327–349.
[10] R. Fourer, D. Gay, and B. Kernighan, AMPL: A Modeling Language for
Mathematical Programming, Duxbury Press/Brooks/Cole Publishing Co., sec-
ond ed., 2003.
[11] GLOBALLib. www.gamsworld.org/global/globallib.htm.
[12] O. Günlük, J. Lee, and R. Weismantel, MINLP strengthening for separable con-
vex quadratic transportation-cost UFL, 2007. IBM Research Report RC24213.
[13] L. Liberti, Writing global optimization software, in Global Optimization: From
Theory to Implementation, L. Liberti and N. Maculan, eds., Springer, Berlin,
2006, pp. 211–262.
[14] MATLAB. www.mathworks.com/products/matlab/, R2007a.
[15] MINLPLib. www.gamsworld.org/minlp/minlplib.htm.
[16] I. Nowak, H. Alperin, and S. Vigerske, LaGO – an object oriented library for
solving MINLPs, in Global Optimization and Constraint Satisfaction, vol. 2861
of Lecture Notes in Computer Science, Springer, Berlin Heidelberg, 2003,
pp. 32–42.
[17] I. Quesada and I. Grossmann, An LP/NLP based branch and bound algorithm
for convex MINLP optimization problems, Computer & Chemical Engineering,
16 (1992), pp. 937–947.
[18] N. Sahinidis, BARON: A general purpose global optimization software package,
J. Global Opt., 8 (1996), pp. 201–205.
[19] A. Wächter and L.T. Biegler, On the implementation of a primal-dual inte-
rior point filter line search algorithm for large-scale nonlinear programming,
Mathematical Programming, 106 (2006), pp. 25–57.

www.it-ebooks.info
www.it-ebooks.info
GLOBAL OPTIMIZATION OF MIXED-INTEGER
SIGNOMIAL PROGRAMMING PROBLEMS
ANDREAS LUNDELL∗ AND TAPIO WESTERLUND∗

Abstract. Described in this chapter, is a global optimization algorithm for mixed-


integer nonlinear programming problems containing signomial functions. The method
obtains a convex relaxation of the nonconvex problem through reformulations using
single-variable transformations in combination with piecewise linear approximations of
the inverse transformations. The solution of the relaxed problems converges to the
global optimal solution as the piecewise linear approximations are improved iteratively.
To illustrate how the algorithm can be used to solve problems to global optimality, a
numerical example is also included.

Key words. Nonconvex optimization, mixed-integer nonlinear programming, sig-


nomial functions, convex reformulations.

AMS(MOS) subject classifications. 90C11, 90C26, 90C30.

1. Introduction. In this chapter, an algorithm for finding the global


optimal solution to mixed-integer nonlinear programming (MINLP) prob-
lems containing signomial functions is described. The algorithm employs
single-variable transformations convexifying the nonconvex signomial func-
tions termwise. When the inverse transformations are approximated with
piecewise linear functions, the feasible region of the reformulated problem
is overestimated, i.e., the solution of this problem is a lower bound of the
solution of the original problem. Also included in the technique for solving
these kinds of problems to global optimality is a preprocessor for deter-
mining an optimized set of transformations. By adding new breakpoints to
the piecewise linear functions, the feasible region can be made tighter and
tighter until the global optimal solution of the original problem is obtained.
Optimization problems involving signomial functions appear in many
different application areas, since, for instance, all polynomial, bilinear and
trilinear functions are special cases of signomials. Some special applica-
tions are in design of heat exchanger networks [5, 7], chemical reactors [6],
optimal condensers [2], delta-sigma modulator topologies [11] and induc-
tors [12], as well as, chemical equilibrium [24], optimal control [13], image
restoration [28] and trim-loss minimization [10] problems.
The results presented in this chapter is a review over previously found
results from, e.g., [15, 16, 18, 25, 31, 33].
2. The problem formulation. The class of MINLP problems which
can be solved to global optimality using the techniques in this chapter is
of the mixed-integer signomial programming (MISP) type.

∗ Process Design and Systems Engineering, Åbo Akademi University, FIN-20500

Turku, Finland ({andreas.lundell, tapio.westerlund}@abo.fi).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 349
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_12,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
350 ANDREAS LUNDELL AND TAPIO WESTERLUND

Definition 2.1. A MISP problem can be formulated as

minimize f (x),
subject to Ax = a, Bx ≤ b,
(2.1)
g(x) ≤ 0,
q(x) + σ(x) ≤ 0.

The vector of variables x = [x1 , x2 , . . . , xN ] may consist of both continuous


and discrete variables. The objective function f is assumed to be convex
and Ax = a and Bx ≤ b are linear equality and inequality constraints
respectively. The constraints g(x) ≤ 0 are nonlinear convex inequality
constraints, and the constraints q(x) + σ(x) ≤ 0 are generalized signo-
mial constraints, consisting of sums of convex functions q(x) and signomial
functions σ(x). A signomial objective function s(x) can also be used, by
introducing the variable μ as the new objective function, and including the
original objective function as the signomial constraint s(x) − μ ≤ 0.
A signomial function is a sum of signomial terms, where each term
consists of products of power functions. Thus, a signomial function of N
variables and J signomial terms can be expressed mathematically as


J 
N
p
σ(x) = cj xi ji , (2.2)
j=1 i=1

where the coefficients cj and the powers pji are real-valued. A posynomial
term is a positive signomial term, so signomial functions are generalizations
of posynomial functions, since the terms are allowed to be both positive
and negative. Note that, if a certain variable xi does not exist in the j-th
term, then pji = 0.
The variables xi are allowed to be reals or integers. Since imaginary
solutions to the problems are not allowed, negative values on a variable
appearing in a signomial term with noninteger power must be excluded.
Also, zero has to be excluded in case a power is negative. Thus, all variables
occurring in the signomial terms are assumed to have a positive fixed lower
bound. For variables having a lower bound of zero, this bound may be
approximated with a small positive lower bound of  > 0. Furthermore,
translations of the form xi = xi + τi , where τi > | min xi |, may be used
for variables with a nonpositive lower bound. Note however, that using
translations may introduce additional variables and signomial terms into
the problem.
Signomials are often highly nonlinear and nonconvex. Unfortunately,
convex envelopes are only known for some special cases, for instance, so-
called McCormick envelopes for bilinear terms [22]. Therefore, other tech-
niques for dealing with optimization problems containing signomial func-
tions are needed, including the αBB underestimator [1, 8] or the methods

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 351

used in BARON [27, 29]. These techniques are not confined to convexify-
ing signomial functions only, rather they can be applied to larger classes of
nonconvex functions.
In the transformation method presented here, based on single-variable
power and exponential transformations, the convexity of signomial func-
tions is guaranteed termwise. This is, however, only a sufficient con-
dition for convexity; for instance, the signomial function f (x1 , x2 ) =
x21 + 2x1 x2 + x22 is convex although the middle term is nonconvex. In
the next theorem convexity conditions for signomial terms, first derived in
[21], are given.
Theorem 2.1. The positive signomial term s(x) = c · xp1i · · · xpNN ,
where c > 0, is convex if one of the following two conditions is fulfilled: (i)
all powers pi are negative, or (ii) one power pk is positive, the rest of the
powers pi , i = k are negative and the sum of the powers is greater than or
equal to one, i.e.,


N
pi ≥ 1. (2.3)
i=1

The negative signomial term s(x) = c · xp1i · · · xpNN , where c < 0, is convex
if all powers pi are positive and the sum of the powers is between zero and
one, i.e.,


N
0≤ pi ≤ 1. (2.4)
i=1

By using the previous theorem, it is easy to determine the convexity


of any signomial term, which is illustrated in the following example.
Example 1. For x1 , x2 > 0, the following signomial terms are convex

1 x21 √
= x−1 −1
1 x2 , = x21 x−1
2 and − x1 = −x0.5
1 (2.5)
x1 x2 x2
while the following terms are nonconvex
x1 √
= x1 x−1
2 , x1 = x0.5
1 and − x 1 x2 . (2.6)
x2

In Section 3, it is explained how it is possible to deduce single-variable


power transformations to convexify general nonconvex signomial terms us-
ing Theorem 2.1.
2.1. Piecewise linear functions using special ordered sets. In
the transformation technique presented in the next section, piecewise linear
functions (PLFs) play an important role. There are many different ways
of expressing PLFs; here, one using so-called special ordered sets (SOS) is

www.it-ebooks.info
352 ANDREAS LUNDELL AND TAPIO WESTERLUND

f (x)

X3
w2 = 0.25, w3 = 0.75

X2

w1 = 0.5, w2 = 0.5

X1 x

x1 x2 x3

Fig. 1: A PLF using the SOS type 2 formulation.

given, but other variants, including methods using binary variables, can be
found in [9]. However, SOS formulations are often computationally more
efficient in optimization problems than those using binary variables.
A special ordered set of type 2 is defined as a set of integers, con-
tinuous or mixed-integer and continuous variables, where at most two
variables in the set are nonzero, and if there are two nonzero variables,
these must be adjacent in the set. For example, the sets {1, 0, . . . , 0} and
{0, a, b, 0 . . . , 0}, a, b ∈ R are SOS type 2 sets.
In the PLF formulation in [3] presented here, one SOS of type 2,
{wk }Kk=1 , is used, with the additional conditions that all variables wk as-
sume positive real values between zero and one, and that the sum of the
variables in the set is equal to one, i.e.,


K
∀ k = 1, . . . , K : 0 ≤ wk ≤ 1 and wk = 1. (2.7)
k=1

The variable x ∈ [x, x] can then be expressed as


K
x= xk wk , (2.8)
k=1

and the PLF approximating the function f in the interval [x, x] becomes


K
fˆ(x) = Xk wk , (2.9)
k=1

where the function f assumes the values Xk = f (xk ) at the K consecutive


points xk ∈ [x, x], x1 < x2 < · · · < xK . In this formulation, only one

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 353

additional variable wk is required for each breakpoint added to the PLF.


A PLF approximation using the SOS type 2 formulation is illustrated in
Figure 1.
3. The transformation technique. In this section, the transfor-
mation technique for signomial functions in MINLP problems of the type
in Definition 2.1 is presented in more detail. First, the transformation
schemes for positive and negative signomial terms are presented, and then
some theoretical results regarding the relationships between the different
transformations are given.
The transformation procedure for an individual generalized signomial
constraint can be illustrated as follows
(i)
qm (x) + σm (x) ≤ 0 −→ qm (x) + σm
C
(x, X)
(3.1)
(ii)
≤ 0 −→ qm (x) + σm
C
(x, X̂) ≤ 0.

In step (i), the nonconvex signomial terms in the signomial function qm (x)
are convexified using single-variable transformations xi = T ji (X ji ), where
X ji is the new transformation variable. In this step, the problem is refor-
mulated so that the generalized signomial constraints are convex, but the
problem itself is still nonconvex, since the nonlinear equality constraints
representing the relations between the transformation variables and the
original variables, i.e., X ji = T ji −1 (xi ), must be included. In step (ii),
however, these relations are approximated with PLFs, in such a way that
the relaxed and convex feasible region of the transformed problem will
overestimate that of the original problem. Thus, the solution to the trans-
formed problem will be a lower bound of the original problem. The lower
bound can then be improved by iteratively adding more breakpoints to the
PLFs and, if the breakpoints are chosen in a certain way, the solution of
the approximated problems will form a converging sequence to the global
optimal solution of the original nonconvex problem.
The transformation technique using single-variable transformations
have been studied previously in many papers, e.g., [4, 15, 25, 31].
3.1. Transformations for positive terms. A positive signomial
term is convexified using single-variable power transformations (PTs), as
well as underestimated by expressing the relation between the original and
transformation variables with PLFs according to
 p  p 
s(x) = c xi i = c xi i · Xipi Qi = sC (x, X) ≥ sC (x, X̂), (3.2)
i i:pi <0 i:pi >0

as long as the transformation powers Qi fulfill certain conditions. There


are two different types of transformation schemes using PTs – the negative
power transformation (NPT) and positive power transformation (PPT) –
corresponding to the two different cases for the convexity of positive terms

www.it-ebooks.info
354 ANDREAS LUNDELL AND TAPIO WESTERLUND

in Theorem 2.1. Note that, for a positive term, only the variables xi having
a negative power pi must be transformed.
Definition 3.1. The NPT convex underestimator for a positive sig-
nomial term is obtained by applying the transformation
xi = XiQi , Qi < 0, (3.3)
to all variables xi with positive powers (pi > 0) as long as the inverse
1/Q
transformation Xi = xi i is approximated by a PLF X̂i .
The transformation technique in [14] and [30] is similar to the NPT,
in the respect that it employs single-variable PTs with negative powers
approximated by linear functions; it is, however designed to be used in a
branch-and-bound type framework.
Definition 3.2. The PPT convex underestimator for a positive sig-
nomial term is obtained by applying the transformation
xi = XiQi , (3.4)
to all variables with positive powers, where the transformation powers Qi <
0 for all indices i, except for one (i = k), where Qk ≥ 1. Furthermore, the
condition
 
pi Qi + pi ≥ 1 (3.5)
i:pi >0 i:pi <0

1/Q
must be fulfilled and the inverse transformation Xi = xi i approximated
by a PLF X̂i .
There is also another transformation scheme available for convexify-
ing positive signomial terms, namely the exponential transformation (ET).
The single-variable ET has been used for a long time for reformulation of
nonconvex geometric programming problems. This transformation is based
on the fact that the function
f (x) = c · ep1 x1 +p2 x2 +...+pi xi · xi+1
p p
i+1 i+2
xi+2 · · · xpI I , (3.6)
where c > 0, p1 , . . . , pi > 0 and pi+1 , . . . , pI < 0, is convex on Using Rn+ .
the ET, a nonconvex positive signomial term is convexified and underesti-
mated according to
 p  p 
s(x) = c xi i = c xi i · epi Xi = sC (x, X) ≥ sC (x, X̂). (3.7)
i i:pi <0 i:pi >0

As in the NPT and PPT, only the variables xi having a positive power pi
must be transformed.
Definition 3.3. The ET convex underestimator for a positive signo-
mial term is obtained by applying the transformation
xi = eXi (3.8)
to the individual variables with positive powers as long as the inverse trans-
formation Xi = ln xi is approximated by a PLF X̂i .

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 355

3.1.1. Examples of the transformation technique. In the fol-


lowing two examples, the transformations in Definitions 3.1–3.3 are used
to find convex underestimators for two signomial functions.
Example 2. The convex underestimator for the function f (x1 , x2 ) =
x1 x2 obtained when applying the ET to the function is given by

fˆE (X̂1,E , X̂2,E ) = eX̂1,E eX̂2,E , (3.9)

where the inverse transformations X1,E = ln x1 and X2,E = ln x2 have


been replaced with the PLF approximations X̂1,E and X̂2,E respectively.
Applying either of the PTs to the function, gives the convex underestimator

fˆP (X̂1,P , X̂2,P ) = X̂1,P


Q1 Q2
X̂2,P , (3.10)

1/Q 1/Q
where the inverse transformations X1,P = x1 1 and X2,P = x2 2 have
been replaced with the PLF approximations X̂1,P and X̂2,P respectively. If
the transformation used is the NPT, both Q1 and Q2 must be negative, and
if the PPT is used, one of these must be positive, the other negative and
Q1 + Q2 ≥ 1. For example, Q1 = Q2 = −1 or Q1 = 2 and Q2 = −1 gives
convex underestimators for f (x1 , x2 ).
Example 3. The convex underestimator for the function f (x1 , x2 ) =
x1 /x2 = x1 x−1
2 obtained when applying the ET to the function is given by

fˆE (X̂1,E , x2 ) = eX̂1,E x−1


2 , (3.11)

where the inverse transformation X1,E = ln x1 has been replaced with the
PLF approximation X̂1,E , i.e., only one variable is transformed. When ap-
plying either of the PTs to the function, the convex underestimator becomes
Q1 −1
fˆP (X̂1,P , x2 ) = X̂1,P x2 , (3.12)

1/Q
where the inverse transformation X1,P = x1 1 has been replaced with
the PLF approximation X̂1,P . If the transformation used is the NPT, Q1
must be negative, and if the PPT has been used, Q1 must be positive and
Q1 − 1 ≥ 1, i.e., Q1 ≥ 2. For example, Q1 = 2 can be used. Also in these
cases, only one transformation is needed.
3.2. Transformations for negative terms. In a similar way as for
positive terms, a negative signomial term is convexified and underestimated
using single-variable PTs according to
  
s(x) = c xpi i = c Xipi Qi · Xipi Qi
i i:pi <0 i:pi >0
(3.13)
= sC (x, X) ≥ sC (x, X̂).

www.it-ebooks.info
356 ANDREAS LUNDELL AND TAPIO WESTERLUND

Q2 Q2
PPT

1 1

1 Q1 PT 1 Q1

PPT
NPT

(a) PPT and NPT (b) PT for negative terms

Fig. 2: Overviews of how the bilinear terms x1 x2 and −x1 x2 are trans-
formed using the PTs into X1Q1 X2Q2 . The colored regions indicates convex
transformations.

However, as the convexity conditions for negative terms are different ac-
cording to Theorem 2.1, the requirements on the powers Qi in the trans-
formations are also different. For example, variables with negative powers
also require transformations.
Definition 3.4. A convex underestimator for a negative signomial
term is obtained by applying the transformation
xi = XiQi , (3.14)
where 0 < Qi ≤ 1 for all variables with positive powers and Qi < 0 for
all variables with negative power, to the individual variables in the term.
Furthermore, the condition

0< pi Qi ≤ 1, (3.15)
i

1/Qi
must be fulfilled and the inverse transformation Xi = xi approximated
by a PLF X̂i .
3.2.1. Example of the transformation technique. In the follow-
ing example, a convex underestimator for a negative signomial term is
obtained using the previous definition.
Example 4. The convex underestimator for the function f (x1 , x2 ) =
−x1 x−1
2 , obtained by applying the PTs for negative terms, is given by
−1·Q2
fˆP (X̂1,P , X̂2,P ) = −X̂1,P
Q1
X̂2,P , (3.16)
1/Q 1/Q
where the inverse transformations X1,P = x1 1 and X2,P = x2 2 have
been replaced with the PLF approximations X̂1,P and X̂2,P respectively.

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 357

In this case, Q1 must be positive and Q2 negative for the powers in the
convexified term 1 · Q1 and −1 · Q2 to both be positive. Furthermore, the
sum of the powers in the transformed term needs to be less than or equal to
one, so for example Q1 = 1/2 and Q2 = −1/2 give a valid transformation.
3.3. Relationships between the transformations. Application of
the single-variable transformations together with the approximation using
PLFs, results in convex underestimators of the original nonconvex term.
Depending on what transformation is used, and in the case of the PTs,
what the value of the transformation power Q is, different underestimation
errors occur. In this section, some results regarding the relationships, con-
nected to the underestimation properties, between the transformations are
summarized. More detail on this subject, as well as proofs of the results,
can be found in [15] and [18].
The first result is regarding the underestimation error resulting from
transforming an individual nonconvex power function xp .
Theorem 3.1. Assume that a single-variable ET and single-variable
PTs with positive and negative powers, i.e., the transformations

x = eXE , x = XPQP , QP ≥ 1, and QN


x = XN , QN < 0,

are applied to the power function xp , p > 0, where x ∈ R+ or x ∈ Z and


x ∈ [x, x], x > 0. Then the following is true
1 2p 1 2p 1 2p
∀x ∈ [x, x] : X̂PQP QN
≥ eX̂E ≥ X̂N , (3.17)

when the inverse transformations

XE = ln x, XP = x1/QP and XN = x1/QN ,

have been replaced by the PLFs X̂E , X̂P and X̂N respectively.
Although this theorem states that any PT with positive transformation
power always gives a tighter convex underestimator than the ET and the
ET a tighter convex underestimator than any PT with negative transfor-
mation power, the limit of the single-variable PTs when the transformation
power Q tends to plus or minus infinity is actually the single-variable ET:
Theorem 3.2. For the piecewise linear approximations X̂P , X̂N and
X̂E of the single-variable PT with positive and negative powers and ET
respectively, the following statement is true

∀x ∈ [x, x] : lim X̂PQ = eX̂E = Q


lim X̂N , (3.18)
Q→∞ Q→−∞

i.e., a single-variable PT with positive or negative power tends to a single-


variable ET as the transformation powers tend to plus and minus infinity
respectively.

www.it-ebooks.info
358 ANDREAS LUNDELL AND TAPIO WESTERLUND

Using the results from the previous theorems, the following theorem
regarding the underestimation properties of a general positive signomial
term can be obtained:
Theorem 3.3. For a general nonconvex signomial term transformed
using the ET, NPT or PPT, the following statements are true:
(i) The ET always gives a tighter underestimator than the NPT.
(ii) The PPT gives a tighter underestimator than the NPT as long as the
transformation powers Qi,N and Qi,P in the NPT and PPT respectively,
fulfill the condition

∀i : pi > 0, i = k : Qi,P ≤ Qi,N . (3.19)

(iii) Neither of the PPT nor ET gives a tighter convex underestimator in


the whole domain.
4. The SGO algorithm. The signomial global optimization (SGO)
algorithm is a method for solving nonconvex MISP problems to global
optimality as a sequence of overestimated convex MINLP subproblems. It
is based on the generalized geometric programming extended cutting plane
(GGPECP) algorithm from [31] and [33]. The SGO algorithm in the form
given here was presented in [15] and [19]. One of the major differences
between the original GGPECP algorithm and the newer SGO algorithm,
is that the latter includes a preprocessing step, in which the single-variable
transformations convexifying the signomial terms are obtained by solving
a mixed-integer linear programming (MILP) problem. Furthermore, the
SGO algorithm can use any convex MINLP solver for the subproblems,
whereas the GGPECP algorithm, as the name indicates, uses the solver
αECP exclusively.
The SGO algorithm solves problem of the MISP type to global opti-
mality as a sequence of converging convex subproblems. In each iteration,
the approximation of the overestimated feasible region of the reformulated
problem is improved by adding additional breakpoints to the PLFs de-
scribing the relationships between the transformation variables and origi-
nal variables. However, as shown in [31], it is only possible to guarantee
convergence to the global solution if the breakpoints are chosen in a certain
way. The strategies for selecting the breakpoints are described in more de-
tail later on in this section. Additionally, a flowchart illustrating the SGO
algorithm is included in Figure 3.
4.1. The preprocessing step. There are many degrees of freedom
regarding how to choose the transformations for the nonconvex signomial
terms in the MISP problem. For positive terms, the first choice is whether
to use the ET or either of the PTs on an individual term, and if either of the
PTs is chosen, the transformation power must also be selected. For nega-
tive terms, only PTs are applicable, however, also here the transformation
power can be selected in infinitely many different ways.

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 359

Preprocessing step: Solve the MILP problem


to obtain the required transformations

Select initial break-


points for the PLFs

The relaxed convexied and overestimated Add new breakpoints


problem is solved using a convex MINLP solver to one or more PLFs

Solution fullls the no


termination criteria?

yes

Optimal solution found

Fig. 3: A flowchart of the SGO algorithm.

Since the goal is to implement the underestimation technique in an


automatic manner, a scheme for selecting the transformations in an appro-
priate manner is needed. In [16], a MILP problem formulation for deter-
mining an optimized set of transformations, subject to the values of certain
strategy parameters, was presented. It was later extended in [17] to also in-
clude the ET for positive signomial terms. Finally, in [20] some additional
enhancements were introduced.
By using the MILP method, it is possible to find an optimized set of
transformations, subject to certain strategy parameters. For instance, the
set containing the smallest number of transformations required to trans-
form the problem or the set of transformations where the fewest original
variables in the problems are transformed can be obtained. The latter is
important, since it is possible to reuse variables and expressions for the
PLFs corresponding to transformations of the same variable, even if the
transformations used are not identical, leading to a combinatorially sim-
pler reformulated MISP problem.

www.it-ebooks.info
360 ANDREAS LUNDELL AND TAPIO WESTERLUND

4.2. Solving the convex MINLP problem. After the transfor-


mations have been determined, the nonconvex signomial constraints are
convexified using the single-variable transformations xi = T ji (X ji ). Fur-
thermore, the relationships between the inverse transformation and the
original variables are expressed using PLFs, for example using the SOS
formulation in Section 2.1. After this, the convexified and overestimated
MISP problem can, in principle, be solved using any convex MINLP solver.
4.3. Termination criteria. When the solution to the overestimated
problem has been found, it is checked whether this solution fulfills the
constraints in the original nonconvex problem. Mathematically, this can
be stated as an -criterion, for the m = 1, . . . , M generalized signomial
constraints to be satisfied, as

max (qm (x∗ ) + σm (x∗ )) ≤ T , (4.1)


m

where x∗ is the optimal value of the current subproblem and T ≥ 0.


Another -criterion can additionally be used: If the distance from the
solution point x∗ to the nearest breakpoint is less than D , no more itera-
tions are performed. This criterion can be specified as
 

max min |xi − x̌i,k | ≤ D , (4.2)
i k


where {x̌i,k }K
k=1 is the set of breakpoints for the variable xi and xi is the
i

optimal value for the variable xi in the current iteration. This criterion
is based on the fact that a solution of the transformed problem is exactly
equal to the solution given by the original problem at the breakpoints of
the PLFs.
4.4. Updating the PLFs. If neither of the termination criteria is
met, additional breakpoints must be added to the PLFs. Now there are
two things that must be considered, namely which transformation variable
approximations to improve and which points to add to the corresponding
PLFs? This subject is explained in detail in [31]; here, only a brief summary
is given.
The simplest strategy for selecting the variables, is to add breakpoints
to the PLFs of all variables transformed. However, for large problems with
many different transformations, this may make the transformed problem
unnecessary complex. Instead, breakpoints could be added to as few PLFs
as possible. For example, it is not necessary to update the PLFs cor-
responding to transformation variables in nonconvex constraints already
fulfilled. A further restriction is to only add breakpoints to the variables
in the constraints violated the most in each iteration.
The breakpoints to be added must also be determined. Several strate-
gies exist, for example the solution point in the previous iteration can be
added. However, this strategy may, unfortunately, lead to subproblems

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 361

X X

x x

x x x x∗ x
(a) The original PLF (b) Solution point x∗ added

X X

x x
x (x − x)/2 x x x̃ x
(c) Midpoint of interval added (d) Most deviating point x̃ added

Fig. 4: Illustration of the impact on the PLF approximation when using


the different strategies for adding new breakpoints.

whose solution only converges to a local solution of the nonconvex prob-


lem. To guarantee to reach the global optimum, other strategies must be
used. For example, by adding the midpoint of or the most deviating point
in the current interval of breakpoints, to which the previous solution be-
longs, it has been shown (see [31]) that the global optimal solution will
always be found.
The most deviating point for x in the interval [x, x] is, according to
[15], the point
x−x
x̃ = (4.3)
ln(x/x)
for the single-variable ET and
  1−Q
Q

x1/Q − x1/Q
x̃ = Q· (4.4)
x−x

www.it-ebooks.info
362 ANDREAS LUNDELL AND TAPIO WESTERLUND

Fig. 5: Shown in the figure are the linear constraints l1 : 0.5x1 − x2 = 0


and l2 : −0.5x1 + x2 = −3, as well as the signomial constraint σ(x1 , x2 ).
The dark gray region is the integer-relaxed feasible region of the problem.

for any of the single-variable PTs. To exemplify this, an illustration of how


the different strategies for selecting the breakpoint improves the PLFs is
included in Figure 4.
5. An illustrative example. In this section, the SGO algorithm is
applied to the following MISP problem:

minimize − x1 − 3x2 ,
subject to − 0.5x1 + x2 ≤ 3,
0.5x1 − x2 ≤ 1,
8
 (5.1)
σ(x1 , x2 ) = sj (x1 , x2 ) + 1.5 ≤ 0,
j=1

1 ≤ x1 ≤ 7, 1 ≤ x2 ≤ 5,
x1 ∈ Z+ , x2 ∈ R+ .

The signomial terms sj (x1 , x2 ) in the signomial constraint, and the trans-
formations convexifying them, are listed in Table 1. The signomial function
σ(x1 , x2 ) is a modification of the Taylor series expansion of the function
sin(x1 +x2 ) at the point (x1 , x2 ) = (1, 1). The feasible region of the original
problem is shown in Figure 5.
As can be seen from Table 1, a total number of nine transformations
are needed to convexify the nonconvex signomial terms. However, of these
transformations, some are identical in different terms, so the number of
different transformations are five, three for the variable x1 and two for
x2 . Furthermore, only two special ordered sets are needed in the PLF

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 363

Table 1
The signomial terms in the signomial constraint σ(x1 , x2 ) in the example
in Section 5.

j sj (x1 , x2 ) Tj (X1,◦ ) Tj (X2,◦ ) ŝj (x1 , x2 , X1,◦ , X2,◦ )


1 1.572020 x1 1.572020 x1
2 −0.435398 x21 0.5
x1 = X1,1 −0.435398 X1,1
3 1.572020 x2 1.572020 x2
0.5 0.5 0.5 0.5
4 −0.832294 x1 x2 x1 = X1,1 x2 = X2,1 −0.832294 X1,1 X2,1
5 −0.246575 x21 x2 0.25
x1 = X1,2 x2 0.5
= X2,1 −0.246575 X1,2 0.5 0.5
X2,1
6 −0.435398 x22 x2 0.5
= X2,1 −0.435398 X2,1
7 −0.246575 x1 x22 0.5
x1 = X1,1 x2 0.25
= X2,2 −0.246575 X1,1 0.5 0.5
X2,2
−0.5 −1 2
8 0.227324 x21 x22 x1 = X1,3 0.227324 X1,3 x2

formulation, one for each variable. Thus, the reformulated problem will
become

minimize − x1 − 3x2 ,
subject to − 0.5x1 + x2 ≤ 3,
0.5x1 − x2 ≤ 1,
8
 (5.2)
ŝj (x1 , x2 , X̂1,◦ , X̂2,◦ ) + 1.5 ≤ 0,
j=1

1 ≤ x1 ≤ 7, 1 ≤ x2 ≤ 5,
x1 ∈ Z+ , x2 ∈ R+ ,

where the transformation variables X1,◦ and X2,◦ have been replaced with
the PLFs X̂1,◦ and X̂2,◦ , which are formulated using the expressions from
Section 2.1. The interval endpoints for the variables x1 and x2 are used as
initial breakpoints for the PLFs. The overestimated feasible region of the
problem in the first iteration is illustrated in Figure 7a.
This problem is then solved using a MINLP solver able to solve convex
MINLP problems to global optimality. In this case GAMS/αECP [32]
has been used. The optimal solution in the first iteration is (x1 , x2 ) =
(6, 5.00), which gives an objective function value of −21.00. The original
signomial constraint σ(x1 , x2 ) has the value 90.48 in this point, so it is not
yet optimal since the value is positive. Therefore, more SGO iterations are
needed and additional breakpoints must be added in the next iteration. In
this example, two different strategies are used: the first one is to add the
solution point of the transformed variables, and the second is to add the
midpoint of the current interval of breakpoints which the solution belongs

www.it-ebooks.info
364 ANDREAS LUNDELL AND TAPIO WESTERLUND

−14

Objective function value


−15
−16
−17
−18
−19
Midpoint strategy
−20
Solution point strategy
−21
1 2 3 4 5 6
Iteration

Fig. 6: The objective function value in each SGO iteration.

to. Since x1 is a discrete variable, the calculated value of the midpoint


is rounded upwards if it is noninteger. The solution process and number
of SGO iterations required when using the two breakpoint strategies are
different as illustrated in Figure 6.
If the midpoint strategy is used, the breakpoint 4 is added to the PLFs
X̂1,◦ and the breakpoint 3 to the PLFs X̂2,◦ in the second iteration. In
total, the solution of this problem required four SGO iterations before the
value of the function in the signomial constraint became nonpositive, i.e.,
the global optimal solution was found. The solution to the overestimated
problems in each iteration is listed in Table 2 and the feasible regions are
shown in Figure 7.
If instead of adding the midpoint of the interval as a new breakpoint,
the solution point itself is added as breakpoint, the solution process will
be different. In this case, it will take a total of six iterations to find the
global optimal solution. How the feasible region is overestimated in each
SGO iteration, is shown in Figure 8.

6. Computational results. A proof-of-concept implementation of


the SGO algorithm has been implemented in C# and GAMS [26] and
is described in [15] and [19]. Although it is too early to compare it to
other available global optimization solvers performance-wise, to give an
indication of what sizes of problems the algorithm can solve, it is applied
to some trim-loss problems available from the MINLP Library [23]. In
this type of problem, all the variables are discrete (binary or integer) and
the nonlinearities consists of negative bilinear terms. These problems are
somewhat problematic for the solver, since all the variables in the bilinear
terms require translations because their lower bounds are zero. Although
no additional nonconvex terms appear in the constraints in this type of
problems, additional linear terms do. As can be seen from the results in

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 365

Table 2
The solution of the convex MINLP problem in each SGO iteration in the
example in Section 5 when using the midpoint strategy.

Iter. Breakpoints Opt. sol. x1 x2 σ(x1 , x2 )


x1 x2
1 {1, 7} {1, 5} -21.00 6 5.00 90.48
2 {1, 4, 7} {1, 3, 5} -17.39 5 4.13 30.82
3 {1, 4, 6, 7} {1, 3, 4, 5} -16.50 3 4.50 5.78
4 {1, 3, 4, 6, 7} {1, 3, 4, 4.5, 5} -14.00 2 4.00 -1.72

Table 3

The results of solving some trim-loss problems available in the MINLP Li-
brary using an implementation of the SGO algorithm. The number of vari-
ables indicated are in the original nonconvex problem. All transformations
are power transformations of the type xi = Xi0.5 .

Problem #bin. vars #int. vars #transf. SGO sol. Glob. sol.
ex1263a 4 20 20 9.3 9.3
ex1264a 4 20 20 8.6 8.6
ex1265a 5 30 30 10.3 10.3
ex1266a 6 42 42 16.3 16.3

Table 3, the globally optimal solution was found in all cases (and all choices
of breakpoint and variable selection strategies).
7. Conclusions. In this chapter, the SGO algorithm, a global opti-
mization algorithm for solving nonconvex MISP problems to global opti-
mality as a sequence of convex and overestimated subproblems, was pre-
sented. It was shown how the nonconvex problem is reformulated using
single-variable transformations applied to the nonconvex signomial terms,
after which the relationship describing the inverse transformation between
the transformation variables and the original variables are approximated
using PLFs. This resulted in a relaxed convex problem, the feasible region
of which is an overestimation of that of the original problem. The solution
of this transformed problem provides a lower bound to the solution of the
original problem, and by iteratively improving the PLFs by adding addi-
tional breakpoints, the global optimal solution can be found. It was also
illustrated, through an example, how the strategy for selecting the break-
points impacted the number of iterations required to solve the problem.

www.it-ebooks.info
366 ANDREAS LUNDELL AND TAPIO WESTERLUND

Fig. 7: The feasible region of the convex overestimated problem when using
the midpoint strategy. The dark gray region is the integer-relaxed feasible
region of the nonconvex problem and the lighter parts correspond to the
piecewise convex overestimation of the signomial constraint σ(x1 , x2 ). The
dark points correspond to the optimal solution of the current iteration.
The dashed lines indicate the location of the breakpoints in the PLFs.

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 367

Fig. 8: The feasible region of the convex overestimated problem when using
the solution point strategy. The dark gray region is the integer-relaxed
feasible region of the nonconvex problem and the lighter parts correspond to
the piecewise convex overestimation of the signomial constraint σ(x1 , x2 ).
The dark points correspond to the optimal solution of the current iteration.
The dashed lines indicate the location of the breakpoints in the PLFs.

www.it-ebooks.info
368 ANDREAS LUNDELL AND TAPIO WESTERLUND

REFERENCES

[1] C.S. Adjiman, S. Dallwig, C.A. Floudas, and A. Neumaier, A global opti-
mization method, αBB, for general twice-differentiable constrained NLPs –
I. Theoretical advances, Computers and Chemical Engineering, 22 (1998),
pp. 1137–1158.
[2] M. Avriel and D.J. Wilde, Optimal condenser design by geometric program-
ming, Industrial & Engineering Chemistry Process Design and Development,
6 (1967), pp. 256–263.
[3] E.M.L. Beale and J.J.H. Forrest, Global optimization using special ordered
sets, Mathematical Programming, 10 (1976), pp. 52–69.
[4] K.-M. Björk, A Global Optimization Method with Some Heat Exchanger Network
Applications, PhD thesis, bo Akademi University, 2002.
[5] K.-M. Björk, I. Grossmann, and T. Westerlund, Solving heat exchanger net-
work synthesis problems with non-constant heat capacity flowrates and heat
transfer coefficients, AIDIC Conference Series, 5 (2002), pp. 41–48.
[6] G.E. Blau and D.J. Wilde, A lagrangian algorithm for equality constrained gen-
eralized polynomial optimization, AIChE Journal, 17 (1971), pp. 235–240.
[7] R.J. Duffin and E.L. Peterson, Duality theory for geometric programming,
SIAM Journal on Applied Mathematics, 14 (1966), pp. 1307–1349.
[8] C.A. Floudas, Deterministic Global Optimization. Theory, Methods and Appli-
cations, no. 37 in Nonconvex Optimization and Its Applications, Kluwer Aca-
demic Publishers, 1999.
[9] C.A. Floudas and P.M. Pardalos, eds., Encyclopedia of Optimization, Kluwer
Academic Publishers, 2001.
[10] I. Harjunkoski, T. Westerlund, R. Pörn, and H. Skrifvars, Different trans-
formations for solving non-convex trim-loss problems by MINLP, European
Journal of Operational Research, 105 (1998), pp. 594–603.
[11] Y.H.A. Ho, H.-K. Kwan, N. Wong, and K.-L. Ho, Designing globally optimal
delta-sigma modulator topologies via signomial programming, International
Journal of Circuit Theory and Applications, 37 (2009), pp. 453–472.
[12] R. Jabr, Inductor design using signomial programming, The International Journal
for Computation and Mathematics in Electrical and Electronic Engineering,
26 (2007), pp. 461–475.
[13] T.R. Jefferson and C.H. Scott, Generalized geometric programming applied to
problems of optimal control: I. Theory, Journal of Optimization Theory and
Applications, 26 (1978), pp. 117–129.
[14] H.-C. Lu, H.-L. Li, C.E. Gounaris, and C.A. Floudas, Convex relaxation for
solving posynomial programs, Journal of Global Optimization, 46f (2010),
pp. 147–154.
[15] A. Lundell, Transformation Techniques for Signomial Functions in Global Opti-
mization, PhD thesis, Åbo Akademi University, 2009.
[16] A. Lundell, J. Westerlund, and T. Westerlund, Some transformation tech-
niques with applications in global optimization, Journal of Global Optimiza-
tion, 43 (2009), pp. 391–405.
[17] A. Lundell and T. Westerlund, Exponential and power transformations for
convexifying signomial terms in MINLP problems, in Proceedings of the 27th
IASTED International Conference on Modelling, Identification and Control,
L. Bruzzone, ed., ACTA Press, 2008, pp. 154–159.
[18] , Convex underestimation strategies for signomial functions, Optimization
Methods and Software, 24 (2009), pp. 505–522.
[19] , Implementation of a convexification technique for signomial functions, in
19th European Symposium on Computer Aided Process Engineering, J. Je-
zowski and J. Thullie, eds., Vol. 26 of Computer Aided Chemical Engineering,
Elsevier, 2009, pp. 579–583.

www.it-ebooks.info
GLOBAL OPTIMIZATION OF MISP PROBLEMS 369

[20] , Optimization of transformations for convex relaxations of MINLP problems


containing signomial functions, in Proceedings 10th International Symposium
on Process Systems Engineering (PSE2009), 2009.
[21] C.D. Maranas and C.A. Floudas, Finding all solutions of nonlinearly con-
strained systems of equations, Journal of Global Optimization, 7 (1995),
pp. 143–182.
[22] G.P. McCormick, Mathematical programming computability of global solutions
to factorable nonconvex programs: Part I – convex underestimating problems,
Mathematical Programming, 10 (1976), pp. 147–175.
[23] MINLPWorld, The MINLP Library, http://www.gamsworld.org/minlp/.
[24] U. Passy and D.J. Wilde, A geometric programming algorithm for solving chem-
ical equilibrium problems, SIAM Journal on Applied Mathematics, 16 (1968),
pp. 363–373.
[25] R. Pörn, Mixed Integer Non-Linear Programming: Convexification Techniques
and Algorithm Development, PhD thesis, bo Akademi University, 2000.
[26] R. E. Rosenthal, GAMS – A user’s guide, GAMS Development Corporation,
Washington, DC, USA, 2008.
[27] N. V. Sahinidis and M. Tawarmalani, BARON 7.2.5: Global optimization of
mixed-integer nonlinear programs, user’s manual, 2005.
[28] Y. Shen, E. Y. Lam, and N. Wong, Binary image restoration by signomial pro-
gramming, in OSA Topical Meeting in Signal Recovery and Synthesis, Optical
Society of America, 2007.
[29] M. Tawarmalani and N.V. Sahinidis, Global optimization of mixed-integer non-
linear programs: A theoretical and computational study, Mathematical Pro-
gramming, 99 (2004), pp. 563–591.
[30] J.F. Tsai and M.-H. Lin, Global optimization of signomial mixed-integer nonlin-
ear programming problems with free variables, Journal of Global Optimization,
42 (2008), pp. 39–49.
[31] T. Westerlund, Some transformation techniques in global optimization, in Global
Optimization: From Theory to Implementation, L. Liberti and N. Maculan,
eds., Vol. 84 of Nonconvex Optimization and its Applications, Springer, 2005,
pp. 47–74.
[32] T. Westerlund and T. Lastusilta, AlphaECP GAMS user’s manual, Åbo
Akademi University, 2008.
[33] T. Westerlund and J. Westerlund, GGPECP – An algorithm for solving
non-convex MINLP problems by cutting plane and transformation techniques,
Chemical Engineering Transactions, 3 (2003), pp. 1045–1050.

www.it-ebooks.info
www.it-ebooks.info
PART VI:
Mixed-Integer Quadraticaly
Constrained Optimization

www.it-ebooks.info
www.it-ebooks.info
THE MILP ROAD TO MIQCP
SAMUEL BURER∗ AND ANUREET SAXENA†

Abstract. This paper surveys results on the NP-hard mixed-integer quadratically


constrained programming problem. The focus is strong convex relaxations and valid
inequalities, which can become the basis of efficient global techniques. In particular, we
discuss relaxations and inequalities arising from the algebraic description of the problem
as well as from dynamic procedures based on disjunctive programming. These methods
can be viewed as generalizations of techiniques for mixed-integer linear programming.
We also present brief computational results to indicate the strength and computational
requirements of these methods.

1. Introduction. More than fifty years have passed since Dantzig et.
al. [25] solved the 50-city travelling salesman problem. An achievement
in itself at the time, their seminal paper gave birth to one of the most
succesful disciplines in computational optimization, Mixed Integer Linear
Programming (MILP). Five decades of wonderful research, both theoretical
and computational, have brought mixed integer programming to a stage
where it can solve many if not all MILPs arising in practice (see [43]).
The ideas discovered during the course of this development have naturally
influenced other disciplines. Constraint programming, for instance, has
adopted and refined many of the ideas from MILP to solve more general
classes of problems [2].
Our focus in this paper is to track the influence of MILP in solving
mixed integer quadratically constrained problems (MIQCP). In particu-
lar, we survey some of the recent research on MIQCP and establish their
connections to well known ideas in MILP. The purpose of this is two-fold.
First, it helps to catalog some of the recent results in a form that is accessi-
ble to a researcher with reasonable background in MILP. Second, it defines
a roadmap for further research in MIQCP; although significant progress
has been made in the field of MIQCP, the “breakthrough” results are yet
to come and we believe that the past of MILP holds the clues to the future
of MIQCP.
Specifically, we focus on the following mixed integer quadratically con-
strained problem

min xT Cx + cT x (MIQCP)
s.t. x∈F

∗ Department of Management Sciences, University of Iowa, Iowa City, IA 52242-

1994, USA ([email protected]). Author supported in part by NSF Grant CCF-


0545514.
† Axioma Inc., 8800 Roswell Road, Building B. Suite 295, Atlanta GA, 30350, USA

([email protected]).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 373
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_13,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
374 SAMUEL BURER AND ANUREET SAXENA

where
⎧ ⎫
⎨ xT Ak x + aTk x ≤ bk ∀ k = 1, . . . , m ⎬
F := x ∈ Rn : l≤x≤u .
⎩ ⎭
xi ∈ Z ∀i∈I

The data of (MIQCP) is


• (C, c) ∈ S n × Rn
• (Ak , ak , bk ) ∈ S n × Rn × R for all k = 1, . . . , m
• (l, u) ∈ (R ∪ {−∞})n × (R ∪ {+∞})n
• I ⊆ [n]
where, in particular, S n is the set of all n × n symmetric matrices and
[n] := {1, . . . , n}. Without loss of generality, we assume l < u, and for all
i ∈ I, (li , ui ) ∈ (Z ∪ {−∞}) × (Z ∪ {+∞}). Note that any specific lower or
upper bound may be infinite.
If all Ak = 0, then (MIQCP) reduces to MILP. So (MIQCP) is
NP-hard. In fact, the continuous variant of MIQCP, namely a non-convex
QCP, is already NP-hard and a well-known problem in global optimization
[45, 46]. The computational intractability of MIQCP is quite notorious
and can be traced to the result of Jeroslow [30] from the seventies that
shows that the variant of MIQCP without explicit non-infinite lower/upper
bounds on some of the varibles is undecidable. (MIQCP) is itself a special
case of mixed integer nonlinear programming (MINLP); we refer the reader
to the website MINLP World [41] for surveys, software, and test instances
for MINLP. We also note that any polynomial optimization problem may
be reduced to (MIQCP) by the introduction of auxiliary variables and
constraints to reduce all polynomial degrees to 2, e.g., a cubic term x1 x2 x3
could be modeled as x1 X23 with X23 = x2 x3 .
Note that if the objective function and constraints in MIQCP are con-
vex, then the resulting optimization problem can be solved using standard
techniques for solving convex MINLP (see [16] for more details). Most of
the ideas and methods discussed in this paper specifically exploit the non-
convex quadratic nature of the objective and constraints of (MIQCP). In
fact, our viewpoint is that many ideas from the solution of MILPs can be
adapted in interesting ways for the study of (MIQCP). In this sense, we
view (MIQCP) as a natural progression from MILP rather than, say, a
special case of MINLP.
We are also not specifically concerned with the global optimization of
(MIQCP). Rather, we focus on generating strong convex relaxations and
valid inequalities, which could become the basis of efficient global tech-
niques.
In Section 2, we review the idea of lifting, which is commonly used to
convexify (MIQCP) and specifically the feasible set F . We then discuss
the generation of various types of linear, second-order-cone, and semidefi-
nite valid inequalities which strengthen the convexification. These inequali-
ties have the property that they arise directly from the algebraic form of F .

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 375

In this sense, they generalize the basic LP relaxation often used in MILP.
We also catalog several known and new results establishing the strength
of these inequalities for certain specifications of F . Then, in Section 3,
we describe several related approaches that shed further light on convex
relaxations of (MIQCP).
In Section 4, we discuss methods for dynamically generating valid in-
equalities, which can further improve the relaxations. One of the funda-
mental tools is that of disjunctive programming, which has been used in
the MILP community for five decades. However, the disjunctions employed
herein are new in the sense that they truly exploit the quadratic form
of (MIQCP). Recently, Belotti [11] studies disjunctive cuts for general
MINLP.
Finally, in Section 5, we consider a short computational study to give
some sense of the computational effort and effect of the methods surveyed
in this paper.
1.1. Notation and terminology. Most of the notation used in this
paper is standard. We define here just a few perhaps atypical notations.
For symmetric matrices A and B of conformable dimensions, we define
"A, B# = tr(AB); 6a standard7 fact is that the quadratic form xT Ax can
T
be represented as A, xx . For a set P in the space of variables (x, y),
projx (P ) denotes the coordinate projection of P onto the space x. clconv P
is the closed convex hull of P . For a square matrix A, diag(A) denotes the
vector of diagonal entries of A. The vector e is the all-ones vector, and ei
is the vector having all zeros except a 1 in position i. The notation X  0
means that X is symmetric positive semidefinite; X ' 0 means symmetric
negative semidefinite.
2. Convex relaxations and valid inequalities. In this section,
we describe strong convex relaxations of (MIQCP) and F, which arise
from the algebraic description of F. For the purposes of presentation, we
partition the indices [m] of the quadratic constraints into three groups:

“linear” LQ := {k : Ak = 0}
“convex quadratic” CQ := {k : Ak = 0, Ak  0}
“nonconvex quadratic” N Q := {k : Ak = 0, Ak 
 0}.

For those k ∈ CQ, there exists a rectangular matrix Bk (not necessarily


unique) such that Ak = BkT Bk . Using the Bk ’s, it is well known that
each convex quadratic constraint can be represented as a second-order-cone
constraint.
Proposition 2.1 (Alizadeh and Goldfarb [5]). Let k ∈ CQ with
Ak = BkT Bk . A point x ∈ Rn satisfies xT Ak x + aTk x ≤ bk if and only if
3 3
3 Bk x 3 1
3 1 3 T
3 (1 + aT x − bk ) 3 ≤ 2 (1 − ak x + bk ).
2 k

www.it-ebooks.info
376 SAMUEL BURER AND ANUREET SAXENA

So F ⊆ (n may be rewritten as
⎧ ⎫

⎪ 3 3 aTk x ≤ bk ∀ k ∈ LQ ⎪


⎪ 3 T 3 1 ⎪


⎪ 3 1 T Bk x 3 T ⎪

⎨ 3 (a x − bk + 1) 3 ≤ 2 (1 − ak x + bk ) ∀ k ∈ CQ ⎬
F= x : 2 k .

⎪ xT Ak x + aTk x ≤ bk ∀ k ∈ N Q ⎪


⎪ ⎪


⎪ l≤x≤u ⎪

⎩ ⎭
xi ∈ Z ∀ i ∈ I

If so desired, we can model the bounds l ≤ x ≤ u within the linear con-


straints. However, since bounds often play a special role in approaches for
(MIQCP), we leave them separate.
2.1. Lifting, convexification, and relaxation. An idea fundamen-
tal to many methods for (MIQCP) is lifting to a higher dimensional
space. The simplest lifting idea is to introduce auxiliary variables Xij ,
which model the quadratic terms xi xj via equations Xij = xi xj for all
1 ≤ i, j ≤ n. The single symmetric matrix equation X = xxT also cap-
tures this lifting.
As an immediate consequence of lifting, the quadratic objective and
constraints may be expressed linearly in (x, X), e.g.,

X=xxT
xT Cx + cT x = "C, X# + cT x.

So (MIQCP) becomes

min "C, X# + cT x (2.1)


s.t. (x, X) ∈ F-

where
⎧ ⎫

⎪ "Ak , X# + aTk x ≤ bk ∀ k = 1, . . . , m ⎪

⎨ ⎬
l≤x≤u
F- := (x, X) ∈ Rn × S n : .

⎪ xi ∈ Z ∀ i ∈ I ⎪

⎩ ⎭
X = xxT

This provides an interesting perspective: the “hard” quadratic objective


and constraints of (MIQCP) are represented as “easy” linear ones in the
space (x, X). The trade-off is the nonconvex equation X = xxT , and of
course the non-convex integrality conditions remain.
The linear objective in (2.1) allows convexification of the feasible region
without change to the optimal value. From standard convex optimization:
Proposition 2.2. The problem (2.1), and hence also (MIQCP), is
equivalent to
 
min "C, X# + cT x : (x, X) ∈ clconv F- .

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 377

Thus, many lifting approaches may be interpreted as attempting to


better understand clconv F. - We adopt this point of view.
A straightforward linear relaxation of clconv F-, which is analogous to
the basic linear relaxation of a MILP, is gotten by simply dropping X = xxT
and xi ∈ Z:
 
"Ak , X# + aTk x ≤ bk ∀ k = 1, . . . , m
L- := (x, X) ∈ Rn × S n : .
l≤x≤u

There are many ways to strengthen L- as discussed in the following three


subsections.
2.2. Valid linear inequalities. The most common idea for con-
structing valid linear inequalities for clconv F- is the following. Let αT x ≤
α0 and β T x ≤ β0 be any two valid linear inequalites for F (possibly the
same). Then the quadratic inequality

0 ≤ (α0 − αT x)(β0 − β T x) = α0 β0 − β0 αT x − α0 β T x + xT αβ T x

is also valid for F, and so the linear inequality


6 7
α0 β0 − β0 αT x − α0 β T x + βαT , X ≥ 0

is valid for clconv F-.


The above idea works with any pair of valid inequalities, e.g., ones
given explicitly in the description of F or derived ones. For those explicitly
given (the bounds l ≤ x ≤ u and the constraints corresponding to k ∈ LQ),
the complete list of derived valid quadratic inequalities is

(xi − li )(xj − lj ) ≥ 0 ⎪


(xi − li )(uj − xj ) ≥ 0
∀ (i, j) ∈ [n] × [n], i ≤ j (2.2a)
(ui − xi )(xj − lj ) ≥ 0 ⎪⎪

(ui − xi )(uj − xj ) ≥ 0

(xi − li )(bk − aTk x) ≥ 0
∀ (i, k) ∈ [n] × LQ (2.2b)
(ui − xi )(bk − aTk x) ≥ 0

(b − aT x)(bk − aTk x) ≥ 0 ∀ (, k) ∈ LQ × LQ,  ≤ k. (2.2c)

The linearizations of (2.2) were considered in [37] and are sometimes re-
ferred to as rank-2 linear inequalities [33]. So we denote the collection of
all (x, X), which satisfy these linearizations, as R2 , i.e.,

R2 := { (x, X) : linearized versions of (2.2) hold }.

In particular, the linearized versions of (2.2a) are called the RLT in-
equalities after the “reformulation-linearization technique” of [55], though
they first appeared in [3, 4, 40]. These inequalities have been studied exten-
sively because of their wide applicability and simple structure. Specifically,

www.it-ebooks.info
378 SAMUEL BURER AND ANUREET SAXENA

the RLT constraints provide simple bounds on the entries of X, which


otherwise may be weakly constrained in L, - especially when m is small. Af-
ter linearization via Xij = xi xj , the four inequalities (2.2a) for a specific
(i, j) are
 
li xj + xi lj − li lj xi uj + li xj − li uj
≤ Xij ≤ (2.3)
ui xj + xi uj − ui uj xi lj + ui xj − ui lj .

Using the symmetry of X, the entire collection of RLT inequalities in matrix


form is

lxT + xl T − llT
≤ X ≤ xuT + lxT − luT . (2.4)
uxT + xuT − uuT

It can be shown easily that the original inequalities l ≤ x ≤ u are implied


by (2.4) if both l and u are finite in all components.
Since the RLT inequalities are just a portion of the inequalities defin-
ing R2 , it might be reasonable to consider R2 generally and not the RLT
constraints specifically. However, it is sometimes convenient to study the
RLT constraints on their own. So we write (x, X) ∈ RLT when (x, X)
satisfies (2.4) but not necessarily all rank-2 linear inequalities, i.e.,

RLT := { (x, X) : (2.4) holds }.

2.3. Valid second-order-cone inequalities. Similar to the deriva-


tion of the inequalities defining R2 , the linearizations of the following
quadratic inequalities are valid for clconv F:- for all (k, ) ∈ LQ × CQ,
 3 3
1 3 BT x 3
T
(bk − ak x) T 3
(1 − a x + b ) − 3 1 T 3 ≥ 0. (2.5)
2 (a x − b  + 1) 3
2 

We call the linearizations rank-2 second-order inequalities and denote by


S2 the set of all satisfying (x, X), i.e.,

S2 := { (x, X) : linearized versions of (2.5) hold }.

As an example, suppose we have the two valid inequalities x1 + x2 ≤ 1


and x21 + x22 ≤ 2/3, the second of which, via
Proposition 2.1, is equivalent
to the second-order
 cone constraint x ≤ 2/3. Then we multiply 1 −
x1 − x2 ≥ 0 with 2/3 − x ≥ 0 to obtain the valid inequality

0 ≤ (1 − x1 − x2 )( 2/3 − x)

= 2/3(1 − x1 − x2 ) − (1 − x1 − x2 )x,

which is linearized as
3 3
3 x1 − X11 − X12 3 
3 3
3 x2 − X21 − X22 3 ≤ 2/3(1 − x1 − x2 ).

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 379

2.4. Valid semidefinite inequalities. The application of SDP to


(MIQCP) arises from the following fundamental observation:
Lemma 2.1 (Shor [56]). Given x ∈ Rn , it holds that
     T
1 xT 1 1
=  0.
x xxT x x

Thus, the linearized semidefinite inequality


 
1 xT
Y := 0 (2.6)
x X

- We define
is valid for clconv F.

PSD := { (x, X) : (2.6) holds }.

Instead of enforcing (x, X) ∈ PSD, i.e., the full PSD condition (2.6),
one can enforce relaxations of it. For example, since all principal submatri-
ces of Y  0 are semidefinite, one could enforce just that all or some of the
2 × 2 principal submatrices of Y are positive semidefinite. This has been
done in [32], for example.
2.5. The strength of valid inequalities. From the preceding sub-
sections, we have the following result by construction:
Proposition 2.3. clconv F- ⊆ L- ∩ RLT ∩ R2 ∩ S2 ∩ PSD.
Even though R2 ⊆ RLT, we retain RLT in the above expression for em-
phasis. We next catalog and discuss various special cases in which equality
is known to hold in Proposition 2.3.
2.5.1. Simple bounds. We first consider the case when F is defined
by simple, finite bounds, i.e., F = {x ∈ Rn : l ≤ x ≤ u} with (l, u) finite
in all components. In this case, R2 = RLT ⊆ L- and S2 is vacuous. So
Proposition 2.3 can be stated more simply as clconv F- ⊆ RLT ∩ PSD.
Equality holds if and only if n ≤ 2:
Theorem 2.1 (Anstreicher and Burer [6]). Let F = {x ∈ Rn : l ≤
x ≤ u} with (l, u) finite in all components. Then clconv F- ⊆ RLT ∩ PSD
with equality if and only if n ≤ 2.
For n > 2, [6] and [19] derive additional valid inequalities but are still
unable to determine an exact representation by valid inequalities even for
n = 3. ([6] does give an exact disjunctive representation for n = 3.)
We also mention a classical result, which is in some sense subsumed by
Theorem 2.1. Even still, this result indicates the strength of the RLT in-
equalities and can be useful when one-variable quadratics Xii = x2i are not
of interest. The result does not fully classify clconv F- but rather coordinate
projections of it.
Theorem 2.2 (Al-Khayyal and Falk [4]). Let F = {x ∈ Rn : l ≤
x ≤ u} with (l, u) finite in all components. Then, for all 1 ≤ i < j ≤ n,

www.it-ebooks.info
380 SAMUEL BURER AND ANUREET SAXENA

proj(xi ,xj ,Xij ) (clconv F-) = RLTij , where RLTij := {(xi , xj , Xij ) ∈ R3 :
(2.3) holds}.
2.5.2. Binary integer grid. We next consider the case when F is a
binary integer grid: that is, F = {x ∈ Zn : l ≤ x ≤ u} with u = l + e and l
finite in all components. Note that this is simply a shift of the standard 0-1
binary grid and that clconv F- is a polytope. In this case, R2 = RLT ⊆ L-
and S2 is vacuous. So Proposition 2.3 states that clconv F- ⊆ RLT ∩ PSD.
Also, some additional, simple linear equations are valid for clconv F. -
Proposition 2.4. Suppose that i ∈ I has ui = li + 1 with li finite.
Then the equation Xii = (1 + 2 li )xi − li − li2 is valid for clconv F-.
Proof. The shift xi −li is either 0 or 1. Hence, (xi −li )2 = xi −li . After
linearization with Xii = x2i , this quadratic equation becomes the claimed
linear one.
When I = [n], the individual equations Xii = (1 + 2 li )xi − li − li2 can
be collected as diag(X) = (e + 2 l) ◦ x − l − l2 . We remark that the next
result does not make use of PSD.
Theorem 2.3 (Padberg [44]). Let F = {x ∈ Zn : l ≤ x ≤ u} with
u = l + e and l finite in all components. Then
 
clconv F- ⊆ RLT ∩ (x, X) : diag(X) = (e − 2 l) ◦ x − l − l2

with equality if and only if n ≤ 2.


In this case, clconv F- is closely related to the boolean quadric polytope
[44]. For n > 2, additional valid inequalities, such as the triangle inequal-
ities described by Padberg [44], provide an even better approximation of
clconv F.- For general n, a full description should not be easily available
unless P = N P .
2.5.3. The nonnegative orthant and standard simplex. We now
consider a case arising in the study of completely positive matrices in opti-
mization and linear algebra [13, 23, 24]. A matrix Y is completely positive
if Y = N N T for some nonnegative, rectangular matrix N , and the set of
all completely positive matrices is a closed, convex cone. Although this
cone is apparently intractable [42], it can be approximated from the out-
side to any precision using a seqence of polyhedral-semidefinite relaxations
[26, 47]. The simplest approximation is by the so-called doubly nonnegative
matrices, which are matrices Y satisfying Y  0 and Y ≥ 0. Clearly, every
completely positive matrix Y is doubly nonnegative. The converse holds if
and only if n ≤ 4 [39].
Let F = {x ∈ Rn : x ≥ 0}. Then, since RLT = R2 and S2 is vacuous,
Proposition 2.3 states clconv F- ⊆ L- ∩ RLT ∩ PSD. Note that, because u is
infinite in this case, (x, X) ∈ RLT does not imply x ≥ 0, and so we must
include L- explicitly to enforce x ≥ 0. It is easy to see that
   
1 xT
L- ∩ RLT ∩ PSD = (x, X) ≥ 0 : 0 (2.7)
x X

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 381

which is the set of all doubly nonnegative matrices of size (n + 1) × (n + 1)


having a 1 in the top-left corner.
For general n, it appears that equality in Proposition 2.3 does not hold
in this case. However, it does hold for n ≤ 3, which we prove next. As far
as we are aware, this result has not appeared in the literature.
We first characterize the difference between conv F- and L∩RLT
- ∩PSD
for n ≤ 3. The following lemma shows that this difference is precisely the
recession cone of L- ∩ RLT ∩ PSD, which equals

rcone(L- ∩ RLT ∩ PSD) = {(0, D) ≥ 0 : D  0} . (2.8)

Lemma 2.2. Let F = {x ∈ Rn : x ≥ 0} with n ≤ 3. Then

L- ∩ RLT ∩ PSD = conv F- + rcone(L- ∩ RLT ∩ PSD).

Proof. To prove the statement, we show containment in both direc-


tions; the containment ⊇ is easy. To show ⊆, let (x, X) ∈ L- ∩ RLT ∩ PSD
be arbitary. By (2.7),
 
1 xT
Y :=
x X

is doubly nonnegative of size (n + 1) × (n + 1). Since n ≤ 3, the “n ≤ 4”


result of [39] implies that Y is completely positive. Hence, there exists a
rectangular N ≥ 0 such that Y = N N T . By decomposing each column N·j
of N as
 
ζ
N·j = j , (ζj , zj ) ∈ R+ × Rn+
zj

we can write
      T
1 xT ζj ζj
Y = =
x X zj zj
j

   T   0   0 T
1 1
= ζj2 +
ζj−1 zj ζj−1 zj zj zj
j:ζj >0 j:ζj =0
    0 0T 
1 ζj−1 zjT
= ζj2 −1 + .
ζj zj ζj−2 zj zjT 0 zj zjT
j:ζj >0 j:ζj =0
 2
This shows that j:ζj >0 ζj = 1 and, from (2.8), that (x, X) is expressed as
the convex combination of points in F- plus the sum of points in rcone(L- ∩
RLT ∩ PSD), as desired.
Using the lemma, we can prove equality in Proposition 2.3 for n ≤ 3.
Theorem 2.4. Let F = {x ∈ Rn : x ≥ 0}. Then clconv F- ⊆
-
L ∩ RLT ∩ PSD with equality if n ≤ 3.

www.it-ebooks.info
382 SAMUEL BURER AND ANUREET SAXENA

Proof. The first statement of the theorem is just Proposition 2.3. Next,
for contradiction, suppose there exists (x̄, X̄) ∈ L- ∩ RLT ∩ PSD \ clconv F.
-
By the separating hyperplane theorem, there exists (c, C) such that

min{"C, X# + cT x : x ∈ clconv F-} ≥ 0 > "C, X̄# + cT x̄.

Since (x̄, X̄) ∈ L- ∩ RLT ∩ PSD, by the lemma there exists (z, Z) ∈ conv F-
and (0, D) ∈ rcone(L- ∩ RLT ∩ PSD) such that (x̄, X̄) = (z, Z + D). Thus,
"C, D# < 0.
Since D ≥ 0, D  0, and n ≤ 3, D is completely positive, i.e., there
exists rectangular N ≥ 0 such that D = N N T . We have "C, N N T # < 0,
which implies dT Cd < 0 for some nonzero column d ≥ 0 of N . It follows
that d is a negative direction of recession for the function xT Cx + cT x. In
other words,

min{"C, X# + cT x : x ∈ clconv F-} = −∞,

a contradiction.
A related result occurs for a bounded slice of the nonnegative orthant,
e.g., the standard simplex {x ≥ 0 : eT x = 1}. In this case, however, the
boundedness, the linear constraint, and R2 ensure that equality holds in
Proposition 2.3 for n ≤ 4.
Theorem 2.5 (Anstreicher and Burer [6]). Let F := {x ≥ 0 : eT x =
1}. Then clconv F- ⊆ L- ∩ RLT ∩ R2 ∩ PSD with equality if and only if
n ≤ 4.
[36] and [6] also give related results where F is an affine transformation
of the standard simplex.

2.5.4. Half ellipsoid. Let F be a half ellipsoid, that is, the intersec-
tion of a linear half-space and a possibly degenerate ellipsoid. In contrast
to the previous cases considered, [57] proved that this case achieves equality
in Proposition 2.3 regardless of the dimension n. On the other hand, the
number of constraints is fixed. In particular, all simple bounds are infinite,
|LQ| = 1, |CQ| = 1, and N Q = ∅ in which case Proposition 2.3 states
simply clconv F- ⊆ L- ∩ S2 ∩ PSD.
Theorem 2.6 (Sturm and Zhang [57]). Suppose
 
aT1 x ≤ b1
F= x ∈ Rn :
x A2 x + aT2 x ≤ b2
T

with A2  0 is nonempty. Then clconv F- = L- ∩ S2 ∩ PSD.


As far as we are aware, this is the only case where Proposition 2.3
is provably strengthened via use of the rank-2 second-order inequalities
enforced by S2 .

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 383

2.5.5. Bounded quadratic form. The final case we consider is that


of a bounded quadratic form. Specifically, for a given quadratic form
xT Ax + aT x and bounds −∞ ≤ bl ≤ bu ≤ +∞, let F be the set of
points such that the form falls within the bounds, i.e., F = {x : bl ≤
xT Ax + aT x ≤ bu }. No assumptions are made on A, e.g., we do not as-
sume that A is positive semidefinite. As far as we are aware, the result
proved below is new, but closely related results can be found in [57, 61].
Since there are no explicit bounds and no linear or convex quadratic
inequalities, Proposition 2.3 states simply that clconv F- ⊆ L- ∩ PSD, where
⎧ ⎫
⎨ "A, X# +aT x ≤ bu ⎬
bl ≤ 
L- ∩ PSD = (x, X) : 1 xT . (2.9)
⎩ 0 ⎭
x X

In general, it appears that equality in Proposition 2.3 does not hold in this
case. However, we can still characterize the difference between clconv F-
and L- ∩ PSD. As it turns out in the theorem below, this difference is
precisely the recession cone of L- ∩ PSD, which equals
⎧ ⎫
⎨ 0 ≤ "A, D# if − ∞ < bl ⎬
rcone(L- ∩ PSD) = (0, D) : "A, D# ≤ 0 if bu < +∞ .
⎩ ⎭
D  0.

In particular, if rcone(L- ∩ PSD) is trivial, e.g., when A ) 0 and bl = bu are


finite, then we will have equality in Proposition 2.3. The proof makes use
of an important external lemma but otherwise is self-contained. Related
proof techniques have been used in [10, 18].
Lemma 2.3 (Pataki [48]). Consider a consistent feasibility system in
the symmetric matrix variable Y , which enforces Y  0 as well as p linear
equalities and q linear inequalities. Suppose Ȳ is an extreme point of the
feasible set, and let r̄ := rank(Y ) and s̄ be the number of inactive linear
inequalities at Ȳ . Then r̄(r̄ + 1)/2 + s̄ ≤ p + q.
Theorem 2.7. Let F = {x ∈ Rn : bl ≤ xT Ax + aT x ≤ bu } with
−∞ ≤ bl ≤ bu ≤ +∞ be nonempty. Then

clconv F- ⊆ L- ∩ PSD = conv F- + rcone(L- ∩ PSD).

Proof. Proposition 2.3 gives the inclusion clconv F- ⊆ L- ∩ PSD. So we


need to prove L- ∩ PSD = clconv F- + rcone(L- ∩ PSD). The containment ⊇
is straightforward by construction.
For the containment ⊆, recall that any point in a convex set may be
written as a convex combination of finitely many extreme points plus a
finite number of extreme rays. Hence, to complete the proof, it suffices to
show that every extreme point of L- ∩ PSD is in F-.
So let (x̄, X̄) be any extreme point of L- ∩ PSD. Examining (2.9) in the
context of Lemma 2.3, L∩PSD- can be represented as a feasibility system in

www.it-ebooks.info
384 SAMUEL BURER AND ANUREET SAXENA
 
1 xT
Y :=
x X

under four scenarios based on the values of (bl , bu ):


(i) (p, q) = (1, 0) if both bl and bu are infinite;
(ii) (p, q) = (1, 1) if exactly one is finite;
(iii) (p, q) = (1, 2) if both are finite and bl < bu ; or
(iv) (p, q) = (2, 0) if both are finite and bl = bu .
Define Ȳ according to (x̄, X̄) and (r̄, s̄) as in the lemma. In the three cases
(i), (ii), and (iv), p + q ≤ 2, and since s̄ ≥ 0, we have

r̄(r̄ + 1)/2 ≤ r̄(r̄ + 1)/2 + s̄ ≤ p + q ≤ 2,

in which case r̄ ≤ 1. In the case (iii), p + q = 3 but s̄ ≥ 1 since bl < bu ,


and so

r̄(r̄ + 1)/2 ≤ p + q − s̄ ≤ 2,

which implies r̄ ≤ 1 as well. Overall, r̄ ≤ 1 means that (x̄, X̄) satisfies


- as desired.
X̄ = x̄x̄T , in which case (x̄, X̄) ∈ F,
We remark that, if one were able to prove rcone(clconv F-) = rcone(L∩
-
PSD), then one would have equality in general. Using Lemma 2.3, it is
possible to show that rcone(L- ∩ PSD) = D, - where
 
0 ≤ dT Ad if − ∞ < bl
D := d ∈ Rn : T
d Ad ≤ 0 if bu < +∞
 
-
D := (0, dd ) ∈ Rn × S n : d ∈ D ,
T

- and rcone(clconv F-) is unclear.


but the relationship between D
3. Convex relaxations and valid inequalities: Related topics.
3.1. Another approach to convexification. We have presented
Section 2 in terms of the set clconv F.- Another common approach [29, 58] is
to study the so-called convex and concave envelopes of nonconvex functions.
For example, in the formulation (MIQCP), suppose f (x) := xT Cx + cT x
is nonconvex, and let f− (x) be any convex function that underestimates
f (x) in F , i.e., f− (x) ≤ f (x) holds for all x ∈ F. Then one can relax f (x)
by f− (x). Considering all such f− (x), since the point-wise supremum of
convex functions is convex, there is a single convex fˆ− (x) which most closely
underestimates the objective. By definition, fˆ− (x) is the convex envelope
for f (x) over F . The convex envelope is closely related to the closed convex
hull of the graph or epigraph of f (x), i.e., clconv{(x, f (x)) : x ∈ F } or
clconv{(x, w) : x ∈ F, f (x) ≤ w}. Concave envelopes apply similar ideas
to overestimation.
One can obtain a convex relaxation of (MIQCP) by relaxing the ob-
jective and all nonconvex constraints using convex envelopes. This can be

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 385

seen as an alternative to lifting via X = xxT . Either approach is generi-


cally hard. Another alternative is to perform some mixture of lifting and
convex envelopes as is done in [36].
Like the various results in Section 2 with clconv F-, there are relatively
few cases (typically low-dimensional) for which exact convex envelopes are
known. For example, the following gives another perspective on Theorem
2.2 and equation (2.3) above:
Theorem 3.1 (Al-Khayyal and Falk [4]). Let F = {x ∈ Rn : l ≤ x ≤
u} with (l, u) finite in all components. For all 1 ≤ i < j ≤ n, the convex
and concave envelopes of f (xi , xj ) = xi xj over F are, respectively,

max{li xj + xi lj − li lj , ui xj + xi uj − ui uj }
min{xi uj + li xj − li uj , xi lj + ui xj − ui lj }.

These basic formulas can be used to construct convex underestimators


(not necessarily envelopes) for any quadratic function by separating that
quadratic function into pieces based on all xi xj . Such techniques are used
in the software BARON [51]. Also, [21] generalizes the above theorem to
the case where f is the product of two linear functions having disjoint
support.
3.2. A SOCP relaxation. [31] proposes an SOCP relaxation for
(MIQCP), which does not explicitly require the lifting X = xxT and is
related to ideas in difference-of-convex programming [28].
First, the authors assume without loss of generality that the objective
of (MIQCP) is linear. This can be achieved, for example, by introducing
a new variable t ∈ R as well as a new quadratic constraint xT Cx + cT x ≤ t
and then minimizing t. Next, for each k ∈ N Q, Ak is written as the
difference of two (carefully chosen) positive semidefinite A+ −
k , Ak  0, i.e.,
+ −
Ak = Ak − Ak , so that k-th constraint may be expressed as

xT A+ T T −
k x + ak x ≤ bk + x Ak x.

Then, an auxiliary variable zk ∈ R is introduced to represent xT A− k x but


also immediately relaxed as xT A−
k x ≤ z k resulting in the convex system

xT A+ T
k x + a k x ≤ b k + zk
xT A−
k x ≤ zk .

Finally, zk must be bounded in some fashion, say as zk ≤ μk ∈ R, or else the


relaxation will in fact be useless. Bounding zk depends very much on the
problem and the choice of A+ −
k , Ak . [31] provides strategies for bounding zk .
The relaxation thus obtained can be modeled as a problem having only
linear and convex quadratic inequalities, which can in turn be represented
as an SOCP. In total, the relaxation obtained by the authors is

www.it-ebooks.info
386 SAMUEL BURER AND ANUREET SAXENA
⎧ ⎫

⎪ aTk x ≤ bk ∀ k ∈ LQ ⎪


⎪ ⎪

⎨ x Ak x + aTk x ≤ bk
T
∀ k ∈ CQ ⎬
T +
x ∈ Rn : x Ak x + aTk x ≤ bk + zk ∀ k ∈ NQ .

⎪ ⎪


⎪ xT A−k x ≤ zk ∀ k ∈ NQ ⎪

⎩ ⎭
l ≤ x ≤ u, z ≤ μ

-
This SOCP model is shown to be dominated by the SDP relaxation L∩PSD,
-
while it is not directly comparable to the basic LP relaxation L.
The above relaxation was recently revisited in [53]. The authors stud-
ied the relaxation obtained by the following splitting of the Ak matrices,
 
T T
Ak = λkj vkj vkj − λkj vkj vkj ,
λkj >0 λkj <0

where {λk1 , . . . , λkn } and {vk1 , . . . , vkn } are the set of eigenvalues and
eigenvectors of Ak , respectively. The constraint xT Ak x+aTk x ≤ bk can thus
 1 22  1 22
T
be reformulated as, λkj >0 λkj vkj x + aTk x ≤ bk + λkj <0 λkj vkj T
x .
1 22
T
The non-convex terms vkj x (λkj < 0) can be relaxed by using their se-
cant approximation to derive a convex relaxation of the above constraint.
Instances of (MIQCP) tend to have geometric correlations along those
vkj with λkj < 0, which can be captured by projection techniques, and
embedded within the polarity framework to derive strong cutting planes.
We refer the reader to [53] for further details.
3.3. Results relating simple bounds and the binary integer
grid. Motivated by Theorems such as 2.1 and 2.3 and the prior work of
Padberg [44] and Yajima and Fujie [60], Burer and Letchford [19] studied
the relationship between the two convex hulls
 
clconv (x, xxT ) : x ∈ [0, 1]n (3.1a)
 T n

clconv (x, xx ) : x ∈ {0, 1} . (3.1b)

The convex hull (3.1a) has been named QP Bn by the authors because of
its relationship to “quadratic programming over the box.” The convex hull
(3.1b) is essentially the well known boolean quadric polytope BQPn [44].
In fact, the authors show that BQPn is simply the coordinate projection
of (3.1b) on the variables xi (1 ≤ i ≤ n) and Xij (1 ≤ i < j ≤ n). Note
that nothing is lost in the projection because Xii = xi and Xji = Xij are
valid for (3.1b).
We let π represent the coordinate projection just mentioned, i.e., onto
the variables xi and Xij (i < j). The authors’ result can be stated as
π(QP Bn ) = BQPn , which immediately implies the following:
Theorem 3.2 (Burer and Letchford [19]). Any inequality in the vari-
ables xi (1 ≤ i ≤ n) and Xij (1 ≤ i < j ≤ n), which is valid for BQPn , is
also valid for QP Bn.

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 387

For proper interpretation of the theorem, it is important to keep in


mind that QP Bn still involves the variables Xii , while those same vari-
ables have been projected out to obtain BQPn . So another way to phrase
the theorem is as follows: a valid inequality for BQPn becomes valid for
QP Bn when the variables Xii are introduced into the inequality with zero
coefficients.
This result shows that, in a certain sense, describing QP Bn is at least
as hard as describing BQPn , and since many classes of valid inequalities
are already known for BQPn , it also gives many classes of valid inequalities
for QP Bn . Indeed, the authors prove many classes of facets for BQPn are
in fact facets for QP Bn . The authors also demonstrate that PSD and the
RLT inequalities for pairs (i, i) help describe QP Bn beyond BQPn .
3.4. Completely positive programming. Burer [18] has recently
studied a special case of (MIQCP) having
⎧ ⎫
⎨ Ax = b ⎬
F = x ≥ 0 : xi xj = 0 ∀ (i, j) ∈ E , (3.2)
⎩ ⎭
xi ∈ {0, 1} ∀ i ∈ I

where, in particular, A is a rectangular matrix and E ⊆ [n] × [n] is sym-


metric. The author considers this specific form because it is amenable to
analysis and yet is still fairly general. The results in [18] do not appear to
hold, for example, under the general quadratic constraints of (MIQCP).
Proposition 2.3 and similar logic as in Theorem 2.3 show that

clconv F- ⊆ H
-PSD := L- ∩ RLT ∩ R2 ∩ PSD ∩ {(x, X) : Xii = xi ∀ i ∈ I}.

The following simplifying proposition is fairly easy to show:


Proposition 3.1 (Burer [17]). It holds that
⎧ ⎫

⎪ (x, X) ≥ 0 ⎪

⎨ ⎬
-PSD = (x, X) ∈ PSD : Ax = b, diag(AXAT ) = b2
H .

⎪ Xij = 0 ∀ (i, j) ∈ E ⎪

⎩ ⎭
Xii = xi ∀ i ∈ I

Actually, (x, X) ∈ clconv F- satisfies a stronger convex condition than


(x, X) ∈ PSD. Recall that (x, X) ∈ PSD is derived from
    T
1 xT 1 1
=  0.
x xxT x x

Using that x ∈ F has x ≥ 0, the above matrix is completely positive, not


just positive semidefinite; see Section 2.5.3. We write
 
1 xT
(x, X) ∈ CPP ⇐⇒ is completely positive
x X

www.it-ebooks.info
388 SAMUEL BURER AND ANUREET SAXENA

and define H -CPP := H - PSD ∩ CPP. The result below establishes that
- -
clconv F = HCPP .
Theorem 3.3 (Burer [18], Bomze and Jarre [15]). Let F be defined as
in (3.2). Define J := {j : ∃ k s.t. (j, k) ∈ E or (k, j) ∈ E}, and suppose xi
is bounded in {x ≥ 0 : Ax = b} for all i ∈ I ∪ J. Then clconv F- = H -CPP .
We emphasize that the result holds regardless of the boundedness of
F as a whole; it is only important that certain variables are bounded.
Completely positive representations of clconv F- for different F , which are
not already covered by the above theorem, can also be found in [49, 50].
Starting from the above theorem, Burer [17] has implemented a spe-
cialized algorithm for optimizing the relaxation H -PSD . We briefly discuss
this implementation in Section 5.
3.5. Higher-order liftings and projections. Whenever it is not
possible to capture clconv F- exactly in the lifted space (x, X), it is still
possible to lift into ever higher dimensional spaces and to linearize, say,
cubic, quartic, or higher-degree valid inequalities there. This is quite a
deep and powerful technique for capturing clconv F-. We refer the reader
to the following papers: [9, 14, 33, 34, 35, 37, 54].
One of the most famous results in this area is the sequential convexi-
fication result for mixed 0-1 linear programs (M01LPs). Balas [8] showed
that M01LPs are special cases of facial disjunctive programs which possess
the sequential convexifiability property. Simply put, this means that the
closed convex hull of all feasible solutions to a M01LP can be obtained by
imposing the 0-1 condition on the binary variables sequentially, i.e., by im-
posing the 0-1 condition on the first binary variable and convexifying the
resulting set, followed by imposing the 0-1 condition on the second variable,
and so on. This is stated as the following theorem.
Theorem 3.4 (Balas [8]). Let F be the feasible set of a M01LP, i.e.,
 
F = x ∈ {0, 1}n : aTk x ≤ bk ∀ k = 1, . . . , m
and define L to be its basic linear relaxation in x. For each i = 1, . . . , n,
define Ti := {x : xi ∈ {0, 1}}. and
S0 := L
Si := clconv (Si−1 ∩ Ti ) ∀ i = 1, . . . , n.
Then Sn = clconv F.
There exists an analogous sequential convexficiation for the continuous
case of (MIQCP) for general quadratic constraints.
Theorem 3.5 (Saxena et al. [52]). Suppose that the feasible region F
of (MIQCP) is bounded with I = ∅, i.e., no integer variables. For each
i = 1, . . . , n, define T-i := {(x, X) : Xii ≤ x2i }. Also define
S-0 := L- ∩ PSD
1 2
S-i := clconv S-i−1 ∩ T-i ∀ i = 1, . . . , n.

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 389

-
Then S-n = clconv F.
Part of the motivation for this theorem comes from the fact that
n
8
PSD ∩ T-i = {(x, X) : X = xxT }
i=1

i.e., enforcing all T-i along with positive semidefiniteness recovers the
"nnon-
convex condition X = xxT . This is analogous to the fact that i=1 Ti
recovers the integer condition in Theorem 3.4.
There is one crucial difference between Theorems 3.4 and 3.5. Note
that a M01LP with a single binary variable is polynomial-time solvable;
Balas [8] gave a polynomial-sized LP for this problem. On the other hand,
the analogous problem in the context of (MIQCP) involves minimizing a
linear function over a nonconvex set of the form

L- ∩ PSD ∩ {(x, X) : Xii ≤ x2i }.

It is not immediately clear if this is a polynomial-time solvable problem.


Indeed, it is likely to be NP-hard [38].
An immediate consequence of any sequential convexification theorem
is that it decomposes the non-convexity of the original problem (M01LP
or MIQCP) into a set of simple atomic non-convex conditions, such as
xj ∈ {0, 1} or Xii ≤ x2i that can be handled separately. For instance, Balas,
Ceria and Cornuéjols [9] studied a lifted LP formulation of M01LP with a
single binary variable and combined it with projection techniques to derive
a family of cutting planes for M01LP, widely known as lift-and-project
cuts. In order to apply the same idea to (MIQCP) we need systematic
techniques for deriving valid cutting planes for the set L- ∩ PSD ∩ {(x, X) :
Xii ≤ x2i }; a disjunctive programming based approach is described in the
following section.
Theorems 3.4 and 3.5 can actually be combined to convexify any
bounded F having a mix of binary and continuous variables. Also, The-
orem 3.5 holds if the sets T-i are defined 6 with respect
7 to any orthonormal
basis {v1 , . . . , vn }, i.e., T-i = {(x, X) : vi viT , X ≤ (viT x)2 }, not just the
standard basis {e1 , . . . , en }. We refer the reader to [52] for proofs of these
results.
4. Dynamic approaches for generating valid inequalities. Our
starting point in this section is the lifted version F- of the feasible set F,
whose convex hull can be relaxed, for example, as clconv F- ⊆ L- ∩ RLT ∩
R2 ∩ S2 ∩ PSD (see Proposition 2.3). We are particularly interested in
improving this relaxation through valid inequalities coming from a certain
disjunctive programming approach.
Besides the presence of the integrality constraints xi ∈ Z, the only
nonconvex constraint in F- is the nonlinear equation X = xxT which can
be represented exactly by the pair of SDP inequalities X − xxT  0 and

www.it-ebooks.info
390 SAMUEL BURER AND ANUREET SAXENA

X − xxT ' 0. In fact, by the Schur complement theorem, the former is


equivalent to the inequality (2.6), which is enforced by PSD. However, the
latter is nonconvex. So relaxing F- to L- ∩ PSD can be viewed as simply
dropping X −xxT ' 0. Said differently, F- = L∩PSD
- ∩{(x, X) : X −xxT '
T
0}. Harnessing the power of the inequality X − xx ' 0 constitutes the
emphasis of this section.
As an aside, we would like to mention that all the results presented
in this section exploit the continuous non-convexities in (MIQCP) to
generate cutting planes. Non-convexities arising from presence of inte-
ger variables can be handled in a manner that is usally done in MILP; we
refer the reader to [52] for computational results on disjunctive cuts for
(MIQCP) derived from elementary 0-1 disjunctions, and to [7] for mixed
integer rounding cuts for conic programs with 0-1 variables.
4.1. A procedure for generating disjunctive cuts. For any or-
thonormal basis {v1 , . . . , vn } of Rn ,

F- = L- ∩ PSD ∩ {(x, X) : X − xxT ' 0} (4.1)


 
= L- ∩ PSD ∩ (x, X) : "X, vi viT # ≤ (viT x)2 ∀ i = 1, . . . , n . (4.2)

Given an arbitrary incumbent solution (x̂, X̂), say, from optimizing over
L- or L- ∩ PSD, we would like to choose a basis {v1 , . . . , vn } whose corre-
sponding reformulation most effectively elucidates the infeasibility of (x̂, X̂)
with respect to (4.1). The problem of choosing such a basis can be formu-
lated as the following optimization problem that focuses on maximizing
the violation of (x̂, X̂) with respect to the set of nonconvex constraints
. /2
"X, vi viT # ≤ viT x :
. /2
max maxi=1...n "X̂, vi viT # − viT x̂
s.t. {v1 , . . . , vn } is an orthonormal basis.

Clearly, a set of orthonormal eigenvectors of X̂ − x̂x̂T is an optimal solu-


tion to the above problem. This exercise of choosing a reformulation that
hinges on certain characteristics of the incumbent solution (in this case the
spectral decomposition of (x̂, X̂)) can be viewed as a dynamic reformula-
tion technique that rotates the coordinate axes so as to most effectively
highlight the infeasibility of the incumbent solution.
Having chosen an orthonormal basis, we need a systematic technique
to derive cutting planes for clconv F- using (4.2). We use the framework
of disjunctive programming to accomplish this goal. Classical disjunctive
programming of Balas [8] requires a linear relaxation P- of F- and a disjunc-
tion that is satisfied by all (x, X) ∈ F.- The linear relaxation P - could be
-
taken equal to L but could also incorporate cutting planes generated from
previous incumbent solutions. As for the choice of disjunctions, we seek
the sources of nonconvexities in (4.2). Evidently, (4.2) has two of these,

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 391

namely, the integrality conditions on the variables xi for i ∈ I and the non-
. /2
convex constraints "X, vi viT # ≤ viT x . Integrality constraints have been
used to derive disjunctions in MILP for the past five decades; examples
of such disjunctions include elementary 0-1 disjunctions, split disjunctions,
GUB disjunctions, etc. We do not detail these disjunctions here. For con-
. /2
straints of the type "Y, vv T # ≤ v T x for fixed v ∈ Rn , Saxena et. al. [52]
proposed a technique to derive a valid disjunction, which we detail next.
. /2
Following [52], we refer to "X, vv T # ≤ v T x as a univariate expression.
Let
 
ηL (v) := min vT x | (x, X) ∈ P-
 
-
ηU (v) := max v T x | (x, X) ∈ P
θ ∈ (ηL (v), ηU (v)) .

In their computational experiments, Saxena et. al. [52] chose θ = vT x̂,


where (x̂, X̂) is the current incumbent. Every (x, X) ∈ F- satisfies the
following disjunction:

9 :
ηL (v) ≤ v T x ≤ θ ;
T T
−(v x)(ηL (v) + θ) + θηL (v) ≤ −"X, vv #
9 :
θ ≤ v T x ≤ ηU (v)
. (4.3)
−(v T x)(ηU (v) + θ) + θηU (v) ≤ −"X, vv T #

This disjunction can be derived by splitting the range [ηL (v), ηU (v)] of the
function v T x over P- into the two intervals [ηL (v), θ] and [θ, ηU (v)] and
constructing a secant approximation of the function −(v T x)2 in each of
the intervals, respectively (see Figure 1 for an illustration).
The disjunction (4.3) can then be embedded within the framework of
Cut Generation Linear Programs (CGLPs) to derive disjunctive cuts as
discussed in the following theorem.
Theorem 4.14 ([8]). 1 Let a polyhedral set P = {x : Ax ≥ b}, a
q
disjunction D = k=1 (Dk x ≥ dk ) and a point x̂ ∈ P be given. Then
q
x̂ ∈ Q := clconv ∪k=1 {x ∈ P | Dk x ≥ dk } if and only if the optimal value
of the following Cut Generation Linear Program (CGLP) is non-negative:

1 We caution the reader that the notation used in this theorem is not specifically tied

to the notation for F and related sets.

www.it-ebooks.info
392 SAMUEL BURER AND ANUREET SAXENA

ηL (c) θ ηU (c)

L1

L2

Fig. 1. The constraint −(vT x)2 ≤ −X, vvT and the disjunction (1) represented
in the space spanned by vT x (horizontal axis) and −X, vvT (vertical axis). The fea-
sible region is the grey area above the parabola between ηL (v) and ηU (v). Disjunction
(4.3) is obtained by taking the piecewise-linear approximation of the parabola, using a
breakpoint at θ, and given by the two lines L1 and L2 . Clearly, if ηL (v) ≤ vT x ≤ θ
then (x, X) must be above L1 to be in the grey area; if θ ≤ v T x ≤ ηU (v) then (x, X)
must be above L2 .

min αT x̂ − β (CGLP)
s.t. A u + DkT vk = α
T k
k = 1, . . . , q
bT uk + dTk vk ≥ β k = 1, . . . , q
k k
u ,v ≥ 0 k = 1, . . . , q
q
. /
ξ T uk + ξkT v k = 1
k=1

where ξ and ξk (k = 1, . . . , q) are any non-negative vectors of conformable


dimensions that satisfy ξk > 0 (k = 1, . . . , q). If the optimal value of
(CGLP) is negative, and (α, β) are part of an optimal solution, then αT x ≥
β is a valid inequality for Q which cuts off x̂.
Next, we illustrate the above procedure for deriving disjunctive cuts
on a small example. Consider the following instance of (MIQCP) derived
from the st ph11 instance from the GLOBALLib repository [27]:
1. 2 /
min x1 + x2 + x3 − x1 + x22 + x23
2
s.t. 2x1 + 3x2 + 4x3 ≤ 35
0 ≤ x1 , x2 , x3 ≤ 4.

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 393

An optimal solution to the linear-semidefinite relaxation


 
1 -
min x1 + x2 + x3 − (X11 + X22 + X33 ) : (x, X) ∈ L ∩ PSD
2

is
⎛ ⎞ ⎛ ⎞
4 16 16 15
x̂ = ⎝ 4 ⎠ , X̂ = ⎝16 16 15⎠
3.75 15 15 15

and so
⎛ ⎞
0 0 0
X̂ − x̂x̂T = ⎝0 0 0 ⎠
0 0 0.9375

has exactly one non-zero eigenvalue. The associated eigenvector and uni-
variate expression are given by cT = (0, 0, 1) and X33 ≤ x23 , respectively.
Note that (x̂, X̂) satisfies the secant approximation X33 ≤ 4x3 of X33 ≤ x23
at equality; hence the secant inequality does not cut off this point. Choosing
θ = 2 in (4.3), we get the following disjunction which is satisfied by every
feasible solution (x, X) ∈ F- for this example:
 ; 
0 ≤ x3 ≤ 2 2 ≤ x3 ≤ 4
.
2x3 − X33 ≥ 0 6x3 − X33 ≥ 8

In order to derive a disjunctive cut, for each term in the disjunction we


sum a non-negative weighted combination of its constraints together with
the original linear constraints of F- to construct a new constraint valid for
that term in the disjunction. If the separate weights in each term can be
chosen in such a way that the resulting constraints for both terms are the
same, then that constraint is a disjunctive cut. In particular, using the
weighting scheme
⎡ ⎤
2x3 − y33 ≥ 0 (14.70588)
⎢ −x1 ≥ −4 (15.68627) ⎥ ;
⎢ ⎥
⎣ −x2 ≥ −4 (23.52941) ⎦
x3 ≥ 0 (27.45098)
 
6x3 − y33 ≥ 8 (14.70588)
,
−2x1 − 3x2 − 4x3 ≥ −35 (7.84314)

we arrive at the disjunctive cut

−15.68627x1 − 23.52941x2 + 56.86275x3 − 14.70588y33 ≥ −156.86274.

This disjunctive cut is violated by (x̂, X̂).

www.it-ebooks.info
394 SAMUEL BURER AND ANUREET SAXENA

This example highlights a very important aspect of disjunctive pro-


gramming: its ability to involve additional problem constraints in deriving
strong cuts for clconv F-. For illustration, consider the same example and
the relaxation L- ∩ RLT ∩ PSD. Note that this relaxation only incorporates
the effect of the general linear constraint 2x1 + 3x2 + 4x3 ≤ 35 via the set
- Defining
L.
⎛ ⎞ ⎛ ⎞
4 4
x1 = ⎝4⎠ , X 1 = x1 (x1 )T x2 = ⎝4⎠ , X 2 = x2 (x2 )T
4 0

and (x̂, X̂) := 15 1 1 1 2 2


16 (x , X ) + 16 (x , X ), i.e., the same (x̂, X̂) is in the exam-
-
ple, it holds that (x̂, X̂) ∈ L∩RLT ∩PSD. Note, however, that the endpoint
(x , X ) is not in clconv F since x1 ∈ F, which implies (x1 , X 1 ) ∈ L- in
1 1 -
this case. So it remains possible that (x̂, X̂) is not in clconv F-. Indeed,
by explicitly involving the linear constraint in a more powerful way during
the convexification process, disjunctive programming cuts off (x̂, X̂) from
clconv F.-
In fact, something stronger can be said. For this same example, define
Flu := {x : 0 ≤ x1 , x2 , x3 ≤ 4} so that F = Flu ∩ {x : 2x1 + 3x2 + 4x3 ≤
35}; also define F-lu accordingly. Now consider the stronger relaxation
L- ∩ clconv F-lu of clconv F, - which completely convexifies with respect to
the bounds l ≤ x ≤ u. Still, even this stronger relaxation contains (x̂, X̂),
and so we see that convexifying with respect to the bounds is simply not
enough to cut off (x̂, X̂). One must incorporate the general linear inequality
in a more aggressive fashion such as disjunctive programming does.
4.2. Computational insights. In [52], the authors report compu-
tational results with a cutting plane procedure based on these ideas. For
instances of (MIQCP) coming from GLOBALLib, the authors solved five
separate relaxations of
 
v∗ := min "C, X# + cT x : x ∈ clconv F- .

These relaxations were (with accompanying “version numbers” and optimal


values for reference)

(V0) vRLT := min "C, X# + cT x


s.t. x ∈ L- ∩ RLT ,
(V1) vPSD := min "C, X# + cT x
s.t. x ∈ L- ∩ RLT ∩ PSD ,

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 395

(V2) vdsj := min "C, X# + cT x


s.t. x ∈ L- ∩ RLT ∩ PSD ∩ “disjunctive cuts”,
(V2-SI) vsec := min "C, X# + cT x
s.t. x ∈ L- ∩ RLT ∩ PSD ∩ “secant cuts”,

(V2-Dsj) vdsj := min "C, X# + cT x
s.t. x ∈ L- ∩ RLT ∩ “disjunctive cuts”.

The secant-cut referred to in the description of V2-SI is obtained by con-


structing the convex envelope of the non-convex inequality "Y, vvT # ≤
. T /2
v x ; using the notation introduced above, the corresponding secant
inequality is given by, "Y, vv T # ≤ (ηL (v) + ηU (v))vT x − ηL (v)ηU (v). Since
the secant inequality can be obtained cheaply once ηL (v) and ηU (v) have
been computed, variant V2-SI helps us assess the marginal importance of
using the computationally expensive disjunctive cut as compared to readily
available secant inequality.

Note that v∗ ≤ vdsj ≤ vsec ≤ vPSD ≤ vRLT and v∗ ≤ vdsj ≤ vdsj ≤
vRLT . V0 was used as a base relaxation by which others were judged. In
particular, for each of the four remaining relaxations, the metric

vRLT − v
percent duality gap closed := × 100
vRLT − v∗

was recorded on each instance using the optimal value v for that relaxation.
Only instances having vRLT > v∗ were selected for testing (129 instances).
We remark that, when present, constraint PSD was enforced with a cutting-
plane approach based on convex quadratic cuts rather than a black-box
SDP solver.
Table 1
Summary of computational results.

V1 V2 V2-SI V2-Dsj
>99.99 % 16 23 24 1
98-99.99 % 1 44 4 29
75-98 % 10 23 17 10
25-75 % 11 22 26 29
0-25 % 91 17 58 60
Average Gap Closed 24.80% 76.49% 44.40% 41.54%

Table 1 summarizes the authors’ key results on the 129 instances.


Each of the main columns gives, for that version, the number of instances
in several bins of the metric “percentage gap closed.” Some comments are
in order.

www.it-ebooks.info
396 SAMUEL BURER AND ANUREET SAXENA

First, the variant V2 code that uses disjunctive cuts closes 50% more
duality gap than the SDP relaxation V1. In fact, relaxations obtained
by adding disjunctive cuts close more than 98% of the duality gap on 67
out of 129 instances; the same figure for SDP relaxations is 17 out of 129
instances. Second, the authors were able to close 99% of the duality gap
on some of the instances such as st qpc-m3a, st ph13, st ph11, ex3 1 4,
st jcbpaf2, ex2 1 9, etc., on which the SDP relaxation closes 0% of the
duality gap.
Third, the variant V2-SI of the code that uses the secant inequal-
ity instead of disjunctive cuts does close a significant proportion (44.40%)
of the duality gap. However, using disjunctive cuts improves this statis-
tic to 76.49% thereby demonstrating the marginal benefits of disjunc-
tive programming. Fourth, it is worth observing that both variants V2
and V2-SI have access to the same kinds of nonconvexities, namely, uni-
variate expressions "X, vvT # ≤ (v T x)2 derived from eigenvectors v of
X̂ − x̂x̂T . Despite this commonality, why does V2, which has access to
the CGLP apparatus, outperform V2-SI? The answer to this question lies
in the way the individual frameworks process the nonconvex expression
"X, vv T # ≤ (v T x)2 . While V2-SI takes a local view of the problem and
convexifies "X, vv T # ≤ (vT x)2 in the 2-dimensional space spanned by vT x
and "X, vvT #, V2 takes a global view of the problem and combines disjunc-
tive terms with other problem constraints. It is precisely this ability to
derive stronger inferences by combining disjunctive information with other
problem constraints that allows V2 to outperform its local counterpart
V2-SI.
Fifth, it is worth observing that removing PSD has a debilitating effect
on the cutting plane algorithm presented in [52] as demonstrated by the
performance of V2-Dsj relative to V2. While the CGLP apparatus allows
us to take a global view of the problem, its ability to derive strong disjunc-
tive cuts is limited by the strength of the initial relaxation. By removing
PSD, the relaxation is significantly weakened, and this subsequently has a
deteriorating effect on the strength of disjunctive cuts later derived.

Table 2
Selection criteria.

% Duality Gap closed by


V1 V2 Instance Chosen
< 10 % > 90 % st jcbpaf2
> 40% < 60% ex9 2 7
< 10% < 10% ex7 3 1

The basic premise of the work in [52] lies in generating valid cutting
planes for clconv F- from the spectrum of X̂ − x̂x̂T , where (x̂, X̂) is the
incumbent solution. In order to highlight the impact of these cuts on

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 397

1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 5 10 15 20 25 30 −0.2 50 100 150 200 250
−0.4 −0.4
−0.6 −0.6
V1 V2

Fig. 2. Plot of the sum of positive and negative eigenvalues for st jcbpaf2
with V1–V2.

2.0 2.0
1.6 1.6
1.2 1.2
0.8 0.8
0.4 0.4
0 0
250 500 750 1000 1250 50 100 150 200 250 300
−0.4 −0.4
−0.8 −0.8
V1 V2

Fig. 3. Plot of the sum of positive and negative eigenvalues for ex 9 2 7 with V1–V2.

the spectrum itself, the authors presented details on three instances listed
in Table 2 which we reproduce here for the sake of illustration. Figures
2–4 report the key results. The horizontal axis represents the number of
iterations while the vertical axis reports the sum of the positive eigenvalues
of X̂ − x̂x̂T (broken line) and the sum of the negative eigenvalues of X̂ − x̂x̂T
(solid line) . Some remarks are in order.
First, the graph of the sum of negative eigenvalues converges to zero
much faster than the corresponding graph for positive eigenvalues. This is
not surprising because the problem of eliminating the negative eigenvalues
is a convex programming problem, namely an SDP; the approach of adding
convex-quadratic cuts is just an iterative cutting-plane based technique to
impose the X − xxT  0 condition. Second, V1 has a widely varying effect
on the sum of positive eigenvalues of X − xxT . This is to be expected
because the X − xxT  0 condition imposes no constraint on the positive
eigenvalues of X − xxT . Furthermore, the sum of positive eigenvalues rep-
resents the part of the nonconvexity of F- that is not captured by PSD.
Third, it is interesting to note that the variant that uses disjunctive cuts,

www.it-ebooks.info
398 SAMUEL BURER AND ANUREET SAXENA

2.0 2.0
1.6 1.6
1.2 1.2
0.8 0.8
0.4 0.4
0 0
1 2 3 4 5 6 5 10 15
−0.4 −0.4
−0.8 −0.8
V1 V2

Fig. 4. Plot of the sum of the positive and negative eigenvalues for ex 7 3 1
with V1–V2.

namely V2, is able to force both positive and negative eigenvalues to con-
verge to 0 for the st jcbpaf2 thereby generating an almost feasible solution
to this problem.
4.3. Working with only the original variables. Finally, we would
like to emphasize that all of the relaxations of clconv F- discussed until now
are defined in the lifted space of (x, X). While the additional variable
X enhances the expressive power of the formulation, it also increases the
size of the formulation drastically, resulting in an enormous computational
overhead which would be, for example, incurred at every node of a branch-
and-bound tree. Ideally, we would like to extract the strength of these
extended reformulations in the form of cutting planes that are defined only
in the space of the original x variable. Systematic approaches for construct-
ing such convex relaxations of (MIQCP) are described in a recent paper
by Saxena et. al. [53]. We briefly reproduce some of these results to expose
the reader to this line of research.
Consider the relaxation L- ∩ RLT ∩ PSD of clconv F, - and define Q :=
-
projx (L ∩ RLT ∩ PSD), which is a relaxation of clconv F (not clconv F!) -
in the space of the original variable x—but one that retains the power of
L- ∩ RLT ∩ PSD. Can we separate from Q, hence enabling us to work solely
in the x-space? Specifically, given a point x̂ that satisfies at least the simple
bounds l ≤ x ≤ u, we desire an algorithmic framework that either shows
x̂ ∈ Q or finds an inequality valid for Q which cuts off x̂. Note that x̂ ∈ Q
if and only if the following system is feasible in X with x̂ fixed:
"Ak , X# + aTk x̂ ≤ bk ∀ k = 1, . . . , m
T T T

lx̂ + x̂l − ll
≤ X ≤ x̂uT + lx̂T − luT
ux̂T + x̂uT − uuT
 
1 x̂T
 0.
x̂ X
As is typical, if this system is infeasible, then duality theory provides a cut
(in this case, a convex quadratic cut) cutting off x̂ from Q. Further, one

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 399

can optimize to obtain a deep cut. We refer the reader to [53] for further
details where the authors report computational results to demonstrate the
computational dividends of working in the space of original variables possi-
bly augmented by a few additional variables. We reproduce a slice of their
computational results in Section 5.
5. Computational case study. To give the reader an impression of
the computational requirements of the relaxations and techniques proposed
in this paper, we compare three implementations for solving the relaxation

min{"C, X# + cT x : (x, X) ∈ L- ∩ RLT ∩ PSD} (5.1)

of the following particular case of (MIQCP), which is called quadratic


programming over the box :

min{xT Cx + cT x : x ∈ [0, 1]n }. (5.2)

We compare a black-box interior-point-method SDP solver (called ipm for


“interior-point method”), the specialized completely-positive solver of [17]
mentioned in Section 3.4 (called cp for “completely positive”), and the
projection cutting plane method of [53] discussed in Section 4.3 (called
proj for “projection”). We refer the reader to the original papers for full
details of the implementations.
Methods ipm and cp work with the formulation (5.1). On the other
hand, proj first reformulates (5.2) as
 
x ∈ [0, 1]n
min t : T (5.3)
x Qx + cT x ≤ t

and then, in a pre-processing step, calculates several convex quadratic con-


-
straints as cutting planes for the relaxation proj(t,x) (L∩RLT ∩PSD) of the
reformulation. The procedure for calculating the cutting planes is outlined
briefly in Section 4.3. Theoretically, if all possible cuts are generated, then
the power of (5.1) is recovered. In practice, however, it is hoped that a
few deep cuts will recover most of the power of (5.1) but save a significant
amount of computation time. Finally, letting αk t2 +βk t+xT Ak x+aTk x ≤ bk
represent the derived convex cuts, proj solves the relaxation
 
x ∈ [0, 1]n
min t : . (5.4)
αk t2 + βk t + xT Ak x + aTk x ≤ bk ∀ k

Nine instances from [20] are tested. Their relevant characteristics un-
der relaxations (5.1) and (5.4) are given in Table 3, and the timings (in
seconds) are give in Table 4. Also, in Table 5, we give the percentage gaps
closed by the three methods relative to the pure linear relaxation L- ∩ RLT
(see Section 4.2 for a careful definition of the percentage gap closed). A
few comments are in order.

www.it-ebooks.info
400 SAMUEL BURER AND ANUREET SAXENA

Table 3
Sizes of tested instances.

# Constraints
# Variables Linear Convex
Instance ipm/cp proj ipm/cp proj ipm/cp proj
(SDP) (quad)
spar100-025-1 5151 203 20201 156 1 119
spar100-025-2 5151 201 20201 151 1 95
spar100-025-3 5151 201 20201 150 1 114
spar100-050-1 5151 201 20201 150 1 98
spar100-050-2 5151 201 20201 150 1 113
spar100-050-3 5151 201 20201 150 1 97
spar100-075-1 5151 201 20201 150 1 131
spar100-075-2 5151 201 20201 150 1 109
spar100-075-3 5151 199 20201 147 1 90

Table 4
Computational utility of projected relaxations.

Time (sec)
Instances ipm cp proj (pre-process + solve)
spar100-025-1 5719.42 59 670.15 + 1.14
spar100-025-2 10185.65 54 538.03 + 1.52
spar100-025-3 5407.09 58 656.59 + 1.24
spar100-050-1 10139.57 76 757.14 + 1.07
spar100-050-2 5355.20 92 929.91 + 1.26
spar100-050-3 7281.26 76 747.46 + 0.82
spar100-075-1 9660.79 101 1509.96 + 2.00
spar100-075-2 6576.10 100 936.61 + 1.23
spar100-075-3 10295.88 81 657.84 + 0.87

First, on each instance, ipm is not competitive with either cp or proj.


This illustrates a recognized trend in solving relaxations of this sort, namely
that, at this point in time, specialized solvers perform better than black-box
ones. Perhaps this will change as black-box solvers become more robust.
Second, cp performs best in terms of overall time on each instance, but
proj, discounting its pre-processing phase, solves its relaxation the quick-
est while still closing most of the gap that ipm and cp do. Within the
context of using proj within branch-and-bound, this accrues significance
due to two observations: (i) most contemporary branch-and-bound proce-
dures generate cutting planes primarily at the root node and only sparingly
at other nodes; and (ii) such a relaxation would be solved hundreds or thou-
sands of times within the tree. So the pre-processing time of proj can be
effectively amortized over the entire branch-and-bound tree.

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 401

Table 5
Computational utility of projected relaxations.

% Gap Closed
Instances ipm/cp proj
spar100-025-1 98.93% 92.36%
spar100-025-2 99.09% 92.16%
spar100-025-3 99.33% 93.26%
spar100-050-1 98.17% 93.62%
spar100-050-2 98.57% 94.13%
spar100-050-3 99.39% 95.81%
spar100-075-1 99.19% 95.84%
spar100-075-2 99.18% 96.47%
spar100-075-3 99.19% 96.06%

We also mention that, while proj is currently being solved by a non-


linear programming algorithm, the convex quadratic constraints of proj
could actually be approximated by polyhedral relaxations introduced by
Ben-Tal and Nemirovski [12] (also see [59]) yielding LP relaxations of these
problems. Such LP relaxations are extremely desirable for branch-and-
bound algorithms for two reasons. One, they can be efficiently re-optimized
using warm-starting capabilities of LP solvers thereby reducing the com-
putational overhead at nodes of the enumeration tree. Two, these LP re-
laxations can easily avail techniques, such as branching strategies, cutting
planes, heuristics, etc., which have been developed by the MILP commu-
nity in the past five decades (see [1] for application of these techniques in
the context of convex MINLPs).

6. Conclusion. Table 6 catalogues the results covered in this paper.


The first column lists the main concepts while the following two columns list
their manifestations for M01LP and MIQCP, respectively. Some remarks
are in order.
First, while linear programming based relaxations are almost univer-
sally used in M01LP, the same does not hold for MIQCP. There is a wide
variety of relaxations for MIQCP that can be used, starting from the ex-
tended RLT+SDP relaxations to the compact eigen-reformulations (see
[53]) defined in the original space of variables. It must be noted that all of
these relaxations are currently solved by interior point methods that lack
efficient re-optimization capabilities making them bottlenecks in a branch-
and-bound procedure.
Second, there is a well established theory of exact formulations in
M01LP (see [22]). Many of these results were obtained as byproducts of
the tremendous amount of research that went into proving the perfect
graph conjecture. Unfortunately, the progress in this direction in MIQCP

www.it-ebooks.info
402 SAMUEL BURER AND ANUREET SAXENA

Table 6
M01LP vs. MIQCP.

Concept M01LP MIQCP


Lb ∩ RLT ∩ R2 ∩ S2 ∩ PSD
Relaxation LP relaxation projected SDP
eigen-reformulation
total unimodularity;
Exact Description perfect, ideal, and theorems in Section 2.5
balanced matrices
Elementary xj ∈ {0, 1} Xii ≤ x2i
Non-Convexity
. /2
Linear Transformed (πx ≤ π0 ) ∨ "X, vv T # ≤ vT x
Non-Convexity (πx ≥ π0 + 1)
Sequential Balas [8] Saxena et al. [52]
Convexification

has been rather slow, and exact descriptions are unknown for most classes
of problems except for some very small problem instances.
Third, there is an interesting connection between cuts derived from the
. /2
univariate expression "X, vv T # ≤ v T x for MIQCP and split cuts de-
rived from split disjunctions (πx ≤ π0 )∨(πx ≥ π0 + 1) (π ∈ Zn ) in M01LP.
. /2
To see this, note that "X, vvT # ≤ vT x can be obtained from the el-
ementary non-convex constraint Xii ≤ x2i by the linear transformation
(x, X) −→ (v T x, "X, vv T #) where the linear transformation is chosen de-
pending on the incumbent solution; for example, Saxena et al. [52] derive
the v vector from the spectral decomposition of X̂−x̂x̂T . Similarly, the split
disjunction (πx ≤ π0 ) ∨ (πx ≥ π0 + 1) can be obtained from elementary 0-1
disjunction (xj ≤ 0)∨(xj ≥ 1) by the linear transformation x −→ πx where
the linear transformation is chosen depending on the incumbent solution;
for instance, the well known mixed integer Gomory cuts can be obtained
from split disjunctions derived by monoidal strengthening of elementary
0-1 disjunctions, wherein the monoid that is chosen to strengthen the cut
depends on the incumbent solution (see [9]).

Acknowledgements. The authors are in debt to two anonymous ref-


erees for many helpful suggestions that have improved the paper signifi-
cantly.

REFERENCES

[1] K. Abhishek, S. Leyffer, and J.T. Linderoth, FilMINT: An outer-


approximation-based solver for convex mixed-integer nonlinear programs, IN-
FORMS Journal on Computing, 22 (2010), pp. 555–567.

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 403

[2] T.K. Achterberg, T. Berthold and K.T. Wolter, Constraint integer program-
ming: A new approach to integrate cp and mip, Lecture Notes in Computer
Science, 5015 (2008), pp. 6–20.
[3] F.A. Al-Khayyal, Generalized bilinear programming, Part i: Models, applica-
tions, and linear programming relaxations, European Journal of Operational
Research, 60 (1992), pp. 306–314.
[4] F.A. Al-Khayyal and J.E. Falk, Jointly constrained biconvex programming,
Math. Oper. Res., 8 (1983), pp. 273–286.
[5] F. Alizadeh and D. Goldfarb, Second-order cone programming, Math. Pro-
gram., 95 (2003), pp. 3–51. ISMP 2000, Part 3 (Atlanta, GA).
[6] K.M. Anstreicher and S. Burer, Computable representations for convex hulls
of low-dimensional quadratic forms, with K. Anstreicher, Mathematical Pro-
gramming (Series B), 124(1-2), pp. 33-43 (2010).
[7] A. Atamtürk and V. Narayanan, Conic mixed-integer rounding cuts, Math.
Program., 122 (2010), pp. 1–20.
[8] E. Balas, Disjunctive programming: properties of the convex hull of feasible
points, Discrete Appl. Math., 89 (1998), pp. 3–44.
[9] E. Balas, S. Ceria, and G. Cornuéjols, A lift-and-project cutting plane al-
gorithm for mixed 0-1 programs, Mathematical Programming, 58 (1993),
pp. 295–324.
[10] A. Beck, Quadratic matrix programming, SIAM J. Optim., 17 (2006), pp. 1224–
1238 (electronic).
[11] P. Belotti, Disjunctive cuts for non-convex MINLP, IMA Volume Series, Springer
2010, accepted.
http://myweb.clemson.edu/∼pbelott/papers/belotti-disj-MINLP.pdf.
[12] A. Ben-Tal and A. Nemirovski, On polyhedral approximations of the second-
order cone, Math. Oper. Res., 26 (2001), pp. 193–205.
[13] A. Berman and N. Shaked-Monderer, Completely Positive Matrices, World
Scientific, 2003.
[14] D. Bienstock and M. Zuckerberg, Subset algebra lift operators for 0-1 integer
programming, SIAM J. Optim., 15 (2004), pp. 63–95 (electronic).
[15] I. Bomze and F. Jarre, A note on Burer’s copositive representation of mixed-
binary QPs, Optimization Letters, 4 (2010), pp. 465–472.
[16] P. Bonami, L. Biegler, A. Conn, G. Cornuéjols, I. Grossmann, C. Laird,
J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Wächter, An algorithmic
framework for convex mixed-integer nonlinear programs., Discrete Optimiza-
tion, 5 (2008), pp. 186–204.
[17] S. Burer, Optimizing a polyhedral-semidefinite relaxation of completely positive
programs, Mathematical Programming Computation, 2(1), pp 1–19 (2010).
[18] , On the copositive representation of binary and continuous nonconvex
quadratic programs, Mathematical Programming, 120 (2009), pp. 479–495.
[19] S. Burer and A.N. Letchford, On nonconvex quadratic programming with box
constraints, SIAM J. Optim., 20 (2009), pp. 1073–1089.
[20] S. Burer and D. Vandenbussche, Globally solving box-constrained nonconvex
quadratic programs with semidefinite-based finite branch-and-bound, Comput.
Optim. Appl., 43 (2009), pp. 181–195.
[21] D. Coppersmith, O. Günlük, J. Lee, and J. Leung, A polytope for a product of
a real linear functions in 0/1 variables, manuscript, IBM, Yorktown Heights,
NY, December 2003.
[22] G. Cornuéjols, Combinatorial optimization: packing and covering, Society for
Industrial and Applied Mathematics, Philadelphia, PA, USA, 2001.
[23] R.W. Cottle, G.J. Habetler, and C.E. Lemke, Quadratic forms semi-definite
over convex cones, in Proceedings of the Princeton Symposium on Mathemat-
ical Programming (Princeton Univ., 1967), Princeton, N.J., 1970, Princeton
Univ. Press, pp. 551–565.

www.it-ebooks.info
404 SAMUEL BURER AND ANUREET SAXENA

[24] G. Danninger and I.M. Bomze, Using copositivity for global optimality criteria
in concave quadratic programming problems, Math. Programming, 62 (1993),
pp. 575–580.
[25] G. Dantzig, R. Fulkerson, and S. Johnson, Solution of a large-scale traveling-
salesman problem, J. Operations Res. Soc. Amer., 2 (1954), pp. 393–410.
[26] E. de Klerk and D.V. Pasechnik, Approximation of the stability number of a
graph via copositive programming, SIAM J. Optim., 12 (2002), pp. 875–892.
[27] See the website: www.gamsworld.org/global/globallib/globalstat.htm.
[28] R. Horst and N.V. Thoai, DC programming: overview, J. Optim. Theory Appl.,
103 (1999), pp. 1–43.
[29] M. Jach, D. Michaels, and R. Weismantel, The convex envelope of (N − 1)-
convex fucntions, SIAM J. Optim., 19 (2008), pp. 1451–1466.
[30] R. Jeroslow, There cannot be any algorithm for integer programming with
quadratic constraints., Operations Research, 21 (1973), pp. 221–224.
[31] S. Kim and M. Kojima, Second order cone programming relaxation of non-
convex quadratic optimization problems, Optim. Methods Softw., 15 (2001),
pp. 201–224.
[32] , Exact solutions of some nonconvex quadratic optimization problems via
SDP and SOCP relaxations, Comput. Optim. Appl., 26 (2003), pp. 143–154.
[33] M. Kojima and L. Tunçel, Cones of matrices and successive convex relaxations
of nonconvex sets, SIAM J. Optim., 10 (2000), pp. 750–778.
[34] J.B. Lasserre, Global optimization with polynomials and the problem of moments,
SIAM J. Optim., 11 (2001), pp. 796–817.
[35] M. Laurent, A comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre
relaxations for 0-1 programming, Math. Oper. Res., 28 (2003), pp. 470–496.
[36] J. Linderoth, A simplicial branch-and-bound algorithm for solving quadratically
constrained quadratic programs, Math. Program., 103 (2005), pp. 251–282.
[37] L. Lovász and A. Schrijver, Cones of matrices and set-functions and 0-1 opti-
mization, SIAM Journal on Optimization, 1 (1991), pp. 166–190.
[38] T. Matsui, NP-hardness of linear multiplicative programming and related prob-
lems, J. Global Optim., 9 (1996), pp. 113–119.
[39] J.E. Maxfield and H. Minc, On the matrix equation X  X = A, Proc. Edinburgh
Math. Soc. (2), 13 (1962/1963), pp. 125–129.
[40] G.P. McCormick, Computability of global solutions to factorable nonconvex pro-
grams. I. Convex underestimating problems, Math. Programming, 10 (1976),
pp. 147–175.
[41] See the website: http://www.gamsworld.org/minlp/.
[42] K.G. Murty and S.N. Kabadi, Some NP-complete problems in quadratic and
nonlinear programming, Math. Programming, 39 (1987), pp. 117–129.
[43] G. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization, Wiley-
Interscience, 1999.
[44] M. Padberg, The Boolean quadric polytope: some characteristics, facets and rel-
atives, Math. Programming, 45 (1989), pp. 139–172.
[45] P. Pardalos, Global optimization algorithms for linearly constrained indefi-
nite quadratic problems, Computers and Mathematics with Applications, 21
(1991), pp. 87–97.
[46] P.M. Pardalos and S.A. Vavasis, Quadratic programming with one negative
eigenvalue is NP-hard, J. Global Optim., 1 (1991), pp. 15–22.
[47] P. Parrilo, Structured Semidefinite Programs and Semi-algebraic Geometry
Methods in Robustness and Optimization, PhD thesis, California Institute
of Technology, 2000.
[48] G. Pataki, On the rank of extreme matrices in semidefinite programs and the
multiplicity of optimal eigenvalues, Mathematics of Operations Research, 23
(1998), pp. 339–358.
[49] J. Povh and F. Rendl, A copositive programming approach to graph partitioning,
SIAM J. Optim., 18 (2007), pp. 223–241.

www.it-ebooks.info
OLD WINE IN A NEW BOTTLE: THE MILP ROAD TO MIQCP 405

[50] , Copositive and semidefinite relaxations of the quadratic assignment prob-


lem, Discrete Optim., 6 (2009), pp. 231–241.
[51] N.V. Sahinidis, BARON: a general purpose global optimization software package,
J. Glob. Optim., 8 (1996), pp. 201–205.
[52] A. Saxena, P. Bonami, and J. Lee, Convex relaxations of non-convex
mixed integer quadratically constrained programs: Extended formulations,
Mathematical Programming (Series B), 124(1-2), pp. 383–411 (2010).
http://dx.doi.org/10.1007/s10107-010-0371-9.
[53] , Convex relaxations of non-convex mixed integer quadratically constrained
programs: Projected formulations, 2010. To appear in Mathematical Program-
ming. http://dx.doi.org/10.1007/s10107-010-0340-3.
[54] H.D. Sherali and W.P. Adams, A hierarchy of relaxations between the continuous
and convex hull representations for zero-one programming problems, SIAM J.
Discrete Math., 3 (1990), pp. 411–430.
[55] H.D. Sherali and W.P. Adams, A Reformulation-Linearization Technique (RLT)
for Solving Discrete and Continuous Nonconvex Problems, Kluwer, 1997.
[56] N. Shor, Quadratic optimization problems, Soviet Journal of Computer and Sys-
tems Science, 25 (1987), pp. 1–11. Originally published in Tekhnicheskaya
Kibernetika, 1:128–139, 1987.
[57] J.F. Sturm and S. Zhang, On cones of nonnegative quadratic functions, Math.
Oper. Res., 28 (2003), pp. 246–267.
[58] M. Tawarmalani and N.V. Sahinidis, Convexification and global optimization in
continuous and mixed-integer nonlinear programming, Vol. 65 of Nonconvex
Optimization and its Applications, Kluwer Academic Publishers, Dordrecht,
2002. Theory, algorithms, software, and applications.
[59] J.P. Vielma, S. Ahmed, and G.L. Nemhauser, A lifted linear programming
branch-and-bound algorithm for mixed-integer conic quadratic programs, IN-
FORMS J. Comput., 20 (2008), pp. 438–450.
[60] Y. Yajima and T. Fujie, A polyhedral approach for nonconvex quadratic program-
ming problems with box constraints, J. Global Optim., 13 (1998), pp. 151–170.
[61] Y. Ye and S. Zhang, New results on quadratic minimization, SIAM J. Optim.,
14 (2003), pp. 245–267 (electronic).

www.it-ebooks.info
www.it-ebooks.info
LINEAR PROGRAMMING RELAXATIONS
OF QUADRATICALLY CONSTRAINED
QUADRATIC PROGRAMS
ANDREA QUALIZZA∗ , PIETRO BELOTTI† , AND FRANÇOIS MARGOT∗‡

Abstract. We investigate the use of linear programming tools for solving semidefi-
nite programming relaxations of quadratically constrained quadratic problems. Classes
of valid linear inequalities are presented, including sparse P SD cuts, and principal
minors P SD cuts. Computational results based on instances from the literature are
presented.

Key words. Quadratic programming, semidefinite programming, relaxation, linear


programming.

AMS(MOS) subject classifications. 90C57.

1. Introduction. Many combinatorial problems have Linear Pro-


gramming (LP) relaxations that are commonly used for their solution
through branch-and-cut algorithms. Some of them also have stronger re-
laxations involving positive semidefinite (P SD) constraints. In general,
stronger relaxations should be preferred when solving a problem, thus us-
ing these P SD relaxations is tempting. However, they come with the
drawback of requiring a Semidefinite Programming (SDP) solver, creating
practical difficulties for an efficient implementation within a branch-and-
cut algorithm. Indeed, a major weakness of current SDP solvers compared
to LP solvers is their lack of efficient warm starting mechanisms. Another
weakness is solving problems involving a mix of P SD constraints and a
large number of linear inequalities, as these linear inequalities put a heavy
toll on the linear algebra steps required during the solution process.
In this paper, we investigate LP relaxations of P SD constraints with
the aim of capturing most of the strength of the P SD relaxation, while still
being able to use an LP solver. The LP relaxation we obtain is an outer-
approximation of the P SD cone, with the typical convergence difficulties
when aiming to solve problems to optimality. We thus do not cast this
work as an efficient way to solve P SD problems, but we aim at finding
practical ways to approximate P SD constraints with linear ones.
We restrict our experiments to Quadratically Constrained Quadratic
Programs (QCQP). A QCQP problem with variables x ∈ Rn and y ∈ Rm
is a problem of the form

∗ Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213.


† Department of Mathematical Sciences, Clemson University, Clemson, SC 29634.
‡ Corresponding author ([email protected]). Supported by NSF grant NSF-

0750826.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 407
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_14,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
408 ANDREA QUALIZZA ET AL.

max xT Q0 x + aT0 x + bT0 y


s.t.
xT Qk x + aTk x + bTk y ≤ ck for k = 1, 2, . . . , p (QCQP)
lxi ≤ xi ≤ uxi for i = 1, 2, . . . , n
lyj ≤ yj ≤ uyj for j = 1, 2, . . . , m

where, for k = 0, 1, 2, . . . , p, Qk is a rational symmetric n × n-matrix, ak


is a rational n-vector, bk is a rational m-vector, and ck ∈ Q. Moreover,
the lower and upper bounds lxi , uxi for i = 1, . . . , n, and lyj , uyj for j =
1, . . . , m are all finite. If Q0 is negative semidefinite and Qk is positive
semidefinite for each k = 1, 2, . . . , p, problem QCQP is convex and thus
easy to solve. Otherwise, the problem is NP-hard [6].
An alternative lifted formulation for QCQP is obtained by replacing
each quadratic term xi xj with a new variable Xij . Let X = xxT be the
matrix with entry Xij corresponding to the quadratic term xi xj . For square
matrices A and B of the same dimension, let A • B denote the Frobenius
inner product of A and B, i.e., the trace of AT B. Problem QCQP is then
equivalent to

max Q0 • X + aT0 x + bT0 y


s.t.
Qk • X + aTk x + bTk y ≤ ck for k = 1, 2, . . . , p
lxi ≤ xi ≤ uxi for i = 1, 2, . . . , n (LIFT)
lyj ≤ yj ≤ uyj for j = 1, 2, . . . , m
X = xxT .

The difficulty in solving problem LIFT lies in the non-convex con-


straint X = xxT . A relaxation, dubbed P SD, that is possible to solve
relatively efficiently is obtained by relaxing this constraint to the require-
ment that X − xxT be positive semidefinite, i.e., X − xxT  0. An alterna-
tive relaxation of QCQP, dubbed RLT , is obtained by the Reformulation
Linearization Technique [17], using products of pairs of original constraints
and bounds and replacing nonlinear terms with new variables.
Anstreicher [2] compares the P SD and RLT relaxations on a set of
quadratic problems with box constraints, i.e., QCQP problems with p = 0
and with all the variables bounded between 0 and 1. He shows that the
P SD relaxations of these instances are fairly good and that combining
the P SD and RLT relaxations yields significantly tighter relaxations than
either of the P SD or RLT relaxations. The drawback of combining the
two relaxations is that current SDP solvers have difficulties to handle the
large number of linear constraints of the RLT .
Our aim is to solve relaxations of QCQP using exclusively linear pro-
gramming tools. The RLT is readily applicable for our purposes, while the
P SD technique requires a cutting plane approach as described in Section 2.

www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 409

In Section 3 we consider several families of valid cuts. The focus is


essentially on capturing the strength of the positive semidefinite condition
using standard cuts [18], and some sparse versions of these.
We analyze empirically the strength of the considered cuts on instances
taken from GLOBALLib [10] and quadratic programs with box constraints
described in more details in the next section. Implementation and compu-
tational results are presented in Section 4. Finally, Section 5 summarizes
the results and gives possible directions for future research.

2. Relaxations of QCQP problems. A typical approach to get


bounds on the optimal value of a QCQP is to solve a convex relaxation.
Since our aim is to work with linear relaxations, the first step is to lin-
earize LIFT by relaxing the last constraint to X = X T . We thus get the
Extended formulation

max Q0 • X + aT0 x + bT0 y


s.t.
Qk • X + aTk x + bTk y ≤ ck for k = 1, 2, . . . , p
lxi ≤ xi ≤ uxi for i = 1, 2, . . . , n (EXT)
lyj ≤ yj ≤ uyj for j = 1, 2, . . . , m
X = XT .

EXT is a Linear Program with n(n + 3)/2 + m variables and the same
number of constraints as QCQP. Note that the optimal value of EXT is
usually a weak upper bound for QCQP, as no constraint links the values
of the x and X variables. Two main approaches for doing that have been
proposed and are based on relaxations of the last constraint of LIFT,
namely

X − xxT = 0. (2.1)

They are known as the Positive Semidefinite (P SD) relaxation and


the Reformulation Linearization Technique (RLT ) relaxation.

2.1. PSD relaxation. As X − xxT = 0 implies X − xxT  0, us-


ing this last constraint yields a convex relaxation of QCQP. This is the
approach used in [18, 20, 21, 23], among others.
Moreover, using Schur’s complement
 
1 xT
X − xx  0
T
⇔  0,
x X

and defining
   
−ck aTk /2 1 xT
Q̃k = , X̃ = ,
ak /2 Qk x X

www.it-ebooks.info
410 ANDREA QUALIZZA ET AL.

we can write the P SD relaxation of QCQP in the compact form

max Q̃0 • X̃ + bT0 y


s.t.
Q̃ • X̃ + bTk y ≤ 0 k = 1, 2, . . . , p
lxi ≤ xi ≤ uxi i = 1, 2, . . . , n (PSD)
lyj ≤ yj ≤ uyj j = 1, 2, . . . , m
X̃  0.

This is a positive semidefinite problem with linear constraints. It can thus


be solved in polynomial time using interior point algorithms. PSD is
tighter than usual linear relaxations for problems such as the Maximum
Cut, Stable Set, and Quadratic Assignment problems [25]. All these prob-
lems can be formulated as QCQPs.
2.2. RLT relaxation. The Reformulation Linearization Technique
[17] can be used to produce a relaxation of QCQP. It adds linear inequal-
ities to EXT. These inequalities are derived from the variable bounds and
constraints of the original problem as follows: multiply together two origi-
nal constraints or bounds and replace each product term xi xj with the vari-
able Xij . For instance, let xi , xj , i, j ∈ {1, 2, . . . , n} be two variables from
QCQP. By taking into account only the four original bounds xi − lxi ≥ 0,
xi − uxi ≤ 0, xj − lxj ≥ 0, xj − uxj ≤ 0, we get the RLT inequalities

Xij − lxi xj − lxj xi ≥ −lxi lxj ,


Xij − uxi xj − uxj xi ≥ −uxi uxj ,
(2.2)
Xij − lxi xj − uxj xi ≤ −lxi uxj ,
Xij − uxi xj − lxj xi ≤ −uxi lxj .

Anstreicher [2] observes that, for Quadratic Programs with box con-
straints, the P SD and RLT constraints together yield much better bounds
than those obtained from the PSD or RLT relaxations. In this work,
we want to capture the strength of both techniques and generate a Linear
Programming relaxation of QCQP.
Notice that the four inequalities above, introduced by McCormick [12],
constitute the convex envelope of the set {(xi , xj , Xij ) ∈ R3 : lxi ≤ xi ≤
uxi , lxj ≤ xj ≤ uxj , Xij = xi xj } as proven by Al-Khayyal and Falk [1], i.e.,
they are the tightest relaxation for the single term Xij .
3. Our framework. While the RLT constraints are linear in the
variables in the EXT formulation and therefore can be added directly
to EXT, this is not the case for the P SD constraint. We use a linear
outer-approximation of the PSD relaxation and a cutting plane framework,
adding a linear inequality separating the current solution from the P SD
cone.
The initial relaxation we use and the various cuts generated by our
separation procedure are described in more details in the next sections.

www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 411

3.1. Initial relaxation. Our initial relaxation is the EXT formula-


tion together with the O(n2 ) RLT constraints derived from the bounds on
the variables xi , i = 1, 2, . . . , n. We did not include the RLT constraints
derived from the problem constraints due to their large number and the fact
that we want to avoid the introduction of extra variables for the multivari-
ate terms that occur when quadratic constraints are multiplied together.
The bounds [Lij , Uij ] for the extended variables Xij are computed as
follows:
Lij = min{lxi lxj ; lxi uxj ; uxi lxj ; uxi uxj }, ∀i = 1, 2, . . . , n; j = i, . . . , n
Uij = max{lxi lxj ; lxi uxj ; uxi lxj ; uxi uxj }, ∀i = 1, 2, . . . , n; j = i, . . . , n.

In addition, equality (2.1) implies Xii ≥ x2i . We therefore also make sure
that Lii ≥ 0. In the remainder of the paper, this initial relaxation is
identified as EXT+RLT.
3.2. PSD cuts. We use the equivalence that a matrix is positive
semidefinite if and only if

v T X̃v ≥ 0 for all v ∈ Rn+1 . (3.1)

We can reformulate PSD as the semi-infinite Linear Program

max Q̃0 • X̃ + bT0 y


s.t.
Q̃ • X̃ + bTk y ≤ ck for k = 1, 2, . . . , p
(PSDLP)
lxi ≤ xi ≤ uxi for i = 1, 2, . . . , n
lyj ≤ yj ≤ uyj for j = 1, 2, . . . , m
v T X̃v ≥0 for all v ∈ Rn+1 .

A practical way to use PSDLP is to adopt a cutting plane approach


to separate constraints (3.1) as done in [18].
Let X̃ ∗ be an arbitrary point in the space of the X̃ variables. The
spectral decomposition of X̃ ∗ is used to decide if X̃ ∗ is in the P SD cone or
not. Let the eigenvalues and corresponding orthonormal eigenvectors of X̃ ∗
be λk and vk for k = 1, 2, . . . , n, and assume without loss of generality that
λ1 ≤ λ2 ≤ . . . ≤ λn and let t ∈ {0, . . . , n} such that λt < 0 ≤ λt+1 . If t = 0,
then all the eigenvalues are non negative and X̃ ∗ is positive semidefinite.
Otherwise, vkT X̃ ∗ vk = vkT λk vk = λk < 0 for k = 1, . . . , t. Hence, the
valid cut

vkT X̃vk ≥ 0 (3.2)

is violated by X̃ ∗ . Cuts of the form (3.2) are called PSDCUTs in the


remainder of the paper.
The above procedure has two major weaknesses: First, only one cut is
obtained from eigenvector vk for k = 1, . . . , t, while computing the spectral

www.it-ebooks.info
412 ANDREA QUALIZZA ET AL.

Sparsify(v, X̃, pctN Z , pctV IOL )


1 minVIOL ← −v T X̃v · pctV IOL
2 maxNZ ← length[v] · pctN Z 
3 w←v
4 perm ← random permutation of 1 to length[w]
5 for j ← 1 to length[w]
6 do
7 z ← w, z[perm[j]] ← 0
8 if −zT X̃z > minV IOL
9 then w ← z
10 if number of non-zeroes in w < maxN Z
11 then output w

Fig. 1. Sparsification procedure for PSD cuts.

decomposition requires a non trivial investment in cpu time, and second,


the cuts are usually very dense, i.e. almost all entries in vv T are nonzero.
Dense cuts are frowned upon when used in a cutting plane approach, as they
might slow down considerably the reoptimization of the linear relaxation.
To address these weaknesses, we describe in the next section a heuristic
to generate several sparser cuts from each of the vectors vk for k = 1, . . . , t.
3.3. Sparsification of PSD cuts. A simple idea to get sparse cuts
is to start with vector w = vk , for k = 1, . . . , t, and iteratively set to zero
some component of w, provided that wT X̃ ∗ w remains sufficiently negative.
If the entries are considered in random order, several cuts can be obtained
from a single eigenvector vk . For example, consider the Sparsify procedure
in Figure 1, taking as parameters an initial vector v, a matrix X̃, and two
numbers between 0 and 1, pctN Z and pctV IOL , that control the maximum
percentage of nonzero entries in the final vector and the minimum viola-
tion requested for the corresponding cut, respectively. In the procedure,
parameter length[v] identifies the size of vector v.
It is possible to implement this procedure to run in O(n2 ) if length[v] =
n + 1: Compute and update a vector m such that


n+1
mj = wj wi X̃ij for j = 1, . . . , n + 1 .
i=1

Its initial computation takes O(n2 ) and its update, after a single entry of w
is set to 0, takes O(n). The vector m can be used to compute the left hand
side of the test in step 8 in constant time given the value of the violation
d for the inequality generated by the current vector w: Setting the entry
 = perm[j] of w to zero reduces the violation by 2m − w2 X̃ and thus
the violation of the resulting vector is (d − 2m + w2 X̃ ).

www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 413

A slight modification of the procedure is used to obtain several cuts


from the same eigenvector: Change the loop condition in step 5 to consider
the entries in perm in cyclical order, from all possible starting points s in
{1, 2 . . . , length[w]}, with the additional condition that entry s − 1 is not
set to 0 when starting from s to guarantee that we do not generate always
the same cut. From our experiments, this simple idea produces collections
of sparse and well-diversified cuts. This is referred to as SPARSE1 in the
remainder of the paper.
We also consider the following variant of the procedure given in Fig-
ure 1. Given a vector w, let X̃[w] be the principal minor of X̃ induced by
the indices of the nonzero entries in w. Replace step 7 with
7. z ← w̄ where w̄ is an eigenvector corresponding to the
most negative eigenvalue of a spectral decomposition of
X̃[w] , z[perm[j]] ← 0.
This is referred to as SPARSE2 in the remainder, and we call the cuts
generated by SPARSE1 or SPARSE2 described above Sparse P SD cuts.
Once sparse P SD cuts are generated, for each vector w generated,
we can also add all P SD cuts given by the eigenvectors corresponding to
negative eigenvalues of a spectral decomposition of X̃[w] . These cuts are
valid and sparse. They are called Minor P SD cuts and denoted by MINOR
in the following.
An experiment to determine good values for the parameters pctN Z and
pctV IOL was performed on the 38 GLOBALLIB instances and 51 BoxQP
instances described in Section 4.1. It is run by selecting two sets of three
values in [0, 1], {VLOW , VM ID , VU P } for pctV IOL and {NLOW , NM ID , NU P }
for pctN Z . The nine possible combinations of these parameter values are
used and the best of the nine (Vbest , Nbest ) is selected. We then center
and reduce the possible ranges around Vbest and Nbest , respectively, and
repeat the operation. The procedure is stopped when the best candidate
parameters are (VM ID , NM ID ) and the size of the ranges satisfy |VU P −
VLOW | ≤ 0.2 and |NU P − NLOW | ≤ 0.1.
In order to select the best value of the parameters, we compare the
bounds obtained by both algorithms after 1, 2, 5, 10, 20, and 30 seconds
of computation. At each of these times, we count the number of times
each algorithm outperforms the other by at least 1% and the winner is the
algorithm with the largest number of wins over the 6 clocked times. It is
worth noting that typically the majority of the comparisons end up as ties,
implying that the results are not extremely sensitive to the selected values
for the parameters.
For SPARSE1, the best parameter values are pctV IOL = 0.6 and
pctN Z = 0.2. For SPARSE2, they are pctV IOL = 0.6 and pctN Z =
0.4. These values are used in all experiments using either SPARSE1 or
SPARSE2 in the remainder of the paper.

www.it-ebooks.info
414 ANDREA QUALIZZA ET AL.

4. Computational results. In the implementation, we have used


the Open Solver Interface (Osi-0.97.1) from COIN-OR [8] to create and
modify the LPs and to interface with the LP solvers ILOG Cplex-11.1. To
compute eigenvalues and eigenvectors, we use the dsyevx function provided
by the LAPACK library version 3.1.1. We also include a cut management
procedure to reduce the number of constraints in the outer approximation
LP. This procedure, applied at the end of each iteration, removes the cuts
that are not satisfied with equality by the optimal solution. Note however
that the constraints from the EXT+RLT formulation are never removed,
only constraints from added cutting planes are possibly removed.
The machine used for the tests is a 64 bit 2.66GHz AMD processor,
64GB of RAM memory, and Linux kernel 2.6.29. Tolerances on the accu-
racy of the primal and dual solutions of the LP solver and LAPACK calls
are set to 10−8 .
The set of instances used for most experiments consists of 51 BoxQP
instances with at most 50 variables and the 38 GLOBALLib instances as
described in Section 4.1.
For an instance I and a given relaxation of it, we define the gap closed
by the relaxation as

RLT − BN D
100 · , (4.1)
RLT − OP T
where BN D and RLT are the optimal value for the given relaxation and
the EXT+RLT relaxation respectively, and OP T is either the optimal
value of I or the best known value for a feasible solution. The OP T values
are taken from [14].
4.1. Instances. Tests are performed on a subset of instances from
GLOBALLib [10] and on Box Constrained Quadratic Programs (BoxQPs)
[24]. GLOBALLib contains 413 continuous global optimization problems
of various sizes and types, such as BoxQPs, problems with complemen-
tarity constraints, and general QCQPs. Following [14], we select 160 in-
stances from GLOBALLib having at most 50 variables and that can easily
be formulated as QCQP. The conversion of a non-linear expression into a
quadratic expression, when possible, is performed by adding new variables
and constraints to the problem. Additionally, bounds on the variables are
derived using linear programming techniques and these bound are included
in the formulation. From these 160 instances in AMPL format, we substi-
tute each bilinear term xi xj by the new variable Xij as described for the
LIFT formulation. We build two collections of linearized instances in MPS
format, one with the original precision on the coefficients and right hand
side, and the second with 8-digit precision. In our experiments we used the
latter.
As observed in [14], using together the SDP and RLT relaxations
yields stronger bounds than those given by the RLT relaxation only for 38

www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 415

out of 160 GLOBALLib instances. Hence, we focus on these 38 instances


to test the effectiveness of the P SD Cuts and their sparse versions.
The BoxQP collection contains 90 instances with a number of variables
ranging from 20 to 100. Due to time limit constraints and the number of
experiments to run, we consider only instances with a number of variables
between 20 to 50, for a total of 51 BoxQP problems.
The converted GLOBALLib and BoxQP instances are available in
MPS format from [13].

4.2. Effectiveness of each class of cuts. We first compare the


effectiveness of the various classes of cuts when used in combination with
the standard PSDCUTs. For these tests, at most 1,000 cutting iterations
are performed, at most 600 seconds are used, and operations are stopped
if tailing off is detected. More precisely, let zt be the optimal value of the
linear relaxation at iteration t. The operations are halted if t ≥ 50 and
zt ≥ (1 − 0.0001) · zt−50 . A cut purging procedure is used to remove cuts
that are not tight at iteration t if the condition zt ≥ (1 − 0.0001) · zt−1 is
2
satisfied. On average in each iteration the algorithm generates n2 cuts, of
which only n2 are are kept by the cut purging procedure and the rest are
discarded.
In order to compare two different cutting plane algorithms, we compare
the closed gap values first after a fixed number of iterations, and second
at several given times, for all QCQP instances at avail. Comparisons at
fixed iterations indicate the quality of the cuts, irrespective of the time
used to generate them. Comparisons at given times are useful if only
limited time is available for running the cutting plane algorithms and a
good approximation of the P SD cone is sought. The closed gaps obtained
at a given point are deemed different only if their difference is at least g% of
the initial gap. We report comparisons for g = 1 and g = 5. Comparisons at
one point is possible only if both algorithms reach that point. The number
of problems for which this does not happen – because, at a given time,
either result was not available or one of the two algorithms had already
stopped, or because either algorithm had terminated in fewer iterations
– is listed in the “inc.” (incomparable) columns in the tables. For the
remaining problems, we report the percentage of problems for which one
algorithm is better than the other and the percentage of problems were they
are tied. Finally, we also report the average improvement in gap closed for
the second algorithm over the first algorithm in the column labeled “impr.”.
Tests are first performed to decide which combination of the SPARSE1,
SPARSE2 and MINOR cuts perform best on average. Based on Tables 1
and 2 below, we conclude that using MINOR is useful both in terms of itera-
tion and time, and that the algorithm using PSDCUT+SPARSE2+MINOR
(abbreviated S2M in the remainder) dominates the algorithm using PSD-
CUT+SPARSE1+MINOR (abbreviated S1M) both in terms of iteration
and time. Table 1 gives the comparison between S1M and S2M at differ-

www.it-ebooks.info
416 ANDREA QUALIZZA ET AL.

Table 1
Comparison of S1M with S2M at several iterations.

g=1 g=5
Iteration S1M S2M Tie S1M S2M Tie inc. impr.
1 7.87 39.33 52.80 1.12 19.1 79.78 0.00 3.21
2 17.98 28.09 53.93 0.00 10.11 89.89 0.00 2.05
3 17.98 19.10 62.92 1.12 7.87 91.01 0.00 1.50
5 12.36 14.61 73.03 3.37 5.62 91.01 0.00 1.77
10 10.11 13.48 76.41 0.00 5.62 94.38 0.00 1.42
15 4.49 13.48 82.03 1.12 6.74 92.14 0.00 1.12
20 1.12 10.11 78.66 1.12 6.74 82.02 10.11 1.02
30 1.12 8.99 79.78 1.12 5.62 83.15 10.11 0.79
50 2.25 6.74 80.90 1.12 4.49 84.28 10.11 0.47
100 0.00 4.49 28.09 0.00 2.25 30.33 67.42 1.88
200 0.00 3.37 15.73 0.00 2.25 16.85 80.90 2.51
300 0.00 2.25 12.36 0.00 2.25 12.36 85.39 3.30
500 0.00 2.25 7.87 0.00 2.25 7.87 89.88 3.85
1000 0.00 2.25 3.37 0.00 2.25 3.37 94.38 7.43

ent iterations. S2M dominates clearly S1M in the very first iteration and
after 200 iterations, while after the first few iterations S1M also manages
to obtain good bounds. Table 2 gives the comparison between these two
algorithms at different times. For comparisons with g = 1, S1M is better
than S2M only in at most 2.25% of the problems, while the converse varies
between roughly 50% (at early times) and 8% (for late times). For g = 5,
S2M still dominates S1M in most cases.
Sparse cuts yield better bounds than using solely the standard P SD
cuts. The observed improvement is around 3% and 5% respectively for
SPARSE1 and SPARSE2. When we are using the MINOR cuts, this value
gets to 6% and 8% respectively for each type of sparsification algorithm
used. Table 3 compares PSDCUT (abbreviated by S) with S2M. The ta-
ble shows that the sparse cuts generated by the sparsification procedures
and minor P SD cuts yield better bounds than the standard cutting plane
algorithm at fixed iterations. Comparisons performed at fixed times, on
the other hand, show that considering the whole set of instances we do not
get any improvement in the first 60 to 120 seconds of computation (see
Table 4). Indeed S2M initially performs worse than the standard cutting
plane algorithm, but after 60 to 120 seconds, it produces better bounds on
average.
In Section 6 detailed computational results are given in Tables 5 and
6 where for each instance we compare the duality gap closed by S and S2M
at several iterations and times. The initial duality gap is obtained as in
(4.1) as RLT − OP T . We then let S2M run with no time limit until the

www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 417

Table 2
Comparison of S1M with S2M at several times.

g=1 g=5
Time S1M S2M Tie S1M S2M Tie inc. impr.
0.5 3.37 52.81 12.36 0.00 43.82 24.72 31.46 2.77
1 0.00 51.68 14.61 0.00 40.45 25.84 33.71 4.35
2 0.00 47.19 15.73 0.00 39.33 23.59 37.08 5.89
3 1.12 44.94 14.61 0.00 34.83 25.84 39.33 5.11
5 1.12 43.82 15.73 0.00 38.20 22.47 39.33 6.07
10 1.12 41.58 16.85 0.00 24.72 34.83 40.45 4.97
15 2.25 37.08 16.85 1.12 21.35 33.71 43.82 3.64
20 1.12 35.96 16.85 1.12 17.98 34.83 46.07 3.49
30 1.12 28.09 22.48 1.12 16.86 33.71 48.31 2.99
60 1.12 20.23 28.09 0.00 12.36 37.08 50.56 2.62
120 0.00 15.73 32.58 0.00 10.11 38.20 51.69 1.73
180 0.00 13.49 32.58 0.00 5.62 40.45 53.93 1.19
300 0.00 11.24 31.46 0.00 3.37 39.33 57.30 0.92
600 0.00 7.86 24.72 0.00 0.00 32.58 67.42 0.72

Table 3
Comparison of S with S2M at several iterations.

g=1 g=5
Iteration S S2M Tie S S2M Tie inc. impr.
1 0.00 76.40 23.60 0.00 61.80 38.20 0.00 10.47
2 0.00 84.27 15.73 0.00 55.06 44.94 0.00 10.26
3 0.00 83.15 16.85 0.00 48.31 51.69 0.00 10.38
5 0.00 80.90 19.10 0.00 40.45 59.55 0.00 10.09
10 1.12 71.91 26.97 0.00 41.57 58.43 0.00 8.87
15 1.12 60.67 38.21 1.12 35.96 62.92 0.00 7.49
20 1.12 53.93 40.45 1.12 29.21 65.17 4.50 6.22
30 1.12 34.83 53.93 0.00 16.85 73.03 10.12 5.04
50 1.12 25.84 62.92 0.00 13.48 76.40 10.12 3.75
100 1.12 8.99 21.35 0.00 5.62 25.84 68.54 5.57
200 0.00 5.62 8.99 0.00 3.37 11.24 85.39 7.66
300 0.00 3.37 7.87 0.00 3.37 7.87 88.76 8.86
500 0.00 3.37 5.62 0.00 3.37 5.62 91.01 8.72
1000 0.00 2.25 0.00 0.00 2.25 0.00 97.75 26.00

value s obtained does not improve by at least 0.01% over ten consecutive
iterations. This value s is an upper bound on the value of the PSD+RLT
relaxation. The column “bound” in the tables gives the value of RLT − s
as a percentage of the gap RLT − OP T , i.e. an approximation of the

www.it-ebooks.info
418 ANDREA QUALIZZA ET AL.

Table 4
Comparison of S with S2M at several times.

g=1 g=5
Time S S2M Tie S S2M Tie inc. impr.
0.5 41.57 17.98 5.62 41.57 17.98 5.62 34.83 -9.42
1 41.57 14.61 5.62 39.33 13.48 8.99 38.20 -8.66
2 42.70 10.11 6.74 29.21 8.99 21.35 40.45 -8.73
3 41.57 8.99 8.99 31.46 6.74 21.35 40.45 -8.78
5 35.96 7.87 15.72 33.71 5.62 20.22 40.45 -7.87
10 34.84 7.87 13.48 30.34 4.50 21.35 43.81 -5.95
15 37.07 5.62 11.24 22.47 2.25 29.21 46.07 -5.48
20 37.07 5.62 8.99 17.98 1.12 32.58 48.32 -4.99
30 30.34 5.62 15.72 11.24 1.12 39.32 48.32 -3.9
60 11.24 12.36 25.84 11.24 2.25 35.95 50.56 -1.15
120 8.99 12.36 24.72 2.25 2.25 41.57 53.93 0.48
180 2.25 14.61 29.21 0.00 4.50 41.57 53.93 1.09
300 0.00 15.73 26.97 0.00 6.74 35.96 57.30 1.60
600 0.00 14.61 13.48 0.00 5.62 22.47 71.91 2.73

percentage of the gap closed by the PSD+RLT relaxation. The columns


labeled S and S2M in the tables give the gap closed by the corresponding
algorithms at different iterations.
Note that although S2M relies on numerous spectral decomposition
computations, most of its running time is spent in generating cuts and
reoptimization of the LP. For example, on the BoxQP instances with a
time limit of 300 seconds, the average percentage of CPU time spent for
obtaining spectral decompositions is below 21 for instances of size 30, below
15 for instances of size 40 and below 7 for instances of size 50.

5. Conclusions. This paper studies linearizations of the P SD cone


based on spectral decompositions. Sparsification of eigenvectors corre-
sponding to negative eigenvalues is shown to produce useful cuts in practice,
in particular when the minor cuts are used. The goal of capturing most of
the strength of a P SD relaxation through linear inequalities is achieved,
although tailing off occurs relatively quickly. As an illustration of typical
behavior of a P SD solver and our linear outer-approximation scheme, con-
sider the two instances, spar020-100-1 and spar030-060-1, with respectively
20 and 30 variables. We use the SDP solver SeDuMi and S2M, keeping track
at each iteration of the bound achieved and the time spent. Figure 2 and
Figure 3 compare the bounds obtained by the two solvers at a given time.
For the small size instance spar020-100-1, we note that S2M converges
to the bound value more than twenty times faster than SeDuMi. In the
medium size instance spar030-060-1 we note that S2M closes a large gap in
the first ten to twenty iterations, and then tailing off occurs. To compute

www.it-ebooks.info
LP RELAXATIONS OF QUADRATIC PROGRAMS 419

the exact bound, SeDuMi requires 408 seconds while S2M requires 2,442
seconds to reach the same precision. Nevertheless, for our purposes, most
of the benefits of the P SD constraints are captured in the early iterations.
Two additional improvements are possible. The first one is to use a
cut separation procedure for the RLT inequalities, avoiding their inclusion
in the initial LP and managing them as other cutting planes. This could
potentially speed up the reoptimization of the LP. Another possibility is
to use a mix of the S and S2M algorithms, using the former in the early
iterations and then switching to the latter.

ϳϱϬ

ϳϬϬ

ϲϱϬ
ŽƵŶĚ

^ĞƵDŝ
^ϮD
ϲϬϬ
ďŽƵŶĚ

ϱϱϬ

ϱϬϬ
Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ
dŝŵĞ;ƐͿ

Fig. 2. Instance spar020-100-1.

ϭϮϱϬ

ϭϭϱϬ

ϭϬϱϬ

ϵϱϬ
ŽƵŶĚ

^ĞƵDŝ

ϴϱϬ ^ϮD
ďŽƵŶĚ

ϳϱϬ

ϲϱϬ

ϱϱϬ
Ϭ ϱϬ ϭϬϬ ϭϱϬ ϮϬϬ ϮϱϬ ϯϬϬ ϯϱϬ ϰϬϬ ϰϱϬ

dŝŵĞ;ƐͿ

Fig. 3. Instance spar030-060-1.

Acknowledgments. The authors warmly thank Anureet Saxena for


the useful discussions that led to the results obtained in this work.
6. Appendix.

www.it-ebooks.info
420

Table 5
Duality gap closed at several iterations for each instance.

iter. 2 iter. 10 iter. 50


Instance |x| |y| bound S S2M S S2M S S2M
circle 3 0 45.79 0.00 0.00 10.97 41.31 45.77 45.79
dispatch 3 1 100.00 25.59 27.92 37.25 35.76 95.90 92.17
ex2 1 10 20 0 22.05 3.93 8.65 15.93 21.05 22.05 22.05
ex3 1 2 5 0 49.75 49.75 49.75 49.75 49.75 49.75 49.75
ex4 1 1 3 0 100.00 99.81 99.84 100.00 100.00 100.00 100.00
ex4 1 3 3 0 56.40 0.00 0.00 51.19 51.19 56.40 56.40
ex4 1 4 3 0 100.00 22.33 42.78 98.98 99.98 100.00 100.00
ex4 1 6 3 0 100.00 69.44 69.87 92.62 99.94 100.00 100.00
ex4 1 7 3 0 100.00 18.00 48.17 96.86 99.90 100.00 100.00
ex4 1 8 3 0 100.00 56.90 81.93 99.76 99.93 100.00 100.00
ex8 1 4 4 0 100.00 94.91 95.19 99.98 100.00 100.00 100.00
ex8 1 5 5 0 68.26 32.32 39.17 59.01 66.76 68.00 68.25
ex8 1 7 9 0 77.43 3.04 33.75 33.13 53.44 64.03 75.38
ex8 4 1 21 1 91.81 4.45 21.80 18.60 45.08 38.07 69.83
ex9 2 1 10 0 54.52 0.00 42.55 0.01 50.13 0.01 51.90
ex9 2 2 10 0 70.37 0.00 14.08 2.34 51.97 7.12 69.41
ex9 2 4 6 2 99.87 0.00 24.84 25.24 99.85 86.37 99.87

www.it-ebooks.info
ex9 2 6 16 0 99.88 3.50 99.42 23.09 99.86 62.32 99.88
ex9 2 7 10 0 42.30 0.00 4.59 0.00 27.34 3.14 34.91
ANDREA QUALIZZA ET AL.

himmel11 5 4 49.75 49.75 49.75 49.75 49.75 49.75 49.75


hydro 12 19 52.06 0.00 20.87 21.95 29.03 26.04 31.39
mathopt1 4 0 100.00 95.76 100.00 99.96 100.00 100.00 100.00
mathopt2 3 0 100.00 99.84 99.93 100.00 100.00 100.00 100.00
meanvar 7 1 100.00 0.00 0.00 78.35 95.84 100.00 100.00
nemhaus 5 0 53.97 26.00 26.41 48.49 50.16 53.87 53.96
prob06 2 0 100.00 90.61 92.39 98.39 98.39 98.39 98.39
prob09 3 1 100.00 0.00 99.00 61.14 99.96 99.64 100.00
process 9 3 8.00 0.00 4.25 0.00 4.98 0.00 5.73
qp1 50 0 100.00 79.59 89.09 93.89 99.77 98.93 100.00
qp2 50 0 100.00 55.94 70.99 82.42 93.92 93.04 99.35
rbrock 3 0 100.00 97.48 100.00 99.96 100.00 100.00 100.00
Continued on Next Page. . .
Table 5 (Continued)

iter. 2 iter. 10 iter. 50


Instance |x| |y| bound S S2M S S2M S S2M
st e10 3 1 100.00 56.90 81.93 99.76 99.93 100.00 100.00
st e18 2 0 100.00 0.00 0.00 98.72 98.72 100.00 100.00
st e19 3 1 93.51 5.14 15.93 29.97 60.10 93.40 93.50
st e25 4 0 87.55 55.80 55.80 87.02 87.01 87.23 87.23
st e28 5 4 49.75 49.75 49.75 49.75 49.75 49.75 49.75
st iqpbk1 8 0 97.99 71.99 76.69 97.20 97.95 97.99 97.99
st iqpbk2 8 0 97.93 70.55 75.16 94.93 97.52 97.93 97.93
spar020-100-1 20 0 100.00 91.15 94.64 99.77 99.99 100.00 100.00
spar020-100-2 20 0 99.70 90.12 92.64 98.17 99.32 99.66 99.69
spar020-100-3 20 0 100.00 96.96 98.51 100.00 100.00 100.00 100.00
spar030-060-1 30 0 98.87 43.53 53.64 79.61 87.39 93.90 97.14
spar030-060-2 30 0 100.00 80.74 89.73 99.89 100.00 100.00 100.00
spar030-060-3 30 0 99.40 67.43 71.94 91.48 95.68 98.75 99.26
spar030-070-1 30 0 97.99 49.05 54.94 76.54 86.51 91.15 95.68
spar030-070-2 30 0 100.00 81.19 85.82 99.26 99.99 100.00 100.00
spar030-070-3 30 0 99.98 85.97 87.43 98.44 99.52 99.92 99.97
spar030-080-1 30 0 98.99 64.44 70.99 87.32 92.11 96.23 98.01
spar030-080-2 30 0 100.00 92.78 95.45 100.00 100.00 100.00 100.00
spar030-080-3 30 0 100.00 92.71 94.18 99.99 100.00 100.00 100.00
spar030-090-1 30 0 100.00 80.37 86.35 97.27 99.30 100.00 100.00
spar030-090-2 30 0 100.00 86.09 89.26 98.13 99.65 100.00 100.00

www.it-ebooks.info
spar030-090-3 30 0 100.00 90.65 91.56 99.97 100.00 100.00 100.00
spar030-100-1 30 0 100.00 77.28 83.25 95.20 98.30 99.85 100.00
spar030-100-2 30 0 99.96 76.78 81.65 93.44 96.84 98.70 99.72
spar030-100-3 30 0 99.85 86.82 88.74 97.45 98.75 99.75 99.83
spar040-030-1 40 0 100.00 25.60 41.96 73.59 84.72 99.13 100.00
spar040-030-2 40 0 100.00 30.93 53.39 79.34 95.62 99.46 100.00
spar040-030-3 40 0 100.00 9.21 31.38 66.46 86.62 98.53 100.00
LP RELAXATIONS OF QUADRATIC PROGRAMS

spar040-040-1 40 0 96.74 23.62 29.03 63.04 75.93 85.93 93.29


spar040-040-2 40 0 100.00 33.17 48.87 89.08 97.94 100.00 100.00
spar040-040-3 40 0 99.18 21.77 30.31 70.44 80.96 91.37 96.69
spar040-050-1 40 0 99.42 35.62 44.87 73.11 84.05 92.81 97.21
spar040-050-2 40 0 99.48 36.79 47.68 82.38 91.27 97.26 98.93
spar040-050-3 40 0 100.00 41.91 51.72 84.04 90.70 96.88 99.34
spar040-060-1 40 0 98.09 46.22 52.89 81.65 87.28 92.39 95.97
Continued on Next Page. . .
421
Table 5 (Continued)
422

iter. 2 iter. 10 iter. 50


Instance |x| |y| bound S S2M S S2M S S2M
spar040-060-2 40 0 100.00 63.02 72.87 94.09 97.66 99.78 100.00
spar040-060-3 40 0 100.00 78.09 87.91 99.30 99.99 100.00 100.00
spar040-070-1 40 0 100.00 64.02 71.33 93.92 97.35 99.77 100.00
spar040-070-2 40 0 100.00 67.49 76.78 95.12 97.97 99.97 100.00
spar040-070-3 40 0 100.00 70.13 79.43 95.65 97.99 99.75 100.00
spar040-080-1 40 0 100.00 63.06 69.40 91.09 95.44 99.00 99.97
spar040-080-2 40 0 100.00 71.42 79.77 94.98 97.62 99.92 100.00
spar040-080-3 40 0 99.99 83.93 88.65 97.76 98.86 99.81 99.95
spar040-090-1 40 0 100.00 75.73 79.96 95.34 97.43 99.46 99.91
spar040-090-2 40 0 99.97 76.39 80.97 95.16 96.72 99.20 99.81
spar040-090-3 40 0 100.00 84.90 87.04 98.33 99.52 100.00 100.00
spar040-100-1 40 0 100.00 87.64 90.43 98.27 99.35 99.98 100.00
spar040-100-2 40 0 99.87 79.78 83.02 94.58 96.76 98.74 99.50
spar040-100-3 40 0 98.70 72.69 78.31 90.83 93.03 95.84 97.36
spar050-030-1 50 0 100.00 3.11 17.60 58.23 79.98 - -
spar050-030-2 50 0 99.27 1.35 16.67 51.11 70.58 - -
spar050-030-3 50 0 99.29 0.08 13.63 50.19 67.46 - -
spar050-040-1 50 0 100.00 23.13 30.86 72.10 81.73 - -
spar050-040-2 50 0 99.39 21.89 34.45 71.24 81.63 - -
spar050-040-3 50 0 100.00 27.18 37.42 83.96 91.70 - -
spar050-050-1 50 0 93.02 25.24 33.77 61.42 68.75 - -

www.it-ebooks.info
spar050-050-2 50 0 98.74 32.10 41.26 77.48 83.48 - -
spar050-050-3 50 0 98.84 38.57 44.67 80.97 85.36 - -
ANDREA QUALIZZA ET AL.

Average - - - 48.75 59.00 75.53 84.39 85.85 89.60


Table 6
Duality gap closed at several times for each instance. (Instances solved in less than 1 second are not shown.)

1s 60 s 180 s 300 s 600 s


Instance bound S S2M S S2M S S2M S S2M S S2M
ex4 1 4 100.00 - 100.00 - - - - - - - -
ex8 1 4 100.00 - 100.00 - - - - - - - -
ex8 1 7 77.43 77.43 77.37 - - - - - - - -
ex8 4 1 91.81 28.14 36.24 61.60 90.43 - - - - - -
ex9 2 2 70.37 - 70.35 - - - - - - - -
ex9 2 6 99.88 96.28 - - - - - - - - -
hydro 52.06 26.43 31.46 - - - - - - - -
mathopt2 100.00 - 100.00 - - - - - - - -
process 8.00 - 7.66 - - - - - - - -
qp1 100.00 79.99 80.28 98.22 99.52 99.73 99.96 99.92 99.98 99.99 100.00
qp2 100.00 55.82 55.27 91.74 95.56 95.86 98.69 97.41 99.66 98.80 100.00
spar020-100-1 100.00 100.00 100.00 - - - - - - - -
spar020-100-2 99.70 99.67 99.61 - - - - - - - -
spar020-100-3 100.00 - 100.00 - - - - - - - -
spar030-060-1 98.87 69.98 58.72 96.53 97.61 98.45 98.70 98.68 98.82 - -
spar030-060-2 100.00 96.52 91.05 - - - - - - - -
spar030-060-3 99.40 82.99 76.15 99.27 99.32 99.38 99.39 99.39 99.40 99.40 99.40

www.it-ebooks.info
spar030-070-1 97.99 69.81 60.36 94.50 96.38 97.29 97.73 97.70 97.91 - 97.98
spar030-070-2 100.00 96.05 87.93 - - - - - - - -
spar030-070-3 99.98 96.26 90.42 99.98 99.98 99.98 99.98 - 99.98 - -
spar030-080-1 98.99 83.36 74.42 97.80 98.11 98.74 98.88 98.89 98.96 - 98.99
spar030-080-2 100.00 99.83 96.70 - - - - - - - -
spar030-080-3 100.00 99.88 95.87 - - - - - - - -
spar030-090-1 100.00 92.86 87.69 - - - - - - - -
spar030-090-2 100.00 93.80 88.46 - 100.00 - - - - - -
LP RELAXATIONS OF QUADRATIC PROGRAMS

spar030-090-3 100.00 97.78 91.35 - - - - - - - -


spar030-100-1 100.00 91.04 84.34 100.00 100.00 - - - - - -
spar030-100-2 99.96 90.21 83.14 99.56 99.75 99.91 99.95 99.95 99.96 - 99.96
spar030-100-3 99.85 94.26 89.55 99.84 99.84 99.85 99.85 99.85 99.85 99.85 99.85
spar040-030-1 100.00 28.97 40.51 89.30 84.19 99.06 99.98 99.98 100.00 - 100.00
Continued on Next Page. . .
423
Table 6 (Continued)
424

1s 60 s 180 s 300 s 600 s


Instance bound S S2M S S2M S S2M S S2M S S2M
spar040-030-2 100.00 31.97 48.01 94.01 96.39 99.58 99.98 99.99 100.00 - -
spar040-030-3 100.00 9.20 27.59 81.66 85.43 97.25 99.86 99.81 100.00 100.00 -
spar040-040-1 96.74 19.38 22.90 70.35 75.45 80.73 88.63 85.34 92.29 90.79 94.74
spar040-040-2 100.00 24.51 29.87 98.63 98.60 100.00 100.00 - - - -
spar040-040-3 99.18 20.88 21.31 78.28 79.31 86.02 91.22 89.52 95.04 94.07 97.71
spar040-050-1 99.42 28.96 21.27 80.18 84.01 88.70 94.62 92.75 96.71 96.53 98.32
spar040-050-2 99.48 29.52 16.91 91.33 91.42 97.01 97.97 98.26 98.87 - 99.31
spar040-050-3 100.00 28.67 19.81 90.03 90.72 95.68 97.51 97.49 99.08 98.92 99.89
spar040-060-1 98.09 37.16 17.10 86.26 87.13 90.18 93.50 92.25 95.32 95.05 96.84
spar040-060-2 100.00 39.57 22.83 98.09 98.22 99.90 99.96 100.00 100.00 100.00 -
spar040-060-3 100.00 52.41 30.57 100.00 99.99 - - - - - -
spar040-070-1 100.00 50.01 21.79 97.74 97.78 99.80 99.87 99.97 99.99 100.00 100.00
spar040-070-2 100.00 47.57 25.19 98.81 98.46 99.99 99.99 100.00 100.00 - -
spar040-070-3 100.00 47.22 21.95 98.96 98.70 99.88 99.92 99.98 100.00 100.00 100.00
spar040-080-1 100.00 51.66 28.00 95.13 95.38 98.29 99.05 99.09 99.74 99.77 99.99
spar040-080-2 100.00 52.24 25.94 98.71 98.31 99.95 99.97 100.00 100.00 - -
spar040-080-3 99.99 56.05 26.98 99.54 99.25 99.89 99.88 99.94 99.95 99.97 99.98
spar040-090-1 100.00 59.71 28.17 98.10 97.86 99.43 99.61 99.70 99.86 99.90 99.99
spar040-090-2 99.97 59.14 29.82 97.83 97.70 99.34 99.58 99.68 99.81 99.86 99.93
spar040-090-3 100.00 63.07 34.62 99.94 99.85 100.00 100.00 - - - -
spar040-100-1 100.00 69.47 28.24 99.66 99.47 99.99 99.99 100.00 100.00 - -

www.it-ebooks.info
spar040-100-2 99.87 65.27 26.07 97.34 96.87 98.60 98.98 99.02 99.39 99.44 99.69
spar040-100-3 98.70 61.40 29.61 93.01 93.17 94.91 96.02 95.81 97.00 96.84 97.77
ANDREA QUALIZZA ET AL.

spar050-030-1 100.00 0.37 3.63 54.46 37.52 70.10 73.34 76.87 84.75 86.23 96.33
spar050-030-2 99.27 0.08 2.79 44.68 38.62 59.58 64.94 67.79 74.98 77.02 86.58
spar050-030-3 99.29 0.00 2.75 44.32 32.31 57.13 59.07 62.54 68.99 71.18 82.86
spar050-040-1 100.00 3.76 1.77 69.97 56.87 77.15 78.30 80.31 84.30 84.90 91.79
spar050-040-2 99.39 2.08 2.84 68.64 58.47 77.72 77.61 81.54 83.63 86.40 90.94
spar050-040-3 100.00 1.76 2.31 79.44 65.71 89.73 87.74 92.67 93.00 95.99 97.69
spar050-050-1 93.02 4.91 1.84 60.64 53.28 65.52 66.42 66.81 70.38 68.45 74.76
spar050-050-2 98.74 6.18 3.39 76.56 68.33 82.34 82.21 84.94 86.52 - 91.34
spar050-050-3 98.84 6.12 2.82 79.38 69.23 84.95 83.23 86.99 86.98 89.77 91.57
Average - 51.45 42.96 87.50 86.38 92.14 93.22 93.18 94.77 93.16 95.86
LP RELAXATIONS OF QUADRATIC PROGRAMS 425

REFERENCES

[1] F.A. Al-Khayyal and J.E. Falk, Jointly constrained biconvex programming.
Math. Oper. Res. 8, pp. 273–286, 1983.
[2] K.M. Anstreicher, Semidefinite Programming versus the Reformulation-
Linearization Technique for Nonconvex Quadratically Constrained Quadratic
Programming, Journal of Global Optimization, 43, pp. 471–484, 2009.
[3] E. Balas, Disjunctive programming: properties of the convex hull of feasible
points. Disc. Appl. Math. 89, 1998.
[4] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming: The-
ory and Algorithms. Wiley, 2006.
[5] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University
Press, 2004.
[6] S.A. Burer and A.N. Letchford, On Non-Convex Quadratic Programming with
Box Constraints, SIAM Journal on Optimization, 20, pp. 1073–1089, 2009.
[7] B. Borchers, CSDP, A C Library for Semidefinite Programming, Optimization
Methods and Software 11(1), pp. 613–623, 1999.
[8] COmputational INfrastructure for Operations Research (COIN-OR).
http://www.coin-or.org.
[9] S.J. Benson and Y. Ye, DSDP5: Software For Semidefinite Programming. Avail-
able at http://www-unix.mcs.anl.gov/DSDP.
[10] GamsWorld Global Optimization library,
http://www.gamsworld.org/global/globallib/globalstat.htm.
[11] L. Lovász and A. Schrijver, Cones of matrices and set-functions and 0-1 opti-
mization. SIAM Journal on Optimization, May 1991.
[12] G.P. McCormick, Nonlinear programming: theory, algorithms and applications.
John Wiley & sons, 1983.
[13] http://www.andrew.cmu.edu/user/aqualizz/research/MIQCP.
[14] A. Saxena, P. Bonami, and J. Lee, Convex Relaxations of Non-Convex Mixed
Integer Quadratically Constrained Programs: Extended Formulations, Math-
ematical Programming, Series B, 124(1–2), pp. 383–411, 2010.
[15] , Convex Relaxations of Non-Convex Mixed Integer Quadratically Con-
strained Programs: Projected Formulations, Optimization Online, November
2008. Available at
http://www.optimization-online.org/DB HTML/2008/11/2145.html.
[16] J.F. Sturm, SeDuMi: An Optimization Package over Symmetric Cones. Available
at http://sedumi.mcmaster.ca.
[17] H.D. Sherali and W.P. Adams, A reformulation-linearization technique for solv-
ing discrete and continuous nonconvex problems. Kluwer, Dordrecht 1998.
[18] H.D. Sherali and B.M.P. Fraticelli, Enhancing RLT relaxations via a new class
of semidefinite cuts. J. Global Optim. 22, pp. 233–261, 2002.
[19] N.Z. Shor, Quadratic optimization problems. Tekhnicheskaya Kibernetika, 1,
1987.
[20] K. Sivaramakrishnan and J. Mitchell, Semi-infinite linear programming ap-
proaches to semidefinite programming (SDP) problems. Novel Approaches to
Hard Discrete Optimization, edited by P.M. Pardalos and H. Wolkowicz, Fields
Institute Communications Series, American Math. Society, 2003.
[21] , Properties of a cutting plane method for semidefinite programming, Tech-
nical Report, Department of Mathematics, North Carolina State University,
September 2007.
[22] K.C. Toh, M.J. Todd, and R.H.Tütüncü, SDPT3: A MATLAB software for
semidefinite-quadratic-linear programming. Available at
http://www.math.nus.edu.sg/~mattohkc/sdpt3.html.

www.it-ebooks.info
426 ANDREA QUALIZZA ET AL.

[23] L. Vandenberghe and S. Boyd, Semidefinite Programming. SIAM Review 38(1),


pp. 49–95, 1996.
[24] D. Vandenbussche and G. L. Nemhauser, A branch-and-cut algorithm for non-
convex quadratic programs with box constraints. Math. Prog. 102(3), pp. 559–
575, 2005.
[25] H. Wolkowicz, R. Saigal, and L. Vandenberghe, Handbook of Semidefinite
Programming: Theory, Algorithms, and Applications. Springer, 2000.

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPS∗
TIMO BERTHOLD† , STEFAN HEINZ† , AND STEFAN VIGERSKE‡

Abstract. This paper discusses how to build a solver for mixed integer quadrati-
cally constrained programs (MIQCPs) by extending a framework for constraint integer
programming (CIP). The advantage of this approach is that we can utilize the full power
of advanced MILP and CP technologies, in particular for the linear relaxation and the
discrete components of the problem. We use an outer approximation generated by lin-
earization of convex constraints and linear underestimation of nonconvex constraints to
relax the problem. Further, we give an overview of the reformulation, separation, and
propagation techniques that are used to handle the quadratic constraints efficiently.
We implemented these methods in the branch-cut-and-price framework SCIP. Com-
putational experiments indicating the potential of the approach and evaluating the im-
pact of the algorithmic components are provided.

Key words. Mixed integer quadratically constrained programming, constraint


integer programming, branch-and-cut, convex relaxation, domain propagation, primal
heuristic, nonconvex.

AMS(MOS) subject classifications. 90C11, 90C20, 90C26, 90C27, 90C57.

1. Introduction. In recent years, substantial progress has been made


in the solvability of generic mixed integer linear programs (MILPs) [2, 12].
Furthermore, it has been shown that successful MILP solving techniques
can often be extended to the more general case of mixed integer nonlinear
programs (MINLPs) [1, 6, 13]. Analogously, several authors have shown
that an integrated approach of constraint programming (CP) and MILP
can help to solve optimization problems that were intractable with either
of the two methods alone, for an overview see [17].
The paradigm of constraint integer programming (CIP) [2, 4] combines
modeling and solving techniques from the fields of constraint programming
(CP), mixed integer programming, and satisfiability testing (SAT). The
concept of CIP aims at restricting the generality of CP modeling as little
as needed while still retaining the full performance of MILP solving tech-
niques. Such a paradigm allows us to address a wide range of optimization
problems. For example, in [2], it is shown that CIP includes MILP and
constraint programming over finite domains as special cases.
The goal of this paper is to show, how a framework for CIPs can be
extended towards a competitive solver for mixed integer quadratically con-
strained programs (MIQCPs), which are an important subclass of MINLPs.
This framework allows us to utilize the power of already existing MILP and
∗ Supported by the DFG Research Center Matheon Mathematics for key technologies

in Berlin, http://www.matheon.de.
† Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany ({berthold,
heinz}@zib.de).
‡ Humboldt University Berlin, Unter den Linden 6, 10099 Berlin, Germany

([email protected]).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 427
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_15,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
428 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

Start Init Presolving


Domain Propagation

Stop
Solve Relax.

Select Subproblem Pricing

Conflict Analysis solved

Cuts
Solve Subproblem
infeasible
Enforce Constraints
solved
Primal Heuristics relax. feasible infeas.
relax. feasible

Branching

Fig. 1. Flowchart of the main solving loop of SCIP.

CP technologies for handling the linear and the discrete parts of the prob-
lem. The integration of MIQCP is a first step towards the incorporation
of MINLP into the concept of constraint integer programming.
We extended the branch-cut-and-price framework SCIP (Solving Con-
straint Integer Programs) [2, 3] by adding methods for MIQCP. SCIP incor-
porates the idea of CIP and implements several state-of-the-art techniques
for solving MILPs. Due to its plugin-based design, it can be easily cus-
tomized, e.g., by adding problem specific separation, presolving, or domain
propagation algorithms.
The framework SCIP solves CIPs by a branch-and-bound algorithm.
The problem is recursively split into smaller subproblems, thereby creating
a search tree and implicitly enumerating all potential solutions. At each
subproblem, domain propagation is performed to exclude further values
from the variables’ domains, and a relaxation may be solved to achieve
a local lower bound – assuming the problem is to minimize the objective
function. The relaxation may be strengthened by adding further valid
constraints (e.g., linear inequalities), which cut off the optimal solution
of the relaxation. In case a subproblem is found to be infeasible, conflict
analysis is performed to learn additional valid constraints. Primal heuristics
are used as supplementary methods to improve the upper bound. Figure 1
illustrates the main algorithmic components of SCIP. In the context of this
article, the relaxation employed in SCIP is a linear program (LP).
The remainder of this article is organized as follows. In Section 2, we
formally define MIQCP and CIP, in Sections 3, 4, and 5, we show how
to handle quadratic constraints inside SCIP, and in Section 6, we present
computational results.

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 429

2. Problem definition. An MIQCP is an optimization problem of


the form
min dT x (2.1)
s.t. xT Ai x + bi T x + ci ≤ 0 for i = 1, . . . , m
xL
k ≤ xk ≤ xU
k for all k ∈ N
xk ∈ Z for all k ∈ I,
where I ⊆ N := {1, . . . , n} is the index set of the integer variables, d ∈ Qn ,
n
Ai ∈ Qn×n and symmetric, bi ∈ Qn , ci ∈ Q for i = 1, . . . , m, xL ∈ Q
n
and xU ∈ Q , with Q := Q ∪ {±∞}, are the lower and upper bounds of
the variables x, respectively (Q denotes the rational numbers). Note that
we do not require the matrices Ai to be positive semidefinite, hence we
also allow for nonconvex quadratic constraints. If I = ∅, we call (2.1) a
quadratically constrained program (QCP).
The definition of CIP, as given in [2, 4], requires a linear objective
function. This is, however, just a technical prerequisite, as a quadratic (or
more general) objective f (x) can be modeled by introducing an auxiliary
objective variable z that is linked to the actual nonlinear objective function
with a constraint f (x) ≤ z. Thus, formulation (2.1) also covers the general
case of mixed integer quadratically constrained quadratic problems.
In this article, we use a definition of CIP which is slightly different from
the one given in [2, 4]. A constraint integer program consists of solving
min dT x
s.t. Ci (x) = 1 for i = 1, . . . , m
xk ∈ Z for all k ∈ I,
with a finite set of constraints Ci : Qn → {0, 1}, for i = 1, . . . , m, the index
set I ⊆ N of the integer variables, and an objective function vector d ∈ Qn .
In [2, 4], it is required that the subproblem remaining after fixing all
integer variables be a linear program in order to guarantee termination
in finite time. In this article, we, however, require a subproblem with all
integer variables fixed to be a QCP. Note that, using spatial branch-and-
bound algorithms, QCPs with finite bounds on the variables can be solved
in finite time up to a given tolerance [18].
3. A constraint handler for quadratic constraints. In SCIP, a
constraint handler defines the semantics and the algorithms to process con-
straints of a certain class. A single constraint handler is responsible for all
the constraints belonging to its constraint class. Each constraint handler
has to implement an enforcement method. In enforcement, the handler has
to decide whether a given solution, e.g., the optimum of a relaxation1 , sat-
1 For this section, we assume that the LP relaxation is bounded. In our implemen-

tation, the so-called pseudo solution, see [2, 3] for details, will be used in the case of
unbounded LP relaxations.

www.it-ebooks.info
430 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

isfies all of its constraints. If the solution violates one or more constraints,
the handler may resolve the infeasibility by adding another constraint, per-
forming a domain reduction, or a branching.
For speeding up computation, a constraint handler may further im-
plement additional features like presolving, cutting plane separation, and
domain propagation for its particular class of constraints. Besides that,
a constraint handler can add valid linear inequalities to the initial LP re-
laxation. For example, all constraint handler for (general or specialized)
linear constraints add their constraints to the initial LP relaxation. The
constraint handler for quadratic constraints adds one linear inequality that
is obtained by the method given in Section 3.2 below.
In the following, we discuss the presolving, separation, propagation,
and enforcement algorithms that are used to handle quadratic constraints.
3.1. Presolving. During the presolving phase, a set of reformula-
tions and simplifications are tried. If SCIP fixes or aggregates variables,
e.g., using global presolving methods like dual bound reduction [2], then
the corresponding reformulations will also be realized in the quadratic con-
straints. Bounds on the variables are tightened using the domain prop-
agation method described in Section 3.3. If, due to reformulations, the
quadratic part of a constraint vanishes, it is replaced by the corresponding
linear constraint. Furthermore, the following reformulations are performed.
Binary Variables. A square of a binary variable is replaced by the
binary variable itself. Further, if a constraint
 contains a product of a binary
variable with a linear term, i.e., x ki=1 ai yi , where x is a binary variable,
yi are variables with finite bounds, and ai ∈ Q, i = 1, . . . , k, then this
product will be replaced by a new variable z ∈ R and the linear constraints
yL x ≤ z ≤ y U x
k
 k
ai yi − y U (1 − x) ≤ z ≤ ai yi − y L (1 − x), where
i=1 i=1
k
 k

yL := ai yiL + ai yiU , and (3.1)
i=1, i=1,
ai >0 ai <0
k
 k

yU := ai yiU + ai yiL .
i=1, i=1,
ai >0 ai <0

In the case that k = 1 and y1 is also a binary variable, the product xy1 can
also be handled by SCIP’s handler for AND constraints [11].
Second-Order Cone (SOC) constraints. Constraints of the form
k

γ+ (αi (xi + βi ))2 ≤ (α0 (y + β0 ))2 , (3.2)
i=1

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 431

with k ≥ 2, αi , βi ∈ Q, i = 0, . . . , k, γ ∈ Q+ , and y L ≥ −β0 are recog-


nized as SOC constraints and handled by a specialized constraint handler,
cf. Section 4.
Convexity. After the presolving phase, each quadratic function is
checked for convexity by computing the sign of the minimum eigenvalue
of the coefficient matrix A. This information will be used for separation.
3.2. Separation. If the current LP solution x̃ violates some con-
straints, a constraint handler may add valid cutting planes in order to
strengthen the formulation.
For a violated convex constraint i, this is always possible by linearizing
the constraint function at x̃. Thus, we add the valid inequality

ci − x̃T Ai x̃ + (bTi + 2x̃T Ai )x ≤ 0 (3.3)

to separate x̃. In the important special case that xT Ai x ≡ ax2j for some
a > 0 and j ∈ I with x̃j ∈
/ Z, we generate the cut

ci + bTi x + a(2x̃j  + 1)xj − ax̃j x̃j  ≤ 0, (3.4)

which is obtained by underestimating xj ∈ Z → x2j by the secant defined


by the points (x̃j , x̃j 2 ) and (x̃j , x̃j 2 ). Note that the violation of
(3.4) by x̃ is larger than that of (3.3).
For a violated nonconvex constraint i, we currently underestimate each
term of xT Ai x separately. A term ax2j with a > 0, j ∈ N , is underesti-
mated as just discussed. For the case a < 0, however, the tightest linear
underestimation for the term ax2j is given by the secant approximation
a(xL U L U L U
j + xj )xj − axj xj , if xj and xj are finite. Otherwise, if xj = −∞
L

or xUj = ∞, we skip separation for constraint i. For a bilinear term axj xk


with a > 0, we utilize the McCormick underestimators [21]

axj xk ≥ axL L L L
j xk + axk xj − axj xk ,

axj xk ≥ axU U U U
j xk + axk xj − axj xk .

If (xU L U L U U L L
j − xj )x̃k + (xk − xk )x̃j ≤ xj xk − xj xk and the bounds xj and
L
L
xk are finite, the former is used for cut generation, elsewise the latter is
used. If both xL L U U
j or xk and xj or xk are infinite, we skip separation for
constraint i. Similar, for a bilinear term axj xk with a < 0, the McCormick
underestimators are

axj xk ≥ axU L U L
j xk + axk xj − axj xk ,

axj xk ≥ axL U L U
j xk + axk xj − axj xk .

If (xU L U L U L L U U
j − xj )x̃k − (xk − xk )x̃j ≤ xj xk − xj xk and the bounds xj and xk
L

are finite, the former is used for cut generation, elsewise the latter is used.

www.it-ebooks.info
432 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

In the case that a linear inequality generated by this method does


not cut off the current LP solution x̃, the infeasibility has to be resolved
in enforcement, see Section 3.4. Besides others, the enforcement method
may apply a spatial branching operation on a variable xj , creating two
subproblems, which both contain a strictly smaller domain for xj . This
results in tighter linear underestimators.

3.3. Propagation. In the domain propagation call, a constraint han-


dler may deduce new restrictions upon local domains of variables. Such
deductions may yield stronger linear underestimators in the separation
procedures, prune nodes due to infeasibility of a constraint, or result in
further deductions for other constraints. For quadratic constraints, we im-
plemented an interval-arithmetic based method similar to [16]. To allow
for an efficient propagation, we write a quadratic constraint in the form

   
dj xj + ek + pk,k xk + pk,r xr xk ∈ [, u], (3.5)
j∈J k∈K r∈K

such that dj , ek , pk,r ∈ Q, , u ∈ Q, J ∪ K ⊆ N , J ∩ K = ∅, and pk,r = 0


for k > r. For a given a ∈ Q, an interval [bL , bU ], and a variable y with
domain [y L , y U ], we denote by q(a, [bL , bU ], y) the set {by + ay 2 : y ∈
[y L , y U ], b ∈ [bL , bU ]}. This set can be computed analytically [16].
The forward propagation step aims at tightening the bounds [, u]
in (3.5). For this purpose, we replace the variables xj and xr in (3.5) by
their domain to obtain the “interval-equation”
 
dj [xL U
j , xj ] + ([fkL , fkU ]xk + pk,k x2k ) ∈ [, u],
j∈J k∈K


where [fkL , fkU ] := 
 [ek , ek ] + r∈K pk,r [xL U L U
r , xr ]. Computing [h , h ] :=
L U L U
j∈J dj [xj , xj ] + k∈K q(pk,k , [fk , fk ], xk ) yields an interval that con-
tains all values that the left hand side of (3.5) can take w.r.t. the current
variables’ domains. If [hL , hU ] ∩ [, u] = ∅, then (3.5) cannot be satisfied for
any x ∈ [xL , xU ] and the current branch-and-bound node can be pruned.
Otherwise, the interval [, u] can be tightened to [, u] ∩ [hL , hU ].
The backward propagation step aims at inferring domain deductions
on the variables in (3.5) using the interval [, u]. For a “linear” variable xj ,
j ∈ J, we can easily infer the bounds
   
1
[, u] − dj  [xL U
j  , xj  ] − q(pk,k , [fkL , fkU ], xk ) .
dj
j  ∈J,j =j  k∈K

For a “quadratic” variable xk , k ∈ K, one way to compute potentially


tighter bounds is by solving the quadratic interval-equation

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 433
  
dj [xL U
j , xj ] + q(pk ,k , [ek , ek ] + pk,r [xL U
r , xr ], xk )
j∈J k ∈K,k =k r∈K,r =k

+ ([ek , ek ] + (pk,r + pr,k )[xL U 2
r , xr ])xk + pk,k xk ∈ [, u].
r∈K

However, since evaluating the argument of q(·) for each k ∈ K may pro-
duce a huge computational overhead, especially for constraints with many
bilinear terms, we compute the solution set of
    
dj [xL
j , xU
j ] + q(p 
k ,k  , [e k  , e k  ], xk  ) + p  [x
k ,r k
L
 , xU
k  ][xL
r , xU
r ]
j∈J k ∈K r∈K
k =k r =k

+ ([ek , ek ] + (pk,r + pr,k )[xL U 2
r , xr ])xk + pk,k xk ∈ [, u], (3.6)
r∈K

which can be performed more efficiently. If the intersection of the current


domain [xL U
k , xk ] of xk with the solution set of (3.6) is empty, we can deduce
infeasibility and prune the corresponding node. Otherwise, we may be able
to tighten the bounds of xk .
As in [16], all interval operations detailed in this section are performed
in outward rounding mode.
3.4. Enforcement. In the enforcement call, a constraint handler has
to check whether the current LP solution x̃ is feasible for all its constraints.
It can resolve an infeasibility by either adding cutting planes that separate
x̃ from the relaxation, by tightening bounds on a variable such that x̃ is
separated from the current domain, by pruning the current node from the
branch-and-bound tree, or by performing a branching operation.
We have configured SCIP to call the enforcement method of the quadra-
tic constraint handler with a lower priority than the enforcement method for
the handler of integrality constraints. Thus, at the point where quadratic
constraints are enforced, all integer variables take an integral value in the
LP optimum x̃. For a violated quadratic constraint, we first perform a
forward propagation step, see Section 3.3, which may prune the current
node. If the forward propagation does not declare infeasibility, we call the
separation method, see Section 3.2. If the separator fails to cut off x̃, we
perform a spatial branching operation. We use the following branching rule
to resolve infeasibility in a nonconvex quadratic constraint.
Branching rule. We consider each unfixed variable xj that appears in
a violated nonconvex quadratic constraint as a branching candidate. Let xlj ,
xuj ∈ Q be the local lower and upper bounds of xj , and xbj ∈ (xlj , xuj ) be the
potential branching point for branching on xj . Usually, we choose xbj = x̃j .
If, however, x̃j is very close to one of the bounds, xbj is shifted inwards the
interval. Thus, for xlj , xuj ∈ Q, we let xbj := min{max{x̃j , λxlj + (1 − λ)xuj },
λxuj + (1 − λ)xlj }, where the parameter λ is set to 0.2 in our experiments.

www.it-ebooks.info
434 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

As suggested in [6], we select the branching variable w.r.t. its pseudo-


cost values. The pseudocosts are used to estimate the objective change in
the LP relaxation when branching downwards and upwards on a particular
variable. The pseudocosts of a variable are defined as the average objective
gains per unit change, taken over all nodes, where this variable has been
chosen for branching, see [8] for details.
In classical pseudocost branching for integer variables, the distances
of x̃j to the nearest integers are used as multipliers of the pseudocosts.
For continuous variables, we use another measure similar to “rb-int-br-rev”
which was suggested in [6]: the distance of xbj to the bounds xL U
j and xj for a
variable xj . This measure is motivated by the observation that the length of
the domain determines the quality of the convexification. If the domain of
xj is unbounded, then the “convexification error of the variable xj ” will be
used as multiplicator. This value is computed by assigning to each variable
the gap evaluated in x̃ that is introduced by using a secant or McCormick
underestimator for a nonconvex term that includes this variables.
As in [2], we combine the two estimates for downwards and upwards
branching by multiplication rather than by a convex sum.

4. A constraint handler for Second-Order Cone constraints.


Constraints of the form
<
= k
=
> γ + (αi (xi + βi ))2 ≤ α0 y + β0 , α0 y ≥ −β0 , (4.1)
i=1

where αi , βi ∈ Q, i = 0, . . . , k, γ ∈ Q+ are handled by a constraint handler


for second-order cone constraints. Note that SOC constraints are con-
vex, i.e., the nonlinear function on the left hand side of (4.1) is convex.
Therefore, unlike nonconvex quadratic constraints, SOC constraints can
be enforced by separation routines solely. First, the inequality α0 y ≥ −β0
is ensured by tightening the bounds of y accordingly. Next, if the current
LP solution (x̃, ỹ) violates some SOC constraint (4.1), then we add the
valid gradient-based inequality

k
1 2
η+ α (x̃i + βi )(xi − x̃i ) ≤ α0 y + β0 ,
η i=1 i


k 2
where η := i=1 γ + (αi (x̃i + βi )) . Note that since (x̃, ỹ) violates (4.1),
one has η > α0 ỹ + β0 ≥ 0. For the initial linear relaxation, no inequalities
are added.
We also experimented with adding a linear outer-approximation as
suggested in [7] a priori, but did not observe computational benefits. Thus,
this option has been disabled for the experiments in Section 6.

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 435

5. Primal heuristics. When solving MIQCPs, we still make use of


all default MILP primal heuristics of SCIP. Most of these heuristics aim at
finding good integer and LP feasible solutions starting from the optimum
of the LP relaxation. For details and a computational study of the primal
MILP heuristics available in SCIP, see [9].
So far, we have implemented two additional primal heuristics for solv-
ing MIQCPs in SCIP, both of which are based on large neighborhood search.
QCP local search. There are several cases, where the MILP primal
heuristics already yield feasible solutions for the MIQCP. However, the
heuristics usually construct a point x̂ which is feasible for the MILP relax-
ation, i.e., the LP relaxation plus the integrality requirements, but violates
some of the quadratic constraints. Such a point may, nevertheless, provide
useful information, since it can serve as starting point for a local search.
The local search we currently use considers the space of continuous
variables, i.e., if there are continuous variables in a quadratic part of a
constraint, we solve a QCP obtained from the MIQCP by fixing all integer
variables to the values of x̂, using x̂ as starting point for the QCP solver.
Each feasible solution of this QCP also is a feasible solution of the MIQCP.
RENS. Furthermore, we implemented an extended form of the relax-
ation enforced neighborhood search (RENS) heuristic [10]. This heuristic
creates a sub-MIQCP problem by exploiting the optimal solution x̃ of the
LP relaxation at some node of the branch-and-bound-tree. In particular,
it fixes all integer variables which take an integral value in x̃ and restricts
the bounds of all integer variables with fractional LP solution value to the
two nearest integral values. This, hopefully much easier, sub-MIQCP is
then partially solved by a separate SCIP instance. Obviously, each feasible
solution of the sub-MIQCP is a feasible solution of the original MIQCP.
Note that, during the solution process of the sub-MIQCP, the QCP
local search heuristic may be used along with the default SCIP heuris-
tics. For some instances this works particularly well since, amongst oth-
ers, RENS performs additional presolving reductions on the sub-MIQCP –
which yields a better performance of the QCP solver.
6. Computational experiments. We conducted numerical experi-
ments on three different test sets. The first is a test set of mixed integer
quadratic programs (MIQPs) [22], i.e., problems with a quadratic objec-
tive function and linear constraints. Secondly, we selected a test set of
mixed integer conic programs (MICPs) [27], which have been formulated
as MIQCP. Finally, we assembled a test set of 24 general MIQCPs from
the MINLPLib [14] and six constrained layout problems (clay*) from [15].
We will refer to these test sets as Miqp, Micp, and Minlp test sets.
In Tables 1–3, each entry shows the number of seconds a certain solver
needs to solve a problem. If the problem was not solved within the given
time limit, the lower and upper bounds at termination are given. For

www.it-ebooks.info
436 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

each instance, the fastest solution time or – in case all solvers hit the time
limit – the best bounds, are depicted in bold face. Further, for each solver
we calculated the geometric mean of the solution time (in which unsolved
instances are accounted for with the time limit), and collected statistics on
how often a solver solved a problem, computed the best dual bound, found
the best primal solution value, or was the fastest among all solvers.
For our benchmark, we ran SCIP 1.2.0.7 using CPLEX 11.2.1 [19] as
LP solver, Ipopt 3.8 [28] as QCP solver for the heuristics (cf. Section 5),
and LAPACK 3.1.0 to compute eigenvalues. For comparison, we ran BA-
RON 9.0.2 [26], Couenne 0.3 [6], CPLEX 12.1, LindoGlobal 6.0.1 [20], and
MOSEK 6.0.0.55 [23]. Note that BARON, Couenne, and LindoGlobal can
also be applied to general MINLPs. All solvers were run with a time limit
of one hour, a final gap tolerance of 10−4 , and a feasibility tolerance of 10−6
on a 2.5 GHz Intel Core2 Duo CPU with 4 GB RAM and 6 MB Cache.
Mixed Integer Quadratic Programs. Table 6 presents the 25 in-
stances from the Miqp test set [22]. We observe that due to the refor-
mulation (3.1), 15 instances could be reformulated as mixed integer linear
programs in the presolving state.
Table 1 compares the performance of SCIP, BARON, Couenne, and
CPLEX on the Miqp test set. We did not run LindoGlobal since many
of the Miqp instances exceed limitations of our LindoGlobal license. Note
that some of the instances are nonconvex before applying the reformulation
described in Section 3.1, so that we did not apply solvers which have only
been designed for convex problems. Instance ivalues is the only instance
that cannot be handled by CPLEX due to nonconvexity. Altogether, SCIP
performs much better than BARON and Couenne and slightly better than
CPLEX w.r.t. the mean computation time.
Mixed Integer Conic Programs. The Micp test set consists of
three types of optimization problems, see Table 5. The classical
 XXX YY
instances contain one convex quadratic constraint of the form kj=1 x2j ≤ u
for some u ∈ Q, where XXX stand for the dimension k and YY is a problem
index. The robust XXX YY instances contain one convex quadratic and one
SOC constraint of dimension k. The shortfall XXX YY instances contain
two SOC constraints of dimension k.
Table 2 compares the performance of BARON, Couenne, CPLEX, MO-
SEK, LindoGlobal, and SCIP on the Micp test set. We observe that on
this specific test set SCIP outperforms BARON, Couenne, and LindoGlobal.
It solves one instance more but is about 20% slower than the commercial
solvers CPLEX and MOSEK.
Mixed Integer Quadratically Constrained Programs. The in-
stances lop97ic, lop97icx, pb302035, pb351535, qap, and qapw were
transformed into MILPs by presolving – which is due to the reformula-
tion (3.1). The instances nuclear*, space25, space25a, and waste are

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 437

Table 1
Results on Miqp instances. Each entry shows the number of seconds to solve a
problem, or the bounds obtained after the one hour limit.

instance BARON Couenne CPLEX SCIP


iair04 [−∞, ∞] fail 37.52 228.27
iair05 [−∞, ∞] [25886, ∞] 30.71 113.65
ibc1 [1.792, 3.72] [1.696, 3.98] 895.54 43.06
ibell3a 58.95 198.90 3.96 14.59
ibienst1 1048.04 [−∞, 49.11] 2836.05 31.53
icap6000 [−2448496, −2441852] fail 6.28 6.10
icvxqp1 [324603, 613559] fail [327522, 410439] [0, 4451398]
ieilD76 [729.5, 1081] [808.3, 898.5] 13.50 41.28
ilaser0 [−∞, ∞] fail [2409925, 2412734] fail
imas284 [89193, 92241] 3139.12 4.36 27.33
imisc07 [2432, 2814] [1696, 3050] 70.02 30.52
imod011 [−3.823, −3.843] fail [−∞, ∞] [−∞, 0]
inug06-3rd [177.8, 1434] fail [527.2, 1434] [1152, ∞]
inug08 [1451, 14696] [683.1, ∞] 2126.68 24.83
iportfolio [−∞, 0] [−0.4944, ∞] [−0.4944, −0.4937] [−0.5251, 0]
iqap10 [−∞, ∞] [329.8, ∞] 1411.26 657.71
iqiu [−402.5, −108.6] [−357.5, −126.3] 91.77 64.53
iran13x13 [2930, 3355] [3014, 3476] 20.02 68.44
iran8x32 [5013, 5454] [5034, 5629] 25.24 8.31
isqp0 [−∞, ∞] [−∞, −20137] [−20338, −20320] [−∞, −19895]
isqp1 [−∞, ∞] [−∞, −18801] [−19028, −18993] [−∞, −17883]
isqp [−∞, ∞] [−∞, −20722] [−21071, −21001] [−∞, ∞]
iswath2 [335.6, 661.9] [335.9, 411.8] 212.15 121.29
itointqor [−∞, −1146] fail [−1150, −1147] [−∞, 0]
ivalues [−12.88, −0.4168] [−6.054, −1.056] – [−172.6, ∞]
mean time 2923.78 3193.55 423.387 303.013
#solved 2 2 15 15
#best dual bound 4 5 21 16
#best primal sol. 5 3 23 15
#fastest 0 0 6 9

particularly difficult since they contain continuous variables that appear in


quadratic terms with at least one bound at infinity. This prohibits to use
the reformulation (3.1) for products of binary variables with a linear term.
Further, generating secant and McCormick cuts for nonconvex terms is not
possible. Thus, if the propagation algorithm cannot reduce domains for
such unbounded variables, it may require many branching operations until
reasonable variable bounds and a lower bound can be computed.
Table 3 compares the performance of BARON, Couenne, LindoGlobal,
and SCIP on the Minlp test set. Figure 2 shows a performance profile
for this particular test set. Regarding the number of solved instances,
LindoGlobal performs best: it could solve two instances more than BARON
and SCIP, which both solved six instances more than Couenne. SCIP was,
however, significantly faster than the other solvers.
BARON wrongly declared the instance product to be infeasible and
hit the time limit while parsing the instances pb302035 and pb351535.
Couenne wrongly declared the instances product and waste to be infeasible.
Using a time limit of 3600 seconds, LindoGlobal and Couenne did not stop

www.it-ebooks.info
438

Table 2
Results on Micp instances. Each entry shows the number of seconds to solve a problem, or the bounds obtained after the one hour limit.

instance BARON Couenne CPLEX LindoGlobal MOSEK SCIP


classical_40_0 207.24 178.72 1.60 38.25 12.79 33.16
classical_40_1 7.09 114.49 1.01 217.55 22.28 5.53
classical_50_0 [−0.09191, −0.09074] [−0.09447, −0.09054] 41.33 [−0.09572, −0.09074] 161.17 2664.58
classical_50_1 250.46 [−0.09595, −0.09459] 7.38 [−0.09593, −0.09476] 30.62 210.65
classical_200_0 [−0.1247, −0.1077] [−0.1255, −0.0951] [−0.1231, −0.1106] [−0.1256, −0.08574] [−0.124, −0.1103] [−0.1284, −0.108]
classical_200_1 [−0.1269, −0.1149] [−0.1283, −0.1036] [−0.1257, −0.1164] [−0.1284, −0.1093] [−0.1266, −0.1162] [−0.1311, −0.1162]
robust_40_0 3473.03 [−0.09706, −0.07602] 0.67 3600.95 1.37 4.03
robust_40_1 1752.70 [−0.116, −0.07646] 0.64 249.72 2.88 2.29
robust_50_0 [−0.08615, −0.08609] [−0.1263, −0.0861] 1.88 165.45 0.90 1.51
robust_50_1 [−0.08574, −0.08569] [−0.1274, −0.0857] 2.32 506.36 3.18 8.71
robust_100_0 [−0.1013, −0.0932] [−0.1514, −0.09747] [−0.1043, −0.09721] [−0.1542, −0.08833] 1210.86 1392.47
robust_100_1 [−0.07501, −0.0703] [−0.1257, −0.0677] 525.14 [−0.1269, 0] 292.21 592.92
robust_200_0 [−0.1722, −0.1363] fail [−0.145, −0.1411] [−1, 0] [−0.1468, −0.1411] [−0.1452, −0.1411]
robust_200_1 [−0.1477, −0.1424] [−0.1995, −0.1377] [−0.1454, −0.1425] [−1, 0] [−0.1456, −0.1427] [−0.1467, −0.1413]
shortfall_40_0 102.17 333.76 247.99 1027.02 45.71 15.39
shortfall_40_1 5.99 133.49 5.71 288.12 13.72 3.00

www.it-ebooks.info
shortfall_50_0 [−1.098, −1.095] [−1.102, −1.095] 1913.00 [−1.104, −1.095] 405.93 1602.81
shortfall_50_1 91.84 [−1.103, −1.099] 13.13 [−1.104, −1.102] 21.73 15.44
shortfall_100_0 [−1.12, −1.114] [−1.126, −1.102] [−1.132, −1.112] [−1.125, −1.114] [−1.116, −1.114] [−1.121, −1.114]
shortfall_100_1 [−1.109, −1.106] [−1.113, −1.091] 3301.75 [−1.113, −1.105] [−1.111, −1.106] 2152.56
shortfall_200_0 [−1.149, −1.12] [−1.15, −1.094] [−1.146, −1.125] [−1.479, −1.08] [−1.146, −1.126] [−1.149, −1.119]
shortfall_200_1 [−1.15, −1.131] [−1.152, −1.101] [−1.15, −1.133] [−1.361, −1.089] [−1.151, −1.135] [−1.148, −1.134]
mean time 1202.41 2092.28 226.932 1552.67 228.392 288.956
#solved 8 4 14 8 14 15
#best dual bound 8 4 19 8 16 16
#best primal sol. 13 9 17 12 19 17
TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

#fastest 0 0 8 0 4 3
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 439

Table 3
Results on Minlp instances. Each entry shows the number of seconds to solve a
problem, or the bounds obtained after the one hour limit.

instance BARON Couenne LindoGlobal SCIP


clay0203m 1.56 2.03 43.13 0.15
clay0204m 48.25 4.79 85.50 0.52
clay0205m 971.83 27.73 1162.00 6.49
clay0303m 1.29 [−∞, 29911] 62.17 0.37
clay0304m 14.77 16.31 187.38 0.90
clay0305m 3584.73 27.65 1112.42 7.45
du-opt 137.39 [−8727, ∞] 2204.70 1.07
du-opt5 150.54 [−2437, 9.012] 697.10 0.47
lop97ic [2549, ∞] [3826, ∞] [−∞, ∞] [3069, 4547]
lop97icx [2812, 4415] [3903, 4272] [0, 5259] [3763, 4099]
nous1 641.19 [1.345, 1.567] 41.44 [1.195, 1.567]
nous2 0.97 2.69 0.36 1349.72
nuclear14a [−12.26, ∞] [−12.26, ∞] [−∞, ∞] [−228.2, −1.105]
nuclear14b [−2.078, −1.107] [−2.234, ∞] [−∞, ∞] [−198.3, −1.118]
nuclear14 [−∞, ∞] [−∞, −1.12] [−∞, −1.126] [−∞, −1.122]
nuclearva [−∞, ∞] [−∞, −1.005] [−∞, ∞] [−∞, ∞]
nvs19 12.14 778.31 457.04 0.21
nvs23 44.54 [−1380, −1109] 2533.14 0.40
pb302035 [−∞, ∞] fail [622588, ∞] [1138613, 4019210]
pb351535 [−∞, ∞] fail fail [1710093, 4976433]
product fail fail [−2200, −2092] 326.76
qap [103040, 388250] [0, ∞] [−∞, ∞] [24761, 410450]
qapw [265372, 391210] [0, ∞] [0, 405354] [32191, 400150]
space25a [99.99, 490.2] [35.09, ∞] [330.6, 489.2] [72.46, ∞]
space25 [84.91, 520.9] [42.68, ∞] [33.07, 638.8] [72.46, ∞]
tln12 [32.73, ∞] [16.19, ∞] [85.8, 139.1] [16.41, 91.6]
tln5 798.56 [6.592, 10.3] 174.02 32.56
tln6 [13.75, 15.3] [7.801, 15.3] 182.23 [10.21, 15.3]
tln7 [12.38, 15.6] [5.038, 16.1] [14.2, 15.6] [7.016, 15]
waste [306.7, 712.3] fail 1532.71 [306.7, 670.6]
mean time 755.466 1215.11 995.697 400.464
#solved 13 7 15 13
#best dual bound 18 10 18 15
#best primal sol. 17 11 17 23
#fastest 0 0 4 12

after 4000 seconds for pb351535 and for pb302035, pb351535, respectively.
Further, no bounds were reported in the log file.
CPLEX can be applied to 11 instances of this test set. The clay* and
du-opt* instances were solved within seconds; 4 times CPLEX was fastest,
4 times SCIP was fastest. For the instances pb302035, pb351535, and qap,
CPLEX found good primal solutions, but very weak lower bounds.

Evaluation of implemented MIQCP techniques. In order to


evaluate the computational effects of the implemented techniques, we com-
pare the default settings of SCIP with a series of settings where a single
technique has been turned off at a time. The methods we evaluated are
the reformulation (3.1) for products that involve binary variables, cf. Sec-
tion 3.1, the handling of SOC constraints by the SOC constraint handler,
cf. Section 4, the domain propagation for quadratic constraints during
branch-and-bound, cf. Section 3.3, the detection of convexity for multi-

www.it-ebooks.info
440 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

Fig. 2. Performance profile for Minlp test set.

variate quadratic functions, cf. Section 3.1, the QCP local search heuristic,
cf. Section 5, and the extended RENS heuristic, cf. Section 5.
For each method, we evaluate the performance only on those instances
from the test sets Miqp, Micp, and Minlp where the method to evaluate
may have an effect (e.g., disabling the reformulation (3.1) is only evaluated
on instances where this reformulation can be applied). The results are
summarized in Table 4. For a number of performance measures we report
the relative change caused by disabling a particular method.
We observe that deactivating one of the methods always leads to more
deteriorations than improvements for both, the dual and primal bounds at
termination. Except for one instance in the case of switching off binary
reformulations, the number of solved instances remains equal or decreases.
Recognizing SOC constraints and convexity allows to solve instances
of those special types much faster. Disabling domain propagation or one
of the primal heuristics yields a small improvement w.r.t. computation
time for easy instances, but results in weaker bounds for those instances
which could not be solved within the time limit. We further observed that
switching off the QCP local search heuristic increases the time until the first
feasible solution is found by 93% and the time until the optimal solution
is found by 26%. For RENS, the numbers are 12% and 43%, accordingly.
Therefore, we still consider applying these techniques to be worthwhile.
7. Conclusions. In this paper, we have shown how a framework for
constraint integer programming can be extended towards a solver for gen-
eral MIQCPs. We implemented methods to correctly handle the quadratic

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 441

Table 4
Relative impact of implemented MIQCP methods. Percentages in columns 3–9 are
relative to the size of the test set. Percentage in mean time column is relative to the
mean time of SCIP with default settings.

primal sol. dual bound running time


disabled feature size solved better worse better worse better worse mean
binary reform. 32 +3% 13% 22% 6% 25% 22% 34% +3%
SOC upgrade 16 −69% 0% 69% 0% 100% 0% 69% +1317%
domain prop. 48 ±0% 4% 8% 6% 17% 29% 15% −4%
convexity check 10 −20% 20% 20% 0% 30% 10% 30% +159%
QCP local search 48 ±0% 2% 17% 2% 4% 38% 17% −6%
RENS heuristic 56 ±0% 5% 9% 7% 7% 41% 11% −3%

constraints. In order to speed up computations we further implemented


MIQCP specific presolving, propagation, and separation methods. Further-
more, we discussed two large neighborhood search heuristics for MIQCP.
The computational results indicate that this already suffices for building a
solver which is competitive to state-of-the-art solvers like CPLEX, BARON,
Couenne, and LindoGlobal. SCIP performed particularly well on the Miqp
and Micp test sets, which contain a large number of linear constraints and
a few quadratic constraints. These results meet our expectations, since
SCIP already features several sophisticated MILP technologies.
We conclude that the extension of a full-scale MILP solver for handling
MIQCP is a promising approach. The next step towards a full-scale MIQCP
solver will be the incorporation of further MIQCP specific components into
SCIP, e.g., more sophisticated separation routines [5, 24] and specialized
constraint handlers, e.g., for bilinear covering constraints [25].

Acknowledgments. We like to thank Ambros M. Gleixner and Marc


E. Pfetsch for their valuable comments on the paper. We further thank the
anonymous reviewers for their constructive suggestions.

REFERENCES

[1] K. Abhishek, S. Leyffer, and J. Linderoth, FilMINT: An outer-approximation-


based solver for nonlinear mixed integer programs, INFORMS Journal on
Computing, 22 (2010), pp. 555–567.
[2] T. Achterberg, Constraint Integer Programming, PhD thesis, Technische Uni-
versität Berlin, 2007.
[3] , SCIP: Solving Constraint Integer Programs, Math. Program. Comput.,
1 (2009), pp. 1–41.
[4] T. Achterberg, T. Berthold, T. Koch, and K. Wolter, Constraint integer
programming: A new approach to integrate CP and MIP, in Integration of AI
and OR Techniques in Constraint Programming for Combinatorial Optimiza-
tion Problems, 5th International Conference, CPAIOR 2008, L. Perron and
M. Trick, eds., Vol. 5015 of LNCS, Springer, 2008, pp. 6–20.

www.it-ebooks.info
442 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

[5] X. Bao, N. Sahinidis, and M. Tawarmalani, Multiterm polyhedral relaxations for


nonconvex, quadratically-constrained quadratic programs, Optimization Meth-
ods and Software, 24 (2009), pp. 485–504.
[6] P. Belotti, J. Lee, L. Liberti, F. Margot, and A. Wächter, Branching and
bounds tightening techniques for non-convex MINLP, Optimization Methods
and Software, 24 (2009), pp. 597–634.
[7] A. Ben-Tal and A. Nemirovski, On polyhedral approximations of the second-
order cone, Math. Oper. Res., 26 (2001), pp. 193–205.
[8] M. Bénichou, J. M. Gauthier, P. Girodet, G. Hentges, G. Ribière, and
O. Vincent, Experiments in mixed-integer linear programming, Math. Pro-
gram., 1 (1971), pp. 76–94.
[9] T. Berthold, Primal heuristics for mixed integer programs. Diploma thesis, Tech-
nische Universität Berlin, 2006.
[10] , RENS – relaxation enforced neighborhood search, ZIB-Report 07-28, Zuse
Institute Berlin, 2007.
[11] T. Berthold, S. Heinz, and M.E. Pfetsch, Nonlinear pseudo-boolean optimiza-
tion: relaxation or propagation?, in Theory and Applications of Satisfiability
Testing – SAT 2009, O. Kullmann, ed., no. 5584 in LNCS, Springer, 2009,
pp. 441–446.
[12] R.E. Bixby, M. Fenelon, Z. Gu, E. Rothberg, and R. Wunderling, MIP:
theory and practice – closing the gap, in System Modelling and Optimization:
Methods, Theory and Applications, M. Powell and S. Scholtes, eds., Kluwer,
2000, pp. 19–50.
[13] P. Bonami, L.T. Biegler, A.R. Conn, G. Cornuéjols, I.E. Grossmann, C.D.
Laird, J. Lee, A. Lodi, F. Margot, N.W. Sawaya, and A. Wächter, An
algorithmic framework for convex mixed integer nonlinear programs, Discrete
Optim., 5 (2008), pp. 186–204.
[14] M.R. Bussieck, A.S. Drud, and A. Meeraus, MINLPLib - a collection of test
models for mixed-integer nonlinear programming, INFORMS J. Comput., 15
(2003), pp. 114–119.
[15] CMU-IBM MINLP Project. http://egon.cheme.cmu.edu/ibm/page.htm.
[16] F. Domes and A. Neumaier, Quadratic constraint propagation, Constraints, 15
(2010), pp. 404–429.
[17] J.N. Hooker, Integrated Methods for Optimization, International Series in Oper-
ations Research & Management Science, Springer, New York, 2007.
[18] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, Springer,
1990.
[19] IBM, CPLEX. http://ibm.com/software/integration/optimization/cplex.
[20] Y. Lin and L. Schrage, The global solver in the LINDO API, Optimization
Methods and Software, 24 (2009), pp. 657–668.
[21] G. McCormick, Computability of global solutions to factorable nonconvex pro-
grams: Part I-Convex Underestimating Problems, Math. Program., 10 (1976),
pp. 147–175.
[22] H. Mittelmann, MIQP test instances. http://plato.asu.edu/ftp/miqp.html.
[23] MOSEK Corporation, The MOSEK optimization tools manual, 6.0 ed., 2009.
[24] A. Saxena, P. Bonami, and J. Lee, Convex relaxations of non-convex mixed in-
teger quadratically constrained programs: Projected formulations, Tech. Rep.
RC24695, IBM Research, 2008. to appear in Math. Program.
[25] M. Tawarmalani, J.-P.P. Richard, and K. Chung, Strong valid inequalities
for orthogonal disjunctions and bilinear covering sets, Math. Program., 124
(2010), pp. 481–512.
[26] M. Tawarmalani and N. Sahinidis, Convexification and Global Optimization in
Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms,
Software, and Applications, Kluwer Academic Publishers, 2002.

www.it-ebooks.info
EXTENDING A CIP FRAMEWORK TO SOLVE MIQCPs 443

[27] J.P. Vielma, S. Ahmed, and G.L. Nemhauser, A lifted linear programming
branch-and-bound algorithm for mixed integer conic quadratic programs, IN-
FORMS J. Comput., 20 (2008), pp. 438–450.
[28] A. Wächter and L.T. Biegler, On the implementation of a primal-dual interior
point filter line search algorithm for large-scale nonlinear programming, Math.
Program., 106 (2006), pp. 25–57.

APPENDIX
In this section, detailed problem statistics are presented for the three
test sets, Micp (Table 5), Miqp, and Minlp (both in Table 6). The
columns belonging to “original problem” state the structure of the origi-
nal problem. The “presolved problem” columns show statistics about the
MIQCP after SCIP has applied its presolving routines, including the ones
described in Section 3.1. The columns “vars”, “int”, and “bin” show the
number of all variables, the number of integer variables, and the num-
ber of binary variables, respectively. The columns “linear”, “quad”, and
“soc” show the number of linear, quadratic, and second-order cone con-
straints, respectively. The column “conv” indicates whether all quadratic
constraints of the presolved MIQCP are convex or whether at least one of
them is nonconvex.
Table 5
Statistics of instances in Micp test set.

original problem presolved problem


instance vars int bin linear quad vars int bin linear quad soc
classical_40_0 120 0 40 82 1 120 0 40 82 1 0
classical_40_1 120 0 40 82 1 120 0 40 82 1 0
classical_50_0 150 0 50 102 1 150 0 50 102 1 0
classical_50_1 150 0 50 102 1 150 0 50 102 1 0
classical_200_0 600 0 200 402 1 600 0 200 402 1 0
classical_200_1 600 0 200 402 1 600 0 200 402 1 0
robust_40_0 163 0 41 124 2 163 0 41 124 1 1
robust_40_1 163 0 41 124 2 163 0 41 124 1 1
robust_50_0 203 0 51 154 2 203 0 51 154 1 1
robust_50_1 203 0 51 154 2 203 0 51 154 1 1
robust_100_0 403 0 101 304 2 403 0 101 304 1 1
robust_100_1 403 0 101 304 2 403 0 101 304 1 1
robust_200_0 803 0 201 604 2 803 0 201 604 1 1
robust_200_1 803 0 201 604 2 803 0 201 604 1 1
shortfall_40_0 164 0 41 125 2 164 0 41 125 0 2
shortfall_40_1 164 0 41 125 2 164 0 41 125 0 2
shortfall_50_0 204 0 51 155 2 204 0 51 155 0 2
shortfall_50_1 204 0 51 155 2 204 0 51 155 0 2
shortfall_100_0 404 0 101 305 2 404 0 101 305 0 2
shortfall_100_1 404 0 101 305 2 404 0 101 305 0 2
shortfall_200_0 804 0 201 605 2 804 0 201 605 0 2
shortfall_200_1 804 0 201 605 2 804 0 201 605 0 2

www.it-ebooks.info
444 TIMO BERTHOLD, STEFAN HEINZ, AND STEFAN VIGERSKE

Table 6
Statistics of instances in Miqp (first part) and Minlp (second part) test set.

original problem presolved problem


instance vars int bin linear quad vars int bin linear quad conv
iair04 8905 0 8904 823 1 12848 0 7362 17464 0 
iair05 7196 0 7195 426 1 10574 0 6117 14218 0 
ibc1 1752 0 252 1913 1 866 0 252 1438 0 
ibell3a 123 29 31 104 1 129 29 31 161 1 
ibienst1 506 0 28 576 1 473 0 28 592 0 
icap6000 6001 0 6000 2171 1 7323 0 5865 6362 0 
icvxqp1 10001 10000 0 5000 1 10003 9998 2 5006 1 
ieilD76 1899 0 1898 75 1 2685 0 1898 3168 0 
ilaser0 1003 151 0 2000 1 1003 151 0 1000 1 
imas284 152 0 150 68 1 228 0 150 299 0 
imisc07 261 0 259 212 1 360 0 238 598 0 
imod011 10958 1 96 4480 1 8963 1 96 2730 1 
inug06-3rd 2887 0 2886 3972 1 3709 0 2886 7779 0 
inug08 1633 0 1632 912 1 2217 0 1632 3076 0 
iportfolio 1201 192 775 201 1 1201 192 775 201 1 
iqap10 4151 0 4150 1820 1 5879 0 4150 9047 0 
iqiu 841 0 48 1192 1 871 0 48 1285 0 
iran13x13 339 0 169 195 1 468 0 169 585 0 
iran8x32 513 0 256 296 1 651 0 256 713 0 
isqp0 1001 50 0 249 1 1001 50 0 249 1 
isqp1 1001 0 100 249 1 1068 0 100 480 1
isqp 1001 50 0 249 1 1001 50 0 249 1 
iswath2 6405 0 2213 483 1 8007 0 2213 5631 0 
itointqor 51 50 0 0 1 51 50 0 0 1 
ivalues 203 202 0 1 1 203 202 0 1 1
clay0203m 30 0 18 30 24 27 0 15 27 24 
clay0204m 52 0 32 58 32 48 0 28 54 32 
clay0205m 80 0 50 95 40 75 0 45 90 40 
clay0303m 33 0 21 30 36 31 0 19 29 36 
clay0304m 56 0 36 58 48 54 0 34 57 48 
clay0305m 85 0 55 95 60 81 0 51 93 60 
du-opt 21 13 0 9 1 21 13 0 5 1 
du-opt5 21 13 0 9 1 19 11 0 4 1 
lop97ic 1754 831 831 52 40 5228 708 708 11521 0 
lop97icx 987 831 68 48 40 488 68 68 1138 0 
nous1 51 0 2 15 29 47 0 2 11 29
nous2 51 0 2 15 29 47 0 2 11 29
nvs19 9 8 0 0 9 9 8 0 0 9
nvs23 10 9 0 0 10 10 9 0 0 10
pb302035 601 0 600 50 1 1199 0 600 1847 0 
pb351535 526 0 525 50 1 1048 0 525 1619 0 
product 1553 0 107 1793 132 446 0 92 450 82
qap 226 0 225 30 1 449 0 225 702 0 
qapw 451 0 225 255 1 675 0 225 930 0 
space25 893 0 750 210 25 767 0 716 118 25
space25a 383 0 240 176 25 308 0 240 101 25
nuclear14 1562 0 576 624 602 986 0 576 48 602
nuclear14a 992 0 600 49 584 1568 0 600 2377 560
nuclear14b 1568 0 600 1225 560 1568 0 600 1225 560
nuclearva 351 0 168 50 267 327 0 144 24 267
tln12 168 156 12 60 12 180 144 24 85 11
tln5 35 30 5 25 5 35 30 5 20 5
tln6 48 42 6 30 6 48 42 6 24 6
tln7 63 56 7 35 7 63 56 7 28 7
waste 2484 0 400 623 1368 1238 0 400 516 1230

www.it-ebooks.info
PART VII:
Combinatorial Optimization

www.it-ebooks.info
www.it-ebooks.info
COMPUTATION WITH POLYNOMIAL EQUATIONS AND
INEQUALITIES ARISING IN
COMBINATORIAL OPTIMIZATION
JESUS A. DE LOERA∗ , PETER N. MALKIN† , AND PABLO A. PARRILO‡

Abstract. This is a survey of a recent methodology to solve systems of polyno-


mial equations and inequalities for problems arising in combinatorial optimization. The
techniques we discuss use the algebra of multivariate polynomials with coefficients over
a field to create large-scale linear algebra or semidefinite programming relaxations of
many kinds of feasibility or optimization questions.

Key words. Polynomial equations and inequalities, combinatorial optimization,


Nullstellensatz, Positivstellensatz, graph colorability, max-cut, stable sets, semidefinite
programming, large-scale linear algebra, semi-algebraic sets, real algebra.

AMS(MOS) subject classifications. 90C27, 90C22, 68W05.

1. Introduction. A wide variety of problems in optimization can be


easily modeled using systems of polynomial equations and inequalities. Fea-
sibility and optimization problems translate, either directly or via branch-
ing, into the problem of finding a solution of a system of equations and
inequalities. In this survey paper, we explain how to manipulate such sys-
tems for finding solutions or proving that they do not exist. Although these
techniques work in general, we are particularly motivated by problems of
combinatorial origin. For example, in the case of graphs, here is how one
can think about stable sets, k-colorability and max-cut problems in terms
of polynomial (non-linear) constraints:
Proposition 1.1. Let G = (V, E) be a graph.
• For a given positive integer k, consider the following polynomial
system:

x2i − xi = 0 ∀i ∈ V, xi xj = 0 ∀(i, j) ∈ E and xi = k.
i∈V

This system is feasible if and only if G has a stable set of size k.

∗ Department of Mathematics, University of California at Davis, Davis, CA 95616

([email protected]); partially supported by NSF DMS-0914107 and an IBM


OCR award.
† Department of Mathematics, University of California at Davis, Davis, CA 95616

([email protected]); partially supported by an IBM OCR award.


‡ Laboratory for Information and Decision Systems, Department of Electrical Engi-

neering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA


02139 ([email protected]); partially supported by AFOSR MURI 2003-07688-1 and NSF
FRG DMS-0757207.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 447
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_16,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
448 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

• For a positive integer k, consider the following polynomial system


of |V | + |E| polynomials equations:
k−1

xki − 1 = 0 ∀i ∈ V and xk−1−s
i xsj = 0 ∀(i, j) ∈ E.
s=0

The graph G is k-colorable if and only if this system has a complex


solution. Furthermore, when k is odd, G is k-colorable if and only
if this system has a common root over F2 , the algebraic closure of
the finite field with two elements.
• We can represent the set of cuts of G (i.e., bipartitions on V ) as
the 0-1 incidence vectors

SG := {χF : F ⊆ E is contained in a cut of G} ⊆ {0, 1}E .

Thus, the max cut problem with non-negative weights we on the


edges e ∈ E is

max{ we xe : x ∈ SG}.
e∈E

The vectors χF are the solutions of the polynomial system



x2e − xe = 0 ∀ e ∈ E, and xi = 0 ∀ T an odd cycle in G.
i∈T

There are many other combinatorial problems that can be modeled


concisely by polynomial systems (see [9] and the many references therein).
In fact, a given problem can often be modeled non-linearly in many differ-
ent ways, and in practice choosing a “good” formulation is critical for an
efficient solution.
Given a polynomial system encoding a combinatorial question, we ex-
plain how to use two famous algebraic identities to derive solution methods.
In what follows, let K denote a field and let K[x1 , . . . , xn ] = K[x] denote the
ring of polynomials in n variables with coefficients over K. The situation
is slightly different depending on whether only equations are being consid-
ered, or if there also inequalities (more precisely, on whether the underlying
field K is formally real):
1. First, suppose that the system contains only the polynomial equa-
tions f1 (x) = 0, f2 (x) = 0, . . . , fs (x) = 0 where f1 , ..., fs ∈ K[x].
We explain how to generate a finite sequence of linear algebra sys-
tems over K which terminate with either a solution over K, the
algebraic closure of K, or provide a certificate of infeasibility. Cru-
cially for practical computation, the linear algebra systems are
over K, not K. The calculations reduce to matrix manipulations
over K, mostly rank computations. The techniques we use are a

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 449

specialization of prior techniques from computational algebra (see


[37, 20, 21, 38]). As it turns out this technique is particularly ef-
fective when the number of solutions is finite, when K is a finite
field and when the system has nice combinatorial information (see
[9]).
2. Second, several authors (see e.g. [23, 41, 29] and references therein)
have considered the solvability (over the reals) of systems of poly-
nomial equations and inequalities. It was shown that in this situ-
ation there is a way to set up the feasibility problem

∃x ∈ Rn s.t. f1 (x) = 0, . . . , fs (x) = 0, g1 (x) ≥ 0, . . . , gk (x) ≥ 0,

where f1 , . . . , fs , g1 , . . . , gk ∈ R[x], as a sequence of semidefinite


programs terminating with a feasible solution (see [41, 29]). Once
more, the combinatorial structure can help in the understanding
of the structure of these relaxations, as is well-known from the
case of stable sets [32] and max-cut [28]. In recent work, Gouveia
et al. [15, 14] considered a sequence of semidefinite relaxations of
the convex hull of real solutions of a polynomial system encoding
a combinatorial problem. They called these approximations theta
bodies because, for stable sets of graphs, the first theta body in
this hierarchy is exactly Lovász’s theta body of a graph [32].
The common central idea to both of the relaxations procedures de-
scribed above is to use the right infeasibility certificates or theorems of
alternative. Just as Farkas’ lemma is a centerpiece for the development of
Linear Programming, here the key point is that the infeasibility of poly-
nomial systems can always be certified by particular algebraic identities
(on non-linear polynomials). To find these infeasibility certificates we rely
either on linear algebra or semidefinite programming (for a quick overview
of semidefinite programming see [51]).
We now introduce some necessary notation and algebraic concepts.
For a detailed introduction we recommend the books [2, 5, 6, 36]. In the
paper K denotes a field and when the distinction is necessary we denote its
algebraic closure by K. Let K[x1 , . . . , xn ] denote the ring of polynomials in
n variables with coefficients over K, which will be abbreviated as K[x]. We
denote the monomials in the polynomial ring K[x] as xα := xα 1 α2 αn
1 x2 · · · xn
n α α n
for α ∈ N . The degree  of x is deg(x ) := |α| := i=1 αi . The degree
of a polynomial f = α∈Nn fα xα , written deg(f ), is the maximum degree
of xα where fα = 0 for α ∈ Nn . Given a set of polynomials F ⊂ K[x], we
write deg(F ) for the maximum degree of the polynomials in F . Given a
set of polynomials F := {f1 , . . . , fm } ⊆ K[x], we define the ideal generated
by F as
m 

ideal(F ) := βi fi | βi ∈ K[x] .
i=1

www.it-ebooks.info
450 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

To study of solutions of a system over a non-algebraically closed field


like R requires extra structure. Given a set of real polynomials G :=
{g1 , . . . , gm } ⊆ R[x], following page 86 in Section 4.2 of [2], we define the
cone generated by G as
⎧ ⎫
⎨  ⎬
cone(G) := sα g α | sα ∈ R[x] is SOS
⎩ ⎭
α∈{0,1} n

m
where g α := i=1 giαi and a polynomial s(x) ∈ R[x] is SOS if itcan be
written as a sum of squares of other polynomials, that is, s(x) = i qi2 (x)
for some qi (x) ∈ R[x]. We note that the cone of G is also called a preordering
generated by G in [36]. If s(x) is SOS, then clearly s(x) ≥ 0 for all x ∈ Rn .
The sum in the definition of cone(G) is finite, with a total of 2m terms,
corresponding to the subsets of {g1 , . . . , gm }.
The notions of ideal and cone are standard in algebraic geometry,
but they also have inherent convex geometry: Ideals are affine sets and
cones are closed under convex combinations and non-negative scalings, i.e.,
they are actually cones in the convex geometry sense. Ideals and cones
are used for deriving new valid constraints, which are logical consequences
of the given constraints. For example, notice that by construction, every
polynomial in ideal({f1 , . . . , fm }) vanishes in the solution set of the system
f1 (x) = 0, . . . , fm (x) = 0 over the algebraic closure of K. Similarly, every
element of cone({g1 , ..., gm }) is clearly non-negative on the feasible set of
g1 (x) ≥ 0, . . . , gm (x) ≥ 0 over R.
It is well-known that optimization algorithms are intimately tied to the
development of infeasibility certificates. For example, the simplex method
is closely related to Farkas’ lemma. Our starting point is a generalization of
this famous principle. We start with a description of two powerful infeasi-
bility certificates for polynomial systems which generalize the classical ones
for linear optimization. First, as motivation, recall from elementary linear
algebra the “Fredholm alternative theorem” (e.g., see page 28 Corollary
3.1.b in [46]):
Theorem 1.1 (Fredholm’s alternative). Given a matrix A ∈ Km×n
and a vector b ∈ Km ,

 x ∈ Kn s.t. Ax + b = 0 ⇔ ∃ μ ∈ Km s.t. μT A = 0, μT b = 1.

It turns out that there are much stronger versions for general polynomials,
which unfortunately do not seem to be widely known among optimizers
(for more details see e.g., [5]).
Theorem 1.2 (Hilbert’s Nullstellensatz). Let F := {f1 , . . . , fm } ⊆
K[x]. Then,
n
 x ∈ K s.t. f1 (x) = 0, ..., fs (x) = 0 ⇔ 1 ∈ ideal(F ).

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 451

Note that 1 ∈ ideal(Fm) means that there exist polynomials β1 , . . . , βm ∈


K[x] such that 1 = i=1 βi fi , and this polynomial identity is thus a cer-
tificate of infeasibility. Fredholm’s alternative theorem is simply a linear
version of Hilbert’s Nullstellensatz where all the polynomials are linear and
the βi ’s are constant.
Example 1. Consider the following set of polynomials in R[x1 , x2 , x3 ]:

F := {f1 := x21 − 1, f2 := 2x1 x2 + x3 , f3 := x1 + x2 , f4 := x1 + x3 }.

By the Nullstellensatz, the system f1 (x) = 0, f2 (x) = 0, f3 (x) = 0, f4 (x) =


0 is infeasible over C if and only if there exist polynomials β1 , β2 , β3 , β4 ∈
R[x1 , x2 , x3 ] that satisfy the polynomial identity β1 f1 +β2 f2 +β3 f3 +β4 f4 =
1. Here, the system is infeasible, so there exist such polynomials as follows:
2 2 1 2 4 2 1
β1 = −1 − x2 , β2 = − + x1 , β3 = − + x1 , β4 = − x1 .
3 3 3 3 3 3 3
The resulting identity provides a certificate of infeasibility of the system.
Now, the two theorems above deal only with the case of equations.
The inclusion of inequalities in the problem formulation poses additional
algebraic challenges because we need to take into account special properties
of the reals. Consider first the case of linear inequalities, which is familiar
to optimizers, where linear programming duality provides the following
characterization:
Theorem 1.3 (Farkas’ lemma). Let A ∈ Rm×n , b ∈ Rm , C ∈ Rk×n ,
and d ∈ Rk .

 x ∈ Rn s.t. Ax + b = 0, Cx + d ≥ 0
+
∃λ ∈ Rm
+, ∃μ ∈ R k
s.t. μ A + λT C = 0, μT b + λT d = −1.
T

Again, although not widely known in optimization, it turns out that similar
certificates do exist for non-linear systems of polynomial equations and
inequalities over the reals. The result essentially appears in this form in [2]
and is due to Stengle [49].
Theorem 1.4 (Positivstellensatz). Let F := {f1 , . . . , fm } ⊂ R[x] and
G := {g1 , . . . , gk } ⊂ R[x].

x ∈ Rn s.t. f1 (x) = 0, . . . , fm (x) = 0, g1 (x) ≥ 0, . . . , gk (x) ≥ 0


+
∃ f ∈ ideal(F ), ∃ g ∈ cone(G) s.t. f (x) + g(x) = −1.

The theorem states that for every infeasible system of polynomial equa-
tions
m and inequalities,
 there exists a simple polynomial identity of the form
α
i=1 β i f i + α∈{0,1} sα g = −1 for some βi , sα ∈ R[x] where sα are SOS,
n

that directly gives a certificate of infeasibility of real solutions.

www.it-ebooks.info
452 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

Example 2. Consider the polynomial system {f = 0, g ≥ 0}, where

f := x2 + x21 + 2 = 0, g := x1 − x22 + 3 ≥ 0.

By the Positivstellensatz, there are no solutions (x1 , x2 ) ∈ R2 if and only


if there exist polynomials β, s1 , s2 ∈ R[x1 , x2 ] that satisfy

β · f + s1 + s2 · g = −1 where s1 and s2 are SOS.

Here, the system is infeasible, so there exist such polynomials as follows:


. /2 . /2
s1 = 13 + 2 x2 + 32 + 6 x1 − 16 , s2 = 2 and β = −6.

The resulting identity provides a certificate of infeasibility of the system.


Of course, we are very concerned with the effective practical computa-
tion of the infeasibility certificates. For the sake of computation and com-
plexity, we must worry about the growth of degrees and thus the growth in
the encoding size of the infeasibility
 certificates. Here, we define the degree
of a Nullstellensatz certificate m i=1 βi fi= 1 as maxi {deg(βi fi )} and the
m
degree of a Positivstellensatz certificate i=1 βi fi + α∈{0,1}n sα g α = −1
as the larger of maxi {deg(βi fi )} and maxα {deg(sα g α )}. On the negative
side, the degrees of the certificates are expected to grow at least linearly
leading to exponential growth in the encoding size of the certificates simply
because the NP-hardness of the original combinatorial questions; see e.g.
[9]. At the same time, tight exponential upper bounds on the degrees have
been derived (see e.g. [22], [16] and references therein). Nevertheless, for
many problems of practical interest, it is often the case that it is possible
to prove infeasibility using low-degree certificates (see [8, 7]). Even more
important is the fact that for a fixed degree of the certificates, the calcu-
lations are polynomial time (see Lemma 2.1 and [41]) and can be reduced
to either linear algebra or semidefinite programming. We summarize the
strong analogies between the case of linear equations and inequalities with
high-degree polynomial systems in the following table:
Table 1
Infeasibility certificates and their associated computational techniques.

Degree\Field Arbitrary Real


Linear Fredholm Alternative Farkas’ Lemma
Linear Algebra Linear Programming
Polynomial Nullstellensatz Positivstellensatz
Bounded degree Linear Algebra Bounded degree SDP

It is important to remark that just as in the classical case of linear


programming, the problem of computation of certificates has very natural
primal-dual formulations, with the corresponding primal and dual vari-
ables playing distinct, but well-defined roles. For example, in the case of

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 453

Fredholm’s alternative, the primal variables are the variables x1 , . . . , xn


while there is a dual variable for each equation. For Nullstellensatz and
Positivstellensatz there is a similar duality, based on linear duality and
semidefinite programming duality, respectively. In what follows, we use
the most intuitive or convenient set-up and we leave to the reader the
exercise of transferring the results to the corresponding dual version.
The remainder of the paper is divided in two main sections: Section 2
is a study of the Hilbert Nullstellensatz, for general fields, used in the
solution of systems of equations. In Section 3, we survey the use of the
Positivstellensatz in the context of solving systems of equations and in-
equalities over the reals. Both sections contain combinatorial applications
that show why these techniques can be of interest in this setting. The fo-
cus of the combinatorial results is understanding those situations when a
constant degree certificate is enough to show infeasibility. These are situa-
tions when hard combinatorial problems have polynomial time algorithms
and as such provide structural insight. Finally, in Section 4, we describe a
methodology, common to both approaches, to recover feasible solutions of
the original combinatorial problem from the outcome of these relaxations.
In addition, we have included an Appendix A that contains proofs of some
the results used in the main body of the paper that are either hard to find
or whose original proof, available elsewhere, is not written in the language
of this survey.
To conclude the introduction we include some more notation and ter-
minology. The variety of F over K, written VK (F ), is the set of common
zeros of polynomials in F in Kn, that is, VK (F ) := {v ∈ Kn : f (v) = 0 ∀f ∈
I}. Also, VK (F ), the variety of F over K, is the set of common zeros of F in
n
K . Note that in combinatorial problems, the variety of a polynomial sys-
tem typically has finitely many solutions (e.g., colorings, cuts, stable sets,
etc.). For an ideal I ⊆ K[x], when VK (I) is finite, the ideal is called zero-
dimensional (this is the case for all of the applications considered here). We
say that a system of polynomial equations is a combinatorial system when
its variety encodes a combinatorial problem (e.g., zeros represent stable
sets, colorings, matchings, etc.) and it is zero-dimensional.
An ideal I ⊆ K[x] is radical
√ if f k ∈ I for some positive integer k
implies f ∈ I. We denote by I the ideal of all polynomials
√ f ∈ K[x] such
that f k ∈ I for some positive integer k. The ideal I is necessarily radical
and √it is called the radical ideal of I. Note that I is radical if and only if
I = I. Given a vector space W over a field K, we write dim(W ) for the
dimension of W . Given vector spaces U ⊆ W , we write W/U as the vector
space quotient. Recall that dim(W/U ) = dim(W ) − dim(U ). Given a set
F ⊂ K[x], span(F ) denotes the vector space generated by F over the field
K. Please note the distinction between the vector space span(F ) and the
ideal ideal(F ).

www.it-ebooks.info
454 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

2. Solving combinatorial systems of equations. In this section,


we wish to solve a given system of polynomial equations f1 (x) = 0, f2 (x) =
0, . . . , fm (x) = 0 where f1 , . . . , fm ∈ K[x]. The systems we consider have
finitely many solutions, each corresponding to a combinatorial object. To
simplify our arguments we also assume that K is algebraically closed, i.e.,
K = K. We abbreviate this system as F (x) = 0 where F := {f1 , . . . , fm } ⊂
K[x]. Here, by solving a system, we mean first determining if F (x) = 0
is feasible over K, and furthermore finding a solution (or all solutions) of
F (x) = 0 if feasible. The literature on polynomial solving is very extensive
and it continues to be an area of active research (see [50, 6, 10] for an
overview and background).
Here we choose to focus on techniques that fit well with traditional op-
timization methods. The main idea is that solving a polynomial system of
equations can be reduced to solving a sequence of linear algebra problems.
The foundations of this technique can be traced back to ([37, 20, 21, 38]).
The specific approach we take to present this technique is closest to that
of Mourrain in [37]. Variants of this technique have been applied to stable
sets [9, 35], vertex coloring [8, 35], satisfiability (see e.g., [3]) and cryp-
tography (see for example [4]). This technique is also strongly related to
Border basis and Gröbner basis techniques, which can also be viewed in
terms of linear algebra computations (see e.g., [20, 21, 38, 50]).
The linear algebra systems of equations have primal and dual represen-
tations in the sense of Fredholm’s lemma. Specifically, in this survey, the
primal approach msolves a linear system to find constant multipliers μ ∈ Km
such that 1 = i=1 μi fi providing a certificate of (non-linear) infeasibility.
Then, the dual approach  aims to find a vector λ with entries in K indexed
by monomials  such that α λxα fi,α = 0 for all i = 1, . . . , m and λ1 = 1
where fi = α fi,α xα for all i. As we see in Section 2.2, the dual approach
amounts to constructing linear relaxations of the set of feasible solutions.
In Sections 2.1 and 2.2, we present the primal and dual approaches respec-
tively.

2.1. Linear algebra certificates. Consider the following corollary


of
m Hilbert’s Nullstellensatz: If there exist constants μ ∈ Km such that
i=1 μi fi = 1, then the polynomial system F (x) = 0 must be infeasible.
In other words, if the system F (x) = 0 is infeasible, then 1 ∈ span(F ).
The crucialpoint here is that determining whether there exists a μ ∈ Km
m
such that i=1 μi fi = 1 is a linear algebra problem over K. The equa-
m
tion i=1 μi fi = 1 is called a certificate of infeasibility of the polynomial
system.
Example 3. Consider again the following set of polynomials from
Example 1:

F := {f1 := x21 − 1, f2 := 2x1 x2 + x3 , f3 := x1 + x2 , f4 := x1 + x3 }.

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 455

We can abbreviate the infeasible polynomial system of equations f1 (x) =


0, f2 (x) = 0, f3 (x) = 0, f4 (x) = 0 as F (x) = 0. We can prove that the
system F (x) = 0 is infeasible if we can find μ ∈ R4 satisfying the following:

μ1 f1 + μ2 f2 + μ3 f3 + μ4 f4 = 1
⇔ μ1 (x21 − 1) + μ2 (2x1 x2 + x3 ) + μ3 (x1 + x2 ) + μ4 (x1 + x3 ) =1
⇔ μ1 x21 + 2μ2 x1 x2 + (μ2 + μ4 )x3 + μ3 x2 + (μ3 + μ4 )x1 − μ1 = 1.

Then, equating coefficients on the left and right hand sides of the equation
above gives the following linear system of equations:

−μ1 = 1 (1), μ3 + μ4 = 0 (x1 ), μ3 = 0 (x2 ),


μ3 + μ4 = 0 (x3 ), 2μ2 = 0 (x1 x2 ), μ1 = 0 (x21 ).

We abbreviate this system as μT F = 1. Even though F (x) = 0 is infeasible,


the linear system μT F = 1 is infeasible, and so, we have not found a
certificate of infeasibility of F (x) = 0.
 α
More formally, let fi =  α∈Nn fi,α x where only finitely many fi,α are
m
m
non-zero
 i = 1, ..., m. Then, i=1 μ i f i = 1 if and only if i=1 μi fi,0 = 1
and m n
i=1 μi fi,α = 0 for all α ∈ N where α = 0. Note that there is one
linear equation per monomial appearing in F . We abbreviate this linear
system as μT F = 1 where we consider F as a matrix whose rows are the
coefficient vectors of its polynomials and we consider the constant polyno-
mial 1 as the vector of its coefficients (i.e., a unit vector). The columns of
F are indexed by monomials with non-zero coefficients. We remark that in
the special case where F (x) = 0 is a linear system of equations, then Fred-
holm’s alternative says that F (x) = 0 is infeasible if and only if μT F = 1
is feasible.
Remark 2.1. Crucially for computation, when we solve the linear
system μT F = 1, we can do so over the smallest subfield of K containing
the coefficients of the polynomials in F , which is particularly useful if such
a subfield is a finite field.
In general, even if F (x) = 0 is infeasible, μT F = 1 may not be feasible
as in the above example. In order to prove infeasibility, we must add
polynomials from ideal(F ) to F and try again to find a μ such that μT F =
1. Hilbert’s Nullstellensatz guarantees that, if F (x) = 0 is infeasible, there
exists a finite set of polynomials from ideal(F ) that we can add to F so
that the linear system μT F = 1 is feasible.
More precisely, it is enough to add polynomials of the form xα f for
α
x a monomial and some polynomial f ∈ F . Why mis this? If F (x) = 0
is infeasible, then Hilbert’s Nullstellensatz says i=1 βi fi = 1 for some
β1 , . . . , βm ∈ K[x]. Let d = maxi {deg(βi )}. Then, if we add to F all
polynomials of the form xα f where f ∈ F and deg(xα ) ≤ d. Then, the
K-linear span of F , that is span(F ), contains βi fi for all i, and thus,

www.it-ebooks.info
456 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

1 ∈ span(F ) or equivalently μT F  = 1 is feasible (as a linear algebra


problem) where F  denotes the larger polynomial system.
Example 4. Consider again the polynomial system F (x) = 0 from Ex-
ample 3. Here, μT F = 1 is feasible, so we must thus add redundant polyno-
mial equations to the system F (x) = 0. In particular, we add the following
redundant polynomial equations: x2 f1 (x) = 0, x1 f2 (x) = 0, x1 f3 (x) = 0,
and x1 f4 (x) = 0. Let F  := {f1 , f2 , f3 , f4 , x2 f1 , x1 f2 , x1 f3 , x1 f4 }.
Then, the system μT F  = 1 is now as follows:

−μ1 = 1 (1), μ3 + μ4 = 0 (x1 ), μ3 − μ5 = 0 (x2 ),


μ2 + μ4 = 0 (x3 ), 2μ2 + μ7 = 0 (x1 x2 ), μ1 + μ7 + μ8 = 0 (x21 ),
μ6 + μ8 = 0 (x1 x3 ), μ5 + 2μ6 = 0 (x21 x2 ).

This system is feasible proving that F (x) = 0 is infeasible. The solution is


μ = (−1, − 32 , − 32 , 23 , − 32 , − 13 , 43 , − 31 ), which gives the following certificate of
infeasibility as given in Example 1:
2 2 2 2 1 4 1
−f1 − f2 − f3 + f4 − x2 f1 + x1 f2 + x1 f3 − x1 f4 = 1.
3 3 3 3 3 3 3
Next, we present the dual approach to the one in this section.
2.2. Linear algebra relaxations. In optimization, it is quite com-
mon to “linearize” non-linear polynomial systems of equations by replacing
all monomials in the system with new variables giving a system of linear
constraints. Specifically, we can construct a linear algebra relaxation of
the solutions of F (x) = 0 by replacing every monomial xα in a polynomial
equation in F (x) = 0 with a new variable λxα thereby giving a system
of linear equations in the new λ variables, one variable for each mono-
mial appearing in F . Readers familiar with relaxation procedures such as
Sherali-Adams and Lovász-Schrijver (see [27] and references therein) will
see a lot of similarities, but here we deal only with equality constraints.
Example 5. Consider the following feasible system in C[x1 , x2 , x3 ]:

f1 (x) = x21 − 1 = 0, f2 (x) = 2x1 x2 + x3 = 0, f3 (x) = x1 + x2 = 0.

This system has two solutions (x1 , x2 , x3 ) = (1, −1, 2) and (x1 , x2 , x3 ) =
(−1, 1, 2). Let F = {f1 , f2 , f3 }. So, we abbreviate the above system as
F (x) = 0. We can replace the monomials 1, x1 , x2 , x3 , x21 , x1 x2 with the
variables λ1 , λx1 , λx2 , λx3 , λx21 , λx1 x2 respectively. The system F (x) = 0
thus gives rise to the following set of linear equations:

λx21 − λ1 = 0, 2λx1 x2 + λx3 = 0, λx1 + λx2 = 0. (2.1)

We abbreviate the above system as F ∗ λ = 0.


Solutions of F (x) = 0 give solutions of F ∗ λ = 0: If x is a solution of
F (x) = 0 above, then setting λ1 = 1, λx1 = x1 , λx2 = x2 , λx3 = x3 , λx21 =

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 457

x21 , λx1 x2 = x1 x2 gives a solution of F ∗ λ = 0. So, taking x = (1, −1, 2), we


set λ1 = 1, λx1 = 1, λx2 = −1, λx3 = 2, λx21 = 1, and λx1 x2 = −1. Then,
we have F ∗ λ = 0. Thus, the solutions of F ∗ λ = 0 gives a vector space
effectively containing all of the solutions of F (x) = 0. Hence, F ∗ λ = 0
gives a linear relaxation of F (x) = 0.
There are solutions of F ∗ λ = 0 that do not correspond to solutions
of F (x) = 0 because the linear system F ∗ λ = 0 does not take into account
the non-linear constraints that λ1 = 1, λx21 = λ2x1 and λx1 x2 = λx1 λx2 ; For
example, λ1 = 1, λx1 = 2, λx2 = −2, λx3 = −2, λx21 = 1 and λx1 x2 = 1 is a
solution of F ∗λ = 0, but x1 = λx1 = 2, x2 = λx2 = −2, and x3 = λx3 = −2
is not a solution of F (x) = 0.
We now formalize the above example construction of a linear system.
We can consider the polynomial ring K[x] as an infinite dimensional vector
α
space over K where the set of all monomials  x formsαa vector space basis
of K[x]. In other words, a polynomial f = α∈Nn fα x can be represented
as an infinite sequence (fα )α∈Nn where only finitely many fα are non-
zero. We define K[[x1 , . . . , xn ]] = K[[x]] as the ring of formal power series
in thevariables x1 , . . . , xn with coefficients in K. So, the power series
α
λ = α∈Nn λα x can be represented as an infinite sequence (λα )α∈Nn .
Note that we do not require that only finitely many λα are non-zero. We
define
 the bilinear form ∗ : K[x]  × K[[x]]α → K as follows: given f =
α
n fα x ∈ K[x] and λ = α∈Nn λα x ∈ K[[x]], we define f ∗ λ =
 α∈N
α∈Nn fα λα , which is always finite since only finitely many fα are non-
zero. Thus, we define a linear relaxation of {x ∈ Kn : F (x) = 0}, written
as {λ ∈ K[[x]] : F ∗ λ = 0}, as the set of linear equations f ∗ λ = 0 for
all f ∈ F . We denote the set of solutions of the linear system F ∗ λ = 0
as F ◦ := {λ ∈ K[[x]] : F ∗ λ = 0}, called the annihilator of F , which is a
vector subspace of K[[x]]. See Appendix A for further details.
Note that, for any polynomial f ∈ K[x] and any point v ∈ Kn , we have
f (v) = f ∗ λ(v) where λ(v) = (v α )α∈Nn . Thus, for any v ∈ Kn , F (v) = 0
if and only if F ∗ λ(v) = 0. So, the system F ∗ λ = 0 can be considered
as a linear relaxation of the system F (x) = 0. As mentioned in the above
example, there are solutions of F ∗λ = 0 that do not correspond to solutions
of F (x) = 0 because the linear system F ∗ λ = 0 does not take into account
the relationships between the λ variables. Specifically, if λ corresponded to
a solution of F (x) = 0, then we must have λxα = λxβ λxγ for all monomials
xα , xβ , xγ where xα = xβ xγ . If we added these non-linear constraints to
the linear constraints F ∗ λ = 0, then we would essentially have the original
polynomial system F (x) = 0.
The system F ∗ λ = 0 is always feasible, but the constraint λ1 = 1 also
holds for any λ that corresponds to a solution x of F (x) = 0. Thus, if the
inhomogeneous linear system {F ∗ λ = 0, λ1 = 1} is infeasible, then so is
the system of polynomials F (x) = 0.

www.it-ebooks.info
458 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

Remark 2.2. Crucially for computation again, when we solve the


linear system {F ∗ λ = 0, λ1 = 1}, we can do so over the smallest subfield
of K containing the coefficients of the polynomials in F .
Remark 2.3. Importantly, the linear system {F ∗ λ = 0, λ1 = 1} is
dual to the linear system μT F = 1 from the previous section by Fredholm’s
alternative meaning that {F ∗ λ = 0, λ1 = 1} is infeasible if and only if
μT F = 1 is feasible.
There is a fundamental observation we wish to make here: adding
redundant polynomial equations can lead to a tighter relaxation.
Example 6. (Cont.) Add x1 f3 (x) = x21 + x1 x2 = 0 to the system
F (x) = 0 giving the system F  (x) = 0 where F  := {f1 , f2 , f3 , x1 f3 }. The
system F  (x) = 0 has the same solutions as F (x) = 0. The polynomial
equation x1 f3 (x) = 0 gives rise to a new linear equation λx21 + λx1 x2 = 0
giving the following linear system F  ∗ λ = 0:

λx21 − λ1 = 0, 2λx1 x2 + λx3 = 0, λx1 + λx2 = 0, λx21 + λx1 x2 = 0. (2.2)

The dimension of the solution space of the original system F ∗λ = 0 is three


if we ignore all λ variables that do not appear in the linear system, or in
other words, if we project the solution space onto the λ variables appearing
in the system. However, the dimension of the projected solution space of
F  ∗ λ = 0 is two; so, F  ∗ λ = 0 is a tighter relaxation of F (x) = 0.
Extending this idea, consider the ideal I = ideal(F ), which is the
set of all redundant polynomials given as a polynomial combination of
polynomials in F , then I ◦ becomes a finite dimensional vector space where
dim(I ◦ ) is precisely the number of solutions of F (x) = 0 over K, including
multiplicities, assuming that there are finitely many solutions. Note that
by linear algebra, I ◦ is isomorphic to the vector space quotient K[x]/I
(see e.g., [50]). Furthermore, if I is radical, then dim(I ◦ ) = dim(K[x]/I)
is precisely the number of solutions of F (x) = 0. So, there is a direct
relationship between the number of solutions of a polynomial system and
the dimension of the solution space of its linear relaxation (see e.g., [6] for
a proof).
Theorem 2.1. Let I ⊆ K[x] be a zero-dimensional ideal. Then,
dim(I ◦ ) is finite and dim(I ◦ ) is the number of solutions of polynomial sys-
tem I(x) = 0 over K including multiplicities, so |VK (I)| ≤ dim(I ◦ ) with
equality when I is radical.
So, if we can compute dim(I ◦ ), then we can determine the feasibility
of I(x) = 0 over K. Unfortunately, we cannot compute dim(I ◦ ) directly.
Instead, under some conditions (see Theorem 2.2), we can compute dim(I ◦ )
by computing the dimension of F ◦ when projected onto the λxα variables
where deg(xα ) ≤ deg(F ).
2.3. Nullstellensatz Linear Algebra Algorithm (NulLA). We
now present an algorithm for determining whether a polynomial system of
equations is infeasible using linear relaxations. Let F ⊆ K[x] and again let

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 459

F (x) = 0 be the polynomial system f (x) = 0 for all f ∈ F . We wish to


determine whether F (x) = 0 has a solution over K.
The idea behind NulLA [8] is straightforward: we check whether the
linear system {F ∗ λ = 0, λ1 = 1} is infeasible or equivalently whether
μT F = 1 is feasible (i.e., 1 ∈ span(F )) using linear algebra over K and if
not then we add polynomials from ideal(F ) to F and try again. We add
polynomials in the following systematic way: for each polynomial f ∈ F
and for each variable xi , we add xi f to F . So, the NulLA algorithm is as
follows: if {F ∗ λ = 0, λ1 = 1} is infeasible, then F (x) = 0 is infeasible and
stop, otherwise for every variable xi and every f ∈ F add xi f to F and
repeat.
In the following, we assume without loss of generality that F is closed
under K-linear combinations, that is F = span(F ), and thus, F is a vector
space over K. Note that taking the closure of F under K-linear combina-
tions does not change the set of solutions of F (x) = 0 and does not change
the set of solutions of F ∗ λ = 0. In practice, we must choose a vector
space basis of F for computation, but the point we wish to make is that
the choice of basis is irrelevant. Moreover, we find that it is more natural
to work with vector spaces and that it leads to a more concise exposition.
Recall from above that {F ∗ λ = 0, λ1 = 1} is infeasible if and only if
1 ∈ span(F ), which when F is a vector space, simplifies to 1 ∈ F since
span(F ) = F .
n
For a vector space F ⊂ K[x], we define F + := F + i=1 xi F where
xi F := {xi f : f ∈ F }. Note that F + is also a vector subspace of K[x].
Then, F + is precisely the linear span of F and xi F for all i = 1, . . . , n. So,
the NulLA algorithm for vector spaces is as follows (see Algorithm 1): if
1 ∈ F , then F (x) = 0 is infeasible and stop, otherwise set F ← F + and
repeat. There is an upper bound on the number of times we need to repeat
the above step given by the Nullstellensatz bound of the system F (x) = 0
(see [22]): if F (x) = 0 has a Nullstellensatz bound D, then if F (x) = 0
is
 infeasible, there must exist a Nullstellensatz certificate of infeasibility
i βi fi = 1 where deg(βi ) ≤ D, that is, the degree of the certificate is
at most deg(F ) + D. After d iterations of NulLA, the set F contains all
linear combinations of polynomials of the form xα f where |α| ≤ d and
where f was one of the initial polynomials in F , and so, if the system is
infeasible, then NulLA will find a certificate of infeasibility in at most the
Nullstellensatz bound number of iterations.
While theoretically the Nullstellensatz bound limits the number of
iterations, this bound is in general too large to be practically useful (see
[8]). Hence, in practice, NulLA is most useful for proving infeasibility (see
Section 2.4).
Next, we discuss improving NulLA by adding redundant polynomials
to F in such a way so that deg(F ) does not grow unnecessarily. We call
this improved algorithm the Fixed-Point Nullstellensatz Linear Algebra
(FPNulLA) algorithm. Some variations of FPNulLA appeared, e.g., in

www.it-ebooks.info
460 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

Algorithm 1 NulLA Algorithm [8]


Input: A finite dimensional vector space F ⊆ K[x] and a Nullstellensatz
bound D.
Output: Feasible, if F (x) = 0 is feasible over K, else Infeasible.
1: for k = 0, 1, 2, . . . , D do
2: If 1 ∈ F , then return Infeasible.
3: F ← F +.
4: end for
5: Return Feasible.

[37, 44, 25]. The basic idea behind the FPNulLA algorithm is that, if
1 ∈ F , then instead of replacing F with F + and thereby increasing deg(F ),
we check to see whether there are any new polynomials in F + with degree
at most deg(F ) that were not in F and add them to F , and then check
again whether 1 ∈ F . More formally, if 1 ∈ F , then we replace F with
F + ∩ K[x]d where K[x]d is the set of all polynomials with degree at most
d = deg(F ). We keep replacing F with F + ∩ K[x]d until either 1 ∈ F or
we reach a fixed point, F = F + ∩ K[x]d . This process must terminate.
Note that if we find that 1 ∈ F at some stage of FPNulLAs this implies
that there exists an infeasibility certificate of the form 1 = i=1 βi fi where
β1 , ..., βs ∈ K[x] and the polynomials f1 , ..., fs ∈ K[x] are a vector space
basis of the original set F .
Moreover, we can also improve NulLA by proving that the system
F (x) = 0 is feasible well before reaching the Nullstellensatz bound as fol-
lows. When 1 ∈ F and F = F + ∩ K[x]d , then we could set F ← F + and
d ← d + 1 and repeat the above process. However, when we reach the
fixed point F = F + ∩ K[x]d , we can use the following theorem to determine
if the system is feasible and if so how many solutions it has. First, we
introduce some notation. Let πd : K[[x]] → K[[x]]d be the truncation or
projection of a power series onto a polynomial of degree at most d with coef-
ficients in K. Below, we abbreviate dim(πd (F ◦ )) as dimd (F ◦ ) and similarly
dim(πd−1 (F ◦ )) as dimd−1 (F ◦ ).
Theorem 2.2. Let F ⊂ K[x] be a finite dimensional vector space and
let d = deg(F ). If F = F + ∩ K[x]d and dimd (F ◦ ) = dimd−1 (F ◦ ), then
dim(I ◦ ) = dimd (F ◦ ) where I = ideal(F ).
See the Appendix for a proof of Theorem 2.2 or see original proof in
[37]. There are many equivalent forms of the above theorem that appear
in the literature (see e.g., [37, 44, 25]).
Recall from Theorem 2.1, that there are dim(I ◦ ) solutions of F (x) = 0
over K including multiplicities where I = ideal(F ) and exactly dim(I ◦ ) so-
lutions when I is radical. Checking the fixed point condition in FPNulLA
whether F = F + ∩ K[x]d is equivalent to checking whether dim(F ) =
dim(F + ∩ K[x]d ). Furthermore, to check the condition that dimd (F ◦ ) =
dimd−1 (F ◦ ), we need to compute dim(F + ∩ K[x]d ) and dim(F ∩ K[x]d−1 )

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 461

since dim(K[x]d /F ) = dimd (F ◦ ) and also dim(K[x]d−1 /(F ∩ K[x]d−1)) =


dimd−1 (F ◦ ) (see Lemma A.1). So, in order to check the condition in FP-
NulLA, we need to compute dim(F ), dim(F + ∩K[x]d ) and dim(F ∩K[x]d−1 ),
which amounts to matrix rank calculations over the field of coefficients of
a given basis of F .
We can now present the FPNulLA algorithm. See the Appendix or
[37, 7] for details. The FPNulLA algorithm always terminates for zero-
dimensional polynomials systems, which in particular includes combinato-
rial systems (see Lemma A.2).

Algorithm 2 FPNulLA Algorithm


Input: A vector space F ⊂ K[x].
Output: The number of solutions of F (x) = 0 over K up to multiplicities.
1: Let d ← deg(F ).
2: loop
3: if 1 ∈ F then Return 0 (infeasible).
4: while F = F + ∩ K[x]d do
5: Set F ← F + ∩ K[x]d .
6: if 1 ∈ F then Return 0 (infeasible).
7: end while
8: if dimd (F ◦ ) = dimd−1 (F ◦ ) then Return dimd (F ◦ ) (feasible).
9: F ← F +.
10: d ← d + 1.
11: end loop

Example 7. Consider again the system below with polynomials in


K[x, y] with K = F2 . This system has two solutions.

1 + x + x2 = 0, 1 + y + y 2 = 0, x2 + xy + y 2 = 0.

Let F := span({1 + x + x2 , 1 + y + y 2 , x2 + xy + y 2 }). Then, 1 ∈ F and


deg(F ) = 2. Now,

F + = F + xF + yF
= F + span({x + x2 + x3 , x + xy + xy 2 , x3 + x2 y + xy 2 })
+ span({y + xy + x2 y, y + y 2 + y 3 , x2 y + xy 2 + y 3 }).

Then, F + ∩ K[x]2 = span({1 + x + x2 , 1 + y + y 2 , x2 + xy + y 2 , 1 + x + y}).


So, F = F + ∩ K[x]2 . Next, let F := F + ∩ K[x]2 . One can check that now
F = F + ∩ K[x]2 . Moreover,

dim2 (F ◦ ) = dim(K[x]2 /F ) = dim(K[x]2 ) − dim(F ) = 2

and

dim1 (F ◦ ) = dim(K[x]1 /(F ∩ K[x]1 ) = dim(K[x]1 ) − dim(F ∩ K[x]1 ) = 2.

www.it-ebooks.info
462 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

Therefore, dim2 (F ◦ ) = dim1 (F ◦ ) proving that F (x) = 0 is feasible with at


most 2 solutions.
We refer to the number of iterations (the for loop) that NulLA takes
to solve a given system of equations as the NulLA rank of the system. Note
that if an infeasible system F (x) = 0 has a NulLA rank of r, then it has a
Nullstellensatz certificate of infeasibility of degree r + deg(F ). Similarly to
the NulLA rank, we refer to the number of outer iterations (the outer loop)
that FPNulLA takes to the system as the FPNulLA rank of the system.
We can consider the NulLA rank and the FPNulLA rank as measures of
the “hardness” of proving infeasibility of the system. In section 2.4, we
present experimental evidence that the NulLA rank and even more so the
FPNulLA are “good” measures of the “hardness” of proving infeasibility
of a system (see also [3] for theoretical evidence for FPNulLA).
For a given class of polynomial system of equations, it is interesting
to understand the growth of the NulLA rank or FPNulLA rank because of
the implications for the complexity of solving the given class of problems.
Furthermore, for some fixed rank, it is also interesting to characterize which
systems can be solved at that rank since this class of systems are polynomial
time solvable by Lemma 2.1 below (see proof in Appendix and a proof
for NulLA in [35]). For example, in Section 2.5, we characterize systems
encoding 3-colorability with NulLA rank one.
Lemma 2.1. Let L ∈ N be fixed. Let F = span({f1 , f2 , . . . , fm }) ⊆
K[x] be a finite dimensional vector space of K[x]. Polynomials are assumed
to be encoded as vectors of coefficients indexed by all monomials of degree
at most deg(F ).
1. The first L iterations (the for loop) of the NulLA algorithm can be
computed in polynomial time in n and the input size of the defining
basis of F .
2. When K is a finite field, the first L iterations (the outer loop) of
the FPNulLA algorithm can be computed in polynomial time in n,
log2 (|K|) and the input size of the defining basis of F .

2.4. Experimental results. In this section, we summarize experi-


mental results for graph 3-coloring from [7], which illustrate the practical
performance of the NulLA and FPNulLA algorithms. For further and more
detailed results, see [8, 35, 7]. Experimentally, for graph 3-coloring, NulLA
and FPNulLA are well-suited to proving infeasibility, that is, that no 3-
coloring exists. The system polynomials we use to encode 3-colorability has
coefficients on F2 (see Proposition 1.1) and thus the linear algebra opera-
tions are very fast. However, even though in theory NulLA and FPNulLA
can determine feasibility, for the experiments described below NulLA and
FPNulLA are only suitable for proving infeasibility.
Here, we are interested in the percentage of randomly generated graphs
whose polynomial system encoding has a NulLA rank of one, a NulLA
rank of two or a FPNulLA rank of one. The G(n, p) model [13] is used

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 463

for generating random graphs where n is the number of vertices and p is


the probability that an edge is included between any two vertices. Also,
without loss of generality, for a slightly smaller polynomial encoding, the
color of one of the vertices of each randomly generated graph was fixed.
The experimental results are presented in Figure 1 (taken from [7]),
which plots the percentage of 1000 random graphs in G(100, p) that were
proven infeasible with a NulLA rank of one, with a NulLA rank of two, with
a FPNulLA rank of one, or with an exact method versus the p value. The
exact method used was to model graph 3-coloring as a Boolean satisfiability
problem [12] and then use the program zchaff [52] to solve the satisfiability
problem.

100
Exact
NulLA1
NulLA2
FPNulLA1
80

60
% Infeasible

40

20

0
0 0.02 0.04 0.06 0.08 0.1
Edge Probability

Fig. 1. Non-3-colorable graphs with NulLA rank 1 and 2 and FPNulLA rank 1.

It is well-known that there is a distinct phase transition from feasibil-


ity to infeasibility for graph 3-coloring, and it is at this phase transition
that graphs exists for which it is difficult on average to prove infeasibility or
feasibility (see [19]). Observe that the infeasibility curve for NulLA resem-
bles that of the exact infeasibility curve and that the infeasibility curve for
FPNulLA also resembles the infeasibility curve and clearly dominates the
infeasibility curve for NulLA. These results suggest that the NulLA rank
or FPNulLA rank are a reasonable measure of the hardness of proving in-
feasibility since those graphs that require a high rank are located near the
phase transition.
Lastly, we comment on the runnings times of NulLA and FPNulLA and
the exact approach using zchaff for the experiments on random graphs in

www.it-ebooks.info
464 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

G(100, p) above. The NulLA rank one and FPNulLA rank one approaches
ran on average in less than a second for all p values. However, the exact
approach using zchaff ran in split second times for all p values, but pre-
liminary computational experiments indicate that the gap in running times
between the exact approach and the FPNulLA rank one approach closes
for larger graphs. The NulLA rank two approach ran on average in less
than a second for p ≤ 0.04 and p ≥ 0.08, but the average running times
peaked at about 24 seconds at p = 0.65. Interestingly, for each approach,
the average running time peaked at the transition from feasible to infeasi-
ble at the p value where about half of the graphs were proven infeasible by
the approach.
In order to better understand the practical implications of the NulLA
and FPNulLA approaches, there needs to be more detailed computational
studies performed to compare this approach with the exact method using
satisfiability and other exact approaches such as traditional integer pro-
gramming techniques. See [8] for some additional experimental data.
2.5. Application: The structure of non-3-colorable graphs. In
this section, we state a combinatorial characterization of those graphs
whose combinatorial system of equations encoding 3-colorability has a
NulLA rank of one thus giving a class of polynomial solvable graphs by
Lemma 2.1, and also, we recall bounds for the NulLA rank (see [35]):
Theorem 2.3. The NulLA rank for a polynomial encoding over F2 of
the 3-colorability of a graph with n vertices with no 3-coloring is at least
one and at most 2n. Moreover, in the case of a non-3-colorable graph
containing an odd-wheel (e.g. a 4-clique) as a subgraph, the NulLA rank is
exactly one.
Now we look at those non-3-colorable graphs that have a NulLA rank of
one. Let A denote the set of all possible directed edges or arcs in the graph
G. We are interested in two types of substructures of the graph G: oriented
partial-3-cycles and oriented chordless 4-cycles (see Figure 2). An oriented
partial-3-cycle is a set of two arcs of a 3-cycle, that is, a set {(i, j), (j, k)}
also denoted (i, j, k) where (i, j), (j, k), (k, i) ∈ A. An oriented chordless
4-cycle is a set of four arcs {(i, j), (j, l), (l, k), (k, i)} also denoted (i, j, k, l)
where (i, j), (j, l), (l, k), (k, i) ∈ A and (j, k), (i, l) ∈ A.

Fig. 2. (i) oriented partial 3-cycle and (ii) an oriented chordless 4-cycle.

Now, we can state a sufficient condition for non-3-colorability [7]. This


sufficient condition is satisfied if and only if the combinatorial system en-
coding 3-coloring has a NulLA rank of one, which is proved in [7].

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 465

Theorem 2.4. The graph G is not 3-colorable if there exists a set C


of oriented partial 3-cycles and oriented chordless 4-cycles such that
(i,j) | + |C(j,i) | ≡ 0 (mod 2) for all (i, j) ∈ E and
1. |C
2. (i,j)∈A,i<j |C(i,j) | ≡ 1 (mod 2)
where |C(i,j) | denotes the number of cycles in C (either 3-cycles or 4-cycles)
in which the arc (i, j) ∈ A appears.
Condition 1 in Theorem 2.4 means that every undirected edge of G
is covered by an even number of directed edges from cycles in C (ignoring
orientation). Condition 2 in Theorem 2.4 means that, given any orientation
of G, the total number of times the arcs in that orientation appear in the
cycles of C is odd. The particular orientation we use in Theorem 2.4 is the
orientation given by the set of arcs {(i, j) ∈ A : i < j}, but the particular
orientation we use for Condition 2 is irrelevant (see [7]).
Using Theorem 2.4, proving that graphs containing odd wheels (e.g.,
4-cliques) are not 3-colorable (see Theorem 2.3) is straight-forward ([7]):
Example 8. Assume a graph G contains an odd wheel with vertices
labelled as in Figure 3 below. Consider the following set of oriented partial
3-cycles: C := {(i, 1, i + 1) : 2 ≤ i ≤ n − 1} ∪ {(n, 1, 2)}. The oriented
partial 3-cycles of C are shown in Figure 3.

8
7 9

6 10

5 11
1

3 n
2

Fig. 3. Odd wheel.

The set C satisfies Condition 1 of Theorem 2.4 since each edge is


covered by exactly zero or two cycles in C. Also, C satisfies Condition 2
of Theorem 2.4 since each arc (1, i) ∈ Arcs(G) is covered exactly once by
a cycle in C and there are an odd number of arcs (1, i) ∈ Arcs(G). Thus,
G is non-3-colorable by Theorem 2.4.
The Grötzsch graph is a non-trivial example of a non-3-colorable graph
with a degree one Nullstellensatz certificate ([7]):
Example 9. Consider the Grötzsch graph (Mycielski 4) in Fig-
ure 4, which has no 3-coloring. It contains no 3-cycles. Now, consider the

www.it-ebooks.info
466 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

following set of oriented chordless 4-cycles, which we show gives a certificate


of non-3-colorability by Theorem 2.4.

C := {(1, 2, 3, 7), (2, 3, 4, 8), (3, 4, 5, 9), (4, 5, 1, 10), (1, 10, 11, 7),
(2, 6, 11, 8), (3, 7, 11, 9), (4, 8, 11, 10), (5, 9, 11, 6)}.

Figure 4 illustrates the edge directions for the 4-cycles of C. Each undi-
rected edge of the graph is contained in exactly two 4-cycles, so C satisfies
Condition 1 of Theorem 2.4. Now,

|C(6,11) | = |C(7,11) | = |C(8,11) | = |C(9,11) | = |C(10,11) | = 1,

and |C(i,j) | ≡ 0 (mod 2) for all other arcs (i, j) ∈ A where i < j. Thus,

|C(i,j) | ≡ 1 (mod 2),
(i,j)∈A,i<j

so Condition 2 is satisfied, and therefore, the graph has no 3-coloring.

Fig. 4. Grötzsch graph

There are no known combinatorial characterizations concerning higher


NulLA ranks.
3. Adding polynomial inequalities. Up until this point we have
worked over arbitrary fields (with special attention to finite fields due to
their fast and exact computation), where the only allowable constraints
were equations. Now we turn our attention to the real case (i.e. K = R),
where we have the additional possibility of specifying inequalities (more
generally, one can work over ordered or formally real fields). In this case,
following the terminology of real algebraic geometry, we call the solution set
of a system of polynomial equations and inequalities a basic semialgebraic
set. Note that convex polyhedra correspond to the particular case where

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 467

all the constraint polynomials have degree one. As we have seen earlier
in the Positivstellensatz (Theorem 1.4 above), the emptiness of a basic
semialgebraic set can be certified through an algebraic identity involving
sum of squares of polynomials.
The connection between sum of squares decompositions of polynomi-
als and convex optimization can be traced back to the work of N. Z. Shor
[48]. His work went relatively unnoticed for several years, until several au-
thors, including Lasserre, Nesterov, and Parrilo, observed, around the year
2000, that the existence of sum of squares decompositions and the search
for infeasibility certificates for a semialgebraic set can be addressed via a
sequence of semidefinite programs relaxations [23, 40, 41, 39]. The first
part of this section will be a short description of the connections between
sums of squares and semidefinite programming, and how the Positivstel-
lensatz allows, in a analogous way to what was presented in Section 2 for
the Nullstellensatz, for a systematic way to formulate these semidefinite
relaxations.
A very central preoccupation of combinatorial optimizers has been the
understanding of the facets that describe the integer hull (normally binary)
of a combinatorial problem. As we will see later on, one can recover quite
a bit of information about the integer hull of combinatorial problems from
a sequence combinatorially controlled SDPs. This kind of approach was
pioneered in the lift-and-project method of Balas, Ceria and Cornuéjols
[1], the matrix-cut method of Lovász and Schrijver [34] and the lineariza-
tion technique of Sherali-Adams [47]. Here we try to present more recent
developments (see [30] and references therein for a very extensive survey).

3.1. Sums of squares, SDP, and feasibility of semialgebraic


sets. Recall that a multivariate polynomial p(x) is a sum of squares (SOS
for short) if it can
 be written as a sum of squares of other polynomials,
that is, p(x) = i qi2 (x), qi (x) ∈ R[x]. The condition that a polynomial
is a sum of squares is a quite natural sufficient test for polynomial non-
negativity. Thus instead of asking whether even degree polynomials are
non-negative we ask the easier question whether they are sums of squares.
More importantly, as we shall see, the existence of a sum of squares de-
composition can be decided via semidefinite programming.
Theorem 3.1. A polynomial p(x) is SOS if and only if p(x) = z T Qz,
where z is a vector of monomials in the xi variables, and Q is a symmetric
positive semidefinite matrix.
By the theorem above, every SOS polynomial can be written as a
quadratic form in a set of monomials, with the corresponding matrix being
positive semidefinite. The vector of monomials z in general depends on
the degree and sparsity pattern of p(x). If p(x) has n variables and total
degree 2d, then z can always be chosen as a subset of the. set/of monomials
of degree less than or equal to d, which has cardinality n+dd .

www.it-ebooks.info
468 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

Example 10. The polynomial p(x1 , x2 ) = x21 − x1 x22 + x42 + 1 is SOS.


Among infinitely many others, p(x1 , x2 ) has the following decompositions:
3 1
p(x1 , x2 ) =(x1 − x22 )2 + (x1 + x22 )2 + 1
4 4
1 2 2 2 2 1 23
= (3 − x2 ) + x2 + (9x1 − 16x22 )2 + x21 .
9 3 288 32
The polynomial p(x1 , x2 ) has the following representation:
⎡ ⎤T ⎡ ⎤⎡ ⎤
1 6 0 −2 0 1
1⎢ x ⎥ ⎢ 0 4 0 0 ⎥ ⎢ x2 ⎥
p(x1 , x2 ) = ⎢ 2 ⎥ ⎢ ⎥⎢ ⎥
6 ⎣ x22 ⎦ ⎣ −2 0 6 −3 ⎣ ⎦ x22 ⎦
x1 0 0 −3 6 x1

where the matrix in the expression above is positive semidefinite.


In the representation f (x) = z T Qz, for the right- and left-hand sides
to be identical, all the coefficients of the corresponding polynomials should
be equal. Since Q is simultaneously constrained by linear equations and a
positive semidefiniteness condition, the problem can be easily seen to be
directly equivalent to a semidefinite programming feasibility problem in the
standard primal form.
Now we describe an algorithm (originally presented in [40, 41]) and
illustrate it with an example, on how we can use SDPs to decide the fea-
sibility of a system of polynomial inequalities. Exactly as we did for the
Nullstellensatz case, we can look for the existence of a Positivstellensatz
certificate of bounded degree D (see Theorem 1.4). Once we assume that
the degree D is fixed we can apply Theorem 3.1 and obtain a reformulation
as a semidefinite programming problem. We formalize this description in
the following algorithm:

Algorithm 3 Bounded degree Positivstellensatz [40, 41]


Input: A polynomial system {fi (x) = 0, gi (x) ≥ 0} and a Positivstellen-
satz bound D.
Output: Feasible, if {fi (x) = 0, gi(x) ≥ 0} is feasible over R, else In-
feasible.
for d = 0, 1, 2, . . . , D do  
If there exist βi , sα ∈ R[x] such that −1 = i βi fi + α∈{0,1}n sα g α ,
with sα SOS, deg(βi fi ) ≤ d, deg(sα g α ) ≤ d then return Infeasible.
d ← d + 1.
end for
Return Feasible.

Notice that the membership test in the main loop of the algorithm
is, by the results described at the beginning of this section, equivalent to
a finite-sized semidefinite program. Similarly to the Nullstellensatz case,

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 469

the number of iterations (i.e., the degree of the certificates) serves as a


quantitative measure of the hardness in proving infeasibility of the system.
As we will describe in more detail in Section 3.4, in several situations one
can give further refined characterization on these degrees.
Example 11. Consider the polynomial system {f = 0, g ≥ 0} from
Example 2, where f := x2 +x21 +2 = 0 and g := x1 −x22 +3 ≥ 0. At the d-th
iteration of Algorithm 3 applied to the polynomial problem {f = 0, g ≥ 0},
one asks whether there exist polynomials β, s1 , s2 ∈ K[x] such that βf +
s1 + s2 · g = −1 where s1 , s2 are SOS and deg(s1 ), deg(s2 · g), deg(β · f ) ≤ d.
For each fixed positive integer d this can be tested by a (possibly large)
semidefinite program.
Solving this for d = 2, we have deg(s1 ) ≤ 2, deg(s2 ) = 0 and deg(β) =
0, so s2 and β are constants and
⎡ ⎤T ⎡ ⎤⎡ ⎤
1 Q11 Q12 Q13 1
s1 = z T Qz = ⎣ x1 ⎦ ⎣ Q12 Q22 Q23 ⎦ ⎣ x1 ⎦
x2 Q13 Q23 Q33 x2
= Q11 + 2Q12 x1 + 2Q13 x2 + Q22 x21 + 2Q23 x1 x2 + Q33 x22

where z = (1, x1 , x2 )T and Q ∈ R3×3 is a symmetric positive semidefinite


matrix. Thus, the certificate for D = 2 is βf + z t Qz + s2 · g = −1 where
Q  0 and s2 ≥ 0. If we expand the left hand side and equate coefficients
on both sides of the equation, we arrive at the following SDP:
2β + Q11 + 3s2 = −1 (1), 2Q12 + s2 = 0 (x1 ),
β + 2Q13 = 0 (x2 ), β + Q22 = 0 (x21 ),
2Q23 = 0 (x1 x2 ), Q33 − s2 = 0 (x22 )
where Q  0 and s2 ≥ 0. This SDP has a solution as follows:
⎡ ⎤
5 −1 3
Q = ⎣ −1 6 0 ⎦ , s2 = 2 and β = −6.
3 0 2
The resulting identity, which is the same as the one given in Example 2,
proves the inconsistency of the system.
As outlined in the preceding paragraphs, there is a direct connec-
tion going from general polynomial optimization problems to SDP, via the
Positivstellensatz infeasibility certificates. Even though we have discussed
only feasibility problems here, there are obvious straightforward connec-
tions with optimization. For instance, by considering the emptiness of the
sublevel sets of the objective function, or using representation theorems
for positive polynomials, sequences of converging bounds indexed by cer-
tificate degree can be directly constructed; see e.g. [40, 23, 42]. These
schemes have been implemented in software packages such as SOSTOOLS
[43], GloptiPoly [17], and YALMIP [31].

www.it-ebooks.info
470 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

3.2. Semidefinite programming relaxations. In the last section,


we have described the search for Positivstellensatz infeasibility certificates
formulated as a semidefinite programming problem. We now describe an al-
ternative interpretation, obtained by dualizing the corresponding semidefi-
nite programs. This is the exact analogue of the construction presented in
Section 2.2, and is closely related to the approach via truncated moment
sequences developed by Lasserre [23].
Recall that in the approach in Section 2.2, the linear relaxations were
constructed by replacing every monomial xα by a new variable λxα . Fur-
thermore, new redundant equations were obtained by multiplying an exist-
ing constraint f (x) = 0 by terms of the form xi , yielding xi f (x) = 0 (essen-
tially, generating the ideal of valid equations). In the inequality case, and as
suggested by the Positivstellensatz, new inequality constraints will be gen-
erated by both squarefree multiplication of the original constraints, and by
multiplication against sums of squares. That is, if gi (x) ≥ 0 and gj (x) ≥ 0
are valid inequalities, then so are gi (x)gj (x) ≥ 0 and gi (x)s(x) ≥ 0, where
s(x) is SOS. After substitution with the extended variables λ, we then ob-
tain a new system of linear equations and inequalities, with the property
that the resulting inequality conditions are semidefinite conditions. The
presence of the semidefinite constraints arises because we do not specify a
priori what the multipliers s(x) are, but only give their linear span.
Example 12. Consider the polynomial system discussed earlier in
Example 2. As described, new linear and semidefinite constraints are ob-
tained by linearizing all the polynomial constraints in the original system.
The corresponding relaxation is (for d = 2):
⎡ ⎤
λ1 λx1 λx2
⎣λx1 λx21 λx1 x2 ⎦  0, λx2 +λx21 +2λ1 = 0, λx1 −λx22 +3λ1 ≥ 0,
λx2 λx1 x2 λx22

plus the condition λ1 > 0 (without loss of generality, we can take λ1 = 1).
The first semidefinite constraint arises from linearizing the square of an
arbitrary degree one polynomial, while the other two constraints are the
direct linearization of the original equality and inequality constraints. The
resulting problem is a semidefinite program, and in this case, its infeasibility
directly shows that the original system of polynomial inequalities does not
have a solution.
An appealing geometric interpretation follows from considering the
projection of the feasible set of these relaxations in the space of original
variables (i.e., λxi ). For the linear algebra relaxations of Section 2.2, we ob-
tain outer approximations to the affine hull of the solution set (an algebraic
variety), while the SDP relaxation described here constructs outer approx-
imations to the convex hull of the corresponding semialgebraic set. This
latter viewpoint will be discussed in Section 3.3, for the case of equations
arising from combinatorial problems.

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 471

3.3. Theta bodies. Recall that traditional modeling of combinato-


rial optimization problems often uses 0/1 incidence vectors. The set S of
solutions of a combinatorial problem (e.g., the stable sets, traveling sales-
man tours) is often computed through the (implicit) convex hull of such
incidence vectors. Just as in the stable set and max-cut examples in Propo-
sition 1.1, the incidence vectors can be seen at the set of real solutions to a
system of polynomial equations: f1 (x) = f2 (x) = · · · = fm (x) = 0, where
f1 , . . . , fm ∈ R[x] := R[x1 , . . . , xn ]. Over the years there have been well-
known attempts to understand the structure of these convex hulls through
semidefinite programming relaxations (see [47, 34, 26, 33]) and in fact they
are closely related [27, 30]. Here we wish to summarize some recent results
that give appealing structural properties, in terms of the associated system
of equations (see [15, 14] for details).
Let us start with a historically important example: Given an undi-
rected finite graph G = (V, E), consider the set SG of characteristic vectors
of stable sets of G. The convex hull of SG , denoted by STAB(G), is the sta-
ble set polytope. As we mentioned already the vanishing ideal of SG is given
by IG := "x2i − xi (∀ i ∈ V ), xi xj (∀ {i, j} ∈ E)# which is a real radical
zero-dimensional ideal in R[x]. In [32], Lovász introduced a semidefinite
relaxation, TH(G), of the polytope STAB(G), called the theta body of G.
There are multiple descriptions of TH(G), but the one in [34, Lemma 2.17],
for instance, shows that TH(G) can be defined completely in terms of the
polynomial system IG . It is easy to show that STAB(G) ⊆ TH(G), and
remarkably, we have that STAB(G) = TH(G) if and only if the graph is
perfect. We will now explain how the case of stable sets can be generalized
to construct theta bodies for many other combinatorial problems.
We will construct an approximation of the convex hull of a finite set of
points S, denoted conv(S), by a sequence of convex bodies recovered from
“degree truncations” of the defining polynomial systems. In what follows I
will be a radical polynomial ideal. A polynomial f is non-negative modulo
I, written as f ≥ 0 mod I, if f (s) ≥ 0 for all s ∈ VR (I). More strongly,
the polynomial f is a sum of squares (sos) mod I if there exists hj ∈ R[x]
t t
such that f ≡ j=1 h2j mod I for some t, or equivalently, f − j=1 h2j ∈ I.
If, in addition, each hj has degree at most k, then we say that f is k-sos
mod I. The ideal I is k-sos if every polynomial that is non-negative mod I
is k-sos mod I. If every polynomial of degree at most d that is non-negative
mod I is k-sos mod I, we say that I is (d, k)-sos.
Note that conv(VR (I)), the convex hull of VR (I), is described by the
linear polynomials f such that f ≥ 0 mod I. A certificate for the non-
t
negativity of f mod I is the existence of a sos-polynomial j=1 h2j that
is congruent to f mod I. One can now investigate the convex hull of S
through the hierarchy of nested closed convex sets defined by the semidef-
inite programming relaxations of the set of (1, k)-sos polynomials.
Definition 3.1. Let I ⊆ R[x] be an ideal, and let k be a positive
integer. Let Σk ⊂ R[x] be the set of all polynomials that are k-sos mod I.

www.it-ebooks.info
472 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

1. The k-th theta body of I is

THk (I) := {x ∈ Rn : f (x) ≥ 0 for every linear f ∈ Σk }.

2. The ideal I is THk -exact if the k-th theta body THk (I) coincides
with the closure of conv(VR (I)).
3. The theta-rank of I is the smallest k such that THk (I) coincides
with the closure of conv(VR (I)).
Example 13. Consider the ideal I = "x2 y − 1# ⊂ R[x, y]. Then
conv(VR (I)) = {(p1 , p2 ) ∈ R2 : p2 > 0}, and any linear polynomial that
√ over V√
is non-negative R (I) is of the form α + βy, where α, β ≥ 0. Since
αy + β ≡ ( αxy)2 + ( β)2 mod I, I is (1, 2)-sos and TH2 -exact.
Example 14. For the case of the stable sets of a graph G, one can
see that
⎧ ⎫

⎪ ∃ M  0, M ∈ R(n+1)×(n+1) such that ⎪⎪
⎨ ⎬
n M00 = 1,
TH1 (IG ) = y ∈ R : .

⎪ M0i = Mi0 = Mii = yi ∀ i ∈ V ⎪

⎩ ⎭
Mij = 0 ∀ {i, j} ∈ E

It is known that TH1 (IG ) is precisely Lovász’s theta body of G. The ideal
IG is TH1 -exact precisely when the graph G is perfect.
By definition, TH1 (I) ⊇ TH2 (I) ⊇ · · · ⊇ conv(VR (I)). As seen in
Example 13, conv(VR (I)) may not always be closed and so the theta-body
sequence of I can converge, if at all, only to the closure of conv(VR (I)).
But the good news for combinatorial optimization is that there is plenty of
good behavior for problems arising with a finite set of possible solutions.
3.4. Application: cuts and exact finite sets. We discuss now a
few important combinatorial examples. As we have seen in Section 2.5 for
3-colorability, and in the preceding section for stable sets, in some special
cases it is possible to give nice combinatorial characterizations of when
low-degree certificates can exactly recognize infeasibility. Here are a few
additional results for the real case:
Example 15. For the max-cut problem we saw earlier, the defining
vanishing ideal is I(SG) = "x2e − xe ∀ e ∈ E, xT ∀ T an odd cycle in G#.
In this case one can prove that the ideal I(SG) is TH1 -exact if and only
if G is a bipartite graph. In general the theta-rank of I(SG) is bounded
above by the size of the max-cut in G. There is no constant k such that
THk (I(SG)) = conv(SG), for all graphs G. Other formulations of max-cut
are studied in [14].
Recall that when S ⊂ Rn is a finite set, its vanishing ideal I(S) is
zero-dimensional and real radical (see [36] Section 12.5 for a definition of
the real radical). In what follows, we say that a finite set S ⊂ Rn is exact
if its vanishing ideal I(S) ⊆ R[x] is TH1 -exact.
Theorem 3.2 ([15]). For a finite set S ⊂ Rn , the following are
equivalent.

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 473

1. S is exact.
2. There is a finite linear inequality description of conv(S) in which
for every inequality g(x) ≥ 0, g is 1-sos mod I(S).
3. There is a finite linear inequality description of conv(S) such that
for every inequality g(x) ≥ 0, every point in S lies either on the
hyperplane g(x) = 0 or on a unique parallel translate of it.
4. The polytope conv(S) is affinely equivalent to a compressed lattice
polytope (every reverse lexicographic triangulation of the polytope
is unimodular with respect to the defining lattice).
Example 16. The vertices of the following 0/1-polytopes in Rn are
exact for every n: (1) hypercubes, (2) (regular) cross polytopes, (3) hyper-
simplices (includes simplices), (4) joins of 2-level polytopes, and (5) stable
set polytopes of perfect graphs on n vertices.
More strongly one can say the following.
Proposition 3.1. Suppose S ⊆ Rn is a finite point set such that for
each facet F of conv(S) there is a hyperplane HF such that HF ∩conv(S) =
F and S is contained in at most t + 1 parallel translates of HF . Then I(S)
is THt -exact.
In [15] the authors show that theta bodies can be computed explicitly
as projections to the feasible set of a semidefinite program. These SDPs are
constructed using the combinatorial moment matrices introduced by [29].

4. Recovering solutions in the feasible case. In principle, it is


possible to find the actual roots of the system of equations (and thus the
colorings, stable sets, or desired combinatorial object) whenever the relax-
ations are feasible and a few additional conditions are satisfied. Here we
outline the linear algebra relaxations case, but the semidefinite case is very
similar; see e.g. [18, 25] for this case.
We describe below how, under certain conditions, it is possible to
recover the solution of the original polynomial system from the relaxations
(linear or semidefinite) described in earlier sections. The main concepts
are very similar for both methodologies, and are based on the well-known
eigenvalue methods for polynomial equations; see e.g. [6, §2.4]. The key
idea for extracting solutions is the fact that from the relaxations one can
obtain a finite-dimensional representation of the vector space K[x]/I and its
multiplicative structure, where I is the ideal ideal(F ) (in the case of linear
relaxations). In order to do this, we need to compute a basis of the vector
space K[x]/I, and construct matrix representations for the multiplication
operators Mxi : K[x]/I → K[x]/I where [f ] → [xi f ] for all [f ] ∈ K[x]/I.
Then, we can use the eigenvalue/eigenvector methods to compute solutions
(see e.g., [10]).
A sufficient condition for the existence of a suitable basis of K[x]/I is
given by Theorem 2.2. Under this condition, multiplication matrices Mxi
can be easily computed. In particular, if we have computed a set F ⊂ K[x]
that satisfies the conditions of Theorem 2.2 by running FPNulLA, then

www.it-ebooks.info
474 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

x0

x1

x3 x2

x5 x4

Fig. 5. Graph for Example 17.

finding a basis of R/I and computing its multiplicative structure is straight-


forward using linear algebra (see e.g., [37]). By construction, the matrices
Mxi commute pairwise, and to obtain the roots one must diagonalize the
corresponding commutative algebra. It is well-known (see, e.g., [6]), that
this can be achieved by forming a random linear combination of these ma-
trices. This random matrix will generically have distinct eigenvalues, and
the corresponding matrix of eigenvectors will give the needed change of
basis. In the case of a finite field, it is enough to choose the random co-
efficients over an algebraic extension of sufficiently large degree, instead of
working over the algebraic closure (alternatively, the more efficient meth-
ods in [11] can be used). The entries of the diagonalized matrices directly
provide the coordinates of the roots.
Remark 4.1. The condition in Theorem (2.2) can in general be a
strong requirement for recovery of solutions, since it implies that we can
obtain all solutions of the polynomial system. In some occasions, it may be
desirable to obtain just a single solution, in which case weaker conditions
may be of interest.
Example 17. Consider the following polynomial system over F2 , that
corresponds to the 3-colorings of the six-node graph in Figure 5:

x3i + 1 = 0 ∀i ∈ V, x2i + xi xj + x2j = 0 ∀(i, j) ∈ E.

We add to these equations the symmetry-breaking constraint x0 = 1. Af-


ter running NulLA with this system as an input, we obtain multiplication
matrices over F2 , of dimensions 4 × 4, given by:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 1 1 0 0 0 1 0 0 1 0
⎢ 0 0 1 0⎥ ⎢0 0 1 1⎥ ⎢0 0 0 1⎥
M x1 = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0 1 1 0⎦ Mx2 = ⎣1 0 1 1⎦ Mx3 = ⎣1 1 0 1⎦
1 1 0 1 0 1 1 0 1 0 1 1
⎡ ⎤ ⎡ ⎤
1 1 0 0 0 1 0 0
⎢ 1 0 0 0⎥ ⎢1 1 0 0⎥

M x4 = ⎣ ⎥ Mx5 = ⎣⎢ ⎥
0 0 1 1⎦ 0 0 0 1⎦
0 0 1 0 0 0 1 1

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 475

Diagonalizing the corresponding commutative algebra, we obtain the change


of basis matrix given by
⎡ ⎤
1 1 1 1
⎢ω 2 ω ω ω 2 ⎥
T =⎢ ⎥
⎣ 1 1 ω2 ω ⎦ ,
ω2 ω 1 1

where ω is a primitive root of 1, i.e., it satisfies w2 + w + 1 = 0. It can be


easily verified that all the matrices T −1 Mxi T are diagonal, and given by:
T −1 Mx1 T = diag[ω, ω 2 , ω, ω 2 ] T −1Mx2 T = diag[ω 2 , ω, 1, 1]
T −1 Mx3 T = diag[1, 1, ω 2 , ω] T −1Mx4 T = diag[ω, ω 2 , ω 2 , ω]
T −1 Mx5 T = diag[ω 2 , ω, ω, ω 2 ],
which correspond to the four possible 3-colorings of the graph. For instance,
from the second diagonal entry of each matrix we obtain the feasible coloring
(x0 , x1 , x2 , x3 , x4 , x5 ) → (1, ω 2 , ω, 1, ω 2 , ω).
Acknowledgements. We are grateful to the two anonymous referees
who provided many useful corrections and comments that greatly enhanced
the quality of presentation. We are also grateful to Jon Lee, Susan Mar-
gulies, Mohamed Omar, and Chris Hillar for their ideas and support.

APPENDIX
A. Proofs. This appendix contains proofs of some the results used in
the main body of the paper that are either hard to find or whose original
proof, available elsewhere, is not written in the language of this survey.
For the purpose of formally prove Theorem 2.2 we need to formalize
further some of the notions of Section 2: The space K[[x]] is isomorphic to
the dual vector space of K[x] consisting of all linear functionals on K[x],
that is, K[[x]] ∼
= Hom(K[x], K). We choose to use K[[x]] instead of as
the dual vector space of K[x] (see e.g., [24]) because using K[[x]] makes
clearer the linearization of a system of polynomial equations. The map
τ : K[[x]] → Hom(K[x], K) where (τ (λ))(g) = α∈Nn λα gα = λ ∗ g for
all λ ∈ K[[x]]
 and g ∈ K[x] is an isomorphism and the inverse map is
τ −1 (ψ) = α∈N n ψ(x α α
)x for all ψ ∈ Hom(K[x], K). For a given set
F ⊆ K[x], there is an analogue of the annihilator F ◦ in the context of
the dual vector space Hom(K[x], K) as follows: Ann(K[x], F ) := {ψ ∈
Hom(K[x], K) : ψ(f ) = 0, ∀f ∈ F }. Note that F◦ ∼ = Ann(K[x], F ) since
τ (F ◦ ) = Ann(K[x], F ).
Lemma A.1. Let F ⊆ K[x] be a vector subspace and k ∈ N. Then,
dim(K[x]k /(F ∩ K[x]k )) = dimk (F ◦ ).
Proof. We know from Theorem 3.14 in [45], Ann(K[x], F ∩ K[x]k ) =
Ann(K[x], F ) + Ann(K[x], K[x]k ); thus, (F ∩ K[x]k )◦ = F ◦ + K[x]◦k , and

www.it-ebooks.info
476 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

so, we have πk ((F ∩ K[x]k )◦ ) = πk (F ◦ ) + πk (K[x]◦k ) = πk (F ◦ ). More-


over, from Theorems 3.12 and 3.15 in [45], we have Ann(K[x]k , F ∩
K[x]k ) ∼
= Hom(K[x]k /(F ∩K[x]k ), K) ∼ = K[x]k /(F ∩K[x]k ) since K[x]k /(F ∩
K[x]k ) is finite dimensional, and finally, Ann(K[x]k , F ∩ K[x]k ) ∼ =
πk ((F ∩ K[x]k )◦ )since, for the isomorphism τk : Hom(K[x]k , K) → K[x]k
α α
where τ (ψ) = α∈Nn :|α|≤k ψ(x )x , thus τ (Ann(K[x]k , F ∩ K[x]k )) =

πk ((F ∩ K[x]k ) ).
We now present proofs verifying the correctness and efficiency of Al-
gorithm 2. We begin by proving Theorem 2.2.
Proof. [Proof of Theorem 2.2] We will explicitly show that, under the
hypothesis of the theorem, one can recover a finite dimensional vector space
B such that B ⊕ F = K[x]d and B ⊕ I = K[x]. The result then follows
from the equalities dim(I ◦ ) = dim(K[x]/I) = dim(B) = dim(K[x]d /F ) =
dimd (F ◦ ). We define the vector space B ⊆ K[x]d−1 such that B ⊕ (F ∩
K[x]d−1 ) = K[x]d−1 . By assumption, dimd (F ◦ ) = dimd−1 (F ◦ ) implying
dim(B) = dim(K[x]d−1 /F ∩ K[x]d−1 ) = dim(K[x]d /F ), and thus, it follows
that B ⊕ F = K[x]d . It only remains to show that B ⊕ I = K[x].
Denote F [0] = F and F [k] = (F [k−1] )+ for all k ≥ 1. We show by
induction on k that B ⊕ F [k] = K[x]d+k for all k ≥ 0, and hence B ⊕ I =
K[x]. We have already established B ⊕ F = K[x]d , so the claim holds for
k = 0. The claim also holds for k = 1 as follows: K[x]d+1 = (K[x]d )+ =
(B ⊕ F )+ = B + + F + = B + F + since B + ⊆ K[x]d = B ⊕ F , and
furthermore, the assumption F + ∩ K[x]d = F implies F + ∩ B = ∅, and
therefore, B ⊕ F + = K[x]d+1 . Now assume that the claim holds for k ≥ 1
and let us prove it must hold for k + 1.
By the assumption that B ⊕ F [k] = K[x]d+k , there exists a vector
space projection ρk : K[x]d+k → K[x]d+k where im(ρk ) = B, ρk (b) = b for
all b ∈ B and ker(ρk ) = F [k] . We extend the map ρk to  the map ρk+1 :
K[x]d+k+1 → K[x]
 d+k+1 by defining ρ k+1 (g) := ρk (g 0 ) + i ρk (xi ρk (gi ))
where g = g0 + i xi gi is a representation of g with g0 , g1 , . . . , gn ∈ K[x]d+k .
We show below that ρk+1 is well-defined meaning that the value of ρk+1 (g)
is independent of the chosen representation of g since there may be multiple
possible representations of g. It follows by construction that ρk+1 is K-
linear, im(ρk+1 ) = B, ρk+1 (b) = b for all b ∈ B and ker(ρk+1 ) = F [k+1] ,
implying that ρk+1 is a vector space projection and B ⊕F [k+1] = K[x]d+k+1
as required.
We now show that ρk+1 is well-defined. First, consider the special
case where g ∈ K[x]d+k+1 is a monomial, that is, g = xi xj xγ for some
i, j and some monomial xγ ∈ K[x]d+k−1 , so ρk+1 (g) = ρk (xi ρk (xj xγ )) or
ρk+1 (g) = ρ(xj ρk (xi xγ )). We thus need to show that ρk (xi ρk (xj xγ )) =
ρk (xj ρk (xi xγ )). Now, xγ = b + f for some b ∈ B where ρk (xγ ) = b and
f ∈ F [k−1] (k ≥ 1). Then, ρk (xi xγ ) = ρk (xi b + xi f ) = ρk (xi b) + ρk (xi f ) =
ρk (xi b), and similarly, ρk (xj xγ ) = ρk (xj b). Then,

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 477

ρk (xi ρk (xj xγ )) − ρk (xj ρk (xi xγ ))


= ρk (xi ρk (xj b)) − ρk (xj ρk (xi b))
= ρk (xi (xj b − f1 )) − ρk (xj (xi b − f1 )) (f1 , f1 ∈ F )
= (xi (xj b − f1 ) − f2 ) − (xj (xi b − f1 ) − f2 ) (f2 , f2 ∈ F )
= xj f1 − xi f1 + f2 − f2 ∈ F + .

So, ρk (xi ρk (xα )) − ρk (xj ρk (xβ )) ∈ F + . But, ρk (xi ρk (xα )) ∈ B and


ρk (xj ρk (xβ )) ∈ B by definition, so ρk (xi ρk (xα ))−ρk (xj ρk (xβ )) ∈ F + ∩B =
{0} since F + ∩ K[x]d = F . Thus, ρk (xi ρk (xα )) = ρk (xj ρk (xβ )) as required.
By the K-linearity of ρk , ρk+1 is well-defined on K[x]d+k+1 as required.
Theorem 2.2 (and its proof) can be seen as an adaptation and simplifi-
cation of Theorem 4.2 and Algorithm 4.3 in [37], the main difference being
that in Mourrain’s terminology, we stick to a particular order ideal and
only need to keep track of vector space dimensions instead of an explicit
basis for B.
We now present a proof of termination of the FPNulLA algorithm (see
also the comments following Algorithm 4.3 in [37]).
Lemma A.2. Let I be a zero-dimensional ideal, then FPNulLA (Al-
gorithm 2) terminates.
Proof. First, we prove that the inner while loop must terminate.
Let F ⊆ K[x]d be a vector space. We denote F [0,d] = F and F [k,d] =
(F [k−1,d] )+ ∩ K[x]d for all k ≥ 1 where d = deg(F ). By construction,
F [k,d] ⊆ F [k+1,d] ⊆ K[x]d for all k. So, the sequence of vector spaces
F [0,d] , F [1,d] , . . . , F [k,d] , . . . is an inclusion-wise increasing sequence of vec-
tor subspaces of K[x]d . Since K[x]d is finite-dimensional, the sequence must
reach a fixed point where F [k,d] = F [k+1,d] , which is the terminating con-
dition of the inner loop of FPNulLA (Steps 4-7). Let F [∗,d] denote this
fixed point.
The outer loop of FPNulLA is essentially the same as NulLA. After k
iterations of the outer loop, the vector space F contains at least all linear
combinations of polynomials of the form xα f where the total degree |α| ≤ k
and where f is one of the initial polynomials in F . Therefore, if the system
F (x) = 0 is infeasible, Hilbert’s Nullstellensatz guarantees that after a
finite number of iterations, 1 ∈ F and the algorithm terminates.
It remains to show that the algorithm terminates when the system
F (x) = 0 is feasible. Let I = I(F ). Since I is zero-dimensional, there must
exist a finite-dimensional vector space B ⊂ K[x] such that K[x] = I ⊕B (see
e.g. [5, 50]). Since the system F (x) = 0 is feasible, Hilbert’s Nullstellensatz
implies 1 ∈ I. Thus, we can choose B such that 1 ∈ B. Now after
finitely many iterations of the outer loop, any f ∈ I will eventually be
in F . Combined with the fact that B + is finite dimensional and B + ⊂
I ⊕ B, this implies that B + ⊂ F ⊕ B after finitely many iterations of
the outer loop. Also, since the inner loop has terminated, we know that
F = F + ∩ K[x]d = F [1,d] . Next, we show that K[x]d = F ⊕ B. Now,

www.it-ebooks.info
478 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

(F ⊕ B)[1,d] = F [1,d] + B [1,d] = F ⊕ B since F = F [1,d] and B + ⊆ F ⊕ B.


Thus, (F ⊕ B)[∗,d] = F ⊕ B, and since B + ⊆ F ⊕ B, this implies

(B + )[∗,d] ⊆ (F ⊕ B)[∗,d] = F ⊕ B.
[∗,d]
But 1 ∈ B, so K[x]d ⊆ (B + ) which then implies K[x]d = F ⊕ B.
Then, since B ⊆ K[x]d−1 , we also have K[x]d−1 = (F ∩ K[x]d−1 ) ⊕ B, and
thus, dim(K[x]d /F ) = dim(B) = dim(K[x]d−1 /F ), which is the stopping
criterion of the outer loop.
Now, we show that NulLA and FPNulLA algorithms run in polynomial
time in the bit-size of the input data when the Nullstellensatz degree is
assumed to be fixed.. To/ begin, note that the number of monomials xα
with deg(xα ) ≤ k is n+k k , which is O(n ).
k

Proof. (of Lemma 2.1). Let d = deg(F ). First note that by definition
(see section 2.1 in [46]) the input size of the defining basis {f1 , f2 , . . . , fm }
of F equals O(cmnd ) where c is the average bit-size of the coefficients in
the basis.
For the proof of (1), observe that in the k th iteration of Algorithm 1
(when the F + operation has increased the degree of F by k), we solve a
system of linear equations Ak x = bk to find coefficients of the Nullstellen-
satz certificate in Step 2 of Algorithm 1. The rows of Ak consist of vectors
of coefficients of all polynomials of the form xα fi where i = 1, . . . m and
deg(xα ) ≤ k. Therefore, Ak has O(mnk ) rows and each row has input size
O(cnd+k ). Hence, the input size of Ak is O(cmnd+2k ). The input size of
bk , which is a vector of zeros and ones, is O(mnk ). Thus, the input size
of the linear system Ak x = bk is O(cmnd+2k ), which is polynomial in the
input size of the basis of F and n, and thus, the system can be solved in
polynomial time (see e.g. Theorem 3.3 of [46]). The complexity of the first
L iterations is thus bounded by L times the complexity of the Lth iteration,
which is polynomial in L, n and the input size of the defining basis of F .
This completes the proof of the first part.
We now prove part (2). Denote by Fk the vector space computed at
the start of the kth outer loop iteration. Let {g1 , . . . gmk } be a basis of Fk
which was given to us either as an input or from the previous iteration.
Observe that the deg(Fk ) = d + k, so each basis polynomial of Fk has bit
size O(log2 (|K|)nd+k ). Note that dim(Fk ) = mk ≤ O(nd+k ); therefore the
bit size of the entire basis {g1 , . . . gmk } is M = O(log2 (|K|)n2(d+k) ). Note
that M is polynomial size in the input size of the initial basis f1 , . . . , fm .
Now we proceed to analyze the cost of the kth iteration, meaning steps
3 to 10 in the pseudocode. As in part (1), Step 3, involves solving a linear
system of size M ; thus it can be done in polynomial time. In Step 4 we
check whether dim(Fk ) = dim(Fk+ ∩ K[x]d+k ), which involves computing
a basis of Fk+ ∩ K[x]d+k . Note that Fk+ has bit size (n + 1)M , and to
compute the desired basis we perform Gaussian elimination on a matrix of
size (n + 1)M , which is polynomial time. If dim(Fk ) = dim(Fk+ ∩ K[x]d+k ),

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 479

then in Step 5, we set Fk := Fk+ ∩ K[x]d+k . We still have Fk ⊆ K[x]d ,


and Fk still has bit size M ; thus, as above, Step 6 can be computed in
polynomial time. The number of iterations of the while loop (Steps 4-7)
is O(nd ) since the dim(Fk ) is at most dim(K[x]d ) = O(nd ) and dim(Fk )
increases each iteration of the loop. So, the loop terminates in polynomial
time. Then, Step 8 involves computing a basis for Fk ∩K[x]d−1 using Gaus-
sian elimination, which is polynomial time again. Lastly, Step 9 involves
computing a basis of Fk+ which has bit size (n + 1)M and thus polynomial
time. The complexity of the first L iterations is thus bounded by L times
the complexity of the Lth iteration, which is polynomial in L, n, log2 (|K|)
and the input size of the defining basis of F , and the result follows.

REFERENCES

[1] E. Balas, S. Ceria, and G. Cornuéjols, A lift-and-project cutting plane al-


gorithm for mixed 0-1 programs, Mathematical Programming, 58 (1993),
pp. 295–324.
[2] J. Bochnak, M. Coste, and M.-F. Roy, Real algebraic geometry, Springer, 1998.
[3] M. Clegg, J. Edmonds, and R. Impagliazzo, Using the Groebner basis algorithm
to find proofs of unsatisfiability, in STOC ’96: Proceedings of the twenty-
eighth annual ACM symposium on Theory of computing, New York, NY,
USA, 1996, ACM, pp. 174–183.
[4] N. Courtois, A. Klimov, J. Patarin, and A. Shamir, Efficient algorithms for
solving overdefined systems of multivariate polynomial equations, in EURO-
CRYPT, 2000, pp. 392–407.
[5] D. Cox, J. Little, and D. O’Shea, Ideals, Varieties and Algorithms: An In-
troduction to Computational Algebraic Geometry and Commutative Algebra,
Springer Verlag, 1992.
[6] , Using Algebraic Geometry, Vol. 185 of Graduate Texts in Mathematics,
Springer, 2nd ed., 2005.
[7] J. De Loera, C. Hillar, P. Malkin, and M. Omar, Recognizing graph theoretic
properties with polynomial ideals. http://arxiv.org/abs/1002.4435, 2010.
[8] J. De Loera, J. Lee, P. Malkin, and S. Margulies, Hilbert’s Nullstellensatz
and an algorithm for proving combinatorial infeasibility, in Proceedings of the
Twenty-first International Symposium on Symbolic and Algebraic Computa-
tion (ISSAC 2008), 2008.
[9] J. De Loera, J. Lee, S. Margulies, and S. Onn, Expressing combinatorial opti-
mization problems by systems of polynomial equations and the nullstellensatz,
to appear in the Journal of Combinatorics, Probability and Computing (2008).
[10] A. Dickenstein and I. Emiris, eds., Solving Polynomial Equations: Founda-
tions, Algorithms, and Applications, Vol. 14 of Algorithms and Computation
in Mathematics, Springer Verlag, Heidelberg, 2005.
[11] W. Eberly and M. Giesbrecht, Efficient decomposition of associative algebras
over finite fields, Journal of Symbolic Computation, 29 (2000), pp. 441–458.
[12] A.V. Gelder, Another look at graph coloring via propositional satisfiability, Dis-
crete Appl. Math., 156 (2008), pp. 230–243.
[13] E. Gilbert, Random graphs, Annals of Mathematical Statistics, 30 (1959),
pp. 1141–1144.
[14] J. Gouveia, M. Laurent, P.A. Parrilo, and R.R. Thomas, A new semidefi-
nite programming relaxation for cycles in binary matroids and cuts in graphs.
http://arxiv.org/abs/0907.4518, 2009.

www.it-ebooks.info
480 J.A. DE LOERA, P.N. MALKIN, AND P.A. PARRILO

[15] J. Gouveia, P.A. Parrilo, and R.R. Thomas, Theta bodies for polynomial ideals,
SIAM Journal on Optimization, 20 (2010), pp. 2097–2118.
[16] D. Grigoriev and N. Vorobjov, Complexity of Nullstellensatz and Positivstel-
lensatz proofs, Annals of Pure and Applied Logic, 113 (2002), pp. 153–160.
[17] D. Henrion and J.-B. Lasserre, GloptiPoly: Global optimization over polyno-
mials with MATLAB and SeDuMi, ACM Trans. Math. Softw., 29 (2003),
pp. 165–194.
[18] , Detecting global optimality and extracting solutions in GloptiPoly, in Posi-
tive polynomials in control, Vol. 312 of Lecture Notes in Control and Inform.
Sci., Springer, Berlin, 2005, pp. 293–310.
[19] T. Hogg and C. Williams, The hardest constraint problems: a double phase
transition, Artif. Intell., 69 (1994), pp. 359–377.
[20] A. Kehrein and M. Kreuzer, Characterizations of border bases, Journal of Pure
and Applied Algebra, 196 (2005), pp. 251 – 270.
[21] A. Kehrein, M. Kreuzer, and L. Robbiano, An algebraist’s view on border bases,
in Solving Polynomial Equations: Foundations, Algorithms, and Applications,
A. Dickenstein and I. Emiris, eds., Vol. 14 of Algorithms and Computation in
Mathematics, Springer Verlag, Heidelberg, 2005, ch. 4, pp. 160–202.
[22] J. Kollár, Sharp effective Nullstellensatz, Journal of the AMS, 1 (1988),
pp. 963–975.
[23] J. Lasserre, Global optimization with polynomials and the problem of moments,
SIAM J. on Optimization, 11 (2001), pp. 796–817.
[24] J. Lasserre, M. Laurent, and P. Rostalski, Semidefinite characterization and
computation of zero-dimensional real radical ideals, Found. Comput. Math.,
8 (2008), pp. 607–647.
[25] , A unified approach to computing real and complex zeros of zero-
dimensional ideals, in Emerging Applications of Algebraic Geometry, M. Puti-
nar and S. Sullivant, eds., vol. 149 of IMA Volumes in Mathematics and its
Applications, Springer, 2009, pp. 125–155.
[26] J.B. Lasserre, An explicit equivalent positive semidefinite program for nonlinear
0-1 programs, SIAM J. on Optimization, 12 (2002), pp. 756–769.
[27] M. Laurent, A comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre
relaxations for 0–1 programming, Math. Oper. Res., 28 (2003), pp. 470–496.
[28] , Semidefinite relaxations for max-cut, in The Sharpest Cut: The Impact
of Manfred Padberg and His Work, M. Grötschel, ed., Vol. 4 of MPS-SIAM
Series in Optimization, SIAM, 2004, pp. 257–290.
[29] , Semidefinite representations for finite varieties, Mathematical Program-
ming, 109 (2007), pp. 1–26.
[30] , Sums of squares, moment matrices and optimization over polynomials, in
Emerging Applications of Algebraic Geometry, M. Putinar and S. Sullivant,
eds., Vol. 149 of IMA Volumes in Mathematics and its Applications, Springer,
2009, pp. 157–270.
[31] J. Löfberg, YALMIP: A toolbox for modeling and optimization in MATLAB, in
Proceedings of the CACSD Conference, Taipei, Taiwan, 2004.
[32] L. Lovász, Stable sets and polynomials, Discrete Math., 124 (1994), pp. 137–153.
[33] , Semidefinite programs and combinatorial optimization, in Recent advances
in algorithms and combinatorics, B. Reed and C. Sales, eds., Vol. 11 of CMS
Books in Mathematics, Spring, New York, 2003, pp. 137–194.
[34] L. Lovász and A. Schrijver, Cones of matrices and set-functions and 0-1 opti-
mization, SIAM J. Optim., 1 (1991), pp. 166–190.
[35] S. Margulies, Computer Algebra, Combinatorics, and Complexity: Hilbert’s Null-
stellensatz and NP-Complete Problems, PhD thesis, UC Davis, 2008.
[36] M. Marshall, Positive polynomials and sums of squares., Mathematical Sur-
veys and Monographs, 146. Providence, RI: American Mathematical Society
(AMS). xii, p. 187, 2008.

www.it-ebooks.info
POLYNOMIALS IN COMBINATORIAL OPTIMIZATION 481

[37] B. Mourrain, A new criterion for normal form algorithms, in Proc. AAECC,
Vol. 1719 of LNCS, Springer, 1999, pp. 430–443.
[38] B. Mourrain and P. Trébuchet, Stable normal forms for polynomial system
solving, Theoretical Computer Science, 409 (2008), pp. 229 – 240. Symbolic-
Numerical Computations.
[39] Y. Nesterov, Squared functional systems and optimization problems, in High
Performance Optimization, J.F. et al., eds., ed., Kluwer Academic, 2000,
pp. 405–440.
[40] P.A. Parrilo, Structured semidefinite programs and semialgebraic geometry meth-
ods in robustness and optimization, PhD thesis, California Institute of Tech-
nology, May 2000.
[41] , Semidefinite programming relaxations for semialgebraic problems, Mathe-
matical Programming, 96 (2003), pp. 293–320.
[42] P.A. Parrilo and B. Sturmfels, Minimizing polynomial functions, in Proceed-
ings of the DIMACS Workshop on Algorithmic and Quantitative Aspects
of Real Algebraic Geometry in Mathematics and Computer Science (March
2001), S. Basu and L. Gonzalez-Vega, eds., American Mathematical Society,
Providence RI, 2003, pp. 83–100.
[43] S. Prajna, A. Papachristodoulou, P. Seiler, and P.A. Parrilo, SOSTOOLS:
Sum of squares optimization toolbox for MATLAB, 2004.
[44] G. Reid and L. Zhi, Solving polynomial systems via symbolic-numeric reduction
to geometric involutive form, Journal of Symbolic Computation, 44 (2009),
pp. 280–291.
[45] S. Roman, Advanced Linear Algebra, Vol. 135 of Graduate Texts in Mathematics,
Springer New York, third ed., 2008.
[46] A. Schrijver, Theory of linear and integer programming, Wiley, 1986.
[47] H. Sherali and W. Adams, A hierarchy of relaxations between the continuous
and convex hull representations for zero-one programming problems, SIAM
Journal on Discrete Mathematics, 3 (1990), pp. 411–430.
[48] N.Z. Shor, Class of global minimum bounds of polynomial functions, Cybernetics,
23 (1987), pp. 731–734.
[49] G. Stengle, A Nullstellensatz and a Positivstellensatz in semialgebraic geometry,
Mathematische Annalen, 207 (1973), pp. 87–97.
[50] H. Stetter, Numerical Polynomial Algebra, SIAM, 2004.
[51] L. Vandenberghe and S. Boyd, Semidefinite programming, SIAM Review, 38
(1996), pp. 49–95.
[52] L. Zhang, zchaff v2007.3.12. Available at http://www.princeton.edu/∼chaff/
zchaff.html, 2007.

www.it-ebooks.info
www.it-ebooks.info
MATRIX RELAXATIONS
IN COMBINATORIAL OPTIMIZATION
FRANZ RENDL∗

Abstract. The success of interior-point methods to solve semidefinite optimization


problems (SDP) has spurred interest in SDP as a modeling tool in various mathemati-
cal fields, in particular in combinatorial optimization. SDP can be viewed as a matrix
relaxation of the underlying 0-1 optimization problem. In this survey, some of the main
techniques to get matrix relaxations of combinatorial optimization problems are pre-
sented. These are based either on semidefinite matrices, leading in general to tractable
relaxations, or on completely positive or copositive matrices. Copositive programs are
intractable, but they can be used to get exact formulations of many NP-hard combina-
torial optimization problems. It is the purpose of this survey to show the potential of
matrix relaxations.

Key words. Semidefinite optimization, Lift-and-project, integer programming.

1. Introduction. Integer programming and nonlinear optimization


developed and grew rather independently of one another for a long time.
The theoretical basis of integer programming consisted essentially of poly-
hedral combinatorics and the algorithmic machinery for linear program-
ming, while nonlinear optimization relies on local analysis based on vector
calculus (Taylor expansion, steepest descent principle, etc). In the last 15
years these two fields mutually opened up, and today interior-point meth-
ods are a widely accepted tool in integer programming, while the modeling
power of 0-1 decision variables in an otherwise continuous setting expands
the flexibility of real-world modeling substantially.
In this article we explore the idea of matrix liftings joining integer
and nonlinear optimization. We first introduce the 0-1 formulation of an
abstract combinatorial optimization problem (COP), given as follows. Let
E be a finite set and let F be a (finite) family of subsets of E. The
elements F ∈ F represent the feasible solutions of (COP). Each e ∈ E
 integer cost ce . We define the cost c(F ) of F ∈ F to be
has a given
c(F ) := e∈F ce . The problem (COP) now consists in finding a feasible
solution F of minimum cost:

(COP ) z ∗ = min{c(F ) : F ∈ F }.

The 0-1 model of (COP) is obtained by assigning to each F ∈ F a charac-


teristic vector xF ∈ {0, 1}|E| with (xF )e = 1 if and only if e ∈ F . We can
write (COP) as a linear program as follows. Let

P := conv{xF : F ∈ F }

∗ Institut für Mathematik, Alpen-Adria-Universität Klagenfurt, Austria.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 483
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_17,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
484 FRANZ RENDL

denote the convex hull of the characteristic vectors of feasible solutions.


Then it is clear that

z ∗ = min{cT xF : F ∈ F } = min{cT x : x ∈ P }.

The first minimization is over the finite set F , the second one is a linear
program. This is the basic principle underlying the polyhedral approach to
solve combinatorial optimization problems. The practical difficulty lies in
the fact that in general the polyhedron P is not easily available. We recall
two classical examples to illustrate this point.
As a nice example, we consider first the linear assignment problem.
For a given n × n matrix C =(cij ), it consists of finding a permutation φ
of N = {1, . . . , n}, such that i∈N ciφ(i) is minimized. The set of all such
permutations is denoted by Π. In our general setting, we define the ground
set to be E = N ×N , the set of all ordered pairs (i, j). Feasible solutions are
now given through permutations φ as Fφ ⊂ E such that Fφ = {(i, φ(i)) :
i ∈ N }. In this case the characteristic vector of Fφ is the permutation
matrix Xφ given by (Xφ )ij = 1 if and only if j = φ(i). Birkhoff’s theorem
tells us that the convex hull of the set of permutation matrices Π is the set
of doubly stochastic matrices Ω = {X : Xe = X T e = e, X ≥ 0} 1 .
Theorem 1.1. conv{Xφ : φ ∈ Π} = Ω.
Hence we have a simple polyhedral description of P in this case. There-
fore
 

min ciφ(i) : φ ∈ Π} = min{"C, X# : X ∈ Ω ,
i

and the linear assignment problem can be solved as an ordinary linear


program.
Unfortunately the set P does not always have such a nice description.
As a second example we consider the stable set problem. Given a graph
G = (V, E) with vertex set V = {1, . . . , n}, the problem is to find S ⊆ V ,
such that no edge joins vertices in S (such sets S are called stable), and
|S| is maximized. The ground set here is V and F consists of all subsets
of V which are stable. The characteristic vectors x ∈ Rn of the stable sets
can be characterized by x = (xi ) with xi ∈ {0, 1} and

xi + xj ≤ 1 ∀ij ∈ E(G), (1.1)

because no stable set can contain both i and j if ij ∈ E(G). A partial


description of the convex hull of the characteristic vectors of stable sets is
therefore given by

F ST AB(G) := {x ∈ Rn : x ≥ 0, xi + xj ≤ 1 ∀ij ∈ E(G)}, (1.2)

1 For notation we refer to the end of this section.

www.it-ebooks.info
MATRIX RELAXATIONS 485

leading to the following linear programming relaxation

max{eT x : x ∈ F ST AB(G)}.

If we take G to be the 5-cycle C5 , we see that x = 12 e is feasible for


F ST AB(C5 ) with value 52 , showing that this is indeed only a relaxation of
the stable set problem.
The use of a computationally tractable partial description of P by
linear inequalities in combination with systematic enumeration, like Branch
and Bound, has led to quite successful solution methods for a variety of
combinatorial optimization problems like the traveling salesman problem
(TSP), see for instance [40]. It turned out however, that for some prominent
NP-hard problems like Stable-Set or Max-Cut, this polyhedral approach
was not as successful as one might have hoped in view of the results for
TSP. It is the purpose of this article to describe matrix based relaxations,
which generalize the purely polyhedral methods, and have the potential for
stronger approximations of the original problem.
We conclude the introduction with a summary of the notation used
throughout.
Vectors and matrices: e denotes the vector of all ones of appropriate
dimension and J = eeT is the all ones matrix. I = (e1 , . . . , en ) denotes the
n × n identity matrix, so the ei s represent the standard unit vectors. We
also use (δij ) := I, thereby defining the Kronecker delta δij . The sum of
the
 main diagonal entries of a square matrix A is called the trace, tr(A) =
n
i a ii . The inner product in R as well as in the space of n × n matrices
is represented by "., .#. Hence "a, b# = aT b, for a, b ∈ Rn and "A, B# =
tr(AT B) for matrices A, B. The Kronecker product of two matrices P, Q
is the matrix consisting of all possible products of elements from P and
Q, P ⊗ Q = (pij Q). For m ∈ Rn we define Diag(m) to be the diagonal
matrix having m on the main diagonal. diag(M ) is the vector, containing
the main diagonal elements from ⎛ M⎞ . If X = (x1 , . . . , xn ) is a matrix with
x1
⎜ ⎟
columns xi , then vec(X) = ⎝ ... ⎠ is the vector, obtained by stacking
xn
the columns of X.
Sets and matrix cones: The standard simplex in Rn is given by
Δ = {x ∈ Rn : eT x = 1, x ≥ 0}. In this survey, we will put special
emphasis on the following matrix cones.

N := {X : X ≥ 0} elementwise nonnegative matrices,

S + := {X : X = X T , aT Xa ≥ 0 ∀a}, positive semidefinite matrices,

C := {X : X = X T , aT Xa ≥ 0 ∀a ≥ 0}, copositive matrices.

www.it-ebooks.info
486 FRANZ RENDL

We also use the notation X  0 to express X ∈ S + . If K is a cone


in some finite dimensional vector space Rd , then by definition the dual
cone, denoted K ∗ , contains all elements from Rd having nonnegative inner
product with all elements from K,

K ∗ := {y ∈ Rd : "x, y# ≥ 0 ∀x ∈ K}.
It is well known and can easily be shown, that both N and S + are self-dual.
The dual cone of C is the cone

C ∗ = conv{aaT : a ≥ 0} = {Y : ∃X ≥ 0, Y = XX T }
of completely positive matrices. Membership in S + can be checked in
polynomial time, for instance through the existence (or non-existence) of
the Cholesky decomposition. In contrast, it is NP-complete to decide if a
matrix does not belong to C, see [49]. Wile positive semidefinite matrices
are covered in any reasonable text book on advanced linear algebra, we refer
the reader to [4] for a thorough treatment of completely positive matrices
and to [33] for a recent survey on copositive matrices.
Graphs: A graph G = (V, E) is given through the set of vertices V and
the set of edges E. We sometimes write E(G) to indicate the dependence
on G. If S ⊂ V , we denote by δ(S) := {uv ∈ E : u ∈ S, v ∈ / S} the set of
edges joining S and V \ S. We also say that the edges in δ(S) are cut by S.
2. Matrix relaxations: basic ideas. The classical polyhedral ap-
proach is formulated as a relaxation in Rn , the natural space to embed F .
Here n denotes the cardinality of E, |E| = n. Matrix-based relaxations of
(COP) are easiest explained as follows. To an element xF ∈ F we associate
the matrix xF xTF and consider

M := conv{xF xTF : F ∈ F }, (2.1)

see for instance [44, 61, 62]. Note that

diag(xF xTF ) = xF ,

because xF ∈ {0, 1}n. This property immediately shows that the original
linear relaxation, obtained through a partial description of P can also be
modeled in this setting. The full power of matrix lifting is based on the
possibility to constrain M to matrix cones other than polyhedral ones.
Moreover, quadratic constraints on xF will turn into linear constraints on
matrices in M.
If K is some matrix cone and matrices C, A1 , . . . , Am and b ∈ Rm are
given, the problem

inf{"C, X# : "Ai , X# = bi i = 1, . . . , m, X ∈ K} (2.2)

is called a linear program over K. Linear programs over S + are also called
semidefinite programs (SDP) and those over C or C ∗ are called copositive

www.it-ebooks.info
MATRIX RELAXATIONS 487

programs (CP) for short. In this paper we will mostly concentrate on SDP
and CP relaxations of combinatorial optimization problems.
The duality theory of linear programming generalizes easily to conic
linear programs. The (Lagrangian) dual associated to (2.2) is given as

sup{bT y : C − yi Ai ∈ K ∗ }. (2.3)
i

Weak duality (sup ≤ inf) holds by construction of the dual. Strong du-
ality (sup = inf), as well as attainment of the respective optima requires
some sort of regularity of the feasible regions. We refer to Duffin [17] for
the original paper, and to the handbook [70] on semidefinite programming
for a detailed discusssion of SDP. The existence of feasible points in the
interior of the primal and dual cone insures the following characterization
of optimality. For ease of notation we write A(X) = b for the equations
in (2.2). The linear operator A has an adjoint AT , defined through the
adjoint identity

"A(X), y# = "X, AT (y)#.

We should point out that the inner product on the left is in Rm and on the
right it is in the space of n × n matrices. In this paper the inner products
will always be canonical, so we do not bother to overload the  notation to
distinguish them. The adjoint can be expressed as AT (y) = i yi Ai .
Theorem 2.1. [17, 70] Suppose there exists X0 ∈ int(K) such that
A(X0 ) = b and there is y0 such that C − AT (y0 ) ∈ int(K ∗ ). Then the
optima in (2.2) and (2.3) are attained. Moreover, X and y are optimal if
and only if A(X) = b, X ∈ K, Z := C − AT (y) ∈ K ∗ and the optimal
objective values coincide, "X, Z# = 0.
Matrix relaxations can be used in several ways to better understand
(COP). The seminal work of Goemans and Williamson [23] opened the way
to new approximation techniques for some COPs. We will briefly explain
them as we develop the various relaxations. From a computational point of
view, SDP based relaxations pose a serious challenge to existing algorithms
for SDP. We will describe the currently most efficient ways to solve these
relaxations (at least approximately).
There exist several recent survey papers devoted to the connection be-
tween semidefinite optimization and integer programming. The interested
reader is referred to [39] for an extensive summary on the topic covering
the development until 2003. The surveys by Lovász [43], Goemans [22] and
Helmberg [29] all focus on the same topic, but also reflect the scientific
interests and preferences of the respective authors. The present paper is
no exception to this principle. The material selected, and also omitted,
reflects the author’s subjective view on the subject. It is a continuation
and an extension of [57].

www.it-ebooks.info
488 FRANZ RENDL

We are now going to look at several techniques to obtain matrix re-


laxations of combinatorial optimization problems. We start out with the
generic idea, as explained in the introduction, and show how it works for
graph partitioning.
3. Graph partition. Graph partition problems come in various for-
mulations. The starting point is a graph G, given through its weighted
n × n adjcacency matrix AG , or simply A. If ij ∈ E(G), then aij denotes
the weight of edge ij, otherwise aij = 0. Hence, A = AT and diag(A) = 0.
The Laplacian LA of A, or L for short, is defined to be the matrix

L := diag(Ae) − A. (3.1)

The following simple properties of the Laplacian L will be used later on.
Proposition 3.1. The Laplacian L of the matrix A satisfies Le = 0
and A ≥ 0 implies that L  0.
Graph partition problems ask to separate the vertices of a graph into a
specified number of partition blocks so that the total weight of edges joining
different blocks is minimized or maximized. Partition problems lead rather
naturally to matrix based relaxations because encoding whether or not
vertices i and j ∈ V are separated has a natural matrix representation,
as we will see briefly. We recall the definition of a cut given by S ⊂ V :
δ(S) = {uv ∈ E(G) : u ∈ S, v ∈ / S}.
3.1. Max-k-Cut. For k ≥ 2, Max-k-Cut asks to partition V (G) into
k subsets (S1 , . . . , Sk ) such that the total weight of edges joining distinct
subsets is maximized. We introduce characteristic vectors si ∈ {0, 1}n for
each Si . The n × k matrix S = (s1 , . . . , sk ) is called the k-partition matrix.
Since ∪i Si = V , we have
k

si = Se = e.
i=1

Partition matrices have the following properties.


Proposition 3.2. Let S = (s1 , . . . , sk ) be a k-partition matrix. Then
diag(SS T ) = e, kSS T − J  0.
We prove a more general result, which will also be of use later on. Its
proof has been pointed out by M. Laurent2 , see also [18], Lemma 2.
Lemma
k 3.1. Let s1 , . . .
, sk be a set of 0, 1 vectors and
 λi ≥ 0T be such
that i=1 λi s i = e, hence i λi = t > 0. Let M = i λi si si . Then
diag(M ) = e, tM − J  0.
Proof. Consider
  1   1 T  t eT 
λi =  0.
si si e M
i

2 Oberwolfach conference on Discrete Optimization, November 2008.

www.it-ebooks.info
MATRIX RELAXATIONS 489
 T

We note that diag(M ) = i λi diag(si si ) = i λi si = e. The Schur-
complement lemma shows that M − 1t eeT  0.
Proposition 3.2 clearly follows with λi = 1. We recall that δ(Si ) de-
notes the set of edges joining Si to V \ Si . A simple calculation using basic
properties of the Laplacian L shows that

sTi Lsi = auv (3.2)
uv∈δ(Si )

gives the weight of all edges cut by Si . Therefore the total weight of all
edges joining distinct subsets is given by
1 T 1
si Lsi = "S, LS#.
2 2
i

The factor 12 comes from the fact that an edge uv ∈ E(G) with u ∈ Si
and v ∈ Sj appears in both δ(Si ) and δ(Sj ). Thus Max-k-Cut can be
modeled as
1
max "S, LS#
2
such that the n × k matrix S has entries 0 or 1 and Se = e. After replacing
SS T by Y , we get the following SDP relaxation
1
zGP −k := max{ "L, Y # : diag(Y ) = e, kY − J ∈ S + , Y ∈ N }. (3.3)
2
The conditions diag(Y ) = e and kY − J ∈ S + are derived from proposition
3.2. Note in particular that Y  0 is implied by kY  J  0. The
standard SDP formulation, see [19, 16], is obtained through the variable
transformation
1
X= [kY − J]
k−1
and yields, with "J, L# = 0
k−1 1
max{ "L, X# : diag(X) = e, X ∈ S + , xij ≥ − }. (3.4)
2k k−1
3.2. Max-Cut. The special case of Max-k-Cut with k = 2 is usually
simply called Max-Cut, as the task is to separate V into S and V \ S so as
to maximize the weight of edges in δ(S). In view of (3.2) we clearly have

zMC = max{sT Ls : s ∈ {0, 1}n}.

Setting y = e − 2s, we have y ∈ {1, −1}n and, using Le = 0, we get


1
zMC = max{ y T Ly : y ∈ {−1, 1}n}. (3.5)
4

www.it-ebooks.info
490 FRANZ RENDL

The following identity is a simple consequence of the definition of the Lapla-


cian L through the adjacency matrix A, see (3.1)
1 T  1 − y i yj
y Ly = aij . (3.6)
4 2
ij∈E(G)

The resulting SDP relaxation becomes


1
max{ "L, X# : diag(X) = e, X ∈ S + }. (3.7)
4
This model is identical to (3.4) with k = 2. We point out in particular that
the sign constraint xij ≥ −1 is implied by X  0 and diag(X) = e, and
hence redundant.
3.3. k-Equicut. The following version of graph partition constrains
the cardinalities of the partition blocks. In the simplest version of k-Equicut
they are required to be equal to one another, |Si | = nk ∀i. Thus the column
sums of S are nk , S T e = nk e. We also have Se = e, because each vertex is
in exactly one partition block. From this it follows that SS T e = nk e, so nk
is the eigenvalue of SS T for the eigenvector e. The relaxation from (3.3)
therefore leads to
1 n
min{ "L, X# : diag(X) = e, Xe = e, X ≥ 0, X  0}.
2 k
In the context of Equicut, one is often interested in minimizing the total
weight of edges cut. It is well known that without the cardinality con-
straints on the partition blocks, one can find the minimum cut (in the
bisection case k = 2) using maximum flows. We also note that X  0 to-
gether with Xe = nk e implies X = k1 J + i λi ui uTi where the eigenvalues
λi ≥ 0 and the eigenvectors ui ⊥ e. Therefore kX − J  0 as in (3.3) is
implied. Further modeling ideas for Equicut using SDP can be found for
instance in [36]. Applications of Equicut in telecommunication, and some
computational experience with k-Equicut are shown in [41].
3.4. Approximation results for graph partition. The SDP re-
laxations for Max-Cut and Max-k-Cut can be used to get the following
polynomial time approximations. The key idea underlying this approach
was introduced by Goemans and Williamson [23] and consists of the fol-
lowing geometric construction. A feasible solution X of (3.4) or (3.7) has
the Gram representation X = (xij ) with xij = (viT vj ). The constraint
diag(X) = e implies that the vi are unit vectors.
Let us first consider Max-Cut. Goemans and Williamson [23] interpret
the Max-Cut problem as finding an embedding v i of the vertices i in the
unit sphere in R1 , hence vi ∈ {−1, 1}, such that 14 ij lij viT vj is maximized,
see (3.5).
The optimal solution of the relaxation (3.7) gives such an embedding
in Rd , where d is the rank of an optimal X. Clearly 1 ≤ d ≤ n. How should

www.it-ebooks.info
MATRIX RELAXATIONS 491

we get a (bi)partition of approximately maximum weight? Goemans and


Williamson propose the following simple but powerful hyperplane rounding
trick.
Take a random hyperplane H through the origin, and let S ⊆ V be
the set of vertices on one side of H. The probability that H separates i
and j is proportional to the angle between vi and vj and is given by
1
arccos(viT vj ).
π
In [23] it is shown that
1 1
arccos t ≥ α (1 − t)
π 2
holds for −1 ≤ t ≤ 1 and α ≈ 0.87856. We therefore get the following
performance bound for the expected value of the cut y obtained this way,
provided aij ≥ 0
 1 − yi yj  1
aij ≥α aij (1 − viT vj ) ≈ 0.87856zMC .
2 ij
2
ij∈E(G)

Note the use of (3.6). Later on, Nesterov [51] generalizes this result to
the more general case where only L  0 is assumed. The analysis in this
case shows that the expected value of the cut y obtained from hyperplane
rounding is at least
1 T 2
y Ly ≥ zMC ≈ 0.636zMC .
4 π
Frieze and Jerrum [19] generalize the hyperplane rounding idea to Max-
k-Cut. Starting again from the Gram representation X = V T V with
unit vectors vi forming V , we now take k independent random vectors
r1 , . . . , rk ∈ Rn for rounding. The idea is that partition block Sh contains
those vertices i which have vi most parallel to rh ,
i ∈ Sh ⇐⇒ viT rh = max{viT rl : 1 ≤ l ≤ k}.
Ties are broken arbitrarily. For the computation of the probability that
two vertices are in the same partition block, it is useful to assume that
the entries of the ri are drawn independently from the standard normal
distribution
Pr(vs , vt ∈ S1 ) = Pr(vsT r1 = max vsT ri , vtT r1 = max vtT ri ).
i i

The symmetry properties of the normal distribution imply that this prob-
ability depends on ρ = cos(vsT vt ) only. We denote the resulting probability
by I(ρ). Therefore
P r(vs and vt not separated) = kI(ρ).

www.it-ebooks.info
492 FRANZ RENDL

The computation of I(ρ) involves multiple integrals. A Taylor series ex-


pansion is used in [19] to get the following estimates for the expectation
value of the cut given by the partition S from hyperplane rounding,
1
"S, LS# ≥ αk zGP −k ,
2
where α2 = 0.87856 as for Max-Cut, α3 ≈ 0.8327, α4 ≈ 0.85. In [19], values
for αk are also provided for larger values of k. Later, these bounds on αk
were slightly improved, see [16]. It should be emphasized that the math-
ematical analysis underlying this simple rounding scheme involves rather
subtle techniques from classical calculus to deal with probability estimates
leading to the final error bounds αk , see also the monograph [14].
4. Stable sets, cliques and coloring. The seminal work of Lovász
[42] introduces a semidefinite program, which can be interpreted both as a
relaxation of the Max-Clique problem and the Coloring problem.
4.1. Stable sets and Cliques. The Stable-Set problem has already
been described in the introduction. We denote by α(G) the stability num-
ber of G (= cardinality of the largest stable set in G). It is given as the
optimal solution of the following integer program.

α(G) = max{eT x : x ∈ {0, 1}n, xi + xj ≤ 1 ∀ij ∈ E(G)}.

We first observe that the inequalities could equivalently be replaced by

xi xj = 0 ∀ij ∈ E(G).

A clique in G is a subset of pairwise adjacent vertices. Moving from G to


the complement graph G, which joins vertices i = j whenever ij ∈ / E(G),
it is immediately clear that stable sets in G are cliques in G and vice-versa.
In the spirit of matrix lifting, we introduce for a nonzero characteristic
vector x of some stable set the matrix
1
X := xxT . (4.1)
xT x
These matrices satisfy

X  0, tr(X) = 1, xij = 0 ∀ij ∈ E(G).

We also have "J, X# = (exTx)


T 2

x
= eT x. We collect the equations xij =
0 ∀ij ∈ E(G) in the operator equation AG (X) = 0. Therefore we get the
semidefinite programming upper bound α(G) ≤ ϑ(G),

ϑ(G) := max{"J, X# : tr(X) = 1, AG (X) = 0, X  0}. (4.2)

This is in fact one of the first relaxations for a combinatorial optimization


problem based on SDP. It was introduced by Lovász [42] in 1979. This

www.it-ebooks.info
MATRIX RELAXATIONS 493

problem has been the starting point for many quite far reaching theoretical
investigations. It is beyond the scope of this paper to explain them in
detail, but here are some key results.
Grötschel, Lovász and Schrijver [25] show that α(G) can be computed
in polynomial time for perfect graphs. This is essentially a consequence
of the tractability to compute ϑ(G), and the fact that α(G) = ϑ(G) holds
for perfect graphs G. We do not explain the concept of perfect graphs here,
but refer for instance to [25, 60]. It is however a prominent open problem to
provide a polynomial time algorithm to compute α(G) for perfect graphs,
which is purely combinatorial (=not making use of ϑ(G)).
The Stable-Set problem provides a good example for approximations
based on other matrix relaxations. Looking at (4.1), we can additionally
ask that X ∈ N . In this case the individual
 equations xij = 0 ∀ij ∈ E(G)
can be added into a single equation ij∈E(G) xij = 0. If we use AG for the
adjacency matrix of G, this means

"AG , X# = 0.

Hence we get the stronger relaxation, proposed independently by Schrijver


[59] and McEliece et al [47].

α(G) ≤ max{"J, X# : "AG , X# = 0, tr(X) = 1, X ∈ S + ∩ N } =: ϑ (G).

In terms of matrix cones, we have moved from S + to S + ∩ N . We can go


one step further. The matrix X from (4.1) is in fact completely positive,
hence we also get

α(G) ≤ max{"J, X# : "AG , X# = 0, tr(X) = 1, X ∈ C ∗ }. (4.3)

We will see shortly that the optimal value of the copositive program on the
right hand side is in fact equal to α(G). This result is implicitly contained
in Bomze et al [8] and was stated explicitly by de Klerk and Pasechnik
[15]. A simple derivation can be obtained from the following theorem of
Motzkin and Straus [48]. We recall that Δ := {x ∈ Rn : x ≥ 0, eT x = 1}.
Theorem 4.1. [48] Let AG be the adjacency matrix of a graph G.
Then
1
= min{xT (AG + I)x : x ∈ Δ}. (4.4)
α(G)

The relation (4.4) implies in particular that


1 1
0 = min{xT (AG +I − eeT )x : x ∈ Δ} = min{xT (AG +I − J)x : x ≥ 0}.
α α
This clearly qualifies the matrix α(AG + I) − J to be copositive. Therefore

inf{λ : λ(AG + I) − J ∈ C} ≤ α(G).

www.it-ebooks.info
494 FRANZ RENDL

But weak duality for conic linear programs also shows that

sup{"J, X# : "AG + I, X# = 1, X ∈ C ∗ } ≤ inf{λ : λ(AG + I) − J ∈ C}.

Finally, any matrix X of the form (4.1) is feasible for the sup-problem,
hence

α(G) ≤ sup{"J, X# : "AG + I, X# = 1, X ∈ C ∗ }.

Combining the last 3 inequalities, we see that equality must hold through-
out, the infimum is attained at λ = α(G) and the supremum is attained
at (4.1) with x being a characteristic vector of a stable set of size α(G).
Hence we have shown the following result.
Theorem 4.2. [15] Let G be a graph. Then

α(G) = max{"J, X# : "AG + I, X# = 1, X ∈ C ∗ }.

This shows on one hand that copositive programs are intractable. It


also shows however, that models based on CP may be substantially stronger
than SDP based models. We will see more of this in some of the subsequent
sections.
4.2. Coloring. Partitioning the vertex set V of a graph into stable
sets is pictorially also called vertex coloring. Each partition block Si re-
ceives a distinct ’color’, and vertices having the same color are non-adjacent,
because Si is stable. We could for instance partition into singletons, so
each vertex would get a distinct color. The chromatic number χ(G) is
the smallest number k such that G has a k-partition into stable partition
blocks.
If we let {S1 , . . .} denote the set of all stable sets of G, then the fol-
lowing integer program determines χ(G),
 
χ(G) = min{ λi : λi xi = e, λi ∈ {0, 1}}.
i i

xi denotes the characteristic vector of the stable set Si . It should be


observed that the number of variables λi is in general not polynomially
bounded in |V (G)|. The fractional chromatic number χf (G) is ob-
tained by allowing λi ≥ 0,
 
χf (G) = min{ λi : λi xi = e, λi ≥ 0}. (4.5)
i i

This is now a linear program with a possibly exponential number of vari-


ables. Computing χf (G) is known to be NP-hard, see for instance [60].
A semidefinite programming based lower bound on χf (G) can be ob-
tained as follows, see Lovász [42] and Schrijver [60]. Let λi ≥ 0 be an

www.it-ebooks.info
MATRIX RELAXATIONS 495
 
optimal solution of (4.5), so χf (G) = i λi . Since i λi xi = e, we can
apply Lemma 3.1. The matrix

M= λi xi xTi (4.6)
i

therefore satisfies

χf (G)M − J ∈ S + , diag(M ) = e.

Moreover, since each xi is characteristic vector of a stable set in G, we also


have muv = 0 uv ∈ E(G), or AG (M ) = 0. Therefore the optimal value of
the following SDP is a lower bound on χf (G),

χf (G) ≥ min{t : diag(M ) = e, AG (M ) = 0, tM − J  0}. (4.7)

Strictly speaking, this is not a linear SDP, because both t and M are vari-
ables, but it can easily be linearized by introducing a new matrix variable
Y for tM and asking that diag(Y ) = te. The resulting problem is the
dual of

max{"J, X# : "I, X# = 1, xij = 0 ij ∈


/ E(G), X  0},

which is equal to ϑ(G). Thus we have shown the Lovász ’sandwich theo-
rem’. In [42], the weaker upper bound χ(G) is shown for ϑ(G), but it is
quite clear that the argument goes through also with χf (G).
Theorem 4.3. [42] Let G be a graph. Then α(G) ≤ ϑ(G) ≤ χf (G).
Let us now imitate the steps leading from ϑ(G) to the copositive
strengthening of the stability number from the previous section. The cru-
cial observation is that M from (4.6) and therefore tM ∈ C ∗ .
This leads to the following conic problem, involving matrices both in
S + and C ∗

t∗ := min{t : diag(M ) = e, AG (M ) = 0, tM − J ∈ S + , M ∈ C ∗ }.

Using again Lemma 3.1 we see that M from (4.6) is feasible for this problem,
therefore

t∗ ≤ χf (G).

In [18] it is in fact shown that equality holds.


Theorem 4.4. [18] The optimal value t∗ of the above SDP-CP relax-
ation of the chromatic number is equal to χf (G).
This shows again the strength of modeling with copositive programs.
New relaxations of the chromatic number based on graph products have
very recently been introduced in [26, 27, 28]. It is beyond the scope of this
introductory survey to elaborate on this approach.

www.it-ebooks.info
496 FRANZ RENDL

4.3. Approximation results for coloring. Similar to Max-k-Cut


we can use the SDP relaxations of coloring to derive a vertex partition
using hyperplane rounding. An additional complication comes from the
fact that the partition blocks have to be stable sets. The first ground-
breaking results were obtained by Karger et al [35]. We briefly explain
some of the ideas for the case of graphs G having χ(G) = 3. This may
seem artificial, but simply knowing that χ(G) = 3 does not help much. In
fact, finding a 4-coloring in a 3-colorable graph is NP-hard, see [37].
Widgerson [68] observes√ that if the largest degree dmax of a graph on
n vertices is large, dmax > n, then 2 colors suffice to color the neighbour-√
hood of the vertex with
√ largest degree, thereby legally coloring√ at least n
vertices. If dmax ≤ n, then the graph can be colored with n + 1 colors.
This yields
√ a simple algorithm that colors any three-colorable graph with
at most 3 n colors, see [68].
In [35], the SDP underlying ϑ(G) is used for hyperplane rounding. The
key observation is that a carefully chosen rounding procedure can be used to
n
produce with high probability a stable set of size Õ( 1/3 ). The Õ notation
dmax
ignores polylogarithmic terms. This leads to a coloring with Õ(n1/4 ) colors.
Based on this approach, Blum and Karger [6] refine the analysis and end
up with colorings using Õ(n3/14 ) colors. Note that 3/14 ≈ 0.2143. Very
recently, these results were further improved by Arora et al [2] to at most
Õ(n0.2111 ) colors. The derivation of these estimates is rather complex. We
refer to the recent dissertation [13] for a detailed discussion of hyperplane
rounding for coloring.

5. Bandwidth of graphs. The minimum bandwidth problem of


a graph G can be interpreted as a reordering problem of the vertices of G
such that the nonzero entries of the adjacency matrix (after reordering) are
collected within a band of small width around the main diagonal. Formally
we denote the vertices of G again by V = {1, . . . , n}. For a permutation
φ ∈ Π, the bandwidth bw(φ) of φ is defined as

bw(φ) := max{|φ(i) − φ(j)| : ij ∈ E(G)}.

The bandwidth of G is the smallest of these values over all bijections φ :


V → V .

bw(G) := min{bw(φ) : φ ∈ Π}. (5.1)

Determining bw(G) is in general NP-hard, see [52]. It remains NP-hard,


even if G is restricted to be a tree with maximum degree 3, see [45].
The bandwidth problem has many applications. Consider for instance
a sparse symmetric system of linear equations. Having an ordering of the
system matrix with small bandwidth may result in a substantial computa-
tional speedup, when actually solving the system.

www.it-ebooks.info
MATRIX RELAXATIONS 497

In the following approximation approach, Blum et al [7] formulate


the bandwidth problem as an ordering of V on n equidistant points on a
quarter-circle of radius n. Let
jπ jπ
Pj := n(cos , sin ), j = 1, . . . , n
2n 2n
denote these points. The problem

min{b : ∃φ ∈ Π such that vi = Pφ(i) , vi − vj  ≤ b ∀ij ∈ E(G)}

is clearly equivalent to finding bw(G). Blum et. al. relax the difficult part
of bijectively mapping V to {P1 , . . . , Pn }. Let us define the constraints

vi  = n ∀i ∈ V, vi − vj  ≤ b ∀ij ∈ E. (5.2)

Simply solving

min{b : vi satisfy (5.2)}

could be done using SDP by introducing X = V T V  0. The constraints


(5.2) translate into

xii = n2 ∀i, xii + xjj − 2xij ≤ β ∀ij ∈ E (5.3)



and minimizing β would yield b = β. Unfortunately, the optimum of this
SDP has value β = 0, by assigning each i to the same point, say P1 . To
force some spreading of the vi , we observe the following.
Proposition 5.1. Let i ∈ V and S ⊆ V \ i. Then
 |S| |S|
(i − j)2 ≥ (|S| + 1)( + 1) =: f (|S|).
6 2
j∈S

Since Pi − Pj  ≥ |i − j|, we can include these additional spread


constraints

vi − vj 2 ≥ f (|S|) ∀S ⊆ V, i ∈ V (5.4)
j∈S

into the SDP. Blum et. al. [7] consider the following strenghtened problem,
which is equivalent to an SDP, once we make the change of variables X =
V TV

min{b : (vi ) satisfy (5.2), (5.4)}. (5.5)

Even though there is an exponential number of these spread constraints, it


can easily be argued that their separation can be done in polynomial time
by sorting, for i fixed, the remaining vertices j ∈ V \ i in increasing order
of vi − vj 2 .

www.it-ebooks.info
498 FRANZ RENDL

Having an optimal solution b, v1 , . . . , vn of the SDP, Blum et al apply


the hyperplane rounding idea as follows. Take a random line through the
origin and project the vi onto this line. The resulting permutation of the
vertices is shown to satisfy the following estimate.
Theorem 5.1. [7] Let G be a graph and b, v1 , . . . , vn denote the opti-
mal solution of (5.5). The ordering φ produced by projecting the vi onto a
random line through the origin satisfies

bw(φ) ≤ O( n/b log(n))bw(G)

with high probability.


The proof of this result is rather long and technical and is omitted here.
We are now going to describe another modeling approach which can
be used to bound bw(G). Let G be a connected graph, given through
its adjacency matrix A ≥ 0. We now consider 3-partitions (S1 , S2 , S3 )
of V having prescribed cardinalities |Si | = mi . Let ⎛ us denote⎞the edges
0 1 0
joining S1 and S2 by δ(S1 , S2 ). If we set B = ⎝ 1 0 0 ⎠ and use
0 0 0
the characteristic vectors si for Si , we get the total weight of edges in
δ(S1 , S2 ) by
1
sT1 As2 = "S, ASB#.
2
Minimizing this cost function over 3-partitions of prescribed cardinalities
m = (m1 , m2 , m3 ) may look artificial, but we will see that it provides a
handle to several graph optimization problems, in particular the bandwidth
problem. So let us consider
1
z3gp := min{ "S, ASB# : Se = e, S T e = m, S ≥ 0, S T S = M }. (5.6)
2
We set M = Diag(m). This is a nonconvex quadratic optimization prob-
lem, hence intractable. We leave it to the reader to verify that feasible
matrices S of this problem are integer. We can use this problem as fol-
lows. If z3gp = 0 then the graph underlying A has a vertex separator
of size m3 . (We recall that S ⊂ V is a vertex separator, if the removal of
S disconnects the graph.) If z3gp > 0 then the bandwidth of A is at least
m3 + 1, see Helmberg et al [31]. This last condition could also be used if
we would have a positive lower bound on z3gp . In [31], such a lower bound
was derived using eigenvalue techniques on the basis that the columns of
S are pairwise orthogonal. Povh and Rendl [54] consider the matrix lifting

M3gp := conv{ssT : s = vec(S), S is 3-partition with |Si | = mi }. (5.7)

It turns out that M3gp can be represented as the intersection of a set of


linear equations with the cone C ∗ of completely positive matrices. To see

www.it-ebooks.info
MATRIX RELAXATIONS 499

how these equations are derived, we start out with the quadratic equation
S T S = M . At this point the Kronecker product of two matrices P and Q,
given by P ⊗ Q := (pij Q), is extremely useful. If P, Q and S have suitable
size, and s = vec(S), the following identity is easy to verify

"S, P SQ# = "QT ⊗ P, ssT #. (5.8)

Using it, we see that (S T S)ij = eTi S T Sej = "ej eTi ⊗ I, ssT #. The symmetry
of ssT allows us to replace ej eTi by Bij := 12 (ej eTi + ei eTj ). Therefore
(S T S)ij = Mij becomes

"Bij ⊗ I, Y # = Mij (5.9)

where Y ∈ M3gp . In a similar way we get

"J3 ⊗ ei eTi , Y # = 1, 1 ≤ i ≤ n, (5.10)

by squaring the n equations Se = e. Pairwise multiplication of the con-


straints S T e = m gives

"Bij ⊗ Jn , Y # = mi mj 1 ≤ i ≤ j ≤ 3. (5.11)

Finally, elementwise multiplication of Se = e with S T e = m gives

"ei eT ⊗ eeTj , Y # = mi 1 ≤ i ≤ 3, 1 ≤ j ≤ n. (5.12)

The following result is shown in [54].


Theorem 5.2.

M3gp = {Y : Y ∈ C ∗ , Y satisfies (5.9), (5.10), (5.11), (5.12)}.

Therefore z3gp could be determined as the optimal solution of the (in-


tractable) copositive program
1
z3gp = min{ "B ⊗A, Y # : Y ∈ C ∗ , Y satisfies (5.9), (5.10), (5.11), (5.12)}.
2
We get a tractable relaxation by asking Y ∈ S + instead of Y ∈ C ∗ . Further
details can be found in [54] and in the dissertation [53]. While the model
investigated by [7] is based on n × n matrices, it has to be emphasized
that the last model uses 3n × 3n matrices, and hence is computationally
significantly more demanding.
6. Quadratic assignments. The quadratic assignment problem, or
QAP for short, is a generalization of the (linear) assignment problem,
briefly described in the introduction. It consists of minimizing a quadratic
function over the set of permutation matrices {Xφ : φ ∈ Π}. The general
form of QAP, for given symmetric n2 × n2 matrix Q is as follows

zQAP := min{xTφ Qxφ : xφ = vec(Xφ ), φ ∈ Π}.

www.it-ebooks.info
500 FRANZ RENDL

In applications, Q is often of the form Q = B ⊗ A, where A and B are


symmetric n × n matrices. In this case the objective function has a repre-
sentation in terms of n × n matrices, see (5.8)

xTφ (B ⊗ A)xφ = "Xφ , AXφ B#.

We refer to the recent monograph [12] by Burkard et al for further details


on assignment problems.
To get a handle on QAP, we consider the matrix lifting

MQAP := conv{xφ xTφ : xφ = vec(Xφ ), φ ∈ Π}.

We will now see that MQAP has a ’simple’ description by linear equations
intersected with the cone C ∗ . Matrices Y ∈ MQAP are of order n2 × n2 .
It will be useful to consider the following partitioning of Y into n × n block
matrices Y i,j ,
⎛ 1,1 ⎞
Y . . . Y 1,n

Y = ⎝ ... .. .. ⎟ .
. . ⎠
Y n,1 . . . Y n,n

Let X = (x1 , . . . , xn ) be a permutation matrix (with columns xi ). Then


X can be characterized by X ≥ 0, X T X = XX T = I. These quadratic
constraints translate into linear constraints on Y :
 
XX T = xi xTi = Y i,i = I.
i i

Similarly, (X T X)ij = xTi xj = tr(xi xTj ) = tr(Y i,j ) = δij . Finally, we have

( ij xij )2 = n2 for any permutation matrix X. We get the following set
of constraints for Y :

Y i,i = I, tr(Y i,j ) = δij , "J, Y # = n2 . (6.1)
i

Povh and Rendl [55] show the following characterization of MQAP , which
can be viewed as a lifted version of Birkhoff’s theorem 1.1.
Theorem 6.1. MQAP = {Y : Y ∈ C ∗ , Y satisfies (6.1)}.
It is not hard to verify that the above result would be wrong without
the seemingly redundant equation "J, Y # = n2 . We can therefore formulate
the quadratic problem QAP as a (linear but intractable) copositive program

zQAP = min{"Q, Y # : Y ∈ C ∗ , Y satisfies (6.1)}.

In [55] some semidefinite relaxations based on this model are investigated


and compared to previously published SDP relaxations of QAP.
We have now seen several instances of combinatorial optimization
problems, where CP relaxations in fact gave the exact value. This raises the

www.it-ebooks.info
MATRIX RELAXATIONS 501

question whether there is some general principle behind this observation.


Burer [10] gives a rather general answer and shows that an appropriate
reformulation of quadratic programs is equivalent to a linear copositive
program.
Theorem 6.2. [10] Let c and aj be vectors from Rn , b ∈ Rk , Q ∈ Sn
and I ⊆ {1, . . . , n}. The optimal values of the following two problems are
equal

min{xT Qx + cT x : aTj x = bj , x ≥ 0, xi ∈ {0, 1} i ∈ I},

min{"Q, X# + cT x : aTj x = bj , aTj Xaj = b2j ,

 
1 xT
Xii = xi ∀i ∈ I, ∈ C ∗ }.
x X

7. Optimal mixing rate of Markov chains. In the previous sec-


tions we focused on various ways to get matrix liftings of NP-hard opti-
mization problems. We conclude now with an SDP model which optimizes
the mixing rate of a finite Markov chain. Let G be a connected graph.
We consider random walks on the vertex set V (G). Suppose we can either
stay at i ∈ V (G) or move to j, provided ij ∈ E(G). Suppose further that
the transition probabilities pij are symmetric, pij = pji ∀ij. The resulting
Markov chain is now described by the transition matrix P satisfying

P ∈ PG := {P : P = P T , P ∈ Ω, pij = 0 ij ∈
/ E(G)}.

Finally, we assume that P is primitive (aperiodic and irreducible). This


means there exists some k such that P k > 0. Let us first recall the Perron-
Frobenius theorem for primitive matrices.
Theorem 7.1. Let P be a nonnegative primitive square matrix. Then
the spectral radius is a simple eigenvalue of P with eigenvector x > 0.
We denote by π(t) ∈ Rn the probability distribution on V at time t.
The definition of the transition probabilities in P imply that π(t + 1) =
P T π(t). Symmetry of P therefore shows that π(t) is determined from the
initial distribution π(0) through π(t) = P t π(0). We denote the eigenvalues
of P by

1 = λ1 (P ) > λ2 (P ) ≥ . . . ≥ λn (P ) > −1.

The Perron-Frobenius theorem tells us that n1 e is eigenvector to the eigen-


value 1, which is also the spectral radius of P . Therefore
1
lim π(t) = e.
t→∞ n

www.it-ebooks.info
502 FRANZ RENDL

What can be said about the speed of convergence? There are several ways
to measure the distance of π(t) from the equilibrium distribution π(∞) =
π = n1 e. One such measure is the maximum relative error at time t
|(P t )ij − πj |
r(t) := max ,
ij πj
see for instance [3]. Let
μP := max{λ2 (P ), −λn (P )} = max{|λi (P )| : i > 1}
denote the second largest eigenvalue of P in modulus (SLEM). It is well
known that μP is closely related to how fast π(t) converges to the equilib-
rium distribution n1 e.
Theorem 7.2. [3] Let P be a symmetric irreducible transition matrix.
Then
r(t) ≤ n(μP )t .
Moreover r(t) ≥ (μP )t if t is even.
Given the graph G, we can ask the question to select the transition
probabilities pij > 0 for ij ∈ E(G) in such a way that the mixing rate of
the Markov chain is as fast as possible. In view of the bounds from the
previous theorem, it makes sense to consider the following optimization
problem (in the matrix variable P ), see Boyd et. al. [9]
min{μP : P ∈ PG }.
They show that μP is in fact a convex function. It follows from the Perron-
Frobenius theorem and the spectral decomposition theorem for symmetric
matrices that μP is either the smallest or the largest eigenvalue of P − n1 J in
absolute value. Hence we can determine μP as the solution of the following
SDP, see [9]
1
min{s : sI  P − J  −sI, P ∈ PG }. (7.1)
n
The variables are s and the matrix P . In [9], it is shown that an optimal
choice of P may significantly increase the mixing rate of the resulting chain.
Some further extensions of this idea are discussed in [64].
8. Computational progress. Up to now we have mostly concen-
trated on the modeling power of SDP and CP. From a practical point of
view, it is also important to investigate the algorithmic possibilities to ac-
tually solve the relaxations. Solving copositive programs is at least as hard
as general integer programming, hence we only consider solving SDP, which
are tractable. Before we describe various algorithms to solve SDP, we recall
the basic assumptions from Theorem 2.1,
∃X0 ) 0, y0 such that A(X) = b, C − AT (y0 ) ) 0. (8.1)

www.it-ebooks.info
MATRIX RELAXATIONS 503

In this case (X, y, Z) is optimal for the primal-dual pair

min{"C, X# : A(X) = b, X  0} = max{bT y : C − AT (y) = Z  0}

if and only if

A(X) = b, AT (y) + Z = C, X  0, Z  0, "X, Z# = 0. (8.2)

We are now going to briefly describe several classes of algorithms to solve


SDP and point out their strengths and limitations.
8.1. Interior-point methods. There exist several quite in-depth de-
scriptions of primal-dual interior-point path-following methods for SDP. We
refer to [67, 63, 14] and the SDP handbook [70]. The website 3 maintains a
collection of software, various data sets and provides benchmark compar-
isons of several competing software packages to solve SDP.
We therefore explain here only the basic ideas. First, it follows from
X  0, Z  0, "X, Z# = 0 that in fact XZ = 0. Interior-point (path-
following) methods can be viewed as a sequence of problems parametrized
by μ ≥ 0. Consider the set Pμ , defined as follows:

Pμ := {(X, y, Z) : A(X) = b, Z + AT (y) = C, X  0, Z  0, ZX = μI}.

Clearly, P0 = ∅, if (8.1) holds. It can in fact be shown that Pμ defines a


unique point (Xμ , yμ , Zμ ) for any μ > 0 if and only if (8.1) holds. See for
instance Theorem 10.2.1 in [70]. In this case the set {(Xμ , yμ , Zμ ) : μ ≥ 0}
defines a smooth curve, parametrized by μ > 0. Following this path until
μ ≈ 0 clearly leads to an optimal solution of SDP. This is the basic idea
underlying interior-point methods. There is quite a variety of different
approaches to achieve this goal. Todd [65] gives a detailed summary of
popular variants to solve SDP by path-following methods.
The crucial step in all these variants consists of the following. Given
a current iterate (Xk , yk , Zk ) with Xk ) 0 and Zk ) 0 and a target path
parameter μk > 0, we use the Newton method to determine a search direc-
tion (ΔX, Δy, ΔZ) towards (the point) Pμk . If there are m equations in
the primal problem, so b ∈ Rm , this amounts to setting up and solving a
(dense) linear system of order m to determine Δy. To set up this system,
and to recover ΔX and ΔZ, some additional computational effort is neces-
sary, involving matrix operations (multiplication, inversion) with matrices
of order n. Having the search direction, one needs to test whether the full
Newton step is feasible (Xk + ΔX ) 0 and Zk + ΔZ ) 0). If not, some sort
of backtracking strategy is used to find a smaller steplength leading to a
new iterate in the interior of S + . Then a new (smaller) target value for μ
is selected and the process is iterated until μ ≈ 0 and the current iterates
are primal and dual feasible.
3 http://plato.asu.edu/bench.html.

www.it-ebooks.info
504 FRANZ RENDL

Table 1
Interior-point computation times to solve (3.7) with relative accuracy 10−6 . Here
m = n.

n time (secs.)
1000 12
2000 102
3000 340
4000 782
5000 1570

Table 2
` ´Interior-point computation times to solve (4.2) with relative accuracy 10−6 , m =
1 n
2 2
and m = 5n.

. /
1 n
n m= 2 2
time (secs.) n m = 5n time (secs.)
100 2488 12 500 2500 14
150 5542 125 1000 5000 120
200 9912 600 1500 7500 410

The convergence
√ analysis shows that under suitable parameter settings
it takes O( n) Newton iterations to reach a solution with the required
accuracy. Typically, the number of such iterations is not too large, often
only a few dozen, but both the memory requirements (a dense m×m matrix
has to be handled) and the computation times grow rapidly with n and m.
To give some impression, we provide in Table 1 some sample timings to
solve the basic relaxation for Max-Cut, see (3.7). It has m = n rather
simple constraints xii = 1. We also consider computing the theta number
ϑ(G) (4.2), see Table 2. Here the computational effort is also influenced
. /
by the cardinality |E(G)|. We consider dense graphs (m = 12 n2 ) and
sparse graphs (m = 5n). In the first case, the number n of vertices can
not be much larger than about 200, in the second case we can go to much
larger graphs. Looking at these timings, it is quite clear that interior-point
methods will become impractical once n ≈ 3000 or m ≈ 5000.
There have been attempts to overcome working explicitly with the
dense system matrix of order m. Toh [66] for instance reports quite en-
couraging results for larger problems by iteratively solving the linear system
for the search direction. A principal drawback of this approach lies in the
fact that the system matrix gets ill-conditioned, as one gets close to the
optimum. This implies that high accuracy is not easily reachable. We also
mention the approach from Kocvara and Stingl [38], which uses a modified
’barrier function’ and also handles large-scale problems. Another line of
research to overcome some of these limitations consists in exploiting spar-
sity in the data. We refer to [20, 50] for some first fundamental steps in
this direction.

www.it-ebooks.info
MATRIX RELAXATIONS 505

8.2. Projection methods. To overcome some of the computational


bottlenecks of interior-point methods, we can exploit the fact that the
projection of an arbitrary symmetric matrix M to the cone of semidefinite
matrices can be obtained
 through a spectral decomposition of M . More
T
precisely, let M = i λi ui ui with pairwise orthogonal eigenvectors ui .
Then

argmin{M − X : X  0} = λi ui uTi =: M + , (8.3)
i:λi >0

see for instance [24].


A rather natural use of projection was recently proposed in [34] and
can be explained as follows. We recall the optimality conditions (8.2) and
observe that "X, Z# = 0 can be replaced by the linear equation bT y −
"C, X# = 0. Hence we can group the optimality conditions into the affine
linear constraints

(LP ) A(X) = b, (LD ) AT (y) + Z = C, (LC ) "C, X# − bT y = 0,

and the SDP conditions X  0, Z  0. The projection onto SDP is given


by (8.3). Projecting onto an affine space is also quite easy. Linear algebra
tells us that the projection ΠP (X) of a symmetric matrix X onto (LP ) is
given by

ΠP (X) := X − AT (AAT )−1 (A(X) − b),

and similarly, Z has the projection

ΠD (Z) := C + AT (AAT )−1 A(Z − C)

onto LD . Thus ΠD (Z) = C + AT (y) with y = (AAT )−1 A(Z − C). Finally,
the projection onto the hyperplane LC is trivial. Thus one can use alter-
nate projections to solve SDP. Take a starting point (X, y, Z), and project
it onto the affine constraints. This involves solving two linear equations
with system matrix AAT , which remains unchanged throughout. Then
project the result onto the SDP cone and iterate. This requires the spec-
tral decomposition of both X and Z.
This simple iterative scheme is known to converge slowly. In [34] some
acceleration strategies are discussed and computational results with m ≈
100, 000 are reported.
Another solution approach for SDP using only SDP projection and
solving a linear equation with system matrix AAT is proposed by Povh et
al [56] and Malick et al [46]. The approach from [34] can be viewed as
maintaining A(X) = b, Z + AT (y) = C and the zero duality gap condition
bT y = "C, X# and trying to get X and Z into S + . In contrast, the approach
from [56, 46] maintains X  0, Z  0, ZX = 0 and tries to reach feasibility
with respect to the linear equations. The starting point of this approach

www.it-ebooks.info
506 FRANZ RENDL

consists of looking at the augmented Lagrangian formulation of the dual


SDP. Let
σ
fσ,X (y, Z) := bT y − "X, AT (y) + Z − C# − AT (y) + Z − C2 (8.4)
2
and consider

max fσ,X (y, Z).


y,Z0

Having (approximate) maximizers y, Z (for σ and X held constant), the


augmented Lagrangian method, see [5], asks to update X by

X ← X + σ(Z + AT (y) − C) (8.5)

and iterate until dual feasibility is reached. The special structure of the
subproblem given by (8.4) allows us to interpret the update (8.5) differently.
After introducing the Lagrangian L = fσ,X (y, Z) + "V, Z# with respect
to the constraint Z  0, we get the following optimality conditions for
maximizing (8.4)

∇y L = b − A(X) − σA(AT (y) + Z − C) = 0,

∇Z L = V − X − σ(AT (y) + Z − C) = 0, V  0, Z  0, "V, Z# = 0.

The condition ∇Z L = 0 suggests to set X = V , see (8.5). We note that L


could also be written as
σ 1 1
L = bT y − Z − (C − AT (y) − X)2 + X2 .
2 σ 2σ
Therefore, at the optimum, Z must also be the projection of W := C −
AT (y) − σ1 X onto S + , Z = W + . Thus ∇Z L = 0 becomes

1
V = σ(Z + X + AT (y) − C) = σ(W + − W ) = −σW − ,
σ
where W − is the projection of W onto −(S + ). This leads to the boundary
point method from [56]. Given X, Z, solve the ∇y L = 0 for y:
1
AAT (y) = (b − A(X) − A(Z − C)).
σ
Then compute the spectral decomposition of W = C − AT (y) − σ1 X and
get a new iterate X = −σW − , Z = W + and iterate.
The computational effort of one iteration is essentially solving the lin-
ear system with matrix AAT and computing the factorization of W . Fur-
ther details, like convergence analysis and parameter updates are described
in [56, 46]. To show the potential of this approach, we compute ϑ(G) for

www.it-ebooks.info
MATRIX RELAXATIONS 507

Table 3
Boundary-point computation times to solve (4.2) with relative accuracy 10−6 .

n m = 14 n2 time (secs.)
400 40000 40
600 90000 100
800 160000 235
1000 250000 530
1200 360000 1140

larger dense graphs with m = 14 n2 , see Table 3. It is clear from these


results that projection methods extend the computational possibilities for
SDP. Zhao et al [71] recently generalized this approach to include second
order information in the updates. At higher computational cost they get
more accurate solutions.
The computational limitations of projection methods are determined
on one hand by the need to compute a spectral decomposition of a sym-
metric matrix, which limits the matrix dimension similar to interior-point
methods. On the other hand, the system matrix AAT does not change dur-
ing the algorithm, and any sparsity properties of the input can therefore be
fully exploited. In case of computing the ϑ function for instance, it turns
out that AAT is in fact diagonal. This is one explanation why instances
with m beyond 100,000 are easily manageable by projection methods.
8.3. Further solution approaches for SDP. To avoid limits on the
size of primal matrices (given through the spectral decomposition or the
linear algebra of interior-point methods), one can also use the encoding of
semidefiniteness through the condition λmin (X) ≥ 0. The computation of
λmin (X), for symmetric X, can be done iteratively for quite large matrices.
The only requirement is that v = Xu can be evaluated. In particular, X
need not be stored explicitly. The spectral bundle method exploits this
fact, and applies to SDP, where (primal) feasibility implies constant trace.
It was introduced in [32] and we refer to this paper for all further details. It
turns out that SDP with rather large matrices (n ≈ 10, 000) can be handled
by this method, but the convergence properties are much weaker than in
the case of interior-point methods. Helmberg [30] describes computational
results on a variety of large scale combinatorial optimization problems.
Another idea to get rid of the semidefiniteness condition is to use the
the factorization X = RRT , and work with R using algorithms from non-
linear optimization. Burer and Monteiro [11] investigate this approach and
present some encouraging computational results for some specially struc-
tured SDP like the Max-Cut relaxation (3.7). The drawback here is that
by going from X to the factor R, one loses convexity.
Finally, SDP based relaxations have been used successfully to get exact
solutions. Exact solutions of the Max-Cut problem are reported in [58]

www.it-ebooks.info
508 FRANZ RENDL

for instances having several hundred vertices, see also the thesis [69]. A
combination of polyhedral and SDP relaxations for the bisection problem
is studied in [1]. Exact solutions of rather large sparse instances (n ≈ 1000)
are obtained for the first time. Finally, exact solutions for Max-k-Cut are
given in [21].

Algorithms for linear optimization have reached a high level of sophis-


tication, making them easily accessible even for non-experts. In contrast,
most algorithms for SDP require a basic understanding of semidefinite op-
timization. While small problems n ≈ 100 can be solved routinely, there
is no standard way to solve medium sized SDP in a routine way. Finally,
it is a challenging open problem to find efficient ways of optimizing over
S+ ∩ N .

Acknowledgements. Many thanks go to Nathan Krislock and an


anonymous referee for giving numerous suggestions for improvement, and
for pointing out minor inconsistencies.

REFERENCES

[1] M. Armbruster, M. Fügenschuh, C. Helmberg, and A. Martin. A compara-


tive study of linear and semidefinite branch-and-cut methods for solving the
minimum graph bisection problem. In Integer programming and combinatorial
Optimization (IPCO 2008), pages 112–124, 2008.
[2] S. Arora, E. Chlamtac, and M. Charikar. New approximation guarantee for
chromatic number. In Proceedings of the 38th STOC, Seattle, USA, pages
215–224, 2006.
[3] E. Behrends. Introduction to Markov chains with special emphasis on rapid mix-
ing. Vieweg Advanced Lectures in Mathematics, 2000.
[4] A. Berman and N. Shaked-Monderer. Completely positive matrices. World
Scientific Publishing, 2003.
[5] D.P. Bertsekas. Constrained optimization and Lagrange multipliers. Academic
Press, 1982.
[6] A. Blum and D. Karger. An Õ(n3/14 )-coloring algorithm for 3-colorable graphs.
Information processing letters, 61(1):249–53, 1997.
[7] A. Blum, G. Konjevod, R. Ravi, and S. Vempala. Semi-definite relaxations for
minimum bandwidth and other vertex-ordering problems. Theoretical Com-
puter Science, 235(1):25–42, 2000.
[8] I.M. Bomze, M. Dür, E. de Klerk, C. Roos, A.J. Quist, and T. Terlaky.
On copositive programming and standard quadratic optimization problems.
Journal of Global Optimization, 18(4):301–320, 2000.
[9] S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing Markov chain on a graph.
SIAM Review, 46(4):667–689, 2004.
[10] S. Burer. On the copositive representation of binary and continuous nonconvex
quadratic programs. Mathematical Programming (A), 120:479–495, 2009.
[11] S. Burer and R.D.C. Monteiro. A nonlinear programming algorithm for solving
semidefinite programs via low-rank factorization. Mathematical Programming
(B), 95:329–357, 2003.
[12] R.E. Burkard, M. Dell’Amico, and S. Martello. Assignment problems. SIAM,
2009.
[13] E. Chlamtac. Non-local Analysis of SDP based approximation algorithms. PhD
thesis, Princeton University, USA, 2009.

www.it-ebooks.info
MATRIX RELAXATIONS 509

[14] E. de Klerk. Aspects of semidefinite programming: interior point algorithms and


selected applications. Kluwer Academic Publishers, 2002.
[15] E. de Klerk and D.V. Pasechnik. Approximation of the stability number of a
graph via copositive programming. SIAM Journal on Optimization, 12:875–
892, 2002.
[16] E. de Klerk, D.V. Pasechnik, and J.P. Warners. On approximate graph colour-
ing and Max-k-Cut algorithms based on the ϑ-function. Journal of Combina-
torial Optimization, 8(3):267–294, 2004.
[17] R.J. Duffin. Infinite programs. Ann. Math. Stud., 38:157–170, 1956.
[18] I. Dukanovic and F. Rendl. Copositive programming motivated bounds on the
stability and chromatic numbers. Mathematical Programming, 121:249–268,
2010.
[19] A. Frieze and M. Jerrum. Improved approximation algorithms for Max k-Cut
and Max Bisection. Algorithmica, 18(1):67–81, 1997.
[20] M. Fukuda, M. Kojima, K. Murota, and K. Nakata. Exploiting sparsity in
semidefinite programming via matrix completion. I: General framework. SIAM
Journal on Optimization, 11(3):647–674, 2000.
[21] B. Ghaddar, M. Anjos, and F. Liers. A branch-and-cut algorithm based on
semdefinite programming for the minimum k-partition problem. Annals of
Operations Research, 2008.
[22] M.X. Goemans. Semidefinite programming in combinatorial optimization. Math-
ematical Programming, 79:143–162, 1997.
[23] M.X. Goemans and D.P. Williamson. Improved approximation algorithms for
maximum cut and satisfiability problems using semidefinite programming.
Journal of the ACM, 42:1115–1145, 1995.
[24] G. Golub and C.F. Van Loan. Matrix computations. 3rd ed. Baltimore, MD:
The Johns Hopkins Univ. Press., 1996.
[25] M. Grötschel, L. Lovász, and A. Schrijver. Geometric algorithms and com-
binatorial optimization. Springer, Berlin, 1988.
[26] N. Gvozdenović. Approximating the stability number and the chromatic number
of a graph via semidefinite programming. PhD thesis, University of Amster-
dam, 2008.
[27] N. Gvozdenović and M. Laurent. Computing semidefinite programming lower
bounds for the (fractional) chromatic number via block-diagonalization. SIAM
Journal on Optimization, 19(2):592–615, 2008.
[28] N. Gvozdenović and M. Laurent. The operator Ψ for the chromatic number of
a graph. SIAM Journal on Optimization, 19(2):572–591, 2008.
[29] C. Helmberg. Semidefinite programming. European Journal of Operational Re-
search, 137:461–482, 2002.
[30] C. Helmberg. Numerical validation of SBmethod. Mathematical Programming,
95:381–406, 2003.
[31] C. Helmberg, B. Mohar, S. Poljak, and F. Rendl. A spectral approach to
bandwidth and separator problems in graphs. Linear and Multilinear Algebra,
39:73–90, 1995.
[32] C. Helmberg and F. Rendl. A spectral bundle method for semidefinite program-
ming. SIAM Journal on Optimization, 10:673–696, 2000.
[33] J.-B. Hiriart-Urruty and A. Seeger. A variational approach to copositive
matrices. SIAM Review, 52(4):593–629, 2010.
[34] F. Jarre and F. Rendl. An augmented primal-dual method for linear conic
problems. SIAM Journal on Optimization, 20:808–823, 2008.
[35] D. Karger, R. Motwani, and M. Sudan. Approximate graph colouring by
semidefinite programming. Journal of the ACM, 45:246–265, 1998.
[36] S.E. Karisch and F. Rendl. Semidefinite Programming and Graph Equipartition.
Fields Institute Communications, 18:77–95, 1998.
[37] S. Khanna, N. Linial, and S. Safra. On the hardness of approximating the
chromatic number. Combinatorica, 20:393–415, 2000.

www.it-ebooks.info
510 FRANZ RENDL

[38] M. Kocvara and M. Stingl. On the solution of large-scale SDP problems by the
modified barrier method using iterative solvers. Mathematical Programming,
95:413–444, 2007.
[39] M. Laurent and F. Rendl. Semidefinite programming and integer program-
ming. In K. Aardal, G.L. Nemhauser, and R. Weismantel, editors, Discrete
Optimization, pages 393–514. Elsevier, 2005.
[40] E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys (eds.).
The traveling salesman problem, a guided tour of combinatorial optimization.
Wiley, Chicester, 1985.
[41] A. Lisser and F. Rendl. Graph partitioning using Linear and Semidefinite Pro-
gramming. Mathematical Programming (B), 95:91–101, 2002.
[42] L. Lovász. On the Shannon capacity of a graph. IEEE Trans. Inform. Theory,
25:1–7, 1979.
[43] L. Lovász. Semidefinite programs and combinatorial optimization. In B. A. Reed
and C.L. Sales, editors, Recent advances in algorithms and combinatorics,
pages 137–194. CMS books in Mathematics, Springer, 2003.
[44] L. Lovász and A. Schrijver. Cones of matrices and set-functions and 0-1 opti-
mization. SIAM Journal on Optimization, 1:166–190, 1991.
[45] D. Johnson M. Garey, R. Graham and D.E. Knuth. Complexity results for
bandwidth minization. SIAM Journal on Applied Mathematics, 34:477–495,
1978.
[46] J. Malick, J. Povh, F. Rendl, and A. Wiegele. Regularization methods for
semidefinite programming. SIAM Journal on Optimization, 20:336–356, 2009.
[47] R.J. McEliece, E.R. Rodemich, and H.C. Rumsey Jr. The Lovász bound and
some generalizations. Journal of Combinatorics and System Sciences, 3:134–
152, 1978.
[48] T.S. Motzkin and E.G. Straus. Maxima for graphs and a new proof of a theorem
of Turán. Canadian Journal of Mathematics, 17:533–540, 1965.
[49] K.G. Murty and S.N. Kabadi. Some NP-complete problems in quadratic and
nonlinear programming. Mathematical Programming, 39:117–129, 1987.
[50] K. Nakata, K. Fujisawa, M. Fukuda, K. Kojima, and K. Murota. Exploiting
sparsity in semidefinite programming via matrix completion. II: Implementa-
tion and numerical results. Mathematical Programming, 95:303–327, 2003.
[51] Y. Nesterov. Quality of semidefinite relaxation for nonconvex quadratic opti-
mization. Technical report, CORE, 1997.
[52] C.H. Papadimitriou. The NP-completeness of the bandwidth minimization prob-
lem. Computing, 16:263–270, 1976.
[53] J. Povh. Applications of semidefinite and copositive programming in combinato-
rial optimization. PhD thesis, Universtity of Ljubljana, Slovenia, 2006.
[54] J. Povh and F. Rendl. A copositive programming approach to graph partitioning.
SIAM Journal on Optimization, 18(1):223–241, 2007.
[55] J. Povh and F. Rendl. Copositive and semidefinite relaxations of the quadratic
assignment problem. Discrete Optimization, 6(3):231–241, 2009.
[56] J. Povh, F. Rendl, and A. Wiegele. A boundary point method to solve semidef-
inite programs. Computing, 78:277–286, 2006.
[57] F. Rendl. Semidefinite relaxations for integer programming. In M. Jünger Th.M.
Liebling D. Naddef G.L. Nemhauser W.R. Pulleyblank G. Reinelt G. Rinaldi
and L.A. Wolsey, editors, 50 years of integer programming 1958-2008, pages
687–726. Springer, 2009.
[58] F. Rendl, G. Rinaldi, and A. Wiegele. Solving max-cut to optimality by inter-
secting semidefinite and polyhedral relaxations. Mathematical Programming,
212:307–335, 2010.
[59] A. Schrijver. A comparison of the Delsarte and Lovász bounds. IEEE Transac-
tions on Information Theory, IT-25:425–429, 1979.
[60] A. Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. B,
volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003.

www.it-ebooks.info
MATRIX RELAXATIONS 511

[61] H.D. Sherali and W.P. Adams. A hierarchy of relaxations between the continuous
and convex hull representations for zero-one programming problems. SIAM
Journal on Discrete Mathematics, 3(3):411–430, 1990.
[62] H.D. Sherali and W.P. Adams. A hierarchy of relaxations and convex hull
characterizations for mixed-integer zero-one programming problems. Discrete
Appl. Math., 52(1):83–106, 1994.
[63] J. Sturm. Theory and algorithms of semidefinite programming. In T. Terlaki
H. Frenk, K.Roos and S. Zhang, editors, High performance optimization, pages
1–194. Springer Series on Applied Optimization, 2000.
[64] J. Sun, S. Boyd, L. Xiao, and P. Diaconis. The fastest mixing Markov process
on a graph and a connection to a maximum variance unfolding problem. SIAM
Reviev, 48(4):681–699, 2006.
[65] M.J. Todd. A study of search directions in primal-dual interior-point methods
for semidefinite programming. Optimization Methods and Software, 11:1–46,
1999.
[66] K.-C. Toh. Solving large scale semidefinite programs via an iterative solver on the
augmented systems. SIAM Journal on Optimization, 14:670 – 698, 2003.
[67] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review,
38:49–95, 1996.
[68] A. Widgerson. Improving the performance guarantee for approximate graph
colouring. Journal of the ACM, 30:729– 735, 1983.
[69] A. Wiegele. Nonlinear optimization techniques applied to combinatorial opti-
mization problems. PhD thesis, Alpen-Adria-Universität Klagenfurt, Austria,
2006.
[70] H. Wolkowicz, R. Saigal, and L. Vandenberghe (eds.). Handbook of semidef-
inite programming. Kluwer, 2000.
[71] X. Zhao, D. Sun, and K. Toh. A Newton CG augmented Lagrangian method
for semidefinite programming. SIAM Journal on Optimization, 20:1737–1765,
2010.

www.it-ebooks.info
www.it-ebooks.info
A POLYTOPE FOR A PRODUCT OF
REAL LINEAR FUNCTIONS IN 0/1 VARIABLES
OKTAY GÜNLÜK∗ , JON LEE† , AND JANNY LEUNG‡

Abstract. In the context of integer programming, we develop a polyhedral method


for linearizing a product of a pair of real linear functions in 0/1 variables. As an ex-
ample, by writing a pair of integer variables in binary expansion, we have a technique
for linearizing their product. We give a complete linear description for the resulting
polytope, and we provide an efficient algorithm for the separation problem. Along the
way to establishing the complete description, we also give a complete description for an
extended-variable formulation, and we point out a generalization.

Key words. Integer nonlinear programming, polytope, product.

AMS(MOS) subject classifications. 52B11, 90C10, 90C57, 90C30.

1. Introduction. We assume familiarity with polyhedral methods of


linear integer programming (see [NW88], for example). There is a well-
known method of linear integer programming for modeling the product (i.e.,
logical AND) of a pair of binary variables. Specifically, the 0/1 solutions
of y = x1 x2 are, precisely, the extreme points of the polytope in R3 that is
the solution set of

y≥0; (1.1)
y ≥ x1 + x2 − 1 ; (1.2)
y ≤ x1 ; (1.3)
2
y≤x . (1.4)

There has been extensive interest in the development of linear integer


programming methods for handling and/or exploiting products of many
0/1 variables. In particular, quite a lot is known about the facets of the
convex hull of the 0/1 solutions to: y ij = xi xj , 1 ≤ i < j ≤ n (the
boolean quadric polytope). See [Pad89]. This polytope is related to other
well-studied structures such as the cut polytope [DS90] and the correlation
polytope [Pit91]. Also see [BB98], [DL97] and references therein. Of course,
we know less and less about the totality of the facets of the polytope as n
increases, because optimization over these 0/1 solutions is NP-hard.
Rather than considering pairwise products of many 0/1 variables, we
consider a single product of a pair of real linear functions in 0/1 variables.
For l = 1, 2, let kl be a positive integer, let Kl = {1, 2, . . . , kl }, let al be

∗ IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.

([email protected]).
† Department of Industrial and Operations Engineering, University of Michigan,

Ann Arbor, MI 48109, U.S.A. ([email protected]).


‡ Chinese University of Hong Kong ([email protected]).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 513
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_18,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
514 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

a kl -vector of positive real numbers ali , and let xl be a kl -vector of binary


variables xli . Note that if we had any ali = 0, then we could just delete
such xli , and if we had any ali < 0, then we could just complement such xli
and apply our methods to the nonlinear part of the resulting product. So,
hereafter, we assume without loss of generality that ali are all positive.
Now, we let P (a1 , a2 ) be the convex hull of the solutions in Rk1 +k2 +1 of

 ⎛ ⎞
   
y= a1i x1i ⎝ a2j x2j ⎠ = a1i a2j x1i x2j ; (1.5)
i∈K1 j∈K2 i∈K1 j∈K2

xli ∈ {0, 1}, for i ∈ Kl , l = 1, 2 . (1.6)

We note that P ((1), (1)) is just the solution set of (1.1–1.4). Our goal
is to investigate the polytope P (a1 , a2 ) generally.
In Section 2, we describe an application to modeling a product of a
pair of nonnegative integer variables using binary expansion. In Section
3, we describe a linear integer formulation of P (a1 , a2 ). In Section 4, we
investigate which of our inequalities are facet describing. In Section 5,
we determine a complete polyhedral characterization of P (a1 , a2 ). In es-
tablishing this characterization, we also find an inequality characterization
of a natural extended-variable formulation. In Section 6, we demonstrate
how to solve the separation problem for the facet describing inequalities of
P (a1 , a2 ). In Section 7, we investigate some topological properties of real
points in the P (a1 , a2 ) that satisfy the product equation (1.5). In Section
8, we briefly describe a generalization of our results.
Our results were first reported in complete form in [CGLL03], which
in turn was developed from [CLL99]. Other related work, much of which
was undertaken or announced later, includes [BB98, AFLS01, HHPR02,
JKKW05, Kub05, CK05, ÇKM06, FFL05, Lee07].

2. Products of nonnegative integer variables. Let x1 and x2


be a pair of nonnegative integer variables. We can look directly at the
convex hull of the integer solutions of y = x1 x2 . This convex set, in R3 ,
contains integer points that do not correspond to solutions of y = x1 x2 .
For example, x1 = x2 = y = 0 is in this set, as is x1 = x2 = 2, y = 4,
but the average of these two points is not a solution of y = x1 x2 . More
concretely, the convex hull of the integer solutions to:

y = x1 x2 ;
0 ≤ x1 ≤ 2 ;
0 ≤ x2 ≤ 2

is precisely the solution set of

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 515

y≥0;
y ≥ 2x1 + 2x2 − 4 ;
y ≤ 2x1 ;
y ≤ 2x2 .

If we use these latter linear inequalities to model the product y, and then
seek to maximize y subject to these constraints and a side constraint x1 +
x2 ≤ 2, we find the optimal solution x1 = 1, x2 = 1, y = 2, which does
not satisfy y = x1 x2 . Therefore, this naı̈ve approach is inadequate in the
context of linear integer programming.
We adopt an approach that avoids the specific problem above where
an integer infeasible point is caught in the convex hull of integer feasible
solutions. Specifically, we assume that, for practical purposes, x1 and x2
can be bounded above. So we can write the xl in binary expansion: xl =
 i−1 l
i∈Kl 2 xi , for l = 1, 2. That is, we let ali = 2i−1 . The only integer
points in P (a1 , a2 ) are the solutions of (1.5–1.6). Therefore, we avoid the
problem that we encountered when we did not use the binary expansions
of x1 and x2 .
3. Linear integer formulation. Obviously, for l = 1, 2, the simple
bound inequalities are valid for P (a1 , a2 ):

xli ≥ 0, for i ∈ Kl ; (3.1)


xli ≤ 1, for i ∈ Kl . (3.2)

We view the remaining inequalities that we present as lower and upper


bounds on the product variable y. In the sequel, H is a subset of Kl × Kl̄ ,
where l is either 1 or 2, and l̄ = 3 − l.
Consider the following lower bound inequalities for y:
 . /
y≥ a1i a2j x1i + x2j − 1 . (3.3)
(i,j)∈H

Proposition 3.1. The inequalities (3.3) are valid for P (a1 , a2 ).


Proof.
 
y= a1i a2j x1i x2j
i∈K1 j∈K2

≥ a1i a2j x1i x2j
(i,j)∈H
 . /
≥ a1i a2j x1i + x2j − 1 .
(i,j)∈H

www.it-ebooks.info
516 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

We derive upper bounds on y by considering certain transformations


φl , l = 1, 2, of the polytope P (a1 , a2 ), defined by the following formulae:

x̃li = 1 − xli , for i ∈ Kl ;


x̃l̄j xl̄j
, for j ∈ Kl̄ ;
=
 
ỹ = ali al̄j x̃li x̃l̄j .
i∈Kl j∈Kl̄

Proposition 3.2. The transformation φl , l = 1, 2, is an affine


involution.
Proof. Clearly φl ◦ φl is the identity transformation. To see that it
is affine, we need just check that ỹ is an affine function of the original
variables.
 
ỹ = ali al̄j x̃li x̃l̄j
i∈Kl j∈Kl̄
 
= ali al̄j (1 − xli )xl̄j
i∈Kl j∈Kl̄
   
= ali al̄j xl̄j − ali al̄j xli xl̄j
i∈Kl j∈Kl̄ i∈Kl j∈Kl̄
 
= ali al̄j xl̄j −y .
i∈Kl j∈Kl̄

Our upper bound inequalities take the form:


   1 2
y≤ ali al̄j xl̄j + ali al̄j xli − xl̄j . (3.4)
i∈Kl j∈Kl̄ (i,j)∈H

Proposition 3.3. The inequalities (3.4) are valid for P (a1 , a2 ).


Proof. We apply the lower bound inequalities (3.3) to the variables
transformed by φl , to obtain
   1 2
ali al̄j xl̄j − y = ỹ ≥ ali al̄j x̃li + x̃l̄j − 1 .
i∈Kl j∈Kl̄ (i,j)∈H

Solving for y, and rewriting this in terms of xl and xl̄ , we obtain the upper
bound inequalities (3.4).
Note that the sets of inequalities (3.4) with l = 1 and l = 2 are
equivalent — this follows by checking that changing l is equivalent to com-
plementing H.
The transformation φl corresponds to the “switching” operation used
in the analysis of the cut polytope (see [DL97], for example). Specifically,
(1.2) and (1.3) are switches of each other under φ1 , and (1.2) and (1.4) are
switches of each other under φ2 .

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 517

Proposition 3.4. The points satisfying (1.6) and (3.3–3.4) for all
cross products H are precisely the points satisfying (1.5–1.6).
Proof. By Propositions 3.1 and 3.3, we need only show that every
point satisfying (1.6) and (3.3–3.4) for all cross products H also satisfies
(1.5). Let (x1 , x2 , y) be a point satisfying (1.6) and (3.3–3.4). Letting

H = {i ∈ K1 : x1i = 1} × {j ∈ K2 : x2j = 1} ,

we obtain a lower bound inequality (3.3) that is satisfied as an equation by


the point (x1 , x2 , y). Similarly, letting

H = {i ∈ K1 : x1i = 0} × {j ∈ K2 : x2j = 1} ,

we obtain an upper bound inequality (3.4) that is satisfied as an equa-


tion by the point (x1 , x2 , y). So x1 and x2 together with (3.3–3.4) for
cross products
1 H determine y. But 2by the definition of P (a1 , a2 ), the
 
point x1 , x2 , i∈K1 j∈K2 a1i a2j x1i x2j is in P (a1 , a2 ), so (1.5) must be
satisfied.
4. Facets. For i ∈ Kl , l = 1, 2, we let ei ∈ Rkl denote the i-th
standard unit vector. For simplicity, we say that a point is tight for an
inequality if it satisfies the inequality as an equation.
Proposition 4.1. For k1 , k2 > 0, the polytope P (a1 , a2 ) is full di-
mensional.
Proof. We prove this directly. The following k1 + k2 + 2 points in
P (a1 , a2 ) are easily seen to be affinely independent:
• (x1 , x2 , y) = (0, 0, 0) ;
• (x1 , x2 , y) = (ej , 0, 0), for j ∈ K1 ;
• (x1 , x2 , y) = (0, ej , 0), for j ∈ K2 ;
• (x1 , x2 , y) = (e1 , e1 , a11 a21 ) .
Proposition 4.2. For l = 1, 2, the inequalities (3.1) describe facets
of P (a1 , a2 ) when kl > 1.
Proof. Again, we proceed directly. For both l = 1 and l = 2, the
following k1 + k2 + 1 points in P (a1 , a2 ) are tight for (3.1) and are easily
seen to be affinely independent:
• (x1 , x2 , y) = (0, 0, 0) ;
• (x1 , x2 , y) defined by xl = ej , xl̄ = 0, y = 0, for j ∈ Kl \ {i} ;
• (x1 , x2 , y) defined by xl = 0, xl̄ = ej , y = 0, for j ∈ Kl̄ ;
• (x1 , x2 , y) defined by xl = em , for some m ∈ Kl \ {i}, xl̄ = e1 ,
y = alm al̄1 .
Proposition 4.3. For l = 1, 2, the inequalities (3.2) describe facets
of P (a1 , a2 ) when kl > 1.
Proof. The affine transformation φl is an involution from the face of
P (a , a2 ) described by the simple lower bound inequality xli ≥ 0 to the
1

face of P (a1 , a2 ) described by the simple upper bound inequality xli ≤ 1.

www.it-ebooks.info
518 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

Since the affine transformation is invertible, it is dimension preserving.


Therefore, by Proposition 4.2, the result follows.
One might suspect from Proposition 3.4 that the only inequalities of
the form (3.3–3.4) that yield facets have H being a cross product, but this
is not the case. We now define a key subset H(k1 , k2 ) of the set of subsets
of K1 × K2 . Each H in H(k1 , k2 ) arises by choosing a permutation of the
variables
{x1i : i ∈ K1 } ∪ {x2j : j ∈ K2 } .
If x1i precedes x2j in the permutation, which we denote by x1i ≺ x2j , then
(i, j) is in H.
For example, if S1 ⊂ K1 and S2 ⊂ K2 , then we get the cross product
H = S1 × S2 ∈ H(k1 , k2 ) by choosing any permutation of the variables
having {x2j : j ∈ K2 \ S2 } first, followed by {x1i : i ∈ S1 }, followed by
{x2j : j ∈ S2 }, followed by {x1i : i ∈ K1 \ S1 }.
As another example, let k1 = 3 and k2 = 2, and consider the permu-
tation: x12 , x22 , x11 , x13 , x21 , which yields
H = {(2, 2), (2, 1), (1, 1), (3, 1)} .
This choice of H is not a cross product. However, this H yields the lower
bound inequality:
y ≥ a12 a22 (x12 + x22 − 1)
+ a12 a21 (x12 + x21 − 1)
+ a11 a21 (x11 + x21 − 1)
+ a13 a21 (x13 + x21 − 1) .
We demonstrate that this inequality describes a facet by displaying k1 +
k2 + 1 = 6 affinely independent points in P (a1 , a2 ) that are tight for the
inequality: We display the points as rows of the matrix below, where we
have permuted the columns, in a certain manner, according to the permu-
tation of the variables that led to H, and we have inserted a column of
1’s. It suffices to check that the square “caterpillar matrix” obtained by
deleting the y column is nonsingular.

x22 x21 1 x12 x11 x13 y


0 0 1 1 1 1 0 = (a13 + a11 + a12 )(0)
0 1 1 1 1 1 (a13 + a11 + a12 )(a21 )
0 1 1 1 1 0 (a11 + a12 )(a21 )
0 1 1 1 0 0 (a12 )(a21 )
1 1 1 1 0 0 (a12 )(a21 + a22 )
1 1 1 0 0 0 0 = (0)(a21 + a22 )

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 519

For m ≥ 3, an order-m caterpillar matrix is obtained by choosing


k1 , k2 ≥ 1 with k1 + k2 + 1 = m. The first row of such a matrix is
(0 ∈ Rk2 , 1, 1 ∈ Rk1 ), and the last row is (1 ∈ Rk2 , 1, 0 ∈ Rk1 ). Each
row is obtained from the one above it by either flipping the right-most
1 to 0, or flipping the 0 immediately to the left of the left-most 1 to 1. So
the rows depict snapshots of a “caterpillar” of 1’s that moves from right to
left by either pushing its head forward a bit or pulling its tail forward a bit.
We note that besides translating affine to linear dependence, the column
of 1’s precludes the unsettling possibility of a 1-bit caterpillar disappear-
ing by pulling its tail forward a bit, and then reappearing by pushing its
head forward a bit. Since caterpillar matrices have their 1’s in each row
consecutively appearing, they are totally unimodular (see [HK56]). That
is, the determinant of a caterpillar matrix is in {0, ±1}. In fact, we have
the following result.
Proposition 4.4. The determinant of a caterpillar matrix is in
{±1}.
Proof. The proof is by induction on m. The only order-3 caterpillar
matrices are
⎛ ⎞ ⎛ ⎞
0 1 1 0 1 1
⎝ 0 1 0 ⎠ and ⎝ 1 1 1 ⎠ .
1 1 0 1 1 0

It is easy to check that these have determinant in {±1}.


Now, suppose that we have a caterpillar matrix of order m ≥ 4. De-
pending on the bit flip that produces the last row from the one above it,
the matrix has the form
⎛ ⎞
0 0 ··· 0 1 1 1 ··· 1 1
⎜ 0 0 ··· 0 1 1 1 ··· 1 0 ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ . . ⎟
⎜ ⎟
⎝ 1 0 ⎠
1 0
or
⎛ ⎞
0 0 ··· 0 0 1 1 1 ··· 1
⎜ 0 0 ··· 0 1 1 1 1 ··· 1 ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ . . ⎟ .
⎜ ⎟
⎝ 1 1 ⎠
1 1

In the first case, we can expand the determinant along the last column and
we obtain a caterpillar matrix of order m − 1 with the same determinant
as the original matrix, up to the sign. In the second case, we subtract the
first row from the second row (which does not affect the determinant), and

www.it-ebooks.info
520 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

then we expand along the second row of the resulting matrix. Again, we
obtain a caterpillar matrix of order m − 1 with the same determinant as
the original matrix, up to the sign. In either case, the result follows by
induction.
Proposition 4.5. An inequality of the form (3.3) describes a facet
of P (a1 , a2 ) when H ∈ H(k1 , k2 ).
Proof. By Proposition 3.1, these inequalities are valid for P (a1 , a2 ).
It suffices to exhibit k1 + k2 + 1 affinely independent points of P (a1 , a2 )
that are tight for (3.3). Let Φ be the permutation that gives rise to H. It
is easy to check that (x1 , x2 , y) = (0, 1, 0) is a point of P (a1 , a2 ) that is
tight for (3.3). We generate the remaining k1 + k2 points by successively
flipping bits in the order of the permutation Φ. We simply need to check
that each bit flip preserves equality in (3.3). If a variable x1i is flipped
from
 0 to 11,2 .the1 increase
2
/in y (i.e., the left-hand side of (3.3)) and in
a a x + x − 1 (i.e., the right-hand side of (3.3)) is precisely
(i,j)∈H i1 j2 i j
a
j:xi ≺xj i j
1 2 a . Similarly, if a variable x2j is flipped from 1 to 0, the decrease

in both of these quantities is precisely i:x1 ≺x2 a1i a2j .
i j
Next, we arrange these k1 + k2 + 1 points, in the order generated, as
the rows of a caterpillar matrix of order k1 + k2 + 1. A point (x1 , x2 , y)
yields the row (x2Φ , 1, x1Φ ), where xlΦ is just xl permuted according to the
order of the xli in Φ. Clearly this defines a caterpillar matrix, which is
nonsingular by Proposition 4.4. Hence, the generated points are affinely
independent, so (3.3) describes a facet when H ∈ H(k1 , k2 ).
Corollary 4.1. Each inequality (3.3) with H ∈ H(k1 , k2 ) admits a
set of tight points in P (a1 , a2 ) that correspond to the rows of a caterpillar
matrix.
Proposition 4.6. An inequality of the form (3.4) describes a facet
of P (a1 , a2 ) when H ∈ H(k1 , k2 ).
Proof. Using the transformation φl , this follows from Proposi-
tion 4.5.
Conversely, every caterpillar matrix of order k1 + k2 + 1 corresponds
to a facet of the form (3.3). More precisely, we have the following result.
Proposition 4.7. Let C be a caterpillar matrix of order k1 + k2 + 1
such that its first k2 columns correspond to a specific permutation of {x2j :
j ∈ K2 } and its last k1 columns correspond to a specific permutation of
{x1i : i ∈ K1 }. Then there exists a facet of P (a1 , a2 ) of the form (3.3)
such that the points corresponding to the rows of C are tight for it.
Proof. It is easy to determine the permutation Ψ that corresponds
to C, by interleaving the given permutations of {x2j : j ∈ K2 } and of
{x1i : i ∈ K1 }, according to the head and tail moves of the caterpillar.
Then, as before, we form the H of (3.3) by putting (i, j) in H if x1i ≺ x2j
in the final permutation.
It is easy to see that each row of C corresponds to a point of P (a1 , a2 )
that is tight for the resulting inequality (3.3).

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 521

5. Inequality characterization. In this section, we demonstrate


how every facet of P (a1 , a2 ) is one of the ones described in Section 4.
We do this by projecting from an extended formulation in a higher-
dimensional space. Of course this is a well-known technique (see e.g.,
[BP83, BP89, Bal01]).
Consider the system
 
y= a1i a2j δij ; (5.1)
i∈K1 j∈K2

δij ≤ x1i , for all i ∈ K1 , j ∈ K2 ; (5.2)

δij ≤ x2j , for all i ∈ K1 , j ∈ K2 ; (5.3)

δij ≥ x1i + x2j − 1 , for all i ∈ K1 , j ∈ K2 ; (5.4)

δij ≥ 0 , for all i ∈ K1 , j ∈ K2 , (5.5)

and let

Qδ (a1 , a2 ) =
 
conv x1 ∈ Rk1 , x2 ∈ Rk2 , y ∈ R, δ ∈ Rk1 ×k2 : (3.1-3.2,5.1-5.5) ,

where we use conv(X) to denote the convex hull of X. Let Q(a1 , a2 ) be


the projection of Qδ (a1 , a2 ) in the space of x1 , x2 and y variables. We next
show that Q(a1 , a2 ) is integral.
Proposition 5.1. Q(a1 , a2 ) is integral on x1 and x2 .
Proof. We will show that if p is fractional (on x1 and x2 ), then it is
not an extreme point of Q(a1 , a2 ).
Assume that p = (x1 , x2 , y) is a fractional extreme point of Q(a1 , a2 ).
For v ∈ R, let (v)+ = max{0, v}. For fixed x1 and x2 , notice that δ ∈
Rk1 ×k2 is feasible to (5.2-5.5) if an only if it satisfies min{x1i , x2j } ≥ δij ≥
(x1i + x2j − 1)+ for all i ∈ K1 , j ∈ K2 . Therefore, if we define
 
yup = a1i a2j min{x1i , x2j },
i∈K1 j∈K2

 
ydown = a1i a2j (x1i + x2j − 1)+
i∈K1 j∈K2

then the points pup = (x1 , x2 , yup ), and pdown = (x1 , x2 , ydown ) are in
Q(a1 , a2 ), and pup ≥ p ≥ pdown . Furthermore, if p is an extreme point, it
has to be one of pup and pdown .
Let K̄1 ⊆ K1 and K̄2 ⊆ K2 be the set of indices corresponding to
fractional components of x1 and x2 respectively. Clearly, K̄1 ∪ K̄2 = ∅. Let

www.it-ebooks.info
522 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

 > 0 be a small number so that 1 > xli +  > xli −  > 0 for all i ∈ K̄l ,
l = 1, 2. Define xl+ where xl+ l l+ l
i := xi +  if i ∈ K̄l and xi := xi , otherwise.
l−
Define x similarly. We consider the following two cases and show that if
p is fractional, then it can be represented as a convex combination of two
distinct points in Q(a1 , a2 ).
Case 1. Assume p = pup . Let pa = (x1+ , x2+ , y a ) and pb = (x1− , x2− , y b )
where
 
ya = a1i a2j min{x1+ 2+
i , xj },
i∈K1 j∈K2
 
yb = a1i a2j min{x1− 2−
i , xj },
i∈K1 j∈K2

and note that pa , pb ∈ Q(a1 , a2 ) and pa = pb .


For i ∈ K1 and j ∈ K2 , let δij = min{x1i , x2j }, and define δija b
and δij
  1 2
similarly. Note that y = i∈K1 j∈K2 ai aj δij . Due to the construction,
if min{x1i , x2j } = 0, then we have min{x1+ 2+ 1− 2−
i , xj } = min{xi , xj } = 0,
a b
and therefore δij = δij = δij = 0. Similarly, if δij = 1, then we have
a b a
δij = δij = 1 as well. On the other hand, if δij ∈ {0, 1}, then δij = δij + 
and δij = δij − . Therefore, δ + δ = 2δ and p = (1/2)p + (1/2)pb .
b a b a

Case 2. Assume p = pdown . Let pc = (x1+ , x2− , y c ) and pd =


(x1− , x2+ , y d ) where
 
yc = a1i a2j (x1+ 2−
i + xj − 1) ,
+

i∈K1 j∈K2
 
yd = a1i a2j (x1− 2+ +
i + xj − 1) ,
i∈K1 j∈K2

and note that pc , pd ∈ Q(a1 , a2 ) and pc = pd .


Let δij = (x1i + x2j − 1)+ for
i ∈ K1 and j ∈ K2 , and define δij c d
and δij
similarly. Note that y = i∈K1 j∈K2 δij .
If min{x1i , x2j } = 0, then min{x1+ 2− 1− 2+
i , xj } = min{xi , xj } = 0 and
δij = δij = δij = 0. If xi = 1, then δij = xj , implying δij = x2−
c d 1 2 c
j and
d 2+ 2 1 c 1+
δij = xj . Similarly, if xi = 1, then δij = xj , implying δij = xj and
d
δij = x1− 1 2 c d
j . Finally, if 1 > xi , xj > 0, then δij = δij = δij . Therefore,
δ c + δ d = 2δ and p = (1/2)pc + (1/2)pd .
Now, let R(a1 , a2 ) be the real solution set of (3.1–3.4), and note that
P (a , a2 ) ⊆ R(a1 , a2 ). To prove that P (a1 , a2 ) = R(a1 , a2 ), we will first
1

argue that Q(a1 , a2 ) ⊆ P (a1 , a2 ), and then we will show that R(a1 , a2 ) ⊆
Q(a1 , a2 ).
Proposition 5.2. Q(a1 , a2 ) ⊆ P (a1 , a2 ).
Proof. As P (a1 , a2 ) is a bounded convex set, it is sufficient to show
that all of the extreme points of Q(a1 , a2 ) are contained in P (a1 , a2 ). Using

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 523

Proposition 5.1 and its proof, we know that if p = (x1 , x2 , y) is an extreme


point of Q(a1 , a2 ), then x1 and x2 are integral and
   
a1i a2j min{x1i , x2j } ≥ y ≥ a1i a2j (x1i + x2j − 1)+ .
i∈K1 j∈K2 i∈K1 j∈K2

Notice
 that  for any u, v ∈ {0, 1}, min{u, v} = (u + v − 1)+ = uv. Therefore,
y = i∈K1 j∈K2 a1i a2j x1i x2j , and p ∈ P (a1 , a2 ).
Proposition 5.3. R(a1 , a2 ) ⊆ Q(a1 , a2 ).
Proof. Assume not, and let p = (x1 , x2 , y) ∈ R(a1 , a2 ) \ Q(a1 , a2 ).
As in the proof of Proposition let pup = (x1 , x2 , yup ) and pdown =
 5.1, 
(x , x , ydown ), where yup = i∈K1 j∈K2 a1i a2j min{x1i , x2j } and ydown =
1 2
  1 2 1 2 + 1 2
i∈K1 j∈K2 ai aj (xi + xj − 1) . Note that, pup , pdown ∈ Q(a , a ). We
next show that yup ≥ y ≥ ydown , and therefore p ∈ conv{pup , pdown } ⊆
Q(a1 , a2 ).
Let H1 = {(i, j) ∈ K1 × K2 : x1i > x2j } and H2 = {(i, j) ∈ K1 × K2 :
xj + x1i > 1}. Applying (3.4) with H = H1 gives
2

  
y≤ a1i a2j x1i + a1i a2j (x2j − x1i )
i∈K1 j∈K2 (i,j)∈H1

   
= a1i a2j x1i + a1i a2j min{0, x2j − x1i }
i∈K1 j∈K2 i∈K1 j∈K2

 
= a1i a2j (x1i + min{0, x2j − x1i })
i∈K1 j∈K2

 
= a1i a2j min{x1i , x2j } = yup .
i∈K1 j∈K2

Applying (3.3) with H = H2 gives



y≥ a1i a2j (x1i + x2j − 1)
(i,j)∈H2

 
= a1i a2j max{0, x1i + x2j − 1} = ydown .
i∈K1 j∈K2

As a consequence, we have our main theorem:


Theorem 5.4. P (a1 , a2 ) = R(a1 , a2 ) = Q(a1 , a2 ).
Although our main goal was to establish the inequality description
(3.1-3.4) of P (a1 , a2 ), we have established that from a mathematical point
of view, the extended formulation (5.1-5.5) has the same power as a de-
scription. Which formulation will be preferable in an application will likely
depend on implementation details.

www.it-ebooks.info
524 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

6. Separation. We can efficiently include all facet describing inequal-


ities of P (a1 , a2 ) implicitly in a linear programming formulation, provided
that we can separate on them in polynomial time (see [GLS84, GLS81,
GLS93]). That is, provided we have a polynomial-time algorithm that de-
termines whether a given point is in P (a1 , a2 ) and provides a violated facet
describing inequality if the point is not in P (a1 , a2 ).
Separation for the simple lower and upper bound inequalities (3.1–3.2)
is easily handled by enumeration. For a point (x1 , x2 , y) satisfying (3.1–
3.2), separation for the lower and upper bound inequalities (3.3–3.4) is also
rather simple. For the lower bound inequalities (3.3), we simply let
 
H0 = (i, j) ∈ K1 × K2 : x1i + x2j > 1 ,

and then we just check whether (x1 , x2 , y) violates the lower bound in-
equality (3.3) for the choice of H = H0 . Similarly, for the upper bound
inequalities (3.4), we let
 
H0 = (i, j) ∈ Kl × Kl̄ : xli − xl̄j < 0 ,

and then we just check whether (x1 , x2 , y) violates the upper bound in-
equality (3.4) for the choice of H = H0 . Note that for any H ⊆ K1 × K2 ,
 
a1i a2j (x1i + x2j − 1) ≤ a1i a2j (x1i + x2j − 1).
(i,j)∈H (i,j)∈H0

Therefore, (x1 , x2 , y) satisfies the lower bounds (3.3) for all sets H ⊆ K1 ×
K2 if and only if it satisfies (3.3) for H = H0 .
Using Propositions 4.5 and 4.6, we can see how this separation method
yields facet describing violated inequalities. We develop a permutation of
the variables

{x1i : i ∈ K1 } ∪ {x2j : j ∈ K2 } ,

according to their values. Let δ1 > δ2 > · · · > δp denote the distinct
values in {x1i : i ∈ K1 }. For convenience, let δ0 = 1 and δp+1 = 0. We
define the partition via 2p + 1 blocks, some of which may be empty. For,
t = 1, 2, . . . , p, block 2t consists of

{x1i : i ∈ K1 , x1i = δt } .

For, t = 1, 2, . . . , p, block 2t + 1 consists of

{x2j : j ∈ K2 , 1 − δt < x2j ≤ 1 − δt+1 } ,

and block 1 consists of

{x2j : j ∈ K2 , 1 − δ0 = 0 ≤ x2j ≤ 1 − δt+1 }.

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 525

This permutation of the variables determines a subset H of K1 × K2 as


described in Section 4. This choice of H yields a facet-describing lower-
bound inequality (3.3).
Similarly, for the upper bound inequalities (3.4), we let δ1 < δ2 <
· · · < δp denote the distinct values of the {xli : i ∈ Kl }. As before, let
δ0 = 0 and δp+1 = 1, and we define a partition via 2p + 1 blocks, some of
which may be empty. For, t = 1, 2, . . . , p, block 2t consists of

{xli : i ∈ Kl , xli = δt } .

For, t = 0, 1, 2, . . . , p, block 2t + 1 consists of

{xl̄j : j ∈ Kl̄ , δt < xl̄j ≤ δt+1 } .

This permutation of the variables determines a subset H of K1 × K2 as


described in Section 4. This choice of H yields a facet describing upper
bound inequality (3.4).
Our separation algorithm can easily be implemented with running time
O(k1 k2).
7. Ideal points. For many combinatorial polytopes, it is natural to
investigate adjacency of extreme points via edges (i.e., 1-dimensional faces).
One motivation is that this notion of adjacency may prove useful in some
local-search heuristics. In this section, we investigate a different notion of
adjacency for extreme points — one that seems more natural for P (a1 , a2 )
as it directly captures the product structure.
The point (x1 , x2 , y) ∈ P (a1 , a2 ) is ideal if it satisfies (1.5). Clearly,
the extreme points of P (a1 , a2 ) are ideal. Also, P (a1 , a2 ) contains points
that are not ideal. For example,
1 1
(x11 , x21 , y) = ( 12 , 12 , 12 ) = (0, 0, 0) + (1, 1, 1)
2 2
is in P ((1), (1)), but it is not ideal because
  1 1 1 1
a1i a2j x1i x2j = 1 · 1 · · = = .
2 2 4 2
i∈K1 j∈K2

Proposition 7.1. Every pair of distinct extreme points of P (a1 , a2 )


is connected by a curve of ideal points of P (a1 , a2 ). Moreover, such a
curve can be taken to be either a line segment or two line segments joined
at another extreme point of P (a1 , a2 ).
Proof. Let (x11 , x12 , y 11 ) and (x21 , x22 , y 22 ) be a pair of distinct ex-
treme points of P (a1 , a2 ). If x11 = x21 then we consider the curve obtained
by letting

(z1 , z2 , y) = λ(x11 , x12 , y 11 ) + (1 − λ)(x21 , x22 , y 22 ),

www.it-ebooks.info
526 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

as λ ranges between 0 and 1. For λ = 1 we have (z1 , z2 , y) = (x11 , x12 , y 11 ),


and for λ = 0 we have (z1 , z2 , y) = (x21 , x22 , y 22 ). Clearly, the curve is a
line segment entirely contained in the convex polytope P (a1 , a2 ), because
we have defined each point on the curve as a convex combination of the
pair of extreme points of P . So it remains to demonstrate that each point
on the curve is ideal:
 
a1i a2j zi1 zj2
i∈K1 j∈K2
  . / . 12 /
= a1i a2j λx11 21
i + (1 − λ)xi · λxj + (1 − λ)x22
j
i∈K1 j∈K2
  . 12 /
= a1i a2j x11
i λxj + (1 − λ)x22
j
i∈K1 j∈K2
 
= a1i a2j λx11 12 21 22
i xj + (1 − λ)xi xj
i∈K1 j∈K2

= λy 11 + (1 − λ)y 22
=y .

Therefore the points on the curve are ideal.


Similarly, if x12 = x22 , we use the same line segment above to connect
the points.
Suppose now that x11 = x21 and x12 = x22 . We define a third point
(x , x22 , y 12 ), where
11

 
y 12 = a1i a2j x11 22
i xj .
i∈K1 j∈K2

Then we connect this third point, in the manner above, to each of the
points of the original pair.
The curve of ideal points given to us by Proposition 7.1 is entirely
contained in a 2-dimensional polytope, but it is not smooth in general. By
allowing the curve to be contained in a 3-dimensional polytope, we can
construct a smooth curve of ideal points connecting each pair of extreme
points of P (a1 , a2 ).
Proposition 7.2. Every pair of extreme points of P (a1 , a2 ) is con-
nected by a smooth curve of ideal points of P (a1 , a2 ).
Proof. Let (x11 , x12 , y 11 ) and (x21 , x22 , y 22 ) be a pair of distinct ex-
treme points of P (a1 , a2 ). Our goal is to connect these points with a smooth
curve of ideal points of P (a1 , a2 ). Toward this end, we consider two other
points of P (a1 , a2 ), (x11 , x22 , y 12 ) and (x21 , x12 , y 21 ), which we obtain from
the original pair by letting
 
y 12 = a1i a2j x11
i xj
22

i∈K1 j∈K2

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 527

and
 
y 21 = a1i a2j x21 12
i xj .
i∈K1 j∈K2

Now, we consider the curve obtained by letting

(z1 , z2 , y) = λ2 (x11 , x12 , y 11 ) + (1 − λ)2 (x21 , x22 , y 22 )


+λ(1 − λ)(x11 , x22 , y 12 ) + λ(1 − λ)(x21 , x12 , y 21 ) ,

as λ ranges between 0 and 1. For λ = 1 we have (z1 , z2 , y) = (x11 , x12 , y 11 ),


and for λ = 0 we have (z1 , z2 , y) = (x21 , x22 , y 22 ). Clearly, the curve
is entirely contained in the convex polytope P (a1 , a2 ), because we have
defined each point on the curve as a convex combination of extreme points
of P . So it remains to demonstrate that each point on the curve is ideal:
 
a1i a2j zi1 zj2
i∈K1 j∈K2
  . /
= a1i a2j λ2 x11 2 21 11 21
i + (1 − λ) xi + λ(1 − λ)xi + λ(1 − λ)xi
i∈K1 j∈K2
. /
· λ2 x12 2 22 22
j + (1 − λ) xj + λ(1 − λ)xj + λ(1 − λ)xj
12
  . / . 12 /
= a1i a2j λx11 21
i + (1 − λ)xi · λxj + (1 − λ)x22
j
i∈K1 j∈K2
   
= λ2 a1i a2j x11 12
i xj + (1 − λ)
2
a1i a2j x21 22
i xj
i∈K1 j∈K2 i∈K1 j∈K2
   
+λ(1 − λ) a1i a2j x11 22
i xj + λ(1 − λ) a1i a2j x21 22
i xj
i∈K1 j∈K2 i∈K1 j∈K2

= λ2 y 11 + (1 − λ)2 y 22 + λ(1 − λ)y 12 + λ(1 − λ)y 21


=y .

Therefore the points on the curve are ideal.

8. A generalization. Although we do not have an application for


this, our results generalize. Let A be a k1 × k2 matrix with positive com-
ponents, and let P (A) be the convex hull of solutions of
 
y= aij x1i x2j ;
i∈K1 j∈K2

xli ∈ {0, 1}, for i ∈ Kl , l = 1, 2 .

The reader can easily check that everything that we have done applies to
P (A) by making the substitution of a1i a2j by aij throughout.

www.it-ebooks.info
528 OKTAY GÜNLÜK, JON LEE, AND JANNY LEUNG

Acknowledgments. The authors are extremely grateful to Don Cop-


persmith who was a key participant in earlier versions of this work (see
[CGLL03, CLL99]). The authors are grateful to Komei Fukuda for making
his program cdd available. Evidence collected with the use of cdd led us
to conjecture some of our results. The research of Jon Lee was supported
in part by the Department of Systems Engineering and Engineering Man-
agement, Chinese University of Hong Kong. The work of Janny Leung was
partially supported by the Hong Kong Research Grants Council.

REFERENCES

[AFLS01] Kim Allemand, Komei Fukuda, Thomas M. Liebling, and Erich


Steiner. A polynomial case of unconstrained zero-one quadratic op-
timization. Math. Program., 91(1, Ser. A):49–52, 2001.
[Bal01] Egon Balas. Projection and lifting in combinatorial optimization. In
Computational combinatorial optimization (Schloß Dagstuhl, 2000),
Volume 2241 of Lecture Notes in Comput. Sci., pages 26–56. Springer,
Berlin, 2001.
[BB98] Tamas Badics and Endre Boros. Minimization of half-products. Math.
Oper. Res., 23(3):649–660, 1998.
[BP83] Egon Balas and William R. Pulleyblank. The perfectly matchable
subgraph polytope of a bipartite graph. Networks, 13(4):495–516,
1983.
[BP89] Egon Balas and William R. Pulleyblank. The perfectly matchable
subgraph polytope of an arbitrary graph. Combinatorica, 9(4):321–
337, 1989.
[CGLL03] Don Coppersmith, Oktay Günlük, Jon Lee, and Janny Leung. A
polytope for a product of real linear functions in 0/1 variables, 2003.
Manuscript, November 2003.
[CK05] Jinliang Cheng and Wieslaw Kubiak. A half-product based approxi-
mation scheme for agreeably weighted completion time variance. Eu-
ropean J. Oper. Res., 162(1):45–54, 2005.
[ÇKM06] Eranda Çela, Bettina Klinz, and Christophe Meyer. Polynomially
solvable cases of the constant rank unconstrained quadratic 0-1 pro-
gramming problem. J. Comb. Optim., 12(3):187–215, 2006.
[CLL99] Don Coppersmith, Jon Lee, and Janny Leung. A polytope for a prod-
uct of real linear functions in 0/1 variables, 1999. IBM Research
Report RC21568, September 1999.
[DL97] Michel Deza and Monique Laurent. Geometry of cuts and metrics.
Springer-Verlag, Berlin, 1997.
[DS90] Caterina De Simone. The cut polytope and the Boolean quadric poly-
tope. Discrete Math., 79(1):71–75, 1989/90.
[FFL05] Jean-Albert Ferrez, Komei Fukuda, and Thomas M. Liebling. Solv-
ing the fixed rank convex quadratic maximization in binary variables
by a parallel zonotope construction algorithm. European J. Oper.
Res., 166(1):35–50, 2005.
[GLS81] Martin Grötschel, László Lovász, and Alexander Schrijver. The
ellipsoid method and its consequences in combinatorial optimization.
Combinatorica, 1(2):169–197, 1981.
[GLS84] Martin Grötschel, László Lovász, and Alexander Schrijver. Cor-
rigendum to our paper: “The ellipsoid method and its consequences
in combinatorial optimization”. Combinatorica, 4(4):291–295, 1984.

www.it-ebooks.info
A POLYTOPE FOR PRODUCTS 529

[GLS93] Martin Grötschel, László Lovász, and Alexander Schrijver. Ge-


ometric algorithms and combinatorial optimization. Springer-Verlag,
Berlin, second edition, 1993.
[HHPR02] Peter L. Hammer, Pierre Hansen, Panos M. Pardalos, and David J.
Rader, Jr. Maximizing the product of two linear functions in 0-1
variables. Optimization, 51(3):511–537, 2002.
[HK56] Alan J. Hoffman and Joseph B. Kruskal. Integral boundary points of
convex polyhedra. In Linear inequalities and related systems, pages
223–246. Princeton University Press, Princeton, N. J., 1956. Annals
of Mathematics Studies, no. 38.
[JKKW05] Adam Janiak, Mikhail Y. Kovalyov, Wieslaw Kubiak, and Frank
Werner. Positive half-products and scheduling with controllable pro-
cessing times. European J. Oper. Res., 165(2):416–422, 2005.
[Kub05] Wieslaw Kubiak. Minimization of ordered, symmetric half-products. Dis-
crete Appl. Math., 146(3):287–300, 2005.
[Lee07] Jon Lee. In situ column generation for a cutting-stock problem. Com-
puters and Operations Research, 34(8):2345–2358, 2007.
[NW88] George L. Nemhauser and Laurence A. Wolsey. Integer and com-
binatorial optimization. Wiley-Interscience Series in Discrete Mathe-
matics and Optimization. John Wiley & Sons Inc., New York, 1988.
A Wiley-Interscience Publication.
[Pad89] Manfred W. Padberg. The Boolean quadric polytope: Some characteris-
tics, facets and relatives. Math. Programming, Ser. B, 45(1):139–172,
1989.
[Pit91] Itamar Pitowsky. Correlation polytopes: their geometry and complexity.
Math. Programming, 50(3, (Ser. A)):395–414, 1991.

www.it-ebooks.info
www.it-ebooks.info
PART VIII:
Complexity

www.it-ebooks.info
www.it-ebooks.info
ON THE COMPLEXITY OF NONLINEAR
MIXED-INTEGER OPTIMIZATION
MATTHIAS KÖPPE∗

Abstract. This is a survey on the computational complexity of nonlinear mixed-


integer optimization. It highlights a selection of important topics, ranging from incom-
putability results that arise from number theory and logic, to recently obtained fully
polynomial time approximation schemes in fixed dimension, and to strongly polynomial-
time algorithms for special cases.

1. Introduction. In this survey we study the computational com-


plexity of nonlinear mixed-integer optimization problems, i.e., models of
the form

max/min f (x1 , . . . , xn )
s.t. g1 (x1 , . . . , xn ) ≤ 0
.. (1.1)
.
gm (x1 , . . . , xn ) ≤ 0
x ∈ Rn1 × Zn2 ,

where n1 + n2 = n and f, g1 , . . . , gm : Rn → R are arbitrary nonlinear


functions.
This is a very rich topic. From the very beginning, questions such
as how to present the problem to an algorithm, and, in view of possible
irrational outcomes, what it actually means to solve the problem need to
be answered. Fundamental intractability results from number theory and
logic on the one hand and from continuous optimization on the other hand
come into play. The spectrum of theorems that we present ranges from
incomputability results, to hardness and inapproximability theorems, to
classes that have efficient approximation schemes, or even polynomial-time
or strongly polynomial-time algorithms.
We restrict our attention to deterministic algorithms in the usual bit
complexity (Turing) model of computation. Some of the material in the
present survey also appears in [31]. For an excellent recent survey focusing
on other aspects of the complexity of nonlinear optimization, including
the performance of oracle-based models and combinatorial settings such as
nonlinear network flows, we refer to Hochbaum [34]. We also do not cover
the recent developments by Onn et al. [11–13,21–23,32,44,45] in the context
of discrete convex optimization, for which we refer to the monograph [53].
Other excellent sources are [16] and [55].

∗ Dept. of Mathematics, University of California, Davis, One Shields Avenue, Davis,

CA 95616, USA ([email protected]).

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 533
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_19,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
534 MATTHIAS KÖPPE

2. Preliminaries.
2.1. Presentation of the problem. We restrict ourselves to a model
where the problem is presented explicitly. In most of this survey, the func-
tions f and gi will be polynomial functions presented in a sparse encoding,
where all coefficients are rational (or integer) and encoded in the binary
scheme. It is useful to assume that the exponents of monomials are given
in the unary encoding scheme; otherwise already in very simple cases the
results of function evaluations will have an encoding length that is expo-
nential in the input size.
In an alternative model, the functions are presented by oracles, such
as comparison oracles or evaluation oracles. This model permits to handle
more general functions (not just polynomials), and on the other hand it is
very useful to obtain hardness results.
2.2. Encoding issues for solutions. When we want to study the
computational complexity of these optimization problems, we first need to
discuss how to encode the input (the data of the optimization problem) and
the output (an optimal solution if it exists). In the context of linear mixed-
integer optimization, this is straightforward: Seldom are we concerned with
irrational objective functions or constraints; when we restrict the input to
be rational as is usual, then also optimal solutions will be rational.
This is no longer true even in the easiest cases of nonlinear optimiza-
tion, as can be seen on the following quadratically constrained problem in
one continuous variable:

max f (x) = x4 s.t. x2 ≤ 2.



Here the unique optimal solution is irrational (x∗ = 2, with f (x∗ ) = 4),
and so it does not have a finite binary encoding. We ignore here the
possibilities of using a model of computation and complexity over the real
numbers, such as the celebrated Blum–Shub–Smale model [14]. In the
familiar Turing model of computation, we need to resort to approximations.
In the example above it is clear that for every  > 0, there exists a ra-
tional x that is a feasible solution for the problem and satisfies |x − x∗ | < 
(proximity to the optimal solution) or |f (x) − f (x∗ )| <  (proximity to the
optimal value). However, in general we cannot expect to find approxima-
tions by feasible solutions, as the following example shows.

max f (x) = x s.t. x3 − 2x = 0.



(Again, the optimal solution is x = 2, but the closest rational feasible
solution is x = 0.) Thus, in the general situation, we will have to use the
following notion of approximation:
Definition 2.1. An algorithm A is said to efficiently approximate
an optimization problem if, for every value of the input parameter  > 0,
it returns a rational vector x (not necessarily feasible) with x − x∗  ≤ ,

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 535

where x∗ is an optimal solution, and the running time of A is polynomial


in the input encoding of the instance and in log 1/.
2.3. Approximation algorithms and schemes. The polynomial
dependence of the running time in log 1/, as defined above, is a very strong
requirement. For many problems, efficient approximation algorithms of
this type do not exist, unless P = NP. The following, weaker notions of
approximation are useful; here it is common to ask for the approximations
to be feasible solutions, though.
Definition 2.2.
(a) An algorithm A is an -approximation algorithm for a maximization
problem with optimal cost fmax , if for each instance of the problem
of encoding length n, A runs in polynomial time in n and returns a
feasible solution with cost fA , such that

fA ≥ (1 − ) · fmax . (2.1)

(b) A family of algorithms A is a polynomial time approximation scheme


(PTAS) if for every error parameter  > 0, A is an -approximation
algorithm and its running time is polynomial in the size of the instance
for every fixed .
(c) A family {A } of -approximation algorithms is a fully polynomial
time approximation scheme (FPTAS) if the running time of A is poly-
nomial in the encoding size of the instance and 1/.
These notions of approximation are the usual ones in the domain of
combinatorial optimization. It is clear that they are only useful when the
function f (or at least the maximal value fmax ) are non-negative. For
polynomial or general nonlinear optimization problems, various authors
[9, 17, 65] have proposed to use a different notion of approximation, where
we compare the approximation error to the range of the objective function
on the feasible region,

fA − fmax ≤  fmax − fmin . (2.2)

(Here fmin denotes the minimal value of the function on the feasible re-
gion.) It enables us to study objective functions that are not restricted to
be non-negative on the feasible region. In addition, this notion of approxi-
mation is invariant under shifting of the objective function by a constant,
and under exchanging minimization and maximization. On the other hand,
it is not useful for optimization problems that have an infinite range. We
remark that, when the objective function can take negative values on the
feasible region, (2.2) is weaker than (2.1). We will call approximation al-
gorithms and schemes with respect to this notion of approximation weak.
This terminology, however, is not consistent in the literature; [16], for in-
stance, uses the notion (2.2) without an additional attribute and instead
reserves the word weak for approximation algorithms and schemes that give
a guarantee on the absolute error:

www.it-ebooks.info
536 MATTHIAS KÖPPE

fA − fmax ≤ . (2.3)

3. Incomputability. Before we can even discuss the computational


complexity of nonlinear mixed-integer optimization, we need to be aware
of fundamental incomputability results that preclude the existence of any
algorithm to solve general integer polynomial optimization problems.
Hilbert’s tenth problem asked for an algorithm to decide whether
a given multivariate polynomial p(x1 , . . . , xn ) has an integer root, i.e.,
whether the Diophantine equation

p(x1 , . . . , xn ) = 0, x1 , . . . , xn ∈ Z (3.1)

is solvable. It was answered in the negative by Matiyasevich [48], based


on earlier work by Davis, Putnam, and Robinson; see also [49]. A short
self-contained proof, using register machines, is presented in [39].
Theorem 3.1.
(i) There does not exist an algorithm that, given polynomials p1 , . . . , pm ,
decides whether the system pi (x1 , . . . , xn ) = 0, i = 1, . . . , m, has a
solution over the integers.
(ii) There does not exist an algorithm that, given a polynomial p, decides
whether p(x1 , . . . , xn ) = 0 has a solution over the integers.
(iii) There does not exist an algorithm that, given a polynomial p, decides
whether p(x1 , . . . , xn ) = 0 has a solution over the non-negative inte-
gers Z+ = {0, 1, 2, . . . }.
(iv) There does not exist an algorithm that, given a polynomial p, decides
whether p(x1 , . . . , xn ) = 0 has a solution over the natural numbers
N = {1, 2, . . . }.
These three variants of the statement are easily seen to be equivalent.
The solvability of the system pi (x1 , . . . , xn ) = 0, i = 1, . . . , m, is equivalent
to the solvability of m 2
i=1 pi (x1 , . . . , xn ) = 0. Also, if (x1 , . . . , xn ) ∈ Z
n

is a solution of p(x1 , . . . , xn ) = 0 over the integers, then by splitting vari-


ables into their positive and negative parts, yi = max{0, xi } and zi =
max{0, −xi}, clearly (y1 , z1 ; . . . ; yn , zn ) is a non-negative integer solution of
the polynomial equation q(y1 , z1 ; . . . ; yn , zn ) := p(y1 − z1 , . . . , yn − zn ) = 0.
(A construction with only one extra variable is also possible: Use the non-
negative variables w = max{|xi | : xi < 0} and yi := xi + w.) In the other
direction, using Lagrange’s four-square theorem, any non-negative integer x
can be represented as the sum a2 + b2 + c2 + d2 with integers a, b, c, d. Thus,
if (x1 , . . . , xn ) ∈ Zn+ is a solution over the non-negative integers, then there
exists a solution (a1 , b1 , c1 , d1 ; . . . ; an , bn , cn , dn ) of the polynomial equation
r(a1 , b1 , c1 , d1 ; . . . ; an , bn , cn , dn ) := p(a21 +b21 +c21 +d21 , . . . , a2n +b2n +c2n +d2n ).
The equivalence of the statement with non-negative integers and the one
with the natural numbers follows from a simple change of variables.
Sharper statements of the above incomputability result can be found
in [38]. All incomputability statements appeal to the classic result by Tur-
ing [64] on the existence of recursively enumerable (or listable) sets of natu-

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 537

ral numbers that are not recursive, such as the halting problem of universal
Turing machines.
Theorem 3.2. For the following universal pairs (ν, δ)

(58, 4), . . . , (38, 8), . . . , (21, 96), . . . , (14, 2.0 × 105 ), . . . , (9, 1.638 × 1045 ),

there exists a universal polynomial U (x; z, u, y; a1, . . . , aν ) of degree δ in


4 + ν variables, i.e., for every recursively enumerable (listable) set X there
exist natural numbers z, u, y, such that

x∈X ⇐⇒ ∃a1 , . . . , aν ∈ N : U (x; z, u, y; a1 , . . . , aν ) = 0.

Jones explicitly constructs these universal polynomials, using and extend-


ing techniques by Matiyasevich. Jones also constructs an explicit system of
quadratic equations in 4 + 58 variables that is universal in the same sense.
The reduction of the degree, down to 2, works at the expense of introducing
additional variables; this technique goes back to Skolem [62].
In the following, we highlight some of the consequences of these re-
sults. Let U be a universal polynomial corresponding to a universal pair
(ν, δ), and let X be a recursively enumerable set that is not recursive, i.e.,
there does not exist any algorithm (Turing machine) to decide whether
a given x is in X. By the above theorem, there exist natural numbers
z, u, y such that x ∈ X holds if and only if the polynomial equation
U (x; z, u, y; a1 , . . . , aν ) = 0 has a solution in natural numbers a1 , . . . , aν
(note that x and z, u, y are fixed parameters here). This implies:
Theorem 3.3.
(i) Let (ν, δ) be any of the universal pairs listed above. Then there does
not exist any algorithm that, given a polynomial p of degree at most δ
in ν variables, decides whether p(x1 , . . . , xn ) = 0 has a solution over
the non-negative integers.
(ii) In particular, there does not exist any algorithm that, given a polyno-
mial p in at most 9 variables, decides whether p(x1 , . . . , xn ) = 0 has
a solution over the non-negative integers.
(iii) There also does not exist any algorithm that, given a polynomial p in
at most 36 variables, decides whether p(x1 , . . . , xn ) = 0 has a solution
over the integers.
(iv) There does not exist any algorithm that, given a polynomial p of degree
at most 4, decides whether p(x1 , . . . , xn ) = 0 has a solution over the
non-negative integers (or over the integers).
(v) There does not exist any algorithm that, given a system of quadratic
equations in at most 58 variables, decides whether it has a solution of
the non-negative integers.
(vi) There does not exist any algorithm that, given a system of quadratic
equations in at most 232 variables, decides whether it has a solution
of the integers.

www.it-ebooks.info
538 MATTHIAS KÖPPE

We remark that the bounds of 4 × 9 = 36 and 4 × 58 = 232 are most


probably not sharp; they are obtained by a straightforward application of
the reduction using Lagrange’s theorem.
For integer polynomial optimization, this has the following fundamen-
tal consequences. First of all, Theorem 3.3 can be understood as a state-
ment on the feasibility problem of an integer polynomial optimization prob-
lem. Thus, the feasibility of an integer polynomial optimization problem
with a single polynomial constraint in 9 non-negative integer variables or
36 free integer variables is undecidable, etc.
If we wish to restrict our attention to feasible optimization problems,
we can consider the problem of minimizing p2 (x1 , . . . , xn ) over the inte-
gers or non-negative integers and conclude that unconstrained polynomial
optimization in 9 non-negative integer or 36 free integer variables is unde-
cidable. We can also follow Jeroslow [37] and associate with an arbitrary
polynomial p in n variables the optimization problem
min u
s.t. (1 − u) · p(x1 , . . . , xn ) = 0,
u ∈ Z+ , x ∈ Zn+ .
This optimization problem is always feasible and has the optimal solution
value 0 if and only if p(x1 , . . . , xn ) = 0 is solvable, and 1 otherwise. Thus,
optimizing linear forms over one polynomial constraint in 10 non-negative
integer variables is incomputable, and similar statements can be derived
from the other universal pairs above. Jeroslow [37] used the above program
and a degree reduction (by introducing additional variables) to prove the
following.
Theorem 3.4. The problem of minimizing a linear form over
quadratic inequality constraints in integer variables is not computable; this
still holds true for the subclass of problems that are feasible, and where the
minimum value is either 0 or 1.
This statement can be strengthened by giving a bound on the number
of integer variables.
4. Hardness and inapproximability. All incomputability results,
of course, no longer apply when finite bounds for all variables are known;
in this case, a trivial enumeration approach gives a finite algorithm. This
is immediately the case when finite bounds for all variables are given in the
problem formulation, such as for 0-1 integer problems.
For other problem classes, even though finite bounds are not given, it is
possible to compute such bounds that either hold for all feasible solutions or
for an optimal solution (if it exists). This is well-known for the case of linear
constraints, where the usual encoding length estimates of basic solutions
[26] are available. As we explain in section 6.2 below, such finite bounds
can also be computed for convex and quasi-convex integer optimization
problems.

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 539

In other cases, algorithms to decide feasibility exist even though no


finite bounds for the variables are known. An example is the case of single
Diophantine equations of degree 2, which are decidable using an algorithm
by Siegel [61]. We discuss the complexity of this case below.
Within any such computable subclass, we can ask the question of the
complexity. Below we discuss hardness results that come from the number
theoretic side of the problem (section 4.1) and those that come from the
continuous optimization side (section 4.2.
4.1. Hardness results from quadratic Diophantine equations
in fixed dimension. The computational complexity of single quadratic
Diophantine equations in 2 variables is already very interesting and rich in
structure; we refer to to the excellent paper by Lagarias [43]. Below we
discuss some of these aspects and their implications on optimization.
Testing primality of a number N is equivalent to deciding feasibility
of the equation
(x + 2)(y + 2) = N (4.1)
over the non-negative integers. Recently, Agrawal, Kayal, and Saxena [2]
showed that primality can be tested in polynomial time. However, the
complexity status of finding factors of a composite number, i.e., finding a
solution (x, y) of (4.1), is still unclear.
The class also contains subclasses of NP-complete feasibility problems,
such as the problem of deciding for given α, β, γ ∈ N whether there exist
x1 , x2 ∈ Z+ with αx21 + βx2 = γ [47]. On the other hand, the problem of
deciding for given a, c ∈ N whether there exist x1 , x2 ∈ Z with ax1 x2 +x2 =
c, lies in NP \ coNP unless NP = coNP [1].
The feasibility problem of the general class of quadratic Diophantine
equations in two (non-negative) variables was shown by Lagarias [43] to
be in NP. This is not straightforward because minimal solutions can have
an encoding size that is exponential in the input size. This can be seen
in the case of the so-called anti-Pellian equation x2 − dy 2 = −1. Here
Lagarias [42] proved that for all d = 52n+1 , there exists a solution, and the
solution with minimal binary encoding length has an encoding length of
Ω(5n ) (while the input is of encoding length Θ(n)). (We remark that the
special case of the anti-Pellian equation is in coNP, as well.)
Related hardness results include the problem of quadratic congruences
with a bound, i.e., deciding for given a, b, c ∈ N whether there exists a pos-
itive integer x < c with x2 ≡ a (mod b); this is the NP-complete problem
AN1 in [25].
From these results, we immediately get the following consequences on
optimization.
Theorem 4.1.
(i) The feasibility problem of quadratically constrained problems in n = 2
integer variables is NP-complete.

www.it-ebooks.info
540 MATTHIAS KÖPPE

(ii) The problems of computing a feasible (or optimal) solution of quadrat-


ically constrained problems in n = 2 integer variables is not
polynomial-time solvable (because the output may require exponential
space).
(iii) The feasibility problem of quadratically constrained problems in n > 2
integer variables is NP-hard (but it is unknown whether it is in NP).
(iv) The problem of minimizing a degree-4 polynomial over the lattice
points of a convex polygon (dimension n = 2) is NP-hard.
(v) The problem of finding the minimal value of a degree-4 polynomial
over Z2+ is NP-hard; writing down an optimal solution cannot be done
in polynomial time.
However, the complexity of minimizing a quadratic form over the in-
teger points in polyhedra of fixed dimension is unclear, even in dimen-
sion n = 2. Consider the integer convex minimization problem
min αx21 + βx2 ,
s.t. x1 , x2 ∈ Z+
for α, β ∈ N. Here an optimal solution can be obtained efficiently, as we
explain in section 6.2; in fact, clearly x1 = x2 = 0 is the unique optimal
solution. On the other hand, the problem whether there exists a point
(x1 , x2 ) of a prescribed objective value γ = αx21 + βx2 is NP-complete (see
above). For indefinite quadratic forms, even in dimension 2, nothing seems
to be known.
In varying dimension, the convex quadratic maximization case, i.e.,
maximizing positive definite quadratic forms is an NP-hard problem. This
is
 even true in very restricted settings such as the problem to maximize
2
i (w i x) over x ∈ {0, 1}n [53].
4.2. Inapproximability of nonlinear optimization in varying
dimension. Even in the pure continuous case, nonlinear optimization is
known to be hard. Bellare and Rogaway [9, 10] proved the following inap-
proximability results using the theory of interactive proof systems.
Theorem 4.2. Assume that P = NP.
(i) For any  < 13 , there does not exist a polynomial-time weak -approxi-
mation algorithm for the problem of (continuous) quadratic program-
ming over polytopes.
(ii) There exists a constant δ > 0 such that the problem of polynomial
programming over polytopes does not have a polynomial-time weak
(1 − n−δ )-approximation algorithm.
Here the number 1 − n−δ becomes arbitrarily close to 0 for growing n;
note that a weak 0-approximation algorithm is one that gives no guarantee
other than returning a feasible solution.
Inapproximability still holds for the special case of minimizing a qua-
dratic form over the cube [−1, 1]n or over the standard simplex. In the
case of the cube, inapproximability of the max-cut problem is used. In

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 541

the case of the standard simplex, it follows via the celebrated Motzkin–
Straus theorem [51] from the inapproximability of the maximum stable set
problem. These are results by Håstad [28]; see also [16].
5. Approximation schemes. For important classes of optimization
problems, while exact optimization is hard, good approximations can still
be obtained efficiently.
Many such examples are known in combinatorial settings. As an ex-
ample in continuous optimization, we refer to the problem of maximizing
homogeneous polynomial functions of fixed degree over simplices. Here de
Klerk et al. [17] proved a weak PTAS.
Below we present a general result for mixed-integer polynomial opti-
mization over polytopes.
5.1. Mixed-integer polynomial optimization in fixed dimen-
sion over linear constraints: FPTAS and weak FPTAS. Here we
consider the problem

max/min f (x1 , . . . , xn )
subject to Ax ≤ b (5.1)
x ∈ Rn1 × Zn2 ,

where A is a rational matrix and b is a rational vector. As we pointed


out above (Theorem 4.1), optimizing degree-4 polynomials over problems
with two integer variables (n1 = 0, n2 = 2) is already a hard problem.
Thus, even when we fix the dimension, we cannot get a polynomial-time
algorithm for solving the optimization problem. The best we can hope for,
even when the number of both the continuous and the integer variables is
fixed, is an approximation result.
We present here the FPTAS obtained by De Loera et al. [18–20], which
uses the “summation method” and the theory of short rational generating
functions pioneered by Barvinok [6, 7]. We review the methods below; the
FPTAS itself appears in Section 5.1. An open question is briefly discussed
at the end of this section.
5.1.1. The summation method. The summation method for opti-
mization is the idea to use of elementary relation

max{s1 , . . . , sN } = lim k
sk1 + · · · + skN , (5.2)
k→∞

which holds for any finite set S = {s1 , . . . , sN } of non-negative real num-
bers. This relation can be viewed as an approximation result for k -norms.
Now if P is a polytope and f is an objective function non-negative on
P ∩ Zd , let x1 , . . . , xN denote all the feasible integer solutions in P ∩ Zd
and collect their objective function values si = f (xi ) in a vector s ∈ QN .

www.it-ebooks.info
542 MATTHIAS KÖPPE

k=1 k=2
Fig. 1. Approximation properties of k -norms.

Then, comparing the unit balls of the k -norm and the ∞ -norm (Figure 1),
we get the relation

Lk := N −1/k sk ≤ s∞ ≤ sk =: Uk .

These estimates are independent of the function f . (Different estimates


that make use of the properties of f , and that are suitable also for the
continuous case, can be obtained from the Hölder inequality; see for in-
stance [3].)
Thus, for obtaining a good approximation of the maximum, it suffices
to solve a summation problem of the polynomial function?h = f k on P ∩ Z@d
for a value of k that is large enough. Indeed, for k = (1 + 1/) log N ,
we obtain Uk − Lk ≤ f (xmax ). On the other hand, this choice of k is
polynomial in the input size (because 1/ is encoded in unary in the input,
and log N is bounded by a polynomial in the binary encoding size of the
polytope P ). Hence, when the dimension d is fixed, we can expand the
polynomial function f k as a list of monomials in polynomial time.
5.1.2. Rational generating functions. To solve the summation
problem, one uses the technology of short rational generating functions.
We explain the theory on a simple, one-dimensional example. Let us con-
sider the set S of integers in the interval P = [0, . . . , n]. We associate with S
the polynomial g(P ; z) = z 0 + z 1 + · · · + z n−1 + z n ; i.e., every integer α ∈ S
corresponds to a monomial z α with coefficient 1 in the polynomial g(P ; z).
This polynomial is called the generating function of S (or of P ). From the
viewpoint of computational complexity, this generating function is of expo-
nential size (in the encoding length of n), just as an explicit list of all the
integers 0, 1, . . . , n − 1, n would be. However, we can observe that g(P ; z)
is a finite geometric series, so there exists a simple summation formula that
expresses it in a much more compact way:

1 − z n+1
g(P ; z) = z 0 + z 1 + · · · + z n−1 + z n = . (5.3)
1−z
The “long” polynomial has a “short” representation as a rational function.
The encoding length of this new formula is linear in the encoding length
of n. On the basis of this idea, we can solve the summation problem.
Consider the generating function of the interval P = [0, 4],

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 543

1 z5
g(P ; z) = z 0 + z 1 + z 2 + z 3 + z 4 = − .
1−z 1−z
d
We now apply the differential operator z dz and obtain
 
d 1 −4z 5 + 5z 4
z g(P ; z) = 1z 1 + 2z 2 + 3z 3 + 4z 4 = − .
dz (1 − z)2 (1 − z)2

Applying the same differential operator again, we obtain


  
d d
z z g(P ; z) = 1z 1 + 4z 2 + 9z 3 + 16z 4
dz dz
z + z2 25z 5 − 39z 6 + 16z 7
= − .
(1 − z)3 (1 − z)3

We have thus evaluated the monomial function h(α) = α2 for α = 0, . . . , 4;


the results appear as the coefficients of the respective monomials. Substi-
tuting z = 1 yields the desired sum
  
d d
z z g(P ; z) = 1 + 4 + 9 + 16 = 30.
dz dz z=1

The idea now is to evaluate this sum instead by computing the limit of the
rational function for z → 1,
4
  
2 z + z2 25z 5 − 39z 6 + 16z 7
α = lim − ;
z→1 (1 − z)3 (1 − z)3
α=0

this can be done using residue techniques.


We now present the general definitions and results. Let P ⊆ Rd be a
rational polyhedron. We first
 define its generating function as the formal
Laurent series g̃(P ; z) = α∈P ∩Zd zα ∈ Z[[z1 , . . . , zd , z1−1, . . . , zd−1 ]], i.e.,
without any consideration of convergence properties. By convergence, one
moves to a rational generating function g(P ; z) ∈ Q(z1 , . . . , zd ).
The following breakthrough result was obtained by Barvinok in 1994.
Theorem 5.1 (Barvinok [6]). Let d be fixed. Then there exists a
polynomial-time algorithm for computing the generating function g(P ; z)
of a polyhedron P ⊆ Rd given by rational inequalities in the form of a
rational function
 za i
g(P ; z) = i  d (5.4)
i∈I j=1 (1 − zbij )

with i ∈ {±1}, ai ∈ Zd , and bij ∈ Zd .

www.it-ebooks.info
544 MATTHIAS KÖPPE

5.1.3. Efficient summation using rational generating func-


tions. Below we describe the theorems on the summation method based on
short rational generating functions, which appeared in [18–20]. Let g(P ; z)
be the rational generating function of P ∩Zd , computed using Barvinok’s al-
gorithm. By symbolically applying differential operators to g(P ; z), we can
compute a short  rational function representation of the Laurent polynomial
g(P, h; z) = α∈P ∩Zd h(α)zα , where each monomial zα corresponding to
an integer point α ∈ P ∩ Zd has a coefficient that is the value h(α). As
in the one-dimensional example above, we use the partial differential op-
erators zi ∂z∂ i for i = 1, . . . , d on the short rational generating function. In
fixed dimension, the size of the rational function expressions occuring in
the symbolic calculation can be bounded polynomially. Thus one obtains
the following result.
Theorem 5.2 (see  [19], Lemma 3.1).
(a) Let h(x1 , . . . , xd ) = β cβ xβ ∈ Q[x1 , . . . , xd ] be a polynomial. Define
the differential operator
    β1  βd
∂ ∂ ∂ ∂
Dh = h z1 , . . . , zd = cβ z1 . . . zd .
∂z1 ∂zd ∂z1 ∂zd
β

Then Dh maps the generating function g(P ; z) = α∈P ∩Zd zα to the
weighted generating function

(Dh g)(z) = g(P, h; z) = h(α)zα .
α∈P ∩Zd

(b) Let the dimension d be fixed.  Let g(P ; z) be the Barvinok representation
of the generating function α∈P ∩Zd zα of P ∩Zd . Let h ∈ Q[x1 , . . . , xd ]
be a polynomial, given as a list of monomials with rational coeffi-
cients cβ encoded in binary and exponents β encoded in unary. We
can compute in polynomial time a Barvinok  representation g(P, h; z)
for the weighted generating function α∈P ∩Zd h(α)zα .
Thus, we can implement the following algorithm in polynomial time
(in fixed dimension).
Algorithm 1 (Computation of bounds for the optimal value).
Input: A rational convex polytope P ⊂ Rd ; a polynomial objective func-
tion f ∈ Q[x1 , . . . , xd ] that is non-negative over P ∩ Zd , given as a list of
monomials with rational coefficients cβ encoded in binary and exponents β
encoded in unary; an index k, encoded in unary.
Output: A lower bound Lk and an upper bound Uk for the maximal function
value f ∗ of f over P ∩Zd . The bounds Lk form a nondecreasing, the bounds
Uk a nonincreasing sequence of bounds that both reach f ∗ in a finite number
of steps.
1. Compute a short rational function expression for the generating function
g(P ; z) = α∈P ∩Zd zα . Using residue techniques, compute |P ∩ Zd | =
g(P ; 1) from g(P ; z).

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 545

2. Compute the polynomial f k from f .


3. From the rational function g(P  ; z) compute the rational function rep-
resentation of g(P, f k ; z) of α∈P ∩Zd f k (α)zα by Theorem 5.2. Using
residue techniques, compute
A B C D
Lk := k k
g(P, f ; 1)/g(P ; 1) and Uk := k k
g(P, f ; 1) .

From the discussion of the convergence of the bounds, one then obtains
the following result.
Theorem 5.3 (Fully polynomial-time approximation scheme). Let
the dimension d be fixed. Let P ⊂ Rd be a rational convex polytope. Let f
be a polynomial with rational coefficients that is non-negative on P ∩ Zd ,
given as a list of monomials with rational coefficients cβ encoded in binary
and exponents β encoded in unary.
(i) Algorithm 1 computes the bounds Lk , Uk in time polynomial in k, the
input size of P 1and f , and the2 total degree D. The bounds satisfy

Uk − Lk ≤ f ∗ · k |P ∩ Zd | − 1 .
(ii) For k = (1 + 1/) log(|P ∩ Zd |) (a number bounded by a polynomial in
the input size), Lk is a (1 − )-approximation to the optimal value f ∗
and it can be computed in time polynomial in the input size, the total
degree D, and 1/. Similarly, Uk gives a (1 + )-approximation to f ∗ .
(iii) With the same complexity, by iterated bisection of P , we can also find
a feasible solution x ∈ P ∩ Zd with f (x ) − f ∗ ≤ f ∗ .
5.1.4. Extension to the mixed-integer case by discretization.
The mixed-integer case can be handled by discretization of the continuous
variables. We illustrate on an example that one needs to be careful to pick
a sequence of discretizations that actually converges. Consider the mixed-
integer linear optimization problem depicted in Figure 2, whose feasible
region consists of the point ( 12 , 1) and the segment { (x, 0) : x ∈ [0, 1] }.
The unique optimal solution is x = 12 , z = 1. Now consider the sequence
1
of grid approximations where x ∈ m Z≥0 . For even m, the unique optimal
solution to the grid approximation is x = 12 , z = 1. However, for odd m,
the unique optimal solution is x = 0, z = 0. Thus the full sequence of the
optimal solutions to the grid approximations does not converge because it
has two limit points; see Figure 2.
To handle polynomial objective functions that take arbitrary (posi-
tive and negative) values, one can shift the objective function by a large
constant. Then, to obtain a strong approximation result, one iteratively
reduces the constant by a factor. Altogether we have the following result.
Theorem 5.4 (Fully polynomial-time approximation schemes). Let
the dimension n = n1 + n2 be fixed. Let an optimization problem (5.1) of
a polynomial function f over the mixed-integer points of a polytope P and
an error bound  be given, where

www.it-ebooks.info
546 MATTHIAS KÖPPE

Z Z Z
1 Opt. 1 1
f ( 12 , 1) = 1

Opt.
1 1 f (0, 0) = 0
2
1 R 2
1 R 1 R

Fig. 2. A mixed-integer linear optimization problem and a sequence of optimal


solutions to grid problems with two limit points, for even m and for odd m.

(I1 ) f is given as a list of monomials with rational coefficients cβ encoded


in binary and exponents β encoded in unary,
(I2 ) P is given by rational inequalities in binary encoding,
(I3 ) the rational number 1 is given in unary encoding.
(a) There exists a fully polynomial time approximation scheme (FPTAS) for
the maximization problem for all polynomial functions f (x, z) that are
non-negative on the feasible region. That is, there exists a polynomial-
time algorithm. that, given/ the above data, computes a feasible solution
(x , z ) ∈ P ∩ Rn1 × Zn2 with
f (x , z ) − f (xmax , zmax ) ≤ f (xmax , zmax ).
(b) There exists a polynomial-time algorithm .that, given /the above data,
computes a feasible solution (x , z ) ∈ P ∩ Rn1 × Zn2 with
f (x , z ) − f (xmax , zmax ) ≤  f (xmax , zmax ) − f (xmin , zmin ) .
In other words, this is a weak FPTAS.
5.1.5. Open question. Consider the problem (5.1) for a fixed num-
ber n2 of integer variables and a varying number n1 of continuous variables.
Of course, even with no integer variables present (n2 = 0), this is NP-hard
and inapproximable. On the other hand, if the objective function f is
linear, the problem can be solved in polynomial time using Lenstra’s al-
gorithm. Thus it is interesting to consider the problem for an objective
function of restricted nonlinearity, such as
f (x, z) = g(z) + c x,
with an arbitrary polynomial function g in the integer variables and a
linear form in the continuous variables. The complexity (in particular the
existence of approximation algorithms) of this problem is an open question.
6. Polynomial-time algorithms. Here we study important special
cases where polynomial-time algorithms can be obtained. We also include
cases here where the algorithms efficiently approximate the optimal solution
to arbitrary precision, as discussed in section 2.2.

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 547

6.1. Fixed dimension: Continuous polynomial optimization.


Here we consider pure continuous polynomial optimization problems of
the form
min f (x1 , . . . , xn )
s.t. g1 (x1 , . . . , xn ) ≤ 0
.. (6.1)
.
gm (x1 , . . . , xn ) ≤ 0
x ∈ Rn1 .

When the dimension is fixed, this problem can be solved in polynomial


time, in the sense that there exists an algorithm that efficiently computes
an approximation to an optimal solution. This follows from a much more
general theory on the computational complexity of approximating the solu-
tions to general algebraic and semialgebraic formulae over the real numbers
by Renegar [60], which we review in the following. The bulk of this theory
was developed in [57–59]. Similar results appeared in [29]; see also [8, Chap-
ter 14]). One considers problems associated with logic formulas of the form

Q1 x1 ∈ Rn1 : . . . Qω xω ∈ Rnω : P (y, x1 , . . . , xω ) (6.2)

with quantifiers Qi ∈ {∃, ∀}, where P is a Boolean combination of polyno-


mial inequalities such as

gi (y, x1 , . . . , xω ) ≤ 0, i = 1, . . . , m,

or using ≥, <, >, or = as the relation. Here y ∈ Rn0 is a free (i.e., not
quantified) variable. Let d ≥ 2 be an upper bound on the degrees of the
polynomials gi . A vector ȳ ∈ Rn0 is called a solution of this formula if
the formula (6.2) becomes a true logic sentence if we set y = ȳ. Let Y
denote the set of all solutions. An -approximate solution is a vector y
with ȳ − y  <  for some solution ȳ ∈ Y .
The following bound can be proved. When the number ω of “blocks”
of quantifiers (i.e., the number of alternations of the quantifiers ∃ and ∀)
is fixed, then the bound is singly exponential in the dimension.
Theorem 6.1. If the formula (6.2) has only integer coefficients of
binary encoding size at most , then every connected component of Y in-
tersects with the ball {y ≤ r}, where

log r = (md)2 n0 n1 ···nk


O(ω)
.

This bound is used in the following fundamental result, which gives a


general algorithm to compute -approximate solutions to the formula (6.2).
Theorem 6.2. There exists an algorithm that, given numbers 0 <  <
r that are integral powers of 2 and a formula (6.2), computes a set {yi }i of

www.it-ebooks.info
548 MATTHIAS KÖPPE

(md)2 n0 n1 ···nk
O(ω)
distinct -approximate solutions of the formula with the
property that for each connected components of Y ∩ {y ≤ r} one of the
yi is within distance . The algorithm runs in time
. /O(1)
(md)2 n0 n1 ...nk
O(ω)
 + md + log 1 + log r .

This can be applied to polynomial optimization problems as follows.


Consider the formula

∀x ∈ Rn1 : g1 (y) ≤ 0 ∧ · · · ∧ gm (y) ≤ 0


E F (6.3)
∧ g1 (x) > 0 ∨ · · · ∨ gm (x) > 0 ∨ f (y) − f (x) < 0 ,

this describes that y is an optimal solution (all other solutions x are either
infeasible or have a higher objective value). Thus optimal solutions can be
efficiently approximated using the algorithm of Theorem 6.2.
6.2. Fixed dimension: Convex and quasi-convex integer poly-
nomial minimization. In this section we consider the case of the mini-
mization of convex and quasi-convex polynomials f over the mixed-integer
points in convex regions given by convex and quasi-convex polynomial func-
tions g1 , . . . , gm :

min f (x1 , . . . , xn )
s.t. g1 (x1 , . . . , xn ) ≤ 0
.. (6.4)
.
gm (x1 , . . . , xn ) ≤ 0
x ∈ Rn1 × Zn2 .

Here a function g : Rn → R1 is called quasi-convex if every lower level set


Lλ = { x ∈ Rn : g(x) ≤ λ } is a convex set.
The complexity in this setting is fundamentally different from the gen-
eral (non-convex) case. One important aspect is that bounding results for
the coordinates of optimal integer solutions exists, which are similar to
the ones for continuous solutions in Theorem 6.1 above. For the case of
convex functions, these bounding results were obtained by [40, 63]. An im-
proved bound was obtained by [4, 5], which also handles the more general
case of quasi-convex polynomials. This bound follows from the efficient
theory of quantifier elimination over the real numbers that we referred to
in section 6.1.
Theorem 6.3. Let f, g1 , . . . , gm ∈ Z[x1 , . . . , xn ] be quasi-convex poly-
nomials of degree at most d ≥ 2, whose coefficients have a binary encoding
length of at most . Let
 
F = x ∈ Rn : gi (x) ≤ 0 for i = 1, . . . , m

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 549

Fig. 3. Branching on hyperplanes corresponding to approximate lattice width di-


rections of the feasible region in a Lenstra-type algorithm.

be the (continuous) feasible region. If the integer minimization problem


min{ f (x) : x ∈ F ∩ Zn } is bounded, there exists a radius R ∈ Z+ of binary
encoding length at most (md)O(n)  such that
   
min f (x) : x ∈ F ∩ Zn = min f (x) : x ∈ F ∩ Zn , x ≤ R .
Using this finite bound, a trivial enumeration algorithm can find an
optimal solution (but not in polynomial time, not even in fixed dimen-
sion). Thus the incomputability result for integer polynomial optimization
(Theorem 3.3) does not apply to this case.
The unbounded case can be efficiently detected in the case of quasi-
convex polynomials; see [5] and [52], the latter of which also handles the
case of “faithfully convex” functions that are not polynomials.
In fixed dimension, the problem of convex integer minimization can be
solved efficiently using variants of Lenstra’s algorithm [46] for integer pro-
gramming. Lenstra-type algorithms are algorithms for solving feasibility
problems. We consider a family of feasibility problems associated with the
optimization problem,
∃x ∈ Fα ∩ Zn where Fα = { x ∈ F : f (x) ≤ α } for α ∈ Z. (6.5)
If bounds for f (x) on the feasible regions of polynomial binary encoding
size are known, a polynomial time algorithm for this feasibility problem
can be used in a binary search to solve the optimization problem in poly-
nomial time. Indeed, when the dimension n is fixed, the bound R given by
Theorem 6.3 has a binary encoding size that is bounded polynomially by
the input data.
A Lenstra-type algorithm uses branching on hyperplanes (Figure 3) to
obtain polynomial time complexity in fixed dimension. Note that only the
binary encoding size of the bound R, but not R itself, is bounded polyno-
mially. Thus, multiway branching on the values of a single variable xi will

www.it-ebooks.info
550 MATTHIAS KÖPPE

create an exponentially large number of subproblems. Instead, a Lenstra-


type algorithm computes a primitive lattice vector w ∈ Zn such that there
are only few lattice hyperplanes w x = γ (with γ ∈ Z) that can have
a nonempty intersection with Fα . The width of Fα in the direction w,
defined as

max{ w x : x ∈ Fα } − min{ w x : x ∈ Fα } (6.6)

essentially gives the number of these lattice hyperplanes. A lattice width


direction is a minimizer of the width among all directions w ∈ Zn , the
lattice width the corresponding width. Any polynomial bound on the width
will yield a polynomial-time algorithm in fixed dimension.
Exact and approximate lattice width directions w can be constructed
using geometry of numbers techniques. We refer to the excellent tuto-
rial [24] and the classic references cited therein. The key to dealing with
the feasible region Fα is to apply ellipsoidal rounding. By applying the
shallow-cut ellipsoid method (which we describe in more detail below), one
finds concentric proportional inscribed and circumscribed ellipsoids that
differ by some factor β that only depends on the dimension n. Then
any η-approximate lattice width direction for the ellipsoids gives a βη-
approximate lattice width direction for Fα . Lenstra’s original algorithm
now uses an LLL-reduced basis of the lattice Zn with respect to a norm
associated with the ellipsoid; the last basis vector then serves as an approx-
imate lattice width direction.
The first algorithm of this kind for convex integer minimization was
announced by Khachiyan [40]. In the following we present the variant
of Lenstra’s algorithm due to Heinz [30], which seems to yield the best
complexity bound for the problem published so far. The complexity result
is the following.
Theorem 6.4. Let f, g1 , . . . , gm ∈ Z[x1 , . . . , xn ] be quasi-convex poly-
nomials of degree at most d ≥ 2, whose coefficients have a binary en-
coding length of at most . There exists an algorithm running in time
mO(1) dO(n) 2O(n ) that computes a minimizer x∗ ∈ Zn of the problem (6.4)
3

or reports that no minimizer exists. If the algorithm outputs a mini-


mizer x∗ , its binary encoding size is dO(n) .
We remark that the complexity guarantees can be improved dramati-
cally by combining Heinz’ technique with more recent variants of Lenstra’s
algorithm that rely on the fast computation of shortest vectors [33].
A complexity result of greater generality was presented by Khachiyan
and Porkolab [41]. It covers the case of minimization of convex polynomials
over the integer points in convex semialgebraic sets given by arbitrary (not
necessarily quasi-convex) polynomials.
Theorem 6.5. Let Y ⊆ Rn0 be a convex set given by
 
Y = y ∈ Rn0 : Q1 x1 ∈ Rn1 : · · · Qω xω ∈ Rnω : P (y, x1 , . . . , xω )

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 551

with quantifiers Qi ∈ {∃, ∀}, where P is a Boolean combination of polyno-


mial inequalities

gi (y, x1 , . . . , xω ) ≤ 0, i = 1, . . . , m

with degrees at most d ≥ 2 and coefficients of binary encoding size at most .


There exists an algorithm Q
for solving the problem min{ yn0 : y ∈ Y ∩ Zn0 }
O(1) O(n40 ) ω
i=1 O(ni ) .
in time  (md)
When the dimension n0 + n1 + · · · + nω is fixed, the algorithm runs in
polynomial time. For the case of convex minimization where the feasible
region is described by convex polynomials, the complexity bound of Theo-
rem 6.5, however, translates to O(1) mO(n ) dO(n ) , which is worse than the
2 4

bound of Theorem 6.4 [30].


In the remainder of this subsection, we describe the ingredients of
the variant of Lenstra’s algorithm due to Heinz. The algorithm starts out
by “rounding” the feasible region, by applying the shallow-cut ellipsoid
method to find proportional inscribed and circumscribed ellipsoids. It is
well-known [26] that the shallow-cut ellipsoid method only needs an initial
circumscribed ellipsoid that is “small enough” (of polynomial binary en-
coding size – this follows from Theorem 6.3) and an implementation of a
shallow separation oracle, which we describe below.
For a positive-definite matrix A we denote by E(A, x̂) the ellipsoid
{ x ∈ Rn : (x − x̂) A(x − x̂) ≤ 1 }.
Lemma 6.1 (Shallow separation oracle). Let g0 , . . . , gm+1 ∈ Z[x] be
quasi-convex polynomials of degree at most d, the binary encoding sizes of
whose coefficients are at most r. Let the (continuous) feasible region F =
{ x ∈ Rn : gi (x) < 0 } be contained in the ellipsoid E(A, x̂), where A and x̂
have binary encoding size at most . There exists an algorithm with running
time m(lnr)O(1) dO(n) that outputs
(a) “true” if

E((n + 1)−3A, x̂) ⊆ F ⊆ E(A, x̂); (6.7)

(b) otherwise, a vector c ∈ Qn \{0} of binary encoding length (l+r)(dn)O(1)


with
 
F ⊆ E(A, x̂) ∩ x ∈ Rn : c (x − x̂) ≤ n+11
(c Ac)1/2 . (6.8)

Proof. We give a simplified sketch of the proof, without hard complex-


ity estimates. By applying an affine transformation to F ⊆ E(A, x̂), we
can assume that F is contained in the unit ball E(I, 0). Let us denote as
usual by e1 , . . . , en the unit vectors and by en+1 , . . . , e2n their negatives.
The algorithm first constructs numbers λi1 , . . . , λid > 0 with
1 1
3 < λi1 < · · · < λid < (6.9)
n+ 2
n+1

www.it-ebooks.info
552 MATTHIAS KÖPPE

B2 x2d F
x21 x21
B3 B1 x31 x11
−1 0 1
1
n+1.5 1 x41
B4 n+1
(a) (b)
Fig. 4. The implementation of the shallow separation oracle. (a) Test points xij
in the circumscribed ball E(1, 0). (b) Case I: All test points xi1 are (continuously)
feasible; so their convex hull (a cross-polytope) and its inscribed ball E((n + 1)−3 , 0) are
contained in the (continuous) feasible region F .

F B2 F
x21
B3 B1 x31
0 0 x11 x1d
x41
B4
(a) (b)
Fig. 5. The implementation of the shallow separation oracle. (a) Case II: The
center 0 violates a polynomial inequality g0 (x) < 0 (say). Due to convexity, for all
i = 1, . . . , n, one set of each pair Bi ∩ F and Bn+i ∩ F must be empty. (b) Case III: A
test point xk1 is infeasible, as it violates an inequality g0 (x) < 0 (say). However, the
center 0 is feasible at least for this inequality.

and the corresponding point sets Bi = { xij := λij ei : j = 1, . . . , d };


see Figure 4 (a). The choice of the bounds (6.9) for λij will ensure that
we either find a large enough inscribed ball for (a) or a deep enough cut
for (b). Then the algorithm determines the (continuous) feasibility of the
center 0 and the 2n innermost points xi,1 .
Case I. If xi,1 ∈ F for i = 1, . . . , 2n, then the cross-polytope
conv{ xi,1 : i = 1, . . . , 2n } is contained in F ; see Figure 4 (b). An easy
calculation shows that the ball E((n + 1)−3 , 0) is contained in the cross-
polytope and thus in F ; see Figure 4. Hence the condition in (a) is satisfied
and the algorithm outputs “true”.
Case II. We now discuss the case when the center 0 violates a poly-
nomial inequality g0 (x) < 0 (say). Let F0 = { x ∈ Rn : g0 (x) < 0 } ⊇ F .
Due to convexity of F0 , for all i = 1, . . . , n, one set of each pair Bi ∩ F0 and
Bn+i ∩ F0 must be empty; see Figure 5 (a). Without loss of generality, let
us assume Bn+i ∩ F0 = ∅ for all i. We can determine whether a n-variate
polynomial function of known maximum degree d is constant by evaluating
it on (d + 1)n suitable points (this is a consequence of the Fundamental
Theorem of Algebra). For our case of quasi-convex polynomials, this can
be improved; indeed, it suffices to test whether the gradient ∇g0 vanishes

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 553

on the nd points in the set B1 ∪ · · · ∪ Bn . If it does, we know that g0 is con-


stant, thus F = ∅, and so can we return an arbitrary vector c. Otherwise,
there is a point xij ∈ Bi with c := ∇g0 (xij ) = 0; we return this vector as
the desired normal vector of a shallow cut. Due to the choice of λij as a
1
number smaller than n+1 , the cut is deep enough into the ellipsoid E(A, x̂),
so that (6.8) holds.
Case III. The remaining case to discuss is when 0 ∈ F but there exists
a k ∈ {1, . . . , 2n} with xk,1 ∈
/ F . Without loss of generality, let k = 1, and
let x1,1 violate the polynomial inequality g0 (x) < 0, i.e., g0 (x1,1 ) ≥ 0; see
Figure 5 (b). We consider the univariate polynomial φ(λ) = g0 (λei ). We
have φ(0) = g0 (0) < 0 and φ(λ1,1 ) ≥ 0, so φ is not constant. Because φ has
degree at most d, its derivative φ has degree at most d−1, so φ has at most
d − 1 roots. Thus, for at least one of the d different values λ1,1 , . . . , λ1,d ,
say λ1,j , we must have φ (λ1,j ) = 0. This implies that c := ∇g0 (x1,j ) = 0.
By convexity, we have x1,j ∈ / F , so we can use c as the normal vector of a
shallow cut.
By using this oracle in the shallow-cut ellipsoid method, one obtains
the following result.
Corollary 6.1. Let g0 , . . . , gm ∈ Z[x] be quasi-convex polynomials
of degree at most d ≥ 2. Let the (continuous) feasible region F = { x ∈
Rn : gi (x) ≤ 0 } be contained in the ellipsoid E(A0 , 0), given by the positive-
definite matrix A0 ∈ Qn×n . Let  ∈ Q>0 be given. Let the entries of A0
and the coefficients of all monomials of g0 , . . . , gm have binary encoding
size at most .
There exists an algorithm with running time m(n)O(1) dO(n) that com-
putes a positive-definite matrix A ∈ Qn×n and a point x̂ ∈ Qn with
(a) either E((n + 1)−3 A, x̂) ⊆ F ⊆ E(A, x̂)
(b) or F ⊆ E(A, x̂) and vol E(A, x̂) < .
Finally, there is a lower bound for the volume of a continuous feasible
region F that can contain an integer point.
Lemma 6.2. Under the assumptions of Corollary 6.1, if F ∩ Zn = ∅,
there exists an  ∈ Q>0 of binary encoding size (dn)O(1) with vol F > .
On the basis of these results, one obtains a Lenstra-type algorithm for
the decision version of the convex integer minimization problem with the
desired complexity. By applying binary search, the optimization problem
can be solved, which provides a proof of Theorem 6.4.
6.3. Fixed dimension: Convex integer maximization. Maxi-
mizing a convex function over the integer points in a polytope in fixed
dimension can be done in polynomial time. To see this, note that the op-
timal value is taken on at a vertex of the convex hull of all feasible integer
points. But when the dimension is fixed, there is only a polynomial number
of vertices, as Cook et al. [15] showed.
Theorem 6.6. Let P = { x ∈ Rn : Ax ≤ b } be a rational polyhedron
with A ∈ Qm×n and let φ be the largest binary encoding size of any of the

www.it-ebooks.info
554 MATTHIAS KÖPPE

rows of the system Ax ≤ b. Let P I = conv(P ∩ Zn ) be the integer hull


n−1
of P . Then the number of vertices of P I is at most 2mn (6n2 φ) .
Moreover, Hartmann [27] gave an algorithm for enumerating all the
vertices, which runs in polynomial time in fixed dimension.
By using Hartmann’s algorithm, we can therefore compute all the ver-
tices of the integer hull P I , evaluate the convex objective function on each of
them and pick the best. This simple method already provides a polynomial-
time algorithm.
7. Strongly polynomial-time algorithms: Submodular func-
tion minimization. In important specially structured cases, even
strongly polynomial-time algorithms are available. The probably most
well-known case is that of submodular function minimization. We briefly
present the most recent developments below.
Here we consider the important problem of submodular function min-
imization. This class of problems consists of unconstrained 0/1 program-
ming problems

min f (x) : x ∈ {0, 1}n,

where the function f is submodular, i.e.,

f (x) + f (y) ≥ f (max{x, y}) + f (min{x, y}).

Here max and min denote the componentwise maximum and minimum of
the vectors, respectively.
The fastest algorithm known for submodular function minimization
seems to be by Orlin [54], who gave a strongly polynomial-time algorithm
of running time O(n5 Teval + n6 ), where Teval denotes the running time of
the evaluation oracle. The algorithm is “combinatorial”, i.e., it does not
use the ellipsoid method. This complexity bound simultaneously improved
that of the fastest strongly polynomial-time algorithm using the ellipsoid
method, of running time Õ(n5 Teval + n7 ) (see [50]) and the fastest “com-
binatorial” strongly polynomial-time algorithm by Iwata [35], of running
time O((n6 Teval + n7 ) log n). We remark that the fastest polynomial-time
algorithm, by Iwata [35], runs in O((n4 Teval + n5 ) log M ), where M is the
largest function value. We refer to the recent survey by Iwata [36], who
reports on the developments that preceded Orlin’s algorithm [54].
For the special case of symmetric submodular function minimization,
i.e., f (x) = f (1 − x), Queyranne [56] presented an algorithm of running
time O(n3 Teval ).
Acknowledgments. The author wishes to thank the referees, in par-
ticular for their comments on the presentation of the Lenstra-type algo-
rithm, and his student Robert Hildebrand for a subsequent discussion about
this topic.

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 555

REFERENCES

[1] L. Adleman and K. Manders, Reducibility, randomness and intractability,


in Proc. 9th Annual ACM Symposium on Theory of Computing, 1977,
pp. 151–163.
[2] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Annals of Math., 160
(2004), pp. 781–793.
[3] V. Baldoni, N. Berline, J.A. De Loera, M. Köppe, and M. Vergne, How to
integrate a polynomial over a simplex. To appear in Mathematics of Compu-
tation. eprint arXiv:0809.2083 [math.MG], 2008.
[4] B. Bank, J. Heintz, T. Krick, R. Mandel, and P. Solernó, Une borne optimale
pour la programmation entiére quasi-convexe, Bull. Soc. math. France, 121
(1993), pp. 299–314.
[5] B. Bank, T. Krick, R. Mandel, and P. Solernó, A geometrical bound for inte-
ger programming with polynomial constraints, in Fundamentals of Computa-
tion Theory, Vol. 529 of Lecture Notes In Computer Science, Springer-Verlag,
1991, pp. 121–125.
[6] A.I. Barvinok, A polynomial time algorithm for counting integral points in poly-
hedra when the dimension is fixed, Mathematics of Operations Research, 19
(1994), pp. 769–779.
[7] A.I. Barvinok and J.E. Pommersheim, An algorithmic theory of lattice points
in polyhedra, in New Perspectives in Algebraic Combinatorics, L.J. Billera,
A. Björner, C. Greene, R.E. Simion, and R.P. Stanley, eds., Vol. 38 of Math.
Sci. Res. Inst. Publ., Cambridge Univ. Press, Cambridge, 1999, pp. 91–147.
[8] S. Basu, R. Pollack, and M.-F. Roy, Algorithms in Real Algebraic Geometry,
Springer-Verlag, second ed., 2006.
[9] M. Bellare and P. Rogaway, The complexity of aproximating a nonlinear pro-
gram, in Pardalos [55].
[10] M. Bellare and P. Rogaway, The complexity of approximating a nonlinear pro-
gram, Mathematical Programming, 69 (1995), pp. 429–441.
[11] Y. Berstein, J. Lee, H. Maruri-Aguilar, S. Onn, E. Riccomagno, R. Weis-
mantel, and H. Wynn, Nonlinear matroid optimization and experimental
design, SIAM Journal on Discrete Mathematics, 22 (2008), pp. 901–919.
[12] Y. Berstein, J. Lee, S. Onn, and R. Weismantel, Nonlinear optimization for
matroid intersection and extensions, IBM Research Report RC24610 (2008).
[13] Y. Berstein and S. Onn, Nonlinear bipartite matching, Discrete Optimization, 5
(2008), pp. 53–65.
[14] L. Blum, M. Shub, and S. Smale, On a theory of computation and complexity
over the real numbers: NP-completeness, recursive functions and universal
machines, Bull. Am. Math. Soc., 21 (1989), pp. 1–46.
[15] W.J. Cook, M.E. Hartmann, R. Kannan, and C. McDiarmid, On integer points
in polyhedra, Combinatorica, 12 (1992), pp. 27–37.
[16] E. de Klerk, The complexity of optimizing over a simplex, hypercube or sphere:
a short survey, Central European Journal of Operations Research, 16 (2008),
pp. 111–125.
[17] E. de Klerk, M. Laurent, and P.A. Parrilo, A PTAS for the minimization of
polynomials of fixed degree over the simplex, Theoretical Computer Science,
361 (2006), pp. 210–225.
[18] J.A. De Loera, R. Hemmecke, M. Köppe, and R. Weismantel, FPTAS for
mixed-integer polynomial optimization with a fixed number of variables, in
17th ACM-SIAM Symposium on Discrete Algorithms, 2006, pp. 743–748.
[19] , Integer polynomial optimization in fixed dimension, Mathematics of Oper-
ations Research, 31 (2006), pp. 147–153.
[20] , FPTAS for optimizing polynomials over the mixed-integer points of poly-
topes in fixed dimension, Mathematical Programming, Series A, 118 (2008),
pp. 273–290.

www.it-ebooks.info
556 MATTHIAS KÖPPE

[21] J.A. De Loera, R. Hemmecke, S. Onn, and R. Weismantel, N-fold integer


programming, Disc. Optim., to appear (2008).
[22] J.A. De Loera and S. Onn, All linear and integer programs are slim 3-
way transportation programs, SIAM Journal of Optimization, 17 (2006),
pp. 806–821.
[23] , Markov bases of three-way tables are arbitrarily complicated, Journal of
Symbolic Computation, 41 (2006), pp. 173–181.
[24] F. Eisenbrand, Integer programming and algorithmic geometry of numbers, in 50
Years of Integer Programming 1958–2008, M. Jünger, T. Liebling, D. Naddef,
W. Pulleyblank, G. Reinelt, G. Rinaldi, and L. Wolsey, eds., Springer-Verlag,
2010.
[25] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the
Theory of NP-completeness, W.H. Freeman and Company, New York, NY,
1979.
[26] M. Grötschel, L. Lovász, and A. Schrijver, Geometric Algorithms and Com-
binatorial Optimization, Springer, Berlin, Germany, 1988.
[27] M.E. Hartmann, Cutting Planes and the Complexity of the Integer Hull, phd
thesis, Cornell University, Department of Operations Research and Industrial
Engineering, Ithaca, NY, 1989.
[28] J.Håstad, Some optimal inapproximability results, in Proceedings of the 29th
Symposium on the Theory of Computing (STOC), ACM, 1997, pp. 1–10.
[29] J. Heintz, M. Roy, and P. Solernó, Sur la complexité du principe de Tarski–
Seidenberg, Bull. Soc. Math. France, 118 (1990), pp. 101–126.
[30] S. Heinz, Complexity of integer quasiconvex polynomial optimization, Journal of
Complexity, 21 (2005), pp. 543–556.
[31] R. Hemmecke, M. Köppe, J. Lee, and R. Weismantel, Nonlinear integer
programming, in 50 Years of Integer Programming 1958–2008, M. Jünger,
T. Liebling, D. Naddef, W. Pulleyblank, G. Reinelt, G. Rinaldi, and L. Wolsey,
eds., Springer-Verlag, 2010.
[32] R. Hemmecke, S. Onn, and R. Weismantel, A polynomial oracle-time algorithm
for convex integer minimization, Mathematical Programming, Series A (2009).
Published online 06 March 2009.
[33] R. Hildebrand and M. Köppe, A faster algorithm for quasi-convex integer poly-
nomial optimization. eprint arXiv:1006.4661 [math.OC], 2010.
[34] D. Hochbaum, Complexity and algorithms for nonlinear optimization problems,
Annals of Operations Research, 153 (2007), pp. 257–296.
[35] S. Iwata, A faster scaling algorithm for minimizing submodular functions, SIAM
Journal on Computing, 32 (2003), pp. 833–840.
[36] , Submodular function minimization, Mathematical Programming, 112
(2008), pp. 45–64.
[37] R.G. Jeroslow, There cannot be any algorithm for integer programming with
quadratic constraints, Operations Research, 21 (1973), pp. 221–224.
[38] J.P. Jones, Universal diophantine equation, Journal of Symbolic Logic, 47 (1982),
pp. 403–410.
[39] J.P. Jones and Yu. V. Matiyasevich, Proof of recursive unsolvability of Hilbert’s
tenth problem, The American Mathematical Monthly, 98 (1991), pp. 689–709.
[40] L.G. Khachiyan, Convexity and complexity in polynomial programming, in Pro-
ceedings of the International Congress of Mathematicians, August 16–24, 1983,
Warszawa, Z. Ciesielski and C. Olech, eds., New York, 1984, North-Holland,
pp. 1569–1577.
[41] L.G. Khachiyan and L. Porkolab, Integer optimization on convex semialgebraic
sets., Discrete and Computational Geometry, 23 (2000), pp. 207–224.
[42] J.C. Lagarias, On the computational complexity of determining the solvability or
unsolvability of the equation x2 − dy 2 = −1, Transactions of the American
Mathematical Society, 260 (1980), pp. 485–508.

www.it-ebooks.info
COMPLEXITY OF NONLINEAR MIXED-INTEGER OPTIMIZATION 557

[43] , Succinct certificates for the solvability of binary quadratic diophantine


equations. e-print arXiv:math/0611209v1, 2006. Extended and updated ver-
sion of a 1979 FOCS paper.
[44] J. Lee, S. Onn, and R. Weismantel, Nonlinear optimization over a weighted
independence system, IBM Research Report RC24513 (2008).
[45] , On test sets for nonlinear integer maximization, Operations Research Let-
ters, 36 (2008), pp. 439–443.
[46] H.W. Lenstra, Integer programming with a fixed number of variables, Mathemat-
ics of Operations Research, 8 (1983), pp. 538–548.
[47] K. Manders and L. Adleman, NP-complete decision problems for binary quadrat-
ics, J. Comp. Sys. Sci., 16 (1978), pp. 168–184.
[48] Yu. V. Matiyasevich, Enumerable sets are diophantine, Doklady Akademii Nauk
SSSR, 191 (1970), pp. 279–282. (Russian); English translation, Soviet Math-
ematics Doklady, Vol. 11 (1970), pp. 354–357.
[49] , Hilbert’s tenth problem, The MIT Press, Cambridge, MA, USA, 1993.
[50] S.T. McCormick, Submodular function minimization, in Discrete Optimization,
K. Aardal, G. Nemhauser, and R. Weismantel, eds., Vol. 12 of Handbooks in
Operations Research and Management Science, Elsevier, 2005.
[51] T.S. Motzkin and E.G. Straus, Maxima for graphs and a new proof of a theorem
of Turán, Canadian Journal of Mathematics, 17 (1965), pp. 533–540.
[52] W.T. Obuchowska, On boundedness of (quasi-)convex integer optimization prob-
lems, Math. Meth. Oper. Res., 68 (2008).
[53] S. Onn, Convex discrete optimization. eprint arXiv:math/0703575, 2007.
[54] J.B. Orlin, A faster strongly polynomial time algorithm for submodular function
minimization, Math. Program., Ser. A, 118 (2009), pp. 237–251.
[55] P.M. Pardalos, ed., Complexity in Numerical Optimization, World Scientific,
1993.
[56] M. Queyranne, Minimizing symmetric submodular functions, Mathematical Pro-
gramming, 82 (1998), pp. 3–12.
[57] J. Renegar, On the computational complexity and geometry of the first-order
theory of the reals, part I: Introduction. Preliminaries. The geometry of semi-
algebraic sets. The decision problem for the existential theory of the reals,
Journal of Symbolic Computation, 13 (1992), pp. 255–300.
[58] , On the computational complexity and geometry of the first-order theory of
the reals, part II: The general decision problem. Preliminaries for quantifier
elimination, Journal of Symbolic Computation, 13 (1992), pp. 301–328.
[59] , On the computational complexity and geometry of the first-order theory of
the reals. part III: Quantifier elimination, Journal of Symbolic Computation,
13 (1992), pp. 329–352.
[60] , On the computational complexity of approximating solutions for real alge-
braic formulae, SIAM Journal on Computing, 21 (1992), pp. 1008–1025.
[61] C.L. Siegel, Zur Theorie der quadratischen Formen, Nachrichten der Akademie
der Wissenschaften in Göttingen, II, Mathematisch-Physikalische Klasse, 3
(1972), pp. 21–46.
[62] T. Skolem, Diophantische Gleichungen, Vol. 5 of Ergebnisse der Mathematik und
ihrer Grenzgebiete, 1938.
[63] S.P. Tarasov and L.G. Khachiyan, Bounds of solutions and algorithmic com-
plexity of systems of convex diophantine inequalities, Soviet Math. Doklady,
22 (1980), pp. 700–704.
[64] A.M. Turing, On computable numbers, with an application to the Entschei-
dungsproblem, Proceedings of the London Mathematical Society, Series 2, 42
(1936), pp. 230–265. Errata in ibidem, 43 (1937):544–546.
[65] S.A. Vavasis, Polynomial time weak approximation algorithms for quadratic pro-
gramming, in Pardalos [55].

www.it-ebooks.info
www.it-ebooks.info
THEORY AND APPLICATIONS
OF N -FOLD INTEGER PROGRAMMING
SHMUEL ONN∗

Abstract. We overview our recently introduced theory of n-fold integer program-


ming which enables the polynomial time solution of fundamental linear and nonlinear
integer programming problems in variable dimension. We demonstrate its power by
obtaining the first polynomial time algorithms in several application areas including
multicommodity flows and privacy in statistical databases.

Key words. Integer programming, transportation problem, multiway table, mul-


ticommodity flow, combinatorial optimization, nonlinear optimization, Graver base,
Graver complexity, discrete optimization, privacy in databases, data security.

AMS(MOS) subject classifications. 05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q,
68R, 68U, 68W, 90B, 90C.

1. Introduction. Linear integer programming is the following fun-


damental optimization problem,

min {wx : x ∈ Zn , Ax = b , l ≤ x ≤ u} ,

where A is an integer m × n matrix, b ∈ Zm , and l, u ∈ Zn∞ with Z∞ :=


Z  {±∞}. It is generally NP-hard, but polynomial time solvable in two
fundamental situations: the dimension is fixed [18]; the underlying matrix
is totally unimodular [15].
Recently, in [4], a new fundamental polynomial time solvable situa-
tion was discovered. We proceed to describe this class of so-termed n-fold
integer programs.
An (r, s) × t bimatrix is a matrix A consisting of two blocks A1 , A2 ,
with A1 its r × t submatrix consisting of the first r rows and A2 its s × t
submatrix consisting of the last s rows. The n-fold product of A is the
following (r + ns) × nt matrix,
⎛ ⎞
A1 A1 ··· A1
⎜ A2 0 ··· 0 ⎟
⎜ ⎟
(n) ⎜ 0 A2 ··· 0 ⎟
A := ⎜ ⎟ .
⎜ .. .. .. .. ⎟
⎝ . . . . ⎠
0 0 ··· A2

The following result of [4] asserts that n-fold integer programs are efficiently
solvable.
∗ Technion - Israel Institute of Technology, 32000 Haifa, Israel
([email protected]). Supported in part by a grant from ISF - the Israel
Science Foundation.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 559
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_20,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
560 SHMUEL ONN

Theorem 1.1. [4] For each fixed integer (r, s) × t bimatrix A, there
is an algorithm that, given positive integer n, bounds l, u ∈ Znt
∞, b ∈ Z
r+ns
,
and w ∈ Z , solves in time which is polynomial in n and in the binary-
nt

encoding length l, u, b, w of the rest of the data, the following so-termed
linear n-fold integer programming problem,
 
min wx : x ∈ Znt , A(n) x = b , l ≤ x ≤ u .

Some explanatory notes are in order. First, the dimension of an n-fold


integer program is nt and is variable. Second, n-fold products A(n) are
highly non totally unimodular: the n-fold product of the simple (0, 1) × 1
bimatrix with A1 empty and A2 := 2 satisfies A(n) = 2In and has expo-
nential determinant 2n . So this is indeed a class of programs which cannot
be solved by methods of fixed dimension or totally unimodular matrices.
Third, this class of programs turns out to be very natural and has nu-
merous applications, the most generic being to integer optimization over
multidimensional tables (see §2). In fact it is universal: the results of [7]
imply that every integer program is an n-fold program over some simple
bimatrix A (see §4).
The above theorem extends to n-fold integer programming with non-
linear objective functions as well. The following results, from [12], [5] and
[13], assert that the minimization and maximization of broad classes of
convex functions over n-fold integer programs can also be done in polyno-
mial time. The function f is presented either by a comparison oracle that
for any two vectors x, y can answer whether or not f (x) ≤ f (y), or by an
evaluation oracle that for any vector x can return f (x). 
In the next theorem, f is separable convex, namely f (x) = i fi (xi )
with each fi univariate convex. Like linear forms, such functions can be
minimized over totally unimodular programs [14]. We show that they can
also be efficiently minimized over n-fold programs. The running time de-
pends also on log fˆ with fˆ the maximum value of |f (x)| over the feasible
set (which need not be part of the input).
Theorem 1.2. [12] For each fixed integer (r, s) × t bimatrix A, there
∞, b ∈ Z
is an algorithm that, given n, l, u ∈ Znt r+ns
, and separable convex
f : Z → Z presented by a comparison oracle, solves in time polynomial
nt

in n and l, u, b, fˆ, the program


 
min f (x) : x ∈ Znt , A(n) x = b , l ≤ x ≤ u .

An important natural special case of Theorem 1.2 is the following


result that concerns finding a feasible point which is lp -closest to a given
desired goal point.
Theorem 1.3. [12] For each fixed integer (r, s) × t bimatrix A, there
is an algorithm that, given positive integers n and p, l, u ∈ Znt∞, b ∈ Z
r+ns
,
and x̂ ∈ Z , solves in time polynomial in n, p, and l, u, b, x̂, the following
nt

distance minimization program,

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 561

min {x − x̂p : x ∈ Znt , A(n) x = b, l ≤ x ≤ u} . (1.1)

For p = ∞ the problem (1.1) can be solved in time polynomial in n and


l, u, b, x̂.
The next result concerns the maximization of a convex function of the
composite form f (W x), with f : Zd → Z convex and W an integer matrix
with d rows.
Theorem 1.4. [5] For each fixed d and (r, s) × t integer bimatrix A,
∞ , integer d × nt matrix
there is an algorithm that, given n, bounds l, u ∈ Znt
W , b ∈ Zr+ns , and convex function f : Zd → R presented by a comparison
oracle, solves in time polynomial in n and W, l, u, b, the convex n-fold
integer maximization program

max{f (W x) : x ∈ Znt , A(n) x = b , l ≤ x ≤ u} .

Finally, we have the following broad extension of Theorem 1.2 where


the objective can include a composite term f (W x), with f : Zd → Z
separable convex and W an integer matrix with d rows, and where also
inequalities on W x can be included. As before, fˆ, ĝ denote the maximum
values of |f (W x)|, |g(x)| over the feasible set.
Theorem 1.5. [13] For each fixed integer (r, s) × t bimatrix A and
integer (p, q)×t bimatrix W , there is an algorithm that, given n, l, u ∈ Znt∞,
ˆl, û ∈ Zp+nq , b ∈ Zr+ns , and separable convex functions f : Zp+nq → Z,

g : Znt → Z presented by evaluation oracles, solves in time polynomial in
n and l, u, ˆl, û, b, fˆ, ĝ, the generalized program
 
min f (W (n) x) + g(x) : x ∈ Znt , A(n) x = b , ˆl ≤ W (n) x ≤ û, l ≤ x ≤ u .

The article is organized as follows. In Section 2 we discuss some of


the many applications of this theory and use Theorems 1.1–1.5 to obtain
the first polynomial time algorithms for these applications. In Section 3 we
provide a concise development of the theory of n-fold integer programming
and prove our Theorems 1.1–1.5. Sections 2 and 3 can be read in any
order. We conclude in Section 4 with a discussion of the universality of n-
fold integer programming and of a new (di)-graph invariant, about which
very little is known, that is important in understanding the complexity of
our algorithms. Further discussion of n-fold integer programming within
the broader context of nonlinear discrete optimization can be found in [21]
and [22].
2. Applications.
2.1. Multiway tables. Multiway tables occur naturally in any con-
text involving multiply-indexed variables. They have been studied exten-
sively in mathematical programming in the context of high dimensional
transportation problems (see [27, 28] and the references therein) and in

www.it-ebooks.info
562 SHMUEL ONN

statistics in the context of disclosure control and privacy in statistical


databases (see [3, 9] and the references therein). The theory of n-fold
integer programming provides the first polynomial time algorithms for mul-
tiway table problems in these two contexts, which are discussed in §2.1.1
and §2.1.2 respectively.
We start with some terminology and background that will be used
in the sequel. A d-way table is an m1 × · · · × md array x = (xi1 ,...,id ) of
nonnegative integers. A d-way transportation polytope (d-way polytope for
brevity) is the set of m1 × · · · × md nonnegative arrays x = (xi1 ,...,id ) with
specified sums of entries over some of their lower dimensional subarrays
(margins in statistics). The d-way tables with specified margins are the
integer points in the d-way polytope. For example (see Figure 1), the
3-way polytope of l × m × n arrays with specified line-sums (2-margins) is
  
T := {x ∈ Rl×m×n
+ : xi,j,k = v∗,j,k , xi,j,k = vi,∗,k , xi,j,k = vi,j,∗ }
i j k

where the specified line-sums are mn + ln + lm given nonnegative integer


numbers
v∗,j,k , vi,∗,k , vi,j,∗ , 1 ≤ i ≤ l, 1 ≤ j ≤ m, 1 ≤ k ≤ n .
Our results hold for k-margins for any 0 ≤ k ≤ d, and much more gen-
erally for any so-called hierarchical family of margins. For simplicity of
the exposition, however, we restrict attention here to line-sums, that is,
(d − 1)-margins, only.
We conclude this preparation with the universality theorem for multi-
way tables and polytopes. It provides a powerful tool in establishing the
presumable limits of polynomial time solvability of table problems, and
will be used in §2.1.1 and §2.1.2 to contrast the polynomial time solvability
attainable by n-fold integer programming.
Theorem 2.1. [7] Every rational polytope P = {y ∈ Rd+ : Ay = b}
is in polynomial time computable integer preserving bijection with some
l × m × 3 line-sum polytope
  
T = {x ∈ Rl×m×3
+ : xi,j,k = v∗,j,k , xi,j,k = vi,∗,k , xi,j,k = vi,j,∗ }.
i j k

2.1.1. Multi-index transportation problems. The multi-index


transportation problem of Motzkin [19] is the integer programming problem
over multiway tables with specified margins. For line-sums it is the program
1 ×···×md
min{wx : x ∈ Zm
+ ,
 
xi1 ,...,id = v∗,i2 ,...,id , . . . , xi1 ,...,id = vi1 ,...,id−1 ,∗ }.
i1 id

For d = 2 this program is totally unimodular and can be solved in polyno-


mial time. However, already for d = 3 it is generally not, and the problem

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 563

Multiway Tables
Consider m1 X . . . X md X n tables with given margins such as line-sums:

8
4
6 1
3
2
3 0
1
0
A(n) =
5
n
0
1
n
0
9

Such tables form an n-fold program { x : A(n)x = b, x • 0, x integer } for


suitable bimatrix A determined by m1, …, md where A1 controls equations
of margins which involve summation over layers, whereas A2 controls
equations of margins involving summation within a single layer at a time

Fig. 1. Multiway tables.

is much harder. Consider the problem over l × m × n tables. If l, m, n are


all fixed then the problem is solvable in polynomial time (in the natural
binary-encoding length of the line-sums), but even in this very restricted
situation one needs off-hand the algorithm of integer programming in fixed
dimension lmn. If l, m, n are all variable then the problem is NP-hard
[17]. The in-between cases are much more delicate and were resolved only
recently. If two sides are variable and one is fixed then the problem is still
NP-hard [6]; moreover, Theorem 2.1 implies that it is NP-hard even over
l × m × 3 tables with fixed n = 3. Finally, if two sides are fixed and one
is variable, then the problem can be solved in polynomial time by n-fold
integer programming. Note that even over 3×3×n tables, the only solution
of the problem available to-date is the one given below using n-fold integer
programming.
The polynomial time solvability of the multi-index transportation prob-
lem when one side is variable and the others are fixed extends to any di-
mension d. We have the following important result on the multi-index
transportation problem.
Theorem 2.2. [4] For every fixed d, m1 , . . . , md , there is an algorithm
that, given n, integer m1 × · · · × md × n cost w, and integer line-sums v =

www.it-ebooks.info
564 SHMUEL ONN

((v∗,i2 ,...,id+1 ), . . . , (vi1 ,...,id ,∗ )), solves in time polynomial in n and w, v,
the (d + 1)-index transportation problem
1 ×···×md ×n
min{wx : x ∈ Zm
+ ,
 
xi1 ,...,id+1 = v∗,i2 ,...,id+1 , . . . , xi1 ,...,id+1 = vi1 ,...,id ,∗ }.
i1 id+1

Proof. Re-index arrays as x = (x1 , . . . , xn ) with each xid+1 =


(xi1 ,...,id ,id+1 ) a suitably indexed m1 m2 · · · md vector representing the id+1 -
th layer of x. Similarly re-index the array w. Let t := r := m1 m2 · · · md and
s := n (m2 · · · md + · · · + m1 · · · md−1 ). Let b := (b0 , b1 , . . . , bn ) ∈ Zr+ns ,
where b0 := (vi1 ,...,id ,∗ ) and for id+1 = 1, . . . , n,
. /
bid+1 := (v∗,i2 ,...,id ,id+1 ), . . . , (vi1 ,...,id−1 ,∗,id+1 ) .

Let A be the (t, s) × t bimatrix with first block A1 := It the t × t identity


matrix and second block A2 a matrix defining the  line-sum equations on
m1 × · · · × md arrays. Then the equations A1 ( id+1 xid+1 ) = b0 repre-

sent the line-sum equations id+1 xi1 ,...,id+1 = vi1 ,...,id ,∗ where summations
over layers occur, whereas the equations A2 xid+1 = bid+1 for id+1 = 1, . . . , n
represent all other line-sum equations, where summations are within a sin-
gle layer at a time. Therefore the multi-index transportation problem is
encoded as the n-fold integer programming problem

min {wx : x ∈ Znt , A(n) x = b, x ≥ 0} .

Using the algorithm of Theorem 1.1, this n-fold integer program, and hence
the given multi-index transportation problem, can be solved in polynomial
time.
This proof extends immediately to multi-index transportation prob-
lems with nonlinear objective functions of the forms in Theorems 1.2–1.5.
Moreover, as mentioned before, a similar proof shows that multi-index
transportation problems with k-margin constraints, and more generally,
hierarchical margin constraints, can be encoded as n-fold integer program-
ming problems as well. We state this as a corollary.
Corollary 2.1. [5] For every fixed d and m1 , . . . , md , the nonlin-
ear multi-index transportation problem, with any hierarchical margin con-
straints, over (d + 1)-way tables of format m1 × · · · × md × n with variable
n layers, are polynomial time solvable.
2.1.2. Privacy in statistical databases. A common practice in the
disclosure of sensitive data contained in a multiway table is to release some
of the table margins rather than the entries of the table. Once the margins
are released, the security of any specific entry of the table is related to the
set of possible values that can occur in that entry in all tables having the
same margins as those of the source table in the database. In particular, if

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 565

this set consists of a unique value, that of the source table, then this entry
can be exposed and privacy can be violated. This raises the following
fundamental problem.
Entry uniqueness problem: Given hierarchical margin family and entry
index, is the value which can occur in that entry in all tables with these
margins, unique?
The complexity of this problem turns out to behave in analogy to the
complexity of the multi-index transportation problem discussed in §2.1.1.
Consider the problem for d = 3 over l × m × n tables. It is polynomial
time decidable when l, m, n are all fixed, and coNP-complete when l, m, n
are all variable [17]. We discuss next in more detail the in-between cases
which are more delicate and were settled only recently.
If two sides are variable and one is fixed then the problem is still
coNP-complete, even over l × m × 3 tables with fixed n = 3 [20]. Moreover,
Theorem 2.1 implies that any set of nonnegative integers is the set of values
of an entry of some l × m × 3 tables with some specified line-sums. Figure
2 gives an example of line-sums for 6 × 4 × 3 tables where one entry attains
the set of values {0, 2} which has a gap.

Set of Entry Values With a Gap

The only values occurring in the designated entry in all


6 X 4 X 3 tables with the specified line-sums are 0, 2

2
1 2
2 1 2
2 1 2
0 2 3 0
1 2
1 2
1 2 2 0
0
0 1
0 1 1 2
0
2 0
2 2 2 1
2
0 0
0 0 3 2
2 2
2 0
0 2 0 3
2 1
0 0

Fig. 2. Set of entry values with a gap.

www.it-ebooks.info
566 SHMUEL ONN

Theorem 2.3. [8] For every finite set S ⊂ Z+ of nonnegative integers,


there exist l, m, and line-sums for l×m×3 tables, such that the set of values
that occur in some fixed entry in all l × m × 3 tables that have these line-
sums, is precisely S.
Proof. Consider any finite set S = {s1 , . . . , sh } ⊂ Z+ . Consider the
polytope


h 
h
P := {y ∈ Rh+1
+ : y0 − s j yj = 0 , yj = 1 } .
j=1 j=1

By Theorem 2.1, there are l, m, and l × m × 3 polytope T with line-sums

v∗,j,k , vi,∗,k , vi,j,∗ , 1 ≤ i ≤ l, 1 ≤ j ≤ m, 1 ≤ k ≤ 3 ,

such that the integer points in T , which are precisely the l × m × 3 tables
with these line-sums, are in bijection with the integer points in P . Moreover
(see [7]), this bijection is obtained by a simple projection from Rl×m×3 to
Rh+1 that erases all but some h+1 coordinates. Let xi,j,k be the coordinate
that is mapped to y0 . Then the set of values that this entry attains in all
tables with these line-sums is, as desired,
   
xi,j,k : x ∈ T ∩ Zl×m×3 = y0 : y ∈ P ∩ Zh+1 = S.

Finally, if two sides are fixed and one is variable, then entry uniqueness
can be decided in polynomial time by n-fold integer programming. Note
that even over 3 × 3 × n tables, the only solution of the problem available
to-date is the one below.
The polynomial time decidability of the problem when one side is
variable and the others are fixed extends to any dimension d. It also extends
to any hierarchical family of margins, but for simplicity we state it only for
line-sums, as follows.
Theorem 2.4. [20] For every fixed d, m1 , . . . , md , there is an algo-
rithm that, given n, integer line-sums v = ((v∗,i2 ,...,id+1 ), . . . , (vi1 ,...,id ,∗ )),
and entry index (k1 , . . . , kd+1 ), solves in time which is polynomial in n and
v, the corresponding entry uniqueness problem, of deciding if the entry
xk1 ,...,kd+1 is the same in all (d + 1)-tables in the set

1 ×···×md ×n
S := {x ∈ Zm
+ :
 
xi1 ,...,id+1 = v∗,i2 ,...,id+1 , . . . , xi1 ,...,id+1 = vi1 ,...,id ,∗ }.
i1 id+1

Proof. By Theorem 2.2 we can solve in polynomial time both n-fold


programs
 
l := min xk1 ,...,kd+1 : x ∈ S ,

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 567
 
u := max xk1 ,...,kd+1 : x ∈ S .

Clearly, entry xk1 ,...,kd+1 has the same value in all tables with the given
line-sums if and only if l = u, which can therefore be tested in polynomial
time.
The algorithm of Theorem 2.4 and its extension to any family of hi-
erarchical margins allow statistical agencies to efficiently check possible
margins before disclosure: if an entry value is not unique then disclosure
may be assumed secure, whereas if the value is unique then disclosure may
be risky and fewer margins should be released.
We note that long tables, with one side much larger than the others, of-
ten arise in practical applications. For instance, in health statistical tables,
the long factor may be the age of an individual, whereas other factors may
be binary (yes-no) or ternary (subnormal, normal, and supnormal). More-
over, it is always possible to merge categories of factors, with the resulting
coarser tables approximating the original ones, making the algorithm of
Theorem 2.4 applicable.
Finally, we describe a procedure based on a suitable adaptation of the
algorithm of Theorem 2.4, that constructs the entire set of values that can
occur in a specified entry, rather than just decides its uniqueness. Here
S is the set of tables satisfying the given (hierarchical) margins, and the
running time is output-sensitive, that is, polynomial in the input encoding
plus the number of elements in the output set.
Procedure for constructing the set of values in an entry:
1. Initialize l := −∞, u := ∞, and E := ∅.
2. Solve in polynomial time the following linear n-fold integer pro-
grams:
 
ˆl := min xk ,...,k : l ≤ xk1 ,...,kd+1 ≤ u , x ∈ S ,
1 d+1

 
û := max xk1 ,...,kd+1 : l ≤ xk1 ,...,kd+1 ≤ u , x∈S .

3. If the problems in Step 2 are feasible then update l := ˆl + 1,


u := û − 1, E := E  {ˆl, û}, and repeat Step 2, else stop and
output the set of values E.
2.2. Multicommodity flows. The multicommodity transshipment
problem is a very general flow problem which seeks minimum cost routing
of several discrete commodities over a digraph subject to vertex demand
and edge capacity constraints. The data for the problem is as follows (see
Figure 3 for a small example). There is a digraph G with s vertices and t
edges. There are l types of commodities. Each commodity has a demand
vector dk ∈ Zs with dkv the demand for commodity k at vertex v (interpreted
as supply when positive and consumption when negative). Each edge e has
a capacity ue (upper bound on the combined flow of all commodities on

www.it-ebooks.info
568 SHMUEL ONN

Multicommodity Transshipment Example


Data:
digraph G
two commodities: red and green
edge capacities ue ulimited
edge costs fe(x1e+x2e):=(x1e+x2e)2 and g1e(x1e):=g2e(x2e):=0
vertex demands: (3 0)
d1 := (3 -1 -2) 3 -3 -1 2
d2 := (-3 2 1)
Solution: G
(0 3) (2 2)
X1 = (3 2 0)
X2 = (0 2 3)

Cost:
(3+0)2+(2+2)2+(0+3)2 = 34 -2 1

Fig. 3. Multicommodity transshipment example.

it). A multicommodity transshipment is a vector x = (x1 , . . . , xl ) with


xk ∈ Zt+ for all k and xke the flow of commodity k on edge e, satisfying the
l
k=1 xe ≤ ue for each edge e and demand constraint
k
capacity
 constraint

e∈δ + (v) xe −
k k k
e∈δ − (v) xe = dv for each vertex v and commodity k (with
+ −
δ (v), δ (v) the sets of edges entering and leaving vertex v).
The cost of transshipment x is defined as follows. There are cost
functions fe , gek : Z → Z for each edge and each edge-commodity pair. The
l l
transshipment cost on edge e is fe ( k=1 xke ) + k=1 gek (xke ) with the first
term being the value of fe on the combined flow of all commodities on e
and the second term being the sum of costs that depend on both the edge
and the commodity. The total cost is
   

t 
l 
l
fe xke + gek (xke ) .
e=1 k=1 k=1

Our results apply to cost functions which can be standard linear or


l l k
convex such as αe | k=1 xke |βe + k=1 γek |xke |δe for some nonnegative in-
tegers αe , βe , γek , δek , which take into account the increase in cost due to
channel congestion when subject to heavy traffic or communication load
(with the linear case obtained by βe = δek =1).

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 569

The theory of n-fold integer programming provides the first polynomial


time algorithms for the problem in two broad situations discussed in §2.2.1
and §2.2.2.
2.2.1. The many-commodity transshipment problem. Here we
consider the problem with variable number l of commodities over a fixed
(but arbitrary) digraph - the so termed many-commodity transshipment
problem. This problem may seem at first very restricted: however, even
deciding if a feasible many-transshipment exists (regardless of its cost) is
NP-complete already over the complete bipartite digraphs K3,n (oriented
from one side to the other) with only 3 vertices on one side [13]; moreover,
even over the single tiny digraph K3,3 , the only solution available to-date
is the one given below via n-fold integer programming.
As usual, fˆ and ĝ denote the maximum absolute values of the objective
functions f and g over the feasible set. It is usually easy to determine an
upper bound on these values from the problem data. For instance, in the
special case of linear cost functions f , g, bounds which are polynomial in
the binary-encoding length of the costs αe , γek , capacities u, and demands
dkv , are easily obtained by Cramer’s rule.
We have the following theorem on (nonlinear) many-commodity trans-
shipment.
Theorem 2.5. [13] For every fixed digraph G there is an algorithm
that, given l commodity types, demand dkv ∈ Z for each commodity k and
vertex v, edge capacities ue ∈ Z+ , and convex costs fe , gek : Z → Z presented
by evaluation oracles, solves in time polynomial in l and dkv , ue , fˆ, ĝ, the
many-commodity transshipment problem,
  l  
   l
k k k
min fe xe + ge (xe )
e k=1 k=1
  
l
s.t. xke ∈ Z , xke − xke = dkv , xke ≤ ue , xke ≥ 0 .
e∈δ + (v) e∈δ − (v) k=1

Proof. Assume G has s vertices and t edges and let D be its s×t vertex-
edge incidence matrix. Let f : Zt → Z and g : Zlt → Z be the l separable
t
convex functions defined by f (y) := e=1 fe (ye ) with ye := k=1 xke and
 
g(x) := te=1 lk=1 gek (xke ). Let x = (x1 , . . . , xl ) be the vector of variables
with xk ∈ Zt the flow of commodity k for each k. Then the problem can
be rewritten in vector form as
  l  
 l
min f xk + g (x) : x ∈ Zlt , Dxk = dk , xk ≤ u , x ≥ 0 .
k=1 k=1

We can now proceed in two ways.


First way: extend the vector of variables to x = (x0 , x1 , . . . , xl ) with
0
x ∈ Zt representing an additional slack commodity. Then the capacity

www.it-ebooks.info
570 SHMUEL ONN

l
constraints become k=0 xk = u and the cost function becomes f (u−x0 )+
g(x1 , . . . , xl ) which is also separable convex. Now let A be the (t, s) × t
bimatrix with first block A1  := It the t×t identity matrix and second block
A2 := D. Let d0 := Du − k=1 dk and let b := (u, d0 , d1 , . . . , dl ). Then
l

the problem becomes the (l + 1)-fold integer program


 . / . / 
min f u − x0 + g x1 , . . . , xl : x ∈ Z(l+1)t , A(l) x = b , x ≥ 0 .
(2.1)
By Theorem 1.2 this program can be solved in polynomial time as claimed.
Second way: let A be the (0, s) × t bimatrix with first block A1 empty
and second block A2 := D. Let W be the (t, 0) × t bimatrix with first
block W1 := It the t × t identity matrix and second block W2 empty. Let
b := (d1 , . . . , dl ). Then the problem is precisely the following l-fold integer
program,
 1 2 
min f W (l) x + g (x) : x ∈ Zlt , A(l) x = b , W (l) x ≤ u , x ≥ 0 .

By Theorem 1.5 this program can be solved in polynomial time as


claimed.
We also point out the following immediate corollary of Theorem 2.5.
Corollary 2.2. For any fixed s, the (convex) many-commodity
transshipment problem with variable l commodities on any s-vertex digraph
is polynomial time solvable.
2.2.2. The multicommodity transportation problem. Here we
consider the problem with fixed (but arbitrary) number l of commodities
over any bipartite subdigraph of Km,n (oriented from one side to the other)
- the so-called multicommodity transportation problem - with fixed number
m of suppliers and variable number n of consumers. This is very natural in
operations research applications where few facilities serve many customers.
The problem is difficult even for l = 2 commodities: deciding if a feasible
2-commodity transportation exists (regardless of its cost) is NP-complete
already over the complete bipartite digraphs Km,n [7]; moreover, even over
the digraphs K3,n with only m = 3 suppliers, the only available solution
to-date is the one given below via n-fold integer programming.
This problem seems harder than the one discussed in the previous
subsection (with no seeming analog for non bipartite digraphs), and the
formulation below is more delicate. Therefore it is convenient to change
the labeling of the data a little bit as follows (see Figure 4). We now
denote edges by pairs (i, j) where 1 ≤ i ≤ m is a supplier and 1 ≤ j ≤ n is
a consumer. The demand vectors are now replaced by (nonnegative) supply
and consumption vectors: each supplier i has a supply vector si ∈ Zl+ with
sik its supply in commodity k, and each consumer j has a consumption
vector cj ∈ Zl+ with cjk its consumption in commodity k. In addition,
here each commodity k may have its own volume vk ∈ Z+ per unit flow.

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 571

Multicommodity Transportation Problem


Find integer l commodity transportation x of minimum f,g cost
from m suppliers to n consumers in the bipartite digraph Km,n
Also given are supply and consumption vectors si and cj in Zl,
edge capacities ui,j and volume vk per unit commodity k
consumers
suppliers c1
s1

Km,n
sm

cn
For suitable (ml,l) x ml bimatrix A and suitable (0,m) x ml bimatrix W
derived from the vk the problem becomes the n-fold integer program
nml
min { f(W(n)x)+g(x) : x in Z , A(n)x =(si, cj), W(n)x ” u, x • 0 }

Fig. 4. Multicommodity transportation problem.

A multicommodity transportation is now indexed as x = (x1 , . . . , xn ) with


xj = (xj1,1 , . . . , xj1,l , . . . , xjm,1 , . . . , xjm,l ), where xji,k is the flow of commodity
k from supplier i to consumer j. The capacity 1 constraint
2  on edge (i,
1 j) is
2
l j l j l j j
k=1 vk xi,k ≤ ui,j and the cost is fi,j k=1 vk xi,k + k=1 gi,k xi,k
j
with fi,j , gi,k : Z → Z convex. As before, fˆ, ĝ denote the maximum
absolute values of f , g over the feasible set.

We assume below that the underlying digraph is Km,n (with edges ori-
ented from suppliers to consumers), since the problem over any subdigraph
G of Km,n reduces to that over Km,n by simply forcing 0 capacity on all
edges not present in G.

We have the following theorem on (nonlinear) multicommodity trans-


portation.

Theorem 2.6. [13] For any fixed l commodities, m suppliers, and vol-
umes vk , there is an algorithm that, given n, supplies and demands si , cj ∈
j
Zl+ , capacities ui,j ∈ Z+ , and convex costs fi,j , gi,k : Z → Z presented by
evaluation oracles, solves in time polynomial in n and si , cj , u, fˆ, ĝ, the
multicommodity transportation problem,

www.it-ebooks.info
572 SHMUEL ONN

   
  
l 1 2
min fi,j vk xji,k + j
gi,k xji,k
i,j k k=1
s.t.
  
l
xji,k ∈ Z , xji,k = sik , xji,k = cjk , vk xji,k ≤ ui,j , xji,k ≥ 0 .
j i k=1

Proof. Construct bimatrices A and W as follows. Let D be the (l, 0)×l


bimatrix with first block D1 := Il and second block D2 empty. Let V be
the (0, 1) × l bimatrix with first block V1 empty and second block V2 :=
(v1 , . . . , vl ). Let A be the (ml, l) × ml bimatrix with first block A1 := Iml
and second block A2 := D(m) . Let W be the (0, m) × ml bimatrix with
first block W1 empty and second block W2 := V (m) . Let b be the (ml + nl)-
vector b := (s1 , . . . , sm , c1 , . . . , cn ).
Let f : Znm → Zand g : Znml → Z be the separable l convex functions
j
defined by f (y) := f
i,j i,j i,j(y ) with yi,j := k=1 k i,k and g(x) :=
v x
 l j j
i,j k=1 gi,k (xi,k ).
Now note that A(n) x is an (ml + nl)-vector, whose first ml entries
are the flows from each supplier of each commodity to all consumers,
and whose last nl entries are the flows to each consumer of each com-
modity from all suppliers. Therefore the supply and consumption equa-
tions are encoded by A(n) x = b. Next note that the nm-vector y =
(y1,1 , . . . , ym,1 , . . . , y1,n , . . . , ym,n ) satisfies y = W (n) x. So the capacity con-
straints become W (n) x ≤ u and the cost function becomes f (W (n) x)+g(x).
Therefore, the problem is precisely the following n-fold integer program,
 1 2 
min f W (n) x + g (x) : x ∈ Znml , A(n) x = b , W (n) x ≤ u , x ≥ 0 .

By Theorem 1.5 this program can be solved in polynomial time as


claimed.
3. Theory. In §3.1 we define Graver bases of integer matrices and
show that they can be used to solve linear and nonlinear integer programs
in polynomial time. In §3.2 we show that Graver bases of n-fold products
can be computed in polynomial time and, incorporating the results of §3.1,
prove our Theorems 1.1–1.5 that establish the polynomial time solvability
of linear and nonlinear n-fold integer programming.
To simplify the presentation, and since the feasible set in most appli-
cations is finite or can be made finite by more careful modeling, whenever
an algorithm detects that the feasible set is infinite, it simply stops. So,
throughout our discussion, an algorithm is said to solve a (nonlinear) inte-
ger programming problem if it either finds an optimal solution x or concludes
that the feasible set is infinite or empty.
As noted in the introduction, any nonlinear function f involved is
presented either by a mere comparison oracle that for any two vectors x, y

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 573

can answer whether or not f (x) ≤ f (y), or by an evaluation oracle that for
any vector x can return f (x).
3.1. Graver bases and nonlinear integer programming. The
Graver basis is a fundamental object in the theory of integer programming
which was introduced by J. Graver already back in 1975 [11]. However,
only very recently, in the series of papers [4, 5, 12], it was established that
the Graver basis can be used to solve linear (as well as nonlinear) integer
programming problems in polynomial time. In this subsection we describe
these important new developments.
3.1.1. Graver bases. We begin with the definition of the Graver
basis and some of its basic properties. Throughout this subsection let A
be an integer m × n matrix. The lattice of A is the set L(A) := {x ∈ Zn :
Ax = 0} of integer vectors in its kernel. We use L∗ (A) := {x ∈ Zn : Ax =
0, x = 0} to denote the set of nonzero elements in L(A). We use a partial
order on Rn which extends the usual coordinate-wise partial order ≤
on the nonnegative orthant Rn+ and is defined as follows. For two vectors
x, y ∈ Rn we write x y and say that x is conformal to y if xi yi ≥ 0 and
|xi | ≤ |yi | for i = 1, . . . , n, that is, x and y lie in the same orthant of Rn
and each component of x is bounded by the corresponding component of
y in absolute value. A suitable extension of the classical lemma of Gordan
[10] implies that every subset of Zn has finitely many -minimal elements.
We have the following fundamental definition.
Definition 3.1. [11] The Graver basis of an integer matrix A is
defined to be the finite set G(A) ⊂ Zn of -minimal elements in L∗ (A) =
{x ∈ Zn : Ax = 0, x = 0}. Note that G(A) is centrally symmetric, that is,
g ∈ G(A) if and only if −g ∈ G(A). For instance, the Graver basis of the
1 × 3 matrix A := [1 2 1] consists of 8 elements,
G(A) = ± {(2, −1, 0), (0, −1, 2), (1, 0, −1), (1, −1, 1)} .
Note also that the Graver basis may contain elements, such as (1, −1, 1)
in the above small example, whose support involves linearly dependent
columns of A. So the cardinality of the Graver basis cannot be bounded in
terms of m and n only and depends on the entries of A as well. Indeed, the
Graver basis is typically exponential and cannot be written down, let alone
computed, in polynomial time. But, as we will show in the next section,
for n-fold products it can be computed efficiently.

A finite sum u := i vi of vectors in Rn is called conformal if vi u
for all i and hence all summands lie in the same orthant. We start with a
simple lemma. 
Lemma 3.1. Any x ∈ L∗ (A) is a conformal sum x = i gi of Graver
basis elements gi ∈ G(A), with some elements possibly appearing more than
once in the sum.
Proof. We use induction on the well partial order . Consider any
x ∈ L∗ (A). If it is -minimal in L∗ (A) then x ∈ G(A) by definition of the

www.it-ebooks.info
574 SHMUEL ONN

Graver basis and we are done. Otherwise, there is an element g ∈ G(A)



such that g  x. Set y := x − g. Then  y ∈ L (A) and y  x, so by
inductionthere is a conformal sum y = i gi with gi ∈ G(A) for all i. Now
x = g + i gi is a conformal sum of x.
We now provide a stronger form of Lemma 3.1 which basically follows
from the integer analogs of Carathéodory’s theorem established in [2] and
[26]. t
Lemma 3.2. Any x ∈ L∗ (A) is a conformal sum x = i=1 λi gi
involving t ≤ 2n − 2 Graver basis elements gi ∈ G(A) with nonnegative
integer coefficients λi ∈ Z+ .
Proof. We prove the slightly weaker bound t ≤ 2n − 1 from [2]. A
proof of the stronger bound can be found in [26]. Consider any x ∈ L∗ (A)
and let g1 , . . . , gs be all elements of G(A) lying in the same orthant as x.
Consider the linear program
 s 
 s
max λi : x = λi gi , λi ∈ R+ . (3.1)
i=1 i=1

By Lemma 3.1 the point x is a nonnegative linear combination of the gi


and hence the program (3.1) is feasible. Since all gi are nonzero and in
the same orthant as x, program (3.1) is also bounded. As is well known,
it then has a basic optimal solution, that is, an optimal solution λ1 , . . . , λs
with at most n of the λi nonzero. Let
 
y := (λi − λi )gi = x − λi gi .

If y = 0 then x = λi gi is a conformal sum of at most n of the gi and
we are done. Otherwise, y ∈ L∗ (A) and sy lies in the same orthant as x,
and hence,
 by Lemma 3.1 again, y = i=1 μi gi with all μi ∈ Z+ . Then
x = (μi + λ i )gi and hence,since the λi form an optimal
 solution to
(3.1), we have (μi + λi ) ≤ λi . Therefore μi ≤ (λi − λi ) < n
with the last inequality holding since at most n of the λi are nonzero.
 Since
the μi are integer, at most n−1 of them are nonzero. So x = (μi +λi )gi
is a conformal sum of x involving at most 2n − 1 of the gi .
The Graver basis also enables to check the finiteness of a feasible in-
teger program.
Lemma 3.3. Let G(A) be the Graver basis of matrix A and let l, u ∈
Zn∞ . If there is some g ∈ G(A) satisfying gi ≤ 0 whenever ui < ∞ and
gi ≥ 0 whenever li > −∞ then every set of the form S := {x ∈ Zn : Ax =
b , l ≤ x ≤ u} is either empty or infinite, whereas if there is no such g,
then every set S of this form is finite. Clearly, the existence of such g can
be checked in time polynomial in G(A), l, u.
Proof. First suppose there exists such g. Consider any such S. Sup-
pose S contains some point x. Then for all λ ∈ Z+ we have l ≤ x + λg ≤ u
and A(x+λg) = Ax = b and hence x+λg ∈ S, so S is infinite. Next suppose

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 575

S is infinite. Then the polyhedron P := {x ∈ Rn : Ax = b , l ≤ x ≤ u}


is unbounded and hence, as is well known, has a recession vector, that is,
a nonzero h, which we may assume to be integer, such that x + αh ∈ P
for all x ∈ P and α ≥ 0. This implies that h ∈ L∗ (A) and that hi ≤ 0
 ui < ∞ and hi ≥ 0 whenever li > −∞. So h is a conformal sum
whenever
h= gi of vectors gi ∈ G(A), each of which also satisfies gi ≤ 0 whenever
ui < ∞ and gi ≥ 0 whenever li > −∞, providing such g.
3.1.2. Separable convex integer minimization. In this subsec-
tion we consider the following nonlinear integer minimization problem

min{f (x) : x ∈ Zn , Ax = b, l ≤ x ≤ u} , (3.2)

where A is an integer m × n matrix, b ∈ Zm , l, u ∈ Zn∞ , and f : Zn → Z is


a separable convex function, that is, f (x) = nj=1 fj (xj ) with fj : Z → Z
a univariate convex function for all j. We prove a sequence of lemmas and
then combine them to show that the Graver basis of A enables to solve this
problem in polynomial time.
We start with two simple lemmas about univariate convex functions.
The first lemma establishes a certain supermodularity property of such func-
tions.
Lemma 3.4. Let f : R → R be a univariate convex function, let r be
a real number, and let s1 , . . . , sm be real numbers satisfying si sj ≥ 0 for all
i, j. Then we have
 
m m
f r+ si − f (r) ≥ (f (r + si ) − f (r)) .
i=1 i=1

Proof. We use induction on m. The claim holding trivially form = 1,


m
consider m > 1. Since all nonzero si have the same sign, sm = λ i=1 si
for some 0 ≤ λ ≤ 1. Then
   
m 
m−1 m
r+sm = (1−λ)r+λ r + si , r+ si = λr+(1−λ) r + si ,
i=1 i=1 i=1

and so the convexity of f implies


 

m−1
f (r + sm ) + f r + si
i=1
   

m 
m
≤ (1 − λ)f (r) + λf r+ si + λf (r) + (1 − λ)f r+ si
i=1 i=1
 

m
= f (r) + f r+ si .
i=1

www.it-ebooks.info
576 SHMUEL ONN

Subtracting 2f (r) from both sides and applying induction, we obtain, as


claimed,
 
m
f r+ si − f (r)
i=1
 

m−1
≥ f (r + sm ) − f (r) + f r+ si − f (r)
i=1

m
≥ (f (r + si ) − f (r)) .
i=1

The second lemma shows that univariate convex functions can be min-
imized efficiently over an interval of integers using repeated bisections.
Lemma 3.5. There is an algorithm that, given any two integer num-
bers r ≤ s and any univariate convex function f : Z → R given by a com-
parison oracle, solves in time polynomial in r, s the following univariate
integer minimization problem,
min { f (λ) : λ ∈ Z , r ≤ λ ≤ s } .
Proof. If r = s then λ := r is optimal. Assume then r ≤ s − 1.
Consider the integers
C D C D
r+s r+s
r ≤ < +1 ≤ s .
2 2
.G H/ .G H /
Use the oracle of f to compare f r+s 2 and f r+s 2 + 1 . By the con-
vexity of f :
. / . r+s /
2 / = f . 2  + 1/ ⇒
f . r+s λ :=  r+s
2  is a minimum of f; H
G r+s
f . 2 / < f . 2  + 1/ ⇒ the minimum of f is in [r,
r+s r+s
G 2H ];
2  >f  2 +1
f  r+s ⇒ the minimum of f is in [ r+s
r+s
2 + 1, s].

Thus, we either obtain the optimal point, or bisect the interval [r, s] and
repeat. So in O(log(s−r)) = O(r, s) bisections we find an optimal solution
λ ∈ Z ∩ [r, s].
The next two lemmas extend Lemmas 3.4 and 3.5. The first lemma
shows the supermodularity of separable convex functions with respect to
conformal sums.
Lemma 3.6. Let f : Rn→ R be any separable convex function, let
x ∈ Rn be any point, and let gi be any conformal sum in Rn . Then the
following inequality holds,
1  2 
f x+ gi − f (x) ≥ (f (x + gi ) − f (x)) .

n Let fj : R → R be univariate convex functions


Proof.  such that
f (x) = j=1 fj (xj ). Consider any 1 ≤ j ≤ n. Since gi is a conformal

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 577

sum, we have gi,j gk,j ≥ 0 for all i, k and so, setting r := xj and si := gi,j
for all i, Lemma 3.4 applied to fj implies
 
 
fj xj + gi,j − fj (xj ) ≥ (fj (xj + gi,j ) − fj (xj )) . (3.3)
i i

Summing the equations (3.3) for j = 1, . . . , n, we obtain the claimed in-


equality.
The second lemma concerns finding a best improvement step in a given
direction.
Lemma 3.7. There is an algorithm that, given bounds l, u ∈ Zn∞ ,
direction g ∈ Zn , point x ∈ Zn with l ≤ x ≤ u, and convex function
f : Zn → R presented by comparison oracle, solves in time polynomial in
l, u, g, x, the univariate problem,

min{f (x + λg) : λ ∈ Z+ , l ≤ x + λg ≤ u} (3.4)

Proof. Let S := {λ ∈ Z+ : l ≤ x + λg ≤ u} be the feasible set and


let s := sup S, which is easy to determine. If s = ∞ then conclude that
S is infinite and stop. Otherwise, S = {0, 1, . . . , s} and the problem can
be solved by the algorithm of Lemma 3.5 minimizing the univariate convex
function h(λ) := h(x + λg) over S.
We can now show that the Graver basis of A allows to solve problem
(3.2) in polynomial time, provided we are given an initial feasible point to
start with. We will later show how to find such an initial point as well. As
noted in the introduction, fˆ below denotes the maximum value of |f (x)|
over the feasible set (which need not be part of the input). An outline of
the algorithm is provided in Figure 5.
Lemma 3.8. There is an algorithm that, given an integer m×n matrix
A, its Graver basis G(A), vectors l, u ∈ Zn∞ and x ∈ Zn with l ≤ x ≤ u, and
separable convex function f : Zn → Z presented by a comparison oracle,
solves the integer program

min{f (z) : z ∈ Zn , Az = b , l ≤ z ≤ u} , b := Ax , (3.5)

in time polynomial in the binary-encoding length G(A), l, u, x, fˆ of the


data.
Proof. First, apply the algorithm of Lemma 3.3 to G(A) and l, u and
either detect that the feasible set is infinite and stop, or conclude it is finite
and continue. Next produce a sequence of feasible points x0 , x1 , . . . , xs with
x0 := x the given input point, as follows. Having obtained xk , solve the
univariate minimization problem

min{f (xk + λg) : λ ∈ Z+ , g ∈ G(A) , l ≤ xk + λg ≤ u } (3.6)

by applying the algorithm of Lemma 3.7 for each g ∈ G(A). If the mini-
mal value in (3.6) satisfies f (xk + λg) < f (xk ) then set xk+1 := xk + λg

www.it-ebooks.info
578 SHMUEL ONN

Separable Convex Minimization Using Graver Bases

Solve: min { f(x) : x in Zn : Ax = b, l ” x ” u }


n
R
Given: the Graver basis G(A)
and initial feasible point
f

R
Algorithm: Iteratively greedily augment initial point
to optimal one using elements from G(A)

Supermodularity of f and integer Carathéodory’s theorem assure


polynomial convergence

Fig. 5. Separable convex minimization using Graver bases.

and repeat, else stop and output the last point xs in the sequence. Now,
Axk+1 = A(xk + λg) = Axk = b by induction on k, so each xk is feasible.
Since the feasible set is finite and the xk have decreasing objective values
and hence distinct, the algorithm terminates.
We now show that the point xs output by the algorithm is optimal.
Let x∗ be any optimal solution to (3.5). Consider any point xk in the
sequence and suppose it is not optimal. We claim that a new point xk+1
will be produced and will satisfy

2n − 3
f (xk+1 ) − f (x∗ ) ≤ (f (xk ) − f (x∗ )) . (3.7)
2n − 2
t
By Lemma 3.2, we can write the difference x∗ −xk = i=1 λi gi as conformal
sum involving 1 ≤ t ≤ 2n − 2 elements gi ∈ G(A) with all λi ∈ Z+ . By
Lemma 3.6,
 

t 
t

f (x ) − f (xk ) = f xk + λi gi − f (xk ) ≥ (f (xk + λi gi ) − f (xk )) .
i=1 i=1

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 579

Adding t (f (xk ) − f (x∗ )) on both sides and rearranging terms we obtain


t
(f (xk + λi gi ) − f (x∗ )) ≤ (t − 1) (f (xk ) − f (x∗ )) .
i=1

Therefore there is some summand on the left-hand side satisfying


t−1 2n − 3
f (xk + λi gi )−f (x∗ ) ≤ (f (xk ) − f (x∗ )) ≤ (f (xk ) − f (x∗ )) .
t 2n − 2
So the point xk + λg attaining minimum in (3.6) satisfies

2n − 3
f (xk + λg) − f (x∗ ) ≤ f (xk + λi gi ) − f (x∗ ) ≤ (f (xk ) − f (x∗ ))
2n − 2
and so indeed xk+1 := xk + λg will be produced and will satisfy (3.7).
This shows that the last point xs produced and output by the algorithm is
indeed optimal.
We proceed to bound the number s of points. Consider any i < s and
the intermediate non optimal point xi in the sequence produced by the
algorithm. Then f (xi ) > f (x∗ ) with both values integer, and so repeated
use of (3.7) gives


i−1
f (xk+1 ) − f (x∗ )

1 ≤ f (xi ) − f (x ) = (f (x) − f (x∗ ))
f (xk ) − f (x∗ )
k=0
 i
2n − 3
≤ (f (x) − f (x∗ ))
2n − 2

and therefore
 −1
2n − 2
i ≤ log log (f (x) − f (x∗ )) .
2n − 3

Therefore the number s of points produced by the algorithm is at most one


unit larger than this bound, and using a simple bound on the logarithm,
we obtain

s = O (n log(f (x) − f (x∗ ))) .

Thus, the number of points produced and the total running time are poly-
nomial.
Next we show that Lemma 3.8 can also be used to find an initial
feasible point for the given integer program or assert that none exists in
polynomial time.
Lemma 3.9. There is an algorithm that, given integer m × n matrix
A, its Graver basis G(A), l, u ∈ Zn∞ , and b ∈ Zm , either finds an x ∈ Zn

www.it-ebooks.info
580 SHMUEL ONN

satisfying l ≤ x ≤ u and Ax = b or asserts that none exists, in time which


is polynomial in A, G(A), l, u, b.
Proof. Assume that l ≤ u and that li < ∞ and uj > −∞ for all
j, since otherwise there is no feasible point. Also assume that there is no
g ∈ G(A) satisfying gi ≤ 0 whenever ui < ∞ and gi ≥ 0 whenever li > −∞,
since otherwise S is empty or infinite by Lemma 3.3. Now, either detect
there is no integer solution to the system of equations Ax = b (without
the lower and upper bound constraints) and stop, or determine some such
solution x̂ ∈ Zn and continue; it is well known that this can be done in
polynomial time, say, using the Hermite normal formof A, see [25]. Next
n
define a separable convex function on Zn by f (x) := j=1 fj (xj ) with

⎨ lj − xj , if xj < lj
fj (xj ) := 0, if lj ≤ xj ≤ uj , j = 1, . . . , n

xj − uj , if xj > uj

and extended lower and upper bounds


ˆlj := min{lj , x̂j } , ûj := max{uj , x̂j } , j = 1, . . . , n .

Consider the auxiliary separable convex integer program

min{f (z) : z ∈ Zn , Az = b , ˆl ≤ z ≤ û}. (3.8)

First note that ˆlj > −∞ if and only if lj > −∞ and ûj < ∞ if and only if
uj < ∞. Therefore there is no g ∈ G(A) satisfying gi ≤ 0 whenever ûi < ∞
and gi ≥ 0 whenever ˆli > −∞ and hence the feasible set of (3.8) is finite by
Lemma 3.3. Next note that x̂ is feasible in (3.8). Now apply the algorithm
of Lemma 3.8 to (3.8) and obtain an optimal solution x. Note that this
can be done in polynomial time since the binary length of x̂ and therefore
also of ˆl, û and of the maximum value fˆ of |f (x)| over the feasible set of
(3.8) are polynomial in the length of the data.
Now note that every point z ∈ S is feasible in (3.8), and every point z
feasible in (3.8) satisfies f (z) ≥ 0 with equality if and only if z ∈ S. So, if
f (x) > 0 then the original set S is empty, whereas if f (x) = 0 then x ∈ S
is a feasible point.
We are finally in position, using Lemmas 3.8 and 3.9, to show that
the Graver basis allows to solve the nonlinear integer program (3.2) in
polynomial time. As usual, fˆ is the maximum of |f (x)| over the feasible
set and need not be part of the input.
Theorem 3.1. [12] There is an algorithm that, given integer m × n
matrix A, its Graver basis G(A), l, u ∈ Zn∞ , b ∈ Zm , and separable convex
f : Zn → Z presented by comparison oracle, solves in time polynomial in
A, G(A), l, u, b, fˆ the problem

min{f (x) : x ∈ Zn , Ax = b , l ≤ x ≤ u} .

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 581

Proof. First, apply the polynomial time algorithm of Lemma 3.9 and
either conclude that the feasible set is infinite or empty and stop, or obtain
an initial feasible point and continue. Next, apply the polynomial time
algorithm of Lemma 3.8 and either conclude that the feasible set is infinite
or obtain an optimal solution.
3.1.3. Specializations and extensions.

n Linear integer programming. Clearly, any linear function wx =


i=1 wi xi is separable convex. Moreover, an upper bound on |wx| over the
feasible set (when finite), which is polynomial in the binary-encoding length
of the data, readily follows from Cramer’s rule. Therefore we obtain, as
an immediate special case of Theorem 3.1, the following important result,
asserting that Graver bases enable the polynomial time solution of linear
integer programming.
Theorem 3.2. [4] There is an algorithm that, given an integer m × n
matrix A, its Graver basis G(A), l, u ∈ Zn∞ , b ∈ Zm , and w ∈ Zn , solves in
time which is polynomial in A, G(A), l, u, b, w, the following linear integer
programming problem,
min{wx : x ∈ Zn , Ax = b , l ≤ x ≤ u} .
Distance minimization. Another useful special case of Theorem 3.1
which is natural in various applications such as image processing, tomog-
raphy, communication, and error correcting codes, is the following result,
which asserts that the Graver basis enables to determine a feasible point
which is lp -closest to a given desired goal point in polynomial time.
Theorem 3.3. [12] There is an algorithm that, given integer m × n
matrix A, its Graver basis G(A), positive integer p, vectors l, u ∈ Zn∞ ,
b ∈ Zm , and x̂ ∈ Zn , solves in time polynomial in p and A, G(A), l, u, b, x̂,
the distance minimization problem
min {x − x̂p : x ∈ Zn , Ax = b, l ≤ x ≤ u} . (3.9)
For p = ∞ the problem (3.9) can be solved in time which is polynomial in
A, G(A), l, u, b, x̂.
Proof. For finite p apply the algorithm of Theorem 3.1 taking f to be
the p-th power x − x̂pp of the lp distance. If the feasible set is nonempty
and finite (else the algorithm stops) then the maximum value fˆ of |f (x)|
over it is polynomial in p and A, l, u, b, x̂, and hence an optimal solution
can be found in polynomial time.
Consider p = ∞. Using Cramer’s rule it is easy to compute an integer
ρ with ρ polynomially bounded in A, l, u, b that, if the feasible set is
finite, provides an upper bound on x∞ for any feasible x . Let q be a
positive integer satisfying
log n
q > .
log(1 + (2ρ)−1 )

www.it-ebooks.info
582 SHMUEL ONN

Now apply the algorithm of the first paragraph above for the lq distance.
Assuming the feasible set is nonempty and finite (else the algorithm stops)
let x∗ be the feasible point which minimizes the lq distance to x̂ obtained
by the algorithm. We claim that it also minimizes the l∞ distance to x̂
and hence is the desired optimal solution. Consider any feasible point x.
By standard inequalities between the l∞ and lq norms,

x∗ − x̂∞ ≤ x∗ − x̂q ≤ x − x̂q ≤ n q x − x̂∞ .


1

Therefore

x∗ − x̂∞ − x − x̂∞ ≤ (n q − 1)x − x̂∞ ≤ (n q − 1)2ρ < 1 ,


1 1

where the last inequality holds by the choice of q. Since x∗ − x̂∞ and
x − x̂∞ are integers we find that x∗ − x̂∞ ≤ x − x̂∞ . This establishes
the claim.
In particular, for all positive p ∈ Z∞ , using the Graver basis we can
solve

min {xp : x ∈ Zn , Ax = b, l ≤ x ≤ u} ,

which for p = ∞ is equivalent to the min-max integer program

min {max{|xi | : i = 1, . . . , n} : x ∈ Zn , Ax = b, l ≤ x ≤ u} .

Convex integer maximization. We proceed to discuss the max-


imization of a convex function over of the composite form f (W x), with
f : Zd → Z any convex function and W any integer d × n matrix.
We need a result of [23]. A linear-optimization oracle for a set S ⊂ Zn
is one that, given w ∈ Zn , solves the linear optimization problem max{wx :
x ∈ S}. A direction of an edge (1-dimensional face) e of a polyhedron P is
any nonzero scalar multiple of u − v where u, v are any two distinct points
in e. A set of all edge-directions of P is one that contains some direction
of each edge of P , see Figure 6.
Theorem 3.4. [23] For all fixed d there is an algorithm that, given
a finite set S ⊂ Zn presented by linear-optimization oracle, integer d ×
n matrix W , set E ⊂ Zn of all edge-directions of conv(S), and convex
f : Zd → R presented by comparison oracle, solves in time polynomial in
max{x∞ : x ∈ S}, W, E, the convex problem

max {f (W x) : x ∈ S} .

We now show that, fortunately enough, the Graver basis of a matrix A is a


set of all edge-directions of the integer hull related to the integer program
defined by A.
Lemma 3.10. For every integer m × n matrix A, l, u ∈ Zn∞ , and
b ∈ Zm , the Graver basis G(A) is a set of all edge-directions of PI :=
conv{x ∈ Zn : Ax = b, l ≤ x ≤ u}.

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 583

Edge-Directions of a Convex Polytope

Fig. 6. Edge-directions of a convex polytope.

Proof. Consider any edge e of PI and pick two distinct integer points
x, y ∈e. Then g := y − x is in L∗ (A) and hence Lemma 3.1 implies that
g = i hi is a conformal sum for suitable hi ∈ G(A). We claim that
x + hi ∈ PI for all i. Indeed, hi ∈ G(A) implies A(x + hi ) = Ax = b, and
l ≤ x, x + g ≤ u and hi g imply l ≤ x + hi ≤ u.
Now let w ∈ Zn be uniquely maximized over PI at the edge e. Then
whi = w(x + hi ) − wx ≤ 0 for all i. But whi = wg = wy − wx = 0,
implying that in fact whi = 0 and hence x + hi ∈ e for all i. This implies
that hi is a direction of e (in fact, all hi are the same and g is a multiple
of some Graver basis element).
Using Theorems 3.2 and 3.4 and Lemma 3.10 we obtain the following
theorem.
Theorem 3.5. [5] For every fixed d there is an algorithm that, given
integer m × n matrix A, its Graver basis G(A), l, u ∈ Zn∞ , b ∈ Zm , integer
d×n matrix W , and convex function f : Zd → R presented by a comparison
oracle, solves in time which is polynomial in A, W, G(A), l, u, b, the convex
integer maximization problem

max {f (W x) : x ∈ Zn , Ax = b, l ≤ x ≤ u} .

www.it-ebooks.info
584 SHMUEL ONN

Proof. Let S := {x ∈ Zn : Ax = b , l ≤ x ≤ u}. The algorithm of


Theorem 3.2 allows to simulate in polynomial time a linear-optimization
oracle for S. In particular, it allows to either conclude that S is infinite
and stop or conclude that it is finite, in which case max{x∞ : x ∈ S} is
polynomial in A, l, u, b, and continue. By Lemma 3.10, the given Graver
basis is a set of all edge-directions of conv(S) = PI . Hence the algorithm
of Theorem 3.4 can be applied, and provides the polynomial time solution
of the convex integer maximization program.
3.2. N-fold integer programming. In this subsection we focus our
attention on (nonlinear) n-fold integer programming. In §3.2.1 we study
Graver bases of n-fold products of integer bimatrices and show that they
can be computed in polynomial time. In §3.2.2 we combine the results of
§3.1 and §3.2.1, and prove our Theorems 1.1–1.5, which establish the poly-
nomial time solvability of linear and nonlinear n-fold integer programming.
3.2.1. Graver bases of n-fold products. Let A be a fixed integer
(r, s) × t bimatrix with blocks A1 , A2 . For each positive integer n we index
vectors in Znt as x = (x1 , . . . , xn ) with each brick xk lying in Zt . The type
of vector x is the number type(x) := |{k : xk = 0}| of nonzero bricks of x.
The following definition plays an important role in the sequel.
Definition 3.2. [24] The Graver complexity of an integer bimatrix
A is defined as
 
g(A) := inf g ∈ Z+ : type(x) ≤ g for all x ∈ G(A(n) ) and all n .

We proceed to establish a result of [24] and its extension in [16] which show
that, in fact, the Graver complexity of every integer bimatrix A is finite.
Consider n-fold products A(n) nof A. By definition of the n-fold prod-
uct, A(n) x = 0 if and only if A1 k=1 xk = 0 and A2 xk = 0 for all k. In
particular, a necessary condition for x to lie in L(A(n) ), and in particular
in G(A(n) ), is that xk ∈ L(A2 ) for all k. Call a vector x = (x1 , . . . , xn ) full
if, in fact, xk ∈ L∗ (A2 ) for all k, in which case type(x) = n, and pure if,
moreover, xk ∈ G(A2 ) for all k. Full vectors, and in particular pure vectors,
are natural candidates for lying in the Graver basis G(A(n) ) of A(n) , and
will indeed play an important role in its construction.
Consider any full vector y = (y 1 , . . . , y m ). By definition, each brick of
ki i,j
y satisfies yi ∈ L∗ (A2 ) and is therefore a conformal sum y i = j=1 x of
some elements xi,j ∈ G(A2 ) for all i, j. Let n := k1 + · · · + km ≥ m and let
x be the pure vector

x = (x1 , . . . , xn ) := (x1,1 , . . . , x1,k1 , . . . , xm,1 , . . . , xm,km ) .

We call the pure vector x an expansion of the full vector y, and we call
full vector y a compression of the pure vector x. Note that A1 y i =
the
A1 xi,j and therefore y ∈ L(A(m) ) if and only if x ∈ L(A(n) ). Note also

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 585

that each full y may have many different expansions and each pure x may
have many different compressions.
Lemma 3.11. Consider any full y = (y 1 , . . . , y m ) and any expansion
x = (x1 , . . . , xn ) of y. If y is in the Graver basis G(A(m) ) then x is in the
Graver basis G(A(n) ).
Proof. Let x = (x1,1 , . . . , xm,km ) = (x1 , . . . , xn ) be an expansion of
ki i,j
y = (y 1 , . . . , y m ) with y i = j=1 x for each i. Suppose indirectly y ∈
(m)
G(A ) but x ∈ / G(A ). Since y ∈ L∗ (A(m) ) we have x ∈ L∗ (A(n) ).
(n)

Since x ∈ / G(A ), there exists an element g = (g 1,1 , . . . , g m,km ) in G(A(n) )


(n)

satisfying g  x. Let h = (h1 , . . . , hm ) be the compression of g defined by


 i i,j
hi := kj=1 g . Since g ∈ L∗ (A(n) ) we have h ∈ L∗ (A(m) ). But h  y,
contradicting y ∈ G(A(m) ). This completes the proof.
Lemma 3.12. The Graver complexity g(A) of every integer bimatrix
A is finite.
Proof. We need to bound the type of any element in the Graver basis of
the l-fold product of A for any l. Suppose there is an element z of type m in
some G(A(l) ). Then its restriction y = (y 1 , . . . , y m ) to its m nonzero bricks
is a full vector and is in the Graver basis G(A(m) ). Let x = (x1 , . . . , xn )
be any expansion of y. Then type(z) = m ≤ n = type(x), and by Lemma
3.11, the pure vector x is in G(A(n) ).
Therefore, it suffices to bound the type of any pure element in the
Graver basis of the n-fold product of A for any n. Suppose x = (x1 , . . . , xn )
is a pure element in G(A(n) ) for some n. Let G(A2 ) = {g 1 , . . . , g p } be the
Graver basis of A2 and let G2 be the t × p matrix whose columns are the g i .
Let v ∈ Zp+ be the vector with vi := |{k : xk = g i }|  counting the number of
p
bricks of x which are equal to g i for each i. Then i=1 vi = type(x) = n.
Now, note that A1 G2 v = A1 k=1 x = 0 and hence v ∈ L∗ (A1 G2 ). We
n k

claim that, moreover, v is in G(A1 G2 ). Suppose indirectly it is not. Then


there is a v̂ ∈ G(A1 G2 ) with v̂  v, and it is easy to obtain a nonzero
x̂  x from x by zeroing out some bricks so that v̂i = |{k : x̂k = g i }| for all
i. Then A1 k=1 x̂k = A1 G2 v̂ = 0 and hence x̂ ∈ L∗ (A(n) ), contradicting
n

x ∈ G(A(n) ).
So the type of any pure vector, p and hence the Graver complexity of
A, is at most the largest value i=1 vi of any nonnegative vector v in the
Graver basis G(A1 G2 ).
We proceed to establish the following theorem from [4] which asserts
that Graver bases of n-fold products can be computed in polynomial time.
An n-lifting of a vector y = (y 1 , . . . , y m ) consisting of m bricks is any vector
z = (z 1 , . . . , z n ) consisting of n bricks such that for some 1 ≤ k1 < · · · <
km ≤ n we have z ki = y i for i = 1, . . . , m, and all other bricks of z are
zero; in particular, n ≥ m and type(z) = type(y).
Theorem 3.6. [4] For every fixed integer bimatrix A there is an
algorithm that, given positive integer n, computes the Graver basis G(A(n) )
of the n-fold product of A, in time which is polynomial in n. In particular,

www.it-ebooks.info
586 SHMUEL ONN

the cardinality |G(A(n) )| and the binary-encoding length G(A(n) ) of the


Graver basis of A(n) are polynomial in n.
Proof. Let g := g(A) be the Graver complexity of A. Since A is
fixed, so is g. Therefore, for every n ≤ g, the Graver basis G(A(n) ), and
in particular, the Graver basis G(A(g) ) of the g-fold product of A, can be
computed in constant time.
Now, consider any n > g. We claim that G(A(n) ) satisfies
 
G(A(n) ) = z : z is an n-lifting of some y ∈ G(A(g) ) .

Consider any n-lifting z of any y ∈ G(A(g) ). Suppose indirectly z ∈ / G(A(n) ).


 (n)  
Then there exists z ∈ G(A ) with z  z. But then z is the n-lifting of
some y  ∈ L∗ (A(g) ) with y   y, contradicting y ∈ G(A(g) ). So z ∈ G(A(n) ).
Conversely, consider any z ∈ G(A(n) ). Then type(z) ≤ g and hence z
is the n-lifting of some y ∈ L∗ (A(g) ). Suppose indirectly y ∈/ G(A(g) ). Then
there exists y ∈ G(A ) with y  y. But then the n-lifting z  of y  satisfies
 (g) 

z  ∈ L∗ (A(n) ) with z   z, contradicting z ∈ G(A(n) ). So y ∈ G(A(g)


. ).
/
Now, the number of n-liftings of each y ∈ G(A(g) ) is at most ng , and
hence
 
(n) n
|G(A )| ≤ |G(A(g) )| = O(ng ) .
g

So the set of all n-liftings of vectors in G(A(g) ) and hence the Graver basis
G(A(n) ) of the n-fold product can be computed in time polynomial in n as
claimed.
3.2.2. N-fold integer programming in polynomial time. Com-
bining Theorem 3.6 and the results of §3.1 we now obtain Theorems 1.1–1.4.
Theorem 1.1 [4] For each fixed integer (r, s) × t bimatrix A, there is an
algorithm that, given positive integer n, l, u ∈ Znt
∞, b ∈ Z
r+ns
, and w ∈ Znt ,
solves in time which is polynomial in n and l, u, b, w, the following linear
n-fold integer program,
 
min wx : x ∈ Znt , A(n) x = b , l ≤ x ≤ u .

Proof. Compute the Graver basis G(A(n) ) using the algorithm of The-
orem 3.6. Now apply the algorithm of Theorem 3.2 with this Graver basis
and solve the problem.
Theorem 1.2 [12] For each fixed integer (r, s) × t bimatrix A, there is
∞, b ∈ Z
an algorithm that, given n, l, u ∈ Znt r+ns
, and separable convex
f : Z → Z presented by a comparison oracle, solves in time polynomial
nt

in n and l, u, b, fˆ, the program


 
min f (x) : x ∈ Znt , A(n) x = b , l ≤ x ≤ u .

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 587

Proof. Compute the Graver basis G(A(n) ) using the algorithm of The-
orem 3.6. Now apply the algorithm of Theorem 3.1 with this Graver basis
and solve the problem.
Theorem 1.3 [12] For each fixed integer (r, s) × t bimatrix A, there is an
∞, b ∈ Z
algorithm that, given positive integers n and p, l, u ∈ Znt r+ns
, and
x̂ ∈ Z , solves in time polynomial in n, p, and l, u, b, x̂, the following
nt

distance minimization program,

min {x − x̂p : x ∈ Znt , A(n) x = b, l ≤ x ≤ u} . (3.10)

For p = ∞ the problem (3.10) can be solved in time polynomial in n and


l, u, b, x̂.
Proof. Compute the Graver basis G(A(n) ) using the algorithm of The-
orem 3.6. Now apply the algorithm of Theorem 3.3 with this Graver basis
and solve the problem.
Theorem 1.4 [5] For each fixed d and (r, s) × t integer bimatrix A, there
∞ , integer d × nt matrix W ,
is an algorithm that, given n, bounds l, u ∈ Znt
b ∈ Zr+ns , and convex function f : Zd → R presented by a comparison
oracle, solves in time polynomial in n and W, l, u, b, the convex n-fold
integer maximization program

max{f (W x) : x ∈ Znt , A(n) x = b , l ≤ x ≤ u} .

Proof. Compute the Graver basis G(A(n) ) using the algorithm of The-
orem 3.6. Now apply the algorithm of Theorem 3.5 with this Graver basis
and solve the problem.
3.2.3. Weighted separable convex integer minimization. We
proceed to establish Theorem 1.5 which is a broad extension of Theorem
1.2 that allows the objective function to include a composite term of the
form f (W x), where f : Zd → Z is a separable convex function and W is
an integer matrix with d rows, and to incorporate inequalities on W x. We
begin with two lemmas. As before, fˆ, ĝ denote the maximum values of
|f (W x)|, |g(x)| over the feasible set.
Lemma 3.13. There is an algorithm that, given an integer m × n
matrix A, an integer d × n matrix W , l, u ∈ Zn∞ , ˆl, û ∈ Zd∞ , b ∈ Zm , the
Graver basis G(B) of
 
A 0
B := ,
W I

and separable convex functions f : Zd → Z, g : Zn → Z presented by evalu-


ation oracles, solves in time polynomial in A, W, G(B), l, u, ˆl, û, b, fˆ, ĝ, the
problem

min{f (W x) + g(x) : x ∈ Zn , Ax = b , ˆl ≤ W x ≤ û , l ≤ x ≤ u} . (3.11)

www.it-ebooks.info
588 SHMUEL ONN

Proof. Define h : Zn+d → Z by h(x, y) := f (−y) + g(x) for all x ∈ Zn


and y ∈ Zd . Clearly, h is separable convex since f, g are. Now, problem
(3.11) can be rewritten as

min{h(x, y) : (x, y) ∈ Zn+d ,


    
A 0 x b
= , l ≤ x ≤ u, −û ≤ y ≤ −ˆl}
W I y 0

and the statement follows at once by applying Theorem 3.1 to this


problem.
Lemma 3.14. For every fixed integer (r, s) × t bimatrix A and (p, q) ×
t bimatrix W , there is an algorithm that, given any positive integer n,
computes in time polynomial in n, the Graver basis G(B) of the following
(r + ns + p + nq) × (nt + p + nq) matrix,

 
A(n) 0
B := .
W (n) I

Proof. Let D be the (r + p, s + q) × (t + p + q) bimatrix whose blocks


are defined by
   
A1 0 0 A2 0 0
D1 := , D2 := .
W1 Ip 0 W2 0 Iq

Apply the algorithm of Theorem 3.6 and compute in polynomial time the
Graver basis G(D (n) ) of the n-fold product of D, which is the following
matrix:

0 1
A1 0 0 A1 0 0 ··· A1 0 0
B W1 Ip 0 W1 Ip 0 ··· W1 Ip 0 C
B C
B A2 0 0 0 0 0 ··· 0 0 0 C
B C
B W2 Iq ··· C
B 0 0 0 0 0 0 0 C
B A2 ··· C
D(n) = B
B
0 0 0 0 0 0 0 0 C .
C
B 0 0 0 W2 0 Iq ··· 0 0 0 C
B C
B .. .. .. .. .. .. .. .. .. .. C
B . . . . . . . . . . C
B C
@ 0 0 0 0 0 0 ··· A2 0 0 A
0 0 0 0 0 0 ··· W2 0 Iq

Suitable row and column permutations applied to D(n) give the following
matrix:

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 589

0 1
A1 A1 ··· A1 0 0 ··· 0 0 0 ··· 0
B A2 0 ··· 0 0 0 ··· 0 0 0 ··· 0 C
B C
B 0 A2 ··· 0 0 0 ··· 0 0 0 ··· 0 C
B C
B . .. .. .. .. .. .. .. .. .. .. .. C
B .. C
B . . . . . . . . . . . C
B C
B 0 0 ··· A2 0 0 ··· 0 0 0 ··· 0 C
C := B C .
B W1 W1 ··· W1 Ip Ip ··· Ip 0 0 ··· 0 C
B C
B W2 0 ··· 0 0 0 ··· 0 Iq 0 ··· 0 C
B C
B 0 W2 ··· 0 0 0 ··· 0 0 Iq ··· 0 C
B C
B .. .. .. .. .. .. .. .. .. .. .. .. C
@ . . . . . . . . . . . . A
0 0 ··· W2 0 0 ··· 0 0 0 ··· Iq

Obtain the Graver basis G(C) in polynomial time from G(D(n) ) by per-
muting the entries of each element of the latter by the permutation of the
columns of G(D(n) ) that is used to get C (the permutation of the rows does
not affect the Graver basis).
Now, note that the matrix B can be obtained from C by dropping all
but the first p columns in the second block. Consider any element in G(C),
indexed, according to the block structure, as

(x1 , x2 , . . . , xn , y1 , y 2 , . . . , yn , z 1 , z 2 , . . . , z n ) .

Clearly, if y k = 0 for k = 2, . . . , n then the restriction

(x1 , x2 , . . . , xn , y 1 , z 1 , z 2 , . . . , z n )

of this element is in the Graver basis of B. On the other hand, if

(x1 , x2 , . . . , xn , y 1 , z 1 , z 2 , . . . , z n )

is any element in G(B) then its extension

(x1 , x2 , . . . , xn , y 1 , 0, . . . , 0, z 1 , z 2 , . . . , z n )

is clearly in G(C). So the Graver basis of B can be obtained in polynomial


time by

G(B) := {(x1 , . . . , xn , y 1 , z 1 , . . . , z n ) :
(x1 , . . . , xn , y 1 , 0, . . . , 0, z 1 , . . . , z n ) ∈ G(C)}.

This completes the proof.


Theorem 1.5 [13] For each fixed integer (r, s) × t bimatrix A and integer
(p, q) × t bimatrix W , there is an algorithm that, given n, l, u ∈ Znt ˆ
∞ , l, û ∈
Z∞ , b ∈ Z
p+nq r+ns
, and separable convex functions f : Z p+nq
→ Z, g :

www.it-ebooks.info
590 SHMUEL ONN

Znt → Z presented by evaluation oracles, solves in time polynomial in n


and l, u, ˆl, û, b, fˆ, ĝ, the generalized program

min{f (W (n) x) + g(x) : x ∈ Znt , A(n) x = b, ˆl ≤ W (n) x ≤ û , l ≤ x ≤ u}.

Proof. Use the algorithm of Lemma 3.14 to compute the Graver basis
G(B) of
 (n) 
A 0
B := .
W (n) I

Now apply the algorithm of Lemma 3.13 and solve the nonlinear integer
program.
4. Discussion. We conclude with a short discussion of the universal-
ity of n-fold integer programming and the Graver complexity of (directed)
graphs, a new important invariant which controls the complexity of our
multiway table and multicommodity flow applications.
4.1. Universality of n-fold integer programming. Let us intro-
duce the following notation. For an integer s × t matrix D, let D denote
the (t, s) × t bimatrix whose first block is the t × t identity matrix and
whose second block is D. Consider the following special form of the n-fold
(n)
product, defined for a matrix D, by D [n] := (D) . We consider such m-
[m]
fold products of the 1 × 3 matrix 13 := [1, 1, 1]. Note that 13 is precisely
the (3 + m) × 3m incidence matrix of the complete bipartite graph K3,m .
For instance, for m = 3, it is the matrix
⎛ ⎞
1 0 0 1 0 0 1 0 0
⎜ 0 1 0 0 1 0 0 1 0 ⎟
⎜ ⎟
⎜ 0 0 1 0 0 1 0 0 1 ⎟
[3]
13 = ⎜ ⎜ ⎟ .

⎜ 1 1 1 0 0 0 0 0 0 ⎟
⎝ 0 0 0 1 1 1 0 0 0 ⎠
0 0 0 0 0 0 1 1 1

We can now rewrite Theorem 2.1 in the following compact and elegant
form.
The Universality Theorem [7] Every rational polytope {y ∈ Rd+ : Ay =
b} stands in polynomial time computable integer preserving bijection with
some polytope
 
[m][n]
x ∈ R3mn
+ : 13 x=a . (4.1)

The bijection constructed by the algorithm of this theorem is, moreover, a


simple projection from R3mn to Rd that erases all but some d coordinates
(see [7]). For i = 1, . . . , d let xσ(i) be the coordinate of x that is mapped
to yi under this projection. Then any linear or nonlinear integer program

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 591

min{f (y) : y ∈ Zd+ , Ay = b} can be lifted in polynomial time to the


[m][n]
following integer program over a simple {0, 1}-valued matrix 13 which
is completely determined by two parameters m and n only,
 . / 
[m][n]
min f xσ(1) , . . . , xσ(d) : x ∈ Z3mn
+ , 13 x = a . (4.2)

This also shows the universality of n-fold integer programming: every lin-
ear or nonlinear integer program is equivalent to an n-fold integer program
[m]
over some bimatrix 13 which is completely determined by a single pa-
rameter m.
Moreover, for every fixed m, program (4.2) can be solved in polynomial
time for linear forms and broad classes of convex and concave functions by
Theorems 1.1–1.5.
4.2. Graver complexity of graphs and digraphs. The signifi-
cance of the following new (di)-graph invariant will be explained below.
Definition 4.1. [1] The Graver complexity of a graph or a digraph
G is the Graver complexity g(G) := g(D) of the bimatrix D with D the
incidence matrix of G.
One major task done by our algorithms for linear and nonlinear n-fold
integer programming over. a bimatrix
/ A is the construction of the Graver
basis G(A(n) ) in time O ng(A) with g(A) the Graver complexity of A (see
proof of Theorem 3.6).
Since the bimatrix underlying the universal n-fold integer program
[m]
(4.2) is precisely D with D = 13 the incidence matrix of K3,m , it
follows that the complexity of computing the
. relevant/ Graver bases for this
program for fixed m and variable n is O ng(K3,m ) where g(K3,m ) is the
Graver complexity of K3,m as just defined.
Turning to the many-commodity transshipment problem over a di-
graph G discussed in §2.2.1, the bimatrix underlying the n-fold integer
program (2.1) in the proof of Theorem 2.5 is precisely D with D the
incidence matrix of G, and so it follows that the. complexity
/ of computing
the relevant Graver bases for this program is O ng(G) where g(G) is the
Graver complexity of the digraph G as just defined.
So the Graver complexity of a (di)-graph controls the complexity of
computing the Graver bases of the relevant n-fold integer programs, and
hence its significance.
Unfortunately, our present understanding of the Graver complexity of
(di)-graphs is very limited and much more study is required. Very little
is known even for the complete bipartite graphs K3,m : while g(K3,3 ) = 9,
already g(K3,4 ) is unknown. See [1] for more details and a lower bound on
g(K3,m ) which is exponential in m.
Acknowledgements. I thank Jon Lee and Sven Leyffer for inviting
me to write this article. I am indebted to Jesus De Loera, Raymond Hem-
mecke, Uriel Rothblum and Robert Weismantel for their collaboration in

www.it-ebooks.info
592 SHMUEL ONN

developing the theory of n-fold integer programming, and to Raymond


Hemmecke for his invaluable suggestions. The article was written mostly
while I was visiting and delivering the Nachdiplom Lectures at ETH Zürich
during Spring 2009. I thank the following colleagues at ETH for useful feed-
back: David Adjiashvili, Jan Foniok, Martin Fuchsberger, Komei Fukuda,
Dan Hefetz, Hans-Rudolf Künsch, Hans-Jakob Lüthi, and Philipp Zum-
stein. I also thank Dorit Hochbaum, Peter Malkin, and a referee for useful
remarks.

REFERENCES

[1] Berstein Y. and Onn S., The Graver complexity of integer programming, Annals
Combin. 13 (2009) 289–296.
[2] Cook W., Fonlupt J., and Schrijver A., An integer analogue of Carathéodory’s
theorem, J. Comb. Theory Ser. B 40 (1986) 63–70.
[3] Cox L.H., On properties of multi-dimensional statistical tables, J. Stat. Plan.
Infer. 117 (2003) 251–273.
[4] De Loera J., Hemmecke R., Onn S., and Weismantel R., N-fold integer pro-
gramming, Disc. Optim. 5 (Volume in memory of George B. Dantzig) (2008)
231–241.
[5] De Loera J., Hemmecke R., Onn S., Rothblum U.G., and Weismantel R.,
Convex integer maximization via Graver bases, J. Pure App. Algeb. 213 (2009)
1569–1577.
[6] De Loera J. and Onn S., The complexity of three-way statistical tables, SIAM
J. Comp. 33 (2004) 819–836.
[7] De Loera J. and Onn S., All rational polytopes are transportation polytopes
and all polytopal integer sets are contingency tables, In: Proc. IPCO 10 –
Symp. on Integer Programming and Combinatoral Optimization (Columbia
University, New York), Lec. Not. Comp. Sci., Springer 3064 (2004) 338–351.
[8] De Loera J. and Onn S., Markov bases of three-way tables are arbitrarily com-
plicated, J. Symb. Comp. 41 (2006) 173–181.
[9] Fienberg S.E. and Rinaldo A., Three centuries of categorical data analysis: Log-
linear models and maximum likelihood estimation, J. Stat. Plan. Infer. 137
(2007) 3430–3445.
[10] Gordan P., Über die Auflösung linearer Gleichungen mit reellen Coefficienten,
Math. Annalen 6 (1873) 23–28.
[11] Graver J.E., On the foundations of linear and linear integer programming I, Math.
Prog. 9 (1975) 207–226.
[12] Hemmecke R., Onn S., and Weismantel R., A polynomial oracle-time algorithm
for convex integer minimization, Math. Prog. 126 (2011) 97–117.
[13] Hemmecke R., Onn S., and Weismantel R., N-fold integer programming and
nonlinear multi-transshipment, Optimization Letters 5 (2011) 13–25.
[14] Hochbaum D.S. and Shanthikumar J.G., Convex separable optimization is not
much harder than linear optimization, J. Assoc. Comp. Mach. 37 (1990)
843–862.
[15] Hoffman A.J. and Kruskal J.B., Integral boundary points of convex polyhedra,
In: Linear inequalities and Related Systems, Ann. Math. Stud. 38, 223–246,
Princeton University Press, Princeton, NJ (1956).
[16] Hoşten S. and Sullivant S., Finiteness theorems for Markov bases of hierarchical
models, J. Comb. Theory Ser. A 114 (2007) 311–321.
[17] Irving R. and Jerrum M.R., Three-dimensional statistical data security problems,
SIAM J. Comp. 23 (1994) 170–184.

www.it-ebooks.info
THEORY AND APPLICATIONS OF N -FOLD INTEGER PROGRAMMING 593

[18] Lenstra Jr. H.W., Integer programming with a fixed number of variables, Math.
Oper. Res. 8 (1983) 538–548.
[19] Motzkin T.S., The multi-index transportation problem, Bull. Amer. Math. Soc.
58 (1952) 494.
[20] Onn S., Entry uniqueness in margined tables, In: Proc. PSD 2006 – Symp. on
Privacy in Statistical Databses (Rome, Italy), Lec. Not. Comp. Sci., Springer
4302 (2006) 94–101.
[21] Onn S., Convex discrete optimization, In: Encyclopedia of Optimization, Springer
(2009) 513–550.
[22] Onn S., Nonlinear discrete optimization, Zurich Lectures in Advanced Mathemat-
ics, European Mathematical Society, 2010.
[23] Onn S. and Rothblum U.G., Convex combinatorial optimization, Disc. Comp.
Geom. 32 (2004) 549–566.
[24] Santos F. and Sturmfels B., Higher Lawrence configurations, J. Comb. Theory
Ser. A 103 (2003) 151–164.
[25] Schrijver A., Theory of Linear and Integer Programming, Wiley, New York
(1986).
[26] Sebö, A., Hilbert bases, Carathéodory’s theorem and combinatorial optimization,
In: Proc. IPCO 1 - 1st Conference on Integer Programming and Combinatorial
Optimization (R. Kannan and W.R. Pulleyblank Eds.) (1990) 431–455.
[27] Vlach M., Conditions for the existence of solutions of the three-dimensional planar
transportation problem, Disc. App. Math. 13 (1986) 61–78.
[28] Yemelichev V.A., Kovalev M.M., and Kravtsov M.K., Polytopes, Graphs and
Optimisation, Cambridge University Press, Cambridge (1984).

www.it-ebooks.info
www.it-ebooks.info
PART IX:
Applications

www.it-ebooks.info
www.it-ebooks.info
MINLP APPLICATION FOR
ACH INTERIORS RESTRUCTURING
ERICA KLAMPFL∗ AND YAKOV FRADKIN∗

Abstract. In 2006, Ford Motor Company committed to restructure the $1.5 billion
ACH interiors business. This extensive undertaking required a complete re-engineering
of the supply footprint of 42 high-volume product lines over 26 major manufacturing pro-
cesses and more than 50 potential supplier sites. To enable data-driven decision making,
we developed a decision support system (DSS) that could quickly yield a variety of so-
lutions for different business scenarios. To drive this DSS, we developed a non-standard
mathematical model for the assignment problem and a novel practical approach to solve
the resulting large-scale mixed-integer nonlinear program (MINLP). In this paper, we
present the MINLP and describe how we reformulated it to remove the nonlinearity in
the objective function, while still capturing the supplier facility cost as a function of the
supplier’s utilization. We also describe our algorithm and decoupling method that scale
well with the problem size and avoid the nonlinearity in the constraints. Finally, we
provide a computational example to demonstrate the algorithm’s functionality.

Key words. Mixed-integer programming, applications of mathematical program-


ming.

AMS(MOS) subject classifications. Primary 90C11, 90C90.

1. Introduction. As part of the 2006 Accelerated Way Forward plan,


Ford committed to sell or close all existing Automotive Components Hold-
ings, LLC (ACH) plants by the end of 2008 [4]. In this paper, we focus on
the interiors business portion of the ACH business, which was a major piece
of the ACH portfolio, about $1.5 billion in annual revenue. Ford’s proactive
Way Forward approach was key in the ACH interiors business’ avoidance
of bankruptcy to which many other interiors suppliers (e.g. Plastech [10],
Blue Water [11]) succumbed.
Making strategic sourcing decisions is well suited to operations re-
search techniques, as there are often millions of possible alternatives that
cannot be readily evaluated or analyzed manually. We developed a quan-
titative decision support system that allowed the major stakeholders to
assign the interiors product programs to suppliers that had available ca-
pacity at their facilities.
The resulting mathematical program that describes this sourcing prob-
lem is a large-scale mixed-integer nonlinear program (MINLP); the nonlin-
earity is due to a product program’s production cost being a function of
a facility’s capacity utilization (suppliers prefer to operate at almost full
capacity to get the full value of the fixed cost of their equipment and fa-
cility). Since it is not well known how to solve MINLPs of this size, we
developed a MINLP reformulation that removed the nonlinearity in the

∗ Ford Research & Advanced Engineering, Systems Analytics & Environmental Sci-

ences Department, Dearborn, MI 48124.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 597
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_21,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
598 ERICA KLAMPFL AND YAKOV FRADKIN

objective function, and subsequently, we developed an algorithm based on


decoupling the reformulated MINLP into solvable mixed-integer programs
(MIPs).
In this paper, we first provide an overview of the business problem
in Section 2. Next, in Section 3, we describe the MINLP formulation
together with the constraints that provide details on the business needs,
such as certain products being constrained to be produced together due to
shipping requirements or intellectual property licensing. In Section 4, we
describe the problem reformulation that removes the nonlinearity in the
objective function. The resulting formulation is also a MINLP with the
nonlinearity in the constraints remaining. In Section 5, we demonstrate
how we decouple this MINLP into two MIPs and describe the algorithm
that iteratively solves the MIPs until convergence. In Section 6, we provide
a small computational example to demonstrate the solution methodology.
Finally, in Section 7, we provide a summary of the business outcome of this
work, as well as the MINLP modeling insight that we gained.

2. Problem overview. In this section, we will give a brief overview of


the business problem of making the strategic sourcing decisions associated
with the ACH restructuring. For details of the business climate, problem,
and scenario analysis see [7].
ACH’s interiors business was located in two large-scale manufacturing
plants in southeast Michigan (Utica and Saline), producing instrument
panels, consoles, door trim panels and cockpit module assemblies for Ford
assembly plants. Saline and Utica were underutilized and not profitable,
and the interiors business was consuming a substantial amount of company
cash. Ford had committed to sell or close all existing ACH plants, including
Utica and Saline, by the end of 2008 as a part of the 2006 Accelerated Way
Forward plan. However, after traditional improvement actions were made
and a concerted marketing effort was attempted in 2006, Ford was unable
to find an interested, qualified buyer for the interiors business.
ACH and Ford Purchasing were left with a few options: consolidate
the product programs into either Saline or Utica, making the consolidated
plant more attractive to a potential investor due to better facility uti-
lization; outsource all product programs to suppliers that have facilities
with available capacity; or devise some combination of restructuring and
outsourcing. How to make this decision became daunting, as there was
an overwhelming number of possible sourcing alternatives of the interiors
business to the overall supply base: more than 40 product programs re-
quiring multiple manufacturing processes that could be sourced to over 50
potential production sites, each with different configurations of production
technologies, capacities, utilizations and shipping distances to assembly
plants.
It soon became apparent that the quality of these decisions could be
drastically improved if a computational tool was available for identifying

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 599

and evaluating the millions of possible product programs-to-suppliers allo-


cations. Specifically, the business problem centered on identifying the best
possible sourcing pattern for the product programs at the two ACH interi-
ors plants, given certain physical, financial and operational considerations.
In this case, the “best possible sourcing pattern” was defined as a sourc-
ing footprint that yielded the minimum net present value of five-year cost
stream to Ford, factoring in 1) investment costs to achieve the new sourcing
footprint, 2) the resulting annual purchased part expenditure (dependent
on facility utilization), and 3) the annual freight expenditure to ship the
interiors products to their respective assembly destinations.
ACH and Ford Purchasing wanted a model that faithfully represented
the business problem by accounting for the nonlinear supplier facility cost
curve as a function of capacity utilization, as well as the many business
constraints. They also wanted a means for allowing ACH customers the
ability to independently run the tool and perform scenario analysis.

3. MINLP formulation. The underlying model that represents the


interiors restructuring problem is a large-scale MINLP. The MINLP field
is still an active area of research (see [5] for recent developments), with no
general methods available for large-scale problems: most early work was
done in the eighties for problems arising in chemical engineering [1]. The
nonlinearity in our MINLP is due to the product program’s production
cost being a function of a facility’s capacity utilization. The standard MIP
modeling approach that uses a fixed cost plus per-unit variable cost as
described in [8] was not readily applicable here, as Ford does not directly
pay a third party supplier’s fixed cost. However, Ford is likely to get a
reduced per unit cost from the third party supplier dependent upon the
facility’s total utilization. This potential reduction in per unit total cost is
what we attempt to capture in our modeling approach.

3.1. Input information. The list below describes the inputs and
calculated coefficients that we use in the model.

S = {1, . . . , m} set of facilities with available processes; includes


both Ford-owned and third-party facilities.
S c ⊆ S = set of excluded (closed-for-business) facilities.
C = {1, . . . , n} set of manufacturing requirements: a product pro-
gram may have multiple such requirements.
C p ⊆ C = set of indices with cardinality λ that corresponds to unique
product programs.
P = {1, . . . , r} set of processes.
NPV = net present value multiplier used to discount future annual
costs in order to properly reflect the time value of money.
COF = capacity overload factor: allows override of a facility’s capacity.
UMIN = fewest number of employees required across all union facilities.
CCj = cost of closing a Ford-owned facility j ∈ S c .

www.it-ebooks.info
600 ERICA KLAMPFL AND YAKOV FRADKIN

Uj = designates if a facility j ∈ S is a union facility: Uj = 1 if


facility j is a union facility; otherwise Uj = 0.
SUj = supplier to which a facility j ∈ S belongs.
EHi = employees needed to produce product program i ∈ C p .
IFi = intellectual property family for product program i ∈ C p . IFi
= 0 if product program i does not fall under any intellectual
property agreement. The products in the same intellectual
property family must be assigned to the same supplier but
possibly to different facilities of that supplier.
VVi = production volume for product program i ∈ C p .
CPMi = product program i ∈ C p freight-family identifier. If the value
is greater than zero, it provides the group identifier for
product programs that can be shipped for one combined cost.
PFi = product-program family-number identifier for each require-
ment i ∈ C that is used to ensure that all requirements of the
same product program are assigned to the same facility.
PRi = process quantity required for requirement i ∈ C.
FCij = freight cost to ship product program i ∈ C p from facility j ∈ S
to assembly destination. This is the net present value cost
over the life of the project, calculated as the annual freight
cost multiplied by the NPV factor.
TCij = tooling move cost to make product program i ∈ C p in facility
j ∈ S.
MCpj = cost of moving any additional unit of capacity of process p ∈ P
to facility j ∈ S.
MPMpj = maximum number of units of capacity of process p ∈ P that
can be moved to facility j ∈ S.
CPUpj = capacity per unit of process p ∈ P at facility j ∈ S.
RLip = required load for requirement i ∈ C on process p ∈ P.
SACpj = available capacity of process p ∈ P at facility j ∈ S.
IPMCpj = the total initial installed capacity for process p ∈ P at
facility j ∈ S. Note that IPMCpj − SACpj is the capacity
already used by other parties at facility j.
IPPj = initial total number of processes having IPMCpj > 0 avail-
able at facility j ∈ S c .

There were λ = 42 product programs, which we expanded into n = 229


manufacturing operations (also referred to as requirements) in order to
improve granularity of manufacturing process assignments. Each product
program could have multiple requirements, such as floor space and machine
time requirements. In addition, for machine requirements, there were sev-
eral machines on which a manufacturing requirement could be made, and
the manufacturing requirements could be assigned to be made on several
machines in the same facility. For example, an instrument panel could

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 601

be made on a 1,500, 2,000, or 2,500-ton press or any combination of the


aforementioned.
3.2. Variables. For this MINLP, we have both continuous and bi-
nary decision variables. The first variable, xij , is a binary variable that
determines if a requirement i ∈ C is assigned to facility j ∈ S. In other
words,

1 if requirement i is assigned to facility j
xij =
0 otherwise.

The second variable, vipj , is the fraction of requirement i ∈ C pro-


duced on process p ∈ P at facility j ∈ S. This variable enables partial
loading of the total requirement across multiple machine sizes. For exam-
ple, the required process hours could be split between 2000 ton and 2500
ton machines.
The third variable is a binary variable that specifies whether or not a
requirement that is produced at a certain facility avoids double counting
freight:


⎪ 1 if another product program that belongs to the same

freight family as product program i ∈ C p is produced at
wij =

⎪ the same facility j ∈ S

0 otherwise.

Note that only one product program of any given product program group
can get a freight discount. So, if product programs i1 and i2 are in the same
freight family and both product program i1 and i2 are produced at facility
j, then wi1 j = 1 and wi2 j = 0 ∀ j ∈ S and ∀ i1 , i2 ∈ C p . For example, an
instrument panel could receive reduced transportation costs by assigning
cockpit assembly work to the same location as the base instrument panel
and shipping a single finished assembly to the final assembly plant, as
opposed to separately shipping the cockpit and instrument panel.
The fourth variable, mpj , specifies how many units of capacity of type
p ∈ P to move to supplier j ∈ S. This is an integer variable that is greater
than or equal to zero. Note that this variable is integer-constrained (see
constraint (3.18)) by M P Mpj , which is introduced in Section 3.1. An
example unit of capacity that could be moved to a facility is a 2,500 ton
press.
The variable upj is a binary variable that keeps track of whether or
not any additional units of capacity of process p were added to a facility j
that did not already have any initial installed capacity of process type p.

⎨ 1 if any units of process p ∈ P are added to facility j ∈ S,
upj = such that IPMCpj = 0

0 otherwise.

www.it-ebooks.info
602 ERICA KLAMPFL AND YAKOV FRADKIN

The variable nppj is the total number of processes available at facility


j ∈ S, including the “just moved in” processes: see constraint (3.13). It
is a function of the initial total number of processes plus any units of new
processes that are moved into a facility. Hence, we can let it be a continuous
variable because it is the sum of integers.
The continuous variable pmcpj is the maximum capacity for process
p ∈ P at facility j ∈ S, including any capacity that has been moved into
facility j: see constraint (3.14).
The continuous variable ipuj is the initial percent utilization for facility
j ∈ S: see constraint (3.18).
The continuous variable ucij is the unit cost for product program
i ∈ C p at facility j ∈ S, which can be approximated by the function in
constraint (3.16).
The continuous variable pcij is the production cost of a product pro-
gram i ∈ C p in facility j ∈ S: see constraint (3.17).
For this MINLP, we had λ = 42 product programs, n = 229 re-
quirements, m = 57 facilities and r = 26 processes. The MINLP has
m(n + λ + r) = 16, 929 binary variables, rm = 1, 482 integer (non-binary)
variables, and m(nr+r +2+2λ) = 345, 762 continuous variables, for a total
of m(n + nr + 3λ + 3r + 2) = 364, 173 variables. Note that the number of
variables is an upper bound, as all facilities were not necessarily considered
in all scenarios, and constraints (3.3) and (3.11) also reduce the number of
variables.
3.3. Objective function. Our objective is to minimize all costs as-
sociated with the interiors business: product program producing, trans-
porting, tooling, equipment moving, and facility closing. Including all of
these costs allows the interiors business to capture not only traditional
transportation and production costs, but also allows sourcing to occur at
facilities to which equipment can be moved. The resulting nonlinear objec-
tive function is
 
min (FCij + TCij )xij + pcij xij
wij ,pcij ,ucij ∀i∈C , j∈S,
p
xij ,vipj ,mpj ,upj i∈C p i∈C p
∀i∈C, j∈S, p∈P j∈S j∈S
  
− FCij wij + MCpj mpj + CCj (3.1)
i∈C p \{i:CPMi =0}, p∈P j∈S c
j∈S j∈S

We describe the individual cost components in more details below:


• Freight costs, FCij , are incurred by shipping a product program
i ∈ C p from facility j ∈ S. Since this cost is associated with a
product program, it is carried by only one of the requirements;
multiple entries for a product program representing additional re-
quirements
 do not have additional costs. This is represented by
i∈C p ,j∈S FCij xij , where the freight cost is realized only if the

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 603

product program i is assigned to a facility j (i.e., xij = 1). In


addition, freight cost can be avoided for building common parts
in the same facility. This is because they have the same volume
shipped together or separately, so we can subtract the freight cost
for additional common partsthat are assigned to the same facil-
ity. This is captured by − i∈C p \{i:CPMi =0},j∈S FCij wij , where
the freight cost is subtracted for additional common parts i that
are shipped from facility j (i.e., this is indicated when wij = 1).
• The tooling move cost, T Cij , is the cost to move tooling to pro-
duceproduct program i ∈ C p in facility j ∈ S. This is realized
by i∈C p j∈S TCij xij , where tooling move cost is added only if
product program i is assigned to facility j (i.e., xij = 1). T Cij = 0
when sourcing a product program to its current production facility;
however, when moving production to a different facility (even to a
facility having the capacity available to make a product program),
there is generally some tooling cost involved.
• The capacity move cost, M Cpj , is the cost to move a unit of ca-
pacity of process p ∈ P to facility j ∈ S. This is to account for
cost associated with moving capacity to a facility to either increase
existing
 capacity or introduce new capacity. It is represented by
p∈P,j∈S MCpj mpj , where cost is added only if a unit of capacity
of process p is added to facility j (i.e., mpj > 0).
• Production cost of a product program, pcij , is a variable that is de-
pendent on capacity utilization at facility j, introducing
 nonlinear-
ity into the objective function. It is captured by i∈C p ,j∈S pcij xij ,
where production cost is only charged for a product program i if
it is produced in facility j (i.e., xij = 1).
• The facility closure cost, CCj , is the cost to close facility j. Note
that this is an input value and is added to the total objective func-
tion value if either Ford-owned facility, Utica or Saline, is chosen
for closure (i.e., nothing is sourced to the Ford-owned facility).
Ford does not have the ability to close any outside supplier facility
and likewise would not incur closing cost if an outside supplier’s
facility closed.

3.4. Constraints. The following constraints for the MINLP convey


the different business requirements:
• Each requirement must be assigned to exactly one facility (n = 229
constraints).

xij = 1 ∀ i ∈ C (3.2)
j∈S

• If requirements are in the same family, then they must be assigned


to same facility ( mn(n−1)
2 = 1, 488, 042 constraints).

www.it-ebooks.info
604 ERICA KLAMPFL AND YAKOV FRADKIN

xi1 j = xi2 j ∀ i1 , i2 ∈ C, j ∈ S :
(3.3)
i2 < i1 ∧ PFi1 = 0 ∧ PFi2 = PFi1 .
We guarantee that manufacturing requirements belonging to the
same product program are sourced to the same facility by grouping
them in the same “product-program family” and constraining those
in the same product-program family to be sourced to the same
facility.
• If a requirement i is produced at facility j, then the total fractional
amount produced at j over all processes p should equal 1. In
addition, if a requirement i is not produced at facility j, then no
fractional amount of requirement i must be produced on a process
p belonging to that facility j (nm = 13, 053 constraints).

vipj = xij ∀ i ∈ C, j ∈ S. (3.4)
p∈P:RLip >0

• The capacity on every process p at a facility j must not be violated


(mr = 1, 482 constraints).

RLip vipj ≤ COF(SACpj + CPUpj mpj )
i∈C:RLip >0 (3.5)
∀ j ∈ S, p ∈ P.
• Every requirement i must be fully assigned to one or more processes
p at facility j (n = 229 constraints). Note that this constraint only
needs to be for every requirement because constraints (3.2) and
(3.4) guarantee that the total fractional amount of requirement i is
produced only at one facility j. Although this constraint is implied
by (3.2) and (3.4), it is useful for better defining the convex hull
when solving the MIPs in Section 5.

vipj = 1 ∀ i ∈ C. (3.6)
p∈P:RLip >0,j∈S

• At least some threshold of business must be dedicated UAW facil-


ities.

EHi xij ≥ UMIN. (3.7)
i∈C,j∈S:Uj =1

• If product programs are in the same intellectual family, then they


must be assigned to same holding company; that is, the facilities
to which the product programs are assigned must belong to the
same supplier ( λ(λ−1)
2 = 861 constraints).
 
SUj xi1 j = SUj xi2 j ∀ i1 ∈ C p : IFi1 > 0, i2 ∈ C p :
j∈S j∈S (3.8)
IFi1 = IFi2 ∧ i2 > i1 .

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 605

As long as this constraint holds for i1 , i2 ∈ C p , it will also hold for


all other i1 , i2 ∈ C because of constraint (3.3).
• If a product program i is not assigned to facility j, then the product
program is not eligible for a freight discount to facility j (λm =
2, 394 constraints).

wij ≤ xij ∀ i ∈ C p , j ∈ S (3.9)

• The following constraint forces the “produced together” variable,


wij , to be zero if two product programs that get a benefit for being
produced together are not produced at the same facility.

xi1 j + xi2 j ≥ 2wi1 j ∀ j ∈ S, i1 ∈ C p : CPMi1 > 0, i2 ∈ C p :


CPMi1 = CPMi2 ∧ i2 > i1 . (3.10)

Similar to the previous constraint, we only consider i1 , i2 ∈ C p


because a reduction in freight cost is only applicable to the unique
product programs ( mλ(λ−1) 2 = 49, 077 constraints).
• The following constraint guarantees that “produced together” vari-
able, wi1 j , is zero if a product program i1 does not get a benefit
for being produced together with other product programs at the
same facility. It also forces wi1 j to be zero if product program
i1 has a CPMi1 value greater than zero that is the same as some
other product program i2 and i1 > i2 . This constraint is useful
in the presolve phase in eliminating variables ( mλ(λ−1)2 = 49, 077
constraints).

wi1 j = 0 ∀ i1 , i2 ∈ C p , j ∈ S : (CPMi1 = 0)∨


(3.11)
(CPMi1 > 0 ∧ CPMi1 = CPMi2 ∧ i2 < i1 ).

• The following ensures that if requirement i is assigned to facility


j, then the sum of all processes that requirement i uses in facility
j satisfies requirement i’s demand (nm = 13, 053 constraints):

RLip vipj = PRi xij ∀ i ∈ C, j ∈ S. (3.12)
p∈P:RLip >0

• The following keeps track of the number of unique process types


available at a facility. It adds the number of new processes moved
to a facility (does not include adding more process units to existing
processes) to the initial number of processes at a facility (m = 57
constraints).

nppj = IPPj + upj ∀ j ∈ S. (3.13)
p∈P

www.it-ebooks.info
606 ERICA KLAMPFL AND YAKOV FRADKIN

• The constraint below updates the maximum capacity for a pro-


cess at a facility, updating the initial total capacity with the total
capacity that is moved into the facility (rm = 1, 482 constraints).
pmcpj = IPMCpj + CPUpj mpj ∀ p ∈ P, j ∈ S. (3.14)
• The following nonlinear constraint keeps track of the initial percent
utilization of a facility, accounting for any new capacity that was
added to the facility (m = 57 constraints).
⎛ ⎞
 pmcpj − SACpj
ipuj = ⎝ ⎠ /nppj ∀ j ∈ S. (3.15)
pmcpj
p∈P

• The following nonlinear constraint approximates the unit cost,


where Aij is the unit variable cost at this facility, and Bij is the
facility fixed cost attributed to a unit at 100% utilization of all
processes at the facility (λm = 2, 394 constraints). The facility
congestion costs are not accounted for in this approximation.
Bij
ucij = Aij +  ∀ i ∈ C p , j ∈ S. (3.16)
p∈P,i∈C v ipj + ipu j

• The following constraint defines the production cost of a product


program assigned to facility (λm = 2, 394 constraints):
VVi
pcij = NPV · ucij ∀ i ∈ C p , j ∈ S. (3.17)
1, 000
Note that we divide VVi by a thousand to convert to the same
units of measure: pcij is in millions of dollars, ucij is in dollars,
and VVi is in thousand of units per year.
• The following constraint guarantees that upj = 1 if any units of
capacity of a process are moved into a facility when there is no
initial installed capacity for that process.
mpj ≤ MPMpj upj ∀ p ∈ P, j ∈ S : IPMCpj = 0. (3.18)
If there is initial installed capacity for a process, then upj = 0
(together with constraint 3.18, it has rm = 1, 482 constraints).
upj = 0 ∀ p ∈ P, j ∈ S : IPMCpj > 0. (3.19)
• The following constraint guarantees that if no units of process are
moved to a facility, then there are no units of process p that were
added to facility j (rm = 1, 482 constraints).
upj ≤ mpj ∀ p ∈ P, j ∈ S. (3.20)
This MINLP has a nonlinear objective function and several nonlinear
constraints, (3.15)-(3.17). Altogether, there are mn(n−1)
2 + λ(λ−1)(2m+1)
2 +
m(3λ + 2n + 4r + 2) + 2n + 1 = 1, 626, 846 constraints.

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 607

4. Discrete MINLP reformulation. The MINLP problem, as de-


scribed in Section 3, has a nonlinear objective function, several hundred
thousand variables, and over a million constraints (a subset of which is
nonlinear): this MINLP is too large to solve with current MINLP solu-
tion techniques. With a very tight deadline in which we had to solve this
problem and deliver an analysis (two months), we focused on developing a
formulation that could be easily solved using existing solution techniques.
Our goal was to remove the nonlinearity in both the objective function and
constraints so that we could reduce the MINLP to a MIP, allowing the
use of available solvers capable of tackling problems with millions of vari-
ables and constraints. In this section, we describe how we first reduced the
complexity of the problem by removing the nonlinearity in the objective
function.
Recall that the nonlinearity in the objective function was caused by
how we derived the non-standard unit cost of a product program. In this
section, we describe how we linearized this cost by approximating the fa-
cilities’ utilization curves into discrete steps having constant cost factors.
To get the product program’s utilization-adjusted unit cost, we multiply
those factors by the variable unit cost. Because the nonlinear cost curves,
themselves, are estimates, the approach of using the utilization levels ad-
equately represented the business problem. To validate this assumption,
we ran a calibration scenario using the existing sourcing pattern at ACH
facilities to confirm agreement with current budget revenues: tests found
that we were within five percent accuracy.
We refer to the resulting model as the “reformulated MINLP” because
although we remove the nonlinearity in the objective function, there are still
nonlinear constraints. We will first describe the new and modified inputs
required for this reformulated MINLP. Next, we describe the additional
dimension to some of the variables and the new variables for this MINLP
that reflect the utilization range choice. We conclude with a discussion of
the added and modified constraints.

4.1. Input Information. The list below describes the additional in-
puts that were necessary for the MINLP reformulation.
Q = {1, . . . , } set of utilization ranges for facilities.
QCqj = production cost multiplier for facility j ∈ S when the facility
is in utilization range q ∈ Q.
UBq = upper bound on utilization range q ∈ Q; U B 0 = 0.
UBCi = base unit cost of product program i ∈ C p .
UCi = calculated unit cost of product program i ∈ C p .
PCij = calculated production cost of product program i ∈ C p at

Note that UCi and PCij were previously the variables uci and pcij in
Section 3 but are now parameters in the reformulated MINLP.

www.it-ebooks.info
608 ERICA KLAMPFL AND YAKOV FRADKIN

4.2. Variables. Many of the variables for the reformulated MINLP


are the same as the variables in Section 3.2; exceptions include the vari-
ables xiqj and vipqj that now have the dimension q added to designate the
utilization range for a facility. We also introduce a new variable yqj that
will determine which utilization range is chosen for a facility.
So, the variable xij is now redefined as

⎨ 1 if requirement i ∈ C is produced in utilizaton range q ∈ Q
xiqj = by facility j ∈ S,

0 otherwise.
The variable vipj is now redefined as vipqj , which allows a fraction of
requirement i ∈ C to be produced on different processes p ∈ P at facility
j ∈ S in utilization range q ∈ Q.
The new variable, yqj , is a binary variable that determines whether
or not a facility’s total production is within a certain utilization range q.
The utilization level is only defined for the facilities to which we source a
share of our own business; we are ambivalent regarding the utilization of a
facility to which we have not sourced any business. We write this as,

⎨ 1 if facility j ∈ S to which we source a share of our own
yqj = business has utilization levels in range q ∈ Q,

0 otherwise.
Assuming values for m, n, λ, and r have the same values as in Section 3.2,
and  = 10, there are m(n + λ + r + ) = 134, 976 binary variables, rm =
1, 482 integer (non-binary) variables, and m(nr + r + 2 + 2λ) = 3, 400, 164
continuous variables, with a total of m(n + nr + 3λ + 3r + 2 + ) =
3, 536, 622 variables. Note that this is an upper bound on the number of
variables, as not all facilities are necessarily considered in each scenario,
and constraints (3.3) and (3.11) also reduce the number of variables.
4.3. Objective function. The objective function for the reformu-
lated MINLP is similar to the one in Section 3.3. However, the production
cost for a product program is now a fixed input that depends on the fa-
cility’s utilization range; this makes the objective function linear. This
is because the unit cost variable, ucij , from the reformulated MINLP has
been approximated by UCiqj = UBCi · QCqj j for some qj ∈ Q and for all
j ∈ S such that yqj j = 1: see Figure 1. Hence, the production cost of a
product program, PCiqj , is (N P V · V Vi )U Ciqj /1000.
4.4. Constraints. As a result of linearizing the objective function,
we now have some modified and additional constraints in the reformulated
MINLP. Constraints (3.2) through (3.10) and (3.12) have been modified
from the constraints discussed in Section 3.4 to include the new dimension
associated with the utilization range q. For every variable that contains a
q subscript, there is a summation over q in the constraint for that variable,
with the exception of constraint (3.4) that is for every q.

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 609

Fig. 1. We discretized each facility’s utilization curve into ranges; range length
could be unequal to better approximate the curve. In this diagram, the utilization
range, qj , is defined by the interval from UBqj −1 to UBqj and has a unit cost fac-
tor QCqj j ∀ j ∈ S.

The following constraints are new constraints that were not in the
original MINLP described in Section 3.4. They have been added as a result
of introducing the utilization range approximation to the facility cost curve.
• A facility j can use at most one utilization level q. If a facility is
not chosen for Ford’s business (even though the facility may carry
third-party business), then we specify that no utilization range at
that facility is chosen (m = 57 constraints).

yqj ≤ 1 ∀ j ∈ S (4.1)
q

• Facility utilization, defined as an average of all installed process


utilizations, has a known upper bound in each utilization range
q, which we enforce for each facility with the following nonlinear
constraint (m = 570 constraints).
 RLip
vipqj
nppj pmcpj (4.2)
i∈C,p∈P:pmcp,j >0∧RLip >0
≤ (UBq − ipuj )yqj ∀ j ∈ S, q ∈ Q

• Similarly, each utilization range has a known lower bound utiliza-


tion, which we enforce for each facility with the following nonlinear
constraint (m( − 1) = 513 constraints).
 RLip
vipqj
nppj pmcpj (4.3)
i∈C,p∈P:pmcp,j >0∧RLip >0
≥ (UBq−1 − ipuj )yqj ∀ j ∈ S, q ∈ Q : q > 1

• The next two constraints are linking constraints. The first guar-
antees that if no product program i is sourced to a facility j in

www.it-ebooks.info
610 ERICA KLAMPFL AND YAKOV FRADKIN

utilization range q, then utilization range q for facility j can not


be selected (m = 570 constraints).

yqj ≤ xiqj ∀ j ∈ S, q ∈ Q. (4.4)
i∈C

The second guarantees that if at least one product program i is


sourced to a facility j in utilization range q, then utilization range
q for facility j must be selected. (nm = 130, 530 constraints).

xiqj ≤ yqj ∀ i ∈ C, q ∈ Q, j ∈ S. (4.5)

• If a product program i is assigned to a process p at facility j then


it also must be assigned to a utilization range q at plant j (nm =
13, 053 constraints).
 
RLip vipqj = PRi xiqj ∀ i ∈ C, j ∈ S. (4.6)
p∈P:RLip >0,q∈Q q∈Q

This reformulated MINLP has a linear objective function and several non-
linear constraints, (3.15) through (3.17) from the original MINLP plus (4.2)
and (4.3). In total, there are mn(n−1)
2
+ λ(λ−1)(2m+1)
2
+ m(3λ + 2n + 4r +
2n+3+2)+2n+1 = 1, 889, 616 constraints in this discrete MINLP refor-
mulation. Recall that this is an upper bound on the number of constraints
since not all facilities are considered in each scenario.
5. Solution technique. Although we were able to remove the nonlin-
earity in the objective function to create the reformulated MINLP described
in the previous section, we are still left with a large-scale MINLP where
the discrete and continuous variables are non-separable. In this section,
we describe how we obviate the nonlinearity in the constraints of the refor-
mulated MINLP by iteratively fixing different variables in the reformulated
MINLP, allowing us to solve a MIP at each iteration. By carefully selecting
which variable to fix at each iteration, we not only are able to solve MIPs
at each iteration, but we are also able to reduce the number of constraints
and variables in each formulation. We first introduce the two MIPs that
we iteratively solve, each of which captures a subset of the decisions to be
made. Next, we describe the algorithm that provides us with a solution
to the discrete MINLP. Finally, we discuss convergence properties of the
algorithm.
5.1. Decoupling the MINLP. We introduce a decoupling of the
discrete MINLP model into two MIPs that can then be solved iteratively
until convergence to a solution of the discrete MINLP reformulation (see
Figure 2). Each MIP is actually a subset of the reformulated MINLP
with a different subset of variables fixed in the different MIPs. We chose
this approach because solution techniques for MIPs are well defined for

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 611

efficiently solving problems with millions of variables and constraints [6].


Not only do we avoid the nonlinearity in the constraints, but we are also
able to reduce the number of variables and constraints needed to describe
the problem.

6WDUW

,QLWLDOL]H

&DOFXODWH &DOFXODWH

6ROYH)DFLOLW\&DSDFLW\ 6ROYH)DFLOLW\
0RGHO 8WLOL]DWLRQ0RGHO

<HV <HV
&RQYHUJHQFHWHVW &RQYHUJHQFHWHVW

1R 1R
6WRS

Fig. 2. Flowchart that describes the solution technique.

The first MIP model we refer to as the Facility Capacity Model (FCM);
it assumes that the utilization range for a supplier is fixed, and it solves to
determine to which facilities and processes to source and whether or not
facilities that do not currently have capacity should have capacity added.
This solution will then determine upper bounds on a facility’s process ca-
pacity. The second model we refer to as the Facility Utilization Model
(FUM). This model accepts as inputs a facility’s upper bound on capacity
for each process, the number of unique processes at each facility, and the
initial facility percent utilization and determines which utilization range a
facility falls within, as well as the same sourcing decisions. We conclude
this section with a detailed discussion of the algorithm in Figure 2.
5.1.1. Facility capacity problem formulation. The FCM MIP as-
sumes that the utilization range for each supplier is fixed; hence, we remove
the subscript q from all variables. The FCM answers the following two
questions: to which facilities and processes to source, and whether or not
facilities that do not currently have capacity should have capacity added.
As stated in Section 2, most categories of ACH-owned equipment could be
moved from Saline or Utica to a new supplier’s site, at a certain expense.
This FCM solution will then determine upper bounds on a facility’s process
capacity. The FCM model differs from the reformulated MINLP in that
it does not include the variable yqj and so does not need the associated
bounds or the constraints associated with this variable in (4.1)–(4.6). Ad-
ditionally, constraints (3.13) through (3.15) are not needed because nppj ,
pmcpj , and ipuj are calculated after the FCM is solved. Constraints (3.16)
and (3.17) are not needed because PCiqj and UCiqj are not decision vari-
ables and are set according to the fixed utilization range from the previous
solve of the FUM. Constraints (3.18) through (3.20) are no longer needed

www.it-ebooks.info
612 ERICA KLAMPFL AND YAKOV FRADKIN

because upj is determined by testing to see if mkpj > 0 and IPMCpj = 0


after the FCM is solved.
The formulation below defines the objective function solution MinkC
for the FCM at iteration k.
This MIP has m(n+nr +λ+r) = 356, 307 variables and λ(λ−1)(m+2)2 +
mn(n−1)
2 + 2n(m + 1) + mr + 1 = 1, 566, 888 constraints before the pre-
solve. The number of variables and constraints that can be eliminated
during CPLEX’s pre-solve is dependent upon the scenario (e.g. number
of external facilities we consider, number of closed for business facilities,
etc.); in one scenario, the MIP had 5,864 variables and 2,575 constraints
after the pre-solve.


MinkC = min (FCij + TCij + PCk−1
ij )xij
wij ∀i∈C ,j∈S,
p
xij , i∈C,j∈S
vipj ,mpj
∀i∈C,j∈S,p∈P
  
− FCij wij + MCpj mpj + CCj
i∈C\{i:CPMi =0}, p∈P,j∈S j∈S c
j∈S

s.t.

xij = 1 ∀ i ∈ C (5.1)
j∈S

xi1 j = xi2 j ∀ i1 , i2 ∈ C, j ∈ S :
i2 < i1 ∧ PFi1 = 0 ∧ PFi2 = PFi1 (5.2)


vipj = xij ∀ i ∈ C, j ∈ S (5.3)
p∈P:RLip >0


RLip vipj ≤ COF(SACpj + CPUpj mpj )
i∈C:RLip >0

∀ j ∈ S, p ∈ P (5.4)


vipj = 1 ∀ i ∈ C (5.5)
p∈P:RLip >0,j∈S


EHi xij ≥ UMIN (5.6)
i∈C,j∈S:Uj =1

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 613
 
SUj xi1 j = SUj xi2 j ∀ i1 ∈ C p : IFi1 > 0,
j∈S j∈S

i2 ∈ C p : IFi1 = IFi2 ∧ i2 > i1 (5.7)

wij ≤ xij ∀ i ∈ C p , j ∈ S (5.8)

xi1 j + xi2 j ≥ 2wi1 j ∀ j ∈ S, i1 ∈ C p : CPMi1 > 0,


i2 ∈ C p : CPMi1 = CPMi2 ∧ i2 > i1 (5.9)

wi1 j = 0 ∀ i1 , i2 ∈ C p , j ∈ S : (CPMi1 = 0)∨


(CPMi1 > 0 ∧ CPMi1 = CPMi2 ∧ i2 < i1 ) (5.10)

RLip vipj = PRi xij ∀ i ∈ C, j ∈ S (5.11)
p∈P:RLip >0

xij ∈ B ∀ i ∈ C, j ∈ S (5.12)

0 ≤ vipj ≤ 1 ∀ i ∈ C, p ∈ P, j ∈ S (5.13)

wij ∈ B ∀ i ∈ C p , j ∈ S (5.14)

0 ≤ mpj ≤ MPMpj ∀ p ∈ P, j ∈ S. (5.15)

5.1.2. Facility utilization problem formulation. The FUM ac-


cepts as an input each facility’s upper bound on capacity for each process
(determined by the variable mkpj in the FCM) and decides which utiliza-
tion range a facility falls within, as well as the same sourcing decisions. As
mentioned in Section 2, a manufacturing site’s capacity utilization impacts
its cost structure; this model allows a supplier that runs a plant at higher
capacity to incur a lower per-unit cost than an under-utilized facility due
to improved fixed cost coverage.
The formulation below defines the objective function solution MinkU for
the FUM at iteration k. The FUM differs from the reformulated MINLP
since the variables mkpj , nppkj , pmckpj , ukpj , and ipukj from the reformulated
MINLP are constants. The value for mkpj is passed from the FCM and is
used to determine ukpj , nppkj , pmckpj , and ipukj . Hence, constraints (4.2) and
(4.3), which are (5.28) and (5.29) in the FUM, respectively, are now linear.
This MIP has m(nr + n + 1) + λm = 3, 527, 274 variables and mn(n−1) 2 +
λ(λ−1)(2m+1)
2
+ n + m(3 + λ + 2n + r + 1) + 1 = 1, 619, 036 constraints
before the pre-solve; The number of variables and constraints that can be
eliminated during CPLEX’s pre-solve is dependent upon the scenario; in
one scenario, the MIP had 14,741 variables and 4,940 constraints after the
pre-solve.

www.it-ebooks.info
614 ERICA KLAMPFL AND YAKOV FRADKIN


MinkU = min
xiqj ,yqj ,vipqj ,
(FCij + TCij + PCiqj )xiqj
∀i∈C,j∈S,q∈Q,p∈P; i∈C,j∈S,q∈Q
wij ∀i∈C p ,j∈S
  
− FCij wij + MCpj mkpj + CCj
i∈C\{i:CPMi =0}, p∈P,j∈S j∈S c
j∈S

s.t.

xiqj = 1 ∀ i ∈ C (5.16)
q∈Q,j∈S

 
xi1 qj = xi2 qj ∀ i1 , i2 ∈ C, j ∈ S : i2 < i1 ∧ PFi1 = 0
q∈Q q∈Q

∧ PFi2 = PFi1 (5.17)


vipqj = xiqj ∀ i ∈ C, j ∈ S, q ∈ Q (5.18)
p∈P:RLip >0

 
RLip vipqj ≤ COF(SACpj + CPUpj mkpj ) yqj
i∈C:RLip >0,q∈Q q∈Q

∀ j ∈ S, p ∈ P (5.19)


vipqj = 1 ∀ i ∈ C (5.20)
p∈P:RLip >0,q∈Q,j∈S:SACpj >0


EHi xiqj ≥ UMIN (5.21)
i∈C,q∈Q,j∈S:Uj =1

 
SUj xi1 qj1 = SUj xi2 qj2 ∀ i1 ∈ C p : IFi1 > 0,
q∈Q,j∈S q∈Q,j∈S

i2 ∈ C p : IFi1 = IFi2 ∧ i2 > i1 (5.22)


wij ≤ xiqj ∀ i ∈ C p , j ∈ S (5.23)
q∈Q

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 615
 
xi1 qj + xi2 qj ≥ 2wi1 j ∀ j ∈ S, i1 ∈ C p : CPMi1 > 0,
q∈Q q∈Q

i2 ∈ C p : CPMi1 = CPMi2 ∧ i2 > i1 (5.24)

wi1 j = 0 ∀ i1 , i2 ∈ C p , j ∈ S : (CPMi1 = 0)∨


(CPMi1 > 0 ∧ CPMi1 = CPMi2 ∧ i2 < i1 ) (5.25)

 
RLip vipqj = PRi xiqj ∀ i ∈ C, j ∈ S (5.26)
p∈P:RLip >0,q∈Q q∈Q


yqj ≤ 1 ∀ j ∈ S (5.27)
q

 RLip
v
k pmck ipqj
≤ (UBq − ipukj )yqj
i∈C,p∈P:pmcp,j >0∧RLip >0
npp j pj

∀ j ∈ S, q ∈ Q (5.28)

 RLip
vipqj ≥
i∈C,p∈P:pmcp,j >0∧RLip >0
nppkj pmckpj

(UBq−1 − ipukj )yqj ∀ j ∈ S, q ∈ Q : q > 1 (5.29)


yqj ≤ xiqj ∀ j ∈ S, q ∈ Q (5.30)
i∈C

xiqj ≤ yqj ∀ i ∈ C, q ∈ Q, j ∈ S (5.31)

xiqj ∈ B ∀ i ∈ C, q ∈ Q, j ∈ S (5.32)

yqj ∈ B ∀ q ∈ Q, j ∈ S (5.33)

0 ≤ vipqj ≤ 1 ∀ i ∈ C, p ∈ P, q ∈ Q, j ∈ S (5.34)

wij ∈ B ∀ i ∈ C p , j ∈ S (5.35)

www.it-ebooks.info
616 ERICA KLAMPFL AND YAKOV FRADKIN

5.2. Algorithm. In this section, we describe in detail the algorithm


we used to solve the reformulated MINLP. We will refer to the FUM and
FCM models described above and the flowchart of the algorithm in Figure
2 on page 611. This algorithm allows us to solve the reformulated MINLP
by fixing certain variables in the reformulated MINLP at each stage in the
iteration, resulting in solving only MIPs (i.e., the FCM and FUM).
5.2.1. Initialization. The first step in the algorithm in Figure 2 is
to initialize parameters. We set the iteration number k = 1, initialize the
starting FUM objective function value, Min0U , to an arbitrarily large value,
0
and set the utilization level, yqj , for each facility j ∈ S. We performed test-
0
ing to determine the best initialization for yqj that would yield the fastest
convergence. We tried the following three strategies: starting from the
highest, the current, and the lowest facility utilization range. After much
testing, the results demonstrated that we achieved the fastest convergence
0
on average when using the highest facility utilization for yqj ; that is, we
assumed that all facilities (no matter whether we sourced or never sourced
any business to such a facility in the past) were in their last (i.e. highest)
utilization range:
0
yqj = 0 ∀ q ∈ Q : q = , j ∈ S;
y 0,j = 1 ∀ j ∈ S.
5.2.2. Calculate input parameters for FCM. The next step in
the algorithm in Figure 2 is to calculate the product program costs at all
facilities based on the utilization level that was either assumed in Section
5.2.1 for each facility, or determined in Section 5.2.6 for facilities having a
share of Ford business.
But first, note that if FUM determined that facility j does not have
a share of Ford business, then the value yqj k
= 0 ∀q ∈ Q will be passed
from the FUM. We need to adjust these values to account for the non-Ford
initial utilization:
For j = 1, . . . , m
t=0
For q = 1, . . . , 
k
t = t + yqj
End For
If (t = 0) then
For q = 1, . . . , 
If (UBq−1 < ipukj ≤ UBq ) then
k
yqj =1
End If
End For
End If
End For

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 617

Once this is done, we can determine the cost multiplier for each facility: if
facility j is in utilization level qj (that is, if yqkj ,j = 1), then we use cost
multiplier QCqkj j for that facility. The value of QCqkj j is used consequently
to determine the unit cost for each product program:
UCkij = U BCi · QCqkj j ∀ i ∈ C p , j ∈ S;
PCkij = NPV · UCkij 1,000
VVi
∀ i ∈ C p , j ∈ S.
Recall that we approximate UCkij as described in Section 4.3 and calculate
PCkij as defined in (3.17). Finally, we update the iteration counter (i.e.,
k = k + 1), and return to the step in Section 5.2.3.
5.2.3. Solving the FCM. The third step in the algorithm in Figure
2 is to solve the FCM, which will tell us how many units of process capacity
is moved to a facility.
This MIP typically took around 10 minutes to solve. The output from
this model is MinkC , mkpj , wij , xij , and vipj : note that we only pass MinkC
and mkpj , the objective function value and the process capacity moves to
each facility, respectively, to the next step in the algorithm. The value of
mkpj will be used to determine the upper bound on process capacity at the
k th iteration in the FUM.
5.2.4. Convergence check after FCM. The fourth step in the al-
gorithm in Figure 2 is to check for convergence. If MinkC < Mink−1
U (i.e.,
we obtained a smaller objective function value for the FCM than we did
for the FUM in the previous iteration), then the algorithm continues. If
not, then the algorithm has converged.
5.2.5. Calculate input parameters for FUM. The fifth step in
the algorithm in Figure 2 is to calculate the input parameters necessary for
the FUM that are functions of or dependent upon mkpj . First, we determine
if any new processes were added to a facility for which there was no prior
processes of that type. In the reformulated MINLP, the variable upj was a
binary variable that established this. However, we can now determine if any
new processes were added to a facility for which there was no prior processes
of that type with the following check: if IPMCpj = 0 ∧ mkpj > 0, then no
prior process of that type previously existed, and now it has been added.
Next, we need to calculate nppkj : constraint (3.13) in the reformulated
MINLP was previously used to determine this. We can now update nppkj
after solving the FCM by iterating over all plants j and all processes p,
updating nppkj only if for any mkpj > 0, IPMCpj = 0. Subsequently, we can
update pmcpj and ipuj . That is, we update these parameters as follows:
Let nppkj = IPPj
For j = 1, . . . , m
For p = 1, . . . , r
pmcpj = IPMCpj + CPUpj mpj

www.it-ebooks.info
618 ERICA KLAMPFL AND YAKOV FRADKIN

If IPMCpj = 0 ∧ mkpj > 0 then


nppkj = nppkj + 1
End If
End For1 2
 pmcpj −SACpj
ipuj = p∈P pmcpj /nppj
End For

Recall that we calculate pmckpj as defined in (3.14) and ipukj as defined in


(3.18).
5.2.6. Solving the FUM. The sixth step in the algorithm in Figure
2 is to solve the FUM. We pass the values MinkC , mkpj , nppkj , pmckpj , and
ipukj to the FUM. These values provide an upper bound on the process
capacity at each facility, the facility’s initial percent utilization, and the
number of processes at each facility. Initially, this MIP took between 20
and 30 minutes to solve using the dual simplex method. However, we
changed to an interior point solver and reduced the solve time by half.
The output of this model is MinkU , wijk
, xkiqj , yqj
k k
, and vipqj . Note that
k k
we only pass MinU and yqj , the objective function value and the utilization
range for each facility, respectively, to the next step in the algorithm.
5.2.7. Convergence check after FUM. After solving the FUM,
the seventh step in the algorithm in Figure 2 is to check for convergence.
If MinkU < MinkC (i.e., we obtained a smaller objective function value for
the FUM than we did for the FCM in this iteration), then the algorithm
continues. If not, then the algorithm has converged.
5.2.8. Implementation. The two MIP models, Facility Capacity
(section 5.1.1) and Facility Utilization (section 5.1.2) models were imple-
mented in ILOG OPL Studio 5.2 [3], using CPLEX 10.2 with Microsoft
Excel 2003 as the user interface. The tool was run on a 3.0 GHz Intel Xeon
dual-processor workstation with 2GB RAM.
5.2.9. Algorithm convergence. We will now show that our algo-
rithm is guaranteed to converge. We introduce the following notations for
k
clarity. We define the vector zC to be the solution to the FCM model at
th k
the k iteration the vector zU be the solution to the FUM model at the
k th iteration. We let the optimal solution be F ∗ and update its value after
solving the FCM or FUM.
CPLEX’s MIP solver produces a global optimal solution for both the
FUM and FCM at each iteration k; [2] provides details on convergence
theory for the barrier and dual simplex method. In addition, at the k th
iteration, the solution for F CM k is also feasible for F U M k because zC
k
sat-
k
isfies the constraints of F U M for the fixed utilization range q. Because
zCk
is feasible for F U M k , it follows that MinkU ≤ MinkC because zUk
is the

global optimal solution to F U M . If MinU < MinC , then F = MinkU .
k k k

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 619

That is, we only accept an updated solution for F ∗ if the solution to the
FUM is strictly less than the solution to the FCM at the k th iteration.
Similarly, at the k + 1st iteration, the solution for F U M k is also feasible
for F CM k+1 because zU k
satisfies the constraints of F CM k+1 by the na-
k
ture of the problem formulation. Because zU is feasible for F CM k+1 , it
follows that MinC ≤ MinU since zC is the global optimal solution to
k+1 k k+1

F CM k+1 . If Mink+1
C < MinkU , then F ∗ = Mink+1C . Therefore, our algo-
rithm produces a set of solutions that are strictly monotone decreasing. In
addition, this sequence is bounded below because both MinkC and MinkU
are bounded below. Therefore, our algorithm converges and convergence
occurs when MinkU = MinkC = F ∗ or Mink+1 C = MinkU = F ∗ . More strongly,
we know that the algorithm converges finitely because the MIP problems
being alternatively solved have discrete and finite feasible regions: the strict
monotonicity implies the algorithm must eventually stop. [9] provides de-
tails about for convergence theory on monotone decreasing sequences.
6. Computational example. Throughout this paper, we have de-
scribed the size of the problem that we actually solved for the ACH interiors
restructuring. In this section, we provide a small computational example
to demonstrate the algorithm and the core decision making capabilities of
the model. We will first introduce the inputs for this example and follow
with the model runs and solution.
6.1. Inputs. First, we describe the input parameters that affect the
dimension of the problem. We have four product programs (λ = 4) having
a total of ten manufacturing requirements (n = 10) and four third-party
suppliers having seven manufacturing facilities (m = 7) to which the prod-
uct programs need to be sourced. There are thirteen potential processes
(r = 13), although not all processes are available at every facility. We
divide a facility’s utilization into ten levels ( = 10).
Table 1 lists ten manufacturing requirements C, each characterized by
certain process requirement PRi . The column PFi defines the product-
program family-numbers; there are four unique product programs. For
example, the first two manufacturing requirements correspond to the same
product program, so they must be assigned to the same manufacturing fa-
cility. The column with C p ⊂ C provides the indices for the corresponding
manufacturing requirements that represent the four unique product pro-
grams. Each product program has the following values associated with it:
NPV, VVi , EHi , UBCi , IFi , and CPMi . Note that NPV = 3.79 for ev-
ery product program. VVi is the annual production volume, EHi is the
number of required employees, and UBCi is the unit production cost. IFi ,
when non-zero, is the product program’s “intellectual property family”; in
our example, product programs 1 and 4 fall under the same intellectual
property agreement and must be assigned to the same supplier but pos-
sibly to different manufacturing facilities of that supplier. Finally, CPMi ,
when non-zero, is the product program’s “freight-family” identifier. In our

www.it-ebooks.info
620 ERICA KLAMPFL AND YAKOV FRADKIN

Table 1
This table lists the characteristics of four product programs C p having ten manu-
facturing requirements C.

i∈C PRi PFi Cp VVi EHi UBCi IFi CPMi


1 149 101 1 43.3 56 $57.33 105 -
2 13.082 101
3 72000 102 3 106.3 - $40.00 - 100
4 52.1 103 4 19.9 - $40.45 105 -
5 230.2 103
6 456 104 6 24.9 6 $400.59 - 100
7 125 104
8 5300 104
9 6300 104
10 8174.8 104

example, product programs 3 and 6, if assigned to the same manufacturing


facility, can be shipped for one combined cost.
Table 2 lists seven manufacturing facilities S, of which facility 4 is
a union facility. In this example, we assume that the two Ford facilities
must be closed, requiring that all product programs be sourced to the seven
listed non-Ford facilities in Table 2: we do not include the two Ford facilities
in our set S. The column Uj denotes which facilities are union facilities.
In this example, however, union considerations are set aside; that is, the
fewest number of employees required across all union facilities is UMIN = 0.
The column SUj provides the supplier identification code: there are three
different suppliers. Facilities with the same supplier code are owned by the
same supplier. For example, the last two facilities (j = 6, 7) with SUj = 108
belong to the same supplier. The facilities each presently have IPPj types
of manufacturing processes; that is, each facility has IPPj processes having
non-zero installed capacity, IPMCpj (see Table 4 for IPMCpj values). We
do not incur any closing cost CCj for not assigning any business to any of
the seven facilities because they are all non-Ford facilities; also, all non-Ford
facilities are open for business; i.e., S c = ∅.
The underlying assumption of this example is that in the “current
state” all product programs are assigned to two company-owned facilities,
and we require all production to move to third-party suppliers. Moving
production to a different facility often requires moving of tools (e.g. dies)
at a certain one-time cost defined at the level of product programs C p . We
assume that tooling move costs for the product programs are TC1j = $0.3,
TC3j = TC4j = $0.0, and TC6j = $2.0 for every j ∈ S.
Production also requires certain manufacturing “real estate,” such as
molding machines or floor space, and there is often a choice of which man-

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 621

Table 2
This table lists the characteristics of seven manufacturing facilities S.

j∈S Uj SUj IPPj CCj


1 - 106 9 -
2 - 107 5 -
3 - 107 6 -
4 1 107 6 -
5 - 107 4 -
6 - 108 7 -
7 - 108 4 -

ufacturing process can be used for a certain manufacturing requirement.


Table 3 provides the values for RLip for each manufacturing requirement
on each process. If RLip is non-zero, then the value in the table represents
the amount of process p ∈ P needed to support a manufacturing require-
ment i ∈ C if it is fully assigned to that process. For example, if the first
requirement is fully assigned to processes 2, then it consumes 149 of that
process. Requirements can be spread over multiple processes if available;
for example, the first requirement could be split fifty-fifty between processes
3 and 4, creating a load of 74.5 on each process. Note that requirement 1
cannot be assigned to any process other than process 2, 3, or 4.

Table 3
The load RLip caused by manufacturing requirement i ∈ C if and when fully as-
signed to process p ∈ P.

p∈P
i∈C 1 2 3 4 5 6 7 8 9 10 11 12 13
1 - 149 149 149 - - - - - - - - -
2 - - - 13.1 13.1 13.1 - - - - - - -
3 - - - - - - - - - - - - 72000
4 52.1 52.1 52.1 - - - - - - - - - -
5 - - 230.2 230.2 230.2 - - - - - - - -
6 - - - - - 456 456 456 456 - - - -
7 - - - - - - - - - 125 - - -
8 - - - - - - - - - - 5300 - -
9 - - - - - - - - - - - 6300 -
10 - - - - - - - - - - - - 8174.8

We know each facility’s installed process capacity IPMCpj (see Table


4) and the portion SACpj of that capacity that is available for sourcing (see
Table 5); the difference IPMCpj − SACpj represents the capacity promised
by that facility to other parties. We assume the value of the capacity

www.it-ebooks.info
622 ERICA KLAMPFL AND YAKOV FRADKIN

Table 4
The total initial installed capacity IPMCpj for process p ∈ P at facility j ∈ S.

j∈S
p∈P 1 2 3 4 5 6 7
1 2,400 1,680 - - 1,440 360 480
2 120 600 - - - 480 240
3 - 960 - 120 1,320 120 240
4 1,200 - 120 - 120 240 -
5 150 - - - - 240 -
6 300 - 480 240 - 240 -
7 - - - 480 - - -
8–9 - - - - - - -
10 150 - 240 120 - - -
11 6,500 - 18,252 25,272 - - -
12 7,800 480 9,600 12,000 - - -
13 40,000 874,000 160,000 178,000 288,000 55,000 14,000

Table 5
The available capacity SACpj of process p ∈ P at facility j ∈ S.

j∈S
p∈P 1 2 3 4 5 6 7
1 1,200 359 - - 1,071 62 -
2 60 241 - - - 111 16
3 - 98 - 120 1,047 17 16
4 330 - - - 102 168 -
5 50 - - - - 46 -
6 90 - 130 25 - 69 -
7 - - - 20 - - -
8-9 - - - - - - -
10 80 - 240 - - - -
11 1,000 - 17,002 21,626 - - -
12 1,000 400 6,384 8,400 - - -
13 10,000 406,010 50,000 75,000 - 16,000 5,000

overload flex factor COF = 1.0; i.e., no overload on any of the processes is
allowed.
We have the option of moving, if needed, the process capacities from
the closed company-owned facilities to a different location, within certain
limits and at a certain cost. In this example, we assume that the maximum
number of units of capacity of process that can be moved to a facility

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 623

j ∈ S is as follows: MPMpj = 3 for processes p = 1–4, 6 and 8–11; and


MPMpj = 0 for the remaining processes p = 5, 7, 12 and 13. The capacity
per unit of process at a facility j ∈ S is defined as follows: CPUpj = 120
for processes p = 1–4, 6 and 8–10; CPUpj = 2, 880 for process p = 11; and
CPUpj = 0 for the remaining p = 5, 7, 12 and 13.
Table 6 provides the cost for moving a unit of capacity of process to
a facility. A table entry with no cost means that you cannot move the
specified process to the specified facility. Process moves can only occur in
predefined “chunks”; for example, one can only move the stamping machine
in its wholeness, even if just a tiny portion of that machine’s capacity would
be needed at the new location. Some capacities, such as floor space, cannot
be moved at all. In our example, the capacities of processes 5, 7, 12, and
13 are unmoveable; processes 8 and 9 presently have no capacities at any
of the facilities (as seen in Table 5) but can be moved into the facility of
our choice if needed.

Table 6
The cost MCpj of moving a unit of capacity of process p ∈ P to facility j ∈ S.

j∈S
p∈P 1 2 3 4 5 6 7
1 $0.08 $0.08 $0.08 $0.08 $0.08 $0.08 $0.08
2 $0.09 $0.09 $0.09 $0.09 $0.09 $0.09 $0.09
3 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10
4 $0.18 $0.18 $0.18 $0.18 $0.18 $0.18 $0.18
5, 7, 12, 13 - - - - - - -
6 $0.33 $0.33 $0.33 $0.33 $0.33 $0.33 $0.33
8 $0.35 $0.35 $0.35 $0.35 $0.35 $0.35 $0.35
9 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38
10 $0.13 $0.13 $0.13 $0.13 $0.13 $0.13 $0.13
11 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38 $0.38

Once produced, the goods need to be shipped from the manufacturing


facilities to the vehicle assembly plants, at a certain cost. Table 7 provides
the net present value freight cost, FCij , to ship product program i ∈ C p
from facility j ∈ S to the final vehicle assembly destination. Recall that
some parts that are shipped together get a freight discount. Those eligible
are shown in Table 1 under the column CPMi .
Finally, the unit production cost depends on a facility’s utilization. In
the first approximation, it is a hyperbolic function accounting for variable
and fixed costs; the second-degree factors account for shop floor congestion
at critical (high) utilization. The utilization curve has been modeled with
ten utilization levels q ∈ Q. For example, the second utilization level rep-
resents facility utilization range from UB1 = 10% to UB2 = 20%. Table 8

www.it-ebooks.info
624 ERICA KLAMPFL AND YAKOV FRADKIN

displays the values for the utilization-dependant cost multipliers QCqj for
each facility j ∈ S in all of the ranges q ∈ Q.

Table 7
Freight cost FCij , in $mln, to ship product program i ∈ C p from facility j ∈ S to
the final vehicle assembly destination. This is the net present value cost over the life of
the project, calculated as the annual freight cost multiplied by the NPV factor.

j∈S
i∈C p
1 2 3 4 5 6 7
1 0.190 0.476 1.198 0.941 0.654 0.667 0.689
3 648.772 1,031.588 886.521 1,571.560 224.048 1,161.343 224.048
4 0.324 0.322 0.627 0.067 0.453 0.443 0.464
6 96.819 193.072 259.753 322.666 95.312 224.905 106.048

Table 8
This table shows utilization-dependant cost multiplier QCqj for each facility j ∈ S.

j∈S
q∈Q UBq 1 2 3 4 5 6 7
1 0.1 1.85 2.57 1.85 1.85 2.57 2.57 1.85
2 0.2 1.4 1.76 1.4 1.4 1.76 1.76 1.4
3 0.3 1.17 1.35 1.17 1.17 1.35 1.35 1.17
4 0.4 1.1 1.22 1.1 1.1 1.22 1.22 1.1
5 0.5 1.061 1.15 1.061 1.061 1.15 1.15 1.061
6 0.6 1.04 1.111 1.04 1.04 1.111 1.111 1.04
7 0.7 1.0201 1.08 1.0201 1.0201 1.08 1.08 1.0201
8 0.8 1.01 1.05 1.01 1.01 1.05 1.05 1.01
9 0.9 1 1.04 1 1 1.04 1.04 1
10 100 1.00001 1.04001 1.00001 1.00001 1.04001 1.04001 1.00001

6.2. Results. We now provide results for the computational example


described above, using the algorithm described in Section 5.2.
Iteration 1: Facility capacity model. We first initialize the FUM
objective function value, Min0U , to an arbitrarily large value and the uti-
0
lization levels yqj with the “highest” facility utilization range; that is, we
assumed that all facilities were in their maximum utilization range (q = 10).
Note that by setting all facilities to be at their highest utilization in the
first iteration, we are potentially giving them a more beneficial cost mul-
tiplier QCqj than they actually deserve. For example, looking at Table 8,
if facility 1 is really 25% utilized, then q = 3 and QC3,1 = 1.17, but if we
initialize q = 10 ∀q ∈ Q, then QC10,1 = 1.00001.

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 625

This FCM problem had 124 variables and 71 constraints after the
pre-solve. After running the FCM, the manufacturing requirement facility
sourcing decisions are in Table 9, the manufacturing process sourcing deci-
sions are in Table 10, and the objective function value Min1C = 1362.615864.
Also after running the FCM, we get that the units of capacity to move
m6,3 = 3 (increases the capacity of process 6 at facility 3 from 130 to 490),
and mpj = 0 for all other p ∈ P and j ∈ S. Note from Table 3 that
manufacturing requirement 6 can be run on process 6, 7, 8, or 9 and has a
required load of 456. However, we can see in Table 5 that no processes of
type 6, 7, 8, or 9 at any of the facilities has enough available capacity to
meet the requirement for manufacturing requirement 6. So, the problem
would be infeasible if we were not allowed to move units of capacity into
a facility. Although the moving costs for the different processes 6, 8, and
9 are the same for facilities 1, 3, 4, and 6 according to Table 6 (process 7
cannot be moved), adding process capacity of type 6 at facilities 1, 4, and
6 and process capacity of type 8 and 9 at all facilities would require moving
4 units of process capacity: moving process capacity of type 6 to facility 3
only requires moving 3 units of process capacity.

Table 9
This table shows the output value xij specifying whether or not manufacturing
requirement i ∈ C is produced by facility j ∈ S. These output values are equivalent for
the first and the second executions of both the FCM and the FUM.

j∈S
i∈C 1 2 3 4 5 6 7

1–2 1 - - - - - -
3 - 1 - - - - -
4–5 1 - - - - - -
6–10 - - 1 - - - -

Since we are solving the FCM in the first iteration, and Min0U is set to
some arbitrarily large number, we do not need to check for convergence. We
proceed to the next step, where we calculate the values for nppj as defined
in the algorithm section, pmckpj as defined in (3.14), and ipukj as defined in
(3.18). Note that although we move m6,3 = 3 units of capacity of process
p = 6 to facility j = 3, some units of capacity of process six were already
available at facility three, so all the values of nppj as defined in Table 2 are
the same. We update the other two values for the affected . facility number
j = 3: pmc16,3 = 480 + 3 · 120 = 840, and ipu13 = 120−0 + 840−130 +
240−240 18252−17002 9600−6384 160000−50000
/ 120 840
240 + 18252 + 9600 + 160000 /6 = 0.7680378.

www.it-ebooks.info
626 ERICA KLAMPFL AND YAKOV FRADKIN

Table 10
This table shows
P the output from the first and the second executions of the FCM:
the actual load j∈S,q∈Q RLip vipqj of manufacturing requirement i ∈ C on process
p ∈ P.

p∈P
i∈C 1 2 3 4 5 6 7 8 9 10 11 12 13
1 - - - 149 - - - - - - - - -
2 - - - - - 13.1 - - - - - - -
3 - - - - - - - - - - - - 72,000.0
4 52.1 - - - - - - - - - - - -
5 - - - 181 49.2 - - - - - - - -
6 - - - - - 456 - - - - - - -
7 - - - - - - - - - 125 - - -
8 - - - - - - - - - - 5300 - -
9 - - - - - - - - - - - 6300 -
10 - - - - - - - - - - - - 8,174.8

Iteration 1: Facility utilization model. The output from iteration


1 of the FCM was then used as the input to the FUM: so, in this run, facility
3’s available capacity of process 6 was increased by three units.
This FUM problem had 151 variables and 113 constraints after the
pre-solve. After running the FUM, we get the following outputs: objective
function value, facility utilization, manufacturing requirement sourcing de-
cisions regarding facilities and processes. The objective function value is
Min1U = 1364.143179. Note that the objective function value for the FUM
in iteration 1 is greater than the objective function value for the FCM in
iteration 1 (i.e. Min1U > Min1C ). This is because the FCM got a better
utilization cost multiplier than was actually feasible, while the utilization
level constraints were enforced in the FUM, giving the accurate, yet higher,
utilization cost multiplier.
The manufacturing requirements are sourced to the same facilities as
in iteration 1 of the FCM (see Table 9). The facilities have the following
utilization levels: facility 1 is utilized at 70–80% (y8,1 = 1); facilities 2 and
3 are utilized at 60–70% (y7,2 = y7,3 = 1); and all the remaining facility
utilization levels, yqj , are zeroes. Recall that utilization level is only defined
for the facilities to which we source a share of our own business. Hence, we
use the the post-processing technique in Section 5.2.2 to re-establish the
utilization levels of non-Ford utilized facilities 4, . . . , 7 to be the same as
their initial utilization levels (see Table 12): this results in yq̄j ,j .
The manufacturing requirement process sourcing decisions are in Ta-
ble 11. It is instructive to compare these results against the required loads
RLip listed in Table 3. For example, the manufacturing requirement num-

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 627

ber 1 (requiring a load of 149 on any or several of the processes 2, 3, or 4)


has been split between two processes: a load of 17.9 on process 2 and a load
of 131.1 on process 4. Note that the FUM tends to source one manufactur-
ing requirement to multiple processes if cost effective because a facility’s
utilization is actually an average of the process utilization at that facility.
Before proceeding to the second iteration, we apply the method in
Section 5.2.2 to update the values of cost multipliers QCq̄kj j (see Table 12),
requirement costs UCkij , and product program costs PCkij .

Table 11
P This table shows the output from the first executionof the FUM: the actual load
j∈S,q∈Q RLip vipqj of manufacturing requirement i ∈ C on process p ∈ P.

p∈P
i∈C 1 2 3 4 5 6 7 8 9 10 11 12 13
1 - 17.9 - 131.1 - - - - - - - - -
2 - - - - - 13.1 - - - - - - -
3 - - - - - - - - - - - - 72000
4 52.1 - - - - - - - - - - - -
5 - - - 180.2 50 - - - - - - - -
6 - - - - - 456 - - - - - - -
7 - - - - - - - - - 125 - - -
8 - - - - - - - - - - 5300 - -
9 - - - - - - - - - - - 6300 -
10 - - - - - - - - - - - - 8174.8

Table 12
This table shows the utilization ranges q̄j for all facilities, as calculated by the first
execution of FUM (for Ford-utilized facilities 1 . . . 3) and enhanced by post-processing
logic of Section 5.2.2 (for non-Ford utilized facilities 4 . . . 7). The corresponding cost
multipliers QCq̄kj j are then passed to the second execution of the FCM.

j∈S
1 2 3 4 5 6 7
q̄j 8 7 7 5 5 8 9
QCq̄kj j 1.01 1.08 1.0201 1.061 1.15 1.05 1

Iteration 2: Facility capacity model. In the second iteration,


we use the utilization levels from the FUM in the previous iteration
as inputs. The objective function value from the run of this FCM is
Min2C = 1364.143179, which is the same as the objective function value
in the FUM in the first iteration. However, the manufacturing requirement

www.it-ebooks.info
628 ERICA KLAMPFL AND YAKOV FRADKIN

facility sourcing decisions and manufacturing process sourcing decisions are


the same as in the run of the first FCM, see Tables 9 and 10, respectively.
In the FCM, the overall facility has the same utilization as in the FUM,
but the average process utilization between the two solutions is different.
Since the objective function value from the run of this second FCM is the
same as the objective value from the run of the first FUM, the algorithm
has converged.

7. Summary. In this paper, we presented a real world problem that


uses operations research techniques to aid in restructuring Ford’s interiors
business in a distressed supplier environment. We first gave an overview
of the business problem and the importance behind using a data-driven
approach to arrive at sourcing decisions in an area of strategic importance
to the company.
The underlying model that described the business problem of identi-
fying feasible allocations of 229 manufacturing operations for 42 product
programs at more than 50 different supplier sites is a large scale MINLP.
We described how we reformulated the large scale MINLP by linearizing
the objective function through introduction of discrete utilization levels.
We then decoupled the discrete MINLP into two easily solved MIPs for
which solution techniques are well known, and we developed an algorithm
that iteratively solved the MIPS until convergence. We provided a decision
support system that allowed the users to easily perform scenario and sen-
sitivity analysis. This feature was especially important in practice, since
considerations other than cost (e.g. union relations, intellectual property,
concentration of sourcing) may play a part in evaluating alternative sourc-
ing decisions.
As a result of the availability of the new tool, a new process was de-
veloped by which ACH and Purchasing identified optimal and next-best
suppliers for each product program in the portfolio, allowing targeted Re-
quests For Quotes (RFQ) to these top suppliers. This practice represents
a major departure from standard procedures (typically all suppliers would
be sent RFQs).
In the final (May, 2007) approved strategy based on the results of the
tool output and resulting targeted market tests, 39 of 42 product programs
agreed completely with the optimal recommendation produced by the tool.
Perhaps most importantly, the tools output was used as a direct basis for
securing $45 million in restructuring funding to achieve the recommended
sourcing pattern, thereby saving the company approximately $50-55 million
in additional costs over a five-year period.
Although the tool was developed for the problem of re-sourcing plastic
interiors, it can easily be adapted to address similar strategic sourcing
issues for other classes of commodities. We provided a small computational
example to illustrate the use and functionality of the algorithm and solution
approach.

www.it-ebooks.info
MINLP APPLICATION FOR ACH INTERIORS RESTRUCTURING 629

Acknowledgements. We would like to acknowledge the key involve-


ment of our business partners Chip McDaniel, formerly of Ford, and Mike
Wolcott of ACH for posing this problem, helping shape the decisions of
the model, painstakingly gathering all of the data, and actively engaging
in testing and validation of the model.

REFERENCES

[1] C. Floudas, Nonlinear and Mixed-Integer Optimization: Fundamentals and Ap-


plications, Oxford University Press, Inc., 1995.
[2] J. Nocedal and S. Wright, Numerical Optimization, Springer, New York, NY,
1999.
[3] ILOG, ILOG OPL-CPLEX Development System, http://www.ilog.com/
products/oplstudio/, 2008.
[4] Ford Motor Company, 2006. Ford Accelerates “Way Forward.” http://media.
ford.com/article_display.cfm?article_id=24261. Retrieved June 23, 2009.
[5] I. Nowak, Relaxation and Decomposition Methods for Mixed Integer Nonlinear
Programming, Birkhäuser Verlag, 2005.
[6] L. Wolsey, Integer Programming, John Wiley & Sons, Inc., 1998.
[7] E. Klampfl, Y. Fradkin, C. McDaniel, and M. Wolcott, Ford Optimizes
Urgent Sourcing Decisions in a Distressed Supplier Environment, Interfaces,
Special Section: The Daniel H. Wagner Prize for Excellence in Operations
Research Practice, Editor: Joseph H. Discenza, September-October 2009.
[8] R. Rardin, 2000. Optimization in Operations Research. Prentice Hall, Inc., Upper
Saddle River, NJ.
[9] J. Deshpande, 2004. Mathematical Analysis and Applications: An Introduction.
Alpha Science Int’l Ltd., Harrow, UK.
[10] Sharon Silke Carty, USA TODAY, March 27, 2008. Many auto
parts suppliers failed to widen base http://www.usatoday.com/money/
industries/manufacturing/2008-03-26-auto-parts-suppliers_N.htm. Re-
trieved November 20, 2008.
[11] AutomotiveWorld.com, February 14, 2008. US: Blue Water files
for bankruptcy http://www.automotiveworld.com/news/powertrain/
66356-us-blue-water-files-for-bankruptcy. Retrieved November 20,
2008.

www.it-ebooks.info
www.it-ebooks.info
A BENCHMARK LIBRARY OF MIXED-INTEGER
OPTIMAL CONTROL PROBLEMS
SEBASTIAN SAGER∗

Abstract. Numerical algorithm developers need standardized test instances for


empirical studies and proofs of concept. There are several libraries available for finite-
dimensional optimization, such as the netlib or the miplib. However, for mixed-integer
optimal control problems (MIOCP) this is not yet the case. One explanation for this
is the fact that no dominant standard format has been established yet. In many cases
instances are used in a discretized form, but without proper descriptions on the modeling
assumptions and discretizations that have been applied. In many publications crucial
values, such as initial values, parameters, or a concise definition of all constraints are
missing.
In this contribution we intend to establish the basis for a benchmark library of
mixed-integer optimal control problems that is meant to be continuously extended online
on the open community web page http://mintoc.de. The guiding principles will be
comprehensiveness, a detailed description of where a model comes from and what the
underlying assumptions are, a clear distinction between problem and method description
(such as a discretization in space or time), reproducibility of solutions and a standardized
problem formulation. Also, the problems will be classified according to model and
solution characteristics. We do not benchmark MIOCP solvers, but provide a library
infrastructure and sample problems as a basis for future studies.
A second objective is to formulate mixed-integer nonlinear programs (MINLPs) orig-
inating from these MIOCPs. The snag is of course that we need to apply one out of
several possible method-specific discretizations in time and space in the first place to
obtain a MINLP. Yet the resulting MINLPs originating from control problems with an
indication of the currently best known solution are hopefully a valuable test set for de-
velopers of generic MINLP solvers. The problem specifications can also be downloaded
from http://mintoc.de.

AMS(MOS) subject classifications. Primary 1234, 5678, 9101112.

1. Introduction. For empirical studies and proofs of concept, devel-


opers of optimization algorithms need standardized test instances. There
are several libraries available, such as the netlib for linear programming
(LP) [4], the Schittkowski library for nonlinear programming (NLP) [59],
the miplib [44] for mixed-integer linear programming (MILP), or more
recently the MINLPLib [13] and the CMU-IBM Cyber-Infrastructure for
for mixed-integer nonlinear programming (MINLP) collaborative site [15].
Further test libraries and related links can be found on [12], a comprehen-
sive testing environment is CUTEr [28]. The solution of these problems
with different solvers is facilitated by the fact that standard formats such
as the standard input format (SIF) or the Mathematical Programming Sys-
tem format (MPS) have been defined.
Collections of optimal control problems (OCPs) in ordinary differential
equations (ODE) and in differential algebraic equations (DAE) have also

∗ Interdisciplinary Center for Scientific Computing, University of Heidelberg, 69120

Heidelberg, Germany.

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 631
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3_22,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
632 SEBASTIAN SAGER

been set up. The PROPT (a matlab toolkit for dynamic optimization using
collocation) homepage states over 100 test cases from different applications
with their results and computation time, [32]. With the software package
dsoa [20] come currently 77 test problems. The ESA provides a test set of
global optimization spacecraft trajectory problems and their best putative
solutions [3].
This is a good starting point. However, no standard has evolved yet
as in the case of finite-dimensional optimization. The specific formats for
which only few optimization / optimal control codes have an interface,
insufficient information on the modeling assumptions, or missing initial
values, parameters, or a concise definition of all constraints make a transfer
to different solvers and environments very cumbersome. The same is true
for hybrid systems, which incorporate MIOCPs as defined in this paper as
a special case. Two benchmark problems have been defined at [19].
Although a general open library would be highly desirable for opti-
mal control problems, we restrict ourselves here to the case of MIOCPs, in
which some or all of the control values and functions need to take values
from a finite set. MIOCPs are of course more general than OCPs as they
include OCPs as a special case, however the focus in this library will be
on integer aspects. We want to be general in our formulation, without
becoming too abstract. It will allow to incorporate ordinary and partial
differential equations, as well as algebraic constraints. Most hybrid systems
can be formulated by means of state-dependent switches. Closed-loop con-
trol problems are on a different level, because a unique and comparable
scenario would include well-defined external disturbances. We try to leave
our approach open to future extensions to nonlinear model predictive con-
trol (NMPC) problems, but do not incorporate them yet. The formulation
allows for different kinds of objective functions, e.g., time minimal or of
tracking type, and of boundary constraints, e.g., periodicity constraints.
Abstract problem formulations, together with a proposed categorization of
problems according to model, objective, and solution characteristics will
be given in Section 2.
MIOCPs include features related to different mathematical disciplines.
Hence, it is not surprising that very different approaches have been pro-
posed to analyze and solve them. There are three generic approaches to
solve model-based optimal control problems, compare [8]: first, solution of
the Hamilton-Jacobi-Bellman equation and in a discrete setting Dynamic
Programming, second indirect methods, also known as the first optimize,
then discretize approach, and third direct methods (first optimize, then dis-
cretize) and in particular all–at–once approaches that solve the simulation
and the optimization task simultaneously. The combination with the ad-
ditional combinatorial restrictions on control functions comes at different
levels: for free in dynamic programming, as the control space is evaluated
anyhow, by means of an enumeration in the inner optimization problem of

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 633

the necessary conditions of optimality in Pontryagin’s maximum principle,


or by various methods from integer programming in the direct methods.
Even in the case of direct methods, there are multiple alternatives to
proceed. Various approaches have been proposed to discretize the differ-
ential equations by means of shooting methods or collocation, e.g., [10, 7],
to use global optimization methods by under- and overestimators, e.g.,
[18, 48, 14], to optimize the time-points for a given switching structure,
e.g., [36, 26, 58], to consider a static optimization problem instead of the
transient behavior, e.g., [30], to approximate nonlinearities by piecewise-
linear functions, e.g., [45], or by approximating the combinatorial decisions
by continuous formulations, as in [11] for drinking water networks. Also
problem (re)formulations play an important role, e.g., outer convexifica-
tion of nonlinear MIOCPs [58], the modeling of MPECs and MPCCs [6, 5],
or mixed-logic problem formulations leading to disjunctive programming,
[50, 29, 47].
We do not want to discuss reformulations, solvers, or methods in detail,
but rather refer to [58, 54, 29, 5, 47, 50] for more comprehensive surveys
and further references. The main purpose of mentioning them is to point
out that they all discretize the optimization problem in function space in a
different manner, and hence result in different mathematical problems that
are actually solved on a computer.
We have two objectives. First, we intend to establish the basis for
a benchmark library of mixed-integer optimal control problems that is
meant to be continuously extended online on the open community web page
http://mintoc.de. The guiding principles will be comprehensiveness, a
detailed description of where a model comes from and what the underlying
assumptions are, a clear distinction between problem and method descrip-
tion (such as a discretization in space or time), reproducibility of solutions
and a standardized problem formulation that allows for an easy transfer,
once a method for discretization has been specified, to formats such as
AMPL or GAMS. Also, the problems will be classified according to model and
solution characteristics.
Although the focus of this paper is on formulating MIOCPs before any
irreversible reformulation and numerical solution strategy has been applied,
a second objective is to provide specific MINLP formulations as benchmarks
for developers of MINLP solvers. Powerful commercial MILP solvers and
advances in MINLP solvers as described in the other contributions to this
book make the usage of general purpose MILP/MINLP solvers more and
more attractive. Please be aware however that the MINLP formulations we
provide in this paper are only one out of many possible ways to formulate
the underlying MIOCP problems.
In Section 2 a classification of problems is proposed. Sections 3 to
11 describe the respective control problems and currently best known so-
lutions. In Section 12 two specific MINLP formulations are presented for
illustration. Section 13 gives a conclusion and an outlook.

www.it-ebooks.info
634 SEBASTIAN SAGER

2. Classifications. The MIOCPs in our benchmark library have dif-


ferent characteristics. In this section we describe these general character-
istics, so we can simply list them later on where appropriate. Beside its
origins from application fields such as mechanical engineering, aeronautics,
transport, systems biology, chemical engineering and the like, we propose
three levels to characterize a control problem. First, characteristics of the
model from a mathematical point of view, second the formulation of the
optimization problem, and third characteristics of an optimal solution from
a control theory point of view. We will address these three in the following
subsections.
Although we strive for a standardized problem formulation, we do not
formulate a specific generic formulation as such. Such a formulation is not
even agreed upon for PDEs, let alone the possible extensions in the direc-
tion of algebraic variables, network topologies, logical connections, multi-
stage processes, MPEC constraints, multiple objectives, functions including
higher-order derivatives and much more that might come in. Therefore we
chose to start with a very abstract formulation, formulate every control
problem in its specific way as is adequate and to connect the two by us-
ing a characterization. On the most abstract level, we want to solve an
optimization problem that can be written as

min Φ[x, u, v])


x,u,v

s.t. 0 = F [x, u, v], (2.1)


0 ≤ C[x, u, v],
0 = Γ[x].

Here x(·) : Rd #→ Rnx denotes the differential-algebraic states1 in a d-


dimensional space. Until now, for most applications we have d = 1 and
the independent variable time t ∈ [t0 , tf ], the case of ordinary or algebraic
differential equations. u(·) : Rd #→ Rnu and v(·) : Rd #→ Ω are controls,
where u(·) are continuous values that map to Rnu , and v(·) are controls that
map to a finite set Ω. We allow also constant-in-time or constant-in-space
control values rather than distributed controls.
We will also use the term integer control for v(·), while binary control
refers to ω(t) ∈ {0, 1}nω that will be introduced later. We use the expres-
sion relaxed, whenever a restriction v(·) ∈ Ω is relaxed to a convex control
set, which is typically the convex hull, v(·) ∈ convΩ.
Basically two different kinds of switching events are at the origin of
hybrid systems, controllable and state-dependent ones. The first kind is
due to degrees of freedom for the optimization, in particular with controls
that may only take values from a finite set. The second kind is due to
1 Note that we use the notation common in control theory with x as differential states

and u as controls, not the PDE formulation with x as independent variable and u as
differential states.

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 635

state-dependent switches in the model equations, e.g., ground contact of


a robot leg or overflow of weirs in a distillation column. The focus in the
benchmark library is on the first kind of switches, whereas the second one
is of course important for a classification of the model equations, as for
certain MIOCPs both kinds occur.
The model equations are described by the functional F [·], to be spec-
ified in Section 2.1. The objective functional Φ[·], the constraints C[·]
that may include control- and path-constraints, and the interior point con-
straints Γ[x] that specify also the boundary conditions are classified in
Section 2.2. In Section 2.3 characteristics of an optimal solution from a
control theory point of view are listed.
The formulation of optimization problems is typically not unique.
Sometimes, as in the case of MPEC reformulations of state-dependent
switches [5], disjunctive programming [29], or outer convexification [58],
reformulations may be seen as part of the solution approach in the sense of
the modeling for optimization paradigm [47]. Even in obvious cases, such
as a Mayer term versus a Lagrange term formulation, they may be math-
ematically, but not necessarily algorithmically equivalent. We propose to
use either the original or the most adequate formulation of the optimization
problem and list possible reformulations as variants.

2.1. Model classification. This Section addresses possible realiza-


tions of the state equation

0 = F [x, u, v]. (2.2)

We assume throughout that the differential-algebraic states x are uniquely


determined for appropriate boundary conditions and fixed (u, v).

2.1.1. ODE model. This category includes all problems constrained


by the solution of explicit ordinary differential equations (ODE). In par-
ticular, no algebraic variables and derivatives with respect to one indepen-
dent variable only (typically time) are present in the mathematical model.
Equation (2.2) reads as

ẋ(t) = f (x(t), u(t), v(t)), t ∈ [0, tf ], (2.3)

for t ∈ [t0 , tf ] almost everywhere. We will often leave the argument (t)
away for notational convenience.

2.1.2. DAE model. If the model includes algebraic constraints and


variables, for example from conversation laws, a problem will be categorized
as a DAE model. Equality (2.2) will then include both differential equations
and algebraic constraints that determine the algebraic states in dependence
of the differential states and the controls. A more detailed classification
includes the index of the algebraic equations.

www.it-ebooks.info
636 SEBASTIAN SAGER

2.1.3. PDE model. If d > 1 the model equation (2.2) becomes a par-
tial differential equation (PDE). Depending on whether convection or dif-
fusion prevails, a further classification into hyperbolic, elliptic, or parabolic
equations is necessary. A more elaborate classification will evolve as more
PDE constrained MIOCPs are described on http://mintoc.de. In this
work one PDE-based instance is presented in Section 11.
2.1.4. Outer convexification. For time-dependent and space- inde-
pendent integer controls often another formulation is beneficial, e.g., [37].
For every element v i of Ω a binary control function ωi (·) is introduced.
Equation (2.2) can then be written as



0= F [x, u, v i ] ωi (t), t ∈ [0, tf ]. (2.4)
i=1

If we impose the special ordered set type one condition



ωi (t) = 1, t ∈ [0, tf ], (2.5)
i=1

there is a bijection between every feasible integer function v(·) ∈ Ω and an


appropriately chosen binary function ω(·) ∈ {0, 1}nω , compare [58]. The
relaxation of ω(t) ∈ {0, 1}nω is given by ω(t) ∈ [0, 1]nω . We will refer to
(2.4) and (2.5) as outer convexification of (2.2). This characteristic applies
to the control problems in Sections 3, 6, 9, 10, and 11.
2.1.5. State-dependent switches. Many processes are modelled by
means of state-dependent switches that indicate, e.g., model changes due to
a sudden ground contact of a foot or a weir overflow in a chemical process.
Mathematically, we write

0 = Fi [x, u, v] if σi (x(t)) ≥ 0 (2.6)

with well defined switching functions σi (·) for t ∈ [0, tf ]. This characteristic
applies to the control problems in Sections 6 and 8.
2.1.6. Boolean variables. Discrete switching events can also be ex-
pressed by means of Boolean variables and logical implications. E.g., by in-
troducing logical functions δi : [0, tf ] #→ {true, false} that indicate whether
a model formulation Fi [x, u, v] is active at time t, both state-dependent
switches and outer convexification formulations may be written as disjunc-
tive programs, i.e., optimization problems involving Boolean variables and
logical conditions. Using disjunctive programs can be seen as a more natu-
ral way of modeling discrete events and has the main advantage of resulting
in tighter relaxations of the discrete dicisions, when compared to integer
programming techniques. More details can be found in [29, 46, 47].

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 637

2.1.7. Multistage processes. Processes of interest are often mod-


elled as multistage processes. At transition times the model can change,
sometimes in connection with a state-dependent switch. The equations
read as

0 = Fi [x, u, v] t ∈ [ti , ti+1 ] (2.7)

on a time grid {ti }i . With smooth transfer functions also changes in the
dimension of optimization variables can be incorporated, [43].

2.1.8. Unstable dynamics. For numerical reasons it is interesting


to keep track of instabilities in process models. As small changes in inputs
lead to large changes in outputs, challenges for optimization methods arise.
This characteristic applies to the control problems in Sections 3 and 7.

2.1.9. Network topology. Complex processes often involve an un-


derlying network topology, such as in the control of gas or water networks
[45, 11] . The arising structures should be exploited by efficient algorithms.

2.2. Classification of the optimization problem. The optimiza-


tion problem (2.1) is described by means of an objective functional Φ[·] and
inequality constraints C[·] and equality constraints Γ[·]. The constraints
come in form of multipoint constraints that are defined on a time grid
t0 ≤ t1 ≤ · · · ≤ tm = tf , and of path-constraints that need to hold almost
everywhere on the time horizon. The equality constraints Γ[·] will often fix
the initial values or impose a periodicity constraint. In this classification
we assume all functions to be sufficiently often differentiable.
In the future, the classification will also include problems with non-
differentiable objective functions, multiple objectives, online control tasks
including feedback, indication of nonconvexities, and more characteristics
that allow for a specific choice of test instances.

2.2.1. Minimum time. This is a category with all control problems


that seek for time-optimal solutions, e.g., reaching a certain goal or com-
pleting a certain process as fast as possible. The objective function is of
Mayer type, Φ[·] = tf . This characteristic applies to the control problems
in Sections 3, 9, and 10.

2.2.2. Minimum energy. This is a category with all control prob-


lems that seek for energy-optimal solutions, e.g., reaching a certain goal or
completing a certain process with a minimum amount of energy. The objec-
tive function is of Lagrange type and sometimes proportional to a minimiza-
It
tion of the squared control (e.g., acceleration) u(·), e.g., Φ[·] = t0f u2 dt.
Almost always an upper bound on the free end time tf needs to be specified.
This characteristic applies to the control problems in Sections 6 and 8.

www.it-ebooks.info
638 SEBASTIAN SAGER

2.2.3. Tracking problem. This category lists all control problems


in which a tracking type Lagrange functional of the form
J tf
Φ[·] = ||x(τ ) − xref ||22 dτ (2.8)
t0

is to be minimized. This characteristic applies to the control problems in


Sections 4, 5, and 7.
2.2.4. Periodic processes. This is a category with all control prob-
lems that seek periodic solutions, i.e., a condition of the kind

Γ[x] = P (x(tf )) − x(t0 ) = 0, (2.9)

has to hold. P (·) is an operation that allows, e.g., for a perturbation


of states (such as needed for the formulation of Simulated Moving Bed
processes, Section 11, or for offsets of angles by a multiple of 2π such as
in driving on closed tracks, Section 10). This characteristic applies to the
control problems in Sections 8, 10, and 11.
2.2.5. Equilibrium constraints. This category contains mathemat-
ical programs with equilibrium constraints (MPECs). An MPEC is an op-
timization problem constrained by a variational inequality, which takes for
generic variables / functions y1 , y2 the following general form:

min Φ(y1 , y2 )
y1 ,y2

s.t. 0 = F (y1 , y2 ),
(2.10)
0 ≤ C(y1 , y2 ),
0 ≤ (μ − y2 )T φ(y1 , y2 ), y2 ∈ Y (y1 ), ∀μ ∈ Y (y1 )

where Y (y1 ) is the feasible region for the variational inequality and given
function φ(·). Variational inequalities arise in many domains and are gen-
erally referred to as equilibrium constraints. The variables y1 and y2 may
be controls or states.
2.2.6. Complementarity constraints. This category contains opti-
mization problems with complementarity constraints (MPCCs), for generic
variables / functions y1 , y2 , y3 in the form of

min Φ(y1 , y2 , y3 )
y1 ,y2 ,y3

s.t. 0 = F (y1 , y2 , y3 ), (2.11)


0 ≤ C(y1 , y2 , y3 ),
0 ≤ y1 ⊥ y2 ≥ 0.
The complementarity operator ⊥ implies the disjunctive behavior

y1,i = 0 OR y2,i = 0 ∀ i = 1 . . . ny .

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 639

MPCCs may arise from a reformulation of a bilevel optimization problem


by writing the optimality conditions of the inner problem as variational
constraints of the outer optimization problem, or from a special treatment
of state-dependent switches, [5]. Note that all MPCCs can be reformulated
as MPECs.
2.2.7. Vanishing constraints. This category contains mathematical
programs with vanishing constraints (MPVCs). The problem

min Φ(y)
y

s.t. 0 ≥ gi (y)hi (y), i ∈ {1, . . . , m} (2.12)


0 ≤ h(y)

with smooth functions g, h : Rny #→ Rm is called MPVC. Note that every


MPVC can be transformed into an MPEC [2, 33]. Examples for vanishing
constraints are engine speed constraints that are only active if the corre-
sponding gear control is nonzero. This characteristic applies to the control
problems in Sections 9, and 10.
2.3. Solution classification. The classification that we propose for
switching decisions is based on insight from Pontryagin’s maximum princi-
ple, [49], applied here only to the relaxation of the binary control functions
ω(·), denoted by α(·) ∈ [0, 1]nω . In the analysis of linear control problems
one distinguishes three cases: bang-bang arcs, sensitivity-seeking arcs, and
path-constrained arcs, [61], where an arc is defined to be a nonzero time-
interval. Of course a problem’s solution can show two or even all three
behaviors at once on different time arcs.
2.3.1. Bang-bang arcs. Bang-bang arcs are time intervals on which
the control bounds are active, i.e., αi (t) ∈ {0, 1} ∀ t. The case where the
optimal solution contains only bang-bang arcs is in a sense the easiest. The
solution of the relaxed MIOCP will be integer feasible, if the control dis-
cretization grid is a superset of the switching points of the optimal control.
Hence, the main goal will be to adapt the control discretization grid such
that the solution of the relaxed problem is already integer. Also on fixed
time grids good solutions are easy to come up with, as rounded solutions
approximate the integrated difference between relaxed and binary solution
very well.
A prominent example of this class is time-optimal car driving, see
Section 9 and see Section 10. Further examples of “bang-bang solutions”
include free switching of ports in Simulated Moving Bed processes, see
Section 11, unconstrained energy-optimal operation of subway trains see
Section 6, a simple F-8 flight control problem see Section 3, and phase
resetting in biological systems, such as in Section 7.
2.3.2. Path–constrained arcs. Whenever a path constraint is ac-
tive, i.e., it holds ci (x(t)) = 0 ∀ t ∈ [tstart , tend ] ⊆ [0, tf ], and no continuous

www.it-ebooks.info
640 SEBASTIAN SAGER

control u(·) can be determined to compensate for the changes in x(·), nat-
urally α(·) needs to do so by taking values in the interior of its feasible
domain. An illustrating example has been given in [58], where velocity
limitations for the energy-optimal operation of New York subway trains
are taken into account, see Section 6. The optimal integer solution does
only exist in the limit case of infinite switching (Zeno behavior), or when
a tolerance is given. Another example is compressor control in supermar-
ket refrigeration systems, see Section 8. Note that all applications may
comprise path-constrained arcs, once path constraints need to be added.
2.3.3. Sensitivity–seeking arcs. We define sensitivity–seeking (also
compromise–seeking) arcs in the sense of Srinivasan and Bonvin, [61], as
arcs which are neither bang–bang nor path–constrained and for which the
optimal control can be determined by time derivatives of the Hamiltonian.
For control–affine systems this implies so-called singular arcs.
A classical small-sized benchmark problem for a sensitivity-seeking
(singular) arc is the Lotka-Volterra Fishing problem, see Section 4. The
treatment of sensitivity–seeking arcs is very similar to the one of path–
constrained arcs. As above, an approximation up to any a priori specified
tolerance is possible, probably at the price of frequent switching.
2.3.4. Chattering arcs. Chattering controls are bang–bang controls
that switch infinitely often in a finite time interval [0, tf ]. An extensive an-
alytical investigation of this phenomenon can be found in [63]. An example
for a chattering arc solution is the famous example of Fuller, see Section 5.
2.3.5. Sliding mode. Solutions of model equations with state-de-
pendent switches as in (2.6) may show a sliding mode behavior in the sense
of Filippov systems [21]. This means that at least one of the functions σi (·)
has infinetely many zeros on the finite time interval [0, tf ]. In other words,
the right hand side switches infinetely often in a finite time horizon.
The two examples with state-dependent switches in this paper in Sec-
tions 6 and 8 do not show sliding mode behavior.
3. F-8 flight control. The F-8 aircraft control problem is based on
a very simple aircraft model. The control problem was introduced by Kaya
and Noakes [36] and aims at controlling an aircraft in a time-optimal way
from an initial state to a terminal state. The mathematical equations form
a small-scale ODE model. The interior point equality conditions fix both
initial and terminal values of the differential states. The optimal, relaxed
control function shows bang bang behavior. The problem is furthermore
interesting as it should be reformulated equivalently. Despite the reformu-
lation the problem is nonconvex and exhibits multiple local minima.
3.1. Model and optimal control problem. The F-8 aircraft con-
trol problem is based on a very simple aircraft model in ordinary differential
equations, introduced by Garrard [24]. The differential states consist of x0
as the angle of attack in radians, x1 as the pitch angle, and x2 as the pitch

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 641

rate in rad/s. The only control function w = w(t) is the tail deflection angle
in radians. The control objective is to control the airplane from one point
in space to another in minimum time. For t ∈ [0, T ] almost everywhere the
mixed-integer optimal control problem is given by
min T
x,w,T

s.t. ẋ0 = − 0.877 x0 + x2 − 0.088 x0 x2 + 0.47 x20 − 0.019 x21


− x20 x2 + 3.846 x30
− 0.215 w + 0.28 x20 w + 0.47 x0 w2 + 0.63 w 3
ẋ1 = x2 (3.1)
ẋ2 = − 4.208 x0 − 0.396 x2 − 0.47 x20 − 3.564 x30
− 20.967 w + 6.265 x20 w + 46 x0 w2 + 61.4 w3
x(0) = (0.4655, 0, 0)T , x(T ) = (0, 0, 0)T ,
w(t) ∈ {−0.05236, 0.05236}, t ∈ [0, T ].
In the control problem, both initial and terminal values of the differential
states are fixed. The control w(t) is restricted to take values from a finite
set only. Hence, the control problem can be reformulated equivalently to
min T
x,w,T

s.t. ẋ0 = − 0.877 x0 + x2 − 0.088 x0 x2 + 0.47 x20 − 0.019 x21


− x20 x2 + 3.846 x30
+ 0.215 ξ − 0.28 x20 ξ + 0.47 x0 ξ 2 − 0.63 ξ 3
. /
− 0.215 ξ − 0.28 x20 ξ − 0.63 ξ 3 2w
ẋ1 = x2 (3.2)
ẋ2 = − 4.208 x0 − 0.396 x2 − 0.47 x20 − 3.564 x30
+ 20.967 ξ − 6.265 x20 ξ + 46 x0 ξ 2 − 61.4 ξ3
. /
− 20.967 ξ − 6.265 x20 ξ − 61.4 ξ 3 2w
T T
x(0) = (0.4655, 0, 0) , x(T ) = (0, 0, 0) ,
w(t) ∈ {0, 1}, t ∈ [0, T ]
with ξ = 0.05236. Note that there is a bijection between optimal solu-
tions of the two problems, and that the second formulation is an outer
convexification, compare Section 2.1.
3.2. Results. We provide in Table 1 a comparison of different solu-
tions reported in the literature. The numbers show the respective lengths
ti − ti−1 of the switching arcs with the value of w(t) on the upper or lower
bound (given in the second column). The infeasibility shows values ob-
tained by a simulation with a Runge-Kutta-Fehlberg method of 4th/5th
order and an integration tolerance of 10−8 .

www.it-ebooks.info
642 SEBASTIAN SAGER

Table 1
Results for the F-8 flight control problem. The solution in the second last column
is a personal communication by Martin Schlüter and Matthias Gerdts.

Arc w(t) Lee[42] Kaya[36] Sager[53] Schlüter Sager


1 1 0.00000 0.10292 0.10235 0.0 1.13492
2 0 2.18800 1.92793 1.92812 0.608750 0.34703
3 1 0.16400 0.16687 0.16645 3.136514 1.60721
4 0 2.88100 2.74338 2.73071 0.654550 0.69169
5 1 0.33000 0.32992 0.32994 0.0 0.0
6 0 0.47200 0.47116 0.47107 0.0 0.0
Infeasibility 1.75E-3 1.64E-3 5.90E-6 3.29E-6 2.21E-7
Objective 6.03500 5.74218 5.72864 4.39981 3.78086

The best known optimal objective value of this problem given is given
by T = 3.78086. The corresponding solution is shown in Figure 1 (right),
another local minimum is plotted in Figure 1 (left). The solution of bang-
bang type switches three resp. five times, starting with w(t) = 1.

Fig. 1. Trajectories for the F-8 flight control problem. Left: corresponding to the
Sager[53] column in Table 1. Right: corresponding to the rightmost column in Table 1.

4. Lotka Volterra fishing problem. The Lotka Volterra fishing


problem seeks an optimal fishing strategy to be performed on a fixed time
horizon to bring the biomasses of both predator as prey fish to a prescribed
steady state. The problem was set up as a small-scale benchmark problem
in [55] and has since been used for the evaluation of algorithms, e.g., [62].
The mathematical equations form a small-scale ODE model. The inte-
rior point equality conditions fix the initial values of the differential states.
The optimal integer control shows chattering behavior, making the Lotka
Volterra fishing problem an ideal candidate for benchmarking of algorithms.

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 643

4.1. Model and optimal control problem. The biomasses of two


fish species — one predator, the other one prey — are the differential
states of the model, the binary control is the operation of a fishing fleet.
The optimization goal is to penalize deviations from a steady state,
Jtf
min (x0 − 1)2 + (x1 − 1)2 dt
x,w
t0

s.t. ẋ0 = x0 − x0 x1 − c0 x0 w (4.1)


ẋ1 = −x1 + x0 x1 − c1 x1 w,
x(0) = (0.5, 0.7)T ,
w(t) ∈ {0, 1}, t ∈ [0, tf ],
with tf = 12, c0 = 0.4, and c1 = 0.2.
4.2. Results. If the problem is relaxed, i.e., we demand that w(·)
be in the continuous interval [0, 1] instead of the binary choice {0, 1}, the
optimal solution can be determined by means of Pontryagin’s maximum
principle [49]. The optimal solution contains a singular arc, [55].
The optimal objective value of this relaxed problem is Φ = 1.34408.
As follows from MIOC theory [58] this is the best lower bound on the
optimal value of the original problem with the integer restriction on the
control function. In other words, this objective value can be approximated
arbitrarily close, if the control only switches often enough between 0 and
1. As no optimal solution exists, a suboptimal one is shown in Figure 2,
with 26 switches and an objective function value of Φ = 1.34442.

Fig. 2. Trajectories for the Lotka Volterra Fishing problem. Top left: optimal
relaxed solution on grid with 52 intervals. Top right: feasible integer solution. Bottom:
corresponding differential states, biomass of prey and of predator fish.

4.3. Variants. There are several alternative formulations and vari-


ants of the above problem, in particular
• a prescribed time grid for the control function [55],
• a time-optimal formulation to get into a steady-state [53],
• the usage of a different target steady-state, as the one correspond-
ing to w(·) = 1 which is (1 + c1 , 1 − c0 ),
• different fishing control functions for the two species,
• different parameters and start values.

www.it-ebooks.info
644 SEBASTIAN SAGER

5. Fuller’s problem. The first control problem with an optimal chat-


tering solution was given by [23]. An optimal trajectory does exist for all
initial and terminal values in a vicinity of the origin. As Fuller showed, this
optimal trajectory contains a bang-bang control function that switches in-
finitely often. The mathematical equations form a small-scale ODE model.
The interior point equality conditions fix initial and terminal values of the
differential states, the objective is of tracking type.
5.1. Model and optimal control problem. The MIOCP reads as
J 1
min x20 dt
x,w 0
s.t. ẋ0 = x1
(5.1)
ẋ1 = 1 − 2 w
x(0) = (0.01, 0)T , x(T ) = (0.01, 0)T ,
w(t) ∈ {0, 1}, t ∈ [0, 1].

5.2. Results. The optimal trajectories for the relaxed control prob-
lem on an equidistant grid G 0 with nms = 20, 30, 60 are shown in the top
row of Figure 3. Note that this solution is not bang–bang due to the dis-
cretization of the control space. Even if this discretization is made very
fine, a trajectory with w(·) = 0.5 on an interval in the middle of [0, 1] will
be found as a minimum.
The application of MS MINTOC [54] yields an objective value of Φ =
1.52845 · 10−5 , which is better than the limit of the relaxed problems,
Φ20 = 1.53203 · 10−5 , Φ30 = 1.53086 · 10−5 , and Φ60 = 1.52958 · 10−5 .

Fig. 3. Trajectories for Fuller’s problem. Top row and bottom left: relaxed optima
for 20, 30, and 60 equidistant control intervals. Bottom right: feasible integer solution.

5.3. Variants. An extensive analytical investigation of this problem


and a discussion of the ubiquity of Fuller’s problem can be found in [63].

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 645

6. Subway ride. The optimal control problem we treat in this section


goes back to work of [9] for the city of New York. In an extension, also
velocity limits that lead to path–constrained arcs appear. The aim is to
minimize the energy used for a subway ride from one station to another,
taking into account boundary conditions and a restriction on the time.
6.1. Model and optimal control problem. The MIOCP reads as
J tf
min L(x, w) dt
x,w 0
s.t. ẋ0 = x1
(6.1)
ẋ1 = f1 (x, w)
x(0) = (0, 0)T , x(tf ) = (2112, 0)T ,
w(t) ∈ {1, 2, 3, 4}, t ∈ [0, tf ].

The terminal time tf = 65 denotes the time of arrival of a subway train in


the next station. The differential states x0 (·) and x1 (·) describe position
and velocity of the train, respectively. The train can be operated in one of
four different modes, w(·) = 1 series, w(·) = 2 parallel, w(·) = 3 coasting, or
w(·) = 4 braking that accelerate or decelerate the train and have different
energy consumption. Acceleration and energy comsumption are velocity-
dependent. Hence, we will need switching functions σi (x1 ) = vi − x1 for
given velocities vi , i = 1..3. The Lagrange term reads as

⎨ e p1 if σ1 ≥ 0
L(x, 1) = e p else if σ2 ≥ 0 (6.2)
⎩ 5 . 12 /−i
e i=0 ci (1) 10 γ x1 else

⎨ ∞ if σ2 ≥ 0
L(x, 2) = e p3 else if σ3 ≥ 0 (6.3)
⎩ 5 .1 /−i
e c
i=0 i (2) 10 γ x1 − 1 else
L(x, 3) = L(x, 4) = 0. (6.4)

The right hand side function f1 (x, w) reads as




⎨ f11A := gWeeffa1 if σ1 ≥ 0
f1 (x, 1) = f11B := gWeeffa2 else if σ2 ≥ 0 (6.5)

⎩ f 1C := g (e T (x1 ,1)−R(x1 )
1 Weff else

⎨ 0 if σ2 ≥ 0
f1 (x, 2) = f12B := gWeeffa3 else if σ3 ≥ 0 (6.6)
⎩ 2C
f1 := g (e T (xW 1 ,2)−R(x1 )
eff
else
g R(x1 )
f1 (x, 3) = − − C, (6.7)
Weff
f1 (x, 4) = −u = −umax . (6.8)

www.it-ebooks.info
646 SEBASTIAN SAGER

Table 2
Parameters used for the subway MIOCP and its variants.

Symbol Value Unit Symbol Value Unit


W 78000 lbs v1 0.979474 mph
Weff 85200 lbs v2 6.73211 mph
S 2112 ft v3 14.2658 mph
S4 700 ft v4 22.0 mph
S5 1200 ft v5 24.0 mph
γ 3600 sec / ft a1 6017.611205 lbs
5280 h mile2
a 100 ft a2 12348.34865 lbs
nwag 10 - a3 11124.63729 lbs
b 0.045 - umax 4.4 ft / sec2
C 0.367 - p1 106.1951102 -
g 32.2 ft p2 180.9758408 -
sec2
e 1.0 - p3 354.136479 -

The braking deceleration u(·) can be varied between 0 and a given umax . It
can be shown that for problem (6.1) only maximal braking can be optimal,
hence we fixed u(·) to umax without loss of generality. Occurring forces are

1.3
R(x1 ) = ca γ 2 x1 2 + bW γx1 + W + 116, (6.9)
2000
5  −i
1
T (x1 , 1) = bi (1) γx1 − 0.3 , (6.10)
i=0
10
5  −i
1
T (x1 , 2) = bi (2) γx1 − 1 . (6.11)
i=0
10

Parameters are listed in Table 2, while bi (w) and ci (w) are given by

b0 (1) −0.1983670410E02, c0 (1) 0.3629738340E02,


b1 (1) 0.1952738055E03, c1 (1) −0.2115281047E03,
b2 (1) 0.2061789974E04, c2 (1) 0.7488955419E03,
b3 (1) −0.7684409308E03, c3 (1) −0.9511076467E03,
b4 (1) 0.2677869201E03, c4 (1) 0.5710015123E03,
b5 (1) −0.3159629687E02, c5 (1) −0.1221306465E03,
b0 (2) −0.1577169936E03, c0 (2) 0.4120568887E02,
b1 (2) 0.3389010339E04, c1 (2) 0.3408049202E03,
b2 (2) 0.6202054610E04, c2 (2) −0.1436283271E03,
b3 (2) −0.4608734450E04, c3 (2) 0.8108316584E02,
b4 (2) 0.2207757061E04, c4 (2) −0.5689703073E01,
b5 (2) −0.3673344160E03, c5 (2) −0.2191905731E01.

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 647

Details about the derivation of this model and the assumptions made
can be found in [9] or in [38].
6.2. Results. The optimal trajectory for this problem has been cal-
culated by means of an indirect approach in [9, 38], and based on the
direct multiple shooting method in [58]. The resulting trajectory is listed
in Table 3.
Table 3
Optimal trajectory for the subway MIOCP as calculated in [9, 38, 58].

Time t w(·) f1 = x0 [ft] x1 [mph] x1 [ftps] Energy


0.00000 1 f11A 0.0 0.0 0.0 0.0
0.63166 1 f11B 0.453711 0.979474 1.43656 0.0186331
2.43955 1 f11C 10.6776 6.73211 9.87375 0.109518
3.64338 2 f12B 24.4836 8.65723 12.6973 0.147387
5.59988 2 f12C 57.3729 14.2658 20.9232 0.339851
12.6070 1 f11C 277.711 25.6452 37.6129 0.93519
45.7827 3 f1 (3) 1556.5 26.8579 39.3915 1.14569
46.8938 3 f1 (3) 1600 26.5306 38.9115 1.14569
57.1600 4 f1 (4) 1976.78 23.5201 34.4961 1.14569
65.0000 - − 2112 0.0 0.0 1.14569

6.3. Variants. The given parameters have to be modified to match


different parts of the track, subway train types, or amount of passengers.
A minimization of travel time might also be considered.
The problem becomes more challenging, when additional point or path
constraints are considered. First we consider the point constraint

x1 ≤ v4 if x0 = S4 (6.12)

for a given distance 0 < S4 < S and velocity v4 > v3 . Note that the state
x0 (·) is strictly monotonically increasing with time, as ẋ0 = x1 > 0 for all
t ∈ (0, T ).
The optimal order of gears for S4 = 1200 and v4 = 22/γ with the ad-
ditional interior point constraints (6.12) is 1, 2, 1, 3, 4, 2, 1, 3, 4. The stage
lengths between switches are 2.86362, 10.722, 15.3108, 5.81821, 1.18383,
2.72451, 12.917, 5.47402, and 7.98594 with Φ = 1.3978. For different pa-
rameters S4 = 700 and v4 = 22/γ we obtain the gear choice 1, 2, 1, 3, 2, 1,
3, 4 and stage lengths 2.98084, 6.28428, 11.0714, 4.77575, 6.0483, 18.6081,
6.4893, and 8.74202 with Φ = 1.32518.
A more practical restriction are path constraints on subsets of the
track. We will consider a problem with additional path constraints

x1 ≤ v5 if x0 ≥ S5 . (6.13)

www.it-ebooks.info
648 SEBASTIAN SAGER

Optimal solution with 1 touch point Optimal solution with 3 touch points

Fig. 4. The differential state velocity of a subway train over time. The dotted ver-
tical line indicates the beginning of the path constraint, the horizontal line the maximum
velocity. Left: one switch leading to one touch point. Right: optimal solution for three
switches. The energy-optimal solution needs to stay as close as possible to the maximum
velocity on this time interval to avoid even higher energy-intensive accelerations in the
start-up phase to match the terminal time constraint tf ≤ 65 to reach the next station.

The additional path constraint changes the qualitative behavior of the re-
laxed solution. While all solutions considered this far were bang–bang and
the main work consisted in finding the switching points, we now have a
path–constraint arc. The optimal solutions for refined grids yield a series
of monotonically decreasing objective function values, where the limit is
the best value that can be approximated by an integer feasible solution. In
our case we obtain

1.33108, 1.31070, 1.31058, 1.31058, . . . (6.14)

Figure 4 shows two possible integer realizations, with a trade-off between


energy consumption and number of switches. Note that the solutions ap-
proximate the optimal driving behavior (a convex combination of two op-
eration modes) by switching between the two and causing a touching of the
velocity constraint from below as many times as we switch.
7. Resetting calcium oscillations. The aim of the control prob-
lem is to identify strength and timing of inhibitor stimuli that lead to a
phase singularity which annihilates intra-cellular calcium oscillations. This
is formulated as an objective function that aims at minimizing the state

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 649

deviation from a desired unstable steady state, integrated over time. A


calcium oscillator model describing intra-cellular calcium spiking in hep-
atocytes induced by an extracellular increase in adenosine triphosphate
(ATP) concentration is described. The calcium signaling pathway is initi-
ated via a receptor activated G-protein inducing the intra-cellular release of
inositol triphosphate (IP3) by phospholipase C. The IP3 triggers the open-
ing of endoplasmic reticulum and plasma membrane calcium channels and
a subsequent inflow of calcium ions from intra-cellular and extracellular
stores leading to transient calcium spikes.
The mathematical equations form a small-scale ODE model. The inte-
rior point equality conditions fix the initial values of the differential states.
The problem is, despite of its low dimension, very hard to solve, as the
target state is unstable.
7.1. Model and optimal control problem. The MIOCP reads as
J tf
minmax ||x(t) − x̃||22 + p1 w(t) dt
x,w,w 0
k3 x0 x1 k 5 x0 x2
s.t. ẋ0 = k1 + k2 x0 − −
x0 + K4 x0 + K6
k 8 x1
ẋ1 = k7 x0 −
x1 + K 9
k10 x1 x2 x3 k14 x2
ẋ2 = + k12 x1 + k13 x0 −
x3 + K11 w · x2 + K15 (7.1)
k16 x2 x3
− +
x2 + K17 10
k10 x1 x2 x3 k16 x2 x3
ẋ3 = − + −
x3 + K11 x2 + K17 10
x(0) = (0.03966, 1.09799, 0.00142, 1.65431)T ,
1.1 ≤ wmax ≤ 1.3,
w(t) ∈ {1, wmax }, t ∈ [0, tf ]

with fixed parameter values [t0 , tf ] = [0, 22], k1 = 0.09, k2 = 2.30066,


k3 = 0.64, K4 = 0.19, k5 = 4.88, K6 = 1.18, k7 = 2.08, k8 = 32.24,
K9 = 29.09, k10 = 5.0, K11 = 2.67, k12 = 0.7, k13 = 13.58, k14 = 153.0,
K15 = 0.16, k16 = 4.85, K17 = 0.05, p1 = 100, and reference values
x̃0 = 6.78677, x̃1 = 22.65836, x̃2 = 0.384306, x̃3 = 0.28977.
The differential states (x0 , x1 , x2 , x3 ) describe concentrations of acti-
vated G-proteins, active phospholipase C, intra-cellular calcium, and intra-
ER calcium, respectively. The external control w(·) is a temporally varying
concentration of an uncompetitive inhibitor of the PMCA ion pump.
Modeling details can be found in [39]. In the given equations that
stem from [41], the model is identical to the one derived there, except
for an additional first-order leakage flow of calcium from the ER back to

www.it-ebooks.info
650 SEBASTIAN SAGER

the cytoplasm, which is modeled by x103 . It reproduces well experimental


observations of cytoplasmic calcium oscillations as well as bursting behavior
and in particular the frequency encoding of the triggering stimulus strength,
which is a well known mechanism for signal processing in cell biology.

7.2. Results. The depicted optimal solution in Figure 5 consists of a


stimulus of wmax = 1.3 and a timing given by the stage lengths 4.6947115,
0.1491038, and 17.1561845. The optimal objective function value is Φ =
1610.654. As can be seen from the additional plots, this solution is ex-
tremely unstable. A small perturbation in the control, or simply rounding
errors on a longer time horizon lead to a transition back to the stable
limit-cycle oscillations.
The determination of the stimulus by means of optimization is quite
hard for two reasons. First, the unstable target steady-state. Only a stable
all-at-once algorithm such as multiple shooting or collocation can be applied
successfully. Second, the objective landscape of the problem in switching
time formulation (this is, for a fixed stimulus strength and modifying only
beginning and length of the stimulus) is quite nasty, as the visualizations
in [53] and on the web page [52] show.

Fig. 5. Trajectories for the calcium problem. Top left: optimal integer solution.
Top right: corresponding differential states with phase resetting. Bottom left: slightly
perturbed control: stimulus 0.001 too early. Bottom right: long time behavior of optimal
solution: numerical rounding errors lead to transition back from unstable steady-state
to stable limit-cycle.

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 651

7.3. Variants. Alternatively, also the annihilation of calcium oscilla-


tions with PLC activation inhibition, i.e., the use of two control functions is
possible, compare [41]. Of course, results depend very much on the scaling
of the deviation in the objective function.

8. Supermarket refrigeration system. This benchmark problem


was formulated first within the European network of excellence HYCON,
[19] by Larsen et. al, [40]. The formulation lacks however a precise defi-
nition of initial values and constraints, which are only formulated as “soft
constraints”. The task is to control a refrigeration system in an energy op-
timal way, while guaranteeing safeguards on the temperature of the show-
cases. This problem would typically be a moving horizon online optimiza-
tion problem, here it is defined as a fixed horizon optimization task.
The mathematical equations form a periodic ODE model.

8.1. Model and optimal control problem. The MIOCP reads as


J tf
1
min (w2 + w3 ) · 0.5 · ηvol · Vsl · f dt
x,w,tf tf
10 . / . /2
x4 x2 − Te (x0 ) + x8 x6 − Te (x0 ) U Awrm
s.t. ẋ0 = ·
Vsuc · dρsuc
dPsuc (x0 )
Mrm · Δhlg (x0 )
Mrc − ηvol · Vsl · 0.5 (w2 + w3 ) ρsuc (x0 )
+
Vsuc · dP
dρsuc
suc
(x0 )
U Agoods−air (x1 − x3 )
ẋ1 = −
Mgoods · Cp,goods
U Awrm . /
U Aair−wall (x3 − x2 ) − x4 x2 − Te (x0 )
Mrm
ẋ2 =
Mwall · Cp,wall
U Agoods−air (x1 − x3 ) + Q̇airload − U Aair−wall (x3 − x2 )
ẋ3 =
Mair · Cp,air
 
Mrm − x4 U Awrm (1 − w0 ) . /
ẋ4 = w0 − x4 x2 − Te (x0 )
τf ill Mrm · Δhlg (x0 )
U Agoods−air (x5 − x7 )
ẋ5 = −
Mgoods · Cp,goods
U Awrm . /
U Aair−wall (x7 − x6 ) − x8 x6 − Te (x0 )
Mrm
ẋ6 =
Mwall · Cp,wall
U Agoods−air (x5 − x7 ) + Q̇airload − U Aair−wall (x7 − x6 )
ẋ7 =
Mair · Cp,air

www.it-ebooks.info
652 SEBASTIAN SAGER

Table 4
Parameters used for the supermarket refrigeration problem.

Symbol Value Unit Description


J
Q̇airload 3000.00 s Disturbance, heat transfer
kg
ṁrc 0.20 s Disturbance, constant mass flow
Mgoods 200.00 kg Mass of goods
J
Cp,goods 1000.00 kg·K Heat capacity of goods
J
U Agoods−air 300.00 s·K Heat transfer coefficient
Mwall 260.00 kg Mass of evaporator wall
J
Cp,wall 385.00 kg·K
Heat capacity of evaporator wall
J
U Aair−wall 500.00 s·K Heat transfer coefficient
Mair 50.00 kg Mass of air in display case
J
Cp,air 1000.00 kg·K
Heat capacity of air
J
U Awrm 4000.00 s·K Maximum heat transfer coefficient
τf ill 40.00 s Filling time of the evaporator
TSH 10.00 K Superheat in the suction manifold
Mrm 1.00 kg Maximum mass of refrigerant
Vsuc 5.00 m3 Total volume of suction manifold
m3
Vsl 0.08 s Total displacement volume
ηvol 0.81 − Volumetric efficiency

 
Mrm − x8 U Awrm (1 − w1 ) . /
ẋ8 = w1 − x8 x6 − Te (x0 )
τf ill Mrm · Δhlg (x0 )
x(0) = x(tf ),
650 ≤ tf ≤ 750,
x0 ≤ 1.7, 2 ≤ x3 ≤ 5, 2 ≤ x7 ≤ 5
w(t) ∈ {0, 1}4 , t ∈ [0, tf ].

The differential state x0 describes the suction pressure in the suction mani-
fold (in bar). The next three states model temperatures in the first display
case (in C). x1 is the goods’ temperature, x2 the one of the evaporator
wall and x3 the air temperature surrounding the goods. x4 then models
the mass of the liquefied refrigerant in the evaporator (in kg). x5 to x8
describe the corresponding states in the second display case. w0 and w1
describe the inlet valves of the first two display cases, respectively. w2 and
w3 denote the activity of a single compressor.
The model uses the parameter values listed in Table 4 and the poly-
nomial functions obtained from interpolations:

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 653

Te (x0 ) = −4.3544x20 + 29.224x0 − 51.2005,


Δhlg (x0 ) = (0.0217x20 − 0.1704x0 + 2.2988) · 105 ,
ρsuc (x0 ) = 4.6073x0 + 0.3798,
dρsuc
dPsuc
(x0 ) = −0.0329x0 3 + 0.2161x0 2 − 0.4742x0 + 5.4817.

8.2. Results. For the relaxed problem the optimal solution is Φ =


12072.45. The integer solution plotted in Figure 6 is feasible, but yields an
increased objective function value of Φ = 12252.81, a compromise between
effectiveness and a reduced number of switches.

Fig. 6. Periodic trajectories for optimal relaxed (left) and integer feasible controls
(right), with the controls w(·) in the first row and the differential states in the three
bottom rows.

8.3. Variants. Since the compressors are parallel connected one can
introduce a single control w2 ∈ {0, 1, 2} instead of two equivalent controls.
The same holds for scenarios with n parallel connected compressors.
In [40], the problem was stated slightly different:
• The temperature constraints weren’t hard bounds but there was a
penalization term added to the objective function to minimize the
violation of these constraints.
• The differential equation for the mass of the refrigerant had another
switch, if the valve (e.g. w0 ) is closed. It was formulated as x˙4 =
Mrm − x4 U Awrm . /
if w0 = 1, x˙4 = − x4 x2 − Te (x0 ) if
τf ill Mrm · Δhlg (x0 )
w0 = 0 and x4 > 0, or x˙4 = 0 if w0 = 0 and x4 = 0. This
additional switch is redundant because the mass itself is a factor
on the right hand side and so the complete right hand side is 0 if
x4 = 0.
• A night scenario with two different parameters was given. At night
the following parameters change their value to Q̇airload = 1800.00 Js

www.it-ebooks.info
654 SEBASTIAN SAGER

and ṁrc = 0.00 kg s


. Additionally the constraint on the suction
pressure x0 (t) is softened to x0 (t) ≤ 1.9.
• The number of compressors and display cases is not fixed. Larsen
also proposed the problem with 3 compressors and 3 display cases.
This leads to a change in the compressor rack’s performance to
3
Vsl = 0.095 ms . Unfortunately this constant is only given for these
two cases although Larsen proposed scenarios with more compres-
sors and display cases.
9. Elchtest testdrive. We consider a time-optimal car driving ma-
neuver to avoid an obstacle with small steering effort. At any time, the car
must be positioned on a prescribed track. This control problem was first
formulated in [25] and used for subsequent studies [26, 37].
The mathematical equations form a small-scale ODE model. The in-
terior point equality conditions fix initial and terminal values of the differ-
ential states, the objective is of minimum-time type.
9.1. Model and optimal control problem. We consider a car
model derived under the simplifying assumption that rolling and pitch-
ing of the car body can be neglected. Only a single front and rear wheel is
modelled, located in the virtual center of the original two wheels. Motion
of the car body is considered on the horizontal plane only.
The MIOCP reads as
J tf
min tf + wδ2 (t) dt (9.1a)
tf ,x(·),u(·) 0
. /
s.t. ċx = v cos ψ − β (9.1b)
. /
ċy = v sin ψ − β (9.1c)
1 1 . /
v̇ = (Flrμ − FAx ) cos β + Flf cos δ + β (9.1d)
m 2
. /
− (Fsr − FAy ) sin β − Fsf sin δ + β
δ̇ = wδ (9.1e)
1 1 . /
β̇ = wz − (Flr − FAx ) sin β + Flf sin δ + β (9.1f)
mv
. /2
+ (Fsr − FAy ) cos β + Fsf cos δ + β
ψ̇ = wz (9.1g)
1 . /
ẇz = Fsf lf cos δ − Fsr lr − FAy eSP + Flf lf sin δ (9.1h)
Izz
E F
cy (t) ∈ Pl (cx (t)) + B2 , Pu (cx (t)) − B2 (9.1i)
4
wδ (t) ∈ [−0.5, 0.5], FB (t) ∈ [0, 1.5 · 10 ], φ(t) ∈ [0, 1] (9.1j)
μ(t) ∈ {1, . . . , 5} (9.1k)
. /
x(t0 ) = −30, free, 10, 0, 0, 0, 0 , (cx , ψ)(tf ) = (140, 0) (9.1l)

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 655

for t ∈ [t0 , tf ] almost everywhere. The four control functions contained in


u(·) are steering wheel angular velocity wδ , total braking force FB , the ac-
celerator pedal position φ and the gear μ. The differential states contained
in x(·) are horizontal position of the car cx , vertical position of the car cy ,
magnitude of directional velocity of the car v, steering wheel angle δ, side
slip angle β, yaw angle ψ, and the y aw angle velocity wz .
The model parameters are listed in Table 5, while the forces and ex-
pressions in (9.1b) to (9.1h) are given for fixed μ by
1 .
Fsf,sr (αf,r ) := Df,r sin Cf,r arctan Bf,r αf,r
/2
− Ef,r (Bf,r αf,r − arctan(Bf,r αf,r )) ,
 
lf ψ̇(t) − v(t) sin β(t)
αf := δ(t) − arctan
v(t) cos β(t)
 
lr ψ̇(t) + v(t) sin β(t)
αr := arctan ,
v(t) cos β(t)

Flf := −FBf − FRf ,


iμg it
Flrμ := μ
Mmot (φ) − FBr − FRr ,
R
μ μ μ
Mmot (φ) := f1 (φ) f2 (wmot ) + (1 − f1 (φ)) f3 (wmot ),
f1 (φ) := 1 − exp(−3 φ),
2
f2 (wmot ) := −37.8 + 1.54 wmot − 0.0019 wmot ,
f3 (wmot ) := −34.9 − 0.04775 wmot ,

μ iμg it
wmot := v(t),
R
2 1
FBf := FB , FBr := FB ,
3 3
m lr g m lf g
FRf (v) := fR (v) , FRr (v) := fR (v) ,
lf + lr lf + lr
fR (v) := 9 · 10−3 + 7.2 · 10−5 v + 5.038848 · 10−10 v 4 ,
1
FAx := cw ρ A v 2 (t), FAy := 0.
2

The test track is described by setting up piecewise cubic spline func-


tions Pl (x) and Pr (x) modeling the top and bottom track boundary, given
a horizontal position x.

www.it-ebooks.info
656 SEBASTIAN SAGER

Table 5
Parameters used in the car model.

Value Unit Description


3
m 1.239 · 10 kg Mass of the car
m
g 9.81 s2
Gravity constant
lf 1.19016 m Front wheel distance to c.o.g.
lr 1.37484 m Rear wheel distance to c.o.g.
R 0.302 m Wheel radius
Izz 1.752 · 103 kg m2 Moment of inertia
cw 0.3 – Air drag coefficient
kg
ρ 1.249512 m3
Air density
2
A 1.4378946874 m Effective flow surface
i1g 3.09 – Gear 1 transmission ratio
i2g 2.002 – Gear 2 transmission ratio
i3g 1.33 – Gear 3 transmission ratio
i4g 1.0 – Gear 4 transmission ratio
i5g 0.805 – Gear 5 transmission ratio
it 3.91 – Engine torque transmission
Bf 1.096 · 101 – Pacejka coeff. (stiffness)
Br 1.267 · 101 –
Cf,r 1.3 – Pacejka coefficients (shape)
Df 4.5604 · 103 – Pacejka coefficients (peak)
Dr 3.94781 · 103 –
Ef,r −0.5 – Pacejka coefficients (curv.)



⎪ 0 if x ≤ 44,



⎪ 4 h2 (x − 44)3 if 44 < x ≤ 44.5,


⎨ 4 h2 (x − 45)3 + h2 if 44.5 < x ≤ 45,
Pl (x) := h2 if 45 < x ≤ 70, (9.2)



⎪ 4 h2 (70 − x)3 + h2 if 70 < x ≤ 70.5,



⎪ 4 h2 (71 − x)3 if 70.5 < x ≤ 71,

0 if 71 < x.



⎪ h1 if x ≤ 15,



⎪ 4 (h3 − h1 ) (x − 15)3 + h1 if 15 < x ≤ 15.5,


⎨ 4 (h3 − h1 ) (x − 16)3 + h3 if 15.5 < x ≤ 16,
Pu (x) := h3 if 16 < x ≤ 94, (9.3)



⎪ 4 (h3 − h4 ) (94 − x)3 + h3 if 94 < x ≤ 94.5,



⎪ 4 (h3 − h4 ) (95 − x)3 + h4 if 94.5 < x ≤ 95,

h4 if 95 < x.

where B = 1.5 m is the car’s width and

h1 := 1.1 B + 0.25, h2 := 3.5, h3 := 1.2 B + 3.75, h4 := 1.3 B + 0.25.

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 657

9.2. Results. In [25, 26, 37] numerical results for the benchmark
problem have been deduced. In [37] one can also find an explanation why
a bang-bang solution for the relaxed and convexified gear choices has to
be optimal. Table 6 gives the optimal gear choice and the resulting ob-
jective function value (the end time) for different numbers N of control
discretization intervals, which were also used for a discretization of the
path constraints.

Table 6
Gear choice depending on discretization in time N . Times when gear becomes active.

N μ=1 μ=2 μ=3 μ=4 μ=5 tf


10 0.0 0.435956 2.733326 – – 6.764174
20 0.0 0.435903 2.657446 6.467723 – 6.772046
40 0.0 0.436108 2.586225 6.684504 – 6.782052
80 0.0 0.435796 2.748930 6.658175 – 6.787284

10. Elliptic track testdrive. This control problem is very similar to


the one in Section 9. However, instead of a simple lane change maneuver the
time-optimal driving on an elliptic track with periodic boundary conditions
is considered, [57].
10.1. Model and optimal control problem. With the notation of
Section 9 the MIOCP reads as

min tf
tf ,x(·),u(·)

s.t. (9.1b − 9.1h), (9.1j), (9.1k),


(cx , cy ) ∈ X , (10.1a)
x(t0 ) = x(tf ) − (0, 0, 0, 0, 0, 2π, 0)T ,
cy (t0 ) = 0,
0 ≤ reng (v, μ),

for t ∈ [t0 , tf ] almost everywhere.


The set X describes an elliptic track with axes of a = 170 meters and
b = 80 meters respectively, centered in the origin. The track’s width is
W = 7.5 meters, five times the car’s width B = 1.5 meters,
E F 
X = (a + r) cos η, (b + r) sin η r ∈ [−W/2, W/2] ⊂ R ,

c
with η = arctan cyx . Note that the special case cx = 0 leading to η = ± π2
requires separate handling.
The model in Section 9 has a shortcoming, as switching to a low gear is
possible also at high velocities, although this would lead to an unphysically

www.it-ebooks.info
658 SEBASTIAN SAGER

high engine speed. Therefore we extend it by additional constraints on the


car’s engine speed

800 =: nMIN MAX


eng ≤ neng ≤ neng := 8000, (10.2)

in the form of equivalent velocity constraints

πnMIN
eng R πnMAX
eng R
μ ≤v≤ (10.3)
30it ig 30it iμg

for all t ∈ [0, tf ] and the active gear μ. We write this as reng (v, μ) ≥ 0.
10.2. Results. Parts of the optimal trajectory from [57] are shown in
Figures 7 and 8. The order of gears is (2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1, 2). The
gear switches take place after 1.87, 5.96, 10.11, 11.59, 12.21, 12.88, 15.82,
19.84, 23.99, 24.96, 26.10, and 26.76 seconds, respectively. The final time
is tf = 27.7372 s.

Fig. 7. The steering angle velocity (control), and some differential states of the
optimal solution: directional velocity, side slip angle β, and velocity of yaw angle wz
plotted over time. The vertical lines indicate gear shifts.

As can be seen in Fig. 8, the car uses the track width to its full extent,
leading to active path constraints. As was expected, the optimal gear
increases in an acceleration phase. When the velocity has to be reduced, a
combination of braking, no acceleration, and engine brake is used.
The result depends on the engine speed constraint reng (v, μ) that be-
comes active in the braking phase. If the constraint is omitted, the optimal

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 659

Fig. 8. Elliptic race track seen from above with optimal position and gear choices
of the car. Note the exploitation of the slip (sliding) to change the car’s orientation as
fast as possible, when in first gear. The gear order changes when a different maximum
engine speed is imposed.

solution switches directly from the fourth gear into the first one to maxi-
mize the effect of the engine brake. For nMAX
eng = 15000 braking occurs in
the gear order 4, 2, 1.
Although this was left as a degree of freedom, the optimizer yields a
symmetric solution with respect to the upper and lower parts of the track
for all scenarios we considered.
10.3. Variants. By a more flexible use of Bezier patches more general
track constraints can be specified, e.g., of formula 1 race courses.
11. Simulated moving bed. We consider a simplified model of a
Simulated Moving Bed (SMB) chromatographic separation process that
contains time–dependent discrete decisions. SMB processes have been gain-
ing increased attention lately, see [17, 34, 56] for further references. The
related optimization problems are challenging from a mathematical point of
view, as they combine periodic nonlinear optimal control problems in par-
tial differential equations (PDE) with time–dependent discrete decisions.
11.1. Model and optimal control problem. SMB chromatogra-
phy finds various industrial applications such as sugar, food, petrochemical
and pharmaceutical industries. A SMB unit consists of multiple columns
filled with solid absorbent. The columns are connected in a continuous
cycle. There are two inlet streams, desorbent (De) and feed (Fe), and two
outlet streams, raffinate (Ra) and extract (Ex). The continuous counter-
current operation is simulated by switching the four streams periodically
in the direction of the liquid flow in the columns, thereby leading to better
separation. This is visualized in Figure 9.

www.it-ebooks.info
660 SEBASTIAN SAGER

Feed, Desorbent ?
6
?
1
?
2
?
3
?
4
?
5

     
6 6 6 6 6 6

Extract, Raffinate ?
1
?
2
?
3
?
4
?
5
?
6

Fig. 9. Scheme of SMB process with 6 columns.

Due to this discrete switching of columns, SMB processes reach a cyclic


or periodic steady state, i.e., the concentration profiles at the end of a
period are equal to those at the beginning shifted by one column ahead in
direction of the fluid flow. A number of different operating schemes have
been proposed to further improve the performance of SMB.
The considered SMB unit consists of Ncol = 6 columns. The flow rate
through column i is denoted by Qi , i ∈ I := {1, . . . , Ncol }. The raffinate,
desorbent, extract and feed flow rates are denoted by QRa , QDe , QEx and
QFe , respectively. The (possibly) time–dependent value wiα (t) ∈ {0, 1}
denotes if the port of flow α ∈ {Ra, De, Ex, Fe} is positioned at column
i ∈ I. As in many practical realizations of SMB processes only one pump
per flow is available and the ports are switched by a 0–1 valve, we obtain
the additional special ordered set type one restriction

wiα (t) = 1, ∀ t ∈ [0, T ], α ∈ {Ra, De, Ex, Fe}. (11.1)
i∈I

The flow rates Q1 , QDe , QEx and QFe enter as control functions u(·) resp.
time–invariant parameters p into the optimization problem, depending on
the operating scheme to be optimized. The remaining flow rates are derived
by mass balance as
QRa = QDe − QEx + QFe (11.2)
 
Qi = Qi−1 − wiα Qα + wiα Qα (11.3)
α∈{Ra,Ex} α∈{De,Fe}

for i = 2, . . . Ncol . The feed contains two components A and B dissolved in


desorbent, with concentrations cA B
Fe = cFe = 0.1. The concentrations of A
A B
and B in desorbent are cDe = cDe = 0.
A simplified equilibrium model is described in Diehl and Walther [16].
It can be derived from an equilibrium assumption between solid and liquid
phases along with a simple spatial discretization. The mass balance in the
liquid phase for K = A, B is given by:
∂cK
i (x, t) ∂q K (x, t) ∂cK (x, t)
b + (1 − b ) i + ui (t) i =0 (11.4)
∂t ∂t ∂x

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 661

with equilibrium between the liquid and solid phases given by a linear
isotherm:

qiK (x, t) = CK cK
i (x, t). (11.5)

Here b is the void fraction, cK


i (x, t) is the concentration in the liquid phase
of component K in column i, qiK is the concentration in the solid phase.
Also, i is the column index and NColumn is the number of columns. We can
combine (11.4) and (11.5) and rewrite the model as:

∂cK
i (x, t) ∂cK (x, t)
= −(ui (t)/K̄K ) i (11.6)
∂t ∂x
where K̄K = b + (1 − b )CK . Dividing the column into NF EX compart-
ments and applying a simple backward difference with Δx = L/NF EX
leads to:
dcK
i,j ui (t)NF EX K
= [ci,j−1 (t) − cK
i,j (t)] = k [ci,j−1 (t) − ci,j (t)]
K K K
(11.7)
dt K̄K L

for j = 1, . . . , NF EX , with k A = 2NF EX , kB = NF EX , and cK i,j (t) is a


discretization of cK i (jΔx, t) for j = 0, . . . , NF EX .
This simplified model for the dynamics in each column considers axial
convection and axial mixing introduced by dividing the respective column
into Ndis perfectly mixed compartments. Although this simple discretiza-
tion does not consider all effects present in the advection–diffusion equation
for the time and space dependent concentrations, the qualitative behavior
of the concentration profiles moving at different velocities through the re-
spective columns is sufficiently well represented. We assume that the com-
partment concentrations are constant. We denote the concentrations of A
and B in the compartment with index i by cA B
i , ci and leave away the time
dependency. For the first compartment j = (i − 1)Ndis + 1 of column i ∈ I
we have by mass transfer for K = A, B

ċK
j
 
j − − Qi cj −
= Qi− cK K
K
wiα Qα cK
j− + wiα Qα CαK (11.8)
k
α∈{Ra,Ex} α∈{De,Fe}

where i− is the preceding column, i− = Ncol if i = 1, i− = i − 1, else


and equivalently j − = N if j = 1, j − = j − 1, else. kK denotes the axial
convection in the column, k A = 2Ndis and k B = Ndis . Component A is
less adsorbed, thus travels faster and is prevailing in the raffinate, while B
travels slower and is prevailing in the extract. For interior compartments
j in column i we have

ċK
j
j − − Qi cj .
= Qi− cK K
(11.9)
kK

www.it-ebooks.info
662 SEBASTIAN SAGER

The compositions of extract and raffinate, α ∈ {Ex, Ra}, are given by



ṀαK = Qα wiα cK
j(i) (11.10)
i∈I

with j(i) the last compartment of column i− . The feed consumption is

ṀFe = QFe . (11.11)

These are altogether 2N + 5 differential equations for the differential states


x = (xA , xB , xM ) with xA = (cA A B B
0 , . . . , cN ), xB = (c0 , . . . , cN ), and finally
A B A B
xM = (MEx , MEx , MRa , MRa , MFe ). They can be summarized as

ẋ(t) = f (x(t), u(t), w(t), p). (11.12)

We define a linear operator P : Rnx → Rnx that shifts the concentration


profiles by one column and sets the auxiliary states to zero, i.e.,

x #→ P x := (PA xA , PB xB , PM xM ) with
PA xA := (cA A A A
Ndis +1 , . . . , cN , c1 , . . . , cNdis ),

PB xB := (cB B B B
Ndis +1 , . . . , cN , c1 , . . . , cNdis ),
PM xM := (0, 0, 0, 0, 0).

Then we can impose periodicity after the unknown cycle duration T by


requiring x(0) = P x(T ). The purity of component A in the raffinate at the
end of the cycle must be higher than pRa = 0.95 and the purity of B in
the extract must be higher than pEx = 0.95, i.e., we impose the terminal
purity conditions

A 1 − pEx B
MEx (T ) ≤ MEx (T ), (11.13)
pEx
B 1 − pRa A
MRa (T ) ≤ MRa (T ). (11.14)
pRa

We impose lower and upper bounds on all external and internal flow rates,

0 ≤ QRa , QDe , QEx , QFe , Q1 , Q2 , Q3 , Q4 , Q5 , Q6 ≤ Qmax = 2.(11.15)

To avoid draining inflow into outflow streams without going through a


column,

Qi − wiDe QDe − wiFe QFe >= 0 (11.16)

has to hold for all i ∈ I. The objective is to maximize the feed throughput
MFe (T )/T . Summarizing, we obtain the following MIOCP

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 663

max MFe (T )/T


x(·),u(·),w(·),p,T
s.t. ẋ(t) = f (x(t), u(t), w(t), p),
x(0) = P x(T ),
(11.13 − 11.16), (11.17)

i∈I wiα (t) = 1, ∀ t ∈ [0, T ],

w(t) ∈ {0, 1}4Ncol , ∀ t ∈ [0, T ].


with α ∈ {Ra, De, Ex, Fe}.
11.2. Results. We optimized different operation schemes that fit into
the general problem formulation (11.17): SMB fix. The wiα are fixed as
shown in Table 7. The flow rates Q· are constant in time, i.e., they enter
as optimization parameters p into (11.17). Optimal solution Φ = 0.7345.
SMB relaxed. As above. But the wiα are free for optimization and
relaxed to wiα ∈ [0, 1], allowing for a ”splitting” of the ports. Φ = 0.8747.
In PowerFeed the flow rates are modulated during one period, i.e., the Q·
enter as control functions u(·) into (11.17). Φ = 0.8452. VARICOL. The
ports switch asynchronically, but in a given order. The switching times
are subject to optimization. Φ = 0.9308. Superstruct. This scheme is
the most general and allows for arbitrary switching of the ports. The flow
rates enter as continuous control functions, but are found to be bang–bang
by the optimizer (i.e., whenever the port is given in Table 7, the respective
flow rate is at its upper bound). Φ = 1.0154.

Table 7
Fixed or optimized port assignment wiα and switching times of the process strategies.

Process Time 1 2 3 4 5 6
SMB fix 0.00 – 0.63 De Ex Fe Ra
SMB relaxed 0.00 – 0.50 De,Ex Ex Fe Ra
PowerFeed 0.00 – 0.56 De Ex Fe Ra
VARICOL 0.00 – 0.18 De Ex Fe Ra
0.18 – 0.36 De Ex Fe Ra
0.36 – 0.46 De,Ra Ex Fe
0.46 – 0.53 De,Ra Ex Fe
Superstruct 0.00 – 0.10 Ex De
0.10 – 0.18 De,Ex
0.18 – 0.24 De Ra
0.24 – 0.49 De Ex Fe Ra
0.49 – 0.49 De,Ex

12. Discretizations to MINLPs. In this section we provide AMPL


code for two discretized variants of the control problems from Sections 3

www.it-ebooks.info
664 SEBASTIAN SAGER

and 4 as an illustration of the discretization of MIOCPs to MINLPs. More


examples will be collected in the future on http://mintoc.de.
12.1. General AMPL code. In Listings 1 and 2 we provide two AMPL
input files that can be included for MIOCPs with one binary control w(t).
Listing 1
Generic settings AMPL model file to be included
param T > 0; # End time
param nt > 0; # Number o f d i s c r e t i z a t i o n p o i n t s i n time
param nu > 0; # Number o f c o n t r o l d i s c r e t i z a t i o n p o i n t s
param nx > 0; # Dimension o f d i f f e r e n t i a l s t a t e v e c t o r
param n t p e r u > 0 ; # n t / nu
s e t I := 0 . . nt ;
s e t U:= 0 . . nu −1;
param u i d x { I } ; param f i x w ; param f i x w ;

var w {U} >= 0 , <= 1 b i n a r y ; # control function


var dt {U} >= 0 , <= T ; # stage length vector

Listing 2
Generic settings AMPL data file to be included
if ( fix w > 0 ) t h e n { f o r { i in U} { f i x w [ i ] ; } }
if ( f i x dt > 0 ) t h e n { f o r { i in U} { f i x dt [ i ] ; } }

# S e t i n d i c e s o f c o n t r o l s c o r r e s p o n d i n g t o time p o i n t s
f o r { i in 0 . . nu−1} {
f o r { j in 0 . . n t p e r u −1} { l e t u i d x [ i ∗ n t p e r u+j ] := i ; }
}
l e t u i d x [ nt ] := nu −1;

12.2. Lotka Volterra fishing problem. The AMPL code in Listings 3


and 4 shows a discretization of the problem(4.1) with piecewise constant
controls on an equidistant grid of length T /nu and with an implicit Euler
method. Note that for other MIOCPs, especially for unstable ones as in
Section 7, more advanced integration methods such as Backward Differen-
tiation Formulae need to be applied.
Listing 3
AMPL model file for Lotka Volterra Fishing Problem
var x { I , 1 . . nx} >= 0 ;
param c1 > 0 ; param c2 > 0 ; param r e f 1 > 0 ; param r e f 2 > 0 ;

minimize D e v i a t i o n :
0 . 5 ∗ ( dt [ 0 ] / n t p e r u ) ∗ ( ( x [ 0 , 1 ] − r e f 1 ) ˆ 2 + ( x [ 0 , 2 ] − r e f 2 ) ˆ 2 )
+ 0 . 5 ∗ ( dt [ nu −1]/ n t p e r u ) ∗ ( ( x [ nt , 1 ] − r e f 1 ) ˆ 2 + ( x [ nt , 2 ] − r e f 2 ) ˆ 2 )
+ sum { i in I d i f f { 0 , nt } } ( ( dt [ u i d x [ i ] ] / n t p e r u ) ∗
( ( x [ i , 1 ] − r e f 1 )ˆ2 + ( x [ i , 2 ] − r e f 2 )ˆ2 ) ) ;

subj to ODE DISC 1 { i in I d i f f { 0 } } :


x [ i , 1 ] = x [ i − 1 , 1 ] + ( dt [ u i d x [ i ] ] / n t p e r u ) ∗
( x [ i , 1 ] − x [ i , 1 ] ∗x [ i , 2 ] − x [ i , 1 ] ∗ c1 ∗w [ u i d x [ i ] ] );

subj to ODE DISC 2 { i in I d i f f { 0 } } :


x [ i , 2 ] = x [ i − 1 , 2 ] + ( dt [ u i d x [ i ] ] / n t p e r u ) ∗
( − x [ i , 2 ] + x [ i , 1 ] ∗x [ i , 2 ] − x [ i , 2 ] ∗ c2 ∗w [ u i d x [ i ] ] );

subj to o v e r a l l s t a g e l e n g t h :
sum { i in U} dt [ i ] = T ;

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 665

Listing 4
AMPL dat file for Lotka Volterra Fishing Problem
# A l g o r i t h m i c pa r am et er s
param n t p e r u := 1 0 0 ; param nu := 1 0 0 ; param nt := 1 0 0 0 0 ;
param nx := 2 ; param f i x w := 0 ; param f i x dt := 1 ;

# Problem parameters
param T := 1 2 . 0 ; param c1 := 0 . 4 ; param c2 := 0 . 2 ;
param r e f 1 := 1 . 0 ; param r e f 2 := 1 . 0 ;

# I n i t i a l values d i f f e r e n t i a l states
l e t x [ 0 , 1 ] := 0 . 5 ; l e t x [ 0 , 2 ] := 0 . 7 ;
fix x [ 0 , 1 ] ; fix x [ 0 , 2 ] ;

# Initial values control


l e t { i in U} w [ i ] := 0 . 0 ;
f o r { i in 0 . . ( nu−1) / 2} { l e t w [ i ∗ 2 ] := 1 . 0 ; }
l e t { i in U} dt [ i ] := T / nu ;

Note that the constraint overall stage length is only necessary,


when the value for fix dt is zero, a switching time optimization.
The solution calculated by Bonmin (subversion revision number 1453,
default settings, 3 GHz, Linux 2.6.28-13-generic, with ASL(20081205)) has
an objective function value of Φ = 1.34434, while the optimum of the
relaxation is Φ = 1.3423368. Bonmin needs 35301 iterations and 2741 nodes
(4899.97 seconds). The intervals on the equidistant grid on which w(t) = 1
holds, counting from 0 to 99, are 20–32, 34, 36, 38, 40, 44, 53.
12.3. F-8 flight control. The main difficulty in calculating a time-
optimal solution for the problem in Section 3 is the determination of the
correct switching structure and of the switching points. If we want to
formulate a MINLP, we have to slightly modify this problem. Our aim is
not a minimization of the overall time, but now we want to get as close
as possible to the origin (0, 0, 0) in a prespecified time tf = 3.78086 on an
equidistant time grid. As this time grid is not a superset of the one used
for the time-optimal solution in Section 3, one can not expect to reach the
target state exactly. Listings 5 and 6 show the AMPL code.
Listing 5
AMPL model file for F-8 Flight Control Problem
var x { I , 1 . . nx } ;
param x i > 0;

minimize D e v i a t i o n : sum { i in 1 . . 3 } x [ nt , i ] ∗x [ nt , i ] ;

subj to ODE DISC 1 { i in I d i f f { 0 } } :


x [ i , 1 ] = x [ i − 1 , 1 ] + ( dt [ u i d x [ i ] ] / n t p e r u ) ∗ (
− 0 . 8 7 7 ∗x [ i , 1 ] + x [ i , 3 ] − 0 . 0 8 8 ∗x [ i , 1 ] ∗x [ i , 3 ] + 0 . 4 7 ∗x [ i , 1 ] ∗x [ i , 1 ]
− 0 . 0 1 9 ∗x [ i , 2 ] ∗x [ i , 2 ]
− x [ i , 1 ] ∗x [ i , 1 ] ∗x [ i , 3 ] + 3 . 8 4 6 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗x [ i , 1 ]
+ 0 . 2 1 5 ∗ x i − 0 . 2 8 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗ x i + 0 . 4 7 ∗x [ i , 1 ] ∗ x i ˆ2 − 0 . 6 3 ∗ x i ˆ2
− 2∗w [ u i d x [ i ] ] ∗ ( 0 . 2 1 5 ∗ x i − 0 . 2 8 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗ x i − 0 . 6 3 ∗ x i ˆ 3 ) ) ;

subj to ODE DISC 2 { i in I d i f f { 0 } } :


x [ i , 2 ] = x [ i − 1 , 2 ] + ( dt [ u i d x [ i ] ] / n t p e r u ) ∗ x [ i , 3 ] ;

subj to ODE DISC 3 { i in I d i f f { 0 } } :


x [ i , 3 ] = x [ i − 1 , 3 ] + ( dt [ u i d x [ i ] ] / n t p e r u ) ∗ (

www.it-ebooks.info
666 SEBASTIAN SAGER

− 4 . 2 0 8 ∗x [ i , 1 ] − 0 . 3 9 6 ∗x [ i , 3 ] − 0 . 4 7 ∗x [ i , 1 ] ∗x [ i , 1 ]
− 3 . 5 6 4 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗x [ i , 1 ]
+ 20.967∗xi − 6 . 2 6 5 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗ x i + 46∗x [ i , 1 ] ∗ x i ˆ2 − 6 1 . 4 ∗ x i ˆ3
− 2∗w [ u i d x [ i ] ] ∗ ( 2 0 . 9 6 7 ∗ x i − 6 . 2 6 5 ∗x [ i , 1 ] ∗x [ i , 1 ] ∗ x i − 6 1 . 4 ∗ x i ˆ 3 ) ) ;

Listing 6
AMPL dat file for F-8 Flight Control Problem
# Parameters
param n t p e r u := 5 0 0 ; param nu := 6 0 ; param nt := 3 0 0 0 0 ;
param nx := 3 ; param f i x w := 0 ; param f i x dt := 1 ;
param x i := 0 . 0 5 2 3 6 ; param T := 8 ;

# Initial values d i f f e r e n t i a l states


let x [0 ,1] := 0 . 4 6 5 5 ;
let x [0 ,2] := 0 . 0 ;
let x [0 ,3] := 0 . 0 ;
f o r { i in 1..3} { fix x [0 , i ] ; }

# Initial values control


l e t { i in U} w [ i ] := 0 . 0 ;
f o r { i in 0 . . ( nu−1) / 2} { l e t w [ i ∗ 2 ] := 1 . 0 ; }
l e t { i in U} dt [ i ] := 3 . 7 8 0 8 6 / nu ;

The solution calculated by Bonmin has an objective function value of


Φ = 0.023405, while the optimum of the relaxation is Φ = 0.023079. Bonmin
needs 85702 iterations and 7031 nodes (64282 seconds). The intervals on
the equidistant grid on which w(t) = 1 holds, counting from 0 to 59, are 0,
1, 31, 32, 42, 52, and 54. This optimal solution is shown in Figure 10.

Fig. 10. Trajectories for the discretized F-8 flight control problem. Left: optimal
integer control. Right: corresponding differential states.

13. Conclusions and outlook. We presented a collection of mixed-


integer optimal control problem descriptions. These descriptions comprise
details on the model and a specific instance of control objective, con-
straints, parameters, and initial values that yield well-posed optimization
problems that allow for reproducibility and comparison of solutions. Fur-
thermore, specific discretizations in time and space are applied with the
intention to supply benchmark problems also for MINLP algorithm devel-
opers. The descriptions are complemented by references and best known
solutions. All problem formulations are or will be available for download
at http://mintoc.de in a suited format, such as optimica or AMPL.

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 667

The author hopes to achieve at least two things. First, to provide a


benchmark library that will be of use for both MIOC and MINLP algorithm
developers. Second, to motivate others to contribute to the extension of
this library. For example, challenging and well-posed instances from water
or gas networks [11, 45], traffic flow [31, 22], supply chain networks [27],
submarine control [51], distributed autonomous systems [1], and chemical
engineering [35, 60] would be highly interesting for the community.

Acknowledgements. Important contributions to the online resource


http://mintoc.de by Alexander Buchner, Michael Engelhart, Christian
Kirches, and Martin Schlüter are gratefully acknowledged.

REFERENCES

[1] P. Abichandani, H. Benson, and M. Kam, Multi-vehicle path coordination


under communication constraints, in American Control Conference, 2008,
pp. 650–656.
[2] W. Achtziger and C. Kanzow, Mathematical programs with vanishing con-
straints: optimality conditions and constraint qualifications, Mathematical
Programming Series A, 114 (2008), pp. 69–99.
[3] E.S. Agency, GTOP database: Global optimisation trajectory problems and solu-
tions. http://www.esa.int/gsp/ACT/inf/op/globopt.htm.
[4] AT&T Bell Laboratories, University of Tennessee, and Oak Ridge National Labo-
ratory, Netlib linear programming library. http://www.netlib.org/lp/.
[5] B. Baumrucker and L. Biegler, MPEC strategies for optimization of a class
of hybrid dynamic systems, Journal of Process Control, 19 (2009), pp. 1248
– 1256. Special Section on Hybrid Systems: Modeling, Simulation and
Optimization.
[6] B. Baumrucker, J. Renfro, and L. Biegler, MPEC problem formulations and
solution strategies with chemical engineering applications, Computers and
Chemical Engineering, 32 (2008), pp. 2903–2913.
[7] L. Biegler, Solution of dynamic optimization problems by successive quadratic
programming and orthogonal collocation, Computers and Chemical Engineer-
ing, 8 (1984), pp. 243–248.
[8] T. Binder, L. Blank, H. Bock, R. Bulirsch, W. Dahmen, M. Diehl, T. Kro-
nseder, W. Marquardt, J. Schlöder, and O. Stryk, Introduction to model
based optimization of chemical processes on moving horizons, in Online Opti-
mization of Large Scale Systems: State of the Art, M. Grötschel, S. Krumke,
and J. Rambau, eds., Springer, 2001, pp. 295–340.
[9] H. Bock and R. Longman, Computation of optimal controls on disjoint control
sets for minimum energy subway operation, in Proceedings of the American
Astronomical Society. Symposium on Engineering Science and Mechanics, Tai-
wan, 1982.
[10] H. Bock and K. Plitt, A Multiple Shooting algorithm for direct solu-
tion of optimal control problems, in Proceedings of the 9th IFAC World
Congress, Budapest, 1984, Pergamon Press, pp. 243–247. Available at
http://www.iwr.uni-heidelberg.de/groups/agbock/FILES/Bock1984.pdf.
[11] J. Burgschweiger, B. Gnädig, and M. Steinbach, Nonlinear programming tech-
niques for operative planning in large drinking water networks, The Open
Applied Mathematics Journal, 3 (2009), pp. 1–16.
[12] M. Bussieck, Gams performance world.
http://www.gamsworld.org/performance.

www.it-ebooks.info
668 SEBASTIAN SAGER

[13] M.R. Bussieck, A.S. Drud, and A. Meeraus, Minlplib–a collection of test models
for mixed-integer nonlinear programming, INFORMS J. on Computing, 15
(2003), pp. 114–119.
[14] B. Chachuat, A. Singer, and P. Barton, Global methods for dynamic opti-
mization and mixed-integer dynamic optimization, Industrial and Engineering
Chemistry Research, 45 (2006), pp. 8573–8392.
[15] CMU-IBM, Cyber-infrastructure for MINLP collaborative site. http://minlp.org.
[16] M. Diehl and A. Walther, A test problem for periodic optimal control algorithms,
tech. rep., ESAT/SISTA, K.U. Leuven, 2006.
[17] S. Engell and A. Toumi, Optimisation and control of chromatography, Comput-
ers and Chemical Engineering, 29 (2005), pp. 1243–1252.
[18] W. Esposito and C. Floudas, Deterministic global optimization in optimal con-
trol problems, Journal of Global Optimization, 17 (2000), pp. 97–126.
[19] European Network of Excellence Hybrid Control, Website.
http://www.ist-hycon.org/.
[20] B.C. Fabien, dsoa: Dynamic system optimization.
http://abs-5.me.washington.edu/noc/dsoa.html.
[21] A. Filippov, Differential equations with discontinuous right hand side, AMS
Transl., 42 (1964), pp. 199–231.
[22] A. Fügenschuh, M. Herty, A. Klar, and A. Martin, Combinatorial and contin-
uous models for the optimization of traffic flows on networks, SIAM Journal
on Optimization, 16 (2006), pp. 1155–1176.
[23] A. Fuller, Study of an optimum nonlinear control system, Journal of Electronics
and Control, 15 (1963), pp. 63–71.
[24] W. Garrard and J. Jordan, Design of nonlinear automatic control systems,
Automatica, 13 (1977), pp. 497–505.
[25] M. Gerdts, Solving mixed-integer optimal control problems by Branch&Bound:
A case study from automobile test-driving with gear shift, Optimal Control
Applications and Methods, 26 (2005), pp. 1–18.
[26] , A variable time transformation method for mixed-integer optimal control
problems, Optimal Control Applications and Methods, 27 (2006), pp. 169–182.
[27] S. Göttlich, M. Herty, C. Kirchner, and A. Klar, Optimal control for con-
tinuous supply network models, Networks and Heterogenous Media, 1 (2007),
pp. 675–688.
[28] N. Gould, D. Orban, and P. Toint, CUTEr testing environment for optimiza-
tion and linear algebra solvers. http://cuter.rl.ac.uk/cuter-www/.
[29] I. Grossmann, Review of nonlinear mixed-integer and disjunctive programming
techniques, Optimization and Engineering, 3 (2002), pp. 227–252.
[30] I. Grossmann, P. Aguirre, and M. Barttfeld, Optimal synthesis of complex
distillation columns using rigorous models, Computers and Chemical Engi-
neering, 29 (2005), pp. 1203–1215.
[31] M. Gugat, M. Herty, A. Klar, and G. Leugering, Optimal control for traffic
flow networks, Journal of Optimization Theory and Applications, 126 (2005),
pp. 589–616.
[32] T.O. Inc., Propt - matlab optimal control software (dae, ode).
http://tomdyn.com/.
[33] A. Izmailov and M. Solodov, Mathematical programs with vanishing constraints:
Optimality conditions, sensitivity, and a relaxation method, Journal of Opti-
mization Theory and Applications, 142 (2009), pp. 501–532.
[34] Y. Kawajiri and L. Biegler, A nonlinear programming superstructure for opti-
mal dynamic operations of simulated moving bed processes, I&EC Research,
45 (2006), pp. 8503–8513.
[35] , Optimization strategies for Simulated Moving Bed and PowerFeed pro-
cesses, AIChE Journal, 52 (2006), pp. 1343–1350.
[36] C. Kaya and J. Noakes, A computational method for time-optimal control, Jour-
nal of Optimization Theory and Applications, 117 (2003), pp. 69–92.

www.it-ebooks.info
A BENCHMARK LIBRARY OF MIOCPs 669

[37] C. Kirches, S. Sager, H. Bock, and J. Schlöder, Time-optimal control of auto-


mobile test drives with gear shifts, Optimal Control Applications and Methods,
31 (2010), pp. 137–153.
[38] P. Krämer-Eis, Ein Mehrzielverfahren zur numerischen Berechnung optimaler
Feedback–Steuerungen bei beschränkten nichtlinearen Steuerungsproblemen,
Vol. 166 of Bonner Mathematische Schriften, Universität Bonn, Bonn, 1985.
[39] U. Kummer, L. Olsen, C. Dixon, A. Green, E. Bornberg-Bauer, and
G. Baier, Switching from simple to complex oscillations in calcium signal-
ing, Biophysical Journal, 79 (2000), pp. 1188–1195.
[40] L. Larsen, R. Izadi-Zamanabadi, R. Wisniewski, and C. Sonntag, Super-
market refrigeration systems – a benchmark for the optimal control of hy-
brid systems, tech. rep., Technical report for the HYCON NoE., 2007.
http://www.bci.tu-dortmund.de/ast/hycon4b/index.php.
[41] D. Lebiedz, S. Sager, H. Bock, and P. Lebiedz, Annihilation of limit cycle os-
cillations by identification of critical phase resetting stimuli via mixed-integer
optimal control methods, Physical Review Letters, 95 (2005), p. 108303.
[42] H. Lee, K. Teo, V. Rehbock, and L. Jennings, Control parametrization en-
hancing technique for time-optimal control problems, Dynamic Systems and
Applications, 6 (1997), pp. 243–262.
[43] D. Leineweber, Efficient reduced SQP methods for the optimization of chemi-
cal processes described by large sparse DAE models, Vol. 613 of Fortschritt-
Berichte VDI Reihe 3, Verfahrenstechnik, VDI Verlag, Düsseldorf, 1999.
[44] A. Martin, T. Achterberg, T. Koch, and G. Gamrath, Miplib - mixed integer
problem library. http://miplib.zib.de/.
[45] A. Martin, M. Möller, and S. Moritz, Mixed integer models for the stationary
case of gas network optimization, Mathematical Programming, 105 (2006),
pp. 563–582.
[46] J. Oldenburg, Logic–based modeling and optimization of discrete–continuous dy-
namic systems, Vol. 830 of Fortschritt-Berichte VDI Reihe 3, Verfahrens-
technik, VDI Verlag, Düsseldorf, 2005.
[47] J. Oldenburg and W. Marquardt, Disjunctive modeling for optimal con-
trol of hybrid systems, Computers and Chemical Engineering, 32 (2008),
pp. 2346–2364.
[48] I. Papamichail and C. Adjiman, Global optimization of dynamic systems, Com-
puters and Chemical Engineering, 28 (2004), pp. 403–415.
[49] L. Pontryagin, V. Boltyanski, R. Gamkrelidze, and E. Miscenko, The Math-
ematical Theory of Optimal Processes, Wiley, Chichester, 1962.
[50] A. Prata, J. Oldenburg, A. Kroll, and W. Marquardt, Integrated scheduling
and dynamic optimization of grade transitions for a continuous polymeriza-
tion reactor, Computers and Chemical Engineering, 32 (2008), pp. 463–476.
[51] V. Rehbock and L. Caccetta, Two defence applications involving discrete valued
optimal control, ANZIAM Journal, 44 (2002), pp. E33–E54.
[52] S. Sager, MIOCP benchmark site. http://mintoc.de.
[53] S. Sager, Numerical methods for mixed–integer optimal control problems, Der an-
dere Verlag, Tönning, Lübeck, Marburg, 2005. ISBN 3-89959-416-9. Available
at http://sager1.de/sebastian/downloads/Sager2005.pdf.
[54] S. Sager, Reformulations and algorithms for the optimization of switching de-
cisions in nonlinear optimal control, Journal of Process Control, 19 (2009),
pp. 1238–1247.
[55] S. Sager, H. Bock, M. Diehl, G. Reinelt, and J. Schlöder, Numerical methods
for optimal control with binary control functions applied to a Lotka-Volterra
type fishing problem, in Recent Advances in Optimization (Proceedings of
the 12th French-German-Spanish Conference on Optimization), A. Seeger,
ed., Vol. 563 of Lectures Notes in Economics and Mathematical Systems,
Heidelberg, 2006, Springer, pp. 269–289.

www.it-ebooks.info
670 SEBASTIAN SAGER

[56] S. Sager, M. Diehl, G. Singh, A. Küpper, and S. Engell, Determining SMB


superstructures by mixed-integer control, in Proceedings OR2006, K.-H. Wald-
mann and U. Stocker, eds., Karlsruhe, 2007, Springer, pp. 37–44.
[57] S. Sager, C. Kirches, and H. Bock, Fast solution of periodic optimal control
problems in automobile test-driving with gear shifts, in Proceedings of the 47th
IEEE Conference on Decision and Control (CDC 2008), Cancun, Mexico, 2008,
pp. 1563–1568. ISBN: 978-1-4244-3124-3.
[58] S. Sager, G. Reinelt, and H. Bock, Direct methods with maximal lower bound
for mixed-integer optimal control problems, Mathematical Programming, 118
(2009), pp. 109–149.
[59] K. Schittkowski, Test problems for nonlinear programming - user’s guide, tech.
rep., Department of Mathematics, University of Bayreuth, 2002.
[60] C. Sonntag, O. Stursberg, and S. Engell, Dynamic optimization of an indus-
trial evaporator using graph search with embedded nonlinear programming, in
Proc. 2nd IFAC Conf. on Analysis and Design of Hybrid Systems (ADHS),
2006, pp. 211–216.
[61] B. Srinivasan, S. Palanki, and D. Bonvin, Dynamic Optimization of Batch Pro-
cesses: I. Characterization of the nominal solution, Computers and Chemical
Engineering, 27 (2003), pp. 1–26.
[62] M. Szymkat and A. Korytowski, The method of monotone structural evolution
for dynamic optimization of switched systems, in IEEE CDC08 Proceedings,
2008.
[63] M. Zelikin and V. Borisov, Theory of chattering control with applications to
astronautics, robotics, economics and engineering, Birkhäuser, Basel Boston
Berlin, 1994.

www.it-ebooks.info
IMA HOT TOPICS WORKSHOP PARTICIPANTS

Mixed-Integer Nonlinear Optimization:


Algorithmic Advances and Applications

• Kurt M. Anstreicher, Department of Management Sciences, Uni-


versity of Iowa
• Pietro Belotti, Department of Mathematical Sciences, Clemson
University
• Hande Yurttan Benson, Department of Decision Sciences, Drexel
University
• Dimitris Bertsimas, Sloan School of Management, Massachusetts
Institute of Technology
• Lorenz T. Biegler, Chemical Engineering Department, Carnegie
Mellon University
• Christian Bliek, CPLEX R&D, ILOG Corporation
• Pierre Bonami, Laboratoire d’Informatique Fondamentale de Mar-
seille, Centre National de la Recherche Scientifique (CNRS)
• Samuel Burer, Department of Management Sciences, University of
Iowa
• Alfonso Cano, University of Minnesota
• Xianjin Chen, Institute for Mathematics and its Applications, Uni-
versity of Minnesota
• Claudia D’Ambrosio, Dipartimento di Elettronica, Informatica e
Sistemistica, Universit di Bologna
• Michel Jacques Daydé ENSEEIHT, Institut National Polytech-
nique de Toulouse
• Jesus Antonio De Loera, Department of Mathematics, University
of California, Davis
• Sarah Drewes, Mathematics, Research Group Nonlinear Optimiza-
tion, Technische Universität Darmstadt
• Ricardo Fukasawa, T.J. Watson Research Center, IBM
• Kevin Furman, Exxon Research and Engineering Company
• Weiguo Gao, School of Mathematical Sciences, Fudan University
• David M. Gay, Optimization and Uncertainty Estimation, Sandia
National Laboratories
• Philip E. Gill, Department of Mathematics, University of Califor-
nia, San Diego
• Ignacio E. Grossmann, Department of Chemicial Engineering, Car-
negie Mellon University

J. Lee and S. Leyffer (eds.), Mixed Integer Nonlinear Programming, The IMA Volumes 671
in Mathematics and its Applications 154, DOI 10.1007/978-1-4614-1927-3,
© Springer Science+Business Media, LLC 2012
www.it-ebooks.info
672 IMA HOT TOPICS WORKSHOP PARTICIPANTS

• Oktay Gunluk, Mathematical Sciences Department, IBM


• William E. Hart, Sandia National Laboratories
• David Haws, Department of Mathematics, University of California,
Davis
• Christoph Helmberg, Fakultät für Mathematik, Technische Uni-
versität Chemnitz-Zwickau
• Andreas G. Karabis, Department of Research and Development,
PI Medical Ltd.
• Markus Keel, Institute for Mathematics and its Applications, Uni-
versity of Minnesota
• Vassilio kekatos, University of Minnesota
• Mustafa Rasim Kilinc, Department of Industrial Engineering, Uni-
versity of Pittsburgh
• Erica Zimmer Klampfl, Ford Research Laboratory, Ford
• Matthias Koeppe, Department of Mathematics, University of Cal-
ifornia, Davis
• Jon Lee, Mathematical Sciences Department, IBM T.J. Watson
Research Center
• Thomas Lehmann, Fachgruppe Informatik, Universität Bayreuth
• Sven Leyffer, Mathematics and Computer Science Division, Ar-
gonne National Laboratory
• Tong Li, Department of Mathematics, University of Iowa
• Yongfeng Li Institute for Mathematics and its Applications Uni-
versity of Minnesota
• Leo Liberti, LIX, École Polytechnique
• Jeff Linderoth, Department of Industrial and Systems Engineering,
University of Wisconsin-Madison
• Chun Liu, Institute for Mathematics and its Applications, Univer-
sity of Minnesota
• Andrea Lodi, DEIS, Universitá di Bologna
• James Luedtke, Industrial and Systems Engineering Department,
University of Wisconsin-Madison
• Tom Luo, Department of Electrical and Computer Engineering,
University of Minnesota
• Francois Margot, Tepper School of Business, Carnegie Mellon Uni-
versity
• Susan Margulies, Department of Computational and Applied
Mathematics, Rice University
• Andrew James Miller, Department of Mathematics, Université de
Bordeaux I

www.it-ebooks.info
IMA HOT TOPICS WORKSHOP PARTICIPANTS 673

• Kien Ming Ng, Department of Industrial and Systems Engineering,


National University of Singapore
• John E. Mitchell, Department of Mathematical Sciences, Rensse-
laer Polytechnic Institute
• Hans Mittelmann, Department of Mathematics and Statistics, Ari-
zona State University
• Todd S. Munson, Mathematics and Computer Science Division,
Argonne National Laboratory
• Mahdi Namazifar, Industrial and Systems Engineering Depart-
ment, University of Wisconsin-Madison
• Giacomo Nannicini, Laboratoire d’Informatique École Polytech-
nique
• Jorge Nocedal, Department of Electrical Engineering and Com-
puter Science, Northwestern University
• Isamu Onishi, Department of Mathematical and Life Sciences, Hi-
roshima University
• Shmuel Onn, IE & M Technion-Israel Institute of Technology
• Pablo A. Parrilo, Electrical Engineering and Computer Science
Department, Massachusetts Institute of Technology
• Jaroslav Pekar, Honeywell Prague Laboratory, Honeywell
• Jiming Peng, Department of Industrial and Enterprise System En-
gineering, University of Illinois at Urbana-Champaign
• Cynthia A. Phillips, Discrete Mathematics and Complex Systems
Department, Sandia National Laboratories
• Kashif Rashid, Schlumberger-Doll
• Franz F. Rendl, Institut für Mathematik, Universität Klagenfurt
• Kees Roos, Department of Information Systems and Algorithms,
Technische Universiteit te Delft
• Sebastian Sager, Interdisciplinary Center for Scientific Computing,
Ruprecht-Karls-Universität Heidelberg
• Fadil Santosa, Institute for Mathematics and its Applications, Uni-
versity of Minnesota
• Annick Sartenaer, Department of Mathematics, Facultés Universi-
taires Notre Dame de la Paix (Namur)
• Anureet Saxena, Axioma Inc.
• Uday V. Shanbhag, Industrial & Enterprise Systems Engineering
Department, University of Illinois at Urbana-Champaign
• Tamás Terlaky, Department of Industrial and Systems Engineer-
ing, Lehigh University
• Jon Van Laarhoven, Department of Applied Mathematics and
Computational Sciences, University of Iowa

www.it-ebooks.info
674 IMA HOT TOPICS WORKSHOP PARTICIPANTS

• Stefan Vigerske, Department of Mathematics, Humboldt-Universi-


tät
• Andreas Wächter, Mathematical Sciences Department, IBM
• Richard A. Waltz, Department of Industrial and Systems Engi-
neering, University of Southern California
• Robert Weismantel, Department of Mathematical Optimization,
Otto-von-Guericke-Universität Magdeburg
• Tapio Westerlund, Department of Chemical Engineering, Åbo Aka-
demi (Finland-Swedish University of Åbo)
• Angelika Wiegele, Department of Mathematics, Universität Kla-
genfurt
• Fei Yang, Department of Biomedical Engineering, University of
Minnesota
• Hongchao Zhang, Department of Mathematics, Louisiana State
University

www.it-ebooks.info
IMA SUMMER PROGRAMS 675

1992–1993 Control Theory and its Applications


1993–1994 Emerging Applications of Probability
1994–1995 Waves and Scattering
1995–1996 Mathematical Methods in Material Science
1996–1997 Mathematics of High Performance Computing
1997–1998 Emerging Applications of Dynamical Systems
1998–1999 Mathematics in Biology
1999–2000 Reactive Flows and Transport Phenomena
2000–2001 Mathematics in Multimedia
2001–2002 Mathematics in the Geosciences
2002–2003 Optimization
2003–2004 Probability and Statistics in Complex Systems: Genomics,
Networks, and Financial Engineering
2004–2005 Mathematics of Materials and Macromolecules: Multiple
Scales, Disorder, and Singularities
2005–2006 Imaging
2006–2007 Applications of Algebraic Geometry
2007–2008 Mathematics of Molecular and Cellular Biology
2008–2009 Mathematics and Chemistry
2009–2010 Complex Fluids and Complex Flows
2010–2011 Simulating Our Complex World: Modeling, Computation
and Analysis
2011–2012 Mathematics of Information
2012–2013 Infinite Dimensional and Stochastic Dynamical Systems
and their Applications
2013–2014 Scientificand Engineering Applicationsof Algebraic Topology

IMA SUMMER PROGRAMS

1987 Robotics
1988 Signal Processing
1989 Robust Statistics and Diagnostics
1990 Radar and Sonar (June 18–29)
New Directions in Time Series Analysis (July 2–27)
1991 Semiconductors
1992 Environmental Studies: Mathematical, Computational, and
Statistical Analysis
1993 Modeling, Mesh Generation, and Adaptive Numerical Methods
for Partial Differential Equations
1994 Molecular Biology

www.it-ebooks.info
676 IMA “ HOT TOPICS/SPECIAL” WORKSHOPS

1995 Large Scale Optimizations with Applications to Inverse Problems,


Optimal Control and Design, and Molecular and Structural
Optimization
1996 Emerging Applications of Number Theory (July 15–26)
Theory of Random Sets (August 22–24)
1997 Statistics in the Health Sciences
1998 Coding and Cryptography (July 6–18)
Mathematical Modeling in Industry (July 22–31)
1999 Codes, Systems, and Graphical Models (August 2–13, 1999)
2000 Mathematical Modeling in Industry: A Workshop for Graduate
Students (July 19–28)
2001 Geometric Methods in Inverse Problems and PDE Control
(July 16–27)
2002 Special Functions in the Digital Age (July 22–August 2)
2003 Probability and Partial Differential Equations in Modern
Applied Mathematics (July 21–August 1)
2004 n-Categories: Foundations and Applications (June 7–18)
2005 Wireless Communications (June 22–July 1)
2006 Symmetries and Overdetermined Systems of Partial Differential
Equations (July 17–August 4)
2007 Classical and Quantum Approaches in Molecular Modeling
(July 23–August 3)
2008 Geometrical Singularities and Singular Geometries (July 14–25)
2009 Nonlinear Conservation Laws and Applications (July 13–31)

IMA “HOT TOPICS/SPECIAL” WORKSHOPS

• Challenges and Opportunities in Genomics: Production, Storage,


Mining and Use, April 24–27, 1999
• Decision Making Under Uncertainty: Energy and Environmental
Models, July 20–24, 1999
• Analysis and Modeling of Optical Devices, September 9–10, 1999
• Decision Making under Uncertainty: Assessment of the Reliability
of Mathematical Models, September 16–17, 1999
• Scaling Phenomena in Communication Networks, October 22–24,
1999
• Text Mining, April 17–18, 2000
• Mathematical Challenges in Global Positioning Systems (GPS),
August 16–18, 2000
• Modeling and Analysis of Noise in Integrated Circuits and Systems,
August 29–30, 2000
• Mathematics of the Internet: E-Auction and Markets, December
3–5, 2000
• Analysis and Modeling of Industrial Jetting Processes, January
10–13, 2001

www.it-ebooks.info
IMA “ HOT TOPICS/SPECIAL” WORKSHOPS 677

• Special Workshop: Mathematical Opportunities in Large-Scale Net-


work Dynamics, August 6–7, 2001
• Wireless Networks, August 8–10 2001
• Numerical Relativity, June 24–29, 2002
• Operational Modeling and Biodefense: Problems, Techniques, and
Opportunities, September 28, 2002
• Data-driven Control and Optimization, December 4–6, 2002
• Agent Based Modeling and Simulation, November 3–6, 2003
• Enhancing the Search of Mathematics, April 26–27, 2004
• Compatible Spatial Discretizations for Partial Differential Equa-
tions, May 11–15, 2004
• Adaptive Sensing and Multimode Data Inversion, June 27–30, 2004
• Mixed Integer Programming, July 25–29, 2005
• New Directions in Probability Theory, August 5–6, 2005
• Negative Index Materials, October 2–4, 2006
• The Evolution of Mathematical Communication in the Age of Dig-
ital Libraries, December 8–9, 2006
• Math is Cool! and Who Wants to Be a Mathematician?, November
3, 2006
• Special Workshop: Blackwell-Tapia Conference, November 3–4,
2006
• Stochastic Models for Intracellular Reaction Networks, May 11–13,
2008
• Multi-Manifold Data Modeling and Applications, October 27–30,
2008
• Mixed-Integer Nonlinear Optimization: Algorithmic Advances and
Applications, November 17–21, 2008
• Higher Order Geometric Evolution Equations: Theory and Appli-
cations from Microfluidics to Image Understanding, March 23–26,
2009
• Career Options for Women in Mathematical Sciences, April 2–4,
2009
• MOLCAS, May 4–8, 2009
• IMA Interdisciplinary Research Experience for Undergraduates,
June 29–July 31, 2009
• Research in Imaging Sciences, October 5–7, 2009
• Career Options for Underrepresented Groups in Mathematical Sci-
ences, March 25–27, 2010
• Physical Knotting and Linking and its Applications, April 9, 2010
• IMA Interdisciplinary Research Experience for Undergraduates,
June 14–July 16, 2010
• Kickoff Workshop for Project MOSAIC, June 30–July 2, 2010
• Finite Element Circus Featuring a Scientific Celebration of Falk,
Pasciak, and Wahlbin, November 5–6, 2010

www.it-ebooks.info
678 SPRINGER LECTURE NOTES FROM THE IMA

• Integral Equation Methods, Fast Algorithms and Applications, Au-


gust 2–5, 2010
• Medical Device-Biological Interactions at the Material-Tissue In-
terface, September 13–15, 2010
• First Abel Conference A Mathematical Celebration of John Tate,
January 3–5, 2011
• Strain Induced Shape Formation: Analysis, Geometry and Mate-
rials Science, May 16–20, 2011
• Uncertainty Quantification in Industrial and Energy Applications:
Experiences and Challenges, June 2–4, 2011
• Girls and Mathematics Summer Day Program, June 20–24, 2011
• Special Workshop: Wavelets and Applications: A Multi-Disciplinary
Undergraduate Course with an Emphasis on Scientific Computing,
July 13–16, 2011
• Special Workshop: Wavelets and Applications: Project Building
Workshop, July 13–16, 2011
• Macaulay2, July 25–29, 2011
• Instantaneous Frequencies and Trends for Nonstationary Nonlinear
Data, September 7–9, 2011

SPRINGER LECTURE NOTES FROM THE IMA

The Mathematics and Physics of Disordered Media


Editors: Barry Hughes and Barry Ninham
(Lecture Notes in Math., Volume 1035, 1983)
Orienting Polymers
Editor: J.L. Ericksen
(Lecture Notes in Math., Volume 1063, 1984)
New Perspectives in Thermodynamics
Editor: James Serrin
(Springer-Verlag, 1986)
Models of Economic Dynamics
Editor: Hugo Sonnenschein
(Lecture Notes in Econ., Volume 264, 1986)

www.it-ebooks.info
IMA VOLUMES 679

THE IMA VOLUMES


IN MATHEMATICS AND ITS APPLICATIONS

Volume 1: Homogenization and Effective Moduli of Materials and Media


Editors: Jerry Ericksen, David Kinderlehrer, Robert Kohn, and
J.-L. Lions

Volume 2: Oscillation Theory, Computation, and Methods of


Compensated Compactness
Editors: Constantine Dafermos, Jerry Ericksen,
David Kinderlehrer, and Marshall Slemrod

Volume 3: Metastability and Incompletely Posed Problems


Editors: Stuart Antman, Jerry Ericksen, David Kinderlehrer, and
Ingo Muller

Volume 4: Dynamical Problems in Continuum Physics


Editors: Jerry Bona, Constantine Dafermos, Jerry Ericksen, and
David Kinderlehrer

Volume 5: Theory and Applications of Liquid Crystals


Editors: Jerry Ericksen and David Kinderlehrer

Volume 6: Amorphous Polymers and Non-Newtonian Fluids


Editors: Constantine Dafermos, Jerry Ericksen, and
David Kinderlehrer

Volume 7: Random Media


Editor: George Papanicolaou

Volume 8: Percolation Theory and Ergodic Theory of Infinite Particle


Systems
Editor: Harry Kesten

Volume 9: Hydrodynamic Behavior and Interacting Particle Systems


Editor: George Papanicolaou

Volume 10: Stochastic Differential Systems, Stochastic Control Theory,


and Applications
Editors: Wendell Fleming and Pierre-Louis Lions

Volume 11: Numerical Simulation in Oil Recovery


Editor: Mary Fanett Wheeler

www.it-ebooks.info
680 IMA VOLUMES

Volume 12: Computational Fluid Dynamics and Reacting Gas Flows


Editors: Bjorn Engquist, M. Luskin, and Andrew Majda

Volume 13: Numerical Algorithms for Parallel Computer Architectures


Editor: Martin H. Schultz

Volume 14: Mathematical Aspects of Scientific Software


Editor: J.R. Rice

Volume 15: Mathematical Frontiers in Computational Chemical Physics


Editor: D. Truhlar

Volume 16: Mathematics in Industrial Problems


by Avner Friedman

Volume 17: Applications of Combinatorics and Graph Theory to the


Biological and Social Sciences
Editor: Fred Roberts

Volume 18: q-Series and Partitions


Editor: Dennis Stanton

Volume 19: Invariant Theory and Tableaux


Editor: Dennis Stanton

Volume 20: Coding Theory and Design Theory Part I: Coding Theory
Editor: Dijen Ray-Chaudhuri

Volume 21: Coding Theory and Design Theory Part II: Design Theory
Editor: Dijen Ray-Chaudhuri

Volume 22: Signal Processing: Part I - Signal Processing Theory


Editors: L. Auslander, F.A. Grünbaum, J.W. Helton, T. Kailath,
P. Khargonekar, and S. Mitter

Volume 23: Signal Processing: Part II - Control Theory and Applications


of Signal Processing
Editors: L. Auslander, F.A. Grünbaum, J.W. Helton, T. Kailath,
P. Khargonekar, and S. Mitter

Volume 24: Mathematics in Industrial Problems, Part II


by Avner Friedman

Volume 25: Solitons in Physics, Mathematics, and Nonlinear Optics


Editors: Peter J. Olver and David H. Sattinger

www.it-ebooks.info
IMA VOLUMES 681

Volume 26: Two Phase Flows and Waves


Editors: Daniel D. Joseph and David G. Schaeffer

Volume 27: Nonlinear Evolution Equations that Change Type


Editors: Barbara Lee Keyfitz and Michael Shearer

Volume 28: Computer Aided Proofs in Analysis


Editors: Kenneth Meyer and Dieter Schmidt

Volume 29: Multidimensional Hyperbolic Problems and Computations


Editors: Andrew Majda and Jim Glimm

Volume 30: Microlocal Analysis and Nonlinear Waves


Editors: Michael Beals, R. Melrose, and J. Rauch

Volume 31: Mathematics in Industrial Problems, Part III


by Avner Friedman

Volume 32: Radar and Sonar, Part I


by Richard Blahut, Willard Miller, Jr., and Calvin Wilcox

Volume 33: Directions in Robust Statistics and Diagnostics: Part I


Editors: Werner A. Stahel and Sanford Weisberg

Volume 34: Directions in Robust Statistics and Diagnostics: Part II


Editors: Werner A. Stahel and Sanford Weisberg

Volume 35: Dynamical Issues in Combustion Theory


Editors: P. Fife, A. Liñán, and F.A. Williams

Volume 36: Computing and Graphics in Statistics


Editors: Andreas Buja and Paul Tukey

Volume 37: Patterns and Dynamics in Reactive Media


Editors: Harry Swinney, Gus Aris, and Don Aronson

Volume 38: Mathematics in Industrial Problems, Part IV


by Avner Friedman

Volume 39: Radar and Sonar, Part II


Editors: F. Alberto Grünbaum, Marvin Bernfeld, and
Richard E. Blahut

Volume 40: Nonlinear Phenomena in Atmospheric and Oceanic Sciences


Editors: George F. Carnevale and Raymond T. Pierrehumbert

www.it-ebooks.info
682 IMA VOLUMES

Volume 41: Chaotic Processes in the Geological Sciences


Editor: David A. Yuen

Volume 42: Partial Differential Equations with Minimal Smoothness and


Applications
Editors: B. Dahlberg, E. Fabes, R. Fefferman, D. Jerison, C. Kenig,
and J. Pipher

Volume 43: On the Evolution of Phase Boundaries


Editors: Morton E. Gurtin and Geoffrey B. McFadden

Volume 44: Twist Mappings and Their Applications


Editors: Richard McGehee and Kenneth R. Meyer

Volume 45: New Directions in Time Series Analysis, Part I


Editors: David Brillinger, Peter Caines, John Geweke,
Emanuel Parzen, Murray Rosenblatt, and Murad S. Taqqu

Volume 46: New Directions in Time Series Analysis, Part II


Editors: David Brillinger, Peter Caines, John Geweke,
Emanuel Parzen, Murray Rosenblatt, and Murad S. Taqqu

Volume 47: Degenerate Diffusions


Editors: Wei-Ming Ni, L.A. Peletier, and J.-L. Vazquez

Volume 48: Linear Algebra, Markov Chains, and Queueing Models


Editors: Carl D. Meyer and Robert J. Plemmons

Volume 49: Mathematics in Industrial Problems, Part V


by Avner Friedman

Volume 50: Combinatorial and Graph-Theoretic Problems in Linear


Algebra
Editors: Richard A. Brualdi, Shmuel Friedland, and Victor Klee

Volume 51: Statistical Thermodynamics and Differential Geometry of


Microstructured Materials
Editors: H. Ted Davis and Johannes C.C. Nitsche

Volume 52: Shock Induced Transitions and Phase Structures in General


Media
Editors: J.E. Dunn, Roger Fosdick, and Marshall Slemrod

Volume 53: Variational and Free Boundary Problems


Editors: Avner Friedman and Joel Spruck

www.it-ebooks.info
IMA VOLUMES 683

Volume 54: Microstructure and Phase Transitions


Editors: David Kinderlehrer, Richard James, Mitchell Luskin, and
Jerry L. Ericksen

Volume 55: Turbulence in Fluid Flows: A Dynamical Systems Approach


Editors: George R. Sell, Ciprian Foias, and Roger Temam

Volume 56: Graph Theory and Sparse Matrix Computation


Editors: Alan George, John R. Gilbert, and Joseph W.H. Liu

Volume 57: Mathematics in Industrial Problems, Part VI


by Avner Friedman

Volume 58: Semiconductors, Part I


Editors: W.M. Coughran, Jr., Julian Cole, Peter Lloyd,
and Jacob White

Volume 59: Semiconductors, Part II


Editors: W.M. Coughran, Jr., Julian Cole, Peter Lloyd, and
Jacob White

Volume 60: Recent Advances in Iterative Methods


Editors: Gene Golub, Anne Greenbaum, and Mitchell Luskin

Volume 61: Free Boundaries in Viscous Flows


Editors: Robert A. Brown and Stephen H. Davis

Volume 62: Linear Algebra for Control Theory


Editors: Paul Van Dooren and Bostwick Wyman

Volume 63: Hamiltonian Dynamical Systems: History, Theory, and


Applications
Editors: H.S. Dumas, K.R. Meyer, and D.S. Schmidt

Volume 64: Systems and Control Theory for Power Systems


Editors: Joe H. Chow, Petar V. Kokotovic, and Robert J. Thomas

Volume 65: Mathematical Finance


Editors: Mark H.A. Davis, Darrell Duffie, Wendell H. Fleming,
and Steven E. Shreve

Volume 66: Robust Control Theory


Editors: Bruce A. Francis and Pramod P. Khargonekar

www.it-ebooks.info
684 IMA VOLUMES

Volume 67: Mathematics in Industrial Problems, Part VII


by Avner Friedman

Volume 68: Flow Control


Editor: Max D. Gunzburger

Volume 69: Linear Algebra for Signal Processing


Editors: Adam Bojanczyk and George Cybenko

Volume 70: Control and Optimal Design of Distributed Parameter


Systems
Editors: John E. Lagnese, David L. Russell, and Luther W. White

Volume 71: Stochastic Networks


Editors: Frank P. Kelly and Ruth J. Williams

Volume 72: Discrete Probability and Algorithms


Editors: David Aldous, Persi Diaconis, Joel Spencer, and
J. Michael Steele

Volume 73: Discrete Event Systems, Manufacturing Systems,


and Communication Networks
Editors: P.R. Kumar and P.P. Varaiya

Volume 74: Adaptive Control, Filtering, and Signal Processing


Editors: K.J. Åström, G.C. Goodwin, and P.R. Kumar

Volume 75: Modeling, Mesh Generation, and Adaptive Numerical


Methods for Partial Differential Equations
Editors: Ivo Babuska, Joseph E. Flaherty, William D. Henshaw,
John E. Hopcroft, Joseph E. Oliger, and Tayfun Tezduyar

Volume 76: Random Discrete Structures


Editors: David Aldous and Robin Pemantle

Volume 77: Nonlinear Stochastic PDE’s: Hydrodynamic Limit and


Burgers’ Turbulence
Editors: Tadahisa Funaki and Wojbor A. Woyczynski

Volume 78: Nonsmooth Analysis and Geometric Methods in Deterministic


Optimal Control
Editors: Boris S. Mordukhovich and Hector J. Sussmann

Volume 79: Environmental Studies: Mathematical, Computational, and


Statistical Analysis
Editor: Mary Fanett Wheeler

www.it-ebooks.info
IMA VOLUMES 685

Volume 80: Image Models (and their Speech Model Cousins)


Editors: Stephen E. Levinson and Larry Shepp

Volume 81: Genetic Mapping and DNA Sequencing


Editors: Terry Speed and Michael S. Waterman

Volume 82: Mathematical Approaches to Biomolecular Structure and Dy-


namics
Editors: Jill P. Mesirov, Klaus Schulten, and De Witt Sumners

Volume 83: Mathematics in Industrial Problems, Part VIII


by Avner Friedman

Volume 84: Classical and Modern Branching Processes


Editors: Krishna B. Athreya and Peter Jagers

Volume 85: Stochastics Models in Geosystems


Editors: Stanislav A. Molchanov and Wojbor A. Woyczynski

Volume 86: Computational Wave Propagation


Editors: Bjorn Engquist and Gregory A. Kriegsmann

Volume 87: Progress in Population Genetics and Human Evolution


Editors: Peter Donnelly and Simon Tavaré

Volume 88: Mathematics in Industrial Problems, Part 9


by Avner Friedman

Volume 89: Multiparticle Quantum Scattering with Applications to Nu-


clear, Atomic and Molecular Physics
Editors: Donald G. Truhlar and Barry Simon

Volume 90: Inverse Problems in Wave Propagation


Editors: Guy Chavent, George Papanicolaou, Paul Sacks, and
William Symes

Volume 91: Singularities and Oscillations


Editors: Jeffrey Rauch and Michael Taylor

Volume 92: Large-Scale Optimization with Applications, Part I: Opti-


mization in Inverse Problems and Design
Editors: Lorenz T. Biegler, Thomas F. Coleman, Andrew R. Conn,
and Fadil Santosa

www.it-ebooks.info
686 IMA VOLUMES

Volume 93: Large-Scale Optimization with Applications, Part II: Optimal


Design and Control
Editors: Lorenz T. Biegler, Thomas F. Coleman, Andrew R. Conn,
and Fadil Santosa

Volume 94: Large-Scale Optimization with Applications, Part III: Molec-


ular Structure and Optimization
Editors: Lorenz T. Biegler, Thomas F. Coleman, Andrew R. Conn,
and Fadil Santosa

Volume 95: Quasiclassical Methods


Editors: Jeffrey Rauch and Barry Simon

Volume 96: Wave Propagation in Complex Media


Editor: George Papanicolaou

Volume 97: Random Sets: Theory and Applications


Editors: John Goutsias, Ronald P.S. Mahler, and Hung T. Nguyen

Volume 98: Particulate Flows: Processing and Rheology


Editors: Donald A. Drew, Daniel D. Joseph, and Stephen L. Pass-
man

Volume 99: Mathematics of Multiscale Materials


Editors: Kenneth M. Golden, Geoffrey R. Grimmett, Richard D.
James, Graeme W. Milton, and Pabitra N. Sen

Volume 100: Mathematics in Industrial Problems, Part 10


by Avner Friedman

Volume 101: Nonlinear Optical Materials


Editor: Jerome V. Moloney

Volume 102: Numerical Methods for Polymeric Systems


Editor: Stuart G. Whittington

Volume 103: Topology and Geometry in Polymer Science


Editors: Stuart G. Whittington, De Witt Sumners, and Timo-
thy Lodge

Volume 104: Essays on Mathematical Robotics


Editors: John Baillieul, Shankar S. Sastry, and Hector J. Sussmann

Volume 105: Algorithms for Parallel Processing


Editors:Robert S. Schreiber, Michael T. Heath, and AbhiramRanade

www.it-ebooks.info
IMA VOLUMES 687

Volume 106: Parallel Processing of Discrete Problems


Editor: Panos Pardalos

Volume 107: The Mathematics of Information Coding, Extraction, and


Distribution
Editors: George Cybenko, Dianne O’Leary, and Jorma Rissanen

Volume 108: Rational Drug Design


Editors: Donald G. Truhlar, W. Jeffrey Howe, Anthony J. Hopfin-
ger, Jeff Blaney, and Richard A. Dammkoehler

Volume 109: Emerging Applications of Number Theory


Editors: Dennis A. Hejhal, Joel Friedman, Martin C. Gutzwiller,
and Andrew M. Odlyzko

Volume 110: Computational Radiology and Imaging: Therapy and Diag-


nostics
Editors: Christoph Börgers and Frank Natterer

Volume 111: Evolutionary Algorithms


Editors: Lawrence David Davis, Kenneth De Jong, Michael D.
Vose and L. Darrell Whitley

Volume 112: Statistics in Genetics


Editors: M. Elizabeth Halloran and Seymour Geisser

Volume 113: Grid Generation and Adaptive Algorithms


Editors: Marshall Bern, Joseph E. Flaherty, and Mitchell Luskin

Volume 114: Diagnosis and Prediction


Editor: Seymour Geisser

Volume 115: Pattern Formation in Continuous and Coupled Systems: A


Survey Volume
Editors: Martin Golubitsky, Dan Luss, and Steven H. Strogatz

Volume 116: Statistical Models in Epidemiology, the Environment and


Clinical Trials
Editors: M. Elizabeth Halloran and Donald Berry

Volume 117: Structured Adaptive Mesh Refinement (SAMR) Grid


Methods
Editors: Scott B. Baden, Nikos P. Chrisochoides, Dennis B. Gan-
non, and Michael L. Norman

www.it-ebooks.info
688 IMA VOLUMES

Volume 118: Dynamics of Algorithms


Editors: Rafael de la Llave, Linda R. Petzold, and Jens Lorenz

Volume 119: Numerical Methods for Bifurcation Problems and Large-


Scale Dynamical Systems
Editors: Eusebius Doedel and Laurette S. Tuckerman

Volume 120: Parallel Solution of Partial Differential Equations


Editors: Petter Bjørstad and Mitchell Luskin

Volume 121: Mathematical Models for Biological Pattern Formation


Editors: Philip K. Maini and Hans G. Othmer

Volume 122: Multiple-Time-Scale Dynamical Systems


Editors: Christopher K.R.T. Jones and Alexander Khibnik

Volume 123: Codes, Systems, and Graphical Models


Editors: Brian Marcus and Joachim Rosenthal

Volume 124: Computational Modeling in Biological Fluid Dynamics


Editors: Lisa J. Fauci and Shay Gueron

Volume 125: Mathematical Approaches for Emerging and Reemerging


Infectious Diseases: An Introduction
Editors: Carlos Castillo-Chavez with Sally Blower, Pauline van
den Driessche, Denise Kirschner, and Abdul-Aziz Yakubu

Volume 126: Mathematical Approaches for Emerging and Reemerging


Infectious Diseases: Models, Methods and Theory
Editors: Carlos Castillo-Chavez with Sally Blower, Pauline van
den Driessche, Denise Kirschner, and Abdul-Aziz Yakubu

Volume 127: Mathematics of the Internet: E-Auction and Markets


Editors: Brenda Dietrich and Rakesh V. Vohra

Volume 128: Decision Making Under Uncertainty: Energy and Power


Editors: Claude Greengard and Andrzej Ruszczynski

Volume 129: Membrane Transport and Renal Physiology


Editors: Harold E. Layton and Alan M. Weinstein

Volume 130: Atmospheric Modeling


Editors: David P. Chock and Gregory R. Carmichael

www.it-ebooks.info
IMA VOLUMES 689

Volume 131: Resource Recovery, Confinement, and Remediation of


Environmental Hazards
Editors: John Chadam, Al Cunningham, Richard E. Ewing, Peter
Ortoleva, and Mary Fanett Wheeler

Volume 132: Fractals in Multimedia


Editors: Michael F. Barnsley, Dietmar Saupe, and Edward Vrscay

Volume 133: Mathematical Methods in Computer Vision


Editors: Peter J. Olver and Allen Tannenbaum

Volume 134: Mathematical Systems Theory in Biology, Communications,


Computation, and Finance
Editors: Joachim Rosenthal and David S. Gilliam

Volume 135: Transport in Transition Regimes


Editors: Naoufel Ben Abdallah, Anton Arnold, Pierre Degond,
Irene Gamba, Robert Glassey, C. David Levermore, and Christian
Ringhofer

Volume 136: Dispersive Transport Equations and Multiscale Models


Editors: Naoufel Ben Abdallah, Anton Arnold, Pierre Degond,
Irene Gamba, Robert Glassey, C. David Levermore, and Christian
Ringhofer

Volume 137: Geometric Methods in Inverse Problems and PDE Control


Editors: Christopher B. Croke, Irena Lasiecka, Gunther Uhlmann,
and Michael S. Vogelius

Volume 138: Mathematical Foundations of Speech and Language


Processing
Editors: Mark Johnson, Sanjeev Khudanpur, Mari Ostendorf, and
Roni Rosenfeld

Volume 139: Time Series Analysis and Applications to Geophysical


Systems
Editors: David R. Brillinger, Enders Anthony Robinson, and Fred-
eric Paik Schoenberg

Volume 140: Probability and Partial Differential Equations in Modern


Applied Mathematics
Editors: Jinqiao Duan and Edward C. Waymire

Volume 141: Modeling of Soft Matter


Editors: Maria-Carme T. Calderer and Eugene M. Terentjev

www.it-ebooks.info
690 IMA VOLUMES

Volume 142: Compatible Spatial Discretizations


Editors: Douglas N. Arnold, Pavel B. Bochev, Richard B. Lehoucq,
Roy A. Nicolaides, and Mikhail Shashkov

Volume 143: Wireless Communications


Editors: Prathima Agrawal, Daniel Matthew Andrews, Philip J.
Fleming, George Yin, and Lisa Zhang

Volume 144: Symmetries and Overdetermined Systems of Partial Differ-


ential Equations
Editors: Michael Eastwood and Willard Miller, Jr.

Volume 145: Topics in Stochastic Analysis and Nonparametric Estimation


Editors: Pao-Liu Chow, Boris Mordukhovich, and George Yin

Volume 146: Algorithms in Algebraic Geometry


Editors: Alicia Dickenstein, Frank-Olaf Schreyer, and Andrew J.
Sommese

Volume 147: Symmetric Functionals on Random Matrices and Random


Matchings Problems by Grzegorz A. Rempala and Jacek Wesolowski

Volume 148: Software for Algebraic Geometry


Editors: Michael E. Stillman, Nobuki Takayama, and Jan Ver-
schelde

Volume 149: Emerging Applications of Algebraic Geometry


Editors: Mihai Putinar and Seth Sullivant

Volume 150: Mathematics of DNA Structure, Function, and Interactions


Editors: Craig John Benham, Stephen Harvey, Wilma K. Olson,
De Witt L. Sumners, and David Swigon

Volume 151: Nonlinear Computational Geometry


Editors: Ioannis Z. Emiris, Frank Sottile, and Thorsten Theobald

Volume 152: Towards Higher Categories


Editors: John C. Baez and J. Peter May

Volume 153: Nonlinear Conservation Laws and Applications


Editors: Alberto Bressan, Gui-Qiang Chen, Marta Lewicka, and
Dehua Wang

Volume 154: Mixed Integer Nonlinear Programming


Editors: Jon Lee and Sven Leyffer

www.it-ebooks.info
7KLVSDJHLQWHQWLRQDOO\OHIWEODQN

www.it-ebooks.info

You might also like