0% found this document useful (0 votes)
217 views

2010 Book IntegerProgrammingAndCombinato

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
217 views

2010 Book IntegerProgrammingAndCombinato

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 476

Lecture Notes in Computer Science 6080

Commenced Publication in 1973


Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Friedrich Eisenbrand F. Bruce Shepherd (Eds.)

Integer Programming
and Combinatorial
Optimization

14th International Conference, IPCO 2010


Lausanne, Switzerland, June 9-11, 2010
Proceedings

13
Volume Editors

Friedrich Eisenbrand
École Polytechnique Féderale de Lausanne
Institute of Mathematics
1015 Lausanne, Switzerland
E-mail: [email protected]

F. Bruce Shepherd
McGill University
Department of Mathematics and Statistics
805 Sherbrooke West, Montreal, Quebec, H3A 2K6, Canada
E-mail: [email protected]

Library of Congress Control Number: 2010926408

CR Subject Classification (1998): F.2, E.1, I.3.5, G.2, G.1.6, F.2.2

LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues

ISSN 0302-9743
ISBN-10 3-642-13035-6 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-13035-9 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2010
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper 06/3180
Preface

The idea of a refereed conference for the mathematical programming community


was proposed by Ravi Kannan and William Pulleyblank to the Mathematical
Programming Society (MPS) in the late 1980s. Thus IPCO was born, and MPS
has sponsored the conference as one of its main events since IPCO I at the
University of Waterloo in 1990. The conference has become the main forum for
recent results in Integer Programming and Combinatorial Optimization in the
non-Symposium years.
This volume compiles the papers presented at IPCO XIV held June 9-11,
2010, at EPFL in Lausanne. The scope of papers considered for IPCO XIV is
likely broader than at IPCO I. This is sometimes due to the wealth of new
questions and directions brought from related areas. It can also be due to the
successful application of “math programming” techniques to models not tradi-
tionally considered. In any case, the interest in IPCO is greater than ever and
this is reflected in both the number (135) and quality of the submissions. The
Programme Committee with 13 members was also IPCO’s largest. We thank the
members of the committee, as well as their sub-reviewers, for their exceptional
(and time-consuming) work and especially during the online committee meeting
held over January. The process resulted in the selection of 34 excellent research
papers which were presented in non-parallel sessions over three days in Lau-
sanne. Unavoidably, this has meant that many excellent submissions were not
able to be included. As is typical, we would expect to see full versions of many
of the IPCO papers in scientific journals in the not too distant future. Finally,
a sincere thanks to all authors who submitted their current research to IPCO.
It is this support that determines the excellence of the conference.

March 2010 Friedrich Eisenbrand


Bruce Shepherd
Conference Organization

Programme Committee
Alper Atamtürk UC Berkeley
David Avis McGill
Friedrich Eisenbrand EPFL
Marcos Goycoolea Adolfo Ibañez
Oktay Günlük IBM
Satoru Iwata Kyoto
Tamás Király Eötvös Budapest
François Margot CMU
Bruce Shepherd (Chair) McGill
Levent Tunçel Waterloo
Santosh Vempala Georgia Tech
Peter Winkler Dartmouth
Neal E. Young UC Riverside

Local Organization
Michel Bierlaire
Jocelyne Blanc
Friedrich Eisenbrand (Chair)
Thomas Liebling
Martin Niemeier
Thomas Rothvoß
Laura Sanità

External Reviewers
Tobias Achterberg John Birge
Ernst Althaus Jaroslaw Byrka
Reid Andersen Alberto Caprara
Matthew Andrews Deeparnab Chakrabarty
Elliot Anshelevich Chandra Chekuri
Gary Au Kevin Cheung
Mourad Baiou Marek Chrobak
Nina Balcan Jose Coelho de Pina
Nikhil Bansal Michelangelo Conforti
Andre Berger Miguel Constantino
Attila Bernáth Jose Correa
Dan Bienstock Sanjeeb Dash
VIII Organization

Santanu Dey Tamas Kis


David Eppstein Robert Kleinberg
Daniel Espinoza Yusuke Kobayashi
Guy Even Jochen Könemann
Uriel Feige Lingchen Kong
Zsolt Fekete Nitish Korula
Christina Fernandes Christos Koufogiannakis
Carlo Filippi Erika Kovacs
Samuel Fiorini Marek Krcal
Nathan Fisher Sven Krumke
Lisa Fleischer Simge Kucukyavuz
Keith Frikken Lap Chi Lau
Tetsuya Fujie Monique Laurent
Toshihiro Fujito Adam Letchford
Ricardo Fukasawa Asaf Levin
Joao Gouveia Sven Leyffer
Marcos Goycoolea Christian Liebchen
Fabrizio Grandoni Jeff Linderoth
Betrand Guenin Quentin Louveaux
Dave Hartvigsen James Luedtke
Christoph Helmberg Avner Magen
Hiroshi Hirai Dániel Marx
Dorit Hochbaum Monaldo Mastrolilli
Chien-Chung Huang Kurt Mehlhorn
Cor Hurkens Zoltan Miklos
Sungjin Im Hiroyoshi Miwa
Nicole Immorlica Atefeh Mohajeri
Toshimasa Ishii Eduardo Moreno
Takehiro Ito Yiannis Mourtos
Garud Iyengar Kiyohito Nagano
Kamal Jain Arkadi Nemirovski
Klaus Jansen Martin Niemeier
David Johnson Neil Olver
Tibor Jordan Gianpaolo Oriolo
Vincent Jost Gyula Pap
Alpár Jüttmer Julia Pap
Satyen Kale Gabor Pataki
George Karakostas Sebastian Pokutta
Anna Karlin Imre Polik
Sean Kennedy David Pritchard
Rohit Khandekar Kirk Pruhs
Sanjeev Khanna Linxia Qin
Samir Khuller Maurice Queyranne
Shuji Kijima R Ravi
Zoltan Kiraly Gerhard Reinelt
Organization IX

Thomas Rothvoß Tamas Szantai


Laura Sanità Tami Tamir
Andreas S. Schulz Torsten Tholey
Andras Sebo Rekha Thomas
David Shmoys László Végh
Marcel Silva Juan Vera
Mohit Singh Adrian Vetta
Christian Sommer Juan Pablo Vielma
Gregory Sorkin Jan Vondrak
Frits Spieksma David Wagner
Clifford Stein Gerhard Woeginger
Ruediger Stephan Mihalis Yannakakis
Nicolas Stier-Moses Giacomo Zambelli
Zoya Svitkina Rico Zenklusen
Chaitanya Swamy Miklos Zoltan
Jacint Szabo
Table of Contents

Solving LP Relaxations of Large-Scale Precedence Constrained


Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Daniel Bienstock and Mark Zuckerberg

Computing Minimum Multiway Cuts in Hypergraphs from Hypertree


Packings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Takuro Fukunaga

Eigenvalue Techniques for Convex Objective, Nonconvex Optimization


Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Daniel Bienstock

Restricted b-Matchings in Degree-Bounded Graphs . . . . . . . . . . . . . . . . . . . 43


Kristóf Bérczi and László A. Végh

Zero-Coefficient Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Kent Andersen and Robert Weismantel

Prize-Collecting Steiner Network Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 71


MohammadTaghi Hajiaghayi, Rohit Khandekar, Guy Kortsarz, and
Zeev Nutov

On Lifting Integer Variables in Minimal Inequalities . . . . . . . . . . . . . . . . . . 85


Amitabh Basu, Manoel Campelo, Michele Conforti,
Gérard Cornuéjols, and Giacomo Zambelli

Efficient Edge Splitting-Off Algorithms Maintaining All-Pairs


Edge-Connectivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Lap Chi Lau and Chun Kong Yung

On Generalizations of Network Design Problems with Degree Bounds . . . 110


Nikhil Bansal, Rohit Khandekar, Jochen Könemann,
Viswanath Nagarajan, and Britta Peis

A Polyhedral Study of the Mixed Integer Cut . . . . . . . . . . . . . . . . . . . . . . . . 124


Steve Tyber and Ellis L. Johnson

Symmetry Matters for the Sizes of Extended Formulations . . . . . . . . . . . . 135


Volker Kaibel, Kanstantsin Pashkovich, and Dirk Oliver Theis

A 3-Approximation for Facility Location with Uniform Capacities . . . . . . 149


Ankit Aggarwal, L. Anand, Manisha Bansal, Naveen Garg,
Neelima Gupta, Shubham Gupta, and Surabhi Jain
XII Table of Contents

Secretary Problems via Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . 163


Niv Buchbinder, Kamal Jain, and Mohit Singh

Branched Polyhedral Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177


Volker Kaibel and Andreas Loos

Hitting Diamonds and Growing Cacti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


Samuel Fiorini, Gwenaël Joret, and Ugo Pietropaoli

Approximability of 3- and 4-Hop Bounded Disjoint Paths Problems . . . . 205


Andreas Bley and Jose Neto

A Polynomial-Time Algorithm for Optimizing over N -Fold 4-Block


Decomposable Integer Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Raymond Hemmecke, Matthias Köppe, and Robert Weismantel

Universal Sequencing on a Single Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 230


Leah Epstein, Asaf Levin, Alberto Marchetti-Spaccamela,
Nicole Megow, Julián Mestre, Martin Skutella, and Leen Stougie

Fault-Tolerant Facility Location: A Randomized Dependent


LP-Rounding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Jaroslaw Byrka, Aravind Srinivasan, and Chaitanya Swamy

Integer Quadratic Quasi-polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258


Adam N. Letchford

An Integer Programming and Decomposition Approach to General


Chance-Constrained Mathematical Programs . . . . . . . . . . . . . . . . . . . . . . . . 271
James Luedtke

An Effective Branch-and-Bound Algorithm for Convex Quadratic


Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Christoph Buchheim, Alberto Caprara, and Andrea Lodi

Extending SDP Integrality Gaps to Sherali-Adams with Applications


to Quadratic Programming and MaxCutGain . . . . . . . . . . . . . . . . . . 299
Siavosh Benabbas and Avner Magen

The Price of Collusion in Series-Parallel Networks . . . . . . . . . . . . . . . . . . . . 313


Umang Bhaskar, Lisa Fleischer, and Chien-Chung Huang

The Chvátal-Gomory Closure of an Ellipsoid Is a Polyhedron . . . . . . . . . . 327


Santanu S. Dey and Juan Pablo Vielma

A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games with


Perfect Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Endre Boros, Khaled Elbassioni, Vladimir Gurvich, and
Kazuhisa Makino
Table of Contents XIII

On Column-Restricted and Priority Covering Integer Programs . . . . . . . . 355


Deeparnab Chakrabarty, Elyot Grant, and Jochen Könemann

On k-Column Sparse Packing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369


Nikhil Bansal, Nitish Korula, Viswanath Nagarajan, and
Aravind Srinivasan

Hypergraphic LP Relaxations for Steiner Trees . . . . . . . . . . . . . . . . . . . . . . 383


Deeparnab Chakrabarty, Jochen Könemann, and David Pritchard

Efficient Deterministic Algorithms for Finding a Minimum Cycle Basis


in Undirected Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Edoardo Amaldi, Claudio Iuliano, and Romeo Rizzi

Efficient Algorithms for Average Completion Time Scheduling . . . . . . . . . 411


René Sitters

Experiments with Two Row Tableau Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . 424


Santanu S. Dey, Andrea Lodi, Andrea Tramontani, and
Laurence A. Wolsey

An OP T + 1 Algorithm for the Cutting Stock Problem with Constant


Number of Object Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Klaus Jansen and Roberto Solis-Oba

On the Rank of Cutting-Plane Proof Systems . . . . . . . . . . . . . . . . . . . . . . . . 450


Sebastian Pokutta and Andreas S. Schulz

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465


Solving LP Relaxations of Large-Scale
Precedence Constrained Problems

Daniel Bienstock1 and Mark Zuckerberg2


1
APAM and IEOR Depts., Columbia University
2
Resource and Business Optimization Group Function, BHP Billiton Ltd.

Abstract. We describe new algorithms for solving linear programming


relaxations of very large precedence constrained production scheduling
problems. We present theory that motivates a new set of algorithmic
ideas that can be employed on a wide range of problems; on data sets
arising in the mining industry our algorithms prove effective on prob-
lems with many millions of variables and constraints, obtaining provably
optimal solutions in a few minutes of computation1 .

1 Introduction
We consider problems involving the scheduling of jobs over several periods sub-
ject to precedence constraints among the jobs as well as side-constraints. We
must choose the subset of jobs to be performed, and, for each of these jobs,
how to perform it, choosing from among a given set of options (representing
facilities or modes of operation). Finally, there are side-constraints to be satis-
fied, including period-wise, per-facility processing capacity constraints, among
others. There are standard representations of these problems as (mixed) integer
programs.
Our data sets originate in the mining industry, where problems typically have
a small number of side constraints - often well under one hundred – but may
contain millions of jobs and tens of millions of precedences, as well as spanning
multiple planning periods. Appropriate formulations often achieve small inte-
grality gap in practice; unfortunately, the linear programming relaxations are
far beyond the practical reach of commercial software.
We present a new iterative algorithm for solving the LP relaxation of this prob-
lem. The algorithm incorporates, at a low level, ideas from Lagrangian relaxation
and column generation, but is however based on fundamental observations on
the underlying combinatorial structure of precedence constrained, capacitated
optimization problems. Rather than updating dual information, the algorithm
uses primal structure gleaned from the solution of subproblems in order to ac-
celerate convergence. The general version of our ideas should be applicable to
a wide class of problems. The algorithm can be proved to converge to optimal-
ity; in practice we have found that even for problems with millions of variables
1
The first author was partially funded by a gift from BHP Billiton Ltd., and ONR
Award N000140910327.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 1–14, 2010.

c Springer-Verlag Berlin Heidelberg 2010
2 D. Bienstock and M. Zuckerberg

and tens of millions of constraints, convergence to proved optimality is usually


obtained in under twenty iterations, with each iteration requiring only a few
seconds on current computer hardware.

2 Definitions and Preliminaries

2.1 The Precedence Constrained Production Scheduling Problem


Definition 1. We are given a directed graph G = (N , A), where the elements
of N represent jobs, and the arcs A represent precedence relationships among
the jobs: for each (i, j) ∈ A, j can be performed no later than job i. Denote by
F , the number of facilities, and T , the number of scheduling periods.
Let yj,t ∈ {0, 1} represent the choice to process job j in period t, and xj,t,f ∈
[0, 1] represent the proportion of job j performed in period t, and processed ac-
cording to processing option, or “facility”, f .
Let cT x be an objective function, and let Dx ≤ d be a collection of arbitrary
“side” constraints.
The linear programming relaxation of the resulting problem, which we will
refer to as PCPSP, is as follows:

(PCPSP): max cT x (1)


t 
t
Subject to: yi,τ ≤ yj,τ , ∀(i, j) ∈ A, 1 ≤ t ≤ T (2)
τ =1 τ =1

Dx ≤ d (3)


F
yj,t = xj,t,f , ∀j ∈ N , 1 ≤ t ≤ T (4)
f =1


T
yj,t ≤ 1, ∀j ∈ N (5)
t=1

x ≥ 0. (6)

For precedence constrained production scheduling problems that occur in the


mining industry some typical numbers are as follows:
– 1 million – 10 million jobs, and 1 million – 100 million precedences,
– 20 – 200 side-constraints, 10 – 20 periods, and 2 – 3 facilities.
These numbers indicate that the number of constraints of the form (2), (4) and
(5) can be expected to be very large.
Solving LP Relaxations of Large-Scale Precedence Constrained Problems 3

2.2 Background
The Open Pit Mine Scheduling Problem. The practical motivating prob-
lem behind our study is the open pit mine scheduling problem. We are given a
three-dimensional region representing a mine to be exploited; this region is di-
vided into “blocks” (jobs, from a scheduling perspective) corresponding to units
of earth (“cubes”) that can be extracted in one step. In order for a block to be
extracted, the set of blocks located (broadly speaking) in a cone above it must
be extracted first. This gives rise to a set of precedences, i.e. to a directed graph
whose vertices are the blocks, and whose arcs represent the precedences. Finally,
the extraction of a block entails a certain (net) profit or cost.
The problem of selecting which blocks to extract so as to maximize profit can
be stated as follows:
 
max cT x : xi ≤ xj ∀ (i, j) ∈ A, xj ∈ {0, 1} ∀j ,

where as before A indicates the set of precedences. This is the so-called maximum
weight closure problem – in a directed graph, a closure is a set S of vertices such
that there exist no arcs (i, j) with i ∈ S and j ∈
/ S. It can be solved as a minimum
s − t cut problem in a related graph of roughly the same size. See [P76], and also
[J68], [Bal70] and [R70]. Further discussion can be found in [HC00], where the
authors note (at the end of Section 3.4) that it can be shown by reduction from
max clique that adding a single cardinality constraint to a max closure problem
is enough to make it NP-hard. For additional related material see [F06], [LG65],
[CH03], and references therein.
The problem we are concerned with here, by contrast, also incorporates pro-
duction scheduling. When a block is extracted it will be processed at one of
several facilities with different operating capabilities. The processing of a given
block i at a given facility f consumes a certain amount of processing capacity
vif and generates a certain net profit pif . This overall planning problem spans
several time periods; in each period we will have one or more knapsack (capac-
ity) constraints for each facility. We usually will also have additional, ad-hoc,
non-knapsack constraints. In this version the precedence constraints apply across
periods as per (2): if (i, j) ∈ A then j can only be extracted in the same or in a
later period than i.
Typically, we need to produce schedules spanning 10 to 20 periods. Addi-
tionally, we may have tens of thousands (or many more) blocks; this can easily
make for an optimization problem with millions of variables and tens of millions
of precedence constraints, but with (say) on the order of one hundred or fewer
processing capacity constraints (since the total number of processing facilities is
typically small).

Previous work. A great deal of research has been directed toward algorithms
for the maximum weight closure problems, starting with [LG65] and culminat-
ing in the very efficient method described in [H08] (also see [CH09]). A “nested
shells” heuristic for the capacitated, multiperiod problem, based on the work in
[LG65], is applicable to problems with a single capacity constraint, among other
4 D. Bienstock and M. Zuckerberg

simplifications. As commercial integer programming software has improved, mine


scheduling software packages have recently emerged that aggregate blocks in order
to yield a mixed integer program of tractable size. The required degree of aggrega-
tion can however be enormous; this can can severely compromise the validity and
the usefulness of the solution. For an overview of other heuristic approaches that
have appeared in the open mine planning literature [HC00] and [F06].
Recently (and independent of our work) there has been some new work rele-
vant to the solution of the LP relaxation of the open pit mine scheduling problem.
[BDFG09]2 have suggested a new approach in which blocks are aggregated only
with respect to the digging decisions but not with respect to the processing
decisions, i.e. all original blocks in an aggregate must be extracted in a com-
mon period, but the individual blocks comprising an aggregate can be processed
in different ways. This problem is referred to by the authors as the ”Optimal
Binning Problem”. As long as there is more than one processing option this ap-
proach still maintains variables for each block and period and is therefore still
very large, but the authors propose an algorithm for the LP relaxation of this
problem that is only required to solve a sequence of linear programs with a num-
ber of variables on the order of the number of aggregates (times the number of
periods) in order to come to a solution of the large LP. Thus if the number of
aggregates is small the LP can be solved quickly.
Another development that has come to our attention recently is an algorithm
by [CEGMR09] which can solve the LP relaxation of even very large instances
of the open pit mine scheduling problem very efficiently. This algorithm is only
applicable however to problems for which there is a single processing option and
for which the only constraints are knapsacks and there is a single such constraint
in each scheduling period. The authors note however that more general problems
can be relaxed to have this form in order to yield an upper bound on the solution
value.
From a broad perspective, the method we give below uses dual information in
order to effectively reduce the size of the linear program; in this sense our work
is similar to that in [BDFG09]. In the full version of this paper we describe what
our algorithm would look like when applied to the aggregated problem treated
by [BDFG09], which is a special case of ours. The relationship between the max
closure problem and the LP is a theme in common with the work of [CEGMR09].

3 Our Results
Empirically, it can be observed that formulation (1-6) frequently has small in-
tegrality gap. We present a new algorithm for solving the continuous relaxation
of this formulation and generalizations. Our algorithm is applicable to problems
with an arbitrary number of process options and arbitrary side constraints, and
it requires no aggregation. On very large, real-world instances our algorithm
proves very efficient.
2
We interacted with [BDFG09] as part of an industrial partnership, but our work was
performed independently.
Solving LP Relaxations of Large-Scale Precedence Constrained Problems 5

Our algorithmic developments hinge on three ideas. In order to describe these


ideas, we will first recast PCPSP as a special case of a more general problem, to
which these results (and our solution techniques) apply.

Definition 2. Given a directed graph G = (N , A) with n vertices, and a system


Dx ≤ d of d constraints on n variables, the General Precedence Constrained
Problem is the following linear program:

(GPCP): max cT x (7)


Dx ≤ d (8)
xi − xj ≤ 0, ∀ (i, j) ∈ A, (9)
0 ≤ xj ≤ 1, ∀ j ∈ N . (10)

This problem is more general than PCPSP:

Lemma 1. Any instance of PCPSP can be reduced to an equivalent instance of


GP CP with the same number of variables and of constraints.

Proof. Consider an instance of PCPSP on G = (N , A), with T time periods,


F facilities and side constraints Dx ≤ d. Note that the y variables can be
eliminated. Consider the following system of inequalities on variables zj,t,f (j ∈
N , 1 ≤ t ≤ T , 1 ≤ f ≤ F ):

zj,t,f − zj,t,f +1 ≤ 0, ∀ j ∈ N , 1 ≤ t ≤ T, 1 ≤ f < F, (11)


zj,t,F − zj,t+1,1 ≤ 0, ∀ j ∈ N , 1 ≤ t < T, (12)
zj,T,F ≤ 1, j ∈ N , (13)
zi,t,F − zj,t,F ≤ 0, ∀ (i, j) ∈ A, 1 ≤ t ≤ T, (14)
z ≥ 0. (15)

Given a solution (x, y) to PCPSP, we obtain a solution z to (11)-(15) by setting,


for all j, t and f :


t−1 
F 
f
zj,t,f = xj,τ,f  + xj,t,f  ,
τ =1 f  =1 f  =1

and conversely. Thus, for an appropriate system D̄z ≤ d¯ (with the same number
of rows as Dx ≤ d) and objective c̄T z, PCPSP is equivalent to the linear program:
¯ and constraints (11)-(15)}.
min{c̄T z : D̄z ≤ d,

Note: In Lemma 1 the number of precedences in the instance of GP CP is larger


than in the original instance of PCPSP; nevertheless we stress that the number
of constraints (and variables) is indeed the same in both instances.
We will now describe ideas that apply to GP CP . First, we have the following
remark.
6 D. Bienstock and M. Zuckerberg

Observation 1. Consider an instance of problem GP CP , and let π ≥ 0 be a


given vector of dual variables for the side-constraints (8). Then the Lagrangian
obtained by dualizing (8) using π,

max cT x + π T (d − Dx) (16)


Subject to: xi − xj ≤ 0, ∀ (i, j) ∈ A (17)
0 ≤ xj ≤ 1, ∀ j ∈ N . (18)

is a maximum closure problem with |A| precedences.


Note: There is a stronger version of Observation 1 in the specific case of problem
PCPSP; namely, the x variables can be eliminated from the Lagrangian (details:
full paper, also see [BZ09]).
Observation 1 suggests that a Lagrangian relaxation algorithm for solving
problem GP CP – that is to say, an algorithm that iterates by solving prob-
lems of the form (16-18) for various vectors π – would enjoy fast individual
iterations. This is correct, as our experiments confirm that even extremely large
max closure instances can be solved quite fast using the appropriate algorithm
(details, below). However, in our experiments we also observed that traditional
Lagrangian relaxation methods (such as subgradient optimization), applied to
GP CP , performed quite poorly, requiring vast numbers of iterations and not
quite converging to solutions with desirable accuracy.
Our approach, instead, relies on leveraging combinatorial structure that opti-
mal solutions to GP CP must satisfy. Lemmas 2 and 3 are critical in suggesting
such structure.
Lemma 2. Let P = {x ∈ Rn : Ax ≤ b, Dx ≤ d}, where A, D, b, d are matrices
and vectors of appropriate dimensions. Let x̂ be an extreme point of P . Let
Āx = b̄, D̄x = d¯ be the set of binding constraints at x̂. Assume D̄ has q linearly
independent rows, and let N x̂ be the null space of Ā. Then dim(N x̂ ) ≤ q.
Proof: Ā must have at least n − q linearly independent rows and thus its null
space must have dimension ≤ q.
Lemma 3. Let P be the feasible space of a GP CP with q side constraints.
Denote by Ax ≤ b the subset of constraints containing the precedence constraints
and the constraints 0 ≤ x ≤ 1, and let Dx ≤ d denote the side constraints. Let x̂
be an extreme point of P , and the entries of x̂ attain k distinct fractional values
{α1 , . . . , αk }. For 1 ≤ r ≤ k, let θr ∈ {0, 1}n be defined by:

1, if x̂j = αr ,
for 1 ≤ j ≤ n, θj =r
0, otherwise.

Let Ā be the submatrix of A containing the binding constraints at x̂. Then the
vectors θr are linearly independent and belong to the null space of Ā. As a con-
sequence, k ≤ q.
Proof: First we prove that Āθr = 0. Given a precedence constraint xi − xj ≤ 0,
if the constraint is binding then x̂i = x̂j . Thus if x̂i = αr , so that θir = 1, then
Solving LP Relaxations of Large-Scale Precedence Constrained Problems 7

x̂j = αr also, and so θjr = 1 as well, and so θir − θjr = 0. By the same token if
x̂i = αr then x̂j = αr and again θir − θjr = 0. If a constraint xi ≥ 0 or xi ≤ 1 is
binding at x̂ then naturally θir = 0 for all r as x̂i is not fractional. The supports
of the θr vectors are disjoint, yielding linear independence. Finally, k ≤ q follows
from Lemma 2.

Observation 1 implies that an optimal solution x∗ to an instance of GP CP can


be written as a weighted sum of incidence vectors of closures, i.e.,

Q

x = μq v q , (19)
q=1

where μ ≥ 0, and, for each q, v q ∈ {0, 1}n is the incidence vector of a closure
S q ⊂ N . [In fact, the S q can be assumed to be nested]. So for any i, j ∈ N ,
x∗j = x∗i if i and j belong to precisely the same family of sets S q . Also, Lemma 3
states that the number of distinct values that x∗j can take is small, if the number
of side constraints is small. Therefore it can be shown that when the number
of side constraints is small the number of closures (terms) in (19) must also be
small. In the full paper we show that a rich relationship exists between the max
closures produced by Lagrangian problems and the optimal dual and primal
solutions to GP CP . Next, we will develop an algorithm that solves GP CP by
attempting to “guess” the correct representation (19).
First, we present a result that partially generalizes Lemma 3.

Theorem 2. Let P , A, Ā, D, q, x̂ and N x̂ be as in Lemma 2, and assume


additionally that A is totally unimodular and that b is integral. Define

I x̂ = {y ∈ Rn : yi = 0, ∀i s.t. x̂i is integer}. (20)

Then there exists an integral vector xi ∈ Rn , and vectors θh ∈ Rn , 1 ≤ h ≤ q,


such that:
(a) Axi ≤ b,
(b) Āxi = b̄,
(c) xij = x̂j , ∀j s.t. x̂j is integer,

(d) x̂ = xi + qr=1 αr θr , for some α ∈ Rq ,
(e) The set {θ1 , . . . , θq } spans N x̂ ∩ I x̂ ,
(f ) |θjh | ≤ rank(Ā), for all 1 ≤ h ≤ q and 1 ≤ j ≤ n,

In the special case of the GP CP , we can choose xi satisfying the additional


condition:
(g) xij = 0, for all j such that x̂j is fractional.

Proof sketch: Let us refer to the integer coordinates of x as xI and to the


corresponding columns of A as AI , and to the fractional coordinates of x as xF ,
and to the corresponding columns of A as AF . Let h be the number of columns
in AF . Note that b − AI xI is integer, and so by total unimodularity there exists
8 D. Bienstock and M. Zuckerberg

integer y ∈ Rh satisfying AF y ≤ b − AI xI , ĀF y = b̄ − ĀI xI . Defining now


xi = (xI , y) then xi is integer; it is equal to x everywhere that x is integer, and
it satisfies Axi ≤ b and Āxi = b̄. Clearly x − xi belongs to I x , and moreover
Ā(x − xi )= 0 so that it belongs to N x as well, and so it can be decomposed as
q
x − x = r=1 αr θr . For the special case of GP CP we have already described
i

a decomposition for which xi equals x everywhere that x is integer and is zero


elsewhere. See the full paper for other details.

Comment: Note that rank(Ā) can be high and thus condition (d) is not quite
as strong as Lemma 3; nevertheless q is small in any case and so we obtain
a decomposition of x̂ into “few” terms when the number of side-constraints is
“small”. Theorem 2 can be strengthened for specific families of totally unimodu-
lar matrices. For example, when A is the node-arc incidence matrix of a digraph,
the θ vectors are incidence vectors of cycles, which yields the following corollary.

Corollary 1. Let P be the feasible set for a minimum cost network flow problem
with integer data and side constraints. Let x̂ be an extreme point of P , and let
q be the number of linearly independent side constraints that are binding at x̂.
Let ζ = {j : x̂j integral}. Then x̂ can be decomposed into the sum of an integer
vector v satisfying all network flow (but not necessarily side) constraints, and
with vj = x̂j ∀j ∈ ζ, and a sum of no more than q fractional cycle flows, over a
set of cycles disjoint from ζ.

4 A General Algorithmic Template


Now we return to the generic algorithm for GP CP that attempts to guess the
right representation of an optimal solution as a weighted sum of incidence vectors
of “few” closures. To motivate our approach, we first consider a more general
situation. We are given a linear program:

(P1 ) : max cT x
s.t. Ax ≤ b
Dx ≤ d. (21)

Denote by L(P1 , μ) the Lagrangian relaxation in which constraints (21) are du-
alized with penalties μ, i.e. the problem max{cT x + μT (d − Dx) : Ax ≤ b}.
One can approach problem P1 by means of Lagrangian relaxation, i.e. an algo-
rithm that iterates by solving multiple problems L(P1 , μ) for different choices of
μ; the multipliers μ are updated according to some procedure. A starting point
for our work concerns the fact that traditional Lagrangian relaxation schemes
(such as subgradient optimization) can prove frustratingly slow to achieve con-
vergence, often requiring seemingly instance-dependent choices of algorithmic
parameters. They also do not typically yield optimal feasible primal solutions;
in fact frequently failing to deliver a sufficiently accurate solutions (primal or
dual). However, as observed in [B02] (and also see [BA00]) Lagrangian relaxation
schemes can discover useful “structure.”
Solving LP Relaxations of Large-Scale Precedence Constrained Problems 9

For example, Lagrangian relaxation can provide early information on which


constraints from among those that were dualized are likely to be tight, and on
which variables x are likely to be nonzero, even if the actual numerical values
for primal or dual variables computed by the relaxation are inaccurate. The
question then is how to use such structure in order to accelerate convergence
and to obtain higher accuracy. In [B02] the following approach was used:

– Periodically, interrupt the Lagrangian relaxation scheme to solve a restricted


linear program consisting of P1 with some additional constraints used to im-
pose the desired structure. Then use the duals for constraints (21) obtained
in the solution to the restricted LP to restart the Lagrangian procedure.

The restricted linear program includes all constraints, and thus could (poten-
tially) still be very hard – the idea is that the structure we have imposed renders
the LP much easier. Further, the LP includes all constraints, and thus the solu-
tion we obtain is fully feasible for P1 , thus proving a lower bound. Moreover, if
our guess as to “structure” is correct, we also obtain a high-quality dual feasible
vector, and our use of this vector so as to restart the Lagrangian scheme should
result in accelerated convergence (as well as proving an upper bound on P1 ). In
[B02] these observations were experimentally verified in the context of several
problem classes.

1. Set μ0 = 0 and set k = 1.

2. Solve L(P1 , μk−1 ). Let wk be an optimal solution.


If k > 1 and H k−1 wk = hk−1 , STOP.

3. Let H k x = hk be a system of equations satisfied by wk .

4. Define the restricted problem:

(P2k ) : max cT x
s.t. Ax ≤ b, Dx ≤ d, H k x = hk .

5. Solve P2k to obtain xk , an optimal primal vector (with value z k ) and


μk , an optimal dual vector corresponding to constraints Dx ≤ d.
If μk = μk−1 , STOP.

6. Set k = k + 1 and goto Step 2.

Fig. 1. Algorithmic template for solving P1

In this work we extend and recast these ideas in a generalized framework as an


algorithm to systematically extract information from the Lagrangian and from
restricted LP’s symbiotically so as to solve the Lagrangian and the primal LP
simultaneously.
10 D. Bienstock and M. Zuckerberg

In the template in Figure 1, at each iteration k we employ a linear system


H k x = hk that represents a structure satisfied by the current iteration’s La-
grangian solution and which can be interpreted as an educated guess for condi-
tions that an optimal solution to P1 should satisfy. This is problem-specific; we
will indicate later how this structure is discovered in the context of GP CP .
Notes:
1. Ideally, imposing H k x = hk in Step 4 should result in an easier linear program.
2. For simplicity, in what follows we will assume that P2k is always feasible;
though this is a requirement that can be easily circumvented in practice (full
paper).

Theorem 3. (a) If the algorithm stops at iteration k in Step 2, then xk−1 is


optimal for P1 . (b) If it stops in Step 5 then xk is optimal for P1 .

Proof: (a) We have

z k−1 = max{cT x + μTk−1 (d − Dx) : Ax ≤ b, H k−1 x = hk−1 } =


cT wk + μTk−1 (d − Dwk ),

where the first equality follows by duality and the second by definition of wk in
Step 2 since H k−1 wk = hk−1 . Also, clearly z k−1 ≤ z ∗ , and so in summary

z ∗ ≤ cT wk + μTk−1 (d − Dwk ) = z k−1 ≤ z ∗ . (22)

(b) μk = μk−1 implies that wk optimally solves L(P1 , μk ), so that we could


choose wk+1 = wk and so H k wk+1 = hk , obtaining case (a) again.

4.1 Applying the Template


We will now apply the above template to a case where P1 is an instance of GPCP
where, as before, we denote by N the set of jobs; and we use Ax ≤ b to describe
the precedences and the box constraints 0 ≤ xj ≤ 1 (∀j ∈ N ), and Dx ≤ d
denotes the side-constraints.
Thus, in Step 2 of the template, L(P1 , μk−1 ) can be solved as a max closure
problem (Observation 1); therefore its solution can be described by a 0/1-vector
which we will denote by y k . Recall that Lemma 3 implies that where D has m
rows, an optimal extreme point solution to GPCP has q ≤ m + 2 distinct q values
0 ≤ α1 < α1 < · · · < αq ≤ 1 and can therefore be written x∗ = r=1 αr θr ,
where for 1 ≤ r ≤ q, Vjr = {j ∈ N : x∗j = αr }, and θr is the incidence vector
of V r .
The structure that we will “guess” has been exposed by the current iter-
ate’s Lagrangian solution is that the nodes inside the max closure should be
distinguished from those nodes outside, i.e. that the nodes inside should not be
required to take the same value in the LP solution as those outside. Given an
existing partition of the nodeset N that represented our previous guess as to the
sets {V r }, this guess at structure implies a refinement of this partition. We will
Solving LP Relaxations of Large-Scale Precedence Constrained Problems 11

note later that this partition never needs more than a small number of elements
for the algorithm to converge.
At iteration k, we denote by C k = {C1k , . . . , Crkk } the constructed partition of
N . Our basic application of the template is as follows:
GPCP Algorithm
1. Set μ0 = 0. Set r0 = 1, = N , C 0 = {C10 }, z 0 = −∞, and k = 1.
C10
k
2. Let y be an optimal solution to L(P1 , μk−1 ), and define
I k = {j ∈ N : yjk = 1} (23)
and define
Ok = {j ∈ N : yjk = 0}. (24)
If k > 1, and, for 1 ≤ h ≤ rk−1 , either Chk−1 ∩ I k = ∅ or Chk−1 ∩ Ok = ∅,
then STOP.
3. Let C k = {C1k , . . . , Crkk } consist of all nonempty sets in the collection
 k   
I ∩ Chk−1 : 1 ≤ h ≤ rk−1 ∪ Ok ∩ Chk−1 : 1 ≤ h ≤ rk−1 .
Let H k x = hk consist of the constraints:
xi = xj , for 1 ≤ h ≤ rk , and each pair i, j ∈ Chk .
4. Let P2k consist of P1 , plus the additional constraints H k x = hk .
5. Solve P2k , with optimal solution xk , and let μk denote the optimal duals
corresponding to the side-constraints Dx ≤ d. If μk = μk−1 , STOP.
6. Set k = k + 1 and goto Step 2.
We have:
Lemma 4. (a) For each k, problem P2k is an instance of GPCP with rk variables
and the same number of side-constraints as in Dx ≤ d. (b) If P21 is feasible, the
above algorithm terminates finitely with an optimal solution.
Proof: full paper.

Comments: Since each problem P2k is a GPCP, its extreme point solution xk
never attains more than m + 2 distinct values (where m is the number of linearly
independent rows in D), and thus the partition C k can be coarsened while main-
taining the feasibility of xk by merging the sets Cjk with common xk values. Note
also that in choosing C k+1 to be a refinement of C k the LP solution xk remains
available to the problem P2k+1 . The above algorithm is a basic application of
the template. Finer partitions than {I k , Ok } may also be used. The feasibility
assumption in (b) of Lemma 4 can be bypassed. Details will be provided in the
full paper.
In the full paper an analysis is presented that explains why the structure
exposed by the Lagrangian solutions can be expected to point the algorithm in
the right direction. In particular, the solution to the Lagrangian obtained by
using optimal duals for the side constraints can be shown to exhibit significant
structure.
12 D. Bienstock and M. Zuckerberg

Table 1. Sample runs, 1

Marvin Mine1B Mine2 Mine3,s Mine3,b

Jobs 9400 29277 96821 2975 177843


Precedences 145640 1271207 1053105 1748 2762864
Periods 14 14 25 8 8
Facilities 2 2 2 8 8
Variables 199626 571144 3782250 18970 3503095
Constraints 2048388 17826203 26424496 9593 19935500
Problem arcs 2229186 18338765 30013104 24789 23152350
Side-constraints 28 28 50 120 132
Binding side-constr.
at optimum 14 11 23 33 44
Cplex
time (sec) 55544 — — 5 —

Algorithm Performance

Iterations to 10−5
optimality 8 8 9 13 30
Time to 10−5
optimality (sec) 10 60 344 1 1076
Iterations to
comb. optimality 11 12 16 15 39
Time to comb.
optimality (sec) 15 96 649 1 1583

5 Computational Experiments
In this section we present results from some of our experiments. A more complete
set of results will be presented in the full paper. All these tests were conducted
using a single core of a dual quad-core 3.2 GHz Xeon machine with 64 GB of
memory. The LP solver we used was Cplex, version 12 and the min cut solver
we used was our implementation of Hochbaum’s pseudoflow algorithm ([H08]).
The tests reported on in Tables 1 and 2 are based on three real-world ex-
amples provided by BHP Billiton3 , to which we refer as ’Mine1’, ’Mine2’ and
’Mine3’ and a synthetic but realistic model called ’Marvin’ which is included
with Gemcom’s Whittle [W] mine planning software. ’Mine1B’ is a modifica-
tion of Mine1 with a denser precedence graph. Mine3 comes in two versions to
which we refer as ’big’ and ’small’. Using Mine1, we also obtained smaller and
larger problems by modifying the data in a number of realistic ways. Some of
the row entries in these tables are self-explanatory; the others have the following
meaning:
3
Data was masked.
Solving LP Relaxations of Large-Scale Precedence Constrained Problems 13

– Problem arcs. The number of arcs in the graph that the algorithm creates to
represent the scheduling problem (i.e., the size of the min cut problems we solve).
– Iterations, time to 10−5 optimality. The number of iterations (resp.,
the CPU time) taken by the algorithm until it obtained a solution it could
certify as having ≤ 10−5 relative optimality error.
– Iterations, time to combinatorial optimality. The number of iterations
(resp., the CPU time) taken by the algorithm to obtain a solution it could cer-
tify as optimal as per the stopping criteria in Steps 2 or 5. Notice that this
implies that the solution is optimal as per the numerical tolerances of Cplex.
Finally, an entry of ”—” indicates that Cplex was unable to terminate after 100000
seconds of CPU time. More detailed analyses will appear in the full paper.

Table 2. Sample runs, 2

Mine1 very Mine1 Mine1 Mine1 Mine1,3


small medium large full weekly

Jobs 755 7636 15003 29277 87831


Precedences 222 22671 113703 985011 2955033
Periods 12 12 12 12 100
Facilities 2 2 2 2 2
Variables 14282 160944 292800 489552 12238800
Constraints 8834 327628 1457684 11849433 295591331
Problem arcs 22232 477632 1727565 12280407 307654269
Side-constraints 24 24 24 24 200
Binding side-constr.
at optimum 12 11 11 11 151
Cplex
time (sec) 1 12424 — — —

Algorithm Performance

Iterations to 10−5
optimality 6 6 8 7 10
Time to 10−5
optimality (sec) 0 1 7 45 2875
Iterations to
comb. optimality 7 7 11 9 20
Time to comb.
optimality (sec) 0 2 10 61 6633

References
[Bal70] Balinsky, M.L.: On a selection problem. Management Science 17, 230–
231 (1970)
[BA00] Barahona, F., Anbil, R.: The Volume Algorithm: producing primal solu-
tions with a subgradient method. Math. Programming 87, 385–399 (2000)
14 D. Bienstock and M. Zuckerberg

[B02] Bienstock, D.: Potential Function Methods for Approximately Solving


Linear Programming Problems, Theory and Practice. Kluwer Academic
Publishers, Boston (2002), ISBN 1-4020-7173-6
[BZ09] Bienstock, D., Zuckerberg, M.: A new LP algorithm for precedence con-
strained production scheduling, posted on Optimization Online (August
2009)
[BDFG09] Boland, N., Dumitrescu, I., Froyland, G., Gleixner, A.M.: LP-based
disaggregation approaches to solving the open pit mining production
scheduling problem with block processing selectivity. Computers and
Operations Research 36, 1064–1089 (2009)
[CH03] Caccetta, L., Hill, S.P.: An application of branch and cut to open pit
mine scheduling. Journal of Global Optimization 27, 349–365 (2003)
[CH09] Chandran, B., Hochbaum, D.: A Computational Study of the Pseud-
oflow and Push-Relabel Algorithms for the Maximum Flow Problem.
Operations Research 57, 358–376 (2009)
[CEGMR09] Chicoisne, R., Espinoza, D., Goycoolea, M., Morena, E., Rubio, E.: A
New Algorithm for the Open-Pit Mine Scheduling Problem (submitted
for publication), http://mgoycool.uai.cl/
[F06] Fricke, C.: Applications of integer programming in open pit mine plan-
ning, PhD thesis, Department of Mathematics and Statistics, The Uni-
versity of Melbourne (2006)
[H08] Hochbaum, D.: The pseudoflow algorithm: a new algorithm for the max-
imum flow problem. Operations Research 58, 992–1009 (2008)
[HC00] Hochbaum, D., Chen, A.: Improved planning for the open - pit mining
problem. Operations Research 48, 894–914 (2000)
[J68] Johnson, T.B.: Optimum open pit mine production scheduling, PhD the-
sis, Operations Research Department, University of California, Berkeley
(1968)
[LG65] Lerchs, H., Grossman, I.F.: Optimum design of open-pit mines. Trans-
actions C.I.M. 68, 17–24 (1965)
[P76] Picard, J.C.: Maximal Closure of a graph and applications to combina-
torial problems. Management Science 22, 1268–1272 (1976)
[R70] Rhys, J.M.W.: A selection problem of shared fixed costs and network
flows. Management Science 17, 200–207 (1970)
[W] Gemcom Software International, Vancouver, BC, Canada
Computing Minimum Multiway Cuts in
Hypergraphs from Hypertree Packings

Takuro Fukunaga

Department of Applied Mathematics and Physics,


Graduate School of Informatics, Kyoto University, Japan
[email protected]

Abstract. Hypergraph k-cut problem is a problem of finding a mini-


mum capacity set of hyperedges whose removal divides a given hyper-
graph into k connected components. We present an algorithm for this
problem which runs in strongly polynomial-time if both k and the rank
of the hypergraph are constants. Our algorithm extends the algorithm
due to Thorup (2008) for computing minimum k-cuts of graphs from
greedy packings of spanning trees.

1 Introduction

Let Q+ denote the set of non-negative rationals. For a connected hypergraph


H = (V, E) with a non-negative hyperedge capacity c : E → Q+ and an integer
k ≥ 2, a k-cut of H is defined as a subset of E whose removal divides H into
k connected components. Hypergraph k-cut problem is a problem of finding a
minimum capacity k-cut of a hypergraph. If the given hypergraph is a graph,
then the problem is called graph k-cut problem.
The graph k-cut problem is one of the fundamental problems in combinatorial
optimization. It is closely related to the reliability of networks, and has many
applications, for example, to the traveling salesperson problem, VLSI design,
and evolutionary tree construction [4,14]. By Goldschmidt and Hochbaum [6], it
is shown that the problem is NP-hard when k is not fixed, and polynomial-time
solvable when k is fixed to a constant. After their work, there are many works
on the algorithmic aspect of this problem.
In spite of these active studies on the graph k-cut problem, there are few works
on the hypergraph k-cut problem. If k is not fixed, the NP-hardness of the graph k-
cut problem implies that of the hypergraph k-cut problem. When k = 2, the k-cut
problem is usually called the minimum cut problem. Klimmek and Wagner [9] and
Mak and Wong [13] extended an algorithm proposed by Stoer and Wagner [16] for
the minimum cut problem in graphs to hypergraphs. Lawler [10] showed that the
(s, t)-cut problem in hypergraphs can be reduced to computing maximum flows
in digraphs. For the case of k = 3, Xiao [19] gave a polynomial-time algorithm.

This work was partially supported by Grant-in-Aid for Scientific Research from the
Ministry of Education, Culture, Sports, Science and Technology of Japan.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 15–28, 2010.

c Springer-Verlag Berlin Heidelberg 2010
16 T. Fukunaga

However, it is not known whether the hypergraph k-cut problem is polynomial


solvable or NP-hard when k is a constant larger than 3.
In this paper, we partially answer this question. We present an algorithm
which runs in strongly polynomial-time if k and the rank γ of hyperedges (i.e.,
γ = maxe∈E |e|) are fixed to constants. Since graphs can be regarded as hyper-
graphs with γ = 2, this result extends the polynomial-solvability of the graph
k-cut problem.
Our algorithm is based on the idea due to Thorup [18], which is success-
fully applied to the graph k-cut problem. He showed that a maximum spanning
tree packing of a graph contains a spanning tree sharing at most a constant
number of edges with a minimum k-cut of the graph. Although this fact itself
gives a strongly polynomial-time algorithm for computing the minimum k-cuts
of graphs, he also showed that a set of spanning trees constructed in a greedy
way has the same property. Based on this fact, he gave the fastest algorithm
to the graph k-cut problem. In this paper, we show that these facts can be ex-
tended to hypergraphs with a hypertree packing theorem due to Frank, Király
and Kriesell [2] (see Section 3).
Let us mention the previous works on problems related to the hypergraph
k-cut problem. As mentioned above, the first polynomial-time algorithm for the
graph k-cut problem with fixed k was presented by Goldschmidt and
2
Hochbaum [6]. Its running time is O(nk T (n, m)) where T (n, m) is time for
computing max-flow in a graph consisting of n vertices and m edges. T (n, m) is
known to be O(mn log(n2 /m)) for now [5]. After their work, many polynomial-
time algorithms for fixed k are obtained. An√algorithm due to Kamidoi, Yoshida
and Nagamochi [7] runs in O(n4k/(1−1.71/ k)−34 T (n, m)). An algorithm due
to Xiao [20] runs in O(n4k−log k ). An algorithm due to Thorup [17] runs in
Õ(n2k ). In addition, Karger and Stein [8] gave a randomized algorithm running
in O(n2(k−1) log3 n).
For the hypergraph k-cut problem, Xiao [19] gave a polynomial-time divide-
and-conquer algorithm for k = 3. Zhao, Nagamochi and Ibaraki [21] gave an
approximation algorithm. It achieves the approximation factor (1 − k2 ) min{k, γ}
for k ≥ 4 by using the Xiao’s algorithm due for k = 3 as a subroutine. Moreover,
it is shown by Okumoto, Fukunaga and Nagamochi [15] that the problem can be
reduced to the terminal k-vertex cut problem in bipartite graphs (refer to [15]
for the definition of the terminal k-vertex cut problem). Hence the LP-rounding
algorithm due to Garg, Vazirani and Yannakakis [3] for the terminal k-vertex
cut problem achieves approximation factor 2 − k2 also for the hypergraph k-cut
problem. Recently Chekuri and Korula [1] claims that the randomized algorithm
proposed by Karger and Stein [8] for the graph k-cut problem can be extended
to the hypergraph k-cut problem.
Okumoto, Fukunaga and Nagamochi [15] showed that the hypergraph k-cut
problem is contained by the the submodular system k-partition problem. Zhao,
Nagamochi and Ibaraki [21] presented a (k − 1)-approximation algorithm to this
problem. Okumoto, Fukunaga and Nagamochi [15] presented an approximation

algorithm whose approximation factor is 1.5 for k = 4 and k + 1 − 2 k − 1 for
Computing Minimum Multiway Cuts in Hypergraphs 17

k ≥ 5. They also showed that, for the hypergraph 4-cut problem, their algorithm
achieves approximation factor 4/3.
The rest of this paper is organized as follows. Section 2 introduces basic facts
and notations. Section 3 explains outline of our result and presents our algo-
rithm. Sections 4 shows that a maximum hypertree packing contains a hyper-
tree sharing at most a constant number of hyperedges with a minimum k-cut.
Section 5 discusses a property of a set of hypertrees constructed greedily.
Section 6 concludes this paper and mentions the future works.

2 Preliminaries

Let H = (V, E) be a hypergraph with a capacity c : E → Q+ . Throughout


this paper, we denote |V | by n, |E| by m, and maxe∈E |e| by γ. We sometimes
denote the vertex set of H by VH , and the edge set of H by EH , respectively.
For non-empty X ⊂ V and F ⊆ E, δF (X) denotes the set of hyperedges in F
intersecting both X and V \ X. When F = E, we may represent δF (X) by δ(X).
For some function f : E → Q+ and F ⊆ E, e∈F f (e) is represented by f (F ).
For non-empty X ⊂ V , E[X] denotes the set of hyperedges in E contained in
X, and H[X] denotes the sub-hypergraph (X, E[X]) of H.
It is sometimes convenient to associate a hypergraph H = (V, E) with a
bipartite graph BH = (V, VE , E  ) as follows. Each vertex in VE corresponds to
an edge in E. x ∈ V and y ∈ VE is joined by an edge in E  if and only if x is
contained by the hyperedge corresponding to y in H.
H is called -connected if c(δ(X)) ≥  for all non-empty X ⊂ V . H is
called connected if |δ(X)| ≥ 1 for all non-empty X ⊂ V . Notice that the 1-
connectedness is not equivalent to the connectedness.
A partition V = {V1 , V2 , . . . , Vk } of V into k non-empty subsets is called k-
partition of V . We let δF (V) = ∪i=1 δF (Vi ). H is called -partition-connected
if c(δ(V)) ≥ (|V| − 1) for all partitions V of V into non-empty subsets. It is
easy to see that the -partition-connectivity is a stronger condition than the
-connectivity. We call a partition V achieving min{c(δ(V))/(|V| − 1)} weakest.
min{c(δ(V))/(|V| − 1)} is denoted by H . If |δ(V)| ≥ |V| − 1 for all V, H is called
partition-connected. Notice that the 1-partition-connectedness is not equivalent
to the partition-connectedness
A minimal k-cut of H is represented by δ(V) where V is the k-partition of
V consisting of the connected components after removing the k-cut. Hence the
hypergraph k-cut problem is equivalent to the problem of finding a k-partition
V of H minimizing c(δ(V)). We call such a partition minimum k-partition.
A hyperforest in H = (V, E) is defined as F ⊆ E such that |F [X]| ≤ |X|−1 for
every non-empty X ⊆ V . A hyperforest F is called hypertree if |F | = |V | − 1 and
∪e∈F e = V . Notice that if H = (V, E) is a graph, F ⊆ E is a hypertree if and
only if F is a spanning tree. Actually a hypertree is an extension of a spanning
tree which inherits many important properties of spanning trees. However, there
is also a difference between them. For example, in contrast to spanning trees, a
connected hypergraph may contain no hypertree.
18 T. Fukunaga

A hypertree packing of H is a pair of a set T of hypertrees in H and a non-


negative weight α : T → Q+ such that α(Te ) ≤ c(e) holds for all e ∈ E where
Te denotes the set of hypertrees in T containing e. A hypertree packing is called
maximum if α(T ) is maximum. We define the packing value of a hypergraph H
as the maximum of α(T ) over all hypertree packings of H. If α is integer, then
the hypertree packing (T , α) is called integer.
Frank, Király and Kriesell [2] characterized hypergraphs containing hypertrees
as follows.

Theorem 1 (Frank, Király, Kriesell [2]). Let H be a hypergraph with integer


hyperedge capacity. H has an integer hypertree packing (T , α) such that α(T ) ≥
 if and only if H is -partition-connected.

Let F be the family of hyperforests. In the proof of Theorem 1, it is mentioned


that (E, F ) is a matroid, which is originally proven by Lorea [11]. Matroids
defined from hypergraphs in such a way are called hypergraphic matroids. Hy-
pertrees are bases of the hypergraphic matroid.
Independence testing in hypergraphic matroids needs to judge whether a given
F ⊆ E satisfies F [X] ≤ |X| − 1 for every ∅ = X ⊆ V . By Hall’s theorem, this
condition holds if and only if the bipartite graph BH = (V, VE , E  ) defined
from H contains a matching covering V after removing any vertex v ∈ VE .
Thus the independence testing can be done in O(nθ(n, m − 1, γm)) time where
θ(n, m−1, γm) denotes the time for computing maximum matchings in bipartite
graph BH = (V, VE , E  ) − v with |V | = n, |VE | = m and |E  | ≤ γm.

3 Outline of Our Result

The first step of our result is to prove the following theorem originally proven for
graphs by Thorup [18]. A recursively maximum hypertree packing is a maximum
hypertree packing that satisfies some condition, which will be defined formally
in Section 4.

Theorem 2. A recursively maximum hypertree packing of H contains a hyper-


tree that shares at most γk − 3 hyperedges with a minimum k-cut of H.

We prove Theorem 2 in Section 4.


Assume that the hypertree and the h = γk − 3 hyperedges in Theorem 2 are
specified. Since each of the other n − 1 − h hyperedges in the hypertree intersects
only one elements of the minimum k-partition, shrinking them into single vertices
preserves the minimum k-partition. If γ = 2, these n − 1 − h hyperedges form at
most h + 1 connected components. Hence the hypergraph obtained by shrinking
them contains at most h + 1 vertices, for which the minimum k-partition can
be found by enumerating all k-partitions. If γ ≥ 3, the number of the connected
components cannot be bounded in general because one large deleted hyperedge
may connect many components. However, a characterization of hypertrees due
to Lovász [12] tells that such a case does not occur even if γ ≥ 3.
Computing Minimum Multiway Cuts in Hypergraphs 19

Theorem 3 (Lovász [12]). Consider an operation that replaces each hyper-


edge by an edge joining two vertices chosen from the hyperedge. It is possible to
construct a spanning tree from a hypergraph by this operation if and only if the
hypergraph is a hypertree.

Corollary 1. After removing h hyperedges from a hypertree, there exist at most


h + 1 connected components.

Proof. Consider the spanning tree constructed from a hypertree as shown by


Theorem 3. After removing h edges from the spanning tree, the remaining edges
forms h + 1 connected components. The vertices in the same connected com-
ponent are also connected by the hyperedges corresponding to the remaining
edges. Hence removing h hyperedges from a hypertree results in at most h + 1
connected components. 


Another thing to care is the existence of hypertrees. As mentioned in Section 2,


there exist connected hypergraphs which contain no hypertrees. For such hyper-
graphs, hypertree packings give no information on minimum k-cuts. We avoid
this situation by replacing each hyperedge e ∈ E by its |e| copies with capacity
c(e)/|e|. Obviously this replacement makes no effect on capacities of k-partitions
while the obtained hypergraphs contain hypertrees. Notice that after the replace-
ment, the number of hyperedges are increased to at most γm.

Theorem 4. Let H  = (V, E  ) be the hypergraph obtained from a connected


hypergraph H = (V, E) by replacing each e ∈ E by |e| copies of e. Then H 
contains a hypertree.
 
Proof. Let V be a partition of  each e ∈ δE (V) intersects at most |e |
V . Since


components of V, |δE  (V)| ≥ U∈V e ∈δE (U) (1/|e |). Since each e ∈ E has |e|
 
copies in E  , e ∈δE (U) (1/|e |) = e∈δE (U) 1 = |δE (U )|. Moreover, |δE (U )| ≥ 1

because H is connected. Thus |δE  (V)| ≥ U∈V 1 = |V|, which implies that H 
is partition-connected. Hence by Theorem 1, H  contains a hypertree. 


Now we describe our algorithm for the hypergraph k-cut problem.

Algorithm 1: Hypergraph k-Cut Algorithm


Input: A connected hypergraph H = (V, E) with capacity c : E → Q+ and an
integer k ≥ 2
Output: A minimum k-cut of H
Step 1: For each e ∈ E, prepare |e| copies e1 , e2 , . . . , e|e| of e with capacity
c(ei ) = c(e)/|e|, i ∈ {1, 2, . . . , |e|}, and replace e by them.
Step 2: Define F = E. Compute a recursively maximum hypertree (T ∗ , α∗ ) of
H.
Step 3: For each T ∈ T ∗ and each set T  of h = γk − 3 hyperedges in T ,
execute the following operations.
3-1: Compute a hypergraph H  obtained by shrinking all hyperedges in
T \ T .
20 T. Fukunaga

3-2: Compute a minimum k-cut F  of H  .


3-3: Let F := F  if c(F  ) ≤ c(F ).
Step 4: Output F .

Let us discuss the running time of this algorithm. For each hypertree, there
are O(nh ) ways to choose h hyperedges. By Corollary 1, shrinking n − 1 − h hy-
peredges in a hypertree results in a hypergraph with at most h+1 vertices. Hence
Step 3-2 can be done in O(k h+1 ) time. It means that Step 3 of the algorithm
runs in O(k h+1 nh ) time per one hypertree in T ∗ .
To bound the running time of all the steps, we must consider how to compute
a recursively maximum hypertree packing and how large its size is. A recursively
maximum hypertree packing can be computed in polynomial time. However we
know no algorithm to compute small recursively maximum hypertree packings.
Hence this paper follows the approach taken by Thorup [18] for the graph k-cut
problem. We show that a set of hypertrees constructed as below approximates a
recursively maximum hypertree packing well. It enables us to avoid computing
a recursively maximum hypertree packing.

Algorithm 2: Greedy Algorithm for Computing a Set of Hypertrees


Input: A connected hypergraph H = (V, E) with capacity c : E → Q+ and an
integer t.
Output: A set of t hypertrees of H.
Step 1: Let T := ∅.
Step 2: Compute a minimum cost hypertree T of H with respects to the cost
defined as |Te |/c(e) for each e ∈ E, and T := T ∪ {T }.
Step 3: If |T | = t, then output T . Otherwise, return to Step 2.

As mentioned in Section 2, hypertrees are bases of a hypergraphic matroid.


Hence a minimum cost hypertree can be computed by a greedy algorithm. The
running time of Algorithm 2 is O(tγm log(γm)nθ(n, γm − 1, γ 2 m)). The set of
hypertrees computed by Algorithm 2 approximates the recursively maximum
hypertree packing well.

Theorem 5. Let H = (V, E) be a hypergraph such that each e ∈ E has at least


|e| − 1 copies in E \ {e} of the same capacity. For this H, Algorithm 2 with
t = 24γ 4 mk 3 ln(2γ 2 kmn) outputs a set of hypertrees which contains a hypertree
sharing at most h = γk − 2 hyperedges with a minimum k-cut of H.

Replace the computation of recursively maximum hypertree packings in Step 2


of Algorithm 1 by Algorithm 2 with t = 24γ 4 mk 3 ln(2γ 2 kmn). Moreover, change
the definition of h in Step 3 as γk − 2. Then we obtain another algorithm for
the hypertree k-cut problem as summarized in the next corollary.

Corollary 2. The hypergraph k-cut problem is solvable in time

O(k γk+2 nγk−1 γ 5 m2 θ(n, γm − 1, γ 2 m) log(kγ 2 mn) log(γm)).


Computing Minimum Multiway Cuts in Hypergraphs 21

4 Proof of Theorem 2
Let V be an arbitrary partition of V , and (T , α) be an arbitrary hypertree
packing of H. Since every hypertree T ∈ T satisfies |T | = |V | − 1 and has at
most |U | − 1 hyperedges contained by U for each U ∈ V, we have
 
|δT (V)| = |T | − |T [U ]| ≥ |V | − 1 − (|U | − 1) = |V| − 1. (1)
U∈V U∈V

Moreover,
c(e) ≥ α(Te ) for each e ∈ δ(V) (2)
by the definition of hypertree packings. Thus it follows that
 
c(δ(V)) e∈δ(V) α(Te ) α(T )|δT (V)|
≥ = T ∈T ≥ α(T ). (3)
|V| − 1 |V| − 1 |V| − 1

Let V ∗ be a weakest partition of V (i.e., it attains minV c(δ(V))/(|V|−1)), and


(T ∗ , α∗ ) be a maximum hypertree packing of H (i.e., it attains max(T ,α) α(T )).
From Theorem 1, we can derive their important properties.

Lemma 1. V ∗ and (T ∗ , α∗ ) satisfy c(δ(V ∗ ))/(|V ∗ | − 1) = α∗ (T ∗ ). Moreover,


|δT (V ∗ )| = |V ∗ | − 1 holds for each T ∈ T ∗ , and α∗ (Te∗ ) = c(e) holds for each
e ∈ δ(V ∗ ). T [U ] defined from any T ∈ T ∗ and U ∈ V ∗ is a hypertree on H[U ].

Proof. Let M be a positive integer such that all of M c(e), e ∈ E and M α∗ (T ),


T ∈ T ∗ are integers. Notice that (T ∗ , M α∗ ) is a maximum hypertree packing of
the hypergraph H associated with hyperedge capacity M c. Applying Theorem 1
to this hypergraph shows that M c(δ(V ∗ ))/(|V ∗ | − 1) = T ∈T ∗ M α∗ (T ) holds.
That is to say, V ∗ and (T ∗ , α∗ ) satisfy c(δ(V ∗ ))/(|V ∗ | − 1) = α∗ (T ∗ ).
Since V ∗ and (T ∗ , α∗ ) satisfy (3) with equality, they also satisfy (1) and
(2), used for deriving (3), with equality. This proves the remaining part of the
lemma. 


Let U ∈ V ∗ , T  = {T [U ] | T ∈ T ∗ }, and α be the weight defined on the


hypertrees in T  such that α (T [U ]) = α∗ (T ) for T ∈ T ∗ . Since T  consists of
hypertrees of H[U ] by Lemma 1, (T  , α ) is a hypertree packing of H[U ]. How-
ever, it may not be a maximum hypertree packing of H[U ] since the partition-
connectivity of H[U ] may be larger than α (T  ). Let (S , β) be a maximum
hypertree packing of H[U ]. For each T ∈ T ∗ and S ∈ S , replacing hyperedges
in T contained by U with those in S generates another hypertree of H because
hypertrees are bases of hypergraphic matroids. Hence, from (T , α) and (S , β),
we can construct another maximum hypertree packing (U , ζ) of H such that
|U | ≤ |T | + |S | and (U  , ζ  ) is a maximum hypertree packing of H[U ] where
U  = {T [U ] | T ∈ U } and ζ  (T [U ]) = ζ(T )β(S )/ζ(U ) for each T [U ] ∈ U  .
A maximum hypertree packing obtained by repeating this operation is called
recursively maximum. That is to say, a recursively maximum hypertree packing
is defined as a hypertree packing computed by the following algorithm.
22 T. Fukunaga

Algorithm 3: Computing a Recursively Maximum Hypertree Packing

Input: A connected hypergraph H = (V, E) with capacity c : E → Q+ .


Output: A recursively maximum hypertree packing of H.
Step 1: Compute a maximum hypertree packing (T ∗ , α∗ ) of H, and a weakest
partition V ∗ of H.
Step 2: While there exits U ∈ V ∗ such that |U | > 1, execute the following
operations.
2-1: Compute a maximum hypertree packing (S , β) of H[U ] and a weakest
partition V of H[U ]. Define T := ∅, and β  (S) := β(S)α∗ (T ∗ )/β(S )
for each S ∈ S .
2-2: Choose T ∈ T ∗ \ T and S ∈ S . If α∗ (T ) < β  (S), then replace the
hyperedges in T [U ] by those in S, β  (S) := β  (S) − α∗ (T ), and T :=
T ∪{T }. Otherwise, i.e., α∗ (T ) ≥ β  (S), then construct a hypertree T  =
(T \ T [U ]) ∪ S with α∗ (T  ) := β  (S), and update α∗ (T ) := α∗ (T ) − β(S),
T := T ∪ {T  }, and S := S \ {S}. If α∗ (T ) = β  (S) in the latter case,
remove T from T ∗ in addition.
2-3: If T ∗ \ T = ∅, then return to Step 2-2.
2-4: V ∗ := (V ∗ \ {U }) ∪ V.
Step 3: Output (T ∗ , α∗ ).

From now on, we let (T ∗ , α∗ ) stand for a recursively maximum hypertree pack-
ing. For U ∈ V ∗ , let T  = {T [U ] | T ∈ T ∗ } and α (T [U ]) = α∗ (T )H[U] /α∗ (T ∗ )
where H[U] is the partition-connectivity of H[U ]. The definition of (T ∗ , α∗ ) im-
plies that (T  , α ) is a recursively maximum hypertree packing of H[U ] for any
U ∈ V ∗.
From T ∗ and given k, define Vk as the k-partition of V constructed by the
following algorithm.

Algorithm 4: Computing Vk
Input: A connected hypergraph H = (V, E) with capacity c : E → Q+ , and an
integer k ≥ 2.
Output: A k-partition of V .
Step 1: Define Vk := {V }.
Step 2: Let U ∈ Vk be a set attaining min{H[U] | U ∈ Vk , |U | ≥ 2}. Compute
aweakest partition U = {U1 , U 2 , . . . , U|U | } of H[U ], where we assume that
∗ ∗
T ∈T ∗ α (T )|δ(Ui ) ∩ T [U ]| ≤ T ∈T ∗ α (T )|δ(Uj ) ∩ T [U ]| for 1 ≤ i < j ≤
|U|.
Step 3: If |Vk | − 1 + |U| < k, then Vk := (Vk \ {U }) ∪ U and return to Step 2.
Step 4: If |Vk | − 1 + |U| ≥ k, then Vk := (Vk \ {U }) ∪ {U1 , U2 , . . . , Uk−|Vk | , U \
k−|Vk |
∪i=1 Ui }, and output Vk .

Lemma 2 
α∗ (Te∗ ) < (γk − 2)α∗ (T ∗ ).
e∈δ(Vk )
Computing Minimum Multiway Cuts in Hypergraphs 23

Proof. In this proof, Vk stands for Vk immediately before executing Step 4 of
Algorithm 4, and Vk stands for it outputted by Algorithm 4.
By the definition of recursively maximum hypertree packings, T [U ] is a hy-
 ∗ 
pertree of H[U ] for every pairof U ∈ Vk and T ∈ T . Thus |δ(Vk ) ∩ T | =
|T |− U∈V  |T [U ]| = |V |−1− U∈V  (|U |−1) = |Vk |−1 holds for each T ∈ T ∗ ,

k
∗ ∗
 k ∗   ∗ ∗
and hence e∈δ(Vk ) α (Te ) = T ∈T ∗ α (T )|δ(Vk ) ∩ T | = (|Vk | − 1)α (T )
holds.
Let U be the element of Vk and U = {U1 , U2 , . . . , U|U | } be the weakest par-
tition of U computed in Step 2 immediately before executing Step 4. Note that
|U| > k − |Vk | holds by the condition of Step 4. By the same  reason with above,
|δ(U) ∩ T [U ]| = |U| − 1 holds for each T ∈ T ∗ . Hence T ∈T ∗ α∗ (T )|δ(U) ∩
T [U ]| = (|U| − 1)α∗ (T ∗ ).
k−|V  |
Let VU = {U1 , U2 , . . . , Uk−|Vk | , U \ ∪j=1 k Uj }. Then

k−|Vk |
  

α (T )|δ(VU ) ∩ T [U ]| ≤ α∗ (T )|δ(Uj ) ∩ T [U ]|.
T ∈T ∗ j=1 T ∈T ∗


The
 elements in U are ordered so that they satisfy T ∈T ∗ α∗ (T )|δ(Ui )∩T [U ]| ≤

T ∈T ∗ α (T )|δ(Uj ) ∩ T [U ]| for 1 ≤ i < j ≤ |U|. Hence it holds that

k−|Vk | |U |
  k − |Vk |   ∗

α (T )|δ(Uj ) ∩ T [U ]| ≤ α (T )|δ(Uj ) ∩ T [U ]|.

|U| j=1 ∗
j=1 T ∈T T ∈T

Since each hyperedge intersects at most γ elements in δ(U), it holds that


|U |
k − |Vk |   ∗ k − |Vk |  ∗
α (T )|δ(Uj ) ∩ T [U ]| ≤ γ α (T )|δ(U) ∩ T [U ]|
|U| j=1 ∗
|U| ∗
T ∈T T ∈T
k − |Vk |
= γ(|U| − 1)α∗ (T ∗ )
|U|
< (k − |Vk |)γα∗ (T ∗ ).

Combining these implies that T ∈T ∗ α∗ (T )|δ(VU )∩T [U ]| < (k − |Vk |)γα∗ (T ∗ ).
Notice that δ(Vk ) ∩ T = (δ(Vk ) ∩ T ) ∪ (δ(VU ) ∩ T [U ]). Recall that |Vk | ≥ 1
and γ ≥ 2. Therefore it follows that
 
α∗ (Te∗ ) = α∗ (T )|δ(Vk ) ∩ T |
e∈δ(Vk ) T ∈T ∗

< {(|Vk | − 1) + (k − |Vk |)γ}α∗ (T ∗ ) ≤ (γk − 2)α∗ (T ∗ ).




Lemma 3. For each e ∈ δ(Vk ) and f ∈ E \ δ(Vk ), α∗ (Te∗ )/c(e) ≥ α∗ (Tf∗ )/c(f )
holds.
24 T. Fukunaga

Proof. Let U ∈ Vk denote the set containing f (i.e., f ∈ E[U ]). Let Vk denote
Vk immediately before e enters δ(Vk ) in Algorithm 4. Assume that e is contained
by U  ∈ Vk (i.e., e ∈ E[U  ]). Moreover, let U (resp., U  ) denote the packing
value of H[U ] (resp., H[U  ]).
From (T ∗ , α∗ ), define T  = {T [U ] | T ∈ T ∗ }. Moreover, define α (T [U  ]) =
α (T )U  /α∗ (T ∗ ) for T ∈ T ∗ . By the definition of recursively maximum hyper-

tree packings, (T  , α ) is a maximum hypertree packing of H[U  ]. By Lemma 1,


the capacity constraint of edge e is tight for any maximum hypertree packing
of H[U  ], i.e., c(e) = α (Te ). Since α (Te ) = α∗ (Te∗ )U  /α∗ (T ∗ ), it holds that
U  = c(e)α∗ (T ∗ )/α∗ (Te∗ ).
On the other hand, a maximum hypertree packing of H[U ] satisfies the capac-
ity constraint for edge f . Hence, similarly with above, U ≤ c(f )α∗ (T ∗ )/α∗ (Tf∗ ).
Vk contains U  such that U ⊆ U  . In other words, U = U  , or U is obtained by
dividing U  in Algorithm 4. As explained when recursively maximum hypertree
packings are defined, U  ≤ U holds. Since Step 2 chose U  immediately before
e enters δ(Vk ), U  ≤ U  holds. These facts show that
c(e)α∗ (T ∗ ) c(f )α∗ (T ∗ )
=  U  ≤ U  ≤ U ≤ ,
α∗ (Te∗ ) α∗ (Tf∗ )

implying the required inequality. 



Let V opt denote a minimum k-partition of H.
Lemma 4  
α∗ (T )|δ(V opt ) ∩ T | ≤ α∗ (Te∗ ).
T ∈T ∗ e∈δ(Vk )


Proof. Let η = mine∈δ(Vk ) α (Te∗ )/c(e).
By Lemma 3, each hyperedge e ∈
δ(V opt ) \ δ(Vk ) satisfies α∗ (Te∗ )/c(e) ≤ η. Hence it holds that
  α∗ (Te∗ )
α∗ (Te∗ ) = c(e) ≤ ηc(δ(V opt ) \ δ(Vk )).
c(e)
e∈δ(V opt )\δ(Vk ) e∈δ(V opt )\δ(Vk )

The definition of V opt implies that c(δ(V opt )) ≤ c(δ(Vk )), and hence c(δ(V opt ) \
δ(Vk )) ≤ c(δ(Vk ) \ δ(V opt )). Thus
  
α∗ (T )|δ(V opt ) ∩ T | = α∗ (Te∗ ) + α∗ (Te∗ )
T ∈T ∗ e∈δ(V opt )∩δ(Vk ) e∈δ(V opt )\δ(Vk )

≤ α∗ (Te∗ ) + ηc(δ(V opt ) \ δ(Vk ))
e∈δ(V opt )∩δ(V k)

≤ α∗ (Te∗ ) + ηc(δ(Vk ) \ δ(V opt ))
e∈δ(V opt )∩δ(Vk )

≤ α∗ (Te∗ ).
e∈δ(Vk )



Computing Minimum Multiway Cuts in Hypergraphs 25

From Lemmas 2 and 4, we can observe that


 ∗
T ∈T ∗ α (T )|δ(V
opt
) ∩ T|
∗ ∗
< γk − 2.
α (T )
This means that |δ(V opt ) ∩ T | < γk − 2 holds for some T ∈ T ∗ . Therefore
Theorem 2 has been proven.

5 Proof of Theorem 5
In this section, we present a proof of Theorem 5. Although it is almost same
with that for γ = 2 presented by Thorup [18], we sketch it for self-containment.
Throughout this section, we let H = (VH , EH ) be a hypergraph such that
each e ∈ EH has at least |e| − 1 copies in EH \ {e} of the same capacity. We
denote |EH | by γm, and the capacity of hyperedges in H by cH in order to avoid
confusion. Moreover, we assume that a recursively maximum hypertree packing
(T ∗ , α∗ ) of H satisfies α∗ (Te∗ ) = α∗ (Te∗ ) for e ∈ EH and a copy e ∈ EH of e.
For a set T of hypertrees of H and e ∈ EH , define uT H (e) = |Te |/(cH (e)|T |).
For each e ∈ EH , we also define u∗H (e) as α∗ (Te∗ )/(cH (e)α∗ (T ∗ )) from a recur-
sively maximum hypertree packing (T ∗ , α∗ ) of H. Since cH (e) ≥ α∗ (Te∗ ) for all
e ∈ EH , 1/u∗H (e) is at least the packing value of H, i.e., 1/u∗H (e) ≥ α∗ (T ∗ ).
Moreover, since cH (e) = α∗ (Te∗ ) holds for some e ∈ EH by the maximality of
(T ∗ , α∗ ), mine∈EH 1/u∗H (e) = α∗ (T ∗ ) holds.
Recall that Algorithm 3 updates V ∗ by partitioning non-singleton sets in

V repeatedly until no such sets exist. For e ∈ EH , define Ue as the last
set in V ∗ such that e ∈ EH [Ue ] during the execution of the algorithm. Then
maxe ∈EH [Ue ] u∗H[Ue ] (e ) = u∗H[Ue ] (e). The definition of recursively maximum hy-
pertree packings implies that u∗H[Ue ] (e ) = u∗H (e ) for each e ∈ EH [Ue ] because
α∗ (Te∗ )/α∗ (T ∗ ) = β(Se )/β(S ) holds with a maximum hypertree packing
(S , β) of H[Ue ]. Therefore, the partition-connectivity of H[Ue ] is 1/u∗H (e).
Lemma 5. Let I be a subgraph of H and assume that each  hyperedge e in I
has capacity cI (e) such that cmin ≤ cI (e) ≤ cH (e). Let C = e∈EI cI (e), and
uI = maxe∈EI u∗I (e). Moreover, let be an arbitrary real such that 0 < <
1/2, and T g be a set of hypertrees in H constructed by Algorithm 2 with t ≥
3 ln(C/cmin )/(cmin uI 2 ). Then
uT
g
H (e) < (1 + )uI (4)
holds for each e ∈ EI .
Proof. Scaling hyperedge capacity makes no effect on the claim. Hence we assume
without loss of generality that cmin = 1.
Let T denote a set of hypertrees kept by Algorithm 2 at some moment during
it is running for computing T g . The key is the following quantity:
 (1 + )|Te |/cH (e) (1 + uI )t−|T |
cI (e) . (5)
e∈EI
(1 + )(1+)uI t
26 T. Fukunaga

This quantity has the following properties:


(i) When T = ∅, (5) is less than 1;
(ii) If (5) is less than 1 when |T | = t, then (4) holds for all e ∈ EI ;
(iii) When a tree is added to T in Step 2 of Algorithm 2, then (5) is not
increased.
Clearly these three facts imply (4) for all e ∈ EI . We do not prove these proper-
ties here due to the space limitation. Refer to Thorup [18] or full version of this
paper for their proofs. We would like to note that an important fact for having
(iii) is that hypertrees are bases of the hypergraphic matroid. 

By applying Lemma 5 to some subgraph of H, we obtain the next lemma. We
skip the proof due to the space limitation.
Lemma 6. Let 0 < ≤ 1/2, and T g be a set of hypertrees of H constructed by
Algorithm 2 with |T g | = t ≥ 3γm ln(γmn/ )/ 3. Then
1 + α∗ (Te∗ )
|Teg | ≤ · · |T g | + 1
1 − α∗ (T ∗ )
holds for each e ∈ EH .
Lemma 6 proves Theorem 5 as follows. Let V opt stand for a minimum k-partition
of H. Lemma 6 shows that
 
|δ(V opt ) ∩ T | = |Teg |
T ∈T g e∈δ(V opt )

 1 + α∗ (Te∗ ) 1+
∗ ∗
e∈δ(V opt ) α (Te )
≤ (t · ∗ ∗ + 1) ≤ t · + γm.
1 − α (T ) 1− α∗ (T ∗ )
e∈δ(V opt )

In the last of Section 4, we have observed that


 ∗ ∗  ∗
e∈δ(V opt ) α (Te ) T ∈T ∗ α (T )|δ(V
opt
) ∩ T|
∗ ∗
= ∗ ∗
< γk − 2.
α (T ) α (T )
These mean that

T ∈T g |δ(V opt ) ∩ T | 1+ γm
< (γk − 2) + .
t 1− t
Recall that t = 3γm ln(γmn/ )/ 3 . Assume that n, m ≥ 2. Then t ≥ 6γm/ 3 ,
and hence the right-hand side of the above inequality is at most
 
1+ 2kγ − 4 2
(γk − 2) + /6 = γk − 2 + 2γk +
3
+ .
1− 1− 6
Setting to 1/(4k), the right-hand side is at most γk − 1, which means that

T ∈T g |δ(V ) ∩ T|
opt
< γk − 1.
t
This implies that T g contains a hypertree T such that |δ(V opt ) ∩ T | < γk − 1.
Moreover, t = 3γm ln(γmn/ )/ 3 = 24γ 4 mk 3 ln(2γ 2 kmn). Therefore the proof
has been completed.
Computing Minimum Multiway Cuts in Hypergraphs 27

6 Concluding Remarks

Our algorithm proposed in this paper is not polynomial if γ is not fixed. A


reason for this fact is that a bound obtained in Theorem 2 depends on γ. If
we can remove γ from the bound, we have a polynomial algorithm even if γ
is not fixed. However there exists a hypergraph in which every hypertree in a
recursively maximum hypertree packing shares γ + k − 3 hyperedges with any
minimum k-cuts.
Define a set V of vertices as {v1 , v2 , . . . , vn }. We identify i with i + n for
each i ∈ {1, 2, . . . , n} for convenience. We also define a set E of hyperedges
as {e1 , e2 , . . . , en−1 } where each hyperedge ei is defined as {vi , vi+1 , . . . , vi+γ }.
Let H = (V, E) be the hypergraph with uniform hyperedge capacity. Figure 1
illustrates H. The intervals represented by gray lines in the figure denote the
hyperedges of H.
Observe that H is a hypertree. Hence a recursively maximum hypertree packing
of H consists of a single hypertree H. On the other hand, any minimum k-partition
of H is represented by {{vi }, {vi+1 }, {vi+2 }, . . . , {vi+k−2 }, V − ∪i+k−2
j=i {vj }} with
some i ∈ {1, 2, . . . , n} because each hyperedge in H contains vertices of consec-
utive indices. Since less number of hyperedges contain vj , j ∈ {n, 1, . . . , γ − 1},
i ≤ n < γ − 1 ≤ i + k − 2 holds. Hence any minimum k-cut of H contains γ + k − 3
hyperedges (A minimum k-partition is represented by the dotted lines in Figure 1).
Therefore, any hypertree in a recursively maximum hypertree packing of H and
any minimum k-cut shares γ + k − 3 hyperedges.

Fig. 1. A hypergraph H in which every hypertree in a recursively maximum hypertree


packing shares γ + k − 3 hyperedges with any minimum k-cuts. Dotted lines represent
a minimum k-partition {{vn }, {v1 }, . . . , {vk−2 }, {vk−1 , vk , . . . , vn−1 }}.
28 T. Fukunaga

References
1. Chekuri, C., Korula, N.: Personal Communication (2010)
2. Frank, A., Király, T., Kriesell, M.: On decomposing a hypergraph into k connected
sub-hypergraphs. Discrete Applied Mathematics 131(2), 373–383 (2003)
3. Garg, N., Vazirani, V.V., Yannakakis, M.: Multiway cuts in node weighted graphs.
Journal of Algorithms 50, 49–61 (2004)
4. Gasieniec, L., Jansson, J., Lingas, A., Óstlin, A.: On the complexity of constructing
evolutionary trees. Journal of Combinatorial Optimization 3, 183–197 (1999)
5. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum flow problem.
Journal of the ACM 35, 921–940 (1988)
6. Goldschmidt, O., Hochbaum, D.: A polynomial algorithm for the k-cut problem
for fixed k. Mathematics of Operations Research 19, 24–37 (1994)
7. Kamidoi, Y., Yoshida, N., Nagamochi, H.: A deterministic algorithm for finding
all minimum k-way cuts. SIAM Journal on Computing 36, 1329–1341 (2006)
8. Karger, D.R., Stein, C.: A new approach to the minimum cut problem. Journal of
the ACM 43, 601–640 (1996)
9. Klimmek, R., Wagner, F.: A simple hypergraph min cut algorithm. Internal Report
B 96-02, Bericht FU Berlin Fachbereich Mathematik und Informatik (1995)
10. Lawler, E.L.: Cutsets and partitions of hypergraphs. Networks 3, 275–285 (1973)
11. Lorea, M.: Hypergraphes et matroides. Cahiers Centre Etudes Rech. Oper. 17,
289–291 (1975)
12. Lovász, L.: A generalization of König’s theorem. Acta. Math. Acad. Sci. Hungar. 21,
443–446 (1970)
13. Mak, W.-K., Wong, D.F.: A fast hypergraph min-cut algorithm for circuit parti-
tioning. Integ. VLSI J. 30, 1–11 (2000)
14. Nagamochi, H.: Algorithms for the minimum partitioning problems in graphs. IE-
ICE Transactions on Information and Systems J86-D-1, 53–68 (2003)
15. Okumoto, K., Fukunaga, T., Nagamochi, H.: Divide-and-conquer algorithms for
partitioning hypergraphs and submodular systems. In: Dong, Y., Du, D.-Z., Ibarra,
O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 55–64. Springer, Heidelberg (2009)
16. Stoer, M., Wagner, F.: A simple min-cut algorithm. J. the ACM 44, 585–591 (1997)
17. Thorup, M.: Fully-dynamic min-cut. Combinatorica 27, 91–127 (2007)
18. Thorup, M.: Minimum k-way cuts via deterministic greedy tree packing. In: Pro-
ceedings of the 40th Annual ACM Symposium on Theory of Computing, pp. 159–
166 (2008)
19. Xiao, M.: Finding minimum 3-way cuts in hypergraphs. In: Agrawal, M., Du, D.-
Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 270–281. Springer,
Heidelberg (2008)
20. Xiao, M.: An improved divide-and-conquer algorithm for finding all minimum k-
way cuts. In: Hong, S.-H., Nagamochi, H., Fukunaga, T. (eds.) ISAAC 2008. LNCS,
vol. 5369, pp. 208–219. Springer, Heidelberg (2008)
21. Zhao, L., Nagamochi, H., Ibaraki, T.: A unified framework for approximating mul-
tiway partition problems. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS,
vol. 2223, pp. 682–694. Springer, Heidelberg (2001)
Eigenvalue Techniques for Convex Objective,
Nonconvex Optimization Problems

Daniel Bienstock

APAM and IEOR Depts., Columbia University

Abstract. A fundamental difficulty when dealing with a minimization


problem given by a nonlinear, convex objective function over a nonconvex
feasible region, is that even if we can efficiently optimize over the con-
vex hull of the feasible region, the optimum will likely lie in the interior
of a high dimensional face, “far away” from any feasible point, yielding
weak bounds. We present theory and implementation for an approach
that relies on (a) the S-lemma, a major tool in convex analysis, (b) effi-
cient projection of quadratics to lower dimensional hyperplanes, and (c)
efficient computation of combinatorial bounds for the minimum distance
from a given point to the feasible set, in the case of several significant
optimization problems. On very large examples, we obtain significant
lower bound improvements at a small computational cost1 .

1 Introduction
We consider problems with the general form

(F ) : F̄ := min F (x), (1)


s.t. x ∈ P, (2)
x ∈ K, where (3)

– F is a convex quadratic, i.e. F (x) = xT M x + v T x (with M  0 and


v ∈ Rn ). Extensions to the non-quadratic case are possible (see below).
– P ⊆ Rn is a convex set over which we can efficiently optimize F ,
– K ⊆ Rn is a non-convex set with “special structure”.
We assume that a given convex relaxation of the set described by (2), (3) is under
consideration. A fundamental difficulty is likely to be encountered: because of
the convexity of F , the optimum solution to the relaxation will frequently be
attained in the interior of a high-dimensional face of the relaxation, and far from
the set K. Thus, the lower bound proved by the relaxation will often be weak.
What is more, if one were to rely on branch-and-cut the proved lower bound may
improve little if at all when n is large, even after massive amounts of branching.
This stalling of the lower bounding procedure is commonly encountered in
practice and constitutes a significant challenge, the primary subject of our study.
1
Work partially funded by ONR Award N000140910327 and a gift from BHP Billiton
Ltd.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 29–42, 2010.

c Springer-Verlag Berlin Heidelberg 2010
30 D. Bienstock

After obtaining the solution x∗ to the given relaxation for problem F , our meth-
ods will use techniques of convex analysis, of eigenvalue optimization, and com-
binatorial estimations, in order to quickly obtain a valid lower bound on F̄ which
is strictly larger than F (x∗ ). Our methods apply if F is not quadratic but there
is a convex quadratic G such that F (x) − F (x∗ ) ≥ G(x − x∗ ) for all feasible x.
We will describe an important class of problems where our method, applied
to a “cheap” but weak formulation, produces bounds comparable to or better
than those obtained by much more sophisticated formulations, and at a small
fraction of the computational cost.
Cardinality constrained optimization problems. Here, for some integer
0 < K ≤ n, K = { x ∈ Rn : x0 ≤ K }, where the zero-norm v0 of a vector
v is used to denote the number of nonzeros in v. This constraint arises in portfo-
lio optimization (see e.g. [2]) but modern applications involving this constraint
arise in statistics, machine learning [13], and, especially, in engineering and bi-
ology [19]. Problems related to compressive sensing have an explicit cardinality
constraint (see www.dsp.ece.rice.edu/cs for material). Also see [7].
The simplest canonical example of problem F is as follows:

F̄ = min F (x), (4)



s.t. xj = 1, x ≥ 0, (5)
j
x0 ≤ K. (6)

This problem is strongly NP-hard, and it does arise in practice, exactly as stated.
In spite of its difficulty, this example already incorporates the fundamental


difficulty alluded to above: clearly, conv x ∈ Rn+ : j xj = 1, x0 ≤ K =


x ∈ Rn+ : j xj = 1 . In other words, from a convexity standpoint the cardi-


nality constraint disappears. Moreover, if the quadratic in F is positive definite
and dominates the linear term, then the minimizer of F over the unit simplex
will be an interior point (all coordinates positive) whereas K  n in practice.
A second relevant example is given my a system of multiple (linear) disjunc-
tions, such as split-cuts [6]. Also see [3], [4]. Details in full paper. To the extent
that disjunctive sets are a general-purpose technique for formulating combinato-
rial constraints, the methods in this paper apply to a wide variety of optimization
problems.

1.1 Techniques
Our methods embody two primary techniques:
(a) The S-lemma (see [20], also [1], [5], [15]). Let f, g : Rn → R be quadratic
functions and suppose there exists x̄ ∈ RN such that g(x̄) > 0. Then

f (x) ≥ 0 whenever g(x) ≥ 0

if and only if there exists μ ≥ 0 such that (f − μg)(x) ≥ 0 for all x.


Eigenvalue Techniques for Convex Objective Problems 31

Remark: here, a “quadratic” may contain a linear as well as a constant term.


The S-lemma can be used as an algorithmic framework for minimizing a quadratic
subject to a quadratic constraint. Let p, q be quadratic functions and let α, β be
reals. Then

min{p(x) : q(x) ≥ β} ≥ α, iff ∃ μ ≥ 0 s.t. p(x) − α − μq(x) + μβ ≥ 0 ∀ x.


(7)

In other words, the minimization problem in (7) can be approached as a simul-


taneous search for two reals α and μ ≥ 0, with α largest possible such that the
last inequality in (7) holds. The S-lemma is significant in that it provides a good
characterization (i.e. polynomial-time) for a usually non-convex optimization
problem. See [14], [16], [17], [18], [21] and the references therein, in particular
regarding the connection to the trust-region subproblem.
(b) Consider a given nonconvex set K. We will assume, as a primitive, that
(possibly after an appropriate change of coordinates, given a point x̂ ∈ Rn , we
can efficiently compute a strong (combinatorial) lower bound for the Euclidean
distance between x̂ and the nearest point in P ∩ K. We will show that this is
indeed the case for the cardinality constrained case (see Section 1.4. Roughly,
we exploit the “structure” of a set K of interest. We will denote by D(x̂) our
lower bound on the minimum distance from x̂ to P ∩ K.
Using (a) and (b), we can compute a lower bound for F̄ :

Simple Template
S.1 Compute an optimal solution x∗ to the given relaxation to problem F .
S.2 Obtain the quantity D(x∗ ).
S.3 Apply the S-lemma as in (7), using F (x) for p(x), and (the exterior of)
the ball centered at x∗ with radius D(x∗ ) for q(x) − β.

x*

Fig. 1. A simple case

For a simple application of this template, consider Figure 1. This shows an


instance of problem (4)-(6), with n = 3 and K = 2 where all coordinates of x∗
are positive. The figure also assumes that D(x∗ ) is exact – it equals the minimum
distance from x∗ to the feasible region. If we minimize F (x), subject to being
on the exterior of this ball the optimum will be attained at y. Thus, F (y) ≤ F̄ ;
we have F (y) = F (x∗ ) + λ̃1 R2 , where R is the radius of the ball and λ̃1 is the
minimum eigenvalue of the restriction of F (x) to the unit simplex.
32 D. Bienstock

Now consider the example in Figure 2, corresponding to the case of a single


disjunction. Here, xF is the optimizer of F (x) over the affine hull of the set P.
A straightforward application of the S-Lemma will yield as a lower bound (on
F̄ ) the value F (y), which is weak – weaker, in fact, than F (x∗ ). The problem is
caused by the fact that xF is not in the relative interior of the convex hull of
the feasible region. In summary, a direct use of our template will not work.

y xF

x*

Fig. 2. The simple template fails

1.2 Adapting the Template


In order to correct the general form of the difficulty depicted by Figure 2 we
would need to solve a problem of the form:
 
V := min F (x) : x − x∗ ∈ C, (x − x∗ )T (x − x∗ ) ≥ δ 2 (8)
where δ > 0, and C is the cone of feasible directions (for P) at x∗ . We can view
this as a ’cone constrained’ version of the problem addressed by the S-Lemma.
Clearly, F (x∗ ) ≤ V ≤ F̄ with the first inequality in general strict. If we are
dealing with polyhedral sets, (8) becomes (after some renaming):
 
V = min F (ω) : Cω ≥ 0, ω T ω ≥ δ 2 (9)
where C is an appropriate matrix. However, we have (proof in full paper):
Theorem 1. Problem (9) is strongly NP-hard.
We stress that the NP-hardness result is not simply a consequence of the non-
convex constraint in (9) – without the linear constraints, the problem becomes
polynomially solvable (i.e., it is handled by the S-lemma, see the references).
To bypass this negative result, we will adopt a different approach. We assume
that there is a positive-definite quadratic function q(x) such that for any y ∈
Rn , in polynomial time we can produce a (strong, combinatorial) lower bound
2
Dmin (y, q) on the quantity
min{q(y − x) : x ∈ P ∩ K}.
Eigenvalue Techniques for Convex Objective Problems 33

In Section 1.4 we will address how to produce the quadratic q(x) and the value
D2 (y, q) when K is defined by a cardinality constraint.
Let c = ∇F (x∗ ) (other choices for c discussed in full paper). Note that for
any x ∈ P ∩ K, cT (x − x∗ ) ≥ 0. For α ≥ 0, let pα = x∗ + αc, and let H α be the
hyperplane through pα orthogonal to c. Finally, define

V (α) := min{F (x) : q(x − pα ) ≥ D2 (pα , q), x ∈ H α }, (10)

and let y α attain the minimum. Note: computing V (α) entails an application of
the S-lemma, “restricted” to H α . See Figure 3. Clearly, V (α) ≤ F̄ . Then

– Suppose α = 0, i.e. pα = x∗ . Then x∗ is a minimizer of F (x) subject to


x ∈ H 0 . Thus V (0) > F (x∗ ) when F is positive-definite.
– Suppose α > 0. Since cT (y α − x∗ ) > 0, by convexity V (α) = F (y) > F (x∗ ).

Thus, F (x∗ ) ≤ inf α≥0 V (α) ≤ F̄ ; the first inequality being strict in the positive-
definite case. [It can be shown that the “inf” is a “min”]. Each value V (α)
incorporates combinatorial information (through the quantity D2 (pα , q)) and
thus the computation of minα≥0 V (α) cannot be obtained through direct convex
optimization techniques. As a counterpoint to Theorem 1 one can prove (using
the notation in eq. (8):

Theorem 2. In (9), if C has one row and q(x) = j x2j , V ≤ inf α≥0 V (α).

c
α
H

x*

Bounding ellipsoid in minimizer of F(x) in


α α
H H

Fig. 3. A better paradigm

In order to develop a computationally practicable approach that uses these ob-


servations, let 0 = α(0) < α(1) < . . . < α(J) , such that for any x ∈ P ∩ K,
cT (x − x∗ ) ≤ α(J) c22 . Then:

Updated Template

1. For 0 ≤ i < J, compute a value Ṽ (i) ≤ min{ V (α) : α(i) ≤ α ≤ α(i+1) }.


2. Output min0≤i<J Ṽ (i).

The idea here is that if (for all i) α(i+1) −α(i) is small then V (α(i) ) ≈ V (α(i+1) ).
Thus the quantity output in (2) will closely approximate minα≥0 V (α).
34 D. Bienstock

In our implementation, we compute Ṽ (i) by appropriately interpolating be-


tween V (α(i) ) and V (α(i+1) ) (details, full paper). Thus our approach reduces to
computing quantities of the form V (α). We need a fast procedure for this task
(since J may be large). Considering eq. (10) we see that this involves an applica-
tion of the S-lemma, “restricted” to the hyperplane H α . An efficient realization
of this idea, which allows for additional leveraging of combinatorial information,
is obtained by computing the projection of the quadratic F (x) to H α . This is
the subject of the next section.

1.3 Projecting a Quadratic


Let M = QΛQT be a n × n matrix. Here the columns of Q are the eigen-
vectors of M and Λ = diag{λ1 , . . . , λn } where the λi are the eigenvalues of
M . We assume λ1 ≤ . . . ≤ λn . Let c = 0 be an arbitrary vector, and de-
note H = x ∈ Rn : cT x = 0 , and let P be the projection matrix onto H.
In this section we describe an efficient algorithm for computing an eigenvalue-
eigenvector decomposition of the “projected quadratic” P M P . Note that if
x ∈ H, xT P M P x = xT M x. The vector c could be dense (is dense in important
cases) and Q could also be dense.
In [8] (also see Section 12.6 of [9] and references therein) the following “in-
verse” problem is considered. Suppose λi < λi+1 (1 ≤ i < n) and that for
1 ≤ i ≤ n − 1 we are given a number λ̃i with λi < λ̃i < λi+1 . Then we want
to find c (and hence, P ) such that the λ̃i are the nonzero eigenvalues of P M P .
Our approach reverse engineers that of [8], and extends it so as to handle the
case where the λi are not distinct.
Returning to our problem, clearly c is an eigenvector of P M P (corresponding
to eigenvalue 0). The remaining eigenvalues λ̃1 , . . . , λ̃n−1 are known to satisfy
λ1 ≤ λ̃1 ≤ λ2 ≤ λ̃2 ≤ . . . ≤ λn−1 ≤ λ̃n−1 ≤ λn .
Definition 1. An eigenvector q of M is called acute if q T c = 0. An eigenvalue
λ of M is called acute if at least one eigenvector corresponding to λ is acute.
In (e.2) below we will use the convention 0/0 = 0.
Lemma 1. Let α1 < α2 < . . . < αq be the acute eigenvalues of M . Write
d = QT c. Then, for 1 ≤ i ≤ q − 1,
 d2j
(e.1) The equation nj=1 λj −λ = 0 has a unique solution λ̂i in (αi , αi+1 ).
(e.2) Let wi = Q(λ − λ̂i I)−1 d. Then cT wi = 0 and P M P wi = λ̂i wi .
Proof (e.2) Note that the expression in (e.1), evaluated at λ̂i , can be written as

0 = dT (Λ − λ̂i I)−1 d = cT Q(Λ − λ̂i I)−1 QT c = c, (11)


Thus, we have that wi is a linear combination of acute eigenvectors of M and
that wi ∈ H, and therefore P wi = wi . So
(M − λ̂i I)wi = Q(Λ − λ̂i I)QT wi = QQT c,
Eigenvalue Techniques for Convex Objective Problems 35

and therefore

P M P wi = P M wi = λ̂i P wi = λ̂i wi ,
as desired.
Altogether, Lemma 1 produces q − 1 eigenvalue/eigenvector pairs of P M P . The
vector in (e.2) should not be explicitly computed; rather the factorized form in
(e.2) will suffice. The root to the equation in (e.1) can be quickly obtained using
numerical methods (such as golden section search) since the expression in (e.1)
is monotonely increasing in (αi , αi+1 ) (it may also be possible to adapt the basic
trust-region algorithm [14], which addresses a similar but not identical problem).
Lemma 2. Let α be an eigenvalue of M , V α the set of columns of Q with
eigenvalue α, and A = A(α) denote the acute members of V α . If |A| > 0, then
we can construct |A| − 1 eigenvectors of P M P corresponding to eigenvalue α,
each of which is a linear combination of elements of A and is orthogonal to c.
Proof: Write m = |A|, and let H be the m × m Householder matrix [9] corre-
sponding to dA , i.e. H is a symmetric matrix with H 2 = Im such that
HdA = (dA 2 , 0, ..., 0)T ∈ Rm .
Let QA be the n × m submatrix of Q consisting of the columns corresponding
to A, and define
W = QA H. (12)
Then cT W = dTA H = (dA 2 , 0, ..., 0). In other words, the columns of the
submatrix Ŵ consisting of the last m − 1 columns of W are orthogonal to c.
Denoting by Ĥ the submatrix of H consisting of the last m − 1 columns of H,
we therefore have
Ŵ = QA Ĥ, and
P M P Ŵ = P QΛQT Ŵ = P QΛQT QA Ĥ = αP QA Ĥ = αŴ .
Finally, Ŵ T Ŵ = Ĥ T Ĥ = Im , as desired.
Now suppose that
α1 < α2 < . . . < αq
denote the distinct acute eigenvalues of M (possibly q = 0). Let p denote the
number of columns of Q which are perpendicular eigenvectors. Writing mi =
|A(αi )| > 0 for 1 ≤ i ≤ q, we have that

q
n= mi + p.
i=1

(p.1) Using Lemma 1 we obtain q − 1 eigenvectors of P M P , each of which is


a linear combination of acute eigenvectors among Q. Any eigenvalue of
P M P constructed in this manner is different from all acute eigenvalues of
M.
36 D. Bienstock

(p.2) Using Lemma 2 we obtain, for each i, a set of mi − 1 eigenvectors of


P M P , orthogonal to c and with eigenvalue αi , each of which is a linear
combination of elements of A(αi ). In total, we obtain n−q −p eigenvectors
of P M P .
(p.3) Let p denote the number of perpendicular vectors among Q. Any such
vector v (with eigenvalue λ, say) by definition satisfies P M P v = P M v =
λP v = λv.

By construction, all eigenvectors of P M P constructed as per (p.1) and (p.2) are


distinct. Those arising in (p.3) are different from those in (p.1) and (p.2) since
no column of Q is a linear combination of other columns of Q. Thus, altogether,
(p.1)-(p.3) account for n−1 distinct eigenvectors of P M P , all of them orthogonal
to c, by construction. Finally, the vector c itself is an eigenvector of P M P ,
corresponding to eigenvalue 0.
To conclude this section, we note that it is straightforward to iterate the
procedure in this section, so as to project a quadratic to hyperplanes of dimension
less than n − 1. More details will be provided in the full paper.

1.4 Combinatorial Bounds on Distance Functions

Here we take up the problem of computing strong lower bounds on the Euclidean
distance from a point to the set P ∩ K. In this abstract we will focus on the
cardinality constrained problem, but results of a similar flavor hold for the case
of disjunctive sets.
Let a ∈ Rn , b ∈ R, K < n be a positive integer, and ω ∈ Rn . Consider the
problem
⎧ ⎫
⎨n ⎬
2
Dmin (ω, a) := min (xj − ωj )2 , : aT x = b and x0 ≤ K . (13)
⎩ ⎭
j=1

Clearly, the sum of smallest n − K values ωj2 constitutes a (“naive”) lower bound
for problem (13). But it is straightforward to show that an exact solution to (13)
is obtained by choosing S ⊆ {1, . . . , n} with |S| ≤ K, so as to minimize

(b − j∈S aj ωj )2 
 2 + ωj2 . (14)
j∈S aj j ∈S
/

[We use the convention that 0/0 = 0.] Empirically, the naive bound mentioned
above is very weak since the first term in (14) is typically at least an order of
magnitude larger than the second; and it is the bound, rather than the set S
itself, that matters.
Suppose aj = 1 for all j. It can be shown, using (14), that the optimal set S
has the following structure: S = P ∪ N , where |P | + |N | ≤ K, and P consists of
the indices of the |P | smallest nonnegative ωj (resp., N consists of the indices
of the |N | smallest |ωj | with ωj < 0). The optimal S can be computed in O(K)
Eigenvalue Techniques for Convex Objective Problems 37

time, after sorting the ωj . When ω ≥ 0 or ω ≤ 0 we recover the naive procedure


mentioned above (though again we stress that the first term in (14) dominates).
In general, however, we have:
2
Theorem 3. (a) It is NP-hard to compute Dmin (ω, a). (b) Let 0 < < 1. We
can compute a vector x̂ with j aj x̂j = b and x̂0 ≤ K, and such that


n
(x̂j − ωj )2 ≤ (1 + )Dmin
2
(ω, a),
j=1

in time polynomial in n, −1 , and the number of bits needed to represent ω


and a.
In our current implementation we have not used the algorithm in part (b) of the
Lemma, though we certainly plan to evaluate this option. Instead, we proceed
as follows. Assume aj = 0 for all j. Rather than solving problem (13), instead
we consider
⎧ ⎫
⎨ n ⎬
min a2j (xj − ωj )2 : aT x = b and x0 ≤ K .
⎩ ⎭
j=1

Writing
 ω̄j = aj ω j (for all
j), this becomes
n 
j=1 (xj − ω̄j ) j xj = b and x0 ≤ K , which as noted above
2
min :
can be efficiently solved.

1.5 Application of the S-Lemma


Let M = QΛQT  0 be a matrix given by its eigenvector factorization. Let H
be a hyperplane through the origin, x̂ ∈ H, v ∈ Rn , δj > 0 for 1 ≤ j ≤ n, β > 0,
and v ∈ Rn . Here we solve the problem

n
min xT M x + v T x, subject to δi (xi − x̂i )2 ≥ β, and x ∈ H. (15)
i=1

By rescaling, translating, and appropriately changing notation, the problem


becomes:

n
min xT M x + v T x, subject to x2i ≥ β, and x ∈ H. (16)
i=1

Let P be the n × n matrix corresponding to projection onto H. Using Section


1.3 we can produce a representation of P M P as Q̃Λ̃Q̃T , where the the nth
eigenvector q̃n is orthogonal to H, and λ̃1 = mini<n {λ̃i }. Thus, problem (16)
becomes, for appropriately defined ṽ,

n−1 
n−1
Γ := min λ̃j yj2 + 2ṽ T y, subject to yj2 ≥ β. (17)
j=1 j=1
38 D. Bienstock

Using the S-lemma, we have that Γ ≥ γ, iff there exists μ ≥ 0 s.t.


⎛ ⎞

n−1 
n−1
λ̃j yj2 + 2ṽ T y − γ − μ ⎝ yj2 − β ⎠ ≥ 0 ∀ y ∈ Rn−1 . (18)
j=1 j=1

Using some linear algebra, this is equivalent to


 
 ṽ 2
n−1
Γ = max μβ − i
: 0 ≤ μ < λ̃1 . (19)
λ̃ − μ
i=1 i

This is a simple task, since in [0, λ̃1 ) the objective in (19) is concave in μ.
Remarks:
(1) Our updated template in Section 1.2 requires the solution of multiple prob-
lems of the form (19) but just one computation of Q̃ and Λ̃.
(2) Consider any integer 1 ≤ p < n − 1. When μ < λ̃1 ,  the expression max-
p ṽi2
n−1 2
i=p+1 ṽi
imized in (19) is lower bounded by μβ − i=1 λ̃i −μ − λp+1 −μ . This, and
related facts, yield an approximate version of our approach which only asks for
the first p elements of the eigenspace of P M P (and M ).

Capturing the second eigenvalue. We see that Γ < λ̃1 β (and frequently this
bound is close). In experiments, the solution y ∗ to (16) often “cheats” in that y1∗
is close to zero. We can then improve on our procedure if the second projected
eigenvalue, λ̃2 , is significantly larger than λ̃1 . Assuming that is the case, pick a
value θ with y1∗2 /β < θ < 1.

n that y1 ≥ θβ
2
(a) If we assert then we may be able to strengthen the constraint
in (15) to i=1 δi (xi − x̂i )2 ≥ γ, where γ = γ(θ) > β. See Lemma 3 below. So
the assertion amounts to applying
 the2 S-lemma, but using γ in place of β.
(b) Otherwise, we have that n−1 i=2 yi ≥ (1 − θ)β. In this case, instead of the
right-hand side of (19), we will have
 
 ṽ 2
n−1
max μ(1 − θ)β − i
: 0 ≤ μ ≤ λ̃2 . (20)
λ̃ − μ
i=2 i

The minimum of the quantities obtained in (a) and (b) yields a valid lower
bound on Γ ; we can evaluate several candidates for θ and choose the strongest
bound. When λ̃2 is significantly larger than λ̃1 we often obtain an improvement
over the basic approach as in Section 1.5.
Note: the approach in this section constitutes a form of branching and in our
testing has proved very useful when λ2 > λ1 . It is, intrinsically, a combinatorial
approach, and thus not easily reproducible using convexity arguments alone.
To complete this section, we point out that the construction of the quantities
γ(β) above is based on the following observation:
Eigenvalue Techniques for Convex Objective Problems 39

Lemma 3. Let v ∈ Rn , let H ⊂ Rn be a (n − 1)-dimensional hyperplane with


v ∈/ H, and w be the projection of v onto H. Let G ⊂ Rn be a ≤ (n − 1)-
dimensional hyperplane, K the intersection of G with the closed half-space of Rn
separated from v by H, Dw,G the distance from w to G and D̄v,K the distance
from v to K (D̄v,K = +∞ if K = ∅). Then D̄v,K 2
≥ v − w2 + Dw,G 2
.

2 Computational Experiments

We consider problems min{ xT M x + v T x : j xj = 1, x ≥ 0, x0 ≤ K }. The
matrix M  0 is given in its eigenvector/eigenvalue factorization QΛQT . To
stress-test our linear algebra routines, we construct Q as the product of random
rotations: as the number of rotations increases, so does the number of nonzeroes
in Q, and the overall “complexity” of M . We ran our procedure after computing
the solution to the (diagonalized) “weak” formulation

min{ y T Λy + v T x : QT x = y, xj = 1, x ≥ 0}.
j

We also ran the (again, diagonalized) perspective formulation [10], [12], a strong
conic formulation (here, λmin is the minimum λi ):
 
min λmin wj + (λj − λmin )yj2
j j

T
s.t. Q x = y, xj = 1
j

x2j − wj zj ≤ 0, 0 ≤ zj ≤ 1 ∀ j, (21)

zj ≤ K, xj ≤ zj ∀ j, x, w ∈ Rn+ .
j

We used the Updated Template given above, with c = ∇(x∗ ) and with the
α(i) quantities set according to the following method: (a) J = 100, and (b)
α(J) = argmax{α ≥ 0 : H α ∩ S n−1 = ∅} (S n−1 is the unit simplex). The
improvement technique involving the second eigenvalue was applied in all cases.
For the experiments in Tables 1 and 2, we used Cplex 12.1 on a single core
of a 2.66 GHz quad-core Xeon machine with 16 GB of RAM, which was never
exceeded. In the tests in Table 1, n = 2443 and the eigenvalues are from a
finance application. Q is the product of 5000 random rotations, resulting in
142712 nonzeros in Q.
Here, rQMIP refers to the weak formulation, PRSP to the perspective for-
mulation, and SLE to the approach in this paper. “LB” is the lower bound
produced by a given approach, and “sec” is the CPU time in seconds. The second
eigenvalue technique proved quite effective in all these tests.
In Table 2 we consider examples with n = 10000 and random Λ. In the table,
Nonz indicates the number of nonzeroes in Q; as this number increases the
quadratic becomes less diagonal dominant.
40 D. Bienstock

Table 1. Examples with few nonzeroes

K rQMIP PRSP SLE rQMIP PRSP SLE


LB LB LB sec sec sec

200 0.031 0.0379 0.0382 14.02 59.30 5.3


100 0.031 0.0466 0.0482 13.98 114.86 5.8
90 0.031 0.0485 0.0507 14.08 103.38 5.9
80 0.031 0.0509 0.0537 14.02 105.02 6.2
70 0.031 0.0540 0.0574 13.95 100.06 6.2
60 0.031 0.0581 0.0624 15.64 111.63 6.4
50 0.031 0.0638 0.0696 13.98 110.78 6.4
40 0.031 0.0725 0.0801 14.03 104.48 6.5
30 0.031 0.0869 0.0958 14.17 104.48 6.8
20 0.031 0.1157 0.1299 15.69 38.13 6.9
10 0.031 0.2020 0.2380 14.05 43.77 7.2

Table 2. Larger examples

Nonz rQMIP PRSP SLE rQMIP PRSP SLE


in Q LB LB LB sec sec sec

5.3e+05 2.483e-03 1.209e-02 1.060e-02 332 961.95 57.69


3.7e+06 2.588e-03 1.235e-02 1.113e-02 705 2299.75 57.55
1.8e+07 2.671e-03 1.248e-02 1.117e-02 2.4e+03 1.3e+04 57.69
5.3e+07 2.781e-03 1.263e-02 1.120e-02 1.1e+04 8.5e+04 58.44
8.3e+07 2.758e-03 1.262e-02 1.211e-02 2.3e+04 1.4e+05 57.38

As in Table 1, SLE and PRSP provide similar improvements over rQMIP


(which is clearly extremely weak). SLE proves uniformly fast. In the examples
in Table 2, the smallest ten (or so) eigenvalues are approximately equal, with
larger values after that. As a result, on these examples our second eigenvalue
technique proved ineffective.
Also note that the perspective formulation quickly proves impractical. A
cutting-plane procedure that replaces the conic constraints in (21) with (outer
approximating) linear inequalities is outlined in [10], [12] and tested on random
problems with n ≤ 400. The procedure begins by solving rQMIP and then it-
eratively adds the inequalities; or it could simply solve a formulation consisting
of rQMIP, augmented with a set of pre-computed inequalities. In our experi-
ments with this linearized approximation, we found that (a) it can provide a very
good lower bound to the conic perspective formulation, (b) it can run signifi-
cantly faster than the full conic formulation, but, (c) it proves significantly slower
than rQMIP, and, in particular, still significantly slower than the combination
Eigenvalue Techniques for Convex Objective Problems 41

Table 3. Detailed analysis of K = 70 case of Table 1

algorithm threads nodes wall-clock time (sec) LB UB

QPMIP
mip emph 3 4 10000 41685 0.0314 0.241
(16.67 sec/node)

PRSP-MIP
mip emph 2 16 14000 39550 0 0.8265
(90.4 sec/node)
mip emph 3 16 7000 19817 0 0.8099
(45.30 sec/node)

LPRSP-MIP

mip emph 0 4 39000 109333 0.0554 0.305


(11.21 sec/node) root 0.0540

mip emph 1 16 7000 36751 0.0542 0.412


(84.04 sec/node) root 0.0540

mip emph 2 16 16000 35222 0.0543 0.309


(35.22 sec/node) root 0.0540

mip emph 3 16 6000 57469 0.0564 0.702


(153 sec/node) root 0.0540

of rQMIP and SLE. A strengthened version of the perspective formulation,


which requires the solution of a semidefinite program, is given in [11].
Note that the perspective formulation itself is an example of the paradigm that
we consider in this paper: a convex formulation for a nonconvex problem with a
convex objective; thus we expect it to exhibit stalling. Table 3 concerns the K =
70 case of Table 1, using Cplex 12.1 on a dual 2.93 GHz quad-core “Nehalem”
machine with 48GB of physical memory. [This CPU uses “hyperthreading” and
Cplex 12.1, as a default, will use 16 threads]. On this machine, rQMIP requires
4.35 seconds (using Cplex) and our method, 3.54 seconds (to prove a lower bound
of 0.0574).
In this table, QPMIP is the weak formulation, PRSP-MIP is the perspec-
tive formulation, and LPRSP-MIP is the linearized perspective version (con-
straint (21) is linearized at xj = 1/K which proved better than other choices).
[Comment: Cplex 12.1 states a lower bound of 0 for PRSP-MIP]. “wall-clock
time” indicates the observed running time. The estimates of CPU time per node
were computed using the formula (wall-clock time)*threads/nodes.
42 D. Bienstock

References
1. Ben-Tal, A., Nemirovsky, A.: Lectures on Modern Convex Optimization: Analysis,
Algorithms, and Engineering Applications. MPS-SIAM Series on Optimization.
SIAM, Philadelphia (2001)
2. Bienstock, D.: Computational study of a family of mixed-integer quadratic pro-
gramming problems. Math. Programming 74, 121–140 (1996)
3. Bienstock, D., Zuckerberg, M.: Subset algebra lift algorithms for 0-1 integer pro-
gramming. SIAM J. Optimization 105, 9–27 (2006)
4. Bienstock, D., McClosky, B.:Tightening simple mixed-integer sets with guaranteed
bounds (submitted 2008)
5. Boyd, S., El Ghaoui, L., Feron, E., Balakrishnan, V.: Linear matrix inequalities in
system and control theory. SIAM, Philadelphia (1994)
6. Cook, W., Kannan, R., Schrijver, A.: Chv’atal closures for mixed integer programs.
Math. Programming 47, 155–174 (1990)
7. De Farias, I., Johnson, E., Nemhauser, G.: A polyhedral study of the cardinality
constrained knapsack problem. Math. Programming 95, 71–90 (2003)
8. Golub, G.H.: Some modified matrix eigenvalue problems. SIAM Review 15, 318–
334 (1973)
9. Golub, G.H., van Loan, C.: Matrix Computations. Johns Hopkins University Press,
Baltimore (1996)
10. Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0-1 mixed integer
programs. Mathematical Programming 106, 225–236 (2006)
11. Frangioni, A., Gentile, C.: SDP Diagonalizations and Perspective Cuts for a Class
of Nonseparable MIQP. Oper. Research Letters 35, 181–185 (2007)
12. Günlük, O., Linderoth, J.: Perspective Relaxation of Mixed Integer Nonlinear Pro-
grams with Indicator Variables. In: Lodi, A., Panconesi, A., Rinaldi, G. (eds.)
IPCO 2008. LNCS, vol. 5035, pp. 1–16. Springer, Heidelberg (2008)
13. Moghaddam, B., Weiss, Y., Avidan, S.: Generalized spectral bounds for sparse
LDA. In: Proc. 23rd Int. Conf. on Machine Learning, pp. 641–648 (2006)
14. Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat.
Comput. 4, 553–572 (1983)
15. Pólik, I., Terlaky, T.: A survey of the S-lemma. SIAM Review 49, 371–418 (2007)
16. Rendl, F., Wolkowicz, H.: A semidefinite framework for trust region subproblems
with applications to large scale minimization. Math. Program 77, 273–299 (1997)
17. Stern, R.J., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric
eigenvalue perturbations. SIAM J. Optim. 5, 286–313 (1995)
18. Sturm, J., Zhang, S.: On cones of nonnegative quadratic functions. Mathematics
of Operations Research 28, 246–267 (2003)
19. Miller, W., Wright, S., Zhang, Y., Schuster, S., Hayes, V.: Optimization methods for
selecting founder individuals for captive breeding or reintroduction of endangered
species (2009) (manuscript)
20. Yakubovich, V.A.: S-procedure in nonlinear control theory, vol. 1, pp. 62–77. Vest-
nik Leningrad University (1971)
21. Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14,
245–267 (2003)
Restricted b-Matchings
in Degree-Bounded Graphs

Kristóf Bérczi and László A. Végh

MTA-ELTE Egerváry Research Group (EGRES),


Department of Operations Research, Eötvös Loránd University,
Pázmány Péter sétány 1/C, Budapest, Hungary, H-1117
{berkri,veghal}@cs.elte.hu

Abstract. We present a min-max formula and a polynomial time al-


gorithm for a slight generalization of the following problem: in a simple
undirected graph in which the degree of each node is at most t + 1, find a
maximum t-matching containing no member of a list K of forbidden Kt,t
and Kt+1 subgraphs. An analogous problem for bipartite graphs without
degree bounds was solved by Makai [15], while the special case of finding
a maximum square-free 2-matching in a subcubic graph was solved in [1].

Keywords: square-free, Kt,t -free, Kt+1 -free, b-matching, subcubic graph.

1 Introduction
Let G = (V, E) be an undirected graph and let b : V → Z+ be an upper bound
on the nodes. An edge set F ⊆ E is called a b-matching if dF (v), the number of
edges in F incident to v, is at most b(v) for each node v. (This is often called
simple b-matching in the literature.) For some integer t ≥ 2, by a t-matching
we mean a b-matching with b(v) = t for every v ∈ V . Let K be a set consisting
of Kt,t ’s, complete bipartite subgraphs of G on two colour classes of size t, and
Kt+1 ’s, complete subgraphs of G on t + 1 nodes. The node-set and the edge-set
of a subgraph K ∈ K are denoted by VK and EK , respectively. By a K-free b-
matching we mean a b-matching not containing any member of K. In this paper,
we give a min-max formula on the size of K-free b-matchings and a polynomial
time algorithm for finding one with maximum size (that is, a K-free b-matching
F ⊆ E with maximum cardinality) under the assumptions that for any K ∈ K
and any node v of K,
VK spans no parallel edges (1)
b(v) = t (2)
dG (v) ≤ t + 1. (3)
Note that this is a generalization of the problem mentioned in the abstract. The
most important special case of K-free b-matching is to find a maximum C3 -free

Supported by the Hungarian National Foundation for Scientific Research (OTKA)
grant K60802.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 43–56, 2010.

c Springer-Verlag Berlin Heidelberg 2010
44 K. Bérczi and L.A. Végh

or C4 -free 2-matching in a graph where Ck stands for a cycle of length k. The


motivation for these problems is twofold. On the one hand, a natural relaxation
of the Hamiltonian cycle problem is to find a C≤k -free 2-factor, that is, a 2-factor
containing no cycle of length at most k. Cornuéjols and Pulleyblank [2] showed
this problem to be NP-complete for k ≥ 5. In his Ph.D. thesis [6], Hartvigsen
proposed a solution for the case k = 3. Hence the remaining question is to
find a maximum C≤4 -free 2-matching, and another natural question is to find a
maximum C4 -free 2-matching (possibly containing triangles).
The other motivation comes from connectivity-augmentation, that is, when
one would like to make a graph G = (V, E) k-node-connected by the addition of
a minimum number of new edges. It is easy to see that for k = n − 2 (n = |V |)
this problem is equivalent to finding a maximum matching in the complement
graph of G. For k = n − 3 the problem is equivalent to finding a maximum
C4 -free 2-matching.
The C4 -free 2-matching problem admits two natural generalizations. The first
one is Kt,t -free t-matchings considered in this paper, while the second is t-
matchings containing no complete bipartite graph Ka,b with a + b = t + 2.
This latter problem is equivalent to connectivity augmentation for k = n − t − 1.
The complexity of connectivity augmentation for general k is yet open, while
connectivity augmentation by one, that is, when the input graph is already
(k − 1)-connected was recently solved in [20] (this corresponds to the case when
the graph contains no Ka,b with a + b = t + 3, in particular, d(v) ≤ t + 1).
The weighted versions of these problems are also of interest. The weighted
C≤k -free 2-matching problem asks for a C≤k -free 2-matching with maximum
weight for a weight function defined on the edge set. For k = 2 the problem
is just to find a 2-matching with maximum weight, while Király showed [11]
that the problem is NP-complete for k = 4 even in bipartite graphs with 0 − 1
weights on the edges. The case of k = 3 in general graphs is still open. Hartvigsen
and Li [9], and recently Kobayashi [12] gave polynomial-time algorithms for
the weighted C3 -free 2-matching problem in subcubic graphs with an arbitrary
weight function.
Let us now consider the special case of C4 -free 2-matchings in bipartite graphs.
This problem was solved by Hartvigsen [7,8] and Király [10]. A generalization of
the problem to maximum Kt,t -free t-matchings in bipartite graphs was given by
Frank [3] who observed that this is a special case of covering positively crossing
supermodular functions on set pairs, solved by Frank and Jordán in [4]. Makai
[15] generalized Frank’s theorem for the case when a list K of forbidden Kt,t ’s
is given (that is, a t-matching may contain Kt,t ’s not in K.) He gave a min-max
formula based on a polyhedral description for the minimum cost version for node-
induced cost functions. Pap [16] gave a further generalization of the maximum
cardinality version for excluded complete bipartite subgraphs and developed a
simple, purely combinatorial algorithm. For node induced cost functions, such
an algorithm was given by Takazawa [19] for Kt,t -free t-matching.
Much less is known when the underlying graph is not assumed to be bipartite
and finding a maximum C4 -free 2-matching is still open. The special case when
Restricted b-Matchings in Degree-Bounded Graphs 45

the graph is subcubic was solved by the first author and Kobayashi [1]. In terms
of connectivity augmentation, the equivalent problem is augmenting an (n − 4)-
connected graph to (n − 3) connected. Our theorem is a generalization of this
result.
It is worth mentioning that the polynomial solvability of the above problems
seems to show a strong connection with jump systems. In [18], Szabó proved that
for a list K of forbidden Kt,t and Kt+1 subgraphs the degree sequences of K-free t-
matchings form a jump system in any graph. Concerning bipartite graphs,
Kobayashi and Takazawa showed [14] that the degree sequences of C≤k -free 2-
matchings do not always form a jump system for k ≥ 6. These results are con-
sistent with the polynomial solvability of the C≤k -free 2-matching problem, even
when restricting it to bipartite graphs. Similar results are known about even fac-
tors due to [13]. Although Szabó’s result suggests that finding a maximum K-free
t-matching should be solvable in polynomial time, the problem is still open.
Among our assumptions, (1) and (2) may be considered as natural ones as
they hold for the maximum Kt,t -free t-matching problem in a simple graph.
We exclude parallel edges on the node sets of members of K in order to avoid
having two different Kt,t ’s on the same two colour classes or two Kt+1 ’s on the
same ground set. However, the degree bound (3) is a restrictive assumption and
dissipates essential difficulties. Our proof strongly relies on this and the theorem
cannot be straightforwardly generalized, as it can be shown by using the example
in Chapter 6 of [20].
The proof and algorithm use the contraction technique of [11], [16] and [1].
Our contribution on the one hand is the extension of this technique for t ≥ 2 and
forbidding Kt+1 ’s as well, while on the other hand the argument is significantly
simpler than the argument in [1].
Throughout the paper we use the following notation. For an undirected graph
G = (V, E), the set of edges induced by X ⊆ V is denoted by E[X]. For disjoint
subsets X, Y of V , E[X, Y ] denotes the set of edges between X and Y . The set of
nodes in V − X adjacent to X by some edge from F ⊆ E is denoted by ΓF (X).
We let dF (v) denote the number of edges in F ⊆ E incident to v, where loops
in G are counted twice, while dF (X, Y ) stands for the number of edges going
between disjoint subsets X and Y . For a node v ∈ V , we sometimes abbreviate
the set {v} by v, e.g. dF (v,
 X) is the number of edges between v and X. For a
set X ⊆ V , let hF (X) = v∈X dF (v), the sum of the number of edges  incident
to X and twice the number of edges spanned by X. We use b(U ) = v∈U b(v)
for a function b : V → Z+ and a set U ⊆ V .
Let K be the list of forbidden Kt,t and Kt+1 subgraphs. For disjoint subsets
X, Y of V we denote by K[X] and K[X, Y ] the members of K contained in X
and having edges only between X and Y , respectively. That is, K[X, Y ] stands
for forbidden Kt,t ’s whose colour classes are subsets of X and Y . Recall that
VK and EK denote the node-set and edge-set of the forbidden graph K ∈ K,
respectively.
The rest of the paper is organized as follows. In Section 2 we formalize the
theorem and prove the trivial max ≤ min direction. Two shrinking operations
46 K. Bérczi and L.A. Végh

are introduced in Section 3, and Section 4 contains the proof of the max ≥ min
direction. Finally, the algorithm is presented in Section 5.

2 Main Theorem
Before stating our theorem, let us recall the well-known min-max formula on the
maximum size of a b-matching (see e.g. [17, Vol A, p. 562.]).

Theorem 1 (Maximum size of a b-matching). Let G = (V, E) be a graph


with an upper bound b : V → Z+ . The maximum size of a b-matching is equal to
the minimum value of
 
b(U ) + |E[W ]| + 2 (b(T ) + |E[T, W ]|)
1
(4)
T

where U and W are disjoint subsets of V , and T ranges over the connected
components of G − U − W .

Let us now formulate our theorem. There are minor technical difficulties when
t = 2 that do not occur for larger t. In order to make both the formulation and
the proof simpler it is worth introducing the following definitions. We refer to
forbidden K2,2 and K3 subgraphs as squares and triangles, respectively.

Definition 2. For t = 2, we call a complete subgraph on four nodes square-full


if it contains three forbidden squares.

Note that, by assumption (3), every square-full subgraph is a connected com-


ponent of G. We denote the number of square-full components of G by S(G)
for t = 2, and define S(G) = 0 for t > 2. It is easy to see that a K-free b-
matching contains at most three edges from each square-full component of G.
The following definition will be used in the proof of the theorem.

Definition 3. For t = 2, a forbidden triangle is called square-covered if its node


set is contained in the node set of a forbidden square, otherwise uncovered.

The theorem is as follows.

Theorem 4. Let G = (V, E) be a graph with an upper bound b : V → Z+ and


K be a list of forbidden Kt,t and Kt+1 subgraphs of G so that (1), (2) and (3)
hold. Then the maximum size of a K-free b-matching is equal to the minimum
value of
 
b(U ) + |E[W ]| − |K̇[W ]| + 1
2 (b(T ) + |E[T, W ]| − |K̇[T, W ]|) − S(G) (5)
T ∈P

where U and W are disjoint subsets of V , P is a partition of the connected


components of G − U − W and K̇ ⊆ K is a collection of node-disjoint forbidden
subgraphs.
Restricted b-Matchings in Degree-Bounded Graphs 47

For fixed U, W, P and K̇ the value of (5) is denoted by τ (U, W, P, K̇). It is easy
to see that the contribution of a square-full component to (5) is always 3 and
a maximum K-free b-matching contains exactly 3 of its edges. Hence we may
count these components of G separately, so the following theorem immediately
implies the general one.

Theorem 5. Let G = (V, E) be a graph with an upper bound b : V → Z+ and


K be a list of forbidden Kt,t and Kt+1 subgraphs of G so that (1), (2) and (3)
hold. Furthermore, if t = 2, assume that G has no square-full component. Then
the maximum size of a K-free b-matching is equal to the minimum value of
 
b(U ) + |E[W ]| − |K̇[W ]| + 1
2 (b(T ) + |E[T, W ]| − |K̇[T, W ]|) (6)
T ∈P

where U and W are disjoint subsets of V , P is a partition of the connected


components of G − U − W and K̇ ⊆ K is a collection of node-disjoint forbidden
subgraphs.

Proof (of max ≤ min in Theorem 5). Let M be a K-free b-matching. Then clearly
|M ∩(E[U ]∪E[U, V −U ])| ≤ b(U ) and |M ∩E[W ]| ≤ |E[W ]|−|K̇[W ]|. Moreover,
for each T ∈ P we have

2 · |M ∩ (E[T ] ∪ E[T, W ])| = 2 · |M ∩ E[T ]| + 2 · |M ∩ E[T, W ]|


≤ 2 · |M ∩ E[T ]| + |M ∩ E[T, W ]|
+ |E[T, W ]| − |K̇[T, W ]|
≤ b(T ) + |E[T, W ]| − |K̇[T, W ]|.

These together prove the inequality. 




3 Shrinking
In the proof of max ≥ min we use two shrinking operations to get rid of the Kt,t
and Kt+1 subgraphs in K.

Definition 6 (Shrinking a Kt,t subgraph). Let K be a Kt,t subgraph of


G = (V, E) with colour classes KA and KB . Shrinking K in G consists of the
following operations:
• identify the nodes in KA , and denote the corresponding node by ka ,
• identify the nodes in KB , and denote the corresponding node by kb , and
• replace the edges between KA and KB with t − 1 parallel edges between ka
and kb (we call the set of these edges a shrunk bundle between ka and kb ).

When identifying the nodes in KA and KB , the edges (and also loops) spanned
by KA and KB are replaced by loops on ka and kb , respectively. Each edge
48 K. Bérczi and L.A. Végh

KA ka

t − 1 edges

KB kb

Fig. 1. Shrinking a Kt,t subgraph

e ∈ E − EK is denoted by e again after shrinking a Kt,t subgraph and is called


the image of the original edge. By abuse of notation, for an edge set F ⊆ E −EK ,
the corresponding subset of edges in the contracted graph is also denoted by F .
Hence for an edge set F ⊆ E−EK we have hF (KA ) = dF (ka ), hF (KB ) = dF (kb ).
Definition 7 (Shrinking a Kt+1 subgraph). Let K be a Kt+1 subgraph of
G = (V, E). Shrinking K in G consists of the following operations:

• identify the nodes in VK , and denote the corresponding node by k,


 
• replace the edges in EK by t+12 − 1 loops on the new node k.

VK

 t+1
2
 − 1 loops

Fig. 2. Shrinking a Kt+1 subgraph

Again, for an edge set F ⊆ E − EK , the corresponding subset of edges in the


contracted graph is also denoted by F .
We usually denote the graph obtained by applying one of the shrinking op-
erations by G◦ = (V ◦ , E ◦ ). Throughout the section, the graph G, the function
b and the list K of forbidden subgraphs are supposed to satisfy the conditions
of Theorem 5. It is easy to see, by using (3), that two members of K are edge-
disjoint if and only if they are also node-disjoint, hence we simply call such pairs
disjoint.
The following two lemmas give the connection between the maximum size
of a K-free b-matching in G and a K◦ -free b◦ -matching in G◦ where b◦ is a
properly defined upper bound on V ◦ and K◦ is a list of forbidden sugraphs in
the contracted graph.
Restricted b-Matchings in Degree-Bounded Graphs 49

Lemma 8. Let G◦ = (V ◦ , E ◦ ) be the graph obtained by shrinking a Kt,t sub-


graph K. Let K◦ be the set of forbidden subgraphs disjoint from K and define
b◦ as b◦ (v) = b(v) for v ∈ V − VK and b◦ (ka ) = b◦ (kb ) = t. Then the difference
between the maximum size of a K-free b-matching in G and the maximum size
of a K◦ -free b◦ -matching in G◦ is exactly t2 − t.
Lemma 9. Let G◦ = (V ◦ , E ◦ ) be the graph obtained by shrinking a Kt+1 sub-
graph K ∈ K where K is uncovered if t = 2. Let K◦ be the set of forbidden sub-
graphs disjoint from K and define b◦ as b◦ (v) = b(v) for v ∈ V − VK , b◦ (k) = t if
t is even and b◦ (k) = t + 1 if t is odd. Then the difference between the maximum
size of a K-free b-matching
 2 in G and the maximum size of a K◦ -free b◦ -matching
◦ t
in G is exactly 2 .

The proof of Lemma 8 is based on the following claim.


Claim 10. Assume that K ∈ K is a Kt,t subgraph with colour classes KA and
KB and M  is a K-free b-matching of G − EK . Then M  can be extended to a
K-free b-matching M of G with |M | = |M  | + t2 − max{1, hM  (KA ), hM  (KB )}.
Proof. First we consider the case t ≥ 3. Let P be a minimum size matching of
K covering each node v ∈ VK with dM  (v) = 1 (note that dM  (v) ≤ 1 for v ∈ VK
as d(v) ≤ t + 1). If there is no such node, then let P consist of an arbitrary
edge in EK . We claim that M = M  ∪ (EK − P ) satisfies the above conditions.
Indeed, M is a b-matching and |M ∩ EK | = t2 − max{1, hM  (KA ), hM  (KB )}
clearly holds, so we only have to verify that it is also K-free.
Assume that there is a forbidden Kt,t subgraph K  in M with colour classes
   
KA , KB . EK  must contain an edge uv ∈ EK ∩ M with u ∈ KA and v ∈ KB .

By symmetry, we may assume that u ∈ KA . As b(u) = t, ΓM (u) = KB and also

|ΓM (u) ∩ KB | ≥ t − 1. Hence |KB ∩ KB | ≥ t − 1. Consider a node z ∈ KA .
 
Since dM (z, KB ) ≥ t − 1 and t ≥ 3, we get dM (z, KB ) > 0, thus KA ⊆ ΓM (KB ).
   
Because of ΓM (KB ) = KA , this gives KA = KA . KB = KB follows similarly,
giving a contradiction.
If there is a forbidden Kt+1 subgraph K  in M , then EK  must contain an
edge uv ∈ EK ∩ M , u ∈ KA . As above, |VK  ∩ KB | ≥ t − 1. Using t ≥ 3
again, KA ⊆ ΓM (VK  ∩ KB ) ⊆ VK  . But KA ⊆ VK  is a contradiction since
t + 1 = |VK  | ≥ |VK  ∩ KA | + |VK  ∩ KB | ≥ t + t − 1 = 2t − 1 > t + 1.
Now let t = 2 and KA = {v1 , v3 }, KB = {v2 , v4 }. If max{hM  (KA ), hM  (KB )}
≤ 1, then we may assume by symmetry that dM  (v1 ) = dM  (v2 ) = 0. Clearly,
M = M  ∪{v1 v2 , v1 v4 , v2 v3 } is a K-free 2-matching. If max{hM  (KA ), hM  (KB )}
= 2, we claim that at least one of M1 = M  ∪ {v1 v2 , v3 v4 } and M2 = M  ∪
{v1 v4 , v2 v3 } is K-free. Assume M1 contains a forbidden square or triangle K  ;
by symmetry assume it contains the edge v1 v2 . If K  contains v3 v4 as well, then
K  is the square v1 v3 v4 v2 . Otherwise, it consists of v1 v2 and a path L of length
2 or 3 between v1 and v2 , not containing v3 and v4 . In the first case, the only
forbidden subgraph possibly contained in M2 is the square v1 v3 v2 v4 , implying
that {v1 , v2 , v3 , v4 } is a square-full component, a contradiction. In the latter case,
it is easy to see that M2 cannot contain a forbidden subgraph. 

50 K. Bérczi and L.A. Végh

Proof (of Lemma 8). First we show that if M is a K-free b-matching in G then
there is a K◦ -free b◦ -matching M ◦ in G◦ with |M ◦ | ≥ |M | − (t2 − t). Let M  =
M −EK . Clearly, |M ∩EK | ≤ t2 −max{1, hM  (KA ), hM  (KB )}. In G◦ , let M ◦ be
the union of M  and t− max{1, dM  (ka ), dM  (kb )} parallel edges from the shrunk
bundle between ka and kb . Is is easy to see that M ◦ is a K◦ -free b◦ -matching in
G◦ with |M ◦ | ≥ |M | − (t2 − t).
The proof is completed by showing that for an arbitrary K◦ -free b◦ -matching
M in G◦ there exists a K-free b-matching M in G with |M | ≥ |M ◦ | + (t2 − t).

Let H denote the set of parallel edges in the shrunk bundle between ka and kb ,
and let M  = M ◦ − H. Now |M ◦ ∩ H| ≤ t − max{1, dM  (ka ), dM  (kb )} and, by
Claim 10, M  may be extended to a K-free b-matching in G with |M ∩ EK | =
t2 − max{1, hM  (KA ), hM  (KB )}, that is

|M | = |M ◦ | − |M ◦ ∩ H| + |M ∩ EK | ≥ |M ◦ | − (t − max{1, dM  (ka ), dM  (kb )})


+ (t2 − max{1, hM  (KA ), hM  (KB )}) ≥ |M ◦ | + (t2 − t). 


Lemma 9 can be proved in a similar way by using the following claim.

Claim 11. Assume that K ∈ K is a Kt+1 subgraph and M  is a K-free b-


matching of G − EK . If t = 2, then assume that K is uncovered. Then M  can
be
 extended to obtain
 a K-free b-matching M of G with |M | = |M  | + t+1
2 −
max{1,hM  (VK )}
2 .

Proof. Let P be a minimum size subgraph of K covering each node v ∈ VK with


dM  (v) = 1 (so P is a matching or a matching and one more edge in EK ). If there
is no such node, then let P consist of an arbitrary edge in EK . For t = 2 and
3, we will choose P in a specific way, as given later in the proof. We show that
M = M  ∪ (EK − P ) satisfies the above conditions. Indeed, M is a b-matching
   max{1,hM  (K)} 
and |M ∩ EK | = t+1 2 − 2 clearly holds, so we only have to show
that it is also K-free.
Assume that there is a forbidden Kt+1 subgraph K  in M . EK  must contain
an edge uv ∈ EK ∩M . By the minimal choice of P at least one of |ΓM (u)∩VK | ≥

v3 v3

v4 v1 v4 v1

v2 v2
: edges in M : edges in M
: edges in P : edges in P

Fig. 3. Choice of P for t = 2 in the proof of Claim 11


Restricted b-Matchings in Degree-Bounded Graphs 51

v1 v2 x v1 v2 x
 
KA KA
K K
 
KB KB
v3 v4 y v3 v4 y

: edges in M 
: edges in P
: edges in E \ (P ∪ M  )

Fig. 4. Choice of P for t = 3 in the proof of Claim 11

t − 1 and |ΓM (v) ∩ VK | ≥ t − 1 is satisfied which implies |VK  ∩ VK | ≥ t − 1. For


t ≥ 3 this immediately implies VK ⊆ ΓM (VK  ∩ VK ) ⊆ VK  , a contradiction.
If t = 2, then |VK  ∩ VK | ≥ 1 does not imply VK ⊆ VK  and an improper
choice of P may enable M to contain a forbidden K3 . The only such case is
when hM  (VK ) = 3, VK = {v1 , v2 , v3 }, VK  = {v2 , v3 , v4 }, v2 v4 , v3 v4 ∈ M  and
P = {v1 v2 , v1 v3 } (Figure 3). In this case, we may leave the edge incident to v1
from M  and then P = {v2 v3 } is a good choice. Indeed, the only problem could
be that v1 v2 v3 v4 is a forbidden square, contradicting K being uncovered.
Otherwise hM  (VK ) ≤ 2 implies |P | ≤ 1. Hence at least one of |ΓM (u)∩VK | =
2 and |ΓM (v) ∩ VK | = 2 is satisfied meaning K  = K, a contradiction again.
Now assume that there is a forbidden Kt,t subgraph K  in M with colour
 
classes KA , KB . The same argument gives a contradiction for t ≥ 4. However,
in case of t = 3, choosing P arbitrarily may enable M to contain a forbidden

K3,3 in the following single configuration: VK = {v1 , v2 , v3 , v4 }, KA = {v1 , v2 , x},
 
KB = {v3 , v4 , y}, xv3 , xv4 , yv1 , yv2 , xy ∈ M and P = {v1 v2 , v3 v4 } (Figure 4).
In this case, P = {v1 v4 , v2 v3 } is a good choice.
Finally, for t = 2 no forbidden square appears if hM  (K) ≤ 2 as otherwise
K would be a square-covered triangle. If hM  (K) = 3, then such a square K 
may appear only if VK = {v1 , v2 , v3 }, VK  = {v2 , v3 , v4 , v5 }, v3 v4 , v4 v5 , v5 v2 ∈
M  , P = {v1 v2 , v1 v3 } (v1 = v4 , v5 as K is uncovered). In this case both P =
{v1 v2 , v2 v3 } and P = {v1 v3 , v2 v3 } give a proper M (Figure 5). 


v4 v3 v4 v3
v1 v1

v5 v2 v5 v2

: edges in M : edges in M
: edges in P : edges in P
Fig. 5. Choice of P for t = 2 in the proof of Claim 11
52 K. Bérczi and L.A. Végh

Proof (of Lemma 9). First we show that if M is a K-free b-matching  2  in G then
there is a K◦ -free b◦ -matching M ◦ in G◦ with |M ◦ | ≥ |M | − t2 . Let M  =
   max{1,hM  (VK )} 
M − EK . Clearly, |M ∩ EK | ≤ t+1 − . In G◦ , let M ◦ be the
 2  2

union of M  and t−max{1,d 2
M  (k)}
or t+1−max{1,d
2
M  (k)}
loops on k depending
on whether t is even or not, respectively. Is
 2 is easy to see that M ◦ is a K◦ -free
◦ ◦ ◦
b -matching in G with |M | ≥ |M | − 2 . t

The proof is completed by showing that for an arbitrary K◦ -free b◦ -matching  2


M in G◦ there exists a K-free b-matching M in G with |M | ≥ |M ◦ | + t2 .

Let H denote the set of loops on k obtained when  shrinking K, and let M  =
M ◦ − H. Now |M ◦ ∩ H| ≤ t−max{1,d M (k)}
if t is even and |M ◦ ∩ H| ≤
  2
t+1−max{1,dM  (k)}
if t is odd. By Claim 10, M  can be extended modified as to
   max{1,h  (VK )} 
2

get a K-free b-matching in G with |M ∩ EK | = t+1 2 − M


2 , that is
 
|M | = |M ◦ | − |M ◦ ∩ H| + |M ∩ EK | ≥ |M ◦ | − t−max{1,d 2
M  (k)}

   max{1,hM  (VK )}  ◦
 2
+ t+1 2 − 2 ≥ |M | + t
2

if t is even and
 
|M | = |M ◦ | − |M ◦ ∩ H| + |M ∩ EK | ≥ |M ◦ | − t+1−max{1,dM  (k)}
2
   max{1,hM  (VK )}   2
+ t+1
2 − 2 ≥ |M ◦ | + t2

if t is odd. 


4 Proof of Theorem 5
We prove max ≥ min by induction on |K|. For K = ∅, this is simply a consequence
of Theorem 1.
Assume now that K = ∅ and let K be a forbidden subgraph such that K is
uncovered if t = 2. Let G◦ = (V ◦ , E ◦ ) denote the graph obtained by shrinking
K, let b◦ be defined as in Lemma 8 or 9 depending on whether K is a Kt,t or a
Kt+1 . We denote by K◦ the list of forbidden subgraphs disjoint from K.
By induction, the maximum size of a K◦ -free b◦ -matching in G◦ is equal to the
minimum value of τ (U ◦ , W ◦ , P ◦ , K̇◦ ). Let us choose an optimal U ◦ , W ◦ , P ◦ , K˙◦
so that |U ◦ | is minimal. The following claim gives a useful property of U ◦ .
Claim 12. Assume that v ∈ U is such that d(v, W )+|Γ (v)∩(V −W )| ≤ b(v)+1.
Then τ (U −v, W, P  , K̇) ≤ τ (U, W, P, K̇) where P  is obtained from P by replacing
its members incident to v by their union plus v.
Proof. By removing v from U , b(U ) decreases by b(v). |E[W ]| − |K̇[W ]| remains
unchanged, while the bound on d(v, W ) + |Γ (v) ∩ (V − W )| implies that the
increment in the sum over the components of G − U − W is at most b(v). 

Restricted b-Matchings in Degree-Bounded Graphs 53

Case 1: K is a Kt,t with colour classes KA and KB .


By Lemma 8, the difference between the maximum size of a K-free b-matching
in G and the maximum size of a K◦ -free b◦ -matching in G◦ is exactly t2 − t. We
will define U, W, P and K̇ such that
τ (U, W, P, K̇) = τ (U ◦ , W ◦ , P ◦ , K˙◦ ) + t2 − t. (7)
The shrinking replaces KA and KB by two nodes ka and kb with t − 1 parallel
edges between them. Let U, W and P denote the pre-images of U ◦ , W ◦ , P ◦ in G,
respectively and let K̇ = K̇◦ ∪ {K}. By (3), dG◦ −kb (ka ), dG◦ −ka (kb ) ≤ t. Since
b◦ (ka ) = b◦ (kb ) = t, Claim 12 and the minimal choice of |U ◦ | implies that if
ka ∈ U ◦ , then kb ∈ W ◦ .
Hence we have the following cases (T ◦ denotes a member of P ◦ ). In each
case we are only considering those terms in τ (U ◦ , W ◦ , P ◦ , K˙◦ ) that change when
taking τ (U, W, P, K̇) instead.
• ka ∈ U ◦ , kb ∈ W ◦ : b(U ) = b◦ (U ◦ ) + t2 − t.
• ka , kb ∈ W ◦ : |E[W ]| = |E ◦ [W ◦ ]| + t2 − t + 1 and |K̇[W ]| = |K˙◦ [W ◦ ]| + 1.
• ka ∈ W ◦ , kb ∈ T ◦ : |E[T, W ]| = |E ◦ [T ◦ , W ◦ ]|+t2 −t+1, b(T ) = b◦ (T ◦ )+t2 −t
and |K̇[T, W ]| = |K˙◦ [T ◦ , W ◦ ]| + 1 (see Figure 6 for an example).
• ka ∈ T ◦ , kb ∈ W ◦ : similar to the previous case.
• ka , kb ∈ T ◦ : b(T ) = b◦ (T ◦ ) + 2t2 − 2t.
(7) is satisfied in each of the above cases, hence we are done. Note that in the first
and the last case we may leave out K from K̇ as it is not counted in any term.

1 1 3 3 3 T2 1 1 3 3 3 T1

2 3 3 3 U 2 3 3 3 W
Forbidden K3,3 τ (U, W, P, K̇) = 5 + 32 − 3 = 11
Shrinking
1 1 3 T2◦ 1 1

3 T1

2 3 U◦ 2 3 W◦
τ (U ◦ , W ◦ , P ◦ , K˙◦ ) = 5
Fig. 6. Extending M ◦

Case 2: K is a Kt+1 .
By Lemma 9, the difference between the maximum size of a K-free
 2  b-matching in
G and the maximum size of a K◦ -free b◦ -matching in G◦ is t2 . We show that
for the pre-images U, W and P of U ◦ , W ◦ and P ◦ with K̇ = K̇◦ ∪ {K} satisfy
54 K. Bérczi and L.A. Végh

 2
τ (U, W, P, K̇) = τ (U ◦ , W ◦ , P ◦ , K˙◦ ) + t
2 . (8)
 t+1 
After shrinking K = (VK , EK ) we get a new node k with 2 − 1 loops on
it. (3) implies that there are at most t + 1 non-loop edges incident to k. Since
b◦ (k) ≥ t, Claim 12 implies k ∈ U . Hence we have the following two cases (T ◦
denotes a member of P ◦ ).
   t+1 
• k ∈ W ◦ : |E[W ]| = |E ◦ [W ◦ ]| + t+1
2 − 2 + 1 and |K̇[W ]| = |K˙◦ [W ◦ ]| + 1.
• k ∈ T ◦ : b(T ) = b◦ (T ◦ ) + t2 if t is even and b(T ) = b◦ (T ◦ ) + t2 − 1 for an
odd t.

(8) is satisfied in both cases, hence we are done. We may also leave out K from
K̇ in the second case as it is not counted in any term. 


5 Algorithm
In this section we show how the proof of Theorem 5 immediately yields an
algorithm for finding a maximum K-free b-matching in strongly polynomial time.
In such problems, an important question from an algorithmic point of view is how
K is represented. For example, in the K-free b-matching problem for bipartite
graphs solved by Pap in [16], the set of excluded subgraphs may be exponentially
large. Therefore Pap assumes that K is given by a membership oracle, that is,
a subroutine is given for determining whether a given subgraph is a member
of K. However, with such an oracle there is no general method for determining
whether K = ∅. Fortunately, we do not have to tackle such problems: by the next
claim, we may assume that K is given explicitly, as its size is linear in n. We use
n = |V |, m = |E| for the number of nodes and edges of the graph, respectively.

Claim 13. If the graph G = (V, E) satisfies (1) and (3), then the total number
of Kt,t and Kt+1 subgraphs is bounded by (t+3)n
2 .

Proof. Assume that v ∈ V is contained in a forbidden subgraph and so dG (v) =


t + 1. If we select an edge incident to v, the remaining t edges may be contained
in at most one Kt+1 subgraph hence the number of Kt+1 ’s containing v is at
most t + 1. However, these t edges also determine one of the colour classes
of those Kt,t ’s containing them. If we pick a node v  from this colour class
(implying dG (v  ) = t + 1), pick an edge incident to v  (but not to v), then the
remaining t edges, if they do so, exactly determine the other colour class of a Kt,t
subgraph. Therefore the number of Kt,t subgraphs containing v is bounded by
(t + 1)t = t2 + t. Hence the total number of forbidden Kt,t and Kt+1 subgraphs
2
is at most (t +t)n
2t + (t+1)n
t+1 =
(t+3)n
2 . 


Now we turn to the algorithm. First we choose an inclusionwise maximal subset


H = {H1 , . . . , Hk } of disjoint forbidden subgraphs greedily. For t = 2, let us
always choose squares as long as possible and then go on with triangles. This
Restricted b-Matchings in Degree-Bounded Graphs 55

can be done in O(t3 n) time as follows. Maintain an array of size m that encodes
for each edge whether it is used in one of the selected forbidden subgraphs or
not. When increasing H, one only has to check whether any of the edges of the
examined forbidden subgraph is already used, which takes O(t2 ) time. This and
Claim 13 together give an O(t3 n) bound.
Let us shrink the members of H simultaneously (this can be easily done since
they are disjoint), resulting in a graph G = (V  , E  ) with a bound b : V  → Z+
and no forbidden subgraphs since H was maximal. One can find a maximal b -
matching M  in G in O(|V  ||E  | log |V  |) = O(nm log m) time as in [5]. Using the
constructions described in Lemmas 8 and 9 for Hk , ..., H1 , this can be modified
into a maximal K-free b-matching M . Note that, for t = 2, Hi is always uncovered
in the actual graph by the selection rule. A dual optimal solution U, W, P, K̇ can
be constructed simultaneously as in the proof of Theorem 5. The best time bound
of the shrinking and extension steps may depend on the data structure used and
the representation of the graph. In any case, one such step may be performed in
O(m) time and |H| = O(n), hence the total running time is O(t3 n + nm log m).

References
1. Bérczi, K., Kobayashi, Y.: An Algorithm for (n − 3)–Connectivity Augmentation
Problem: Jump System Approach. Technical report, Department of Mathematical
Engineering, University of Tokyo, METR 2009-12
2. Cornuéjols, G., Pulleyblank, W.: A Matching Problem With Side Conditions. Dis-
crete Math. 29, 135–139 (1980)
3. Frank, A.: Restricted t-matchings in Bipartite Graphs. Discrete Appl. Math. 131,
337–346 (2003)
4. Frank, A., Jordán, T.: Minimal Edge-Coverings of Pairs of Sets. J. Comb. Theory
Ser. B 65, 73–110 (1995)
5. Gabow, H.N.: An Efficient Reduction Technique for Degree-Constrained Subgraph
and Bidirected Network Flow Problems. In: STOC ’83: Proceedings of the fifteenth
annual ACM symposium on Theory of computing, pp. 448–456. ACM, New York
(1983)
6. Hartvigsen, D.: Extensions of Matching Theory. PhD Thesis, Carnegie-Mellon Uni-
versity (1984)
7. Hartvigsen, D.: The Square-Free 2-factor Problem in Bipartite Graphs. In:
Cornuéjols, G., Burkard, R.E., Woeginger, G.J. (eds.) IPCO 1999. LNCS, vol. 1610,
pp. 234–241. Springer, Heidelberg (1999)
8. Hartvigsen, D.: Finding maximum square-free 2-matchings in bipartite graphs. J.
Comb. Theory Ser. B 96, 693–705 (2006)
9. Hartvigsen, D., Li, Y.: Triangle-Free Simple 2-matchings in Subcubic Graphs (Ex-
tended Abstract). In: Fischetti, M., Williamson, D.P. (eds.) IPCO 2007. LNCS,
vol. 4513, pp. 43–52. Springer, Heidelberg (2007)
10. Király, Z.: C4 -free 2-factors in Bipartite Graphs. Technical report, Egerváry Re-
search Group, Department of Operations Research, Eötvös Loránd University, Bu-
dapest, TR-2001-13 (2001)
11. Király, Z.: Restricted t-matchings in Bipartite Graphs. Technical report, Egerváry
Research Group, Department of Operations Research, Eötvös Loránd University,
Budapest, TR-2009-04 (2009)
56 K. Bérczi and L.A. Végh

12. Kobayashi, Y.: A Simple Algorithm for Finding a Maximum Triangle-Free 2-


matching in Subcubic Graphs. Technical report, Department of Mathematical En-
gineering, University of Tokyo, METR 2009-26 (2009)
13. Kobayashi, Y., Takazawa, K.: Even Factors, Jump Systems, and Discrete Con-
vexity. Technical report, Department of Mathematical Engineering, University of
Tokyo, METR 2007-36 (2007)
14. Kobayashi, Y., Takazawa, K.: Square-Free 2-matchings in Bipartite Graphs and
Jump Systems. Technical report, Department of Mathematical Engineering, Uni-
versity of Tokyo, METR 2008-40 (2008)
15. Makai, M.: On Maximum Cost Kt,t -free t-matchings of Bipartite Graphs. SIAM J.
Discret. Math. 21, 349–360 (2007)
16. Pap, G.: Alternating Paths Revisited II: Restricted b-matchings in Bipartite
Graphs. Technical report, Egerváry Research Group, Department of Operations
Research, Eötvös Loránd University, Budapest, TR-2005-13 (2005)
17. Schrijver, A.: Combinatorial Optimization - Polyhedra and Efficiency. Springer,
Heidelberg (2003)
18. Szabó, J.: Jump systems and matroid parity (in hungarian). Master’s Thesis,
Eötvös Loránd University, Budapest (2002)
19. Takazawa, K.: A Weighted Kt,t -free t-factor Algorithm for Bipartite Graphs. Math.
Oper. Res. 34, 351–362 (2009) (INFORMS)
20. Végh, L. A.: Augmenting Undirected Node-Connectivity by One. Technical report,
Egerváry Research Group, Department of Operations Research, Eötvös Loránd
University, Budapest, TR-2009-10 (2009)
Zero-Coefficient Cuts

Kent Andersen and Robert Weismantel

Otto-von-Guericke-University of Magdeburg,
Department of Mathematics/IMO, Universitätsplatz 2,
39106 Magdeburg, Germany
{andersen,weismant}@mail.math.uni-magdeburg.de

Abstract. Many cuts used in practice to solve mixed integer programs


are derived from a basis of the linear relaxation. Every such cut is of
the form αT x ≥ 1, where x ≥ 0 is the vector of non-basic variables and
α ≥ 0. For a point x̄ of the linear relaxation, we call αT x ≥ 1 a zero-
coefficient cut wrt. x̄ if αT x̄ = 0, since this implies αj = 0 when x̄j > 0.
We consider the following problem: Given a point x̄ of the linear relax-
ation, find a basis, and a zero-coefficient cut wrt. x̄ derived from this
basis, or provide a certificate that shows no such cut exists. We show
that this problem can be solved in polynomial time. We also test the
performance of zero-coefficient cuts on a number of test problems. For
several instances zero-coefficient cuts provide a substantial strengthening
of the linear relaxation.

Keywords: Mixed integer program; Lattice basis; Cutting plane; Split


cut.

1 Introduction
This paper concerns mixed integer linear sets of the form:

PI := {x ∈ Rn : Ax = b, x ≥ 0 and xj ∈ Z for j ∈ NI }, (1)

where A ∈ Qm×n , b ∈ Qm , N := {1, 2, . . . , n} is an index set for the variables,


and NI ⊆ N is an index set for the integer constrained variables. The linear
relaxation of PI is denoted P . For simplicity we assume A has full row rank. A
basis of A is a subset B ⊆ N of m variables such that the columns {a.j }j∈B
of A are linearly independent. From a basis B one obtains the basic polyhedron
associated with B:

P (B) := {x ∈ Rn : Ax = b and xj ≥ 0 for all j ∈ N \ B}, (2)

and the corresponding corner polyhedron:

PI (B) := {x ∈ P (B) : xj ∈ Z for all j ∈ NI }. (3)

Observe that P (B) can be obtained from P by deleting the non-negativity con-
straints on the basic variables xi for i ∈ B. Also observe that P (B) can be

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 57–70, 2010.

c Springer-Verlag Berlin Heidelberg 2010
58 K. Andersen and R. Weismantel

written in the form:



P (B) = {x ∈ Rn : x = x̄B + xj rj,B and xj ≥ 0 for j ∈ N \ B}, (4)
j∈N \B

where x̄B ∈ Rn is the basic solution associated with B, and the vectors rj,B ∈ Rn
for j ∈ N \B are the extreme rays of P (B).Finally observe that every non-trivial
valid inequality for PI (B) is of the form j∈N \B αj xj ≥ 1 with αj ≥ 0 for all

j ∈ N \ B. We say that a valid inequality j∈N \B αj xj ≥ 1 for PI (B) is a valid
cut for PI that can be derived from the basis B.
Several classes of cuts can be derived from a basis. Some of these cuts are
derived from a single equation. This equation may be one of the equalities
x = x̄B + j∈N \B xj rj,B , or an integer combination of these equalities. The
integrality constraints on the variables are then used to obtain a valid cut. Single-
row cuts are named either Mixed Integer Gomory (MIG) cuts [10], Mixed Integer
Rounding (MIR) cuts [11] or Split Cuts [8].
Recent research has attempted to use several equations simultaneously to
generate valid cuts from a basis. In [3,9], two equations were considered, and
cuts named disection cuts and lifted two-variable cuts were derived. All these
cuts are intersection cuts [4], and their validity is based on lattice-free polyhedra.
This paper is motivated by the following question: Which properties should
cuts derived from bases of the linear relaxation have in order to be effective
cutting planes for mixed integer programs ? Such properties could be useful for
identifying classes of cuts that are effective in practice.
A first realization is that such cuts must be sparse, i.e., the cuts must have
many zero coefficients. Dense cuts are hard for linear programming solvers to
handle, and they have not been shown to be effective in closing the integrality
gap of mixed integer programs.
Secondly, deciding exactly which variables should have a zero coefficient in a
cut seems hard. It therefore seems natural to consider a specific point x̄ ∈ P and
aim at cuts that form solutions to the following variant of a separation problem:
 
min{ αj x̄j : αj xj ≥ 1 is valid for PI (B) for some basis B} (5)
j∈N \B j∈N \B

withmany zero coefficients on variables j ∈ N \ B for which x̄j > 0. Ideally a


cut j∈N \B αj xj ≥ 1 should be maximally violated, i.e., the cut should satisfy
αj = 0 for all j ∈ N \ B with x̄j > 0. We call a maximally violated cut
obtained from a basis of the linear relaxation for a zero-coefficient cut wrt. x̄.
Zero-coefficient cuts are optimal solutions to the above separation problem when
they exist, and they necessarily have coordinates with zero coefficients. Zero-
coefficient cuts therefore seem to be a class of cuts of high quality for solving
mixed integer programs in practice.
The main result in this paper is to show that a zero-coefficient cut wrt. a point
x̄ ∈ P can be identified in polynomial time if such a cut exists. In other words,
given a point x̄ ∈ P , it is possible in polynomial time to find a basis B, and a
Zero-Coefficient Cuts 59

 
valid inequality j∈N \B αj xj ≥ 1 for PI (B) which satisfies j∈N \B αj x̄j = 0 if
such an inequality exists. The cuts we identify are split cuts, and we show that, if
there exists a zero-coefficient cut wrt. x̄, then there also exists a zero-coefficient
cut wrt. x̄ which is a split cut. The cut is computed by first pivoting to an
appropriate basis, and then computing a lattice basis of a well chosen lattice.
It has been shown that, in general, the separation problem for split cuts is
NP-hard [7]. Our result demonstrates that, if one insists on a maximally violated
split cut, then the separation problem can be solved in polynomial time. Zero-
coefficient cuts therefore seem to provide a reasonably efficient alternative to
optimizing over the split closure of a mixed integer set. The quality of the split
closure as an approximation of a mixed integer set was demonstrated in [5].
The performance of zero-coefficient cuts is tested computationally on instances
from miplib 3.0 [6] and miplib 2003 [1]. We restrict our experiments to the corner
polyhedron PI (B ∗ ) obtained from an optimal basis B ∗ of the LP relaxation. In
other words we do not examine the effect of pivoting in our experiments. On
several test problems, zero-coefficient close substantially more integrality gap
than the MIG cuts obtained from the equations defining the optimal simplex
tableau.
The remainder of the paper is organized as follows. In Sect. 2 we derive an
infeasibility certificate for the set PI (B) for a given basis B. This certificate
is key for deriving zero-coefficient cuts. Zero-coefficient cuts are motivated and
presented in Sect. 3. Our main theorem is proved in Sect. 4. Finally our compu-
tational results are presented in Sect. 5.

2 Infeasibility Certificates for Corner Polyhedra


We now consider a fixed basis B, and we address the question of when PI (B) is
empty. We will present a certificate that proves PI (B) = ∅ whenever this is the
case. We first derive the representation (4) of P (B). Since we consider a fixed
basis B throughout this section, we let x̄ := x̄B and rj := rj,B for j ∈ N \ B.
Let AB denote the induced sub-matrix of A composed of the columns in B, and
define ā.j := (AB )−1 a.j for j ∈ N \ B. We may write P (B) in the form:

P (B) = {x ∈ Rn : xi = x̄i − āi,j xj , for i ∈ B,
j∈N \B

xj = xj , for j ∈ N \ B,
xj ≥ 0 for j ∈ N \ B }. (6)
Defining the following vectors r ∈ R for j ∈ N \ B:
j n

⎨ −āk,j if k ∈ B,
rkj := 1 if k = j, (7)

0 otherwise,
the representation (6) of P (B) can be re-written in the form

P (B) = {x ∈ Rn : x = x̄ + sj rj and sj ≥ 0 for j ∈ N \ B}. (8)
j∈N \B
60 K. Andersen and R. Weismantel

Hence PI (B) is empty if and only if the translated cone x̄ + cone({rj }j∈N \B )
does not contain mixed integer points. Our certificate for proving PI (B) is empty
is a split disjunction. A split disjunction is of the form π T x ≤ π0 ∨ π T x ≥ π0 + 1,
where (π, π0 ) ∈ Zn+1 and πj = 0 for all j ∈ N \ NI . All mixed integer points
x ∈ PI (B) satisfy all split disjunctions.
Our point of departure is a result characterizing when an affine set contains
integer points (see [2]). Specifically, consider the affine set:
T a := f + span({q j }j∈J ), (9)

where f ∈ Qn , J is a finite index set and {q j }j∈J ⊂ Qn . A result in [2] shows


that T a does not contain integer points if and only if there exists π ∈ Zn such
that π T f ∈
/ Z and π T q j = 0 for all j ∈ J. Observe that such a vector π ∈ Zn
gives a split disjunction π T x ≤ π T f ∨ π T x ≥ π T f  which proves T a ∩ Zn = ∅,
i.e., we have T a ⊆ {x ∈ Rn : π T f  < π T x < π T f }.
We first generalize this result from integer points in affine sets to mixed integer
points in affine sets.
Lemma 1. The set T a does not contain mixed integer points if and only if there
exists π ∈ Zn such that π T f ∈
/ Z, π T q j = 0 for all j ∈ J and πj = 0 for all
j ∈ N \ NI .
Proof. We have that {x ∈ T a : xj ∈ Z for all j ∈ NI } is empty if and only if

{x ∈ Rn : xi = fi + j∈J sj qij and xi ∈ Z for all i ∈ NI } is empty if and only
if there exists a vector π ∈ Zn such that π T f ∈
/ Z, π T q j = 0 for all j ∈ J and
πj = 0 for all j ∈ N \ NI .
Lemma 1 shows that, if T a does not contain mixed integer points, then there
exists split disjunction π T x ≤ π T f  ∨π T x ≥ π T f , where π ∈ Zn satisfies
πj = 0 for all j ∈ N \ NI , such that T a ⊂ {x ∈ Rn : π T f  < π T x < π T f },
i.e., this split disjunction provides a certificate that shows that T a does not
contain any mixed integer points.
We next generalize Lemma 1 from mixed integer points in affine sets to mixed
integer points in translates of polyhedral cones.
Lemma 2. The set f + cone({q j }j∈J ) contains no mixed integer points if and
only if the set f + span({q j }j∈J ) contains no mixed integer points.
Proof. Let T c := f + cone({q j }j∈J ). We have to show that T c does not contain
mixed integer points if and only if T a does not contain mixed integer points.
Clearly, if T c contains mixed integer points, then T a also contains mixed integer
points since T c ⊆ T a . Hence we only need to show the other direction.
Therefore suppose T c does not contain mixed integer points, and assume, for a
contradiction, that T a contains mixed integer points. Let x ∈ T a satisfy xj ∈ Z

for all j ∈ NI , and let s ∈ R|J| be such that x = f + j∈J sj q j . Choose an
 s
integer d > 0 such that dq j ∈ Zn for all j ∈ J and define x := x − j∈J  dj dq j .
 s s
We have x ∈ {x ∈ Rn : xj ∈ Z for j ∈ NI } and x = f + j∈J ( dj −  dj )dq j .
Hence x ∈ T c which is a contradiction.
Zero-Coefficient Cuts 61

Since PI (B) is the set of mixed integer points in a translate of a polyhedral cone,
we now have the following certificate for when PI (B) is empty.
Corollary 1. We have PI (B) = ∅ if and only there exists π ∈ Zn such that
π T x̄ ∈
/ Z, π T rj = 0 for all j ∈ N \ B and πj = 0 for all j ∈ N \ NI .

3 Zero-Coefficient Cuts from Corner Polyhedra

We now use the certificate obtained in Sect. 2 to derive zero-coefficient cuts for
a corner polyhedron PI (B) for a fixed basis B. As in Sect. 2, we let x̄ := x̄B and
rj := rj,B for j ∈ N \ B. We consider an optimization problem (MIP):

min{ cj xj : x ∈ PI (B)},
j∈N \B

where c ∈ R|N \B| denotes the objective coefficients. The linear programming
relaxation of (MIP) is denoted (LP). The set of feasible solutions to (LP) is the
set P (B). We assume cj ≥ 0 for all j ∈ N \B since otherwise (LP) is unbounded.
To motivate zero-coefficient cuts, we first consider a generic cutting plane
|N \B|
algorithm for strengthening the LP relaxation (LP) of (MIP).  Let V ⊂ Q+
be a family of valid inequalities for PI (B), i.e., we have that j∈N \B αj xj ≥ 1 is
valid for PI (B) for all α ∈ V . Let x ∈ P (B) be arbitrary. A separation problem
(SEP) wrt. x can be formulated as:

min{ αj xj : α ∈ V }.
j∈N \B

A cutting plane algorithm (CPalg) for using V to strengthen the LP relaxation


(LP) of (MIP) can now be designed by iteratively solving (SEP) wrt. various
points x ∈ P (B):

Cutting plane algorithm (CPalg):


(1) Set k := 0. Let xk := x̄ be an optimal solution to (LP).
(2) Solve (SEP) wrt. xk . Let αk ∈ V be an optimal solution.
k k
(3) While j∈N \B αj xj < 1:

j∈N \B αj xj ≥ 1 to (LP) and re-optimize.
k
(a) Add
k+1
Let x be an optimal solution.
(b) Solve (SEP) wrt. xk+1 .
Let αk+1 ∈ V be an optimal solution.
(c) Set k := k + 1.
End.

In (CPalg) above, only one cut is added in every iteration. It is also possible
to add several optimal and sub-optimal solutions to (SEP). Furthermore, for
many classes V of valid cutting planes for PI (B), (SEP) can not necessarily be
62 K. Andersen and R. Weismantel

solved in polynomial time, and final convergence of (CPalg) is not guaranteed.


For instance, if V is the class of split cuts, (SEP)
 is NP-hard [7].
For α ∈ V and x ∈ P (B), the inequality j∈N \B αj xj ≥ 1 is maximally vio-
 
lated by x when j∈N \B αj xj = 0. We call j∈N \B αj xj ≥ 1 a zero-coefficient

cut wrt. x when j∈N \B αj xj = 0. Observe that if a zero-coefficient wrt. xk ex-
ists in the family V of valid inequalities for PI (B) in the k th iteration of (CPalg),
then this cut is an optimal solution to (SEP).
Since (SEP) always returns a zero-coefficient cut wrt. the point that is being
separated whenever such a cut exists, the structure of (CPalg) is such that zero-
coefficient cuts are separated first, i.e., the first iterations of (CPalg) consist of
the following cutting plane algorithm (InitCPalg):

Cutting plane algorithm (InitCPalg):


(1) Set k := 0. Let xk := x̄ be an optimal solution to (LP).
(2) While there exists αk ∈ V such that j∈N \B αj xj ≥ 1
k

is a zero-coefficient
 cut wrt. xk :
j∈N \B αj xj ≥ 1 to (LP) and re-optimize.
k
(a) Add
Let xk+1 be an optimal solution.
(b) Set k := k + 1.
End.

When (InitCPalg) terminates, a point x∗ ∈ P (B) is obtained that satisfies


 ∗
j∈N \B αj xj > 0 for all α ∈ V , i.e., there does not exist any zero-coefficient cut

wrt. x in the family V . Following (InitCPalg), if possible and desirable, (CPalg)
can be continued in order to strengthen the LP relaxation of (MIP) further with
valid cuts that are not maximally violated.
In the following we show that (InitCPalg) can be implemented to run in
polynomial time by using only split cuts. Since the initial phase (InitCPalg)
of (CPalg) can be implemented with split cuts only, this could suggest an ex-
planation of why split cuts have been observed to close a large amount of the
integrality on many instances [5].
We first review how split cuts are derived for PI (B). Every split cut is derived
from a vector π ∈ Zn satisfying π T x̄ ∈ / Z and πj = 0 for all j ∈ N \ NI . Define
f0 (π) := π T x̄−π T x̄ and fj (π) := π T rj −π T rj  for j ∈ NI \B. The inequality:
 xj
≥1 (10)
αj (π)
j∈N \B

is the (strengthened) split cut defined by π, where αj (π) for j ∈ N \ B is:


⎧ f0 (π)

⎪ 1−fj (π) if j ∈ NI and 0 < fj (π) < 1 − f0 (π),


⎪ 1−f0 (π)
⎨ fj (π) if j ∈ NI and 1 − f0 (π) ≤ fj (π) < 1,
αj (π) := 1−fT0 (π) (11)

⎪ π r j if j ∈ / NI and π T rj > 0,

⎪ − T j if j ∈
f (π) T j

0
⎩ π r / NI and π r < 0,
+∞ otherwise.
Zero-Coefficient Cuts 63

We next prove that, for a given point x ∈ P (B), if there exists any valid
inequality for PI (B) which is maximally violated wrt. x , then there also exists
a split cut which is maximally violated wrt. x .
Theorem 1. Let x ∈ P (B) be arbitrary. If there exists a valid inequality for
PI (B) which is a zero-coefficient cut wrt. x , then there also exists a split cut for
PI (B) which is a zero-coefficient cut wrt. x .

Proof. Let x ∈ P (B), and let j∈N \B αj xj ≥ 1 be a valid inequality for PI (B)

which is a zero-coefficient cut wrt. x . Since j∈N \B αj xj = 0 and αj , sj ≥ 0
for all j ∈ N \ B, we must have αj = 0 for all j ∈ N \ B satisfying xj > 0. Let
X  := {j ∈ N \ B : xj > 0}. It follows that 0 ≥ 1 is valid for:

QI := {x ∈ PI (B) : xj = 0 for all j ∈ (N \ B) \ X  }


= {x ∈ x̄ + cone({rj }j∈X  ) : xj ∈ Z for all j ∈ NI }.

Since QI = ∅, Lemma 2 shows there exists π̄ ∈ Zn such that π̄ T rj = 0 for


j ∈ X  , π̄j = 0 for j ∈ N \ NI and π̄ T x̄ ∈
/ Z. From (10) and (11) it now follows
that the split cut derived from π̄ is a zero-coefficient cut wrt. x .
In general it is NP-hard to separate a split cut for PI (B) [7]. However, as we
will show next, it is possible to separate a zero-coefficient split cut wrt. a given
point in polynomial time whenever such a split cut exists.
Let x ∈ P (B). Define X  := {j ∈ N \ B : xj > 0}. From (11) we have that
π ∈ Zn defines a maximally violated split cut wrt. x if and only if

π T rj = 0 for all j ∈ X  , (12)


πj = 0 for all j ∈ N \ NI , (13)
π T x̄ ∈
/ Z. (14)

If π ∈ Zn satisfies (12)-(14), then a split cut can be derived from π since π T x̄ ∈


/ Z,
and we have αj (π) = +∞ for all j ∈ X  , which implies that the coefficients on
the variables xj for j ∈ X  in the split cut (10) are all zero. Hence any π ∈ Zn
that satisfies (12)-(14) defines a zero-coefficient split cut wrt. x . Conversely, if
there exists a valid inequality for PI (B) which is maximally violated wrt. x ,
then Theorem 1 shows there exists π ∈ Zn that satisfies (12)-(14).
Let L(x ) ⊆ Zn denote the set of π ∈ Zn that satisfy (12) and (13):

L(x ) := {π ∈ Zn : π T rj = 0 for all j ∈ X  and πj = 0 for all j ∈ N \ NI }.

Observe that L(x ) is a lattice, i.e., for any π 1 , π 2 ∈ L(x ) and k ∈ Z, we have
kπ 1 ∈ L(x ) and π 1 + π 2 ∈ L(x ). For any lattice it is possible to compute a basis
for the lattice in polynomial time. Hence we can find vectors π 1 , . . . , π p ∈ L(x )
in polynomial time such that:

p
L(x ) = {π ∈ Zn : π = λi π i and λi ∈ Z for i = 1, 2, . . . , p}.
i=1
64 K. Andersen and R. Weismantel

Now, if there exists a lattice basis vector π ī ∈ L(x ) with ī ∈ {1, 2, . . . , p} such
that (π ī )T x̄ ∈
/ Z, then the split cut derived from π ī is maximally violated wrt.

x . Conversely, if we have (π i )T x̄ ∈ Z for all i ∈ {1, 2, . . . , p}, then π T x̄ ∈ Z for
all π ∈ L(x ). We therefore have the following.
Corollary 2. Let x ∈ P (B) be arbitrary. If there exists a valid inequality for
PI (B) that is maximally violated wrt. x , then it is possible to find such an
inequality in polynomial time.
Based on the above results, we have the following implementation of the cutting
plane algorithm (InitCPalg) presented earlier:

Implementation of (InitCPalg):
(1) Set k := 0. Let xk := x̄ be an optimal solution to (LP).
(2) Find a lattice basis π 1 , . . . , π pk for L(xk ).
Let I(xk ) := {i ∈ {1, . . . , pk } : (π i )T x̄ ∈
/ Z}.
(3) While I(xk ) = ∅:
(a) Add all split cuts generated from vectors π i
with i ∈ I(xk ) to (LP) and re-optimize.
Let xk+1 be an optimal solution.
(b) Find a lattice basis π 1 , . . . , π pk+1 for L(xk+1 ).
Let I(xk+1 ) := {i ∈ {1, . . . , pk+1 } : (π i )T x̄ ∈
/ Z}.
(c) Set k := k + 1.
End.

We next argue that mixed integer Gomory cuts play a natural role in the above
implementation of (InitCPalg). Consider the computation of the lattice basis for
L(x0 ) in step (2) of (InitCPalg). Observe that, since x0 = x̄, we have L(x0 ) = Zn ,
and therefore π 1 := e1 , . . . , π n := en is a lattice basis for L(x0 ), where e1 , . . . , en
denote the unit vectors in Rn . Since a split cut (10) obtained from a unit vector
is a mixed integer Gomory cut, the first cuts added in step (3).(a) of the above
implementation of (InitCPalg) are the mixed integer Gomory cuts. A natural
computational question therefore seems to be how much more integrality gap
can be closed by continuing (InitCPalg) and generating the remaining zero-
coefficient cuts.

4 Zero-Coefficient Cuts from Mixed Integer Polyhedra


In Sect. 3 we considered a fixed basis B and a point x ∈ P (B), and we demon-
strated how to obtain a zero-coefficient cut wrt. x from PI (B) whenever such a
cut exists. Given x ∈ P , we now consider how to obtain an appropriate basis,
i.e., we show how to identify a basis B such that a zero-coefficient cut wrt. x
can be derived from PI (B). For this, we first relate the emptyness of two corner
polyhedra PI (B) and PI (B  ) obtained from two adjacent bases B and B  for P .

Lemma 3. Let B be a basis for P , and let B  := (B \ {ī}) ∪ {j̄} be an adjacent


basis to B, where ī ∈ B and j̄ ∈ N \B. Then PI (B) = ∅ if and only if PI (B  ) = ∅.
Zero-Coefficient Cuts 65


Proof. For simplicity let x̄ := x̄B and x̄ := x̄B . Also let ā.j := (AB )−1 a.j for all
j ∈ N \ B and ā.j := (AB  )−1 a.j for all j ∈ N \ B  , where AB and AB  denote

the basis matrices associated with the bases  B and B respectively. 

Suppose z ∈ PI (B). We have zi = x̄i + j∈N \B  āi,j zj for all i ∈ B , zj ≥ 0 for
all j ∈ N \ (B  ∪ {ī}) and zj ∈ Z for all j ∈ NI . If zī ≥ 0, we are done, so suppose
zī < 0. Choose an integer k > 0 such that kā.ī ∈ Zm and zī + k ≥ 0. Defining
zi := zi + kāi,ī for all i ∈ B  , zī := zī + k and zj := zj for all j ∈ N \ (B  ∪{ī}), we
have z  ∈ PI (B  ). Hence PI (B) = ∅ implies PI (B  ) = ∅. The opposite direction
is symmetric.
From Lemma 3 it follows that either all corner polyhedra PI (B) associated with
bases B for P are empty, or they are all non-empty. We next present a pivot
operation from a basis B to an adjacent basis B  with the property that, if a zero-
coefficient cut wrt. a point x ∈ P can be derived from B, then a zero-coefficient
cut wrt. x can also be derived from B  .
Lemma 4. Let B be a basis for P , let x ∈ P and define X  := {j ∈ N : xj > 0}.
Also let B  := (B \ {ī}) ∪ {j̄} be an adjacent basis to B, where ī ∈ B \ X  and
j̄ ∈ X  \ B. If a zero-coefficient cut wrt. x can be derived from B, then a zero-
coefficient cut wrt. x can also be derived from B  .
Proof. Given a set S ⊆ N , we will use sets obtained from P , PI , P (B) and
PI (B) by setting xj = 0 for all j ∈ N \ S. For S ⊆ N , define Q(S) := {x ∈ P :
xj = 0 for j ∈ N \ S} and QI (S) := {x ∈ PI : xj = 0 for j ∈ N \ S}. Also,
given a basis B ⊆ S, define Q(B, S) := {x ∈ P (B) : xj = 0 for j ∈ N \ S} and
QI (B, S) := {x ∈ PI (B) : xj = 0 for j ∈ N \ S}.
Assume a zero-coefficient cut wrt. x can be derived from B. Observe that this
implies PI (B, B ∪X  ) = ∅. Now, PI (B, B ∪X  ) is a corner polyhedron associated
with PI (B ∪ X  ), and PI (B  , B ∪ X  ) is also a corner polyhedron associated with
PI (B ∪ X  ). Since any two bases of P (B ∪ X  ) can be obtained from each other
by pivoting, it follows from Lemma 3 that also PI (B  , B ∪ X  ) = ∅. Corollary 1
now gives a split cut which is a zero-coefficient cut wrt. x derived from B  .
Lemma 4 shows that, for the purpose of identifying zero-coefficient cuts wrt. x ,
the interesting bases to consider are those bases for which it is not possible to
pivot a variable xj with j ∈ X  into the basis.
Definition 1. Let x ∈ P , and define X  := {j ∈ N : xj > 0}. A basis B for P
is called maximal wrt. x if (B \ {ī}) ∪ {j̄} is not a basis for P for all ī ∈ B \ X 
and j̄ ∈ X  \ B.
From the above results it is not clear whether it is necessary to investigate all
maximal bases wrt. x in order to identify a zero-coefficient cut wrt. x . However,
the following lemma shows that it is sufficient to examine just a single arbitrarily
chosen maximal basis wrt. x . In other words, if there exists a basis from which
a zero-coefficient cut wrt. x can be derived, then a zero-coefficient cut wrt. x
can be derived from every maximal basis wrt. x .
66 K. Andersen and R. Weismantel

Lemma 5. If there exists a basis B for P from which a zero-coefficient cut wrt.
x can be derived, then a zero-coefficient cut can be derived from every basis for
P which is maximal wrt. x .
Proof. Suppose B is a basis from which a zero-coefficient cut wrt. x can be
derived. Let J := N \ B, Bx := B ∩ X  and Jx := J ∩ X  . Also let x̄ := x̄B and
ā.j := (AB )−1 a.j for j ∈ J, where AB denotes the basis matrix associated with
B. Lemma 4 shows that we may assume B is maximal, i.e., we may assume that
the simplex tableau associated with B is of the form:

xi = 0 + āi,j xj for all i ∈ B \ Bx , (15)
j∈J\Jx
 
xi = x̄i + āi,j xj + āi,j xj for all i ∈ Bx . (16)
j∈Jx j∈J\Jx

xj ≥ 0 for all i ∈ J. (17)


Observe that x̄i = 0 for all i ∈ B \ Bx , since x satisfies (15). The set P (B) is
the set of solutions to (15)-(17), and PI (B) is the set of mixed integer solutions
to (15)-(17). Furthermore, from (15)-(17) it follows that a zero-coefficient cut
can be derived from B if and only if the following set does not contain mixed
integer points:

T (B) := {x ∈ Rn : xi = x̄i + āi,j xj for i ∈ Bx , and xj ≥ 0 for j ∈ Jx }.
j∈Jx

Now, T (B) is a basic polyhedron associated with the set:



T := {x ∈ Rn : xi = x̄i + āi,j xj for i ∈ Bx , and xj ≥ 0 for j ∈ Jx ∪ Bx }.
j∈Jx

Furthermore, from any basis B  for P which is maximal wrt. x , a basic poly-
hedron T (B  ) of T can be associated of the above form, and a zero-coefficient
cut wrt. x can be derived from B  if and only if T (B  ) does not contain mixed
integer points. Since T (B) does not contain mixed integer points, it follows from
Lemma 3 that every basic polyhedron T (B  ) for T does not contain mixed in-
teger points. Hence a zero-coefficient cut can be derived from every basis B  for
P which is maximal wrt. x .
Since a maximal basis wrt. x ∈ P can be obtained in polynomial time, we
immediately have our main theorem.
Theorem 2. Let x ∈ P be arbitrary. If there exists basis B, and a valid inequal-
ity for PI (B) which is a zero-coefficient cut wrt. x , then such a zero-coefficient
cut can be obtained in polynomial time.

5 Computational Results
We now test the performance of the cutting plane algorithm (InitCPalg) de-
scribed in Sect. 3. In our implementation, we use CPLEX 9.1 for solving linear
Zero-Coefficient Cuts 67

programs, and the open source software NTL for the lattice computations. We
use instances from miplib 3.0 [6] and miplib 2003 [1] in our experiments. All in-
stances are minimization problems, and we use the preprocessed version of each
instance, i.e., when we refer to an instance, we refer to the instance obtained
after applying the preprocessor of CPLEX 9.1.
For each instance, we formulate the optimization problem over the corner
polyhedron associated with an optimal basis of the LP relaxation. To distin-
guish the optimization problem over the corner polyhedron from the original
mixed integer program, we use the following notation: The original mixed inte-
ger program is denoted (MIP), and the mixed integer program over the corner
polyhedron is denoted (MIPc ). The optimal objective of (MIP) is denoted z MIP ,
c
and the optimal objective value of (MIPc ) is denoted z MIP . The LP relaxation
of (MIP) is denoted (LP), and the optimal objective value of (LP) is denoted
z LP .
We assume the (original) mixed integer program (MIP) has n variables, and
includes slack, surplus and artificial variables in the formulation:
min cT x
such that
aTi. x = bi , for all i ∈ M, (18)
lj ≤ xj ≤ uj , for all j ∈ N, (19)
xj ∈ Z, for all j ∈ NI . (20)
where M is an index set for the constraints, c ∈ Qn+|M| denotes the objective
coefficients, N := {1, 2, . . . , (n + |M |)} is an index set for the variables, NI ⊆ N
denotes those variables that are integer constrained, l and u are the lower and
upper bounds on the variables respectively and (ai. , bi ) ∈ Q|N |+1 for i ∈ M
denotes the coefficients in the ith constraint. The variables xn+i for i ∈ M are
either slack, surplus or artificial variables.
The problem (MIPc ) is formulated as follows. An optimal basis for (LP) is an
|M |-subset B ∗ ⊆ N of basic variables. Let J ∗ := N \ B ∗ denote the non-basic
variables. The problem (MIPc ) can be constructed from (MIP) by eliminating

certain bounds on the variables. Let JA denote the non-basic artificial variables,
∗ ∗ ∗
let JL ⊆ J \ JA denote the non-basic structural variables on lower bound, and
let JU∗ ⊆ J ∗ \ JA

denote the non-basic structural variables on upper bound. By
re-defining the bounds on the variables xj for j ∈ N to:
⎧ ∗
⎧ ∗
⎨0 if j ∈ JA , ⎨0 if j ∈ JA ,
∗ ∗
lj (B ) := lj if j ∈ JL , and uj (B ):= uj if j ∈ JU∗ ,

(21)
⎩ ⎩
−∞ otherwise, +∞ otherwise,
then the problem (MIPc ) associated with B ∗ is given by:
min cT x
such that
aTi. x = bi , for all i ∈ M, (22)
68 K. Andersen and R. Weismantel

lj (B ∗ ) ≤ xj ≤ uj (B ∗ ), for all j ∈ N, (23)


xj ∈ Z, for all j ∈ NI . (24)

We evaluate the performance of zero-coefficient cuts by calculating how much


more of the integrality gap of (MIPc ) can be closed by continuing (InitCPalg)
beyond the generation of the mixed integer Gomory (MIG) cuts. For this, we
first evaluate the quality of (MIPc ) as an approximation to (MIP). Our measure
of quality is the size of the integrality gap. The integrality gap of (MIPc ) is
c
the number GapI (MIPc ) := z MIP − z LP , and the integrality gap of (MIP) is
the number GapI (MIP) := z MIP
− z LP . The relationship between the numbers
c
GapI (MIP) and GapI (MIP ) give information on the quality of (MIPc ) as an
object for cutting plane generation for (MIP).
Table 1 contains our results for evaluating the quality of (MIPc ). The first
three columns contain the problem name, the number of constraints and the
number of variables for each instance respectively.
There are six instances (not included in Table 1) for which (MIPc ) does not
close any of the integrality gap between (LP) and (MIP). This means that the
bounds deleted from (MIP) to create (MIPc ) are important constraints of (MIP),
although these bounds are not active in defining an optimal solution to (LP).
This seems to indicate that, for these instances, (MIPc ) is not the right relaxation
of (MIP) from which to derive strong cuts.
For the first seven instances in Table 1 the opposite is true, i.e., for these
instances the optimal objective of (MIPc ) is the same as the optimal objective
of (MIP). Therefore, for these instances, if we can identify all facet defining
inequalities associated with (MIPc ), and add these inequalities (LP), then all
of the integrality gap between (LP) and (MIP) will be closed. Hence, for these
instances, (MIPc ) seems to be the right object from which to derive strong cuts
for (MIP). For the remaining instances in Table 1, not all the integrality gap
between (LP) and (MIP) is closed by valid cuts from (MIPc ). However, for most
instances in Table 1, it is still a large amount of integrality gap between (LP)
and (MIP) that can potentially be closed with valid cuts for (MIPc ).
We next evaluate the performance of zero-coefficient cuts. Table 2 contains
the main results. Before considering the results in Table 2, we first make a few
comments on those instances that are not in Table 2.
For three instances, MIG cuts close all the integrality gap of (MIPc ). For these
instances, zero-coefficient cuts can therefore not close any additional integrality
gap, and we did not include these instances in our test of the performance of
zero-coefficient cuts. Furthermore, for another nine instances, no further zero-
coefficient cuts were generated besides the MIG cuts. Observe that (InitCPalg)
does not do much work for these instances, since this is detected after the first
lattice basis has been computed.
For the remaining instances, we divided them into those instances where MIG
cuts closed less than 80% of the total integrality gap that can be closed with
zero-coefficient cuts, and those where MIG cuts closed more than 80% of the
total integrality gap that can be closed with zero-coefficient cuts.
Zero-Coefficient Cuts 69

Table 1. Strength of corner polyhedron


c GapI (MIPc )
Problem # Constr. # Var. z MIP z MIP z LP GapI (MIP)
× 100%
10teams 210 1600 904 904 897 100.00 %
air04 614 7564 54632 54632 54030.44 100.00 %
egout 35 47 299.01 299.01 242.52 100.00 %
l152lav 97 1988 4722 4722 4656.36 100.00 %
mas76 12 148 40005.05 40005.05 38893.90 100.00 %
mod008 6 319 307 307 290.93 100.00 %
p0282 160 200 258401 258401 179990.30 100.00 %
qnet1 363 1417 15997.04 16029.69 14274.10 98.14 %
flugpl 13 14 759600 760500 726875 97.32 %
nsrand-ipx 76 4162 50880.00 51200.00 49667.89 79.11 %
vpm1 128 188 19 20 16.43 71.99 %
vpm2 128 188 13 13.75 11.14 71.26 %
pp08a 133 234 5870 7350 2748.35 67.84 %
p2756 702 2642 2893 3124 2701.67 45.30 %
swath 482 6260 378.07 467.41 334.50 32.78 %
modglob 286 384 19886358 20099766 19790206 31.06 %
fixnet6 477 877 3357 3981 3190.04 21.11 %
p0201 107 183 7185 7615 7155 6.52 %
rout 290 555 -1388.42 -1297.69 -1393.39 5.19 %

Table 2. Instances where the increase in objective with all ZC cuts was substantially
larger than the increase in objective with only MIG cuts
ΔObj. MIG cuts
Problem # Constr. # Var. # MIG # Additional ΔObj. All cuts × 100%
cuts ZC cuts
l152lav 97 1988 53 6 0.00%
mkc∗ 1286 3230 119 29 0.00%
p0201 107 183 42 27 0.00%
p2756 702 2642 201 7 6.02%
rout 290 555 52 36 6.24%
swath 482 6260 80 26 7.58%
vpm1 114 188 15 40 22.38%
vpm2 128 188 29 29 22.73%
flugpl 13 14 13 5 23.36%
fixnet6 477 877 12 19 27.06%
timtab2∗ 287 648 237 653 29.85%
timtab1∗ 166 378 133 342 30.93%
egout 35 47 8 5 41.47%
qnet1 363 1417 55 47 45.05%
p0282 160 200 34 53 49.60%
air04 614 7564 290 30 50.53%
modglob 286 384 60 28 56.77%
mas76 12 148 11 11 65.77%
pp08a 133 234 51 34 66.16%
10teams 210 1600 179 76 66.67%
mod008 6 319 5 10 69.77%
nsrand-ipx 590 4162 226 91 73.17%

Table 2 contains those instances where MIG cuts closed less than 80% of the
total integrality gap that can be closed with zero-coefficient cuts. We observe
that for the first 16 instances in Table 2, continuing (InitCPalg) beyond MIG
cuts closed at least twice as much integrality gap as would have been achieved
70 K. Andersen and R. Weismantel

by using only MIG cuts. For the remaining instances in Table 2, it was not at
least a factor of two which was achieved, but still a substantial improvement.
The instances marked with an asterisk in Table 2 are instances where we were
unable to solve (MIPc ). For those instances, the results are based on the best
possible solution we were able to find.
The remaining class of instances are those instances where MIG cuts closed
more than 80% of the total integrality gap that can be closed with zero-coefficient
cuts. There were 28 of these instances. For these instances, continuing (InitC-
Palg) beyond MIG cuts was therefore not beneficial. However, we observe that
for all except two of these instances (markshare1 and markshare2), this was
because very few zero-coefficient cuts were generated that are not MIG cuts.
Detecting that it is not beneficial to continue (InitCPalg) beyond the generation
of MIG cuts was therefore done after only very few lattice basis computations.

References
1. Achterberg, T., Koch, T.: MIPLIB 2003. Operations Research Letters 34, 361–372
(2006)
2. Andersen, K., Louveaux, Q., Weismantel, R.: Certificates of linear mixed integer
infeasibility. Operations Research Letters 36, 734–738 (2008)
3. Andersen, K., Louveaux, Q., Weismantel, R., Wolsey, L.A.: Inequalities from Two
Rows of a Simplex Tableau. In: Fischetti, M., Williamson, D.P. (eds.) IPCO 2007.
LNCS, vol. 4513, pp. 1–15. Springer, Heidelberg (2007)
4. Balas, E.: Intersection Cuts - a new type of cutting planes for integer programming.
Operations Research 19, 19–39 (1971)
5. Balas, E., Saxena, A.: Optimizing over the split closure. Mathematical Program-
ming, Ser. A 113, 219–240 (2008)
6. Bixby, R.E., Ceria, S., McZeal, C.M., Savelsbergh, M.W.P.: An updated mixed
integer programming library: MIPLIB 3. 0. Optima 58, 12–15 (1998)
7. Caprara, A., Letchford, A.: On the separation of split cuts and related inequalities.
Mathematical Programming, Ser. A 94, 279–294 (2003)
8. Cook, W.J., Kannan, R., Schrijver, A.: Chvátal closures for mixed integer pro-
gramming problems. Mathematical Programming 47, 155–174 (1990)
9. Cornuéjols, G., Margot, F.: On the Facets of Mixed Integer Programs with Two
Integer Variables and Two Constraints. Mathematical Programming, Ser. A 120,
429–456 (2009)
10. Gomory, R.E.: An algorithm for the mixed integer problem. Technical Report RM-
2597, The Rand Corporation (1960a)
11. Nemhauser, G., Wolsey, L.A.: A recursive procedure to generate all cuts for 0-1
mixed integer programs. Mathematical Programming, Ser. A 46, 379–390 (1990)
Prize-Collecting Steiner Network Problems

MohammadTaghi Hajiaghayi1, Rohit Khandekar2 ,


Guy Kortsarz3, , and Zeev Nutov4
1
AT&T Research Lab Research
[email protected]
2
IBM T.J.Watson Research Center
[email protected]
3
Rutgers University, Camden
[email protected]
4
The Open University of Israel
[email protected]

Abstract. In the Steiner Network problem we are given a graph G with


edge-costs and connectivity requirements ruv between node pairs u, v.
The goal is to find a minimum-cost subgraph H of G that contains ruv
edge-disjoint paths for all u, v ∈ V . In Prize-Collecting Steiner Network
problems we do not need to satisfy all requirements, but are given a
penalty function for violating the connectivity requirements, and the goal
is to find a subgraph H that minimizes the cost plus the penalty. The case
when ruv ∈ {0, 1} is the classic Prize-Collecting Steiner Forest problem.
In this paper we present a novel linear programming relaxation for
the Prize-Collecting Steiner Network problem, and by rounding it, obtain
the first constant-factor approximation algorithm for submodular and
monotone non-decreasing penalty functions. In particular, our setting
includes all-or-nothing penalty functions, which charge the penalty even
if the connectivity requirement is slightly violated; this resolves an open
question posed in [SSW07]. We further generalize our results for element-
connectivity and node-connectivity.

1 Introduction
Prize-collecting Steiner problems are well-known network design problems with
several applications in expanding telecommunications networks (see for exam-
ple [JMP00, SCRS00]), cost sharing, and Lagrangian relaxation techniques (see
e.g. [JV01, CRW01]). A general form of these problems is the Prize-Collecting
Steiner Forest problem1 : given a network (graph) G = (V, E), a set of source-
sink pairs P = {{s1 , t1 }, {s2 , t2 }, . . . , {sk , tk }}, a non-negative cost function
c : E → + , and a non-negative penalty function π : P → + , our goal is

Part of this work was done while the authors were meeting at DIMACS. We would
like to thank DIMACS for hospitality.

Partially supported by NSF Award Grant number 0829959.
1
In the literature, this problem is also called “prize-collecting generalized Steiner
tree”.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 71–84, 2010.

c Springer-Verlag Berlin Heidelberg 2010
72 M. Hajiaghayi et al.

a minimum-cost way of installing (buying) a set of links (edges) and paying the
penalty for those pairs which are not connected via installed links. When all
penalties are ∞, the problem is the classic APX-hard Steiner Forest problem, for
which the best known approximation ratio is 2 − n2 (n is the number of nodes of
the graph) due to Agrawal, Klein, and Ravi [AKR95] (see also [GW95] for a more
general result and a simpler analysis). The case of Prize-Collecting Steiner Forest
problem when all sinks are identical is the classic Prize-Collecting Steiner Tree
problem. Bienstock, Goemans, Simchi-Levi, and Williamson [BGSLW93] first
considered this problem (based on a problem earlier proposed by Balas [Bal89])
and gave for it a 3-approximation algorithm. The current best ratio for this
problem is 1.992 by Archer, Bateni, Hajiaghayi,
! and Karloff [ABHK09], im-
proving upon a primal-dual 2 − n−1 1
-approximation algorithm of Goemans
and Williamson [GW95]. When in addition all penalties are ∞, the problem is
the classic Steiner Tree problem, which is known to be APX-hard [BP89] and
for which the best approximation ratio is 1.55 [RZ05]. Very recently, Byrka et
al. [BGRS10] have announced an improved approximation algorithm for the
Steiner tree problem.
The general form of the Prize-Collecting Steiner Forest problem first has been
formulated by Hajiaghayi and Jain [HJ06]. They showed how by a primal-dual
algorithm to a novel integer programming formulation of the problem with
doubly-exponential variables, we can obtain a 3-approximation algorithm for
the problem. In addition, they show that the factor 3 in the analysis of their
algorithm is tight. However they show how a direct randomized LP-rounding al-
gorithm with approximation factor 2.54 can be obtained for this problem. Their
approach has been generalized by Sharma, Swamy, and Williamson [SSW07] for
network design problems where violated arbitrary 0-1 connectivity constraints
are allowed in exchange for a very general penalty function. The work of Ha-
jiaghayi and Jain has also motivated a game-theoretic version of the problem
considered by Gupta et al. [GKL+ 07].
In this paper, we consider a much more general high-connectivity version of
Prize-Collecting Steiner Forest, called Prize-Collecting Steiner Network, in which
we are also given connectivity requirements ruv for pairs of nodes u and v and
a penalty function in case we do not satisfy all ruv . Our goal is to find a mini-
mum way of constructing a network (graph) in which we connect u and v with

ruv ≤ ruv edge-disjoint paths and paying a penalty for all violated connectivity
between source-sink pairs. This problem can arise in real-world network design,
in which a typical client not only might want to connect to the network but
also might want to connect via a few disjoint paths (e.g., to have a higher band-
width or redundant connections in case of edge failures) and a penalty might
be charged if we cannot satisfy its connectivity requirement. When all penalties
are ∞, the problem is the classic Steiner Network problem. Improving on a long
line of earlier research that applied primal-dual methods, Jain [Jai01] obtained
a 2-approximation algorithm for Steiner Network using the iterative rounding
method. This algorithm was generalized to so called “element-connectivity” by
Fleischer, Jain, and Williamson [FJW01] and by Cheriyan, Vempala, and Vetta
Prize-Collecting Steiner Network Problems 73

[CVV06]. Recently, some results were obtained for the node-connectivity version;
the currently best known ratios for the node-connectivity case are O(R3 log n)
for general requirements [CK09] and O(R2 ) for rooted requirements [Nut09],
where R = maxu,v∈V ruv is the maximum requirement. See also the survey by
Kortsarz and Nutov [KN07] for various min-cost connectivity problems.
Hajiaghayi and Nasri [HN10] generalize the iterative rounding approach of
Jain to Prize-Collecting Steiner Network when there is a separate non-increasing
marginal penalty function for each pair u, v whose ruv -connectivity requirement
is not satisfied. They obtain an iterative rounding 3-approximation algorithm for
this case. For the special case when penalty functions are linear in the violation
of the connectivity requirements, Nagarajan, Sharma, and Williamson [NSW08]
using Jains iterative rounding algorithm as a black box give a 2.54-factor approx-
imation algorithm. They also generalize the 0-1 requirements of Prize-Collecting
Steiner Forest problem introduced by Sharma, Swamy, and Williamson [SSW07]
to include general connectivity requirements. Assuming the monotone submod-
ular penalty function of Sharma et al. is generalized to a multiset function that
can be decomposed into functions in the same type as that of Sharma et al.,
they give an O(log R)-approximation algorithm (recall that R is the maximum
connectivity requirement). In this algorithm, they assume that we can use each
edge possibly many times (without bound). They raise the question whether we
can obtain a constant ratio without all these assumptions, when penalty is a sub-
modular multi-set function of the set of disconnected pairs? More importantly
they pose as an open problem to design a good approximation algorithm for the
all-or-nothing version of penalty functions: penalty functions which charge the
penalty even if the connectivity requirement is slightly violated. In this paper,
we answer affirmatively all these open problems by proving the first constant
factor 2.54-approximation algorithm which is based on a novel LP formulation
of the problem. We further generalize our results for element-connectivity and
node-connectivity. In fact, for all types of connectivities, we prove a very gen-
eral result (see Theorem 1) stating that if Steiner Network (the version without
penalties) admits an LP-based ρ-approximation algorithm, then the correspond-
ing prize-collecting version admits a (ρ + 1)-approximation algorithm.

1.1 Problems We Consider


In this section, we define formally the terms used in the paper. For a subset
S of nodes in a graph H, let λSH (u, v) denote the S-connectivity between u
and v in H, namely, the maximum number of edge-disjoint uv-paths in H so
that no two of them have a node in S − {u, v} in common. In the Generalized
Steiner-Network (GSN) problem we are given a graph G = (V, E) with edge-costs
{ce ≥ 0 | e ∈ E}, a node subset S ⊆ V , a collection {u1 , v1 }, . . . , {uk , vk } of node
pairs from V , and S-connectivity requirements r1 , . . . , rk . The goal is to find a
minimum cost subgraph H of G so that λSH (ui , vi ) ≥ ri for all i. Extensively
studied particular cases of GSN are: the Steiner Network problem, called also
Edge-Connectivity GSN (S = ∅), Node-Connectivity GSN (S = V ), and Element-
Connectivity GSN (S ∩ {ui , vi } = ∅ for all i). The case of rooted requirements
74 M. Hajiaghayi et al.

is when there is a “root” s that belongs to all pairs {ui , vi }. We consider the
following “prize-collecting” version of GSN.

All-or-Nothing Prize Collecting Generalized Steiner Network (PC-GSN):

Instance: A graph G = (V, E) with edge-costs {ce ≥ 0 | e ∈ E}, S ⊆ V ,


a collection {u1 , v1 }, . . . , {uk , vk } of node pairs from V , S-connectivity
requirements r1 , . . . , rk > 0, and a penalty function π : 2{1,...,k} → + .

Objective: Find a subgraph H of G that minimizes the value

val(H) = c(H) + π(unsat(H))

of H, where unsat(H) = {i | λSH (ui , vi ) < ri } is the set of requirements not


satisfied by H.
We will assume that the penalty function π is given by an evaluation oracle.
We will also assume that π is submodular, namely, that π(A) + π(B) ≥ π(A ∩
B) + π(A ∪ B) for all A, B and that it is monotone non-decreasing, namely,
π(A) ≤ π(B) for all A, B with A ⊆ B. As was mentioned, approximating the
edge-connectivity variant of PC-GSN was posed as the main open problem by
Nagarajan, Sharma, and Williamson [NSW08]. We resolve this open problem for
the submodular function val(H) considered here.
We next define the second problem we consider.

Generalized Steiner Network with Generalized Penalties (GSN-GP):

Instance: A graph G = (V, E) with edge-costs {ce ≥ 0 | e ∈ E}, S ⊆ V ,


a collection {u1 , v1 }, . . . , {uk , vk } of node pairs from V , and non-increasing
penalty functions p1 , . . . , pk : {0, 1, . . . , n − 1} → + .

Objective: Find a subgraph H of G that minimizes the value


k
val (H) = c(H) + pi (λSH (ui , vi )).
i=1

The above problem captures general penalty functions of the S-connectivity


λS (ui , vi ) for given pairs {ui , vi }. It is natural to assume that the penalty func-
tions are non-increasing, i.e., we pay less in the objective function if the achieved
connectivity is more. This problem was posed as an open question by Nagarajan
et al. [NSW08]. In this paper, we use the convention that pi (n) = 0 for all i.
We need some definitions to introduce our results. A pair T = {T  , T  } of
subsets of V is called a setpair (of V ) if T  ∩ T  = ∅. Let K = {1, . . . , k}.
Let T = {T  , T  } be a setpair of V . We denote by δ(T ) the set of edges in E
between T  and T  . For i ∈ K we use T  (i, S) to denote that |T  ∩ {ui , vi }| = 1,
|T  ∩ {ui , vi }| = 1 and V \ (T  ∪ T  ) ⊆ S. While in the case of edge-connectivity
a “cut” consists of edges only, in the case of S-connectivity a cut that separates
Prize-Collecting Steiner Network Problems 75

between u and v is “mixed”, meaning it may contain both edges in the graph
and nodes from S. Note that if T  (i, S) then δ(T ) ∪ (V \ (T  ∪ T  )) is such
a mixed cut that separates between ui and vi . Intuitively, Menger’s Theorem
for S-connectivity (c.f. [KN07]) states that the S-connectivity between ui and
vi equals the minimum size of such a mixed cut. Formally, for a node pair ui , vi
of a graph H = (V, E) and S ⊆ V we have:

λSH (ui , vi ) = min (|δ(T )|+ |V \ (T  ∪T  )|) = min (|δ(T )|+ |V |− (|T  |+ |T  |))
T (i,S) T (i,S)

Hence if λSH (ui , vi ) ≥ ri for a graph H = (V, E), then for any setpair T with
T  (i, S) we must have |δ(T )| ≥ ri (T ), where ri (T ) = max{ri + |T  | + |T  | −
|V |, 0}. Consequently, a standard “cut-type” LP-relaxation of the GSN problem
is as follows (c.f. [KN07]):
⎧ ⎫
⎨  ⎬
min ce xe | xe ≥ ri (T ) ∀T  (i, S), ∀i ∈ K, xe ∈ [0, 1] ∀e . (1)
⎩ ⎭
e∈E e∈δ(T )

1.2 Our Results


We introduce a novel LP relaxation of the problem which is shown to be bet-
ter, in terms of the integrality gap, than a “natural” LP relaxation considered
in [NSW08]. Using our LP relaxation, we prove the following main result.
Theorem 1. Suppose that there exists a polynomial time algorithm that com-
putes an integral solution to LP (1) of cost at most ρ times the optimal value
of LP (1) for any subset of node pairs. Then PC-GSN admits a (1 − e−1/ρ )−1 -
approximation algorithm, provided that the penalty function π is submodular and
monotone non-decreasing.
Note that since 1 − ρ1 < e− ρ < 1 − ρ+1
1
1
holds for ρ ≥ 1, we have ρ < (1 −
−1/ρ −1
e ) < ρ + 1.
Let R = maxi ri denote the maximum requirement. The best known values of
ρ are as follows: 2 for Edge-GSN [Jai01], 2 for Element-GSN [FJW01, CVV06],
O(R3 log |V |) for Node-GSN [CK09], and O(R2 ) for Node-GSN with rooted re-
quirements [Nut09]. Substituting these values in Theorem 1, we obtain:
Corollary 1. PC-GSN problems admit the following approximation ratios pro-
vided that the penalty function π is submodular and monotone non-decreasing:
2.54 for edge- and element-connectivity, O(R3 log |V |) for node-connectivity, and
O(R2 ) for node-connectivity with rooted requirements.
Our results for GSN-GP follow from Corollary 1.
Corollary 2. GSN-GP problems admit the following approximation ratios: 2.54
for edge- and element-connectivity, O(R3 log |V |) for node-connectivity, and O(R2 )
for node-connectivity with rooted requirements. Here R = max1≤i≤k min{λ ≥ 0 |
pi (λ) = 0}.
76 M. Hajiaghayi et al.

Proof. We present an approximation ratio preserving reduction from the GSN-


GP problem to the corresponding PC-GSN problem. Given an instance of the
GSN-GP problem, we create an instance of the PC-GSN problem as follows.
The PC-GSN instance inherits the graph G, its edge-costs, and the set S. Let
(ui , vi ) be a pair in GSN-GP and let Ri = min{λ ≥ 0 | pi (λ) = 0}. We in-
troduce Ri copies of this pair, {(u1i , vi1 ), . . . , (uR i Ri
i , vi )}, to the set of pairs
in the PC-GSN instance. We set the edge-connectivity requirement of a pair
(uti , vit ) to be t for 1 ≤ t ≤ Ri . We also set the penalty function for single-
ton sets as follows π({(uti , vit )}) = pi (t − 1) − pi (t) for all 1 ≤ t ≤ Ri . Fi-
nally,
 we extend this function π to a set of pairs P by linearity, i.e., π(P ) =
p∈P π({p}). Note that such a function π is clearly submodular and monotone
non-decreasing.
It is sufficient to show that for any subgraph H of G, its value in the GSN-
GP instance equals its value in the PC-GSN instance, i.e., val(H) = val (H);
then we can use the algorithm from Corollary 1 to complete the proof. Fix a
pair (ui , vi ) in the GSN-GP instance. Let λSH (ui , vi ) = ti . Thus the contribu-
tion of pair (ui , vi ) to the objective function val(H) of the GSN-GP instance
is pi (ti ). On the other hand, since π is linear, the total contribution of pairs
{(u1i , vi1 ), . . . , (uRi
, viRi )} to the objective function val (H) of the PC-GSN in-
Ri i Ri
stance is t=ti +1 π({(uti , vit )}) = t=t i +1
(pi (t − 1) − pi (t)) = pi (ti ). Note that
the pairs (uti , vit ) for 1 ≤ t ≤ ti do not incur any penalty. Summing up over all
pairs, we conclude that val(H) = val (H), as claimed.

2 A New LP Relaxation

We use the following LP-relaxation for the PC-GSN problem. We introduce vari-
ables xe for e ∈ E (xe = 1 if e ∈ H), fi,e for i ∈ K and e ∈ E (fi,e = 1 if
i ∈ unsat(H) and e appears on a chosen set of ri S-disjoint {ui , vi }-paths in
H), and zI for I ⊆ K (zI = 1 if I = unsat(H)).
 
Minimize e∈E ce xe + I⊆K π(I)zI

 
Subject to fi,e ≥ (1 − I:i∈I zI )ri (T ) ∀i ∀T  (i, S)
e∈δ(T ) 
fi,e ≤ 1 − I:i∈I zI ∀i ∀e
 xe ≥ fi,e ∀i ∀e (2)
I⊆K zI =1
xe , fi,e , zI ∈ [0, 1] ∀i ∀e ∀I

We first prove that (2) is a valid LP-relaxation of the PC-GSN problem.


Lemma 1. The optimal value of LP (2) is at most the optimal solution value
to the PC-GSN problem. Moreover, if π is monotone non-decreasing, the opti-
mum solution value to the PC-GSN problem is at most the value of the optimum
integral solution of LP (2).
Prize-Collecting Steiner Network Problems 77

Proof. Given a feasible solution H to the PC-GSN problem define a feasible


solution to LP (2) as follows. Let xe = 1 if e ∈ H and xe = 0 otherwise. Let
zI = 1 if I = unsat(H) and zI = 0 otherwise. For each i ∈ unsat(H) set
fi,e = 0 for all e ∈ E, while for i ∈ / unsat(H) the variables fi,e take values
as follows: fix a set of ri pairwise S-disjoint {ui , vi }-paths, and let fi,e = 1 if
e belongs to one of these paths and fi,e = 0 otherwise. The defined solution is
feasible for LP (2): the first set of constraints are satisfied by Menger’s Theorem
for S-connectivity, while the remaining constraints are satisfied by the above
definition of variables. It is also easy to see that the above solution has value
exactly val(H).
If π is monotone non-decreasing, we prove that for any integral solution
{xe , fi,e , zI } to (2), the graph H with edge-set {e ∈ E | xe = 1} has val(H)
at most the value of the solution {xe , fi,e , zI }. To see this, first note that there
 a unique set I ⊆ K with zI = 1, since the variables
is  zI are integral and
 I⊆K z I = 1. Now
 consider an index i ∈/ I. Since I:i∈I zI = 0, we have
e∈δ(T ) xe ≥ e∈δ(T ) fi,e ≥ ri (T ) for all T  (i, S). This implies that i ∈
/
unsat(H), by Menger’s Theorem for S-connectivity. Consequently, unsat(H) ⊆
I, hence π(unsat(H))  ≤ π(I) by the monotonicity of π. Thus val(H) = c(H) +
π(unsat(H)) ≤ e∈E ce xe + I⊆K π(I)zI and the lemma follows.

2.1 Why Does a “Natural” LP Relaxation Not Work?


One may be tempted to consider a natural LP without using the flow variables
fi,e , namely, the LP obtained from LP (2) by replacing the the first three sets
of constraints by the set of constraints
 
xe ≥ (1 − zI )ri (T )
e∈δ(T ) I:i∈I

for all i and T  (i, S). Here is an example demonstrating that the integrality
gap of this LP can be as large as R = maxi ri even for edge-connectivity. Let G
consist of R − 1 edge-disjoint paths between two nodes s and t. All the edges
have cost 0. There is only one pair {u1 , v1 } = {s, t} that has requirement r1 = R
and penalty π({1}) = 1. Let π(∅) = 0. Clearly, π is submodular and monotone
non-decreasing. We have S = ∅. No integral solution can satisfy the requirement
r1 , hence an optimal integral solution pays the penalty π({1}) and has value 1.
A feasible fractional solution (without the flow variables) sets xe = 1 for all e,
 sets z{1} = 1/R, z∅ = 1 − 1/R. The new set of constraints is satisfied since
and
e∈δ(T ) xe ≥ (1 − 1/R) · R = (1 − z{1} )r1 (T ) for any {s, t}-cut T . Thus the
optimal LP-value is at most 1/R, giving a gap of at least R.
With flow variables, however, we have an upper bound f1,e ≤ 1 − z{1} . Since
there
 is an {s, t}-cut T with |δ(T )| = R − 1, we cannot satisfy the constraints
e∈δ(T ) f1,e ≥ (1 − z{1} )r1 (T ) and f1,e ≤ 1 − z{1} simultaneously unless we set
z{1} = 1. Thus in this case, our LP (2) with flow variables has the same optimal
value of as the integral optimum.
78 M. Hajiaghayi et al.

2.2 Some Technical Results Regarding LP (2)

We will prove the following two statements that together imply Theorem 1.
Lemma 2. Any basic feasible solution to (2) has a polynomial number of non-
zero variables. Furthermore, an optimal basic solution to (2) (the non-zero en-
tries) can be computed in polynomial time.

Lemma 3. Suppose that there exists a polynomial time algorithm that computes
an integral solution to LP (1) of cost at most ρ times the optimal value of LP (1)
for any subset of node pairs. Then there exists a polynomial time algorithm that
given a feasible solution to (2) computes as a solution to PC-GSN a subgraph H
of G so that val(H) = c(H) + π(unsat(H)) is at most (1 − e−1/ρ )−1 times the
value of this solution, assuming π is submodular and monotone non-decreasing.

Before proving these lemmas, we prove some useful results. The following state-
ment can be deduced from a theorem of Edmonds for polymatroids (c.f. [KV02,
Chapter 14.2]), as the dual LP d(γ) in the lemma seeks to optimize a linear
function over a polymatroid. We provide a direct proof for completeness of ex-
position.
Lemma 4. Let γ ∈ [0, 1]k be a vector. Consider a primal LP
⎧ ⎫
⎨  ⎬
p(γ) := min π(I)zI | zI ≥ γi ∀i ∈ K, zI ≥ 0 ∀I ⊆ K
⎩ ⎭
I⊆K I:i∈I

and its dual LP


 
 
d(γ) := max γi yi | yi ≤ π(I) ∀I ⊆ K, yi ≥ 0 ∀i ∈ K .
i∈K i∈I

Let σ be a permutation of K such that γσ(1) ≤ γσ(2) ≤ . . . ≤ γσ(k) . Let us also use
the notation that γσ(0) = 0. The optimum solutions to p(γ) and d(γ) respectively
are given by

γσ(i) − γσ(i−1) , for I = {σ(i), . . . , σ(k)}, i ∈ K
zI =
0, otherwise;

and

yσ(i) = π({σ(i), . . . , σ(k)}) − π({σ(i + 1), . . . , σ(k)}), for i ∈ K.

Proof. To simplify the notation, we assume without loss of generality that γ1 ≤


γ2 ≤ · · · ≤ γk , i.e., that σ is the identity permutation.
We argue that the above defined {zI } and {yi } form feasible solutions
 to the
primal and dual LPs respectively. Note that zI ≥ 0 for all I and I:i∈I zI =
i
j=1 (γj − γj−1 ) = γi for all i. Since π is monotone non-decreasing, the above
Prize-Collecting Steiner Network Problems 79

defined yi satisfy yi ≥ 0 for all i. Now fix I ⊆ K. Let I = {i1 , . . . , ip } where


i1 < · · · < ip . Therefore
 
p 
p
yi = yij = [π({ij , . . . , k}) − π({ij + 1, . . . , k})]
i∈I j=1 j=1
p
≤ [π({ij , ij+1 , . . . , ip }) − π({ij+1 , ij+2 , . . . , ip })]
j=1
= π({i1 , . . . , ip }) = π(I).
The above inequality holds because of the submodularity of π. Next observe that
the solutions {zI } and {yi } satisfy
 
k
π(I)zI = π({i, . . . , k}) · (γi − γi−1 )
I i=1


k 
k
= γi · (π({i, . . . , k}) − π({i + 1, . . . , k})) = γi · yi .
i=1 i=1

Thus from weak LP duality, they in fact form optimum solutions to primal and
dual LPs respectively.
Recall that a sub-gradient of a convex function g : k →  at a point γ ∈ k
is a vector d ∈ k such that for any γ  ∈ k , we have g(γ  ) − g(γ) ≥ d ·
(γ  − γ). For a differentiable convex function g, the sub-gradient corresponds
to gradient ∇g. The function p(γ) defined in Lemma 4 is essentially Lovasz’s
continuous extension of the submodular function π. The fact that p is convex
and its subgradient can be computed efficiently is given in [Fuj05]. We provide
a full proof for completeness of exposition.
Lemma 5. The function p(γ) in Lemma 4 is convex and given γ ∈ [0, 1]k , both
p(γ) and its sub-gradient ∇p(γ) can be computed in polynomial time.
Proof. We first prove that p is convex. Fix γ1 , γ2 ∈ [0, 1]k and α ∈ [0, 1]. To show
that p is convex, we will show p(αγ1 + (1 − α)γ2 ) ≤ αp(γ1 ) + (1 − α)p(γ2 ). Let
{zI1 } and {zI2 } be the optimum solutions of the primal LP defining p for γ1 and
γ2 respectively. Note that the solution {αzI1 + (1 − α)zI2 } is feasible for this LP
for γ = αγ1 + (1 − α)γ2 . Thus the optimum solution has value not greater than
the value of this solution which is αp(γ1 ) + (1 − α)p(γ2 ).
From Lemma 4, it is clear that given γ ∈ [0, 1]k , the value p(γ) can be com-
puted in polynomial time. Lemma 4 also implies that the optimum dual solution
y ∗ = (y1∗ , . . . , yk∗ ) ∈ k+ can be computed in polynomial time. We now argue that
y ∗ is a sub-gradient of p at γ. Fix any γ  ∈ k . First note that, from LP duality,
p(γ) = y ∗ · γ. Thus we have
p(γ) + y ∗ · (γ  − γ) = y ∗ · γ + y ∗ · (γ  − γ) = y ∗ · γ  ≤ p(γ  ).
The last inequality holds from weak LP duality since y ∗ is a feasible solution for
the dual LP d(γ  ) as well. The lemma follows.
80 M. Hajiaghayi et al.

3 Proof of Lemma 3

We now describe how to round LP (2) solutions to obtain a (ρ+1)-approximation


for PC-GSN. Later we show how to improve it to (1 − e−1/ρ )−1 . Let {x∗e , fi,e

, zI∗ }
be a feasible solution to LP (2). Let α ∈ (0, 1) be a parameter to be fixed later.
Wepartition the requirements into two classes: we call a requirement i ∈ K good
if I:i∈I zI∗ ≤ α and bad otherwise. Let Kg denote the set of good requirements.
The following statement shows how to satisfy the good requirements.

Lemma 6. There exists a polynomial


 time algorithm that computes a subgraph
H of G of cost c(H) ≤ 1−α
ρ
· e ce x∗e that satisfies all good requirements.

Proof. Consider the LP-relaxation (1) of the GSN problem with good require-
ments only, with K replaced by Kg ; namely, we seek a minimum cost sub-
graph H of G that satisfies the set Kg of good requirements. We claim that
x∗∗ ∗
e = min {1, xe /(1 − α)} for each e ∈ E  is a feasible solution to LP (1). Thus
the optimum value of LP (1) is at most e∈E ce x∗∗ e . Consequently, using the
algorithm that computes an integral solution to LP (1) of cost at most ρ times
the optimal value of LP (1), we can construct a subgraph H that satisfies all
ρ 
good requirements and has cost at most c(H) ≤ ρ e∈E ce x∗∗ e ≤ 1−α e c e x∗
e,
as desired.
∗∗
 We now∗∗ show that {xe } is a feasible solution to LP (1), namely, that
x ≥ ri (T ) for any i ∈ Kg and any T  (i, S). Let i ∈ Kg and let ζi =
 ) e
e∈δ(T
1 − I:i∈I zI∗ . Note that ζi ≥ 1 − α, by the definition of Kg . By the second and
the third sets of constraints in LP (2), for every e ∈ E we have min{ζi , x∗e } ≥ fi,e∗
.


Thus we obtain: x∗∗ x∗e ≥ ζi min{ζi , x∗e } ≥


x 1 ζi 1
e = min 1, 1−α = ζi min ζi , 1−α
e


fi,e f∗
=  i,e
ζi 1− zI∗ .
Consequently, combining with the first set of constraints in


I:i∈I

e∈δ(T ) fi,e
LP (2), for any T  (i, S) we obtain that e∈δ(T ) x∗∗
e ≥ 1−

z ∗ ≥ ri (T ).
I:i∈I I

Let H be as in Lemma 6, and recall that unsat(H) denotes the set of require-
ments not satisfied by H. Clearly each requirement i ∈ unsat(H) is bad. The
following lemma bounds the total penalty we pay for unsat(H).

Lemma 7. π(unsat(H)) ≤ α1 · I π(I)zI∗ .

Proof. Define γ ∈ [0, 1]k as follows: γi = 1 if i ∈ unsat(H) and 0 otherwise. Now


consider LP p(γ) defined in Lemma 4. Since each i ∈ unsat(H) is bad, from
the definition of bad requirements, it is clear that {zI∗ /α} is a feasible solution
to LP p(γ). Furthermore, from Lemma 4, the solution {zI } defined as zI = 1
if I = unsat(H) and 0 otherwise is the optimum solution to p(γ). The cost of
this solution, π(unsat(H)),
 is therefore at most the cost of the feasible solution
{zI∗ /α} which is α1 · I π(I)zI∗ . The lemma thus follows.

Combining Lemmas 6 and 7, we obtain max{ 1−α ρ


, α1 }-approximation. If we sub-
stitute α = 1/(ρ + 1), we obtain a (ρ + 1)-approximation for PC-GSN.
Prize-Collecting Steiner Network Problems 81

Improving the Approximation to (1 − e−1/ρ )−1 . We use a technique


introduced by Goemans as follows. We pick α uniformly at random from the
interval (0, β] where β = 1 − e−1/ρ . From Lemmas 6 and 7, the expected cost of
the solution is at most
" # 
ρ
Eα · ce x∗e + Eα [π(unsat(H))] . (3)
1−α
e∈E

To complete the proof of β1 -approximation, we now argue that the above expec-
 
tation is at most β1 · e∈E (ce x∗e + I π(I)zI∗ ).
$ % 
Since Eα 1−αρ
= β1 , the first term in (3) is at most β1 · e∈E ce x∗e . Since

unsat(H) ⊆ {i | I:i∈I zI∗ ≥ α} and & sinceπ is monotone'non-decreasing, the
second term in (3) is at most Eα π {i | I:i∈I zI∗ ≥ α} . Lemma 8 bounds
this quantity as follows. The ideas used here are also presented in Sharma et
al. [SSW07].
Lemma 8. We have
( ) *+
 1 
Eα π {i | zI∗ ≥ α} ≤ · π(I)zI∗ . (4)
β
I:i∈I I

Proof. Let γi = I:i∈I zI∗ for all i ∈ K. Let us, without loss of generality, order
the elements i ∈ K such that γ1 ≤ γ2 ≤ · · · ≤ γk . We also use the notation
γ0 = 0. Note that {zI∗ } forms a feasible solution to the primal LP p(γ) given in
Lemma 4. Therefore, from Lemma 4, its objective value is at least that of the
optimum solution:

 
k
π(I)zI∗ ≥ [(γi − γi−1 ) · π({i, . . . , k})] . (5)
I i=1

We now observe that the LHS of (4) can be expressed as follows. Since α is picked
uniformly at random from (0, β], we have that for all 1 ≤ i ≤ k, with probability
at most γi −γ i−1
, the random variable α lies in the interval (γi−1 , γi ]. When this
β 
event happens, we get that {i | I:i ∈I zI∗ ≥ α} = {i | γi ≥ α} = {i, . . . , k}.
Thus the expectation in LHS of (4) is at most
k "
 #
γi − γi−1
· π({i, . . . , k}) . (6)
i=1
β

From expressions (5) and (6), the lemma follows.

Thus the proof of (1 − e−1/ρ )−1 -approximation is complete. It is worth men-


tioning so far in this section we obtain a solution with a bound on its expected
cost. However, the choice of α can be simply derandomized by trying out all the
breakpoints where a good demand pair becomes a bad one (plus 0 and β).
82 M. Hajiaghayi et al.

4 Proof of Lemma 2
We next show that even if LP (2) has exponential number of variables and
constraints, the following lemma holds.
Lemma 9. Any basic feasible solution to LP (2) has a polynomial number of
non-zero variables.
Proof. Fix a basic feasible solution {x∗e , fi,e

, zi∗ } to (2). For i ∈ K, let
 ∗
min e∈δ(T ) fi,e
T :T i
γi = 1 − and γi = 1 − max fi,e

.
ri e
Now fix the values of variables {xe , fi,e } to {x∗e , fi,e

} and project the LP (2) onto
variables {zI } as follows.

 ⎨
ce x∗e + min π(I)zI |

e∈E I⊆K

  ⎬
zI = 1, γi ≤ zI ≤ γi ∀i ∈ K, zI ≥ 0 ∀I ⊆ K . (7)

I⊆K I:i∈I

Since {x∗e , fi,e



, zi∗ } is a basic feasible solution to (2), it cannot be written as a
convex combination of two distinct feasible solutions to (2). Thus we get that
{zI∗ } cannot be written as a convex combination of two distinct feasible solutions
to (7), and hence it forms a basic feasible solution to (7). Since there are 1 + 2|K|
non-trivial constraints in (7), at most 1 + 2|K| variables zI can be non-zero in
any basic feasible solution of (7). Thus the lemma follows.
We prove that LP (2) can be solved in polynomial time. Introduce variables
γ ∈ [0, 1]k and obtain the following program (the function p is as in Lemma 4).

Minimize e∈E ce xe + p(γ)

Subject to e∈δ(T ) fi,e ≥ (1 − γi )ri (T ) ∀i ∀T  (i, S)
fi,e ≤ 1 − γi ∀i ∀e
(8)
xe ≥ fi,e ∀i ∀e
xe , fi,e , γi ∈ [0, 1] ∀i ∀e
It is clear that solving (8) is enough to solve (2). Now note that this is a convex
program since p is a convex
 function. To solve (8), we convert its objective func-
tion into a constraint e∈E ce xe + p(γ) ≤ opt where opt is the target objective
value and thus reduce it to a feasibility problem. Now to find a feasible solution
using an ellipsoid algorithm, we need to show a polynomial time separation or-
acle. The separation oracle for the first set of constraints can be reduced to a
minimum u-v cut problem using standard techniques. The separation oracle for
the remaining constraints is trivial.
The separation oracle for the objective
 function is as follows. Given a point
(x, γ) = {xe , γi } that satisfies e∈E c e xe + p(γ) > opt, we compute a
Prize-Collecting Steiner Network Problems 83


sub-gradient of the function e∈E ce xe + p(γ) w.r.t. variables {xe , γi }. The sub-
gradient of e∈E ce xe w.r.t. x is simply the cost vector c. The sub-gradient
of p(γ) w.r.t. γ is computed using Lemma 5, denote it by y ∈ k+ . From the
definition of sub-gradient, we have that the sub-gradient (c, y) to the objective
function at point (x, γ) satisfies
) * ) *
 
 
ce xe + p(γ ) − ce xe + p(γ) ≥ (c, y) · ((x , γ  ) − (x, γ)) .
e∈E e∈E

Now fix any feasible solution (x∗ , γ ∗ ), i.e., the one that satisfies e∈E ce x∗e +
p(γ ∗ ) ≤ opt. Substituting (x , γ  ) = (x∗ , γ ∗ ) in the above equation we get,
) * ) *
 
0 = opt − opt > ce x∗e + p(γ ∗ ) − ce xe + p(γ)
e∈E e∈E
≥ (c, y) · (x∗ , γ ∗ ) − (c, y) · (x, γ).

Thus (c, y) defines a separating hyperplane between the point (x, γ) and any
point (x∗ , γ ∗ ) that satisfies e∈E ce x∗e + p(γ ∗ ) ≤ opt. Hence we have a polyno-
mial time separation oracle for the objective function as well.
Thus we can solve (8) using the ellipsoid algorithm. The proof of Lemma 2 is
hence complete.

References
[ABHK09] Archer, A., Bateni, M., Hajiaghayi, M., Karloff, H.: A technique for im-
proving approximation algorithms for prize-collecting problems. In: Proc.
50th IEEE Symp. on Foundations of Computer Science, FOCS (2009)
[AKR95] Agrawal, A., Klein, P., Ravi, R.: When trees collide: an approximation
algorithm for the generalized Steiner problem on networks. SIAM J. Com-
put. 24(3), 440–456 (1995)
[Bal89] Balas, E.: The prize collecting traveling salesman problem. Networks 19(6),
621–636 (1989)
[BGRS10] Byrka, J., Grandoni, F., Rothvoss, T., Sanita, L.: An improved lp-based
approximation for steiner tree. In: Proceedings of the 42nd annual ACM
Symposium on Theory of computing, STOC (2010)
[BGSLW93] Bienstock, D., Goemans, M., Simchi-Levi, D., Williamson, D.: A note on
the prize collecting traveling salesman problem. Math. Programming 59(3,
Ser. A), 413–420 (1993)
[BP89] Bern, M., Plassmann, P.: The Steiner problem with edge lengths 1 and 2.
Information Processing Letters 32, 171–176 (1989)
[CK09] Chuzhoy, J., Khanna, S.: An O(k3 log n)-approximation algorithms for
vertex-connectivity network design. In: Proceedings of the 50th Annual
IEEE Symposium on Foundations of Computer Science, FOCS (2009)
[CRW01] Chudak, F., Roughgarden, T., Williamson, D.: Approximate k-MSTs and
k-Steiner trees via the primal-dual method and Lagrangean relaxation. In:
Aardal, K., Gerards, B. (eds.) IPCO 2001. LNCS, vol. 2081, pp. 60–70.
Springer, Heidelberg (2001)
84 M. Hajiaghayi et al.

[CVV06] Cheriyan, J., Vempala, S., Vetta, A.: Network design via iterative rounding
of setpair relaxations. Combinatorica 26(3), 255–275 (2006)
[FJW01] Fleischer, L., Jain, K., Williamson, D.: An iterative rounding 2-
approximation algorithm for the element connectivity problem. In: Proc.
of the 42nd IEEE Symp. on Foundations of Computer Science (FOCS), pp.
339–347 (2001)
[Fuj05] Fujishige, S.: Submodular functions and optimization. Elsevier, Amster-
dam (2005)
[GKL+ 07] Gupta, A., Könemann, J., Leonardi, S., Ravi, R., Schäfer, G.: An efficient
cost-sharing mechanism for the prize-collecting steiner forest problem. In:
Proc. of the 18th ACM-SIAM Symposium on Discrete algorithms (SODA),
pp. 1153–1162 (2007)
[GW95] Goemans, M., Williamson, D.: A general approximation technique for con-
strained forest problems. SIAM J. Comput. 24(2), 296–317 (1995)
[HJ06] Hajiaghayi, M., Jain, K.: The prize-collecting generalized Steiner tree prob-
lem via a new approach of primal-dual schema. In: Proc. of the 17th ACM-
SIAM Symp. on Discrete Algorithms (SODA), pp. 631–640 (2006)
[HN10] Hajiahayi, M., Nasri, A.: Prize-collecting Steiner networks via iterative
rounding. In: LATIN (to appear, 2010)
[Jai01] Jain, K.: A factor 2 approximation algorithm for the generalized Steiner
network problem. Combinatorica 21(1), 39–60 (2001)
[JMP00] Johnson, D., Minkoff, M., Phillips, S.: The prize collecting Steiner tree
problem: theory and practice. In: Proceedings of the Eleventh Annual
ACM-SIAM Symposium on Discrete Algorithms, pp. 760–769 (2000)
[JV01] Jain, K., Vazirani, V.: Approximation algorithms for metric facility loca-
tion and k-median problems using the primal-dual schema and Lagrangian
relaxation. J. ACM 48(2), 274–296 (2001)
[KN07] Kortsarz, G., Nutov, Z.: Approximating minimum cost connectivity prob-
lems. In: Gonzales, T.F. (ed.) Approximation Algorithms and Metahueris-
tics, ch. 58. CRC Press, Boca Raton (2007)
[KV02] Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms.
Springer, Berlin (2002)
[NSW08] Nagarajan, C., Sharma, Y., Williamson, D.: Approximation algorithms for
prize-collecting network design problems with general connectivity require-
ments. In: Bampis, E., Skutella, M. (eds.) WAOA 2008. LNCS, vol. 5426,
pp. 174–187. Springer, Heidelberg (2009)
[Nut09] Nutov, Z.: Approximating minimum cost connectivity problems via un-
crossable bifamilies and spider-cover decompositions. In: Proc. of the 50th
IEEE Symposium on Foundations of Computer Science, FOCS (2009)
[RZ05] Robins, G., Zelikovsky, A.: Tighter bounds for graph Steiner tree approx-
imation. SIAM J. on Discrete Mathematics 19(1), 122–134 (2005)
[SCRS00] Salman, F., Cheriyan, J., Ravi, R., Subramanian, S.: Approximating the
single-sink link-installation problem in network design. SIAM J. on Opti-
mization 11(3), 595–610 (2000)
[SSW07] Sharma, Y., Swamy, C., Williamson, D.: Approximation algorithms for
prize collecting forest problems with submodular penalty functions. In:
Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms
(SODA), pp. 1275–1284 (2007)
On Lifting Integer Variables in Minimal
Inequalities

Amitabh Basu1, , Manoel Campelo2, , Michele Conforti3 ,


Gérard Cornuéjols1,4, , and Giacomo Zambelli3
1
Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213
2
Departamento de Estatı́stica e Matemática Aplicada,
Universidade Federal do Ceará, Brazil
3
Dipartimento di Matematica Pura e Applicata, Universitá di Padova, Via Trieste
63, 35121 Padova, Italy
4
LIF, Faculté des Sciences de Luminy, Université de Marseille, France

Abstract. This paper contributes to the theory of cutting planes for


mixed integer linear programs (MILPs). Minimal valid inequalities are
well understood for a relaxation of an MILP in tableau form where all
the nonbasic variables are continuous. In this paper we study lifting
functions for the nonbasic integer variables starting from such minimal
valid inequalities. We characterize precisely when the lifted coefficient is
equal to the coefficient of the corresponding continuous variable in every
minimal lifting. The answer is a nonconvex region that can be obtained
as the union of convex polyhedra.

1 Introduction
There has been a renewed interest recently in the study of cutting planes for
general mixed integer linear programs (MILPs) that cut off a basic solution
of the linear programming relaxation. More precisely, consider a mixed integer
linear set in which the variables are partitioned into a basic set B and a nonbasic
set N , and K ⊆ B ∪ N indexes the integer variables:

xi = fi − j∈N aij xj for i ∈ B
x≥0 (1)
xk ∈ Z for k ∈ K.

Let X be the relaxation of (1) obtained by dropping the nonnegativity restriction


on all the basic variables xi , i ∈ B. The convex hull of X is the corner polyhedron
introduced by Gomory  [11] (see also [12]). Note that, for any i ∈ B \ K, the
equation xi = fi − j∈N aij xj can be removed from the formulation of X
since it just defines variable xi . Therefore, throughout the paper, we will assume

Supported by a Mellon Fellowship and NSF grant CMMI0653419.

Partially supported by CNPq Brazil.

Supported by NSF grant CMMI0653419, ONR grant N00014-09-1-0133 and ANR
grant ANR06-BLAN-0375.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 85–95, 2010.

c Springer-Verlag Berlin Heidelberg 2010
86 A. Basu et al.

B ⊆ K, i.e. all basic variables are integer. Andersen, Louveaux, Weismantel and
Wolsey [1] studied the corner polyhedron when |B| = 2 and B = K, i.e. all
nonbasic variables are continuous. They give a complete characterization of the
corner polyhedron using intersection cuts (Balas [2]) arising from splits, triangles
and quadrilaterals. This very elegant result has been extended to |B| > 2 and
B = K by showing a correspondence between minimal valid inequalities and
maximal lattice-free convex sets [5], [7]. These results and their extensions [6],
[10] are best described in an infinite model, which we motivate next.
A classical family of cutting planes for (1) is that of Gomory mixed integer
cuts.
 For a given row of the tableau, the Gomory mixed integer cut is of the form
j∈N \K ψ(aij )xj + j∈N ∩K π(aij )xj ≥ 1 where ψ and π are functions given by
simple formulas. A nice feature of the Gomory mixed integer cut is that, for fixed
fi , the same functions ψ, π are used for any possible choice of the aij s in (1). It is
well known that the Gomory mixed integer cuts are also valid for X. More gener-
ij , i ∈ B; we are interested
ally, let aj be the vector with entries a  in pairs (ψ, π)
of functions such that the inequality j∈N \K ψ(aj )xj + j∈N ∩K π(aj )xj ≥ 1
is valid for X for any possible choice of the nonbasic coefficients aij . Since we
are interested in nonredundant inequalities, we can assume that the function
(ψ, π) is pointwise minimal. While a general characterization of minimal valid
functions seems hopeless (see for example [4]), when N ∩ K = ∅ the minimal
valid functions ψ are well understood in terms of maximal lattice-free convex
sets, as already mentioned. Starting from such a minimal valid function ψ, an
interesting question is how to generate a function π such that (ψ, π) is valid and
minimal. Recent papers [8], [9] study when such a function π is unique. Here we
prove a theorem that generalizes and unifies results from these two papers.
In order to formalize the concept of valid function (ψ, π), we introduce the
following infinite model. In the setting below, we also allow further linear con-
straints on the basic variables. Let S be the set of integral points in some rational
polyhedron in Rn such that dim(S) = n (for example, S could be the set of non-
negative integer points). Let f ∈ Rn \ S. Consider the following semi-infinite
relaxation of (1), introduced in [10].
 
x= f+ rsr + ryr , (2)
r∈Rn r∈Rn
x ∈ S,
sr ∈ R+ , ∀r ∈ Rn ,
yr ∈ Z+ , ∀r ∈ Rn ,
s, y have finite support

where the nonbasic continuous variables have been renamed s and the nonbasic
integer variables have been renamed y. Given π : Rn → R, (ψ, π)
two functions ψ, 
is said to be valid for (2) if the inequality r∈Rn ψ(r)sr + r∈Rn π(r)yr ≥ 1
holds for every (x, s, y) satisfying (2). We also consider the semi-infinite model
where we only have continuous nonbasic variables.
On Lifting Integer Variables in Minimal Inequalities 87


x= f+ rsr (3)
r∈Rn
x ∈ S,
sr ∈ R+ , ∀r ∈ Rn ,
s has finite support.

A function ψ : Rn → R is said to be valid for (3) if the inequality r∈Rn ψ(r)sr ≥
1 holds for every (x, s) satisfying (3). Given a valid function ψ for (3), a function
π is a lifting of ψ if (ψ, π) is valid for (2). One is interested only in (pointwise)
minimal valid functions, since non-minimal ones are implied by some minimal
valid function. If ψ is a minimal valid function for (3) and π is a lifting of ψ such
that (ψ, π) is a minimal valid function for (2) then we say that π is a minimal
lifting of ψ.
While minimal valid functions for (3) have a simple characterization [6], min-
imal valid functions for (2) are not well understood. A general idea to derive
minimal valid functions for (2) is to start from some minimal valid function ψ
for (3), and construct a minimal lifting π of ψ. While there is no general tech-
nique to compute such minimal lifting π, it is known that there exists a region
Rψ , containing the origin in its interior, where ψ coincides with π for any mini-
mal lifting π. This latter fact was observed by Dey and Wolsey [9] for the case
of S = Z2 and by Conforti, Cornuéjols and Zambelli [8] for the general case.
Furthermore, it is remarked in [8] and [10] that, if π is a minimal lifting of ψ,
then π(r) = π(r ) for every r, r ∈ Rn such that r − r ∈ Zn ∩ lin(conv(S)).
Therefore the coefficients of any minimal lifting π are uniquely determined in
the region Rψ + (Zn ∩ lin(conv(S))). In particular, whenever translating Rψ by
integer vectors in lin(conv(S)) covers Rn , ψ has a unique minimal lifting. The
purpose of this paper is to give a precise description of the region Rψ .
To state our main result, we need to explain the characterization of minimal
valid functions for (3). We say that a convex set B ⊆ Rn is S-free if B does not
contain any point of S in its interior. A set B is a maximal S-free convex set if it
is an S-free convex set that is not properly contained in any S-free convex set.
It was proved in [6] that maximal S-free convex sets are polyhedra containing a
point of S in the relative interior of each facet.
Given an S-free polyhedron B ⊆ Rn containing f in its interior, B can be
uniquely written in the form

B = {x ∈ Rn : ai (x − f ) ≤ 1, i ∈ I},

where I is a finite set of indices and ai (x − f ) ≤ 1 is facet-defining for B for


every i ∈ I.
Let ψB : Rn → R be the function defined by

ψB (r) = max ai r, ∀r ∈ Rn .
i∈I

Note in particular that, since maximal S-free convex sets are polyhedra, the
above function is defined for all maximal S-free convex sets B.
88 A. Basu et al.

Theorem 1. [6] Let ψ be a minimal valid function for (3). Then the set

Bψ := {x ∈ Rn | ψ(x − f ) ≤ 1}

is a maximal S-free convex set containing f in its interior, and ψ = ψBψ .


Conversely, if B is a maximal S-free convex set containing f in its interior, then
ψB is a minimal valid function for (3).
We are now ready to state the main result of the paper. Given a minimal valid
function ψ for (3), by Theorem 1 Bψ is a maximal S-free convex set containing
f in its interior, thus it can be uniquely written as Bψ = {x ∈ Rn | ai (x − f ) ≤
1, i ∈ I}. For every r ∈ Rn , let I(r) = {i ∈ I | ψ(r) = ai r}. Given x ∈ S, let

R(x) := {r ∈ Rn | I(r) ⊇ I(x − f ) and I(x − f − r) ⊇ I(x − f )}.

We define ,
Rψ := R(x).
x∈S∩Bψ

Theorem 2. Let ψ be a minimal valid function for (3). If π is a minimal lifting


of ψ, then π(r) = ψ(r) for every r ∈ Rψ .
Conversely, for every r̄ ∈ Rψ , there exists a lifting π of ψ such that π(r̄) < ψ(r̄).
Figure 1 illustrates the region Rψ for several examples. We conclude the intro-
duction presenting a different characterization of the regions R(x).
Proposition 1. Let ψ be a minimal valid function for (3), and let x ∈ S. Then
R(x) = {r ∈ Rn | ψ(r) + ψ(x − f − r) = ψ(x − f )}.

Proof. We can uniquely write Bψ = {x ∈ Rn | ai (x − f ) ≤ 1, i ∈ I}. Let h ∈


I(x − f ). Then

ψ(x − f ) = ah (x − f ) = ah r+ah (x −f − r) ≤ maxi∈I ai r+maxi∈I ai (x − f − r) =


ψ(r) + ψ(x − f − r).

In the above expression, equality holds if and only if h ∈ I(r) and h ∈ I(x−f −r).

2 Minimum Lifting Coefficient of a Single Variable


Given r∗ ∈ Rn , we consider the set of solutions to

x=f+ rsr + r∗ yr∗
r∈Rn
x∈S
s≥0 (4)
yr∗ ≥ 0, yr∗ ∈ Z
s has finite support.
On Lifting Integer Variables in Minimal Inequalities 89

x3 R(x3) R(x ) x f
f 2 2
x1 x2

R(x2)
R(x1)

R(x1)
x1

l1 l
(a) A maximal Z2 -free triangle with (b) A wedge
three integer points

x3 R(x3)
x1
R(x1)

Bψ x1 x2
f
x6 R(x1)
x2
R(x2) R(x6) R(x2)
f
R(x5)
R(x4)

x3 x4 x5
R(x3)
(c) A maximal Z2 -free triangle with integer (d) A truncated wedge
vertices

Fig. 1. Regions R(x) for some maximal S-free convex sets in the plane. The thick dark
line indicates the boundary of Bψ . For a particular x, the dark gray regions denote
R(x). The jagged lines in a region indicate that it extends to infinity. For example, in
Figure 1(b), R(x1 ) is the strip between lines l1 and l. Figure 1(c) shows an example
where R(x) is full-dimensional for x2 , x4 , x6 , but is not full-dimensional for x1 , x3 , x5 .
90 A. Basu et al.

Given
 a minimal valid function ψ for (3) and scalar λ, we say that the inequality
r∈Rn ψ(r)sr + λyr ∗ ≥ 1 is valid for (4) if it holds for every (x,
s, yr∗ ) satisfy-
ing (4). We denote by ψ ∗ (r∗ ) the minimum value of λ for which r∈Rn ψ(r)sr +
λyr∗ ≥ 1 is valid for (4).
We observe that, for any lifting π of ψ, we have

ψ ∗ (r∗ ) ≤ π(r∗ ).

Indeed, r∈Rn ψ(r)sr + π(r∗ )yr∗ ≥ 1 is valid for (4), since, for any (s̄, ȳr∗ )
satisfying (4), the vector (s̄, ȳ), defined by ȳr = 0 for all r ∈ Rn \{r∗ }, satisfies (2).
Moreover, the following fact was shown in [8].
Lemma 1. If ψ is a minimal valid function for (3) and π is a minimal lifting
of ψ, then π ≤ ψ.
So we have the following relation for every minimal lifting π of ψ :

ψ ∗ (r) ≤ π(r) ≤ ψ(r) for all r ∈ Rn .

In general ψ ∗ is not a lifting of ψ, but if it is, then the above relation implies
that it is the unique minimal lifting of ψ.
Remark 1. For any r ∈ Rn such that ψ ∗ (r) = ψ(r), we have π(r) = ψ(r) for
every minimal lifting π of ψ. Conversely, if ψ ∗ (r∗ ) < ψ(r∗ ) for some r∗ ∈ Rn ,
then there exists some lifting π of ψ such that π(r∗ ) < ψ(r∗ ).

Proof. The first part follows from ψ ∗ ≤ π ≤ ψ. For the second part, given
r∗ ∈ Rn such that ψ ∗ (r∗ ) < ψ(r∗ ), we can define π by π(r∗ ) = ψ ∗ (r∗ ) and
π(r) = ψ(r) for all r ∈ Rn , r = r∗ . Since ψ is valid for (3), it follows by the
definition of ψ ∗ (r∗ ) that π is a lifting of ψ.

By the above remark, in order to prove Theorem 2 we need to show that


Rψ = {r ∈ Rn | ψ(r) = ψ ∗ (r)}. We will need the following results.
Theorem 3. [6] A full-dimensional convex set B is a maximal S-free convex
set if and only if it is a polyhedron such that B does not contain any point
of S in its interior and each facet of B contains a point of S in its relative
interior. Furthermore if B ∩ conv(S) has nonempty interior, lin(B) contains
rec(B ∩ conv(S)).

Remark 2. The proof of Theorem 3 in [6] implies the following. Given a maximal
S-free convex set B, there exists δ > 0 such that there is no point of S \ B at
distance less than δ from B.

Let r∗ ∈ Rn . Given a maximal S-free convex set B = {x ∈ Rn | ai (x − f ) ≤


1, i ∈ I}, for any λ ∈ R, we define the set B(λ) ⊂ Rn+1 as follows
 
x
B(λ) = { ∈ Rn+1 | ai (x − f ) + (λ − ai r∗ )xn+1 ≤ 1, i ∈ I}. (5)
xn+1
On Lifting Integer Variables in Minimal Inequalities 91

Theorem 4. [8] Let r∗ ∈ Rn .  Given a maximal S-free convex set B, let ψ =


ψB . Given λ ∈ R, the inequality r∈Rn ψ(r)sr + λyr∗ ≥ 1 is valid for (4) if and
only if B(λ) is (S × Z+ )-free.
Remark 3. Let r∗ ∈ Rn , and let B be a maximal S-free convex set. For every λ
such that B(λ) is (S × Z+ )-free, B(λ) is maximal (S × Z+ )-free.
Proof. Since B is a maximal S-free convex set, then by Theorem 3 each facet of B
contains a point x̄ of S in its relative
 interior. Therefore the corresponding facet
of B(λ) contains the point x̄0 in its relative interior. If B(λ) is (S × Z+ )-free,
by Theorem 3 it is a maximal (S × Z+ )-free convex set.

3 Characterizing the Region Rψ


Next we state and prove a theorem that characterizes when ψ ∗ (r∗ ) = ψ(r∗ ). The
main result of this paper, Theorem 2, will then follow easily.
Theorem 5. Given a maximal S-free convex set B, let ψ = ψB . Given r∗ ∈ Rn ,
the following are equivalent:
(i) ψ ∗ (r∗ ) = ψ(r∗ ).  
(ii)There exists a point x̄ ∈ S such that x̄1 ∈ B(ψ(r∗ )).

Proof. Let λ∗ = ψ(r∗ ). Note that the inequality r∈Rn ψ(r)sr + λ∗ yr∗ ≥ 1 is
valid for (4). Thus, it follows from Theorem 4 that B(λ∗ ) is S × Z+ -free .
x̄We first∗ show that (ii) implies (i). x̄ Assume there exists x̄ ∈ ∗
S such that
1 ∈ B(λ ). Then, for every > 0, 1 is in the interior of B(λ − ), because
ai (x̄ − f ) + (λ − ε − ai r∗ ) ≤ 1 − ε < 1 for all i ∈ I. Theorem 4 then implies that
ψ ∗ (r∗ ) = λ∗ .
Next we show that (i) implies (ii). Assume that ψ ∗ (r∗ ) = ψ(r∗ ) = λ∗ . We
recall that λ∗ = maxi∈I ai r∗ .
Note that, if ai r∗ = λ∗ for all i ∈ I, then B(λ∗ ) = B × R, so given any point
x̄ in B ∩ S, x̄1 is in B(λ∗ ). Thus we assume that there exists an index h such
that ah r∗ < λ∗ .
By Remark 3, B(λ∗ ) is maximal (S × Z+ )-free. Theorem 3 implies the
following,
a) rec(B ∩ conv(S)) ⊆ lin(B),
b) rec(B(λ∗ ) ∩ conv(S × Z+ )) ⊆ lin(B(λ∗ )).
Lemma 2. rec(B(λ∗ ) ∩ conv(S × Z+ )) = rec(B ∩ conv(S)) × {0}.
 r̄ 
Let r̄n+1 ∈ rec(B(λ∗ )∩conv(S×Z+ )). Note that rec(conv(S×Z+ )) = rec(conv(S))
× Z+ , thus r̄ ∈ rec(conv(S)) and r̄n+1 ≥ 0. We only need to show that r̄n+1 = 0.

By b), r̄n+1 satisfies

ai r̄ + (λ∗ − ai r∗ )r̄n+1 = 0, i ∈ I. (6)

Since λ∗ − ai r∗ ≥ 0 and r̄n+1 ≥ 0,

ai r̄ ≤ 0, i ∈ I,
92 A. Basu et al.

therefore r̄ ∈ rec(B). Thus r̄ ∈ rec(B ∩ conv(S)) which, by a), is contained in


lin(B). This implies
ai r̄ = 0, i ∈ I.
It follows from the above and from (6) that (λ∗ − ai r∗ )r̄n+1 = 0 for i ∈ I. Since
λ∗ − ah r∗ > 0 for some index h, it follows that r̄n+1 = 0. This concludes the
proof of Lemma 2.
Lemma 3. There exists ε̄ > 0 such that rec(B(λ∗ − ε) ∩ conv(S × Z+ )) =
rec(B ∩ conv(S)) × {0} for every ε ∈ [0, ε̄].
Since conv(S) is a rational polyhedron, S = {x∈Rn | Cx ≤ d} for some rational
matrix (C, d). By Lemma 2, there is no vector 1r in rec(B(λ∗ ) ∩ conv(S × Z+ )).
Thus the system
ai r + (λ∗ − ai r∗ ) ≤ 0, i ∈ I
Cr ≤ 0
is infeasible. By Farkas Lemma, there exist scalars μi ≥ 0, i ∈ I and a nonnega-
tive vector γ such that

μi ai + γC = 0
i∈I
 

λ ( μi ) − ( μi ai )r∗ > 0.
i∈I i∈I

This implies that there exists some ε̄ > 0 such that for all ε ≤ ε̄,

μi ai + γC = 0
i∈I
 

(λ − ε)( μi ) − ( μi ai )r∗ > 0,
i∈I i∈I

thus the system


ai r + (λ∗ − ε − ai r∗ ) ≤ 0, i ∈ I
Cr ≤ 0
is infeasible. This implies that rec(B(λ∗ − ε) ∩ conv(S × Z+ )) = rec(B ∩ conv(S))
× {0}.
 x̄ 
Lemma 4. B(λ∗ ) contains a point x̄n+1 ∈ S × Z+ such that x̄n+1 > 0.
By Lemma 3, there exists ε̄ such that, for every ε ∈ [0, ε̄], rec(B(λ∗ −ε)∩conv(S×
Z+ )) = rec(B ∩ conv(S)) × {0}. This implies x̄ that there exists a scalar M such
that, for every ε ∈ [0, ε̄] and every point x̄n+1 ∈ B(λ∗ − ε) ∩ (S × Z+ ), it follows
x̄n+1 ≤ M .
 Remark 2 and Remark 3 imply that there exists δ > 0 such that, for every

x̄n+1 ∈ (S × Z+ ) \ B(λ∗ ), there exists h ∈ I such that ah (x̄ − f ) + (λ∗ −
ah r∗ )x̄n+1 ≥ 1 + δ. Choose ε > 0 such that ε ≤ ε̄ and εM ≤δ. 
Since ψ ∗ (r∗ ) = λ∗ , by Theorem 4, B(λ∗ − ε) has a point x̄n+1 x̄
∈ S × Z+ in
its interior. Thus ai (x̄ − f ) + (λ∗ − ε − ai r∗ )x̄n+1 < 1, i ∈ I.
On Lifting Integer Variables in Minimal Inequalities 93
 x̄ 
We show that x̄n+1 is also in B(λ∗ ). Suppose not. Then, by our choice of δ,
there exists h ∈ I such that ah (x̄ − f ) + (λ∗ − ah r∗ )x̄n+1 ≥ 1 + δ. By our choice
of M and ε,
1 + δ ≤ ah (x̄ − f ) + (λ∗ − ah r∗ )x̄n+1 ≤ ah (x̄ − f ) + (λ∗ − ε − ah r∗ )x̄n+1 + εM <
1 + εM ≤ 1 + δ,
a contradiction.
 x̄ 
Hence x̄n+1 is in B(λ∗ ). Since B is S-free and B(λ∗ −ε)∩(Rn ×{0}) = B×{0},
it follows that B(λ∗ − ε) does not contain any point of S × {0} in its interior.
Thus x̄n+1 > 0. This concludes the proof of Lemma 4.
 
By the previous lemma, B(λ∗ ) contains a point x̄
∈ S × Z+ such that
  x̄n+1
x̄n+1 > 0. Note that B(λ∗ ) contains x̄1 , since

ai (x̄ − f ) + (λ∗ − ai r∗ ) ≤ ai (x̄ − f ) + (λ∗ − ai r∗ )x̄n+1 ≤ 1, i∈I

since λ∗ − ai r∗ ≥ 0, i ∈ I.

Corollary 1. Let ψ be a minimal valid function for (3). Then ψ ∗ (r∗ ) = ψ(r∗ )
if and only if there exists x̄ ∈ S such that

ψ(r∗ ) + ψ(x̄ − f − r∗ ) = 1. (7)

Proof. We first showthat, if there exist x̄ ∈ S satisfying (7), then ψ ∗ (r∗ ) =


ψ(r∗ ). Indeed, since r∈Rn ψ(r)sr + ψ ∗ (r∗ )yr∗ ≥ 1 is valid for (4),

1 ≤ ψ ∗ (r∗ ) + ψ(x̄ − f − r∗ ) ≤ ψ(r∗ ) + ψ(x̄ − f − r∗ ) = 1.

We show the converse. Since ψ is a valid function for (3), ψ(x̄−f −r∗ )+ψ(r∗ ) ≥ 1.
Since ψ is a minimal valid function for (3), by Theorem 1 there exists a maximal
S-free convex set B such that ψ = ψB . Let Bψ = {x ∈ Rn | ai (x − f ) ≤ 1, i ∈ I}.
∗ ∗ ∗
x̄Assume ψ∗ (r ) = ψ(r ). By Theorem 5, there exists a point x̄ ∈ S such that
1 ∈ B(ψ(r )). Therefore

ai (x̄ − f ) + ψ(r∗ ) − ai r∗ ≤ 1, i ∈ I.

Thus

max ai (x̄ − f − r∗ ) ≤ 1 − ψ(r∗ ),


i∈I

which implies ψ(x̄ − f − r∗ ) + ψ(r∗ ) ≤ 1. Hence ψ(x̄ − f − r∗ ) + ψ(r∗ ) = 1.

Proof (Proof of Theorem 2). By Remark 1, we only need to show that Rψ =


{r ∈ Rn | ψ(r) = ψ ∗ (r)}. For every x ∈ S, we have ψ(x − f ) = 1 if and only if
x ∈ S ∩ Bψ . Therefore, by Proposition 1, R(x) = {r ∈ Rn | ψ(r) + ψ(x − f − r) =
ψ(x − f ) = 1} if and only if x ∈ S ∩ Bψ . The latter fact and Corollary 1 imply
that a vector r ∈ Rn satisfies ψ ∗ (r) = ψ(r) if and only if r ∈ R(x) for some
x ∈ S ∩ Bψ . The statement now follows from the definition of Rψ .
94 A. Basu et al.

4 Conclusion
In this paper we give an exact characterization of the region where a minimal
valid inequality ψ and any minimal lifting π of ψ coincide. This was exhibited in
Theorem 2, which generalizes results from [8] and [9] about liftings of minimal
valid inequalities.
As already mentioned in the introduction, the following theorem was proved
in [8].
Theorem 6. Let ψ be a minimal valid function for (3). If Rψ + (Zn ∩ lin
(conv(S))) covers all of Rn , then there exists a unique minimal lifting π of ψ.
We conjecture that the converse also holds.
Conjecture 7 Let ψ be a minimal valid function for (3). There exists a unique
minimal lifting π of ψ if and only if Rψ + (Zn ∩ lin(conv(S))) covers all of Rn .

Acknowledgements
The authors would like to thank Marco Molinaro for helpful discussions about
the results presented in this paper.

References
1. Andersen, K., Louveaux, Q., Weismantel, R., Wolsey, L.A.: Cutting Planes from
Two Rows of a Simplex Tableau. In: Fischetti, M., Williamson, D.P. (eds.) IPCO
2007. LNCS, vol. 4513, pp. 1–15. Springer, Heidelberg (2007)
2. Balas, E.: Intersection Cuts - A New Type of Cutting Planes for Integer Program-
ming. Operations Research 19, 19–39 (1971)
3. Barvinok, A.: A Course in Convexity. In: Graduate Studies in Mathematics, vol. 54.
American Mathematical Society, Providence (2002)
4. Basu, A., Conforti, M., Cornuejols, G., Zambelli, G.: A Counterexample to a Con-
jecture of Gomory and Johnson. Mathematical Programming Ser. A (to appear
2010)
5. Basu, A., Conforti, M., Cornuejols, G., Zambelli, G.: Maximal Lattice-free Convex
Sets in Linear Subspaces (2009) (manuscript)
6. Basu, A., Conforti, M., Cornuejols, G., Zambelli, G.: Minimal Inequalities for an
Infinite Relaxation of Integer Programs. SIAM Journal of Discrete Mathematics
(to appear 2010)
7. Borozan, V., Cornuéjols, G.: Minimal Valid Inequalities for Integer Constraints.
Mathematics of Operations Research 34, 538–546 (2009)
8. Conforti, M., Cornuejols, G., Zambelli, G.: A Geometric Perspective on Lifting
(May 2009) (manuscript)
9. Dey, S.S., Wolsey, L.A.: Lifting Integer Variables in Minimal Inequalities corre-
sponding to Lattice-Free Triangles. In: Lodi, A., Panconesi, A., Rinaldi, G. (eds.)
IPCO 2008. LNCS, vol. 5035, pp. 463–475. Springer, Heidelberg (2008)
10. Dey, S.S., Wolsey, L.A.: Constrained Infinite Group Relaxations of MIPs (March
2009) (manuscript)
On Lifting Integer Variables in Minimal Inequalities 95

11. Gomory, R.E.: Some Polyhedra related to Combinatorial Problems. Linear Algebra
and its Applications 2, 451–558 (1969)
12. Gomory, R.E., Johnson, E.L.: Some Continuous Functions Related to Corner Poly-
hedra, Part I. Mathematical Programming 3, 23–85 (1972)
13. Johnson, E.L.: On the Group Problem for Mixed Integer Programming. In: Math-
ematical Programming Study, pp. 137–179 (1974)
14. Schrijver, A.: Theory of Linear and Integer Programming. John Wiley & Sons,
New York (1986)
15. Meyer, R.R.: On the Existence of Optimal Solutions to Integer and Mixed-Integer
Programming Problems. Mathematical Programming 7, 223–235 (1974)
Efficient Edge Splitting-Off Algorithms
Maintaining All-Pairs Edge-Connectivities

Lap Chi Lau and Chun Kong Yung

Department of Computer Science and Engineering


The Chinese University of Hong Kong

Abstract. In this paper we present new edge splitting-off results main-


taining all-pairs edge-connectivities of a graph. We first give an alter-
nate proof of Mader’s theorem, and use it to obtain a deterministic
Õ(rmax 2 · n2 )-time complete edge splitting-off algorithm for unweighted
graphs, where rmax denotes the maximum edge-connectivity requirement.
This improves upon the best known algorithm by Gabow by a factor of
Ω̃(n). We then prove a new structural property, and use it to further
speedup the algorithm to obtain a randomized Õ(m + rmax 3 · n)-time
algorithm. These edge splitting-off algorithms can be used directly to
speedup various graph algorithms.

1 Introduction
The edge splitting-off operation plays an important role in many basic graph
problems, both in proving theorems and obtaining efficient algorithms. Splitting-
off a pair of edges (xu, xv) means deleting these two edges and adding a new
edge uv if u = v. This operation is introduced by Lovász [18] who showed that
splitting-off can be performed to maintain the global edge-connectivity of a graph.
Mader extended Lovász’s result significantly to prove that splitting-off can be
performed to maintain the local edge-connectivity for all pairs:
Theorem 1 (Mader [19]). Let G = (V, E) be an undirected graph that has at
least r(s, t) edge-disjoint paths between s and t for all s, t ∈ V − x. If there is
no cut edge incident to x and d(x) = 3, then some edge pair (xu, xv) can be
split-off so that in the resulting graph there are still at least r(s, t) edge-disjoint
paths between s and t for all s, t ∈ V − x.
These splitting-off theorems have applications in various graph problems. Lovász
[18] and Mader [19] used their splitting-off theorems to derive Nash-Williams’ graph
orientation theorems [23]. Subsequently these theorems and their extensions have
found applications in a number of problems, including edge-connectivity augmen-
tation problems [4, 8, 9], network design problems [7, 13, 16], tree packing problems
[1, 6, 17], and graph orientation problems [11].
Efficient splitting-off algorithms have been developed to give fast algorithms
for the above problems [4, 6, 12, 20, 22]. However, most of the efficient algorithms
are developed only in the global edge-connectivity setting, although there are
important applications in the more general local edge-connectivity setting.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 96–109, 2010.

c Springer-Verlag Berlin Heidelberg 2010
Efficient Edge Splitting-Off Algorithms 97

In this paper we present new edge splitting-off results maintaining all-pairs


edge-connectivities. First we give an alternate proof of Mader’s theorem (The-
orem 1). Based on this, we develop a faster deterministic algorithm for edge
splitting-off maintaining all-pairs edge-connectivities (Theorem 2). Then we prove
a new structural property (Theorem 3), and use it to design a randomized pro-
cedure to further speedup the splitting-off algorithm (Theorem 2). These algo-
rithms improve the best known algorithm by a factor of Ω̃(n), and can be applied
directly to speedup various graph algorithms using edge splitting-off.

1.1 Efficient Complete Edge Splitting-Off Algorithm

Mader’s theorem can be applied repeatedly until d(x) = 0 when d(x) is even and
there is no cut edge incident to x. This is called a complete edge splitting-off at
x, which is a key subroutine in algorithms for connectivity augmentation, graph
orientation, and tree packing.
A straightforward algorithm to compute a complete splitting-off sequence is
to split-off (xu, xv) for every pair u, v ∈ N (x) where N (x) is the neighbor set
of x, and then check whether the connectivity requirements are violated by
computing all-pairs edge-connectivities in the resulting graph, and repeat this
procedure until d(x) = 0.
Several efficient algorithms are proposed for the complete splitting-off prob-
lem, but only Gabow’s algorithm [12] can be used in the local edge-connectivity
setting, with running time O(rmax 2 · n3 ). Our algorithms improve the running
time of Gabow’s algorithm by a factor of Ω̃(n). In applications where rmax is
small, the improvement of the randomized algorithm could be a factor of Ω̃(n2 ).
Theorem 2. In the local edge-connectivity setting, there is a deterministic
Õ(rmax 2 · n2 )-time algorithm and a randomized Õ(m + rmax 3 · n)-time algorithm
for the complete edge splitting-off problem in unweighted graphs.
These edge splitting-off algorithms can be used directly to improve the run-
ning time of various graph algorithms [7, 9, 12, 13, 17, 23]. For instance, us-
ing Theorem 2 in Gabow’s local edge-connectivity augmentation algorithm [12]
in unweighted graphs, the running time can be improved from Õ(rmax 2 n3 ) to
Õ(rmax 2 n2 ) time. Similarly, using Theorem 2 in Gabow’s orientation algorithm
[12], one can find a well-balanced orientation in unweighted graphs in Õ(rmax 3 n2 )
expected time, improving the O(rmax 2 n3 ) result by Gabow [12]. We will not dis-
cuss the details of these applications in this paper.
Our edge splitting-off algorithms are conceptually very simple, which can be
seen as refinements of the straightforward algorithm. The improvements come
from some new structural results, and a recent fast Gomory-Hu tree construc-
tion algorithm by Bhalgat, Hariharan, Kavitha, and Panigrahi [5]. First, in
Section 3.2, we show how to find a complete edge splitting-off sequence by using
at most O(|N (x)|) splitting-off attempts, instead of O(|N (x)|2 ) attempts by the
straightforward algorithm. This is based on an alternative proof of Mader’s the-
orem in Section 3.1. Then, in Section 3.4, we show how to reduce the problem of
checking local edge-connectivities for all pairs, to the problem of checking local
98 L.C. Lau and C.K. Yung

edge-connectivities from a particular vertex (i.e. checking at most O(n) pairs


instead of checking O(n2 ) pairs). This allows us to use the recent fast Gomory-
Hu tree algorithm [5] to check connectivities efficiently. Finally, using a new
structural property (Theorem 3), we show how to speedup the algorithm by a
randomized edge splitting-off procedure in Section 4.

1.2 Structural Property and Randomized Algorithm


Mader’s theorem shows the existence of one admissible edge pair, whose splitting-
off maintains the local edge-connectivity requirements of the graph. Given an
edge xv, we say an edge xw is a non-admissible partner of xv if (xv, xw) is not
admissible. We prove a tight upper bound on the number of non-admissible part-
ners of a given edge xv, which may be of independent interest. In the following
rmax := maxs,t∈V −x r(s, t) is the maximum edge-connectivity requirement.
Theorem 3. Suppose there is no cut edge incident to x and rmax ≥ 2. Then the
number of non-admissible partners for any given edge xv is at most 2rmax − 2.
This improves the result of Bang-Jensen and Jordán [2] by a factor of rmax , and
the bound is best possible as there are examples achieving it. Theorem 3 implies
that when d(x) is considerably larger than rmax , most of the edge pairs incident
to x are admissible. Therefore, we can split-off edge pairs randomly to speedup
our efficient splitting-off algorithm. The proof of Theorem 3 is based on a new
inductive argument and will be presented in Section 4.

2 Preliminaries
Let G = (V, E) be a graph. For X, Y ⊆ V , denote by δ(X, Y ) the set of edges
with one endpoint in X − Y and the other endpoint in Y − X and d(X, Y ) :=
¯
|δ(X, Y )|, and also define d(X, Y ) := d(X ∩ Y, V − (X ∪ Y )). For X ⊆ V , define
δ(X) := δ(X, V − X) and the degree of X as d(X) := |δ(X)|. Denote the degree
of a vertex as d(v) := d({v}). Also denote the set of neighbors of v by N (v), and
call a vertex in N (v) a v-neighbor.
Let λ(s, t) be the maximum number of edge-disjoint paths between s and t in
V , and let r(s, t) be an edge-connectivity requirement for s, t ∈ V . The connec-
tivity requirement is global if rs,t = k for all s, t ∈ V , otherwise it is local. We
say a graph G satisfies the connectivity requirements if λ(s, t) ≥ r(s, t) for any
s, t ∈ V . The requirement r(X) of a set X ⊆ V is the maximum edge-connectivity
requirement between u and v with u ∈ X and v ∈ V − X. By Menger’s theorem,
to satisfy the requirements, it suffices to guarantee that d(X) ≥ r(X) for all
X ⊂ V . The surplus s(X) of a set X ⊆ V is defined as d(X) − r(X). A graph
satisfies the edge-connectivity requirements if s(X) ≥ 0 for all ∅ = X ⊂ V . For
X ⊂ V − x, X is called dangerous if s(X) ≤ 1 and tight if s(X) = 0. The
following proposition will be used throughout our proofs.
Proposition 4 ([10] Proposition 2.3). For X, Y ⊆ V at least one of the
following inequalities holds:
Efficient Edge Splitting-Off Algorithms 99

s(X) + s(Y ) ≥ s(X ∩ Y ) + s(X ∪ Y ) + 2d(X, Y ) (4a)


¯
s(X) + s(Y ) ≥ s(X − Y ) + s(Y − X) + 2d(X, Y) (4b)
In edge splitting-off problems, the objective is to split-off a pair of edges incident
to a designated vertex x to maintain the edge-connectivity requirements for all
other pairs in V −x. For this purpose, we may assume that the edge-connectivity
requirements between x and other vertices are zero. In particular, we may assume
that r(V − x) = 0 and thus the set V − x is not a dangerous set. Two edges
xu, xv form an admissible pair if the graph after splitting-off (xu, xv) does not
violate s(X) ≥ 0 for all X ⊂ V . Given an edge xv, we say an edge xw is a
non-admissible partner of xv if (xv, xw) is not admissible. The following simple
proposition characterizes when a pair is admissible.
Proposition 5 ([10] Claim 3.1). A pair xu, xv is not admissible if and only
if u, v are contained in a dangerous set.
A vertex subset S ⊆ N (x) is called a non-admissible set if (xu, xv) is non-
admissible for every u, v ∈ S. We define the capacity of an edge pair to be the
number of copies of the edge pair that can be split-off while satisfying edge-
connectivity requirements. In our algorithms we will always split-off an edge
pair to its capacity (which could be zero), and only attempt at most O(|N (x)|)
many pairs. Following the definition of Gabow [12], we say that a splitting-off
operation voids a vertex u if d(x, u) = 0 after the splitting-off.
Throughout the complete splitting-off algorithm, we assume that there is no
cut edge incident to x. This holds at the beginning by our assumption, and so
the local edge-connectivity between x and v is at least two for each x-neighbor
v. Therefore, we can reset the connectivity requirement between u and v as
max{r(u, v), 2}, and hence splitting-off any admissible pair would maintain the
property that there is no cut edge incident to x at each step.

2.1 Some Useful Results


The first lemma is about a reduction step of contracting tight sets. Suppose
there is a non-trivial tight set T , i.e. T is a tight set and |T | ≥ 2. Clearly there
are no admissible pairs xu, xv with u, v ∈ T . Let G/T be the graph obtained
by contracting T into a single vertex t, and define the connectivity requirement
r(t, v) as maxu∈T r(u, v), while other connectivity requirements remain the same.
The following lemma says that one can consider the admissible pairs in G/T ,
without losing any information about the admissible pairs in G. This lemma is
useful in proofs to assume that every tight set is a singleton, and is useful in
algorithms to allow us to make progress by contracting non-trivial tight sets.
Lemma 6 ([19], [10] Claim 3.2). Let T be a non-trivial tight set. For an x-
neighbor w in G/T , let w be the corresponding vertex in G if w = t, and let w
be any x-neighbor in T in G if w = t. Suppose (xu, xv) is an admissible pair in
G/T , then (xu , xv  ) is an admissible pair in G.
The next lemma proved in [7] shows that if the conditions in Mader’s theorem are
satisfied, then there is no “3-dangerous-set structure”. This lemma is important
in the efficient edge splitting-off algorithm.
100 L.C. Lau and C.K. Yung

Lemma 7 ([7] Lemma 2.7). If d(x) = 3 and there is no cut edge incident to
x, then there are no maximal dangerous sets X, Y, Z and u, v, w ∈ N (x) with
u ∈ X ∩ Y , v ∈ X ∩ Z, w ∈ Y ∩ Z and u, v, w ∈
/ X ∩ Y ∩ Z.
Nagamochi and Ibaraki [21] gave a fast algorithm to find a sparse subgraph that
satisfies edge-connectivity requirements, which will be used in Section 3.3 as a
preprocessing step.
Theorem 8 ([21] Lemma 2.1). There is an O(m)-time algorithm to construct
a subgraph with O(rmax · n) edges that satisfies all the connectivity requirements.
As a key tool in checking local edge-connectivities, we need to construct a
Gomory-Hu tree, which is a compact representation of all pairwise min-cuts
of an undirected graph. Let G = (V, E) be an undirected graph, a Gomory-Hu
tree is a weighted tree T = (V, F ) with the following property. Consider any
s, t ∈ V , the unique s-t path P in T , an edge e = uv on P with minimum
weight, and any component K of T − e. Then the local edge-connectivity be-
tween s and t in G is equal to the weight of e in T , and δ(K) is a minimum s-t
cut in G. To check whether the connectivity requirements are satisfied, we only
need to check the pairs with λ(u, v) ≤ rmax . A partial Gomory-Hu tree Tk of G is
obtained from a Gomory-Hu tree T of G by contracting all edges with weight at
least k. Therefore, each node in Tk represents a subset of vertices S in G, where
the local edge-connectivity between each pair of vertices in S is at least k. For
vertices u, v ∈ G in different nodes of Tk , their local edge-connectivity (which
is less than k) is determined in the same way as in an ordinary Gomory-Hu
tree. Bhalgat et.al. [5] gave a fast randomized algorithm to construct a partial
Gomory-Hu tree. We will use the following theorem by setting k = rmax . The
following result can be obtained by using the algorithm in [15], with the fast tree
packing algorithm in [5].
Theorem 9 ([5, 15]). A partial Gomory-Hu tree Tk can be constructed in
Õ(km) expected time.

3 Efficient Complete Edge Splitting-Off Algorithm


In this section we present the deterministic splitting-off algorithm as stated
in Theorem 2. First we present an alternative proof of Mader’s theorem in
Section 3.1. Extending the ideas in the alternative proof we show how to find
a complete edge splitting-off sequence by only O(|N (x)|) edge splitting-off at-
tempts in Section 3.2. Then, in Section 3.3, we show how to efficiently perform
one edge splitting-off attempt, by doing some preprocessing and applying some
fast algorithms to check edge-connectivities. Combining these two steps yields
an Õ(rmax 2 · n2 ) randomized algorithm for the complete splitting-off problem.
Finally, in Section 3.5, we describe how to modify some steps in Section 3.3 to
obtain an Õ(rmax 2 · n2 ) deterministic algorithm for the problem.
Efficient Edge Splitting-Off Algorithms 101

3.1 Mader’s Theorem


We present an alternative proof of Mader’s theorem, which can be extended to
obtain an efficient algorithm. The following lemma about non-admissible sets
can be used directly to derive Mader’s theorem.
Lemma 10. Suppose there is no 3-dangerous set structure. Then, for any non-
admissible set U ⊆ N (x) with |U | ≥ 2, there is a dangerous set containing U .

Proof. We prove the lemma by a simple induction. The statement holds trivially
for |U | = 2 by Proposition 5. Consider U = {u1 , u2 , . . . , uk+1 } ⊆ N (x) where ev-
ery pair (ui , uj ) is non-admissible. By induction, since every pair (ui , uj ) is non-
admissible, there are maximal dangerous sets X, Y such that {u1 , ..., uk−1 , uk } ⊆
X and {u1 , ..., uk−1 , uk+1 } ⊆ Y . Since (uk , uk+1 ) is non-admissible, by Propo-
sition 5, there is a dangerous set Z containing uk and uk+1 . If uk+1 ∈ / X and
uk ∈ / Y and there is some ui ∈ / Z, then X, Y and Z form a 3-dangerous-set
structure with u = ui , v = uk , w = uk+1 . Hence either X, Y or Z contains U .  

To prove Mader’s theorem, consider a vertex x ∈ V with d(x) is even and


there is no cut edge incident to it. By Lemma 7, there is no 3-dangerous set
structure in G. Suppose that there is no admissible pair incident to x. Then, by
Lemma 10, there is a dangerous set D containing all the vertices in N (x). But
this is impossible since r(V −D−x) = r(D) ≥ d(D)−1 = d(V −D−x)+d(x)−1 ≥
d(V − D − x) + 1, contradicting that the connectivity requirements are satisfied
in G. This completes the proof.

3.2 An Upper Bound on Splitting-Off Attempts


Extending the ideas in the proof of Lemma 10, we present an algorithm to
find a complete splitting-off sequence by making at most O(|N (x)|) splitting-off
attempts (to split-off to capacity). In the algorithm we maintain a non-admissible
set C; initially C = ∅. The algorithm will apply one of the following three
operations guaranteed by the following lemma. Here we assume that {u} is a
non-admissible set for every u ∈ N (x). This can be achieved by a pre-processing
step that split-off every (u, u) to capacity.
Lemma 11. Suppose that C is a non-admissible set and there is a vertex u ∈
N (x) − C. Then, using at most three splitting-off attempts, at least one of the
following operations can be applied:
1. Splitting-off an edge pair to capacity that voids an x-neighbor.
2. Deducing that every pair in C ∪ {u} is non-admissible, and add u to C.
3. Contracting a tight set T containing at least two x-neighbors.

Proof. We consider three cases based on the size of C. When |C| = 0, we simply
assign C = {u}. When |C| = 1, pick the vertex v ∈ C, and split-off (u, v) to
capacity. Either case (1) applies when either u or v becomes void, or case (2)
applies in the resulting graph after (u, v) is split-off to capacity. Hence, when
|C| ≤ 1, either case (1) or case (2) applies after only one splitting-off attempt.
102 L.C. Lau and C.K. Yung

The interesting case is when |C| ≥ 2 and let v1 , v2 ∈ C. Since C is a non-


admissible set, by Lemma 10, there is a maximal dangerous set D containing C.
First, we split-off (u, v1 ) and (u, v2 ) to capacity. If case (1) applies then we are done,
so we assume that none of the three x-neighbors voids, implying that (u, v1 ) and
(u, v2 ) are non-admissible in the resulting graph G after splitting-off these edge
pairs to capacity. Note that the edge pair (v1 , v2 ) is also non-admissible since non-
admissible edge pair in G remains non-admissible in G . By Lemma 10, there exists
a maximal dangerous set D covering the non-admissible set {u, v1 , v2 }. Then in-
equality (4b) cannot hold for D and D , since 1 + 1 = s(D) + s(D ) ≥ s(D − D ) +
s(D − D) + 2d(D,
¯ D ) ≥ 0 + 0 + 2d(x, {v1 , v2 }) ≥ 2 · 2. Therefore inequality (4a)
must hold for D and D , hence 1 + 1 = s(D) + s(D ) ≥ s(D ∩ D ) + s(D ∪ D ).
This implies that either D ∪ D is a dangerous set for which case (2) applies,
since C ∪ {u} is contained in a dangerous set and hence every pair is a non-
admissible pair by Proposition 5, or D ∩ D is a tight set for which case (3)
applies since v1 and v2 are x-neighbors. Note that v1 , v2 are contained in a
tight set if and only if after splitting-off one copy of (xv1 , xv2 ) the connectivity
requirement of some pair is violated by two. Hence this can be checked by one
splitting-off attempt, and thus we can distinguish between case (2) and case (3),
and in case (3) we can find such a tight set efficiently. Therefore, by making
at most three splitting-off attempts ((xu, xv1 ), (xu, xv2 ), (xv1 , xv2 )), one of the
three operations can be applied. 

The following result can be obtained by applying Lemma 11 repeatedly.
Lemma 12. The algorithm computes a complete edge splitting-off sequence us-
ing at most O(|N (x)|) numbers of splitting-off attempts.
Proof. The algorithm maintains the property that C is a non-admissible set,
which holds at the beginning when C = ∅. It is clear that in case (2) the set
C remains non-admissible. In case (1), by splitting-off an admissible pair, every
pair of vertices in C remains non-admissible. Also, in case (3), by contracting a
tight set, every pair of vertices in C remains non-admissible by Lemma 6.
The algorithm terminates when there is no vertex in N (x) − C. At that time,
if C = ∅, then we have found a complete splitting-off sequence; if C = ∅, then by
Mader’s theorem (or by the proof in Section 3.1), this only happens if d(x) = 3
and d(x) is odd at the beginning. In any case, the longest splitting-off sequence
is found and the given complete edge splitting-off problem is solved.
It remains to prove that the total number of splitting-off attempts in the whole
algorithm is at most O(|N (x)|). To see this, we claim that each of the operations
in Lemma 11 will be performed at most |N (x)| times. Indeed, case (1) and (3)
will be applied at most |N (x)| times since each application reduces the number
of x-neighbors by at least one, and case (2) will be applied at most |N (x)| times
since each application reduces the number of x-neighbors in N (x)−C by one.  

3.3 Algorithm Outline


The following is an outline of the whole algorithm for the complete splitting-off
problem. First we use the O(m) time algorithm in Theorem 8 to construct a
Efficient Edge Splitting-Off Algorithms 103

subgraph of G with O(rmax · n) edges satisfying the connectivity requirements.


To find a complete splitting-off sequence, we can thus restrict our attention to
maintain the local edge-connectivities in this subgraph.
In the next preprocessing step, we will reduce the problem further to an
instance where there is a particular indicator vertex t = x, with the property
that for any pair of vertices u, v ∈ V − x with λ(u, v) ≤ rmax , then it holds that
λ(u, v) = min{λ(u, t), λ(v, t)}. With this indicator vertex, to check the local
edge-connectivity for all pairs with λ(u, v) ≤ rmax , we only need to check the
local edge-connectivities from t to every vertex v with λ(v, t) ≤ rmax . This allows
us to make only O(n) queries (instead of O(n2 ) queries) to check the local edge-
connectivities. This reduction step can be done by computing a partial Gomory-
Hu tree and contracting appropriate tight sets; see the details in Section 3.4.
The total preprocessing time is at most Õ(m + rmax 2 · n), by using the fast
Gomory-Hu tree algorithm in Theorem 9.
After these two preprocessing steps, we can perform a splitting-off attempt
(split-off a pair to capacity) efficiently. For a vertex pair (u, v), we replace
min{d(x, u), d(x, v)} copies of xu and xv by copies of uv, and then determine
the maximum violation of connectivity requirements by constructing a partial
Gomory-Hu tree and checking the local edge-connectivities from the indicator
vertex t to every other vertex. If q is the maximum violation of the connectivity
requirements, then exactly min{d(x, u), d(x, v)}−q/2 copies of (xu, xv) are ad-
missible. Therefore, using Theorem 9, one splitting-off attempt can be performed
in Õ(rmax · m + n) = Õ(rmax 2 · n) expected time. By Lemma 12, the complete
splitting-off problem can be solved by at most O(|N (x)|) = O(n) splitting-off
attempts. Hence we obtain the following result.
Theorem 13. The complete edge splitting-off problem can be solved in Õ(rmax 2 ·
|N (x)| · n) = Õ(rmax 2 · n2 ) expected time.

3.4 Indicator Vertex


We show how to reduce the problem into an instance with a particular indicator
vertex t = x, with the property that if λ(u, v) ≤ rmax for u, v = x, then λ(u, v) =
min{λ(u, t), λ(v, t)}. Hence if we could maintain the local edge-connectivity from
t to v for every v ∈ V −x with λ(v, t) ≤ rmax , then the connectivity requirements
for every pair in V − x will be satisfied. Furthermore, by maintaining the local
edge-connectivity, the indicator vertex t will remain to be an indicator vertex,
and therefore this procedure needs to be executed only once. Without loss of
generality, we assume that the connectivity requirement for each pair of vertices
u, v ∈ V − x is equal to min{λ(u, v), rmax }, and r(x, v) = 0 for every v ∈ V − x.
First we compute a partial Gomory-Hu tree Trmax in Õ(rmax · m) time by
Theorem 9, which is Õ(rmax 2 · n) after applying the sparsifying algorithm in
Theorem 8. Recall that each node in Trmax represents a subset of vertices in G.
In the following we will use a capital letter (say U ) to denote both a node in
Trmax and the corresponding subset of vertices in G. If Trmax has only one node,
then this means that the local edge-connectivity between every pair of vertices in
G is at least rmax . In this case, any vertex t = x is an indicator vertex. So assume
104 L.C. Lau and C.K. Yung

that Trmax has at least two nodes. Let X be the node in Trmax that contains x
in G, and U1 , . . . , Up be the nodes adjacent to X in Trmax , and let XU1 be the
edge in Trmax with largest weight among XUi for 1 ≤ i ≤ p. See Figure (a).

X x U1
t
U2 Up W1 Wq
U1 U2 … Up … …

U1* U2* Up* U2* Up* W1* Wq*

(a) (b)

Suppose X contains a vertex t = x in G. The idea is to contract tight sets so


that t will become an indicator vertex in the resulting graph. For any edge XUi
in Trmax , let Ti be the component of Trmax that contains Ui when XUi is removed
from Trmax . We claim that each Ui∗ := ∪U∈Ti U is a tight set in G; see Figure (a).
By the definition of a Gomory-Hu tree, the local edge-connectivity between any
vertex ui ∈ Ui and t is equal to the edge weight of XUi in Trmax . Also, by the
definition of a Gomory-Hu tree, d(Ui∗ ) is equal to the weight of edge XUi in
Trmax . Therefore, Ui∗ is a tight set in G, because r(ui , t) = λ(ui , t) = d(Ui∗ ) for
some pair ui , t ∈ V − x. By Proposition 5, we can contract each Ui∗ into a single
vertex ui for 1 ≤ i ≤ p without losing any information about admissible pairs
in G. Since each Ui∗ becomes a single vertex, the vertex t becomes an indicator
vertex in the resulting graph.
Suppose X contains only x in G. Then U1∗ may not be a tight set, since there
may not exist a pair u, v ∈ V − x with r(u, v) = λ(u, v) = d(U1∗ ) (note that
there is a vertex v with λ(x, v) = d(U1∗ ), but r(x, v) = 0 for every vertex v). In
this case, we will contract some tight sets so that any vertex in U1 will become
an indicator vertex. Let W1 = X, . . . , Wq = X be the nodes (if any) adjacent
to U1 in Trmax ; see Figure (b). By using similar arguments as before, it can be
shown that each Ui∗ is a tight set for 2 ≤ i ≤ p (through ui ∈ Ui and u1 ∈ U1 ).
Therefore we can contract each Ui∗ into a single vertex ui for 2 ≤ i ≤ p. Similarly,
we can argue that each Wj∗ (defined analogously as Ui∗ ) is a tight set, and hence
we can contract each Wj∗ into a single vertex wj for each 1 ≤ j ≤ q. We can
see that any vertex t ∈ U1 is an indicator vertex in the resulting graph, because
λ(t, v) ≥ min{λ(w, v), rmax } for any pair of vertices v, w.
Henceforth we can consider this resulting graph instead of G for the purpose of
computing a complete splitting-off sequence, and using t as the indicator vertex
to check connectivities. The running time of this procedure is dominated by the
partial Gomory-Hu tree computation, which is at most Õ(rmax 2 · n).

3.5 Deterministic Algorithm


We describe how to modify the randomized algorithm in Theorem 13 to obtain a
deterministic algorithm with the same running time. Every step in the algorithm
Efficient Edge Splitting-Off Algorithms 105

is deterministic except the Gomory-Hu tree construction in Theorem 9. The


randomized Gomory-Hu tree construction is used in two places. First it is used in
finding an indicator vertex in Section 3.4, and for this purpose it is executed only
once. Here we can replace it by a slower deterministic partial Gomory-Hu tree
construction algorithm. It is well-known that a Gomory-Hu tree can be computed
using at most n − 1 max-flow computations [14]. By using the Ford-Fulkerson
flow algorithm, one can obtain an O(rmax 2 · n2 )-time deterministic algorithm
to construct a partial Gomory-Hu tree Trmax . The randomized partial Gomory-
Hu construction is also used in every splitting-off attempt to check whether the
connectivity requirements are satisfied. With the indicator vertex t, this task
reduces to checking the local edge-connectivities from t to other vertices, and
there is a fast deterministic algorithm for this simpler task by Bhalgat et.al. [5].

Theorem 14 ([5]). Given an undirected graph G and a vertex t, there is an


Õ(rmax · m)-time deterministic algorithm to compute min{λG (t, v), rmax } for all
vertices v ∈ G.
Thus we can replace the randomized partial Gomory-Hu tree algorithm by this
algorithm, and so Theorem 13 still holds deterministically. Hence there is a
deterministic Õ(rmax 2 ·n2 ) time algorithm for the complete splitting-off problem.

4 Structural Property and Randomized Algorithm


Before we give the proof of Theorem 3, we first show how to use it in a randomized
edge splitting-off procedure to speedup the algorithm. By Theorem 3, when the
degree of x is much larger than 2rmax , even a random edge pair will be admissible
with probability at least 1 − 2rmax /(d(x) − 1). Using this observation, we show
how to reduce d(x) to O(rmax ) in Õ(rmax 3 · n) time. Then, by Theorem 13, the
remaining edges can be split-off in Õ(rmax 2 · d(x) · n) = Õ(rmax 3 · n) time. So
the total running time of the complete splitting-off algorithm is improved to
Õ(m + rmax 3 · n), proving Theorem 2.
The idea is to split-off many random edge pairs in parallel, before checking if
some connectivity requirement is violated. Suppose that 2l+q−1 < d(x) ≤ 2l+q
and 2l−1 < rmax ≤ 2l for some positive integers l and q. To reduce d(x) to 2l+q−1 ,
we need to split-off at most 2l+q−1 x-edges. Since each x-edge has fewer than
2rmax non-admissible partners by Theorem 3, the probability that a random
−2l+1
= 2 2q−2−1 . Now,
l+q−1 q−2
edge pair is admissible is at least (d(x)−1)−2r
d(x)−1
max
≥ 2 2l+q−1
consider a random splitting-off operation that split-off at most 2q−2 edge pairs at
random in parallel. The operation is successful if all the edge pairs are admissible.
The probability for the operation to succeed is at least ( 2 2q−2−1 )2
q−2 q−2
= O(1).
After each operation, we run the checking algorithm to determine whether this
operation is successful or not. Consider an iteration that consists of c · log n
operations, for some constant c. The iteration is successful if it finds a set of
2q−2 admissible pairs, i.e. any of its operations succeeds. The probability for an
iteration to fail is hence at most 1/nc for q ≥ 3. The time complexity of an
iteration is Õ(rmax 2 · n).
106 L.C. Lau and C.K. Yung

Since each iteration reduces the degree of x by 2q−2 , with at most 2l+1 =
O(rmax ) successful iterations, we can then reduce d(x) to 2l+q−1 , i.e. reduce
d(x) by half. This procedure is applicable as long as q ≥ 3. Therefore, we can
reduce d(x) to 2l+2 by using this procedure for O(log n) times. The total running
time is thus Õ(2l+1 · log n · rmax 2 · n) = Õ(rmax 3 · n). Note that there are at most
Õ(rmax ) iterations and the failure probability of each iteration is at most 1/nc .
By the union bound, the probability for above randomized algorithm to fail
is at most 1/nc−1 . Therefore, with high probability, the algorithm succeeds in
Õ(rmax 3 · n) time to reduce d(x) to O(rmax ). Since the correctness of solution
can be verified by a Gomory-Hu Tree, this also gives a Las Vegas algorithm with
the same expected runtime.

4.1 Proof of Theorem 3


In this subsection we will prove that each edge has at most 2rmax − 2 non-
admissible partners. Given an edge pair (xv, xw), if it is a non-admissible pair,
then there is a dangerous set D with {xv, xw} ⊆ δ(D) by Proposition 5, and we
say such a dangerous set D covers xv and xw. Let P be the set of non-admissible
partners of xv in the initial graph. Our goal is to show that |P | ≤ 2rmax − 2.
Proposition 15 ([2] Lemma 5.4). Suppose there is no cut edge incident to
x. For any disjoint vertex sets S1 , S2 with d(S1 , S2 ) = 0 and d(x, S1 ) ≥ 1 and
d(x, S2 ) ≥ 1, then S1 ∪ S2 is not a dangerous set.
We first present an outline of the proof. Let DP be a minimal set of maximal
dangerous sets such that (i) each set D ∈ DP covers the edge xv and (ii) each
edge in P is covered by some set D ∈ DP . First, we consider the base case
with |DP | ≤ 2. The theorem follows immediately if |DP | = 1, so assume DP =
{D1 , D2 }. By Proposition 15, d(D1 − D2 , D1 ∩ D2 ) ≥ 1 as DP is minimal. Hence
d(D, V − x − D) ≥ 1 for each D ∈ DP . Since d(D) ≤ rmax + 1 and D covers
xv for each D ∈ DP , each set in DP can cover at most rmax − 1 non-admissible
partner of xv, proving |P | ≤ 2rmax − 2.
The next step is to show that |DP | ≤ rmax − 1 when |DP | ≥ 3, where the
proofs of this step use very similar ideas as in [2, 24]. When |DP | ≥ 3, we show
in Lemma 16 that inequality (4a) must hold for each pair of dangerous sets in
DP . Since each dangerous set is connected by Proposition 15, this allows us to
conclude in Lemma 17 that |DP | ≤ rmax − 1. This implies that |P | < rmax 2
.
To improve this bound, we use a new inductive argument to show that |P | ≤
rmax − 1 + |DP | ≤ 2rmax − 2. First we prove in Lemma 18 that there is an
admissible pair (xa, xb) in P (so by definition a, b = v). By splitting-off (xa, xb),
let P  = P − {xa, xb} with |P  | = |P | − 2. In the resulting graph, we prove
in Lemma 19 that |DP  | ≤ |DP | − 2. Hence, by repeating this reduction, we
can show that after splitting-off |DP |/2 pairs of edges in P , the remaining
edges in P is covered by one dangerous set. Therefore, we can conclude that
|P | ≤ rmax − 1 + |DP | ≤ 2rmax − 2. In the following we will first prove the upper
bound on |DP |, then we will provide the details of the inductive argument.
Efficient Edge Splitting-Off Algorithms 107

An Upper Bound on |DP |: By contracting non-trivial tight sets, each edge in P


is still a non-admissible partner of xv by Lemma 6. Henceforth, we will assume that
all tight sets in G are singletons. Also we assume there is no cut edge incident to x
and rmax ≥ 2 as required in the proof by Theorem 3. Recall that DP is a minimal
set of maximal dangerous sets such that (i) each set D ∈ DP covers the edge xv
and (ii) each edge in P is covered by some set D ∈ DP . We use the following result.

Lemma 16 ([2] Lemma 5.4, [24] Lemma 2.6). If |DP | ≥ 3, then inequal-
ity (4a) holds for every X, Y ∈ DP . Furthermore, X ∩ Y = {v} and is a tight
set for any X, Y ∈ DP .
Lemma 17. |DP | ≤ rmax − 1 when |DP | ≥ 3.

Proof. By Lemma 16, we have X ∩ Y = {v} for any X, Y ∈ DP . For each set
X ∈ DP , we have d(x, v) ≥ 1 and d(x, X − v) ≥ 1 by the minimality of DP .
Therefore, we must have d(v, X − v) ≥ 1 by Proposition 15. By Lemma 16, X − v
and Y − v are disjoint for each pair X, Y ∈ DP . Since d(v, X − v) ≥ 1 for each
X ∈ DP and d(x, v) ≥ 1, it follows that |DP | ≤ d(v) − 1. By Lemma 16, {v} is
a tight set, and thus |DP | ≤ d(v) − 1 ≤ rmax − 1. 

An Inductive Argument: The goal is to prove that |P | ≤ rmax − 1 + |DP |.
By Lemma 17, this holds if d(x, X − v) = 1 for every dangerous set X ∈ DP .
Hence we assume that there is a dangerous set A ∈ DP with d(x, A − v) ≥ 2;
this property will only be used at the very end of the proof. By Lemma 16,
inequality (4a) holds for A and B for every B ∈ DP . By the minimality of DP ,
there exists a x-neighbor a ∈ A which is not contained in any other set in DP .
Similarly, there exists b ∈ B which is not contained in any other set in DP . The
following lemma shows that the edge pair (xa, xb) is admissible.
Lemma 18. For any A, B ∈ DP satisfying inequality (4a), an edge pair (xa, xb)
is admissible if a ∈ A − B and b ∈ B − A.
Proof. Suppose, by way of contradiction, that (xa, xb) is non-admissible. Then,
by Proposition 5, there exists a maximal dangerous set C containing a and b. We
claim that v ∈ C; otherwise there exists a 3-dangerous-set structure, contradict-
ing Lemma 7. Then d(x, A ∩ C) ≥ d(x, {v, a}) ≥ 2, and so inequality (4b) cannot
¯ C) ≥
hold for A and C, since 1 + 1 ≥ s(A) + s(C) ≥ s(A − C) + s(C − A) + 2d(A,
0 + 0 + 2 · 2. Therefore, inequality (4a) must hold for A and C. Since A and
C are maximal dangerous sets, A ∪ C cannot be a dangerous set, and thus
1 + 1 ≥ s(A) + s(C) ≥ s(A ∪ C) + s(A ∩ C) + 2d(A, C) ≥ 2 + s(A ∩ C) + 0, which
implies that A ∩ C is a tight set, but this contradicts the assumption that each
tight set is a singleton as {v, a} ⊆ A ∩ C. 

After splitting-off (xa, xb), let the resulting graph be G and P  = P − {xa, xb}.
Clearly, since each edge in P  is a non-admissible partner of xv in G, every edge
in P  is still a non-admissible partner of xv in G . Furthermore, by contracting
non-trivial tight sets in G , each edge in P  is still a non-admissible partner of
xv by Lemma 6. Hence we assume all tight sets in G are singletons. Let DP  be a
108 L.C. Lau and C.K. Yung

minimal set of maximal dangerous sets such that (i) each set D ∈ DP  covers the
edge xv and (ii) each edge in P  is covered by some set D ∈ DP  . The following
lemma shows that there exists DP  with |DP  | ≤ |DP | − 2.
Lemma 19. When |DP | ≥ 3, the edges in P  can be covered by a set DP  of
maximal dangerous sets in G such that (i) each set in DP  covers xv, and (ii)
each edge in P  is covered by some set in DP  , and (iii) |DP  | ≤ |DP | − 2.
Proof. We will use the dangerous sets in DP to construct DP  . Since each pair of sets
in DP satisfies inequality (4a), we have s(A∪D) = 2 before splitting-off (xa, xb) for
each D ∈ DP . Also, before splitting-off (xa, xb), for A, B, C ∈ DP , inequality (4b)
cannot hold for A ∪ B and C because 2 + 1 = s(A ∪ B) + s(C) ≥ s((A ∪ B) − C) +
¯
s(C − (A∪B))+ 2d(A∪B, C) ≥ 2 + 0 + 2 ·1, where the last inequality follows since
v ∈ A∩B∩C and (A∪B)−C is not dangerous (as it covers the admissible edge pair
(xa, xb)). Therefore inequality (4a) must hold for A ∪ B and C, which implies that
s(A ∪ B ∪ C) ≤ 3 since 2 + 1 = s(A ∪ B) + s(C) ≥ s((A ∪ B) ∪ C) + s((A ∪ B) ∩ C).
For A and B as defined before Lemma 18, since s(A ∪ B) = 2 before splitting-off
(xa, xb), A∪B becomes a tight set after splitting-off (xa, xb). For any other set C ∈
DP −A−B, since s(A∪B ∪C) ≤ 3 before splitting-off (xa, xb), A∪B ∪C becomes
a dangerous set after splitting-off (xa, xb). Hence, after splitting-off (xa, xb) and
contracting the tight set A ∪ B into v, each set in DP − A − B becomes a dangerous
set. Then DP  = DP − A − B is a set of dangerous sets covering each edge in P  ,
satisfying properties (i)-(iii). By replacing a dangerous set C ∈ DP  by a maximal
dangerous set C  ⊇ C and removing redundant dangerous sets in DP  so that it
minimally covers P  , we have found DP  as required by the lemma. 

Recall that we chose A with d(x, A − v) ≥ 2, and hence d(x, v) ≥ 2 after the
splitting-off and contraction of tight sets. Therefore, inequality (4a) holds for
every two maximal dangerous sets in DP  . By induction, when |DP | ≥ 3, we
have |P | = |P  | + 2 ≤ rmax − 1 + |DP  | + 2 ≤ rmax − 1 + |DP |. In the base case
when |DP | = 2 and A, B ∈ DP satisfy (4a), the same argument in Lemma 19 can
be used to show that the edges in P  is covered by one tight set after splitting-off
(xa, xb), and thus |P | = |P  |+ 2 ≤ rmax − 1 + 2 ≤ rmax − 1 + |DP |. This completes
the proof that |P | ≤ rmax − 1 + |DP |, proving the theorem.

5 Concluding Remarks
Theorem 3 can be applied to constrained edge splitting-off problems, and give
additive approximation algorithms for constrained augmentation problems. The
efficient algorithms can also be adapted to these problems. We refer the reader
to [25] for these results.

References
1. Bang-Jensen, J., Frank, A., Jackson, B.: Preserving and increasing local edge-
connectivity in mixed graphs. SIAM J. Disc. Math. 8(2), 155–178 (1995)
2. Bang-Jensen, J., Jordán, T.: Edge-connectivity augmentation preserving simplicity.
SIAM Journal on Discrete Mathematics 11(4), 603–623 (1998)
Efficient Edge Splitting-Off Algorithms 109

3. Bernáth, A., Király, T.: A new approach to splitting-off. In: Lodi, A., Panconesi, A.,
Rinaldi, G. (eds.) IPCO 2008. LNCS, vol. 5035, pp. 401–415. Springer, Heidelberg
(2008)
4. Benczúr, A.A., Karger, D.R.: Augmenting undirected edge connectivity in O(n2 )
time. Journal of Algorithms 37(1), 2–36 (2000)
5. Bhalgat, A., Hariharan, R., Kavitha, T., Panigrahi, D.: An Õ(mn) Gomory-Hu
tree construction algorithm for unweighted graphs. In: STOC 2007, pp. 605–614
(2007)
6. Bhalgat, A., Hariharan, R., Kavitha, T., Panigrahi, D.: Fast edge splitting and
Edmonds’ arborescence construction for unweighted graphs. In: SODA ’08, pp.
455–464 (2008)
7. Chan, Y.H., Fung, W.S., Lau, L.C., Yung, C.K.: Degree Bounded Network Design
with Metric Costs. In: FOCS ’08, pp. 125–134 (2008)
8. Cheng, E., Jordán, T.: Successive edge-connectivity augmentation problems. Math-
ematical Programming 84(3), 577–593 (1999)
9. Frank, A.: Augmenting graphs to meet edge-connectivity requirements. SIAM
Journal on Discrete Mathematics 5(1), 25–53 (1992)
10. Frank, A.: On a theorem of Mader. Ann. of Disc. Math. 101, 49–57 (1992)
11. Frank, A., Király, Z.: Graph orientations with edge-connection and parity con-
straints. Combinatorica 22(1), 47–70 (2002)
12. Gabow, H.N.: Efficient splitting off algorithms for graphs. In: STOC ’94, pp. 696–
705 (1994)
13. Goemans, M.X., Bertsimas, D.J.: Survivable networks, linear programming relax-
ations and the parsimonious property. Math. Prog. 60(1), 145–166 (1993)
14. Gomory, R.E., Hu, T.C.: Multi-terminal network flows. Journal of the Society for
Industrial and Applied Mathematics 9(4), 551–570 (1961)
15. Hariharan, R., Kavitha, T., Panigrahi, D.: Efficient algorithms for computing all
low st edge connectivities and related problems. In: SODA ’07, pp. 127–136 (2007)
16. Jordán, T.: On minimally k-edge-connected graphs and shortest k-edge-connected
Steiner networks. Discrete Applied Mathematics 131(2), 421–432 (2003)
17. Lau, L.C.: An approximate max-Steiner-tree-packing min-Steiner-cut theorem.
Combinatorica 27(1), 71–90 (2007)
18. Lovász, L.: Lecture. Conference of Graph Theory, Prague (1974); See also Combi-
natorial problems and exercises. North-Holland (1979)
19. Mader, W.: A reduction method for edge-connectivity in graphs. Annals of Discrete
Mathematics 3, 145–164 (1978)
20. Nagamochi, H.: A fast edge-splitting algorithm in edge-weighted graphs. IEICE
Transactions on Fundamentals of Electronics, Communications and Computer Sci-
ences, 1263–1268 (2006)
21. Nagamochi, H., Ibaraki, T.: Linear time algorithm for finding a sparse k-connected
spanning subgraph of a k-connected graph. Algorithmica 7(1), 583–596 (1992)
22. Nagamochi, H., Ibaraki, T.: Deterministic O(nm) time edge-splitting in undirected
graphs. Journal of Combinatorial Optimization 1(1), 5–46 (1997)
23. Nash-Williams, C.S.J.A.: On orientations, connectivity and odd vertex pairings in
finite graphs. Canadian Journal of Mathematics 12, 555–567 (1960)
24. Szigeti, Z.: Edge-splittings preserving local edge-connectivity of graphs. Discrete
Applied Mathematics 156(7), 1011–1018 (2008)
25. Yung, C.K.: Edge splitting-off and network design problems. Master thesis, The
Chinese University of Hong Kong (2009)
On Generalizations of Network Design Problems with
Degree Bounds

Nikhil Bansal1 , Rohit Khandekar1, Jochen Könemann2,


Viswanath Nagarajan1, and Britta Peis3
1
IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
2
University of Waterloo
3
Technische Universität Berlin

Abstract. Iterative rounding and relaxation have arguably become the method of
choice in dealing with unconstrained and constrained network design problems.
In this paper we extend the scope of the iterative relaxation method in two direc-
tions: (1) by handling more complex degree constraints in the minimum spanning
tree problem (namely laminar crossing spanning tree), and (2) by incorporating
‘degree bounds’ in other combinatorial optimization problems such as matroid
intersection and lattice polyhedra. We give new or improved approximation al-
gorithms, hardness results, and integrality gaps for these problems.

1 Introduction
Iterative rounding and relaxation have arguably become the method of choice in dealing
with unconstrained and constrained network design problems. Starting with Jain’s ele-
gant iterative rounding scheme for the generalized Steiner network problem in [14], an
extension of this technique (iterative relaxation) has more recently lead to breakthrough
results in the area of constrained network design, where a number of linear constraints
are added to a classical network design problem. Such constraints arise naturally in
a wide variety of practical applications, and model limitations in processing power,
bandwidth or budget. The design of powerful techniques to deal with these problems is
therefore an important goal.
The most widely studied constrained network design problem is the minimum-cost
degree-bounded spanning tree problem. In an instance of this problem, we are given an
undirected graph, non-negative costs for the edges, and positive, integral degree-bounds
for each of the nodes. The problem is easily seen to be NP-hard, even in the absence
of edge-costs, since finding a spanning tree with maximum degree two is equivalent to
finding a Hamiltonian Path. A variety of techniques have been applied to this problem
[5,6,11,17,18,23,24], culminating in Singh and Lau’s breakthrough result in [27]. They
presented an algorithm that computes a spanning tree of at most optimum cost whose
degree at each vertex v exceeds its bound by at most 1, using the iterative relaxation
framework developed in [20,27].
The iterative relaxation technique has been applied to several constrained network
design problems: spanning tree [27], survivable network design [20,21], directed graphs
with intersecting and crossing super-modular connectivity [20,2]. It has also been ap-
plied to degree bounded versions of matroids and submodular flow [15].

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 110–123, 2010.
c Springer-Verlag Berlin Heidelberg 2010
On Generalizations of Network Design Problems with Degree Bounds 111

In this paper we further extend the applicability of iterative relaxation, and obtain
new or improved bicriteria approximation results for minimum crossing spanning tree
(MCST), crossing matroid intersection, and crossing lattice polyhedra. We also provide
hardness results and integrality gaps for these problems.
Notation. As is usual, when dealing with an undirected graph G = (V, E), for any
S ⊆ V we let δG (S) := {(u, v) ∈ E | u ∈ S, v ∈ S}. When the graph is clear from
context, the subscript is dropped. A collection {U1 , · · · , Ut } of vertex-sets is called
laminar if for every pair Ui , Uj in this collection, we have Ui ⊆ Uj , Uj ⊆ Ui , or
Ui ∩ Uj = ∅. A (ρ, f (b)) approximation for minimum cost degree bounded problems
refers to a solution that (1) has cost at most ρ times the optimum that satisfies the degree
bounds, and (2) satisfies the relaxed degree constraints in which a bound b is replaced
with a bound f (b).

1.1 Our Results, Techniques and Paper Outline


Laminar MCST. Our main result is for a natural generalization of bounded-degree MST
(called Laminar Minimum Crossing Spanning Tree or laminar MCST), where we are
given an edge-weighted undirected graph with a laminar family L = {Si }m i=1 of vertex-
sets having bounds {bi }mi=1 ; and the goal is to compute a spanning tree of minimum cost
that contains at most bi edges from δ(Si ) for each i ∈ [m].
The motivation behind this problem is in designing a network where there is a hi-
erarchy (i.e. laminar family) of service providers that control nodes (i.e. vertices). The
number of edges crossing the boundary of any service provider (i.e. its vertex-cut) rep-
resents some cost to this provider, and is therefore limited. The laminar MCST problem
precisely models the question of connecting all nodes in the network while satisfying
bounds imposed by all the service providers.
From a theoretical viewpoint, cut systems induced by laminar families are well stud-
ied, and are known to display rich structure. For example, one-way cut-incidence ma-
trices are matrices whose rows are incidence vectors of directed cuts induced by the
vertex-sets of a laminar family; It is well known (e.g., see [19]) that such matrices are
totally unimodular. Using the laminar structure of degree-constraints and the iterative
relaxation framework, we obtain the following main result, and present its proof in
Section 2.
Theorem 1. There is a polynomial time (1, b + O(log n)) bicriteria approximation al-
gorithm for laminar MCST. That is, the cost is no more than the optimum cost and the
degree violation is at most additive O(log n). This guarantee is relative to the natural
LP relaxation.
This guarantee is substantially stronger than what follows from known results for the
general minimum crossing spanning tree (MCST) problem: where the degree bounds
could be on arbitrary edge-subsets E1 , . . . , Em . In particular, for general MCST a
(1, b + Δ − 1) [2,15] is known where Δ is the maximum number of degree-bounds an
edge appears in. However, this guarantee is not useful for laminar MCST as Δ can be as
large as Ω(n) in this case. If a multiplicative factor
 in the degree violationis allowed,
Chekuri et al. [8] recently gave a very elegant 1, (1 + )b + O( 1 log m) guarantee
(which subsumes the previous best (O(log n), O(log m) b) [4] result). However, these
112 N. Bansal et al.

results also cannot be used to obtain a small additive violation, especially if b is large.
In particular, both the results [4,8] for general MCST √ are based on the natural LP relax-
ation, for which there is an integrality gap of b + Ω( n) even without regard to costs
and when m = O(n) [26] (see also [3]). On the other hand, Theorem 1 shows that a
purely additive O(log n) guarantee on degree (relative to the LP relaxation and even in
presence of costs) is indeed achievable for MCST, when the degree-bounds arise from
a laminar cut-family.
The algorithm in Theorem 1 is based on iterative relaxation and uses two main new
ideas. Firstly, we drop a carefully chosen constant fraction of degree-constraints in each
iteration. This is crucial as it can be shown that dropping one constraint at a time as in
the usual applications of iterative relaxation can indeed lead to a degree violation of
Ω(Δ). Secondly, the algorithm does not just drop degree constraints, but in some itera-
tions it also generates new degree constraints, by merging existing degree constraints.
All previous applications of iterative relaxation to constrained network design treat
connectivity and degree constraints rather asymmetrically. While the structure of the
connectivity constraints of the underlying LP is used crucially (e.g., in the ubiquitous
uncrossing argument), the handling of degree constraints is remarkably simple. Con-
straints are dropped one by one, and the final performance of the algorithm is good only
if the number of side constraints is small (e.g., in recent work by Grandoni et al. [12]),
or if their structure is simple (e.g., if the ‘frequency’ of each element is small). In con-
trast, our algorithm for laminar MCST exploits the structure of degree constraints in a
non-trivial manner.
Hardness Results. We obtain the following hardness of approximation for the general
MCST problem (and its matroid counterpart). In particular this rules out any algorithm
for MCST that has additive constant degree violation, even without regard to costs.
Theorem 2. Unless N P has quasi-polynomial time algorithms, the MCST problem
admits no polynomial time O(logα m) additive approximation for the degree bounds
for some constant α > 0; this holds even when there are no costs.
The proof for this theorem is given in Section 3, and uses a a two-step reduction from
the well-known Label Cover problem. First, we show hardness for a uniform matroid
instance. In a second step, we then demonstrate how this implies the result for MCST
claimed in Theorem 2.
Note that our hardness bound nearly matches the result obtained by Chekuri et al.
in [8]. We note however that in terms of purely additive degree guarantees,√a large gap
remains. As noted above, there is a much stronger lower bound of b + Ω( n) for LP-
based algorithms [26] (even without regard to costs), which is based on discrepancy. In
light of the small number of known hardness results for discrepancy type problems, it
is unclear how our bounds for MCST could be strengthened.
Degree Bounds in More General Settings. We consider crossing versions of other clas-
sic combinatorial optimization problems, namely matroid intersection and lattice poly-
hedra. We discuss our results briefly and defer the proofs to the full version of the
paper [3].
On Generalizations of Network Design Problems with Degree Bounds 113

Definition 1 (Minimum crossing matroid intersection problem). Let r1 , r2 : 2E →


Z be two supermodular functions, c : E → R and {Ei }i∈I be a collection of subsets of
E with corresponding bounds {bi }i∈I . Then the goal is to minimize:
-
{cT x - x(S) ≥ max{r1 (S), r2 (S)}, ∀ S ⊆ E;
x(Ei ) ≤ bi , ∀ i ∈ [m]; x ∈ {0, 1}E }.

We remark that there are alternate definitions of matroid intersection (e.g., see Schri-
jver [25]) and that our result below extends to those as well.
Let Δ = maxe∈E |{i ∈ [m] | e ∈ Ei }| be the largest number of sets Ei that any
element of E belongs to, and refer to it as frequency.
Theorem 3. Any optimal basic solution x∗ of the linear relaxation of the minimum
crossing matroid intersection problem can be rounded into an integral solution x̂ such
that x̂(S) ≥ max{r1 (S), r2 (S)} for all S ⊆ E and

cT x̂ ≤ 2cT x∗ and x̂(Ei ) ≤ 2bi + Δ − 1 ∀i ∈ I.

The algorithm for this theorem again uses iterative relaxation, and its proof is based on
a ‘fractional token’ counting argument similar to the one used in [2].
An interesting special case is for the bounded-degree arborescence problem (where
Δ = 1). As the set of arborescences in a digraph can be expressed as the intersection
of partition and graphic matroids, Theorem 3 readily implies a (2, 2b) approximation
for this problem. This is an improvement over the previously best-known (2, 2b + 2)
bound [20] for this problem.
The bounded-degree arborescence problem is potentially of wider interest since it is
a relaxation of ATSP, and it is hoped that ideas from this problem lead to new ideas
for ATSP. In fact Theorem 3 also implies an improved (2, 2b)-approximation for the
bounded-degree arborescence packing problem, where the goal is to pack a given num-
ber of arc-disjoint arborescences while satisfying degree-bounds on vertices (arbores-
cence packing can again be phrased as matroid intersection). The previously best known
bound for this problem was (2, 2b + 4) [2]. We also give the following integrality gap.
Theorem 4. For any > 0, there exists an instance of unweighted minimum crossing
arborescence for which the LP is feasible, and any integral solution must violate the
bound on some set {Ei }m
i=1 by a multiplicative factor of at least 2 − . Moreover, this
instance has Δ = 1, and just one non-degree constraint.
Thus Theorem 3 is the best one can hope for, relative to the LP relaxation. First,
Theorem 4 implies that the multiplicative factor in the degree cannot be improved be-
yond 2 (even without regard to costs). Second, the lower bound for arborescences with
costs presented in [2] implies that no cost-approximation ratio better than 2 is possible,
without violating degrees by a factor greater than 2.
Crossing Lattice Polyhedra. Classical lattice polyhedra form a unified framework for
various discrete optimization problems and go back to Hoffman and Schwartz [13] who
proved their integrality. They are polyhedra of type

{x ∈ [0, 1]E | x(ρ(S)) ≥ r(S), ∀S ∈ F }


114 N. Bansal et al.

where F is a consecutive submodular lattice, ρ : F → 2E is a mapping from F to


subsets of the ground-set E, and r ∈ RF is supermodular. A key property of lattice
polyhedra is that the uncrossing technique can be applied which turns out to be cru-
cial in almost all iterative relaxation approaches for optimization problems with degree
bounds. We refer the reader to [25] for a more comprehensive treatment of this subject.
We generalize our work further to crossing lattice polyhedra which arise from clas-
sical lattice polyhedra by adding “degree-constraints” of the form ai ≤ x(Ei ) ≤ bi
for a given collection {Ei ⊆ E | i ∈ I} and lower and upper bounds a, b ∈ RI . We
mention that this model covers several important applications including the crossing
matroid basis and crossing planar mincut problems, among others.
We can show that the standard LP relaxation for the general crossing lattice polyhe-
dron problem is weak; details are deferred to the full version of the paper in [3]. For
this reason, we henceforth focus on a restricted class of crossing lattice polyhedra in
which the underlying lattice (F , ≤) satisfies the following monotonicity property

(∗) S < T =⇒ |ρ(S)| < |ρ(T )| ∀ S, T ∈ F.


We obtain the following theorem whose proof is given in [3].
Theorem 5. For any instance of the crossing lattice polyhedron problem in which F
satisfies property (∗), there exists an algorithm that computes an integral solution of
cost at most the optimal, where all rank constraints are satisfied, and each degree bound
is violated by at most an additive 2Δ − 1.
We note that the above property (∗) is satisfied for matroids, and hence Theorem 5
matches the previously best-known bound [15] for degree bounded matroids (with both
upper/lower bounds). Also note that property (∗) holds whenever F is ordered by inclu-
sion. In this special case, we can improve the result to an additive Δ − 1 approximation
if only upper bounds are given.

1.2 Related Work


As mentioned earlier, the basic bounded-degree MST problem has been extensively stud-
ied [5,6,11,17,18,23,24,27]. The iterative relaxation technique for degree-constrained
problems was developed in [20,27].
MCST was first introduced by Bilo et al. [4], who presented a randomized-rounding
algorithm that computes a tree of cost O(log n) times the optimum where each degree
constraint is violated by a multiplicative O(log n) factor and an additive O(log m) term.
Subsequently, Bansal et al. [2] gave an algorithm that attains an optimal cost guarantee
and an additive Δ − 1 guarantee on degree; recall that Δ is the maximum number of de-
gree constraints that an edge lies in. This algorithm used iterative relaxation as its main

tool. Recently, Chekuri et al. [8] obtained an improved 1, (1 + )b + O( 1 log m) ap-
proximation algorithm for MCST, for any > 0; this algorithm is based on pipage
rounding.
The minimum crossing matroid basis problem was introduced in [15], where the au-
thors used iterative relaxation to obtain (1) (1, b + Δ − 1)-approximation when there
are only upper bounds on degree, and (2) (1, b + 2Δ − 1)-approximation in the pres-
ence of both upper and lowed degree-bounds. The [8] result also holds in this matroid
On Generalizations of Network Design Problems with Degree Bounds 115

setting. [15] also considered a degree-bounded version of the submodular flow problem
and gave a (1, b + 1) approximation guarantee.
The bounded-degree arborescence problem was considered in Lau et al. [20], where
a (2, 2b + 2) approximation guarantee was obtained. Subsequently Bansal et al. [2]
designed an algorithm that for any 0 < ≤ 1/2, achieves a (1/ , bv /(1 − ) + 4)
approximation guarantee. They also showed that this guarantee is the best one can hope
for via the natural LP relaxation (for every 0 < ≤ 1/2). In the absence of edge-costs,
[2] gave an algorithm that violates degree bounds by at most an additive two. Recently
Nutov [22] studied the arborescence problem under weighted degree constraints, and
gave a (2, 5b) approximation for it.
Lattice polyhedra were first investigated by Hoffman and Schwartz [13] and the nat-
ural LP relaxation was shown to be totally dual integral. Even though greedy-type algo-
rithms are known for all examples mentioned earlier, so far no combinatorial algorithm
has been found for lattice polyhedra in general. Two-phase greedy algorithms have been
established only in cases where an underlying rank function satisfies a monotonicity
property [10], [9].

2 Crossing Spanning Tree with Laminar Degree Bounds


In this section we prove Theorem 1 by presenting an iterative relaxation-based algo-
rithm with the stated performance guarantee. During its execution, the algorithm selects
and deletes edges, and it modifies the given laminar family of degree bounds. A generic
iteration starts with a subset F of edges already picked in the solution, a subset E of
undecided edges, i.e., the edges not yet picked or dropped from the solution, a laminar
family L on V , and residual degree bounds b(S) for each S ∈ L.
The laminar family L has a natural forest-like structure with nodes corresponding
to each element of L. A node S ∈ L is called the parent of node C ∈ L if S is the
inclusion-wise minimal set in L \ {C} that contains C; and C is called a child of S.
Node D ∈ L is called a grandchild of node S ∈ L if S is the parent of D’s parent.
Nodes S, T ∈ L are siblings if they have the same parent node. A node that has no
parent is called root. The level of any node S ∈ L is the length of the path in this forest
from S to the root of its tree. We also maintain a linear ordering of the children of
each L-node. A subset B ⊆ L is called consecutive if all nodes in B are siblings (with
parent S) and they appear consecutively in the ordering of S’s children. In any iteration
(F, E, L, b), the algorithm solves the following LP relaxation of the residual problem.

min ce xe (1)
e∈E

s.t. x(E(V )) = |V | − |F | − 1
x(E(U )) ≤ |U | − |F (U )| − 1 ∀U ⊂ V
x(δE (S)) ≤ b(S) ∀S ∈ L
xe ≥ 0 ∀e ∈ E

For any vertex-subset W ⊆ V and edge-set H, we let H(W ) := {(u, v) ∈ H | u, v ∈


W } denote the edges induced on W ; and δH (W ) := {(u, v) ∈ H | u ∈ W, v ∈ W }
the set of edges crossing W . The first two sets of constraints are spanning tree con-
straints while the third set corresponds to the degree bounds. Let x denote an optimal
116 N. Bansal et al.

extreme point solution to this LP. By reducing degree bounds b(S), if needed, we as-
sume that x satisfies all degree bounds at equality (the degree bounds may therefore be
fractional-valued). Let α := 24.
Definition 2. An edge e ∈ E is said to be local for S ∈ L if e has at least one end-point
in S but is neither in E(C) nor in δ(C) ∩ δ(S) for any grandchild C of S. Let local(S)
denote the set of local edges for S. A node S ∈ L is said to be good if |local(S)| ≤ α.
The figure on the left shows a set S, its
children B1 and B2 , and grand-children
C1 , . . . , C4 ; edges in local(S) are drawn
S
solid, non-local ones are shown dashed. C4 B2
C1
Initially, E is the set of edges in the C 3
B1
given graph, F ← ∅, L is the original C2
laminar family of vertex sets for which
there are degree bounds, and an arbitrary
linear ordering is chosen on the children
of each node in L. In a generic iteration (F, E, L, b), the algorithm performs one of the
following steps (see also Figure 1):

1. If xe = 1 for some edge e ∈ E then F ← F ∪ {e}, E ← E \ {e}, and set


b(S) ← b(S) − 1 for all S ∈ L with e ∈ δ(S).
2. If xe = 0 for some edge e ∈ E then E ← E \ {e}.
3. DropN: Suppose there at least |L|/4 good non-leaf nodes in L. Then either odd-
levels or even-levels contain a set M ⊆ L of |L|/8 good non-leaf nodes. Drop
the degree bounds of all children of M and modify L accordingly. The ordering of
siblings also extends naturally.
4. DropL: Suppose there are more than |L|/4 good leaf nodes in L, denoted by N .
Then partition N into parts corresponding to siblings in L. For any part {N1 , · · · ,
Nk } ⊆ N consisting of ordered (not necessarily contiguous) children of some node
S:
(a) Define Mi = N2i−1 ∪ N2i for all 1 ≤ i ≤ k/2 (if k is odd Nk is not used).
(b) Modify L by removing leaves {N1 , · · · , Nk } and adding new leaf-nodes {M1 ,
· · · , Mk/2 } as children of S (if k is odd Nk is removed). The children of S in
the new laminar family are ordered as follows: each node Mi takes the position
of either N2i−1 or N2i , and other children of S are unaffected.
(c) Set the degree bound of each Mi to b(Mi ) = b(N2i−1 ) + b(N2i ).

Assuming that one of the above steps applies at each iteration, the algorithm terminates
when E = ∅ and outputs the final set F as a solution. It is clear that the algorithm
outputs a spanning tree of G. An inductive argument (see e.g. [20]) can be used to show
that the LP (1) is feasible at each each iteration and c(F ) + zcur ≤ zo where zo is
the original LP value, zcur is the current LP value, and F is the chosen edge-set at the
current iteration. Thus the cost of the final solution is at most the initial LP optimum zo .
Next we show that one of the four iterative steps always applies.
Lemma 1. In each iteration, one of the four steps above applies.
On Generalizations of Network Design Problems with Degree Bounds 117

S S

N1 T N2 N3 N4 N5
1 2 3 4 DropN step DropL step
Good non-leaf S Good leaves {Ni}5i=1
S
S

1 2 3 4 T
M1 M2
Fig. 1. Examples of the degree constraint modifications DropN and DropL

Proof. Let x∗ be the optimal basic solution of (1), and suppose that the first two steps
do not apply. Hence, we have 0 < x∗e < 1 for all e ∈ E. The fact that x∗ is a basic
solution together with a standard uncrossing argument (e.g., see [14]) implies that x∗ is
uniquely defined by

x(E(U )) = |U | − |F (U )| − 1 ∀ U ∈ S, and x(δE (S)) = b(S), ∀ S ∈ L ,

where S is a laminar subset of the tight spanning tree constraints, and L is a subset of
tight degree constraints, and where |E| = |S| + |L |.
A simple counting argument (see, e.g., [27]) shows that there are at least 2 edges
induced on each S ∈ S that are not induced on any of its children; so 2|S| ≤ |E|. Thus
we obtain |E| ≤ 2|L | ≤ 2|L|.
From the definition of local edges, we get that any edge e = (u, v) is local to at most
the following six sets: the smallest set S1 ∈ L containing u, the smallest set S2 ∈ L
containing v, the parents P1 and P2 of S1  and S2 resp., the least-common-ancestor L
of P1 and P2 , andthe parent of L. Thus S∈L |local(S)| ≤ 6|E|. From the above,
we conclude that S∈L |local(S)| ≤ 12|L|. Thus at least |L|/2 sets S ∈ L must have
|local(S)| ≤ α = 24, i.e., must be good. Now either at least |L|/4 of them must be
non-leaves or at least |L|/4 of them must be leaves. In the first case, step 3 holds and in
the second case, step 4 holds.
It remains to bound the violation in the degree constraints, which turns out to be rather
challenging. We note that this is unlike usual applications of iterative rounding/relaxation,
where the harder part is in showing that one of the iterative steps applies.
It is clear that the algorithm reduces the size of L by at least |L|/8 in each DropN or
DropL iteration. Since the initial number of degree constraints is at most 2n − 1, we get
the following lemma.
Lemma 2. The number of drop iterations (DropN and DropL) is T := O(log n).
Performance guarantee for degree constraints. We begin with some notation. The
iterations of the algorithm are broken into periods between successive drop iterations:
there are exactly T drop-iterations (Lemma 2). In what follows, the t-th drop iteration
118 N. Bansal et al.

is called round t. The time t refers to the instant just after round t; time 0 refers to the
start of the algorithm. At any time t, consider the following parameters.
– Lt denotes the laminar family of degree constraints.
– Et denotes the undecided edge set, i.e., support of the current
 LP optimal solution.
– For any set B of consecutive siblings in Lt , Bnd(B, t) = N ∈B b(N ) equals the
sum of the residual degree bounds on nodes of B.
– For any set B of consecutive siblings in Lt , Inc(B, t) equals the number of edges
from δEt (∪N ∈B N ) included in the final solution.
Recall that b denotes the residual degree bounds at any point in the algorithm. The
following lemma is the main ingredient in bounding the degree violation.
Lemma 3. For any set B of consecutive siblings in Lt (at any time t), Inc(B, t) ≤
Bnd(B, t) + 4α · (T − t).
Observe that this implies the desired bound on each original degree constraint S: using
t = 0 and B = {S}, the violation is bounded by an additive 4α · T term.
Proof. The proof of this lemma is by induction on T − t. The base case t = T is trivial
since the only iterations after this correspond to including 1-edges: hence there is no
bound, i.e. Inc({N }, T) ≤ b(N ) for all N ∈ LT . Hence for any
violation in any degree
B ⊆ L, Inc(B, T ) ≤ N ∈B Inc({N }, T ) ≤ N ∈B b(N ) = Bnd(B, T ).
Now suppose t < T , and assume the lemma for t + 1. Fix a consecutive B ⊆ Lt . We
consider different cases depending on what kind of drop occurs in round t + 1.
DropN round. Here either all nodes in B get dropped or none gets dropped.
Case 1: None of B is dropped. Then observe that B is consecutive in Lt+1 as well;
so the inductive hypothesis implies Inc(B, t + 1) ≤ Bnd(B, t + 1) + 4α · (T − t − 1).
Since the only iterations between round t and round t + 1 involve edge-fixing, we have
Inc(B, t) ≤ Bnd(B, t) − Bnd(B, t + 1) + Inc(B, t + 1) ≤ Bnd(B, t) + 4α · (T − t − 1) ≤
Bnd(B, t) + 4α · (T − t).
Case 2: All of B is dropped. Let C denote the set of all children (in Lt ) of nodes in
B. Note that C consists of consecutive siblings in Lt+1 , and inductively Inc(C, t + 1) ≤
Bnd(C, t + 1) + 4α · (T − t − 1). Let S ∈ Lt denote the parent of the B-nodes;
so C are grand-children of S in Lt . Let x denote the optimal LP solution just before
round t + 1 (when the degree bounds are still given by Lt ), and H = Et+1 the support
edges of x. At that  point, we have b(N ) = x(δ(N )) for all N ∈ B ∪ C. Also let
Bnd (B, t + 1) := N ∈B b(N ) be the sum of bounds on B-nodes just before round

t+ 1. Since S is  t + 1, |Bnd (B,
a good node in round t + 1) − Bnd(C, t + 1)| =
| N ∈B b(N ) − M∈C b(M )| = | N ∈B x(δ(N )) − M∈C x(δ(M ))| ≤ 2α. The
last inequality follows since S is good; the factor of 2 appears since some edges, e.g.,
the edges between two children or two grandchildren of S, may get counted twice. Note
also that the symmetric difference of δH (∪N ∈B N ) and δH (∪M∈C M ) is contained in
local(S). Thus δH (∪N ∈B N ) and δH (∪M∈C M ) differ in at most α edges.
Again since all iterations between time t and t + 1 are edge-fixing:

Inc(B, t) ≤ Bnd(B, t) − Bnd (B, t + 1) + |δH (∪N ∈B N ) \ δH (∪M∈C M )|


+Inc(C, t + 1)
On Generalizations of Network Design Problems with Degree Bounds 119

≤ Bnd(B, t) − Bnd (B, t + 1) + α + Inc(C, t + 1)


≤ Bnd(B, t) − Bnd (B, t + 1) + α + Bnd(C, t + 1) + 4α · (T − t − 1)
≤ Bnd(B, t) − Bnd (B, t + 1) + α + Bnd (B, t + 1) + 2α+4α ·(T − t − 1)
≤ Bnd(B, t) + 4α · (T − t)
The first inequality above follows from simple counting; the second follows since
δH (∪N ∈B N ) and δH (∪M∈C M ) differ in at most α edges; the third is the induction
hypothesis, and the fourth is Bnd(C, t + 1) ≤ Bnd (B, t + 1) + 2α (as shown above).
DropL round. In this case, let S be the parent of B-nodes in Lt , and N = {N1 , · · · , Np }
be all the ordered children of S, of which B is a subsequence (since it is consecutive).
Suppose indices 1 ≤ π(1) < π(2) < · · · < π(k) ≤ p correspond to good leaf-nodes
in N . Then for each 1 ≤ i ≤ k/2, nodes Nπ(2i−1) and Nπ(2i) are merged in this
round. Let {π(i) | e ≤ i ≤ f } (possibly empty) denote the indices of good leaf-nodes
in B. Then it is clear that the only nodes of B that may be merged with nodes outside
B are Nπ(e) and Nπ(f ) ; all other B-nodes are either not merged or merged with another
B-node. Let C be the inclusion-wise minimal set of children of S in Lt+1 s.t.
– C is consecutive in Lt+1 ,
– C contains all nodes of B \ {Nπ(i) }ki=1 , and
– C contains all new leaf nodes resulting from merging two good leaf nodes of B.
Note that ∪M∈C M consists of some subset of B and at most two good leaf-nodes in
N \ B. These two extra nodes (if any) are those
merged with the good leaf-nodes Nπ(e)
and Nπ(f ) of B. Again let Bnd (B, t + 1) := N ∈B b(N ) denote the sum of bounds
on B just before drop round t + 1, when degree constraints are Lt . Let H = Et+1 be
the undecided edges in round t + 1. By the definition of bounds on merged leaves, we
have Bnd(C, t + 1) ≤ Bnd (B, t + 1) + 2α. The term 2α is present due to the two extra
good leaf-nodes described above.
Claim 6. We have |δH (∪N ∈B N ) \ δH (∪M∈C M )| ≤ 2α.
Proof. We say that N ∈ N is represented in C if either N ∈ C or N is contained
in some node of C. Let D be set of nodes of B that are not represented in C and the
nodes of N \ B that are represented in C. Observe that by definition of C, the set D ⊆
{Nπ(e−1) , Nπ(e) , Nπ(f ) , Nπ(f +1) }; in fact it can be easily seen that |D| ≤ 2. Moreover
D consists of only good leaf nodes. Thus, we have | ∪L∈D δH (L)| ≤ 2α. Now note that
the edges in δH (∪N ∈B N ) \ δH (∪M∈C M ) must be in ∪L∈D δH (L). This completes the
proof.
As in the previous case, we have:
Inc(B, t) ≤ Bnd(B, t) − Bnd (B, t + 1) + |δH (∪N ∈B N ) \ δH (∪M∈C M )|
+Inc(C, t + 1)
≤ Bnd(B, t) − Bnd (B, t + 1) + 2α + Inc(C, t + 1)
≤ Bnd(B, t) − Bnd (B, t + 1) + 2α + Bnd(C, t + 1) + 4α · (T − t − 1)
≤ Bnd(B, t) − Bnd (B, t + 1)+2α+Bnd (B, t + 1)+2α+4α · (T − t − 1)
= Bnd(B, t) + 4α · (T − t)
120 N. Bansal et al.

The first inequality follows from simple counting; the second uses Claim 6, the third
is the induction hypothesis (since C is consecutive), and the fourth is Bnd(C, t + 1) ≤
Bnd (B, t + 1) + 2α (from above).
This completes the proof of the inductive step and hence Lemma 3.

3 Hardness Results
We now prove Theorem 2. The first step to proving this result is a hardness for the more
general minimum crossing matroid basis problem: given a matroid M on a ground set
V of elements, a cost function c : V → R+ , and degree bounds specified by pairs
i=1 (where each Ei ⊆ V and bi ∈ N), find a minimum cost basis I in M
{(Ei , bi )}m
such that |I ∩ Ei | ≤ bi for all i ∈ [m].
Theorem 7. Unless N P has quasi-polynomial time algorithms, the unweighted min-
imum crossing matroid basis problem admits no polynomial time O(logc m) additive
approximation for the degree bounds for some fixed constant c > 0.
Proof. We reduce from the label cover problem [1]. The input is a graph G = (U, E)
where the vertex set U is partitioned into pieces U1 , · · · , Un each having size q, and all
edges in E are between distinct pieces. We say that there is a superedge between Ui and
Uj if there is an edge connecting some vertex in Ui to some vertex in Uj . Let t denote
the total number of superedges; i.e.,
-   .-
- [n] -
t = -- (i, j) ∈ : there is an edge in E between Ui and Uj --
2
The goal is to pick one vertex from each part {Ui }ni=1 so as to maximize the number of
induced edges. This is called the value of the label cover instance and is at most t.
It is well known that there exists a universal constant γ > 1 such that for every
k ∈ N, there is a reduction from any instance of SAT (having size N ) to a label cover
instance "G = (U, E), q, t# such that:
– If the SAT instance is satisfiable, the label cover instance has optimal value t.
– If the SAT instance is not satisfiable, the label cover instance has optimal value
< t/γ k .
– |G| = N O(k) , q = 2k , |E| ≤ t2 , and the reduction runs in time N O(k) .
We consider a uniform matroid M with rank t on ground set E (recall that any subset
of t edges is a basis in a uniform matroid). We now construct a crossing matroid basis
instance I on M. There is a set of degree bounds corresponding to each i ∈ [n]: for
every collection C of edges incident to vertices in Ui such that no two edges in C are
incident to the same vertex in Ui , there is a degree bound in I requiring at most one
element to be chosen from C. Note that the number of degree bounds m is at most
k
|E|q ≤ N O(k 2 ) . The following claim links the SAT and crossing matroid instances.
Its proof is deferred to the full version of this paper.
Claim 8. [Yes instance] If the SAT instance is satisfiable, there is a basis (i.e. subset
B ⊆ E with |B| = t) satisfying all degree bounds.
 
√subset B ⊆ E with |B | ≥ t/2
[No instance] If the SAT instance is unsatisfiable, every
k/2
violates some degree bound by an additive ρ = γ / 2.
On Generalizations of Network Design Problems with Degree Bounds 121

The steps described in the above reduction can be done in time polynomial in m and
|G|. Also, instead of randomly choosing vertices from the sets Wi , we can use condi-
tional expectations to derive a deterministic algorithm that recovers at least t/ρ2 edges.
Setting k = Θ(log log N ) (recall that N is the size of the original SAT instance), we
a
obtain an instance of bounded-degree matroid basis of size max{m, |G|} = N log N
and ρ = logb N , where a, b > 0 are constants. Note that log m = loga+1 N , which
implies ρ = logc m for c = a+1 b
> 0, a constant. Thus it follows that for this constant
c > 0 the bounded-degree matroid basis problem has no polynomial time O(logc m)
additive approximation for the degree bounds, unless N P has quasi-polynomial time
algorithms.
We now prove Theorem 2.
Proof. [Proof of Theorem 2] We show how the bases of a uniform matroid can be
represented in a suitable instance of the crossing spanning tree problem. Let the uniform √
matroid from Theorem 7 consist of e elements and have rank t ≤ e; recall that t ≥ e
and clearly m ≤ 2e . We construct a graph as in Figure 2, with vertices v1 , · · · , ve
corresponding to elements in the uniform matroid. Each vertex vi is connected to the
root r by two vertex-disjoint paths: "vi , ui , r# and "vi , wi , r#. There are no costs in
this instance. Corresponding to each degree bound (in the uniform matroid) of b(C)
on a subset C ⊆ [e], there is a constraint to pick at most |C| + b(C) edges from
δ({ui / | i ∈ C}). Additionally, there is a special degree bound of 2e − t on the edge-set
E  = ei=1 δ(wi ); this corresponds to picking a basis in the uniform matroid.
Observe that for each i ∈ [e], any r
spanning tree must choose exactly three u
1
w
edges amongst {(r, ui ), (ui , vi ), (r, wi ), w
e

1
e u
(wi , vi )}, in fact any three edges suffice. u
i w
i
v1
v
Hence every spanning tree T in this graph e

corresponds to a subset X ⊆ [e] such


v
that: (I) T contains both edges in δ(ui ) i

and one edge from δ(wi ), for each i ∈ X,


Fig. 2. The crossing spanning tree instance used
and (II) T contains both edges in δ(wi )
in the reduction
and one edge from δ(ui ) for each i ∈
[e] \ X.
From Theorem 7, for the crossing matroid problem, we obtain the two cases:
Yes instance. There is a basis B ∗ (i.e. B ∗ ⊆ [e], |B ∗ | = t) satisfying all degree bounds.
Consider the spanning tree
,
T ∗ = {(r, ui ), (ui , vi ), (r, wi ) | i ∈ B ∗ } {(r, wi ), (ui , wi ), (r, ui ) | i ∈ [e] \ B ∗ }.

Since B ∗ satisfies its degree-bounds, T ∗ satisfies all degree bounds derived from the
crossing matroid instance. For the special degree bound on E  , note that |T ∗ ∩ E  | =
2e − |B ∗ | = 2e − t; so this is also satisfied. Thus there is a spanning tree satisfying all
the degree bounds.
No instance. Every subset B  ⊆ [e] with |B  | ≥ t/2 (i.e. near basis) violates some
degree bound by an additive ρ = Ω(logc m) term, where c > 0 is a fixed constant.
Consider any spanning tree T that corresponds to subset X ⊆ [e] as described above.
122 N. Bansal et al.

1. Suppose that |X| ≤ t/2; then we have |T ∩ E  | = 2e − |X| ≥ 2e − t + 2t , i.e. the



special degree bound is violated by t/2 ≥ Ω( e) = Ω(log1/2 m).
2. Now suppose that |X| ≥ t/2. Then by the guarantee on the no-instance, T violates
some degree-bound derived from the crossing matroid instance by additive ρ.
Thus in either case, every spanning tree violates some degree bound by additive ρ =
Ω(logc m).
By Theorem 7, it is hard to distinguish the above cases and we obtain the correspond-
ing hardness result for crossing spanning tree, as claimed in Theorem 2.

References
1. Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The hardness of approximate optima in lattices,
codes, and systems of linear equations. J. Comput. Syst. Sci. 54(2), 317–331 (1997)
2. Bansal, N., Khandekar, R., Nagarajan, V.: Additive guarantees for degree bounded network
design. In: STOC, pp. 769–778 (2008)
3. Bansal, N., Khandekar, R., Könemann, J., Nagarajan, V., Peis, B.: On Generalizations of
Network Design Problems with Degree Bounds (full version),Technical Report (2010)
4. Bilo, V., Goyal, V., Ravi, R., Singh, M.: On the crossing spanning tree problem. In: Jansen,
K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS,
vol. 3122, pp. 51–60. Springer, Heidelberg (2004)
5. Chaudhuri, K., Rao, S., Riesenfeld, S., Talwar, K.: What would Edmonds do? Augment-
ing paths and witnesses for degree-bounded MSTs. In: Chekuri, C., Jansen, K., Rolim,
J.D.P., Trevisan, L. (eds.) APPROX 2005 and RANDOM 2005. LNCS, vol. 3624, pp. 26–39.
Springer, Heidelberg (2005)
6. Chaudhuri, K., Rao, S., Riesenfeld, S., Talwar, K.: Push relabel and an improved approxima-
tion algorithm for the bounded-degree MST problem. In: Bugliesi, M., Preneel, B., Sassone,
V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 191–201. Springer, Heidelberg
(2006)
7. Chazelle, B.: The Discrepancy Method: Randomness and Complexity. Cambridge University
Press, Cambridge (2000)
8. Chekuri, C., Vondrák, J., Zenklusen, R.: Dependent Randomized Rounding for Matroid Poly-
topes and Applications (2009), http://arxiv.org/abs/0909.4348
9. Faigle, U., Peis, B.: Two-phase greedy algorithms for some classes of combinatorial linear
programs. In: SODA, pp. 161–166 (2008)
10. Frank, A.: Increasing the rooted connectivity of a digraph by one. Math. Programming 84,
565–576 (1999)
11. Goemans, M.X.: Minimum Bounded-Degree Spanning Trees. In: FOCS, pp. 273–282 (2006)
12. Grandoni, F., Ravi, R., Singh, M.: Iterative Rounding for Multiobjective Optimization Prob-
lems. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 95–106. Springer,
Heidelberg (2009)
13. Hoffman, A., Schwartz, D.E.: On lattice polyhedra. In: Hajnal, A., Sos, V.T. (eds.) Proceed-
ings of Fifth Hungarian Combinatorial Coll, pp. 593–598. North-Holland, Amsterdam (1978)
14. Jain, K.: A factor 2 approximation algorithm for the generalized Steiner network problem.
In: Combinatorica, pp. 39–61 (2001)
15. Király, T., Lau, L.C., Singh, M.: Degree bounded matroids and submodular flows. In: Lodi,
A., Panconesi, A., Rinaldi, G. (eds.) IPCO 2008. LNCS, vol. 5035, pp. 259–272. Springer,
Heidelberg (2008)
On Generalizations of Network Design Problems with Degree Bounds 123

16. Klein, P.N., Krishnan, R., Raghavachari, B., Ravi, R.: Approximation algorithms for finding
low degree subgraphs. Networks 44(3), 203–215 (2004)
17. Könemann, J., Ravi, R.: A matter of degree: Improved approximation algorithms for degree
bounded minimum spanning trees. SIAM J. on Computing 31, 1783–1793 (2002)
18. Könemann, J., Ravi, R.: Primal-Dual meets local search: approximating MSTs with nonuni-
form degree bounds. SIAM J. on Computing 34(3), 763–773 (2005)
19. Korte, B., Vygen, J.: Combinatorial Optimization, 4th edn. Springer, New York (2008)
20. Lau, L.C., Naor, J., Salavatipour, M.R., Singh, M.: Survivable network design with degree or
order constraints (full version). In: STOC, pp. 651–660 (2007)
21. Lau, L.C., Singh, M.: Additive Approximation for Bounded Degree Survivable Network
Design. In: STOC, pp. 759–768 (2008)
22. Nutov, Z.: Approximating Directed Weighted-Degree Constrained Networks. In: Goel, A.,
Jansen, K., Rolim, J.D.P., Rubinfeld, R. (eds.) APPROX 2008 and RANDOM 2008. LNCS,
vol. 5171, pp. 219–232. Springer, Heidelberg (2008)
23. Ravi, R., Marathe, M.V., Ravi, S.S., Rosenkrantz, D.J., Hunt, H.B.: Many birds with one
stone: Multi-objective approximation algorithms. In: STOC, pp. 438–447 (1993)
24. Ravi, R., Singh, M.: Delegate and Conquer: An LP-based approximation algorithm for Min-
imum Degree MSTs. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP
2006. LNCS, vol. 4051, pp. 169–180. Springer, Heidelberg (2006)
25. Schrijver, A.: Combinatorial Optimization. Springer, Heidelberg (2003)
26. Singh, M.: Personal Communication (2008)
27. Singh, M., Lau, L.C.: Approximating minimum bounded degree spanning trees to within one
of optimal. In: STOC, pp. 661–670 (2007)
A Polyhedral Study of the Mixed Integer Cut

Steve Tyber and Ellis L. Johnson

H. Milton Stewart School of Industrial and Systems Engineering,


Georgia Institute of Technology, Atlanta, GA USA
{styber,ejohnson}@isye.gatech.edu

Abstract. General purpose cutting planes have played a central role in


modern IP solvers. In practice, the Gomory mixed integer cut has proven
to be among the most useful general purpose cuts. One may obtain this
inequality from the group relaxation of an IP, which arises by relaxing
non-negativity on the basic variables. We study the mixed integer cut as a
facet of the master cyclic group polyhedron and characterize its extreme
points and adjacent facets in this setting. Extensions are provided under
automorphic and homomorphic mappings.

Keywords: Integer Programming, Group Relaxation, Master Cyclic


Group Polyhedron, Master Knapsack Polytope, Cutting Planes.

1 Introduction
Consider the integer program
min(cx : Ax = b, x ∈ Zn+ ), (1)
where A ∈ Zm×n , b ∈ Zm , and c ∈ Rn . Given a basis B of the LP relaxation of
(1), the group relaxation of X, is obtained by relaxing non-negativity on xB , i.e.
XGR = {x : BxB + N xN = b, xB ∈ Zm , xN ∈ Zn−m
+ }.
It follows that for an integer vector xN , xB is integral if and only if N xN ≡ b
(mod B); that is, N xN − b belongs to the lattice generated by the columns of
B.
Consider the group G of equivalence classes of Zn modulo B. Let N be the
set of distinct equivalence classes represented by the columns of N , and let g0
be the equivalence class represented by b. The group polyhedron is given by
⎧ ⎫
⎨  ⎬
|N |
P (N , g0 ) = conv t ∈ Z+ : gt(g) = g0 ,
⎩ ⎭
g∈N

where equality is taken modulo B. Letting G+ = G \ 0, i.e. the set of equivalence


classes distinct from the lattice generated by B, the master group polyhedron is
given by ⎧ ⎫
⎨  ⎬
|G|−1
P (G, g0 ) = conv t ∈ Z+ : gt(g) = g0 .
⎩ ⎭
g∈G+

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 124–134, 2010.

c Springer-Verlag Berlin Heidelberg 2010
A Polyhedral Study of the Mixed Integer Cut 125

When A consists of a single row, the master group polyhedron is of the form
⎧ ⎫
⎨ |D|−1
 ⎬
|D|−1
P (CD , r) = conv t ∈ Z+ : iti ≡ r (mod D)
⎩ ⎭
i=1

and is called the master cyclic group polyhedron.


In [3], Gomory introduces the group polyhedron and studies its facets. In
particular, he shows that one can obtain facets of the group polyhedron from
facets of its corresponding master polyhedron, and that these can be used to
obtain valid inequalities for P . Further, Gomory identifies the mixed integer cut
as a facet of the master cyclic group polyhedron P (Cn , r) where r = 0:
1 r−1 n−r−1 1
t1 + · · · + tr−1 + tr + tr+1 + · · · + tn−1 ≥ 1.
r r n−r n−r
Indeed, by dropping the coefficients in the above inequality for elements not
appearing in the group problem for a tableau row, one readily obtains the familiar
Gomory mixed integer cut.
Empirically, this cut has been effective in solving integer programs [2], and
shooting experiments indicate that this facet describes a large portion of the
master cyclic group polyhedron [4].
We continue this investigation of the mixed integer cut. In Section 2, we
characterize its extreme points and identify the adjacent facets of the master
cyclic group polyhedron; in Section 3, we extend our characterization of extreme
points to all integer points of the mixed integer cut; and in Section 4, we discuss
mappings of the mixed integer cut under automorphisms and homomorphisms of
groups and provide extensions of our results. We conclude with future research
directions.

2 Facets and Extreme Points of the Mixed Integer Cut


Throughout, we consider the master cyclic group polyhedron:
 

n−1
P (Cn , r) = conv t ∈ Z+ :
n−1
iti ≡ r (mod n) .
i=1

We will also frequently refer to the master knapsack polyhedron,


 

m
P (Km ) = conv x ∈ Zm + : ixi = m .
i=1

Further, we will always assume that r > 0 and that n ≥ 3. By observing that the
recession cone of P (Cn , r) is the non-negative orthant, one notes that P (Cn , r) is
of dimension n − 1. It is also easily observed that P (Km ) is of dimension m − 1.
By the assumption that n ≥ 3, it follows that the non-negativity constraints
are facet defining. In our discussion, these shall be referred to as the trivial facets.
126 S. Tyber and E.L. Johnson

Let (π, π0 ), denote the inequality



n−1
πi ti ≥ π0 .
t=1

When speaking of valid inequalities for the master knapsack polyhedra, we shall
use the same notation where entries are understood to be of appropriate dimen-
sion. Denote the mixed integer cut by (μ, 1), where

i
i≤r
μi = rn−i .
n−r i>r
For completeness, we include the following theorem to which we have already
referred:
Theorem 1 (Gomory [3]). (μ, 1) is a facet of P (Cn , r).
We consider the mixed integer cut as the polytope
PMIC (n, r) = P (Cn , r) ∩ {t : μt = 1}.
Since (μ, 1) is a facet of P (Cn , r) and P (Cn , r) is integral, PMIC (n, r) is also
integral. Note that a facet (π, π0 ) is adjacent to (μ, 1) if and only if it is a facet
of PMIC (n, r). We assume that 1 < r < n − 1, since otherwise the non-trivial
facets of PMIC (n, r) are clearly knapsack facets.
We shall now discuss the connection between PMIC (n, r) and the master knap-
sack polytopes P (Kr ) and P (Kn−r ). The following proposition highlights an
operation that we will call extending a knapsack solution.
Proposition 1. If x ∈ P (Kr ), x = (x1 , . . . , xr ), then t = (x1 , . . . , xr , 0, . . . , 0)
belongs to PMIC (n, r). Likewise, if x ∈ P (Kn−r ), x = (x1 , . . . , xn−r ), then t =
(0, . . . , 0, xn−r , . . . , x1 ) belongs to PMIC (n, r).
Proof. For x ∈ P (Kr ), the result is trivial. So take x ∈ P (Kn−r ). Since P (Kn−r )
is convex and integral, we may assume that x is integral. Rewriting i = n−(n−i)
for i = 1, . . . , r and applying the assumption that x is an integral knapsack
solution, the proposition follows.
In terms of facets, we shall focus on a family of facets introduced in [1]. Before
stating the theorem, we note that for any non-trivial knapsack facet, by taking
an appropriate linear combination with the knapsack equation, we may assume
the following:
Proposition 2. Let (ρ, ρ0 ) be a non-trivial facet of P (Km ). Without loss of
generality we may assume that (ρ, ρ0 ) ≥ 0, ρ0 = ρm = 1. Moreover, we may
assume there exists some i = m such that ρi = 0.
Theorem 2 (Aráoz et. al. [1]). Let (ρ, ρr ) be a non-trivial facet of P (Kr )
such that ρ ≥ 0, ρi = 0 for at least one i, and ρr = 1. Let
 
n−r−1 1
ρ = ρ1 , . . . , ρr = 1, ,..., .
n−r n−r
A Polyhedral Study of the Mixed Integer Cut 127

Then there exists some α ∈ R such that (π, π0 ) = (ρ + αμ, 1 + α) is a facet of


P (Cn , r).

Although not explicitly stated in [1], as an easy consequence of Theorem 2 and


Theorem 6 (Section 4), this operation can also be performed using non-trivial
facets of P (Kn−r ).
Proposition 3. Let (ρ, ρn−r ) be a non-trivial facet of P (Kn−r ) such that ρ ≥ 0,
ρi = 0 for at least one i, and ρn−r = 1. Let
 
1 r−1
ρ= ,..., , 1 = ρn−r , ρn−r−1 , . . . , ρ1 .
r r

Then there exists some α ∈ R such that (π, π0 ) = (ρ + αμ, 1 + α) is a facet of


P (Cn , r).
In particular, given any non-trivial facet of P (Kr ) or P (Kn−r ) we can construct
a facet of P (Cn , r). Such facets are called tilted knapsack facets and the α is
called the tilting coefficient. Details for calculating α are given in [1]. Applying
Proposition 1 we arrive at the following:
Lemma 1. The tilted knapsack facets are facets of PMIC (n, r).

Proof. We argue for facets tilted from P (Kr ); an analogous argument proves the
result for facets tilted from P (Kn−r ).
Let (π, π0 ) be tilted from (ρ, 1), and let ρ be as described in Theorem 2 and
α be the corresponding tilting coefficient. Since (ρ, 1) is a facet of P (Kr ), there
exist r − 1 affinely independent extreme points x1 , . . . , xr−1 satisfying (ρ, 1)
at equality. As described in Proposition 1, these points may be extended to
points t1 , . . . , tr−1 ∈ PMIC (n, r), and clearly this preserves affine independence.
Moreover, for i = 1, . . . , r − 1, μti = 1 and ρti = ρx = 1, thus

πti = (ρ + αμ)ti = ρti + α · μti = 1 + α = πr .

Now consider n−r affinely independent extreme points y 1 , . . . , y n−r of P (Kn−r ),


and again as in Proposition 1, extend them to points s1 , . . . , sn−r ∈ PMIC (n, r).

πsi = (ρ + αμ)si = ρsi + α · μsi = 1 + α = πr .

It is easily seen that {t1 , . . . , tr−1 } ∩ {s1 , . . . , sn−r } = er . Therefore we have


produced n − 2 affinely independent points, proving the claim.

Consider a tilted knapsack facet (π, π0 ) arising from the facet (ρ, 1) of P (Kr )
with tilting coefficient α. Letting μ denote first r coefficients of μ, the same
facet of P (Kr ) is described by (γ, 0) = (ρ, 1) − (μ , 1). In particular letting,

(γ̄, 0) = (γ1 , . . . , γr = 0, 0, . . . , 0),

it follows that (π, π0 ) = (γ̄, 0)+(1+α)(μ, 1). The same applies to tilted knapsack
facets arising from P (Kn−r ).
128 S. Tyber and E.L. Johnson

Therefore we will think of tilted knapsack facets as arising from facets of


the form (ρ, 0), and by subtracting off the mixed integer cut we think of tilted
knapsack facets in the form (ρ̄, 0).
We now prove our main result.
Theorem 3. The convex hull of PMIC (n, r) is given by the tilted knapsack facets
and the non-negativity constraints.

Proof. For convenience, say that P (Kr ) has non-trivial facets (ρ1 , 0), . . . , (ρM , 0)
and that P (Kn−r ) has non-trivial facets (γ 1 , 0), . . . , (γ N , 0). Let (ρ̄i , 0) and
(γ̄ i , 0) denote the tilted knapsack facets from (ρi , 0) and (γ i , 0) respectively.
We shall show that the system

min c · t
s.t. μ · t = 1
ρ̄i · t ≥ 0 i = 1, . . . M (2)
γ̄ i · t ≥ 0 i = 1, . . . N
t ≥0

attains an integer optimum that belongs to PMIC (n, r) for every c. 
Let c = (c1 , . . . ,!cr ) and c = (cn−1 , . . . , cr ), μ = 1r , . . . , r−1 
r , 1 , and μ =
1 n−r−1
n−r , . . . , n−r , 1 . Consider the systems

min c · x
s.t. μ · x = 1
(3)
ρi · x ≥ 0 i = 1, . . . M
x ≥0

and
min c · x
s.t. μ · x = 1
(4)
γ i · x ≥ 0 i = 1, . . . N
x ≥0
representing P (Kr ) and P (Kn−r ) respectively. Since both systems are integral,
the minima are obtained at integer extreme points x0 and x 1 respectively. Now

let t be obtained by extending the solution achieving the smaller objective value
to a feasible point of PMIC (n, r). Indeed this t∗ is feasible and integral; it remains
to show that it is optimal.
We now consider the duals. The dual of (3) is given by

max(λ1 : λ1 μ + α1 ρ1 + · · · + αM ρM ≤ c , α ≥ 0), (5)


λ1 ,α

and the dual of (4) is given by

max (λ2 : λ2 μ + β1 γ 1 + · · · + βN



γ N ≤ c , β  ≥ 0). (6)
λ2 ,β
A Polyhedral Study of the Mixed Integer Cut 129

Lastly the dual of (2) is given by


 
λμ + α1 ρ̄1 + · · · + αM ρ̄M + β1 γ̄ 1 + · · · + βN γ̄ N ≤ c
max λ : . (7)
λ,α,β α, β ≥ 0

11 , α0 ) and (λ
Let (λ 12 , β0 ) attain the maxima in (5) and (6) respectively. Setting
0 1 1
λ = min(λ1 , λ2 ), it easily follows from the zero pattern of (2) and non-negativity
of μ that (λ, 0 α0 , β0 ) is feasible to (7). Moreover λ
0 = c · t∗ , proving optimality.

Further observe that PMIC (n, r) is pointed, and so from this same proof we get
the following characterization of extreme points:
Theorem 4. A point t is an extreme point of PMIC (n, r) if and only if it can
be obtained by extending an extreme point of P (Kr ) or P (Kn−r ).

3 Integer Points of the Mixed Integer Cut

In this section we highlight a noteworthy extension of Theorem 4 to all integer


points of PMIC (n, r).
Theorem 5. t ∈ PMIC (n, r) ∩ Zn−1 if and only if t can be obtained by extending
an integer solution of P (Kr ) or P (Kn−r ).

Proof. If t = er the claim is obvious. So we suppose that tr = 0. We shall show


that if t ∈ PMIC (n, r) ∩ Zn−1 , tr = 0, then either (I) (t1 , . . . , tr ) > 0 or (II)
(tr , . . . , tn−1 ) > 0 but not both.
Since t ∈ P (Cn , r),

t1 + · · · + (r − 1)tr−1 + rtr + (r + 1)tr+1 + · · · + (n − 1)tn−1 ≡ r (mod n).

Thus there exists some β ∈ Z such that

t1 + · · · + (r − 1)tr−1 + rtr + (r + 1)tr+1 + · · · + (n − 1)tn−1 = r + βn

, and since r > 0, we may rewrite this


1 r−1 r+1 n−1 n
t1 + · · · + tr−1 + tr + tr+1 + · · · + tn−1 = 1 + β . (8)
r r r r r
Now, t ∈ PMIC (n, r) therefore

1 r−1 n−r−1 1
t1 + · · · + tr−1 + tr + tr+1 + · · · + tn−1 = 1
r r n−r n−r
or
" #
1 r−1 n−r−1 1
t1 + · · · + tr−1 + tr = 1 − tr+1 + · · · + tn−1 . (9)
r r n−r n−r
130 S. Tyber and E.L. Johnson

Substituting (9) into (8), we obtain


$ %
1− n−r−1
n−r tr+1 + · · · + n−r
1
tn−1 + r+1 r tr+1 + · · · + r tn−1 = 1 + β r
n−1 n

! !
⇒ r+1
r − n−r−1
n−r t r+1 + · · · + n−1
r − 1
n−r tn−1 = β r
n

⇒ n
r · n−r tr+1 + · · · + r · n−r tn−1
1 n n−r−1
= β nr
⇒ n−r tr+1 + · · · + n−r tn−1 =
1 n−r−1
β
" #
n−r−1 1
⇒ [tr+1 + · · · + tn−1 ] − tr+1 + · · · + tn−1 = β.
2 34 5 n−r n−r
(∗) 2 34 5
(∗∗)

Because t was assumed to be integral (∗) is necessarily integral. Suppose con-


versely that both (I) and (II) hold; by the assumption that tr = 0 and because
t is necessarily non-negative, the relation
" #
n−r−1 1 1 r−1
tr+1 + · · · + tn−1 = 1 − t1 + · · · + tr−1 + tr
n−r n−r r r

implies that (∗∗) must be fractional. But this contradicts that β is integral.
Therefore (I) and (II) cannot simultaneously hold.

4 Extensions under Automorphisms and Homomorphisms

Here we review some general properties of facets of the master group polyhedra
and discuss extensions of our previous results. Throughout, some basic knowledge
of algebra is assumed.
Let G be an abelian group with identity 0, G+ = G \ 0, and g0 ∈ G+ . The
master group polyhedron, P (G, g0 ) is defined by
⎧ ⎫
⎨  ⎬
|G|−1
P (G, g0 ) = conv t ∈ Z+ : gt(g) = g0 .
⎩ ⎭
g∈G+

Because |G|g = 0 for all g ∈ G+ , the recession cone of P (G, g0 ) is the non-negative
orthant, and since P (G, g0 ) is nonempty, the polyhedron is of full dimension.
As before, let (π, π0 ) denote the inequality

π(g)t(g) ≥ π0 .
g∈G+

If |G| − 1 ≥ 2, then the inequality t(g) ≥ 0 is facet defining for all g ∈ G+ , and
it is easily verified that these are the only facets with π0 = 0. Likewise, we call
these the trivial facets of P (G+ , g0 ).
A Polyhedral Study of the Mixed Integer Cut 131

4.1 Automorphisms

We are able to use automorphisms of G to obtain facets of P (G, g0 ) from other


master group polyhedra. Throughout, let φ be an automorphism of G.
Theorem 6 (Gomory [3], Theorem 14). If (π, π0 ) is a facet of P (G, g0 ),
with components, π(g), then (π  , π0 ) with components π  (g) = π(φ−1 (g)) is a
facet of P (G, φ(g0 )).
Similarly if t satisfies (π, π0 ) at equality, then t with components t (g) = t(φ−1 (g))
satisfies (π  , π0 ) at equality, and since φ is an automorphism of G, t necessarily
satisfies the group equation for P (G, φ(g0 )). As an obvious consequence, a point
t lies on the facet (π, π0 ) of P (G, g0 ) if and only if the corresponding point t lies
on the facet (π  , π0 ) of P (G, φ(g0 )). Hence we obtain the following:
Proposition 4. If (π, π0 ) and (γ, γ0 ) are facets of P (G, g0 ), then (γ, γ0 ) is
adjacent to (π, π0 ) if and only if (π  , π0 ) and (γ  , γ0 ) are adjacent facets of
P (G, φ(g0 )), where π  (g) = π(φ−1 (g)) and γ  (g) = γ(φ−1 (g)).

Proof. Since (γ, γ0 ) and (π, π0 ) define adjacent facets, there exist affinely in-
dependent points t1 , . . . , t(|G|−2) satisfying both at equality. By the previous
remarks, we may define points (t1 ) , . . . , (t(|G|−2) ) satisfying both (π  , π0 ) and
(γ  , γ0 ) at equality. Since these are all defined by the same permutation of the
indices of t1 , . . . , t(|G|−2) , affine independence is preserved.

Now consider the case when G = Cn , g0 = r. Let (μ , 1) be obtained by applying


φ to (μ, 1). Our previous results extend in the following sense:
Theorem 7. The non-trivial facets of P (Cn , φ(r)) adjacent to (μ , 1) are exactly
those obtained by applying φ to tilted knapsack facets.

Theorem 8. An integer point t ∈ P (Cn , φ(r)) satisfies (μ , 1) at equality if and


only if t is obtained by extending a knapsack solution of P (Kr ) or P (Kn−r ) and
applying φ to the indices of t.

4.2 Homomorphisms

Additionally one can obtain facets from homomorphisms of G by homomorphic


lifting. Let ψ : G → H be a homomorphism with kernel K such that g0 ∈
/ K. For
convenience let h0 = ψ(g0 ).
Theorem 9 (Gomory [3], Theorem 19). Let (π, π0 ) be a non-trivial facet
of P (H, h0 ). Then (π  , π0 ) is a facet of P (G, g0 ) where π  (g) = π(ψ(g)) for all
g ∈ G \ K, and π  (k) = 0 for all k ∈ K.
Unlike automorphisms, it is not clear that homomorphic lifting preserves the
adjacency of facets. We show next that it in fact does preserve adjacency.
132 S. Tyber and E.L. Johnson

First we prove the following useful proposition:


Proposition 5. Let (π, π0 ) and (γ, γ0 ) be adjacent non-trivial facets in P (H, h0 )
(h0 = 0). Then the affine subspace
T = P (H, h0 ) ∩ {t ∈ R|H|−1 : πt = π0 , γt = γ0 }
does not lie in the hyperplane H(h) = {t ∈ R|H|−1 : t(h) = 0} for any h ∈ H+ .
Proof. In [3], Gomory shows that every non-trivial facet (π, π0 ) of P (H, h0 )
satisfies π(h) + π(h0 − h) = π(h0 ) = π0 . In particular for all h ∈ H+ \ h0 , the
point t = eh + eh0 −h belongs to T , and has t(h) > 0. Similarly, the point t = eh0
belongs to T and has t(h0 ) > 0.
Using this proposition we obtain the following:
Lemma 2. Let (π, π0 ) and (γ, γ0 ) be adjacent non-trivial facets of P (H, h0 ),
and let (π  , π0 ) and (γ  , γ0 ) be facets of P (G, g0 ) obtained by homomorphic lifting
using the homomorphism ψ. Then (π  , π0 ) and (γ  , γ0 ) are adjacent.
Proof. Let K = ker(ψ). Let ϕ be a function selecting one element from each coset
of G/K distinct from K, and let ϕ(H) denote the set of coset representatives
chosen by ϕ.
Since we are assuming that (π  , π0 ) and (γ  , γ0 ) are obtained by homomorphic
lifting, h0 = 0. Since (π, π0 ) and (γ, γ0 ) are adjacent there exist affinely indepen-
dent points t1 , . . . , t(|H|−2) in P (H, h0 ) satisfying (π, π0 ) and (γ, γ0 ) at equality.
By Proposition 5, for all h ∈ H+ , there exists an i ∈ {1, . . . , |H| − 2} such that
ti (h) > 0.
Using these points, we will construct |G| − 2 affinely independent points be-
longing to P (G, g0 ) that satisfy both (π  , π0 ) and (γ  , γ0 ) at equality. We proceed
as follows:
1. Set N = H+
2. For i = {1, . . . , |H| − 2}
– Set N (i) = {h ∈ H+ : ti (h) > 0} ∩ N
– Define si as follows:
si (ϕ(h)) = ti (h), ∀h ∈ H
si (g) = 0, g ∈ G \ (K∪ ϕ(H))
si (k) = 1, k = g0 − g∈G+ \K si (g) · g

i
s (k) = 0, k = g0 − g∈G+ \K si (g) · g

– For each h ∈ N (i), k  ∈ K+ , define the point sik ,h as follows:


sik ,h (ϕ(h )) + k  ) = ti (h )
sik ,h (ϕ(h )) = 0
sik ,h (g) = si (g), g ∈ G+ \ K, g = ϕ(h ), g = ϕ(h ) + k 

i
sk ,h (k) = 1, k = g0 − g∈G+ \K sik ,h (g) · g

sik ,h (k) = 0, k = g0 − g∈G+ \K sik ,h (g) · g

– Set N = N \ N (i)
3. For each k ∈ K+ , define sk by sk = s1 + |G|ek
A Polyhedral Study of the Mixed Integer Cut 133

By construction these points satisfy (π  , π0 ) and (γ  , γ0 ) at equality. It remains


to verify that the above procedure indeed produces |G| − 2 affinely independent
points belonging to P (G, g0 ).
First we show that the above points belong to P (G, g0 ). Let s be one of the
above points. Then
⎛ ⎞ ⎛ ⎞
   
ψ⎝ gs(g)⎠ = ψ(g)s(g) = h·⎝ s(g)⎠ = h0 ,
g∈G+ \K g∈G+ \K h∈H+ g∈G+ :ψ(g)=h

where the first equality comes from the fact that ψ is a homomorphism and the
second equality follows by how we defined the above points. Therefore,

gs(g) ∈ g0 K,
g∈G+ \K

and by construction,
⎛ ⎞
 
ks(k) = g0 − ⎝ gs(g)⎠ .
k∈K g∈G+ \K

Thus s ∈ P (G, g0 ).
Note that we have the |H| − 2 points s1 , . . . , s|H|−2 . By Proposition 5, we
obtain (|H| − 1)(|K| − 1) points of the form sk,h for k ∈ K+ and h ∈ H+ , and
lastly, we obtain |K| − 1 points, sk for k ∈ K+ . Using the identity |G| = |K||H|,
it immediately follows that we have |G| − 2 points.
The affine independence of these points is easily verified. By constructing a
matrix for which the first |K| − 1 columns correspond to K, the next |H| − 1
columns corresponding to ϕ(H), and the remaining columns are arranged in
blocks by the cosets, it is readily observed by letting each row be one of the
above points and using the affine independence of t1 , . . . , t|H|−2 that the newly
defined points are affinely independent.
Given a point s ∈ P (G, g0 ) that satisfies the lifted facets at equality, we can
obtain a point t ∈P (H, h0 ) that satisfies (π, π0 ) and (γ, γ0 ) at equality under the
mapping t(h) = g∈G:ψ(g)=h s(g). By a fairly routine exercise in linear algebra,
one can use this to verify that s is in the affine hull of the points described above.
Hence we obtain the following theorem:
Theorem 10. Let (π, π0 ) and (γ, γ0 ) be non-trivial facets of P (H, h0 ), and let
(π  , π0 ) and (γ  , γ0 ) be facets of P (G, g0 ) obtained by homomorphic lifting using
the homomorphism ψ. Then (π  , π0 ) and (γ  , γ0 ) are adjacent if and only if (π, π0 )
and (γ, γ0 ) are adjacent.
Now consider G = Cn , g0 = r , a homomorphism ψ : Cn → Cn , ψ(r ) = r = 0,
and let (μ , 1) be obtained by applying homomorphic lifting to (μ, 1). Similarly
by applying Theorem 10, we know that the only lifted facets under ψ that are
adjacent to (μ , 1) come from tilted knapsack facets. Stated precisely:
134 S. Tyber and E.L. Johnson

Theorem 11. Let (π  , π0 ) be obtained by homomorphic lifting using ψ applied


to (π, π0 ). Then (π  , π0 ) is adjacent to (μ , 1) if and only if (π, π0 ) is a tilted
knapsack facet.
Moreover, for the integer points we obtain the following:
Theorem 12. If an integer point s ∈ P (Cn , r ) satisfies (μ , 1) at equality. Then
the point t defined by the mapping

ti = sj
j:ψ(j)=i

is an integer point of P (Cn , r) and satisfies (μ, 1) at equality. In particular it is


obtained from extending a knapsack solution of P (Kr ) or P (Kn−r ).

5 Future Work and Conclusions


Several questions remain for both the group polyhedron and knapsack polytope.
One worthy avenue of research is to expand the existing library of knapsack
facets, which in turn will provide even more information about the mixed integer
cut.
Another interesting problem is to obtain non-trivial necessary and sufficient
conditions to describe the extreme points of the master knapsack polytope and
the master group polyhedron. A natural idea was considered for the group poly-
hedron in terms of irreducibility. This condition is necessary for all vertices, but
insufficient. One might hope that this condition becomes sufficient for the master
knapsack polytope; however, it again fails.
Lastly, a closer inspection will reveal that in homomorphic lifting we gain no
information about the kernel of our homomorphism. If we consider the lifted
mixed integer cut as a polyhedron, it is no longer sufficient to characterize its
extreme points in terms of two related knapsacks. Similarly, it is easy to see that
lifted tilted knapsack facets are not the only adjacent non-trivial facets of the
lifted mixed integer cut. One might address whether there exists a family of facets
that when added to the lifted tilted knapsack facets completely characterizes the
adjacent facets of the lifted mixed integer cut.

References
1. Aráoz, J., Evans, L., Gomory, R.E., Johnson, E.L.: Cyclic group and knapsack facets.
Math. Program. 96(2), 377–408 (2003)
2. Dash, S., Günlük, O.: On the strength of gomory mixed-integer cuts as group cuts.
Math. Program. 115(2), 387–407 (2008)
3. Gomory, R.E.: Some polyhedra related to combinatorial problems. Linear Algebra
and Its Applications (2), 451–558 (1969)
4. Gomory, R.E., Johnson, E.L., Evans, L.: Corner polyhedra and their connection
with cutting planes. Math. Program. 96(2), 321–339 (2003)
Symmetry Matters for the Sizes of Extended
Formulations

Volker Kaibel, Kanstantsin Pashkovich, and Dirk O. Theis

Otto-von-Guericke-Universität Magdeburg, Institut für Mathematische Optimierung


Universitätsplatz 2, 39108 Magdeburg, Germany
{kaibel,pashkovich,theis}@ovgu.de

Abstract. In 1991, Yannakakis [17] proved that no symmetric extended


formulation for the matching polytope of the complete graph Kn with n
nodes has a number of variables and constraints that is bounded subex-
ponentially in n. Here, symmetric means that the formulation remains
invariant under all permutations of the nodes of Kn . It was also conjec-
tured in [17] that “asymmetry does not help much,” but no correspond-
ing result for general extended formulations has been found so far. In
this paper we show that for the polytopes associated with the matchings
in Kn with log n edges there are non-symmetric extended formulations
of polynomial size, while nevertheless no symmetric extended formula-
tion of polynomial size exists. We furthermore prove similar statements
for the polytopes associated with cycles of length log n. Thus, with
respect to the question for smallest possible extended formulations, in
general symmetry requirements may matter a lot.

1 Introduction
Linear Programming techniques have proven to be extremely fruitful for com-
binatorial optimization problems with respect to both structural analysis and
the design of algorithms. In this context, the paradigm is to represent the prob-
lem by a polytope P ⊆ Rm whose vertices correspond to the feasible solutions
of the problem in such a way that the objective function can be expressed by
a linear functional x %→ "c, x# on Rm (with some c ∈ Rm ). If one succeeds in
finding a description of P by means of linear constraints, then algorithms as
well as structural results from Linear Programming can be exploited. In many
cases, however, the polytope P has exponentially (in m) many facets, thus P
can only be described by exponentially many inequalities. Also it may be that
the inequalities needed to describe P are too complicated to be identified.
In some of these cases one may find an extended formulation for P , i.e., a
(preferably small and simple) description by linear constraints of another poly-
hedron Q ⊆ Rd in some higher dimensional space that projects to P via some
(simple) linear map p : Rd → Rm with p(y) = T y for all y ∈ Rd (and some
matrix T ∈ Rm×d ). Indeed, if p : Rm → Rd with p (x) = T t x for all x ∈ Rm
denotes the linear map that is adjoint to p (with respect to the standard bases),
then we have max{"c, x# : x ∈ P } = max{"p (c), y# : y ∈ Q}.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 135–148, 2010.

c Springer-Verlag Berlin Heidelberg 2010
136 V. Kaibel, K. Pashkovich, and D.O. Theis

As for an example, let us consider the spanning tree polytope Pspt (n) =
conv{χ(T ) ∈ {0, 1}En : T ⊆ En spanning tree of Kn }, where Kn = ([n], En )
denotes the complete graph with node set [n] = {1, . . . , n} and edge set En =
{{v, w} : v, w ∈ [n], v = w}, and χ(A) ∈ {0, 1}B is the characteristic vector of
the subset A ⊆ B of B, i.e., for all b ∈ B, we have χ(A)b = 1 if and only if b ∈ A.
Thus, Pspt (n) is the polytope associated with the bases of the graphical matroid
of Kn , and hence (see [7]), it consists of all x ∈ RE + satisfying x(En ) = n − 1
n

and x(En (S)) ≤ |S| − 1 for all ⊆ [n] with 2 ≤ |S| ≤ n − 1, where RE + is the
nonnegative orthant of R E
, we denote by En (S) the subset of all edges with both
nodes in S, and x(F ) = e∈F xe for F ⊆ En . This linear description of Pspt (n)
has an exponential (in n) number of constraints, and as all the inequalities define
pairwise disjoint facets, none of them is redundant.
The following much smaller exended formulation for Pspt (n) (with O(n3 ) vari-
ables and constraints) appears in [5] (and a similar one in [17], who attributes
it to [13]). Let us introduce additional 0/1-variables ze,v,u for all e ∈ En , v ∈ e,
and u ∈ [n] \ e. While each spanning tree T ⊆ En is represented by its char-
acteristic vector x(T ) = χ(T ) in Pspt (n), in the extended formulation it will be
(T )
represented by the vector y (T ) = (x(T ) , z (T ) ) with ze,v,u = 1 (for e ∈ En , v ∈ e,
u ∈ [n] \ e) if and only if e ∈ T and u is contained in the component of v in T \ e.
The polyhedron Qspt (n) ⊆ Rd defined by the nonnegativity constraints x ≥ 0,
z ≥ 0, the equations x(En ) = n − 1, x{v,w} − z{v,w},v,u  − z{v,w},w,u = 0 for all
pairwise distinct v, w, u ∈ [n], as well as x{v,w} + u∈[n]\{v,w} z{v,u},u,w = 1 for
all distinct v, w ∈ [n], satisfies p(Qspt (n)) = Pspt (n), where p : Rd → RE is the
orthogonal projection onto the x-variables.
For many other polytopes (with exponentially many facets) associated with
polynomial time solvable combinatorial optimization problems polynomially sized
extended formulations can be constructed as well (see, e.g., the recent survey [5]).
Probably the most prominent problem in this class for which, however, no such
small formulation is known, is the matching problem. In fact, Yannakakis [17]
proved that no symmetric polynomially sized extended formulation of the match-
ing polytope exists.
Here, symmetric refers to the symmetric group S(n) of all permutations
π : [n] → [n] of the node set [n] of Kn acting on En via π.{v, w} = {π(v), π(w)}
for all π ∈ S(n) and {v, w} ∈ En . Clearly, this action of S(n) on En induces
an action on the set of all subsets of En . For instance, this yields an action
on the spanning trees of Kn , and thus, on the vertices of Pspt (n). The ex-
tended formulation of Pspt (n) discussed above is symmetric in the sense that,
for every π ∈ S(n), replacing all indices associated with edges e ∈ En and
nodes v ∈ [n] by π.e and π.v, respectively, does not change the set of constraints
in the formulation. Phrased informally, all subsets of nodes of Kn of equal cardi-
nality play the same role in the formulation. For a general definition of symmetric
extended formulations see Section 2.
In order to describe the main results of Yannakakis paper [17] and the
contributions of the present paper, let us denote by M (n) = {M ⊆ En :
M matching in Kn , |M | = } the set of all matchings of size  (a matching
Symmetry Matters for the Sizes of Extended Formulations 137

being a subset of edges no two of which share a node), and by Pmatch (n) =
conv{χ(M ) ∈ {0, 1}En : M ∈ M (n)} the associated polytope. According to
n/2
Edmonds [6] the perfect matching polytope Pmatch (n) (for even n) is described
by
n/2
Pmatch (n) = {x ∈ RE
+ : x(δ(v)) = 1 for all v ∈ [n],
n

x(E(S)) ≤ (|S| − 1)/2 for all S ⊆ [n], 3 ≤ |S| odd} (1)

(with δ(v) = {e ∈ En : v ∈ e}). Yannakakis [17, Thm.1 and its proof] shows that
n/2
there is a constant C > 0 such that, for every extended formulation for Pmatch (n)
(with n even) that is symmetric
 n  in the sense above, the number of variables and
constraints is at least C · n/4 = 2Ω(n) . This in particular implies that there is
no polynomial size symmetric extended formulation for the matching polytope
of Kn (the convex hulls of characteristic vectors of all matchings in Kn ), of which
the perfect matching polytope is a face.
Yannakakis [17] also obtains a similar (maybe less surprising) result on travel-
ing salesman polytopes. Denoting the set of all (simple) cycles of length  in Kn
by C  (n) = {C ⊆ En : C cycle in Kn , |C| = }, and the associated polytopes by
Pcycl (n) = conv{χ(C) ∈ {0, 1}En : C ∈ C  (n)}, the traveling salesman polytope
n/2
is Pncycl (n). Identifying Pmatch (n) (for even n) with a suitable face of P3n
cycl (3n),
Yannakakis concludes that all symmetric extended formulations for Pncycl (n) have
size at least 2Ω(n) as well [17, Thm. 2 and its proof].
Yannakakis’ results in a fascinating way illuminate the borders of our principal
abilities to express combinatorial optimization problems like the matching or the
traveling salesman problem by means of linear constraints. However, they only
refer to linear descriptions that respect the inherent symmetries in the problems.
In fact, the second open problem mentioned in the concluding section of [17] is
described as follows: “We do not think that asymmetry helps much. Thus, prove
that the matching and TSP polytopes cannot be expressed by polynomial size
LP’s without the asymmetry assumption.”
The contribution of our paper is to show that, in contrast to the assumption
expressed in the quotation above, asymmetry can help much, or, phrased differ-
ently, that symmetry requirements on extended formulations indeed can matter
significantly with respect to the minimal sizes of extended formulations. Our
log n log n
main results are that both Pmatch (n) and Pcycl (n) do not admit symmetric
extended formulations of polynomial size, while they have non-symmetric ex-
tended formulations of polynomial size (see Cor. 1 and 2 for matchings, as well as
Cor. 3 and 4 for cycles). The corresponding theorems from which these corollar-
ies are derived provide some more general and more precise results for Pmatch (n)
and Pcycl(n). In order to establish the lower bounds for symmetric extensions,
we generalize the techniques developed by Yannakakis [17]. The constructions
of the compact non-symmetric extended formulations rely on small families of
perfect hash functions [1,8,15].
The paper is organized as follows. In Section 2, we provide definitions of
extensions, extended formulations, their sizes, the crucial notion of a section
138 V. Kaibel, K. Pashkovich, and D.O. Theis

of an extension, and we give some auxilliary results. In Section 3, we present


Yannakakis’ method to derive lower bounds on the sizes of symmetric extended
formulations for perfect matching polytopes in a general setting, which we then
exploit in Section 4 in order to derive lower bounds on the sizes of symmetric
extended formulations for the polytopes Pmatch (n) associated with cardinality
restricted matchings. In Section 5, we describe our non-symmetric extended for-
multions for these polytopes. Finally, in Section 6 we present the results on
Pcycl (n). Some remarks conclude the paper in Section 7.

2 Extended Formulations, Extensions, and Symmetry

An extension of a polytope P ⊆ Rm is a polyhedron Q ⊆ Rd together with


a projection (i.e., a linear map) p : Rd → Rm with p(Q) = P ; it is called a
subspace extension if Q is the intersection of an affine subspace of Rd and the
nonnegative orthant Rd+ . For instance, the polyhedron Qspt (n) defined in the
Introduction is a subspace extension of the spanning tree polytope Pspt (n). A
(finite) system of linear equations and inequalities whose solutions are the points
in an extension Q of P is an extended formulation for P . The size of an extension
is the number of its facets plus the dimension of the space it lies in. The size of
an extended formulation is its number of inequalities (including nonnegativity
constraints, but not equations) plus its number of variables. Clearly, the size
of an extended formulation is at least as large as the size of the extension it
describes. Conversely, every extension is described by an extended formulation
of at most its size.
Extensions or extended formulations of a family of polytopes P ⊆ Rm (for
varying m) are compact if their sizes and the encoding lengths of the coeffi-
cients needed to describe them can be bounded by a polynomial in m and the
maximal encoding length of all components of all vertices of P . Clearly, the
extension Qspt (n) of Pspt (n) from the Introduction is compact.
In our context, sections s : X → Q play a crucial role, i.e., maps that assign
to every vertex x ∈ X of P some point s(x) ∈ Q ∩ p−1 (x) in the intersection
of the polyhedron Q and the fiber p−1 (x) = {y ∈ Rd : p(y) = x} of x under
the projection p. Such a section induces a bijection between X and its image
s(X) ⊆ Q, whose inverse is given by p. In the spanning tree example from the
Introduction, the assignment χ(T ) %→ y (T ) = (x(T ) , z (T ) ) defined such a section.
Note that, in general, sections will not be induced by linear maps. In fact, if a
section is induced by a linear map s : Rm → Rd , then the intersection of Q with
the affine subspace of Rd generated by s(X) is isomorphic to P , thus Q has at
least as many facets as P .
For a family F of subsets of X, an extension Q ⊆ Rd is said to be indexed
by F if there is a bijection between F and [d] such that (identifying RF with
Rd via this bijection) the map 1F = (1F )F ∈F : X → {0, 1}F whose component
functions are the characteristic functions 1F : X → {0, 1} (with 1F (x) = 1 if and
only if x ∈ F ), is a section for the extension, i.e., 1F (X) ⊆ Q and p(1F (x)) = x
hold for all x ∈ X. For instance, the extension Qspt (n) of Pspt (n) is indexed by
Symmetry Matters for the Sizes of Extended Formulations 139

the family {T (e) : e ∈ En } ∪{T (e, v, u) : e ∈ En , v ∈ e, u ∈ [n] \ e}, where T (e)


contains all spanning trees using edge e, and T (e, v, u) consists of all spanning
trees in T (e) for which u and v are in the same component of T \ {e}.
In order to define the notion of symmetry of an extension precisely, let the
group S(d) of all permutations of [d] = {1, . . . , d} act on Rd by coordinate
permutations. Thus we have (σ.y)j = yσ−1 (j) for all y ∈ Rd , σ ∈ S(d), and
j ∈ [d].
Let P ⊆ Rm be a polytope and G be a group acting on Rm with π.P = P
for all π ∈ G, i.e., the action of G on Rm induces an action of G on the set X
of vertices of P . An extension Q ⊆ Rd of P with projection p : Rd → Rm
is symmetric (with respect to the action of G), if for every π ∈ G there is a
permutation κπ ∈ S(d) with κπ .Q = Q and

p(κπ .y) = π.p(y) for all y ∈ Rd . (2)

The prime examples of symmetric extensions arise from extended formula-


tions that “look symmetric”. To be more precise, we define an extended for-
mulation A= y = b= , A≤ y ≤ b≤ describing the polyhedron Q = {y ∈ Rd :
A= y = b= , A≤ y ≤ b≤ } extending P ⊆ Rm as above to be symmetric (with re-
spect to the action of G on the set X of vertices of P ), if for every π ∈ G
there is a permutation κπ ∈ S(d) satisfying (2) and there are two permuta-
≤ ≤ ≤
tions = = =
π and π of the rows of (A , b ) and (A , b ), respectively, such that
the corresponding simultaneous permutations of the columns and the rows of the
matrices (A= , b= ) and (A≤ , b≤ ) leaves them unchanged. Clearly, in this situation
the permutations κπ satisfy κπ .Q = Q, which implies the following.
Lemma 1. Every symmetric extended formulation describes a symmetric
extension.
One example of a symmetric extended formulation is the extended formulation
for the spanning tree polytope described in the Introduction (with respect to
the group G of all permutations of the nodes of the complete graph).
For the proof of the central result on the non-existence of certain symmetric
subspace extensions (Theorem 1), a weaker notion of symmetry will be sufficient.
We call an extension as above weakly symmetric (with respect to the action
of G) if there is a section s : X → Q for which the action of G on s(X)
induced by the bijection s works by permutation of variables, i.e., for every
π ∈ G there is a permutation κπ ∈ S(d) with s(π.x) = κπ .s(x) for all x ∈ X.
The following statement (and its proof, for which we refer to [12]) generalizes
the construction of sections for symmetric extensions of matching polytopes
described in Yannakakis’ paper [17, Claim 1 in the proof of Thm. 1].
Lemma 2. Every symmetric extension is weakly symmetric.
Finally, the following result (again, we refer to [12] for a proof) will turn out to
be useful in order to derive lower bounds on the sizes of symmetric extensions
for one polytope from bounds for another one.
140 V. Kaibel, K. Pashkovich, and D.O. Theis

Lemma 3. Let Q ⊆ Rd be an extension of the polytope P ⊆ Rm with projection


p : Rd → Rm , and let the face P  of P be an extension of a polytope R ⊆ Rk
with projection q : Rm → Rk . Then the face Q = p−1 (P  ) ∩ Q ⊆ Rd of Q is an
extension of R via the composed projection q ◦ p : Rd → Rk .
If the extension Q of P is symmetric with respect to an action of a group G
on Rm (with π.P = P for all π ∈ G), and a group H acts on Rk such that, for
every τ ∈ H, we have τ.R = R, and there is some πτ ∈ G with πτ .P  = P  and
q(πτ .x) = τ.q(x) for all x ∈ Rm , then the extension Q of R is symmetric (with
respect to the action of the group H).

3 Yannakakis’ Method
Here, we provide an abstract view on the method used by Yannakakis [17] in or-
der to bound from below the sizes of symmetric extensions for perfect matching
polytopes, without referring to these concrete poytopes. That method is capable
of establishing lower bounds on the number of variables of weakly symmetric
subspace extensions of certain polytopes. By the following lemma, which is ba-
sically Step 1 in the proof of [17, Theorem 1], such bounds imply similar lower
bounds on the dimension of the ambient space and the number of facets for
general symmetric extensions (that are not necessarily subspace extensions).

Lemma 4. If, for a polytope P , there is a symmetric extension in Rd̃ with f


facets, then P has also a symmetric subspace extension in Rd with d ≤ 2d˜ + f .
The following simple lemma provides the strategy for Yannakakis’ method, which
we need to extend slightly by allowing restrictions to affine subspaces.
Lemma 5. Let Q ⊆ Rd be a subspace extension of the polytope P ⊆ Rm with
vertex set X ⊆ Rm , and let s : X → Q be a section for the extension. If S ⊆ Rm
is an affine subspace, and, for some X ⊆ X ∩S, the coefficients cx ∈ R (x ∈ X )
yield an affine combination of a nonnegative vector
 
cx s(x) ≥ 0d with cx = 1 , (3)
x∈X  x∈X 

from the section images of the vertices in X , then x∈X  cx x ∈ P ∩ S holds.

Proof. Since Q is a subspace extension, we obtain x∈X  cx s(x) ∈ Q from
s(x) ∈ Q (for all x ∈ X ). Thus, if p : Rd → Rm is the projection of the
extension, we derive
  
P ' p( cx s(x)) = cx p(s(x)) = cx x . (4)
x∈X  x∈X  x∈X 

As S is an affine subspace containing X , we also have x∈X  cx x ∈ S.

Due to Lemma 5 one can prove that subspace extensions of some polytope P
with certain properties do not exist by finding, for such a hypothetical extension,
Symmetry Matters for the Sizes of Extended Formulations 141

a subset X of vertices of P and an affine subspace S containing


 X , for which
one can construct coefficients cx ∈ R satisying (3) such that x∈X  cx x violates
some inequality that is valid for P ∩ S.
Actually, following Yannakakis, we will not apply Lemma 5 directly to a hy-
pothetical small weakly symmetric subspace extension, but we will rather first
construct another subspace extension from the one assumed to exist that is in-
dexed by some convenient family F . We say that an extension Q of a polytope P
is consistent with a family F of subsets of the vertex set X of P if there is a
section s : X → Q for the extension such that, for every component function sj
of s, there is a subfamily Fj of F such that sj is constant on every set in Fj , and
the sets in Fj partition X. In this situation, we also call the section s consistent
with F . The proof of the following lemma can be found in [12].
Lemma 6. If P ⊆ Rm is a polytope and F is a family of vertex sets of P for
which there is some extension Q of P that is consistent with F , then there is
some extension Q for P that is indexed by F . If Q is a subspace extension,
then Q can be chosen to be a subspace extension as well.
Lemmas 5 and 6 suggest the following strategy for proving that subspace exten-
sions of some polytope P with certain properties (e.g., being weakly symmetric
and using at most B variables) do not exist by (a) exhibiting a family F of
subsets of the vertex set X of P with which such an extension would be consis-
tent and (b) determining a subset X ⊂ X of vertices and an affine subspace S
containing X , for which one can construct coefficients cx ∈ R satisying
 
cx 1F (x) ≥ 0F with cx = 1 , (5)
x∈X  x∈X 

such that x∈X  cx x violates some inequality that is valid for P ∩ S.
Let us finally investigate more closely the sections that come with weakly
symmetric extensions. In particular, we will discuss an approach to find suitable
families F within the strategy mentioned above in the following setting. Let
Q ⊆ Rd be a weakly symmetric extension of the polytope P ⊆ Rm (with respect
to an action of the group G on the vertex set X of P ) along with a section
s : X → Q such that for every π ∈ G there is a permutation κπ ∈ S(d) that
satisfies s(π.x) = κπ .s(x) for all x ∈ X (with (κπ .s(x))j = sκ−1
π (j)
(x)).
In this setting, we can define an action of G on the set S = {s1 , . . . , sd } of
the component functions of the section s : X → Q with π.sj = sκ−1 (j) ∈ S for
π −1
each j ∈ [d]. In order to see that this definition indeed is well-defined (note that
s1 , . . . , sd need not be pairwise distinct functions) and yields a group action,
observe that, for each j ∈ [d] and π ∈ G, we have

(π.sj )(x) = sκ−1 (j) (x) = (κπ−1 .s(x))j = sj (π −1 .x) for all x ∈ X , (6)
π −1

from which one deduces 1.sj = sj for the one-element 1 in G as well as (ππ  ).sj =
π.(π  .sj ) for all π, π  ∈ G. The isotropy group of sj ∈ S under this action is
isoG (sj ) = {π ∈ G : π.sj = sj }. From (6) one sees that, for all x ∈ X and
142 V. Kaibel, K. Pashkovich, and D.O. Theis

π ∈ isoG (sj ), we have sj (x) = sj (π −1 .x). Thus, sj is constant on every orbit of


the action of the subgroup isoG (sj ) of G on X. We conclude the following.
Remark 1. In the setting described above, if F is a family of subsets of X such
that, for each j ∈ [d], there is a sub-family Fj partitioning X and consisting of
vertex sets each of which is contained in an orbit under the action of isoG (sj )
on X, then s is consistent with F .
In general, it will be impossible to identify the isotropy groups isoG (sj ) without
more knowledge on the section s. However, for each isotropy group isoG (sj ), one
can at least bound its index (G : isoG (sj )) in G.
Lemma 7. In the setting described above, we have (G : isoG (sj )) ≤ d .
Proof. This follows readily from the fact that the index (G : isoG (sj )) of the
isotropy group of the element sj ∈ S under the action of G on S equals the
cardinality of the orbit of sj under that action, which due to |S| ≤ d, clearly is
bounded from above by d.
The bound provided in Lemma 7 can become useful, in case one is able to
establish a statement like “if isoG (sj ) has index less than τ in G then it contains
a certain subgroup Hj ”. Choosing Fj as the family of orbits of X under the action
of the subgroup Hj of G, then F = F1 ∪ · · · ∪ Fd is a familiy as in Remark 1.
If this family (or any refinement of it) can be used to perform Step (b) in the
strategy outlined in the paragraph right after the statement of Lemma 6, then
one can conclude the lower bound d ≥ τ on the number of variables d in an
extension as above.

4 Bounds on Symmetric Extensions of Pmatch(n)


In this section, we use Yannakakis’ method described in Section 3 to prove the
following result.
Theorem 1. For every n ≥ 3 and odd  with  ≤ n2 , there exists no weakly sym-
 n 
metric subspace extension for Pmatch (n) with at most (−1)/2 variables (with
respect to the group S(n) acting via permuting the nodes of Kn as described in
the Introduction).
From Theorem 1, we can derive the following more general lower bounds. Since
we need it in the proof of the next result, and also for later reference, we state
a simple fact on binomial coefficients first.
 
Lemma 8. For each constant b ∈ N there is some constant β > 0 with M−b ≥
M  N
β N for all large enough M ∈ N and N ≤ M 2 .
Theorem 2. There is a constant C > 0 such that, for all n and 1 ≤  ≤
n 
2 , the size of every extension for Pmatch (n) that is symmetric (with respect
to the group S(n) acting via permuting the  nodes of Kn as described in the
Introduction) is bounded from below by C · (−1)/2
n
.
Symmetry Matters for the Sizes of Extended Formulations 143

Proof. For odd , this follows from Theorem 1 using Lemmas 1, 2, and 4. For
match (n − 2) is (isomorphic to) a face of Pmatch (n) defined
even , the polytope P−1 −1

by xe = 1 for an arbitrary edge e of Kn . From this, as  − 1 is odd (and not


larger than (n − 2)/2) with ( − 2)/2 = ( − 1)/2, and due to Lemma 8, the
theorem follows by Lemma 3.
For even n and  = n/2, Theorem 2 provides a similar bound to Yannakakis
result (see Step 2 in the proof of [17, Theorem 1]) that no weakly symmetric
subspace extension of the perfect
  matching polytope of Kn has a number of
variables that is bounded by nk for any k < n/4.
Theorem 2 in particular implies that the size of every symmetric extension for
Pmatch (n) with Ω(log n) ≤  ≤ n/2 is bounded from below by nΩ(log n) , which
has the following consequence.
Corollary 1. For Ω(log n) ≤  ≤ n/2, there is no compact extended formulation
for Pmatch (n) that is symmetric (with respect to the group G = S(n) acting via
permuting the nodes of Kn as described in the Introduction).
The rest of this section is devoted to indicate the proof of
 Theorem 1. Through-
out, with  = 2k + 1, we assume that Q ⊆ Rd with d ≤ nk is a weakly symmetric
subspace extension of P2k+1match (n) for 4k + 2 ≤ n. We will only consider the case
k ≥ 1, as for  = 1 the theorem trivially is true (note that we restrict to n ≥ 3).
Weak symmetry is meant with respect to the action of G = S(n) on the set X of
match (n) as described in the Introduction, and we assume s : X → Q
vertices of P2k+1
to be a section as required in the definition of weak symmetry. Thus, we have
X = {χ(M ) ∈ {0, 1}En : M ∈ M2k+1 (n)}, where M2k+1 (n) is the set of all
matchings M ⊆ En with |M | = 2k + 1 in the complete graph Kn = (V, E) (with
V = [n]), and (π.χ(M )){v,w} = χ(M ){π−1 (v),π−1 (w)} holds for all π ∈ S(n),
M ∈ M2k+1 (n), and {v, w} ∈ E.
In order to identify suitable subgroups of the isotropy groups isoS(n) (sj ) (see
the remarks at the end of Section 3), we use the following result on subgroups of
the symmetric group S(n), where A(n) ⊆ S(n) is the alternating group formed
by all even permutations of [n]. This result is Claim 2 in the proof of Thm. 1 of
Yannakakis paper [17]. Its proof relies on a theorem of Bochert’s [3] stating that
any subgroup of S(m) that acts primitively on [m] and does not contain A(m)
has index at least (m + 1)/2! in S(m) (see [16, Thm. 14.2]).
 
Lemma 9. For each subgroup U of S(n) with (S(n) : U ) ≤ nk for k < n4 , there
is a W ⊆ [n] with |W | ≤ k and Hj = {π ∈ A(n) : π(v) = v for all v ∈ W } ⊆ U .
 
As we assumed d ≤ nk (with k < n4 due to 4k + 2 ≤ n), Lemmas 7 and 9 imply
Hj ⊆ isoS(n) (sj ) for all j ∈ [d]. For each j ∈ [d], two vertices χ(M ) and χ(M  )

match (n) (with M, M ∈ M
of P2k+1 2k+1
(n)) are in the same orbit under the action
of the group Hj if and only if we have
M ∩ E(Vj ) = M  ∩ E(Vj ) and Vj \ M = Vj \ M  . (7)

Indeed, it is clear that (7) holds if we have χ(M ) = π.χ(M ) for some per-
mutation π ∈ Hj . In turn, if (7) holds, then there clearly is some permu-
tation π ∈ S(n) with π(v) = v for all v ∈ Vj and M  = π.M . Due to
144 V. Kaibel, K. Pashkovich, and D.O. Theis

|M | = 2k + 1 > 2|Vj | there is some edge {u, w} ∈ M with u, w ∈ Vj . De-


noting by τ ∈ S(n) the transposition of u and w, we thus also have πτ (v) = v
for all v ∈ Vj and M  = πτ.M . As one of the permutations π and πτ is even,
say π  , we find π  ∈ Hj and M  = π  .M , proving that M and M  are contained
in the same orbit under the action of Hj .
As it will be convenient for Step (b) (referring to the strategy described after
the statement of Lemma 6), we will use the following refinements of the parti-
tionings of X into orbits of Hj (as mentioned at the end of Section 3). Clearly,
for j ∈ [d] and M, M  ∈ M2k+1 (n),

M \ E(V \ Vj ) = M  \ E(V \ Vj ) (8)

implies (7). Thus, for each j ∈ [d], the equivalence classes of the equivalence
relation defined by (8) refine the partitioning of X into orbits under Hj , and
we may use the collection of all these equivalence classes (for all j ∈ [d]) as the
family F in Remark 1. With

Λ = {(A, B) : A ⊆ E matching and there is some j ∈ [d] with


A ⊆ E \ E(V \ Vj ), B = Vj \ V (A)} ,
/
(with V (A) = a∈A a) we hence have F = {F (A, B) : (A, B) ∈ Λ} , where

F (A, B) = {χ(M ) : M ∈ M2k+1 (n), A ⊆ M ⊆ E(V \ B)} .

In order to construct a subset X ⊆ X which will be used to derive a con-


tradiction as mentioned after Equation (5), we choose two arbitrary disjoint
subsets V , V ⊂ V of nodes with |V | = |V | = 2k + 1, and define M =
{M ∈ M2k+1 (n) : M ⊆ E(V ∪ V )} as well as X = {χ(M ) : M ∈ M }.
Thus, M is the set of perfect matchings on K(V ∪ V ). Clearly, X is con-
tained in the affine subspace S of RE defined by xe = 0 for all e ∈ E \E(V ∪V ).
match (n) ∩ S of Pmatch (n), and for this
In fact, X is the vertex set of the face P2k+1 2k+1

face the inequality x(V : V ) ≥ 1 is valid (where (V : V ) is the set of all edges

having one node in V and the other one in V ), since every matching M ∈ M
intersects (V : V ) in an odd number of edges. Therefore, in order to derive the
 it suffices to find cx ∈ R  (for all x ∈
desired
 contradiction, X ) with
x∈X  cx = 1, x∈X  cx · 1F (x) ≥ 0F , and x∈X  cx e∈(V :V  ) xe = 0. For
the details on how this can be done we refer to [12].

5 A Non-symmetric Extension for Pmatch(n)


We shall establish the following result on the existence of extensions for cardi-
nality restricted matching polytopes in this section.
Theorem 3. For all n and , there are extensions for Pmatch (n) whose sizes can
be bounded by 2O() n2 log n (and for which the encoding lengths of the coefficients
needed to describe the extensions by linear systems can be bounded by a constant).
Symmetry Matters for the Sizes of Extended Formulations 145

In particular, Theorem 3 implies the following, although, according to Corol-


lary 1, no compact symmetric extended formulations exist for Pmatch (n) with
 = Θ(log n).
Corollary 2. For all n and  ≤ O(log n), there are compact extended formula-
tions for Pmatch (n).
The proof of Theorem 3 relies on the following result on the existence of small
families of perfect-hash functions, which is from [1, Sect. 4]. Its proof is based
on results from [8,15].
Theorem 4 (Alon, Yuster, Zwick [1]). There are maps φ1 , . . . , φq(n,r) :
[n] → [r] with q(n, r) ≤ 2O(r) log n such that, for every W ⊆ [n] with |W | = r,
there is some i ∈ [q(n, r)] for which the map φi is bijective on W .
Furthermore, we will use the following two auxilliary results that can be de-
rived from general results on polyhedral branching systems [11, see Cor. 3 and
Sect. 4.4]. The first one (Lemma 10) provides a construction of an extension
of a polytope that is specified as the convex hull of some polytopes of which
extensions are already available. In fact, in this section it will be needed only
for the case that these extensions are the polytopes themselves (this is a special
case of a result of Balas’, see [2, Thm.2.1]). However, we will face the slightly
more general situation in our treatment of cycle polytopes in Section 6.
Lemma 10. If the polytopes Pi ⊆ Rm (for i ∈ [q]) have extensions qQi of size si ,
respectively, then P = conv(P1 ∪· · ·∪Pq ) has an extension of size i=1 (si +2)+1.
The second auxilliary result that we need deals with describing a 0/1-polytope
that is obtained by splitting variables of a 0/1-polytope of which a linear de-
scription is already available.
Lemma 11. Let S be a set of subsets of [t], P = conv{χ(S) ∈ {0, 1}t : S ∈ S} ⊆
Rt , the corresponding 0/1-polytope, J = J(1)(· · ·(J(t) a disjoint union of finite
sets J(i),

S = {S ⊆ J : There is some S ∈ S with


|S ∩ J(i)| = 1 for all i ∈ S, |S ∩ J(i)| = 0 for all i ∈ S} , (9)

and P = conv{χ(S ) ∈ {0, 1}J : S ∈ S }. If P = {y ∈ [0, 1]t : Ay ≤ b} for


some A ∈ Rs×t and b ∈ Rs , then

t 
P = {x ∈ [0, 1]J : A ,i · xj ≤ bi for all i ∈ [t]} . (10)
i=1 j∈J(i)

In order to prove Theorem 3, let φ1 , . . . , φq be maps as guaranteed to exist


by Theorem 4 with r = 2 and q = q(n, 2) ≤ 2O() log n, and denote Mi =
{M ∈ M (n) : φi is bijective on V (M )} for each i ∈ [q]. By Theorem 4, we
have M (n) = M1 ∪ · · · ∪ Mq . Consequently,

Pmatch (n) = conv(P1 ∪ · · · ∪ Pq ) (11)


146 V. Kaibel, K. Pashkovich, and D.O. Theis

with Pi = conv{χ(M ) : M ∈ Mi } for all i ∈ [q], where we have

−1
Pi = {x ∈ RE
+ : xE\Ei = 0, x(δ(φi (s))) = 1 for all s ∈ [2],
x(Ei (φ−1
i (S))) ≤ (|S| − 1)/2 for all S ⊆ [2], |S| odd} ,
/
where Ei = E \ j∈[2] E(φ−1
i (j)). This follows by Lemma 11 from Edmonds’
linear description (1) of the perfect matching polytope Pmatch (2) of K2 . As the
sum of the number of variables and the number of inequalities in the description
of Pi is at most 2O() + n2 (the summand n2 comes from the nonnegativity
constraints on x ∈ RE+ and the constant in O() is independent of i), we obtain an

extension of Pmatch (n) of size 2O() n2 log n by Lemma 10. This proves Theorem 3.

6 Extensions for Cycle Polytopes


By a modification of Yannakakis’ construction for the derivation of lower bounds
on the sizes of symmetric extensions for traveling salesman polytopes from the
corresponding lower bounds for matching polytopes [17, Thm. 2], we obtain
lower bounds on the sizes of symmetric extensions for Pcycl (n). The lower bound
 ≥ 42 in the statement of the theorem (whose proof can be found in [12]) is
convenient with respect to both formulating the bound and proving its validity.
Theorem 5. There is a constant C  > 0 such that, for all n and 42 ≤  ≤ n,
the size of every extension for Pcycl (n) that is symmetric (with respect to the
group S(n) acting via permuting the nodes of Kn as described in the Introduc-
 n 
3
tion) is bounded from below by C  · (  −1)/2 .
6

Corollary 3. For Ω(log n) ≤  ≤ n, there is no compact extended formula-


tion for Pcycl (n) that is symmetric (with respect to the group S(n) acting via
permuting the nodes of Kn as described in the Introduction).

On the other hand, if we drop the symmetry requirement, we find extensions of


the following size.
Theorem 6. For all n and , there are extensions for Pcycl (n) whose sizes can
be bounded by 2O() n3 log n (and for which the encoding lengths of the coefficients
needed to describe the extensions by linear systems can be bounded by a constant).
Before we prove Theorem 6, we state a consequence that is similar to Corollary 1
for matching polytopes. It shows that, despite the non-existence of symmetric
extensions for the polytopes associated with cycles of length Θ(log n) (Corol-
lary 3), there are non-symmetric compact extensions of these polytopes.
Corollary 4. For all n and  ≤ O(log n), there are compact extended formula-
tions for Pcycl (n).
The rest of the section is devoted to prove Theorem 6, i.e., to construct an ex-
tension of Pcycl (n) whose size is bounded by 2O() n3 log n. We proceed similarly
Symmetry Matters for the Sizes of Extended Formulations 147

to the proof of Theorem 3 (the construction of extensions for matching poly-


topes), this time starting with maps φ1 , . . . , φq as guaranteed to exist by Theo-
rem 4 with r =  and q = q(n, ) ≤ 2O() log n, and defining Ci = {C ∈ C  (n) :
φi is bijective on V (C)} for each i ∈ [q]. Thus, we have C  (n) = C1 ∪ · · · ∪ Cq ,
and hence, Pcycl (n) = conv(P1 ∪ · · · ∪ Pq ) with Pi = conv{χ(C) : C ∈ Ci }
for all i ∈ [q]. Due to Lemma 10, it suffices to exhibit, for each i ∈ [q], an
extension of Pi of size bounded by O(2 · n3 ) (with the constant independent
of i). Towards this end, let, for i ∈ [q], Vc = φ−1 i (c) for all c ∈ [], and de-
= conv{χ(C) : C ∈ Ci , v ∈ V (C)} for each v ∈ V . Thus, we have
fine Pi (v )/
Pi = conv v ∈V Pi (v ), and hence, due to Lemma 10, it suffices to construct
extensions of the Pi (v ), whose sizes are bounded by O(2 · n2 ).
In order to derive such extensions define, for each i ∈ [q] and v ∈ V , a di-
rected acyclic graph D with nodes (A, v) for all A ⊆ [ − 1]and v ∈ φ−1 i (A), as
well as two additional nodes s and t, and arcs s, ({φi (v)}, v) and ([ − 1], v), t
 
for all v ∈ φ−1
i ([ − 1]), as well as (A, v), (A ∪ {φi (w)}, w) for all A ⊆ [ − 1],
v ∈ φ−1 −1
i (A), and w ∈ φi ([ − 1] \ A). This is basically the dynamic program-
ming digraph (using an idea going back to [10]) from the color-coding method
for finding paths of prescribed lengths described in [1]. Each s-t-path in D cor-
responds to a cycle in Ci that visits v , and each such cycle, in turn, corresponds
to two s-t-paths in D (one for each of the two directions of transversal).
Defining Qi (v ) as the convex hull of the characteristic vectors of all s-t-paths
in D in the arc space of D, we find that Pi (v ) is the image of Qi (v )) under the
projection whose component function corresponding to the edge {v, w} of Kn
is given by the sum of all arc variables corresponding to arcs ((A, v), (A , w))
(for A, A ⊆ [ − 1]) if v ∈ {v, w}, and by the sum of the two arc variables
corresponding to (s, ({φi (w)}, w)) and (([ − 1], w), t) in case of v = v . Clearly,
Qi (v ) can be described by nonnegativity constraints, flow conservation con-
straints for all nodes in D different from s and t, and by the equation stating
that there must be exactly one flow-unit leaving s. As the number of arcs of D
is in O(2 · n2 ), we thus have found an extension of Pi (v ) of the desired size.

7 Conclusions
The results presented in this paper demonstrate that there are polytopes which
have compact extended formulations though they do not admit symmetric ones.
These polytopes are associated with matchings (or cycles) of some prescribed
cardinalities (see [4] for a recent survey on general cardinality restricted com-
binatorial optimization problems). Similarly, for the permutahedron associated
with [n] there is a gap between the smallest sizes Θ(n log n) of a non-symmetric
extension [9] and Θ(n2 ) of a symmetric extension [14].
Nevertheless, the question whether there are compact extended formulations
for general matching polytopes (or for perfect matching polytopes), remains
one of the most interesting open question here. In fact, it is even unknown
whether there are (non-symmetric) extended formulations of these polytopes of
size 2o(n) .
148 V. Kaibel, K. Pashkovich, and D.O. Theis

Actually, it seems that there are almost no lower bounds known on the sizes
of (not necessarily symmetric) extensions, except for the one obtained by the
observation that every extension Q of a polytope P with f faces has at least f
faces itself, thus Q has at least log f facets (since a face is uniquely determined
by the subset of facets it is contained in) [9]. It would be most interesting to
obtain other lower bounds, including special ones for 0/1-polytopes.

Acknowledgements. We thank Christian Bey for useful discussions on subspaces


that are invariant under coordinate permutations.

References
1. Alon, N., Yuster, R., Zwick, U.: Color-coding. J. Assoc. Comput. Mach. 42(4),
844–856 (1995)
2. Balas, E.: Disjunctive programming and a hierarchy of relaxations for discrete
optimization problems. SIAM J. Algebraic Discrete Methods 6(3), 466–486 (1985)
3. Bochert, A.: Ueber die Zahl der verschiedenen Werthe, die eine Function gegebener
Buchstaben durch Vertauschung derselben erlangen kann. Math. Ann. 33(4), 584–
590 (1889)
4. Bruglieri, M., Ehrgott, M., Hamacher, H.W., Maffioli, F.: An annotated bibliog-
raphy of combinatorial optimization problems with fixed cardinality constraints.
Discrete Appl. Math. 154(9), 1344–1357 (2006)
5. Conforti, M., Cornuéjols, G., Zambelli, G.: Extended formulations in combinatorial
optimization. Tech. Rep., Università di Padova (2009)
6. Edmonds, J.: Maximum matching and a polyhedron with 0, 1-vertices. J. Res. Nat.
Bur. Standards Sect. B 69B, 125–130 (1965)
7. Edmonds, J.: Matroids and the greedy algorithm. Math. Programming 1, 127–136
(1971)
8. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with O(1) worst
case access time. J. Assoc. Comput. Mach. 31(3), 538–544 (1984)
9. Goemans, M.: Smallest compact formulation for the permutahedron,
http://www-math.mit.edu/~ goemans/publ.html
10. Held, M., Karp, R.M.: A dynamic programming approach to sequencing problems.
J. Soc. Indust. Appl. Math. 10, 196–210 (1962)
11. Kaibel, V., Loos, A.: Branched polyhedral systems. In: Eisenbrand, F., Shepherd,
B. (eds.) IPCO 2010. LNCS, vol. 6080, pp. 177–190. Springer, Heidelberg (2010)
12. Kaibel, V., Pashkovich, K., Theis, D.O.: Symmetry matters for the sizes of extended
formulations. arXiv:0911.3712v1 [math.CO]
13. Kipp Martin, R.: Using separation algorithms to generate mixed integer model
reformulations. Tech. Rep., University of Chicago (1987)
14. Pashkovich, K.: Tight lower bounds on the sizes of symmetric extensions of per-
mutahedra and similar results (in preparation)
15. Schmidt, J.P., Siegel, A.: The spatial complexity of oblivious k-probe hash func-
tions. SIAM J. Comput. 19(5), 775–786 (1990)
16. Wielandt, H.: Finite permutation groups. Translated from the German by Bercov,
R. Academic Press, New York (1964)
17. Yannakakis, M.: Expressing combinatorial optimization problems by linear pro-
grams. J. Comput. System Sci. 43(3), 441–466 (1991)
A 3-Approximation for Facility Location with
Uniform Capacities

Ankit Aggarwal1, L. Anand2 , Manisha Bansal3 , Naveen Garg4, ,


Neelima Gupta3 , Shubham Gupta5 , and Surabhi Jain6
1
Tower Research Capital LLC, Gurgaon
2
Georgia Tech
3
University of Delhi
4
Indian Institute of Technology Delhi
5
U. of Waterloo
6
Morgan Stanley, Mumbai

Abstract. We consider the facility location problem where each facility


can serve at most U clients. We analyze a local search algorithm for this
problem which uses only the operations of add, delete and swap and
prove that any locally optimum solution is no more than 3 times the
global optimum. This improves on a result √ of Chudak and Williamson
who proved an approximation ratio of 3 + 2 2 for the same algorithm.
We also provide an example which shows that our analysis is tight.

1 Introduction

In a facility location problem we are given a set of clients C and facility locations
F . Opening a facility at location i ∈ F costs fi (the facility cost). The cost of
servicing a client j by a facility i is given by ci,j (the service cost) and these
costs form a metric — for facilities i, i and clients j, j  , ci ,j  ≤ ci ,j + ci,j + ci,j  .
The objective is to determine which locations to open facilities in, so that the
total cost for opening the facilities and for serving all the clients is minimized.
Note that in this setting each client would be served by the open facility which
offers the smallest service cost.
When the number of clients that a facility can serve is bounded, we have a
capacitated facility location problem. In this paper we assume that these capac-
ities are the same, U , for all facilities. For this problem of uniform capacities
the first approximation algorithm was due to Chudak and Williamson [2] who
analyzed a local search algorithm and proved that any locally optimum solution
has cost no more than 6 times the facility cost plus 5 times the service cost of
an (global) optimum solution. In this paper we refer to such a guarantee as a
(6,5)-approximation; note that this is different from the bi-criterion guarantees
for which this notation is typically used. The result of Chudak and Williamson

Work done as part of the “Approximation Algorithms” partner group of MPI-
Informatik, Germany.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 149–162, 2010.

c Springer-Verlag Berlin Heidelberg 2010
150 A. Aggarwal et al.

built on earlier work of Korupolu, Plaxton and Rajaraman [3] who were the first
to analyze local search algorithms for facility location problems.
Given the set of open facilities, the best way of serving the clients, can be
determined by solving an assignment problem. Thus any solution is completely
determined by the set of open facilities. The local search procedure analyzed
by Chudak and Williamson, starts with an arbitrary set of open facilities and
then updates this set, using one of the operations add, delete, swap, whenever
that operation reduces the total costs of the solution. We show that a solution
which is locally optimum with respect to this same set of operations is a (3,3)-
approximation. We then show that our analysis of this local search algorithm is
best possible by demonstrating an instance where the locally optimum solution
is three times the optimum solution.
When facilities have different capacities, the best result known is a (6,5)-
approximation by Zhang, Chen and Ye [7]. The local search in this case relies
on a multi-exchange operation, in which, loosely speaking, a subset of facilities
from the current solution is exchanged with a subset not in the solution. This
result improves on a (8,4)-approximation by Mahdian and Pal [4] and a (9,5)
approximation by Pal, Tardos and Wexler [6].
For capacitated facility location, the only algorithms known are based on local
search. One version of capacitated facility location arises when we are allowed
to make multiple copies of the facilities. Thus if facility i has capacity Ui and
opening cost fi , then to serve k > Ui clients by facility i we need to open k/Ui 
copies of i and incur an opening cost fi k/Ui . This version is usually referred
to as “facility location with soft capacities” and the best known algorithm for
this problem is a 2-approximation [5].
All earlier work for capacitated facility location (uniform or non-uniform)
reroutes all clients in a swap operation from the facility which is closing to one
of the facilities being opened. This however can be quite expensive and cannot
lead to the tight bounds that we achieve in this paper. We use the idea of
Arya et.al. [1] to reassign clients of the facility being closed in a swap operation
to other facilities in our current solution. However, to be able to handle the
capacity constraints in this reassignment we need to extend the notion of the
mapping between clients used in [1] to a fractional assignment. As in earlier work,
we use the fact that when we have a local optimum, no operation leads to an
improvement in cost. However, we now take carefully defined linear combinations
of the inequalities capturing this local optimality. All previous work that we are
aware of seems to only use the sum of such inequalities and therefore requires
additional properties like the integrality of the assignment polytope to carry the
argument through [2]. Our approach is therefore more general and amenable
to better analysis. The idea of doing things fractionally appears more often
in our analysis. Thus, when analyzing the cost of an operation we assign clients
fractionally to the facilities and rely on the fact that such a fractional assignment
cannot be better than the optimum assignment.
In Section 4 we give a tight example that relies on the construction of a suitable
triangle free set-system. While this construction itself is quite straightforward,
A 3-Approximation for Facility Location with Uniform Capacities 151

this is the first instance we know of, where such an idea being applied to prove
a large locality gap.

2 Preliminaries
Let C be the set of clients and F denote the facility locations. Let S (resp. O)
be the set of open facilities in our solution (resp. optimum solution). We abuse
notation and use S (resp O) to denote our solution (resp. optimum solution).
Initially S is an arbitrary set of facilities which can serve all the clients. Let
cost(S) denote the total cost (facility plus service) of solution S. The three
operations that make up our local search algorithm are
Add. For s ∈
/ S, if cost(S + {s}) < cost(S) then S ← S + {s}.
Delete. For s ∈ S, if cost(S − {s}) < cost(S) then S ← S − {s}.
Swap. For s ∈ S and s ∈ / S, if cost(S − {s} + {s }) < cost(S) then S ←

S − {s} + {s }.
S is locally optimum if none of the three operations are possible and at this point
our algorithm stops. Polynomial running time can be ensured at the expense of
an additive in the approximation guarantee by doing a local step only if the
cost reduces by more than an 1 − /n factor, for > 0.
We use fi , i ∈ F to denote the cost of opening a facility at location i. Let
Sj , Oj denote the service-cost of client j in the solutions S and O respectively.
NS (s) denotes the clients served by facility s in the solution S. Similarly NO (o)
denotes the clients served by facility o in solution O. Nso denotes the set of clients
served by facility s in solution S and by facility o in solution O.
The presence of the add operation ensures that the total service cost of the
clients in any locally optimum solution is at most the total cost of the optimum
solution [2]. Hence in this paper we only consider the problem of bounding the
facility cost of a locally optimum solution which we show is no more than 2 times
the cost of an optimum solution.

3 Bounding the Facility Costs


Let S denote the locally optimum solution obtained. For the rest of this section
we assume that the sets S and O are disjoint.
We will associate a weight, wt : C → [0..1], with each client which satisfies
the following properties.
1. For a client j ∈ C let σ(j) be the facility which serves j in solution S. Then
 
U − |NS (σ(j))|
wt(j) ≤ min 1, .
|NS (σ(j))|
Let init-wt(j) denote the quantity on the right of the above inequality.
Since |NS (σ(j))| ≤ U , we have that 0 ≤ init-wt(j) ≤ 1.
2. For all o ∈ O and s ∈ S, wt(Nso ) ≤ wt(NO (o))/2. Here for X ⊆ C, wt(X)
denotes the sum of the weights of the clients in X.
152 A. Aggarwal et al.

To determine wt(j) so that these two properties are satisfied we start by assigning
wt(j) = init-wt(j). However, this assignment might violate the second property.
A facility s ∈ S captures a facility o ∈ O if init-wt(Nso ) > init-wt(NO (o))/2.
Note that at most one facility in S can capture a facility o. If s does not capture
o then for all j ∈ Nso define wt(j) = init-wt(j). However if s captures o then
for all j ∈ Nso define wt(j) = α · init-wt(j) where α < 1 is such that wt(Nso ) =
wt(NO (o))/2.
For a facility o ∈ O we define a fractional assignment πo : NO (o) × NO (o) →
+ with the following properties.
 
 πo (j, j ) > 0 only ifj and j are served
separation. by different facilities in S.

balance. π
j  ∈NO (o) o (j , j) = j  ∈NO (o) πo (j, j ) = wto (j) for all j ∈ NO (o).
The fractional assignment πo can be obtained along the same lines as the map-
ping in [1]. The individual fractional assignments πo are extended to a frac-
tional assignment over all clients, π : C × C → + in the obvious way —
π(j, j  ) = πo (j, j  ) if j, j  ∈ NO (o) and is 0 otherwise.
To bound the facility cost of a facility s ∈ S we will close the facility and as-
sign the clients served by s to other facilities in S and, may be, some facility in
O. The reassignment of the clients served by s to the facilities in S is done using
the fractional assignment π. Thus if client j is served by s in the solution S and
π(j, j  ) > 0 then we assign a π(j, j  ) fraction of j to the facility σ(j  ). Note that
1. σ(j  ) = s and this follows from the separation property of π.
2. j is reassigned to the facilities in S to a total extent of wt(j) (balance
property).
3. A facility s ∈ S, s = s would get some additional clients. The total extent
to which these additional clients are assigned to s is at most wt(NS (s ))
(balance property). Since
wt(NS (s )) ≤ init-wt(NS (s )) ≤ U − |NS (s )|,
the total extent to which clients are assigned to s is at most U .
Let Δ(s) denote the increase in the service-cost of the clients served by s due to
the above reassignment.
 
Lemma 1. s∈S Δ(s) ≤ j∈C 2Oj wt(j)
Proof. Let π(j, j  ) > 0. When the facility σ(j) is closed and π(j, j  ) fraction of
client j assigned to facility σ(j  ), the increase in service cost is π(j, j  )(cj,σ(j  ) −
cj,σ(j) ). Since cj,σ(j  ) ≤ Oj + Oj  + Sj  we have
 
Δ(s) = π(j, j  )(cj,σ(j  ) − cj,σ(j) )
s∈S j,j  ∈C

≤ π(j, j  )(Oj + Oj  + Sj  − Sj )
j,j  ∈C

=2 Oj wt(j)
j∈C

where the last equality follows from the balance property. 



A 3-Approximation for Facility Location with Uniform Capacities 153

If wt(j) < 1 then some part of j remains unassigned. The quantity 1−wt(j) is the
residual weight of client j and is denoted by res-wt(j). Clearly 0 ≤ res-wt(j) ≤
1. Note that
1. If we close facility s ∈ S and assign the residual weight of all clients served
by s to a facility o ∈ O − S then the total extent to which clients are assigned
to o equals res-wt(NS (s)) which is less than U .
2. The service cost of a client, j, which is assigned to o would increase by
cj,o − cj,s . Let
cs,o = max(cj,o − cj,s )
j∈C

denote the maximum possible increase in service cost of a client when it is


assigned to o instead of s. Since service costs satisfy the metric property we
have
cs,o ≤ min(Sj + Oj ).
j∈C
3. The total increase in service cost of all clients in NS (s) which are assigned
(partly) to o is at most cs,o res-wt(NS (s)).
Let "s, o# denote the swapping of facilities s, o and the reassignment of clients
served by s to facilities in S ∪{o} as discussed above. Since S is a locally optimum
we have
fo − fs + cs,o res-wt(NS (s)) + Δ(s) ≥ 0. (1)
The above inequalities are written for every pair (s, o), s ∈ S, o ∈ O. We take
a linear combination of these inequalities with the inequality corresponding to
"s, o# having a weight λs,o in the combination to get
   
λs,o fo − λs,o fs + λs,o cs,o res-wt(NS (s)) + λs,o Δ(s) ≥ 0. (2)
s,o s,o s,o s,o

where
res-wt(Nso )
λs,o =
res-wt(NS (s))
and is 0 if res-wt(NS (s)) = 0. Let S  be the subset of facilities in the solution
S for which res-wt(NS (s)) = 0. A facility s ∈ S  can be deleted from S and its
clients reassigned completely to the other facilities in S. This implies
−fs + Δ(s) ≥ 0
We write such an inequality for each s ∈ S  and add them to inequality 2.

Note that for all s ∈ S − S , o λs,o = 1. This implies that
  
fs + λs,o fs = fs (3)
s∈S  s,o s

and    
Δ(s) + λs,o Δ(s) = Δ(s) ≤ 2Oj wt(j) (4)
s∈S  s,o s j∈C

However, the reason for defining λs,o as above is to ensure the following property.
154 A. Aggarwal et al.

 
Lemma 2. s,o λs,o cs,o res-wt(NS (s)) ≤res-wt(j)(Oj + Sj )
j∈C
 o
Proof. The left hand side in the inequality is s,o cs,o res-wt(Ns ). Since for
each client j ∈ Ns , cs,o ≤ Oj + Sj we have
o


cs,o res-wt(Nso ) = cs,o res-wt(j)
o
j∈NS

≤ res-wt(j)(Oj + Sj )
o
j∈NS

which, when summed over all s and o implies the Lemma. 




Incorporating equations (3), (4) and Lemma 2 into inequality (2) we get
   
fs ≤ λs,o fo + res-wt(j)(Oj + Sj ) + 2Oj wt(j)
s s,o j∈C j∈C
  
= λs,o fo + 2 Oj + res-wt(j)(Sj − Oj ) (5)
s,o j∈C j∈C

We now need to bound the number of times a facility of the optimum solution
may be opened.

Lemma 3. For all o ∈ O, s λs,o ≤ 2.

Proof. We begin with the following observations.

1. For all s, o, λs,o ≤ 1.


2. Let I ⊆ S be the facilities s such that |NS (s)| ≤ U/2 and s does not
capture o. Let s ∈ I and j ∈ Nso . Note that wt(j) = init-wt(j) = 1 and
so res-wt(j) = 0. This implies that res-wt(Nso ) = 0 and so for all s ∈ I,
λs,o = 0.

Thus we only need to show that s∈I / λs,o ≤ 2.
We first consider the case when o is not captured by any s ∈ S. Let s be a
facility not in I which does not capture o. For j ∈ Nso ,

U
res-wt(j) = 1 − wt(j) = 1 − init-wt(j) = 2 − .
|NS (s)|

However, for j ∈ NS (s) we have that

U
res-wt(j) = 1 − wt(j) ≥ 1 − init-wt(j) = 2 − .
|NS (s)|

Therefore λs,o ≤ |Nso |/|NS (s)| and hence


   |N o |  |N o | |NO (o)|
λs,o = λs,o ≤ s
≤ s
≤ ≤ 2.
|NS (s)| U/2 U/2
s s∈I
/ s∈I
/ s∈I
/
A 3-Approximation for Facility Location with Uniform Capacities 155

We next consider the case when o is captured by s ∈ S. This implies



init-wt(Nso ) ≥ init-wt(Nso )
s=s

≥ init-wt(Nso )
s∈I∪{s
/ }

 U − |NS (s)|
= |Nso |
}
|NS (s)|
s∈I∪{s
/
  
|Nso |
= U − |Nso |
}
|NS (s)|
s∈I∪{s
/

Since init-wt(Nso ) ≤ |Nso | rearranging we get,


 |Nso |  |N o |
≤ s
≤ 1.
}
|NS (s)| U
s∈I∪{s
/ s∈I
/

Now
  |Nso |
λs,o ≤ ≤1
} }
|NS (s)|
s∈I∪{s
/ s∈I∪{s
/

and since λs ,o ≤ 1 we have


 
λs,o = λs,o ≤ 2.
s s∈I
/

This completes the proof. 




Incorporating Lemma 3 into inequality (5) we get


⎛ ⎞
   
fs ≤ 2 ⎝ fo + Oj ⎠ + res-wt(j)(Sj − Oj )
s o j∈C j∈C
 
Note that j∈C res-wt(j)(S
 j − Oj ) is at most j∈C (Sj − Oj ) which in turn
can be bounded by o fo by considering the operation of adding  facilities
 in the
optimum solution. This, however, would lead to a bound of 3 o fo + 2 j∈C Oj
on the facility cost of our solution.
The key to obtaining a sharper bound on the facility cost of our solution is the
observation that in the swap "s, o# facility o gets only res-wt(NS (s)) clients and
so can accommodate
 an additional U − res-wt(NS (s)) clients. Since we need to
bound j∈C res-wt(j)(Sj − Oj ), we assign, the clients in NO (o) to facility o in
the ratio of their residual weights. Thus client j would be assigned to an extent
βs,o res-wt(j) where
 
U − res-wt(NS (s))
βs,o = min 1, .
res-wt(NO (o))
156 A. Aggarwal et al.

βs,o is defined so that o gets at most U clients. Let Δ (s, o) denote the increase
in service cost of the clients of NO (o) due to this reassignment. Hence

Δ (s, o) = βs,o res-wt(j)(Oj − Sj ).
j∈NO (o)

The inequality (1) corresponding to the swap  "s, o# would now get an additional
term Δ (s, o) on the left. Hence the term s,o λs,o Δ (s, o) would appear on the
left in inequality (2) and on the right in inequality (5). To bound this term note
that
  
λs,o Δ (s, o) = λs,o βs,o res-wt(j)(Oj − Sj )
s s j∈NO (o)
) *
 
= λs,o βs,o res-wt(j)(Oj − Sj ).
s j∈NO (o)

If s λs,o βs,o > 1 
then we reduce some βs,o so that the sum is exactly 1. On
the other hand if s λs,o βs,o = 1 − γo , γo > 0, then we take the inequalities
corresponding to the operation of adding the facility o ∈ O

fo + res-wt(j)(Oj − Sj ) ≥ 0 (6)
j∈NO (o)

and add these to inequality (2) with a weight γo . Hence the total increase in the
left hand side of inequality (2) is
⎛ ⎞
  
λs,o Δ (s, o) + γo ⎝fo + res-wt(j)(Oj − Sj )⎠
s,o o j∈NO (o)
 
= (1 − γo )res-wt(j)(Oj − Sj )
o j∈NO (o)
  
+ γo fo + γo res-wt(j)(Oj − Sj )
o o j∈NO (o)
  
= res-wt(j)(Oj − Sj ) + γo fo
o j∈NO (o) o
 
= res-wt(j)(Oj − Sj ) + γo fo
j∈C o

and so inequality (5) now becomes


   
fs ≤ λs,o fo + 2 Oj + γo fo
s o s j∈C o
 
+ res-wt(j)(Sj − Oj ) + res-wt(j)(Oj − Sj )
j∈C j∈C
A 3-Approximation for Facility Location with Uniform Capacities 157

) *
  
= γo + λs,o fo + 2 Oj
o s j∈C
) *
  
= 1+ λs,o (1 − βs,o ) fo + 2 Oj
o s j∈C
⎛ ⎞
 
≤ 2⎝ fo + Oj ⎠
o j∈C

where the last inequality follows from the following Lemma.



Lemma 4. s λs,o (1 − βs,o ) ≤ 1.

Proof. Since res-wt(NO (o)) ≤ |NO (o)| ≤ U we have


 
U − res-wt(NS (s))
βs,o = min 1,
res-wt(NO (o))
 
res-wt(NS (s))
≥ min 1, 1 −
res-wt(NO (o))
res-wt(NS (s))
=1−
res-wt(NO (o))

Hence
  res-wt(N o )
λs,o (1 − βs,o ) ≤ s
= 1.
s s
res-wt(N O (o))



This completes the proof of the following theorem.


Theorem 1. The total cost of open facilities in any locally optimum solution is
at most twice the cost of an optimum solution.

4 When S ∩ O = φ
We now consider the case when S ∩ O = φ. We construct a bipartite graph,
G, on the vertex set C ∪ F as in [2]. Every client j ∈ C has an edge from the
facility σ(j) ∈ S and an edge to the facility τ (j) ∈ O, where τ (j) is the facility
in O serving client j. Thus each client has one incoming and one outgoing edge.
A facility s ∈ S has |NS (s)| outgoing edges and a facility o ∈ O has |NO (o)|
incoming edges.
Decompose the edges of G into a set of maximal paths, P, and cycles, C. Note
that all facilities on a cycle are from S ∩ O. Consider a maximal path, p ∈ P
which starts at a vertex s ∈ S and ends at a vertex o ∈ O. Let head(p) denote
the client served by s on this path and tail(p) be the client served by o on this
path. Let s = s0 , j0 , s1 , j1 , . . . , sk , jk , o be the sequence of vertices on this path.
158 A. Aggarwal et al.

Note that {s1 , s2 , . . . , sk } ⊆ S ∩ O. A shift along this path is a reassignment of


clients so that ji which was earlier assigned to si is now assigned to si+1 where
sk+1 = o. As a consequence of this shift, facility s serves one less client while
facility o serves one more client. Let shift(p) denote the increase in service cost
due to a shift along the path p. Then

shift(p) = Oc − Sc .
c∈C∩p

We can similarly define a shift along a cycle. The increase in service cost equals
the sum of Oj − Sj for all clients j in the cycle and since the assignment of
clients to facilities is done optimally in our solution and in the global optimum
this sum is zero. Thus 
Oj − Sj = 0.
j∈C

Consider the operation of adding a facility o ∈ O. We shift along all paths which
end at o. The increase in service cost due to these shifts equals the sum of Oj −Sj
for all clients j on these paths and this quantity is at least −fo .
 
Oj − Sj ≥ − fo .
j∈P o∈O

Thus    
Oj − Sj = Oj − Sj + Oj − Sj ≥ − fo
j∈C j∈P j∈C o∈O
 
which implies that the service cost of S is bounded by o∈P fo + j∈C Oj .
To bound the cost of facilities in S − O we only need the paths that start from
a vertex in S − O. Hence we throw away all cycles and all paths that start at a
facility in S ∩ O; this is done by removing all clients on these cycles and paths.
Let P denote the remaining paths and C the remaining clients. Every client in
C either belongs to a path which ends in S ∩ O (transfer path) or to a path
which ends in O − S (swap path). Let T denote the set of transfer paths and S
the set of swap paths.
Let Nso be the set of paths that start at s ∈ S and end at o ∈ O. Define

NS (s) = ∪o∈O−S Nso .

Note that we do not include the transfer paths in the above definition. Similarly
for all o ∈ O define
NO (o) = ∪s∈S−O Nso .
Just as we defined the init-wt, wt and res-wt of a client, we can define the
init-wt, wt and res-wt of a swap path. Thus for a path p which starts from
s ∈ S − O we define
 
U − |NS (s)|
init-wt(p) = min 1, .
|NS (s)|
A 3-Approximation for Facility Location with Uniform Capacities 159

The notion of capture remains the same and we reduce the initial weights on the
paths to obtain their weights. Thus wt(p) ≤ init-wt(p) and for every s ∈ S
and o ∈ O, wt(Nso ) ≤ wt(NO (o))/2. For every o ∈ O − S we define a fractional
mapping πo : NO (o) × NO (o) → + such that
 
 πo (p, p ) > 0 only ifp and p start at different facilities in S − O.
separation.
balance. π
p ∈NO (o) o (p , p) = p ∈NO (o) πo (p, p ) = wto (p) for all p ∈ NO (o).

This fractional mapping can be constructed in the same way as done earlier.
The way we use this fractional mapping, π, will differ slightly. When facility s
is closed, we will use π to partly reassign the clients served by s in the solution
S to other facilities in S. If p is a path starting from s and π(p, p ) > 0, then we
shift along p and the client tail(p) is assigned to s , where s is the facility from
which p starts. This whole operation is done to an extent of π(p, p ).
Let Δ(s) denote the total increase in service cost due to the reassignment of
clients on all swap paths starting from s. Define the length of the path p as

length(p) = Oc + Sc .
c∈C∩p

Then
   
Δ(s) ≤ π(p, p )(shift(p) + length(p ))
s s p∈NS (s) p ∈P

= wt(p)(shift(p) + length(p))
p∈S

As a result of the above reassignment a facility s ∈ S − O, s = s might get


additional clients whose ”number” is at most wt(NS (s )). Note that this is less
than init-wt(NS (s )) which is at most U − |NS (s )|. The number of clients s
was serving equals |NS (s )| + |T (s )| where T (s ) is the set of transfer paths
starting from s . This implies that the total number of clients s would have
after the reassignment could exceed U . To prevent this violation of our capacity
constraint, we also perform a shift along these transfer paths. To determine when
the shift should be done, we define a mapping t : NS (s ) × T (s ) → + such
that

1. For all q ∈ T (s ), p∈NS (s ) t(p, q) ≤ 1.

2. For all p ∈ NS (s ), q∈T (s ) t(p, q) ≤ wt(p).
  
3. p,q t(p, q) = min (|T (s )|, wt(NS (s ))).

Now suppose s gets an additional client, say tail(p), to an extent of π(p, p ),


where p ∈ NS (s ). Then for all paths q ∈ T (s ) for 
 which t(p , q) > 0, we would
 
shift along path q to an extent π(p, p )t(p , q)/ q∈T (s ) t(p , q). The mapping
ensures that
1. The total extent to which we will shift along a path q ∈ T (s ) is at most
1. This in turn implies that we do not violate the capacity of any facility
160 A. Aggarwal et al.

in S ∩ O. This is because, if there are t transfer paths ending at a facility


o ∈ S ∩ O then o serves t more clients in solution O than in S. Hence, in
solution S, o serves at most U − t clients. Since the total extent to which we
could shift along a transfer path ending at o is 1, even if we were to perform
shift along all transfer paths ending in o, the capacity of o in our solution S
would not be violated.
2. The capacity constraint of no facility in S − O is violated. If a facility s ∈
S − O gets an additional x clients as a result of reassigning the clients of
some facility s = s , then it would also lose some clients, say y, due to the
shifts along the transfer paths. From property 3 of the mapping t it follows
that
x − y ≤ wt(NS (s )) − |T (s )| ≤ U − |NS (s )| − |T (s )|.

Since, initially, s was serving |NS (s )| + |T (s )| clients, the total number of
clients that s is serving after the reassignment is at most U .
Consider a transfer path, q, starting from s. We would shift once along path
q when we close facility s. We would also be shifting along q to an extent of
  
p ∈NS (s) t(p , q) which is at most 1. Let Δ (s) denote the total increase in
service cost due to shifts on all transfer paths starting from s. Then

Δ (s) ≤ 2 shift(q) (7)
q∈T (s)

For a swap path p, define res-wt(p) = 1 − wt(p). If a client j belongs to a swap


path p then define wt(j) = wt(p) and res-wt(j) = res-wt(p). When facility s
is closed, a client j served by s has been assigned to an extent wt(j) to other
facilities in S. We will be assigning the remaining part of j to a facility o ∈ O − S
that will be opened when s is closed. Hence the total number of clients that will
be assigned to o is res-wt(NS (s)) which is less than U . The increase in ser-
vice cost due to this reassignment is at most cs,o res-wt(NS (s)). The remaining
available capacity of o is utilized by assigning each client j ∈ NO (o) to an extent
βs,o res-wt(j), where βs,o is defined as before. This assignment is actually done
by shifting along each path, p ∈ NO (o), by an extent βs,o res-wt(p). As done
earlier, the inequality corresponding to the swap "s, o# is counted to an extent
λs,o in the linear combination.
Just as in Lemma 2 we have
 
λs,o cs,o res-wt(NS (s)) ≤ res-wt(p)length(p).
s,o p∈S

Lemma 3 continues to hold and so does Lemma 4. As before, we might have to


add facility o ∈ O − S, shift each path p ∈ NO (o) by an extent res-wt(p) and
add a γo multiple of this inequality to the linear combination. Putting everything
together we get
A 3-Approximation for Facility Location with Uniform Capacities 161

  
fs ≤ 2 fo + wt(p)(shift(p) + length(p))
s∈S−O o∈O−S p∈S
 
+ res-wt(p)(length(p) + shift(p)) + 2 shift(p)
p∈S p∈T
  
=2 fo + (shift(p) + length(p)) + 2 shift(p)
o∈O−S p∈S p∈T
⎛ ⎞
 
≤ 2⎝ fo + Oj ⎠
o∈O−S j∈C

5 A Tight Example
Our tight example consists of r facilities in the optimum solution O, r facilities
in the locally optimum solution S and rU clients. The facilities are F = O ∪ S.
Since, no facility can serve more than U clients, each facility in S and O serves
exactly U clients. Our instance has the property that a facility in O and a facility
in S share at most one client.
We can view our instance as a set-system — the set of facilities O is the
ground set and for every facility s ∈ S we have a subset Xs of this ground set.
o ∈ Xs iff there is a client which is served by s in the solution S and by o in the
solution O. This immediately implies that each element of the ground set is in
exactly U sets and that each set is of size exactly U .
A triangle in the set-system is a collection of 3 elements, o1 , o2 , o3 and 3 sets
Xs1 , Xs2 , Xs3 such that oi is not in Xsi but belongs to the other two sets. An
important property of our instance is that the corresponding set-system has no
triangles.
We now show how to construct a set system with the three properties men-
tioned above. With every o ∈ O we associate a distinct point xo = (xo1 , xo2 , . . . xoU )
in a U -dimensional space where for all i, xoi ∈ {1, 2, 3, . . . , U }. For every choice
of coordinate i, 1 ≤ i ≤ U we form U U−1 sets, each of which contains all points
differing only in coordinate i. Thus the total number of sets we form is r = U U
which is the same as the number of points. Each set can be viewed as a line in
U -dimensional space. To see that this set system satisfies all the three proper-
ties note that each line contains U points and each point is on exactly U lines.
Further, we do not have 3 points and 3 lines such that each line includes two
points and excludes one.
We now define the facility and the service costs. For a facility o ∈ O, fo = 2U
while for facility s ∈ S, fs = 6U − 6. For a client j ∈ Nso , we have cs,j = 3 and
co,j = 1. All other service costs are given by the metric property.
Since the service cost of each client in O is 1 and the facility cost of each facility
in O is 2U , we have cost(O) = 3U U+1 . Similarly, cost(S) = (3 − 2/U )3U U+1
and hence cost(S) = (3 − 2/U )cost(O). We now need to prove that S is indeed
a locally optimum solution with respect to the local search operations of add,
delete and swap.
162 A. Aggarwal et al.

Adding a facility o ∈ O to the solution S, would incur an opening cost of 2U .


The optimum assignment would reassign only the clients in No (O), and all these
are assigned to o. The reduction in the service cost due to this is exactly 2U
which is offset by the increase in the facility cost. Hence the cost of the solution
does not improve.
If we delete a facility in the solution S, the solution is no longer feasible since
the total capacity of the facilities is now U U+1 − U and the number of clients is
U U+1 .
Now, consider swapping a facility s ∈ S with a facility o ∈ O. The net decrease
in the facility cost is 4U − 6. One can show that the new optimum assignment of
clients to facilities would reassign only clients in Ns (S) and all these are assigned
to o. Since Nso ≤ 1, we have U − 1 clients in Ns (S) whose service cost would
increase from 3 to 7. The client in Nso would see a decrease in service cost by 2.
The net increase in service cost is 4U − 6 which is exactly equal to the decrease
in facility cost. Hence, swapping any pair of facilities s ∈ S and o ∈ O does not
improve the solution.

References
1. Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., Pandit, V.: Lo-
cal search heuristics for k-median and facility location problems. SIAM J. Com-
put. 33(3), 544–562 (2004)
2. Chudak, F., Williamson, D.P.: Improved approximation algorithms for capacitated
facility location problems. Math. Program. 102(2), 207–222 (2005)
3. Korupolu, M.R., Plaxton, C.G., Rajaraman, R.: Analysis of a local search heuristic
for facility location problems. J. Algorithms 37(1), 146–188 (2000)
4. Mahdian, M., Pál, M.: Universal facility location. In: Di Battista, G., Zwick, U.
(eds.) ESA 2003. LNCS, vol. 2832, pp. 409–421. Springer, Heidelberg (2003)
5. Mahdian, M., Ye, Y., Zhang, J.: A 2-approximation algorithm for the soft-
capacitated facility location problem. In: Arora, S., Jansen, K., Rolim, J.D.P., Sa-
hai, A. (eds.) RANDOM 2003 and APPROX 2003. LNCS, vol. 2764, pp. 129–140.
Springer, Heidelberg (2003)
6. Pál, M., Tardos, É., Wexler, T.: Facility location with nonuniform hard capacities.
In: FOCS ’01: Proceedings of the 42nd IEEE symposium on Foundations of Com-
puter Science, Washington, DC, USA, p. 329. IEEE Computer Society, Los Alamitos
(2001)
7. Zhang, J., Chen, B., Ye, Y.: A multiexchange local search algorithm for the capac-
itated facility location problem. Math. Oper. Res. 30(2), 389–403 (2005)
Secretary Problems via Linear Programming

Niv Buchbinder1 , Kamal Jain2 , and Mohit Singh3


1
Microsoft Research, New England, Cambridge, MA
2
Microsoft Research, Redmond, WA, USA
3
McGill University, Montreal, Canada

Abstract. In the classical secretary problem an employer would like to


choose the best candidate among n competing candidates that arrive in
a random order. This basic concept of n elements arriving in a random
order and irrevocable decisions made by an algorithm have been explored
extensively over the years, and used for modeling the behavior of many
processes. Our main contribution is a new linear programming technique
that we introduce as a tool for obtaining and analyzing mechanisms for
the secretary problem and its variants. The linear program is formulated
using judiciously chosen variables and constraints and we show a one-to-
one correspondence between mechanisms for the secretary problem and
feasible solutions to the linear program. Capturing the set of mechanisms
as a linear polytope holds the following immediate advantages.
– Computing the optimal mechanism reduces to solving a linear program.
– Proving an upper bound on the performance of any mechanism
reduces to finding a feasible solution to the dual program.
– Exploring variants of the problem is as simple as adding new con-
straints, or manipulating the objective function of the linear
program.
We demonstrate these ideas by exploring some natural variants of the
secretary problem. In particular, using our approach, we design optimal
secretary mechanisms in which the probability of selecting a candidate
at any position is equal. We refer to such mechanisms as incentive com-
patible and these mechanisms are motivated by the recent applications
of secretary problems to online auctions. We also show a family of linear
programs which characterize all mechanisms that are allowed to choose J
candidates and gain profit from the K best candidates. We believe that
linear programming based approach may be very helpful in the context
of other variants of the secretary problem.

1 Introduction

In the classical secretary problem an employer would like to choose the best
candidate among n competing candidates. The candidates are assumed to arrive
in a random order. After each interview, the position of the interviewee in the
total order is revealed vis-á-vis already interviewed candidates. The interviewer
has to decide, irrevocably, whether to accept the candidate for the position or
to reject the candidate. The objective in the basic problem is to accept the

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 163–176, 2010.

c Springer-Verlag Berlin Heidelberg 2010
164 N. Buchbinder, K. Jain, and M. Singh

best candidate with high probability. A mechanism used for choosing the best
candidate is to interview the first n/e candidates for the purpose of evaluation,
and then hire the first candidate that is better than all previous candidates.
Analysis of the mechanism shows that it hires the best candidate with probability
1/e and that it is optimal [8,18].
This basic concept of n elements arriving in a random order and irrevocable de-
cisions made by an algorithm have been explored extensively over the years. We
refer the reader to the survey by Ferguson [9] on the historical and extensive work
on different variants of the secretary problem. Recently, there has been a interest
in the secretary problem with its application to the online auction problem [13,3].
This has led to the study of variants of the secretary problem which are motivated
by this application. For example, [15] studied a setting in which the mechanism
is allowed to select multiple candidates and the goal is to maximize the expected
profit. Imposing other combinatorial structure on the set of selected candidates,
for example, selecting elements which form an independent set of a matroid [4],
selecting elements that satisfy a given knapsack constraint [2], selecting elements
that form a matching in a graph or hypergraph [16], have also been studied. Other
variants include when the profit of selecting a secretary is discounted with time [5].
Therefore, finding new ways of abstracting, as well as analyzing and designing
algorithms, for secretary type problems is of major interest.

1.1 Our Contributions


Our main contribution is a new linear programming technique that we introduce
as a tool for obtaining and analyzing mechanisms for various secretary problems.
We introduce a linear program with judiciously chosen variables and constraints
and show a one-to-one correspondence between mechanisms for the secretary
problem and feasible solutions to the linear program. Obtaining a mechanism
which maximizes a certain objective therefore reduces to finding an optimal
solution to the linear program. We use linear programming duality to give a
simple proof that the mechanism obtained is optimal. We illustrate our technique
by applying it to the classical secretary problem and obtaining a simple proof of
optimality of the 1e mechanism [8] in Section 2.
Our linear program for the classical secretary problem consists of a single con-
straint for each position i, bounding the probability that the mechanism may
select the ith candidate. Despite its simplicity, we show that such a set of con-
straints suffices to correctly capture all possible mechanisms. Thus, optimizing
over this polytope results in the optimal mechanism. The simplicity and the
tightness of the linear programming formulation makes it flexible and applicable
to many other variants. Capturing the set of mechanisms as a linear polytope
holds the following immediate advantages.
– Computing the optimal mechanism reduces to solving a linear program.
– Proving an upper bound on the performance of any mechanism reduces to
finding a feasible solution to the dual program.
– Exploring variants of the problem is as simple as adding new constraints, or
manipulating the objective function of the linear program.
Secretary Problems via Linear Programming 165

We next demonstrate these ideas by exploring some natural variants of the sec-
retary problem.

Incentive Compatibility. As discussed earlier, the optimal mechanism for the


classical secretary problem is to interview the first n/e candidates for the pur-
pose of evaluation, and then hire the first candidate that is better than all
previous candidates. This mechanism suffers from a crucial drawback. The can-
didates arriving early have an incentive to delay their interview and candidates
arriving after the position ne + 1 have an incentive to advance their interview.
Such a behavior challenges the main assumption of the model that interviewees
arrive in a random order. This issue of incentives is of major importance espe-
cially since secretary problems have been used recently in the context of online
auctions [13,3].
Using the linear programming technique, we study mechanisms that are in-
centive compatible. We call a mechanism for the secretary problem incentive
compatible if the probability of selecting a candidate at ith position is equal for
each position 1 ≤ i ≤ n. Since the probability of being selected in each position
is the same, there is no incentive for any interviewee to change his or her posi-
tion and therefore the interviewee arrives at the randomly assigned position. We
show that there exists an incentive compatible mechanism which selects the best
candidate with probability 1 − √12 ≈ 0.29 and that this mechanism is optimal.
Incentive compatibility is captured in the linear program by introducing a set of
very simple constraints.
Surprisingly, we find that the optimal incentive compatible mechanism some-
time selects a candidate who is worse than a previous candidate. To deal with
this issue, we call a mechanism regret-free if the mechanism only selects can-
didates which are better than all previous candidates. We show that the best
incentive compatible mechanism which is regret free accepts the best candidate
with probability 14 . Another issue with the optimal incentive compatible mech-
anism is that it does not always select a candidate. In the classical secretary
problem, the mechanism can always pick the last candidate but this solution
is unacceptable when considering incentive compatibility. We call a mechanism
must-hire if it always hires a candidate. We show that there is a must-hire incen-
tive compatible mechanism which hires the best candidate with probability 14 .
All the above results are optimal and we use the linear programming technique
to derive the mechanisms as well as prove their optimality.
In subsequent work [6], we further explore the importance of incentive com-
patibility in the context of online auctions. In this context, bidders are bidding
for an item and may have an incentive to change their position if this may in-
crease their utility. We show how to obtain truthful mechanisms for such settings
using underlying mechanisms for secretary type problems. While there are in-
herent differences in the auction model and the secretary problem, a mechanism
for the secretary problem is used as a building block for obtaining an incentive
compatible mechanism for the online auction problem.
166 N. Buchbinder, K. Jain, and M. Singh

The J-choice, K-best Secretary Problem. Our LP formulation approach is able


to capture a much broader class of secretary problems. We define a most general
problem that we call the J-Choice, K-best secretary problem, referred to as the
(J, K)-secretary problem. Here, n candidates arrive randomly. The mechanism
is allowed to pick up to J different candidates and the objective is to pick as
many from the top K ranked candidates. The (1, 1)-secretary problem is the
classical secretary problem. For any J, K, we provide a linear program which
characterizes all mechanisms for the problem by generalizing the linear program
for the classical secretary problem.
A sub-class that is especially interesting is the (K, K)-secretary problem, since
it is closely related to the problem of maximizing the expected profit in a cardinal
version of the problem. In the cardinal version of the problem, n elements that
have arbitrary non-negative values arrive in a random order. The mechanism is
allowed to pick at most k elements and its goal is to maximize its expected profit.
We define a monotone mechanism to be an mechanism that, at any position,
does not select an element that is t best so far with probability higher than an
element that is t < t best so far. We note that any reasonable mechanism (and
in particular the optimal mechanism) is monotone. The following is a simple
observation. We omit the proof due to lack of space.

Observation 1. Let Alg be a monotone mechanism for the (K, K)-secretary


problem that is c-competitive. Then the mechanism is also c-competitive for max-
imizing the expected profit in the cardinal version of the problem.

Kleinberg [15] gave an asymptotically tight mechanism for the cardinal version of
the problem. However, this mechanism is randomized, and also not tight for small
values of k. Better mechanisms, even restricted to small values of k, are helpful
not only for solving the original problem, but also for improving mechanisms that
are based upon them. For example, a mechanism for the secretary knapsack [2]
uses a mechanism that is 1/e competitive for maximizing the expected profit
for small values of k (k ≤ 27). Analyzing the LP asymptotically for any value
n is a challenge even for small value k. However, using our characterization we
solve the problem easily for small values k and n which gives an idea on how
competitive ratio behaves for small values of k. Our results appear in Table 1.
We also give complete asymptotic analysis for the cases of (1, 2), (2, 1)-secretary
problems.

Table 1. Competitive ratio for Maximizing expected profit. Experimental results for
n = 100

Number of elements allowed to be picked by the mechanism Competitive ratio


1 1/e = 0.368
2 0.474
3 0.565
4 0.613
Secretary Problems via Linear Programming 167

1.2 Related Work

The basic secretary problem was introduced in a puzzle by Martin Gardner [11].
Dynkin [8] and Lindley [18] gave the optimal solution and showed that no other
strategy can do better (see the historical survey by Ferguson [9] on the history of
the problem). Subsequently, various variants of the secretary problem have been
studied with different assumptions and requirements [20](see the survey [10]).
More recently, there has been significant work using generalizations of secre-
tary problems as a framework for online auctions [2,3,4,13,15]. Incentives issues
in online mechanisms have been studied in several models [1,13,17]. These works
designed mechanisms where incentive issues were considered for both value and
time strategies. For example, Hajiaghayi et. al. [13] studied a limited supply
online auction problem, in which an auctioneer has a limited supply of identical
goods and bidders arrive and depart dynamically. In their problem bidders also
have a time window which they can lie about.
Our linear programming technique is similar to the technique of factor reveal-
ing linear programs that have been used successfully in many different settings
[7,12,14,19]. Factor revealing linear program formulates the performance of an
algorithm for a problem as a linear program (or sometimes, a more general con-
vex program). The objective function is the approximation factor of the algo-
rithm on the problem. Thus solving the linear program gives an upper bound on
the worst case instance which an adversary could choose to maximize/minimize
the approximation factor. Our technique, in contrast, captures the information
structure of the problem itself by a linear program. We do not apriori assume
any algorithm but formulate a linear program which captures every possible
algorithm. Thus optimizing our linear program not only gives us an optimal
algorithm, but it also proves that the algorithm itself is the best possible.

2 Introducing the Technique: Classical Secretary (and


Variants)

In this section, we give a simple linear program which we show characterizes all
possible mechanisms for the secretary problem. We stress that the LP captures
not only thresholding mechanisms, but any mechanism including probabilistic
mechanisms. Hence, finding the best mechanism for the secretary problem is
equivalent to finding the optimal solution to the linear program. The linear
program and its dual appear in Figure 1. The following two lemmas show that

 n
(P) max n1 · n i=1 ipi (D) min i=1 xi
s.t. s.t.
 
∀ 1 ≤ i ≤ n i · pi ≤ 1 − i−1
j=1 pj ∀1≤i≤n n j=i+1 xj + ixi ≥ i/n
∀ 1 ≤ i ≤ n pi ≥ 0 ∀ 1 ≤ i ≤ n xi ≥ 0

Fig. 1. Linear program and its Dual for the secretary problem
168 N. Buchbinder, K. Jain, and M. Singh

the linear program exactly characterizes all feasible mechanisms for the secretary
problem.

Lemma 1. (Mechanism to LP solution) Let π be any mechanism for select-


ing the best candidate. Let pπi denote the probability of selecting the candidate at
position i. Then pπ is a feasible solution
! to the linear program (P), i.e, it satisfies

the constraints pi ≤ i 1 − j<i pj for each 1 ≤ i ≤ n. Moreover the objective
π 1 π

value n1 ni=1 ipπi is at least the probability of selecting the best candidate by π.

Proof. Let pπi be the probability in which mechanism π selects candidate i. Any
mechanism cannot increase its chances of hiring the best candidate by selecting
a candidate that is not the best so far, therefore we may consider only such
mechanisms. We now show that pπ satisfies the constraints of linear program.

pπi = P r[π selects candidate i| candidate i is best so far]


·P r[candidate i is best so far]
1
≤ P r[π did not select candidates {1, . . . , i − 1}]| candidate i is best so far] ·
i
However, the probability of selecting candidates 1 to i − 1 depends only on the
relative ranks of these candidates and is independent on whether candidate i is
best so far (which can be determined after the mechanism have done its  choices
regarding candidates 1 to i − 1). Therefore, we obtain pπi ≤ 1i (1 − j<i pπj ),
which proves our claim.
Now we show that the objective function of the linear program is at least
the probability with which π accepts the best candidate. Since the mechanism
cannot distinguish whether the ith candidate is the best candidate so far or
best candidate over all, the probability that the mechanism hires candidate i
given that the best candidate is in the ith position equals the probability the
mechanism hires candidate i given that the best candidate among candidates 1
to i is in the ith position. Since the ith candidate is best so far with probability
1/i, the latter probability is at least ipπi . Summing over alln positions we get
n
that π hires the best candidate with probability at least n1 i=1 ipπi .

Lemma 1 shows that the optimal solution to (P) is an upper-bound on the per-
formance of the mechanism. The following lemma shows that every LP solution
actually corresponds to a mechanism which performs as well as the objective
value of the solution.

Lemma 2. (LP solution to Mechanism) Let pi for 1 ≤ i ≤ n be any feasi-


ble LP solution to (P). Then consider the mechanism π which selects the candi-
date i with probability (1−ipi pj ) if candidate i is the best candidate so far and
j<i
candidate 1, . . . , i − 1 have not been selected, i.e., the mechanism reaches candi-
date i. Then π is a mechanism which selects the best candidate with probability
1 n
n i=1 ipi .
Secretary Problems via Linear Programming 169

n n n
max 1
n
· i=1 ipi + q(1 − pi ) (D) min
i=1 i=1 xi + q
s.t. s.t.
 
∀ 1 ≤ i ≤ n i · pi ≤ 1 − i−1
j=1 pj ∀1≤i≤n n j=i+1 xj + ixi ≥ i/n − q
∀ 1 ≤ i ≤ n pi ≥ 0 ∀ 1 ≤ i ≤ n xi ≥ 0

Fig. 2. Linear program and its Dual for the rehiring secretary problem

Proof. First, notice that the mechanism is well defined since for any i,
(1− j<i pj ) ≤ 1. We prove by induction that the probability that the mechanism
ipi

selects candidate at position i is exactly pi . The base case is trivial. Assume this
is true until i − 1. At step i, the probability we choose
 i is the probability that
we didn’t choose candidates 1 to i − 1 which is 1 − j<i pj times the probability
that the current candidate is best so far which is 1/i times (1−ipi pj ) which is
j<i
exactly pi .
The probability of hiring the ith candidate given that the ith candidate is the
best candidate is equal the probability of hiring the ith candidate given the ith
candidate is the best candidate among candidates 1 to i. Otherwise, it means
that the mechanism is able to distinguish between the event of seeing the relative
ranks and the absolute ranks which is a contradiction to the definition of the
secretary problem. Since the ith candidate is best so far with probability 1/i,
the latter probability equals ipi (the mechanism hires only the best candidate
so far). Summing over all possible position
n n we get that the mechanism π hires
the best candidate with probability n1 i=1 ipi .

Using the above equivalence between LP solutions and the mechanisms, it is easy
to show that the optimal mechanism can hire the best candidate with probability
of no more than 1/e. The proof is simply by constructing a feasible solution to
the dual linear program.

Lemma 3 ([8]). No mechanism can hire the best candidate with probability
better than 1/e + o(1).

Proof. To prove an upper bound of 1/e+o(1) we only need to construct a feasible


dual solution to program (D) with value 1/e+o(1). Set xi = 0 for each 1 ≤ i ≤ ne
n−1
and xi = n1 (1 − j=i 1j ) for ne < i ≤ n. A simple calculation shows that x is
feasible and has objective value at most 1e + o(1).

2.1 Allowed to Rehire

One natural extension of the secretary problem is the case when one is allowed
to rehire the best secretary at the end with certain probability. That is, suppose
that after the interviewer has seen all n candidates, he is allowed to hire the
best candidate with certain probability q if no other candidate has been hired.
Observe that if q = 0, the problem reduces to the classical secretary problem
170 N. Buchbinder, K. Jain, and M. Singh

  
(P 1) max n1 n i=1 fi (P 2) max n1 n i=1 fi (P 3) max n1 n i=1 fi
s.t. s.t. s.t.
p ≤ 1/n p ≤ 1/n p = 1/n
∀ i fi + (i − 1) · p ≤ 1 ∀ i fi + (i − 1) · p ≤ 1 ∀ i fi + (i − 1) · p ≤ 1
∀ i fi ≤ i · p ∀ i fi = i · p ∀ i fi ≤ i · p
∀ i p, fi ≥ 0 ∀ i p, fi ≥ 0 ∀ i p, fi ≥ 0
(Incentive compatible) (Regret free) (Must-hire)

Fig. 3. (P1): Characterizes any incentive compatible mechanism. (P2) characterizes


mechanisms that are regret free. (P3) characterizes mechanisms that are must-hire
mechanisms.

while if q = 1, then the optimal strategy is to wait till the end and then hire the
best candidate. We give a tight description of strategies as q changes. This can
be achieved simply
n by modifying the linear program: simply add in the objective
function q(1 − i=1 pi ). That is, if the mechanism did not hire any candidate
you may hire the best candidate with probability q. Solving the primal and the
corresponding dual (see Figure 2) give the following tight result. The proof is
omitted.

Theorem 2. There is a mechanism for the rehire variant that selects the best
secretary with probability e−(1−q) + o(1) and it is optimal.

3 Incentive Compatibility

In this section we study incentive compatible mechanisms for the secretary prob-
lem. We design a set of mechanisms Mp and show that with certain parameters
these mechanisms are the optimal mechanisms for certain secretary problems.
To this end, we derive linear formulations that characterize the set of possible
incentive compatible mechanisms and also analyze the dual linear programs.
The basic linear formulation that characterizes all incentive compatible mech-
anisms appears in Figure 3. We give a set of three linear formulations. The for-
mulation (P 1) characterizes all mechanisms that are incentive compatible, (P 2)
captures mechanisms that are also regret free and (P 3) captures mechanisms
that are must-hire mechanisms. This is formalized in the following two lemmas.

Lemma 4. (Mechanism to LP solution) Let π be any mechanism for select-


ing the best candidate that is incentive compatible. Let pπ denote the probability
the mechanism selects a candidate at each position i, and let fiπ be the probabil-
ity the mechanism selects the candidate at position i given that the candidate at
position i is the best candidate. Then:

– pπ , fiπ is a feasible solution to the linear program (P 1).


– If the mechanism is also regret free then pπ , fiπ is a feasible solution to the
linear program (P 2).
Secretary Problems via Linear Programming 171

– If the mechanism is also must-hire then pπ , fiπ is a feasible solution to the


linear program (P 3).
n
– The objective value n1 i=1 fiπ is at least the probability of selecting the best
candidate by π.

Proof. The proof follows the same ideas as in the proof of Lemma 1. The con-
dition of incentive compatibility implies that pi = pj = p for any two positions
i and j.
Also, in the original secretary problem, every mechanism could be modified to
be a regret free mechanism. This is not true for an incentive compatible mecha-
nism. Indeed, we have the following constraint, fi ≤ ipi since the probability of
hiring in the ith position is at least the probability of hiring in the ith position
given that the candidate is best so far times 1/i. If the mechanism is also sup-
posed to be regret free then equality must hold for each i. In the must-hire part
we demand that the sum of pi is 1. The resulting formulation given in Figure 3
is after simplification.

Lemma 4 shows that the optimal solution to the linear formulations is an upper-
bound on the performance of the mechanism. To show the converse we define a
family of mechanisms that are defined by their probability of selecting a candi-
date at each position 0 ≤ p ≤ 1/n, we show that the set of feasible solutions to
(P 1) corresponds to the set of mechanisms Mp defined here.
Incentive Compatible Mechanism Mp :
– Let 0 ≤ p ≤ 1/n. For each 1 ≤ i ≤ n, while no candidate is selected, do
• If 1 ≤ i ≤ 2p 1 i
, select the ith candidate with probability 1/p−i+1 if
she is the best candidate so far.
• If 2p1
< i ≤ n, let r = 1/p−i+1
i
. Select the ith candidate with proba-
bility 1 if her rank is in top r and with probability r − r if her
rank is r + 1.
The following lemma shows that every LP solution to (P 1) corresponds to a
mechanism which performs as well as the objective value of the solution.

Lemma 5. (LP solution to Mechanism) Let p, fi for 1 ≤ i ≤ n be a feasible


LP solution to (P 1). Then the mechanism Mp selects the best candidate with
n
probability which is at least n1 i=1 fi .

Proof. For any p, the optimal values of fi are given by the following. fi = ip for
1 ≤ i ≤ 2p1
and fi = 1 − (i − 1)p for i > 2p1
. For ease of calculations, we ignore
the fact the fractions need not be integers. These are exactly the values achieved
by the mechanism Mp for any value p.

Lemma 6. The mechanism Mp is!incentive compatible for each 0 ≤ p ≤ 1/n


and has efficiency of 1 −
1 pn
4pn + 2 .
172 N. Buchbinder, K. Jain, and M. Singh

Proof. We prove by induction that the mechanism Mp selects each position i


with probability p. It is easy to verify that for i = 1 this is true. For i > 1. The
probability the mechanism chooses position i is by our induction hypothesis:
1 i 1 1 − (i − 1)p
r · · (1 − (i − 1)p) = (1 − (i − 1)p) = =p
i 1/p + i − 1 i 1/p + i − 1
The probability the mechanism selects the best candidate is related to fi . fi = ip
for 1 ≤ i ≤ 1/2p, and fi = 1 − (i − 1)p for 1/2p < i ≤ n. Thus, we get:
⎛ ⎞
 n 
1/2p
n  
1 1 1 pn
fi = ⎝ ip + (1 − (i − 1)p)⎠ = 1 − +
n i=1 n i=1 4pn 2
i=1/2p+1

Optimizing the linear programs (P 1), (P 2) and (P 3) exactly, we get the following
theorem. The optimality of the mechanisms can also be shown by exhibiting an
optimal dual solution.
Theorem 3. The family of mechanisms Mp achieves the following.
1. Mechanism M1/√2n is incentive compatible with efficiency of 1 − √12 ≈ 0.29.
2. Mechanism M1/2n is incentive compatible and regret free with efficiency 14 .
3. Mechanism M1/n is incentive compatible and must-hire with efficiency 14 .
Moreover, all these mechanism are optimal for efficiency along with the addi-
tional property.

4 The J -Choice K-Best Secretary Problem


In this section we study a general problem of selecting as many of the top
1, . . . , K ranked secretaries given J rounds to select. The mechanism is given J
possible rounds in which it may select a candidate, and it gains from selecting
any of the first K ranked candidates. The classical secretary problem is exactly
1-choice 1-best secretary problem. Other special cases include cases in which
the mechanism is given J rounds and get profit only for the best candidate, or
getting a single round, but receive profit for any of the best K candidates. Our
result is a simple linear formulation that characterize all strategies for selecting
the candidates.
The following two lemmas show that the above linear program exactly char-
acterizes all feasible mechanisms for the (J, K)-secretary problem.
Lemma 7. (Mechanism to LP solution) Let π be any mechanism for se-
lecting the (J, K)-secretary problem. Let
– pji (π): The probability of accepting the candidate at ith position in the jth
round for each 1 ≤ i ≤ n and each 1 ≤ j ≤ J.
j|k
– qi (π): The probability of accepting the candidate ith position in the jth
round given that the candidate is the kth best candidate among the i first
candidates for each 1 ≤ i ≤ n, 1 ≤ j ≤ J and 1 ≤ k ≤ K.
Secretary Problems via Linear Programming 173

n J K k (n−i
k−)(−1) j|
i−1
max F (q) = 1
· qi
n i=1 j=1 k=1 =1 (n−1
k−1 )
s.t.
min{i,K} j|k
∀ 1 ≤ i ≤ n, 1≤j≤J pji = 1i k=1 qi
1|k 
∀ 1 ≤ i ≤ n, 1≤k≤K qi ≤ 1 − <i p1i
j|k  
∀ 1 ≤ i ≤ n, 1 ≤ k ≤ K, 2 ≤ j ≤ J qi ≤ j−1
<i pi − <i pji
j|k
∀ 1 ≤ i ≤ n, 1 ≤ k ≤ K, 1 ≤ j ≤ J pji , qi ≥ 0

Fig. 4. Linear program for the (J, K)-secretary problem

Then (p(π), q(π)) is a feasible solution and expected number of top K candidates
selected is at most F (p(π), q(π)).
min{i,K} j|k
Proof. Let us prove the first type of constraints of the form: pji = 1i k=1 qi
It is clear that there is no reason for any mechanism to select a candidate which
is not at least the K best so far. Such a candidate cannot be even potentially one
of the K best globally and therefore is not profitable for the mechanism. Thus,
the probability any mechanism selects the ith candidate in the jth round is the
sum of the probability of selecting the ith candidate in the jth round given that
the candidate is the kth best candidate so far times 1/i, which is the probability
that the candidate is the kth best so far. We sum until the minimum between
i and K to get the desired equality which holds for every mechanism.Let us
now prove the third type of constraints (the second type follows by the same
arguments). Consider any mechanism and some position i and some rounds j.

j|k
qi = P r[π selects candidate i in round j| candidate i is kth best so far]
≤ P r[π selects exactly j − 1 candidates out of cand. {1, . . . , i − 1}]
| candidate i is kth best so far]
= P r[π selects exactly j − 1 candidates out of cand. {1, . . . , i − 1}]
 j−1  j
= pi (π) − pi (π)
<i <i

The inequality follows since in order to select candidate i in round j the mechanism
must have selected exactly j−1 candidates out of the previous i−1 candidates. The
following equality then follows since the decisions made by the policy with respect
to the i−1 candidates depend only on the relative ranks of the i−1 candidates, and
is independent of the rank of the ith candidate with respect to these candidates.
The final equality follows since the event of selecting j − 1 candidates contains the
event of selecting j candidates, which concludes our proof.
Finally, let us consider the objective function and prove that it upper bounds
the performance of the mechanism. For analysis purpose let us consider the prob-
j|k
abilities fi that are defined as probability of selecting the ith candidate in the
jth round given that the kth best candidate is in the ith position. Note that the
j|k j|k
main difference between fi and qi is that while the former consider the kth
174 N. Buchbinder, K. Jain, and M. Singh

best candidate overall, the latter only looks from the mechanism’s perspective
and therefore looks at the event of the kth best candidate among the first i can-
didates. It is easy to state the objective function using the first set of variables
j|k
as simply the sum over all values of i, j and k of fi divided by 1/n.
j|k j|k
To finish we simply define each fi in terms of qi which proves the lemma.
Claim. For each 1 ≤ i ≤ n, 1 ≤ j ≤ J and 1 ≤ k ≤ K, we must have
i−1n−i
j|k

k
j|
fi = n−1
−1 k−
qi
=1 k−1
The proof is omitted due to lack of space. The proof of Lemma 7 follows directly
from the claim.
Lemma 7 shows that the optimal solution to (P) is an upper-bound on the per-
formance of the mechanism. The following lemma shows that every LP solution
actually corresponds to a mechanism which performs as well as the objective
value of the solution.
Lemma 8. (LP solution to Mechanism) Let (p, q) be any feasible LP solu-
tion to (P). Then consider the mechanism π defined inductively as follows. For
each position 1 ≤ i ≤ n,
– If the mechanism has not selected any candidate among position {1, . . . , i−1}
and the rank of candidate i among {1, . . . , i} is k for some 1 ≤ k ≤ K, then
1|k
q
select candidate i with probability 1−i p1 .
<i i
– If the mechanism has selected j − 1 candidates in positions 1, . . . , i − 1 for
some 2 ≤ j ≤ J and the rank of candidate i among {1, . . . , i} is k for some
j|k
qi
1 ≤ k ≤ K, then select candidate i with probability 
pj−1 
− <i pji
.
<i i
– Else do not select candidate i.
Then expected number of top k candidates selected by π is exactly F (p, q).
Proof (Sketch). The proof is by induction on the steps of the mechanism. It
can be verified easily that the procedure above keeps by induction that pji (π) =
j|k j|k
pji ,qi (π) = qi . That is, the probability the mechanism selects the ith candi-
date in the jth round is the same as the LP. As stated in Lemma 7 there is a
j|k j|k
correspondence between the values of qi (π) and fi (π) which is the probabil-
ities of hiring the ith candidate in the jth round given that the candidate is the
kth best. Thus, the objective function of π is exactly F (p, q).
We now give optimal mechanism for the (1, 2) and (2, 1)-secretary problem.
Observe that (1, 1)-secretary problem is the traditional secretary problem.
Theorem 4. There exists mechanisms which achieve a performance of
1. 1e + e1.5
1
) 0.591 for (2, 1)-secretary problem.
2. ) 0.572284 for the (1, 2) secretary problem.
Moreover all these mechanisms are (nearly) optimal.
Secretary Problems via Linear Programming 175

Proof. (Sketch) To give a mechanism, we will give a primal solution to LP (J, K).
The optimality is shown by exhibiting a dual solution of the same value. Due to
lack of space we only prove the (2, 1) case.
n
(2,1)-secretary. Let t1 = e3/2 and t2 = ne . Consider the following mechanism
that selects the ith candidate if ith candidate is best so far and t1 ≤ i < t2 and
no other candidate has been selected or if t2 ≤ i ≤ n and ith candidate is best
so far and at most one candidate has been selected. The performance of this
mechanism is 1e + 13 . The mechanism corresponds to the primal LP solution
e2
t1 −1
where p1i = 0 for 1 ≤ i < t1 and p1i = i(i−1) for t1 ≤ i ≤ n, p2i = 0 for 1 ≤ i < t2
t2 −t1 i−1 t1 −1 j|1
and p2i = i(i−1) − i(i−1)
1
j=1 i−1 for t2 ≤ i ≤ n, qi = i · pji for each 1 ≤ j ≤ 2
and 1 ≤ i ≤ n.
Dual Solution. We first simplify the primal linear program by eliminating the
j|k
qi variables using the first set of constraints. Let yi denote the dual variables
corresponding to the second set of constraints and zi the variables corresponding
to the third set of constraints. Then the following dual
nsolution is of value 1e +
3 − o(1). Set zi = 0 for 1 ≤ i < t2 and zi = n (1 − j=i+1 j ) for t2 ≤ i ≤ n.
1 1 1
e2 n n 
Set yi = 0 for 1 ≤ i < t1 , yi = n (1 − j=i+1 j ) + j=t2 in
1 1 1
(1 − nk=j+1 k1 ) for
 n  n  n
t1 ≤ i < t2 and yi = n1 (1 − j=i+1 1j ) + j=i in 1
(1 − k=j+1 k1 ) for t2 ≤ i ≤ n.

5 Further Discussion
Characterizing the set of mechanisms in secretary type problems as a linear poly-
tope possesses many advantages. In contrast to methods of factor revealing LPs
in which linear programs are used to analyze a single algorithm, here we char-
acterize all mechanisms by a linear program. One direction for future research
is trying to capture more complex settings of a more combinatorial nature. One
such example is the clean problem studied in [4] in which elements of a matroid
arrive one-by-one. This problem seems extremely appealing since matroid con-
straints are exactly captured by a linear program. Another promising direction
is obtaining upper bounds. While the linear program which characterizes the
performance may be too complex to obtain a simple mechanism, the dual linear
may still be used for obtaining upper bounds on the performance of any mech-
anism. We believe that linear programming and duality is a powerful approach
for studying secretary problems and will be applicable in more generality.

References
1. Awerbuch, B., Azar, Y., Meyerson, A.: Reducing Truth-Telling Online Mechanisms
to Online Optimization. In: Proceedings of ACM Symposium on Theory of Com-
puting, pp. 503–510 (2003)
2. Babaioff, M., Immorlica, N., Kempe, D., Kleinberg, R.: A Knapsack Secretary
Problem with Applications. In: Charikar, M., Jansen, K., Reingold, O., Rolim,
J.D.P. (eds.) RANDOM 2007 and APPROX 2007. LNCS, vol. 4627, pp. 16–28.
Springer, Heidelberg (2007)
176 N. Buchbinder, K. Jain, and M. Singh

3. Babaioff, M., Immorlica, N., Kempe, D., Kleinberg, R.: Online Auctions and Gen-
eralized Secretary Problems. SIGecom Exchange 7, 1–11 (2008)
4. Babaioff, M., Immorlica, N., Kleinberg, R.: Matroids, Secretary Problems, and
Online Mechanisms. In: Proceedings 18th ACM-SIAM Symposium on Discrete
Algorithms (2007)
5. Babaioff, M., Dinitz, M., Gupta, A., Immorlica, N., Talwar, K.: Secretary problems:
weights and discounts. In: SODA ’09: Proceedings of the Nineteenth Annual ACM
-SIAM Symposium on Discrete Algorithms, pp. 1245–1254. Society for Industrial
and Applied Mathematics, Philadelphia (2009)
6. Buchbinder, N., Singh, M., Jain, K.: Incentives in Online Auctions and Secretary
Problems via Linear Programming (2009) (manuscript)
7. Buchbinder, N., Jain, K., Naor, J(S.): Online primal-dual algorithms for maximiz-
ing ad-auctions revenue. In: Arge, L., Hoffmann, M., Welzl, E. (eds.) ESA 2007.
LNCS, vol. 4698, pp. 253–264. Springer, Heidelberg (2007)
8. Dynkin, E.B.: The Optimum Choice of the Instant for Stopping a Markov Process.
Sov. Math. Dokl. 4 (1963)
9. Ferguson, T.S.: Who Solved the Secretary Problem? Statist. Sci. 4, 282–289 (1989)
10. Freeman, P.R.: The Secretary Problem and its Extensions: A Review. International
Statistical Review 51, 189–206 (1983)
11. Gardner, M.: Mathematical Games. Scientific American, 150–153 (1960)
12. Goemans, M., Kleinberg, J.: An improved approximation ratio for the minimum
latency problem. In: SODA ’96: Proceedings of the seventh annual ACM-SIAM
symposium on Discrete algorithms, pp. 152–158 (1996)
13. Hajiaghayi, M.T., Kleinberg, R., Parkes, D.C.: Adaptive Limited-Supply Online
Auctions. In: Proceedings of the 5th ACM Conference on Electronic Commerce
(2004)
14. Jain, K., Mahdian, M., Markakis, E., Saberi, A., Vazirani, V.V.: Greedy facil-
ity location algorithms analyzed using dual fitting with factor-revealing lp. J.
ACM 50(6), 795–824 (2003)
15. Kleinberg, R.: A Multiple-Choice Secretary Algorithm with Applications to Online
Auctions. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on
Discrete algorithms (2005)
16. Korula, N., Pál, M.: Algorithms for secretary problems on graphs and hypergraphs.
In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W.
(eds.) ICALP 2009. LNCS, vol. 5556, pp. 508–520. Springer, Heidelberg (2009)
17. Lavi, R., Nisan, N.: Competitive Analysis of Incentive Compatible On-line Auc-
tions. In: Proceedings of 2nd ACM Conf. on Electronic Commerce, pp. 233–241
(2000)
18. Lindley, D.V.: Dynamic Programming and Decision Theory. Applied Statistics 10,
39–51 (1961)
19. Mehta, A., Saberi, A., Vazirani, U., Vazirani, V.: Adwords and generalized online
matching. J. ACM 54(5), 22 (2007)
20. Samuels, S.M.: Secretary Problems. In: Handbook of Sequential Analysis, vol. 118,
pp. 381–405 (1991)
Branched Polyhedral Systems

Volker Kaibel and Andreas Loos

Otto-von-Guericke-Universität Magdeburg, Institut für Mathematische Optimierung


Universitätsplatz 2, 39108 Magdeburg, Germany
{kaibel,loos}@ovgu.de

Abstract. We introduce the framework of branched polyhedral systems


that can be used in order to construct extended formulations for polyhe-
dra by combining extended formulations for other polyhedra. The frame-
work, for instance, simultaneously generalizes extended formulations like
the well-known ones (see Balas [1]) for the convex hulls of unions of
polyhedra (disjunctive programming) and like those obtained from dy-
namic programming algorithms for combinatorial optimization problems
(due to Martin, Rardin, and Campbell [11]). Using the framework, we
construct extended formulations for full orbitopes (the convex hulls of
all 0/1-matrices with lexicographically sorted columns), we show for two
special matching problems, how branched polyhedral systems can be
exploited in order to construct formulations for certain nested combina-
torial problems, and we indicate how one can build extended formula-
tions for stable set polytopes using the framework of branched polyhedral
systems.

1 Introduction
An extended formulation for a polyhedron P ⊆ Rn is a linear system Ay ≤ b
defining a polyhedron Q = {y ∈ Rd | Ay ≤ b} such that there is a projection
(linear map) p : Rd → Rn with p(Q) = P . With respect to optimization of
linear functionals x %→ "c, x# over P , such an extended formulation is similarly
useful as a description of P by means of linear inequalities in Rn , since we have
max{"c, x# | x ∈ P } = max{"p (c), y# | Ay ≤ b} with the map p : Rn → Rd
that is adjoint to p, i.e., p (x) = T t x for all x ∈ Rn if T ∈ Rn×d is the matrix
with p(y) = T y for all y ∈ Rd .
Extended formulations play an increasingly important role in polyhedral com-
binatorics and mixed integer programming. For a survey on extended formula-
tions in combinatorial optimization we refer to [2], and for examples of recent
interesting results on extended formulations for mixed integer problems to, e.g.,
[3,4]. The reason for the importance of extended formulations lies in the fact
that they can have much less constraints than a description in the original space.
Moreover, in many cases the extensions reflect much better the structure of the
underlying problem, because not all relevant aspects can be expressed linearly
in the original variables. In fact, in several cases, (small) extended formulations
are rather easy to derive, while it seems extremely difficult to come up with a
linear description in the original space.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 177–190, 2010.

c Springer-Verlag Berlin Heidelberg 2010
178 V. Kaibel and A. Loos

A trivial extended formulation for a polytope P ⊆ Rn with vertex set X is


given by the simplex Q ∈ RX that is the convex hull of the standard  unit vec-
tors in RX with the projection p : RX → Rn defined via p(y) = x∈X yx · x. Of
course, this extended formulation usually neither helps to solve an optimization
problem nor does it provide much insight, because it is just trivially combined
from many trivial building blocks, the vertices of P . In many cases, however, one
can construct much better (e.g., smaller) extended formulations by combining
less non-trivial building blocks that are still simple enough. For instance, one
may be able to cover the vertex set X by X = X1 ∪ · · · ∪ Xq such that one
has available nice linear descriptions of the polytopes P (i) with vertex sets Xi ,
respectively, which one wants to combine to a linear description or an extended
formulation of the whole polytope P (see Section 4.1). This is known as disjunc-
tive programming (see, e.g., [1]). As a simple combinatorial example in which
the interplay of the non-trivial building blocks is slightly more involved, let us
consider those perfect matchings in a 2-partite graph with corresponding par-
titioning of the node set into 2 parts, each of size p, that induce, on each of the
bipartite subgraphs defined by any pair of parts, either an empty or a perfect
matching. The question is: How can one derive a linear description or an ex-
tended formulation for the polytope associated with these matchings from linear
descriptions of the perfect matching polytope on the complete graph on 2 nodes
and from the perfect matching polytopes on the bipartite subgraphs induced by
pairs of parts (see Subsection 4.4)? Or suppose that we can partition the node
set of a graph in such a way that the stable set polyopes on each of the parts can
be described by linear inequalities (or by extended formulations). How can one
construct extended formulations for the stable set polytope of the whole graph
from descriptions of the stable set polytopes of the parts and some polytope that
describes the interplay of the parts (see Subsection 4.5)?
The framework of branched polyhedral systems that we introduce in this pa-
per can be used for combining knowledge on several polyhedra in order to de-
rive extended formulations (or, in some cases, even linear descriptions in the
original space) for other polyhedra. We demonstrate the capabilities of the
framework on the examples mentioned in the previous paragraph, as well as
by showing how the framework generalizes the directed hypergraph framework
developed in [11] in order to obtain extended formulations from certain dynamic
programming algorithms (see Subsection 4.3). The relation between branched
polyhedral systems and the composition methods developed by Margot [10] and
Schaffers [12] will be discussed in the journal version of this paper. We also
provide the first polynomial size extended formulations for full orbitopes, i.e.,
the convex hulls of all 0/1-matrices with lexicographically sorted columns (see
Section 4.2).
The interplay between the polyhedra in our framework is governed by acyclic
directed graphs, where the polyhedra that form the building blocks are assigned
to the (non-sink) nodes of the digraph. The central result of the paper is Theo-
rem 1 (stated and proved in Section 3) that establishes how a polyhedron defined
by an inner description constructed from inner descriptions of the building block
Branched Polyhedral Systems 179

polyhedra as described in Section 2 can be described by means of an extended


formulation derived from extended formulations for those building blocks. The
applications mentioned above are treated in Section 4.
For easier reference, we define here most of the notions and notations used
throughout the paper. For a directed graph D = (V, A) (with A ⊆ V × V \
{(v, v) | v ∈ V }) and some node v ∈ V , we denote by Nin (v) = {u ∈ V | (u, v) ∈
A} and Nout (v) = {w ∈ V | (v, w) ∈ A} the sets of in-neighbors and out-
neighbors of v, respectively, by δ in (v) = {(u, v) | u ∈ Nin (v)} and δ out (v) =
{(v, w) | w ∈ Nout (v)} the in-star and the out-star of v, respectively, and by
R(v) (the reach of v) the set of nodes that can be reached from v by some
directed path in D. A digraph D = (V, A) is acyclic if it has no directed cycle.
We denote the set of sinks of D = (V, A) by VT ⊆ V .
An acyclic subset ∅ = B ⊆ A of arcs of a directed graph D = (V, A) is called
a branching in D if, viewed undirectedly, B forms a tree, and |B ∩ δ in (v)| = 1
holds for every v in the set V (B) of nodes incident to arcs in B (the nodes
covered by B), except for one node, called the root r ∈ V (B) of B (for which
B ∩ δ in (r) = ∅ holds). For W ⊂ V , we denote by G[W ] the subgraph of G
induced by W .
For an undirected graph G = (V, E), we denote δ(v) = {e ∈ E | v ∈ E} for
all v ∈ V , and δ(W ) = {e ∈ E | |e ∩ W | = 1} for all W ⊆ V . For W, W  ⊆ V
with W ∩ W  = ∅, we define (W : W  ) = {e ∈ E | e ∩ W = ∅, e ∩ W  = ∅} and
(w : W  ) = ({w} : W  ) for w ∈ V \ W  .
For a vector x ∈ RM , the support supp(x) ⊂ M of x is the set of all i ∈ M
with xi = 0. For a subset N ⊆ M of indices of x, we define x(N ) = i∈N xi .
{1, 2, . . . , n}. For a set X ⊆ R , conv(X) =
n
Furthermore,
 we denote [n] =
{ x∈X̃ λx · x | X̃ ⊆ X finite, x∈X̃ λx = 1, λx ≥ 0 for all x ∈ X̃} denotes the

convex hull, ccone(X) = { x∈X̃ λx · x | X̃ ⊆ X finite, λx ≥ 0 for all x ∈ X̃} is
the convex-conic hull of X, and cone(X) = {λ · x | x ∈ X, λ ∈ R+ } is the conic
hull of X.
If N ⊆ M is a subset of the finite set M , then χ(N ) ∈ {0, 1}M with χ(N )i = 1
if and only if i ∈ N is the characteristic vector of N ⊆ M . For x ∈ RM , xN ∈ RN
is the vector formed by the components of x corresponding to N . The zero-vector
and the standard unit vectors in RM are denoted by OM and i (for i ∈ M ).
The symbol ( is used to represent disjoint unions of sets.

2 The Concept
A branched polyhedral system (BPS ) is a pair B = (D, (P (v) )) of an acyclic
directed graph D = (V, A) with a unique source s ∈ V and a family P (v) ⊆
out
RN (v) (v ∈ V \ VT ) of non-empty polyhedra, such that, for every v ∈ V \ VT ,
(v) (v)
there is an admissable pair (Gcv , Gcc ) of generating sets, i.e., finite sets ∅ =
(v) out (v) out
Gcv ⊆ RN (v) and Gcc ⊆ RN (v) with

P (v) = conv(G(v) (v)


cv ) + ccone(Gcc )
180 V. Kaibel and A. Loos

(v) (v)
satisfying the following for all x̃ ∈ Gcv ∪ Gcc :
(S.1) x̃w > 0 for all w ∈ supp(x̃) \ VT
(S.2) R(w) ∩ R(w ) = ∅ for all w, w ∈ supp(x̃), w = w
If P (v) is pointed (i.e., it has vertices), then we only need to consider the vertex
(v) (v)
set Gcv of P (v) and some set Gcc that contains exactly one non-zero vector
from each extreme ray of the recession cone of P (v) . As we can identify Nout (v)
out
with δ out (v), we will also consider the polyhedra P (v) as subsets of Rδ (v) .
For the remainder of this section, let B = (D, (P (v) )) be a BPS with D =
(v) (v)
(V, A) and source node s ∈ V . We fix one family (Gcv , Gcc ) of admissable pairs
of generating sets.
With respect to the fixed family of admissable pairs of generating sets, we de-
fine two finite sets Gcv (B), Gcc (B) ⊆ RV that will be used to define a polyhedron
associated with the BPS later (it will turn out that this polyhedron does not
depend on the particular family, but on the BPS only).
We start by constructing Gcv (B) ⊆ RV as the set that contains all points x ∈
R for which the following holds:
V

(V.1) xs = 1
(v)
(V.2) For each v ∈ supp(x) \ VT , we have x1v xNout (v) ∈ Gcv .
(V.3) For each v ∈ supp(x) \ {s}, we have xNin (v) = ONin (v) .
Note that Gcv (B) = ∅ holds, since we required the polyhedra P (v) to be non-
empty. Furthermore, looking at the nodes in order of a topological ordering of
the acyclic digraph D (i.e., v appears before w in the ordering for all (v, w) ∈ A),
one finds |Gcv (B)| < ∞.
For a node v ∈ V , the truncation of B at v is the BPS Bv induced by B on the
reach R(v) of v in D (with v as its source node). Clearly, a family of admissable
pairs of generating sets for B induces such a family for Bv , to which we refer
in the following characterization of Gcv (B) that in particular clarifies the name
branched polyhedral system. For a vector x ∈ RV , we denote by A[x] the set of
arcs (v, w) ∈ A with xv , xw = 0. We omit the proof of the following proposition
in this extended abstract.
Proposition 1. For each x ∈ RV , we have x ∈ Gcv (B) if and only if
(a) xs = 1,
(b) x1v · xR(v) ∈ Gcv (Bv ) for each v ∈ supp(x) \ VT , and
(c) A[x] is a branching in D with root s, supp(x) being the set of covered nodes.
We continue by constructing Gcc (B) as the set of all x ∈ RV for which there is
some node vx ∈ V \ VT (the root node of x) with:
(v )
(R.1) xNout (vx ) ∈ Gccx
(R.2) xV̄ = O for V̄ = V \ ∪{R(w) | w ∈ Nout (vx ) ∩ supp(x)}
(R.3) x1w xR(w) ∈ Gcv (Bw ) for all w ∈ Nout (vx ) ∩ supp(x)
Note that, since Gcv (Bw ) is finite for all w ∈ V \ VT , we also have |Gcc (B)| < ∞
(we already observed |Gcv (B)| < ∞).
Branched Polyhedral Systems 181

Finally, the polyhedron defined by the branched polyhedral system B is


P(B) = conv(Gcv (B)) + ccone(Gcc (B)) ,
where Theorem 1 will imply that this definition does not depend on the particular
choice of a family of admissable pairs of generating sets for the polyhedra P (v) .
Before concluding the section with considering a special class of branched
polyhedral systems, we state two remarks on nonnegativity (following readily
from (S.1)) and integrality (following readily from (V.1), (V.2), (V.3)).
Remark 1
1. For v ∈ V \ VT , we have xv ≥ 0 for all x ∈ Gcv (B) ∪ Gcc (B) (thus for all
x ∈ P(B)).
2. If P (v) is a pointed integral polyhedron for each v ∈ V \ VT , then P(B) is
integral as well.
A branched combinatorial system (BCS ) is a pair C = (D, (S (v) )) of an acyclic
directed graph D = (V, A) with a unique source s ∈ V (and set VT ⊆ V of sinks)
and a non-empty family S (v) of subsets of Nout (v) for every node v ∈ V \ VT
such that, for each v ∈ V \ VT and S ∈ S (v) and for every pair w, w ∈ S with
w = w , we have R(w) ∩ R(w ) = ∅. A subset F of nodes is feasible for the
BCS C if it satisfies
(F.1) s ∈ F ,
(F.2) F ∩ Nout (v) ∈ S(v) for all v ∈ F \ VT , and
(F.3) F ∩ Nin (v) = ∅ for all v ∈ F \ {s}.
out
With P (v) = conv{χ(S) ∈ {0, 1}N (v) | S ∈ S (v) } for all v ∈ V \ VT , we denote
by B(C) = (D, (P (v) )) the BPS defined by the BCS C. Clearly, the admissible
(v)
pairs of generating sets that we consider in this context are Gcv = {χ(S) ∈
Nout (v) (v)
{0, 1} | S ∈ S } and Gcc = ∅ for all v ∈ V \ VT , for which we find
(v)

Gcv (B(C)) = {χ(F ) ∈ {0, 1}V | F ⊆ V feasible for C} (and, clearly, Gcc (B(C)) =
{O}). In particular, Proposition 1 (c) implies that A∩(F ×F ) is a branching with
root s, and, with P(C) = P(B(C)), we have P(C) = conv{χ(F ) ∈ {0, 1}V | F ⊆
V feasible for C}.

3 Inequality Descriptions
For a non-empty polyhedron ∅ = Q ⊆ Rd with recession cone rec(Q) = {z ∈
Rd | z̃ + z ∈ Q for all z̃ ∈ Q}, the homogenization
homog(Q) = cone{(z, 1) | z ∈ Q} + {(z, 0) | z ∈ rec(Q)} ⊆ Rd ⊕ R
of Q = {z ∈ Rd | Az ≤ b} with A ∈ Rm×d , b ∈ Rm is the polyhedral cone
homog(Q) = {(z, ξ) ∈ Rd ⊕ R | Az − ξb ≤ 0, ξ ≥ 0} .
The following result, showing how to obtain, for a BPS B, an extended formula-
tion for P(B) from extended formulations for the polyhedra P (v) , makes up the
core of the paper.
182 V. Kaibel and A. Loos

Theorem 1. Let B = (D, (P (v) )) be a branched polyhedral system with a digraph


D = (V, A) with source node s ∈ V , let VT ⊆ V be the set of sinks of D, and
suppose that P (v) = π (v) (Q(v) ) with a polyhedron Q(v) ⊆ Rd(v) and a projection
out
(linear map) π (v) : Rd(v) → Rδ (v) for all v ∈ V \6 VT .
Then, with the polyhedron Q(B) ⊆ RV ⊕ RA ⊕ ( v∈V \VT Rd(v) ) defined by
xs = 1 (1)
in
xv = y(δ (v)) for all v ∈ V \ {s} (2)
yδout (v) = π (v)
(z (v)
) for all v ∈ V \ VT (3)
(z (v)
, xv ) ∈ homog(Q for all v ∈ V \ VT
(v)
) (4)
6
and the orthogonal projection π : RV ⊕ RA ⊕ ( v∈V \VT Rd(v) ) → RV , we have
π(Q(B)) = conv(Gcv (B)) + ccone(Gcc (B)) = P(B)
(v) (v)
with Gcv (B) and Gcc (B) defined with respect to any family (Gcv , Gcc ) of admiss-
able pairs of generating sets for the polyhedra P (v) (for all v ∈ V \ VT ).
Proof. We start by establishing P(B) ⊆ π(Q(B)).
First, let x ∈ Gcv (B) be arbitrary. For all v ∈ (V \ VT ) \ supp(x), we define
y(v,w) = 0 for all w ∈ Nout (v) and z (v) = O, and for all v ∈ (V \ VT ) ∩ supp(x)
(in particular: xv > 0 due to Rem. 1(1)), we set y(v,w) = xw for all w ∈ Nout (v)
and z (v) = xv · z̃ (v) for an arbitrary z̃ (v) ∈ Q(v) with π (v) (z̃ (v) ) = x1v yδout (v) (such
a z̃ (v) exists, as we have x1v xNout (v) ∈ P (v) due to (V.2) and yδout (v) = xNout (v)
due to the settings done just before). Thus, conditions (3) and (4) hold for this
vector (x, y, z). Furthermore, (1) (due to (V.1)) and (2) (due to Proposition 1
implying that y(u,v) = xv holds for exactly one u ∈ Nin (v) and y(u,v) = 0 for all
others) are satisfied as well, which shows x ∈ π(Q(B)), and thus, conv(Gcv (B)) ⊆
π(Q(B)). Similarly, one can show ccone(Gcc (B)) ⊆ π(rec(Q(B))) (which we omit
in this extended abstract). Hence, we have P(B) ⊆ π(Q(B)).
In order to prove π(Q(B)) ⊆ P(B), we show that for all c ∈ RV and ω =
max{"c, x# | (x, y, z) ∈ Q(B)} ∈ R∪{∞} there is some x ∈ Gcv (B) with "c, x # =
ω if ω < ∞, and that there is some x ∈ Gcc (B) with "c, x # > 0 if ω = ∞.
Towards this end, we reformulate (3) and (4) by inequality systems in order to
out
exploit linear programming duality. For each v ∈ V \ VT , let T (v) ∈ Rδ (v)×[d(v)]
be a matrix with π (v) (z (v) ) = T (v) z (v) for all z (v) ∈ Rd(v) , and let A(v) ∈
Rm(v)×d(v) , b(v) ∈ Rm(v) with Q(v) = {z (v) ∈ Rd(v) | A(v) z (v) ≤ b(v) }. Thus,
Q(B) ⊆ RV ⊕ RA is the polyhedron defined by the system
xs = 1 (5)
xv − y(δ (v)) = 0
in
for all v ∈ V \ {s} (6)
yδout (v) − T (v) (v)
z =0 for all v ∈ V \ VT (7)
A (v) (v)
z − xv b (v)
≤0 for all v ∈ V \ VT (8)
xv ≥ 0 for all v ∈ V \ VT (9)
(where (9) turns out to be redundant).
Branched Polyhedral Systems 183

In order to construct an x as required above, let us initialize vectors x(v) =


O ∈ RV for all v ∈ V \ VT and x(v) = v ∈ RV (the standard unit vector
associated with v ∈ V ) for all v ∈ VT as well as some auxilliary objective
function vector c ∈ RV with c VT = cVT . We process the nodes v ∈ V \ VT in
the reverse order of some topological ordering of the acyclic digraph D (i.e., for
each (v, w) ∈ A, node v is processed after node w) in the following way:
1. Let ζ (v) = max{"c Nout (v) , x̃# | x̃ ∈ P (v) } ∈ R ∪ {∞} (recall P (v) = ∅).
2. If ζ (v) = ∞:
(v)
(a) Let x̃ ∈ Gcc with "c Nout (v) , x̃# > 0.

(b) Set x = w∈supp(x̃) x̃w · x(w) , and terminate the construction of x .
3. If ζ (v) < ∞:
(v) m(v)
(a) Let x̃ ∈ Gcv with "c Nout (v) , x̃# = ζ (v) , and let λ(v) ∈ R+ be an optimal
dual solution to max{"(T (v) )t c Nout (v) , z̃# | A(v) z̃ ≤ b(v) } (= ζ (v) ).
(v) 
(b) Set x = v + w∈supp(x̃) x̃w · x(w) .
(c) Set c v = cv + ζ (v) .
If we did not terminate in Step (2b), then we finally set x = x(s) .
From Proposition 1 (and exploiting (S.2)), one deduces by induction that
after processing a node v ∈ V \ VT , we have x(v) ∈ Gcv (Bv ) and c v = "c, x(v) #,
if ζ (v) < ∞, and x ∈ Gcc (B) with "c, x # > 0 in case of ζ (v) = ∞. Thus, it
suffices to exhibit, in case that we did not terminate in Step (2b), a feasible dual
solution to max{"c, x# | (x, y, z) satisfies (5), . . . , (9)} of value "c, x # = c s . We
Nout (v)
have already defined dual variables λ(v) ∈ R+ for all inequalities (8). Let us
complete these dual variables by setting μ = −cNout (v) for all v ∈ V \ VT (for
(v)

equations (7)) and νv = c v for all v ∈ V (for equations (5) and (6)). Obviously,
the vector (ν, μ, λ) has dual objective function value νs = c s . We omit the
calculation showing that (ν, μ, λ) is dually feasible (and revealing the redundancy
of (9)) in this extended abstract.
Corollary 1. Let B = (D, (P (v) )) be a branched polyhedral system with a di-
graph D = (V, A) with source node s ∈ V , and let VT ⊆ V be the set of sinks
of D. Then, with the polyhedron Q(B) ⊆ RV ⊕ RA defined by the system
xs = 1 (10)
in
xv = y(δ (v)) for all v ∈ V \ {s} (11)
(yδout (v) , xv ) ∈ homog(P (v)
) for all v ∈ V \ VT (12)
and the orthogonal projection π : RV ⊕ RA → RV , we have P(B) = π(Q(B)).
Remark 2. For each BPS B, the orthogonal projection of Q(B) (defined by (10),
(11), (12)) to the y-space is isomorphic to Q(B) (due to (10), (11)).
In case that the digraph D is a branching itself, the projection P(B) of Q(B) to
the x-space is isomorphic to Q(B) as well. Thus, in this case, from descriptions
of the polyhedra P (v) (v ∈ V \ VT ) we even obtain a description of P(B) in the
original space.
184 V. Kaibel and A. Loos

Corollary 2. Let B = (D, (P (v) )) be a branched polyhedral system, whose di-


graph D = (V, A) itself is a branching rooted at s ∈ V , and let VT ⊆ V be the
set of sinks of D. Then P(B) is the polyhedron defined by

xNout (s) ∈ P (s) (13)


(xNout (v) , xv ) ∈ homog(P (v)
) for all v ∈ V \ ({s} ∪ VT ) . (14)

Proof. This follows readily from Corollary 1, because for branchings D = (V, A),
(11) means xw = y(v,w) for all (v, w) ∈ A.

4 Applications
4.1 Unions of Polyhedra
The following extended formulation for the convex hull of the union of finitely
many polyhedra is basically due to Balas [1] (see also [2, Thm. 5.1]). We show
how Balas’ result can be established as a consequence of Theorem 1. The slight
generalization to the case that the given polyhedra themselves are specified by
extended formulations is used, e.g., in [8] in order to construct compact extended
formulations of the polytopes associated with the cycles of length log n in a
complete graph with n nodes. We use conv(S) to denote the topological closure
of the convex hull of S.

Corollary 3. If the non-empty polyhedra ∅ = P (i) ⊆ Rn (for i ∈ [q]) are


projections P (i) = π (i) (Q(i) ) of polyhedra Q(i) = {z (i) ∈ Rd(i) | A(i) z (i) ≤ b(i) }
(with A(i) ∈ Rmi ×d(i) , b(i) ∈ Rmi and linear maps π (i) : Rd(i) → Rn ), then the
topological closure P = conv(P (1) ∪ · · · ∪ P (q) ) of the convex hull of the union
of the polyhedra P (i) is the projection P = p(Q) (with p(z (1) , . . . , z (q) , x) =
π (1) (z (1) ) + · · · + π (q) (z (q) )) of the polyhedron
 Q ⊆ Rd(1) × · · · × Rd(q) × Rq
defined by A z ≤ xi b for all i ∈ [q], i∈[q] xi = 1, and x ≥ O.
(i) (i) (i)

Proof. We define a BPS B on the digraph D = (V, A) with node set V = {s}([q](
{t1 , . . . , tn } and arc set A = ({s}×[q])∪([q]×{t1 , . . . , tn }), thus, VT = {t1 , . . . , tn }
is the set of sinks of D. Identifying Rn with RVT , the polyhedra associated with
(s)
the non-sink nodes are P (i) ⊆ RVT (for i ∈ [q]) and P (s) = conv(Gcv ) ⊆ R[q] with
(s) (s) (i) (i)
Gcv = { 1 , . . . , q } (and Gcc = ∅). We choose any finite sets Gcv , Gcc ⊆ RVT
(i) (i)
with P (i) = conv(Gcv ) + ccone(Gcc ) for all i ∈ [q].
For each x ∈ R we have x ∈ Gcv (B) if and only if xs = 1 and there is some
V
(i)
i ∈ [q] with x[q] = i ∈ R[q] as well as xVT ∈ Gcv ; similarly, x ∈ Gcc (B) if and
(i)
only if xs = 0, x[q] = O, and xVT ∈ Gcc for some i ∈ [q]. Denoting by [. . . ]VT
the orthogonal projection of a set of vectors to the xVT -space, we thus find
(i) (i)
[conv(Gcv (B))]VT = conv(∪qi=1 Gcv ) and [ccone(Gcc (B))]VT = ccone(∪qi=1 Gcc ),
from which conv(P ∪ · · · ∪ P ) ⊆ [P(B)]VT follows, which implies conv(P ∪
(1) (q) (1)

· · · ∪ P (q) ) ⊆ [P(B)]VT , because as a (projection of a) polyhedron, [P(B)]VT is


closed. We further derive [P(B)]VT = [Q(B)]VT ⊆ p(Q), the equation following
Branched Polyhedral Systems 185

from
q Theorem 1 und the inclusion from the fact that (2) and (3) imply xVT =
(i) (i)
i=1 π (z ). Thus, we have conv(P
(1)
∪ · · · ∪ P (q) ) ⊆ p(Q).
To see the reverse inclusion, let (z , . . . , z (q) , x) ∈ Q. With I = {i ∈ [q] | xi >
(1)

0} we find x1i A(i) z (i) ≤ b(i) from (4) or (8), hence x1i z (i) ∈ Q(i) for all i ∈ I, and

thus (as we have qi∈I xi = 1 due to (4) or (8) as well as x ≥ O)

v= π (i) (z (i) ) ∈ conv(P (1) ∪ · · · ∪ P (q) ) . (15)
i∈I

If I = [q], we have p(z , . . . , z (q) , x) = v, and (15) proves the claim. Otherwise,
(1)

for each i ∈ [q] \ I (again from (4) resp. (8)) we find A(i) z (i) ≤ O, hence z (i) ∈
rec (Q(i) ), and thus π (i) (z (i) ) ∈ rec (P (i) ). Therefore, choosing x(i) ∈ P (i) = ∅
arbitrarily for all i ∈ [q] \ I, we have
  (i) q−|I| (i) (i) 
w(ε) = 1
q−|I| x + ε π (z ) ∈ conv(P (1) ∪ · · · ∪ P (q) ) (16)
i∈[q]\I

for all ε > 0. By (15) and (16) we obtain (1 − ε)v + εw(ε) ∈ conv(P (1) ∪· · ·∪P

(q)
)
for all 0 < ε ≤ 1. Due to p(z , . . . , z , x) = limε→0 (1 − ε)v + εw
(1) (q) (ε)
, this
proves the claim.

4.2 Full Orbitopes


We denote by Mmax p,q the set of 0/1-matrices with p rows and q columns whose
columns are sorted in lexicographically non-increasing order. The full orbitope
is the polytope Op,q = conv(Mmax p,q ). While for the related packing- and parti-
tioning orbitopes (i.e., the convex hulls of those matrices in Mmax p,q with at most
and exactly, respectively, one 1-entry in every row) descriptions by (exponen-
tially many) linear inequalities are known [9,6], no such description is available
for Op,q . In fact, computer experiments indicate that the facet defining inequal-
ities for Op,q are extremely complicated. However, applying Corollary 1 to a
suitable BCS, we can easily provide a compact extended formulation for Op,q .
Let us define a branched combinatorial system on a digraph D = (V, A) whose
node set consists, next to a source node s, of nodes a(i, j, ) and b(i, j, ) for all
i ∈ [p] and j,  ∈ [q] with j ≤ . The nodes a(i, j, ) and b(i, j, ) represent
intervals of ones and intervals of zeroes, respectively, to be inserted at posi-
tions (i, j), (i, j + 1), . . . , (i, ) in a rowwise construction of a matrix M ∈ Mmaxp,q
from top to bottom. The arc set A consists of all arcs connecting s to the
nodes a(1, j, ) and b(1, j, ) (with j,  ∈ [q], j ≤ ), as well as all pairs (v, w)
with v ∈ {a(i, j, ), b(i, j, )} and w ∈ {a(i + 1, j, k), b(i + 1, k, )} for i ∈ [p − 1]
and j, k,  ∈ [q] with j ≤ k ≤ . The set VT of sinks of D thus consists of all
nodes a(p, j, ) and b(p, j, ) (with j,  ∈ [q], j ≤ ). We define
   
S (s) = {a(1, 1, q)}, {b(1, 1, q)} ∪ {a(1, 1, k), b(1, k + 1, q)} | 1 ≤ k ≤ q − 1
and, for all i ∈ [p − 1], j,  ∈ [q], j ≤ , and v ∈ {a(i, j, ), b(i, j, )},
 
S (v) = {a(i + 1, j, )}, {b(i + 1, j, )}
 
∪ {a(i + 1, j, k), b(i + 1, k + 1, )} | j ≤ k ≤  − 1 .
186 V. Kaibel and A. Loos

 C = (D, (S )) is a BCS, and the projection p : RV → R


(v) V p×q
Then with p(x)i,k =
1≤j≤k≤≤q xa(i,j,) maps P(C) = conv{χ(F ) ∈ {0, 1} | F ⊆ V feasible for C}
 
to Op,q . Because conv { 1 , 2 } ∪ { 2r−1 + 2r | r ∈ {2, . . . , n}} equals {x ∈
2n
+ | x2r−1 − x2r = 0 for all r ∈ {2, . . . , n}, 2x1 + 2x2 +
R2n r=2 xr = 2}, one
obtains linear descriptions of the polytopes P (s) and P (v) that yield the following
extended formulation for Op,q via Corollary 1.

Theorem 2. The full orbitope Op,q is a projection (obtained by first projecting


orthogonally to the x-space and then applying the projection p defined above) of
the polyhedron defined by y ≥ O and

xv − y(δ in (v)) = 0 (17)


q−1
2y(s,a(1,1,q)) + 2y(s,b(1,1,q)) + k=1 (y(s,a(1,1,k)) + y(s,b(1,k+1,q)) ) = 2
y(s,a(1,1,k)) − y(s,b(1,k+1,q)) = 0 (18)
−1
2yv,i+1,j, + k=j (y(v,a(i+1,j,k)) + y(v,b(i+1,k+1,)) ) − 2xv = 0 (19)
y(v,a(i+1,j,k)) − y(s,b(i+1,k+1,)) = 0 , (20)

where, for layout reasons, we use yv,i+1,j, = y(v,a(i+1,j,)) +y(v,b(i+1,j,)) , (17) has
to be included into the system for all v ∈ V \{s}, (18) for all 1 ≤ k ≤ q−1, as well
as (19) and (20) for all i ∈ [p − 1], j,  ∈ [q] with j ≤ , v ∈ {a(i, j, ), b(i, j, )},
and 1 ≤ k ≤  − 1 (where the latter only refers to (20)).

4.3 Dynamic Programming


We briefly show how the framework developed in Martin, Rardin, and Camp-
bell [11] for deriving extended formulations for combinatorial optimization prob-
lems from dynamic programming algorithms is related to a special case of The-
orem 1 for certain branched combinatorial systems.
The dynamic programing algorithms considered in [11] work by searching
hyperpaths in a directed hypergraph H = (V, H) on a (finite) state space V =
[n], where the directed hyperarcs in H are of the form (v, S) with v ∈ V and
S ⊆ {w ∈ V | w > v}. Furthermore, it is assumed that, for each w ∈ V \ {1},
there is at least one hyperarc (v, S) ∈ H with w ∈ S, i.e., s = 1 is the only
source of the directed hypergraph. The set VT ⊆ V of boundary states is the
set of all v ∈ V for which there is no hyperarc (v, S) ∈ H. And finally, there
needs to be a (finite) reference set R(v) = ∅ for each state v ∈ V such that
R(w) ⊆ R(v) for all w ∈ S and R(w) ∩ R(w ) = ∅ for all w, w ∈ S with w = w
is satisfied for all (v, S) ∈ H. A hyperpath in the hypergraph is a subset L of
hyperarcs that contains exactly one hyperarc (1, S) and for each v ∈ V \ {1}
incident to some hyperarc in L, there is exactly one hyperarc (u, S) ∈ L with
v ∈ S, and, if v ∈ VT , there is exactly one (v, S) ∈ L.
In this setting, it is proved in [11] that the convex hull of the characteristic
vectors (in the hyperarc space) of hyperpaths is described by the system
Branched Polyhedral Systems 187


(1,S)∈H z(1,S) = 1 (21)
 
(v,S)∈H z(v,S) = (u,S  )∈H : v∈S  z(u,S  ) for all v ∈ V \ ({s} ∪ VT ) (22)
z ≥O. (23)

System (21), (22), (23) arises from Theorem 1 in the following way. We construct
a BCS on the acyclic digraph D = (V, A) on the node set V of the hypergraph
H, where A consists of all arcs (v, w) for which there is some (v, S) ∈ H with
w ∈ S. Clearly, VT (the set of boundary states) is the set of all sinks of D.
Defining S (v) = {S ⊆ V | (v, S) ∈ H} for every v ∈ V \ VT , we obtain a
BCS C (for this, the existence of the reference sets R(v) is crucial) whose feasible
sets are exactly the node sets of hyperpaths in H. Describing the polytopes
P (v) = conv{χ(S) | S ∈ S (v) }, for all v ∈ V \ VT , by their trivial extended

formulations P (v) = π (v) (Q(v) ) with Q(v) = {z (v) ∈ RS+ |
(v) (v)
S∈S (v) zS = 1}
 (v)
and π (v) (z (v) ) = S∈S (v) zS · χ(S), Theorem 1 yields an extended formulation
of the convex hull P(C) of all node sets of hyperpaths in H that is isomorphic
to (21), (22), (23).
As the feasible sets for the BCS C are the node sets of the hyperpaths in H
(representing the set of states used in order to construct a solution during the
dynamic programming), quite often, the polytope associated with the combina-
torial optimization problem solved by the dynamic programming algorithm is
a projection of P(C), and thus, (21), (22), (23) provides an extended formula-
tion for that polytope. In principle, the concept of BCS allows more flexibility
here, because we can choose other representations of the polytopes P (v) than
their trivial extended formulations. This might be advantageous, e.g, if there are
states v for which the number of hyperedges (v, S) is much larger than the num-
ber of states. For the full orbitope example in Section 4.2 this is not the case.
In fact, one could derive the extended formulation in Theorem 2 also within the
hypergraph framework (and eliminating some redundant variables afterwards).

4.4 Nested Combinatorial Problems


Let N1 , . . . , Nq be pairwise disjoint finite sets, S (i) = ∅ a set of subsets of Ni
for each i ∈ [q], and
 S = ∅ a set of subsets
(0)
of [q]. Denoting N = N1 ∪ · · · ∪ Nq ,
we define S = S ⊆ N | S ∩ Ni ∈ S (i) ∪ {∅} for all i ∈ [q], {i ∈ [q] | S ∩

Ni = ∅} ∈ S (0) . In order to derive linear formulations of the polytope P =
conv{χ(S) | S ∈ S} ⊆ RN , let us construct a BCS on the digraph D = (V, A)
with V = {0} ( [q] ( N and A consisting of all arcs (0, i) (i ∈ [q]) and (i, tj )
(i ∈ [q], j ∈ Ni ). Thus D is a branching rooted at s = 0, the set of sinks is VT =
N , and C = (D, (S (v) )) indeed forms a BCS. From linear descriptions P (0) =
conv{χ(S) | S ∈ S (0) } = {x ∈ Rq | A(0) x ≤ b(0) } and P (i) = conv{χ(S) | S ∈
S (i) } = {x ∈ RNi | A(i) x ≤ b(i) } for all i ∈ [q] of the polytopes associated with
the given set systems, we conclude via Corollary 2 that
188 V. Kaibel and A. Loos

A(0) x ≤ b(0) (24)


A(i) xNi − xi b(i) ≤ O for all i ∈ [q] (25)
x≥O (26)

is an extended formulation in RV \{0} for P (via orthogonal projection to the


xN -space).
Let us further consider the case that for each i ∈ [q], a linear equation
"a(i) , x# = 1 is valid for the polytope P (i) . In this case, (25) implies xi =
"a(i) , xNi # for all i ∈ [q]. Thus, we find that P ⊆ RN is defined by
q (0)
i=1 "a , xNi # · A ,i ≤ b
(i)
(27)
A xNi − "a , xNi #b
(i) (i) (i)
≤O for all i ∈ [q] (28)
"a , xNi # ≥ 0
(i)
for all i ∈ [q] . (29)

We consider two particular examples of such nested combinatorial systems. The


first one is used by [8] in the construction of compact extended formulations of
the polytopes associated with the matchings of size log n in a complete graph
with n nodes. Let G = (W, E) be a 2-partite graph with W = W1 (· · ·(W2 and
e ⊆ Wi for all e ∈ E and i ∈ [2], and denote by M the set of all matchings M ⊆
E with |M | =  and |W (M ) ∩ Wi | = 1 for all i ∈ [2] (where W (M ) = ∪e∈M e is
the set of nodes matched by M ).
In order to obtain a linear description of the polytope P = conv{χ(M ) | M ∈
M} from a BCS as above, we identify [q] with the set E2 of edges of the complete
graph K2 with node set [2], and let N{i,j} = (Wi : Wj ) for all {i, j} ∈ E2 .

 S as the set of all perfect matchings in K2 and S = {e} | e ∈
(0) ({i,j})
Choosing
N{i,j} for all {i, j} ∈ E2 , we thus obtain a BCS as above with S = M. Since
we have

P (0) = {x ∈ RE
+ | x(δ(i)) = 1(i ∈ [2]), x(δ(I)) ≥ 1(I ⊆ [2], |I| odd)}
2

N
(due to Edmonds [5]) and P ({i,j}) = {x ∈ R+{i,j} | x(N{i,j} ) = 1} for all
{i, j} ∈ E2 , the system (27), (28), (29) provides the linear description

x(δ(Wi )) = 1 for all i ∈ [2]


x(δ(∪i∈I Wi )) ≥ 1 for all I ⊆ [2], |I| odd
x≥O

of the polytope P ⊆ RE associated with the matchings in M.


In order to modify this example to a second one (mentioned in the Introduc-
tion), we assume that there is a number p with |Wi | = p for all i ∈ [2], and
we replace the sets S ({i,j}) (which one can consider as the sets of matchings
of size one in the bipartite subgraph G[Wi ∪ Wj ] of G induced by Wi ∪ Wj )
by the sets S ({i,j}) of all perfect matchings in G[Wi ∪ Wj ]. Clearly, instead of
N
P ({i,j}) we now use P ({i,j}) = {x ∈ R+{i,j} | x(δ(k)) = 1 for all k ∈ Wi ∪ Wj }
Branched Polyhedral Systems 189

for all {i, j} ∈ E2 . As for each P ({i,j}) the equation x(Wi : Wj ) = p holds, the
system (27), (28), (29) for the modified example yields the linear description

x(δ(Wi )) = p for all i ∈ [2]


x(δ(∪i∈I Wi )) ≥ p for all I ⊆ [2], |I| odd
p · x(v : Wj ) − x(Wi : Wj ) = 0 for all i, j ∈ [2], {i, j} ∈ E, v ∈ Wi
x≥O

of the polytope P  = conv{χ(M ) | M ∈ M }, where M is the set of all perfect


matchings in G that induce, for each i, j ∈ [q] with i = j, an empty or a perfect
matching in G[Wi ∪ Wj ].

4.5 Stable Set Polytopes

Let G = (W, E) be an undirected graph, and stab(G) = conv{χ(S) | S ⊆


W stable in G} ⊆ RW the stable set polytope of G (where stable means that no
two nodes in S are adjacent). Let W = W1 ( · · · ( Wk be a partitioning of the
node set W into nonempty subsets. We define, for each i ∈ [k], the boundary
∂ Wi = {w ∈ Wi | {w, w } ∈ E for some w ∈ W \ Wi } of Wi , as well as
U = {∅ = U ⊆ W | U stable in G, U ⊆ ∂ Wi for some i ∈ [k]} ( {u1 , . . . , uk }.
For U ∈ U, let i(ui ) = i for all i ∈ [k] and let i(U ) be the index with U ⊆ Wi(U)
for all U ∈ U \ {u1 , . . . , uk }. Moreover, let H = (U, K) be the undirected graph
on U with, for all U, U  ∈ U with U = U  , {U, U  } ∈ K if and only if i(U ) = i(U  )
or U, U  ∈ {u1 , . . . , uk } and (U : U  ) = ∅).
We construct a BCS on the acyclic digraph D = (V, A) with V = {s} ( U ( W ,
and A containing the arcs (s, U ) for all U ∈ U, as well as the arcs (U, w) for
all U ∈ U, w ∈ Wi(U) . Thus, s is the unique source of D, and the set of sinks
of D is VT = W . With S (s) = {S ⊆ U | S stable in H} and S (U) = {S ⊆
Wi(U) | S stable in G[Wi(U) ], S ∩ ∂ Wi(U) = U } for all U ∈ U \ {u1 , . . . , uk }, as
well as S (ui ) = {S ⊂ Wi | S stable in G[Wi ], S ∩ ∂ Wi = ∅} for all i ∈ [k], we
obtain a BCS for which the intersections of the feasible sets with W are the stable
sets in G. Thus, from extended formulations for stab(H) and stab(G[Wi ]) for all
i ∈ [k], one obtains via Theorem 1 an extended formulation for stab(G) (note
that P (U) = conv{χ(S) | S ∈ S (U) } is a face of stab(G[Wi(U) ]) for each U ∈ U).
If |U| is bounded polynomially in the size of G, then the size of the constructed
extended formulation for stab(G) is bounded by a polynomial in the size of G
plus, of course, the sum of the sizes of the used extended formulations for stab(H)
and stab(G[Wi ]) (i ∈ [k]). This is, e.g., the case if every boundary ∂ Wi (i ∈ [k])
can be covered by a constant number of cliques.
The basic idea in the above construction is to consider two stable sets in Wi
equivalent if they agree on the boundary ∂ Wi of Wi . One can also work with a
weaker equivalence relation by first partitioning the boundaries ∂ Wi into cliques
∂ Wi = Wi (1) ( · · · ( Wi (i ) such that every two nodes in the same clique are
adjacent to the same nodes outside Wi , and then considering two stable sets
S, S  ⊆ Wi equivalent in case they satisfy |S∩Wi (j)| = |S  ∩Wi (j)| for all j ∈ [i ].
190 V. Kaibel and A. Loos

This can result in a significantly smaller BCS. It provides, e.g., an alternative


construction for the final step in the derivation of an extended formulation for the
stable set polytopes of claw-free graphs due to Faenza, Oriolo, and Stauffer [7].

Acknowledgements. We are greatful to the referees for careful reading and some
very helpful remarks.

References
1. Balas, E.: Disjunctive programming and a hierarchy of relaxations for discrete
optimization problems. SIAM J. Algebraic Discrete Methods 6(3), 466–486 (1985)
2. Conforti, M., Cornuéjols, G., Zambelli, G.: Extended Formulations in Combinato-
rial Optimization. Technical Report (2009)
3. Conforti, M., Cornuéjols, G., Zambelli, G.: Polyhedral approaches to mixed inte-
ger linear programming. In: Jünger, M., Liebling, T., Naddef, D., Nemhauser, G.,
Pulleyblank, W., Reinelt, G., Rinaldi, G., Wolsey, L. (eds.) 50 Years of Integer
Programming 1958-2008. Springer, Heidelberg (2010)
4. Conforti, M., Di Summa, M., Eisenbrand, F., Wolsey, L.: Network formulations of
mixed-integer programs. Math. Oper. Res. 34, 194–209 (2009)
5. Edmonds, J.: Maximum matching and a polyhedron with 0, 1 vertices. Journal of
Research of the National Bureau of Standards 69B, 125–130 (1965)
6. Faenza, Y., Kaibel, V.: Extended formulations for packing and partitioning or-
bitopes. Math. Oper. Res. 34(3), 686–697 (2009)
7. Faenza, Y., Oriolo, G., Stauffer, G.: The hidden matching structure of the compo-
sition of strips: a polyhedral perspective. In: 14th Aussois Workshop on Combina-
torial Optimization, Aussois (January 2010)
8. Kaibel, V., Pashkovich, K., Theis, D.O.: Symmetry matters for the sizes of extended
formulations. In: Eisenbrand, F., Shepherd, B. (eds.) IPCO 2010. LNCS, vol. 6080,
pp. 135–148. Springer, Heidelberg (2010)
9. Kaibel, V., Pfetsch, M.: Packing and partitioning orbitopes. Math. Program. 114(1,
Ser. A), 1–36 (2008)
10. Margot, F.: Composition de Polytopes Combinatoires: Une Approche par Projec-
tion. Ph.D. thesis, École Polytechnique Fédérale de Lausanne (1994)
11. Martin, R.K., Rardin, R.L., Campbell, B.A.: Polyhedral characterization of discrete
dynamic programming. Oper. Res. 38(1), 127–138 (1990)
12. Schaffers, M.: On Links Between Graphs with Bounded Decomposability, Existence
of Efficient Algorithms, and Existence of Polyhedral Characterizations. Ph.D. the-
sis, Université Catholique de Louvain (1994)
Hitting Diamonds and Growing Cacti

Samuel Fiorini1 , Gwenaël Joret2, , and Ugo Pietropaoli3,


1
Université Libre de Bruxelles (ULB), Département de Mathématique, CP 216,
B-1050 Brussels, Belgium
[email protected]
2
Université Libre de Bruxelles (ULB), Département d’Informatique, CP 212,
B-1050 Brussels, Belgium
[email protected]
3
Università di Roma “Tor Vergata”, Dipartimento di Ingegneria dell’Impresa,
Rome, Italy
[email protected]

Abstract. We consider the following NP-hard problem: in a weighted


graph, find a minimum cost set of vertices whose removal leaves a graph
in which no two cycles share an edge. We obtain a constant-factor ap-
proximation algorithm, based on the primal-dual method. Moreover, we
show that the integrality gap of the natural LP relaxation of the problem
is Θ(log n), where n denotes the number of vertices in the graph.

1 Introduction

Graphs in this paper are finite, undirected, and may contain parallel edges but
no loops. We study the following combinatorial optimization problem: given a
vertex-weighted graph, remove a minimum cost subset of vertices so that all
the cycles in the resulting graph are edge-disjoint. We call this problem the
diamond hitting set problem, because it is equivalent to covering all subgraphs
which are diamonds with a minimum cost subset of vertices, where a diamond
is any subdivision of the graph consisting of three parallel edges.
The diamond hitting set problem can be thought of as a generalization of
the vertex cover and feedback vertex set problems: Suppose you wish to remove
a minimum cost subset of vertices so that the resulting graph has no pair of
vertices linked by k internally disjoint paths. Then, for k = 1 and k = 2, this
is respectively the vertex cover problem and feedback vertex set problem, while
for k = 3 this corresponds to the diamond hitting set problem.

This work was supported by the “Actions de Recherche Concertées” (ARC) fund
of the “Communauté française de Belgique”.

G. Joret is a Postdoctoral Researcher of the Fonds National de la Recherche
Scientifique (F.R.S.–FNRS).

This work was done while U.P. was at Département de Mathématique - Université
Libre de Bruxelles as a Postdoctoral Researcher of the F.R.S.–FNRS.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 191–204, 2010.

c Springer-Verlag Berlin Heidelberg 2010
192 S. Fiorini, G. Joret, and U. Pietropaoli

It is well-known that both the vertex cover and feedback vertex set prob-
lems admit constant-factor approximation algorithms1 . Hence, it is natural to
ask whether the same is true for the diamond hitting set problem. Our main
contribution is a positive answer to this question.

1.1 Background and Related Work

Although there exists a simple 2-approximation algorithm for the vertex cover
problem, there is strong evidence that approximating the problem with a fac-
tor of 2 − ε might be hard, for every ε > 0 [8]. It should be noted that the
feedback vertex set and diamond hitting set problems are at least as hard to
approximate as the vertex cover problem, in the sense that the existence of a
ρ-approximation algorithm for one of these two problems implies the existence of
a ρ-approximation algorithm for the vertex cover problem, where ρ is a constant.
Concerning the feedback vertex set problem, the first approximation algo-
rithm is due to Bar-Yehuda, Geiger, Naor, and Roth [2] and its approximation
factor is O(log n). Later, 2-approximation algorithms have been proposed by
Bafna, Berman, and Fujito [1], and Becker and Geiger [3]. Chudak, Goemans,
Hochbaum and Williamson [4] showed that these algorithms can be seen as de-
riving from the primal-dual method (see for instance [9,7]). Starting with an
integer programming formulation of the problem, these algorithms simultane-
ously construct a feasible integral solution and a feasible dual solution of the
linear programming relaxation, such that the values of these two solutions are
within a constant factor of each other.
These algorithms also lead to a characterization of the integrality gap2 of
two different integer programming formulations of the problem, as we now ex-
plain. Let C(G) denote the collection of all the cycles C of G. A natural integer
programming formulation for the feedback vertex set problem is as follows:

Min cv xv
v∈V (G)

s.t. xv  1 ∀C ∈ C(G) (1)
v∈V (C)

xv ∈ {0, 1} ∀v ∈ V (G).

(Throughout, cv denotes the (non-negative) cost of vertex v.) The algorithm of


Bar-Yehuda et al. [2] implies that the integrality gap of this integer program is
O(log n). Later, Even, Naor, Schieber, and Zosin [5] proved that its integrality
gap is also Ω(log n).
1
A ρ-approximation algorithm for a minimization problem is an algorithm that runs
in polynomial time and outputs a feasible solution whose cost is no more than ρ times
the cost of the optimal solution. The number ρ is called the approximation factor.
2
The integrality gap of an integer programming formulation is the worst-case ratio
between the optimum value of the integer program and the optimum value of its
linear relaxation.
Hitting Diamonds and Growing Cacti 193

A better formulation has been introduced by Chudak et al. [4]. For S ⊆ V (G),
denote by E(S) the set of the edges of G having both ends in S, by G[S] the
subgraph of G induced by S, and by dS (v) the degree of v in G[S]. Then, the
following is a formulation for the feedback vertex set problem:

Min cv xv
v∈V (G)

s.t. (dS (v) − 1)xv  |E(S)| − |S| + 1 ∀S ⊆ V (G) : E(S) = ∅ (2)
v∈S
xv ∈ {0, 1} ∀v ∈ V (G).

Chudak et al. [4] showed that the integrality gap of this integer program asymp-
totically equals 2. Constraints (2) derive from the simple observation that the
removal of a feedback vertex set X from G generates a forest having at most
|G| − |X| − 1 edges. Notice that the covering inequalities (1) are implied by (2).

1.2 Contribution and Key Ideas

First, we obtain a O(log n)-approximation algorithm for the diamond hitting set
problem, leading to a proof that the integrality gap of the natural LP formulation
is Θ(log n). Then, we develop a 9-approximation algorithm. Both the O(log n)-
and 9-approximation algorithm are based on the primal-dual method.
Our first key idea is contained in the following observation: every simple graph
of order n and minimum degree at least 3 contains a O(log n)-size diamond. This
directly yields a O(log n)-approximation algorithm for the diamond hitting set
problem, in the unweighted case. However, the weighted case requires more work.
Our second key idea is to generalize constraints (2) by introducing ‘sparsity
inequalities’, that enable us to derive a constant-factor approximation algorithm
for the diamond hitting set problem: First, by using reduction operations, we
ensure that every vertex of G has at least three neighbors. Then, if G contains
a diamond with at most 9 edges, we raise the dual variable of the corresponding
covering constraint. Otherwise, no such small diamond exists in G, and we can
use this information to select the right sparsity inequality, and raise its dual
variable. This inequality would not be valid in case G contained a small diamond.
The way we use the non-existence of small diamonds is perhaps best explained
via an analogy with planar graphs: An n-vertex planar simple graph G has at
most 3n − 6 edges. However, if we know that G has no small cycle, then this
upper bound can be much strengthened. (For instance, if G is triangle-free then
G has at most 2n − 4 edges.)
We remark that this kind of local/global trade-off did not appear in the work
of Chudak et al. [4] on the feedback vertex set problem, because the cycle cov-
ering inequalities are implied by their more general inequalities. In our case, the
covering inequalities and the sparsity inequalities form two incomparable classes
of inequalities, and examples show that the sparsity inequalities alone are not
enough to derive a constant-factor approximation algorithm.
194 S. Fiorini, G. Joret, and U. Pietropaoli

This extended abstract is organized as follows. Preliminaries are given in


Section 2. Then, in Section 3, we define some reduction operations that allow us
to work mainly with graphs where each vertex has at least three distinct neigh-
bors. Next, in Section 4, we deal with the unweighted version of the diamond
hitting set problem and provide a simple O(log n)-approximation algorithm. In
Section 5, we turn to the weighted version of the problem. We sketch a O(log n)-
approximation algorithm. It turns out that the integrality gap of the natural
formulation of the problem is Θ(log n). Finally, in Section 6, we introduce the
sparsity inequalities, prove their validity and sketch a 9-approximation algo-
rithm. Due to length restrictions, most proofs and details are not included in
this extended abstract. A full version of the paper can be found in [6].

2 Preliminaries
A cactus is a connected graph where each edge belongs to at most one cycle.
Equivalently, a connected graph is a cactus if and only if each of its blocks is
isomorphic to either K2 or a cycle. Thus, a connected graph is a cactus if and
only if it does not contain a diamond as a subgraph. A graph without diamonds
is called a forest of cacti (see Figure 1).

Fig. 1. A forest of cacti

A diamond hitting set (or simply hitting set) of a graph is a subset of vertices
that hits every diamond of the graph. A minimum (diamond) hitting set of a
weighted graph is a hitting set of minimum total cost, and its cost is denoted by
OP T .
Let D(G) denote the collection of all diamonds contained in G. From the
standard IP formulation for a covering problem, we obtain the following LP
relaxation for the diamond hitting set problem:

Min cv xv
v∈V (G)

s.t. xv  1 ∀D ∈ D(G) (3)
v∈V (D)

xv  0 ∀v ∈ V (G).

We call inequalities (3) diamond inequalities.


Hitting Diamonds and Growing Cacti 195

3 Reductions
In this section, we define two reduction operations on graphs: First, we define the
‘shaving’ of an arbitrary graph, and then introduce a ‘bond reduction’ operation
for shaved graphs.
The aim of these two operations is to modify a given graph so that the follow-
ing useful property holds: each vertex either has at least three distinct neighbors,
or is incident to at least three parallel edges.

3.1 Shaving a Graph


Let G be a graph. Every block of G is either isomorphic to K1 , K2 , a cycle, or
contains a diamond. Mark every vertex of G that is included in a block containing
a diamond. The shaving of G is the graph obtained by removing every unmarked
vertex from G. A graph is shaved if all its vertices belong to a block containing a
diamond. Observe that, in particular, every endblock3 of a shaved graph contains
a diamond.

3.2 Reducing a Bond


A bond of a graph G is a connected subgraph Q ⊆ G equipped with two distin-
guished vertices v, w (called ends) satisfying the following requirements:
– Q is a cactus with at least two blocks;
– the block-graph of Q is a path;
– v and w belong to distinct endblocks of Q;
– v and w are not adjacent in Q;
– Q − {v, w} is a non-empty component of G − {v, w}, and
– Q contains all the edges in G between {v, w} and V (Q) − {v, w}.
Observe that Q is “almost” an induced subgraph of G, since Q includes every
edge of G between vertices of Q, except those between v and w (if any). The
vertices in V (Q) − {v, w} are said to be the internal vertices of Q. The bond Q
is simple if Q is isomorphic to a path, double otherwise.
Let G be a shaved graph. A vertex u of G is reducible if u has exactly two
neighbors in G, and there are at most two parallel edges connecting u to each
of its neighbors. The bond reduction operation is defined as follows. Let u be a
reducible vertex and let Qu be an inclusion-wise maximal bond of G containing
u, with ends v and w. (Observe that such a bond exists by our hypothesis on u;
moreover, it might not be unique.) Then, remove from G every internal vertex
of Qu , and add one or two edges between v and w, depending on whether Qu
is simple or double. In the latter case, the two new parallel edges are said to
be twins. See Figure 2 for an illustration of the operation. Observe that the
resulting graph is also a shaved graph.
3
We recall that the block-graph of G has the blocks of G and the cutvertices of G as
vertices, a block and a cutvertex are adjacent if the former contains the latter. This
graph is always acyclic. An endblock of G is a vertex of the block-graph with degree
at most one.
196 S. Fiorini, G. Joret, and U. Pietropaoli

(a) (b) (c)

Fig. 2. (a) A shaved graph G with two maximal bonds (in grey). (b) Reduction of the
first bond. (c) Reduction of the second bond. The graph is now reduced.

A crucial property of the bond reduction operation is that, when applying it


iteratively, we never include in the bond to be reduced any edge coming from
previous bond reductions [6, Lemma 3.1].
A reduced graph G 7 of G is any graph obtained from G by iteratively applying
a bond reduction, as long as there is a reducible vertex (see Figure 2). We remark
that there is not necessarily a unique reduced graph of G (consider for instance
K3 where two edges are doubled).

4 A O(log n)-Approximation Algorithm in the


Unweighted Case
As a first step, we show that every reduced graph contains a diamond of size
O(log n) [6, Lemmas 4.1 and 4.4].
Lemma 1. Every simple n-vertex graph with minimum degree at least 3 contains
a diamond of size at most 6 log3/2 n + 8. Moreover, such a diamond can be found
in polynomial time. The same holds for reduced graphs.
Our algorithm for the diamond hitting set problem on unweighted graphs is
described in Algorithm 1.

Algorithm 1. A O(log n)-approximation algorithm for unweighted graphs.


– X ←∅
– While X is not a hitting set of G, repeat the following steps:
• Compute a reduced graph G 7 of G − X
• Find a diamond D 7 in G 7 +8
7 of size at most 6 log3/2 |G| (using Lemma 1)
• Include in X all vertices of D7

The algorithm relies on the simple fact that every hitting set of a reduced
graph G7 of a graph G is also a hitting set of G itself. The set of diamonds com-
puted by the algorithm yields a collection D of pairwise vertex-disjoint diamonds
in G. In particular, the size of a minimum hitting set is at least |D|. For each
diamond in D, at most 6 log3/2 n + 8 vertices were added to the hitting set X.
Hence, the approximation factor of the algorithm is O(log n).
Hitting Diamonds and Growing Cacti 197

5 A O(log n)-Approximation Algorithm


The present section is devoted to a O(log n)-approximation algorithm for the
diamond hitting set problem in the weighted case, which is based on the primal-
dual method. We start by defining, in Section 5.1, the actual LP relaxation of
the problem used by the algorithm, together with its dual. Then, in Section 5.2,
we sketch our approximation algorithm. Details and proofs can be found in the
full version of this paper [6, Section 5].

5.1 The Working LP and Its Dual


Our approximation algorithm for the weighted case is based on the natural LP
relaxation for the diamond hitting set problem, given on page 194. To simplify
the presentation, we do not directly resort to that LP relaxation but to a possibly
weaker relaxation that is constructed during the execution of the algorithm,
that we call the working LP. At each iteration, an inequality is added to the
working LP. These inequalities, that we name blended diamond inequalities, are
all implied by diamond inequalities (3). The final working LP reads:

(LP) Min cv xv
v∈V (G)

s.t. ai,v xv  βi ∀i ∈ {1, . . . , k}
v∈V (G)

xv  0 ∀v ∈ V (G),

where k is the total number of iterations of the algorithm. The dual of (LP) is:


k
(D) Max βi y i
i=1

k
s.t. ai,v yi  cv ∀v ∈ V (G)
i=1
yi  0 ∀i ∈ {1, . . . , k}.

The algorithm is based on the primal-dual method. It maintains a boolean pri-


mal solution x and a feasible dual solution y. Initially, all variables are set to
0. Then the algorithm enters its main loop, that ends when  x satisfies all dia-
mond inequalities. At the ith iteration, a violated inequality v∈V ai,v xv  βi
is added to the working LP and the corresponding dual variable yi is increased.
In order to preserve the feasibility of the dual solution, we stop increasing yi
whenever
i some dual inequality becomes tight. That is, we stop increasing when
j=1 a j,v yj = cv for some vertex v, that is said to be tight. (Actually, we
should also stop increasing yi in case a ‘collision’ occurs [6, Section 5.2.4].)
All tight vertices v (if any) are then added to the primal solution. That is,
the corresponding variables xv are increased from 0 to 1. The current iteration
198 S. Fiorini, G. Joret, and U. Pietropaoli

then ends and we check whether x satisfies all diamond inequalities. If so, then
we exit the loop, perform a reverse delete step, and output the current primal
solution.
The precise way the violated blended diamond inequality is chosen depends
among other things on the residual cost (or slack) of the vertices. The residual
i−1
cost of vertex v at the ith iteration is the number cv − j=1 aj,v yj . Note that
the residual cost of a vertex is always nonnegative, and zero if and only if the
vertex is tight.

5.2 The Algorithm


The algorithm (rather, a simplified version of the algorithm) is described in
Algorithm 2.

Algorithm 2. A O(log n)-approximation algorithm for weighted graphs.


– X ← ∅; y ← 0; i ← 0
– While X is not a hitting set of G = (V, E), repeat the following steps:
• i←i+1
• Let H be the graph obtained by shaving G − X
• Find a reduced graph H 7 of H
7 7
• Find a diamond D in H of size at most 6 log3/2 |H| 7
+8 (using Lemma 1)
• Compute a violated blended diamond inequality v∈V ai,v xv  βi
7 and the residual costs; add it to (LP)
based on D,
• Increase yi until some vertex becomes tight
• Add all tight vertices v to X, in a carefully chosen order [6, Section 5.2.6]
– k←i
– Perform a reverse delete step on X

We remark that the set X naturally corresponds to a primal solution x, ob-


tained by setting xv to 1 if v ∈ X, to 0 otherwise, for every v ∈ V (G). This
vector x satisfies the diamond inequalities (3) exactly when we exit the while
loop of the algorithm, that is, when X becomes a hitting set.
The reverse delete step consists in considering the vertices of X in the reverse
order in which they were added to X and deleting those vertices v such that
X − {v} is still a hitting set. Observe that, because of this step, the hitting set
X output by the algorithm is inclusion-wise minimal.

Theorem 1. Algorithm 2 yields a O(log n)-approximation for the diamond hit-


ting set problem.

We can prove that the integrality gap of the natural LP relaxation for the prob-
lem (see page 194) is Θ(log n). This result is obtained using expander graphs
with large girth [6, Section 5.4].
Hitting Diamonds and Growing Cacti 199

6 A 9-Approximation Algorithm
In this section we give a primal-dual 9-approximation algorithm for the diamond
hitting set problem. We start with a sketch of the algorithm in Section 6.1. The
algorithm makes use of the sparsity inequalities. In order to describe them, we
first bound the number of edges in a forest of cacti in Section 6.2; using this
bound, in Sections 6.3 and 6.4 we introduce the sparsity inequalities and prove
their validity. Once again, missing proofs and details can be found in the full
version of this paper [6, Section 6].
In the whole section, q is a global parameter of our algorithm which is set to
5. (We remark that the analysis below could be adapted to other values of q,
but this would not give an approximation factor better than 9.)

6.1 The Algorithm


Our 9-approximation algorithm for the diamond hitting set problem is very sim-
ilar to the O(log n)-approximation algorithm. The main difference is that we use
a larger set of inequalities to build the working LP relaxation. See Algorithm 3
for a description of (a simplified version of) the algorithm.

Algorithm 3. A 9-approximation algorithm.


– X ← ∅; y ← 0; i ← 0
– While X is not a hitting set of G = (V, E), repeat the following steps:
• i←i+1
• Let H be the graph obtained by shaving G − X
• Find a reduced graph H 7 of H
• If, in 7 no two cycles of size at most q share an edge then let
H,

v∈V ai,v xv  βi be the extended sparsity inequality with support V (H)
• Otherwise, H7 contains a diamond D 7 with at most 2q − 1 edges and let
 7 and
v∈V a i,v x v  βi be a blended diamond inequality deduced from D
the residual costs 
• Add the inequality v∈V ai,v xv  βi to the working LP
• Increase yi until some vertex becomes tight
• Add all tight vertices to X, in a carefully chosen order
– k←i
– Perform a reverse delete step on X

Our main result is as follows.


Theorem 2. Algorithm 3 yields a 9-approximation for the diamond hitting set
problem.

6.2 Bounding the Number of Edges in a Forest of Cacti


The following lemma provides a bound on the number of edges in a forest of
cacti. For i ∈ {2, . . . , q}, we denote by γi (G) the number of cycles of length i of
a graph G.
200 S. Fiorini, G. Joret, and U. Pietropaoli

Lemma 2. Let F be a forest of cacti with k components and let q  2. Then


q+1 q−i+1q
||F ||  (|F | − k) + γi (F ).
q i=2
q

Proof. Denote by γ>q (F ) the number of cycles of F whose length exceeds q. We


have

q
||F || = |F | − k + γi (F ) + γ>q (F ). (4)
i=2
In the right hand side, the first two terms represent the number of edges in a
spanning forest of F , while the last terms give the number of edges that should
be added to obtain the forest of cacti F .
Because every two cycles in F are edge disjoint, we have

q
||F ||  i γi (F ) + (q + 1) γ>q (F ).
i=2

Combining this with (4), we get

1 q !
γ>q (F )  |F | − k − (i − 1)γi (F ) . (5)
q i=2

From (4) and (5), we finally infer



q
1 q !
||F ||  |F | − k + γi (F ) + |F | − k − (i − 1) γi (F )
i=2
q i=2

q+1 q
q−i+1
 (|F | − k) + γi (F ). 2
q i=2
q

6.3 The Sparsity Inequalities


We define the load of a vertex v in a graph G as

q
G (v) := dG (v) − λi γi (G, v),
i=2

where, for i ∈ {2, . . . , q}, γi (G, v) denotes the number of cycles of length i
incident to v in G and
q−i+1
λi := .
i/2 q
Lemma 3. Let X be a hitting set of a graph G where no two cycles of length at
most q share an edge. Then,
 q + 1! q
q+1 q−i+1 q+1
G (v) −  ||G|| − |G| − γi (G) + . (6)
q q i=2
q q
v∈X

We call Inequality (6) a sparsity inequality.


Hitting Diamonds and Growing Cacti 201

Proof. For i ∈ {2, . . . , q} and j ∈ {0, . . . , i}, we denote by ξij the number of
cycles of G that have length i and exactly j vertices in X.
Letting ||X|| and |δ(X)| respectively denote the number of edges of G with
both ends in X and the number of edges of G having an end in X and the other
in V (G) − X, we have

 
q 
i
G (v) = 2||X|| + |δ(X)| − j λi ξij
v∈X i=2 j=1


q 
i
= ||X|| + ||G|| − ||G − X|| − j λi ξij
i=2 j=1

q+1 q
q − i + 1 0 
q i
 ||X||+||G||− (|G − X| − 1)− ξi − j λi ξij ,
q i=2
q i=2 j=1

where the last inequality follows from Lemma 2 applied to the forest of cacti
G − X (notice that γi (G − X) = ξi0 ).
Because no two cycles of length at most q share an edge and, in a cycle of
length i, each subset of size j induces a subgraph that contains at least 2j − i
edges, we have
q  i
||X||  (2j − i) ξij .
i=2 j=1+ i 
2

Thus, we obtain

 
q 
i
q+1
G (v)  (2j − i) ξij + ||G|| − (|G − X| − 1)
i=2 j=1+ i 
q
v∈X
2


q
q−i+1 
q 
i
− ξi0 − j λi ξij .
i=2
q i=2 j=1

We leave it to the reader to check that, in the right hand side of the last in-
equality, the total coefficient of ξij is at least − q−i+1
q , for all i ∈ {2, . . . , q} and
j ∈ {0, . . . , i}. Hence,

 q+1 q+1 q+1  q−i+1


q
G (v)  ||G|| − |G| + |X| + − γi (G).
q q q i=2
q
v∈X

Inequality (6) follows. 2

6.4 The Extended Sparsity Inequalities

Consider a shaved graph H and denote by H 7 a reduced graph of H. Suppose that,


7
in H, no two cycles of length at most q share an edge. The sparsity inequality (6)
202 S. Fiorini, G. Joret, and U. Pietropaoli

for H7 yields a valid inequality also for H, where the coefficient of each variable
corresponding to a vertex that was removed when H is zero. However, as it is,
the inequality is useless. We have to raise the coefficients to those variables in
such a way that the resulting inequality remains valid.
First, we associate to each edge of H 7 a corresponding primitive subgraph in
H, defined as follows. Consider an edge e ∈ E(H). 7 If e was already present in
H, then its primitive subgraph is the edge itself and its two ends. Otherwise, the
primitive subgraph of e is the bond whose reduction produced e. In particular,
if e has a twin edge e , then the primitive subgraphs of e and e coincide. The
primitive subgraph J of a subgraph J7 ⊆ H 7 is defined simply as the union of the
primitive subgraphs of every edge in E(J ).7
Thus, the graph H can be uniquely decomposed into simple or double pieces
corresponding respectively to edges or pairs of parallel edges in H. 7 Here, the
pieces of H are defined as follows: let v and w be two adjacent vertices of H, 7
7 7 7
and let J denote the subgraph of H induced by {v, w} (thus J is either an edge
or a pair of parallel edges, together with the endpoints v and w). The primitive
subgraph of J7 in H, say J, is a piece of H with ends v and w. Such a piece is
simple if J7 has exactly one edge and double otherwise (see Fig. 3), that is, if J7
has two parallel edges. The vertices of H are of two types: the branch vertices
are those that survive in H, 7 and the other vertices are internal to some piece
of H.

Fig. 3. A double piece

Consider a double piece Q of H (if any) and a cycle C contained in Q. Then


C is a block of Q. A vertex v of C is said to be an end of the cycle C if v is an
end of the piece Q or v belongs to a block of Q distinct from C. Observe that C
has always two distinct ends. The cycle C has also two handles, defined as the
two v–w paths in C, where v and w are the two ends of C.
The two handles of C are labelled top and bottom as described in [6, Section
6]. Denote by t(C) (resp. b(C)) the minimum residual cost of a vertex in the top
handle (resp. bottom handle) of C. Thus, t(C)  b(C). Also, we choose a cycle
of Q (if any) and declare it to be special. If possible, the special cycle is chosen
among the cycles C contained in Q with t(C) = b(C). (So every double piece of
H has a special cycle.)
The extended sparsity inequality for H reads

av xv  β, (7)
v∈V (H)
Hitting Diamonds and Growing Cacti 203

where
⎧ q+1

⎪ H7 (v) − if v is a branch vertex,




q

⎪ 1 if v is an internal vertex of a simple piece,





⎪ 2 if v is an internal vertex of a double piece



⎪ and does not belong to any handle,

av := 0 if v belongs to the top handle



⎪ of a cycle C with t(C) < b(C),



⎪ 2 if v belongs to the bottom handle



⎪ of a cycle C with t(C) < b(C),





⎪ 1 if v belongs to a handle of a cycle C with t(C) = b(C)

or C is special,

and
q−i+1 q
7 − q + 1 |H|
β := ||H|| 7 − 7 + q + 1.
γi (H)
q i=2
q q
In the definition of av above, we always assume that the cycle C is contained in
a double piece of H. The next lemma is proved in [6, Lemma 6.3].

Lemma 4. Let H be a graph and let H 7 be a reduced graph of H such that


no two cycles of length at most q share an edge. Then Inequality (7) is valid,
that is, 
av  β
v∈X

whenever X is a hitting set of H.

The following lemma is the key of our analysis of Algorithm 3. Notice that the
constant involved is smaller than the approximation guarantee of our algorithm.
The number 9 comes from the fact that we use blended diamond inequalities for
diamonds with up to 9 pieces.

Lemma 5. Let H be a shaved graph and let H 7 be a reduced graph of H. Suppose


7
that, in H, no two cycles of length at most q share an edge. Then,

av  8 β
v∈X

for every minimal hitting set X of H.

Acknowledgements
We thank Dirk Oliver Theis for his valuable input in the early stage of this
research. We also thank Jean Cardinal and Marcin Kamiński for stimulating
discussions.
204 S. Fiorini, G. Joret, and U. Pietropaoli

References
1. Bafna, V., Berman, P., Fujito, T.: A 2-approximation algorithm for the undirected
feedback vertex set problem. SIAM Journal on Discrete Mathematics 12(3), 289–297
(1999)
2. Bar-Yehuda, R., Geiger, D., Naor, J., Roth, R.M.: Approximation algorithms for
the feedback vertex set problem with applications to constraint satisfaction and
Bayesian inference. SIAM Journal on Computing 27(4), 942–959 (1998)
3. Becker, A., Geiger, D.: Optimization of Pearl’s method of conditioning and greedy-
like approximation algorithms for the vertex feedback set problem. Artificial Intel-
ligence 83, 167–188 (1996)
4. Chudak, F.A., Goemans, M.X., Hochbaum, D.S., Williamson, D.P.: A primaldual
interpretation of two 2-approximation algorithms for the feedback vertex set problem
in undirected graphs. Operations Research Letters 22, 111–118 (1998)
5. Even, G., Naor, J., Schieber, B., Zosin, L.: Approximating minimum subset feedback
sets in undirected graphs with applications. SIAM Journal on Discrete Mathemat-
ics 13(2), 255–267 (2000)
6. Fiorini, S., Joret, G., Pietropaoli, U.: Hitting diamonds and growing cacti (2010),
http://arxiv.org/abs/0911.4366v2
7. Goemans, M.X., Williamson, D.P.: The primal-dual method for approximation al-
gorithms and its application to network design problems. In: Approximation Al-
gorithms for NP-Hard Problems, ch. 4, pp. 144–191. PWS Publishing Company
(1997)
8. Khot, S., Regev, O.: Vertex cover might be hard to approximate to within 2 − ε.
Journal of Computer and System Sciences 74(3), 334–349 (2008)
9. Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and
Complexity. Prentice-Hall, Englewood Cliffs (1982)
Approximability of 3- and 4-Hop Bounded
Disjoint Paths Problems

Andreas Bley1, and Jose Neto2


1
TU Berlin, Institute of Mathematics
Straße des 17. Juni 136, D-10623 Berlin, Germany
[email protected]
2
Institut Telecom, Telecom & Management SudParis,
CNRS UMR 5157, 9 rue Charles Fourier, F-91011 Evry, France
[email protected]

Abstract. A path is said to be -bounded if it contains at most edges.


We consider two types of -bounded disjoint paths problems. In the max-
imum edge- or node-disjoint path problems MEDP( ) and MNDP( ), the
task is to find the maximum number of edge- or node-disjoint -bounded
(s, t)-paths in a given graph G with source s and sink t, respectively.
In the weighted edge- or node-disjoint path problems WEDP( ) and
WNDP( ), we are also given an integer k ∈ N and non-negative edge
weights ce ∈ N, e ∈ E, and seek for a minimum weight subgraph of G
that contains k edge- or node-disjoint -bounded (s, t)-paths. Both prob-
lems are of great practical relevance in the planning of fault-tolerant
communication networks, for example.
Even though length-bounded cut and flow problems have been stud-
ied intensively in the last decades, the N P-hardness of some 3- and
4-bounded disjoint paths problems was still open. In this paper, we set-
tle the complexity status of all open cases showing that WNDP(3) can
be solved in polynomial time, that MEDP(4) is APX -complete and ap-
proximable within a factor of 2, and that WNDP(4) and WEDP(4) are
APX -hard and N PO-complete, respectively.

Keywords: Graph algorithms; length-bounded paths; complexity; ap-


proximation algorithms.

1 Introduction

Two major concerns in the design of modern communication networks are the
protection against potential failures and the permanent provision of a guaran-
teed minimum level of service quality. A wide variety of models expressing such
requirements may be found in the literature, e.g. [1,2,3,4]. Coping simultane-
ously with both requirements naturally leads to length-restricted disjoint paths
problems: In order to ensure that a pair of nodes remains connected also after
some nodes or edges of the network fail, one typically demands the existence

Supported by the DFG Research Center Matheon – Mathematics for key technologies.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 205–218, 2010.

c Springer-Verlag Berlin Heidelberg 2010
206 A. Bley and J. Neto

of several node- or edge-disjoint transmission paths between them. Each node


on a transmission path, however, may lead to additional packet delay, jitter,
and potential transmission errors for the corresponding data stream. To provide
a guaranteed level of transmission service quality, these paths thus must not
contain more than a certain number of intermediate nodes or, equivalently, of
edges.
Mathematically, the task of verifying if a given network satisfies the robustness
and quality requirements of a given node pair can be formulated as an edge- or
node-disjoint paths problem. Let G = (V, E) be a simple graph with source
s ∈ V and sink t ∈ V and let k ∈ N. A path in G is said to be -bounded
for a given number  ∈ N if it contains at most  edges. In the edge-disjoint
paths problem EDP (), the task is to decide if there are k edge-disjoint -
bounded (s, t)-paths in G or not. In the corresponding maximum edge-disjoint
paths problem MEDP(), we wish to find the maximum number of edge-disjoint
-bounded (s, t)-paths. The analogous node-disjoint path problems are denoted
as N DP () and M N DP (). The task of designing a network that satisfies the
requirements of a single node pair can be modeled as a weighted edge- or node-
disjoint path problems WEDP() and WNDP(). In these problems, we are given
the graph G, source s and sink t, the number of paths k, and non-negative edge
weights ce ∈ N, e ∈ E. The task is to find a minimum cost subset E  ⊆ E such
that the subgraph (V, E  ) contains at least k edge- or node-disjoint -bounded
(s, t)-paths, respectively.
Due to their great practical relevance, problems asking for disjoint paths or
unsplittable flows between some node pairs have received considerable attention
in the literature. Structural results, complexity issues, and approximation algo-
rithms for disjoint paths problems without length restrictions are discussed in
[5,6,7], for example.
In a seminal article Menger [8] shows that the maximum number of edge- or
node-disjoint (s, t)-paths in a graph is equal to the minimum size of an (s, t)-
edge- or (s, t)-node-cut, respectively. Lovász et al. [9], Exoo [10], and Niepel et
al. [11] showed that this strong duality between disjoint paths and (suitably
defined) cuts still holds for 4-bounded node-disjoint paths and node-cuts and for
3-bounded edge-disjoint paths and edge-cuts, but that Menger’s theorem does
not hold for length bounds  ≥ 5. The ratio between the number of paths and
the cut size is studied in [12,13]. Generalizations of Menger’s theorem and of
Ford and Fulkerson’s max flow min cut theorem to length-bounded flows are an
area of active research [14,15].
Polynomial time algorithms for the minimum -bounded edge-cut problem
with  ≤ 3 have been presented by Mahjoub and McCormick [16]. Baier et
al. [17] proved that the minimum -bounded edge-cut problem is APX -hard for
 ≥ 4 and that the corresponding node-cut problem is APX -hard for  ≥ 5.
Itai et al. [18] and Bley [19] showed that the problems MEDP() and MNDP()
of finding the maximum number of edge- and node-disjoint -bounded paths are
polynomially solvable for  ≤ 3 and  ≤ 4, respectively, and that both problems are
APX -complete for  ≥ 5. Heuristics to find large sets of disjoint length bounded
Approximability of 3- and 4-Hop Bounded Disjoint Paths Problems 207

Table 1. Known and new (bold) complexity results for node- and edge-disjoint -
bounded paths problems

MNDP( ) WNDP( ) MEDP( ) WEDP( )


≤2 P P P P
=3 P P P P
=4 P APX-hard (at least) APX-complete NPO-complete
≥5 APX -complete N PO-complete APX -complete N PO-complete

paths can be found, e.g., in [18,20,21]. Polyhedral approaches to these problems


are investigated in [22,23,24]. The weighted disjoint paths problems WEDP()
and WNDP() are known to be N PO-complete for  ≥ 5 and to be polynomi-
ally solvable for  ≤ 2 in the node-disjoint case and for  ≤ 3 in the edge-disjoint
case [19]. Further results and a finer analysis of the complexity of disjoint paths
problems by means of different parameterizations (namely w.r.t. the number of
paths, their length, or the graph treewidth) are presented in [25,26]. The com-
plexity of MEDP(4), WEDP(4), WNDP(3), and WNDP(4), however, has been
left open until now.
The contribution of this paper is to close all these open cases. In Section 2,
we prove that the maximum edge-disjoint 4-bounded paths problem MEDP(4)
is APX -complete, presenting a 2-approximation algorithm and an approxima-
tion preserving reduction from Max-k-Sat(3) to MEDP(4). This implies that
the corresponding weighted edge-disjoint paths problem WEDP(4) is N PO-
complete. In Section 3, we then show how to solve the weighted node-disjoint 3-
bounded paths problem WNDP(3) via matching techniques in polynomial time
and prove that the 4-bounded version of this problem is at least APX -hard.
Table 1 summarizes the known and new complexity results regarding these prob-
lems. All hardness results and algorithms presented in this paper generalize in
a straightforward way to directed graphs and to non-simple graphs containing
parallel edges.

2 Edge-Disjoint 4-Bounded Paths

In this section, we study the approximability of the two edge-disjoint 4-bounded


problems. First, we consider the problem of maximizing the number of edge-
disjoint paths. One easily observes that any inclusion-wise maximal set of edge-
disjoint 4-bounded (s, t)-paths, which can be computed in polynomial time by
greedily adding disjoint paths to the solution, is a 4-approximate solution for
MEDP(4) [19].
A 2-approximation algorithm is obtained as shown in algorithm ExFlow
on Page 208. In the first step of /4algorithm ExFlow, we construct the directed
graph G = (V  , E  ) with V  = i=0 Vi for V0 := {s0 }, V4 := {t4 }, and Vi := {vi |
v ∈ V \ {s, t} with distG (v, s) ≤ i and distG (v, t) ≤ 4 − i} for all i ∈ {1, 2, 3},
/4
and E  := i=0 Ei with E0 := {(s0 , t4 )} if st ∈ E, E0 := ∅ if st ∈ E, and
208 A. Bley and J. Neto

Ei := {(vi−1 , wi ) ∈ Vi−1 × Vi | vw ∈ E or v = w} for i ∈ {1, . . . , 4}, where


distG (u, v) denotes the distance from node u to node v in G. We assign cost 0
and capacity 1 to all edges ui ui+1 ∈ E  and capacity 1 and cost 1 to all other
edges in E  . Figure 1 illustrates this construction.

v11 v21
v1 v3

v2 v22

v5 s0 v23 v33 t4
s t
v24

v14 v25 v34


v4

Fig. 1. Construction of the hop-extended digraph G (right) from the given graph G
(left) in Step 1 of algorithm ExFlow. Arcs with cost 0 in G are thick.

In this layered digraph, we compute an (integer) minimum cost maximum


(s0 , t4 )-flow and its decomposition into paths P1 , . . . , Pk . Each such path Pi =
(s0 , u1 , v2 , w3 , t4 ) defines a 4-bounded walk (s, u, v, w, t) in G, which can be short-
ened to a simple 4-bounded path Pi . Let F = {P1 , . . . , Pk } be the set of these
paths. Note that these paths are 4-bounded, but not necessarily edge-disjoint.
In the second step, we create the associated “conflict graph” H = (F , {Pi Pj |
Pi ∩ Pj = ∅}). By Lemma 1, H consists only of disjoint paths and isolated
nodes. Choosing all isolated nodes and a maximum independent set in each of
these paths, we thus can compute an independent set S ⊆ F of size |S| ≥ |F |/2
in H. This is done in the third step of our algorithm.
Finally, we return the path set corresponding to this independent set.
Steps 1, 2, and 4 of this algorithm clearly can be done in polynomial time. The
possibility to perform also Step 3 in polynomial time follows from the following
lemma.
Lemma 1. The conflict graph H = (F , {Pi Pj | Pi ∩ Pj = ∅}) created in Step 3
of algorithm ExFlow consists of isolated nodes and disjoint paths only.

ExFlow
1. Compute a min cost max (s0 , t4 )-flow f in the hop-extended digraph G .
Let F := {P1 , . . . , Pk } be the corresponding 4-bounded simple paths in G.
2. Create the conflict graph H := (F , {Pi Pj | Pi ∩ Pj = ∅}).
3. Compute an independent set S ⊆ F in H with |S| ≥ 12 |F |.
4. Return S.
Approximability of 3- and 4-Hop Bounded Disjoint Paths Problems 209

Proof. Let f be the minimum cost maximum (s0 , t4 )-flow in G computed in


Step 1 of ExFlow and let P1 , . . . , Pk be its path decomposition. Note that the
paths Pi are edge-disjoint in G .
By construction of G , each edge e ∈ δ(s) ∪ δ(t) corresponds to at most one
arc (s0 , v1 ), (v3 , t4 ), or (s0 , t4 ) in G . Thus, any such edge is contained in at most
one path in F . Furthermore, for each edge e = uv ∈ E \ δ(s) \ δ(t), the paths
P1 , . . . , Pk in G contain at most one of the arcs (u1 , v2 ) and (v1 , u2 ) and at most
one of the arcs (u2 , v3 ) and (v2 , u3 ). Otherwise, these paths do not correspond to
a minimum cost maximum flow: If there were two paths P1 = (s0 , u1 , v2 , w3 , t4 )
and P2 = (s0 , v1 , u2 , q3 , t4 ), for example, then replacing these paths by the paths
P1 = (s0 , u1 , u2 , q3 , t4 ) and P2 = (s0 , v1 , v2 , w3 , t4 ) would reduce the cost of the
corresponding flow. Consequently, any edge e ∈ E \δ(s)\δ(t) can be contained in
at most two of the paths in F and, further on, a path in F can intersect with at
most two other paths in F . This implies that the conflict graph H constructed
in Step 2 of ExFlow consists only of isolated nodes and disjoint paths and
cycles.
To see that H cannot contain cycles, / let C = {P1 , . . . , Pn } be the shortest
cycle in H. Then each edge in M := i∈C Pi \ δ(s) \ δ(t) must appear in ex-
actly two paths in C, once as the second and once as the third edge. If there
were two paths P1 and P2 in C that traverse one of the edges e = uv ∈ M in
opposite directions, then the corresponding paths in G would be of the form
P1 = (s0 , u1 , v2 , w3 , t4 ) and P2 = (s0 , q1 , v2 , u3 , t4 ). In this case, replacing P1 and
P2 by P1 = (s0 , u1 , u2 , u3 , t4 ) and P2 = (s0 , q1 , v2 , w3 , t4 ) would reduce the cost
of the corresponding flow in G (and the size of the remaining cycle in C), which
is a contradiction to our assumption that the paths Pi correspond to a minimum
cost maximum flow in G .
So, we may assume that the paths in C traverse each edge e ∈ M in the same
direction. Then, for each e = uv ∈ M , there is exactly one path of the form
(s, u, v, w, t) and exactly one path of the form (s, q, u, v, t) in C. In this case,
however, we can replace each path Pi = (s0 , u1 , v2 , w3 , t4 ) that corresponds to a
path in C by the less costly path Pi = (s0 , u1 , v2 , v3 , t4 ) without exceeding the
edge capacities in G . This is again a contradiction to our assumption that the
paths Pi define a minimum cost maximal flow in G . Consequently, there are no
cycles in H. 


Theorem 2. ExFlow is a 2-approximation algorithm for MEDP(4).

Proof. By Lemma 1, all steps of the algorithm can be executed in polynomial


time. The paths in S are derived from the 4-bounded (s0 , t4 )-flow paths in G ,
so they are clearly 4-bounded. As S is an independent set in the conflict graph
H, the paths in S are also edge-disjoint.
Furthermore, any set of edge-disjoint 4-bounded (s, t)-paths in G defines a
feasible (s0 , t4 )-flow in G . Hence, k = |F | is an upper bound on the maximum
number k ∗ of edge-disjoint 4-bounded (s, t)-paths in G, which immediately im-
plies |S| ≥ 12 k ∗ . 

210 A. Bley and J. Neto

s
s
u1i ā1i a1i u2i ā2i a2i u3i ā3i a3i a1j
ā3i ā2k
wi1 vi1 w̄i1 wi2 vi2 w̄i2 wi3 vi3 w̄i3 bl

cl

t
t
Fig. 2. Graph Gi for variable xi occurring as Fig. 3. Graph Hl for clause
literals x1i , x̄2i , x̄3i Cl = (x̄3i ∨ x1j ∨ x̄2k )

In order to show that MEDP(4) is APX -hard, i.e., that there is some c > 1
such that approximating MEDP(4) within a factor less than c is N P-hard,
we construct an approximation preserving reduction from the Max-k-Sat(3)
problem to MEDP(4). Given a set X of boolean variables and a collection C
of disjunctive clauses such that each clause contains at most k literals and each
variable occurs at most 3 times as a literal, the Max-k-Sat(3) problem is to
find a truth assignment to the variables that maximizes the number of satisfied
clauses. Max-k-Sat(3) is known to be APX -complete [27].
Theorem 3. MEDP(4) is APX -hard.

Proof. We construct an approximation preserving reduction from Max-k-Sat(3)


to MEDP(4). Let xi , i ∈ I, be the boolean variables and Cl , l ∈ L be the clauses
of the given Max-k-Sat(3) instance. Without loss of generality we may as-
sume that each variable xi occurs exactly 3 times as a literal and denote these
occurrences by xji , j ∈ J := {1, . . . , 3}.
We construct an undirected graph G = (V, E) that consists of |I| + |L| sub-
graphs, one for each variable and one for each clause, as follows. For each i ∈ I,
we construct a variable graph Gi = (Vi , Ei ) as shown in Figure 2. Gi contains
the nodes and edges

Vi :={s, t} ∪ {uji , vij , wij , w̄ij , aji , āji | j ∈ J} and


Ei :={suji , uji vij , vij wij , vij w̄ij , wij t, w̄ij t, saji , sāji , w̄ij wij+1 | j ∈ J}
∪ {aji wij+1 | j ∈ J : xji occurs as unnegated literal xji }
∪ {āji w̄ij | j ∈ J : xji occurs as negated literal x̄ji } ,

where wi4 = wi1 for notational simplicity. The nodes s and t are contained in
all subgraphs and serve as source and destination for all paths. For each l ∈ L,
we construct a clause graph Hl = (Wl , Fl ) as shown in Figure 3. In addition to
the nodes and edges it shares with the variable graphs, Hl contains 2 nodes and
Approximability of 3- and 4-Hop Bounded Disjoint Paths Problems 211

u1i ā1i a1i u2i ā2i a2i u3i ā3i a3i

wi1 vi1 w̄i1 wi2 vi2 w̄i2 wi3 vi3 w̄i3


bl
cl

t
Fig. 4. Union of Gi and Hl for variable xi and clause Cl = (x̄3i ∨ . . . ). Thick lines are
paths in P̄i and path Qli .

k  + 2 edges, where k  is the number of literals in clause Cl . Formally, Wl and Fl


are defined as

Wl :={s, t, bl , cl } ∪ {āji | i ∈ I, j ∈ J : negated literal x̄ji occurs in Cl }


∪ {aji | i ∈ I, j ∈ J : unnegated literal xji occurs in Cl } and
Fl :={bl cl , cl t} ∪ {sāji , āji bl | i ∈ I, j ∈ J : negated literal x̄ji occurs in Cl }
∪ {saji , aji bl | i ∈ I, j ∈ J : unnegated literal xji occurs in Cl } .

The goal in the constructed MEDP(4) instance is to find the maximum number of
edge-disjoint 4-bounded (s, t)-paths in the simple undirected graph G obtained as
the union of all variable and clause (sub)-graphs. It is clear that the constructions
can be performed in polynomial time.
For notational convenience, we denote for each i ∈ I and j ∈ J the paths

j j j  (s, āji , w̄ij , wij+1 , t) if x̄ji occurs
Pij = (s, ui , vi , w̄i , t) , Pij = ,
(s, āji , aji , wij+1 , t) if xji occurs

j j j  (s, aji , āji , w̄ij , t) if x̄ji occurs
P̄ij = (s, ui , vi , wi , t) , P̄ij = .
(s, ai , wi , w̄i , t) if xji occurs
j j+1 j

For each i ∈ I and l ∈ L such that variable xi occurs in clause Cl , we denote



(s, aji , bl , cl , t) if literal xji occurs in Cl
Qli =
(s, āji , bl , cl , t) if literal x̄ji occurs in Cl .

Furthermore, we define Pi := {Pij , Pij | j ∈ J} and P̄i := {P̄ij , P¯ ij | j ∈ J}


for all i ∈ I, and Ql := {Qli | i ∈ I : xi occurs in Cl } for all l ∈ L. Figure 4
illustrates the paths in P̄i and path Qli .
In the first part of the proof we show that any truth assignment x̂ that satisfies
r clauses of the given Max-k-Sat(3) instance can be transformed into a set S(x̂)
of 6|I|+r edge-disjoint 4-bounded (s, t)-paths in G. Let x̂ be a truth assignment.
212 A. Bley and J. Neto

For each clause Cl that is satisfied by this truth assignment, let il (x̂) be one of
the variables whose literal evaluates to true in Cl . We define
, ,
S = S(x̂) := Pi ∪ P̄i ∪ {Qlil (x̂) | l ∈ L : Cl (x̂) = true} .
i∈I:x̂i =true i∈I:x̂i =f alse

/ all paths in S contain at most 4 edges, |S| = 6|I| + r, and all paths in
Clearly,
S ∩ i (Pi ∪ P̄i ) are edge-disjoint. Note that if some path Qli is contained in S,
then either the negated literal x̄ji occurring in clause Cl evaluates to true, which
implies that xi = f alse and Pij ∈ S, or the unnegated literal xji occurring in Cl
evaluates to true and, hence, P¯ ij ∈ S. Furthermore, observe that these paths
Pij and P¯ ij are the only paths that may be contained in S and share an edge
with Qlil . Consequently, each path Qli ∈ S is edge-disjoint to any other path in
S and, thus, all paths in S are edge-disjoint.
In the second part of the proof we show that any set S of 6|I| + r edge-disjoint
4-bounded (s, t)-paths in G can be transformed into a truth assignment x̂(S) that
satisfies a least r clauses of the given Max-k-Sat(3)
/ instance. We may ignore
path sets with |S| < 6|I|, as the path set i∈I Pi is a feasible solution for the
constructed MEDP(4) instance with 6|I| paths. Furthermore, we may restrict
our attention to path sets S that satisfy the property that, for each i ∈ I, either
Pi ⊆ S or P̄i ⊆ S. Any path set S that does not satisfy this property can be
turned into a path set S  with |S  | ≥ S that does as follows:
Suppose that, for some i, neither Pi ⊆ S nor P̄i ⊆ S. Let Si ⊆ S be the set
of paths in S that are fully contained in the variable subgraph Gi . As there are
only 6 edges adjacent to t in Gi , we have |Si | ≤ 6. Observe that each 4-bounded
(s, t)-path in G is either of the form Qli or it is fully contained in one of the
variable subgraphs Gi . Furthermore, all (s, t)-paths of length exactly 4 in Gi are
contained in Pi ∪ P̄i . The only other 4-bounded paths in Gi are the three paths
of length 3, which we denote P̄ij = (s, āji , w̄ij , t) for the negated literals x̄ji and
Pij = (s, aji , wij+1 , t) for the unnegated literals x̄ji . In terms of edge-disjointness,
however, the paths Pij and P̄ij conflict with the same 4-bounded (s, t)-paths as
the paths Pij or P̄ij , respectively. Replacing all paths Pij and P̄ij in S by the
paths Pij and P̄ij , respectively, thus yields a set of edge-disjoint 4-bounded path
of the same size as S. Hence, we can assume that Si ⊆ Pi ∪ P̄i .
Now consider the paths Qil corresponding to the clauses Cl in which variable
xi occurs. Recall that variable xi occurs exactly 3 times in the clauses, so there
are at most 3 paths Qil in S that may share an edge with the paths in Pi ∪ P̄i .
If variable xi occurs uniformly in all 3 clauses negated or unnegated, then these
three paths Qil are edge-disjoint from either all 6 paths in Pi or from all 6 paths
in P̄i . Replacing the paths in Si by Pi or P̄i , respectively, yields an edge-disjoint
path set S  with |S  | ≥ |S|. If variable xi occurs non-uniformly, then either the
paths in Pi or the paths in P̄i conflict with at most one of the three Qil paths. In
this case we have Si ≤ 5, as the only edge-disjoint path sets of size 6 in Pi ∪ P̄i are
Pi and P̄i themselves. Replacing the at most 5 paths in Si and the 1 potentially
conflicting path Qil (if it is contained in S at all) by either Pi or P̄i thus yields
Approximability of 3- and 4-Hop Bounded Disjoint Paths Problems 213

a path set S  with |S  | ≥ |S| and either Pi ⊆ S  or P̄i ⊆ S  . Repeating this


procedure for all i ∈ I, we obtain a path set with the desired property.
So, suppose we are given a set S of 4-bounded edge-disjoint (s, t)-paths in G
with |S| = 6|I| + r and Pi ⊆ S or P̄i ⊆ S for each i ∈ I. Then we define the
truth assignment x̂(S) as

true if Pi ⊂ S,
x̂i (S) := for all i ∈ I.
f alse otherwise

To see that x̂(S) satisfies at least r clauses, consider the (s, t)-cut in G formed
by the edges adjacent to node t. As S contains either Pi or P̄i for each i ∈ I,
which amounts to a total of 6|I| paths, each of the remaining r paths in S must
be of the form Qil for some i ∈ I and l ∈ L. Path Qil , however, can be contained
in S only if clause Cl evaluates to true. Otherwise it would intersect with the
path Pij or P̄ij in S that corresponds to literal xji occurring in clause Cl . Hence,
at least r clauses of the given Max-k-Sat(3) instance are satisfied by the truth
assignment x̂(S).
It now follows in a straightforward way that MEDP(4) is APX -hard. Suppose
there is an algorithm ALG to approximate MEDP(4) within a factor of α > 1 and
denote by S the solution computed by this algorithm. Let r(S) be the number
of clauses satisfied by the truth assignment x̂(S) and let |S ∗ | and r∗ be optimal
solution values of MEDP(4) and Max-k-Sat(3), respectively. The fact that at
least half of the clauses in any Max-k-Sat(3) instance can be satisfied implies
r∗ ≥ 12 |L| and, further on, r∗ ≥ 2k3
|I|. Applying the problem transformation and
algorithm ALG to a given Max-k-Sat(3) instance, we get

1 ∗ 1 1 + 4k − 4kα ∗
r(S) ≥ |S| − 6|I| ≥ |S | − 6|I| ≥ (r∗+ 6|I|) − 6|I| ≥ r
α α α

As there is a threshold c > 1 such that approximating Max-k-Sat(3) within a


factor smaller than c is N P-hard, it is also N P-hard to approximate MEDP(4)
within a factor less than c = 4kc+1
4kc+c
> 1. 


Theorem 3 immediately implies the following corollary.

Corollary 4. Given a graph G = (V, E), s, t ∈ V , and k ∈ Z+ , it is N P-hard


to decide if there are k edge-disjoint 4-bounded (s, t)-paths in G.

Now consider the weighted problem WEDP(4). By Corollary 4, it is already


N P-hard to decide whether a given subgraph of the given graph contains k edge-
disjoint (s, t)-path and, thus, comprises a feasible solution or not. Consequently,
finding a minimum cost such subgraph is N PO-complete.

Theorem 5. WEDP(4) is N PO-complete.

As a consequence of Theorem 5, it is N P-hard to approximate WEDP(4) within


a factor 2f (n) for any polynomial function f in the input size n of the problem.
214 A. Bley and J. Neto

3 Node-Disjoint 3- and 4-Bounded Paths

In this section we study the complexity of the node-disjoint paths problems.


The maximum disjoint paths problem MNDP() is known to be polynomially
solvable for  ≤ 4 and to be APX -hard for  ≥ 5 [19,18]. The weighted problem
 time for  ≤ 2, and N PO-complete for  ≥ 5.
WNDP() is solvable in polynomial
In the special case where ce ≤ f ∈C−e cf holds for every cycle C in G and every
edge e ∈ C, the weighted problem can be solved polynomially also for  = 3 and
 = 4 [19]. For  = 3, the problem can still be solved efficiently if this condition
is not satisfied.
Theorem 6. WNDP(3) can be solved in polynomial time.

Proof. Let S and T denote the set of neighbors of node s and t in the given
graph G, respectively. We may assume w.l.o.g. that each node in G is contained
in {s, t} ∪ S ∪ T , for otherwise it may not appear in any 3-bounded (s, t)-path.
We reduce WNDP(3) to the problem of finding a minimum weight matching
with cardinality k in an auxiliary graph G = (V  , E  ), which is constructed
as follows: For each node v ∈ S (resp. w ∈ T ), there is an associated node
uv ∈ V  , (resp. uw ∈ V  ). For each node v ∈ S ∩ T , there is an associated edge
ev = (uv , uv ) ∈ E  with weight csv + cvt . Choosing this edge in the matching
corresponds to choosing the path (s, v, t) in G. For each edge (v, w) ∈ (S × T ) \
(S ∩ T )2 , there is an associated edge (uv , uw ) ∈ E  , with uz = uz if z ∈ T
and uz = uz otherwise for any z ∈ V . The weight of edge (uv , uw ) is set to
csv + cvw + cwt . Choosing (uv , uw ) in the matching in G corresponds to choosing
path (s, v, w, t) in G. For each edge (v, w) ∈ (S ∩ T )2 , there is an associated
edge (uv , uw ) ∈ E  , with weight min{csv + cvw + cwt , csw + cwv + cvt }, which
represents the paths (s, v, w, t) and (s, w, v, t) in G. For each edge (s, t) ∈ E,
there is an associated edge (us , ut ) ∈ E  with weight cuv .
Clearly, this construction can be performed in polynomial time. One easily
verifies that any set of k vertex-disjoint 3-bounded (s, t)-paths in G corresponds
to a matching of size k and the same cost in G , and vice versa. Since the
cardinality constrained minimum weight matching problem can be solved in
polynomial time [28,29], the claim follows. 


For  = 4, the problem becomes at least APX -hard in the general case.
Theorem 7. WNDP(4) is (at least) APX -hard.

Proof. We use a construction similar to the one presented in the previous section
to reduce Max-k-Sat(3) to WEDP(4). Again, we let xi , i ∈ I, be the boolean
variables and Cl , l ∈ L be the clauses of the given Max-k-Sat(3) instance and
we denote the three occurrences of variable xi by xji , j ∈ J := {1, . . . , 3}.
For each l ∈ L, we construct a clause graph Hl = (Wl , Fl ) exactly as in the
proof of Theorem 3 and shown in Figure 3. For each i ∈ I, we construct a variable
graph Gi = (Vi , Ei ) as
Approximability of 3- and 4-Hop Bounded Disjoint Paths Problems 215

Vi :={s, t} ∪ {aji , āji , uji , ūji , vij , wij , rij | j ∈ J} and


Ei :={saji , sāji , suji , sūji , aji uji , āji ūji , uji vij , ūji vij ,
vij wij , uji rij , ūji rij+1 , rij t, wij t | j ∈ J} ,

where ri4 = ri1 . Figure 5 illustrates these graphs. The graph G is obtained as
the union of all Gi and Hl (sub-)graphs. Finally, we assign weight 1 to all edges
suji and sūji and weight 0 to all other edges in G. The goal in the constructed
WNDP(4) instance is to find a minimum cost subgraph of G that contains (at
least) 6|I| + |L| node-disjoint 4-bounded (s, t)-paths.
For each i ∈ I and j ∈ J, we denote the paths

Pij = (s, uji , vij , wij , t) , Pij = (s, āji , ūji , rij+1 , t) , Pij = (s, ūji , rij+1 , t) ,
P̄ij = (s, ūji , vij , wij , t) , P̄ij = (s, aji , uji , rij , t) , P̄ij = (s, uji , rij , t) .

For each variable xi that occurs in clause Cl , we denote



(s, aji , bl , cl , t) if literal xji occurs in Cl ,
Qli =
(s, āji , bl , cl , t) if literal x̄ji occurs in Cl .

Note that these are the only 4-bounded (s, t)-paths in G. Furthermore, we let
Pi := {Pij , Pij | j ∈ J}, P̄i := {P̄ij , P¯ ij | j ∈ J}, and Ql := {Qli | i ∈ I :
xi occurs in Cl }. Figure 5 illustrates the paths in P̄i and path Qli .
As in the proof of Theorem 3, one finds that a truth assignment x̂ that satisfies
r clauses of the given Max-k-Sat(3) instance corresponds to a path set
, ,
S = S(x̂) := Pi ∪ P̄i ∪ {Qlil (x̂) | l ∈ L : Cl (x̂) = true}
i∈I:x̂i =true i∈I:x̂i =f alse

with |S| = 6|I| + r and cost c(S) = 3|I|. In order to obtain a set of 6|I| + |L|
paths, we modify S as follows: For each l ∈ L with Cl (x̂) = f alse, we arbitrarily
chose one i such that xji or x̄ji occurs in Cl , add the path Qli to S, and replace the

a1i u1i vi1 ū1i ā1i a2i u2i vi2 ū2i ā2i a3i u3i vi3 ū3i ā3i

ri1 wi1 ri2 wi2 ri3 wi3 bl


cl

Fig. 5. Union of Gi and Hl for variable xi and clause Ck = (x̄3i ∨ . . . ). Thick lines are
paths in P̄i and path Qli .
216 A. Bley and J. Neto

path Pij or P̄ij in S with Pij or P̄ij , respectively. This modification maintains
the node-disjointness of the paths in S and increases both the size and the cost
of S by |L| − r. The cost of the resulting path set S thus is

c(S(x̂)) = 3|I| + |L| − r . (1)

Conversely, one finds that any set S of 6|I| + |L| node-disjoint 4-bounded (s, t)-
paths must contain one path from each set Ql and 6 paths within each variable
subgraph Gi . The only way to have 6 node-disjoint 4-bounded path within Gi ,
however, is to have either all 3 paths Pij or all 3 paths P̄ij , complemented
with 3 paths of the type Pij and Pij or with 3 paths of the type P̄ij and P̄ij ,
respectively. The cost of such a path set is equal to the number of Pij and P̄ij
paths it contains, which amounts to a total of 3|I|, plus the number of Pij and
P̄ij paths. Note that the paths Pij and P̄ij contain only a subset of the nodes
in Pij and P̄ij , respectively, and that the cost induced by Pij and P̄ij is 1, while
the cost induced by Pij and P̄ij is 0. Thus, we may assume that S contains path
Pij or P̄ij only if it contains path Qli for the clause l in which literal xji occurs.
Let x̂(S) be the truth assignment defined as

true if Pi1 ∈ S,
x̂i (S) := for all i ∈ I.
f alse otherwise,

Consider a path Qli ∈ S and suppose Cl (x̂(S)) = f alse. Then also the literal xji
or x̄ji occurring in Cl evaluates to f alse. Since S contains either Pij or P̄ij , it
also must contain Pij or P̄ij , respectively. As these paths induce a cost of one,
the number of clauses satisfied by x̂(S) is

r(x̂(S)) ≥ |L| + 3|I| − c(S) . (2)

As in the proof of Theorem 3, it follows straightforward from (1) and (2) that
approximation ratios are transformed linearly by the presented reduction and,
hence, WNDP(4) is APX -hard. 

Unfortunately, it remains open if WNDP(4) is approximable within a constant
factor or not. The best known approximation ratio for WNDP(4) is O(k), which
is achieved by a simple greedy algorithm.
Theorem 8. WNDP(4) can be approximated within a factor of 4k.

Proof. Consider the algorithm, which adds the edges in order of non-decreasing
cost until the constructed subgraph contains k node-disjoint 4-bounded (s, t)-
paths and then returns the subgraph defined by these paths. As, in each iteration,
we can check in polynomial time whether such paths exist or not [18], this
algorithms runs in polynomial time. Furthermore, the optimal solution must
contain at least one edge whose cost is at least as big as the cost of the last edge
added by the greedy algorithm. Therefore, the total cost of the greedy solution
is at most 4k times the optimal solution’s cost. 

Approximability of 3- and 4-Hop Bounded Disjoint Paths Problems 217

4 Conclusion

In this paper we show that the maximum edge-disjoint 4-bounded paths prob-
lem MEDP(4) is APX -complete and that the corresponding weighted edge-
disjoint paths problem WEDP(4) is N PO-complete. The weighted node-disjoint
-bounded paths problem was proven to be polynomially solvable for  = 3 and
to be at least APX -hard for  = 4. This closes all basic complexity issues that
were left open in [18,19]. In addition, we presented a 2-approximation algorithm
for WEDP(4) and a 4k-approximation algorithm WNDP(4). It remains open
whether WNDP(4) is approximable within a factor better than O(k) or if there
is a stronger, non-constant approximation threshold.
The hardness results and algorithms presented in this paper also hold for
directed graphs and for graphs containing parallel edges.

References

1. Alevras, D., Grötschel, M., Wessäly, R.: Capacity and survivability models
for telecommunication networks. ZIB Technical Report SC-97-24, Konrad-Zuse-
Zentrum für Informationstechnik Berlin (1997)
2. Gouveia, L., Patricio, P., Sousa, A.D.: Compact models for hop-constrained node
survivable network design: An application to MPLS. In: Anandaligam, G., Ragha-
van, S. (eds.) Telecommunications Planning: Innovations in Pricing, Network De-
sign and Management. Springer, Heidelberg (2005)
3. Gouveia, L., Patricio, P., Sousa, A.D.: Hop-constrained node survivable network
design: An application to MPLS over WDM. Networks and Spatial Economics 8(1)
(2008)
4. Grötschel, M., Monma, C., Stoer, M.: Design of Survivable Networks. In: Hand-
books in Operations Research and Management Science, Volume Networks, pp.
617–672. Elsevier, Amsterdam (1993)
5. Chakraborty, T., Chuzhoy, J., Khanna, S.: Network design for vertex connectivity.
In: Proceedings of the 40th Annual ACM Symposium on the Theory of Computing
(STOC ’08), pp. 167–176 (2008)

6. Chekuri, C., Khanna, S., Shepherd, F.: An O( n) approximation and integrality
gap for disjoint paths and unsplittable flow. Theory of Computing 2, 137–146
(2006)
7. Lando, Y., Nutov, Z.: Inapproximability of survivable networks. Theoretical Com-
puter Science 410, 2122–2125 (2009)
8. Menger, K.: Zur allgemeinen Kurventheorie. Fund. Mathematicae 10, 96–115
(1927)
9. Lovász, L., Neumann-Lara, V., Plummer, M.: Mengerian theorems for paths of
bounded length. Periodica Mathematica Hungarica 9(4), 269–276 (1978)
10. Exoo, G.: On line disjoint paths of bounded length. Discrete Mathematics 44,
317–318 (1983)
11. Niepel, L., Safarikova, D.: On a generalization of Menger’s theorem. Acta Mathe-
matica Universitatis Comenianae 42, 275–284 (1983)
12. Ben-Ameur, W.: Constrained length connectivity and survivable networks. Net-
works 36, 17–33 (2000)
218 A. Bley and J. Neto

13. Pyber, L., Tuza, Z.: Menger-type theorems with restrictions on path lengths. Dis-
crete Mathematics 120, 161–174 (1993)
14. Baier, G.: Flows with path restrictions. PhD thesis, Technische Universität Berlin
(2003)
15. Martens, M., Skutella, M.: Length-bounded and dynamic k-splittable flows. In:
Operations Research Proceedings 2005, pp. 297–302 (2006)
16. Mahjoub, A., McCormick, T.: Max flow and min cut with bounded-length paths:
Complexity, algorithms and approximation. Mathematical Programming (to ap-
pear)
17. Baier, G., Erlebach, T., Hall, A., Köhler, E., Kolman, P., Panagrác, O., Schilling,
H., Skutella, M.: Length-bounded cuts and flows. ACM Transactions on Algorithms
(to appear)
18. Itai, A., Perl, Y., Shiloach, Y.: The complexity of finding maximum disjoint paths
with length constraints. Networks 12, 277–286 (1982)
19. Bley, A.: On the complexity of vertex-disjoint length-restricted path problems.
Computational Complexity 12, 131–149 (2003)
20. Perl, Y., Ronen, D.: Heuristics for finding a maximum number of disjoint bounded
paths. Networks 14 (1984)
21. Wagner, D., Weihe, K.: A linear-time algorithm for edge-disjoint paths in planar
graphs. Combinatorica 15(1), 135–150 (1995)
22. Botton, Q., Fortz, B., Gouveia, L.: The k-edge 3-hop constrained network design
polyhedron. In: Proceedings of the 9th INFORMS Telecommunications Conference,
Available as preprint at Université Catholique de Louvain: Le polyedre du probleme
de conception de reseaux robustes K-arête connexe avec 3 sauts (2008)
23. Dahl, G., Huygens, D., Mahjoub, A., Pesneau, P.: On the k edge-disjoint 2-hop-
constrained paths polytope. Operations Research Letters 34, 577–582 (2006)
24. Huygens, D., Mahjoub, A.: Integer programming formulations for the two 4-hop
constrained paths problem. Networks 49(2), 135–144 (2007)
25. Golovach, P., Thilikos, D.: Paths of bounded length and their cuts: Parameterized
complexity and algorithms. In: Chen, J., Fomin, F. (eds.) IWPEC 2009. LNCS,
vol. 5917, pp. 210–221. Springer, Heidelberg (2009)
26. Guruswami, V., Khanna, S., Shepherd, B., Rajaraman, R., Yannakakis, M.: Near-
optimal hardness results and approximation algorithms for edge-disjoint paths and
related problems. In: Proceedings of the 31st Annual ACM Symposium on the
Theory of Computing (STOC ’99), pp. 19–28 (1999)
27. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Pro-
tasi, M.: Complexity and Approximation: Combinatorial Optimization Problems
and their Approximability Properties. Springer, Heidelberg (1999)
28. Edmonds, J.: Maximum matching and a polyhedron with 0-1 vertices. Journal of
Research of the National Bureau of Standards 69B, 125–130 (1965)
29. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency. Springer, Hei-
delberg (2003)
A Polynomial-Time Algorithm for Optimizing
over N -Fold 4-Block Decomposable Integer
Programs

Raymond Hemmecke1 , Matthias Köppe2 , and Robert Weismantel3


1
Technische Universität Munich, Germany
2
University of California, Davis, USA
3
ETH, Zürich, Switzerland

Abstract. In this paper we generalize N -fold integer programs and two-


stage integer programs with N scenarios to N -fold 4-block decomposable
integer programs. We show that for fixed blocks but variable N , these
integer programs are polynomial-time solvable for any linear objective.
Moreover, we present a polynomial-time computable optimality certifi-
cate for the case of fixed blocks, variable N and any convex separable
objective function. We conclude with two sample applications, stochastic
integer programs with second-order dominance constraints and stochas-
tic integer multi-commodity flows, which (for fixed blocks) can be solved
in polynomial time in the number of scenarios and commodities and in
the binary encoding length of the input data. In the proof of our main
theorem we combine several non-trivial constructions from the theory
of Graver bases. We are confident that our approach paves the way for
further extensions.

Keywords: N -fold integer programs, Graver basis, augmentation al-


gorithm, polynomial-time algorithm, stochastic multi-commodity flow,
stochastic integer programming.

1 Introduction
Let A ∈ Zd×n be a matrix. We associate with A a finite set G(A) of vectors with
remarkable properties. Consider the set ker(A) ∩ Zn . Then we put into G(A) all
nonzero vectors v ∈ ker(A) ∩ Zn that cannot be written as a sum v = v + v of
nonzero vectors v , v ∈ ker(A)∩Zn that lie in the same orthant (or equivalently,
have the same sign pattern in {≥ 0, ≤ 0}n ) as v. The set G(A) has been named
the Graver basis of A, since Graver [6] introduced this set G(A) in 1975 and
showed that it constitutes an optimality certificate for a whole family of integer
linear programs that share the same problem matrix, A. By this we mean, that
G(A) provides an augmenting vector/step to any non-optimal feasible solution
and hence allows the design of a simple augmentation algorithm to solve the
integer linear program.
In the last 10 years, a tremendous theoretical progress has been made in the
theory of Graver bases. It has been shown that G(A) constitutes an optimal-
ity certificate for a much wider class of integer minimization problems, namely

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 219–229, 2010.

c Springer-Verlag Berlin Heidelberg 2010
220 R. Hemmecke, M. Köppe, and R. Weismantel

for those minimizing a concave or a separable convex objective function over


{z : Az = b, l ≤ z ≤ u, z ∈ Zn } [2,13,15]. Moreover, it has been shown that only
polynomially many Graver basis augmentation steps are needed to find a feasible
solution and to turn it into an optimal feasible solution [7,8,18]. Finally, based
on the fundamental finiteness results for certain highly structured matrices A
(N -fold IPs and two- and multi-stage stochastic IPs) [1,9,10,17], it has been
shown that concave and separable convex N -fold IPs and two- and multi-stage
stochastic IPs can be solved in polynomial time [3,8] for fixed blocks.
In this paper, we will combine the two cases of N -fold IPs and of two-stage
stochastic IPs by considering problems with a problem matrix that is N -fold
4-block decomposable as follows:
⎛ ⎞
C D D ··· D
⎜B A 0 0⎟
⎜ ⎟
(N ) ⎜B 0 A 0⎟
C D
(B A) := ⎜ ⎟
⎜ .. .. ⎟
⎝ . . ⎠
B 0 0 A
(N )
for some given N ∈ Z+ and N copies of A. We call ( C D
B A) an N -fold 4-block
matrix. For B = 0 and C = 0 we recover the problem matrix of an N -fold IP and
for C = 0 and D = 0 we recover the problem matrix of a two-stage stochastic
IP.
Note that N -fold 4-block decomposable matrices also arise in the context of
combinatorial optimization [19,20]. More precisely, for totally unimodular ma-
trices C, A their 1-sum is totally unimodular (B = 0, D = 0). Similarly, total
unimodularity is preserved under the 2-sum and 3-sum composition. Indeed,
it can be verified that a repeated application of specialized 1-sum, 2-sum and
3-sum compositions leads to a particular family of N -fold 4-block decomposable
matrices with structure regarding the matrices B and D.
Example. For matrices C and A, column vector   a and row
 vector b of appro-
 
b
priate dimensions, the 2-sum of ( C a ) and A gives C ab
! 0 A . The 2-sum of
    C ab ab
C ab a and b creates the matrix 0 A 0 , which is the 2-fold 4-block
0 A 0 B
  (2)
0 0 A
decomposable matrix C0 ab A . 
Our main result is the following.
Theorem 1. Let A ∈ ZdA ×nA , B ∈ ZdA ×nB , C ∈ ZdC ×nB , D ∈ ZdC ×nA be
fixed matrices. For given N ∈ Z+ let l, u ∈ ZnB +N nA , b ∈ ZdC +N dA , and let
f : RnB +N nA → R be a separable convex function and denote by fˆ the maximum
of |f | over the feasible region of the convex integer minimization problem

(IP)N,b,l,u,f : C D )(N ) z = b, l ≤ z ≤ u, z ∈ ZnB +N nA .


min f (z) : ( B A

We assume that f is given only by a comparison oracle that, when queried on


z and z decides whether f (z) < f (z ), f (z) = f (z ) or f (z) > f (z ). Then the
following hold:
A Polynomial-Time Algorithm 221

(a) There exists an algorithm that computes a feasible solution to (IP)N,b,l,u,f


or decides that no such solution exists and that runs in time polynomial in
N , in the binary encoding lengths "l, u, b#.
(b) Given a feasible solution z0 to (IP)N,b,l,u,f , there exists an algorithm that de-
cides whether z0 is optimal or finds a better feasible solution z1 to (IP)N,b,l,u,f
with f (z1 ) < f (z0 ) and that runs in time polynomial in N , in the binary en-
coding lengths "l, u, b, fˆ#, and in the number of calls to the evaluation oracle
for f .
(c) If f is linear, there exists an algorithm that finds an optimal solution to
(IP)N,b,l,u,f or decides that (IP)N,b,l,u,f is infeasible or unbounded and that
runs in time polynomial in N , in the binary encoding lengths "l, u, b, fˆ#, and
in the number of calls to the evaluation oracle for f .

This theorem generalizes a similar statement for N -fold integer programming


and for two-stage stochastic integer programming. In these two special cases, one
can even prove claim (c) of Theorem 1 for all separable convex functions and for
a certain class of separable convex functions, respectively. It is a fundamental
open question, whether one can construct not only some augmenting vector
for a given separable convex objective function f in polynomially many steps
but a best-improvement
! (or greedy) augmentation step αv with α ∈ Z+ and
(N )
v ∈ G (B C D)
A . If this can be done, part (c) of Theorem 1 can be extended
from linear f to a class of separable convex functions f by applying the main
result from [8].
In fact, Theorem! 1 will be a consequence of the following structural result
about G ( B C D )(N ) .
A

A ×nA
Theorem 2. If A ∈ Zd , B ∈ ZdA ×nB , C !
∈ ZdC ×nB , D ∈ ZdC ×nA are
(N )
fixed matrices, then max v1 : v ∈ G ( B A )
C D is bounded by a polynomial
in N .

It should be noted that in the two special cases of N -fold IPs and of two-stage
stochastic IPs each component of any Graver basis element is bounded by a
constant (depending only on the fixed problem matrices and not on N ). Hence,
Theorem 2 specialized to these two cases is essentially trivial. In the general
N -fold 4-block situation, however, each component of any Graver basis element is
bounded only by a polynomial in N . This fact demonstrates that N -fold 4-block
IPs are much richer and more difficult to solve than the two special cases of
N -fold IPs and of two-stage stochastic IPs. Moreover, a proof of Theorem 2 in
this general setting is not obvious.
In the next section, we present two sample applications of Theorem 1: stochas-
tic integer programming with second-order dominance constraints [5,11] and
stochastic integer multi-commodity flows [12,16]. For both cases we will develop
tractability results based on our general theory. We do, however, not claim that
these algorithms are particularly useful in practice. While the first application
has an N -fold 4-block matrix as problem matrix, the second application can be
222 R. Hemmecke, M. Köppe, and R. Weismantel

modeled as an N -fold 4-block IP after a suitable transformation. To state the


result, we introduce the following type of matrices. For given N ∈ Z+ let
⎛ ⎞
A B ··· B
⎜ .. .. .. ⎟
⎜ . . .⎟
⎜ ⎟
⎜ A B · · · B ⎟
A B
[D C ]
(N ) ⎜
:= ⎜ ⎟,
D · · · D C ⎟
⎜ ⎟
⎜ . . . ⎟
⎝ .. .. .. ⎠
D ··· D C
where we have N copies of A and of C. Then the following holds.
Corollary 1. Let A ∈ ZdA ×nA , B ∈ ZdA ×nB , C ∈ ZdC ×nB , D ∈ ZdC ×nA be
fixed matrices. For given N ∈ Z+ let l, u ∈ ZN (nA +nB ) , b ∈ ZN (dA +dC ) , and let
f : RN (nA +nB ) → R be a separable convex function and denote by fˆ the maximum
of |f | over the feasible region of the convex integer minimization problem

(IP)N,b,l,u,f : A B ](N ) z = b, l ≤ z ≤ u, z ∈ ZN (nA +nB ) .


min f (z) : [ D C

We assume that f is given only by a comparison oracle that, when queried on


z and z decides whether f (z) < f (z ), f (z) = f (z ) or f (z) > f (z ). Then the
following hold:
(a) There exists an algorithm that computes a feasible solution to (IP)N,b,l,u,f
or decides that no such solution exists and that runs in time polynomial in
N , in the binary encoding lengths "l, u, b#.
(b) Given a feasible solution z0 to (IP)N,b,l,u,f , there exists an algorithm that de-
cides whether z0 is optimal or finds a better feasible solution z1 to (IP)N,b,l,u,f
with f (z1 ) < f (z0 ) and that runs in time polynomial in N , in the binary en-
coding lengths "l, u, b, fˆ#, and in the number of calls to the evaluation oracle
for f .
(c) If f is linear, there exists an algorithm that finds an optimal solution to
(IP)N,b,l,u,f or decides that (IP)N,b,l,u,f is infeasible or unbounded and that
runs in time polynomial in N , in the binary encoding lengths "l, u, b, fˆ#, and
in the number of calls to the evaluation oracle for f .
We do now present problems to which Theorem 1 and its Corollary 1 apply.
Thereafter, we prove our claims. Our proof of Theorem 1 combines several non-
trivial constructions from the theory of Graver bases. Although each of these
constructions has been used before, we are confident that our combined approach
paves the way for further extensions.

2 Sample Applications
In this section we present two N -fold 4-block decomposable integer programming
problems that are polynomial-time solvable for given fixed blocks and variable
N by Theorem 1 and its Corollary 1.
A Polynomial-Time Algorithm 223

2.1 Stochastic Integer Multi-commodity Flow

Stochastic integer multi-commodity flows have been considered for example in


[12,16]. Let us now introduce our setting. Let there be M integer (in contrast
to continuous) commodities to be transported over a given network. While we
assume that supply and demands are deterministic, we assume that the upper
bounds for the capacities per edge are uncertain and given initially only via
some probability distribution. The problem setup is as follows: first, we have
to decide how to transport the M commodities over the given network without
knowing the true capacities per edge. Then, after observing the true capacities
per edge, penalties have to be paid if the capacity is exceeded. Assuming that
we have knowledge about the probability distributions of the uncertain upper
bounds, we wish to minimize the costs for the integer multi-commodity flow
plus the expected penalties to be paid for exceeding capacities. To solve this
problem, we discretize as usual the probability distribution for the uncertain
upper bounds into N scenarios. Doing so, we obtain a (typically large-scale)
(two-stage stochastic) integer programming problem with problem matrix
⎛ ⎞
A 0 0 ··· 0
0
⎜ .. .. .. .. ⎟
..
⎜ . . . .. ⎟
⎜ ⎟
⎜ A 0 0 ··· 0 0 ⎟
⎜ ⎟.
⎜ I · · · I I −I ⎟
⎜ ⎟
⎜. . .. ⎟
⎝ .. .. . ⎠
I ··· I I −I

Herein, A is the node-edge incidence matrix of the given network, I is an identity


matrix of appropriate size, and the columns containing −I correspond to the
penalty variables. If the network is kept fix, A, I, and −I are fix, too. As the
& '(N )
problem matrix is simply A 0
I ( I −I ) , we can apply Corollary 1 and obtain
the following.

Theorem 3. For given fixed network the two-stage stochastic integer linear
multi-commodity flow problem is solvable in polynomial time in the number M
of commodities, in the number N of scenarios, and in the encoding lengths of
the input data.

Proof. The only issue that prevents us to apply Corollary 1 directly is the fact
that M and N are different. But by introducing additional commodities or sce-
narios, we can easily obtain an equivalent (bigger) problem with M = N for
which we can apply Corollary 1. If M < N , we introduce additional commodi-
ties with zero flow and if M > N , we take one scenario, copy it additional M −N
times and choose for each of these M − N + 1 identical scenarios 1/(M − N + 1)
times the original cost vector. So, in total, these M − N + 1 scenarios are equiv-
alent to the one we started from. 
224 R. Hemmecke, M. Köppe, and R. Weismantel

It should be noted that we can extend the problem and still get the same poly-
nomiality result. For example, we may assume that we are allowed to change
the routing of the M commodities in the second-stage decision. Penalties could
be enforced for the amount of change of the first-stage decision or only for the
amount of additional flow on edges compared to the first-stage decision. Writ-
ing down the constraints and introducing suitable additional variables with zero
lower and upper bounds, one obtains again a problem matrix that allows the
application of Corollary 1.

2.2 Stochastic Integer Programs with Second-Order Dominance


Constraints

Stochastic integer programs with second-order dominance constraints were con-


sidered for example in [5,11]. We will consider here the special situation where
all scenarios have the same probability. In Proposition 3.1 of [5], the following
mixed-integer linear program was obtained as a deterministic equivalent to solve
the stochastic problem at hand. We refer the reader to [5] for the details.
⎧ ⎫

⎪ c x + q ylk − ak ≤ vlk ∀l∀k ⎪


⎪ T x + W ylk = zl ∀l∀k ⎪

⎨  ⎬
 L
(SIP) : min g x : πl v lk ≤ ā k ∀k


l=1 ⎪


⎪ x ∈ X, ⎪

⎩  ⎭
ylk ∈ Z+ × R+ , vlk ≥ 0 ∀l∀k
m̄ m

We assume now that all variables are integral and, for simplicity of exposition,
we assume that the inequalities of the polyhedron X are incorporated into the
constraints T x + W ylk = zl . Moreover, we assume that all scenarios have the
same probability, that is, πl = 1/L, l = 1, . . . , L.

Theorem 4. For given fixed matrices T and W and for fixed number K, problem
(SIP) is solvable in polynomial time in the number L of (data) scenarios, and in
the encoding lengths of the input data.

Proof. We transform the problem in such a way that Theorem 1 can be applied.
First, we include the constraints c x + q ylk − ak ≤ vlk into the constraint
Tx +) W y*
lk = zl (by adding
) slack*variables to get an equation). Then, we set
T W
T̄ = .. and W̄ = .. , in which we use K copies of T and W ,
. .
T W
respectively. As T , W , and K are assumed to be fixed, so are T̄ and W̄ . With
this, the problem matrix now becomes
⎛ ⎞
0 I ··· I I
⎜ T̄ W̄ 0⎟
⎜ ⎟
⎜ .. .. .. ⎟ .
⎝. . .⎠
T̄ W̄ 0
A Polynomial-Time Algorithm 225

Introducing suitable additional variables with zero lower and upper bounds, we
obtain a problem matrix of the form ( BC D )(l) with A = (
A W̄ 0 ), B = T̄ , C = 0,
and D = ( I I ). Thus, we can apply Theorem 1 and the result follows. 

3 Proof of Main Results


For a concise introduction to Graver bases (and to the results on N -fold IPs),
including short proofs to the main results, we refer the reader to the survey
paper by Onn [14]. In this section, we state and prove results on Graver bases
needed for the proof of our main theorem in the next section. Let us start by
bounding the 1-norm of Graver basis elements of matrices with only one row.
This lemma is a straight-forward consequence of Theorem 2 in [4].
Lemma 1. Let A ∈ Z1×n be a matrix consisting of only one row and let M
be an upper bound on the absolute values of the entries of A. Then we have
max{v1 : v ∈ G(A)} ≤ 2M − 1.
Let us now prove some more general degree bounds on Graver bases that we will
use in the proof of the main theorem below.
Lemma 2. Let A ∈ Zd×n and let B ∈ Zm×n . Moreover, put C := ( B
A ). Then

we have

max{v1 : v ∈ G(C)} ≤ max{λ1 : λ ∈ G(B · G(A))} · max{v1 : v ∈ G(A)}.

Proof. Let v ∈ G(C). Then v ∈ ker(A) implies that


 v can be written as a non-
negative integer linear sign-compatible sum v = λi gi using Graver basis vec-
tors gi ∈ G(A). Adding zero components if necessary, we can write v = G(A)λ.
We now claim that v ∈ G(C) implies λ ∈ G(B · G(A)) and the result follows.
First, observe that v ∈ ker(B) implies Bv = B · (G(A)λ) = (B · G(A))λ = 0
and thus, λ ∈ ker(B · G(A)). If λ ∈ G(B · G(A)), then it can be written as a
sign-compatible sum λ = μ + ν with μ, ν ∈ ker(B · G(A)). But then

v = (G(A)μ) + (G(A)ν)

gives a sign-compatible decomposition of v into vectors G(A)μ, G(A)ν ∈ ker(C),


contradicting the minimality property of v ∈ G(C). Hence, λ ∈ G(B · G(A)) and
the result follows. 
We will employ the following simple corollary.
Corollary 2. Let A ∈ Zd×n and let a ∈ Zn be a row vector. Moreover, put
C := ( A
a ). Then we have

max{v1 : v ∈ G(C)} ≤ (2 · max {|a v| : v ∈ G(A)} − 1)·max{v1 : v ∈ G(A)}.

In particular, if M := max{|a(i) | : i = 1, . . . , n} then


2
max{v1 : v ∈ G(C)} ≤ 2nM (max{v1 : v ∈ G(A)}) .
226 R. Hemmecke, M. Köppe, and R. Weismantel

Proof. By Lemma 2, we already get

max{v1 : v ∈ G(C)} ≤ max{λ1 : λ ∈ G(a · G(A))} · max{v1 : v ∈ G(A)}.

Now, observe that a · G(A) is a 1 × |G(A)|-matrix. Thus, the degree bound of


primitive partition identities, Lemma 1, applies, which gives

max{λ1 : λ ∈ G(a · G(A))} ≤ 2 · max {|a v| : v ∈ G(A)} − 1,

and thus, the first claim is proved. The second claim is a trivial consequence of
the first. 
Let us now extend this corollary to a form that we need to prove Theorem 1.
Corollary 3. Let A ∈ Zd×n and let B ∈ Zm×n . Let the entries of B be bounded
A ). Then we have
by M in absolute value. Moreover, put C := ( B
m
−1 2m
max{v1 : v ∈ G(C)} ≤ (2nM )2 (max{v1 : v ∈ G(A)}) .

Proof. This claim follows by simple induction, adding one row of B at a time,
and by using the second inequality of Corollary 2 to bound the sizes of the
intermediate Graver bases in comparison to the Graver basis of the matrix with
one row of B less. 
0
In order to prove Theorem 1, let us consider the submatrix ( B 0 (N ) .
A) A main
result from [9] is the following.
Lemma 3. Let A ∈ ZdA ×nA and B ∈ ZdA ×nB . There exists a number g ∈ Z+
depending only on A and !B but not on N such that for every N ∈ Z+ and
for every v ∈ G ( B 0 0 )(N ) , the components of v are bounded by g in absolute
A
!
value. In particular, v1 ≤ (nB + N nA )g for all v ∈ G ( B0 0 )(N ) .
A

Combining this result with Corollary 3, we get a bound for the 1-norms of the
C D )(N ) . Note that the second claim of the following
Graver basis elements of ( B A
corollary is exactly Theorem 2.
Corollary 4. Let A ∈ ZdA ×nA , B ∈ ZdA ×nB , C ∈ ZdC ×nB , D ∈ ZdC ×nA be
given matrices. Moreover, let M be a bound on the absolute values of the entries
in C and D, and let g ∈ Z+ be the number from Lemma 3. Then for any N ∈ Z+
we have
!

max v1 : v ∈ G ( C D (N )
B A)
!
!2dC
≤ (2(nB + N nA )M )2 −1 max v1 : v ∈ G ( B
dC
0 0 )(N )
A
dC
−1 2dC
≤ (2(nB + N nA )M )2 ((nB + N nA )g) .
!

If A, B, C, D are fixed matrices, then max v1 : v ∈ G ( B


C D )(N ) is bounded
A
by a polynomial in N .
A Polynomial-Time Algorithm 227

Proof. While the first claim is a direct consequence of Lemma 3 and Corollary
3, the polynomial bound for fixed matrices A, B, C, D and varying N follows
immediately by observing that nA , nB , dC , M, g are constants as they depend
only on the fixed matrices A, B, C, D. 
Now we are ready to prove our main theorem.
Proof of Theorem 1. Let N ∈ Z+ , l, u ∈ ZnB +N nA , b ∈ ZdC +N dA , and a
separable convex function f : RnB +N nA → R be given. To prove claim (a),
observe that one can turn any integer solution to ( C D (N ) z = b (which can
B A)
be found in polynomial time using for example the Hermite normal form of
C D )(N ) ) into a feasible solution (that in addition fulfills l ≤ z ≤ u) by a
(B A
sequence of linear integer programs (with the same problem matrix ( B C D )(N ) )
A
that “move” the components of z into the direction of the given bounds, see [7].
This step is similar to phase I of the Simplex Method in linear programming.
In order to solve these linear integer programs, it suffices (by the ! result of [18])
(N )
to find Graver basis augmentation vectors from G ( B A ) C D for a directed
augmentation oracle. So, claim (b) will imply both claim (a) and claim (c).
Let us now assume that we are given a feasible solution z0 = (x, y1 , . . . , yN )
and that we wish to decide whether there exists another feasible solution z1
with f (z1 ) < f (z0 ). By the main result in [13], it suffices to decide whether
there exists some vector v = (x̂, ŷ1 , . . . , ŷN ) in the Graver basis of ( C D (N )
B A)
such that z0 + v is feasible and f (z0 + v) < f (z0 ). By Corollary 4 and by the
fact that nB is constant, there is only a polynomial number of candidates for
the x̂-part of v. For each such candidate x̂, we can find a best possible choice
for ŷ1 , . . . , ŷN by solving the following separable convex N -fold IP:
⎧ ⎛ ⎞ ⎫


x+x̂



⎪ D )(N ) ⎝
y1 +ŷ1
⎠ = b, ⎪


⎪ (C . ⎪


⎪ B A .. ⎪


⎪ ⎪


⎪ ⎛⎛ x+x̂ ⎞⎞ yN +ŷN ⎪


⎨ ⎪

y1 +ŷ1 ⎛ ⎞
min f ⎝⎝ .. ⎠⎠ : x+x̂
,

⎪ . y1 +ŷ1
⎠ ≤ u, ⎪⎪


⎪ yN +ŷN l≤⎝ .. ⎪



⎪ . ⎪


⎪ yN +ŷN ⎪


⎪ ⎪


⎪ ⎪

⎩ nA ⎭
y1 , . . . , yN ∈ Z

for given z0 = (x, y1 , . . . , yN ) and x̂. Observe that the problem (IP)N,b,l,u,f does
(N )
indeed simplify to a separable convex N -fold IP with problem matrix ( 00 D A)
because z0 = (x, y1 , . . . , yN ) and x̂ are fixed. For fixed matrices A and D,
however, each such N -fold IP is solvable in polynomial time [8]. If the N -fold IP
is feasible and if for the resulting optimal vector v := (x̂, ŷ1 , . . . , ŷN ) we have
f (z0 + v) ≥ f (z0 ), then no augmenting vector can be constructed using this
particular choice of x̂. If on the other hand we have f (z0 + v) < f (z0 ), then v
is a desired augmenting vector for z0 and we can stop. As we solve polynomially
228 R. Hemmecke, M. Köppe, and R. Weismantel

many polynomially solvable N -fold IPs, claim (b) and thus also claims (a) and
(c) follow. 

Proof of Corollary 1. To prove Corollary 1, observe that after introducing


additional variables, problem (IP)N,b,l,u,f can be modeled as an N -fold 4-block
IP and is thus polynomial-time solvable by Theorem 1. First, write the constraint
A B ](N ) z = b in (IP)
[D C N,b,l,u,f as follows:
⎛ ⎞
A B ··· B ⎛ b1 ⎞
⎜ .. .. .. ⎟ ⎛ x1 ⎞
⎜ . . ⎟ .
⎜ . ⎟ ⎜ .. ⎟ ⎜ ... ⎟
⎜ A B ··· B⎟ ⎜ bN ⎟
⎜ ⎟⎜ xN ⎟
y1 ⎟ = ⎜ ⎟
⎜D ··· ⎟⎜ ⎜ bN +1 ⎟ .
⎜ DC ⎟⎝ . ⎠ ⎝ . ⎠
⎜ . .. .. ⎟ . ..
⎝ .. . . ⎠ y.N b2N
D ··· D C
N N
Now introduce variables wx = i=1 xi and wy = i=1 yi . Then we get the new
constraints
⎛ ⎞
−I I I ··· I
⎜ −I I I ··· I⎟ ⎛ 0 ⎞
⎜ ⎟
⎜D C ⎟ ⎛ wx ⎞ 0
⎜ ⎟ wy ⎜ b1 ⎟
⎜ ⎟ x1
⎜ B A ⎜
⎟ ⎜ y ⎟ ⎜ .. ⎟
⎜D ⎟⎜ 1 ⎟ ⎜ . ⎟
⎜ C ⎟ ⎜ . ⎟ = ⎜ bN ⎟ .
⎜ B A ⎟ ⎝ .. ⎠ ⎜ bN +1 ⎟

⎜ ⎟ x ⎝ . ⎠
⎜ .. .. ⎟ yN
⎜ . . ⎟ ..
⎜ ⎟
N

⎝D C ⎠ b2N

B A

Hence, (IP)N,b,l,u,f can be modeled as an N -fold 4-block decomposable IP and


thus, Corollary 1 follows by applying Theorem 1 to this transformed integer
program. 
Acknowledgments. We wish to thank Rüdiger Schultz for valuable comments
on Section 2 and for pointing us to [5]. The second author was supported by
grant DMS-0914873 of the National Science Foundation.

References
1. Aschenbrenner, M., Hemmecke, R.: Finiteness theorems in stochastic integer pro-
gramming. Foundations of Computational Mathematics 7, 183–227 (2007)
2. De Loera, J.A., Hemmecke, R., Onn, S., Rothblum, U., Weismantel, R.: Convex
integer maximization via Graver bases. Journal of Pure and Applied Algebra 213,
1569–1577 (2009)
3. De Loera, J.A., Hemmecke, R., Onn, S., Weismantel, R.: N-fold integer program-
ming. Discrete Optimization 5, 231–241 (2008)
A Polynomial-Time Algorithm 229

4. Diaconis, P., Graham, R., Sturmfels, B.: Primitive partition identities. In: Miklós,
D., Sós, V.T., Szonyi, T. (eds.) Combinatorics, Paul Erdos is Eighty, pp. 173–192.
Janos Bolyai Mathematical Society, Budapest (1996)
5. Gollmer, R., Gotzes, U., Schultz, R.: A note on second-order stochastic dominance
constraints induced by mixed-integer linear recourse. Mathematical Programming
(to appear, 2010), doi:10.1007/s10107-009-0270-0
6. Graver, J.E.: On the foundation of linear and integer programming I. Mathematical
Programming 9, 207–226 (1975)
7. Hemmecke, R.: On the positive sum property and the computation of Graver test
sets. Mathematical Programming 96, 247–269 (2003)
8. Hemmecke, R., Onn, S., Weismantel, R.: A polynomial oracle-time algorithm for
convex integer minimization. Mathematical Programming, Series A (to appear,
2010), doi:10.1007/s10107-009-0276-7
9. Hemmecke, R., Schultz, R.: Decomposition of test sets in stochastic integer pro-
gramming. Mathematical Programming 94, 323–341 (2003)
10. Hoşten, S., Sullivant, S.: Finiteness theorems for Markov bases of hierarchical mod-
els. Journal of Combinatorial Theory, Series A 114(2), 311–321 (2007)
11. New Formulations for Optimization Under Stochastic Dominance Constraints.
SIAM J. Optim. 19, 1433–1450 (2008)
12. Mirchandani, P.B., Soroush, H.: The stochastic multicommodity flow problem. Net-
works 20, 121–155 (1990)
13. Murota, K., Saito, H., Weismantel, R.: Optimality criterion for a class of nonlinear
integer programs. Operations Research Letters 32, 468–472 (2004)
14. Onn, S.: Theory and Applications of N -fold Integer Programming. In: IMA Volume
on Mixed Integer Nonlinear Programming. Frontier Series. Springer, Heidelberg (in
preparation 2010)
15. Onn, S., Rothblum, U.: Convex combinatorial optimization. Discrete Computa-
tional Geometry 32, 549–566 (2004)
16. Powell, W.B., Topaloglu, H.: Dynamic-Programming Approximations for Stochas-
tic Time-Staged Integer Multicommodity-Flow Problems. INFORMS Journal on
Computing 18, 31–42 (2006)
17. Santos, F., Sturmfels, B.: Higher Lawrence configurations. Journal of Combinato-
rial Theory, Series A 103, 151–164 (2003)
18. Schulz, A.S., Weismantel, R.: A polynomial time augmentation algorithm for in-
teger programming. In: Proc. of the 10th ACM-SIAM Symposium on Discrete
Algorithms, Baltimore (1999)
19. Schrijver, A.: Theory of linear and integer programming. Wiley, Chichester (1986)
20. Seymour, P.D.: Decomposition of regular matroids. Journal of Combinatorial The-
ory, Series B 28, 305–359 (1980)
Universal Sequencing on a Single Machine

Leah Epstein1 , Asaf Levin2 , Alberto Marchetti-Spaccamela3, , Nicole Megow4 ,


Julián Mestre4 , Martin Skutella5, , and Leen Stougie6,
1
Dept. of Mathematics, University of Haifa, Israel
[email protected]
2
Chaya fellow. Faculty of Industrial Engineering and Management, The Technion, Haifa, Israel
[email protected]
3
Dept. of Computer and System Sciences, Sapienza University of Rome, Italy
[email protected]
4
Max-Planck-Institut für Informatik, Saarbrücken, Germany
{nmegow,jmestre}@mpi-inf.mpg.de
5
Inst. für Mathematik, Technische Universität Berlin, Germany
[email protected].
6
Dept. of Econometrics and Operations Research, Vrije Universiteit Amsterdam & CWI,
Amsterdam, The Netherlands
[email protected]

Abstract. We consider scheduling on an unreliable machine that may experi-


ence unexpected changes in processing speed or even full breakdowns. We aim
for a universal solution that performs well without adaptation for any possible
machine behavior. For the objective of minimizing the total weighted completion
time, we design a polynomial time deterministic algorithm that finds a universal
scheduling sequence with a solution value within 4 times the value of an optimal
clairvoyant algorithm that knows the disruptions in advance. A randomized ver-
sion of this algorithm attains in expectation a ratio of e. We also show that both
results are best possible among all universal solutions. As a direct consequence of
our results, we answer affirmatively the question of whether a constant approx-
imation algorithm exists for the offline version of the problem when machine
unavailability periods are known in advance.
When jobs have individual release dates, the situation changes drastically.
Even if all weights are equal, there are instances for which any universal solution
is a factor of Ω(log n/ log log n) worse than an optimal sequence. Motivated by
this hardness, we study the special case when the processing time of each job is
proportional to its weight. We present a non-trivial algorithm with a small con-
stant performance guarantee.

1 Introduction

Traditional scheduling problems normally assume that jobs run on an ideal machine that
provides a constant performance throughout time. While in some settings this is a good

Supported by EU project 215270 FRONTS.

Supported by DFG research center M ATHEON in Berlin.

Supported by the Dutch BSIK-BRIKS project.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 230–243, 2010.
c Springer-Verlag Berlin Heidelberg 2010
Universal Sequencing on a Single Machine 231

enough approximation of real life machine behavior, in other situations this assump-
tion is decidedly unreasonable. Our machine, for example, can be a server shared by
multiple users; if other users suddenly increase their workload, this can cause a general
slowdown; or even worse, the machine may become unavailable for a given user due
to priority issues. In other cases, our machine may be a production unit that can break
down altogether and remain offline for some time until it is repaired. In these cases, it
is crucial to have schedules that take such unreliable machine behavior into account.
Different machine behaviors will typically lead to widely different optimal sched-
ules. This creates a burden on the scheduler who would have to periodically recompute
the schedule from scratch. In some situations, recomputing the schedule may not even
be feasible: when submitting a set of jobs to a server, a user can choose the order in
which it presents these jobs, but cannot alter this ordering later on. Therefore, it is de-
sirable in general to have a fixed master schedule that will perform well regardless of
the actual machine behavior. In other words, we want a universal schedule that, for any
given machine behavior, has cost close to that of an optimal clairvoyant algorithm.
In this paper we initiate the study of universal scheduling by considering the problem
of sequencing jobs on a single machine to minimize average completion times. Our
main result is an algorithm for computing a universal schedule that is always a constant
factor away from an optimal clairvoyant algorithm. We complement this by showing
that our upper bound is best possible among universal schedules. We also consider
the case when jobs have release dates. Here we provide an almost logarithmic lower
bound on the performance of universal schedules, thus showing a drastic difference
with respect to the setting without release dates. Finally, we design an algorithm with
constant performance for the interesting case of scheduling jobs with release dates and
proportional weights. Our hope is that these results stimulate the study of universal
solutions for other scheduling problems, and, more broadly, the study of more realistic
scheduling models. In the rest of this section we introduce our model formally, discuss
related work, and explain our contributions in detail.
The model. We are given a job set J with processing times pj ∈ Q+ and weights
wj ∈ Q+ for each job j ∈ J. Using a standard scaling argument, we can assume
w.l.o.g. that wj ≥ 1 for j ∈ J. The problem is to find a sequence π of jobs to be
scheduled on a single machine that minimizes the total sum of weighted completion
times. The jobs are processed in the prefixed order π no matter how the machine may
change its processing speed or whether it becomes unavailable. In case of a machine
breakdown the currently running job is preempted and will be resumed processing at
any later moment when the machine becomes available again. We analyze the worst
case performance by comparing the solution value provided by an algorithm with that
of an optimal clairvoyant algorithm that knows the machine behavior in advance, and
that is even allowed to preempt jobs at any time.
We also consider the more general problem in which each job j ∈ J has its individ-
ual release date rj ≥ 0, which is the earliest point in time when it can start processing.
In this model, it is necessary to allow job preemption, otherwise no constant perfor-
mance guarantee is possible as simple examples show. We allow preemption in the
actual scheduling procedure, however, as in the case without release dates, we aim for
non-adaptive universal solutions. That is, a schedule will be specified by a total ordering
232 L. Epstein et al.

of the jobs. At any point in time we work on the first job in this ordering that has not
finished yet and that has already been released. This procedure is called preemptive list
scheduling [9, 28]. Note that a newly released job will preempt the job that is currently
running if it comes earlier than the current job in the ordering.
Related work. The concept of universal solutions, that perform well for every single
input of a superset of possible inputs, has been used already decades ago in different
contexts, as e.g. in hashing [4] and routing [31]. The latter is also known as oblivious
routing and has been studied extensively; see [26] for a state-of-the-art overview. Jia et
al. [12] considered universal approximations for TSP, Steiner Tree, and Set Cover Prob-
lems. All this research falls broadly into the field of robust optimization [3]. The term
robust is not used consistently in the literature. In particular, the term robust scheduling
refers mainly to robustness against uncertain processing times; see e.g. [17, chap. 7]
and [23]. Here, quite strong restrictions on the input or weakened notions of robustness
are necessary to guarantee meaningful worst case solutions. We emphasize, that our
results in this paper are robust in the most conservative, classical notion of robustness
originating by Soyster [30], also called strict robustness [22], and in this regard, we
follow the terminology of universal solutions.
Scheduling with limited machine availability is a subfield of machine scheduling
that has been studied for over twenty years; see, e.g., the surveys [27, 20, 7]. Different
objective functions, stochastic breakdowns, as well as the offline problem with known
availability periods have been investigated. Nevertheless, only few results are known on
the problem of scheduling to minimize the total weighted completion time, and none of
these deal with release dates. If all jobs have equal weights, a simple interchange argu-
ment shows that sequencing jobs in non-increasing order of processing times is optimal
as it is in the setting with continuous machine availability [29]. Obviously, this result
immediately transfers to the universal setting in which machine breakdowns or changes
in processing speeds are not known beforehand. The special case of proportional jobs,
in which the processing time of each job is proportional to its weight, has been stud-
ied in [32]. The authors showed that scheduling in non-increasing order of processing
times (or weights) yields a 2-approximation for preemptive scheduling. However, for
the general problem with arbitrary job weights, it remained an open question [32] if a
polynomial time algorithm with constant approximation ratio exists, even without re-
lease dates. In this case, the problem is strongly NP-hard [32].
A major line of research within this area focused on the offline scheduling prob-
lem with a single unavailable period. This problem is weakly NP-hard in both, the
preemptive [19] and the non-preemptive variant [1, 21]. Several approximation results
have been derived, see [19, 21, 32, 13, 24]. Only very recently, and independently of
us, Kellerer and Strusevich [16] derived FPTASes with running time O(n4 / 2 ) for the
non-preemptive problem and O(n6 / 3 ) in the preemptive case. An even improved non-
preemptive FPTAS with running time O(n2 / 2 ) is claimed in [14]. However, the proof
seems incomplete in bounding the deviation of an algorithm’s solution from an optimal
one; in particular, the claim after Ineq. (11) in the proof of Lem. 1 is not proved.
Our results. Our main results are algorithms that compute deterministic and randomized
universal schedules for jobs without release dates. These algorithms run in polynomial
time and output an ordering of the jobs such that scheduling the jobs in this order will
Universal Sequencing on a Single Machine 233

always yield a solution that remains within multiplicative factor 4 and within multiplica-
tive factor e in expectation from any given schedule. Furthermore, we show that our
algorithms can be adapted to solve more general problem instances with certain types
of precedence constraints without loosing performance quality. We also show that our
upper bounds are best possible for universal scheduling. This is done by establishing an
interesting connection between our problem and a certain online bidding problem [5].
It may seem rather surprising that universal schedules with constant performance
guarantee should always exist. In fact, our results immediately answer affirmatively
an open question in the area of offline scheduling with limited machine availability:
whether there exists a constant factor approximation algorithm for scheduling jobs in a
machine having multiple unavailable periods that are known in advance.
To derive our results, we study the objective of minimizing the total weight of un-
completed jobs at any point in time. First, we show that the performance guarantee
is given directly by a bound on the ratio between the remaining weight of our algo-
rithm and that of an optimal clairvoyant algorithm at every point in time on an ideal.
Then, we devise an algorithm that computes the job sequence iteratively backwards: in
each iteration we find a subset of jobs with largest total processing time subject to a
bound on their total weight. The bound is doubled in each iteration. Our approach is
related to, but not equivalent to, an algorithm of Hall et al. [9] for online scheduling
on ideal machines—the doubling there happens in the time horizon. Indeed, this type
of doubling strategy has been applied successfully in the design of algorithms for var-
ious problems; the interested reader is referred to the excellent survey of Chrobak and
Kenyon-Mathieu [6] for a collection of such examples.
The problem of minimizing the total weight of uncompleted jobs at any time was
previously considered [2] in the context of on-line scheduling to minimize flow time on
a single machine; there, a constant approximation algorithm is presented with a worst
case bound of 24. Our results imply an improved 4-approximation for this problem.
Furthermore, we show that the same guarantee holds for the setting with release dates;
unfortunately, unlike in the case without release dates, this does not translate into the
same performance guarantee for universal schedules. In fact, when jobs have individual
release dates, the problem changes drastically.
In Section 4 we show that in the presence of release dates, even if all weights are
equal, there are instances for which the ratio between the value of any universal solution
and that of an optimal schedule is Ω(log n/ log log n). Our proof relies on the classical
theorem of Erdős and Szekeres [8] on the existence of long increasing/decreasing sub-
sequences of a given sequence of numbers. Motivated by this hardness, we study the
class of instances with proportional jobs. We present a non-trivial algorithm and prove
a performance guarantee of 5. Additionally, we give a lower bound of 3 for all universal
solutions in this special case.
Our last result, Section 5, is a fully polynomial time approximation scheme (FPTAS)
for offline scheduling on a machine with a single unavailable period. Compared to the
FPTAS presented recently in [16], our scheme, which was discovered independently
from the former, is faster and seems to be simpler, even though the basic ideas are
similar. Our FPTAS for the non-preemptive variant has running time O(n3 / 2 ) and for
the preemptive variant O(n4 / 3 log pmax ).
234 L. Epstein et al.

2 Preliminaries and Key Observations


Given a single machine that runs continuously at unit speed (ideal machine), the com-
pletion time Cjπ of job j when applying preemptive list scheduling to sequence π is
uniquely defined. For some point in time t ≥ 0 let W π (t) denote the total weight
π
of
 jobs that are not yet completed by time t according to sequence π, i.e., W (t) :=
j:Cjπ >t wj . Then,

 : ∞
wj Cjπ = W π (t)dt. (1)
j∈J 0

Clearly, breaks or fluctuations in the speed of the machine delay the completion times.
To describe a particular machine behavior, let f : R+ → R+ be a non-decreasing con-
tinuous function, with f (t) being the aggregated amount of processing time available
on the machine up to time t. We refer to f as the machine capacity function. If the
derivative of f at time t exists, it can be interpreted as the speed of the machine at that
point in time.
For a given capacity function f , let S(π, f ) denote the single machine schedule
S(π,f )
when applying preemptive list scheduling to permutation π, and let Cj denote
the completion time of job j in this particular schedule. For some point in time t ≥ 0,
let W S(π,f ) (t) denote the total weight of jobs that are not yet completed by time t in
schedule S(π, f ). Then,
 : ∞
S(π,f )
wj Cj = W S(π,f ) (t)dt .
j∈J 0


For t ≥ 0 let W S (f )
(t) := minπ W S(π,f ) (t).
Observation 1. For a given machine capacity function f ,
: ∞

W S (f ) (t)dt (2)
0

is a lower bound on the objective function of any schedule.


We construct a universal sequence of jobs π such that, no matter how the single machine
behaves, the objective value of the corresponding schedule S(π, f ) is within a constant
factor of the optimum.
 S(π,f )
Lemma 1. Let π be a sequence of jobs, and let c > 0. Then, the value j∈J wj Cj
is at most c times the optimum for all machine capacity functions f if and only if

W S(π,f ) (t) ≤ cW S (f )
(t) for all t ≥ 0, and for each f .

Proof. The “if” part is clear, since by Observation 1


 : ∞ : ∞
S(π,f ) ∗
wj Cj = W S(π,f ) (t)dt ≤ c WS (f )
(t)dt.
j∈J 0 0
Universal Sequencing on a Single Machine 235


We prove the “only if” part by contradiction. Assume that W S(π,f ) (t0 ) > cW S (f ) (t0 )
for some t0 and f . For any t1 > t0 consider the following machine capacity function


⎨f (t) if t ≤ t0 ,

f (t) = f (t0 ) if t0 < t ≤ t1 ,


f (t − t1 + t0 ) if t > t1

which equals f up to time t0 and then remains constant at value f  (t) = f (t0 ) for the
time interval [t0 , t1 ]. Hence,
 S(π,f  )
 S(π,f  ) 
wj Cj = wj Cj + (t1 − t0 )W S(π,f ) (t0 ). (3)
j∈J j∈J

∗  ∗
(f  )
On the other hand, let π ∗ be a sequence of jobs with W S(π ,f ) (t0 ) = W S (t0 ).
Then,
 S(π ∗ ,f  )
 S(π ∗ ,f  ) ∗ 
wj Cj = wj Cj + (t1 − t0 )W S (f ) (t0 ). (4)
j∈J j∈J

 ∗
(f  )
As t1 tends to infinity, the ratio of (3) and (4) tends to W S(π,f ) (t0 )/W S (t0 ) > c,
a contradiction. 


In case that all release dates are equal, approximating the sum of weighted completion
times on a machine with unknown processing behavior is equivalent to approximating
the total remaining weight at any point in time on an ideal machine: f (t) = t, t ≥ 0.
Scheduling according to sequence π on such a machine yields for each j, Cjπ :=

k:π(k)≤π(j) pk . The completion time under machine capacity function f is

S(π,f )
Cj = min{t | f (t) ≥ Cjπ }.

Observation 2. For any machine capacity function f and any sequence π of jobs with-
out release dates,

W S(π,f ) (t) = W π (f (t)) for all t ≥ 0.



For f (t) = t let W ∗ (t) := W S (f ) (t). With Observation 2 we can significantly
strengthen the statement of Lemma 1.
Lemma 2. Let π be a sequence of jobs with equal release dates, and let c > 0. Then,
 S(π,f )
the objective value j∈J wj Cj is at most c times the optimum for all machine
capacity functions f if and only if

W π (t) ≤ cW ∗ (t) for all t ≥ 0.

Simple counter examples show that this lemma is only true if all release dates are equal,
otherwise, Observation 2 is simply not true.
236 L. Epstein et al.

3 Universal Scheduling without Release Dates

3.1 Upper Bounds



In the sequel we use for a subset of jobs J  ⊆ J the notation p(J  ) := j∈J  pj
and w(J  ) := j∈J  wj . Based on key Lemma 2, we aim at approximating the min-
imum total weight of uncompleted jobs at any point in time on an ideal machine, i.e.,
we approximate the value of W ∗ (t) for all values of t ≤ p(J) for a machine with ca-
pacity function f (t) = t, t ≥ 0. In our algorithm we do so by solving the problem
to find the set of jobs that has maximum total processing time and total weight within
a given bound. By sequentially doubling the weight bound, a sequence of job sets is
obtained. Jobs in job sets corresponding to smaller weight bounds are to come later in
the schedule, breaking ties arbitrarily.

Algorithm D OUBLE:
1. For i ∈ {0, 1, . . . , log w(J)}, find a subset Ji∗ of jobs of total weight w(Ji∗ ) ≤ 2i
and maximum total processing time p(Ji∗ ). Notice that Jlog ∗
w(J) = J.
2. Construct a permutation π as follows. Start with / an empty sequence of jobs. For i =
log w(J) down to 0, append the jobs in Ji∗ \ k=0 Jk∗ in any order at the end of
i−1

the sequence.

Theorem 1. For every scheduling instance, D OUBLE produces a permutation π such


 S(π,f )
that the objective value j∈J wj Cj is less than 4 times the optimum for all ma-
chine capacity functions f .

Proof. Using Lemma 2 it is sufficient to show that W π (t) < 4W ∗ (t) for all t ≥ 0.
Let t ≥ 0 and let i be minimal such that p(Ji∗ ) ≥ p(J) − t. By construction of π, only
/i
jobs j in k=0 Jk∗ have a completion time Cjπ > t. Thus,


i 
i
W π (t) ≤ w(Jk∗ ) ≤ 2k = 2i+1 − 1. (5)
k=0 k=0

In case i = 0, the claim is trivially true since wj ≥ 1 for any j ∈ J, and thus, W ∗ (t) =

W π (t). Suppose i ≥ 1, then by our choice of i, it holds that p(Ji−1 ) < p(J) − t.
Therefore, in any sequence π  , the total weight of jobs completing after time t is larger

than 2i−1 , because otherwise we get a contradiction to the maximality of p(Ji−1 ). That
∗ i−1
is, W (t) > 2 . Together with (5) this concludes the proof. 


Notice that the algorithm takes exponential time since finding the subsets of jobs Ji∗ is
a K NAPSACK problem and, thus, NP-hard [15]. However, we adapt the algorithm by,
instead of Ji∗ , computing a subset of jobs Ji of total weight w(Ji ) ≤ (1 + /4)2i and
processing time p(Ji ) ≥ max{p(J  ) | J  ⊆ J and w(J  ) ≤ 2i }. This can be done in
time polynomial in the input size and 1/ adapting, e.g., the FPTAS in [11] for K NAP -
SACK . The subsets Ji obtained in this way are turned into a sequence π  as in D OUBLE.
Universal Sequencing on a Single Machine 237

Theorem 2. Let > 0. For every scheduling instance, we can construct a permuta-
 S(π,f )
tion π in time polynomial in the input size and 1/ such that the value j∈J wj Cj
is less than 4 + times the optimum for all machine capacity functions f .

Proof. Again, by Lemma 2 it is sufficient to prove that W π (t) < 4W ∗ (t) for all t ≥ 0.
Instead of inequality (5) we get the slightly weaker bound

 
i 
i
W π (t) ≤ w(Jk ) ≤ (1 + /4)2k = (1 + /4)(2i+1 − 1) < (4 + ) 2i−1 .
k=0 k=0

Moreover, the lower bound W ∗ (t) > 2i−1 still holds. 




We improve Theorem 1 by adding randomization to D OUBLE in a quite standard fash-


ion. Instead of the fixed bound of 2i on the total weight of job set Ji∗ in iteration i ∈
{0, 1, . . . , log w(J)} we use the randomly chosen bound Xei where X = eY and Y
is picked uniformly at random from [0, 1] before the first iteration. We omit the proof.

Theorem 3. Let > 0. For every scheduling instance, randomized D OUBLE constructs
a permutation π in time that is polynomial in the input size and 1/ such that the objec-
 S(π,f )
tive value j∈J wj Cj is in expectation less than e + times the optimum value
for all machine capacity functions f .

A natural generalization of the universal sequencing problem requires that jobs are se-
quenced in compliance with given precedence constraints. We extend the results in
Theorems 1 and 3 to this model for certain classes of precedence constraints such as
directed out-trees, two dimensional orders, and the complement of chordal bipartite
orders.

3.2 Lower Bounds


In this section we show a connection between the performance guarantee for sequencing
jobs on a single machine without release dates and an online bidding problem investi-
gated by Chrobak et al. [5]. This allows us to prove tight lower bounds for our problem.
In online bidding, we are given a universe U = {1, . . . , n}. A bid set is just a subset
of U. A given bid set B is said to be α-competitive if

b + min b ≤ α T ∀ T ∈ U. (6)
b∈B : b≥T
b∈B : b<T

Chrobak et al. [5] gave lower bounds of 4− and e− , for any > 0, for deterministic
and randomized algorithms, respectively.

Theorem 4. For any > 0, there exists an instance of the universal scheduling problem
without release dates on which the performance ratio of any deterministic schedule is
at least 4 − and the performance ratio of any randomized schedule is at least e − .
238 L. Epstein et al.

Proof. Take an instance of the online bidding problem and create the following instance
of the scheduling problem: For each j ∈ U create job j with weight wj = j and
processing time pj = j j . Consider any permutation π of the jobs U. For any j ∈ U,
j−1
let k(j) be the largest index such
 that πk(j) ≥ j. Since pj > i=1 pj , at time t =
p(U) − pj we have W π (t) = k=k(j) wπk , while W ∗ (t) = wj . If sequence π yields a
n

performance ratio of α then, Lemma 2 tell us that


n
πk ≤ α j ∀ j ∈ U. (7)
k=k(j)

From sequence  sequence of jobs as follows: W1 = πn , Wk =


 π we extract another
argmaxi∈U π −1 (i) | i > Wk−1 . Then Wi+1 > Wi , and all j with π −1 (Wi+1 ) <
π −1 (j) < π −1 (Wi ) have weight less than Wi . Therefore, we have {i ∈ W | i < j} ∪
min {i ∈ W | i ≥ j} ⊂ {πk(j) , . . . , πn }, for all j ∈ U. Hence, if π achieves a perfor-
mance ratio of α then
 
n
i + min i ≤ πk ≤ α j ∀ j ∈ U,
i∈W : i≥j
i∈W : i<j k=k(j)

that is, the bid set W induced by the sequence π must be α-competitive. Since there
is a lower bound of 4 − for the competitiveness of deterministic strategies for on-
line bidding, the same bound holds for the performance ratio of deterministic universal
schedules.
The same approach yields the lower bound for randomized strategies. 


4 Universal Scheduling with Release Dates


In this section we study the problem of the previous section when jobs have release
dates. Algorithm D OUBLE, which aims at minimizing the total remaining weight, can
be adapted to the setting with release dates: Instead of a knapsack algorithm we use,
within a binary search routine, Lawler’s pseudo-polynomial time algorithm [18] or the
FPTAS by Pruhs and Woeginger [25] for preemptively scheduling jobs with release
dates and due dates on a single machine to minimize the total weight of late jobs.
However, this approach does not yield a bounded performance guarantee for the uni-
versal scheduling problem. In the presence of release dates approximation ratios on
an ideal machine do not translate directly to a performance guarantee of the univer-
sal scheduling strategy, see Section 2. In fact, universal scheduling with release dates
cannot be approximated within a constant ratio as we show below.

4.1 Lower Bound


Theorem 5. There exists an instance with n jobs with equal weights and release dates,
where any universal schedule has a performance guarantee of Ω(log n/ log log n).
Universal Sequencing on a Single Machine 239

In our lower bound instance each job j has wj = 1, j = 0, 1, . . . , n−1. Their processing
j = 2 , j = 0, 1, . . . , n − 1, and they are released in
j
times form a geometric
n series p
reversed order rj = i>j 2 = i>j pi , j = 0, 1, . . . , n − 1.
i

To show the bound, we rely on a classic theorem of Erdős and Szekeres [8] or, more
precisely, on Hammersley’s proof [10] of this result.

Lemma 3 (Hammersley [10]). Given a sequence of n distinct numbers x1 , x2 , . . . , xn ,


we can decompose this set into k increasing subsequences 1 , 2 , . . . , k such that:
– There is a decreasing subsequence of length k;
– If xi belongs to a then for all j > i if xj < xi then xj belongs to b and b > a.

The idea is now to view a universal schedule as a permutation of {0, 1, . . . , n − 1}


and use Lemma 3 to decompose the sequence into k increasing subsequences. This
decomposition is then used to design a breakdown pattern that will yield Theorem 5.
The next two lemmas outline two kinds of breakdown patterns that apply to the two
possibilities offered by Lemma 3.

Lemma 4. The performance guarantee of a universal schedule that has  as a decreas-


ing subsequence is at least ||.

Proof. Let j be the first job in . The machine has breakdowns [rj , r0 ] and [r0 + 2j −
1, L] for large L. At time r0 all jobs have been released. 2j − 1 time units later, at
the start of the second breakdown, all jobs in  belong to the set of jobs uncompleted
by the universal schedule, whereas an optimal solution can complete all jobs except j.
Choosing L large enough implies the lemma. 


Lemma 5. Let 1 , 2 , . . . , k be the decomposition described in Lemma 3 when applied


to a universal schedule. Then for all i = 1, . . . , k the performance guarantee is at
least |1+|
i |+|i−1 |+···+|1 |
i−1 |+···+|1 |

Proof. For each job j in i there is a breakdown [rj , rj + ]. For each job j in i+1 , . . . , k
there is a breakdown [rj , rj + pj ] = [rj , rj + 2j ]. As a consequence, at time 2n − 1 the
universal schedule has all jobs in i and all jobs in i+1 , . . . , k uncompleted, whereas,
a schedule exists that leaves the last job of i and all jobs in j+1 , . . . , k uncompleted.
Therefore, a breakdown [2n − 1, L] for L large enough implies the lemma. 


Proof (Proof of Theorem 5). Consider an arbitrary universal scheduling solution and its
decomposition into increasing subsequences 1 , . . . , k as in Lemma 3 and let α be its
performance guarantee. Using Lemma 5, one can easily prove by induction that |i | ≤
αk−i+1 . Since 1 , . . . , k is a partition of the jobs, we have

k 
k
n= |i | ≤ αk−i+1 ≤ αk+1 .
i=1 i=1

By Lemma 4, !it follows that k ≤ α. Therefore log n = O(α log α) and


α = Ω logloglogn n .


240 L. Epstein et al.

4.2 Jobs with Proportional Weights

Motivated by the negative result in the previous section, we turn our attention to the
special case with proportional weights, that is, there exists a fixed γ ∈ Q such that wj =
γpj , for all j ∈ J. Using a standard scaling argument we can assume w.l.o.g. that pj =
wj , for all j. We provide an algorithm with a performance guarantee 5, and prove a
lower bound of 3 on the performance guarantee of any universal scheduling algorithm.

Algorithm S ORT C LASS:


1. Partition the set of jobs into z := log maxj∈J wj  classes, such that j belongs to
class Ji , for i ∈ 1, 2, . . . , z, if and only if pj ∈ (2i−1 , 2i ].
2. Construct a permutation π as follows. Start with an empty sequence of jobs. For
i = z down to 1, append the jobs of class Ji in non-decreasing order of release
dates at the end of π.

Theorem 6. The performance guarantee of S ORT C LASS for universal scheduling of


jobs with proportional weights and release dates is exactly 5.

Proof. Let π be the job sequence computed by S ORT C LASS. By Lemma 1, it is suffi-
cient to prove

W S(π,f ) (t) ≤ 5W S (f ) (t) ∀t > 0. (8)
Take any time t and any machine capacity function f . Let j ∈ Ji be the job being
processed at time t according to the schedule S(π, f ). We say that a job other than
job j is in the stack at time t if it was processed for a positive amount of time before t.
The algorithm needs to complete all jobs in the stack, job j, and jobs that did not start
before t, which have a total weight of at most p(J) − f (t), the amount of remaining
processing time at time t to be done by the algorithm.
Since jobs within a class are ordered by release times, there is at most one job per
class in the stack at any point in time. Since jobs in higher classes have higher priority
and job j ∈ Ji is processed at time t, there are no jobs in Ji+1 , . . . , Jz in the stack at
time t.Thus the weight of the jobs in the stack together with the weight of job j is at
i
most k=1 2k = 2i+1 − 1. Hence,

W S(π,f ) (t) < 2i+1 + p(J) − f (t) . (9)

A first obvious lower bound on the remaining weight of any schedule at time t is

WS (f )
(t) ≥ p(J) − f (t) . (10)

For another lower bound, let t be the last time before t in which the machine is avail-
able but it is either idle or a job of a class Ji with i < i is being processed. Note that t
is well-defined. By definition, all jobs processed during the time interval [t , t] are in
classes with index at least i, but also, they are released in the interval [t , t] since at t a
job of a lower class was processed or the machine was idle. Since at time t at least one
of these jobs is unfinished in S(π, f ), even though the machine continuously processed
Universal Sequencing on a Single Machine 241

only those jobs, no algorithm can complete all these jobs. Thus, at time t, an optimal
schedule also still needs to complete at least one job with weight at least 2i−1 :

WS (f )
(t) ≥ 2i−1 . (11)

Combining (9), (10), and (11) yields (8) and thus the upper bound of the theorem.
We omit the example that shows that the analysis is tight. 


We complement this result by a lower bound of 3, but have to omit the proof.
Theorem 7. There is no algorithm with performance guarantee strictly smaller than 3
for universal scheduling of jobs with release dates and wj = pj , for all j ∈ J.

5 The Offline Problem


Clearly, the performance guarantees derived in Sections 3 and 4 also hold in the offline
version of our problem in which machine breakdowns and changes in speed are known
in advance. Additionally, we investigate in this section the special case in which the
machine has a single, a priori-known non-availability interval [s, t], for 1 ≤ s < t.

 an FPTAS with running time O(n / ) for non-preemptive


3 2
Theorem 8. There exists
scheduling to minimize wj Cj on a single machine that is not available for processing
during a given time interval [s, t]. The approximation scheme can be extended to the
preemptive (resumable) setting with an increased running time of O(n4 / 2 log pmax ).

Due to space limitations we defer all details to the full version of the paper. The idea for
our FPTAS is based on a natural non-preemptive dynamic programming algorithm, used
also in [16]. Given a non-available time interval [s, t], the job set must be partitioned
into jobs that complete before s and jobs that complete after t. Clearly, the jobs in
each individual set are scheduled in non-increasing order of ratios wj /pj . This order is
known to be optimal on an ideal machine [29].
The main challenge in designing the FPTAS is to discretize the range of possible
total processing times of jobs scheduled before s in an appropriate way. Notice that
we cannot afford to round these values since they contain critical information on how
much processing time remains before the break. Perturbing this information causes a
considerable change in the set of feasible schedules that cannot be controlled easily. The
intuition behind our algorithm is to reduce the number of states by removing those with
the same (rounded) objective value and nearly the same total processing time before the
break. Among them, we want to store those with smallest amount of processing before
the break in order to make sure that enough space remains for further jobs that need to
be scheduled there.
The algorithm can be extended easily to the preemptive (resumable) problem. We
can assume, w.l.o.g., that in an optimal solution there is at most one job j interrupted
by the break [s, t] and it resumes processing as soon as the machine is available again.
For a given job j and with start time Sj , we define a non-preemptive problem with non-
available period [Sj , Sj + pj + t − s], which we solve by the FPTAS above. Thus, we
can solve the preemptive problem by running the FPTAS above O(n log pmax ) times.
242 L. Epstein et al.

6 Further Remarks
In Section 4 we have shown that the performance of universal scheduling algorithms
may deteriorate drastically when generalizing the universal scheduling problem slightly.
Other generalizations do not admit any (exponential time) algorithm with bounded per-
formance guarantee. If a non-adaptive algorithm cannot guarantee to finish within the
minimum makespan, then an adversary creates an arbitrarily long breakdown at the mo-
ment that an optimal schedule has completed all jobs. Examples of such variations are
the problem with two or more machines instead of a single machine, or the problem in
which preempting or resuming a job requires (even the slightest amount of) extra work.
The offline version of our problem (without release dates) in which preemption is
not allowed or causes extra work is not approximable in polynomial time; a reduction
from 2-PARTITION shows that the problem with two or more non-available periods is
not approximable, unless P=NP, even if all jobs have equal weight. A reduction in that
spirit has been used in [33] for a scheduling problem with some jobs having a fixed
position in the schedule. Similarly, we can rule out constant approximation factors for
any preemptive problem variant in which the makespan cannot be computed exactly in
polynomial time. This is shown by simple reductions from the corresponding decision
version of the makespan minimization problem. Such variations of our problem are
scheduling with release dates and scheduling with general precedence constraints.

References
1. Adiri, I., Bruno, J., Frostig, E., Rinnooy Kan, A.: Single machine flow-time scheduling with
a single breakdown. Acta Informatica 26(7), 679–696 (1989)
2. Becchetti, L., Leonardi, S., Marchetti-Spaccamela, A., Pruhs, K.: Online weighted flow time
and deadline scheduling. In: Goemans, M.X., Jansen, K., Rolim, J.D.P., Trevisan, L. (eds.)
RANDOM 2001 and APPROX 2001. LNCS, vol. 2129, pp. 36–47. Springer, Heidelberg
(2001)
3. Ben-Tal, A., Nemirovski, A.: Robust solutions of linear programming problems contami-
nated with uncertain data. Mathematical Programming 88, 411–424 (2000)
4. Carter, J., Wegman, M.: Universal classes of hash functions. Journal of Computer and System
Sciences 18, 143–154 (1979)
5. Chrobak, M., Kenyon, C., Noga, J., Young, N.E.: Incremental medians via online bidding.
Algorithmica 50(4), 455–478 (2008)
6. Chrobak, M., Kenyon-Mathieu, C.: Sigact news online algorithms column 10: Competitive-
ness via doubling. SIGACT News 37(4), 115–126 (2006)
7. Diedrich, F., Jansen, K., Schwarz, U.M., Trystram, D.: A survey on approximation algo-
rithms for scheduling with machine unavailability. In: Lerner, J., Wagner, D., Zweig, K.A.
(eds.) Algorithmics of Large and Complex Networks. LNCS, vol. 5515, pp. 50–64. Springer,
Heidelberg (2009)
8. Erdős, P., Szekeres, G.: A combinatorial problem in geometry. Compositio Mathematica 2,
463–470 (1935)
9. Hall, L., Schulz, A.S., Shmoys, D., Wein, J.: Scheduling to minimize average comple-
tion time: off-line and on-line approximation algorithms. Mathematics of Operations Re-
search 22, 513–544 (1997)
10. Hammersley, J.: A few seedlings of research. In: Proceedings Sixth Berkeley Symp. Math.
Statist. and Probability, vol. 1, pp. 345–394. University of California Press, Berkeley (1972)
11. Ibarra, O.H., Kim, C.E.: Fast approximation algorithms for the knapsack and sum of subset
problems. Journal of the ACM 22(4), 463–468 (1975)
Universal Sequencing on a Single Machine 243

12. Jia, L., Lin, G., Noubir, G., Rajaraman, R., Sundaram, R.: Universal approximations for TSP,
Steiner tree, and set cover. In: Proceedings of the 37th Annual ACM Symposium on Theory
of Computing (STOC ’05), pp. 386–395 (2005)
13. Kacem, I.: Approximation algorithm for the weighted flow-time minimization on a single
machine with a fixed non-availability interval. Computers & Industrial Engineering 54(3),
401–410 (2008)
14. Kacem, I., Mahjoub, A.R.: Fully polynomial time approximation scheme for the weighted
flow-time minimization on a single machine with a fixed non-availability interval. Computers
& Industrial Engineering 56(4), 1708–1712 (2009)
15. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of computer com-
putations (Proc. Sympos., IBM Thomas J. Watson Res. Center), pp. 85–103. Plenum, New
York (1972)
16. Kellerer, H., Strusevich, V.A.: Fully polynomial approximation schemes for a symmetric
quadratic knapsack problem and its scheduling applications (2009) (accepted for publication
in Algorithmica)
17. Kouvelis, P., Yu, G.: Robust Discrete Optimization and Its Applications. Springer, Heidelberg
(1997)
18. Lawler, E.L.: A dynamic programming algorithm for preemptive scheduling of a single ma-
chine to minimize the number of late jobs. Annals of Operations Research 26, 125–133 (1990)
19. Lee, C.Y.: Machine scheduling with an availability constraint. Journal of Global Optimimiza-
tion 9, 395–416 (1996)
20. Lee, C.Y.: Machine scheduling with availability constraints. In: Leung, J.Y.T. (ed.) Handbook
of scheduling. CRC Press, Boca Raton (2004)
21. Lee, C.Y., Liman, S.D.: Single machine flow-time scheduling with scheduled maintenance.
Acta Informatica 29(4), 375–382 (1992)
22. Liebchen, C., Lübbecke, M., Möhring, R.H., Stiller, S.: Recoverable robustness. Technical
report ARRIVAL-TR-0066, ARRIVAL Project (2007)
23. Mastrolilli, M., Mutsanas, N., Svensson, O.: Approximating single machine scheduling with
scenarios. In: Goel, A., Jansen, K., Rolim, J.D.P., Rubinfeld, R. (eds.) APPROX and RAN-
DOM 2008. LNCS, vol. 5171, pp. 153–164. Springer, Heidelberg (2008)
24. Megow, N., Verschae, J.: Note on scheduling on a single machine with one non-availability
period (2008) (unpublished)
25. Pruhs, K., Woeginger, G.J.: Approximation schemes for a class of subset selection problems.
Theoretical Computer Science 382(2), 151–156 (2007)
26. Räcke, H.: Survey on oblivious routing strategies. In: Ambos-Spies, K., Löwe, B., Merkle, W.
(eds.) Mathematical Theory and Computational Practice, Proceedings of 5th Conference on
Computability in Europe (CiE). LNCS, vol. 5635, pp. 419–429. Springer, Heidelberg (2009)
27. Schmidt, G.: Scheduling with limited machine availability. European Journal of Operational
Research 121(1), 1–15 (2000)
28. Schulz, A.S., Skutella, M.: The power of α-points in preemptive single machine scheduling.
Journal of Scheduling 5, 121–133 (2002)
29. Smith, W.E.: Various optimizers for single-stage production. Naval Research Logistics Quar-
terly 3, 59–66 (1956)
30. Soyster, A.: Convex programming with set-inclusive constraints and applications to inexact
linear programming. Operations Research 21(4), 1154–1157 (1973)
31. Valiant, L.G., Brebner, G.J.: Universal schemes for parallel communication. In: Proc. of
STOC, pp. 263–277 (1981)
32. Wang, G., Sun, H., Chu, C.: Preemptive scheduling with availability constraints to minimize
total weighted completion times. Annals of Operations Research 133, 183–192 (2005)
33. Yuan, J., Lin, Y., Ng, C., Cheng, T.: Approximability of single machine scheduling with fixed
jobs to minimize total completion time. European Journal of Operational Research 178(1),
46–56 (2007)
Fault-Tolerant Facility Location:
A Randomized Dependent LP-Rounding
Algorithm

Jaroslaw Byrka1, , Aravind Srinivasan2 , and Chaitanya Swamy3


1
Institute of Mathematics, Ecole Polytechnique Federale de Lausanne,
CH-1015 Lausanne, Switzerland
[email protected]
2
Dept. of Computer Science and Institute for Advanced Computer Studies,
University of Maryland, College Park, MD 20742, USA
[email protected]
3
Dept. of Combinatorics & Optimization, Faculty of Mathematics,
University of Waterloo, Waterloo, ON N2L 3G1, Canada
[email protected]

Abstract. We give a new randomized LP-rounding 1.725-approximation


algorithm for the metric Fault-Tolerant Uncapacitated Facility Location
problem. This improves on the previously best known 2.076-approximation
algorithm of Swamy & Shmoys. To the best of our knowledge, our work
provides the first application of a dependent-rounding technique in the do-
main of facility location. The analysis of our algorithm benefits from, and
extends, methods developed for Uncapacitated Facility Location; it also
helps uncover new properties of the dependent-rounding approach.
An important concept that we develop is a novel, hierarchical cluster-
ing scheme. Typically, LP-rounding approximation algorithms for facility
location problems are based on partitioning facilities into disjoint clus-
ters and opening at least one facility in each cluster. We extend this
approach and construct a laminar family of clusters, which then guides
the rounding procedure: this allows us to exploit properties of dependent
rounding, and provides a quite tight analysis resulting in the improved
approximation ratio.

1 Introduction
In Facility Location problems we are given a set of clients C that require a certain
service. To provide such a service, we need to open a subset of a given set of

This work was partially supported by: (i) the Future and Emerging Technologies Unit
of EC (IST priority - 6th FP), under contract no. FP6-021235-2 (project ARRIVAL),
(ii) MNISW grant number N N206 1723 33, 2007-2010, (iii) NSF ITR Award CNS-
0426683 and NSF Award CNS-0626636, and (iv) NSERC grant 327620-09 and an
Ontario Early Researcher Award.

Work of this author was partially conducted at CWI Amsterdam, TU Eindhoven,
and while visiting the University of Maryland.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 244–257, 2010.

c Springer-Verlag Berlin Heidelberg 2010
Fault-Tolerant Facility Location 245

facilities F . Opening each facility i ∈ F costs fi and serving a client j by facility


i costs cij ; the standard assumption is that the cij are symmetric and constitute
a metric. (The non-metric case is much harder to approximate.) In this paper,
we follow Swamy & Shmoys [11] and study the Fault-Tolerant Facility Location
(FTFL) problem, where each client has a positive integer specified as its coverage
requirement rj . The task is to find a minimum-cost solution which opens some
facilities from F and connects each client j to rj different open facilities.
The FTFL problem was introduced by Jain & Vazirani [7]. Guha et al. [6]
gave the first constant factor approximation algorithm with approximation ra-
tio 2.408. This was later improved by Swamy & Shmoys [11] who gave a 2.076-
approximation algorithm. FTFL generalizes the standard Uncapacitated Facility
Location (UFL) problem wherein rj = 1 for all j, for which Guha & Khuller [5]
proved an approximation lower bound of ≈ 1.463. The current-best approxima-
tion ratio for UFL is achieved by the 1.5-approximation algorithm of Byrka [2].
In this paper we give a new LP-rounding 1.7245-approximation algorithm
for the FTFL problem. It is the first application of the dependent rounding
technique of [10] to a facility location problem.
Our algorithm uses a novel clustering method, which allows clusters not to
be disjoint, but rather to form a laminar family of subsets of facilities. The hi-
erarchical structure of the obtained clustering exploits properties of dependent
rounding. By first rounding the “facility-opening” variables within smaller clus-
ters, we are able to ensure that at least a certain number of facilities is open
in each of the clusters. Intuitively, by allowing clusters to have different sizes
we may, in a more efficient manner, guarantee the opening of sufficiently-many
facilities around clients with different coverage requirements rj . In addition, one
of our main technical contributions is Theorem 2, which develops a new property
of the dependent-rounding technique that appears likely to have further appli-
cations. Basically, suppose we apply dependent rounding to a sequence of reals
and consider an arbitrary subset S of the rounded variables (each of which lies
in {0, 1}) as well as an arbitrary integer k > 0. Then, a natural fault-tolerance-
related objective is that if X denotes the number of variables rounded to 1 in S,
then the random variable Z = min{k, X} be “large”. (In other words, we want
X to be “large”, but X being more than k does not add any marginal utility.) We
prove that if X0 denotes the corresponding sum wherein the reals are rounded
independently and if Z0 = min{k, X0 }, then E[Z] ≥ E[Z0 ]. Thus, for analysis
purposes, we can work with Z0 , which is much more tractable due to the inde-
pendence; at the same time, we derive all the benefits of dependent rounding
(such as a given number of facilities becoming available in a cluster, with prob-
ability one). Given the growing number of applications of dependent-rounding
methodologies, we view this as a useful addition to the toolkit.

2 Dependent Rounding

Given a fractional vector y = (y1 , y2 , . . . , yN ) ∈ [0, 1]N we often seek to round


it to an integral vector ŷ ∈ {0, 1}N that is in a problem-specific sense very
246 J. Byrka, A. Srinivasan, and C. Swamy

“close to” y. The dependent-randomized-rounding technique of [10] is one such


approach known for preserving the sum of the entries deterministically, along
with concentration bounds for any linear combination of the entries; we will
generalize a known property of this technique in order to apply it to the FTFL
problem. The very useful pipage rounding technique of [1] was developed prior to
[10], and can be viewed as a derandomization (deterministic analog) of [10] via
the method of conditional probabilities. Indeed, the results of [1] were applied in
the work of [11]; the probabilistic intuition, as well as our generalization of the
analysis of [10], help obtain our results.
Define [t] = {1, 2, . . . , t}. Given a fractional vector y = (y1 , y2 , . . . , yN ) ∈
[0, 1]N , the rounding technique of [10] (henceforth just referred to as “dependent
rounding”) is a polynomial-time randomized algorithm to produce a random
vector ŷ ∈ {0, 1}N with the following three properties:
(P1): marginals. ∀i, Pr[ŷi = 1] = yi ;
 N
(P2): sum-preservation. With probability one, N i=1 ŷi equals either  i=1 yi 
N
or  i=1 yi ; and ; <
(P3): negative
; correlation.
< ∀S ⊆ [N ], Pr[ i∈S (ŷi = 0)] ≤ i∈S (1 − yi ), and
Pr[ i∈S (ŷi = 1)] ≤ i∈S yi .
In this paper, we also exploit the order in which the entries of the given fractional
vector y are rounded. We initially define a laminar family of subsets of indices
S ⊆ 2[N ] . When applying the dependent rounding procedure, we first round
within the smaller sets, until at most one fractional entry in a set is left, then
we proceed with bigger sets possibly containing the already rounded entries. It
can easily be shown that it assures the following version of property (P2) for all
subsets S from the laminar family S:
 
(P2’): sum-preservation.  With probability one, i∈S ŷi = i∈S yi
and |{i ∈ S : ŷi = 1}| =  i∈S yi .
Now, let S ⊆ [N ] be any subset, not necessarily from S. In order to present our
 two functions, SumS and gλ,S . For any vector x ∈ [0, 1] , let
n
results, we need
SumS (x) = i∈S xi be the sum of the elements of x indexed by elements of S;
in particular, if x is a (possibly random) vector with all entries either 0 or 1,
then SumS (x) counts the number of entries in S that are 1. Next, given s = |S|
and a real vector λ = (λ0 , λ1 , λ2 , . . . , λs ), we define, for any x ∈ {0, 1}n,


s
gλ,S (x) = λi · I(SumS (x) = i),
i=0

where I(·) denotes the indicator function. Thus, gλ,S (x) = λi if SumS (x) = i.
Let R(y) be a random vector in {0, 1}N obtained by independently rounding
each yi to 1 with probability yi , and to 0 with the complementary probability
of 1 − yi . Suppose, as above, that ŷ is a random vector in {0, 1}N obtained
by applying the dependent rounding technique to y. We start with a general
theorem and then specialize it to Theorem 2 that will be very useful for us:
Fault-Tolerant Facility Location 247

Theorem 1. Suppose we conduct dependent rounding on y = (y1 , y2 , . . . , yN ).


Let S ⊆ [N ] be any subset with cardinality s ≥ 2, and let λ = (λ0 , λ1 , λ2 , . . . , λs )
be any vector, such that for all r with 0 ≤ r ≤ s−2 we have λr −2λr+1 +λr+2 ≤ 0.
Then, E[gλ,S (ŷ)] ≥ E[gλ,S (R(y))].
Theorem 2. For any y ∈ [0, 1]N , S ⊆ [N ], and k = 1, 2, . . ., we have

E[min{k, SumS (ŷ)}] ≥ E[min{k, SumS (R(y))}].

Using the notation exp(t) = et , our next key result is:


Theorem 3. For any y ∈ [0, 1]N , S ⊆ [N ], and k = 1, 2, . . ., we have

E[min{k, SumS (R(y))}] ≥ k · (1 − exp(−SumS (y)/k)).

The above two theorems yield a key corollary that we will use:
Corollary 1

E[min{k, SumS (ŷ)}] ≥ k · (1 − exp(−SumS (y)/k)).

Proofs will appear in the full version of the paper (see also [3]).

3 Algorithm
3.1 LP-Relaxation
The FTFL problem is defined by the following Integer Program (IP).
  
minimize i∈F fi yi + j∈C i∈F cij xij (1)

subject to: i xij ≥ rj ∀j ∈ C (2)
xij ≤ yi ∀j ∈ C ∀i ∈ F (3)
yi ≤ 1 ∀i ∈ F (4)
xij , yi ∈ Z≥0 ∀j ∈ C ∀i ∈ F, (5)

where C is the set of clients, F is the set of possible locations of facilities, fi is


a cost of opening a facility at location i, cij is a cost of serving client j from
a facility at location i, and rj is the amount of facilities client j needs to be
connected to.
If we relax constraint (5) to xij , yi ≥ 0 we obtain the standard LP-relaxation
of the problem. Let (x∗ , y ∗ ) be an optimal solution to this LP relaxation. We will
give an algorithm that rounds this solution to an integral solution (x̃, ỹ) with
cost at most γ ≈ 1.7245 times the cost of (x∗ , y ∗ ).

3.2 Scaling
We may assume, without loss of generality, that for any client j ∈ C there ex-
ists at most one facility i ∈ F such that 0 < xij < yi . Moreover, this facility
248 J. Byrka, A. Srinivasan, and C. Swamy

can be assumed to have the highest distance to client j among the facilities that
fractionally serve j in (x∗ , y ∗ ).
We first set x̃ij = ỹi = 0 for all i ∈ F, j ∈ C. Then we scale up the fractional
solution by the constant γ ≈ 1.7245 to obtain a fractional solution (x̂, ŷ). To be
precise: we set x̂ij = min{1, γ · x∗ij }, ŷi = min{1, γ · yi∗ }. We open each facility
i with ŷi = 1 and connect each client-facility pair with x̂ij = 1. To be more
precise, we modify ŷ, ỹ, x̂, x̃ and service requirements r as follows. For each
facility i with ŷi = 1, set ŷi = 0 and ỹi = 1. Then, for every pair (i, j) such
that x̂ij = 1, set x̂ij = 0, x̃ij = 1 and decrease rj by one. When this process is
finished we call the resulting r, ŷ and x̂ by r, y and x. Note that the connections
that we made in this phase can be paid for by a difference in the connection
cost between x̂ and x. We will show that the remaining connection cost of the
solution of the algorithm is expected to be at most the cost of x.
For the feasibility of the final solution, it is essential that if we connected
client j to facility i in this initial phase, we will not connect it again to i in the
rest of the algorithm. There will be two ways of connecting clients in the process
of rounding x. The first one connects client j to a subset of facilities serving j in
x. Recall that if j was connected to facility i in the initial phase, then xij = 0,
and no additional i-j connection will be created.
The connections of the second type will be created in a process of clustering.
The clustering that we will use is a generalization of the one of Chudak & Shmoys
for the UFL problem [4]. As a result of this clustering process, client j will be
allowed to connect itself via a different client j  to a facility open around j  . j 
will be called a cluster center for a subset of facilities, and it will make sure that
at least some guaranteed number of these facilities will get opened.
To be certain that client j does not get again connected to facility i with a
path via client j  , facility i will never be a member of the set of facilities clustered
by client j  . We call a facility i special for client j iff ỹi = 1 and 0 < xij < 1.
Note that, by our earlier assumption, there is at most one special facility for
each client j, and that a special facility must be at maximal distance among
facilities serving j in x. When rounding the fractional solution in Section 3.5, we
take care that special facilities are not members of the formed clusters.

3.3 Close and Distant Facilities


Before we describe how do we cluster facilities, we specify the facilities that are
interesting for a particular client in the clustering process. The following can be
fought of as a version of a filtering technique of Lin and Vitter [8], first applied
to facility location by Shmoys et al. [9]. The analysis that we use here is a version
of the argument of Byrka [2].
As a result of the scaling that was described in the previous section, the con-
nection variables x amount for a total connectivity that exceeds the requirement
r. More precisely, we have i∈F xij ≥ γ · r j for every client j ∈ C. We will
consider for each client j a subset of facilities that are just enough to provide it
a fractional connection of rj . Such a subset is called a set of close facilities of
client j and is defined as follows.
Fault-Tolerant Facility Location 249

For every client j consider the following construction. Let i1 , i2 , . . . , i|F | be the
ordering of facilities in F in a nondecreasing order of distances cij to client j. Let
 k
ik be the facility in this ordering, such that k−1
l=1 xil j < r j and l=1 xil j ≥ r j .
Define ⎧
⎨ xil j  for l < k,
(c) k−1
xil j = r j − l=1 xil j for l = k,

0 for l > k
(d) (c)
Define xij = xij − xij for all i ∈ F, j ∈ C.
(c)
We will call the set of facilities i ∈ F such that xij > 0 the set of close
facilities of client j and we denote it by Cj . By analogy, we will call the set of
(d)
facilities i ∈ F such that xij > 0 the set of distant facilities of client j and
denote it Dj . Observe that for a client j the intersection of Cj and Dj is either
empty, or contains exactly one facility. In the latter case, we will say that this
facility is both distant and close. Note that, unlike in the UFL problem, we
cannot simply split this facility to the close and the distant part, because it is
essential that we make at most one connection to this facility in the final integral
(max)
solution. Let dj = cik j be the distance from client j to the farthest of its
close facilities.

3.4 Clustering
We will now construct a family of subsets of facilities S ∈ 2F . These subsets S ∈
S will be called clusters and they will guide the rounding procedure described
next. There will be a client related to each cluster, and each single client j will
be related to at most one cluster, which we call Sj .
Not all the clients participate in the clustering process. Clients j with r j = 1
and a special facility i ∈ Cj (recall that a special facility is a facility that is fully
open in ŷ but only partially used by j in x) will be called special and will not
take part in the clustering process. Let C  denote the set of all other, non-special
clients. Observe that, as a result of scaling, clients j with r j ≥ 2 do not have any
special facilities among their close facilities (since i xij ≥ γrj > rj + 1). As
a consequence, there are no special facilities among the close facilities of clients
from C  , the only clients actively involved in the clustering procedure.
For each client j ∈ C  we will keep two families Aj and Bj of disjoint subsets
of facilities. Initially Aj = {{i} : i ∈ Cj }, i.e., Aj is initialized to contain a
singleton set for each close facility of client j; Bj is initially empty. Aj will be
used to store these initial singleton sets, but also clusters containing only close
facilities of j; Bj will be used to store only clusters that contain at least one
close facility of j. When adding a cluster to either Aj or Bj we will remove all
the subsets it intersects from both Aj and Bj , therefore subsets in Aj ∪ Bj will
always be pairwise disjoint.
The family of clusters that we will construct will be a laminar family of subsets
of facilities, i.e., any two clusters are either disjoint or one entirely contains the
other. One can imagine facilities being leaves and clusters being internal nodes
of a forest that eventually becomes a tree, when all the clusters are added.
250 J. Byrka, A. Srinivasan, and C. Swamy


We will use y(S) as a shorthand for i∈S yi . Let us define y(S) = y(S). As
a consequence of using the family of clusters to guide the rounding process, by
Property (P2’) of the dependent rounding procedure when applied to a cluster,
th quantity y(S) lower bounds the number of facilities that will certainly be
opened in cluster S. Additionally,
 let us define the residual requirement of client
j to be rrj = r j − S∈(Aj ∪Bj ) y(S), that is r j minus a lower bound on the
number of facilities that will be opened in clusters from Aj and Bj .
We use the following procedure to compute clusters. While there exists a client
j ∈ C  , such that rrj > 0, take such j with minimal dj
(max)
and do the following:
1. 
Take Xj to be an inclusion-wise minimal subset / of Aj , such that
S∈Xj (y(S) − y(S)) ≥ rrj . Form the new cluster Sj = S∈Xj S.
2. Make Sj a new cluster by setting S ← S ∪ {Sj }.
3. Update Aj ← (Aj \ Xj ) ∪ {Sj }.
4. For each client j  with rrj  > 0 do
– If Xj ⊆ Aj  , then set Aj  ← (Aj  \ Xj ) ∪ {Sj }.
– If Xj ∩ Aj  = ∅ and Xj \ Aj  = ∅,
then set Aj  ← Aj  \ Xj and Bj  ← {S ∈ Bj  : S ∩ Sj = ∅} ∪ {Sj }.
Eventually, add a cluster Sr = F containing all the facilities to the family S.
We call a client j  active in aparticular iteration, if before this iteration its
residual requirement rrj = r j − S∈(Aj ∪Bj ) y(S) was positive. During the above
procedure, all active clients j have in their sets Aj and Bj only maximal subsets
of facilities, that means they are not subsets of any other clusters (i.e., they
are roots of their trees in the current forest). Therefore, when a new cluster
Sj is created, it contains all the other clusters with which it has nonempty
intersections (i.e., the new cluster Sj becomes a root of a new tree).
We shall now argue that there is enough fractional opening in clusters in Aj
to cover the residual requirement rrj when cluster Sj is to be formed. Consider
a fixed client j ∈ C  . Recall that
 at the start of the clustering
 we have Aj =
{{i} : i ∈ Cj }, and therefore S∈Aj (y(S) − y(S)) = i∈Cj yi ≥ r j = rrj . It

remains to show, that S∈Aj (y(S) − y(S)) − rrj does not decrease over time
until client j is considered. When a client j  with dj 
(max) (max)
≤ dj is considered
and cluster Sj  is created, the following cases are possible:
/
1. Sj  ∩ ( S∈Aj S) = ∅: then Aj and rrj do not change;
/ 
2. Sj  ⊆ ( S∈Aj S): then Aj changes its structure, but S∈Aj y(S) and
 
S∈Bj y(S) do not change; hence S∈Aj (y(S) − y(S)) − rrj also does not
change;/ /
3. Sj  ∩( S∈Aj S) = ∅ and Sj  \( S∈Aj S) = ∅: then, by inclusion-wise minimal-
 
ity of set Xj  , we have y(Sj  ) − S∈Bj ,S⊆Sj y(S) − S∈Aj ,S⊆Sj y(S) ≥ 0;

hence, S∈Aj (y(S) − y(S)) − rrj cannot decrease.

Let Aj = Aj ∩ S be the set of clusters in Aj . Recall that all facilities in clusters
in Aj are close facilities of j. Note also that each cluster Sj  ∈ Bj was created
from close facilities of a client j  with dj 
(max) (max)
≤ dj . We also have for each
Fault-Tolerant Facility Location 251

Sj  ∈ Bj that Sj  ∩ Cj = ∅, hence, by the triangle inequality, all facilities in Sj 


(max)
are at distance at most 3 · dj from j. We thus infer the following

Corollary 2. The family of clusters S contains for each client j ∈ C  a collec-


Aj ∪Bj containing only facilities within distance 3·dj
(max)
tion 
of disjoint clusters
 ,
and S∈A ∪Bj  i∈S y i  ≥ r j .
j

Note that our clustering is related to, but more complex then the one of Chudak
and Shmoys [4] for UFL and of Swamy and Shmoys [11] for FTFL, where clusters
are pairwise disjoint and each contains facilities whose fractional opening sums
up to or slightly exceeds the value of 1.

3.5 Opening of Facilities by Dependent Rounding


Given the family of subsets S ∈ 2F computed by the clustering procedure from
Section 3.4, we can proceed with rounding the fractional opening vector y into
an integral vector y R . We do it by applying the rounding technique of Section 2,
guided by the family S, which is done as follows.
While there is more than one fractional entry, select a minimal subset of S ∈ S
which contains more than one fractional entry and apply the rounding procedure
to entries of y indexed by elements of S until at most one entry in S remains
fractional. Eventually, if there remains a fractional entry, round it independently
and let y R be the resulting vector.
Observe that the above process is one of the possible implementations of
dependent rounding applied to y. As a result, the random integral vector y R
satisfies properties (P1),(P2), and (P3).
 Additionally, property (P2’) holds for
each cluster S ∈ S. Hence, at least  i∈S y i  entries in each S ∈ S are rounded
to 1. Therefore, by Corollary 2, we get
Corollary 3. For each client j ∈ C  .
(max)
|{i ∈ F |yiR = 1 and cij ≤ 3 · dj }| ≥ r j .

Next, we combine the facilities opened by rounding y R with facilities opened


already when scaling which are recorded in ỹ, i.e., we update ỹ ← ỹ + y R .
Eventually, we connect each client j ∈ C to rj closest opened facilities and
code it in x̃.

4 Analysis
We will now estimate the expected cost of the solution (x̃, ỹ). The tricky part is
to bound the connection cost, which we do as follows. We argue that a certain
fraction of the demand of client j can be satisfied from its close facilities, then
some part of the remaining demand can be satisfied from its distant facilities.
Eventually, the remaining (not too large in expectation) part of the demand is
satisfied via clusters.
252 J. Byrka, A. Srinivasan, and C. Swamy

4.1 Average Distances


Let us consider weighted average distances from a client j to sets of facilities
fractionally serving it. Let dj be the average connection cost in xij defined as

i∈F cij · xij
dj =  .
i∈F xij

(c) (d) (c) (d)


Let dj , dj be the average connection costs in xij and xij defined as
 (c)
(c) i∈F cij · xij
dj =  (c)
,
i∈F xij
 (d)
(d) i∈F cij · xij
dj =  (d)
.
i∈F xij
Let Rj be a parameter defined as
(c)
dj − dj
Rj =
dj

if dj > 0 and Rj = 0 otherwise. Observe that Rj takes value between 0 and 1.


(c) (d) (c)
Rj = 0 implies dj = dj = dj , and Rj = 1 occurs only when dj = 0. The role
played by Rj is that it measures a certain parameter of the instance, big values
are good for one part of the analysis, small values are good for the other.
(d) Rj
Lemma 1. dj ≤ dj (1 + γ−1 ).

 (c)  (d)
Proof. Recall that i∈F xij = r j and i∈F xij ≥ (γ − 1) · r j . Therefore, we
(d) (c)
have (dj − dj ) · (γ − 1) ≤ (dj − dj ) · 1 = Rj · dj , which can be rearranged to
(d) Rj
get dj ≤ dj (1 + γ−1 ). 


Finally, observe that the average distance from j to the distant facilities of j
gives an upper bound on the maximal distance to any of the close facilities of j.
(max) (d)
Namely, dj ≤ dj .

4.2 Amount of Service from Close and Distant Facilities


We now argue that in the solution (x̃, ỹ), a certain portion of the demand is
expected to be served by the close and distant facilities of each client. Recall
that for a client j it is possible, that there is a facility that is both its close
and its distant facility. Once we have a solution that opens such a facility, we
would like to say what fraction of the demand is served from the close facilities.
To make our analysis simpler we will toss a properly biased coin to decide if
using this facility counts as using a close facility. With this trick we, in a sense,
Fault-Tolerant Facility Location 253

distance

(d) average distance to distant facilities


dj

(max)
dj maximal distance to close facilities
dj
(c)
dj = dj (1 − Rj ) average distance to close facilities

rj
0 close facilities γ
distant rj
facilities

Fig. 1. Distances to facilities serving client j in x. The width of a rectangle correspond-


ing to facility i is equal to xij . Figure helps to understand the meaning of Rj .

split such a facility into a close and a distant part. Note that we can only do
it for this part of the analysis, but not for the actual rounding algorithm from
Section 3.5. Applying the above-described split of the undecided facility, we get
that the total fractional opening of close facilities of client j is exactly r j , and
the total fractional opening of both close and distant facilities is at least γ · r j .
Therefore, Corollary 1 yields the following:
Corollary 4. The amount of close facilities used by client j in a solution de-
scribed in Section 3.5 is expected to be at least (1 − 1e ) · r j .
Corollary 5. The amount of close and distant facilities used by client j in a
solution described in Section 3.5 is expected to be at least (1 − e1γ ) · rj .
Motivated by the above bounds we design a selection method to choose a (large-
enough in expectation) subset of facilities opened around client j:
Lemma 2. For j ∈ C  we can select a subset Fj of open facilities from Cj ∪ Dj
such that:
|Fj | ≤ r j (with probability 1),
1
E[Fj ] = (1 − γ ) · r j ,
e
 (c) 1 (d)
E[ cij ] ≤ ((1 − 1/e) · r j ) · dj + (((1 − γ ) − (1 − 1/e)) · r j ) · dj .
e
i∈Fj

A technical but not difficult proof we sketch in the Appendix.


254 J. Byrka, A. Srinivasan, and C. Swamy

4.3 Calculation

We can now combine the pieces into the algorithm ALG:

1. solve the LP-relaxation of (1)-(5);


2. scale the fractional solution as described in Section 3.2;
3. create a family of clusters as described in Section 3.4;
4. round the fractional openings as described in Section 3.5;
5. connect each client j to rj closest open facilities;
6. output the solution as (x̃, ỹ).

Theorem 4. ALG is an 1.7245-approximation algorithm for FTFL.

Proof. First observe that the solution produced by ALG is trivially feasible to
the original problem (1)-(5), as we simply choose different rj facilities for client
j in step 5. What is less trivial is that all the rj facilities used by j are within
a certain small distance. Let us now bound the expected connection cost of the
obtained solution.
For each client j ∈ C we get rj − r j facilities opened in Step 2. As we already
argued in Section 3.2, we can afford to connectj to these facilities and pay the
connection cost from the difference between i cij x̂ij and i cij xij . We will
now argue, that client j canconnect to the remaining r j with the expected
connection cost bounded by i cij xij .
For a special client j ∈ (C \C  ) we have r j = 1 and already in Step 2 one special
(max)
facility at distance dj from j is opened. We cannot always just connect j
(max)
to this facility, since dj may potentially be bigger then γ · dj . What we do
instead is that we first look at close facilities of j that, as a result of the rounding
in Step 4, with a certain probability, give one open facility at a small distance.
By Corollary 4 this probability is at least 1 − 1/e. It is easy to observe that
(c)
the expected connection cost to this open facility is at most dj . Only if no
close facility is open, we use the special facility, which results in the expected
connection cost of client j being at most

(c) (d) (c) Rj


(1−1/e)dj +(1/e)dj ≤ (1−1/e)dj +(1/e)dj (1+ ) ≤ dj (1+1/(e·(γ−1)) ≤ γ·dj ,
γ−1

where the first inequality is a consequence of Lemma 1, and the last one is a
consequence of the choice of γ ≈ 1.7245.
In the remaining, we only look at non-special clients j ∈ C  . By Lemma 2,
client j can select to connect itself to the subset of open facilities Fj , and pay for
(c) (d)
this connection at most ((1 − 1/e) · rj ) · dj + (((1 − e1γ ) − (1 − 1/e)) · rj ) · dj in
expectation. The expected number of facilities needed on top of those from Fj is
r j − E[|Fj |] = ( e1γ · r j ). These remaining facilities client j gets deterministically
(max)
within the distance of at most 3 · dj , which is possible by the properties of
the rounding procedure described in Section 3.5, see Corollary 3. Therefore, the
(max)
expected connection cost to facilities not in Fj is at most ( e1γ · rj ) · (3 · dj ).
Fault-Tolerant Facility Location 255

Concluding, the total expected connection cost of j may be bounded by


(c) 1 (d) 1 (max)
((1 − 1/e)·rj ) · dj + (((1 − γ ) − (1 − 1/e)) · r j ) · dj +( γ · r j ) · (3 · dj )
 e e 
(c) 1 (d) 1 (d)
≤ r j · (1 − 1/e) · dj + ((1 − γ ) − (1 − 1/e)) · dj + γ · (3dj )
e e
 
(c) 2 (d)
= r j · (1 − 1/e) · dj + ((1 + γ ) − (1 − 1/e)) · dj
e
 
2 Rj
≤ r j · (1 − 1/e) · (1 − Rj ) · dj + ((1 + γ ) − (1 − 1/e)) · (1 + ) · dj
e γ−1
 
2 Rj
= r j · dj · (1 − 1/e) · (1 − Rj ) + ( γ + 1/e) · (1 + )
e γ−1
 
2 2 1
= r j · dj · (1 − 1/e) + ( γ + 1/e) + Rj · (( γ + 1/e) · − (1 − 1/e))
e e γ−1
  2 
2 ( eγ + 1/e)
= r j · dj · 1 + γ + Rj · − (1 − 1/e) ,
e γ−1
where the second inequality follows from Lemma 1 and the definition of Rj .
2
( +1/e)
Observe that for 1 < γ < 2, we have eγγ−1 − (1 − 1/e) > 0. Recall that by
definition, Rj ≤ 1; so, Rj = 1 is the worst case for our estimate, and therefore
  2 
2 ( eγ + 1/e) 2 1
r j ·dj · 1 + γ + Rj · − (1 − 1/e) ≤ r j ·dj ·(1/e+ γ )(1+ ).
e γ−1 e γ −1

Recall that x incurs, for each client j, a fractional connection cost i∈F cij xij ≥
γ · r j · dj . We fix γ = γ0 , such that γ0 = (1/e + eγ20 )(1 + γ01−1 ) ≤ 1.7245.
To conclude, the expected connection cost of j to facilities opened during the
rounding procedure is at most the fractional connection cost of x. The total
connection cost is, therefore, at most the connection cost of x̂, which is at most
γ times the connection cost of x∗ .
By property (P1) of dependent rounding, every single facility i is opened with
the probability ŷi , which is at most γ times yi∗ . Therefore, the total expected
cost of the solution produced by ALG is at most γ ≈ 1.7245 times the cost of
the fractional optimal solution (x∗ , y ∗ ). 


5 Concluding Remarks
We have presented improved approximation algorithms for the metric Fault-
Tolerant Uncapacitated Facility Location problem. The main technical innova-
tion is the usage and analysis of dependent rounding in this context. We believe
that variants of dependent rounding will also be fruitful in other location prob-
lems. Finally, we conjecture that the approximation threshold for both UFL and
FTFL is the value 1.46 · · · suggested by [5]; it would be very interesting to prove
or refute this.
Acknowledgment. We thank the IPCO referees for their helpful comments.
256 J. Byrka, A. Srinivasan, and C. Swamy

References
1. Ageev, A., Sviridenko, M.: Pipage rounding: a new method of constructing algo-
rithms with proven performance guarantee. Journal of Combinatorial Optimiza-
tion 8(3), 307–328 (2004)
2. Byrka, J.: An optimal bifactor approximation algorithm for the metric uncapaci-
tated facility location problem. In: APPROX-RANDOM, pp. 29–43 (2007)
3. Byrka, J., Srinivasan, A., Swamy, C.: Fault-tolerant facility location: a randomized
dependent LP-rounding algorithm. arXiv:1003.1295v1
4. Chudak, F.A., Shmoys, D.B.: Improved approximation algorithms for the unca-
pacitated facility location problem. SIAM J. Comput. 33(1), 1–25 (2003)
5. Guha, S., Khuller, S.: Greedy strikes back: Improved facility location algorithms.
J. Algorithms 31(1), 228–248 (1999)
6. Guha, S., Meyerson, A., Munagala, K.: A constant factor approximation algorithm
for the fault-tolerant facility location problem. J. Algorithms 48(2), 429–440 (2003)
7. Jain, K., Vazirani, V.V.: An approximation algorithm for the fault tolerant metric
facility location problem. Algorithmica 38(3), 433–439 (2003)
8. Lin, J.-H., Vitter, J.S.: Epsilon-approximations with minimum packing constraint
violation (extended abstract). In: STOC, pp. 771–782 (1992)
9. Shmoys, D.B., Tardos, É., Aardal, K.: Approximation algorithms for facility loca-
tion problems (extended abstract). In: STOC, pp. 265–274 (1997)
10. Srinivasan, A.: Distributions on level-sets with applications to approximation al-
gorithms. In: FOCS, pp. 588–597 (2001)
11. Swamy, C., Shmoys, D.B.: Fault-tolerant facility location. ACM Transactions on
Algorithms 4(4) (2008)

Appendix
Proof. (for Lemma 2, a sketch) Given client j, fractional facility opening
vector y, distances cij , requirement r j , and facility subsets Cj and Dj , we will
describe how to randomly choose a subset of at most k = r j open facilities from
Cj ∪ Dj with the desired properties.
For this argument we assume that all the numbers are rational. Recall that
the opening of facilities is decided in a dependent rounding routine, that in a
single step couples two fractional entries to leave at most one of them fractional.
Observe that, for the purpose of this argument, we can split a single facility
into many identical copies with smaller fractional opening. One can think that
the input facilities and their original openings were obtained along the process
of dependent rounding applied to the multiple “small” copies that we prefer to
consider here. Therefore, without loss of generality, we can assume that all the
facilities have fractional opening equal , i.e., yi = for all i ∈ Cj ∪Dj . Moreover,
we can assume that sets Cj and Dj are disjoint.
By renaming facilities we obtain that Cj = {1, 2, . . . , |Cj |}, Dj = {|Cj | +
1, . . . , |Cj | + |Dj |}, and cij ≤ ci j for all 1 ≤ i < i ≤ |Cj | + |Dj |.
Consider random set S0 ⊆ Cj ∪ Dj created as follows. Let ŷ be the outcome of
rounding the fractional opening vector  y with the dependent rounding procedure,
and define S0 = {i : ŷi = 1, ( j<i ŷ) < k}. By Corollary 1, we have that
Fault-Tolerant Facility Location 257

E[|S0 |] ≥ k ·(1−exp(−SumCj ∪Dj (y)/k)). Define random set Sα for α ∈ (0, |Cj |+
|Dj |] as follows. For i = 1, 2, . . . |Cj | + |Dj | − α we have i ∈ Sα if and only
if i ∈ S0 . For i = |Cj | + |Dj | − α, in case i ∈ S0 we toss a (suitably biased)
coin and include i in Sα with probability α − α. For i > |Cj | + |Dj | − α we
deterministically have i ∈ / Sα .
Observe that E[|Sα |] is a continuous monotone non-increasing function of α,
hence there is α0 such that E[|Sα0 |] = k · (1 − exp(−SumCj ∪Dj (y)/k)). We fix
Fj = Sα0 and claim that it has the desired properties. By definition, we have
E[|Fj |] = k · (1 − exp(−SumCj ∪Dj (y)/k)) = (1 − e1γ ) · r j . We next show that the
expected total connection cost between j and facilities in Fj is not too large.
Let pα i = P r[i ∈ Sα ] and  pi = pα
i
0
= P r[i ∈ Fj ]. Consider the cumulative
α α
probability defined as cpi = j≤i pj . Observe that application of Corollary 1
to subsets of first i elements of Cj ∪ Dj yields cp0i ≥ k · (1 − exp(− i/k)) for
i = 1, . . . , |Cj | + |Dj |. Since (1 − exp(− i/k)) is a monotone increasing function
of i one easily gets that also cpα i ≥ k · (1 − exp(− i/k)) for α ≤ α0 and i =
1, . . . , |Cj | + |Dj |. In particular, we get cpα |Cj | ≥ k · (1 − exp(− |Cj |/k)).
0

Since (1 − exp(− i/k)) is a concave function of i, we also have cpα i ≥


0

k · (1 − exp(− i/k)) ≥ (i/|Cj |) · k · (1 − exp(− |Cj |/k)) = (i/|Cj |) · (1 − 1e ) · r j


for all 1 ≤ i ≤ |Cj |. Analogously, we get
cpα 0
≥ (k · (1 − exp(− |Cj |/k)))
i
 
− (|Cj | + |Dj |)
+((i − |Cj |)/|Dj |) · k · (1 − exp( )) − (1 − exp(− |Cj |/k))
k
 
1 1 1
= r j · (1 − ) + rj · ((i − |Cj |)/|Dj |)((1 − γ ) − (1 − ))
e e e
for all |Cj | < i ≤ |Cj | + |Dj |.  
Recall that we want to bound E[ i∈Fj cij ] = i∈Cj ∪Dj pi cij . From the above
bounds on the cumulative probability, we get that, by shifting the probability
from earlier facilities to later ones, one can obtain a probability vector p with
pi = 1/|Cj |·((1− 1e )·r j ) for all 1 ≤ i ≤ |Cj |, and pi = 1/|Dj |·((1− e1γ )−(1− 1e ))·r j
for all |Cj | < i ≤ |Cj |+|Dj |. As connection costs cij are monotone non-decreasing
in i, shifting the probability never decreases the weighted sum; hence,
 
E[ cij ] = pi cij
i∈Fj i∈Fj

≤ pi cij
i∈Fj
 1
= 1/|Cj | · ((1 − ) · r j )cij
e
1≤i≤|Cj |
 1 1
+ 1/|Dj | · (((1 − ) − (1 − )) · rj )cij
eγ e
|Cj |<i≤|Cj |+|Dj |

(c) 1 (d)
= ((1 − 1/e) · r j ) · dj + (((1 − ) − (1 − 1/e)) · r j ) · dj . 


Integer Quadratic Quasi-polyhedra

Adam N. Letchford

Department of Management Science, Lancaster University,


Lancaster LA1 4YW, United Kingdom
[email protected]

Abstract. This paper introduces two fundamental families of ‘quasi-


polyhedra’ — polyhedra with a countably infinite number of facets —
that arise in the context of integer quadratic programming. It is shown
that any integer quadratic program can be reduced to the minimisation
of a linear function over a quasi-polyhedron in the first family. Some
fundamental properties of the quasi-polyhedra are derived, along with
connections to some other well-studied convex sets. Several classes of
facet-inducing inequalities are also derived. Finally, extensions to the
mixed-integer case are briefly examined.

Keywords: mixed-integer non-linear programming, polyhedral combina-


torics, convex analysis.

1 Introduction
In recent years, there has been increasing interest in Mixed-Integer Non-Linear
Programming (MINLP), due to the realisation that it has a wealth of applica-
tions. This paper is concerned with a special case of MINLP: Integer Quadratic
Programming (IQP). It is assumed that instances of IQP are written in the
following standard form:
 n
min cT x + xT Qx : Ax = b, x ∈ Z+ , (1)

where c ∈ Q , Q ∈ Q , A ∈ Q and b ∈ Q . (As in linear program-


n n×n m×n m

ming, inequalities can be converted into equations using slack variables, and free
variables can be expressed as the difference between two non-negative variables.)
We assume (without loss of generality) that the matrix Q is symmetric, but
we do not require it to be positive semidefinite. That is, we do not assume that
the objective function is convex.
Polyhedral combinatorics — the study of polyhedra associated with combi-
natorial problems — has proven to be a very useful tool for deriving strong
formulations of Mixed-Integer Linear Programs (e.g., [1,16]). The purpose of
this paper is to apply it to IQP. It turns out, however, that one has to deal with
‘quasi-polyhedra’: convex sets that are the intersection of a countably infinite
number of half-spaces. For this reason, polyhedral theory has to be combined
with elements of convex analysis. (A similar strategy was used in [5] to study a
continuous quadratic optimisation problem.)

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 258–270, 2010.

c Springer-Verlag Berlin Heidelberg 2010
Integer Quadratic Quasi-polyhedra 259

The paper is structured as follows. In Sect. 2, two families of quasi-polyhedra


are defined, and it is shown that any IQP instance can be reduced to the problem
of optimising a linear function over a quasi-polyhedron in the first family. In
Sect. 3, some fundamental properties of the quasi-polyhedra are derived, such as
dimension, extreme points, affine symmetries, and relationships with some other
well-studied convex sets. In Sect. 4, we derive several classes of valid inequalities,
all of which are proven to induce facets under mild conditions. Finally, in Sect. 5,
we suggest possible extensions to the mixed-integer case, and pose some questions
for future research.

2 The Quasi-polyhedra
A standard trick when dealing with quadratic optimisation problems is to lin-
earise the objective and/or constraints by introducing additional variables (e.g.,
[14,17,18]). More precisely, for 1 ≤ i ≤ j ≤ n, we define a new variable yij , which
represents the product xi xj . The IQP (1) can then be reformulated as:
 
min cT x + q T y : Ax = b, x ∈ Z+ , yij = xi xj (1 ≤ i ≤ j ≤ n) ,
n

n+1
where q ∈ Q( 2 ) is defined appropriately. Notice that the non-linearity (and
non-convexity, if any) is now captured in the constraints yij = xi xj .
It is an interesting fact that the linear equations can be eliminated from the
problem. Indeed, we can delete an arbitrary linear equation aT x = r, provided
that we add M (aT x − r)2 to the objective function, where M is a suitable large
integer. For this reason, one can concentrate on the unconstrained case, in which
the linear system Ax = b is vacuous.
The set of feasible solutions to an unconstrained IQP, in the extended (x, y)-
space, is:
 .
n+(n+1
2 )
Fn := (x, y) ∈ Z+
+
, yij = xi xj (1 ≤ i ≤ j ≤ n) .

We wish to apply a polyhedral approach to IQP, and are therefore interested in


the convex hull of this set. Unfortunately, there are two minor technical issues
to address.
The first technical issue is that the convex hull of Fn+ is not closed, as expressed
in the following proposition:
Proposition 1. The convex hull of Fn+ is not closed for any n.
Proof. For any t ∈ Z+ , let (xt , y t ) be the member of Fn+ that arises when x1 = t,
y11 = t2 , and all other variables are equal to zero. Moreover, let
1 t t t2 − 1 0 0
(x̃t , x̃t ) = (x , y ) + (x , y ) .
t2 t2
Note that (x̃t , x̃t ) is a convex combination of members of Fn+ and therefore lies in
the convex hull. Note also that (x̃t , x̃t ) is obtained by setting x1 = 1/t, y11 = 1,
260 A.N. Letchford

and all other variables to zero. On the other hand, the point with y11 = 1 and
all other variables at zero does not lie in the convex hull of Fn . Since the convex
hull does not contain all of its limit points, it is not closed. 

We are therefore led to look at the closure of the convex hull, which we denote
by IQ+ +
n . Figure 1 represents IQ1 . It can be seen that it is described by the
non-negativity inequality x1 ≥ 0, together with the inequalities y11 ≥ (2t +
1)x1 − t(t + 1) for all t ∈ Z+ . (A similar observation was made by Michaels &
Weismantel [11] for a closely-related family of polytopes.)

p
y11 p
p
p
p
16 sp

12


s

8 

4 s


s

s
 x1
0 1 2 3 4
Fig. 1. The convex set IQ+
1

The second technical issue is that IQ+ n is, in fact, not a polyhedron. A polyhe-
dron is defined as the intersection of a finite number of half-spaces, but we have
seen that IQ+ 1 is the intersection of a countably infinite number of half-spaces.
(The results that we give in Sect. 4 show that the same holds when n > 1 as
well.) The correct term for such sets is quasi-polyhedra (see, e.g., Anderson et
al. [2]). Fortunately, this issue does not cause any difficulty in what follows.
For the purposes of what follows, we introduce a closely-related family of
quasi-polyhedra, obtained by omitting the non-negativity requirement. Specifi-
cally, we define

n+ n+1
Fn := (x, y) ∈ Z ( 2 ) , yij = xi xj (1 ≤ i ≤ j ≤ n) ,

and then let IQn denote the closure of the convex hull of Fn . One can check that
IQ1 is described by the inequalities y11 ≥ (2t + 1)x1 − t(t + 1) for all t ∈ Z.
Next, we present two simple complexity results:
n is N P-hard in the
Proposition 2. Minimising a linear function over IQ+
strong sense.
Proof. It follows from the above discussion that this problem is equivalent to
IQP. Now, IQP is clearly N P-hard in the strong sense, since it contains Integer
Linear Programming as a special case. 

Integer Quadratic Quasi-polyhedra 261

Proposition 3. Minimising a linear function over IQn is N P-hard in the strong


sense.
Proof. The well-known Closest Vector Problem, proven to be strongly N P-hard
by van Emde Boas [8], takes the form:
 n
min Bx − t2 : x ∈ Z ,

where B ∈ Z is a basis matrix and t ∈ Q is a target point. Clearly, squar-


n×n n

ing the objective function leaves the optimal solution unchanged. The resulting
problem is one of minimising a quadratic function over the integer lattice Z . It
n

follows from the definitions that this is equivalent to minimising a linear function
over IQn . 

We therefore cannot expect to obtain complete linear descriptions of IQn or IQ+
n
for general n.
On a more positive note, we have the following result:
Proposition 4. Minimising a linear function over IQn is solvable in polynomial
time when n is fixed.
Proof. As already mentioned, minimising a linear function over IQn is equiva-
lent to minimising a quadratic function over the integer lattice Z . Now, if the
n

function is not convex, the problem is easily shown to be unbounded. If, on the
other hand, the function is convex, then the problem can be solved for fixed n
by the algorithm of Khachiyan & Porkolab [9]. 

There is therefore some hope of obtaining a complete linear description of IQn
for small values of n (just as we have already done for the case n = 1). We do
not know the complexity of minimising a linear function over IQ+ n for fixed n.

3 Fundamental Properties of the Quasi-polyhedra


In this section, we establish some fundamental properties of the quasi-polyhedra
IQn and IQ+ n.

3.1 Dimension and Extreme Points


We begin with two elementary results:
+
Proposition 5. For
 all n, both IQn and IQn are full-dimensional, i.e., of
n+1
dimension n + 2 .
Proof. Consider the following extreme points of IQ+
n:

– the origin (i.e., all variables set to zero);


– for i = 1, . . . , n, the point having xi = yii = 1 and all other variables zero;
– for i = 1, . . . , n, the point having xi = 2, yii = 4 and all other variables zero;
– for 1 ≤ i < j ≤ n, the point having xi = xj = 1, yii = yjj = yij = 1, and all
other variables zero.
262 A.N. Letchford
 
These n + n+1 2 + 1 points are easily shown to be affinely independent, and
therefore IQ+n is full-dimensional. Since IQ+
n is contained in IQn , the same is
true for IQn . 

Proposition 6. Every point in Fn is an extreme point of IQn , and every point
in Fn+ is an extreme point of IQ+
n.

Proof. Let x̄ be an arbitrary point in Z, and let (x̄, ȳ) be the corresponding
n
n
member of Fn . The quadratic function i=1 (xi − x̄i )2 has a unique minimum
at x = x̄. Since every point in Fn satisfies yij = xi xj for all 1 ≤ i ≤ j ≤ n, the
n
linear function i=1 (yii −2x̄i xi + x̄2i ) has a unique minimum at (x̄, ȳ). Therefore
(x̄, ȳ) is an extreme point of IQn . The proof for IQ+ n is similar. 


3.2 Affine Symmetries


Now we examine the affine symmetries of the quasi-polyhedra, i.e., affine trans-
formations that map the quasi-polyhedra onto themselves.
Proposition 7. Let π be an arbitrary permutation of the index set {1, . . . , n}.
n+ n
Consider the linear transformation that takes any (x, y) ∈ R ( 2 ) and maps it
n+(n
2)
to a point (x , y  ) ∈ R , where
– xi = xπ(i) for all i ∈ {1, . . . , n},

– yij = yπ(i),π(j) for all 1 ≤ i ≤ j ≤ n.

This transformation maps IQ+


n onto itself.

Proof. Trivial. 

Theorem 1. Let U be a unimodular integral square matrix of order n, and let
w ∈ Z be an arbitrary integer vector. Consider the affine transformation that
n

n+ n n+ n
takes any (x, y) ∈ R ( 2 ) and maps it to a point (x , y  ) ∈ R ( 2 ) , where
– x = U x + w;

– yij = xi xj for all 1 ≤ i ≤ j ≤ n.
This transformation maps IQn onto itself.
Proof. Let (x, y) be an extreme point of IQn , and let (x , y  ) be its image under
the transformation. Since U and w are integral, x is integral. Moreover, since

yij = xi xj , (x , y  ) is an extreme point of IQn . For the reverse direction, let
 
(x , y ) be an extreme point of IQn , and let (x, y) be its image under the inverse
transformation. Note that x = U −1 (x − w), and is therefore integral. Moreover,
yij = xi xj for all 1 ≤ i ≤ j ≤ n, which implies that (x, y) is an extreme point of
IQn . 

Proposition 7 simply states that IQ+ n is invariant under a permutation of the
index set {1, . . . , n}, which is unsurprising. Theorem 1, on the other hand, has
a very useful corollary:
Integer Quadratic Quasi-polyhedra 263

Corollary 1. Let U be a unimodular integral square matrix of order n, and let


w ∈ Z be an arbitrary integer vector. If the linear inequality αT x + β T y ≥ γ is
n

facet-inducing for IQn , then so is the linear inequality αT x + β T y  ≥ γ, where


(x , y  ) is defined as in Theorem 1.
Intuitively speaking, this means that any inequality inducing a facet of IQn can
be ‘rotated’ and ‘translated’ to yield a countably infinite family of facet-inducing
inequalities.
It is also possible to convert any facet-inducing inequality for IQn into a facet-
inducing inequality for IQ+ n:

Theorem 2. Suppose the inequality αT x + β T y ≥ γ induces a facet of IQn .


Then there exists a vector v ∈ Z+ such that the inequality αT x + β T y  ≥ γ
n
+
induces a facet of IQn , where:

– x = x − v;

– yij = xi xj for all 1 ≤ i ≤ j ≤ n.
 
Proof. Let d = n + n+1 2 . Since the original inequality α x + β y ≥ γ in-
T T

duces a facet of IQn , there exist d affinely-independent members of Fn that


satisfy it at equality. Let x1 , . . . , xd denote the corresponding x vectors. Now,
for i = 1, . . . , n, set vi to min1≤j≤d xji . The resulting transformed inequality
αT x + β T y  ≥ γ induces a facet of IQn by Theorem 1. It is also valid for IQ+ n,
since IQ+n is contained in IQ n . Moreover, the points x1
− v, . . . , xd
− v all lie in
the non-negative orthant by construction. These points correspond to affinely-
independent members of Fn+ that satisfy the transformed inequality at equality.
Therefore the transformed inequality induces a facet of IQ+ n. 


Therefore, any inequality inducing a facet of IQn yields a countably infinite


family of facet-inducing inequalities for IQ+
n as well.

3.3 Two Related Cones

Recall that a symmetric matrix M ∈ R


n×n
is called positive semidefinite (psd)
if it can be factorised as AAT for some real matrix A. The set of psd matrices
of order n forms a convex cone in R
n×n
. It is well known that this cone is
completely described by the linear inequalities v T M v ≥ 0 for all vectors v ∈ R .
n

We now use a standard construction [10,17] to establish a connection between


IQn and the psd cone. Define the n × n symmetric matrix Y = xxT , and note
that, for any 1 ≤ i ≤ j ≤ n, Yij = yij . Define also the augmented matrix
  T  T 
1 1 1x
Ŷ := = .
x x x Y

Since Ŷ is the product of a vector and its transpose, it must be psd. Equivalently,
v T Y v + (2s)v T x + s2 ≥ 0 for all vectors v ∈ R and scalars s ∈ R. This
n

observation immediately yields the following result:


264 A.N. Letchford

Proposition 8. The following ‘psd inequalities’ are valid for IQn (and therefore
also for IQ+
n ):


n 
(∀v ∈ R , s ∈ R) .
n
(2s)v T x + vi2 yii + 2 vi vj yij + s2 ≥ 0 (2)
i=1 1≤i<j≤n

To the knowledge of the author, the validity of the psd inequalities for extended
formulations of quadratic optimisation problems was first observed by Ramana
[15]. The inequalities can be shown to induce proper faces of IQn and IQ+ n under
mild conditions. We will see in the next section, however, that they never induce
facets.
Now recall that a symmetric matrix M ∈ R
n×n
is called completely positive
T
if it can be factorised as AA for some non-negative real matrix A. The set of
completely positive matrices of order n also forms a convex cone in R
n×n
. Using
exactly the same argument as above, any valid inequality for the completely
positive cone yields a valid inequality for IQ+ n . Unfortunately, this additional
information does not help us much, because a complete linear description of the
completely positive cone is unknown, and unlikely to be found for general n [12].

3.4 A Connection to the Boolean Quadric Polytope

We close this section by pointing out a connection between IQn , IQ+ n and the
so-called boolean quadric polytope. The boolean quadric polytope of order n is
denoted by BQPn and is defined as:
n

BQPn = conv (x, y) ∈ {0, 1}n+(2 ) : yij = xi xj (1 ≤ i < j ≤ n) .

Note that the yii variables are not present in the case of BQPn .
The boolean quadric polytope was defined by Padberg [14] in the context of
quadratic 0-1 programming. It has many applications in other fields and has
been studied in great depth [7].
We will need the following lemma:
Lemma 1. For all 1 ≤ i ≤ n, the inequality yii ≥ xi is valid for IQn .

Proof. This follows from the fact that all members of Fn satisfy yii = x2i , and
the fact that t2 ≥ t for any integer t. 


The following proposition states that BQPn is essentially nothing but a face of
IQn :
Proposition 9. Let H be the face of IQn obtained by setting the inequality
yii ≥ xi to an equation for all 1 ≤ i ≤ n. The boolean quadric polytope BQPn is
an affine image of H.

Proof. Note that t2 = t if and only if t ∈ {0, 1}. Therefore, the extreme points
of H are precisely the members of Fn that satisfy x ∈ {0, 1}n. So there is a
Integer Quadratic Quasi-polyhedra 265

one-to-one correspondence between extreme points of H and extreme points of


BQPn . Moreover, every extreme point (x∗ , y ∗ ) of BQPn can be mapped onto an

extreme point of H simply by setting yii = x∗i for all i = 1, . . . , n. This mapping
is affine. 

An immediate consequence of Proposition 9 is that valid or facet-inducing in-
equalities for BQPn can be lifted to yield valid or facet-inducing inequalities
for IQn :
Corollary 2. Suppose the inequality

n 
ai xi + bij yij ≤ c
i=1 1≤i<j≤n

induces a facet of BQPn . Then there exists at least one facet-inducing inequality
for IQn of the form

n 
n 
(ai − λi )xi + λi yii + bij yij ≤ c ,
i=1 i=1 1≤i<j≤n

with λ ∈ Q .
n

Similar results can be shown to hold for IQ+


n.

4 Some Facet-Inducing Inequalities


We now move on to consider some specific classes of facet-inducing inequalities.

4.1 Non-negativity Inequalities


Since IQ+
n is contained in the completely positive cone, it is clear that all variables
are constrained to be non-negative. The following theorem states conditions
under which non-negativity inequalities induce facets of IQ+ n:

Theorem 3. The inequalities xi ≥ 0 for all 1 ≤ i ≤ n, and the inequalities


yij ≥ 0 for all 1 ≤ i < j ≤ n, induce facets of IQ+
n . The inequalities of the form
yii ≥ 0, on the other hand, never induce facets of IQ+ n.

Proof. To see that the inequalities of the form yij ≥ 0 induce facets, simply
note that all but one of the affinely-independent points listed in the proof of
Proposition 5 satisfy yij = 0. To see that the inequalities of the form yii ≥ 0 do
not induce facets, simply note that they are dominated by the inequalities xi ≥ 0
and yii ≥ xi (refer to Fig. 1). The inequalities
  of the form xi ≥ 0 are a little
more tricky: one can easily construct n + n2 affinely-independent points with
xi = 0, but to complete the proof one needs an additional n extreme rays of IQ+ n
having xi = 0. The proof of Proposition 1 shows that there is a ray with yii = 1
and all other variables zero. Using a similar argument, one can show that, for
all j = i, there is a ray with xj = yij = 1 and all other variables zero. 

The non-negativity inequalities are of course not valid for IQn .
266 A.N. Letchford

4.2 Split Inequalities


In this subsection, we introduce a more interesting class of inequalities, valid
for both IQ+ n and IQn . Before presenting them, we recall the definition of split
disjunctions, taken from [6]. A split disjunction is a disjunction of the form
(v T x ≤ s) ∨ (v T ≥ s + 1), where v ∈ Z and s ∈ Z. Split disjunctions are
n

obviously satisfied by all lattice points x ∈ Z . An example of a split disjunction


n

is illustrated in Fig. 2.

r r r r r


r rrr

r

 
x2 r r r r r
 
 
r  r
 r r r

r r r r r
x1
Fig. 2. The split disjunction (x1 − 2x2 ≤ −2) ∨ (x1 − 2x2 ≥ −1)

The following proposition uses split disjunctions to derive an infinite family


of valid inequalities:
Proposition 10. For any vector v ∈ Z and scalar s ∈ Z, the following ‘split’
n

inequality is valid for both IQn and IQ+


n:


n 
(2s + 1)v T x + vi2 yii + 2 vi vj yij + s(s + 1) ≥ 0 . (3)
i=1 1≤i<j≤n

Proof. The split disjunction (v T x ≤ −s − 1) ∨ (v T x ≥ −s) implies the quadratic


inequality (v T x+s)(v T x+s+1) ≥ 0. Expanding this and substituting Y for xxT
yields v T Y v +(2s+1)v T x+s(s+1) ≥ 0, which is equivalent to the inequality (3).



We remark that an important class of cutting planes for Mixed-Integer Linear


Programs, called split cuts, can be derived using split disjunctions [3,6]. It is
important to note however that the split inequalities (3) are not split cuts in the
traditional sense. Indeed, split cuts arise from the interaction between a split
disjunction and a set of linear constraints, whereas the split inequalities (3) are
directly implied by the disjunctions themselves.
One can check that IQ1 is completely described by the split inequalities, and
that IQ+1 is completely described by the split inequalities together with the non-
negativity inequality x1 ≥ 0. The following three theorems give further evidence
that split inequalities are theoretically strong:
Integer Quadratic Quasi-polyhedra 267

Theorem 4. The split inequalities (3) dominate the psd inequalities (2).

Proof. First, suppose that a psd inequality is derived using an integral vector
v and an integral scalar s. Recall that the psd inequality can be written as
v T Y v + (2s)v T x + s2 ≥ 0. This is dominated by the two inequalities v T Y v +
(2s + 1)v T x + s(s + 1) ≥ 0 and v T Y v + (2s − 1)v T x + s(s − 1) ≥ 0, which are
both split inequalities.
To complete the proof, we must show that the psd inequalities derived from
integral v and s dominate all the others. Suppose a point (x∗ , y ∗ ) violates a psd
inequality with non-integral v or s, and let be a small positive quantity. Let v 
be a rational vector such that |vi −vi | < for all i, and let s be a rational number
such that |s − s| < . Provided is small enough, the psd inequality obtained by
using v  and s in place of v and s will also be violated by (x∗ , y ∗ ). Now let M
be a positive integer such that M v  ∈ Z and M s ∈ Z. The psd inequality with
n

M v and M s in place of v and s will also be violated by (x∗ , y ∗ ). Therefore


   

the original psd inequality is redundant. 




Theorem 5. Split inequalities induce facets of IQn if the non-zero components


of v are relatively prime.

Proof. First, note that the trivial inequality y11 ≥ x1 is a split inequality, ob-
tained by linearising the quadratic inequality (x1 − 1)x1 ≥ 0. This trivial split
inequality induces a facet of IQn , because all but one of the affinely-independent
points listed in the proof of Proposition 5 satisfy y11 = x1 .
Now consider a non-trivial split inequality of the form (3), and assume that the
non-zero components of v are relatively prime. A well-known result on integral
matrices (see, e.g., p. 15 of Newman [13]) implies that there exists a unimodular
matrix U ∈ Z
n×n
having v as its first row. Let U be such a matrix, and let
w ∈ Z be an arbitrary vector satisfying w1 = s + 1. Note that, if (x, y) is an
n

extreme point of IQn and (x , y  ) is the transformed extreme point described in
Theorem 1, then x1 = v T x+s+1 and y11 
= (x1 )2 = v T Y v+2(s+1)v T x+(s+1)2 .
Thus, if we apply the transformation mentioned in Corollary 1 to the trivial split
inequality y11 ≥ x1 , we obtain the inequality v T Y v + 2(s + 1)v T x + (s + 1)2 ≥
v T x + s + 1. This is equivalent to the non-trivial split inequality. By Corollary
1, it induces a facet of IQn . 


Theorem 6. Split inequalities induce facets of IQ+ n if the non-zero components


of v are relatively prime and not all of the same sign.

Proof. First, note that when v satisfies the stated condition, there exists a vector
w ∈ Z such that v T w = 0 and such that wi > 0 for all i. To see this, let k and
n

k be the number of components of v that are positive and negative, respectively,
and let m be the product of the non-zero components of v. The desired vector w
can be obtained by setting wi to k  |m|/vi when vi > 0, to k|m|/|vi | when vi < 0,
and to 1 otherwise.
Second, observe that an extreme point (x̄, ȳ) of IQn satisfies the split inequal-
ity (3) at equality if and only if v T x̄ ∈ {−s−1, −s}. Therefore, if (x̄, ȳ) is such an
268 A.N. Letchford

extreme point, then so is the extreme point obtained by replacing x̄ with x̄ + w,


and adjusting ȳ accordingly. Let us call this (affine) transformation ‘shifting’.
Now, since the split inequality
  induces a facet of IQn under the stated con-
ditions, there exist n + n+12 affinely-independent points in Fn that satisfy the
split inequality at equality.
  By shifting this set of points, repeatedly if neces-
sary, we obtain n + n+12 affinely-independent points in Fn+ that satisfy the split
inequality at equality. Therefore the split inequality induces a facet of IQ+ n as
well. 


If the non-zero components of the vector v all have the same sign, then the split
inequality need not induce even a proper face of IQ+ n (because there may not
exist a lattice point x ∈ Z+ such that v T x ∈ {−s − 1, −s}). Theorem 2 implies
n

however the following result:


Corollary 3. Let v ∈ Z be such that all its components are relatively prime
n

and of the same sign. Then there exists an integer s, of the opposite sign, such
that the split inequality (3) induces a facet of IQ+
n.

To close this subsection, we remark that Propositions 9 and 10 imply the validity
of the following inequalities for BQPn :


n 
(∀v ∈ Z , s ∈ Z) . (4)
n
vi (vi + 2s+ 1)xi + 2 vi vj yij + s(s+ 1) ≥ 0
i=1 1≤i<j≤n

These inequalities were discovered by Boros & Hammer [4].

4.3 Other Inequalities

We have seen that IQ+ 1 is completely described by the split inequalities and the
non-negativity inequality x1 ≥ 0. A natural question is whether the split and
non-negativity inequalities are enough to describe IQ+ 2 . This is unfortunately
not the case, as we now explain.
Consider the two lines in R defined by the equations x1 + x2 = 3 and x1 +
2

2x2 = 4. As illustrated in Fig. 3, these lines pass through several points in Z+ .


2

Moreover, all points in Z+ are either above both lines (satisfying x1 + x2 ≥ 3


2

and x1 + 2x2 ≥ 4), or below both lines (satisfying x1 + x2 ≤ 3 and x1 + 2x2 ≤ 4).
This implies that all points in F2+ satisfy the non-linear inequality (x1 + x2 −
3)(x1 + 2x2 − 4) ≥ 0. This implies that the linear inequality

−7x1 − 9x2 + y11 + 3y12 + 2y22 ≥ 12

is valid for IQ+


2 . One can check (either by hand or with the aid of a computer)
that this inequality induces a facet of IQ+
2.
Using ‘non-standard’ split disjunctions of this kind, one can easily derive other
n for n ≥ 2. Details will be given in the full
inequalities that induce facets of IQ+
version of the paper.
Integer Quadratic Quasi-polyhedra 269

r r r r r

r r r r r
@
@
r
x2 H @r r r r
HH@
H@
r r H Hr r r
@H
H
@ H
r r r @r HHr
x1

Fig. 3. A ‘non-standard’ split when n = 2

Turning attention to IQn , we have seen that IQ1 is completely described by


split inequalities. A natural question is whether the split inequalities are enough
to describe IQ2 . We do not know the answer to this question. We are however
able to show that the split inequalities do not completely describe IQ6 . Indeed,
one can show using results in [7] that the boolean quadric polytope BQP6 is not
completely described by the Boros-Hammer inequalities (4). This implies, via
Corollary 2, that there exist facet-inducing inequalities for IQ6 that are not split
inequalities. Specific inequalities of this kind will be presented in the full version
of the paper.

5 Concluding Remarks

This paper marks a first step in applying polyhedral methods to Integer


Quadratic Programs. There are many interesting open questions. We have al-
ready mentioned the question of whether one can optimise a linear function over
IQ+n in polynomial time for fixed n, and whether the split inequalities completely
describe IQ2 . Another important question is whether the separation problem for
the split inequalities can be solved in polynomial time.
Perhaps more importantly, it would be worthwhile extending the approach
given in this paper to the mixed-integer case. Some preliminary observations on
this case are the following. First, one has to deal with general convex sets rather
than quasi-polyhedra, since the number of feasible solutions is no longer count-
able. Second, the split inequalities should be defined only when the components
of the vector v are zero for all continuous variables, since otherwise they may not
be valid. Third, it is no longer the case that the psd inequalities are dominated
by the split inequalities. Indeed, if the vector v has a non-zero component for at
least one continuous variable, it is even possible for a psd inequality to induce
a maximal face of the convex set. Details will be given in the full version of the
paper.

Acknowledgement. The author was supported by the Engineering and Phys-


ical Sciences Research Council under grant EP/D072662/1.
270 A.N. Letchford

References
1. Aardal, K.I., Weismantel, R.: Polyhedral combinatorics. In: Dell’Amico, M., Mafi-
oli, F., Martello, S. (eds.) Annotated Bibliographies in Combinatorial Optimiza-
tion. Wiley, New York (1997)
2. Anderson, E.J., Goberna, M.A., López, M.A.: Simplex-like trajectories on quasi-
polyhedral sets. Math. Oper. Res. 26, 147–162 (2001)
3. Balas, E.: Disjunctive programming. Ann. Discr. Math. 5, 3–51 (1979)
4. Boros, E., Hammer, P.L.: Cut-polytopes, Boolean quadric polytopes and nonneg-
ative quadratic pseudo-Boolean functions. Math. Oper. Res. 18, 245–253 (1993)
5. Burer, S., Letchford, A.N.: On non-convex quadratic programming with box con-
straints. SIAM J. Opt. 20, 1073–1089 (2009)
6. Cook, W., Kannan, R., Schrijver, A.: Chvátal closures for mixed integer program-
ming problems. Math. Program. 47, 155–174 (1990)
7. Deza, M.M., Laurent, M.: Geometry of Cuts and Metrics. Springer, Berlin (1997)
8. van Emde Boas, P.: Another NP-complete problem and the complexity of com-
puting short vectors in a lattice. Technical Report 81-04, Mathematics Institute,
University of Amsterdam (1981)
9. Khachiyan, L., Porkolab, L.: Integer optimization on convex semialgebraic sets.
Discr. Comput. Geom. 23, 207–224 (2000)
10. Lovász, L., Schrijver, A.J.: Cones of matrices and set-functions and 0-1 optimiza-
tion. SIAM J. Opt. 1, 166–190 (1991)
11. Michaels, D., Weismantel, R.: Polyhedra related to integer-convex polynomial sys-
tems. Math. Program. 105, 215–232 (2006)
12. Murty, K.G., Kabadi, S.N.: Some N P-complete problems in quadratic and nonlin-
ear programming. Math. Program. 39, 117–129 (1987)
13. Newman, M.: Integral Matrices. Academic Press, New York (1972)
14. Padberg, M.W.: The boolean quadric polytope: some characteristics, facets and
relatives. Math. Program. 45, 139–172 (1989)
15. Ramana, M.: An Algorithmic Analysis of Multiquadratic and Semidefinite Pro-
gramming Problems. PhD thesis, Johns Hopkins University, Baltimore, MD (1993)
16. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency. Springer,
Berlin (2003)
17. Sherali, H.D., Adams, W.P.: A Reformulation-Linearization Technique for Solving
Discrete and Continuous Nonconvex Problems. Kluwer, Dordrecht (1998)
18. Yajima, Y., Fujie, T.: A polyhedral approach for nonconvex quadratic programming
problems with box constraints. J. Global Opt. 13, 151–170 (1998)
An Integer Programming and Decomposition
Approach to General Chance-Constrained
Mathematical Programs

James Luedtke

Department of Industrial and Systems Engineering


University of Wisconsin-Madison, Madison, WI 53706, USA
[email protected]

Abstract. We present a new approach for exactly solving general chance


constrained mathematical programs having discrete distributions. Such
problems have been notoriously difficult to solve due to nonconvexity of
the feasible region, and currently available methods are only able to find
provably good solutions in certain very special cases. Our approach uses
both decomposition, to enable processing subproblems corresponding to
one possible outcome at a time, and integer programming techniques, to
combine the results of these subproblems to yield strong valid inequal-
ities. Computational results on a chance-constrained two-stage problem
arising in call center staffing indicate the approach works significantly
better than both an existing mixed-integer programming formulation
and a simple decomposition approach that does not use strong valid in-
equalities. Thus, the strength of this approach results from the successful
merger of stochastic programming decomposition techniques with inte-
ger programming techniques for finding strong valid inequalities.

Keywords: Stochastic programming, integer programming, chance


constraints, probabilistic constraints, decomposition.

1 Introduction
We introduce a new approach for exactly solving general chance-constrained
mathematical programs (CCMPs). A chance constraint states that the chosen
decision vector should, with high probability, lie within a region that depends
on a set of random variables. A generic CCMP can be stated as
 
min f (x) | P{x ∈ P (ω)} ≥ 1 − , x ∈ X , (1)

where x ∈ Rn is the vector of decision variables to be chosen to minimize f (x), ω


is a random vector and P (ω) ⊆ Rn is a region parameterized by ω. The interpre-
tation is that the region P (ω) is defined such that the event x ∈/ P (ω) is an unde-
sirable outcome. The parameter ∈ (0, 1) is a risk tolerance, typically small, that
limits the likelihood of such an outcome. A problem with uncertain linear con-
straints is the special case of this problem in which P (ω) = {x | T (ω)x ≥ b(ω)}

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 271–284, 2010.

c Springer-Verlag Berlin Heidelberg 2010
272 J. Luedtke

and a two-stage problem with the possibility to take recourse after observing the
random outcome has P (ω) = {x | ∃y with T (ω)x + W (ω)y ≥ b(ω)}. (In §5.1 we
describe an example application.)
Our approach works for CCMPs with discrete (and finite support) distribu-
tion. Specifically, we assume that1 P{ω = ω k } = 1/N for k = 1, . . . , N . We refer
to the possible outcomes as scenarios. While this is a restriction, recent results
on using sample average approximations on problems with general distributions
[1] demonstrate that such finite support approximations, when obtained from a
Monte Carlo sample of the original distribution, can be used to find good feasible
solutions to the original problem and statistical bounds on solution quality. We
also assume that the sets Pk := P (ω k ) are polyhedra described by

Pk = {x ∈ Rn+ | ∃y ∈ Rd+ with T k x + W k y ≥ bk } , (2)

where bk ∈ Rm and T k and W k are appropriately sized matrices. The special


case with d = 0 yields a mathematical program with chance-constrained linear
constraints having random coefficients: P{T (ω)x ≥ b(ω)} ≥ 1 − . We omit the
details here, but our approach can be extended to the case in which P (ω) is
convex, provided we have oracles for separation and optimization over P (ω).
CCMPs have a long history dating back to Charnes, Cooper and Symonds
[2]. The general version considered here, which enforces a system of constraints
to be enforced with high probability, was introduced by Prékopa [3]. However,
solution of the general problem (1) has remained computationally challenging for
two reasons: the feasible region is generally not convex, and evaluating solution
feasibility requires multi-dimensional integration. As discussed above, the latter
difficulty can be addressed by a sample-average approximation approach. How-
ever, this approach still requires a computationally efficient method for solving
the resulting approximation problem which still has the form (1), except that
the probability distribution is simplified to one with finite support.
Methods for obtaining provably good solutions for CCMPs have been suc-
cessful in only a couple very special cases. If the chance constraint consists of
a single row and all random coefficients are normally distributed [4,5], then
a deterministic (nonlinear and convex) reformulation is possible. If the ran-
domness appears only in the right-hand side of the chance constraints (i.e.,
P (ω) = {x | T x ≥ b(ω)}) and the random vectors b(ω) have continuous and
log-concave distributions, the resulting feasible region is convex and so nonlin-
ear programming techniques can be used [3]. If the randomness appears only
in the right-hand side and the distribution of b(ω) is discrete, then approaches
based on certain “efficient points” of the random vector [6,7] or on strong integer
programming formulations [8,9] have been proposed.
Very few methods are available for finding provably good solutions for CCMPs
with the general structure we consider here, e.g., for problems having linear con-
straints with random coefficients or two stage problems as in (2). In [10], an ap-
proach based on an integer programming formulation (which we give in §2),
1
The extension to 
more general discrete distributions of the form P{ω = ω k } = pk ,
where pk ≥ 0 and k pk = 1, is straightforward and is omitted to simplify exposition.
Integer Programming and Decomposition for Chance Constraints 273

strengthened with precedence constraints is presented. In more recent work, [11]


presents a specialized branch-and-cut algorithm based on identification of irre-
ducible infeasible sets of certain linear inequality systems. While these are impor-
tant contributions, the size of instances that are demonstrated to be solvable with
these approaches is very limited, in particular, because these approaches do not
enable decomposition. In another recent important stream of research, a number
of conservative approximations [12,13,14,15,16,17,18] have been studied that solve
tractable (convex) approximations to yield feasible solutions to general CCMPs.
However, these approaches do not say anything about the cost of the resulting so-
lutions relative to the optimal, and tend to yield highly conservative solutions.
Our approach is important because it is an exact approach for solving prob-
lems with general chance constraints, and as we show in §5, has the potential to
solve problems with high-dimensional random parameters and a large number
of scenarios. The approach builds on the ideas of [19,8] that were very successful
for solving chance-constrained problems with random right-hand side only by
developing a method to apply the same types of valid inequalities used there to
the much more general case considered here. The other important aspect of our
approach is that it enables decomposition of the problem into single scenario
subproblems. This is important for solving CCMPs with discrete distributions
because the problem size grows as the size of the support increases. The ability
of this approach to solve large instances of this problem, even for the particular
structure of the test problem described in §5.1, is significant because, until now,
a major impediment to using a chance-constrained model has been the difficulty
in solving such problems in all but a few very special cases. The approach we
present here, when combined with the sample average approximation results of
[1], has the potential to remove this barrier.
Decomposition has long been used for solving traditional two-stage stochastic
programming problems, where the objective is to minimize the sum of costs of
the first stage decisions and the expected costs of second-stage recourse decisions
(see, e.g., [20,21,22]). For CCMPs, the only existing paper we are aware of that
considers a decomposition approach is [23] which applies a decomposition ap-
proach to a chance-constrained formulation of an application insuring vital arcs
in a critical path network. The decomposition idea is similar to what we present
here, but the mechanism for generating cuts is significantly different: they use a
convex hull reformulation (based on Relaxation-Linearization techniques) which
involves “big-M ” constants, likely leading to weak inequalities. In contrast, we
combine the valid inequalities we obtain from different subproblems in a way
that avoids the need for “big-M ” constants and hence yields strong valid in-
equalities for the overall problem. As we will see in the computational results in
§5, the use of strong valid inequalities makes a very significant difference beyond
the benefits obtained from decomposition.
The remainder of this extended abstract is organized as follows. We start with
an overview of the approach in §2. In §3 we describe how we generate strong valid
inequalities, and in §4 we describe the decomposition branch-and-cut algorithm.
Finally, we present preliminary computational results of the approach in §5.
274 J. Luedtke

2 Overview of the Approach


To fix notation, and motivate the approach, we first describe a standard integer
programming formulation of problem (1). We also make a couple assumptions
that assure this formulation is well-defined, and that also simplify exposition of
the main results in the rest of the paper. We assume without loss of generality
that the sets Pk are non-empty for all k ∈ N , since we could otherwise discard
such a scenario and consider a problem with risk tolerance  = − 1/N . We also
assume that the sets Pk ∩ X are compact for all k ∈ N . Finally, for notational
convenience we define the scenario index set N = {1, . . . , N }.
The standard mixed-integer programming formulation (e.g., [10]) uses a bi-
nary variable zk for each scenario k, where zk = 0 implies the constraints of
scenario k should be satisfied:

min f (x) (3a)


s.t. T k x + W k y k + zk Mk ≥ bk , k∈N , (3b)

N
zk ≤ p , (3c)
k=1
x ∈ X, z ∈ {0, 1}N , y k ∈ Rd+ , k ∈ N . (3d)

Here p := (1− )N  and Mk ∈ Rm + are sufficiently large to ensure that when zk =


1, constraints (3b) are not active. On the other hand, when zk = 0, constraints
(3b) enforce x ∈ Pk . Thus, (3c), which is a rewritten and strengthened version
of the constraint
1 
N
(1 − zk ) ≥ 1 − ,
N
k=1

successfully models the constraint P{x ∈ P (ω)} ≥ 1 − . Our approach is mo-


tivated by the desire to avoid the use of big-M constants as in (3b), which are
likely to lead to weak lower bounds when solving a continuous relaxation of (3),
and also to use decomposition to avoid explicit introduction of the constraints
(3b) and recourse variables y k which may make (3) very large-scale if N is large.
The goal of our approach, to avoid using big-M constraints and associated
variables, is similar in spirit to the goal of combinatorial Benders’ cuts intro-
duced by Codato and Fischetti [24]. However, we are able to take advantage of
the structural properties of the CCMP to obtain stronger valid inequalities. In
particular, the valid inequalities we use include both the “logical” zk variables
and the variables x, in contrast to the combinatorial Benders’ cuts that are
based only on the logical variables. We refer the reader to [11] for an approach
to CCMPs that has a closer connection to combinatorial Benders’ cuts.
Our decomposition algorithm is based on a master problem that includes the
original variables x, and the binary variables z. The constraints (3b) are enforced
implicitly with cutting planes, similar in spirit to a Benders’ decomposition ap-
proach. The key difference, however, is that given the mixed-integer nature of
our master problem, we seek to add cutting planes that are strong. Specifically,
Integer Programming and Decomposition for Chance Constraints 275

we are interested in strong valid inequalities for the projection of the feasible
region of (3) into the space of x and z variables. Specifically, we define this
projection as
 
F = x ∈ X, z ∈ {0, 1}N | ∃y ∈ Rd×N+ s.t. (3b) − (3c) hold . (4)

Note that (x, z) ∈ F if and only if x ∈ X, z ∈ {0, 1}N satisfies (3c), and x ∈ Pk
for any k with zk = 0. Given this definition of F , we can then succinctly state a
reformulation of the original chance-constrained problem (1) as:

min{f (x) | (x, z) ∈ F } . (5)

Our algorithm solves this reformulation.


In §3 we describe how we obtain strong valid inequalities for F , for a given
set of coefficients on the x variables. Then, in §4 we describe the decomposition
approach which naturally suggests a choice for the coefficients on the x variables
that leads to a convergent branch-and-cut algorithm. In our current implemen-
tation, we use only this approach for choosing these coefficients, but we believe
that, depending on the problem structure, alternative approaches may be useful
for yielding additional strong valid inequalities.

3 Generating Strong Valid Inequalities


We now describe our procedure for generating valid inequalities of the form

αx + πz ≥ β (6)

for the set F defined in (4), where α ∈ Rn , π ∈ RN , and β ∈ R. We assume here


that the coefficients α are given, so our task is find π and β that make (6) valid
for F . In addition, given a possibly fractional solution (x̂, ẑ) our separation task
is to find, if possible, π and β such that (x̂, ẑ) violate the resulting inequality.
The approach is very similar to that used in [19,8], which applies only to
chance-constrained problems with random right-hand side. However, by exploit-
ing the fact that we have assumed α to be fixed, we are able to reduce our
significantly more general problem to the structure studied in [19,8] and ulti-
mately apply the same types of valid inequalities.
The first step in our procedure is to solve the following auxiliary “single
scenario” problems:
 
hk (α) := min αx | x ∈ Pk ∩ X̄ , k ∈ N . (7)

Here X̄ ⊆ Rn is a relaxation of the set X, i.e., X̄ ⊇ X, chosen such that Pk ∩ X̄


is non-empty and compact, guaranteeing that the above optimal values exist.
The choice of X̄ represents a trade-off in time to compute the values hk (α)
and strength of the resulting valid inequalities. Choosing X̄ = Rn leads to a
problem for calculating hk (α) that has the fewest number of constraints (and
presumably the shortest computation time), but choosing X̄ = X yields the
276 J. Luedtke

strongest inequalities. In particular, if X is described as a polyhedron with addi-


tional integer restrictions on some of the variables, problem (7) would become a
mixed-integer program and hence could be computationally demanding to solve,
although doing so may yield significantly better valid inequalities.
Observe that calculation of the hk (α) values decomposes by scenario and can
be easily implemented in parallel. Having obtained the values hk (α) for k ∈ N ,
we then sort them to obtain a permutation σ of N such that:

hσ1 (α) ≥ hσ2 (α) ≥ · · · ≥ hσN (α) .

Although the permutation depends on α, we suppress this dependence to simplify


notation. Our first lemma uses these values to establish a set of “base” inequal-
ities that are valid for F , which we ultimately combine to obtain stronger valid
inequalities.
Lemma 1. The following inequalities are valid for F :

αx + (hσi (α) − hσp+1 (α))zσi ≥ hσi (α), i = 1, . . . , p . (8)

The proof of this result is almost identical to an argument in [19] and follows
from the observation that zk = 0 implies αx ≥ hk (α), whereas (3c) implies that
zk = 0 for at least one of the p + 1 largest values of hk (α).
Now, as was done in [19,8], we can apply the star inequalities of [25], or
equivalently in this case, the mixing inequalities of [26] to “mix” the inequalities
(8) to obtain additional strong valid inequalities.
Theorem 1 ([25,26]). Let T = {t1 , t2 , . . . , tl } ⊆ {σ1 , . . . , σp } be such that
hti (α) ≥ hti+1 (α) for i = 1, . . . , l, where htl+1 (α) := hσp+1 (α). Then the
inequality
l
αx + (hti (α) − hti+1 (α))zti ≥ ht1 (α) (9)
i=1

is valid for F .
These inequalities are strong in the sense that, if we consider the set Y defined
by
 
Y = (y, z) ∈ R × {0, 1}p | y + (hσi (α) − hσp+1 (α))zσi ≥ hσi (α), i = 1, . . . , p ,

then the inequalities (9), with αx replaced by y, define the convex hull of Y [25].
Furthermore, the inequalities of Theorem 1 are facet-defining for the convex hull
of Y (again with y = αx) if and only if ht1 (α) = hσ1 (α), which suggests that
when searching for a valid inequality of the form (9), one should always include
σ1 ∈ T . In particular, the valid inequalities

αx + (hσ1 (α) − hσi (α))zσ1 + (hσi (α) − hσp+1 (α))zσi ≥ hσ1 (α), i = 1, . . . , p ,
(10)
dominate the inequalities (8) which can be obtained by aggregating (10) with
the valid inequalities zσ1 ≤ 1 with a weight of (hσ1 (α) − hσi (α)) on the latter.
Integer Programming and Decomposition for Chance Constraints 277

Theorem 1 presents an exponential family of valid inequalities, but given a


point (x̂, ẑ) separation of these inequalities can be accomplished very efficiently.
In [25] an algorithm based on finding a longest path in an acyclic graph is
presented that has complexity O(p2 ), and [26] gives an O(p log p) algorithm. We
use the algorithm of [26].

4 Decomposition Algorithm
We are now ready to describe the branch-and-cut decomposition algorithm. The
algorithm works with a master relaxation defined as follows:
RP∗ (N0 , N1 , C) := min f (x) (11a)

N
s.t. zk ≤ p, (x, z) ∈ C, x ∈ X, z ∈ [0, 1]N , (11b)
k=1
zk = 0, k ∈ N0 , zk = 1, k ∈ N1 . (11c)

Here N0 is the set of binary variables currently fixed to 0, N1 is the set of binary
variables currently fixed to 1, and C is the relaxation defined by all the globally
valid inequalities added so far when the relaxation is solved. At the root node
in the search tree, we set N0 = N1 = ∅ and C = Rn×N .
Algorithm 1 presents a simple version of the proposed approach. The algo-
rithm is a basic branch-and-bound algorithm, with branching being done on the
binary variables zk , with the only important difference being how the nodes are
processed (Step 2 in the algorithm). In this step, the current node relaxation
(11) is solved repeatedly until no cuts have been added to the description of
C or the lower bound exceeds the incumbent objective value U . Whenever an
integer feasible solution ẑ is found, and optionally otherwise, the cut separation
routine SepCuts is called. The SepCuts routine must be called when ẑ is integer
feasible to check whether the solution (x̂, ẑ) is truly feasible to the set F . The
routine is optionally called otherwise to possibly improve the lower bound.
The SepCuts routine, described in Algorithm 2, attempts to find strong vio-
lated inequalities using the approach described in §3. The key here is the method
for selecting the coefficients α that are taken as given in §3. The idea is to con-
sider all scenarios k such that ẑk = 0, so that the associated constraints x ∈ Pk
are supposed to be satisfied, and for such scenarios test whether indeed this
holds. If x̂ ∈ Pk , then the condition that ẑk = 0 should imply x̂ ∈ Pk is not
violated. However, if x̂ ∈
/ Pk , this contradicts the value of ẑk , and hence we seek
to find an inequality that cuts off this infeasible solution. We therefore find an
inequality, say αx ≥ β, that is facet-defining for Pk , and that separates x̂ from
Pk . We then use the coefficients α to generate one or more strong valid inequal-
ities as derived in §3. While stated as two separate steps, the test of x̂ ∈ Pk
(line 3) and subsequent finding of a facet-defining inequality of Pk that cuts off
x̂ if not would typically be done together. For example, if we have an inequality
description of Pk (possibly in a lifted space such as in (2)) then this can be ac-
complished by solving an appropriate linear program. If Pk has special structure
278 J. Luedtke

Algorithm 1. Simple version of branch-and-cut decomposition algorithm.


1 t ← 0, N0 (0) ← ∅, N1 (0) ← ∅, C ← Rn×N , OPEN ← {0}, U ← +∞;
2 while OPEN = ∅ do
3 Step 1: Choose l ∈ OPEN and let OPEN ← OPEN \ {l};
4 Step 2: Process node l;
5 repeat
6 Solve (11);
7 if (11) is infeasible then
8 CUTFOUND ← FALSE;
9 else
10 Let (x̂, ẑ) be an optimal solution to (11), and
lb ← RP∗ (N0 (l), N1 (l), C);
11 if ẑ ∈ {0, 1}N then
12 CUTFOUND ← SepCuts(x̂, ẑ, C);
13 if CUTFOUND = FALSE then U ← lb;
14 else
15 CUTFOUND ← FALSE;
16 Optional: CUTFOUND ← SepCuts(x̂, ẑ, C);
17 end
18 end
19 until CUTFOUND = TRUE or lb ≥ U ;
20 Step 3: Branch if necessary;
21 if lb < U then
22 Choose k ∈ N such that ẑk ∈ (0, 1);
23 N0 (t + 1) ← N0 (l) ∪ {k}, N1 (t + 1) ← N1 (l);
24 N0 (t + 2) ← N0 (l), N1 (t + 2) ← N1 (l) ∪ {k};
25 t ← t + 2;
26 end
27 end

(such as the constraint set of a shortest path problem) it may be accomplished


with a specialized (e.g., combinatorial) algorithm.
Observe that, in line 2 of Algorithm 2, we actually test whether x̂ ∈ Pk for
any k such that ẑk < 1. To obtain a convergent algorithm, it would be sufficient
to check only those k such that ẑk = 0; we also optionally check k such that
ẑk ∈ (0, 1) in order to possibly generate additional strong valid inequalities. We
now establish that Algorithm 1 solves (5).

Theorem 2. Algorithm 1 terminates finitely, and at termination if U = +∞,


problem (5) is infeasible, otherwise U is the optimal value of (5).

Proof (Sketch). The details of the proof are left out of this extended abstract.
However, the first main point is that the algorithm terminates finitely because it
is based on branching on a finite number of binary variables, and the processing
of each node terminates finitely because the valid inequalities are derived from
a finite number of facet-defining inequalities (for the polyhedral sets Pk ). The
Integer Programming and Decomposition for Chance Constraints 279

Algorithm 2. Cut separation routine SepCuts(x̂, ẑ, C).


Data: x̂, ẑ, C
Result: If valid inequalities for F are found that are violated by (x̂, ẑ), adds
these to description of C and returns TRUE, else returns FALSE.
1 CUTFOUND ← FALSE;
2 for k ∈ N such that ẑk < 1 do
3 if x̂ ∈
/ Pk then
4 Separate x̂ from Pk : Find an inequality αx ≥ β that is facet-defining for
Pk such that αx̂ < β;
5 Using the coefficients α, find a violated inequality for F of the form (9)
that is violated by (x̂, ẑ) and add this to the description of C;
6 CUTFOUND ← TRUE;
7 Optionally break;
8 end
9 end
10 return CUTFOUND;

second point is that the algorithm never cuts off an optimal solution because the
branching never excludes part of the feasible region and only valid inequalities
for the set F are added. The final point is that no solutions that are not in the
feasible region F are accepted for updating the incumbent objective value U (in
line 13 of the algorithm) because the SepCuts routine is always called for integer
feasible solutions ẑ and it can be shown that it is guaranteed to find a violated
inequality if (x̂, ẑ) ∈
/ F.

Aside from solving the master relaxation (11) the main work of Algorithm 1
happens within the SepCuts routine. An advantage of this approach is that
most of this work is done for one scenario at a time and can be implemented
to be done in parallel. In particular, checking whether x̂ ∈ Pk (and finding a
violated facet-defining if not) for any k such that ẑk < 1 can be done in parallel.
The subsequent work of generating a strong valid inequality is dominated by
calculation of the values hk (α) as in (7), which can also be done in parallel.
We have stated our approach in relatively simple form in Algorithm 1. How-
ever, as this approach is essentially a variant of branch-and-cut for solving a
(particularly structured) integer programming problem, we can also use all the
computational enhancements commonly used in such algorithms. In particular,
using heuristics to find good feasible solutions early and using some sort of
pseudocost branching [27], strong branching [28], or reliability branching [29]
approach for choosing which variable to branch on would be important. In our
implementation (described in §5.2) we have embedded the key cut generation
step of our algorithm within the CPLEX commercial integer programming solver
which has such enhancements already implemented.
In our definition of the master relaxation (11), we have enforced the con-
straints x ∈ X. If, X is a polyhedron and f (x) is linear, (11) is a linear program.
However, if X is not a polyhedron, suitable modifications to the algorithm could
280 J. Luedtke

be made to ensure that the relaxations solved remain linear programming prob-
lems. For example, if X is defined by a polyhedron Q with integrality constraints
on some of the variables, then we could instead define the master relaxation to
enforce x ∈ Q, and then perform branching both on the integer-constrained x
variables and on the zk variables. Such a modification is also easy to imple-
ment within existing integer programming solvers. Note also that, in this case,
Q would be a natural choice for the relaxation X̄ of X used in §3 when obtaining
the hk (α) values as in (7).

5 Preliminary Computational Results


5.1 Call Center Staffing Problem
We tested our approach on randomly generated instances of a call center staffing
problem recently studied in [30]. In this problem the staffing levels of different
types of available servers (xi for i = 1, . . . , n) must be set before knowing what
the actual arrival rates of the customers will be. The routing of arriving cus-
tomers to servers, however, can be done as the arrivals are observed. In [30], a
static and fluid approximation of the second-stage dynamic routing problem is
used in which servers are simply (fractionally) allocated to customers, leading
to the following formulation:
 - 
min cx - P{x ∈ P (Λ)} ≥ 1 − , x ∈ Rn+ ,
where c ∈ Rn+ represent the staffing costs, Λ is a m-dimensional random vector
of arrival rates, and
 
m 
n

P (λ) = x ∈ Rn+ | ∃y ∈ Rnm
+ s.t. yij ≤ xi , ∀i, μij yij ≥ λj , ∀j . (12)
j=1 i=1

Here μij is the service rate of server type i when serving customer type j (μij = 0
if server type i cannot serve customer type j). This formulation aims to choose
minimum cost staffing levels such that the probability of meeting quality of
service targets is high.
When generating the test instances, we first generated the service rates, and
the mean and covariance of the arrival rate vector. We then generated the cost
vector in such a way that “more useful” server types were generally more ex-
pensive, in order to make the solutions nontrivial. Finally, to generate specific
instances with finite support, we sampled N joint-normally distributed arrival
rate vectors independently using the fixed mean and covariance matrix for vari-
ous sample sizes N . In all our test instances we use = 0.1 as the risk tolerance.
We see that this problem has the two-stage structure given in (2), and hence
available methods for finding exact solutions (or even any solution with a bound
on optimality error) are very limited. However, we point out that the form of
P (λ) still possesses some special structure in that the second-stage constraints
have no random coefficients (i.e., in the form of (2) the matrices T k and W k
do not vary with k). In addition, the constraints x ∈ X are very simple for
Integer Programming and Decomposition for Chance Constraints 281

this problem; we simply have X = Rn+ . Thus, while this test problem is certainly
beyond the capabilities of existing approaches, it is not yet a test of the algorithm
in the most general settings.
Technically, this problem does not satisfy our assumptions given in §2 because
the sets Rn+ ∩ P (λ) are not bounded. However, our approach really only requires
that the optimal solutions to (7) always exist for any coefficient vector α of a
facet-defining inequality for P (λ). As valid inequalities for P (λ) necessarily have
non-negative coefficients, this clearly holds.

5.2 Implementation Details


We implemented our approach within the commercial integer programming
solver CPLEX 11.2. The main component of the approach, separation of valid
inequalities of the form (9), was implemented within a cut callback that CPLEX
calls whenever it has finished solving a node (whether the solution is integer
feasible or not) and also after it has found a heuristic solution. In the feasibility
checking phase of the SepCuts routine (line 2) we searched for k with ẑk < 1 and
x∈/ Pk in increasing order of ẑk (so, in particular we always first check the sce-
narios k with ẑk = 0). For the first such k we find (and only the first) we add all
the violated valid inequalities of the form (10) as well as the single most violated
inequality of the form (9). Our motivation for adding the inequalities (10) is that
they are sparse and this is a simple way to add additional valid inequalities in
one round; we found that doing this yielded somewhat faster convergence.

5.3 Results
We compared our algorithm against the Big-M formulation (3) (with the M
values chosen as small as possible) and also against a basic decomposition algo-
rithm that does not use the strong valid inequalities of §3. We compare against
this simple decomposition approach to understand whether the success of our
algorithm is due solely to the decomposition, or whether the strong inequalities
are also important. The difference between the basic decomposition algorithm
and the strengthened version is in the type of cuts that are added in the SepCuts
routine. Specifically, in the case of an uncertainty set Pk of the form (12), if we
find a scenario k with ẑk = 0, and a valid inequality αx ≥ β for the set Pk that
is violated by x̂, the basic decomposition algorithm simply adds the inequality

αx ≥ βzk .

It is not hard to see that when the sets Pk have the form (12), this inequality is
valid for F because x ≥ 0 and any valid inequality for Pk has α ≥ 0. Furthermore,
this inequality successfully cuts off the infeasible solution (x̂, ẑ).
Table 1 presents the results of these three approaches for varying problem
size in terms of number of agent types (n), number of customer types (m) -
which is also the dimension of the random vector Λ, and number of scenarios N .
These tests were done with a time limit of one hour. Unless stated otherwise,
282 J. Luedtke

Table 1. Results on call center staffing instances; solution time (sec) or final optimality
gap (%)

n m N Big-M Basic Decomp Strong Decomp


10 20 500 23.0% 1752a 2.9
1000 27.3% 5.5% 17.3
2000 - 10.1% 143.4
20 30 1000 28.9%b 7.3% 11.8
2000 - 16.7% 27.5
3000 - 24.3% 73.9
40 50 1000 - 16.2% 65.3
2000 - 24.1% 190.9
3000 - 28.7% 256.3
a
Average based on nine instances that solved in time limit.
b
Gap for one instance, remaining nine instances failed.

each entry is an average over ten randomly generated samples from the same
base underlying instance (i.e., the instance is fixed, but ten different samples of
the N scenarios are taken). The big-M formulation (3) only successfully solves
the LP relaxation and finds a feasible solution for the two smallest instance
sizes. The entries ‘-’ in the other cases mean that either no solution was found
in the time limit, or that the LP relaxation did not solve in the time limit.
For the largest instances, CPLEX failed with an out-of-memory error before
the time limit was reached. Using the basic decomposition approach makes a
significant improvement over the big-M formulation in that feasible solutions
are now found for all instances. However, only the smallest of the instances (and
only 9 of 10 of them) could be solved to optimality, and the larger instances had
very large optimality gaps after the limit. Combining decomposition with strong
valid inequalities (“Strong Decomp” in the table) we are able to solve all the
instances to optimality in an average of less than five minutes.
To understand these results a little better, we present in Table 2 the root
gaps (relative to the optimal values) after all cuts have been added for the

Table 2. Average root gaps and nodes for decomposition approaches

Root gap (%) Nodes


n m N Basic Strong Basic Strong
10 20 500 20.3% 0.00% 22969 0
1000 20.1% 0.01% 15034 0
2000 19.5% 0.01% 4641 6.7
20 30 1000 20.2% 0.00% 6271 0
2000 19.9% 0.01% 557 0
3000 20.4% 0.00% 399 0.1
40 50 1000 20.1% 0.00% 878 0.3
2000 20.7% 0.00% 101 0.1
3000 20.7% 0.00% 13 1
Integer Programming and Decomposition for Chance Constraints 283

two decomposition approaches. We also present the average number of nodes


processed in each approach (to the time limit for the basic approach, and to
optimality for our approach). It is clear that the strong valid inequalities lead
to very strong relaxations for this particular problem, and hence very few nodes
need to be explored. In comparison, for the smallest instance size, in which
the basic decomposition approach can solve most of the instances, the average
number of nodes required is over 20,000. (The smaller number of nodes for the
larger instances merely reflects that fewer could be processed in the time limit.)

6 Discussion
We have presented a promising approach for solving general CCMPs, although
additional computational tests are needed on problems having more general
structures than the test problem we considered. The approach uses both de-
composition, to enable processing subproblems corresponding to one scenario at
a time, and integer programming techniques, to yield strong valid inequalities.
From a stochastic programming perspective, it is not surprising that decom-
position is necessary to yield an efficient algorithm, as this is well-known for
traditional two-stage stochastic programs. From an integer programming per-
spective, it is not surprising that using strong valid inequalities has an enormous
impact. The approach presented here represents a successful merger of these
approaches to solve CCMPs.

Acknowledgments. The author thanks Shabbir Ahmed for the suggestion to


compare the presented approach with a basic decomposition algorithm. This
research has been supported in part by the National Science Foundation under
grant CMMI-0952907.

References
1. Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with
probabilistic constraints. SIAM J. Optim. 19, 674–699 (2008)
2. Charnes, A., Cooper, W.W., Symonds, G.H.: Cost horizons and certainty equiva-
lents: an approach to stochastic programming of heating oil. Manage. Sci. 4, 235–
263 (1958)
3. Prékopa, A.: On probabilistic constrained programmming. In: Kuhn, H.W. (ed.)
Proceedings of the Princeton Symposium on Mathematical Programming, Prince-
ton, NJ, pp. 113–138. Princeton University Press, Princeton (1970)
4. Charnes, A., Cooper, W.W.: Deterministic equivalents for optimizing and satisfic-
ing under chance constraints. Oper. Res. 11, 18–39 (1963)
5. Calafiore, G., El Ghaoui, L.: On distributionally robust chance-constrained linear
programs. J. Optim. Theory Appl. 130, 1–22 (2006)
6. Beraldi, P., Ruszczyński, A.: The probabilistic set-covering problem. Oper. Res. 50,
956–967 (2002)
7. Dentcheva, D., Prékopa, A., Ruszczyński, A.: Concavity and efficient points of dis-
crete distributions in probabilistic programming. Math. Program. 89, 55–77 (2000)
284 J. Luedtke

8. Luedtke, J., Ahmed, S., Nemhauser, G.L.: An integer programming approach for
linear programs with probabilistic constraints. Math. Program. 12, 247–272 (2010)
9. Saxena, A., Goyal, V., Lejeune, M.: MIP reformulations of the probabilistic set
covering problem. Math. Program. 121, 1–31 (2009)
10. Ruszczyński, A.: Probabilistic programming with discrete distributions and prece-
dence constrained knapsack polyhedra. Math. Program. 93, 195–215 (2002)
11. Tanner, M., Ntaimo, L.: IIS branch-and-cut for joint chance-constrained programs
with random technology matrices (2008)
12. Ben-Tal, A., Nemirovski, A.: Robust solutions of linear programming problems
contaminated with uncertain data. Math. Program. 88, 411–424 (2000)
13. Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52, 35–53 (2004)
14. Calafiore, G., Campi, M.: Uncertain convex programs: randomized solutions and
confidence levels. Math. Program. 102, 25–46 (2005)
15. Nemirovski, A., Shapiro, A.: Scenario approximation of chance constraints. In:
Calafiore, G., Dabbene, F. (eds.) Probabilistic and Randomized Methods for Design
Under Uncertainty, pp. 3–48. Springer, London (2005)
16. Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained pro-
grams. SIAM J. Optim. 17, 969–996 (2006)
17. Erdoğan, E., Iyengar, G.: Ambiguous chance constrained problems and robust op-
timization. Math. Program. 107, 37–61 (2006)
18. Erdoğan, E., Iyengar, G.: On two-stage convex chance constrained problems. Math.
Meth. Oper. Res. 65, 115–140 (2007)
19. Luedtke, J., Ahmed, S., Nemhauser, G.: An integer programming approach for
linear programs with probabilistic constraints. In: Fischetti, M., Williamson, D.P.
(eds.) IPCO 2007. LNCS, vol. 4513, pp. 410–423. Springer, Heidelberg (2007)
20. Birge, J., Louveaux, F.: Introduction to stochastic programming. Springer, New
York (1997)
21. Van Slyke, R., Wets, R.J.: L-shaped linear programs with applications to optimal
control and stochastic programming. SIAM J. Appl. Math. 17, 638–663 (1969)
22. Higle, J.L., Sen, S.: Stochastic decomposition: an algorithm for two-stage stochastic
linear programs. Math. Oper. Res. 16, 650–669 (1991)
23. Shen, S., Smith, J., Ahmed, S.: Expectation and chance-constrained models and
algorithms for insuring critical paths (2009) (submitted for publication)
24. Codato, G., Fischetti, M.: Combinatorial benders’ cuts for mixed-integer linear
programming. Oper. Res. 54, 756–766 (2006)
25. Atamtürk, A., Nemhauser, G.L., Savelsbergh, M.W.P.: The mixed vertex packing
problem. Math. Program. 89, 35–53 (2000)
26. Günlük, O., Pochet, Y.: Mixing mixed-integer inequalities. Math. Program. 90,
429–457 (2001)
27. Linderoth, J., Savelsbergh, M.: A computational study of search strategies for
mixed integer programming. INFORMS J. Comput. 11, 173–187 (1999)
28. Applegate, D., Bixby, R., Chvátal, V., Cook, W.: Finding cuts in the TSP. Tech-
nical Report 95-05, DIMACS (1995)
29. Achterberg, T., Koch, T., Martin, A.: Branching rules revisited. Oper. Res. Lett. 33,
42–54 (2004)
30. Gurvich, I., Luedtke, J., Tezcan, T.: Call center staffing with uncertain arrival
rates: a chance-constrained optimization approach. Technical report (2009)
An Effective Branch-and-Bound Algorithm for
Convex Quadratic Integer Programming

Christoph Buchheim1,2 , Alberto Caprara2, and Andrea Lodi2


1
Fakultät für Mathematik, Technische Universität Dortmund
Vogelpothsweg 87, D-44227 Dortmund, Germany
[email protected]
2
DEIS, Università di Bologna
Viale Risorgimento 2, I-40136 Bologna, Italy
{alberto.caprara,andrea.lodi}@unibo.it

Abstract. We present a branch-and-bound algorithm for minimizing a


convex quadratic objective function over integer variables subject to con-
vex constraints. In a given node of the enumeration tree, corresponding
to the fixing of a subset of the variables, a lower bound is given by the
continuous minimum of the restricted objective function. We improve
this bound by exploiting the integrality of the variables using suitably-
defined lattice-free ellipsoids. Experiments show that our approach is
very fast on both unconstrained problems and problems with box con-
straints. The main reason is that all expensive calculations can be done
in a preprocessing phase, while a single node in the enumeration tree can
be processed in linear time in the problem dimension.

1 Introduction
Nonlinear integer optimization has attracted a lot of attention recently. Besides
its practical importance, it is challenging from a theoretical and methodological
point of view. While intensive research has led to tremendous progress in the
practical solution of integer linear programs in the last decades [9], practical
methods for the nonlinear case are still rare [5]. This is true even in special
cases, such as Convex Quadratic Integer Programming (CQIP):

min f (x) = x Qx + L x + c
(1)
s.t. x ∈ Zn ∩ X ,

where Q is an n × n positive definite symmetric matrix, L ∈ Rn , c ∈ R, and


X ⊆ Rn is a convex set for which membership can be tested in polynomial time.
Positive definiteness of Q guarantees strict convexity of f .

1.1 The Two Applications Considered


Our original motivating application is in Electronics and arises in the develop-
ment of pulse coders for actuation, signal synthesis and audio amplification. The

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 285–298, 2010.

c Springer-Verlag Berlin Heidelberg 2010
286 C. Buchheim, A. Caprara, and A. Lodi

aim is to synthesize periodic waveforms by either bipolar or tripolar pulse codes.


The latter problem amounts to solving CQIP for X = [−1, 1]n . In other words,
each variable can take three different values {−1, 0, 1}, leading to a ternary
CQIP. The real problem is called Filtered Approximation and corresponds to
finding a discrete-valued x(kT ) so that the discrete-time continuous signal w(kT )
and x(kT ) are as similar as possible once filtered through a discrete-time, lin-
ear, causal filter H(z) with three states. This must be done in the time interval
[n1 T, n2 T ] where n2 − n1 = n in the mapping of FA to CQIP. Moreover, the
mapping shows that Q only depends on the filter H(z) and it is then strictly
positive definite because the quantity xT Qx can be interpreted as the energy
over the period of x(nT ) filtered by H(z) (see, e.g., [2]).
Besides the original motivation, if Q is positive definite and X = Rn , CQIP is
equivalent to the Closest Vector Problem (CVP), which, given a basis b1 , . . . , bn
of Rn and an additional vector v, calls for an integer linear combination of the
vectors of the basis which is as close as possible (with respect to the Euclidean
distance) nto v. Equivalently, the problem calls for scalars λ1 , . . . , λn ∈ Z such
that  i=1 λi bi − v2 is minimized. It is elementary to check that this amounts
to solving (1) for a positive definite matrix Q = B  B, where B is the n × n
matrix whose columns are b1 , . . . , bn . Viceversa, given an instance of (1) in which
Q is positive definite and symmetric, a corresponding CVP instance is obtained
by computing a Cholesky decomposition of Q = B  B, which yields the basis
b1 , . . . , bn , and by defining v accordingly. This problem has a wide relevance
from both the theoretical and practical viewpoints. Specifically, it is very hard
to approximate and it is used in cryptosystems (see, e.g., [10]).

1.2 Literature Review and State-of-the-Art


It is clear that the ternary CQIP is strongly related to at least two famous
combinatorial optimization problems, namely Unconstrained Binary Quadratic
Programming (UBQP) and Maximum Cut (MC). More precisely, UBQP is the
special case of CQIP in which X = [0, 1]n , while, given a graph G = (V, E) and
an edge function c ∈ RE , MC is to find a cut δ(W ) of G having maximum weight
c(δ(W )). Although UBQP and MC have been treated in an almost separate way
in the literature, a well-known result by [13] shows that they have in fact the
same polyhedral description. Thus, one might be tempted to use the available
algorithmic technology (as well as the software) to solve CQIP. We tried three
different approaches, one based on the rooted-semimetric MC relaxation [4,12],
one based on reduction to a binary CQIP by the obvious replacement xi =

x+i − xi for i = 1, . . . , n, and one based on convex MINLP approaches using
Bonmin [1]. These three approaches performed quite poorly, and were widely
outperformed by the direct application of the CPLEX MIQP solver [6] to the
original problem, which was able to solve instances with n up to 50 for our real-
world application in Electronics, too slowly however to be of practical use to
engineers.
Regarding the second application, we are not aware of any computational
work for the unconstrained CQIP, i.e., for CVP. Thus, for such a problem as
An Effective Branch-and-Bound Algorithm for CQIP 287

well we used CPLEX MIQP as a reference, which was able to solve instances
with n up to about 55.

1.3 Our Contribution


In this paper, we present a branch-and-bound algorithm for CQIP that is very
fast at processing nodes but still computes reasonable lower bounds. The main
observation behind the method is that, in the unconstrained case X = Rn , strict
convexity leads to a unique continuous minimum x̄ of f over Rn which is easy
to compute, yielding a lower bound that is of course valid also for any X. This
bound can be improved by taking the integrality constraints into account, using
lattice-free ellipsoids. Roughly speaking, the general idea is to center a given
ellipsoid E in x̄ and to compute the value λ such that the scaled ellipsoid λE
contains at least one integer point on its border and no integer point in its
interior. This can be done quickly if E is chosen appropriately. Then we find
the minimum of the function f over the border of λE, which yields an improved
lower bound on f .
Our enumeration strategy is depth-first, branching by fixing the value of one
of the n variables. A crucial property of our algorithm is that we restrict the
order in which variables are fixed. In other words, the set of variables fixed only
depends on the depth of the node in the tree. We thus loose the flexibility of
choosing the best branching variable, but this strategy allows us to process a
single node in the tree much faster. Our main observation is that all expensive
calculations to be performed in a node actually only depend on the depth d,
i.e., on the set of variables fixed, but not on the particular values to which
variables are fixed. This allows us to move these calculations to a preprocessing
phase. We will show that, after this preprocessing, the running time per node is
only O(n − d), i.e., sublinear in the problem input size which is Θ(n2 ).
Experimentally, we show that our approach leads to the solution of large CQIP
instances for our real-world Filtered Approximation application in Electronics,
with n up to about 120, allowing engineers to validate their practical approaches
[2]. For the CVP we solve instances with n up to about 70. Even for the largest
instances we can solve, our algorithm is able to process around 400, 000 nodes
per second.
In Section 2 we discuss our methods for computing lower bounds, explaining
how to incorporate constraints into our framework, whereas in Section 3 we
present an outline of the overall branch-and-bound algorithm, illustrating how
to compute the lower bounds in linear time. Finally, in Section 4 we present
computational results for our algorithm. For space reasons, proofs are deferred
to the full paper.

1.4 Basic Definitions and Notation


We will denote scalars by lower case letters, matrices by upper case letters, and
vectors by both lower and upper case letters. Given a (column) vector x ∈ Rn ,
we will let x denote its transposed (row) vector and xi its ith component. In
288 C. Buchheim, A. Caprara, and A. Lodi

some cases it will be convenient to use apices to indicate vectors, such as xi , and
in this case we will denote the transposed vector by (xi ) . As customary, given
a matrix Q we will let qi,j denote its entry in the ith row and jth column. Given
a scalar a ∈ R, we will let a denote the integer value closest to a. Analogously,
given a vector x we will let x̄ denote the componentwise rounding of x to the
nearest integer.
A box X ⊆ Rn is a set of the form X = {x ∈ Rn : l ≤ x ≤ u}, where l, u ∈ Rn ,
l ≤ u. Let Q be a positive semidefinite matrix. For x ∈ Rn we consider the
corresponding ellipsoid

E(Q , x ) := {x ∈ Rn : (x − x ) Q (x − x ) ≤ 1} ,

which is the translation of E(Q ) := E(Q , 0) by the vector x . Moreover, for


α ∈ R+ , we let αE(Q , x ) denote E(Q , x ) scaled by α with respect to the
center x , i.e.,

αE(Q , x ) := x + αE(Q ) = {x + αx : x ∈ E(Q )} .

Given a closed convex set X, we let int X denote the interior of X and bd X
the border of X.

2 Lower Bounds
As anticipated, the main inspiring observation for our method is that the com-
putation of the minimum of the objective function (neglecting all constraints
including integrality) simply requires solving a system of linear equations.
Remark 1. The unique minimum of f (x) = x Qx+ L x+ c in case Q is positive
definite is attained at x̄ = − 21 Q−1 L and has value c − 14 L Q−1 L. Moreover, for
every x ∈ Rn , f (x) = f (x̄) + (x − x̄)T Q(x − x̄).
Our aim in this section is to get stronger bounds by exploiting the integrality of
the variables and possibly the structure of X.

2.1 The Unconstrained Case


For a given x ∈ Rn , we let

μ(Q , x ) := max{α : (int αE(Q , x ))∩Zn = ∅} = min{α : αE(Q , x )∩Zn = ∅} ,

be the scaling factor α such that the ellipsoid αE(Q , x ) contains some integer
point on its border but no integer point in its interior.
Observation 1. μ(Q , x ) = max{α : (x − x ) Q (x − x ) ≥ α2 for each x ∈
Zn }.
An Effective Branch-and-Bound Algorithm for CQIP 289

Fig. 1. Improving the lower bounds: the light gray ellipsoid is E(Q , x̄) scaled
by μ(Q , x̄); the dark gray ellipsoid is E(Q, x̄) scaled by λ(Q, Q )μ(Q , x̄)

Note that, given our objective function f (x) = x Qx + L x + c and the asso-
ciated continuous minimum x̄, the level sets of f (x) are precisely the borders of
ellipsoids of the form αE(Q, x̄). Given this, it is easy to visualize the fact that
finding the integer point that minimizes f is equivalent to scaling E(Q, x̄) by α
starting from α = 0 and stopping as soon as the border of αE(Q, x̄) contains
an integer point. This is the same as computing μ(Q, x̄). Since this is as hard as
solving (1) when X = Rn , we rather do the same scaling for some other ellipsoid
E(Q , x̄), and then scale E(Q, x̄) in turn until it touches the border of the first
ellipsoid, see Figure 1. This requires one to be able to compute μ(Q , x̄) as well
as the maximum α ∈ R+ such that αE(Q) is contained in E(Q ):
λ(Q, Q ) := max{α : αE(Q) ⊆ E(Q )} ,
noting that this latter value has nothing to do with x̄.
=
Observation 2. λ(Q, Q ) = max{α : Q − α2 Q  0} = min{1/ x Q x : x ∈
E(Q)}.
Proposition 1. Given f (x) = x Qx + L x + c with Q positive definite and
continuous minimum x̄ and a positive semidefinite matrix Q of the same size
as Q,
min{f (x) : x ∈ Zn } ≥ f (x̄) + λ2 (Q, Q )μ2 (Q , x̄) .
Note that, in order to find hopefully strong lower bounds, one would like to have
matrices Q such that on the one hand E(Q ) is as close as possible to E(Q) and
on the other μ(Q , x̄) is fast to compute. It is particularly fast to compute μ(Q , x̄)
if Q is a split, i.e., if Q = dd for some vector d ∈ Zn \ {0} with d1 , . . . , dn
coprime. In this case, we have
Observation 3. μ(dd , x ) = |d x − d x |.
In order to derive strong lower bounds, we aim at splits Q that yield large
factors λ(Q, Q ). To this end, we consider flat directions of the ellipsoid E(Q),
i.e., vectors d ∈ Zn \ {0} minimizing the width of E(Q) along d, defined as
max{d x : x ∈ E(Q)} − min{d x : x ∈ E(Q)} = 2 max{d x : x ∈ E(Q)} .
290 C. Buchheim, A. Caprara, and A. Lodi

Observation 4. d ∈ Zn \ {0} maximizes λ(Q, dd ) if and only if it is a flat


direction of E(Q).
The following remark is stated explicitly, e.g., in [3].
Remark 2. If Q = B  B then the width of E(Q) along d is given by 2d B −1 .
In other words, finding a flat direction of E(Q) is equivalent to finding the coeffi-
cients d1 , . . . , dn yielding a shortest non-zero vector in the lattice generated by the
columns of (B −1 ) , which is well known to be NP-hard. A natural heuristic to com-
pute short vectors is obtained by taking as candidates the vectors in a reduced basis
of the lattice. Accordingly, we compute such a reduced basis by the LLL algorithm.
If t1 , . . . , tn ∈ Zn \ {0} are the columns of the corresponding transformation ma-
trix T , from the original basis to the reduced basis, we use the splits Qi := ti (ti )
and compute μ(Qi , x̄) = |(ti ) x̄ − (ti ) x̄| as inObservation 3.
Moreover, we consider the matrix Q0 :=  
n 2
i=1 λ (Q, Qi )Qi , so that the
ellipsoid E(Q0 ) is axis-parallel with respect to t1 , . . . , tn .
=n
Observation 5. μ(Q0 , x ) = 2  2  
i=1 λ (Q, Qi )μ (Qi , x ).

Note that λ(Q, Q0 ) can be strictly smaller than one, so that the lower bound
derived from Q0 ,

n
f (x̄) + λ2 (Q, Q0 )μ2 (Q0 , x̄) = f (x̄) + λ2 (Q, Q0 ) λ2 (Q, Qi )μ2 (Qi , x̄) ,
i=1

can be weaker than the bound f (x̄) + λ2 (Q, Qi )μ2 (Qi , x̄) derived from Qi for
some i ≥ 1. In general, which Qi gives the strongest lower bound depends on
the position of x̄.
Example 1. We illustrate the ideas in Section 2 by an example in the plane. Let
        
1 −2 1 0 1 −2 1 −2 −1 1 −1
Q= = , B= , B = .
−2 8 −2 −2 0 −2 0 −2 0 −1/2
The ellipse E(Q) is shown in Figure 2. Short vectors of the lattice generated by
the rows of B −1 , the vectors (1, −1) and (0, −1/2), are (0, −1/2) and (1, 0) .
These correspond to the transformation matrix
   
0 1 −2 1
T = , T −1 = ,
1 −2 10

and hence to the (hopefully) flat directions (0, 1) and (1, −2) . Thus
   
00 1 −2
Q1 = , Q2 =
01 −2 4
and λ(Q, Q1 ) = 2, λ(Q, Q2 ) = 1. The ellipses E(Q1 ) and E(Q2 ) are illustrated
in Figure 2. Finally, in this case we are lucky to obtain Q0 = 4Q1 + Q2 = Q, so
that the improved lower bound given by Q0 agrees with the optimal solution of
the problem, independently of L and c.
An Effective Branch-and-Bound Algorithm for CQIP 291

(a) (b) (c)

Fig. 2. The ellipse E(Q) in (a); the split E(Q1 ) given by (1, 0) in (b); the split E(Q2 )
given by (1, −2) in (c)

2.2 Dealing with Constraints

Even if X is a box, determination of the continuous minimum of f over X is


much more complex than in the unconstrained case [11] and would not fit well
within our method, which is aimed at exploring branch-and-bound nodes quickly.
On the other hand, even if we stick to the computation of the unconstrained
continuous minimum, we can take into account the structure of X in the lower
bound improvement of Section 2.1.
Assume we are given an arbitrary X and a linear optimization oracle for X ∩
Zn , which in many applications is a reasonable assumption (and trivially true
for a box). We can replace the definition of μ(Q , x ) by

μX (Q , x ) := max{μ : (int μE(Q , x )) ∩ X ∩ Zn = ∅} .

In other words, we scale E(Q , x ) until it touches a feasible point, instead of


simply any integer point. The counterpart of Proposition 1 reads:
Proposition 2. Given f (x) = x Qx + L x + c with Q positive definite and
continuous minimum x̄ and a positive semidefinite matrix Q of the same size
as Q,
min{f (x) : x ∈ X ∩ Zn } ≥ f (x̄) + λ2 (Q, Q )μ2X (Q , x̄) .
Unfortunately, in this case we cannot compute the factors exactly, even for splits,
in polynomial time. However, we can give lower bounds for μX (Q , x ) that im-
prove strongly over μ(Q , x ) in general.
Observation 6. Let dmin and dmax denote the minimum and maximum of d x
over X ∩ Zn , respectively. Then,

⎪    
⎨ |d x − d x |
μX (dd , x ) ≥ max d x − dmax .


dmin − d x

Letting Qi be defined as in the previous section for i = 0, . . . , n, this observation


applies to Qi with i = 1, . . . , n. We also have
292 C. Buchheim, A. Caprara, and A. Lodi
=n
Observation 7. μX (Q0 , x ) ≥ i=1 λ2 (Q, Qi )μ2X (Qi , x ).
Note that this approach is still correct if the oracle optimizes over X instead
of X ∩ Zn , or any set containing X ∩ Zn , but this may lead to weaker bounds.
Finally, note that the calculation of dmin and dmax is trivial if X is a box.

3 Outline of the Algorithm


Our algorithm is a branch-and-bound algorithm with a depth-first enumeration
strategy. Branching consists of fixing a single variable to an integer value, and is
illustrated in detail in the following. Assume that the next variable to be fixed
is xi . We consider the value x̄i of xi in the continuous minimum computed in the
current node. We fix xi to integer values by increasing distance from x̄i . More
precisely, if x̄i  = x̄i , the variable xi is fixed to

x̄i , x̄i , x̄i  − 1, x̄i  + 1, . . . (2)

while otherwise it is fixed to

x̄i , x̄i , x̄i  + 1, x̄i  − 1, . . . (3)

By the convexity of f and its symmetry with respect to x̄, the continuous minima
with respect to these fixings are non-decreasing, so that we can stop as soon as
one of these minima exceeds the current upper bound. In particular, we get a
finite algorithm even without bounds on the variables, since we assume that f
is strictly convex.
In order to enumerate subproblems as quickly as possible, our aim is to per-
form the most time-consuming computations in a preprocessing phase. In partic-
ular, having fixed d variables, we get the reduced objective function f¯ : Rn−d →
R of the form
f¯(x) = x Q̄d x + L̄ x + c̄ .
d d d
If xi is fixed to ri for i = 1, . . . , d, we have c̄ = c+ i=1 Li ri + i=1 j=1 qij ri rj
d
and L̄j−d = Lj + 2 i=1 qij ri for j = d + 1, . . . , n. On the other hand, the
matrix Q̄d is obtained from Q by deleting the first d rows and columns, and
therefore is positive definite and does not depend on the values at which the
first d variables are fixed.

3.1 Achieving Quadratic Time Per Node


For Q̄d , we need the inverse matrix and all factors λ(Q̄d , Q ). For this reason, we
do not change the order of fixing variables, i.e., we always fix the first unfixed
variable according to an order that is determined before starting the enumer-
ation. This implies that, in total, we only have to consider n different matri-
ces Q̄d , which we know in advance as soon as the fixing order is determined. (If
the variables to be fixed were chosen freely, the number of such matrices could
be exponential.)
An Effective Branch-and-Bound Algorithm for CQIP 293

Algorithm 1: Outline of the basic algorithm


Input: a strictly convex function f : Rn → R, x → x Qx + L x + c
Output: a vector x ∈ Zn minimizing f (x)
determine a variable order x1 , . . . , xn ;
let Q̄d be the submatrix of Q for rows and columns d + 1, . . . , n;
compute the inverse matrices Q̄−1 d for d = 1, . . . , n;
set d := 0, ub := ∞;
while d ≥ 0 do
define f¯ : Rn−d → R by f¯(x) := f ((r1 , . . . , rd , x1 , . . . , xn−d ));
compute L̄ and c̄ such that f¯(x) = x Q̄d x + L̄ x + c̄;
// compute lower bound
compute the continuous minimum x̄ := − 12 (Q̄−1 d L̄) ∈ R
n−d
of f¯;
set lb := f¯(x̄);
// compute upper bound
set rj := x̄j−d  for j = d + 1, . . . , n to form r ∈ Zn ;
apply primal heuristics to improve r;
// update solution
if f¯((rd+1 , . . . , rn )) < ub then
set r ∗ := r;
set ub := f¯((rd+1 , . . . , rn ));
end
// prepare next node
if lb < ub then
// branch on variable xd+1
set d := d + 1;
set rd := x̄1 ;
else
// always holds if d = n
// prune current node
set d := d − 1;
if d > 0 then
// go to next node
increment rd according to (2) or (3);
end
end
end

See Algorithm 1 for an outline of our method, for the case in which X = Rn
and we simply use the continuous lower bound. Clearly, the running time of this
algorithm is exponential in general. However, every node in the enumeration tree
can be processed in O(n2 ) time (if the primal heuristics obey the same runtime
bound), the bottleneck being the computation of the continuous minimum given
the pre-computed inverse matrix Q̄−1 d . Note that Algorithm 1 can easily be
adapted to the constrained case where X = Rn . In this case, we just prune all
nodes with invalid variable fixings.
294 C. Buchheim, A. Caprara, and A. Lodi

For the computation of stronger lower bounds as explained in Section 2, at


each node we consider the matrices Q̄0 , . . . , Q̄n derived from Q̄d . It is crucial that
the values λ(Q̄d , Q̄i ) can be computed in the preprocessing phase for each depth
d and for i = 0, . . . , n. In the unconstrained case, the running time per node is
then affected by the time needed to compute μ(Q̄i , x̄). This requires O(n) time
for i = 1, . . . , n and an additional O(n) time for i = 0, i.e., O(n2 ) time in total.
The same applies to the constrained case, where in each node we compute the
stronger bounds given by Observations 6 and 7. For this, we determine all timin
and timax in the preprocessing phase by calling the linear optimization oracle 2n2
times in total. After that, we only need two additional comparisons for each ti
in order to compute the improved bounds.
The above discussion is summarized in the following proposition.
Proposition 3. The running time per node of the branch-and-bound algorithm
in Figure 1 in which the lower bounds are improved as illustrated in Section 2 is
O(n2 ).

3.2 Improvement to Linear Time Per Node


With some adjustments, the running time to process a node of depth d in our
algorithm can be decreased from O(n2 ) to O(n− d). For this, we have to improve
the running time of two different components: the computation of the continu-
ous minima of f¯, and the lower bound improvement by ellipsoids described in
Section 2.
For the computation of the continuous minima, we can replace the O((n−d)2 )
method by an incremental technique needing O(n − d) time only, which deter-
mines the new continuous minimum from the old one in linear time whenever a
new variable is fixed. For this, we exploit the basic observation that in a given
node, the continuous minima according to all possible fixings of the next variable
lie on a line. Moreover, the direction of this line only depends on which variables
have been fixed so far, but not on the values to which they were fixed. This
implies that, in our algorithm, the direction of this line is fully determined by
the depth of the current node. Additional care has to be taken for computing
the objective function values of the continuous minima, since a direct evaluation
of the objective function would take quadratic time.
Formally, recall that by f¯(x) = x Q̄d x + L̄ x + c̄ we denote the function
obtained from f by fixing variable xi to ri for i = 1, . . . , d, and let x̄ ∈ Rn−d
be the continuous minimum of f¯, noting that x̄1 corresponds to the value of the
original variable xd+1 . Finally, let

z d := (1, −(qd+1,d+2 , qd+1,d+3 , . . . , qd+1,n ) Q̄−1 


d+1 ) ∈ R
n−d
.

Observation 8. If we fix variable xd+1 to rd+1 and re-optimize f¯, the resulting


continuous minimum is given by x̄ + (rd+1 − x̄1 )z d .
The above discussion implies that, in order to find the continuous optimum for
the nodes generated by branching from a given node and the associated value,
An Effective Branch-and-Bound Algorithm for CQIP 295

we simply have to compute x̄ + αz d and f¯(x̄ + αz d ) for a given α ∈ R. As to the


latter, if we define

v d := 2Q̄d z d ∈ Rn−d , sd := (z d ) Q̄d z d ∈ R ,

then we get
f¯(x̄ + αz d ) = f¯(x̄) + α(x̄ v d + L̄ z d ) + α2 sd .
Since L̄ can be computed incrementally in O(n − d) time, we get:
Proposition 4. If, in the preprocessing phase of the algorithm in Figure 1, we
compute z d , v d , sd as defined above for d = 0, . . . , n − 1, then the computation
of the continuous minimum and the associated lower bound can be carried out
in O(n − d) time per node.
When improving lower bounds by ellipsoids as illustrated in Section 2, the fol-
lowing natural restriction leads to linear time per node: if the splits in the root
node are defined by the columns of the transformation matrix T , then the splits
on level d are defined by the columns of the matrix T̄d arising from T by deleting
the first d rows and columns. Indeed, in this case, for the continuous minimum
¯ = x̄ + (rd+1 − x̄1 )z d obtained after having fixed variable xd+1 to rd+1 , we have

 ¯n−d ) (recall the notation in Section 1.4) in order to
¯2 , . . . , x̄
to compute T̄d+1 (x̄
determine the scaling factors μ(Q̄i , (x̄ ¯n−d ) ). If we define
¯2 , . . . , x̄

wd+1 := T̄d+1 (z2d , . . . , zn−d
d
) ∈ Rn−d−1 ,

we have
 ¯n−d ) = T̄d+1

T̄d+1 ¯2 , . . . , x̄
(x̄ (x̄2 , . . . , x̄n−d ) + T̄d+1

(rd+1 − x̄1 )(z2d , . . . , zn−d
d
)
= ((T̄d x̄)2 , . . . , (T̄d x̄)n−d ) − x̄1 (td+1,d+2 , . . . , td+1,n ) + (rd+1 − x̄1 )wd+1 .

Now T̄d x̄ has already been determined, hence each of the n − d factors
μ(Q̄i , (x̄ ¯n−d ) ) can be computed in constant extra time. After that, the
¯2 , . . . , x̄
last factor μ(Q̄0 , (x̄ ¯n−d ) ) can be computed in O(n−d) time by Observa-
¯2 , . . . , x̄
tion 5. In the constrained case, we can determine improved bounds as explained
in Section 3.1. In summary, we thus have
Proposition 5. If, in the preprocessing phase of the algorithm in Figure 1, we
compute wd+1 and all timin and timax for d = 0, . . . , n − 1, then the lower bound
improvement as illustrated in Section 2 can be carried out in O(n − d) time per
node.

4 Computational Results
In this section, we present experimental results for the two special cases of CQIP
mentioned in the introduction, namely the cases X = [−1, 1]n (Section 4.1)
and X = Rn (Section 4.2). In all cases, we compare our algorithm with the
CPLEX MIQP solver [6], which, as mentioned in the introduction, turned out
296 C. Buchheim, A. Caprara, and A. Lodi

to be by far the best method among those available when we approached the
problem. For our algorithm, we also state results obtained without the lower
bound improvements discussed in Section 2.
We implemented the versions of our algorithm running in quadratic and linear
time per node in C. All experiments were run on Intel Xeon processors running
at 2.33 GHz. For basis reduction, we used the LLL-algorithm [7] implemented in
LiDIA [8]. The runtime limit for all instances and solution methods was 8 hours.
Besides total running times, we investigated the time needed for preprocessing
and the time per node in the enumeration tree. Moreover, we state the total
number of nodes processed.

4.1 Convex Quadratic Ternary Optimization

In Table 1, we present the experimental results for instances corresponding to


second order Butterworth filters, for a sinusoidal target signal. In the first col-
umn, we state the parameters of the problem instances: the amplitude of the
signal (γ), the cut-off frequency (σ), and the number of pulses per period, which
agrees with the number of variables n. We compare the O(n − d) time/node
and the O(n2 ) time/node versions of our algorithm with CPLEX MIQP. The

Table 1. Experimental results for second order Butterworth filter instances

instance Algorithm 1, O(n − d) Algorithm 1, O(n2 ) CPLEX MIQP


γ σ n tt/s pt/s nodes tt/s pt/s nodes tt/s nodes optimum
0.2 2.25 30 0.04 0.04 1.85e+03 0.05 0.04 1.84e+03 1.13 1.20e+04 6.657892e-04
0.2 2.25 40 0.11 0.09 1.69e+04 0.31 0.15 1.68e+04 61.71 5.14e+05 2.703067e-04
0.2 2.25 50 0.24 0.21 3.00e+04 0.88 0.41 3.00e+04 20995.52 1.20e+08 1.088225e-04
0.2 2.25 60 0.77 0.43 2.62e+05 6.24 0.90 2.62e+05 — — 5.987001e-05
0.2 2.25 70 2.49 0.82 1.09e+06 32.96 1.91 1.09e+06 — — 3.811803e-05
0.2 2.25 80 2.38 1.46 5.05e+05 25.91 3.64 5.05e+05 — — 1.863941e-05
0.2 2.25 90 22.72 2.45 1.09e+07 523.11 6.41 1.09e+07 — — 1.398550e-05
0.2 2.25 100 104.58 3.93 5.12e+07 2958.38 10.68 5.12e+07 — — 9.677417e-06
0.2 2.25 110 1039.53 6.05 4.99e+08 — — — — — 7.226546e-06
0.2 2.25 120 7815.37 9.04 3.58e+09 — — — — — 5.332916e-06
0.3 2.25 30 0.03 0.03 2.50e+03 0.06 0.04 2.49e+03 1.83 1.86e+04 7.691583e-04
0.3 2.25 40 0.12 0.08 3.89e+04 0.52 0.15 3.89e+04 211.93 1.79e+06 3.624281e-04
0.3 2.25 50 0.32 0.21 9.36e+04 1.78 0.40 9.36e+04 — — 1.453055e-04
0.3 2.25 60 1.53 0.43 8.75e+05 16.94 0.90 8.75e+05 — — 7.434848e-05
0.3 2.25 70 2.83 0.82 1.34e+06 39.34 1.91 1.34e+06 — — 4.060845e-05
0.3 2.25 80 11.10 1.45 5.87e+06 213.88 3.66 5.87e+06 — — 2.452317e-05
0.3 2.25 90 28.07 2.45 1.41e+07 688.08 6.41 1.41e+07 — — 1.513070e-05
0.3 2.25 100 78.37 3.92 3.84e+07 2135.50 10.69 3.84e+07 — — 9.448114e-06
0.3 2.25 110 322.24 6.04 1.46e+08 10425.21 16.85 1.46e+08 — — 6.957344e-06
0.3 2.25 120 10814.91 9.03 4.99e+09 — — — — — 5.685910e-06
0.4 2.25 30 0.03 0.03 2.05e+03 0.06 0.04 2.04e+03 1.22 1.29e+04 8.177199e-04
0.4 2.25 40 0.09 0.08 1.06e+04 0.26 0.15 1.06e+04 82.13 6.80e+05 3.136527e-04
0.4 2.25 50 0.29 0.20 7.70e+04 1.53 0.40 7.70e+04 — — 1.518196e-04
0.4 2.25 60 0.73 0.43 2.15e+05 5.49 0.90 2.15e+05 — — 7.385543e-05
0.4 2.25 70 3.57 0.82 1.88e+06 51.64 1.92 1.88e+06 — — 4.473205e-05
0.4 2.25 80 1.67 1.46 1.04e+05 8.60 3.65 1.04e+05 — — 1.776351e-05
0.4 2.25 90 20.82 2.45 9.98e+06 520.05 6.85 9.98e+06 — — 1.587669e-05
0.4 2.25 100 244.66 3.94 1.24e+08 6615.51 11.62 1.24e+08 — — 1.167412e-05
0.4 2.25 110 804.77 6.06 3.86e+08 24945.14 16.85 3.86e+08 — — 7.950806e-06
0.4 2.25 120 8702.36 9.11 4.08e+09 — — — — — 5.842935e-06
An Effective Branch-and-Bound Algorithm for CQIP 297

Table 2. Experimental results for CVP instances

Algorithm 1, O(n − d) CPLEX MIQP


n # tt/s pt/s nt/μs nodes # tt/s nt/μs nodes
20 10 0.01 0.01 0.00 3.26e+02 10 0.08 87.10 8.66e+02
25 10 0.02 0.02 0.00 1.96e+03 10 0.40 93.80 4.40e+03
30 10 0.04 0.03 1.16 1.07e+04 10 2.54 101.75 2.51e+04
35 10 0.14 0.05 1.22 7.39e+04 10 17.72 117.49 1.50e+05
40 10 0.75 0.09 1.31 4.97e+05 10 176.80 138.89 1.26e+06
45 10 6.46 0.14 1.51 4.21e+06 10 1759.09 167.00 1.03e+07
50 10 40.45 0.21 1.59 2.51e+07 10 10875.65 202.87 5.18e+07
55 10 144.87 0.32 1.74 8.39e+07 3 15379.42 235.16 6.52e+07
60 10 2936.50 0.45 2.00 1.45e+09 0 — — —
65 10 6935.54 0.63 2.18 3.17e+09 0 — — —
70 5 15065.94 0.86 2.33 6.45e+09 0 — — —

columns marked “tt/s” contain the total time in seconds needed to solve the
instance to optimality, while “pt/s” states the time in seconds needed for the
preprocessing. Finally, “nodes” contains the total number of nodes processed in
the enumeration tree. The last column contains the value of the optimal solution.
As is obvious from Table 1, our algorithm outperforms CPLEX by several or-
ders of magnitude. CPLEX could not solve instances with more than 50 variables
within the time limit of 8 hours, while the linear-time version of our algorithm
solves all instances up to 120 variables. As expected, quadratic running time
per node leads to much slower total times. However, even the quadratic version
clearly outperforms CPLEX.

4.2 Closest Vector Problem


As anticipated in Section 1.2, we are not aware of computational contributions to
the (optimal) solution of the CVP. Thus, even for the practical applications arising
in cryptography, we could not find any public library of test problems. Therefore,
in this section, we present results for random instances of the problem whose gen-
eration is illustrated in the full paper. Given that all methods widely benefit from
this, we apply basis reduction to these instances before solving them.
The results are presented in Table 2, where 10 random instances have been
considered for each n. The first column for every solution method shows the
number of instances solved to optimality, the remaining figures are averages
over all solved instances. Again it turns out that (the linear-time version of) our
algorithm is much faster than CPLEX, even if the difference is smaller than in
the ternary case.

Acknowledgments. This work was partially supported by the University of


Bologna within the OpIMA project. The first author is supported by the DFG
under contract BU 2313/1-1.
298 C. Buchheim, A. Caprara, and A. Lodi

References
1. Bonami, P., Biegler, L.T., Conn, A.R., Cornuéjols, G., Grossmann, I.E., Laird,
C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., Wächter, A.: An algorithmic
framework for convex mixed integer nonlinear programs. Discrete Optimization 5,
186–204 (2008)
2. Callegari, S., Bizzarri, F., Rovatti, R., Setti, G.: Discrete quadratic programming
problems by ΔΣ modulation: the case of circulant quadratic forms. Technical re-
port, Arces, University of Bologna (2009)
3. Eisenbrand, F.: Integer programming and algorithmic geometry of numbers. In:
Jünger, M., Liebling, T., Naddef, D., Nemhauser, G., Pulleyblank, W., Reinelt,
G., Rinaldi, G., Wolsey, L.A. (eds.) 50 Years of Integer Programming 1958-2008.
The Early Years and State-of-the-Art Surveys. Springer, Heidelberg (2009)
4. Frangioni, A., Lodi, A., Rinaldi, G.: Optimizing over semimetric polytopes. In:
Bienstock, D., Nemhauser, G.L. (eds.) IPCO 2004. LNCS, vol. 3064, pp. 431–443.
Springer, Heidelberg (2004)
5. Hemmecke, R., Köppe, M., Lee, J., Weismantel, R.: Nonlinear integer program-
ming. In: Jünger, M., Liebling, T., Naddef, D., Nemhauser, G., Pulleyblank, W.,
Reinelt, G., Rinaldi, G., Wolsey, L.A. (eds.) 50 Years of Integer Programming 1958-
2008. The Early Years and State-of-the-Art Surveys. Springer, Heidelberg (2009)
6. ILOG, Inc. ILOG CPLEX 12.1 (2009), http://www.ilog.com/products/cplex
7. Lenstra, A.K., Lenstra Jr., H.W., Lovász, L.: Factoring polynomials with rational
coefficients. Mathematische Annalen 261, 515–534 (1982)
8. LiDIA. LiDIA: A C++ Library For Computational Number Theory (2006),
http://www.cdc.informatik.tu-darmstadt.de/TI/LiDIA/
9. Lodi, A.: MIP computation and beyond. In: Jünger, M., Liebling, T., Naddef, D.,
Nemhauser, G., Pulleyblank, W., Reinelt, G., Rinaldi, G., Wolsey, L.A. (eds.) 50
Years of Integer Programming 1958-2008. The Early Years and State-of-the-Art
Surveys. Springer, Heidelberg (2009)
10. Micciancio, D., Goldwasser, S.: Complexity of Lattice Problems: A Cryptographic
Perspective. Springer, Heidelberg (2002)
11. Moré, J.J., Toraldo, G.: On the solution of large quadratic programming problems
with bound constraints. SIAM Journal on Optimization 1, 93–113 (1991)
12. Rajan, D., Dash, S., Lodi, A.: { − 1,0,1} unconstrained quadratic programs us-
ing max-flow based relaxations. Technical Report OR/05/13, DEIS, University of
Bologna (2005)
13. De Simone, C.: The cut polytope and the boolean quadric polytope. Discrete Math-
ematics 79, 71–75 (1989)
Extending SDP Integrality Gaps to
Sherali-Adams with Applications to
Quadratic Programming and MaxCutGain

Siavosh Benabbas and Avner Magen

Department of Computer Science


University of Toronto
{siavosh,avner}@cs.toronto.edu

Abstract. We show how under certain conditions one can extend con-
structions of integrality gaps for semidefinite relaxations into ones that
hold for stronger systems: those SDP to which the so-called k-level con-
straints of the Sherali-Adams hierarchy are added. The value of k above
depends on properties of the problem. We present two applications, to the
Quadratic Programming problem and to the MaxCutGain problem.
Our technique is inspired by a paper of Raghavendra and Steurer
[Raghavendra and Steurer, FOCS 09] and our result gives a doubly ex-
ponential improvement for Quadratic Programming on another re-
sult by the same authors [Raghavendra and Steurer, FOCS 09]. They
provide tight integrality-gap for the system above which is valid up to
k = (log log n)Ω(1) whereas we give such a gap for up to k = nΩ(1) .

1 Introduction

A powerful tool in obtaining approximation algorithms for NP-hard problems is


semidefinite programming. Here one formulates a problem as a quadratic pro-
gram and then “relaxes” it to a semidefinite program. The interesting quantity
one tries to minimize is then the integrality gap defined as the ratio of the ob-
jective value in the relaxation compared to the objective value of the original
problem. When this integrality gap is too big (the relaxation is not tight enough)
one often considers various strengthenings and hopes that they provide a better
integrality gap.
Lift-and-Project systems are systematic ways to produce stronger and stronger
relaxations for a problem from a canonical LP or SDP relaxation. These systems
are equipped with a parameter called “level” which tunes the strength of the
relaxation. As the level increases the relaxation becomes stronger while its size
(and time required to solve it) increases. At one extreme the zeroth level is the
original relaxation and at the other extreme the nth level is an exact formula-
tion of the problem but has exponential size. In between one gets a spectrum of
stronger and stronger relaxations the lth of which can be solved in time nΘ(l) . See
[8, 9, 15] for a more in-depth discussion about this methodology. Recently, there
has been a growing interest in understanding the exact interplay between the

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 299–312, 2010.

c Springer-Verlag Berlin Heidelberg 2010
300 S. Benabbas and A. Magen

quality of such relaxations and the level parameter. See [3, 5, 10, 13, 14, 16, 17]
for a few examples.
Our main problem of interest will be Quadratic Programming as de-
fined by [4]. Here the input is a matrix An×n and  the objective is to find
x ∈ {−1, 1}n that maximizes the quadratic form i=j aij xi xj . The natural
semidefinite programming relaxation of this problem replaces the integer (±1)
valued xi ’s with unit vectors, and products with inner products. This problem
has applications in correlation clustering and its relaxation is related to the well-
known Grothendieck inequality[6] (see [1].) Charikar and Wirth [4] show that the
integrality gap of this relaxation is at most O(log n) and a matching lower bound
was later established by [1, 2, 7]. To lowerbound the integrality gap Khot and
O’Donnell [7] construct a solution for the SDP relaxation that has objective
value Θ(log n) times bigger than any integral (±1) solution.
It is then natural to ask if strengthening the SDP with a lift and project
method will reduce this Θ(log n) gap resulting in a better approximation algo-
rithm. We investigate the performance of one of the stronger Lift-and-Project
systems, the so called Sherali-Adams system for this problem. We show that for
Quadratic Programing the integrality gap does not asymptotically decrease even
after level nδ of the Sherali-Adams hierarchy for some constant δ. The technique
used is somewhat general and can be applied to other problems, and as an ex-
ample we apply it to the related problem of MaxCutGain. We can prove the
following two theorems.
Theorem 1. The standard SDP formulation of Quadratic Programming
admits a Θ(log N ) integrality gap even when strengthened with N Ω(1) levels of
the Sherali-Adams hierarchy.
Theorem 2. The standard SDP formulation of MaxCutGain has integrality
gap ω(1) even when strengthened with ω(1) levels of the Sherali-Adams hierarchy.
It should be mentioned that it is known [2, 7] that assuming the Unique Games
Conjecture no polynomial time algorithm can improve upon the Θ(log n) factor
for Quadratic Programming or ω(1) for MaxCutGain. However, our re-
sults do not rely on any such assumptions. Also, given that the Sherali-Adams
strengthening can only be solved in polynomial time for constant levels, our re-
sults preclude an algorithm based on the standard relaxation with Sherali-Adams
δ
strengthening that runs in time 2n for some small enough constant δ.
Here is an outline of our approach. We start with a known integrality gap
instance for the SDP relaxation and its SDP solution A. Now consider an in-
tegral solution B to the same instance (obviously B is a solution to an SDP
strengthened with an arbitrary level of the Sherali-Adams hierarchy.) We com-
bine A and B into a candidate solution C that has some of the good qualities
of both, namely it has high objective value (like A) and has almost the same
behavior with regards to the Sherali-Adams hierarchy as B. In particular, the
only conditions of the Sherali-Adams hierarchy C violates are some positivity
conditions, and those violations can be bounded by some quantity ξ. To handle
these violations we take a convex combination of C with a solution in which
Extending SDP Integrality Gaps to Sherali-Adams with Applications 301

these violated conditions are satisfied with substantial enough slack. In fact a
uniform distribution over all integral points (considered as a solution) has this
property. The weights we need to give to C and the uniform solution in the
convex combination will depend on ξ and the amount of slack in the uniform
solution which are in turn a function of the level parameter k. In turn these two
weights determine the objective value of the resulting solution.
This idea of “smoothening” a solution like the above to satisfy Sherali-Adams
constraints is due to Raghavendra and Steurer[12]. They use this idea together
with dimensionality reduction to give a rounding algorithm that is optimal for
a large class of constraint satisfaction problems. In another relevant new result
[13] the same authors show how to get integrality gaps for the Sherali-Adams
SDPs similar to our work. However, in [13] the main vehicle in getting the results
are reductions from the Unique Games Problem and “smoothening” is used as
a small step.
It is interesting to compare our results with [13]. While [13] gives a more
general result in that it applies to a broader spectrum of problems, it only
holds to a relatively low level of the Sherali-Adams hierarchy. In particular,
for Quadratic Programming their result is valid for Sherali-Adams SDPs of
level up to (log log n)Ω(1) whereas ours is valid up to level nΩ(1) . We note that
the two results exhibit different level/integrality gap tradeoffs as well, in that
[13] provides the same integrality gap asymptotically until the “critical level”
in which point it breaks down completely. Our results supplies a more smooth
tradeoff with the integrality gap dropping “continuously” as the level increases.
An additional difference is that our result (which does not use Unique Games
reductions) is elementary.
The rest of the paper is organized as follows. In Section 2 we are going to
introduce our notation and some basic definitions and identities from Fourier
analysis on the cube. In the same section we will review the relevant definition
of the Sherali-Adams hierarchy. In section 3 we are going to state and prove
our main technical lemma. Section 4 presents an application of the lemma to
Quadratic Programming. There we prove an integrality gap for the SDP
relaxation of this problem strengthened to the nδ th level of the Sherali-Adams
hierarchy (for some δ > 0.) In Section 5 we present our application to the
MaxCutGain problem. In particular we present super-constant integrality gaps
for some super-constant level of the Sherali-Adams SDP hierarchy. We conclude
in section 6 by pointing out some of the limitations of our approach and some
open problems.

2 Preliminaries

Notation and Fourier Analysis on the cube


   [n] 
We denote by [n] the set {1, . . . , n} and by [n]
k and ≤k the set of subsets of
[n] of size exactly k and at most k respectively. For a distribution μ on some
finite space D we denote by μ(x) the probability of choosing x ∈ D according
302 S. Benabbas and A. Magen

to μ. Note that this allows us to think of distributions as real functions with


domain D.
Consider the set of real functions with domain {−1, 1}n as a linear space of di-
def
mension 2n . It is well known that under the inner product "f, g# = Ex [f (x)g(x)]
def <
the functions {χS }S⊆[n] defined as χS (x) = i∈S xi form an orthonormal basis
called the Fourier basis for this space. In particular, any function f has a unique
Fourier expansion:

f= f0(S)χS , f0(S) = "f, χS #.
S

We call f0(S)’s the Fourier coefficients of f . An immediate corollary is Parseval’s


identity,

E [f (x)g(x)] = "f, g# = f0(S)0
g (S).
x
S

For a distribution μ on {−1, 1}n and for S ⊆ [n] we let MarS μ be the marginal
distribution of μ on the set S. In particular, for y ∈ {−1, 1}S we have
(MarS μ)(y) = Prz∼μ [zS = yS ]. Alternatively,

(MarS μ)(y) = μ(y ◦ z),
z∈{−1,1}[n]\S

where ◦ denotes the concatenation of two vectors on disjoint set of coordinates in


the natural way. In fact, for any function f : {−1, 1}n → R we define (MarS f ) :
{−1, 1}S → R like above. One can easily express the Fourier coefficients of
MarS μ in terms of those of μ,

Mar S μ(U ) = E [(MarS μ)(x)χU (x)]
x∈{−1,1}S
⎡ ⎤

= E ⎣ μ(x ◦ z)χU (x)⎦
x∈{−1,1}S
z∈{−1,1}[n]\S
 
= 2−|S| μ(x ◦ z)χU (x)
x∈{−1,1}S z∈{−1,1}[n]\S

= 2−|S| 0(U ).
μ(y)χU (y) = 2n−|S| μ (1)
y∈{−1,1}[n]

In particular, for all x ∈ {−1, 1}S we have that


 
(MarS μ)(x) = 
Mar S μ(U )χU (x) = 2
n−|S|
0(U )χU (x).
μ (2)
U⊆S U⊆S

We will use Sd−1 for the (d − 1)-dimensional unit sphere, i.e. , the set of unit
vectors in Rd . Throughout the paper we will use bold letters for real vectors,
and "v, u# for the inner product of two vectors v, u ∈ Rd .
Extending SDP Integrality Gaps to Sherali-Adams with Applications 303

The Sherali-Adams Hierarchy


The Sherali-Adams hierarchy[15] is a Lift and Project method to strengthen a
canonical LP or SDP formulation of a problem. In this method one starts with
an exact formulation of a problem as an integer program and relaxes it to a
Linear Program or Semidefinite Program, which we call the canonical formula-
tion. One then strengthens the canonical formulation by adding extra variables
and constraints valid for the integer solutions in a specific way. The method has
a parameter, level, to choose the strength of the resulting relaxation, and one
can think of the resulting relaxations (for different level parameters) as mak-
ing a hierarchy, each stronger than the previous. The weakest relaxation is then
the canonical formulation for level k = 0, and the strongest is a perfect for-
mulation of the problem (albeit of exponential size) for k = n, where n is the
number of variables in the canonical formulation. It is known that one can solve
the strengthened relaxation of level k in running time nO(k) . We use the term
Sherali-Adams SDP of level k to identify the Sherali-Adams strengthening of
level k of the canonical SDP formulation of a problem.
In this work we concentrate on SDPs for combinatorial problems that have
no constraints. In particular, consider an optimization problem of the form

max c(x1 , . . . , xn )
Subject to xi ∈ {−1, 1} for all i.

where c is a quadratic polynomial. For such a problem the definition of the


hierarchy is simplified using the following definition.
Definition [Compatibility]. Let k ≤ n be integers. We say that vectors
u0 , . . . , un and a family of distributions {μS }S⊂[n],|S|≤k where each μS is a
distribution on {−1, 1}S are compatible if the distributions are consistent on
marginals and the inner products of the vectors agree with appropriate biases
and correlations as follows

∀i ∈ [n] "u0 , ui # = E [xi ] , (P1)


x∼μ{i}

∀i = j ∈ [n] "ui , uj # = E [xi xj ] , (P2)


x∼μ{i,j}

∀T ⊆ S ⊂ [n], |S| ≤ k μT = MarT μS . (P3)

Fact 1. A set of vectors is a solution to a Sherali-Adams SDP of level k of such


a problem if and only if they are compatible with some family of distribution (on
subsets of size up to k) as above.
In particular, for Quadratic Programming, MaxCutGain and constraint
satisfaction problems where we do not have any hard constraints on the variables
we can just focus on the definition of compatibly as above. We do not supply
the equivalence proof for Fact 1 in this extended abstract, but instead refer the
reader to a similar concept laid forth by [3, 17].)
304 S. Benabbas and A. Magen

The goal of the present work is to show that (for particular problems), even
for large values of k, the Sherali-Adams SDP of level k is not much stronger
than the canonical relaxation. In particular, its has asymptotically the same
integrality gap.

3 Extending Vector Solutions to Sherali-Adams


Hierarchy
In this section we provide the general framework that suggests how vector (or
SDP) solutions can be extended to Sherali-Adams solutions of comparable ob-
jective value. This framework is captured by the following lemma.

Lemma 2. Let v0 , v1 , . . . , vn ∈ Sn−1 be unit vectors, ν be a distribution on


{−1, 1}n, k be a positive integer and ∈ (0, 1/2] be small enough to satisfy,

for all i ∈ [n] 2 k 2 |"v0 , vi # − E [xi ] | ≤ 1,


x∼ν
for all i = j ∈ [n] 2 k 2 |"vi , vj # − E [xi xj ] | ≤ 1.
x∼ν

Then there exist unit vectors u0 , u1 , . . . , un ∈ Sn−1 and a family of distributions


{μS }S⊂[n],|S|≤k such that ui ’s are compatible with {μS } as defined in Section 2
i.e., they are a valid solution for Sherali-Adams SDP of level k. Furthermore,
the inner products of ui ’s are related to those of vi ’s as follows

for all i ∈ [n] "u0 , ui # = "v0 , vi #, (3)


for all i = j ∈ [n] "ui , uj # = "vi , vj #. (4)

Before we prove the lemma we will briefly describe its use. For simplicity we will
only describe the case where ν is the uniform distribution on {−1, 1}n. To use the
lemma one starts with an integrality gap instance for the canonical SDP relax-
ation of the problem of interest (say Quadratic Programming) and its vector
solution v0 , v1 , . . . , vn . Given that the instance is an integrality gap instance one
knows that the objective value attained by v0 , v1 , . . . , vn is much bigger than
what is attainable by any integral solution. The simplest thing one can hope for
is to show that this vector solution is also a solution to the Sherali-Adams SDP
of level k, or in the language of Fact 1 the vectors are compatible with a set of
distributions {μS }S⊂[n],|S|≤k . However, this is not generally possible. Instead we
use the lemma (in the simplest case with ν being the uniform distribution on
{−1, 1}n) to get another set of vectors u0 , . . . , un which are in fact compatible
with some {μS }S⊂[n],|S|≤k . Given that in the problems we consider the objective
function is a quadratic polynomial, and given the promise of the lemma that
inner products of ui ’s is times that of vi ’s, it follows that the objective value
attained by ui ’s is times that attained by vi ’s. As vi ’s are the vector solution
for an integrality gap instance, it follows that the integrality gap will decrease
by a multiplicative factor of at most when one goes from the canonical SDP
relaxation to the Sherali-Adams SDP of level k.
Extending SDP Integrality Gaps to Sherali-Adams with Applications 305

How big one can take (satisfying the requirements of the lemma) will then
determine the quality of the resulting integrality gap. In the simplest case one
can take ν to be the uniform distribution on {−1, 1}n and argue that in this case
the requirements of the lemma are satisfied as long as 2 k 2 ≤ 1 and in particular
for = 1/2k 2. In fact our application to the MaxCutGain problem detailed in
section 5 follows this simple outline. For Quadratic Programming we can get
a better result by taking a close look at the particular structure of vi ’s for the
integrality gap instance of [7] and using a more appropriate distribution for ν.
We will now prove Lemma 2.

Proof (of Lemma 2). Our proof proceeds in two steps. First, we construct a
family of functions {fS : {−1, 1}S → R}S⊂[n],|S|≤k that satisfy all the required
conditions except being the probability mass function of distributions. In par-
ticular, while for any S the sum of the values of fS is 1, this function can take
negative values for some inputs. The construction of fS uses the distribution
ν and guarantees that while fS may take negative values, these values are not
too small. In the second step we take a convex combination of the fS ’s with
the uniform distribution on {−1, 1}S to get the desired family of distributions
{μS }|S|≤k .
 [n] 
For any subset S ∈ ≤k we define fS as a “hybrid” object that is using
both the vectors {vi }’s and the distribution ν. Given that a function is uniquely
determined by its Fourier expansion we will define fS in terms of its Fourier
expansion,
f1
S (∅) = 2
−|S|
= 2n−|S| ν0(∅),
∀i ∈ S f1
S ({i}) = 2
−|S|
"v0 , vi #,
∀i = j ∈ S f1
S ({i, j}) = 2
−|S|
"vi , vj #,
∀T ⊆ S, |T | > 2 f1
S (T ) = 2
n−|S|
ν0(T ).
Comparing the above definition with (1), fS is exactly like the marginal distri-
bution of ν on the set S except it has different degree one and degree two Fourier
coefficients. First, observe that for any S, the sum of the values of fS is 1.

fS (x) = 2|S| E [fS (x)] = 2|S| f1
S (∅) = 1.
x
x∈{−1,1}S

Then observe that by (1), for all U ⊆ T ⊂ S,


|S|−|T | 1

MarT fS (U ) = 2 fS (U ) = f1
T (U ).

So, fS satisfies (P3). Now observe that,


 & '
fS (x)xi = 2|S| E fS (x)χ{i} (x) = 2|S| f1
S ({i}) = "v0 , vi #,
x
x∈{−1,1}S
 & '
fS (x)xi xj = 2|S| E fS (x)χ{i,j} (x) = 2|S| f1
S ({i, j}) = "vi , vj #.
x
x∈{−1,1}S
306 S. Benabbas and A. Magen

So, fS ’s satisfy (P1) and (P2) and are compatible with vi ’s (except, they are
not distributions.1 ) Next we show that fS (y) cannot be too negative.

fS (y) = f1S (T )χT (y)
T ⊆S
 
= 2n−|S| ν0(T )χT (y) + (f1
S (T ) − 2
n−|S|
ν0(T ))χT (y)
T ⊆S T ⊆S

= (MarS ν)(y) + (f1
S (T ) − 2
n−|S|
ν0(T ))χT (y) by (2)
T ⊆S
- -
≥− fS (T ) − 2n−|S| ν0(T )-
-1
T ⊆S
-
=− -1
fS ({i}) − 2n−|S| ν0({i})|
i∈S
 -
− |f1
S ({i, j}) − 2
n−|S|
ν0({i, j})-
i=j∈S
⎛ ⎞
-  -
= −2−|S| ⎝ -"v0 , vi # − E [xi ] | + |"vi , vj # − E [xi xj ] -⎠
x∼ν x∼ν
i∈S i=j∈S

≥ −2−|S|/2 ,

where the last step follows from the condition on and because |S| ≤ k. This
completes the first step of the proof.
Next, define π to be the uniform distribution on {−1, 1}n and μS as a convex
combination of fS and MarS π, i.e.,

∀y ∈ {−1, 1}S μS (y) = fS (y) + (1 − )(MarS π)(y).


√ √
ui = · vi + 1 − · wi .

Here, wi ’s are defined such that they are perpendicular to all vi ’s and each other.
It is easy to check that ui ’s are compatible with μS ’s and satisfy all the
required properties of the lemma (except that μS ’s can potentially be negative.)
In fact (P1-P3) are linear and given that they hold for {fS } and {vi } and for
 [n] 
{MarS π} and {wi } they must hold for {μS } and {ui }. Finally, for any S ∈ ≤k
and y ∈ {−1, 1}S we have

μS (y) = fS (y) + (1 − )(MarS π)(y) ≥ − 2−|S|/2 + (1 − )2−|S|


= 2−|S|−1 − 2−|S| ≥ 0.

Remark: Some of the conditions in Lemma 2 can readily be relaxed. First,


notice that the choice of π as the uniform distribution is not essential. The
only property that was needed for the argument to work was that the marginal

1
Note that for a distribution μS we have Ex∼μS [xi ] = x∈{−1,1}S μS (x)xi . Hence,
the above sums are relevant to (P1) and (P2).
Extending SDP Integrality Gaps to Sherali-Adams with Applications 307

distribution of π over S assigns positive probability to every member of {−1, 1}S


(the existence of compatible wi ’s is also required but is true for any distribution
on {−1, 1}n.) More precisely there is a positive function δ(k) so that,

Pr [x = y] ≥ δ(k).
x∼MarS π

One would need a stronger condition on in this case that depends on δ(k). The
inner product of the resulting vectors would of course depend on the distribution
π, namely,

"u0 , ui # = "v0 , vi # + (1 − ) E [xi ] ,


x∼π
"ui , uj # = "vi , vj # + (1 − ) E [xi xj ] .
x∼π

Another observation is that ν and π do not have to be true (global) distributions


on {−1, 1}n. Instead, we can start with two families of distributions on sets of
size at most k and a set of vectors w0 , w1 , . . . , wn such that πS ’s are compatible
with wi ’s and νS ’s are consistent with each other in the sense of equality (P3).

4 Application to Quadratic Programming


In the Quadratic Programming problem one is given a matrix  An×n and
is asked to find x ∈ {−1, 1}n maximizing the quadratic form i=j aij xi xj .
The canonical semidefinite programming relaxation of this problem replaces the
integer (±1) valued xi ’s with unit vectors vi ’s, and products xi xj with the inner
products "vi , vj #. In particular, showing an integrality gap of Θ(log N ) for the
canonical SDP relaxation is equivalent to showing that for any large enough N
 AN ×N and value ξ, such that the following hold; (a) for all
there exists a matrix
x ∈ {−1, 1}
N
, i=j aij xi xj ≤ O(ξ/ log(N )), (b) there exist unit vectors {vi }
such that i=j aij "vi , vj # ≥ Ω(ξ). In order to show an integrality gap for the
Sherali-Adams SDP of level k one in addition needs to show that {vi }’s are
compatible with a set of distributions {μS }S⊂[n],|S|≤k .
As discussed earlier we will start with the integrality gap instance of Khot and
O’Donnell [7] and apply lemma 2. The following is a fairly immediate corollary
that follows from Theorem 4.4 and Proposition 2.2 in [7]. It’s worth noting that
Khot and O’Donnell are mainly interested in the existence of integrality gap
instances, and the matrix A is implicit in their work.
Corollary 1 (follows from [7]). Let ξ > 0 be sufficiently small and let d =
1/ξ 3 and n = Θ(d7 ). Further, let m = 1ξ log( 1ξ ) = Θ(n1/21 log n) and N =
nm. Choose n vectors u1 , . . . , un ∈ Rd according to the d-dimensional Gaussian
distribution. Then one can define AN ×N as a function of ui ’s such that almost
surely the following two conditions hold:

1. i=j aij xi xj ≤ O(ξ/ log(1/ξ)) for all
x ∈ {−1, 1}N .
2. There exist unit vectors vi such that i=j aij "vi , vj # ≥ Ω(ξ).
308 S. Benabbas and A. Magen

Furthermore, the vi ’s are produced in this simple manner based on ui ’s. Divide
the N variables into n classes of size m each, and assign the vector uj /uj  to
the variables in the jth class. Formally, vi = ui/m /ui/m .
We will need the the following property of the vi ’s which easily follows from
well-known properties of random unit vectors. We will prove it in appendix A
for completeness.
Fact 3. In the SDP solution of [7], with probability at least 1−4/n4, for all pairs
of indices 1 ≤ i, j ≤ N the inner product "vi , vj # has the following property,
vi , vj  = 1 if i and j are in the same class, i.e. i/m = j/m,
=
|vi , vj | ≤ (12 log n)/d if i and j are in different classes, i.e. i/m = j/m.
Recall that in Lemma 2 a choice of a distribution ν on {−1, 1}n is required. In
particular, if for every pair of variables i, j, Ex∼ν [xi xj ] is close to "vi , vj # one
can choose a big value for in the lemma, which in turn means that the resulting
ui ’s will have inner products close to those of vi ’s.
Indeed, the key to Theorem 1 is using ν which is “agreeable” with fact 3: two
variables will have a large or a small covariance depending on whether they are
from the same or different classes, respectively. Luckily, this is easily achievable
by identifying variables in the same class and assigning values independently
across classes. In other words the distribution ν will choose a random value from
{−1, 1} for x1 = x2 = · · · = xm , an independently chosen value for xm+1 = · · · =
x2m , and similarly an independently chosen value for xnm−m+1 = · · · = xnm .
Such ν clearly satisfies,

1 if i and j are in the same class,
E [xi xj ] =
x∼ν 0 otherwise.
Consider the vector solution of [7], v1 , . . . , vnm and define v0 as a vector per-
pendicular to all other vi ’s. Consider the distribution ν defined above and apply
Lemma 2 for v0 , . . . , vi , ν, k = d0.2 , and = 1/2. By Fact 3 the inner products
of the vi vectors are close to the corresponding correlations of the distribution
ν. It is easy to check that the conditions of Lemma 2 are satisfied,
2 k 2 |"v0 , vi # − E [xi ] | = 2 k 2 |0 − 0| = 0,
x∼ν
= =
2 k |"vi , vj # − E [xi xj ] | ≤ d0.4 (12 log n)/d = 12 log n/d0.1  1,
2
x∼ν

and the lemma applies for large enough n thus the resulting vectors, ui ’s, are
a solution to the Sherali-Adams SDP of level k. It is now easy to see that (4)
implies a big objective value for this solution,
 
aij "ui , uj # = aij "vi , vj # ≥ Ω(ξ).
i=j i=j

It remains to estimate the value of k in terms of N :


k = d0.2 = Θ(n1/35 ) = Θ(N 21/770 / log1/35 N ) = Ω(N 1/37 ),
and we conclude,
Extending SDP Integrality Gaps to Sherali-Adams with Applications 309

Theorem 1 (restated). For δ = 1/37, the Sherali-Adams SDP of level N δ for


Quadratic Programming has integrality gap Ω(log N ).

5 Application to MaxCutGain
The MaxCutGain problem is an important special case of the Quadratic Pro-
gramming problem where all entries of the matrix A are nonpositive. In other
words the input is a nonpositive n by n matrixA and the objective is to find
x ∈ {−1, 1}n that maximizes the quadratic form i=j aij xi xj . This problem gets
its name and main motivation from studying algorithms for the MaxCut prob-
lem that perform well for graphs with maximum cut value close to half the edges.
See [7] for a discussion.
Naturally, constructing integrality gap instances for MaxCutGain is harder
than Quadratic Programing. The best integrality gap instances are due to Khot
and O’Donnell [7] who, for any constant Δ, construct an instance of integrality
gap at least Δ. The following is a restatement of their Theorem 4.1 tailored to
our application.
Theorem 3 ([7]). The standard SDP relaxation for MaxCutGain has super
constant integrality gap. Specifically, for any constant ξ > 0, there is a big enough
n and a matrix An×n such that,

i=j aij xi xj ≤ ξ/ log ξ for all x ∈ {−1, 1} 
1 n
1. .
2. There are unit vectors v1 , . . . , vn such that i=j aij "vi , vj # ≥ Ω(ξ).
It should be mentioned that the proof of [7] is continuous in nature and it is not
entirely clear how n grows as a function of 1/ξ (once some form of discretization
is applied to their construction.) However, an integrality gap of f (n) for some
function f (n) = ω(1) is implicit in the above theorem.
Consider=the instance from Theorem 3 and the function f (n) as above and
let g(n) =3 f (n). We know that for every n, there are unit vectors v1 , . . . , vn
such that i=j aij "vi , vj # ≥ Ω(ξ). Let v0 be a unit vector perpendicular to all
vi ’s and set k = g(n), = g(n)−2 /2 and let ν be the uniform distribution on
{−1, 1}n. Note that for all i < j,
2 k 2 |"v0 , vi # − E [xi ] | = |"v0 , vi #| = 0,
x∼ν
2 k 2 |"vi , vj # − E [xi xj ] | = |"vi , vj #| ≤ 1,
x∼ν

hence the conditions of Lemma 2 hold. Consequently, there are vectors


u0 , u1 , . . . , un compatible with a family of distributions {μS } on subsets of size
up to k = g(n) which satisfy (4). Now,
 
aij "ui , uj # = aij "vi , vj # ≥ Ω(ξ/g(n)2 ),
i=j i=j

∀x ∈ {−1, 1} n
aij xi xj ≤ O(ξ/f (n)) = O(ξ/g(n)3 )
i=j

and we obtain,
310 S. Benabbas and A. Magen

Theorem 2 (restated).There exists a function g(n) = ω(1), such that the


Sherali-Adams SDP of level g(n) for MaxCutGain has integrality gap Ω(g(n)).

6 Discussion

We saw that the Sherali-Adams SDP of level nΩ(1) for Quadratic Program-
ming has the same asymptotic integrality gap as the canonical SDP, namely
Ω(log n). It is interesting to see other problems for which this kind of construc-
tion can prove meaningful integrality gap results. It is easy to see that as long
as a problem does not have “hard” constraints, and a super constant integrality
gap for the canonical SDP relaxation is known, one can get super constant inte-
grality gaps for super constant levels of the Sherali-Adams SDP just by plugging
in the uniform distribution for ν in Lemma 2.
It is possible to show that the same techniques apply when the objective
function is a polynomial of degree greater than 2 (but still constant.) This is
particularly relevant to Max-CSP(P ) problems. When formulated as a maxi-
mization problem of a polynomial q over ±1 valued variables, q will have degree
r, the arity of P . In fact, the canonical SDP formulation for the case r > 2 will
be very similar to Sherali-Adams SDP of level r in our case. In order to adapt
Lemma 2 to this setting, the Fourier expansion of fS ’s should be adjusted appro-
priately. Specifically, their Fourier expansion would match that of the starting
SDP solution up to level r and that of ν beyond level r. It is also possible to
define “gain” versions of Max-CSP(P ) problems in this setting and extend ex-
isting superconstant integrality gaps to the Sherali-Adams SDP of superconstant
level (details omitted in this extended abstract.)
The first obvious open problem is to extend these constructions to be appli-
cable to problems with “hard” constraints for which SDP integrality gaps have
been (or may potentially be) shown. A possibly reasonable candidate along these
lines would be the Sparsest Cut problem in which we have one normalizing con-
straint in the SDP relaxation, and for which superconstant integrality gaps are
known. In contrast, it seems quite unlikely that these techniques can be extended
for Vertex Cover where the integrality gap is constant. Another direction is to
extend these techniques to the Lasserre hierarchy for which very few integrality
gaps are known. We believe that this is possible but more sophisticated “merge
operators” of distributions and vectors á la Lemma 2 will be necessary.

References

1. Alon, N., Makarychev, K., Makarychev, Y., Naor, A.: Quadratic forms on graphs.
Invent. Math. 163(3), 499–522 (2006)
2. Arora, S., Berger, E., Kindler, G., Safra, M., Hazan, E.: On non-approximability for
quadratic programs. In: FOCS ’05: Proceedings of the 46th Annual IEEE Sympo-
sium on Foundations of Computer Science, pp. 206–215. IEEE Computer Society,
Washington (2005)
Extending SDP Integrality Gaps to Sherali-Adams with Applications 311

3. Charikar, M., Makarychev, K., Makarychev, Y.: Integrality gaps for sherali-adams
relaxations. In: STOC ’09: Proceedings of the 41st annual ACM symposium on
Theory of computing, pp. 283–292. ACM, New York (2009)
4. Charikar, M., Wirth, A.: Maximizing quadratic programs: Extending
grothendieck’s inequality. In: FOCS ’04: Proceedings of the 45th Annual IEEE
Symposium on Foundations of Computer Science, pp. 54–60. IEEE Computer
Society, Washington (2004)
5. Georgiou, K., Magen, A., Tulsiani, M.: Optimal sherali-adams gaps from pairwise
independence. In: APPROX-RANDOM, pp. 125–139 (2009)
6. Grothendieck, A.: Résumé de la théorie métrique des produits tensoriels
topologiques. Bol. Soc. Mat. São Paulo 8, 1–79 (1953)
7. Khot, S., O’Donnell, R.: Sdp gaps and ugc-hardness for maxcutgain. In: FOCS ’06:
Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer
Science, pp. 217–226. IEEE Computer Society, Washington (2006)
8. Lasserre, J.B.: An explicit exact sdp relaxation for nonlinear 0-1 programs. In:
Aardal, K., Gerards, B. (eds.) IPCO 2001. LNCS, vol. 2081, pp. 293–303. Springer,
Heidelberg (2001)
9. Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0–1 optimization.
SIAM Journal on Optimization 1(2), 166–190 (1991)
10. Mathieu, C., Sinclair, A.: Sherali-adams relaxations of the matching polytope.
In: STOC ’09: Proceedings of the 41st annual ACM symposium on Theory of
computing, pp. 293–302. ACM, New York (2009)
11. Matousek, J.: Lectures on Discrete Geometry. Springer, New York (2002)
12. Raghavendra, P., Steurer, D.: How to round any csp. In: FOCS ’09: Proceedings
of the 50th Annual IEEE Symposium on Foundations of Computer Science (to
appear 2009)
13. Raghavendra, P., Steurer, D.: Integrality gaps for strong sdp relaxations of unique
games. In: FOCS ’09: Proceedings of the 50th Annual IEEE Symposium on Foun-
dations of Computer Science (to appear 2009)
14. Schoenebeck, G.: Linear level lasserre lower bounds for certain k-csps. In: FOCS
’08: Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of
Computer Science, pp. 593–602. IEEE Computer Society, Washington (2008)
15. Sherali, H.D., Adams, W.P.: A hierarchy of relaxations between the continuous and
convex hull representations for zero-one programming problems. SIAM Journal on
Discrete Mathematics 3(3), 411–430 (1990)
16. Tulsiani, M.: Csp gaps and reductions in the lasserre hierarchy. In: STOC ’09:
Proceedings of the 41st annual ACM symposium on Theory of computing, pp.
303–312. ACM, New York (2009)
17. de la Vega, W.F., Kenyon-Mathieu, C.: Linear programming relaxations of maxcut.
In: SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on
Discrete algorithms, pp. 53–61. Society for Industrial and Applied Mathematics,
Philadelphia (2007)
312 S. Benabbas and A. Magen

A Proof of Fact 3

We will need the following simple and well known lemma that most of the area
of Sd−1 is concentrated around the equator.

Lemma 4. For any unit vector v ∈ Sd−1 , if a unit vector x ∈ Sd−1 is chosen
uniformly at random, |"v, x#| is sharply concentrated:
$ √ %
Pr |"v, x#| ≥ t/ d ≤ 4e−t /2 .
2

Proof. Define,
def
f (x) = "v, x#,

and apply Lévy’s lemma (see Theorem 14.3.2 of [11]) observing that f (x) is
1-Lipschitz. We will have,
$ √ % $ √ % √ 2
Pr |"v, x#| ≥ t/ d = Pr |f (x) − median(f )| ≥ t/ d ≤ 4e−(t/ d) d/2

= 4e−t
2
/2
.

Now, proving the fact is a matter of looking at the actual definition of the
solution vectors and applying lemma 4.
Fact 3 (restated). In the SDP solution of [7], with probability at least 1−4/n4,
for all pairs of indices 1 ≤ i, j ≤ N the inner product "vi , vj # has the following
property,

"vi , vj # = 1 if i and j are in the same class,


=
|"vi , vj #| ≤ (12 log n)/d if i and j are in different classes.

Proof. The first case follows from the definition of vi ’s. For the second case vi
and vj are independent d-dimensional vectors distributed uniformly on Sd−1 .
Consider a particular choice of vi , according to lemma 4,
$ = %
Pr |"vi , vj #| ≥ (12 log n)/d ≤ 4e−6 log n = 4n−6 .
vj

Applying union bound on all n2 pairs of classes shows that the condition of the
lemma holds for all pairs with probability at least,

1 − n2 4n−6 = 1 − 4/n4 .
The Price of Collusion in Series-Parallel
Networks

Umang Bhaskar1, Lisa Fleischer1 , and Chien-Chung Huang2,


1
Department of Computer Science, Dartmouth College, Hanover, NH 03755, U.S.A.
{umang,lkf}@cs.dartmouth.edu
2
Max-Planck-Institut für Informatik, Saarbrücken, 66123, Germany
[email protected]

Abstract. We study the quality of equilibrium in atomic splittable rout-


ing games. We show that in single-source single-sink games on series-
parallel graphs, the price of collusion — the ratio of the total delay
of atomic Nash equilibrium to the Wardrop equilibrium — is at most 1.
This proves that the existing bounds on the price of anarchy for Wardrop
equilibria carry over to atomic splittable routing games in this setting.

1 Introduction
In a routing game, players have a fixed amount of flow which they route in
a network [16,18,24]. The flow on any edge in the network faces a delay, and
the delay on an edge is a function of the total flow on that edge. We look at
routing games in which each player routes flow to minimize his own delay, where
a player’s delay is the sum over edges of the product of his flow on the edge and
the delay of the edge. This objective measures the average delay of his flow and
is commonly used in traffic planning [11] and network routing [16].
Routing games are used to model traffic congestion on roads, overlay routing
on the Internet, transportation of freight, and scheduling tasks on machines.
Players in these games can be of two types, depending on the amount of flow they
control. Nonatomic players control only a negligible amount of flow, while atomic
players control a larger, non-negligible amount of flow. Further, atomic players
may or may not be able to split their flow along different paths. Depending
on the players, three types of routing games are: games with (i) nonatomic
players, (ii) atomic players who pick a single path to route their flow, and (iii)
atomic players who can split their flow along several paths. These are nonatomic
[21,22,24], atomic unsplittable [3,10] and atomic splittable [8,16,19] routing games
respectively. We study atomic splittable routing games in this work. These games
are less well-understood than either nonatomic or atomic unsplittable routing
games. One significant challenge here is that, unlike most other routing games,
each player has an infinite strategy space. Further, unlike nonatomic routing
games, the players are asymmetric since each player has different flow value.

This work was supported in part by NSF grant CCF-0728869.

Research supported by an Alexander von Humboldt fellowship.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 313–326, 2010.

c Springer-Verlag Berlin Heidelberg 2010
314 U. Bhaskar, L. Fleischer, and C.-C. Huang

An equilibrium flow in a routing game is a flow where no single player can


change his flow pattern and reduce his delay. Equilibria are of interest since they
are a stable outcome of games. In both atomic splittable and nonatomic routing
games, equilibria exist under mild assumptions on the delay functions [4,17]. We
refer to equilibria in atomic splittable games as Nash equilibria and in nonatomic
games as Wardrop equilibria [24]. While the Wardrop equilibrium is known to be
essentially unique [24], atomic splittable games can have multiple equilibria [5].
One measure of the quality of a flow is the total delay of the flow: the sum
over all edges of the product of the flow on the edge and the induced delay on the
edge. For routing games, one conern is the degradation in the quality of equilib-
rium flow caused by lack of central coordination. This is measured by the price
of anarchy of a routing game, defined as the ratio of the total delay of worst-case
equilibrium in a routing game to the total delay of the flow that minimizes the to-
tal delay. Tight bounds on the price of anarchy are known for nonatomic routing
games [20], and are extensively studied in various settings [8,9,20,21,19,22,23].
In [13], Hayrapetyan et al. consider the total delay of nonatomic routing games
when nonatomic players form cost-sharing coalitions. These coalitions behave
as atomic splittable players. Hayrapetyan et al. introduce the notion of price of
collusion as a measure of the price of forming coalitions. For an atomic splittable
routing game the price of collusion is defined as the ratio of the total delay of
the worst Nash equilibrium to the Wardrop equilibrium. Together, a bound α on
the price of anarchy for nonatomic routing games and a bound β on the price of
collusion for an atomic splittable routing game, imply the price of anarchy for
the atomic splittable routing game is bounded by αβ.
For atomic splittable routing games, bounds on the price of anarchy are ob-
tained in [8,12]. These bounds do not match the best known lower bounds.
Bounds on the price of collusion in general also remains an open problem. Previ-
ously, the price of collusion has been shown to be 1 only in the following special
cases: in the graph consisting of parallel links [13]; when the players are symmet-
ric, i.e. each player has the same flow value and the same source and sink [8]; and
when all delay functions are monomials of a fixed degree [2]. Conversely, if there
are multiple sources and sinks, the total delay of Nash equilibrium can be worse
than the Wardrop equilibrium of equal flow value, i.e., the price of collusion can
exceed 1, even with linear delays [7,8].

Our Contribution. Let C denote the class of differentiable nondecreasing convex


functions. We prove the following theorem for atomic splittable routing games.

Theorem 1. In single source-destination routing games on series-parallel graphs


with delay functions drawn from the class C, the price of collusion is 1.

We first consider the case when all delays are affine. We show that in the case
of affine delays in the setting described above, the total delay at equilibrium is
largest when the players are symmetric, i.e. all players have the same flow value
(Section 3). To do this, we first show that the equilibrium flow for a player i
remains unchanged if we modify the game by changing slightly the value of flow
The Price of Collusion in Series-Parallel Networks 315

of any player with larger flow value than player i. Then starting from a game
with symmetric players, we show that if one moves flow from a player i evenly to
all players with higher flow value the cost of the corresponding equilibrium flow
never increases. Since it is known that the price of collusion is 1 if the players
are symmetric [8], this shows that the bound extends to arbitrary players with
affine delays.
In Section 4, we extend the result for general convex delays, by showing that
the worst case price of collusion is obtained when the delays are affine.
In contrast to Theorem 1 which presents a bound on the price of collusion,
we also present a new bound on the price of anarchy of atomic splittable routing
games in series-parallel graphs.

Theorem 2. In single source-destination routing games on series-parallel graphs,


the price of anarchy is bounded by k, the number of players.

This bound was proven earlier for parallel links in [12]. For nonatomic routing
games bounds on the price of anarchy depend on the delay functions in the
graph and, in the case of polynomial delays, the price of anarchy is bounded by
O(d/ log d). These bounds are known to be tight even on simple graphs consisting
of 2 parallel links [20]. Theorem 2 improves on the bounds obtained by Theorem 1
when k ≤ d/ log d. All missing proofs are contained in the full version [6].

2 Preliminaries
Let G = (V, E) be a directed graph, with two special vertices s and t called the
source and sink. The vector f , indexed by edges e ∈ E, is defined as a flow of
value v if the following conditions are satisfied.
 
fuw − fwu = 0, ∀u ∈ V − {s, t} (1)
w w
 
fsw − fws = v (2)
w w
fe ≥ 0, ∀e ∈ E .

Here fuw represents the flow on arc (u, w). If there are several flows f 1 , f 2 , · · · , f k ,
we define f := (f 1 ,f 2 , · · · , f k ) and f −i is the vector of the flows except f i . In this
k
case the flow on an edge fe = i=1 fei .
Let C be the class of differentiable nondecreasing convex functions. Each edge
e is associated with a delay function le : R+ → R drawn from C. Note that we
allow delay functions to be negative. For a given flow f , the induced delay on
edge e is le (fe ). We define the total delay on an edge e as the product of the
flow on the edge and the induced delay Ce (fe ) := fe le (fe ). The marginal delay

on an edge e is the rate of change of the  total delay: Le (fe ) := fe le (fe ) + le (fe ).
The total delay of a flow f is C(f ) = e∈E fe le (fe ).
An atomic splittable routing game is a tuple (G,v,l,s,t) where l is a vector
of delay functions for edges in G and v = (v 1 ,v 2 ,· · · ,v k ) is a tuple indicating
316 U. Bhaskar, L. Fleischer, and C.-C. Huang

the flow value of the players from 1 to k. We always assume that the players
are indexed by the order of decreasing flow value, hence v 1 ≥ v 2 · · · ≥ v k . All
players have source s and destination t. Player i has a strategy space consisting
of all possible s-t flows of volume v i . Let (f 1 , f 2 , · · · ,f k ) be a strategy vector.
Player i incurs a delay Cei (fei , fe
) := fei le (fe ) on each edge e, and his objective is
to minimize his delay C (f ) := e∈E Cei (fei , fe ). A set of players are symmetric
i

if each player has the same flow value.


A flow is a Nash equilibrium if no player can unilaterally alter his flow and
reduce his delay. Formally,

Definition 3 (Nash Equilibrium). In an atomic splittable routing game, flow


f is a Nash equilibrium if and only if for every player i and every s-t flow g of
volume v i , C i (f i , f −i ) ≤ C i (g, f −i ).

For player i, the marginal delay on edge e is defined as the rate of change of
his delay on the edge Lie (fei , fe ) := le (fe ) + fei le (fe ). For any s-t path p, the
marginal delay on path p is defined as the rate of change of totaldelay of player
i when he adds flow along the edges of the path: Lip (f ) := i i
e∈p Le (fe , fe ).
The following lemma follows from Karush-Kuhn-Tucker optimality conditions
for convex programs [15] applied to player i’s minimization problem.

Lemma 4. Flow f is a Nash equilibrium flow if and only if for any player i and
any two directed paths p and q between the same pair of vertices such that on all
edges e ∈ p, fei > 0, then Lip (f ) ≤ Liq (f ).

By Lemma 4, at equilibrium the marginal delay of a player is the same on any


s-t path on every edge of which he has positive flow. For a player i, the marginal
delay is Li (f ) := Lip (f ), where p is any s-t path on which player i has positive
flow on every edge.
For a given flow f and for every player i, we let E i (f ) = {e|fei > 0}. P i is the
set of all directed s-t paths p on which for every e ∈ p, fei > 0. We will use e ∈ P i
to mean that the edge e is in some path p ∈ P i ; then e ∈ P i ⇔ e ∈ E i . Let p
be a directed simple s-t path. A path flow on path p is a directed flow on p of
value fp . A cycle flow along cycle C is a directed flow along C of value fC . Any
flow f can be decomposed into a set of directed path flows and directed cycle
flows {fp }p∈P ∪ {fc }c∈C , [1]. This is a flow decomposition of f . Directed cycle
flows cannot exist in atomic splittable or nonatomic games (this follows easily
from Lemma 4). Thus, f i in these games can be expressed as a set of path flows
{fpi }p∈P i such that fei = p∈P i :e∈p fpi . This is a path flow decomposition of the
given flow. A generalized path flow decomposition is a flow decomposition along
paths where we allow the path flows to be negative.

Series-Parallel Graphs. Given graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ) and


vertices v1 ∈ V1 , v2 ∈ V2 , the operation merge(v1 , v2 ) creates a new graph
G = (V  = V1 ∪ V2 , E  = E1 ∪ E2 ), replaces v1 and v2 in V  with a single vertex
v and replaces each edge e = (u, w) ∈ E  incident to v1 or v2 by an edge incident
to v, directed in the same way as the original edge.
The Price of Collusion in Series-Parallel Networks 317

Definition 5. A tuple (G, s, t) is series-parallel if G is a single edge e = (s, t),


or is obtained by a series or parallel composition of two series-parallel graphs
(G1 , s1 , t1 ) and (G2 , s2 , t2 ). Nodes s and t are terminals of G.
(i) Parallel Composition: s = merge(s1 , s2 ), t = merge(t1 , t2 ),
(ii) Series Composition: s := s1 , t := t2 , v = merge(s2 , t1 ).

In directed series-parallel graphs, all edges are directed from the source to the
destination and the graph is acyclic in the directed edges. This is without loss
of generality, since any edge not on an s-t path is not used in an equilibrium
flow, and no flow is sent along a directed cycle. The following lemma describes
a basic property of flows in a directed series-parallel graph.

Lemma 6. Let G = (V, E) be a directed series-parallel graph with terminals s


 flow of value |h|, and c is a function defined
and t. Let h be an s-t  on the edges of
the graph
 G. (i) If e∈p c(e) ≥ κ on every s-t path
 p, then e∈E c(e)he ≥ κ|h|.
(ii) If e∈p c(e) = κ on every s-t paths p then e∈E c(e)he = κ|h| .

Vectors and matrices in the paper, except for flow vectors, will be referred to
using boldface. 1 and 0 refer to the column vectors consisting of all ones and
all zeros respectively. When the size of the vector or matrix is not clear from
context, we use a subscript to denote it, e.g. 1n .

Uniqueness of Equilibrium Flow. The equilibria in atomic splittable and


nonatomic routing games are known to be unique for affine delays, up to induced
delays on the edges (this is true for a larger class of delays [4], [17], but here we
only need affine delays). Although there may be multiple equilibrium flows, in
each of these flows the delay on an edge remains unchanged. If the delay func-
tions are strictly increasing, then the flow on each edge is uniquely determined.
However with constant delays, for two parallel links between s and t with the
same constant delay on each edge, any valid flow is an equilibrium flow. In this
paper, we assume only that the delay functions are differentiable, nondecreasing
and convex, hence we allow edges to have constant delays. We instead assume
that in the graph, between any pair of vertices, there is at most one path on
which all edges have constant delay. This does not affect the generality of our re-
sults. In graphs without this restriction there are Nash and Wardrop equilibrium
flows in which for every pair of vertices, there is at most one constant delay path
which has flow in either equilibrium. To see this, consider any equilibrium flow
in a general graph. For every pair of vertices with more than one constant delay
path between them, only the minimum delay path will be used at equilibrium. If
there are multiple minimum constant delay paths, we can shift all the flow onto
a single path; this does not affect the marginal delay of any player on any path,
and hence the flow is still an equilibrium flow.

Lemma 7. For atomic splittable and nonatomic routing games on series-parallel


networks with affine delays and at most one path between any pair of vertices
with constant delays on all edges, the equilibrium flow is unique.
318 U. Bhaskar, L. Fleischer, and C.-C. Huang

For technical reasons, for proving Theorem 1 we also require that every s-t path
in the graph have at least one edge with strictly increasing delay. We modify the
graph in the following way: we add a single edge e in series with graph G, with
delay function le (x) = x. It is easy to see that for any flow, this increases the
total delay by exactly v 2 where v is the value of the flow, and does not change
the value of flow on any edge at equilibrium. In addition, if the price of collusion
in the modified graph is less than one, then the price of collusion in the original
graph is also less than one. The proof of Theorem 2 does not use this assumption.

3 Equilibria with Affine Delays


In this section we prove Theorem 1 where all delays are affine functions of the
form le (x) = ae x + be . Our main result in this section is:

Theorem 8. In a series-parallel graph with affine delay functions, the total de-
lay of a Nash equilibrium is bounded by that of a Wardrop equilibrium of the
same total flow value.

We first present the high-level ideas of our proof. Given a series-parallel graph G,
terminals s and t, and edge delay functions l, let f (·) : Rk+ → Rm×k + denote the
function mapping a vector of flow values to the equilibrium flow in the atomic
splittable routing game. By Lemma 7, the equilibrium flow is unique and hence
the function f (·) is well-defined. Let (G, u, l, s, t) be an atomic splittable routing
game. Our proof consists of the following three steps:

Step 1. Start with v i = kj=1 uj /k for each player i, i.e. the players are sym-
metric.
Step 2. Gradually adjust the flow values v of the k players so that the total
delay of the equilibrium flow f (v) is monotonically nonincreasing.
Step 3. Stop the flow redistribution process when for each i, v i = ui .

In step 1, we make use of a result of Cominetti et al. [8].

Lemma 9. [8] Let (G, v, l, s, t) denote an atomic splittable routing game with
k
symmetric players. Let g be a Wardrop equilibrium of the same flow value
k
i=1 v . Then C(f (v)) ≤ C(g).
i

Step 2 is the heart of our proof. The flow redistribution works as follows. Let
i
v denote the current flow value of player i. Initially, each player i has v i =
k j
j=1 u /k. Consider each player in turn from k to 1. We decrease the flow of
the kth player and give it evenly to the first k−1 players until v k = uk . Similarly,
when we consider the rth player, for any r < k, we decrease v r and give the flow
evenly to the first r−1 players until v r = ur . Throughout the following discussion
and proofs, player r refers specifically to the player whose flow value is currently
being decreased in our flow redistribution process.
Our flow redistribution strategy traces out a curve S in Rk+ , where points in
the curve correspond to flow value vectors v.
The Price of Collusion in Series-Parallel Networks 319

Lemma 10. For all e ∈ E, i ∈ [k], the function f (v) is continuous and piece-
wise linear along the curve S, with breakpoints occurring where the set of edges
used by any player changes.

In what follows, we consider expressions of the form ∂J(f (v))


∂v i , where J is some
differentiable function defined on a flow (e.g., the total delay, or the marginal
delay along a path). The expression ∂J(f (v))
∂v i considers the change in the function
J(·) evaluated at the equilibrium flow, as the flow value of player i changes by
an infinitesimal amount, keeping the flow values of the other players constant.
Though f (v) is not differentiable at all points in S, S is continous. Therefore,
it suffices to look at the intervals between these breakpoints of S. In the rest of
the paper, we confine our attention to these intervals.
We show that when the flow values are adjusted as described, the total delay
is monotonically nonincreasing.

Lemma 11. In a series-parallel graph, suppose that v 1 = v 2 = · · · = v r−1 ≥


v r ≥ · · · ≥ v k . If i < r, then ∂C(f (v))
∂v i ≤ ∂C(f (v))
∂v r .

Proof of Theorem 8. By Lemma 9, the equilibrium flow in Step 1 has total


delay at most the delay of the Wardrop equilibrium. We show below that during
step 2, C(f (v)) does not increase. Since the total volume of flow remains fixed,
the Wardrop equilibrium is unchanged throughout. Thus, the price of collusion
does not increase above 1, and hence the final equilibrium flow when v = u also
has this property.
Let v be the current flow values of the players. Since C(f (v)) is a continuous
function of v (Lemma 10), it is sufficient to show that the C(f (v)) does not
increase between breakpoints. Define x as follows: xr = −1; xi = 0, if i > r; and
1
xi = r−1 , if 1 ≤ i < r. The vector x is the rate of change of v when we decrease
the flow of player r in Step 2. Thus, using Lemma 11, the change in total delay
between two breakpoints in S satisfies

∂C(f (v))  ∂C(f (v)) 1


r−1
C(f (v + δx)) − C(f (v))
lim = − + ≤ 0 .
δ→0 δ ∂v r i=1
∂v i r−1



The proof of Lemma 11 is described in Section 3.2. Here we highlight the main
ideas. To simplify notation, when the vector of flow values is clear from the
context, we use f instead of f (v) to denote the equilibrium flow.

By chain rule, we have that C(f )
∂v i = e∈E
∂Le (fe ) ∂fe
∂fe ∂v i . The exact expressions
of ∂C(f )
∂v i , for 1 ≤ i ≤ r, are given in Lemmas 18 and 19 in Section 3.2. Our
derivations use the fact that it is possible to simplify the expression ∂fe
∂v i using
the following “nesting property” of a series-parallel graph.

Definition 12. A graph G with delay functions l, source s, and destination t


satisfies the nesting property if all atomic splittable routing games on G satisfy
320 U. Bhaskar, L. Fleischer, and C.-C. Huang

the following condition: for any players i and j with flow values v i and v j , v i > v j
if and only if on every edge e ∈ E, for the equilibrium flow f , either fei = fej = 0
or fei > fej .

Lemma 13 ([5]). A series-parallel graph satisfies the nesting property for any
choice of non-decreasing, convex delay functions.

If a graph satisfies the nesting property, symmetric players have identical flows at
equilibrium. When the flow value of player r is decreased in Step 2, the first r − 1
players are symmetric. Thus, by Lemma 13, these players have identical flows
at equilibrium. Hence, for any player i < r, fei = fe1 and Lie (fei , fe ) = L1e (fe1 , fe )
for any edge e. With affine delays, the nesting property has the following
implication.

Lemma 14 (Frozen Lemma). Let f be an equilibrium flow in an atomic


splittable routing game (G,v,l,s,t) with affine delays on the edges, and assume
that the nesting property holds for (G,l,s,t). Then for all players j, j = i with
∂fej
E j (f ) ⊆ E i (f ) and all edges e, = 0.
∂v i
The frozen lemma has two important implications for our proof. Firstly, in Step
2, players r + 1, · · · , k will not change their flow at equilibrium. Secondly, this
implies a simple expression for ∂f ∂v i , 1 ≤ i ≤ r,
e

∂fe k
∂fei ∂fe1 ∂fer
= = (r − 1) + . (3)
∂v r i=1
∂v r ∂v r ∂v r

∂fe k
∂fei ∂fei
= = , ∀i < r . (4)
∂v i i=1
∂v i ∂v i

3.1 Proof of Lemma 14 (Frozen Lemma)


By Lemma 10, we can assume that f is between the breakpoints of S and is thus
differentiable.

Lemma 15. If player h has positive flow on every edge of two directed paths p
∂Lh
p (f ) ∂Lh
q (f )
and q between the same pair of vertices, then ∂v i = ∂v i .

Proof. Since f is an equilibrium, Lemma 4 implies that Lhp (f ) = Lhq (f ). Differenti-


ation of the two quantities are the same since f is maintained as an equilibrium.  

Lemma 16. Let G be a directed acyclic graph. For an atomic splittable routing
game (G, v, l, s, t) with equilibrium flow f , let c and κ be defined as in Lemma 6.
 ∂fei (v)
Then e∈E c(e) ∂v j = κ if i = j, and is zero otherwise.
The Price of Collusion in Series-Parallel Networks 321

Proof. Define x as follows: xj = 1 and xi = 0 for j = i. Then


   
∂f i (v) f i (v + δx) − fei (v)
c(e) e j = c(e) lim e
∂v δ→0 δ
e∈E e∈E

c(e)(fei (v + δx) − fei (v))
= lim e∈E ,
δ→0 δ
where the second equality is due to the fact that fei (·) is differentiable.

For any two s-t flows f i , g i , it follows from Lemma 6 that e∈E  c(e)(fe −g
i i
e) =
κ(|f | − |g |). If i = j then |f (v + δx)| = |f (v)|, hence
i i i i i
e∈E c(e)(fe (v +
δx)
 − f i
e (v)) = 0. If i = j, then |f i
(v + δx)| − |f i
(v)| = δ, implying that
e∈E c(e)(f i
e (v + δx) − f i
e (v)) = κδ. The proof follows. 


Proof of Lemma 14. We prove by induction on the decreasing order of the


index of j. We make use of the folllowing claim.
Claim 17. Let S j = {h : E h (f ) ⊇ E j (f )}. For player j and an s-t path p on
which j has positive flow,
∂Ljp (f )  ∂Lhp (f )  ∂f j
|S j | i
− i
= (|S j | + 1) e∈p ae ∂vei
∂v ∂v
jh∈S \{j}

 ∂ h:E h (f )⊂E j (f )
feh
+ e∈p ae ∂v i .

Proof. Given players i and h,


∂Lhp (f )  ∂(fe + f h )
e
= ae . (5)
∂v i e∈p
∂v i

Summing (5) over all players h in S j \{j} and subtract it from |S j | times (5) for
player j gives the proof. 


Let Gj = (V, E j ). By definition, all players h ∈ S j have flow on every s-t path
in this graph. Lemma 15 implies that for any s-t paths p, q in Gj and any player
∂Lh (f ) ∂Lh (f )
h ∈ S j , ∂v p
i
q
= ∂v i . The expression on the left hand side of Claim 17 is
thus equal for any path p ∈ P j , and therefore so is the expression on the right.
For the base case j = k, the set {h : E h (f ) ⊂ E j (f )} is empty. Hence, the
second term on the right of Claim 17 is zero, and by the previous discussion,
 ∂f k ∂f k
the quantity e∈p ae ∂vei is equal for any path p ∈ P k . Define c(e) = ae ∂vei
 ∂f k
for each e ∈ E k and κ = e∈p ae ∂vei for any s-t path p in Gk . By Lemma 16,
  !2
∂fek ∂fek ∂f k
e∈E j (f ) c(e) ∂v i = e∈E j (f ) ae ∂v i = 0. Hence, ∂vei = 0, ∀e ∈ E.
∂f h
For the induction step j < k, due to the inductive hypothesis, ∂vei = 0 for
h > j. Since by the nesting property if E h (f ) ⊂ E j (f ) then h > j, the second
term on the right of Claim 17 is again zero. By the same argument as in the
∂f j
base case, ∂vei = 0, for each e ∈ E, proving the lemma. 

322 U. Bhaskar, L. Fleischer, and C.-C. Huang

3.2 Proof of Lemma 11


An unstated assumption for all lemmas in this section is that the nesting property
holds. For the proof of Lemma 11, our first step is to express the rate of change of
total delay in terms of the rate of change of marginal delay of the players, as the
flow value of player r is being decreased. The next lemma gives this expression
for the first r − 1 players.
k
∂C(f ) ∂Li (f ) vj
Lemma 18. For f = f (v), and for each i < r, ∂v i = Li (f ) + ∂v i
j=2
2 .
Proof. For any player j, the set of edges used by player j is a subset of the
edges used by player i < r, since player i has the largest flow value and we
assume that the nesting property holds. Hence, the total delay at equilibrium
C(f ) = e∈E i (f ) Ce (fe ).
∂C(f )  ∂Ce (fe ) ∂fe  ∂fe
= = (2ae fe + be ) i
∂v i ∂f e ∂v i ∂v
e∈E i (f ) e∈E i (f )
⎛ ⎞
 ∂fe 
= ⎝Lie (fei , fe ) + ae fej ⎠ . (6)
∂v i
i
e∈E (f ) j=i

By Lemma 16 with c(e) = Lie (fei , fe ) and κ = Li (f ), e∈E i Lie (fei , fe ) ∂fe
∂v i =
∂C(f )  ∂fe
Li (f ). Thus, i
= Li (f ) + ae fej i .
∂v i
∂v
j=i e∈E
∂(fe +fei ) i i
1 ∂Le (fe ,fe )
By (4), we have that ae ∂fe
∂v i = 12 ae ∂v i = 2 ∂v i . It follows that
∂C(f ) 1   j ∂Lie (fei , fe )
= Li (f ) + fe
∂v i 2 i
∂v i
j=i e∈E

1  ∂Lie (fei , fe )


= Li (f ) + fqj ,
2 ∂v i
j=i e∈E i q∈P i :e∈q

where the last equality is because for any player j, fej = j
q∈P j :e∈q fq =
 j
q∈P i :e∈q fq , and the nesting property. Reversing the order of summation and
 ∂Li (f i ,f ) i
observing that e∈p:p∈P i e∂vei e = ∂L∂v(f i
)
and v i = v 1 , we have the required
expression. 

∂C(f )
We obtain a similar expression for ∂v r .
Lemma 19. Let f = f (v). For player r whose flow value decreases in Step 2,
) * ) *
r − 1 ∂L1 (f )  i ∂Lr (f )  i
k k
∂C(f ) 1 1
= L (f ) + v + v
∂v r r+1 ∂v r i=r r+1 ∂v r i=r
) *
 ∂f e
+ (r − 2) ae fe1 r . (7)
1
∂v
e∈E
The Price of Collusion in Series-Parallel Networks 323

Let P denote the set of all s-t paths in G, and for equilibrium flow f , let
{fpi }p∈P,i∈[k] denote a path flow decomposition of f . For players i, j ∈ [r] with
player r defined as in the flow redistribution, we will be interested in the rate
of change of marginal delay of player i along an s-t path p as the value of flow
controlled by player j changes. Given a decomposition {fpi }p∈P,i∈[k] along paths
of the equilibrium flow, this rate of change can be expressed as

∂Lip (f )  ∂(fe + fei )   ∂(fq + fqi )


= ae = ae
∂v j e∈p
∂v j e∈p
∂v j
q∈P:e∈q
 ∂(fq + fqi ) 
= ae . (8)
∂v j e∈q∩p
q∈P

Let upq = e∈p∩q ae for any paths p, q ∈ P and the matrix U is defined as the
matrix of size |P| × |P| with entries [upq ]p,q∈P .
Lemma 20. For an equilibrium flow f , there exists a generalized path flow de-
i 1 2 k
composition {fpi }p∈P i ,i∈[k] so that P ⊆ P i for all i ∈ [k] and P ⊇ P ⊇ · · · P .
Moreover, each of the submatrices Ui = [upq ]p,q∈P i of U is invertible, ∀i ∈ [k].
i i−1
Since P ⊆ P , we can arrange the rows and columns of U so that Ui is a
leading principal submatrix of U for every player i.
Since matrix Ui is invertible, we define Wi = U−1 . For a matrix A ∈ Rm×n ,
we use Ap to refer to the pth row vector and
apq to refer to the entry in the pth
row and qth column. We define A = aij .
i∈[m],j∈[n]

i
Lemma 21. For equilibrium flow f and sets P ⊆ P as described in Lemma 20,
for all players i ∈ [k], Wi  ≥ Wi+1  and Wk  > 0.
The next lemma gives the rate of change of marginal delay at equilibrium.
Lemma 22. For player r defined as in the flow redistribution process and any
player i < r, for f = f (v),
∂Li (f (v)) 2
(i) = ,
∂v i
Wi 
∂Li (f ) 1
(ii) = ,
∂v r Wi 
r
∂L (f ) r+1 1 r−1 1
(iii) = r + .
∂v r r W  r W1 

If we have just two players, it follows by substituting i = 1 and r = 2 and the


expressions from Lemma
! 22 into Lemma 18 and Lemma 19 that ∂C(f ) ∂C(f )
∂v 2 − ∂v 1 =
∂C(f ) ∂C(f )
1 2
2v
1
2 − 1
1 . By Lemma 21, W1  ≥ W2 , and hence ∂v 2 − ∂v 1 ≥
W  W 
0, proving Lemma 11 for the case of two players. However, if we have more than
324 U. Bhaskar, L. Fleischer, and C.-C. Huang

two players, when r = 2 the fourth term on the right hand side of (7) has
nonzero contribution. Calculating this term is complicated. However, we show
the following inequality for this expression.
Lemma 23. For f = f (v) and the player
 r as defined in
 the flow redistribution
 ∂f e v 1
v r
1 1
process, ae fe1 r ≥ − − .
1
∂v W1  r W1  Wr 
e∈E
i
Proof of Lemma 11. For any player i < r, substituting the expression for ∂L∂v(f
i
)

from Lemma 22 into Lemma 18, and observing that Li (f ) = L1 (f ) and Wi  =
W1  since the flow of the first r − 1 players is identical,
k j
∂C(f ) 1 j=2 v
= L (f ) + . (9)
∂v i W1 

Similarly, substituting from Lemmas 22 and 23 into Lemma 19 and simplifying,


k i
  ) k
*
∂C(f ) v 1 1 1
≥ L1 (f )+ i=21 + − v i − (r − 2)(v 1 − v r ) .
∂v r W  r Wr  W1  i=2
(10)

We subtract (9) from (10) to obtain, for any player i < r,


  )
k
*
∂C(f ) ∂C(f ) 1 1 1
− ≥ − v − (r − 2)(v − v )
i 1 r
. (11)
∂v r ∂v i r Wr  W1  i=2


From Lemma 21 we know that W1  ≥ Wr . Also, ki=2 v i = (r − 2)v 1 +
k
i=r v ≥ (r − 2)(v − v ). Hence, the expression on the right of (11) is nonneg-
i 1 r

ative, completing the proof. 




4 Convex Delays on Series-Parallel Graphs

Let C denote the class of continuous, differentiable, nondecreasing and convex


functions. In this section we prove the following result.

Theorem 24. The price of collusion on a series-parallel graph with delay func-
tions taken from the set C is at most the price of collusion with linear delay
functions.

This theorem combined with Theorem 8, suffices to prove Theorem 1. The fol-
lowing lemma is proved by Milchtaich.1
1
Milchtaich in fact shows the same result for undirected series-parallel graphs. In our
context, every simple s-t path in the underlying undirected graph is also an s-t path
in the directed graph G.
The Price of Collusion in Series-Parallel Networks 325

Lemma 25 ([14]). Let (G,v,l,s,t) and (G,ṽ,l̃,s,t) be nonatomic routing games


on a directed series-parallel graph with terminals s and t, where v ≥ ṽ, and
∀x ∈ R+ and e ∈ E, le (x) ≥ ˜ le (x). Let f and f˜ be equilibrium flows for the
games with delays l and l respectively. Then C(f ) ≥ C̃(f˜).
˜

We now use Lemma 25 to prove Theorem 24.


Proof of Theorem 24. Given a series-parallel graph G with delay functions l taken
from C, let g denote the atomic equilibrium flow and f denote the nonatomic
equilibrium. We define a set of linear - delay functions l̃ as follows. For an edge,
∂le (fe ) -
l̃e (x) = ae x+be , where ae = ∂fe - and be = le (ge )−ae ge . Hence, the delay
fe =ge
function ˜le is the tangent to the original delay function at the atomic equilibrium
flow. Note that a convex continuous differentiable function lies above all of its
tangents.
Let g̃ and f˜ denote the atomic and nonatomic equilibrium flows respectively
with delay functions l̃. Then by the definition of l̃, g̃ = g and l̃(g̃) = l(g). Hence,
C(g) C̃(g̃)
C̃(g̃) = C(g). Further, by Lemma 25, C(f ) ≥ C̃(f˜). Since C(f ) ≤ C̃(f˜) , the proof
follows.

5 Total Delay without the Nesting Property

If the nesting property does not hold, the total delay can increase as we decrease
the flow of a smaller player and increase the flow of a larger player, thus causing
our flow redistribution strategy presented in Section 3.2 to break down. See the
full version for an example.

References

1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows. Prentice-Hall, Englewood
Cliffs (1993)
2. Altman, E., Basar, T., Jimenez, T., Shimkin, N.: Competitive routing in networks
with polynomial costs. IEEE Transactions on Automatic Control 47(1), 92–96
(2002)
3. Awerbuch, B., Azar, Y., Epstein, A.: The price of routing unsplittable flow. In:
STOC, pp. 57–66. ACM, New York (2005)
4. Beckmann, M., McGuire, C.B., Winsten, C.B.: Studies in the Economics of Trans-
portation. Yale University Press, New Haven (1956)
5. Bhaskar, U., Fleischer, L., Hoy, D., Huang, C.-C.: Equilibria of atomic flow games
are not unique. In: SODA, pp. 748–757 (2009)
6. Bhaskar, U., Fleischer, L., Huang, C.-C.: The price of collusion in series-parallel
networks (unpublished manuscript 2010)
7. Catoni, S., Pallottino, S.: Traffic equilibrium paradoxes. Transportation Sci-
ence 25(3), 240–244 (1991)
8. Cominetti, R., Correa, J.R., Stier-Moses, N.E.: The impact of oligopolistic compe-
tition in networks. Operations Research 57(6), 1421–1437 (2009)
326 U. Bhaskar, L. Fleischer, and C.-C. Huang

9. Correa, J.R., Schulz, A.S., Stier-Moses, N.E.: On the inefficiency of equilibria in


congestion games. In: Jünger, M., Kaibel, V. (eds.) IPCO 2005. LNCS, vol. 3509,
pp. 167–181. Springer, Heidelberg (2005)
10. Fotakis, D., Spirakis, P.: Selfish unsplittable flows. In: TCS, pp. 593–605. Springer,
Heidelberg (2004)
11. Harker, P.: Multiple equilibrium behaviors on networks. Transportation Sci-
ence 22(1), 39–46 (1988)
12. Harks, T.: Stackelberg strategies and collusion in network games with splittable
flow. In: Bampis, E., Skutella, M. (eds.) WAOA 2008. LNCS, vol. 5426, pp. 133–
146. Springer, Heidelberg (2009)
13. Hayrapetyan, A., Tardos, É., Wexler, T.: The effect of collusion in congestion
games. In: STOC, pp. 89–98. ACM Press, New York (2006)
14. Milchtaich, I.: Network topology and the efficiency of equilibrium. Games and
Economic Behavior 57(2), 321–346 (2006)
15. Nocedal, J., Wright, S.T.: Numerical Optimization. Springer, Heidelberg (2006)
16. Orda, A., Rom, R., Shimkin, N.: Competitive routing in multiuser communication
networks. IEEE/ACM Transactions on Networking 1(5), 510–521 (1993)
17. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave n-person
games. Econometrica 33(3), 520–534 (1965)
18. Rosenthal, R.W.: A class of games possessing pure-strategy nash equilibria. Intl.
J. of Game Theory 2, 65–67 (1973)
19. Roughgarden, T.: Selfish routing with atomic players. In: SODA, pp. 1184–1185
(2005)
20. Roughgarden, T.: The price of anarchy is independent of the network topology. J.
Comput. Syst. Sci. 67(2), 341–364 (2003)
21. Roughgarden, T.: Selfish Routing and the Price of Anarchy. The MIT Press, Cam-
bridge (2005)
22. Roughgarden, T., Tardos, E.: How bad is selfish routing? Journal of the ACM 49(2),
236–259 (2002)
23. Roughgarden, T., Tardos, E.: Bounding the inefficiency of equilibria in nonatomic
congestion games. Games and Economic Behavior 47, 389–403 (2004)
24. Wardrop, J.G.: Some theoretical aspects of road traffic research. In: Proc. Institute
of Civil Engineers, Pt. II, vol. 1, pp. 325–378 (1952)
The Chvátal-Gomory Closure of an Ellipsoid Is a
Polyhedron

Santanu S. Dey1 and Juan Pablo Vielma2,3


1
H. Milton Stewart School of Industrial and Systems Engineering,
Georgia Institute of Technology, USA
[email protected]
2
Business Analytics and Mathematical Sciences Department
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
3
Department of Industrial Engineering
University of Pittsburgh, Pittsburgh, PA, USA
[email protected]

Abstract. It is well-know that the Chvátal-Gomory (CG) closure of


a rational polyhedron is a rational polyhedron. In this paper, we show
that the CG closure of a bounded full-dimensional ellipsoid, described
by rational data, is a rational polytope. To the best of our knowledge,
this is the first extension of the polyhedrality of the CG closure to a non-
polyhedral set. A key feature of the proof is to verify that all non-integral
points on the boundary of ellipsoids can be separated by some CG cut.
Given a point u on the boundary of an ellipsoid that cannot be trivially
separated using the CG cut parallel to its supporting hyperplane, the
proof constructs a sequence of CG cuts that eventually separates u. The
polyhedrality of the CG closure is established using this separation result
and a compactness argument. The proof also establishes some sufficient
conditions for the polyhedrality result for general compact convex sets.

1 Introduction

Nonlinear Integer Programming has received significant attention from the Inte-
ger Programming (IP) community in recent time. Although, some special classes
are efficiently solvable [32], even simple nonlinear IP problems can be NP-Hard
or undecidable [33]. However, there has been considerable progress in the devel-
opment of practical algorithms that can be effective for many important applica-
tions (e.g. [1,8,9,10,32,36,37]). Building on work for linear IP, practical algorithms
for nonlinear IP have benefited from the development of several classes of cutting
planes or valid inequalities (e.g. [3,4,5,6,13,14,25,29,30,31,35,28,39,40,43]). Many
of these inequalities are based on the generalization of ideas used in linear IP. For
example, [4,5,39,14] exploit the interaction between superadditive functions and
nonlinear constraints to develop techniques that can yield several strong valid
inequalities.
Following the success of such approaches we study some theoretical properties
of this interaction when the superadditive function is the integer round down

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 327–340, 2010.

c Springer-Verlag Berlin Heidelberg 2010
328 S.S. Dey and J.P. Vielma

operation · and the nonlinear constraints are convex. Specifically we study
the polyhedrality of the (first) Chvátal-Gomory (CG) closure [15,26,27,41] of
a non-polyhedral convex set. The study of properties of the CG closure of a
rational polyhedron has yielded many well known results for linear IP. In this
case, the closure is a rational polyhedron [41] for which the associated optimiza-
tion, separation and membership problems are NP-hard even for restricted cases
[11,12,21,34]. However, optimization over the CG closure of a polyhedron has
been successfully used to show its strength computationally [22,23]. Similar re-
sults have also been obtained for closures associated to other valid inequalities
such as split cuts [2,7,12,17,19,20,44].
CG cuts for non-polyhedral sets are considered implicitly in [15,41] and explic-
itly in [14], but only [41] deals with the polyhedrality of the CG closure. Although
[41] shows that for rational polyhedra the closure is a rational polyhedron, the
result does not automatically extend to non-polyhedral sets. Furthermore, nei-
ther of the known proofs of the result for rational polyhedra [16,41,42] can be
easily adapted to consider other convex sets. In fact, as noted in [41] even the
polyhedrality of the CG closure of non-rational polytopes remains unknown.
Because of this, we study the polyhedrality of the CG closure of an ellipsoid as
the first natural step towards understanding the closure of other non-polyhedral
convex sets.
Let a rational ellipsoid be the image of an Euclidean ball under a rational
affine transformation. Our main result is to show that the CG closure of a full-
dimensional bounded rational ellipsoid is a rational polytope. To the best of
our knowledge, this is the first extension to a non-polyhedral set of the well
known result for rational polyhedra. Additionally, the proof of our main result
reveals some general sufficient conditions for the polyhedrality of the CG closure
and other interesting properties. For example, we show that every non-integral
point on the boundary of an ellipsoid can be separated by a CG cut. We recently
verified [18] that this geometrically natural property holds for some other classes
of convex sets.
The rest of the paper is organized as follows. In Section 2, we give some back-
ground on CG cuts, formally state the main result of the paper and present an
outline of its proof. In Section 3, we present notation and review some standard
results from convex analysis. In Section 4, we consider two separation results that
are needed for the proof of the main theorem, which we present in Section 5. We
end with some remarks in Section 6.

2 Background, Main Result and Proof Outline

For a polyhedron P ⊂ Rn , the CG cutting plane procedure [15,26,27] can be


described as follows. For an integer vector a ∈ Zn , let d ∈ R be such that
{x ∈ Rn : "a, x# ≤ d} ⊃ P where "u, v# is the inner product between u and v.
We then have that PI := P ∩ Zn ⊂ {x ∈ Rn : "a, x# ≤ d} and hence the
CG cut "a, x# ≤ d is a valid inequality for conv(PI ). The first CG closure
P 1 of P is defined as the convex set obtained by adding all possible CG cuts
The Chvátal-Gomory Closure of an Ellipsoid Is a Polyhedron 329

to P . If P is a rational polyhedron, then P 1 is also a polyhedron [41] and


hence we can recursively define the k-th CG closure P k of P as the first CG
closure of P k−1 . Furthermore, for any rational polyhedron P we have that there
exists k ∈ Z+ such that P k = conv(PI ) [15,41]. Non-rational polytopes are also
considered in [15] and the CG procedure is extended to the feasible region of
Conic Programming (CP) problems in [14]. In fact, the CG procedure can be
extended to, at least, any compact convex set as follows.
Let C ⊂ Rn be a compact B convex set and let σC (a) := supx∈C "a, x# be its
support function so that C = a∈Rn {x ∈ Rn : "a, x# ≤ σC (a)}. Because Qn is
dense in Rn and B σC (a) is positively homogeneous and continuous, it can be
verified that C = a∈Zn {x ∈ Rn : "a, x# ≤ σC (a)}.
B
Definition 1. For any S ⊂ Zn , let CC(S, C) := a∈S {x ∈ Rn : "a, x# ≤
σC (a)}. We recursively define the k-th CG closure C k of C as C 1 := CC(Zn , C)
and C k+1 := CC(Zn , C k ) for all k > 1.
The definition is consistent because C 1 is a closed convex set contained in C and
when C is a polyhedron it coincides with the traditional definition. Furthermore,
CI := C ∩ Z ⊂ C k for all k and, as noted in [41], the following theorem follows
from [15,41].
Theorem 1 ([15,41]). There exist k such that C k = conv(CI ).
Theorem 1 is also shown in [14] for CP problems with bounded feasible regions.
However, the result neither implies nor requires the polyhedrality of C 1 . In fact,
the original proof of Theorem 1 in [15] does not use the polyhedrality of either P
or P 1 . Although surprising, it could be entirely possible for Theorem 1 to hold
and for C k to be the only polyhedron in the hierarchy {C l }kl=1 . Our main result
is the first proof of the polyhedrality of C 1 for a non-polyhedral set C.
Theorem 2 (Main Theorem). Let T be a full-dimensional bounded rational
ellipsoid. Then CC(Zn , T ) is a rational polytope.
Before presenting an outline of our proof of Theorem 2, we discuss why some of
the well-known polyhedrality proofs and results do not easily extend to ellipsoids.
We begin by noting that it is not clear how to extend the polyhedrality proofs
in [16,41,42] beyond rational polyhedra because they rely on properties that are
characteristic of these sets such as TDI systems and finite integral generating
sets. However, we could try to prove Theorem 2 by using the polyhedrality of the
first CG closure of polyhedral approximations of T . One natural scheme could be
to attempt constructing a sequence of rational polytope pairs {Pi , Qi }i∈N such
that (i) Pi ∩Zn = Qi ∩Zn = T ∩Zn , (ii) Pi ⊂ T ⊂ Qi and (iii) V ol(Qi \Pi ) ≤ 1/i.
We then would have that
Pik ⊂ T k ⊂ Qki , (1)
for all i, k ≥ 1. As noted in [41], using this approach Theorem 1 in general
follows directly from Theorem 1 for rational polytopes. Unfortunately, it is not
clear how to show that there exists i such that (1) holds as equality for k = 1
without knowing a priori that T 1 is a polyhedron. Finally, we note that cut
330 S.S. Dey and J.P. Vielma

domination arguments commonly used in polyhedrality proofs of closures do not


seem to adapt well to the proof of Theorem 2.
Due of the reasons stated above, to prove Theorem 2 we resort to a different
approach that relies on being able to separate with a CG cut every non-integral
point on the boundary of T . Specifically, we show that CC(Zn , T ) can be gen-
erated with the procedure described in Figure 1.

Step 1 Construct a polytope Q defined by a finite number of CG cuts such that:


– Q ⊂ T.
– Q ∩ bd(T ) ⊂ Zn .
Step 2 Update Q with a CG cut that separates a point of Q \ CC(Zn , T ) until
no such cut exists.

Fig. 1. A procedure to generate the first CG closure for ellipsoid

To show that Step 1 can be accomplished, we first show that every non-integral
point on the boundary of T can be separated by a CG cut. If there are no integral
points on the boundary of T , then this separation result allows us to cover the
boundary of T with a possibly infinite number of open sets that are associated
to the CG cuts. We then use compactness of the boundary of T to obtain a finite
sub-cover that yields a finite number of CG cuts that separate every point on
the boundary of T . If there are integer points on the boundary, then we use a
stronger separation result and a similar argument to show that there is a finite
set of CG cuts that separate every non-integral point on the boundary of T .
To show that Step 2 terminates finitely, we simply show that the set of CG
cuts that separate at least one point in Q \ CC(Zn , T ) is finite.
We note that the separation of non-integral points using CG cuts on the
boundary of T , required in Step 1 of Figure 1, is not straightforward. A natural
first approach to separate a non-integral point u on the boundary of T is to take
an inequality "a, x# ≤ σT (a) that is supporting at u, scale it so that a ∈ Zn , and
then generate the CG cut "a, x# ≤ σT (a). If σT (a) ∈ / Z, then the CG cut will
separate u because a was selected such that "a, u# = σT (a). Unfortunately, as
the following examples show, this approach can fail either because a cannot be
scaled to be integral or because σT (a) ∈ Z for any scaling that yields a ∈ Zn .
= √
Example 1. Let T := {x ∈ R2 | x21 + x22 ≤ 1} and u = (1/2, 3/2)T ∈ bd(T ).
We have that the supporting inequality for u is a1 x1 + a2 x2 ≤ σT (a) where
a = u. Since u is irrational in only one component, observe that a cannot be
scaled to be integral.
For Example 1, it is easy to see that selecting an alternative integer left-hand-

side vector a resolves the issue. We can use a = (1, 1) which has σT (a ) = 2 to
obtain the CG cut x1 + x2 ≤ 1. In Example 1 this CG cut separates every non-
negative non-integral point on the boundary of T . In Section 4, we will show that
The Chvátal-Gomory Closure of an Ellipsoid Is a Polyhedron 331

given any non-integral point u on the boundary of T such that the left-hand-side
of its supporting hyperplane cannot be scaled to be integral, there exists an al-
ternative left-hand-side integer vector a such that the CG cut "a , x# ≤ σT (a )
separates u. This vector a will be systematically obtained using simultaneous
diophantine approximation of the left-hand-side of an inequality describing the
supporting hyperplane at u.
=
Example 2. Let T := {x ∈ R2 | x21 + x22 ≤ 5} and u = (25/13, 60/13)T ∈
bd(T ). We have that the supporting inequality for u can be scaled to a1 x1 +
a2 x2 ≤ σT (a) for a = (5, 12)T which has σT (a) = 65. Because 5 and 12 are
coprime and σT (·) is positively homogeneous, a cannot be scaled so that a ∈ Z2
and σT (a) ∈/ Z.

Observe that Example 2 is not an isolated case. In fact, these = cases are closely
related to primitive Pythagorean triples. For T := {x ∈ R2 | x21 + x22 ≤ r},
select any primitive Pythagorean triple v12 + v22 = v32 , and consider the point
r( vv13 , vv23 ) (such that r( vv13 , vv23 ) ∈
/ Z2 ). Then since v1 and v2 are coprimes, the
behavior in Example 2 will be observed. Also note that these examples are not
nballs2 in R2 , since it 2is easy
2
restricted only to Euclidean to construct integers
a1 , ..., an , an+1 such that i=1 ai = an+1 (e.g. 3 + 42 + 122 = 132 ). For the
class of points u ∈ bd(T ) where the left-hand-side of an inequality describing
the supporting hyperplane is scalable to an integer vector a, we will show in
Section 4 that there exists a systematic method to obtain a ∈ Zn such that
"a , x# ≤ σT (a ) separates u.

3 Notation and Standard Results from Convex Analysis


In this paper we consider an ellipsoid given by a non-singular and surjective
rational linear transformation of an Euclidean ball followed by a rational trans-
lation. Without loss of generality, we may assume that that this ellipsoid is
described as T := {x ∈ Rn : γB (x − c) ≤ 1} where c ∈ Qn , and γB (x) := Ax
is the gauge of B := {x ∈ Rn : Ax ≤ 1} such that A ∈ Qn×n is a symmetric
positive definite matrix. Then T is the translation by c of B. The set B is a full
dimensional compact convex set with the zero vector in its interior and hence
has the following properties.
– The support function of B is σB (a) = A−1 a.
– The polar of B given by B ◦ := {a ∈ Rn | "a, x# ≤ 1 ∀x ∈ B} = {a ∈
Rn | σB (a) ≤ 1} is a full-dimensional and compact convex set.
– For any u ∈ bd(B) we have that sB (u) := AT A(u) is such that "sB (u), u# =
σB (sB (u)) = 1 and hence "sB (u), x# ≤ 1 = σB (sB (u)) is a valid inequality
for B that is supporting at u.
– "a, x# ≤ σB (a)γB (x).
– The boundary of B is bd(B) := {x ∈ Rn : γB (x) = 1}.
332 S.S. Dey and J.P. Vielma

Because T = B + c we also have the following properties of T .


– The support function of T is σT (a) = σB+c (a) = σB (a) + "a, c# = A−1 a +
"a, c#.
– For any u ∈ bd(T ) we have that sT (u) := sB (u − c) = AT A(u − c) is such
that "sT (u), u − c# = "sB (u − c), u − c# = σB (sB (u − c)) = σB (s(u)) = 1 and
hence "s(u), x# ≤ 1 + "s(u), c# = σT (s(u)) is a valid inequality for T that is
supporting at u.
– The boundary of T is bd(T ) := {x ∈ Rn : γB (x − c) = 1}.
To simplify the notation, we regularly drop the T from σT (·), sT (·) and CC(·, T )
so that σ(·) := σT (·), s(·) := sT (·) and CC(·) := CC(·, T ). In addition, for u ∈ R
we denote its fractional part by F (u) := u − u.

4 Separation
To prove Theorem 2 we need two separation results. The first one simply states
that every non-integral point on the boundary of T can be separated by a CG
cut.
Proposition 1. If u ∈ bd(T ) \ Zn , then there exists a CG cut that separates
point u.
An integer point u ∈ bd(T ) ∩ Zn cannot be separated by a CG cut, but Proposi-
tion 1 states that every point in bd(T ) that is close enough to u will be separated
by a CG cut. However, for the compactness argument to work we need a stronger
separation result for points on the boundary that are close to integral boundary
points. This second result states that all points in bd(T ) that are sufficiently
close to an integral boundary point can be separated by a finite number of CG
cuts.
Proposition 2. Let u ∈ bd(T ) ∩ Zn . Then there exists εu > 0 and a finite set
Wu ⊂ Zn such that
"w, u# = σ(w) ∀w ∈ Wu , (2)

∀v ∈ bd(T )∩{x ∈ Rn : x− u < εu } \ {u} ∃w ∈ Wu s.t "w, v# > σ(w), (3)
and
∀v ∈ int(T ) ∃w ∈ Wu s.t. "w, v# < σ(w). (4)
The main ideas used in the proof of Proposition 2 are as follows. First, it is
verified that for any nonzero integer vector q, there exists a finite i ∈ Z+ such
that the CG cut of the form "q + iλs(u), x# ≤ σ(q + iλs(u)) satisfies (2) (here
λs(u) ∈ Zn for some scalar λ = 0). Second, it is verified that by carefully
selecting a finite number of integer vectors and applying the above construction,
all points in a sufficiently small neighborhood of u can be separated. Finally, (4) is
established by adding the supporting hyperplane at u which is trivially a CG cut.
Although this proof of Proposition 2 is similar to the proof of Proposition 1,
it is significantly more technical. We therefore refer the readers to [18] where a
more general version of Proposition 2 is proven and confine our discussion to an
outline of the proof of Proposition 1 here.
The Chvátal-Gomory Closure of an Ellipsoid Is a Polyhedron 333

4.1 Outline of Proof of Proposition 1


To prove Proposition 1 we construct a separating CG cut for u ∈ bd(T ) \ Zn by
modifying the supporting inequality for T at u. In the simplest case, we scale
"s(u), x# ≤ σ(s(u)) by λ > 0 so that λs(u) ∈ Zn , to obtain a CG cut "λs(u), x# ≤
σ(λs(u)) that separates u. If this is not successful, then we approximate the
direction s(u) by a sequence {si }i∈N ⊂ Zn such that si /si  → s(u)/s(u) and
for which "si , x# ≤ σ(si ) separates u for sufficiently large i. For this approach
to work we will need a sequence that complies with the following two properties.
C1 limi→+∞ "si , u# − σ(si ) = 0
C2 limi→+∞ F (σ(si )) = δ > 0. (A weaker condition like lim supi→+∞ F (σ(si ))
> 0 is sufficient, but we will verify the stronger condition).

Neither condition holds for every sequence such that si /si  → s(u)/s(u).
For instance, for s(u) = (0, 1)T the sequence si = (k, k 2 ) does not comply with
condition C1. For these reason we need the following proposition.
Proposition 3. Let u ∈ bd(T ) \ Zn and let el be the l-th unit vector for some
l ∈ {1, . . . , n} such that ul ∈
/ Z.
(a) If there exists λ > 0 such that p := λs(u) ∈ Zn and σ(λs(u)) ∈ Z, then
si := el + ip complies with conditions C1 and C2.
(b) If λs(u) ∈ / Zn for all λ > 0, then let {(pi , qi )}i∈N ⊂ Zn × (Z+ \ {0}) be the
coefficients obtained using Dirichlet’s Theorem to approximate s(u). That is
{(pi , qi )}i∈N is such that
1
|qi s(u)j − pij | < ∀j ∈ {1, ..., n}.
i
For M ∈ Z+ such that M c ∈ Zn we have that si := el + M pi complies with
conditions C1 and C2.
With this proposition we can proceed to the proof of Proposition 1
Proof (Proof of Proposition 1). Let u ∈ bd(T ) \ Zn . There are three possible
cases:
1. There exists λ > 0 such that λs(u) ∈ Zn and σ(λs(u)) ∈
/ Z.
2. There exists λ > 0 such that λs(u) ∈ Zn and σ(λs(u)) ∈ Z.
3. λs(u) ∈
/ Zn for all λ > 0.
Case 1: "λs(u), x# ≤ σ(λs(u)) is a CG cut that separates u.
Cases 2 and 3: From Proposition 3, we have that in both cases there exists a
sequence {si }i∈N ⊂ Zn satisfying conditions C1 and C2. Together with

"si , u# − σ(si ) = "si , u# − σ(si ) + F (σ(si )), (5)

conditions C1 and C2 yields that for sufficiently large i we have "si , u#−σ(si ) >
0 and hence "si , x# ≤ σ(si ) separates u.
We next discuss the proof of Proposition 3 in the next two subsections.
334 S.S. Dey and J.P. Vielma

Condition C1 in Proposition 3. Condition C1 is not difficult to satisfy. In


fact, it is satisfied by any sequence for which the angle between si and s con-
verges fast enough (e.g. ifC si  → +∞, then C1 is satisfied if we have that
C
C(si /si ) − (s(u)/s(u))C ∈ o(1/si )). For the specific sequences in Proposi-
tion 3 (a) and 3 (b) condition C1 can be verified using properties from Section 3
and the following lemma which we do not prove here.
Lemma 1. Let w ∈ Rn and {v i }i∈N ⊂ Rn be any sequence such that there exists
N > 0 for which
|vji wk − vki wj | < N ∀i ∈ N, j, k ∈ {1, ..., n}, j = k (6)

and limi→+∞ "v i , w# = +∞. Then


lim "v i , w# − ||v i ||||w|| = 0.
i→+∞

Condition C2 in Proposition 3. Condition C2 is much more interesting


and showing that it holds for our specific sequences is the crux of the proof of
Proposition 3. The intuition behind the proof is the following: For the sequence
in Proposition 3 (a) we have si = el +ip. For large enough i, σ(si ) ≈ "el +ip, u# =
ul + i"λs(u), u# = ul + iσ(λs(u)). Now since σ(λs(u)) is integral, the fractional
part of σ(si ) is therefore approximately equal to ul . The formal proof is presented
next. We first present a simple lemma.
Lemma 2. Let α ∈ R, t ∈ R+ and {βi }i∈N ⊂ R be such that limi→∞ βi = ∞.
Then, for every ε > 0 there exists Nε such that
=
α + βi ≤ (α + βi )2 + t ≤ α + βi + ε ∀i ≥ Nε
Lemma 3. The sequence in Proposition 3 (a) satisfies Condition C2.
Proof. Let α = "A−1 el , A−1 p#/A−1 p, βi = iA−1 p and t = A−1 el 2 −
("A−1 rl , A−1 p#/A−1 p)2 . We have that limi→∞ βi = ∞ because A−1 p > 0
and t ≥ 0 by Cauchy-Schwarz inequality. Observe that,
D
A−1 si  = i2 A−1 p2 + 2i"A−1 el , A−1 p# + A−1 el 2
E 2  −1 l −1 2
"A−1 el , A−1 p# −1 p −1 el 2 −
"A e , A p#
= + iA + A
A−1 p A−1 p
=
= (α + βi )2 + t.
Then, by Lemma 2, we have that
=
σ(si ) = (α + βi )2 + t + "c, si #
"A−1 el , A−1 p#
≥ + iA−1 p + "c, el + ip#
A−1 p
F −1 l −1 G  −1 l −1 
"A e , A p# "A e , A p#
= iσ(p) + + "c, e # + F
l
+ "c, e #
l
A−1 p A−1 p
The Chvátal-Gomory Closure of an Ellipsoid Is a Polyhedron 335

and, similarly, for i ≥ Nε we have


F G  −1 l −1 
"A−1 el , A−1 p# "A e , A p#
σ(si ) ≤ iσ(p) + + "c, e l
# + F + "c, e l
# + ε.
A−1 p A−1 p

Hence, by setting k := iσ(p) + "A−1 el , A−1 p#/A−1 p + "c, el # and δ :=


F ("A−1 el , A−1 p#/A−1 p + "c, el #) we have that

k + δ ≤ σ(si ) ≤ k + δ + ε ∀i ≥ Nε . (7)

Now, k ∈ Z and

δ = F ("A−1 el , A−1 p#/A−1 p + "c, el #)


= F ("A−1 el , λA(u − c)#/λ + "c, el #)
= F ("el , A−1 λA(u − c)#/λ + "c, el #)
= F ("el , u#) = F (ul ) ∈ (0, 1),

/ Z and because p = λs(u) implies A−1 p = λA−1 s(u) = λ. Thus


because ul ∈
limi→+∞ F (σ(si )) = δ > 0.

Lemma 4. The sequence in Proposition 3 (b) satisfies Condition C2.



n
Proof. Let ¯i := pi − qi s(u), so that ¯
i  ≤ i . We then have that

lim A−1 ¯i  = lim A−1 (−¯


i ) = 0. (8)
i→+∞ i→+∞

Now observe that

A−1 si  = A−1 (M pi + el )
= M qi A−1 s(u) + A−1 el + M A−1 ¯i 
≤ M qi A−1 s(u) + A−1 el  + M A−1 ¯i 
E 2
"A−1 s(u), A−1 el #
= + M qi + t + M A−1 ¯i .
A−1 s(u)
!2
A−1 s(u),A−1 el 
where t := A−1 el 2 − A−1 s(u) . Since A−1 s(u) = A(u − c) = 1,
t = A−1 el 2 −"A(u−c), A e # which is non-negative by the Chauchy-Schwartz
−1 l 2
−1 −1 l
inequality. Therefore by setting α := A As(u),A
−1 s(u)
e 
, βi := M qi we can use
−1
Lemma 2 and the fact that A s(u) = A(u − c) = 1 to obtain that for every
ε > 0 there exists Nε such that

A−1 si  ≤ M qi + "A(u − c), A−1 el # + ε + M A−1 ¯i  ∀i ≥ Nε . (9)


336 S.S. Dey and J.P. Vielma

Similarly, we also have that


A−1 si  = A−1 (M pi + el )
≥ M A−1 pi + A−1 el − M A−1 ¯
i  − M A−1 (−¯
i )
= M qi A−1 s(u) + A−1 el  − M A−1 (−¯ i )
E 2
"A−1 s(u), A−1 el #
= + M qi + t − M A−1 (−¯
i )
A−1 s(u)
≥ M qi + "A(u − c), A−1 el # − M A−1 (−¯
i )
= M qi + "u − c, el # − A−1 (−¯
i ). (10)
Combining (9) and (10) and using (8) and the definition of σ(·) we obtain that
for every ε̃ > 0 there exists Nε̃ such that
M qi + "pi , M c# + "u, el # − ε̃ ≤ σ(si ) ≤ M qi + "pi , M c# + "u, el # + ε̃ (11)
holds for all i ≥ Nε̃ . Noting that M qi + "pi , M c# ∈ Z for all i we obtain that
limi→+∞ F (σ(si )) = "u, el # > 0.

5 Proof of Main Theorem


To prove Theorem 2, we first verify that Step 2 in Figure 1 can be accomplished
using a finite number of CG cuts.
Proposition 4. If there exists a finite set S ⊂ Zn such that
CC(S, T ) ⊂ T (12a)
CC(S, T ) ∩ bd(T ) ⊂ Zn , (12b)
then CC(Zn , T ) is a rational polytope.
Proof. Let V be the set of vertices of CC(S). By (12) we have that bd(T ) ∩ V ⊂
Zn ∩ T ⊂ CC(Zn ). Hence any CG cut that separates u ∈ CC(S) \ CC(Zn ) must
also separate a point in V \ bd(T ). It is then sufficient to show that the set of CG
cuts that separates some point in V \ bd(T ) is finite. To achieve this we will use
the fact that, because V \ bd(T ) ⊂ T \ bd(T ) and |V | < ∞, there exists 1 > ε > 0
such that
γB (v − c) ≤ 1 − ε ∀v ∈ V \ bd(T ). (13)
Now, if a CG cut "a, x# ≤ σ(a) for a ∈ Zn separates v ∈ V \ bd(T ), then
"a, v# > σ(a) (14)
⇒ "a, v# > σ(a) − 1 (15)
⇒ "a, v# > σB (a) + "a, c# − 1 (16)
⇒ σB (a)γB (v − c) ≥ "a, v − c# > σB (a) − 1 (17)
1
⇒ σB (a) < ≤ 1/ε (18)
1 − γB (v − c)
⇒ a ∈ (1/ε)B ◦ . (19)

The result follows from the fact that (1/ε)B is a bounded set.
The Chvátal-Gomory Closure of an Ellipsoid Is a Polyhedron 337

The separation results from Section 4 allows the construction of the set re-
quired in Proposition 4, which proves our main result.
Proof (Proof of Theorem 2). Let I := bd(T ) ∩ Zn be the finite (and possibly
empty) set of integer points on the boundary of T . We divide the proof into the
following cases
1. CC(Zn ) = ∅.
2. CC(Zn ) = ∅ and CC(Zn ) ∩ int(T ) = ∅.
3. CC(Zn ) ∩ int(T ) = ∅.
For the first case, the result follows directly. For the second case, by Proposition 1
and the strict convexity of T , we have that |I| = 1 and CC(Zn ) = I so the
result again follows directly. For the third case we show the existence of a set S
complying with conditions (12) presented in Proposition 4.
/ For each un∈ I, let εu > 0 be the value from Proposition 2. Let D := bd(T ) \
u∈I {x ∈ R : x − u < εu }. Observe that D ∩ Z = ∅ by construction and
n

that D is compact because it is obtained from compact set bd(T ) by removing a


finite number of open sets. Now, for any a ∈ Zn let O(a) := {x ∈ bd(T ) | "a, x# >
σ(a)} be the set of points of bd(T ) that are separated by the CG cut "a, x# ≤
σ(a). This set is open with respect to D./Furthermore, by Proposition 1 and
the construction of D, we have that D ⊂ a∈A O(a) for a possibly infinite set
A ⊂ Zn . However, since D is a compact set we have that there exists a finite
subset A0 ⊂ A such that ,
D⊂ O(a). (20)
a∈A0
/
Let S := A0 ∪ u∈I Wu where, for each u ∈ I, Wu is the set from Proposition 2.
Then by (20) and Proposition 2 we have that S is a finite set that complies with
condition (12b).
To show that S complies with condition (12a) we will show that if p ∈ / T,
then p ∈ / CC(S, T ). To achieve this, we use the fact that CC(Zn ) ∩ int(T ) = ∅.
Let c̃ ∈ CC(Zn ) ∩ int(T ), B̃ = B + c − c̃ and γB̃ (x) = inf{λ > 0 : x ∈ λB̃}
be the gauge of B̃. Then B̃ is a convex body with 0 ∈ int(B̃), T = {x ∈ Rn :
γB̃ (x − c̃) ≤ 1} and bd(T ) = {x ∈ Rn : γB̃ (x − c̃) = 1}. Now, for p ∈ / T,
let p̄ := c̃ + (p − c̃)/γB̃ (p − c̃) so that p̄ ∈ {μc̃ + (1 − μ)p : μ ∈ (0, 1)} and
p̄ ∈ bd(T ). If p̄ ∈
/ Zn , then by the definitions of S and c̃ we have that there exists
a ∈ S such that "a, c̃# ≤ σ(a) and "a, p̄# > σ(a). Then "a, p# > σ(a) and
hence p ∈ / CC(S, T ). If p̄ ∈ Zn let w ∈ Wp̄ be such that "w, c̃# < σ(w) and
"w, p̄# = w. Then "w, p# > σ(w) and hence p ∈ / CC(S, T ).

6 Remarks
We note that the proof of Proposition 4 only uses the fact that T is a convex
body and Theorem 2 uses the fact that T is additionally an ellipsoid only through
Proposition 1 and Proposition 2. Therefore, we have the following general suf-
ficient conditions for the polyhedrality of the first CG closure of a compact
convex set.
338 S.S. Dey and J.P. Vielma

Corollary 1. Let T be any compact convex set. CC(Zn , T ) is a rational poly-


hedron if any of the following conditions hold
Property 1 There exists a finite S ⊂ Zn such that (12) holds.
Property 2 For any u ∈ bd(T ) \ Zn there exists a CG cut that separates u and
for any u ∈ bd(T ) ∩ Zn there exist εu > 0 and a finite set Wu ⊂ Zn
such that (2)–(4) hold.

A condition similar to (12) was considered in [41] for polytopes that are not
necessarily rational. Specifically the author stated that if P is a polytope in real
space such that CC(Zn , P ) ∩ bd(P ) = ∅, then CC(Zn , P ) is a rational polytope.
We imagine that the proof he had in mind could have been something along the
lines of Proposition 4.
We also note that Step 2 of the procedure described in Section 2 can be directly
turned into a finitely terminating algorithm by simple enumeration. However, it
is not clear how to obtain a finitely terminating algorithmic version of Step 1
because it requires obtaining a finite subcover of the boundary of T from a quite
complicated infinite cover.

Acknowledgements. We would like to thank Shabbir Ahmed, George Nemhauser


and Arkadi Nemirovski for various discussions on this problem.

References
1. Abhishek, K., Leyffer, S., Linderoth, J.T.: FilMINT: An outer-approximation-
based solver for nonlinear mixed integer programs. In: Preprint ANL/MCS-P1374-
0906, Argonne National Laboratory, Mathematics and Computer Science Division,
Argonne, IL (September 2006)
2. Andersen, K., Cornuéjols, G., Li, Y.: Split closure and intersection cuts. Mathe-
matical Programming 102, 457–493 (2005)
3. Atamtürk, A., Narayanan, V.: Cuts for conic mixed-integer programming. In: Fis-
chetti and Williamson [24], pp. 16–29
4. Atamtürk, A., Narayanan, V.: Lifting for conic mixed-integer programming. Re-
search Report BCOL.07.04, IEOR, University of California-Berkeley, October 2007,
Forthcoming in Mathematical Programming (2007)
5. Atamtürk, A., Narayanan, V.: The submodular 0-1 knapsack polytope. Discrete
Optimization 6, 333–344 (2009)
6. Atamtürk, A., Narayanan, V.: Conic mixed-integer rounding cuts. Mathematical
Programming 122, 1–20 (2010)
7. Balas, E., Saxena, A.: Optimizing over the split closure. Mathematical Program-
ming 113, 219–240 (2008)
8. Belotti, P., Lee, J., Liberti, L., Margot, F., Waechter, A.: Branching and bound
tightening techniques for non-convex MINLP. Optimization Methods and Soft-
ware 24, 597–634 (2009)
9. Bonami, P., Biegler, L.T., Conn, A.R., Cornuéjols, G., Grossmann, I.E., Laird,
C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., Waechter, A.: An algorithmic
framework for convex mixed integer nonlinear programs. Discrete Optimization 5,
186–204 (2008)
The Chvátal-Gomory Closure of an Ellipsoid Is a Polyhedron 339

10. Bonami, P., Kilinç, M., Linderoth, J.: Algorithms and software for convex mixed in-
teger nonlinear programs, Technical Report 1664, Computer Sciences Department,
University of Wisconsin-Madison (October 2009)
11. Caprara, A., Fischetti, M.: {0, 12 }-Chvátal-Gomory cuts. Mathematical Program-
ming 74, 221–235 (1996)
12. Caprara, A., Letchford, A.N.: On the separation of split cuts and related inequal-
ities. Mathematical Programming 94, 279–294 (2003)
13. Ceria, S., Soares, J.: Perspective cuts for a class of convex 0-1 mixed integer pro-
grams. Mathematical Programming 86, 595–614 (1999)
14. Çezik, M.T., Iyengar, G.: Cuts for mixed 0-1 conic programming. Mathematical
Programming 104, 179–202 (2005)
15. Chvatal, V.: Edmonds polytopes and a hierarchy of combinatorial problems. Dis-
crete Mathematics 4, 305–337 (1973)
16. Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial
optimization. John Wiley and Sons, Inc., Chichester (1998)
17. Cook, W.J., Kannan, R., Schrijver, A.: Chvátal closures for mixed integer pro-
gramming problems. Mathematical Programming 58, 155–174 (1990)
18. Dadush, D., Dey, S.S., Vielma, J.P.: The Chvátal-Gomory closure of strictly convex
sets. Working paper, Geogia Institute of Technology (2010)
19. Dash, S., Günlük, O., Lodi, A.: On the MIR closure of polyhedra. In: Fischetti and
Williamson [24], pp. 337–351
20. Dash, S., Günlük, O., Lodi, A.: MIR closures of polyhedral sets. Mathematical
Programming 121, 33–60 (2010)
21. Eisenbrand, F.: On the membership problem for the elementary closure of a poly-
hedron. Combinatorica 19, 297–300 (1999)
22. Fischetti, M., Lodi, A.: Optimizing over the first Chvàtal closure. In: Jünger, M.,
Kaibel, V. (eds.) IPCO 2005. LNCS, vol. 3509, pp. 12–22. Springer, Heidelberg
(2005)
23. Fischetti, M., Lodi, A.: Optimizing over the first Chvátal closure. Mathematical
Programming, Series B 110, 3–20 (2007)
24. Fischetti, M., Williamson, D.P. (eds.): IPCO 2007. LNCS, vol. 4513. Springer,
Heidelberg (2007)
25. Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0-1 mixed integer
programs. Mathematical Programming 106, 225–236 (2006)
26. Gomory, R.E.: Outline of an algorithm for integer solutions to linear programs.
Bulletin of the American Mathematical Society 64, 275–278 (1958)
27. Gomory, R.E.: An algorithm for integer solutions to linear programs. In: Recent
advances in mathematical programming, pp. 269–302. McGraw-Hill, New York
(1963)
28. Grossmann, I., Lee, S.: Generalized convex disjunctive programming: Nonlinear
convex hull relaxation. Computational Optimization and Applications 26, 83–100
(2003)
29. Günlük, O., Lee, J., Weismantel, R.: MINLP strengthening for separable convex
quadratic transportation-cost UFL, IBM Research Report RC24213, IBM, York-
town Heights, NY (March 2007)
30. Günlük, O., Linderoth, J.: Perspective relaxation of mixed integer nonlinear pro-
grams with indicator variables. In: Lodi, et al. (eds.) [38], pp. 1–16
31. Günlük, O., Linderoth, J.: Perspective relaxation of mixed integer nonlinear pro-
grams with indicator variables. Mathematical Programming, Series B (to appear
2009)
340 S.S. Dey and J.P. Vielma

32. Hemmecke, R., Köppe, M., Lee, J., Weismantel, R.: Nonlinear integer program-
ming. IBM Research Report RC24820, IBM, Yorktown Heights, NY (December
2008); Juenger, M., Liebling, T., Naddef, D., Nemhauser, G., Pulleyblank, W.,
Reinelt, G., Rinaldi, G., Wolsey, L.: 50 Years of Integer Programming 1958–2008:
The Early Years and State-of-the-Art Surveys. Springer, Heidelberg (to appear
2010), ISBN 3540682740.
33. Jeroslow, R.: There cannot be any algorithm for integer programming with
quadratic constraints. Operations Research 21, 221–224 (1973)
34. Letchford, A.N., Pokutta, S., Schulz, A.S.: On the membership problem for the
{0, 1/2}-closure. Working paper, Lancaster University (2009)
35. Letchford, A.N., Sørensen, M.M.: Binary positive semidefinite matrices and asso-
ciated integer polytopes. In: Lodi, et al. (eds.) [38], pp. 125–139
36. Leyffer, S., Linderoth, J.T., Luedtke, J., Miller, A., Munson, T.: Applications and
algorithms for mixed integer nonlinear programming. Journal of Physics: Confer-
ence Series 180 (2009)
37. Leyffer, S., Sartenaer, A., Wanufelle, E.: Branch-and-refine for mixed-integer non-
convex global optimization. In: Preprint ANL/MCS-P1547-0908, Argonne National
Laboratory, Mathematics and Computer Science Division, Argonne, IL (September
2008)
38. Lodi, A., Panconesi, A., Rinaldi, G. (eds.): IPCO 2008. LNCS, vol. 5035. Springer,
Heidelberg (2008)
39. Richard, J.-P.P., Tawarmalani, M.: Lifting inequalities: a framework for generating
strong cuts for nonlinear programs. Mathematical Programming 121, 61–104 (2010)
40. Saxena, A., Bonami, P., Lee, J.: Disjunctive cuts for non-convex mixed integer
quadratically constrained programs. In: Lodi, et al. (eds.) [38], pp. 17–33
41. Schrijver, A.: On cutting planes. Annals of Discrete Mathematics 9, 291–296 (1980);
Combinatorics 79 (Proc. Colloq., Univ. Montréal, Montreal, Que., 1979), Part II
(1979)
42. Schrijver, A.: Theory of linear and integer programming. John Wiley & Sons, Inc.,
New York (1986)
43. Stubbs, R.A., Mehrotra, S.: A branch-and-cut method for 0-1 mixed convex pro-
gramming. Mathematical Programming 86, 515–532 (1999)
44. Vielma, J.P.: A constructive characterization of the split closure of a mixed integer
linear program. Operations Research Letters 35, 29–35 (2007)
A Pumping Algorithm for Ergodic Stochastic
Mean Payoff Games with Perfect Information

Endre Boros1, Khaled Elbassioni2, Vladimir Gurvich1 , and Kazuhisa Makino3


1
RUTCOR, Rutgers University, 640 Bartholomew Road, Piscataway NJ 08854-8003
{boros,gurvich}@rutcor.rutgers.edu
2
Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
[email protected]
3
Graduate School of Information Science and Technology, University of Tokyo,
Tokyo, 113-8656, Japan
[email protected]

Abstract. In this paper, we consider two-person zero-sum stochastic


mean payoff games with perfect information, or BWR-games, given by a
digraph G = (V = VB ∪ VW ∪ VR , E), with local rewards r : E → R, and
three types of vertices: black VB , white VW , and random VR . The game
is played by two players, White and Black: When the play is at a white
(black) vertex v, White (Black) selects an outgoing arc (v, u). When the
play is at a random vertex v, a vertex u is picked with the given proba-
bility p(v, u). In all cases, Black pays White the value r(v, u). The play
continues forever, and White aims to maximize (Black aims to minimize)
the limiting mean (that is, average) payoff. It was recently shown in [7]
that BWR-games are polynomially equivalent with the classical Gillette
games, which include many well-known subclasses, such as cyclic games,
simple stochastic games (SSG s), stochastic parity games, and Markov
decision processes. In this paper, we give a new algorithm for solving
BWR-games in the ergodic case, that is when the optimal values do not
depend on the initial position. Our algorithm solves a BWR-game by re-
ducing it, using a potential transformation, to a canonical form in which
the optimal strategies of both players and the value for every initial posi-
tion are obvious, since a locally optimal move in it is optimal in the whole
game. We show that this algorithm is pseudo-polynomial when the num-
ber of random nodes is constant. We also provide an almost matching
lower bound on its running time, and show that this bound holds for a
wider class of algorithms. Let us add that the general (non-ergodic) case
is at least as hard as SSG s, for which no pseudo-polynomial algorithm
is known.

Keywords: mean payoff games, local reward, Gillette model, perfect


information, potential, stochastic games.


This research was partially supported by DIMACS, Center for Discrete Mathematics
and Theoretical Computer Science, Rutgers University, and by the Scientific Grant-
in-Aid from Ministry of Education, Science, Sports and Culture of Japan.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 341–354, 2010.

c Springer-Verlag Berlin Heidelberg 2010
342 E. Boros et al.

1 Introduction
1.1 BWR-Games
We consider two-person zero-sum stochastic games with perfect information and
mean payoff: Let G = (V, E) be a digraph whose vertex-set V is partitioned
into three subsets V = VB ∪ VW ∪ VR that correspond to black, white, and
random positions, controlled respectively, by two players, Black - the minimizer
and White - the maximizer, and by nature. We also fix a local reward function
r : E → R, and probabilities p(v, u) for all arcs (v, u) going out of v ∈ VR .
Vertices v ∈ V and arcs e ∈ E are called positions and moves, respectively. In
a personal position v ∈ VW or v ∈ VB the corresponding player White or Black
selects an arc (v, u), while in a random position v ∈ VR a move (v, u) is chosen
with the given probability p(v, u). In all cases Black pays White the reward
r(v, u).
From a given initial position v0 ∈ V the game produces an infinite walk (called
a play). White’s objective is to maximize the limiting mean payoff
n
bi
c = lim inf i=0 , (1)
n→∞ n+1
where bi is the reward incurred at step i of the play,  nwhile the objective of Black
i=0 bi
is the opposite, that is, to minimize lim supn→∞ n+1 .
For this model it was shown in [7] that a saddle point exists in pure positional
uniformly optimal strategies. Here “pure” means that the choice of a move (v, u)
in a personal position v ∈ VB ∪ VR is deterministic, not random; “positional”
means that this choice depends solely on v, not on previous positions or moves;
finally, “uniformly optimal” means that it does not depend on the initial position
v0 , either. The results and methods in [7] are similar to those of Gillette [17];
see also Liggett and Lippman [28]: First, we analyze the so-called discounted
version, in which the payoff  is discounted by a factor β i at step i, giving the
effective payoff: aβ = (1 − β) ∞ i
i=0 β bi , and then we proceed to the limit as the
discount factor β ∈ [0, 1) tends to 1.
This class of the BWR-games was introduced in [19]; see also [10]. It was
recently shown in [7] that the BWR-games and classical Gillette games [17] are
polynomially equivalent. The special case when there are no random positions,
VR = ∅, is known as cyclic, or mean payoff, or BW-games. They were introduced
for the complete bipartite digraphs in [32,31], for all (not necessarily complete)
bipartite digraphs in [15], and for arbitrary digraphs in [19]. A more special
case was considered extensively in the literature under the name of parity games
[2,3,11,20,22,24], and later generalized also to include random nodes in [10]. A
BWR-game is reduced to a minimum mean cycle problem in case VW = VR = ∅,
see, for example [25]. If one of the sets VB or VW is empty, we obtain a Markov
decision process; see, for example, [30]. Finally, if both are empty VB = VW = ∅,
we get a weighted Markov chain.
It was noted in [9] that “parlor games”, like Backgammon (and even Chess)
can be solved in pure positional uniformly optimal strategies, based on their
BWR-model.
A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games 343

In the special case of a BWR-game, when all rewards are zero except at a
single node t called the terminal, at which there is a self-loop with reward 1,
we obtain the so-called simple stochastic games (SSG), introduced by Condon
[12,13] and considered in several papers [18,20]. In these games, the objective
of White is to maximize the probability of reaching the terminal, while Black
wants to minimize this probability. Recently, it was shown that Gillette games
(and hence BWR-games by [7]) are equivalent to SSG’s under polynomial-time
reductions [1]. Thus, by recent results of Björklund, Vorobyov [5], and Halman
[20],√ all these games can be solved in randomized strongly subexponential time
2O( nd log nd ) , where nd = |VB | + |VW | is the number of deterministic vertices.
Let us note that several pseudo-polynomial and subexponential algorithms exists
for BW-games [19,26,33,6,4,20,36]; see also [14] for a so called policy iteration
method, and [24] for parity games.
Besides their many applications (see e.g. [29,23]), all these games are of in-
terest to Complexity Theory: Karzanov and Lebedev [26] (see also [36]) proved
that the decision problem “whether the value of a BW-game is positive” is in
the intersection of NP and co-NP. Yet, no polynomial algorithm is known for
these games, see e.g., the recent survey by Vorobyov [35]. A similar complexity
claim can be shown to hold for SSG s and BWR-games, see [1,7].
While there are numerous pseudo-polynomial algorithms known for the BW-
case, it is a challenging open question whether a pseudo-polynomial algorithm
exists for SSG s or BWR-games.

1.2 Potential Transformations and Canonical Forms


Given a BWR-game, we consider potential transformations x : V → R, assigning
a real-value x(v) to each vertex v ∈ V , and transforming the local reward on
each arc (v, u) to rx (v, u) = r(v, u) + x(v) − x(u). It is known that for BW-
games there exists a potential transformation such that, in the obtained game
the locally optimal strategies are globally optimal, and hence, the value and
optimal strategies become obvious [19]. This result was extended for the more
general class of BWR-games in [7]: in the transformed game, the equilibrium
value μ(v) = μx (v) is given simply by the maximum local reward for v ∈ VW ,
the minimum local reward for v ∈ VB , and the average local reward for v ∈ VR .
In this case we say that the transformed game is in canonical form.
It is not clear how the algorithm given in [19] for the BW-case can be gen-
eralized to BWR-case. The proof in [7] follows by considering the discounted
case and then taking the discount factor β to the limit. While such an approach
is sufficient to prove the existence of a canonical form, it does not provide an
algorithm to compute the potentials, since the corresponding limits appear to be
infinite. In this paper, we give such an algorithm that does not go through the
discounted case. Our method computes an optimal potential transformation in
case the game is ergodic, that is, when the optimal values do not depend on the
initial position. If the game is not ergodic then our algorithm terminates with a
proof of non-ergodicity, by exhibiting at least two vertices with provably distinct
values. Unfortunately, our approach cannot be applied recursively in this case.
344 E. Boros et al.

This is not a complete surprise, since this case is at least as hard as SSG s, for
which no pseudo-polynomial algorithm is known.
Theorem 1. Consider a BWR-game with k random nodes, a total of n vertices,
and integer rewards in the range [−R, R], and assume that all probabilities are
rational whose common denominator is bounded by W . Then there is an algo-
2
rithm that runs in time nO(k) W O(k ) R log(nRW ) and either brings the game by
a potential transformation to canonical form, or proves that it is non-ergodic.
Let us remark that the ergodic case is frequent in applications. For instance,
it is the case when G = (VW ∪ VB ∪ VR , E) is a complete tripartite digraph
(where p(v, u) > 0 for all v ∈ VR and (v, u) ∈ E); see Section 3 for more general
sufficient conditions.
Theorem 1 states that our algorithm is pseudo-polynomial if the number
of random nodes is fixed. As far as we know, this is the first algorithm with
such a guarantee (in comparison, for example, to strategy improvement meth-
ods [4,21,34], for which exponential lower bounds are known [16]; it is worth
mentioning that the algorithm of [21] also works only for the ergodic case). In
fact, we are not aware of any previous results bounding the running time of an
algorithm for a class of BWR-games in terms of the number of random nodes,
except for [18] which shows that simple stochastic games on k random nodes can
be solved in time O(k!(|V ||E| + L)), where L is the maximum bit length of a
transition probability. It is worth remarking here that even though BWR-games
are polynomially reducible to simple stochastic games, under this reduction the
number of random nodes k becomes a polynomial in n, even if the original BWR-
game has constantly many random nodes. In particular, the result in [18] does
not imply a bound similar to that of Theorem 1 for general BWR-games.
One should also contrast the bound in Theorem 1 with the subexponential
bounds in [20]: roughly, the algorithm of Theorem 1 will be more efficient if |VR |
1
is o((|VW | + |VB |) 4 ) (assuming that W and R are polynomials in n). However,
our algorithm could be practically much faster since it can stop much earlier
than its estimated worst-case running time (unlike the subexponential algorithms
[20], or those based on dynamic programming [36]). In fact, as our preliminary
experiments indicate, to approximate the value of a random game on up to
15, 000 nodes within an additive error ε = 0.001, the algorithm takes no more
than a few hundred iterations, even if the maximum reward is very large. One
more desirable property of this algorithm is that it is of the certifying type (see
e.g. [27]), in the sense that, given an optimal pair of strategies, the vector of
potentials provided by the algorithm can be used to verify optimality in linear
time (otherwise verifying optimality requires solving two linear programs).

1.3 Overview of the Techniques


Our algorithm for proving Theorem 1 is quite simple. Starting form zero po-
tentials, and depending on the current locally optimal rewards (maximum for
White, minimum for Black, and average for Random), the algorithm keeps se-
lecting a subset of nodes and reducing their potentials by some value, until either
A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games 345

the locally optimal rewards at different nodes become sufficiently close to each
other, or a proof of non-ergodicity is obtained in the form of a certain partition
of the nodes. The upper bound on the running time consists of three technical
parts. The first one is to show that if the number of iterations becomes too large,
then there is a large enough potential gap to ensure an ergodic partition. In the
second part, we show that the range of potentials can be kept sufficiently small
throughout the algorithm, namely x∗ ∞ ≤ nRk(2W )k , and hence the range of
the transformed rewards does not explode. The third part concerns the required
accuracy. It can be shown that it is enough in our algorithm to get the value of
the game within an accuracy of
1
ε= , (2)
n2(k+1) k 2k (2W )4k+2k2 +2

in order to guarantee that it is equal to the exact value. As far as we know, such
a bound in terms of k is new, and it could be of independent interest. We also
show the lower bound W Ω(k) on the running time of the algorithm of Theorem
1 by providing an instance of the problem, with only random nodes.
The paper is organized as follows. In the next section, we formally define
BWR-games, canonical forms, and state some useful propositions. In Section 3,
we give a sufficient condition for the ergodicity of a BWR-game, which will be
used as one possible stopping criterion in our algorithm. We give the algorithm in
Section 4.1, and prove it converges in Section 4.2. In Section 5, we show that this
convergence proof can, in fact, be turned into a quantitative statement giving
the precise bounds stated in Theorem 1. The last section gives a lower bound
example for the algorithm. Due to lack of space, most of the proofs are omitted
(see [8] for details).

2 Preliminaries

2.1 BWR-Games

A BWR-game is defined by the quadruple G = (G, P, v0 , r), where G = (V =


VW ∪ VB ∪ VR , E) is a digraph that may have loops and multiple arcs, but
no terminal vertices1 , i.e., vertices of out-degree 0; P is the set of probability
distributions for all v ∈ VR specifying the probability p(v, u) of a move form v
to u; v0 ∈ V is an initial position from which
 the play starts; and r : E → R is
a local reward function. We assume that u | (v,u)∈E p(v, u) = 1 ∀v ∈ VR . For
convenience we will assume that p(v, u) > 0 whenever (v, u) ∈ E and v ∈ VR ,
and set p(v, u) = 0 for (v, u) ∈ E.
Standardly, we define a strategy sW ∈ SW (respectively, sB ∈ SB ) as a
mapping that assigns a move (v, u) ∈ E to each position v ∈ VW (respec-
tively, v ∈ VB ). A pair of strategies s = (sW , sB ) is called a situation. Given a
1
This assumption is without loss of generality since otherwise one can add a loop to
each terminal vertex.
346 E. Boros et al.

BWR-game G = (G, P, v0 , r) and situation s = (sB , sW ), we obtain a (weighted)


Markov chain Gs = (G, Ps , v0 , r) with transition matrix Ps in the obvious way:

⎨1 if (v ∈ VW and u = sW (v)) or (v ∈ VB and u = sB (v));
ps (v, u) = 0 if (v ∈ VW and u = sW (v)) or (v ∈ VB and u = sB (v));

p(v, u) if v ∈ VR .
In the obtained Markov chain Gs = (G, Ps , v0 , r), we define the limiting (mean)
effective payoff cs (v0 ) as
 
cs (v0 ) = p∗ (v) ps (v, u)r(v, u), (3)
v∈V u


where p : V → [0, 1] is the limiting distribution for Gs starting from v0 . Doing
this for all possible strategies of Black and White, we obtain a matrix game
Cv0 : SW × SB → R, with entries Cv0 (sW , sB ) defined by (3).

2.2 Solvability and Ergodicity


It is known that every such game has a saddle point in pure strategies [17,28].
Moreover, there are optimal strategies (s∗W , s∗B ) that do not depend on the
starting position v0 , so-called uniformly optimal strategies. In contrast, the value
of the game μ(v0 ) = Cv0 (s∗W , s∗B ) may depend on v0 .
The triplet G = (G, P, r) is called a un-initialized BWR-game. Furthermore, G
is called ergodic if the value μ(v0 ) of each corresponding BWR-game (G, P, v0 , r)
is the same for all initial positions v0 ∈ V .

2.3 Potential Transforms


Given a BWR-game G = (G, P, v0 , r), let us introduce a mapping x : V → R,
whose values x(v) will be called potentials, and define the transformed reward
function rx : E → R as:
rx (v, u) = r(v, u) + x(v) − x(u), where (v, u) ∈ E. (4)
It is not difficult to verify that the two normal form matrices Cx and C, of the
obtained game Gx and the original game G, are equal (see [7]). In particular,
their optimal (pure positional) strategies coincide, and the values also coincide:
μx (v0 ) = μ(v0 ). Given a BWR-game G = (G, P, r), let us define a mapping
m : V → R as follows:

⎨ max(r(v, u) | u : (v, u) ∈ E) for v ∈ VW ,
m(v) = min(r(v, u) | u : (v, u) ∈ E)  for v ∈ VW , (5)

mean(r(v, u) | u : (v, u) ∈ E) = u|(v,u)∈E r(v, u) p(v, u) for v ∈ VR .

A move (v, u) ∈ E in a position v ∈ VW (respectively, v ∈ VB ) is called locally


optimal if it realizes the maximum (respectively, minimum) in (5). A strategy
sW of White (respectively, sB of Black) is called locally optimal if it chooses a
locally optimal move (v, u) ∈ E in every position v ∈ VW (respectively, v ∈ VB ).
A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games 347

Proposition 1. If in a BWR-game the function m(v) = M for all v ∈ V , then


(i) every locally optimal strategy is optimal and (ii) the game is ergodic: M is
its value for every initial position v0 ∈ V .

3 Sufficient Conditions for Ergodicity of BWR-Games


A digraph G = (V = VW ∪ VB ∪ VR , E) is called ergodic if any un-initialized
BWR-game G = (G, P, r) on G is ergodic, that is, the values of the games
G = (G, P, v0 , r) do not depend on v0 . We will give a simple characterization of
ergodic digraphs, which, obviously, provides a sufficient condition for ergodicity
of the BWR-games.
In addition to partition Πp : V = VW ∪ VB ∪ VR , let us consider one more
partition Πr : V = V W ∪ V B ∪ V R with the following properties:

(i) Sets V W and V B are not empty (while V R might be empty).


(ii) There is no arc (v, u) ∈ E such that (v ∈ (VW ∪ VR ) ∩ V B and u ∈ V B )
or, vice versa, (v ∈ (VB ∪ VR ) ∩ V W and u ∈ V W ). In other words, White
cannot leave V B , Black cannot leave V W , and there are no random moves
from V W ∪ V B .
(iii) For each v ∈ VW ∩V W (respectively, v ∈ VB ∩V B ) there is a move (v, u) ∈ E
such that u ∈ V W (respectively, u ∈ V B ). In other words, White (Black)
cannot be forced to leave V W (respectively, V B ).

In particular, the properties above imply that the induced subgraphs G[V W ]
and G[V B ] have no terminal vertex.
Partition Πr : V = V W ∪ V B ∪ V R satisfying (i), (ii), and (iii) will be called
a contra-ergodic partition for digraph G = (VW ∪ VB ∪ VR , E).
Theorem 2. A digraph G is ergodic iff it has no contra-ergodic partition.
The “only if part” can be strengthened as follows:
Proposition 2. Given a BWR-game G whose graph has a contra-ergodic par-
tition, if m(v) > m(u) for every v ∈ V W , u ∈ V B then μ(v) > μ(u) for every
v ∈ V W , u ∈ V B.
Definition 1. A contra-ergodic decomposition of G is a contra-ergodic partition
Πr : V = V W ∪ V B ∪ V R such that m(v) > m(u) for every v ∈ V W and u ∈ V B .
By Proposition 2, G is not ergodic whenever it has such a decomposition.

4 Pumping Algorithm for the Ergodic BWR-Games


4.1 Description of the Algorithm
Given a BWR-game G = (G, P, r), let us compute m(v) for all v ∈ V using (5).
def def
Throughout, we will denote by [m] = [m− , m+ ] and [r] = [r− , r+ ] the range
348 E. Boros et al.

of functions m and r, respectively, and let M = m+ − m− and R = r+ − r− .


Given potentials x : V → R, we denote by mx the function m in (5) in which r is
replaced by the transformed reward rx . Given a subset I ⊆ [m], let V (I) = {v ∈
V | m(v) ∈ I} ⊆ V . In the following algorithm, set I will always be a closed or
semi-closed interval within [m].
Let m− = t0 < t1 < t2 < t3 < t4 = m+ be given thresholds. We will
successively apply potential transforms x : V → R such that no vertex ever leaves
the interval [t0 , t4 ] or [t1 , t3 ]; in other words, Vx [t0 , t4 ] = V [t0 , t4 ] and Vx [t1 , t3 ] ⊇
V [t1 , t3 ] for all considered transforms x, where Vx (I) = {v ∈ V | mx (v) ∈ I}.
Let us initialize potentials x(v) = 0 for all v ∈ V . We will fix
1 1 3
t0 := m− − − − +
x , t1 := mx + Mx , t2 := mx + Mx , t3 := mx + Mx , t4 := mx , (6)
4 2 4

where Mx = m+ x − mx . Then, let us reduce all potentials of Vx [t2 , t4 ] by a
maximum constant δ such that no vertex leaves the closed interval [t1 , t3 ]; in
other words, we stop the iteration whenever a vertex from this interval reaches
its border. After this we compute potentials x(v), new values mx (v), for v ∈ V ,
and start the next iteration.
It is clear that δ can be computed in linear time: it is the maximum value δ
such that mδx (v) ≥ t1 for all v ∈ Vx [t2 , t4 ] and mδx (v) ≤ t3 for all v ∈ Vx [t0 , t2 ),
where mδx (v) is the new value of mx (v) after all potentials in Vx [t2 , t4 ] have been
reduce by δ.
It is also clear from our update method (and important) that δ ≥ Mx /4.
Indeed, vertices from [t2 , t4 ] can only go down, while vertices from [t0 , t2 ) can
only go up. Each of them must traverse a distance of at least Mx /4 before it can
reach the border of the interval [t1 , t3 ]. Moreover, if after some iteration one of
the sets Vx [t0 , t1 ) or Vx (t3 , t4 ] becomes empty then the range of mx is reduced
at least by 25%.
Procedure PUMP(G, ε) below tries to reduce any BWR-game G by a potential
transformation x into one in which Mx ≤ ε. Two subroutines are used in the
procedure. REDUCE-POTENTIALS(G, x) replaces the current potential x with
another potential with a sufficiently small norm (cf. Lemma 3 below). This reduc-
tion is needed since without it the potentials and, hence, the transformed local re-
wards too, can grow exponentially. The second routine FIND-PARTITION(G, x)
uses the current potential vector x to construct a contra-ergodic decomposition
of G (cf. line 19 of the algorithm below). We will prove in Lemma 2 that if the
number of pumping iterations performed is large enough:

8n2 Rx
N= + 1, (7)
Mx θk−1

where Rx = rx+ − rx− , θ = min{p(v, u) : (v, u) ∈ E}, and k is the number of


random nodes, and yet the range of mx is not reduced, then we will be able to
find a contra-ergodic decomposition.
In Section 4.2, we will first argue that the algorithm terminates in finite time
if ε = 0 and the considered BWR-game is ergodic. In the following section,
A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games 349

this will be turned into a quantitative argument with the precise bound on the
running time. Yet, in Section 6, we will show that this time can be exponential
already for R-games.

4.2 Proof of Finiteness for the Ergodic Case


Let us assume without loss of generality that range of m is [0, 1], and the initial
potential x0 = 0. Suppose that during N iterations no new vertex enters the
interval [1/4, 3/4]. Then, −x(v) ≥ N/4 for each v ∈ V (3/4, 1], since these vertices
“were pumped” N times, and x(v) ≡ 0 for each v ∈ V [0, 1/4), since these vertices
“were not pumped” at all. We will show that if N is sufficiently large then the
considered game is not ergodic.
Consider infinitely many iterations i = 0, 1, . . ., and denote by V B ⊆ V (re-
spectively, by V W ⊆ V ) the set of vertices that were pumped just finitely many
times (respectively, always but finitely many times); in other words, mx0 (v) ∈
[1/2, 1] if v ∈ V W (respectively, mx0 (v) ∈ [0, 1/2) if v ∈ V B ) for all but finitely
many i s. It is not difficult to verify that the partition Πr : V = V W ∪ V B ∪ V R ,
where V R = V \ (V W ∪ V B ), is contra-ergodic. It is also clear that after suffi-
ciently many iterations mx0 (v) > 1/2 for all v ∈ V W , while mx0 (v) ≤ 1/2 for
all v ∈ V B . Thus, by Proposition 2, the considered game G is not ergodic, or in
other words, our algorithm is finite for the ergodic BWR-games. We shall give
an upper bound below for the number of times a vertex can oscillate around 1/2
before it finally settles itself down in [1/4, 1/2) or in (1/2, 3/4].

5 Running Time Analysis


Consider the execution of the algorithm on a given BWR-game. We define a
phase to be a set of iterations during which the range of mx , defined with respect
to the current potential x, is not reduced by a constant factor of what it was at
the beginning of the phase, i.e., none of the sets Vx [t0 , t1 ) or Vx (t3 , t4 ] becomes
empty (cf. line 12 of the procedure). Note that the number of iterations in each
phase is at most N defined by (7). Lemma 2 states that if N iterations are
performed in a phase then, the game is not ergodic. Lemma 4 bounds the total
number of phases and estimates the overall running time.

5.1 Finding a Contra-Ergodic Decomposition


We assume throughout this section that we are inside phase h of the algorithm,
which started with a potential xh , and that Mx > 34 Mxh in all N iterations of
the phase, and hence we proceed to step 19. For convenience, we will write (·)xh
as (·)h , where (·) could be m, r, r+ , etc, (e.g., m− − + +
h = mxh , mh = mxh ). For
simplicity, we assume that the phase starts with local reward function r = rh
and hence 2 xh = 0. Given a potential vector x, we use the following notation:
EXTx = {(v, u) ∈ E : v ∈ VB ∪ VW and rx (v, u) = mx (v)}, Δx = min{x(v) : v ∈ V }.
2
In particular, note that rx (v, u) and mx (v) are used, for simplicity of notation, to
actually mean rx+xh (v, u) and mx+xh (v), respectively.
350 E. Boros et al.

Algorithm 1. PUMP(G, ε)
Input: A BWR-game G = (G = (V, E), P, r) and a desired accuracy ε
Output: a potential x : V → R s.t. |mx (v) − mx (u)| ≤ ε for all u, v ∈ V if the game
is ergodic, and a contra-ergodic decomposition otherwise
1: let x0 (v) := x(v) := 0 for all v ∈ V ; i := 1
2: let t0 , t1 , . . . , t4 , and N be as defined by (6) and (7)
3: while i ≤ N do
4: if Mx ≤ ε then
5: return x
6: end if
 
7: δ := max{δ  | mδx (v) ≥ t1 for all v ∈ Vx0 [t2 , t4 ] and mδx (v) ≤ t3 for all v ∈
Vx0 [t0 , t2 )}
8: if δ = ∞ then
9: return the ergodic partition Vx0 [t0 , t2 ) ∪ Vx0 [t2 , t4 ]
10: end if
11: x(v) := x(v) − δ for all v ∈ Vx0 [t2 , t4 ]
12: if Vx0 [t0 , t1 ) = ∅ or Vx0 (t3 , t4 ] = ∅ then
13: x := x0 :=REDUCE-POTENTIALS(G, x); i := 1
14: recompute the thresholds t0 , t1 , . . . , t4 and N using (6) and (7)
15: else
16: i := i + 1;
17: end if
18: end while
19: V W ∪ V B ∪ V R :=FIND-PARTITION(G, x)
20: return contra-ergodic partition V W ∪ V B ∪ V R

Let tl < 0 be the largest value satisfying the following conditions:


(i) there are no arcs (v, u) ∈ E with v ∈ VW ∪ VR , x(v) ≥ tl and x(u) < tl ;
(i) there are no arcs (v, u) ∈ EXTx with v ∈ VB , x(v) ≥ tl and x(u) < tl .
Let X = {v ∈ V : x(v) ≥ tl }. In words, X is the set of nodes with potential as
close to 0 as possible, such that no white or random node in X has an arc crossing
to V \ X, and no black node has an extremal arc crossing to V \ X. Similarly,
define tu > Δx to be the smallest value satisfying the following conditions:
(i) there are no arcs (v, u) ∈ E with v ∈ VB ∪ VR , x(v) ≤ tu and x(u) > tu ;
(i) there are no arcs (v, u) ∈ EXTx with v ∈ VW , x(v) ≤ tu and x(u) > tu ,
and let Y = {v ∈ V : x(v) ≤ tu }. Note that the sets X and Y can be computed
in O(|V | log |V | + |E|) time.
 k−1
Lemma 1. It holds that max{−tl , tu − Δx } ≤ nRh 1θ .
The correctness of the algorithm follows from the following lemma.
Lemma 2. Suppose that pumping is performed for Nh ≥ 2nTh + 1 iterations,
where Th = M4nR

k−1 , and neither the set Vh [t0 , t1 ) nor Vh (t3 , t4 ] becomes empty.
h

Let V = X and V W = Y be the sets constructed as above, and V R = V \ (X ∪


B

Y ). Then V W ∪ V B ∪ V R is a contra-ergodic decomposition.


A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games 351

5.2 Potential Reduction

One problem that arises during the pumping procedure is that the potentials
can increase exponentially in the number of phases, making our bounds on the
number of iterations per phase also exponential in n. For the BW-case Pisaruk
[33] solved this problem by giving a procedure that reduces the range of the
potentials after each round, while keeping all its desired properties needed for the
running time analysis. Pisaruk’s potential reduction procedure can be thought
of as a combinatorial procedure for finding an extreme point of a polyhedron,
given a point in it. Indeed, given a BWR-game and a potential x, let us assume
without loss of generality, by shifting the potentials if necessary, that x ≥ 0, and
let E  = {(v, u) ∈ E : rx (v, u) ∈ [m− x , mx ], v ∈ VB ∪ VW }, where r is the
+

original local reward function. Then the following polyhedron is non-empty:


⎧ - − ⎫
⎪ - mx ≤ r(v, u) + x (v) − x (u) ≤ m+x, ∀(v, u) ∈ E  ⎪

⎪ - ⎪


⎪ - ⎪


⎪ - r(v, u) + x (v) − x (u) ≤ m+ ∀v ∈ VW , (v, u) ∈ E \ E ⎪ ⎪

⎪ - x, ⎪


⎪ - ⎪


⎨ -  ⎬

- m−  
x ≤ r(v, u) + x (v) − x (u), ∀v ∈ VB , (v, u) ∈ E \ E
Γx = x ∈ RV -- .

⎪ -  ⎪


⎪ - m−x ≤



⎪ - u∈V p(v, u)(r(v, u) ⎪


⎪ - + x (v) − x (u)) ≤ m+ ∀v ∈ VR ⎪

⎪ x, ⎪

⎪ - ⎪


⎩ - ⎪

- x(v) ≥ 0 ∀v ∈V

Moreover, Γx is pointed, and hence, it must have an extreme point.


Lemma 3. Consider a BWR-game in which all rewards are integral with range
R = r+ − r− , and probabilities p(v, u) are rational with common denominator at
most W , and let k = |VR |. Then any extreme point x∗ of Γx satisfies x∗ ∞ ≤
nRk(2W )k .
Note that any point x ∈ Γx satisfies [mx ] ⊆ [mx ], and hence, replacing x by x∗
does not increase the range of mx .

5.3 Proof of Theorem 1

Consider a BWR-game G = (G = (V, E), P, r) with |V | = n vertices and k


random nodes. Assume r to be integral in the range [−R, R] and all transition
probabilities are rational with common denominator W . From Lemmas 2 and 3,
we can conclude the following bound.
Lemma 4. Procedure PUMP(G, ε) terminates in O(nk(2W )k ( 1ε + n2 |E|)R log
( Rε )) time.
Theorem 1 follows by setting ε sufficiently small:
Corollary 1. When procedure PUMP(G, ε) is run with ε as in (2), it either
outputs a potential vector x such mx (v) is constant for all v ∈ V , or finds a
2
contra-ergodic partition. The total running time is nO(k) W O(k ) R log(nRW ).
352 E. Boros et al.

1 1 1 1 1 1 1 1
1 W+1 W+1 W+1 W+1 W+1 W+1 W+1 W+1
−1
W
W+1 u4 u3 u2 u1 u0 v1 v2 v3 v4 W
W+1
W W W 1 1 W W W
W+1 W+1 W+1 2 2 W+1 W+1 W+1

Fig. 1. An exponential Example

6 Lower Bound Example

We show now that the execution time of the algorithm, in the worst case, can
be exponential in the number of random nodes k, already for weighted Markov
chains, that is, for R-games. Consider the following example. Let G = (V, E)
be a digraph on k = 2l + 1 vertices ul , . . . , u1 , u0 = v0 , v1 , . . . , vl , and with the
following set of arcs:

E = {(ul , ul ), (vl , vl )} ∪ {(ui−1 , ui ), (ui , ui−1 ), (vi−1 , vi ), (vi , vi−1 ) : i = 1, . . . , l}.

Let W ≥ 1 be an integer. All nodes are random with the following transi-
tion probabilities: p(ul , ul ) = p(vl , vl ) = 1 − W1+1 , p(u0 , u1 ) = p(u0 , v1 ) =
2 , p(ui−1 , ui ) = p(vi−1 , vi ) = 1 − W +1 , for i = 2, . . . , l, and p(ui , ui−1 ) =
1 1

p(vi , vi−1 ) = W1+1 , for i = 1, . . . , l. The local rewards are zero on every arc,
except for r(ul , ul ) = −r(vl , vl ) = 1. Clearly this Markov chain consists of a
single recurrent class, and it is easy to verify that the limiting distribution p∗ is
as follows:
W −1 W i−1 (W 2 − 1)
p∗ (u0 ) = , p∗ (ui ) = p∗ (vi ) = for i = 1, . . . , l.
(W + 1)W − 2
l 2((W + 1)W l − 2)

The optimal expected reward at each vertex is


1 1
μ(ui ) = μ(vi ) = −1 · (1 − )p∗ (ul ) + 1 · (1 − )p∗ (ul ) = 0,
W +1 W +1
for i = 0, . . . , l. Up to a shift, there is a unique set of potentials that transform the
Markov chain into canonical form, and they satisfy a linear system of equations
in Δi = x(ui ) − x(ui−1 ) and Δi = x(vi ) − x(vi−1 ); solving this system we get
Δi = −Δi = W k−i+1 , for i = 1, . . . , l. Any pumping algorithm that starts with 0
potentials and modifies the potentials in each iteration by at most γ will not have
l−1
a number of iterations less than W2γ on the above example. In particular, the
algorithm in Section 4 has γ ≤ 1/ min{p(v, u) : (v, u) ∈ E, p(v, u) = 0}, which
is Ω(W ) in our example. We conclude that the running time of the algorithm is
Ω(W l−2 ) = W Ω(k) on this example.
A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games 353

References
1. Andersson, D., Miltersen, P.B.: The complexity of solving stochastic games on
graphs. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878,
pp. 112–121. Springer, Heidelberg (2009)
2. Beffara, E., Vorobyov, S.: Adapting Gurvich-Karzanov-Khachiyan’s algorithm for
parity games: Implementation and experimentation. Technical Report 2001-020,
Department of Information Technology, Uppsala University (2001),
https://www.it.uu.se/research/reports/#2001
3. Beffara, E., Vorobyov, S.: Is randomized Gurvich-Karzanov-Khachiyan’s algorithm
for parity games polynomial? Technical Report 2001-025, Department of Informa-
tion Technology, Uppsala University (2001),
https://www.it.uu.se/research/reports/#2001
4. Björklund, H., Sandberg, S., Vorobyov, S.: A combinatorial strongly sub-
exponential strategy improvement algorithm for mean payoff games. DIMACS
Technical Report 2004-05, DIMACS, Rutgers University (2004)
5. Björklund, H., Vorobyov, S.: Combinatorial structure and randomized subexponen-
tial algorithms for infinite games. Theoretical Computer Science 349(3), 347–360
(2005)
6. Björklund, H., Vorobyov, S.: A combinatorial strongly sub-exponential strategy im-
provement algorithm for mean payoff games. Discrete Applied Mathematics 155(2),
210–229 (2007)
7. Boros, E., Elbassioni, K., Gurvich, V., Makino, K.: Every stochastic game with
perfect information admits a canonical form. RRR-09-2009, RUTCOR. Rutgers
University (2009)
8. Boros, E., Elbassioni, K., Gurvich, V., Makino, K.: A pumping algorithm for er-
godic stochastic mean payoff games with perfect information. RRR-19-2009, RUT-
COR. Rutgers University (2009)
9. Boros, E., Gurvich, V.: Why chess and back gammon can be solved in pure posi-
tional uniformly optimal strategies? RRR-21-2009, RUTCOR. Rutgers University
(2009)
10. Chatterjee, K., Henzinger, T.A.: Reduction of stochastic parity to stochastic mean-
payoff games. Inf. Process. Lett. 106(1), 1–7 (2008)
11. Chatterjee, K., Jurdziński, M., Henzinger, T.A.: Quantitative stochastic parity
games. In: SODA ’04: Proceedings of the fifteenth annual ACM-SIAM symposium
on Discrete algorithms, pp. 121–130. Society for Industrial and Applied Mathe-
matics, Philadelphia (2004)
12. Condon, A.: The complexity of stochastic games. Information and Computation 96,
203–224 (1992)
13. Condon, A.: An algorithm for simple stochastic games. In: Advances in computa-
tional complexity theory. DIMACS series in discrete mathematics and theoretical
computer science, vol. 13 (1993)
14. Dhingra, V., Gaubert, S.: How to solve large scale deterministic games with mean
payoff by policy iteration. In: Valuetools ’06: Proceedings of the 1st international
conference on Performance evaluation methodolgies and tools, vol. 12. ACM, New
York (2006)
15. Eherenfeucht, A., Mycielski, J.: Positional strategies for mean payoff games. Inter-
national Journal of Game Theory 8, 109–113 (1979)
16. Friedmann, O.: An exponential lower bound for the parity game strategy improve-
ment algorithm as we know it. In: Symposium on Logic in Computer Science, pp.
145–156 (2009)
354 E. Boros et al.

17. Gillette, D.: Stochastic games with zero stop probabilities. In: Dresher, M., Tucker,
A.W., Wolfe, P. (eds.) Contribution to the Theory of Games III. Annals of Mathe-
matics Studies, vol. 39, pp. 179–187. Princeton University Press, Princeton (1957)
18. Gimbert, H., Horn, F.: Simple stochastic games with few random vertices are
easy to solve. In: Amadio, R.M. (ed.) FOSSACS 2008. LNCS, vol. 4962, pp. 5–
19. Springer, Heidelberg (2008)
19. Gurvich, V., Karzanov, A., Khachiyan, L.: Cyclic games and an algorithm to find
minimax cycle means in directed graphs. USSR Computational Mathematics and
Mathematical Physics 28, 85–91 (1988)
20. Halman, N.: Simple stochastic games, parity games, mean payoff games and dis-
counted payoff games are all LP-type problems. Algorithmica 49(1), 37–50 (2007)
21. Hoffman, A.J., Karp, R.M.: On nonterminating stochastic games. Management
Science, Series A 12(5), 359–370 (1966)
22. Jurdziński, M.: Deciding the winner in parity games is in UP ∩ co-UP. Inf. Process.
Lett. 68(3), 119–124 (1998)
23. Jurdziński, M.: Games for Verification: Algorithmic Issues. PhD thesis, Faculty of
Science, University of Aarhus, USA (2000)
24. Jurdziński, M., Paterson, M., Zwick, U.: A deterministic subexponential algorithm
for solving parity games. In: SODA ’06: Proceedings of the seventeenth annual
ACM-SIAM symposium on Discrete algorithm, pp. 117–123. ACM, New York
(2006)
25. Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete
Math. 23, 309–311 (1978)
26. Karzanov, A.V., Lebedev, V.N.: Cyclical games with prohibition. Mathematical
Programming 60, 277–293 (1993)
27. Kratsch, D., McConnell, R.M., Mehlhorn, K., Spinrad, J.P.: Certifying algorithms
for recognizing interval graphs and permutation graphs. In: SODA ’03: Proceedings
of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 158–
167. Society for Industrial and Applied Mathematics, Philadelphia (2003)
28. Liggett, T.M., Lippman, S.A.: Stochastic games with perfect information and time-
average payoff. SIAM Review 4, 604–607 (1969)
29. Littman, M.L.: Algorithm for sequential decision making, CS-96-09. PhD thesis,
Dept. of Computer Science, Brown Univ., USA (1996)
30. Mine, H., Osaki, S.: Markovian decision process. American Elsevier Publishing Co.,
New York (1970)
31. Moulin, H.: Extension of two person zero sum games. Journal of Mathematical
Analysis and Application 5(2), 490–507 (1976)
32. Moulin, H.: Prolongement des jeux à deux joueurs de somme nulle. Bull. Soc. Math.
France, Memoire 45 (1976)
33. Pisaruk, N.N.: Mean cost cyclical games. Mathematics of Operations Re-
search 24(4), 817–828 (1999)
34. Vöge, J., Jurdzinski, M.: A discrete strategy improvement algorithm for solving
parity games. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855,
pp. 202–215. Springer, Heidelberg (2000)
35. Vorobyov, S.: Cyclic games and linear programming. Discrete Applied Mathemat-
ics 156(11), 2195–2231 (2008)
36. Zwick, U., Paterson, M.: The complexity of mean payoff games on graphs. Theo-
retical Computer Science 158(1-2), 343–359 (1996)
On Column-Restricted and Priority Covering
Integer Programs

Deeparnab Chakrabarty, Elyot Grant, and Jochen Könemann

Department of Combinatorics and Optimization


University of Waterloo, Waterloo, ON, Canada N2L 3G1
[email protected], [email protected], [email protected]

Abstract. In a column-restricted covering integer program (CCIP), all


the non-zero entries of any column of the constraint matrix are equal.
Such programs capture capacitated versions of covering problems. In
this paper, we study the approximability of CCIPs, in particular, their
relation to the integrality gaps of the underlying 0,1-CIP.
If the underlying 0,1-CIP has an integrality gap O(γ), and assuming
that the integrality gap of the priority version of the 0,1-CIP is O(ω), we
give a factor O(γ + ω) approximation algorithm for the CCIP. Priority
versions of 0,1-CIPs (PCIPs) naturally capture quality of service type
constraints in a covering problem.
We investigate priority versions of the line (PLC) and the (rooted) tree
cover (PTC) problems. Apart from being natural objects to study, these
problems fall in a class of fundamental geometric covering problems. We
bound the integrality of certain classes of this PCIP by a constant. Algo-
rithmically, we give a polytime exact algorithm for PLC, show that the
PTC problem is APX-hard, and give a factor 2-approximation algorithm
for it.

1 Introduction
In a 0,1-covering integer program (0,1-CIP, in short), we are given a constraint
matrix A ∈ {0, 1}m×n, demands b ∈ Zm + , non-negative costs c ∈ Z+ , and upper
n

bounds d ∈ Z+ , and the goal is to solve the following integer linear program
n

(which we denote by Cov(A, b, c, d)).

min{cT x : Ax ≥ b, 0 ≤ x ≤ d, x integer}.
Problems that can be expressed as 0,1-CIPs are essentially equivalent to set
multi-cover problems, where sets correspond to columns and elements correspond
to rows. This directly implies that 0,1-CIPs are rather well understood in terms
of approximability: the class admits efficient O(log n) approximation algorithms
and this is best possible unless NP = P. Nevertheless, in many cases one can
get better approximations by exploiting the structure of matrix A. For example,
it is well known that whenever A is totally unimodular (TU)(e.g., see [18]), the

Supported by NSERC grant no. 288340 and by an Early Research Award.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 355–368, 2010.

c Springer-Verlag Berlin Heidelberg 2010
356 D. Chakrabarty, E. Grant, and J. Könemann

canonical LP relaxation of a 0,1-CIP is integral; hence, the existence of efficient


algorithms for solving linear programs immediately yields fast exact algorithms
for such 0,1-CIPs as well.
While a number of general techniques have been developed for obtaining im-
proved approximation algorithms for structured 0, 1-CIPs, not much is known
for structured non-0, 1 CIP instances. In this paper, we attempt to mitigate this
problem, by studying the class of column-restricted covering integer programs
(CCIPs), where all the non-zero entries of any column of the constraint matrix
are equal. Such CIPs arise naturally out of 0, 1-CIPs, and the main focus of this
paper is to understand how the structure of the underlying 0,1-CIP can be used
to derive improved approximation algorithms for CCIPs.

Column-Restricted Covering IPs (CCIPs): Given a 0,1-covering problem


Cov(A, b, c, d) and a supply vector s ∈ Zn+ , the corresponding CCIP is obtained
as follows. Let A[s] be the matrix obtained by replacing all the 1’s in the jth
column by sj ; that is, A[s]ij = Aij sj for all 1 ≤ i ≤ m, 1 ≤ j ≤ n. The column-
restricted covering problem is given by the following integer program.

min{cT x : A[s]x ≥ b, 0 ≤ x ≤ d, x integer}. (Cov(A[s], b, c, d))

CCIPs naturally capture capacitated versions of 0,1-covering problems. To il-


lustrate this we use the following 0,1-covering problem called the tree covering
problem. The input is a tree T = (V, E) rooted at a vertex r ∈ V , a set of
segments S ⊆ {(u, v) : u is a child of v}, non-negative costs cj for all j ∈ S, and
demands be ∈ Z+ for all e ∈ E. An edge e is contained in a segment j = (u, v) if
e lies on the unique u, v-path in T . The goal is to find a minimum-cost subset C
of segments such that each edge e ∈ E is contained in at least be segments of C.
When T is just a line, we call the above problem, the line cover (LC) problem.
In this example, the constraint matrix A has a row for each edge of the tree and
a column for each segment in S. It is not too hard to show that this matrix is
TU and thus these can be solved exactly in polynomial time.
In the above tree cover problem, suppose each segment j ∈ S also has a
capacity supply sj associated with it, and call an edge e covered by a collection
of segments C iff the total supply of the segments containing e exceeds the
demand of e. The problem of finding the minimum cost subset of segments
covering every edge is precisely the column-restricted tree cover problem. The
column-restricted line cover problem encodes the minimum knapsack problem
and is thus NP-hard.
For general CIPs, the best known approximation algorithm, due to Kolliopou-
los and Young [15], has a performance guarantee of O(1 + log α), where α, called
the dilation of the instance, denotes the maximum number of non-zero entries
in any column of the constraint matrix. Nothing better is known for the special
case of CCIPs unless one aims for bicriteria results where solutions violate the
upper bound constraints x ≤ d (see Section 1.1 for more details).
In this paper, our main aim is to understand how the approximability of a
given CCIP instance is determined by the structure of the underlying 0, 1-CIP. In
On Column-Restricted and Priority Covering Integer Programs 357

particular, if a 0, 1-CIP has a constant integrality gap, under what circumstances


can one get a constant factor approximation for the corresponding CCIP? We
make some steps toward finding an answer to this question.
In our main result, we show that there is a constant factor approximation
algorithm for CCIP if two induced 0, 1-CIPs have constant integrality gap.
The first is the underlying original 0,1-CIP. The second is a priority version
of the 0,1-CIP (PCIP, in short), whose constraint matrix is derived from that of
the 0,1-CIP as follows.

Priority versions of Covering IPs (PCIPs): Given a 0,1-covering problem


Cov(A, b, c, d), a priority supply vector s ∈ Zn+ , and a priority demand vector
π ∈ Zm
+ , the corresponding PCIP is as follows. Define A[s, π] to be the following
0,1 matrix 
1 : Aij = 1 and sj ≥ πi
A[s, π]ij = (1)
0 : otherwise,
Thus, a column j covers row i, only if its priority supply is higher than the
priority demand of row i. The priority covering problem is now as follows.

min{cT x : A[s, π]x ≥ 1, 0 ≤ x ≤ d, x integer}. (Cov(A[s, π], 1, c))

We believe that priority covering problems are interesting in their own right,
and they arise quite naturally in covering applications where one wants to model
quality of service (QoS) or priority restrictions. For instance, in the tree cover
problem defined above, suppose each segment j has a quality of service (QoS)
or priority supply sj associated with it and suppose each edge e has a QoS or
priority demand πe associated with it. We say that a segment j covers e iff j
contains e and the priority supply of j exceeds the priority demand of e. The
goal is to find a minimum cost subset of segments that covers every edge. This
is the priority tree cover problem.
Besides being a natural covering problem to study, we show that the priority
tree cover problem is a special case of a classical geometric covering problem:
that of finding a minimum cost cover of points by axis-parallel rectangles in 3
dimensions. Finding a constant factor approximation algorithm for this problem,
even when the rectangles have uniform cost, is a long standing open problem.
We show that although the tree cover is polynomial time solvable, the priority
tree cover problem is APX-hard. We complement this with a factor 2 approx-
imation for the problem. Furthermore, we present constant upper bounds for
the integrality gap of this PCIP in a number of special cases, implying constant
upper bounds on the corresponding CCIPs in these special cases. We refer the
reader to Section 1.2 for a formal statement of our results, which we give after
summarizing works related to our paper.

1.1 Related Work


There is a rich and long line of work ([9,11,17,19,20]) on approximation algorithms
for CIPs, of which we state the most relevant to our work. Assuming no upper
358 D. Chakrabarty, E. Grant, and J. Könemann

bounds on the variables, Srinivasan [19] gave a O(1 + log α)-approximation to the
problem (where α is the dilation as before). Later on, Kolliopoulos and Young
[15] obtained the same approximation factor, respecting the upper bounds. How-
ever, these algorithms didn’t give any better results when special structure of the
constraint matrix was known. On the hardness side, Trevisan [21] showed that it
is NP-hard to obtain a (log α − O(log log α))-approximation algorithm even for
0,1-CIPs.
The most relevant work to this paper is that of Kolliopoulos [12]. The author
studies CCIPs which satisfy a rather strong assumption, called the no bottleneck
assumption, that the supply of any column is smaller than the demand of any
row. Kolliopoulos [12] shows that if one is allowed to violate the upper bounds
by a multiplicative constant, then the integrality gap of the CCIP is within a
constant factor of that of the original 0,1-CIP1 . As the author notes such a
violation is necessary; otherwise the CCIP has unbounded integrality gap. If one
is not allowed to violated upper bounds, nothing better than the result of [15]
is known for the special case of CCIPs.
Our work on CCIPs parallels a large body of work on column-restricted
packing integer programs (CPIPs). Assuming the no-bottleneck assumption, Kol-
liopoulos and Stein [14] show that CPIPs can be approximated asymptotically
as well as the corresponding 0,1-PIPs. Chekuri et al. [7] subsequently improve
the constants in the result from [14]. These results imply constant factor approx-
imations for the column-restricted tree packing problem under the no-bottleneck
assumption. Without the no-bottleneck assumption, however, only polylogarith-
mic approximation is known for the problem [6].
The only work on priority versions of covering problems that we are aware
of is due to Charikar, Naor and Schieber [5] who studied the priority Steiner
tree and forest problems in the context of QoS management in a network multi-
casting application. Charikar et al. present a O(log n)-approximation algorithm
for the problem, and Chuzhoy et al. [8] later show that no efficient o(log log n)
approximation algorithm can exist unless NP ⊆ DTIME(nlog log log n ) (n is the
number of vertices).
To the best of our knowledge, the column-restricted or priority versions of the
line and tree cover problem have not been studied. The best known approxima-
tion algorithm known for both is the O(log n) factor implied by the results of [15]
stated above. However, upon completion of our work, Nitish Korula [16] pointed
out to us that a 4-approximation for column-restricted line cover is implicit in
a result of Bar-Noy et al. [2]. We remark that their algorithm is not LP-based,
although our general result on CCIPs is.

1.2 Technical Contributions and Formal Statement of Results

Given a 0,1-CIP Cov(A, b, c, d), we obtain its canonical LP relaxation by remov-


ing the integrality constraint. The integrality gap of the CIP is defined as the
1
Such a result is implicit in the paper; the author only states a O(log α) integrality
gap.
On Column-Restricted and Priority Covering Integer Programs 359

supremum of the ratio of optimal IP value to optimal LP value, taken over all
non-negative integral vectors b, c, and d. The integrality gap of an IP captures
how much the integrality constraint affects the optimum, and is an indicator of
the strength of a linear programming formulation.

CCIPs: Suppose the CCIP is Cov(A[s], b, c, d). We make the following two as-
sumptions about the integrality gaps of the 0,1 covering programs, both the
original 0,1-CIP and the priority version of the 0,1-CIP.
Assumption 1. The integrality gap of the original 0,1-CIP is γ ≥ 1. Specifi-
cally, for any non-negative integral vectors b, c, and d, if the canonical LP re-
laxation to the CIP has a fractional solution x, then one can find in polynomial
time an integral feasible solution to the CIP of cost at most γ · cT x. We stress
here that the entries of b, c, d could be 0 as well as ∞.
Assumption 2. The integrality gap of the PCIP is ω ≥ 1. Specifically, for any
non-negative integral vectors s, π, c, if the canonical LP relaxation to the PCIP
has a fractional solution x, then one can find in polynomial time, an integral
feasible solution to the PCIP of cost at most ω · cT x.
We give an LP-based approximation algorithm for solving CCIPs. Since the
canonical LP relaxation of a CCIP can have unbounded integrality gap, we
strengthen it by adding a set of valid constraints called the knapsack cover
constraints. We show that the integrality gap of this strengthened LP is O(γ +ω),
and can be used to give a polynomial time approximation algorithm.
Theorem 1. Under Assumptions 1 and 2, there is a (24γ + 8ω)-approximation
algorithm for column-restricted CIPs.
Knapsack cover constraints to strengthen LP relaxations were introduced in
[1,10,22]; Carr et al. [3] were the first to employ them in the design approximation
algorithms. The paper of Kolliopoulos and Young [15] also use these to get their
result on general CIPs.
The main technique in the design of algorithms for column-restricted problems
is grouping-and-scaling developed by Kolliopoulos and Stein [13,14] for packing
problems, and later used by Kolliopoulos [12] in the covering context. In this
technique, the columns of the matrix are divided into groups of ‘close-by’ supply
values; in a single group, the supply values are then scaled to be the same; for
a single group, the integrality gap of the original 0,1-CIP is invoked to get an
integral solution for that group; the final solution is a ‘union’ of the solutions
over all groups.
There are two issues in applying the technique to the new strengthened LP
relaxation of our problem. Firstly, although the original constraint matrix is
column-restricted, the new constraint matrix with the knapsack cover constraints
is not. Secondly, unless additional assumptions are made, the current grouping-
and-scaling analysis doesn’t give a handle on the degree of violation of the upper
bound constraints. This is the reason why Kolliopoulos [12] needs the strong
no-bottleneck assumption.
360 D. Chakrabarty, E. Grant, and J. Könemann

We get around the first difficulty by grouping the rows as well, into those that
get most of their coverage from columns not affected by the knapsack constraints,
and the remainder. On the first group of rows, we apply a subtle modification to
the vanilla grouping-and-scaling analysis and obtain a O(γ) approximate feasible
solution satisfying these rows; we then show that one can treat the remainder
of the rows as a PCIP and get a O(ω) approximate feasible solution satisfying
them, using Assumption 2. Combining the two gives the O(γ + ω) factor. The
full details are given in Section 2.
We stress here that apart from the integrality gap assumptions on the 0,1-
CIPs, we do not make any other assumption (like the no-bottleneck assumption).
In fact, we can use the modified analysis of the grouping-and-scaling technique to
get a similar result as [12] for approximating CCIPs violating the upper-bound
constraints, under a weaker assumption than the no-bottleneck assumption. The
no-bottleneck assumption states that the supply of any column is less than the
demand of any row. In particular, even though a column has entry 0 on a certain
row, its supply needs to be less than the demand of that row. We show that if we
weaken the no-bottleneck assumption to assuming that the supply of a column
j is less than the demand of any row i only if A[s]ij is positive, a similar result
can be obtained via our modified analysis.

Theorem 2. Under assumption 1 and assuming Aij sj ≤ bi , for all i, j, given a


fractional solution x to the canonical LP relaxation of Cov(A[s], b, c, d), one can
find an integral solution xint whose cost c · xint ≤ 10γ(c · x) and xint ≤ 10d.

Priority Covering Problems. In the following, we use PLC and PTC to refer
to the priority versions of the line cover and tree cover problems, respectively.
Recall that the constraint matrices for line and tree cover problems are totally
unimodular, and the integrality of the corresponding 0,1-covering problems is
therefore 1 in both case. It is interesting to note that the 0,1-coefficient matrices
for PLC and PTC are not totally unimodular in general. The following integrality
gap bound is obtained via a primal-dual algorithm.

Theorem 3. The canonical LP for priority line cover has an integrality gap of
at least 3/2 and at most 2.

In the case of tree cover, we obtain constant upper bounds on the integrality gap
for the case c = 1, that is, for the minimum cardinality version of the problem.
We believe that the PCIP for the tree cover problem with general costs also has
a constant integrality gap. On the negative side, we can show an integrality gap
e
of at least e−1 .

Theorem 4. The canonical LP for unweighted PTC has an integrality gap of


at most 6.

We obtain the upper bound by taking a given PTC instance and a fractional so-
lution to its canonical LP, and decomposing it into a collection of PLC instances
On Column-Restricted and Priority Covering Integer Programs 361

with corresponding fractional solutions, with the following two properties. First,
the total cost of the fractional solutions of the PLC instances is within a constant
of the cost of the fractional solution of the PTC instance. Second, union of
integral solutions to the PLC instances gives an integral solution to the PTC
instance. The upper bound follows from Theorem 3. Using Theorem 1, we get
the following as an immediate corollary.

Corollary 1. There are O(1)-approximation algorithms for column-restricted


line cover and the cardinality version of the column-restricted tree cover.

We also obtain the following combinatorial results.

Theorem 5. There is a polynomial-time exact algorithm for PLC.

Theorem 6. PTC is APX-hard, even when all the costs are unit.

Theorem 7. There is an efficient 2-approximation algorithm for PTC.

The algorithm for PLC is a non-trivial dynamic programming approach that


makes use of various structural observations about the optimal solution. The
approximation algorithm for PTC is obtained via a similar decomposition used
to prove Theorem 4.
We end by noting some interesting connections between the priority tree cover-
ing problem and set covering problems in computational geometry. The rectangle
cover problem in 3-dimensions is the following: given a collection of points P in
R3 , and a collection C of axis-parallel rectangles with costs, find a minimum cost
collection of rectangles that covers every point. We believe studying the PTC
problem could give new insights into the rectangle cover problem.

Theorem 8. The priority tree covering problem is a special case of the rectangle
cover problem in 3-dimensions.

Due to space restrictions, we omit many proofs. A full version of the paper is
available [4].

2 General Framework for Column Restricted CIPs


In this section we prove Theorem 1. Our goal is to round a solution to a LP re-
laxation of Cov(A[s], b, c, d) into an approximate integral solution. We strengthen
the following canonical LP relaxation of the CCIP

min{cT x : A[s]x ≥ b, 0 ≤ x ≤ d, x ≥ 0}

by adding valid knapsack cover constraints. In the following we use C for the set
of columns and R for the set of rows of A.
362 D. Chakrabarty, E. Grant, and J. Könemann

2.1 Strengthening the Canonical LP Relaxation


Let F ⊂ C be a subset of the columns in the column
 restricted CIP Cov(A[s], b, c, d).
For all rows i ∈ R, define bFi = max{0, b i − j∈F A[s]ij dj } to be the residual de-
mand of row i w.r.t. F . Define matrix AF [s] by letting

F
A [s]ij = i } : j ∈C \F
min{A[s]ij , bF
(2)
0 : j ∈ F,
for all i ∈ C and for all j ∈ R. The following Knapsack-Cover (KC) inequality

AF [s]ij xj ≥ bF
i
j∈C

is valid for the set of all integer solutions x for Cov(A[s], b, c, d). Adding the set
of all KC inequalities yields the following stronger LP formulation CIP. We note
that the LP is not column-restricted, in that, different values appear on the same
column of the new constraint matrix.

optP := min cj xj (P)
j∈C

s.t. AF [s]ij xj ≥ bF
i ∀F ⊆ C, ∀i ∈ R (3)
j∈C

0 ≤ xj ≤ dj ∀j ∈ C
It is not known whether (P) can be solved in polynomial time. For α ∈ (0, 1),
call a vector x∗ α-relaxed if its cost is at most optP , and if it satisfies (3) for
F = {j ∈ C : x∗j ≥ αdj }. An α-relaxed solution to (P) can be computed
efficiently for any α. To see this note that one can check whether a candidate
solution satisfies (3) for a set F ; we are done if it does, and otherwise we have
found an inequality of (P) that is violated, and we can make progress via the
ellipsoid method. Details can be found in [3] and [15].
We fix an α ∈ (0, 1), specifying its precise value later. Compute an α-relaxed
solution, x∗ , for (P), and let F = {j ∈ C : x∗j ≥ αdj }. Define x̄ as, x̄j = x∗j if
j ∈ C \ F , and x̄j = 0, otherwise. Since x∗ is an α-relaxed solution, we get that x̄
is a feasible fractional solution to the residual CIP, Cov(AF [s], bF , c, αd). In the
next subsection, our goal will be to obtain an integral feasible solution to the
covering problem Cov(AF [s], bF , c, d) using x̄. The next lemma shows how this
implies an approximation to our original CIP.
Lemma 1. If there exists an integral feasible solution, xint , to Cov(AF [s], bF , c, d)
with cT xint ≤ β · cT x̄, then there exists a max{1/α, β}-factor approximation to
Cov(A[s], b, c, d).

2.2 Solving the Residual Problem


In this section we use a feasible fractional solution x̄ of Cov(AF [s], bF , c, αd), to ob-
tain an integral feasible solution xint to the covering problem Cov(AF [s], bF , c, d),
with cT xint ≤ βcT x̄ for β = 24γ + 8ω. Fix α = 1/24.
On Column-Restricted and Priority Covering Integer Programs 363

Converting to Powers of 2. For ease of exposition, we first modify the input


to the residual problem Cov(AF [s], bF , c, d) so that all entries of are powers of 2.
For every i ∈ R, let b̄i denote the smallest power of 2 larger than bF i . For every
column j ∈ C, let s̄j denote the largest power of 2 smaller than sj .
Lemma 2. y = 4x̄ is feasible for Cov(AF [s̄], b̄, c, 4αd).
Partitioning the rows. We call b̄i the residual demand of row i. For a row
i, a column j ∈ C is i-large if the supply of j is at least the residual demand of
row i; it is i-small otherwise. Formally,
Li = {j ∈ C : Aij = 1, s̄j ≥ b̄i } is the set of i-large columns
Si = {j ∈ C : Aij = 1, s̄j < b̄i } is the set of i-small columns
Recall the definition from (2), AF [s̄]ij = min(A[s̄]ij , bF F
i ). Therefore, A [s̄]ij =
Aij bi for all j ∈ Li since s̄j ≥ b̄i ≥ bi ; and A [s̄]ij = Aij s̄j for all j ∈ Si , since
F F F

being powers of 2, s̄j < b̄i implies, s̄j ≤ b̄i /2 ≤ bF


i .
We now partition the rows into large and small depending on which columns
most of their coverage comes from. Formally, call a row i ∈ R large if
 
AF [s̄]ij yj ≤ AF [s̄]ij yj ,
j∈Si j∈Li

and small otherwise. Note that Lemma 2 together with the fact that each column
in row i’s support is either small or large implies,

AF [s̄]ij yj ≥ b̄i /2, for all large rows i, and
j∈Li

AF [s̄]ij yj ≥ b̄i /2, for all small rows i.
j∈Si

Let RL and RS be the set of large and small rows.


In the following, we address small and large rows separately. We compute a
pair of integral solutions xint,S and xint,L that are feasible for the small and
large rows, respectively. We then obtain xint by letting
xint
j = max{xint,S
j , xint,L
j }, (4)
for all j ∈ C.

Small rows. For these rows we use the grouping-and-scaling technique a la


[7,12,13,14]. However, as mentioned in the introduction, we use a modified anal-
ysis that bypasses the no-bottleneck assumptions made by earlier works.
Lemma 3. We can find an integral solution xint,S such that
a) xint,S ≤ dj for all j,
j 
b) j∈C cj xint,S ≤ 24γ j∈C cj x̄j , and
j

c) for every small row i ∈ RS , j∈C AF [s]ij xint,S
j ≥ bF
i .
364 D. Chakrabarty, E. Grant, and J. Könemann

Proof. (Sketch) Since the rows are small, for any row i, we can zero out the entries
that are larger than b̄i , and still 2y will be a feasible solution. Note that, now in
each row, the entries are < b̄i , and thus are at most b̄i /2 (everything being powers
of 2). We stress that it could be that b̄i of some row is less than the entry in some
other row, that is, we don’t have the no-bottleneck assumption. However, when a
particular row i is fixed, b̄i is at least any entry of the matrix in the ith row. Our
modified analysis of grouping and scaling then makes the proof go through.
We group the columns into classes that have sj as the same power of 2, and
(t)
for each row i we let b̄i be the contribution of the class t columns towards the
(t)
demand of row i. The columns of class t, the small rows, and the demands b̄i
form a CIP where all non-zero entries of the matrix are the same power of 2.
(t)
We scale both the constraint matrix and b̄i down by that power of 2 to get a
0,1-CIP, and using assumption 1, we get an integral solution to this 0,1-CIP. Our
final integral solution is obtained by concatenating all these integral solutions
over all classes.
Till now the algorithm is the standard grouping-and-scaling algorithm. The
difference lies in our analysis in proving that this integral solution is feasible for
the original CCIP. Originally the no-bottleneck assumption was used to prove
this. However, we show that, since the column values in different classes are
geometrically decreasing, the weaker assumption of b̄i being at least any entry
in the ith row is enough to make the analysis go through.
This completes the sketch of the proof.

Large rows. The large rows can be showed to be a PCIP problem and thus
Assumption 2 can be invoked to get an analogous lemma to Lemma 3.
Lemma 4. We can find an integral solution xint,L such that
a) xint,L ≤ 1 for all j,
j 
b) j∈C cj xint,S ≤ 8ω j∈C cj x̄j , and
j

c) for every large row i ∈ RL , j∈C AF [s]ij xint,S
j ≥ bF
i .

Define xint as xint


j = max{xint,S
j , xint,L
j } for all j; using the previous two lemmas
and Lemma 1, this integral solution proves Theorem 1.

3 Priority Line Cover


In this extended abstract, we show that the integrality gap of the canonical linear
programming relaxation of PLC is at most 2. Subsequently, we sketch an exact
combinatorial algorithm for the problem.

3.1 Canonical LP Relaxation: Integrality Gap


We start with the canonical LP relaxation for PLC and its dual in Figure 1.
We use the terminology an edge e is larger than f , if πe ≥ πf . The algorithm
maintains a set of segments Q initially empty. Call an edge e unsatisfied if no
On Column-Restricted and Priority Covering Integer Programs 365


min cj xj : S
x ∈ R+ 
j∈S
max ye : y ∈ R+
E
(Dual)
e∈E
(Primal) 


ye ≤ cj , ∀j ∈ S
xj ≥ 1, ∀e ∈ E e∈E:j covers e
j∈S:j covers e

Fig. 1. The PLC canonical LP relaxation and its dual

segment in Q covers e and let U be the set of unsatisfied edges. The algorithm
picks the largest edge in U and raises the dual value ye till some segments
becomes tight. The segments with the farthest left-end point and the farthest
right-end point are picked in Q, and all edges contained in any of them are
removed from U . Note that since we choose the largest in U , all such edges are
covered. The algorithm repeats this process till U becomes ∅, that is, all edges
are covered. The final set of segments is obtained by a reverse delete step, where
a segment is deleted if its deletion doesn’t make any edge uncovered.
The algorithm is a factor 2 approximation algorithm. To show this it suffices
by a standard argument for analysing primal-dual algorithms, that any edge
with a positive dual ye is contained in at most two segments in Q. These two
segments correspond to the left-most and the right-most segments that cover e;
it is not too hard to show if something else covers e, then either e has zero dual,
or the third segment is removed in the reverse delete step.

3.2 An Exact Algorithm for PLC


We sketch the exact algorithm for PLC. A segment j covers only a subset of
edges it contains. We call a contiguous interval of edges covered by j, a valley of
j. The uncovered edges form mountains. Thus a segment can be thought of as
forming a series of valleys and mountains.
Given a solution S ⊆ S to the PLC (or even a PTC) instance, we say that
segment j ∈ S is needed for edge e if j is the unique segment in S that covers
e. We let ES,j be the set of edges that need segment j. We say a solution is
valley-minimal if it satisfies the following two properties: (a) If a segment j is
needed for edge e that lies in the valley v of j, then no higher supply segment of
S intersects this valley v, and (b) every segment j is needed for its last and first
edges. We show that an optimum solution can be assumed to be valley-minimal,
and thus it suffices to find the minimum cost valley-minimal solution.
The crucial observation follows from properties (a) and (b) above. The valley-
minimality of solution S implies that there is a unique segment j ∈ S that covers
the first edge of the line. At a very high level, we may now use j to decompose
the given instance into a set of smaller instances. For this we first observe that
each of the remaining segments in S \ {j} is either fully contained in the strict
interior of segment j, or it is disjoint from j, and lies to the right of it. The set
of all segments that are disjoint from j form a feasible solution for the smaller
PLC instance induced by the portion of the original line instance to the right
366 D. Chakrabarty, E. Grant, and J. Könemann

of j. On the other hand, we show how to reduce the problem of finding an


optimal solution for the part of the line contained in j to a single shortest-
path computation in an auxiliary digraph. Each of the arcs in this digraph
once again corresponds to a smaller sub-instance of the original PLC instance,
and its cost is that of its optimal solution. The algorithm follows by dynamic
programming.

4 Priority Tree Cover


In this extended abstract, we sketch a factor 2 approximation for the PTC
problem, and show how the PTC problem is a special case of the 3 dimensional
rectangle cover problem. For the APX hardness and the integrality gap of the
unweighted PTC LP, we refer the reader to the full version.

4.1 An Approximation Algorithm for PTC


We use the exact algorithm for PLC to get the factor 2 algorithm for PTC.
The crucial idea is the following. Given an optimum solution S ∗ ⊆ S, we can
partition the edge-set E of T into disjoint sets E1 , . . . , Ep , and partition two
copies of S ∗ into S1 , . . . , Sp , such that Ei is a path in T for each i, and Si is a
priority line cover for the path Ei . Using this, we describe the 2-approximation
algorithm which proves Theorem 7.

Proof of Theorem 7: For any two vertices t (top) and b (bottom) of the tree T ,
such that t is an ancestor of b, let Ptb be the unique path from b to t. Note that
Ptb , together with the restrictions of the segments in Sto Ptb , defines an instance
of PLC. Therefore, for each pair t and b, we can compute the optimal solution
to the corresponding PLC instance using the exact algorithm; let the cost of
this solution be ctb . Create an instance of the 0,1-tree cover problem with T and
segments S  := {(t, b) : t is an ancestor of b} with costs ctb . Solve the 0,1-tree
cover instance exactly (recall we are in the rooted version) and for the segments
(t, b) in S  returned, return the solution of the corresponding PLC instance of
cost ctb .
One now uses the decomposition above to obtain a solution to the 0,1-tree
cover problem (T, S  ) of cost at most 2 times the cost of S ∗ . This proves the
theorem. The segments in S  picked are precisely the segments corresponding to
paths Ei , i = 1, . . . , p and each Si is a solution to the PLC instance. Since we
find the optimum PLC, there is a solution to (T, S  ) with costs c of cost less
than total cost of segments in S1 ∪ · · · ∪ Sp . But that cost is at most twice the
cost of S ∗ since each segment of S ∗ is in at most two Si ’s.

4.2 Priority Tree Cover and Geometric Covering Problems


We sketch how the PTC problem can be encoded as a rectangle cover problem.
To do so, an auxiliary problem is defined, which we call 2-PLC.
On Column-Restricted and Priority Covering Integer Programs 367

2-Priority Line Cover (2-PLC). The input is similar to PLC, except each
segment and edge has now an ordered pair of priorities, and a segment covers an
edge it contains iff each of the priorities of the segment exceeds the corresponding
priority of the edge. The goal, as in PLC, is to find a minimum cost cover.
It is not too hard to show 2-PLC is a special case of rectangle cover. The edges
correspond to points in 3 dimension and segments correspond to rectangles in
3-dimension; dimensions encoded by the linear coordinates on the line, and the
two priority values. In general, p-PLC can be shown to be a special case of
(p + 1)-dimensional rectangle cover.
What is more involved is to show PTC is a special case of 2-PLC. To do so, we
run two DFS orderings on the tree, where the order in which children of a node
are visited is completely opposite in the two DFS orderings. The first ordering
gives the order in which these edges must be placed on a line. The second gives
one of the priorities for the edges. The second priority of the edges comes from
the original priority in PTC. It can be shown that the segments priorities can
be so set that the feasible solutions are precisely the same in both the instances
proving Theorem 8.

5 Concluding Remarks
In this paper we studied column restricted covering integer programs. In particu-
lar, we studied the relationship between CCIPs and the underlying 0,1-CIPs. We
conjecture that the approximability of a CCIP should be asymptotically within
a constant factor of the integrality gap of the original 0,1-CIP. We couldn’t show
this; however, if the integrality gap of a PCIP is shown to be within a constant of
the integrality gap of the 0,1-CIP, then we will be done. At this point, we don’t
even know how to prove that PCIPs of special 0,1-CIPS, those whose constraint
matrices are totally unimodular, have constant integrality gap. Resolving the
case of PTC is an important step in this direction, and hopefully in resolving
our conjecture regarding CCIPs.

References
1. Balas, E.: Facets of the knapsack polytope. Math. Programming 8, 146–164 (1975)
2. Bar-Noy, A., Bar-Yehuda, R., Freund, A., Naor, J., Schieber, B.: A unified approach
to approximating resource allocation and scheduling. J. ACM 48(5), 1069–1090
(2001)
3. Carr, R.D., Fleischer, L.K., Leung, V.J., Phillips, C.A.: Strengthening integrality
gaps for capacitated network design and covering problems. In: Proceedings of
ACM-SIAM Symposium on Discrete Algorithms, pp. 106–115 (2000)
4. Chakrabarty, D., Grant, E., Könemann, J.: On column-restricted and priority cov-
ering integer programs. arXiv eprint (2010)
5. Charikar, M., Naor, J., Schieber, B.: Resource optimization in qos multicast routing
of real-time multimedia. IEEE/ACM Trans. Netw. 12(2), 340–348 (2004)
368 D. Chakrabarty, E. Grant, and J. Könemann

6. Chekuri, C., Ene, A., Korula, N.: Unsplittable flow in paths and trees and column-
restricted packing integer programs. In: Proceedings of International Workshop on
Approximation Algorithms for Combinatorial Optimization Problems (2009) (to
appear)
7. Chekuri, C., Mydlarz, M., Shepherd, F.B.: Multicommodity demand flow in a tree
and packing integer programs. ACM Trans. Alg. 3(3) (2007)
8. Chuzhoy, J., Gupta, A., Naor, J., Sinha, A.: On the approximability of some net-
work design problems. ACM Trans. Alg. 4(2) (2008)
9. Dobson, G.: Worst-case analysis of greedy heuristics for integer programming with
non-negative data. Math. Oper. Res. 7(4), 515–531 (1982)
10. Hammer, P., Johnson, E., Peled, U.: Facets of regular 0-1 polytopes. Math. Pro-
gramming 8, 179–206 (1975)
11. Hochbaum, D.S.: Approximation algorithms for the set covering and vertex cover
problems. SIAM Journal on Computing 11(3), 555–556 (1982)
12. Kolliopoulos, S.G.: Approximating covering integer programs with multiplicity con-
straints. Discrete Appl. Math. 129(2-3), 461–473 (2003)
13. Kolliopoulos, S.G., Stein, C.: Approximation algorithms for single-source unsplit-
table flow. SIAM Journal on Computing 31(3), 919–946 (2001)
14. Kolliopoulos, S.G., Stein, C.: Approximating disjoint-path problems using packing
integer programs. Math. Programming 99(1), 63–87 (2004)
15. Kolliopoulos, S.G., Young, N.E.: Approximation algorithms for covering/packing
integer programs. J. Comput. System Sci. 71(4), 495–505 (2005)
16. Korula, N.: Private Communication (2009)
17. Rajagopalan, S., Vazirani, V.V.: Primal-dual RNC approximation algorithms for
(multi)set (multi)cover and covering integer programs. In: Proceedings of IEEE
Symposium on Foundations of Computer Science (1993)
18. Schrijver, A.: Combinatorial optimization. Springer, New York (2003)
19. Srinivasan, A.: Improved approximation guarantees for packing and covering inte-
ger programs. SIAM Journal on Computing 29(2), 648–670 (1999)
20. Srinivasan, A.: An extension of the lovász local lemma, and its applications to
integer programming. SIAM Journal on Computing 36(3), 609–634 (2006)
21. Trevisan, L.: Non-approximability results for optimization problems on bounded
degree instances. In: Proceedings of ACM Symposium on Theory of Computing,
pp. 453–461 (2001)
22. Wolsey, L.: Facets for a linear inequality in 0-1 variables. Math. Programming 8,
168–175 (1975)
On k-Column Sparse Packing Programs

Nikhil Bansal1 , Nitish Korula2 ,


Viswanath Nagarajan1, and Aravind Srinivasan3
1
IBM T.J. Watson Research Center, Yorktown Heights, NY 10598
{nikhil,viswanath}@us.ibm.com
2
Dept. of Computer Science, University of Illinois, Urbana IL 61801,
Partially supported by NSF grant CCF 07-28782 and a University of Illinois
Dissertation Completion Fellowship
[email protected]
3
Dept. of Computer Science and Institute for Advanced Computer Studies,
University of Maryland, College Park, MD 20742, Supported in part by NSF ITR
Award CNS-0426683 and NSF Award CNS-0626636
[email protected]

Abstract. We consider the class of packing integer programs (PIPs)


that are column sparse, where there is a specified upper bound k on
the number of constraints that each variable appears in. We give an
improved (ek +o(k))-approximation algorithm for k-column sparse PIPs.
Our algorithm is based on a linear programming relaxation, and involves
randomized rounding combined with alteration. We also show that the
integrality gap of our LP relaxation is at least 2k−1; it is known that even
special cases of k-column sparse PIPs are Ω( logk k )-hard to approximate.
We generalize our result to the case of maximizing monotone sub-
modular functions
! over k-column sparse packing constraints, and obtain
e2 k
an e−1 + o(k) -approximation algorithm. In obtaining this result, we
prove a new property of submodular functions that generalizes the frac-
tionally subadditive property, which might be of independent interest.

1 Introduction
Packing integer programs (PIPs) are those of the form:
 
max wT x | Sx ≤ c, x ∈ {0, 1}n , where w ∈ Rn+ , c ∈ Rm
+ and S ∈ R+
m×n
.

Above, n is the number of variables/columns, m is the number of rows/constraints,


S is the matrix of sizes, c is the capacity vector, and w is the weight vector. In
general, PIPs are very hard to approximate: a special case is the classic indepen-
dent set problem, which is NP-Hard to approximate within a factor of n1− [30],
whereas an n-approximation is trivial. Thus, various special cases of PIPs are of-
ten studied. Here, we consider k-column sparse PIPs (denoted k-CS-PIP), which
are PIPs where the number of non-zero entries in each column of matrix S is at
most k. This is a fairly general class and models several basic problems such as
k-set packing [19] and independent set in graphs with degree at most k.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 369–382, 2010.

c Springer-Verlag Berlin Heidelberg 2010
370 N. Bansal et al.

Recently, in a somewhat surprising result, Pritchard [25] gave an algorithm


for k-CS-PIP where the approximation ratio only depends on k; this is useful
when k is small. This result is surprising because in contrast, no such guarantee
is possible for k-row sparse PIPs. In particular, the independent set problem
on general graphs is a 2-row sparse PIP, but is n1−o(1) -hard to approximate.
Pritchard’s algorithm [25] had an approximation ratio of 2k · k 2 . Subsequently,
an improved O(k 2 ) approximation algorithm was obtained independently by
Chekuri et al. [14] and Chakrabarty-Pritchard [11].
Our Results: In this paper, we first consider the k-CS-PIP problem and obtain
an (ek + o(k))-approximation algorithm for it. Our algorithm is based on solv-
ing a strengthened version of the natural LP relaxation of k-CS-PIP, and then
performing randomized rounding followed by suitable alterations. In the ran-
domized rounding step, we pick each variable independently (according to its LP
value) and obtain a set of variables with good expected weight; however some
constraints may be violated. Then in the alteration step, we drop some variables
so as to satisfy all constraints, while still having good expected weight. A simi-
lar approach can be used with the natural relaxation for k-CS-PIP obtained by
simply dropping the integrality constraints on the variables; this gives a slightly
weaker 8k-approximation bound. However, the analysis of this weaker result is
much simpler and we thus present it first. To obtain the ek + o(k) bound, we
construct a stronger LP relaxation by adding additional valid constraints to
the natural relaxation for k-CS-PIP. The analysis of our rounding procedure is
based on exploiting these additional constraints and using the positive correla-
tion between various probabilistic events via the FKG inequality. Due to space
constraints, we omit some details; these and other omitted proofs can be found
in the full version of this paper [5].
Our result is almost the best possible that one can hope for using the LP
based approach. We show that the integrality gap of the strengthened LP is
at least 2k − 1, so our analysis is tight up to a small constant factor e/2 ≈
1.36 for large values of k. Even without restricting to LP based approaches,
an O(k) approximation is nearly best possible since it is NP-Hard to obtain
an O(k/ log k)-approximation for the special case of k-set packing [18]. We also
obtain improved results for k-CS-PIP when capacities are large relative to the
sizes. In particular, we obtain a Θ(k 1/B )-approximation algorithm for k-CS-
PIP, where B := mini∈[n],j∈[m] cj /sij measures the relative slack between the
capacities c and sizes S. We also show that this result is tight up to constant
factors relative to its LP relaxation.
Our second main result is for the more general problem of maximizing a mono-
tone submodular function over packing constraints that are k-column sparse.
This problem is a common generalization of maximizing a submodular function
over (a) a k-dimensional knapsack [22], and !(b) the intersection of k partition
2
e k
matroids [24]. Here, we obtain an e−1 + o(k) -approximation algorithm for this
problem. Our algorithm uses the continuous greedy algorithm of Vondrák [29]
in conjunction with our randomized rounding plus alteration based approach.
However, it turns out that the analysis of the approximation guarantee is much
On k-Column Sparse Packing Programs 371

more intricate: In particular, we need a generalization of a result of Feige [16]


that shows that submodular functions are also fractionally subadditive. See Sec-
tion 3 for a statement of the new result, Theorem 5, and related context. This
generalization is based on an interesting connection between submodular func-
tions and the FKG inequality. We believe that this result and technique might
be of further use in the study of submodular optimization.

Related Previous Work: Various special cases of k-CS-PIP have been extensively
studied. An important special case is the k-set packing problem, where given a
collection of sets of cardinality at most k, the goal is to find the maximum weight
sub-collection of mutually disjoint sets. This is equivalent to k-CS-PIP where the
constraint matrix S is 0-1 and the capacity c is all ones. Note that for k = 2 this
is maximum weight matching which can be solved in polynomial time, and for
k = 3 the problem becomes APX-hard [18]. After a long line of work [19,2,12,9],
the best-known approximation ratio for this problem is k+1 2 +  obtained using
local search techniques [9]. An improved bound of k2 +  is also known [19] for the
unweighted case, i.e., the weight vector w = 1. It is also known that the natural
LP relaxation for this problem has integrality gap at least k − 1 + 1/k, and in
particular this holds for the projective plane instance of order k − 1. Hazan et
al. [18] showed that k-set packing is Ω( logk k )-hard to approximate.
Another special case of k-CS-PIP is the independent set problem in graphs
with maximum degree at most k. This is equivalent to k-CS-PIP where the
constraint matrix S is 0-1, capacity c is all ones, and each row is 2-sparse. This
problem has an O(k log log k/ log k)-approximation [17], and is Ω(k/ log2 k)-hard
to approximate [3], assuming the Unique Games Conjecture [20].
Shepherd and Vetta [26] studied the demand matching problem on graphs,
which is k-CS-PIP with k = 2, with the further restriction that in each column
the non-zero entries are equal, and that no two columns have non-zero entries in
the same two rows. They gave an LP-based 3.264-approximation algorithm [26],
and showed that the natural LP relaxation for this problem has integrality gap
at least 3. They also showed the demand matching problem to be APX-hard even
on bipartite graphs. For larger values of k, problems similar to demand matching
have been studied under the name of column-restricted PIPs [21], which arise in
the context of routing flow unsplittably (see also [6,7]). In particular, an 11.54k-
approximation algorithm was known [15] where (i) in each column all non-zero
entries are equal, and (ii) the maximum entry in S is at most the minimum entry
in c (this is also known as the no bottle-neck assumption); later, it was observed
in [13] that even without the second of these conditions, one can obtain an 8k
approximation. The literature on unsplittable flow is quite extensive; we refer
the reader to [4,13] and references therein.
For the general k-CS-PIP, Pritchard [25] gave a 2k k 2 -approximation algo-
rithm, which was the first result with approximation ratio depending only on k.
Pritchard’s algorithm was based on solving an iterated LP relaxation, and then
applying a randomized selection procedure. Independently, [14] and [11] showed
that this final step could be derandomized, yielding an improved bound of O(k 2 ).
All these previous results crucially use the structural properties of basic feasible
372 N. Bansal et al.

solutions of the LP relaxation. However, as stated above, our result is based on


randomized rounding with alterations and does not use properties of basic solu-
tions. This is crucial for the submodular maximization version of the problem,
as a solution to the fractional relaxation there does not have these properties.
We remark that randomized rounding with alteration has also been used ear-
lier by Srinivasan [28] in the context of PIPs. However, the focus of this paper
is different from ours; in previous work [27], Srinivasan had bounded the inte-
grality gap for PIPs by showing a randomized algorithm that obtained a “good”
solution (one that satisfies all constraints) with positive — but perhaps exponen-
tially small — probability. In [28], he proved that rounding followed by alteration
leads to an efficient and parallelizable algorithm; the rounding gives a “solution”
of good value in which most constraints are satisfied, and one can alter this so-
lution to ensure that all constraints are satisfied. (We note that [27,28] also gave
derandomized versions of these algorithms.)
Related issues have been considered in discrepancy theory, where the goal
is to round a fractional solution to a k-column sparse linear program so that
the capacity violation for any constraint is minimized. A celebrated result of
Beck-Fiala [8] shows that the capacity violation is at most O(k). A major open
question
√ in discrepancy theory is whether the above bound can be improved to
O( k), or even O(k 1− ) for some  > 0. While the result of [25] uses techniques
similar to that of [8], a crucial difference in our problem is that no constraint
can be violated at all.
There is a large body of work on constrained maximization of submodular
functions; we only cite the relevant papers here. Calinescu et al. [10] intro-
duced a continuous relaxation (called the multi-linear extension or extension-
by-expectation) of submodular functions and subsequently Vondrák [29] gave
e
an elegant e−1 -approximation algorithm for solving this continuous relaxation
over any “downward monotone” polytope P, as long as there is a polynomial-
time algorithm for optimizing linear functions over P. We use this continuous
relaxation in our algorithm for submodular maximization over k-sparse pack-
ing constraints. As noted earlier, k-sparse packing constraints generalize both
k-partition matroids and k-dimensional knapsacks. Nemhauser et al. [24] gave
a (k + 1)-approximation for submodular maximization over the intersection of
k partition matroids; when k is constant,
! Lee et al. [23] improved this to k + .
e
Kulik et al. [22] gave an e−1 +  -approximation for submodular maximization
over k-dimensional knapsacks when k is constant; if k is part of the input, the
best known approximation bound is O(k).
Problem Definition and Notation: Before we begin, we formally describe the k-
CS-PIP problem and fix some notation. Let the items (i.e., columns) be indexed
by i ∈ [n] and the constraints (i.e., rows) be indexed by j ∈ [m]. We consider
the following packing integer program.
 n 
 - 
n
max wi xi - sij · xi ≤ cj , ∀ j ∈ [m]; xi ∈ {0, 1}, ∀ i ∈ [n]
i=1 i=1
On k-Column Sparse Packing Programs 373

We say that item i participates in constraint j if sij > 0. For each i ∈ [n], let
N (i) := {j ∈ [m] | sij > 0} be the set of constraints that i participates in. In a
k-column sparse PIP, we have |N (i)| ≤ k for each i ∈ [n]. The goal is to find the
maximum weight subset of items such that all the constraints are satisfied.
We define the slack as B := mini∈[n],j∈[m] cj /sij . By scaling the constraint
matrix, we may assume that cj = 1 for all j ∈ [m]. We also assume that sij ≤ 1
for each i, j; otherwise, we can just fix xi = 0. Finally, for each constraint j, we
let P (j) denote the set of items participating in this constraint. Note that |P (j)|
can be arbitrarily large.

Organization: In Section 2 we begin with the natural LP relaxation, and describe


a simple 8k-approximation algorithm. We then present a stronger relaxation, and
sketch a proof of an (e + o(1))k-approximation. We also present the integrality
gap of 2k − 1 for this strengthened LP, implying that our result is almost tight.
In Section 3, we describe the O(k)-approximation for k-column sparse packing
problems over a submodular objective. Finally, in Section 4, we state the sig-
nificantly better ratios that can be obtained for both linear and submodular
objectives if the capacities of all constraints are large relative to the sizes; there
are matching integrality gaps up to a constant factor.

2 The Algorithm for k-CS-PIP

Before presenting our algorithm, we describe a (seemingly correct) algorithm


that does not quite work. Understanding why this easier algorithm fails gives
useful insight into the design for the correct algorithm.
A Strawman Algorithm: Consider the following algorithm. Let x be some opti-
mum solution to the natural LP relaxation of k-CS-PIP (i.e., dropping integral-
ity). For each element i ∈ [n], select it independently at random with probability
xi /(2k). Let S be the chosen set of items. For any constraint j ∈ [m], if it is
violated, then discard all items in S ∩ P (j), i.e., items i ∈ S for which sij > 0.
As the probabilities are scaled down by 2k, by Markov’s inequality any con-
straint j is violated with probability at most 1/(2k), and hence discards its items
with at most this probability. By the k-sparse property, each element can be dis-
carded by at most k constraints, and so by the union bound it is discarded with
probability at most k · 1/(2k) = 1/2. Since an element is chosen in S with prob-
ability xi /(2k), this implies that it lies in the overall solution with probability
at least xi /(4k), implying that the proposed algorithm is a 4k-approximation.
However, the above argument is not correct. Consider the following example.
Suppose there is a single constraint (and so k = 1),

M x1 + x2 + x3 + x4 + . . . + xM ≤ M

where M  1 is a large integer. Clearly, setting xi = 1/2 for i = 1, . . . , M is a


feasible solution. Now consider the execution of the strawman algorithm. Note
that whenever item 1 is chosen in S, it is very likely that some item other than
374 N. Bansal et al.

1 will also be chosen (since M  1 and we pick each item independently with
probability xi /(2k) = 1/4); in this case, item 1 will be discarded. Thus the final
solution will almost always not contain item 1, violating the claim that it lies in
the final solution with probability at least x1 /(4k) = 1/8.
The key point is that we must consider the probability of an item being
discarded by some constraint, conditional on it being chosen in the set S (for
item 1 in the above example, this probability is close to one, not at most half).
This is not a problem if either all item sizes are small (say sij ≤ cj /2), or all item
sizes are large (say sij ≈ cj ). The algorithm we analyze shows that the difficult
case is indeed when some constraints contain both large and small items, as in
the example above.

2.1 A Simple Algorithm for k-CS-PIP


We use the obvious LP relaxation for k-CS-PIP (i.e., dropping the integrality
condition) to obtain an 8k-approximation algorithm. An item i ∈ [n] is called
big for constraint j ∈ [m] iff sij > 12 , and small for constraint j iff 0 < sij ≤ 12 .
We first solve the LP relaxation to obtain an optimal fractional solution x, and
then round to an integral solution as follows. With foresight, set α = 4.

1. Sample each item i ∈ [n] independently with probability xi /(αk).


Let S denote the set of chosen items. We call an item in S an S-item.
2. For each item i, mark i (for deletion) if, for any constraint j ∈ N (i), either:
– S contains some other item i ∈ [n] \ {i} which is big for constraint j or
– The sum of sizes of S-items that are small for j exceeds 1. (i.e., the
capacity).
3. Delete all marked items, and return S  , the set of remaining items.

Analysis: We will show that this algorithm gives an 8k-approximation.


Lemma 1. Solution S  is feasible with probability one.
Proof Sketch. Consider any fixed constraint j ∈ [m]. If there is some i ∈ S  that
is big for j, it will be the only item in S  that participates in constraint j. If all
S  -items participating in j are small, their total size is at most 1. 
We now prove the main technical result of this section.
Theorem 1. For any item i ∈ [n], the probability Pr[i ∈ S  | i ∈ S] ≥ 1 − α2 .
Equivalently, the probability that item i is deleted from S conditional on it being
chosen in S is at most 2/α.
Proof. For any item i and constraint j ∈ N (i), let Bij denote the event that
i is marked for deletion from S because there is some other S-item that is big
for constraint j. Let Gj denote the event that the total size of S-items that are
small for constraint j exceeds 1. For any item i ∈ [n] and constraint j ∈ N (i),
we will show that:
2
Pr[Bij | i ∈ S] + Pr[Gj | i ∈ S] ≤ (1)
αk
On k-Column Sparse Packing Programs 375

To see that (1) implies the theorem, for any item i, simply take the union bound
over all j ∈ N (i). Thus, the probability that i is deleted from S conditional on
it being chosen in S is at most 2/α. Equivalently, Pr[i ∈ S  | i ∈ S] ≥ 1 − 2/α.
We now prove (1) using the following intuition: The total extent to which the
LP selects items that are big for any constraint cannot be more than 2 (each
big item has size at least 1/2); therefore, Bij is unlikely to occur since we scaled
down probabilities by factor αk. Ignoring for a moment the conditioning on
i ∈ S, event Gj is also unlikely, by Markov’s Inequality. But items are selected
for S independently, so if i is big for constraint j, then its presence in S does
not affect the event Gj at all. If i is small for constraint j, then even if i ∈ S,
the total size of S-items is unlikely to exceed 1.
To prove (1) formally,
 let B(j) denote the set of items that are big for con-
straint j, and Yj := ∈B(j) x . By the LP constraint for j, it follows that Yj ≤ 2
(since each
∈ B(j) has size sj > 12 ). Now by a union bound,
1  Yj 2
Pr[Bij | i ∈ S] ≤ x ≤ ≤ . (2)
αk αk αk
∈B(j)\{i}

Now, let G−i (j) denote the set of items that are small for constraint j, not
counting item i, even if it is small. Using the LP constraint j, we have:
  Yj
sj · xl ≤ 1 − sj · x ≤ 1 − . (3)
2
∈G−i (j) ∈B(j)

Since each item i is chosen into S with probability xi /(αk), inequality (3) im-
plies that the expected total size of S-items in G−i (j) is at most αk
1
(1 − Yj /2).
By Markov’s inequality, the probability that the total size of these S-items ex-
ceeds 1/2 is at most αk 2
(1 − Yj /2). Since items are chosen independently and
i ∈ G−i (j), we obtain this probability even conditioned on i ∈ S.
Whether i is big or small for j, event Gj can occur only if the total size of
S-items in G−i (j) exceeds 1/2. Thus,
 
2 Yj 2 Yj
Pr[Gj | i ∈ S] ≤ 1− = −
αk 2 αk αk
which, combined with inequality (2), yields (1).
Using the theorem above, we obtain the desired approximation:
Theorem 2. There is a randomized 8k-approximation algorithm for k-CS-PIP.
Proof. From Lemma 1, our algorithm always outputs a feasible solution. To
bound the objective value, recall that Pr[i ∈ S] = αk xi
for all i ∈ [n]. Hence
Theorem 1 implies that for all i ∈ [n]
 
xi 2
Pr[i ∈ S  ] ≥ Pr[i ∈ S] · Pr[i ∈ S  |i ∈ S] ≥ · 1− .
αk α
Finally, using linearity of expectation and α = 4, we obtain the theorem.
376 N. Bansal et al.

Remarks: We note that the analysis above only uses Markov’s inequality con-
ditioned on a single item being chosen in set S. Thus a pairwise independent
distribution suffices to choose the set S, and hence the algorithm can be eas-
ily derandomized. More generally, one could consider k-CS-PIP with arbitrary
upper-bounds on the variables: the above 8k-approximation algorithm extends
easily to this setting (details in the full version).

2.2 A Stronger LP, and Improved Approximation


We now present our strengthened LP and the (ek + o(k))-approximation algo-
rithm for k-CS-PIP.

Stronger LP relaxation. Recall that entries are scaled so that all capacities are
one. An item i is called big for constraint j iff sij > 1/2. For each constraint
j ∈ [m], let B(j) = {i ∈ [n] | sij > 12 } denote the set of big items. Since no
two items that 
are big for some constraint can be chosen in an integral solution,
the inequality i∈B(j) xi ≤ 1 is valid for each j ∈ [m]. The strengthened LP
relaxation that we consider is as follows.
n
max wi xi (4)
i=1

n
s.t. sij · xi ≤ cj , ∀j ∈ [m] (5)
i=1

xi ≤ 1, ∀j ∈ [m]. (6)
i∈B(j)

0 ≤ xi ≤ 1, ∀i ∈ [n]. (7)

The Algorithm: The algorithm obtains an optimal solution x to the LP relax-


ation (4-7), and rounds it to an integral solution S  as follows (parameter α will
be set to 1 later).
1. Pick each item i ∈ [n] independently with probability xi /(αk), with α ≥ 1.
Let S denote the set of chosen items.
2. For any item i and constraint j ∈ N (i), let Eij denote the event that the
items {i ∈ S | si j ≥ sij } have total size (in constraint j) exceeding one.
Mark i for deletion if Eij occurs for any j ∈ N (i).
3. Return set S  ⊆ S consisting of all items i ∈ S not marked for deletion.
Note the rule for deleting an item from S. In particular, whether item i is deleted
due to constraint j only depends on items that are at least as large as i in j.

Analysis: It is clear that S  is feasible with probability one. The improved ap-
proximation ratio comes from four different steps: First, we use the stronger LP
relaxation. Second, the more careful alteration step does not discard items un-
necessarily; the previous algorithm sometimes deleted items from S even when
constraints were not violated. Third, in analyzing the probability that constraint
On k-Column Sparse Packing Programs 377

j causes item i to be deleted from S, we further exploit discreteness of item sizes.


And fourth, for each item i, we use the FKG inequality to bound the probability
it is deleted instead of the weaker union bound over all constraints in N (i).
The main lemma is the following, where we show that each item appears in
S  with good probability.

Lemma2. For every  item i ∈ [n] and constraint j ∈ N (i), we have Pr[Eij | i ∈
S] ≤ αk
1 2 1/3
1 + ( αk ) .
Proof Sketch. Let
:= (4αk)1/3 . We classify items in relation to constraints as:

– Item i ∈ [n] is big for constraint j ∈ [m] if sij > 12 .


– Item i ∈ [n] is medium for constraint j ∈ [m] if 1 ≤ sij ≤ 12 .
– Item i ∈ [n] is tiny for constraint j ∈ [m] if sij < 1 .

We separately bound Pr[Eij |i ∈ S] when item i is big, medium, and tiny.

Claim. For any i ∈ [n] and j ∈ [m]:

1. If item i is big for constraint j, Pr[Eij | i ∈ S] ≤ 1


αk . !
2
2. If item i is medium for constraint j, Pr[Eij | i ∈ S] ≤ αk 1
1+ 2αk .
 
3. If item i is tiny for constraint j, Pr[Eij | i ∈ S] ≤ αk 1 +  .
1 2

In case 1, Eij occurs only if some other big item for constraint j is chosen in S;
the new constraints (6) of the strengthened LP bound this probability. In case
2, Eij can occur only if some big item or at least two medium items other than
i are selected for S; we argue that the latter probability is much smaller than
1/αk. In case 3, Eij can occur only if the total size (in constraint j) of items in
S \ {i} is greater than 1 − 1 ; Markov’s inequality gives the desired result.
Thus, for any item i and constraint j ∈ N (i), Pr[Eij | i ∈ S] ≤ αk
1
max{(1 +
2
2  1/3
 ), (1 + 2αk )}. From the choice of
= (4αk) , which makes the probability in
parts 2 and 3 of the claim equal, we obtain the lemma. 
We now prove the main result of this section.
  
2 1/3 k
Theorem 3. For each i ∈ [n], Pr[i ∈ S  | i ∈ S] ≥ 1 − 1
αk 1 + ( αk ) .

Proof. For any item i and constraint j ∈ N (i), the conditional event (¬Eij | i ∈ S)
is a decreasing function over the choice of items in set [n] \ {i}. Thus, by the FKG
inequality [1], for any fixed item i ∈ [n], the probability that no event (Eij | i ∈ S)
occurs is: ⎡ ⎤
H - I
Pr ⎣ ¬Eij - i ∈ S ⎦ ≥ Pr[¬Eij | i ∈ S]
j∈N (i) j∈N (i)
 
From Lemma 2, Pr[¬Eij | i ∈ S] ≥ 1 − αk 1 2 1/3
1 + ( αk ) . As each item is in at
most k constraints, we obtain the theorem.
378 N. Bansal et al.

Now, by setting α = 1,1 we have Pr[i ∈ S] = 1/k, and Pr[i ∈ S  | i ∈ S] ≥ 1


e+o(1) ,
which immediately implies:
Theorem 4. There is a randomized (ek + o(k))-approximation algorithm for
k-CS-PIP.
Remark: We note that this algorithm can be derandomized using conditional
expectation and pessimistic estimators, since we can exactly compute estimates
of the relevant probabilities. Also, using ideas from [28], the algorithm can be
implemented in RNC. We defer details to the full version.
Integrality Gap of LP (4-7). Consider the instance on n = m = 2k − 1 items
and constraints defined as follows. We view the indices [n] = {0, 1, · · · , n − 1} as
integers modulo n. The weights wi = 1 for all i ∈ [n]. The sizes are:

⎨ 1 if i = j
sij :=  if j ∈ {i + 1, · · · , i + k − 1 (mod n)} , ∀i, j ∈ [n].

0 otherwise

where  > 0 is arbitrarily small, in particular   nk 1


.
Observe that setting xi = 1 − k for all i ∈ [n] is a feasible fractional solution
to the strengthened LP (4-7); each constraint has only one big item and so the
new constraint (6) is satisfied. Thus the optimal LP value is at least (1 − k)·n ≈
n = 2k − 1. On the other hand, it is easy to see that the optimal integral solution
can only choose one item and hence has value 1. Thus the integrality gap of the
LP we consider is at least 2k − 1, for every k ≥ 1.

3 Submodular Objective Functions


We now consider the more general case when the objective we seek to maximize
is an arbitrary non-negative monotone submodular function f : 2[n] → R+ . The
problem we consider is:
 
- 
max f (T ) - sij ≤ cj , ∀j ∈ [m]; T ⊆ [n] (8)
i∈T

As is standard when dealing with submodular functions, we only assume value-


oracle access to the function: i.e., the algorithm can query any subset T ⊆
[n], and it obtains the function value f (T ) in constant time. Again, we let k
denote the column-sparseness of the underlying constraint matrix. In this section
we obtain an O(k)-approximation algorithm for Problem (8). The algorithm is
similar to that for k-CS-PIP (where the objective was linear):
1. We first solve (approximately) a suitable continuous relaxation of (8). This
step follows directly from the algorithm of Vondrák [29].
1
Note that this is optimal only asymptotically; in the case of k = 2, for instance, it
is better to choose α ≈ 2.8.
On k-Column Sparse Packing Programs 379

2. Then, using the fractional solution, we perform the randomized rounding


with alteration described in Section 2. Although the algorithm is the same
as for linear functions, the analysis requires considerably more work. In the
process, we also establish a new property of submodular functions that gen-
eralizes fractional subadditivity [16].
Solving the Continuous Relaxation. The extension-by-expectation (also
called the multi-linear extension) of a submodular function f is a continuous
function F : [0, 1]n → R+ defined as follows:

F (x) := Πi∈T xi · Πj ∈T (1 − xj ) · f (T )
T ⊆[n]

Note that F (x) = f (x) for x ∈ {0, 1}n and hence F is an extension of f . Even
though F is a non-linear function,
 using
 the continuous greedy algorithm of
Vondrák [29], we can obtain an 1 − 1e -approximation algorithm to the following
fractional relaxation of (8):
 
- n
max F (x) - sij · xi ≤ cj , ∀j ∈ [m]; 0 ≤ xi ≤ 1, ∀i ∈ [n] (9)
i=1

In order to apply the algorithm from [29], one needs to solve in polynomial
n time
the problem of maximizing a linear objective over the constraints { i=1 sij ·xi ≤
cj , ∀j ∈ [m]; 0 ≤ xi ≤ 1, ∀i ∈ [n]}. This is indeed possible since it is a linear
program on n variables and m constraints.
The Rounding Algorithm and Analysis. The rounding algorithm is identi-
cal to that for k-CS-PIP. Let x denote any feasible solution to Problem (9). We
apply the rounding algorithm from the previous section, to first obtain (possibly
infeasible) solution S ⊆ [n] and then feasible integral solution S  ⊆ [n].
However, the analysis approach in Theorem 3 does not work. The problem is
that even though S (which is chosen by random sampling) has good expected
profit, i.e., E[f (S)] = Ω( k1 )F (x), it may happen that the alteration step used
to obtain S  from S may end up throwing away essentially all the profit. This
was not an issue for linear objective functions since our alteration procedure
guarantees that Pr[i ∈ S  |i ∈ S] = Ω(1) for each i ∈ [n]; if f is linear, this im-
plies E[f (S)] = Ω(1) E[f (S  )]. However, this property is not enough for general
monotone submodular functions. Consider the following:

Example: Let set S ⊆ [n] be drawn from the following distribution:


– With probability 1/2n, S = [n].
– For each i ∈ [n], S = {i} with probability 1/2n.
– With probability 1/2 − 1/2n, S = ∅.
Define S  = S if S = [n], and S  = ∅ otherwise. For each i ∈ [n], we have
Pr[i ∈ S  | i ∈ S] = 1/2 = Ω(1). However, consider the profit with respect to the
“coverage” submodular function f , where f (T ) = 1 if T = ∅ and = 0 otherwise.
We have E[f (S)] = 1/2 + 1/2n, but E[f (S  )] is only 1/2n  E[f (S)].
380 N. Bansal et al.

Remark: Note that if S  itself was chosen randomly from S such that Pr[i ∈
S  |S = T ] = Ω(1) for every T ⊆ [n] and i ∈ T , then we would be done by Feige’s
Subadditivity Lemma [16]. Unfortunately, this is too much to hope for. In our
rounding procedure, for any particular choice of S, set S  is a fixed subset of S;
and there could be (bad) sets S, where after the alteration step we end up with
sets S  such that |S  |  |S|.
However, it turns out that we can use the following two additional properties
of our algorithm to argue that S  has reasonable profit. First, the sets S we con-
struct are drawn from a product distribution on the items. Second, our alteration
procedure has the following ‘monotonicity’ property: Suppose i ∈ T1 ⊆ T2 ⊆ [n],
and i ∈ S  when S = T2 . Then we are guaranteed that i ∈ S  when S = T1 .
(That is, if S contains additional items, it is more likely that i will be discarded
by some constraint it participates in.) The example above does not satisfy ei-
ther of these properties. Corollary 1 shows that these properties suffice. Roughly
speaking, the intuition is that since f is submodular, the marginal contribution
of item i to S is largest when S is “small”; this is also the case when i is most
likely to be retained for S  . That is, for every i ∈ [n], both Pr[i ∈ S  | i ∈ S] and
the marginal contribution of i to f (S) are decreasing functions of S. We prove
(see [5]) the following generalization of Feige’s Subadditivity Lemma.
Theorem 5. Let [n] denote a groundset, x ∈ [0, 1]n , and for each B ⊆ [n]
define p(B) = Πi∈B xi · Πj ∈B / (1 − xj ). Associated with each B ⊆ [n], there is an
arbitrary distribution
 over subsets of B, where each set A ⊆ B has probability
qB (A); so A⊆B qB (A) = 1 for all B ⊆ [n]. That is, we choose B from a
product distribution, and then retain a subset A of B by applying a randomized
alteration. Suppose that the system satisfies the following conditions.
Marginal Property:
  
∀i ∈ [n], p(B) qB (A) ≥ β · p(B). (10)
B⊆[n] A⊆B:i∈A B⊆[n]:i∈B

Monotonicity: For any two subsets B ⊆ B  ⊆ [n] we have,


 
∀i ∈ B, qB (A) ≥ qB  (A ) (11)
A⊆B:i∈A A ⊆B  :i∈A

Then, for any monotone submodular function f ,


  
p(B) qB (A) · f (A) ≥ β · p(B) · f (B). (12)
B⊆[n] A⊆B B⊆[n]

Corollary 1. Let S be a random set drawn from a product distribution on [n].


Let S  be another random set where for each choice of S, set S  is an arbitrary
subset of S. Suppose that for each i ∈ [n] the following hold.
– PrS [i ∈ S  | i ∈ S] ≥ β, and
– For all T1 ⊆ T2 with T1  i, if i ∈ S  when S = T2 then i ∈ S  when S = T1 .
Then E[f (S  )] ≥ βE[f (S)].
On k-Column Sparse Packing Programs 381

We are now ready to prove the performance guarantee of our algorithm. Ob-
serve that our rounding algorithm satisfies the hypothesis of Corollary 1 with
1
β = e+o(1) , when parameter α = 1. Moreover, one can show that E[f (S)] ≥
F (x)/(αk). Thus, E[f (S  )] ≥ e+o(1)
1
E[f (S)] ≥ ek+o(k)
1
·F (x). Combined with
e
the fact that x is an e−1 -approximate solution to the continuous relaxation (9),
we have proved our main result:
Theorem 6. There is a randomized algorithm for maximizing any monotone
submodular function over k-column sparse packing constraints achieving approx-
e2
imation ratio e−1 k + o(k).

4 k-CS-PIP Algorithm for Large B


We can obtain substantially better approximation guarantees for k-CS-PIP when
the capacities are large relative to the sizes. Recall the definition of the slack
parameter B. We consider the k-CS-PIP problem as a function of both k and B,
and obtain improved approximation ratios given in the following.
 1/B !
Theorem 7. There is a 4e · (e + o(1)) B k -approximation algorithm
  !
2 1/B
4e
for k-CS-PIP, and a e−1 · (e + o(1)) B k -approximation for maximiz-
ing monotone submodular functions over k-column sparse packing constraints.
The algorithms that obtain these approximation ratios are similar to those of
the preceding sections, but additional care is required in the analysis; as B is
large, one can now use a smaller scaling factor in the randomized rounding step
while bounding the probability that an element is deleted in the alteration step.
We also show that the natural LP relaxation for k-CS-PIP has an Ω(k 1/B )
integrality gap for every B ≥ 1.

Acknowledgements. NK thanks Chandra Chekuri and Alina Ene for detailed


discussions on k-CS-PIP. We also thank Deeparnab Chakarabarty and David
Pritchard for discussions and sharing a copy of [11]. We thank Jan Vondrák and
Chandra Chekuri for pointing out an error in the original proof of Theorem 6,
which prompted us to prove Theorem 5. Our thanks also to the IPCO referees
for their helpful suggestions.

References
1. Alon, N., Spencer, J.: The Probabilistic Method, 3rd edn. Wiley-Interscience,
New York (2008)
2. Arkin, E.M., Hassin, R.: On Local Search for Weighted k-Set Packing. In: European
Symposium on Algorithms, pp. 13–22 (1997)
3. Austrin, P., Khot, S., Safra, S.: Inapproximability of Vertex Cover and Independent
Set in Bounded Degree Graphs. In: Comp. Complexity Conference (2009)
4. Bansal, N., Friggstad, Z., Khandekar, R., Salavatipour, M.R.: A logarithmic ap-
proximation for unsplittable flow on line graphs. In: SODA (2009)
5. Bansal, N., Korula, N., Nagarajan, V., Srinivasan, A.: On k-Column Sparse Packing
Programs (full version), arXiv (2010)
382 N. Bansal et al.

6. Baveja, A., Srinivasan, A.: Approximating Low-Congestion Routing and Column-


Restricted Packing Problems. Information Proc. Letters (74), 19–25 (2000)
7. Baveja, A., Srinivasan, A.: Approximation Algorithms for Disjoint Paths and Re-
lated Routing and Packing Problems. Math. of Oper. Res. (25), 255–280 (2000)
8. Beck, J., Fiala, T.: “Integer making” theorems. Discrete Appl. Math. 3, 1–8 (1981)
9. Berman, P.: A d/2 approximation for maximum weight independent set in d-claw
free graphs. Nordic Journal of Computing 7(3), 178–184 (2000)
10. Calinescu, G., Chekuri, C., Pál, M., Vondrák, J.: Maximizing a monotone submod-
ular function under a matroid constraint. In: Fischetti, M., Williamson, D.P. (eds.)
IPCO 2007. LNCS, vol. 4513, pp. 182–196. Springer, Heidelberg (2007)
11. Chakrabarty, D., Pritchard, D.: Personal Communication (2009)
12. Chandra, B., Halldórsson, M.: Greedy Local Improvement and Weighted Packing
Approximation. In: SODA (1999)
13. Chekuri, C., Ene, A., Korula, N.: Unsplittable Flow in Paths and Trees and
Column-Restricted Packing Integer Programs. In: Dinur, I., Jansen, K., Naor,
J., Rolim, J. (eds.) APPROX and RANDOM 2009. LNCS, vol. 5687, pp. 42–55.
Springer, Heidelberg (2009)
14. Chekuri, C., Ene, A., Korula, N.: Personal Communication (2009)
15. Chekuri, C., Mydlarz, M., Shepherd, B.: Multicommodity Demand Flow in a Tree
and Packing Integer Programs. ACM Trans. on Algorithms 3(3) (2007)
16. Feige, U.: On maximizing welfare when utility functions are subadditive. In: STOC,
pp. 41–50 (2006)
17. Halperin, E.: Improved Approximation Algorithms for the Vertex Cover Problem
in Graphs and Hypergraphs. SIAM J. Comput. 31(5), 1608–1623 (2002)
18. Hazan, E., Safra, S., Schwartz, O.: On the complexity of approximating k-set pack-
ing. Computational Complexity 15(1), 20–39 (2003)
19. Hurkens, A.J., Schrijver, A.: On the Size of Systems of Sets Every t of Which Have
an SDR, with an Application to the Worst-Case Ratio of Heuristics for Packing
Problems. SIAM J. Discrete Math. 2(1), 68–72 (1989)
20. Khot, S.: On the power of unique 2-prover 1-round games. In: STOC, pp. 767–775
(2002)
21. Kolliopoulos, S., Stein, C.: Approximating Disjoint-Path Problems using Packing
Integer Programs. Mathematical Programming A (99), 63–87 (2004)
22. Kulik, A., Shachnai, H., Tamir, T.: Maximizing submodular functions subject to
multiple linear constraints. In: SODA (2009)
23. Lee, J., Mirrokni, V., Nagarajan, V., Sviridenko, M.: Non-monotone submodular
maximization under matroid and knapsack constraints. In: STOC, pp. 323–332 (2009)
24. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for
maximizing submodular set functions II. Math. Prog. Study 8, 73–87 (1978)
25. Pritchard, D.: Approximability of Sparse Integer Programs. In: Fiat, A., Sanders,
P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 83–94. Springer, Heidelberg (2009)
26. Shepherd, B., Vetta, A.: The demand matching problem. Mathematics of Opera-
tions Research 32, 563–578 (2007)
27. Srinivasan, A.: Improved Approximation Guarantees for Packing and Covering
Integer Programs. SIAM J. Comput. 29(2), 648–670 (1999)
28. Srinivasan, A.: New approaches to covering and packing problems. In: SODA, pp.
567–576 (2001)
29. Vondrák, J.: Optimal approximation for the submodular welfare problem in the
value oracle model. In: STOC, pp. 67–74 (2008)
30. Zuckerman, D.: Linear Degree Extractors and the Inapproximability of Max Clique
and Chromatic Number. Theory of Computing 3(1), 103–128 (2007)
Hypergraphic LP Relaxations for Steiner Trees

Deeparnab Chakrabarty, Jochen Könemann, and David Pritchard

University of Waterloo, Waterloo, Ontario N2L 3G1, Canada

Abstract. We investigate hypergraphic LP relaxations for the Steiner


tree problem, primarily the partition LP relaxation introduced by Köne-
mann et al. [Math. Programming, 2009]. Specifically, we are interested
in proving upper bounds on the integrality gap of this LP, and studying
its relation to other linear relaxations. Our results are the following.
Structural results: We extend the technique of uncrossing, usually
applied to families of sets, to families of partitions. As a consequence we
show that any basic feasible solution to the partition LP formulation has
sparse support. Although the number of variables could be exponential,
the number of positive variables is at most the number of terminals.
Relations with other relaxations: We show the equivalence of the
partition LP relaxation with other known hypergraphic relaxations. We
also show that these hypergraphic relaxations are equivalent to the well
studied bidirected cut relaxation, if the instance is quasibipartite.
√ .
Integrality gap upper bounds: We show an upper bound of 3 =
1.729 on the integrality gap of these hypergraph relaxations in general
graphs. In the special case of uniformly quasibipartite instances, we show
.
an improved upper bound of 73/60 = 1.216. By our equivalence theorem,
the latter result implies an improved upper bound for the bidirected cut
relaxation as well.

1 Introduction

In the Steiner tree problem, we are given an undirected graph G = (V, E), non-
negative costs ce for all edges e ∈ E, and a set of terminal vertices R ⊆ V . The
goal is to find a minimum-cost tree T spanning R, and possibly some Steiner
vertices from V \R. We can assume that the graph is complete and that the costs
induce a metric. The problem takes a central place in the theory of combinatorial
optimization and has numerous practical applications. Since the Steiner tree
problem is NP-hard1 we are interested in approximation algorithms for it. The
best published approximation algorithm for the Steiner tree problem is due to
Robins and Zelikovsky [20], which for any fixed  > 0, achieves a performance
.
ratio of 1 + ln23 +  = 1.55 in polynomial time; an improvement is currently in
press [2], see also Remark 1.

Supported by NSERC grant no. 288340 and by an Early Research Award.
1
Chlebı́k and Chlebı́ková show that no (96/95 − )-approximation algorithm can exist
for any positive unless P=NP [5].

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 383–396, 2010.

c Springer-Verlag Berlin Heidelberg 2010
384 D. Chakrabarty, J. Könemann, and D. Pritchard

In this paper, we study linear programming (LP) relaxations for the Steiner
tree problem, and their properties. Numerous such formulations are known (e.g.,
see [7,11,16,17,24,25]), and their study has led to impressive running time im-
provements for integer programming based methods. Despite the significant body
of work in this area, none of the known relaxations is known to exhibit an in-
tegrality gap provably smaller than 2. The integrality gap of a relaxation is the
maximum ratio of the cost of integral and fractional optima, over all instances.
It is commonly regarded as a measure of strength of a formulation. One of the
contributions of this paper are improved bounds on the integrality gap for a
number of Steiner tree LP relaxations.
A Steiner tree relaxation of particular interest is the bidirected cut relaxation
[7,25] (precise definitions will follow in Section 1.2). This relaxation has a flow
formulation using O(|E||R|) variables and constraints, which is much more com-
pact than the other relaxations we study. Also, it is also widely believed to have
an integrality gap significantly smaller than 2 (e.g., see [3,19,23]). The largest
lower bound on the integrality gap known is 8/7 (by Martin Skutella, reported
in [15]), and Chakrabarty et al. [3] prove an upper bound of 4/3 in so called
quasi-bipartite instances (where Steiner vertices form an independent set).
Another class of formulations are the so called hypergraphic LP relaxations
for the Steiner tree problem. These relaxations are inspired by the observation
that the minimum Steiner tree problem can be encoded as a minimum cost
hyper-spanning tree (see Section 1.2) of a certain hypergraph on the terminals.
They are known to be stronger than the bidirected cut relaxation [18], and it is
therefore natural to try to use them to get better approximation algorithms, by
drawing on the large corpus of known LP techniques. In this paper, we focus on
one hypergraphic LP in particular: the partition LP of Könemann et al. [15].

1.1 Our Results and Techniques


There are three classes of results in this paper: structural results, equivalence
results, and integrality gap upper bounds.
Structural results, Section 2: We extend the powerful technique of uncrossing,
traditionally applied to families of sets, to families of partitions. Set uncrossing
has been very successful in obtaining exact and approximate algorithms for a
variety of problems (for instance, [9,14,21]). Using partition uncrossing, we show
that any basic feasible solution to the partition LP has at most (|R| − 1) positive
variables (even though it can have an exponentially large number of variables
and constraints).
Equivalence results, Section 3: In addition to the partition LP, two other
hypergraphic LPs have been studied before: one based on subtour elimination
due to Warme [24], and a directed hypergraph relaxation of Polzin and Vahdati
Daneshmand [18]; these two are known to be equivalent [18]. We prove that in
fact all three hypergraphic relaxations are equivalent (that is, they have the same
objective value for any Steiner tree instance).
We also show that, on quasibipartite instances, the hypergraphic and the bidi-
rected cut LP relaxations are equivalent. This result is surprising since we are
Hypergraphic LP Relaxations for Steiner Trees 385

aware of no qualitative similarity to suggest why the two relaxations should be


equivalent. We believe a better understanding of the bidirected cut relaxation
is important because it is central in theory and practical for implementation.
Improved integrality gap upper bounds, Section 4: For uniformly quasi-
bipartite instances (quasibipartite instances where for each Steiner vertex, all
incident edges have the same cost), we show that the integrality gap of the hy-
.
pergraphic LP relaxations is upper bounded by 73/60 = 1.216. Our proof uses
the approximation algorithm of Gröpl et al. [13] which achieves the same ratio
with respect to the (integral) optimum. We show, via a simple dual fitting argu-
ment, that this ratio is also valid with respect to the LP value. To the best of our
knowledge this is the only nontrivial class of instances where the best currently
known approximation ratio and integrality gap upper bound √ are the. same.
√ .For general graphs, we give simple upper bounds of 2 2 − 1 = 1.83 and
3 = 1.729 on the integrality gap of the hypergraph relaxation. Call a graph
gainless if the minimum spanning tree of the terminals is the optimal Steiner tree.
To obtain these integrality gap upper bounds, we use the following key property
of the hypergraphic relaxation which was implicit in [15]: on gainless instances
(instances where the optimum terminal spanning tree is the optimal Steiner
tree), the LP value equals the minimum spanning tree and the integrality gap
is 1. Such a theorem was known for quasibipartite instances and the bidirected
cut relaxation (implicitly in [19], explicitly in [3]); we extend techniques of [3] to
obtain improved integrality gaps on all instances.
Remark 1. The recent independent work of Byrka et al. [2], which gives an im-
proved approximation for Steiner trees in general graphs, also shows an inte-
grality gap bound of 1.55 on the hypergraphic directed cut LP. This is stronger
than our integrality gap bounds and was obtained prior to the completion of our
paper; yet we include our bounds because they are obtained using fairly different
methods which might be of independent interest in certain settings.
The proof in [2] can be easily modified to show an integrality gap upper
bound of 1.28 in quasibipartite instances. Then using our equivalence result, we
get an integrality gap upper bound of 1.28 for the bidirected cut relaxation on
quasibipartite instances, improving the previous best of 4/3.

1.2 Bidirected Cut and Hypergraphic Relaxations


The Bidirected Cut Relaxation. The first bidirected LP was given by Ed-
monds [7] as an exact formulation for the spanning tree problem. Wong [25]
later extended this to obtain the bidirected cut relaxation for the Steiner tree
problem, and gave a dual ascent heuristic based on the relaxation. For this re-
laxation, introduce two arcs (u, v) and (v, u) for each edge uv ∈ E, and let both
of their costs be cuv . Fix an arbitrary terminal r ∈ R as the root. Call a subset
U ⊆ V valid if it contains a terminal but not the root, and let valid(V ) be the
family of all valid sets. Clearly, the in-tree rooted at r (the directed tree with
all vertices but the root having out-degree exactly 1) of a Steiner tree T must
have at least one arc with tail in U and head outside U , for all valid U . This
386 D. Chakrabarty, J. Könemann, and D. Pritchard

leads to the bidirected cut relaxation (B) (shown in Figure 1 with dual) which
has a variable for each arc a ∈ A, and a constraint for every valid set U . Here
and later, δ out (U ) denotes the set of arcs in A whose tail is in U and whose head
lies in V \ U . When there are no Steiner vertices, Edmonds’ work [7] implies this
relaxation is exact.

 
min ca xa : x ∈ RA
≥0 (B) max zU :
valid(V )
z ∈ R≥0 (BD )
a∈A U
 
xa ≥ 1, ∀U ∈ valid(V ) zU ≤ ca , ∀a ∈ A
a∈δ out (U ) U :a∈δ out (U )

Fig. 1. The bidirected cut relaxation (B) and its dual (BD )

Goemans & Myung [11] made significant progress in understanding the LP,
by showing that the bidirected cut LP has the same value independent of which
terminal is chosen as the root, and by showing that a whole “catalogue” of very
different-looking LPs also has the same value; later Goemans [10] showed that
if the graph is series-parallel, the relaxation is exact. Rajagopalan and Vazirani
[19] were the first to show a non-trivial integrality gap upper bound of 3/2 on
quasibipartite graphs; this was subsequently improved to 4/3 by Chakrabarty et
al. [3], who gave another alternate formulation for (B).

Hypergraphic Relaxations. Given a Steiner tree T , a full component of T is a


maximal subtree of T all of whose leaves are terminals and all of whose internal
nodes are Steiner nodes. The edge set of any Steiner tree can be partitioned
in a unique way into full components by splitting at internal terminals; see
Figure 2 for an example.

Fig. 2. Black nodes are terminals and white nodes are Steiner nodes. Left: a Steiner
tree for this instance. Middle: the Steiner tree’s edges are partitioned into full com-
ponents; there are four full components. Right: the hyperedges corresponding to these
full components.

Let K be the set of all nonempty subsets of terminals (hyperedges). We as-


sociate with each K ∈ K a fixed full component spanning the terminals in K,
and let CK be its cost2 . The problem of finding a minimum-cost Steiner tree
2
We choose the minimum cost full component if there are many. If there is no full
component spanning K, we let CK be infinity. Such a minimum cost component can
be found in polynomial time, if |K| is a constant.
Hypergraphic LP Relaxations for Steiner Trees 387

spanning R now reduces to that of finding a minimum-cost hyper-spanning tree


in the hypergraph (R, K).
Spanning trees in (normal) graphs are well understood and there are many
different exact LP relaxations for this problem. These exact LP relaxations for
spanning trees in graphs inspire the hypergraphic relaxations for the Steiner tree
problem. Such relaxations have a variable xK for every3 K ∈ K, and the different
relaxations are based on the constraints used to capture a hyper-spanning tree,
just as constraints on edges are used to capture a spanning tree in a graph.
The oldest hypergraphic LP relaxation is the subtour LP introduced by Warme
[24] which is inspired by Edmonds’ subtour elimination LP relaxation [8] for the
spanning tree polytope. This LP relaxation uses the fact that there are no hyper-
cycles in a hyper-spanning tree, and that it is spanning. More formally, let ρ(X) :=

max(0, |X| − 1) be the rank  a sub-hypergraph (R, K )
of a set X of vertices. Then
is a hyper-spanning tree iff K∈K ρ(K) = ρ(R) and K∈K ρ(K ∩ S) ≤ ρ(S) for
every subset S of R. The corresponding LP relaxation, denoted below as (S), is
called the subtour elimination LP relaxation.
 
min CK xK : x ∈ RK ≥0 , xK ρ(K) = ρ(R), (S)
K∈K K∈K


xK ρ(K ∩ S) ≤ ρ(S), ∀S ⊂ R
K∈K

Warme showed that if the maximum hyperedge size r is bounded by a constant,


the LP can be solved in polynomial time.
The next hypergraphic LP introduced for Steiner tree was a directed hyper-
graph formulation (D), introduced by Polzin and Vahdati Daneshmand [18], and
inspired by the bidirected cut relaxation. Given a full component K and a ter-
minal i ∈ K, let K i denote the arborescence obtained by directing all the edges
of K towards i. Think of this as directing the hyperedge K towards i to get the
directed hyperedge K i . Vertex i is called the head of K i while the terminals in
K \ i are the tails of K. The cost of each directed hyperedge K i is the cost of the
corresponding undirected hyperedge K. In the directed hypergraph formulation,
there is a variable xK i for every directed hyperedge K i . As in the bidirected cut
relaxation, there is a vertex r ∈ R which is a root, and as described above, a
subset U ⊆ R of terminals is valid if it does not contain the root but contains
at least one vertex in R. We let Δout (U ) be the set of directed full components


coming out of U , that is all K i such that U ∩ K = ∅ but i ∈ / U . Let K be the
set of all directed hyperedges. We show the directed hypergraph relaxation and
its dual in Figure 3.
Polzin & Vahdati Daneshmand [18] showed that OPT (D) = OPT (S). More-
over they observed that this directed hypergraphic relaxation strengthens the
bidirected cut relaxation.
3
Observe that there could be exponentially many hyperedges. This computational
issue is circumvented by considering hyperedges of size at most r, for some constant
r. By a result of Borchers and Du [1], this leads to only a (1 + Θ(1/ log r)) factor
increase in the optimal Steiner tree cost.
388 D. Chakrabarty, J. Könemann, and D. Pritchard

 →
− 
K
min CK xK i : x ∈ R≥0 (D) max zU : z ∈ R≥0
valid(R)
(DD )
K∈K,i∈K U



xK i ≥ 1, ∀ valid U ⊆ R z U ≤ CK , ∀K ∈ K, i ∈ K
K i ∈Δout (U ) U :K∩U =∅,i∈U
/

Fig. 3. The directed hypergraph relaxation (D) and its dual (DD )

Lemma 1 ([18]). For any instance, OPT (D) ≥ OPT (B). There are instances
for which this inequality is strict.
Könemann et al. [15], inspired by the work of Chopra [6], described a partition-
based relaxation which captures that given any partition of the terminals, any
hyper-spanning tree must have sufficiently many “cross hyperedges”. More for-
mally, a partition, π, is a collection of pairwise disjoint nonempty terminal sets
(π1 , . . . , πq ) whose union equals R. The number of parts q of π is referred to
as the partition’s rank and denoted as r(π). Let ΠR be the set of all partitions
of R. Given a partition π = {π1 , . . . , πq }, define the rank contribution rcπK of
hyperedge K ∈ K for π as the rank reduction of π obtained by merging the
K := |{i : K ∩ πi = ∅}| − 1. Then a
π
parts of π that are touched by K; i.e., rc

hyper-spanning tree (R, K ) must satisfy K∈K rcπK ≥ r(π) − 1. The partition
based LP of [15] and its dual are given in Figure 4.

 
min CK x K : x ∈ RK
≥0 (P) max (r(π) − 1)yπ : y ∈ RΠ
≥0 (PD )
R

K∈K π



xK rcπK ≥ r(π) − 1, ∀π ∈ ΠR yπ rcπK ≤ CK , ∀K ∈ K


K∈K π∈ΠR

Fig. 4. The unbounded partition relaxation (P) and its dual (PD )

The feasible region of (P) is unbounded, since if x is a feasible solution for (P)
then so is any x ≥ x. We obtain a bounded partition LP relaxation, denoted by
(P  ) and shown below, by adding a valid equality constraint to the LP.
 

min CK xK : x ∈ (P), xK (|K| − 1) = |R| − 1 (P  )


K∈K K∈K

2 Uncrossing Partitions
In this section we are interested in uncrossing a minimal set of tight partitions
that uniquely define a basic feasible solution to (P). We start with a few pre-
liminaries necessary to state our result formally.

2.1 Preliminaries
We introduce some needed well-known properties of partitions that arise in com-
binatorial lattice theory [22].
Hypergraphic LP Relaxations for Steiner Trees 389

(a) (b) (c)


Fig. 5. Illustrations of some partitions. The black dots are the terminal set R. (a): two
partitions; neither refines the other. (b): the meet of the partitions from (a). (c): the
join of the partitions from (a).

Definition 1. We say that a partition π  refines another partition π if each part


of π  is contained in some part of π. We also say π coarsens π  . Two partitions
cross if neither refines the other. A family of partitions forms a chain if no pair
of them cross. Equivalently, a chain is any family π 1 , π 2 , . . . , π t such that π i
refines π i−1 for each 1 < i ≤ t.

The family ΠR of all partitions of R forms a lattice with a meet operator ∧ :


ΠR2
→ ΠR and a join operator ∨ : ΠR 2
→ ΠR . The meet π ∧ π  is the coarsest
partition that refines both π and π , and the join π ∨ π  is the most refined


partition that coarsens both π and π  . See Figure 5 for an illustration.

Definition 2 (Meet of partitions). Let the parts of π be π1 , . . . , πt and let


the parts of π  be π1 , . . . , πu . Then the parts of the meet π ∧ π  are the nonempty
intersections of parts of π with parts of π  ,

π ∧ π  = {πi ∩ πj | 1 ≤ i ≤ t, 1 ≤ j ≤ u and πi ∩ πj = ∅}.

Given a graph G and a partition π of V (G), we say that G induces π if the parts
of π are the vertex sets of the connected components of G.
Definition 3 (Join of partitions). Let (R, E) be a graph that induces π, and
let (R, E  ) be a graph that induces π  . Then the graph (R, E ∪ E  ) induces π ∨ π  .
 π
Given a feasible solution x to (P), a partition π is tight if K∈K xK rcK =
r(π) − 1. Let tight(x) be the set of all tight partitions. We are interested in
uncrossing this set of partitions. More precisely, we wish to find a cross-free set
of partitions (chain) which uniquely defines x. One way would be to prove the
following.
Property 1. If two crossing partitions π and π  are in tight(x), then so are
π ∧ π  and π ∨ π  .
This type of property is already well-used [9,14,21] for sets (with meets and joins
replaced by unions and intersections respectively), and the standard approach is
390 D. Chakrabarty, J. Könemann, and D. Pritchard

the following. The typical proof considers the constraints in (P) corresponding to
π and π  and uses the “supermodularity” of the RHS and the “submodularity”
of the coefficients in the LHS. In particular, if the following is true,

∀π, π  : r(π ∨ π  ) + r(π ∧ π  ) ≥ r(π) + r(π  ) (1)


  
∀K, π, π  : rcπK + rcπK ≥ rcπ∨π
K + rcπ∧π
K (2)

then Property 1 can be proved easily by writing a string of inequalities.4


Inequality (1) is indeed true (see, for example, [22]), but unfortunately in-
equality (2) is not true in general, as the following example shows.
Example 1. Let R = {1, 2, 3, 4}, π = {{1, 2}, {3, 4}} and π  = {{1, 3}, {2, 4}}.

Let K denote the full component {1, 2, 3, 4}. Then rcπK + rcπK = 1 + 1 < 0 + 3 =
 
rcπ∨π
K + rcπ∧π
K .
Nevertheless, Property 1 is true; its correct proof is given in the full version of
this paper [4] and depends on a simple though subtle extension of the usual
approach. The crux of the insight needed to fix the approach is not to consider
pairs of constraints in (P), but rather multi-sets which may contain more than
two inequalities. Using this uncrossing result, we can prove the following theorem
(details are given in [4]). Here, we let π denote {R}, the unique partition with
(minimal) rank 1; later we use π to denote {{r} | r ∈ R}, the unique partition
with (maximal) rank |R|.
Theorem 1. Let x∗ be a basic feasible solution of (P), and let C be an inclusion-
wise maximal chain in tight(x∗ )\π. Then x∗ is uniquely defined by

rcπK x∗K = r(π) − 1 ∀π ∈ C. (3)
K∈K

Any chain of distinct partitions of R that does not contain π has size at most
|R| − 1, and this is an upper bound on the rank of the system in (3). Elementary
linear programming theory immediately yields the following corollary.
Corollary 1. Any basic solution x∗ of (P) has at most |R| − 1 non-zero
coordinates.

3 Equivalence of Formulations

In this section we describe our equivalence results. A summary of the known and
new results is given in Figure 6.
For lack of space, we present only sketches for our main equivalence results in
this extended abstract, and refer the reader to [4] for details.
 π
4
In this hypothetical scenario we get r(π) + r(π  ) − 2 = K xK (rcK + rcK ) ≥
π
 π∧π  
K xK (rcK + rcπ∨π
K ) ≥ r(π ∧ π  ) + r(π ∨ π  ) − 2 ≥ r(π) + r(π  ) − 2; thus the
inequalities hold with equality, and the middle one shows π ∧ π  and π ∨ π  are tight.
Hypergraphic LP Relaxations for Steiner Trees 391

OPT(P)

= [Thm. 2] = [4] ≥ [Lemma 1],[18]

OPT(P  ) OPT(D) OPT(B)

= [Thm. 3] = [18] ≤ in quasi-bipartite [Thm. 4]

OPT (S)

Fig. 6. Summary of relations among various LP relaxations

Theorem 2. The LPs (P  ) and (P) have the same optimal value.

Proof sketch. To show this, it suffices to find an optimum solution of (P) which
satisfies the equality in (P  ); i.e., we want to find a solution for which the
maximal-rank partition
 π is tight. We pick the optimum solution to (P) which
minimizes the sum K∈K xK |K|. Using Property 1, we show that either π is
tight or there is a shrinking operation which decreases K∈K xK |K| without
increasing the cost. Since the latter is impossible, the theorem is proved.

Theorem 3. The feasible regions of (P  ) and (S) are the same.

Proof sketch. We show that the inequalities defining (P  ) are valid for (S), and
vice-versa. Note that both have the same equality and non-negativity constraints.
To show that the partition inequality of (P  ) for π holds for any x ∈ (S), we
use the subtour inequalities in (S) for every part of π. For the other direction,
given any subset S ⊆ R, we invoke the inequality in (P  ) for the partition
π := {{S} as one part and the remaining terminals as singletons}.
Theorem 4. On quasibipartite Steiner tree instances, OPT (B) ≥ OPT (D).
Proof sketch. We look at the duals of the two LPs and we show OPT (BD ) ≥
OPT (DD ) in quasibipartite instances. Recall that the support of a solution to
(DD ) is the family of sets with positive zU . A family of sets is called laminar if
for any two of its sets A, B we have A ⊆ B, B ⊆ A, or A ∩ B = ∅. The following
fact follows along the standard line of “set uncrossing” argumentation.
Lemma 2. There is an optimal solution to (DD ) with laminar support.
Given the above result, we may now assume that we have a solution z to (DD )
whose support is laminar. The heart of the proof of Theorem 4 is to show that
z can be converted into a feasible solution to (BD ) of the same value.
Comparing (DD ) and (BD ) one first notes that the former has a variable for
every valid subset of the terminals, while the latter assigns values to all valid
subsets of the entire vertex
 set. We say that an edge  uv is satisfied for a candidate
solution z, if both a) U:u∈U,v∈U / z U ≤ c uv and b) / zU ≤ cuv hold; z
U:v∈U,u∈U
is then feasible for (BD ) if all edges are satisfied.
392 D. Chakrabarty, J. Könemann, and D. Pritchard

Let z be a feasible solution to (DD ). One easily verifies that all terminal-
terminal edges are satisfied. On the other hand, terminal-Steiner edges may
initially not be satisfied; e.g., consider the Steiner vertex v and its neighbours
depicted in Figure 7 below. Initially, none of the sets in z’s support contains
v, and the load on the edges incident to v is quite skewed: the left-hand side
of condition a) above may be large, while the left-hand side of condition b) is
initially 0.
To construct a valid solution for (BD ), we
therefore lift the initial value zS of each ter-
U
minal subset S to supersets of S, by adding
Steiner vertices. The lifting procedure processes u
each Steiner vertex v one at a time; when pro- v
cessing v, we change z by moving dual from U’ u’
some sets U to U ∪ {v}. Such a dual transfer
decreases the left-hand side of condition a) for
edge uv, and increases the (initially 0) left-hand
sides of condition b) for edges connecting v to
neighbours other than v. Fig. 7. Lifting variable zU
We are able to show that there is a way of
carefully lifting duals around v that ensures that all edges incident to v become
satisfied. The definition of our procedure will ensure that these edges remain
satisfied for the rest of the lifting procedure. Since there are no Steiner-Steiner
edges, all edges will be satisfied once all Steiner vertices are processed.
Throughout the lifting procedure, we will maintain that z remains unchanged,
when projected  to the terminals. The main consequence of this is that the ob-
jective value U⊆V zU remains constant throughout, and the objective value of
z in (BD ) is not affected by the lifting. This yields Theorem 4.

4 Improved Integrality Gap Upper Bounds

In this extended abstract, we show the improved bound of 73/60 for uniformly
quasibipartite
√ graphs, and due to space restrictions, we only show the weaker
.
(2 2 − 1) = 1.828 upper bound on general graphs.

4.1 Uniformly Quasibipartite Instances


Uniformly quasibipartite instances of the Steiner tree problem are quasibipartite
graphs where the cost of edges incident on a Steiner vertex are the same. They
were first studied by Gröpl et al. [13], who gave a 73/60 factor approximation
algorithm. We start by describing the algorithm of Gröpl et al. [13] in terms of
full components. A collection K of full components is acyclic if there is no list
of t > 1 distinct terminals and hyperedges in K of the form r1 ∈ K1  r2 ∈
K2 · · ·  rt ∈ Kt  r1 — i.e. there are no hypercycles.
Hypergraphic LP Relaxations for Steiner Trees 393

Procedure RatioGreedy
1: Initialize the set of acyclic components L to ∅.
2: Let L∗ be a minimizer of |L|−1
CL
over all full components L such that |L| ≥ 2
and L ∪ L is acyclic.
3: Add L∗ to L.
4: Continue until (R, L) is a hyper-spanning tree and return L.

Theorem 5. On a uniformly quasibipartite instance RatioGreedy returns a


Steiner tree of cost at most 73
60 OPT (P).

Proof sketch. Let t denote the number of iterations and L := {L1 , . . . , Lt } be the
ordered sequence of full components obtained. We now define a dual solution y
to (PD ). Let π(i) denote the partition induced by the connected components of
{L1 , . . . , Li }. Let θ(i) denote CLi /(|Li | − 1) and note that θ is nondecreasing.
Define θ(0) = 0 for convenience. We define a dual solution y with

yπ(i) = θ(i + 1) − θ(i)

for 0 ≤ i < t, and all other coordinates


 of y set to zero. It is straightforward to
verify that the objective value i yπ(i) (r(π(i)) − 1) of y in (PD ) equals C(L).
The key is to show that for all K ∈ K,
 π(i)
yπ(i) rcK ≤ (|K| − 1 + H(|K| − 1))/|K| · CK , (4)
i

where H denotes the harmonic series; this is obtained by using the greedy nature
of the algorithm and the fact that, in uniformly quasi-bipartite graphs, CK  ≤

CK |K | 
|K| whenever K ⊂ K. Now, (|K| − 1 + H(|K| − 1))/|K| is always at most

60 . Thus (4) implies that 73 · y is a feasible dual solution, which completes the
73 60

proof.

4.2 General Graphs


For conciseness we let a “graph” be a triple G = (V, E, R) where R ⊂ V are
G’s terminals. In the following, we let mtst(G; c) denote the minimum terminal
spanning tree, i.e. the minimum spanning tree of the terminal-induced subgraph
G[R] under edge-costs c : E → R. We will abuse notation and let mtst(G; c)
mean both the tree and its cost under c.
When contracting an edge uv in a graph, the new merged node resulting from
contraction is defined to be a terminal iff at least one of u or v was a terminal;
this is natural since a Steiner tree in the new graph is a minimal set of edges
which, together with uv, connects all terminals in the old graph. Our algorithm
performs contraction, which may introduce parallel edges, but one may delete
all but the cheapest edge from each parallel class without affecting the analysis.
Our algorithm proceeds in stages. In each stage we apply the operation G →
G/K which denotes contracting all edges in some full component K. To describe
394 D. Chakrabarty, J. Könemann, and D. Pritchard

and analyze the algorithm we introduce some notation. For a minimum terminal
spanning tree T = mtst(G; c) define dropT (K; c) := c(T ) − mtst(G/K; c). We
also define gainT (K; c) := dropT (K) − c(K), where c(K) is the cost of full
component K. A tree T is called gainless if for every full component K we have
gainT (K; c) ≤ 0. The following useful fact is implicit in [15] (see also [4]).

Theorem 6 (Implicit in [15]). If mtst(G; c) is gainless, then OPT (P) equals


the cost of mtst(G; c).

We now give the algorithm and its analysis, which uses a reduced cost trick
introduced by Chakrabarty et al.[3].
Procedure Reduced One-Pass Heuristic

1: Define costs ce by ce := ce / 2 for all terminal-terminal edges e, and ce = ce
for all other edges. Let G1 := G, Ti := mtst(Gi ; c ), and i := 1.
2: The algorithm considers the full components in any order. When we exam-
ine a full component K, if gainTi (K; c ) > 0, let Ki := K, Gi+1 := Gi /Ki ,
Ti+1 := mtst(Gi+1 ; c ), and i := i + 1.
/f −1
3: Let f be the final value of i. Return the tree Talg := Tf ∪ i=1 Ki .

Note that the full components are scanned in any order and they are not ex-
amined a priori. Hence the algorithm works just as well if the full components
arrive “online,” which might be useful for some applications.

Theorem 7. c(Talg ) ≤ (2 2 − 1) OPT (P).

Proof. First we claim that gainTf (K; c ) ≤ 0 for all K. To see this there are
two cases. If K = Ki for some i, then we immediately see that dropTj (K) = 0
for all j > i so gainTf (K) = −c(K) ≤ 0. Otherwise (if for all i, K = Ki )
K had nonpositive gain when examined by the algorithm; and the well-known
contraction lemma (e.g., see [12, §1.5]) immediately implies that gainTi (K) is
nonincreasing in i, so gainTf (K) ≤ 0.
By Theorem 6, c (Tf ) equals the value of (P) on the graph Gf with costs c .
Since c ≤ c, and since at each step we only contract terminals, the√value of this
optimum must be at most OPT (P). Using the fact that c(Tf ) = 2c (Tf ), we
get
√ √
c(Tf ) = 2c (Tf ) ≤ 2 OPT (P) (5)

Furthermore, for every i we have gainTi (Ki ; c ) > 0, that is, dropTi (Ki ; c ) >
c (K) = c(K). The equality follows since K contains no terminal-terminal edges.
However, dropTi (Ki ; c ) = √12 dropTi (Ki ; c) because all edges of Ti are terminal-

terminal. Thus, we get for every i = 1 to f , dropTi (Ki ; c) > 2 · c(Ki ).
Since dropTi (Ki ; c) := mtst(Gi ; c) − mtst(Gi+1 ; c), we have


f −1
dropTi (Ki ; c) = mtst(G; c) − c(Tf ).
i=1
Hypergraphic LP Relaxations for Steiner Trees 395

Thus, we have

 −1
1 
f f
1
c(Ki ) ≤ √ dropTi (Ki ; c) = √ (mtst(G; c) − c(Tf ))
i=1
2 i=1
2
1
≤ √ (2 OPT (P) − c(Tf ))
2
where we use the fact that mtst(G, c) is at most twice OPT (P)5 . Therefore

1 !
 −1
f

c(Talg ) = c(Tf ) + c(Ki ) ≤ 1 − √ c(Tf ) + 2 OPT (P).
i=1
2

Finally, using c(Tf ) ≤ 2 OPT (P) from (5), the proof of Theorem 7 is complete.

References
1. Borchers, A., Du, D.: The k-Steiner ratio in graphs. SIAM J. Comput. 26(3), 857–
869 (1997)
2. Byrka, J., Grandoni, F., Rothvoß, T., Sanità, L.: An improved LP-based approxi-
mation for Steiner tree. In: Proc. 42nd STOC (to appear 2010)
3. Chakrabarty, D., Devanur, N.R., Vazirani, V.V.: New geometry-inspired relax-
ations and algorithms for the metric Steiner tree problem. In: Lodi, A., Panconesi,
A., Rinaldi, G. (eds.) IPCO 2008. LNCS, vol. 5035, pp. 344–358. Springer, Heidel-
berg (2008)
4. Chakrabarty, D., Könemann, J., Pritchard, D.: Hypergraphic LP relaxations for
Steiner trees. Technical Report 0910.0281, arXiv (2009)
5. Chlebı́k, M., Chlebı́ková, J.: Approximation hardness of the Steiner tree problem
on graphs. In: Penttonen, M., Schmidt, E.M. (eds.) SWAT 2002. LNCS, vol. 2368,
pp. 170–179. Springer, Heidelberg (2002)
6. Chopra, S.: On the spanning tree polyhedron. Operations Research Letters 8, 25–29
(1989)
7. Edmonds, J.: Optimum branchings. Journal of Research of the National Bureau of
Standards B 71B, 233–240 (1967)
8. Edmonds, J.: Matroids and the greedy algorithm. Math. Programming 1, 127–136
(1971)
9. Edmonds, J., Giles, R.: A min-max relation for submodular functions on graphs.
Annals of Discrete Mathematics 1, 185–204 (1977)
10. Goemans, M.X.: The Steiner tree polytope and related polyhedra. Math. Pro-
gram. 63(2), 157–182 (1994)
11. Goemans, M.X., Myung, Y.: A catalog of Steiner tree formulations. Networks 23,
19–28 (1993)
12. Gröpl, C., Hougardy, S., Nierhoff, T., Prömel, H.J.: Approximation algorithms for
the Steiner tree problem in graphs. In: Cheng, X., Du, D. (eds.) Steiner trees in
industries, pp. 235–279. Kluwer Academic Publishers, Norvell (2001)
5
This follows using standard arguments, and can be seen, for instance, by applying
Theorem 6 to the cost-function with all terminal-terminal costs divided by 2, and
using short-cutting.
396 D. Chakrabarty, J. Könemann, and D. Pritchard

13. Gröpl, C., Hougardy, S., Nierhoff, T., Prömel, H.J.: Steiner trees in uniformly
quasi-bipartite graphs. Inform. Process. Lett. 83(4), 195–200 (2002); Preliminary
version appeared as a Technical Report at TU Berlin (2001)
14. Jain, K.: A factor 2 approximation algorithm for the generalized Steiner network
problem. Combinatorica 21(1), 39–60 (2001); Preliminary version appeared in Proc.
39th FOCS, pp. 448–457 (1998)
15. Könemann, J., Pritchard, D., Tan, K.: A partition-based relaxation for Steiner
trees. Math. Programming (2009) (in press)
16. Polzin, T.: Algorithms for the Steiner Problem in Networks. PhD thesis, Universität
des Saarlandes (February 2003)
17. Polzin, T., Vahdati Daneshmand, S.: A comparison of Steiner tree relaxations. Dis-
crete Applied Mathematics 112(1-3), 241–261 (2001); Preliminary version appeared
at COS 1998 (1998)
18. Polzin, T., Vahdati Daneshmand, S.: On Steiner trees and minimum spanning trees
in hypergraphs. Oper. Res. Lett. 31(1), 12–20 (2003)
19. Rajagopalan, S., Vazirani, V.V.: On the bidirected cut relaxation for the metric
Steiner tree problem. In: Proceedings of ACM-SIAM Symposium on Discrete Al-
gorithms, pp. 742–751 (1999)
20. Robins, G., Zelikovsky, A.: Tighter bounds for graph Steiner tree approximation.
SIAM J. Discrete Math. 19(1), 122–134 (2005); Preliminary version appeared as
Improved Steiner tree approximation in graphs at SODA 2000 (2000)
21. Singh, M., Lau, L.C.: Approximating minimum bounded degree spanning trees to
within one of optimal. In: Proc. 39th STOC, pp. 661–670 (2007)
22. Stanley, R.P.: Enumerative Combinatorics, vol. 1. Wadsworth & Brooks/Cole
(1986)
23. Vazirani, V.: Recent results on approximating the Steiner tree problem and its
generalizations. Theoret. Comput. Sci. 235(1), 205–216 (2000)
24. Warme, D.: Spanning Trees in Hypergraphs with Applications to Steiner Trees.
PhD thesis, University of Virginia (1998)
25. Wong, R.T.: A dual ascent approach for Steiner tree problems on a directed graph.
Math. Programming 28, 271–287 (1984)
Efficient Deterministic Algorithms for Finding a
Minimum Cycle Basis in Undirected Graphs

Edoardo Amaldi1 , Claudio Iuliano1 , and Romeo Rizzi2


1
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
{amaldi,iuliano}@elet.polimi.it
2
Dipartimento di Matematica e Informatica, Università degli Studi di Udine, Italy
[email protected]

Abstract. We consider the problem of, given an undirected graph G


with a nonnegative weight on each edge, finding a basis of the cycle
space of G of minimum total weight, where the total weight of a basis is
the sum of the weights of its cycles. Minimum cycle bases are of interest
in a variety of fields. In [13] Horton proposed a first polynomial-time al-
gorithm where a minimum cycle basis is extracted from a polynomial-size
subset of candidate cycles in O(m3 n) by using Gaussian elimination. In
a different approach, due to de Pina [7] and refined in [15], the cycles of a
minimum cycle basis are determined sequentially in O(m2 n+mn2 log n).
A more sophisticated hybrid algorithm proposed in [18] has the best
worst-case complexity of O(m2 n/ log n + mn2 ).
In this work we revisit Horton’s and de Pina’s approaches and we pro-
pose a simple hybrid algorithm which improves the worst-case complex-
ity to O(m2 n/ log n). We also present a very efficient related algorithm
that relies on an adaptive independence test à la de Pina. Computa-
tional results on a wide set of instances show that the latter algorithm
outperforms the previous algorithms by one or two order of magnitude
on medium-size instances and allows to solve instances with up to 3000
vertices in a reasonable time.

1 Introduction

Let G = (V, E) be an undirected graph with n = |V | vertices and m = |E| edges.


Assume w.l.o.g. that G is simple, that is, without loops and multiple edges. An
elementary cycle is a connected subset of edges such that all incident vertices
have degree 2. A cycle C is a (possibly empty) subset of edges such that every
vertex of V is incident to an even number of edges in C. Cycles can be viewed
as the (possibly empty) union of edge-disjoint elementary cycles. Each cycle C
can be represented by an edge incidence vector in {0, 1}m where a component
is equal to 1 precisely when e ∈ C. The composition of two cycles C1 and C2 ,
denoted by C1 ⊕C2 , is defined as the symmetric difference between the subsets of
edges, i.e., (C1 ∪C2 )\(C1 ∩C2 ), or equivalently as the sum modulo 2 of their edge
incidence vectors. For any undirected graph G, the edge incidence vectors of all
the cycles, including the null cycle, form a vector space over GF(2), called the

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 397–410, 2010.

c Springer-Verlag Berlin Heidelberg 2010
398 E. Amaldi, C. Iuliano, and R. Rizzi

cycle space. If G is connected, the dimension of this space is ν = m − n + 1. Since


a cycle basis of G is the union of the cycle bases of the connected components of
G, we assume w.l.o.g. that G is connected. A maximal set of linearly independent
cycles is called a cycle basis.
We consider the following combinatorial optimization problem known as the
minimum cycle basis problem.
Min CB: Given an undirected connected graph G with a nonnegative
weight we assigned to each edge e ∈ E, find a cycle
ν basis C of min-
imum total
 weight, i.e., which minimizes w(C) = i=1 w(Ci ), where
w(Ci ) = e∈Ci we .
Cycle bases with small total weight are of interest in a variety of fields including
electrical networks [4], periodic event scheduling [7], chemistry and biochem-
istry [10]. For instance, in the test of electrical circuits the problem arises when
one wishes to check as fast as possible that Kirchhoff’s law is satisfied along all
its loops. Variants of Min CB where the cycle bases are restricted to have a cer-
tain structure (e.g. to be weakly fundamental [20] and strictly fundamental [8],
integral or totally unimodular [16]) also arise in practice. The reader is referred
to the recent survey [14].
Early references to Min CB go back to the Sixties (see e.g. [21,22] in Rus-
sian) and the first polynomial-time algorithm was proposed by Horton only in
1987 [13]. A different approach was presented in de Pina’s Ph.D. thesis [7] and
improved to O(m2 n + mn2 log n) in [15]. A hybrid algorithm in O(m2 n2 ) was
proposed in [17]. Currently, the best deterministic algorithm is the one described
in [18] by Mehlhorn and Michail, with an O(m2 n/ log n + mn2 ) worst-case com-
plexity. In joint work with Jurkiewicz and Mehlhorn [1] we have recently pre-
sented the best randomized algorithms for undirected and directed graphs, and
improved the deterministic algorithm for planar graphs in [12].
In this paper we are concerned with deterministic algorithms for Min CB.
Most of the previous work aims at reducing the worst-case complexity and little
attention has been devoted so far to evaluate the practical performance of these
algorithms. The only detailed computational work we are aware of is [17].
We revisit Horton’s and de Pina’s approaches and propose a hybrid algo-
rithm which improves the best worst-case complexity to O(m2 n/ log n). This is
achieved by restricting attention to the so-called isometric cycles, which were
mentioned in [13] but whose power was first exploited by us in [1], and by us-
ing a bit packing technique appropriately combined with the divide-and-conquer
framework described in [15]. We also present a very efficient algorithm that fo-
cuses on isometric cycles and is based on an adaptive independence test à la de
Pina, which outperforms in practice all previous algorithms on a set of bench-
mark instances. In particular, it is faster by one or two order of magnitude on
medium-size instances and it allows to solve instances with up to 3000 vertices
in a reasonable time.
Efficient Deterministic Algorithms for Finding a Minimum Cycle Basis 399

2 Previous Work

Before presenting our improved algorithms, we need to summarize the main


algorithms in the literature.
Horton’s polynomial-time algorithm [13] proceeds as follows. First, all-pairs
shortest (minimum weight) paths puv , for all u, v ∈ V , are computed in O(nm +
n2 logn), assuring uniqueness by a lexicographic ordering on the vertices [12,1,14].
For any vertex x, let Tx denote the shortest path tree rooted at x. The key ob-
servation is that a O(nm) size set of candidate cycles H, that we refer to as
Horton set, is guaranteed to contain a minimum cycle basis. For each possible
pair of vertex x and edge [u, v] not in Tx , the set H contains the candidate cycle
formed by the two shortest paths pxu and pxv plus the edge [u, v] itself. Degen-
erate cases where pxu and pxv share some edges are not considered. The number
of such pairs being at most nν, all candidate cycles in H can be generated in
O(nm). Since all cycles in a graph form a matroid [4], a greedy procedure can
be applied: cycles are sorted by nondecreasing weight in O(nm log n) and the
ν lightest independent cycles are extracted by an independence test. Since the
latter step, which is performed via Gaussian elimination in O(m3 n), is the bottle-
neck, this is also the overall complexity of the algorithm. In [11], the complexity
is reduced to O(mω ), where ω is the exponent of the fast matrix multiplication
and ω < 2.376 [6].
A different approach is proposed by de Pina in [7]. The cycles of a minimum
cycle basis are determined sequentially and linear independence is guaranteed by
maintaining at each step a basis of the linear space orthogonal to the subspace
spanned by the cycles selected so far. The vectors of this basis are referred to as
witnesses. Given an arbitrary spanning tree T of G and the corresponding set
ET = {e1 , . . . , eν } of co-tree edges, the initial basis {S1 , . . . , Sν } is the standard
E
basis of GF(2) T , where each Sj , for 1 ≤ j ≤ ν, has only the ej -th component
equal to 1. Any cycle of G can be viewed as a restricted incidence vector in
E
GF(2) T and it is easy to verify that linear independence of the restricted vectors
is equivalent to linear independence of the full incidence vectors. At the beginning
of the i-th iteration, the basis consists of linearly independent vectors Si , . . . , Sν
such that Cl , Sj  = 0, for i ≤ j ≤ ν and 1 ≤ l ≤ i−1, where C1 , . . . , Ci−i are the
cycles already selected. Given the current set of witnesses {Si , . . . , Sν }, the cycle
Ci is determined as the lightest cycle C in the graph G such that C, Si  = 1.
This can be done in O(nm + n2 log n) by n shortest path computations in an
appropriate graph. Since Ci is not orthogonal to Si , Ci is linearly independent
from C1 , . . . , Ci−1 . The optimality of the resulting cycle basis is guaranteed by
the fact that such a lightest cycle C is selected at each step. The set of witnesses
{Si+1 , . . . , Sν } can then be updated in O(m2 ) to become orthogonal to Ci (while
preserving the orthogonality w.r.t. {C1 , . . . , Ci−1 }) by setting Sj = Sj ⊕ Si if
Ci , Sj  = 1, for each Sj with i + 1 ≤ j ≤ ν. By operating recursively on bulks of
witnesses and exploiting fast matrix multiplication, all witness updates can be
carried out in O(mω ) and the complexity of the overall algorithm can be reduced
from O(m3 + mn2 log n) to O(m2 n + mn2 log n) [15].
400 E. Amaldi, C. Iuliano, and R. Rizzi

A hybrid algorithm that combines the two previous approaches is presented


in [17]. First, Horton candidate cycles are generated and sorted by nondecreasing
weight. Then a variant of de Pina’s scheme is applied where, at the i-th iteration,
the search for the lightest cycle Ci is performed in H. The Horton cycles are
tested according to nondecreasing weight until one non-orthogonal to Si is found.
Since there are at most nν cycles, each one with at most n edges, each iteration
is O(mn2 ) for a total of O(m2 n2 ). This dominates the O(mω ) improved witness
update and is also the overall complexity of the algorithm.
A hybrid variant with two major differences is proposed by Mehlhorn and
Michail in [18]. Horton candidate cycles are generated only for pairs of vertex x
and edge [u, v] with x in a close-to-minimum Feedback Vertex Set, that is, a set of
vertices that intersects every cycle. Such a FVS-based set of candidate cycles is
still guaranteed to contain a minimum cycle basis. Although finding a minimum
FVS is NP-hard, an approximate solution within a factor 2 can be obtained in
polynomial time [3]. A simple way to extract a minimum cycle basis from the
resulting set of candidate cycles then leads to an overall O(m2 n) complexity,
which can be further reduced to O(m2 n/ log n + mn2 ) by using a bit-packing
trick.

3 Improved Deterministic Algorithms


As it was the case in [1], an important step towards improved deterministic
algorithms is to restrict attention to a subset of Horton candidate cycles, referred
to as isometric cycles. A cycle C is isometric if for any two vertices u and v on
C, puv is in C, i.e., if C cannot be obtained as the composition of two cycles of
smaller weight. The set of isometric cycles, denoted by I, is a subset of H and
still contains a minimum cycle basis. A cycle in H can be generated from different
pairs of vertex x and edge [u, v]. Each pair gives rise to a different representation
(x, [u, v]) of the same cycle that we denote by C(x, [u, v]). Clearly, the number
of the representations of any cycle C cannot exceed the number of vertices, and
hence of edges, in C. As observed in [1], the isometric cycles are exactly those
which have a number of representations in H equal to the number of edges.
Although the resulting reduced set I of candidate cycles is still of cardinality
O(nm), it has the following simple but fundamental property.
Sparseness property: The total number of edges (nonzero components
in the corresponding incidence vectors) over all isometric cycles is at most
nν.
In [1] we also describe an efficient O(nm) procedure which allows to detect a
single representation of each isometric cycle without explicitly constructing the
non-isometric cycles.
It is interesting to point out that the concept of isometric cycle encompasses
other previous attempts to reduce the size of the set of candidate cycles H.
In [13] Horton suggests to remove for each cycle C the redundant representations
(which yield duplicate candidate cycles) from H by keeping only that generated
Efficient Deterministic Algorithms for Finding a Minimum Cycle Basis 401

3 3 3
1 4 1 4 1 4
2 2 2 2 2 2

3 5 3 3 5 3 3 5 3

2 2 2 2 2 2
2 3 2 3 2 3
3 3 3

(a) (b) (c)


Fig. 1. (a) A small weighted graph in (a) with n = 5, m = 8, and ν = 4. (b) T1 in
bold, with C(1, [2, 5]) and C(1, [4, 5]) of weight 7, C(1, [2, 3]) and C(1, [3, 4]) of weight
10; T2 , T3 , and T4 are identical to T1 up to rotations and lead to similar cycles. (c) T5
in bold, with the 4 inner triangles of weight 7 which are the only isometric cycles. H
without duplicates contains 7 cycles: the 4 isometric cycles plus C(1, [2, 5]), C(1, [4, 5]),
C(2, [3, 4]), while C(2, [1, 4]) is discarded. Any minimum FVS leads to a FVS-based
set of candidate cycles containing 6 cycles. For the FVS marked in (a), the FVS-based
set contains C(1, [2, 5]) and C(1, [4, 5]) in addition to the 4 isometric cycles.

from the pair involving the vertex in C with smallest index, assuming a given
numbering of the vertices from 1 to n. All redundant representations (x, [u, v])
can be discarded without additional time by preliminarily identifying if the path
pxu or the path pxv contains a vertex smaller than x. Non-isometric cycles are
discarded if they do not admit a representation for their vertex of smallest index.
Since an isometric cycle has a representation for all its vertices and at least one
of them belongs to any FVS, the FVS-based set of candidate cycles considered
in [18] contains the set I of isometric cycles, while the reverse is not true. An
example is illustrated in Fig. 1. As we shall see in Section 4, restricting attention
to isometric cycles leads to a substantially smaller set of candidate cycles w.r.t.
just removing all the duplicate ones or focusing on the FVS-based ones.

3.1 An O(m2 n/ log n) Deterministic Algorithm


We follow a hybrid strategy. First, we generate all the isometric cycles (I)
with the above-mentioned O(nm) procedure and we sort them by nondecreasing
weight. This can be done in O(nm log n). Then we adopt de Pina’s scheme with
the key difference that we restrict the search for the cycles to the set I.
We now describe how we achieve the O(m2 n/ log n) worst-case complexity.
We assume a RAM model of computation with words of logarithmic length
(of at least log n bits) that allows bitwise operations in constant time. We
refer to the divide-and-conquer framework described in [15] which exploits fast
matrix multiplication to update many witnesses simultaneously. In [15], when
searching for the new cycles to be included in the minimum basis, the witnesses
are considered one by one but they are updated in bulks by using a recursive
strategy. To achieve the improved worst-case complexity, we consider blocks of
log n witnesses as single elements in the above-mentioned update procedure.
In addition, within each block we exploit the sparsity property deriving from the
restriction to isometric cycles and apply an ad hoc bit-packing technique.
402 E. Amaldi, C. Iuliano, and R. Rizzi

We first present the underlying idea of the witness update procedure in [15]
in an iterative, rather than recursive, manner. For the sake of exposition, we as-
sume w.l.o.g. that the initial number of witnesses is a power of 2 with null vectors
beyond the ν-th position. There are always O(m) such vectors. At the i-th itera-
tion, for 1 ≤ i ≤ ν, the i-th witness Si is considered and the lightest cycle Ci with
Ci , Si  = 1 is found. The index i can be expressed in a unique way as i = 2q r,
where q is a nonnegative integer and r is a positive odd integer. Obviously, q = 0
if and only if i is odd. In the update phase, the 2q witnesses {Si+1 , . . . , Si+2q }
are updated so that they become orthogonal to {Ci+1−2q , . . . , Ci }, namely to
the last 2q cycles added to the basis. In [15] it is shown that the overall cost of
all update phases is O(mω ). In this work, we proceed by blocks of b := log n
witnesses and consider at the i-th iteration the i-th block instead of a single wit-
ness. For exposition purpose, we assume w.l.o.g. that b divides the initial number
of witnesses and that the number of blocks is a power of 2. For the i-th block,
with i = 2q r, after finding the corresponding b cycles of a minimum basis in the
way we shall describe below, the next 2q blocks of witnesses, namely the 2q b
witnesses contained in the blocks, are updated so that they become orthogonal
to the last 2q b cycles added to the basis. Since we consider blocks of witnesses,
the total amount of work needed for all the witness updates is lower than that
in [15] and still O(mω ).
In order to find the b cycles of a minimum basis corresponding to the i-th
block, we proceed as follows. At the beginning of the i-th iteration, we already
have selected s = (i − 1)b cycles. We must specify how to find the following
b cycles Cs+1 , . . . , Cs+b using the corresponding b witnesses Ss+1 , . . . , Ss+b . We
code the witnesses Ss+j , with 1 ≤ j ≤ b, in the following way: for each edge e ∈ E
we have a word W (e) in which the j-th bit is set to 1 if the e-th component of
Ss+j is 1. Then, we scan the isometric cycles searching for the lightest cycle
in the subspace spanned by Ss+1 , . . . , Ss+b , i.e., non-orthogonal to at least one
witness Ss+j , for 1 ≤ j ≤ b. A cycle C ∈ I is tested by making the x-or
among logarithmic words W (e), for each edge e ∈ C. Let W (C) denote the
resulting word. If W (C) is null, then C is orthogonal to all considered witnesses,
otherwise C is the selected cycle. The witnesses Ss+j non-orthogonal to C are
those whose j-th bit in W (C) is 1. Anyone of these witnesses, for example Ss+t ,
can be discarded while the remaining ones are updated. This can be implicitly
done in O(m) as follows: for each edge e ∈ E, if the e-th component in Ss+t
is 1 then W (e) := W (e) x-or W (C), otherwise W (e) remains unchanged. This
is equivalent to make the update according to the standard rule, but has the
advantage of preserving the witness coding in the desired form. Note that this
internal update is necessary only after finding a cycle of a minimum basis, so
that its cost in all phases is O(m2 ). After selecting C, we search for other b − 1
cycles in the same way. To select the b cycles we examine each isometric cycle at
most once. Due to the sparseness property, this takes O(nm). Since the above
step must be applied O(m/ log n) times, a minimum cycle basis is obtained in
O(m2 n/ log n). This dominates the O(mω ) cost of the witness update and thus
is the overall complexity of the algorithm.
Efficient Deterministic Algorithms for Finding a Minimum Cycle Basis 403

Note that our algorithm is simpler than the O(m2 n/ log n+mn2 ) one proposed
in [18] and has a better complexity for sparse graphs with m < n log n.

3.2 An Efficient Adaptive Isometric Cycles Extraction Algorithm


Since computational results (see Section 4) show that the independence test is
the bottleneck also in practice, we propose an efficient algorithm based not only
on the isometric cycles but also on a very effective adaptive independence test
à la de Pina. We refer to the overall algorithm as Adaptive Isometric Cycles
Extraction (AICE).
The main drawback of de Pina’s scheme is arbitrariness: the method works
for any choice of the spanning tree T in G and for any ordering e1 , . . . , eν of the
corresponding co-tree edges that induces the initial set of witnesses. But differ-
ent choices can lead to very different computational loads and times. Ideally, we
would like to pick a spanning tree which requires, along the whole procedure,
as few witness updates as possible and which tends to keep the witnesses as
sparse as possible. We have devised a procedure that tries to reduce the overall
number of elementary operations by iteratively and adaptively building the cur-
rent spanning tree T . Roughly speaking, starting from an empty T we keep on
adding the lightest available isometric cycle and greedily construct T including
as many edges of this newly added cycle as possible. In the independence test we
try to minimize the number of edges that must be considered and in the witness
update we aim at maintaining vector sparsity.
A detailed description of AICE is given in Algorithm 1. To fully exploit the
sparseness property, the isometric cycles and the witnesses are considered as
ordered list of edges corresponding to nonzero components of their incidence
vectors. Clearly, two incidence vectors are not orthogonal if and only if the
intersection of the two lists have an odd cardinality.
In our algorithm, we restrict attention to the set I of isometric cycles and
reverse the traditional de Pina’s scheme, where for each witness Si one looks
for the lightest non-orthogonal cycle Ci . Indeed, we scan the isometric cycles
according to nondecreasing weights and for each isometric cycle Ci we search for
a witness Si such that Ci and Si are not orthogonal. If such a witness does not
exist, Ci can be discarded. Another key difference is that we build the spanning
tree T in an adaptive way. We start with an empty spanning tree T , whose
set of edges is denoted by ET (Step 2). Since the co-tree edges are unknown, we
cannot build the witnesses of the initial standard basis but we can consider them
implicitly until the co-tree edges are identified. The set S (initially empty) only
contains the witnesses that have been made explicit by an update. In EcoT we
also keep track of the co-tree edges relative to T that do no longer appear in the
witnesses. Since the witness update is based on the symmetric difference of the
edge sets, the edges in EcoT correspond to the components that are no longer
relevant for the independence test. At each iteration, given the current (lightest)
isometric cycle C (Step 4), we neglect the irrelevant edges (in EcoT ) together
with the edges in T (Step 5) and we check if some edges of C can be added to T
without creating a cycle (Step 6). The remaining edges are then partitioned into
404 E. Amaldi, C. Iuliano, and R. Rizzi

Algorithm 1. AICE algorithm for finding a minimum cycle basis


Input: an undirected graph G = (V, E)
Output: a minimum cycle basis C
1: Generate all the isometric cycles (I) and sort them by nondecreasing weight
2: C := ∅, S := ∅, ET := ∅, EcoT := ∅
3: while E = ET ∪ EcoT do
4: Extract the lightest cycle C from I
5: CR := C\(ET ∪ EcoT )
6: ∀e ∈ CR : ET ∪ e is acyclic do ET := ET ∪ e, CR := CR \e
7: ES := {e ∈ CR : e ∈ Sj for some Sj ∈ S}, EN := CR \ES
8: if EN = {e1 , . . . , ek } = ∅ then
9: if k > 1 then S := S ∪ Sj with Sj := {ej , ej+1 }, for 1 ≤ j ≤ k − 1 end if
10: if ES = ∅ then ∀Sj ∈ S : |ES ∩ Sj | is odd do Sj := Sj ∪ e1 end if
11: if k = 1 and no Sj ∈ S is updated in (10) then EcoT := EcoT ∪ e1 end if
12: else if ES = ∅ and ∃Si ∈ S : |ES ∩ Si | is odd then
13: S := S\Si
14: ∀Sj ∈ S : |ES ∩ Sj | is odd do Sj := Sj ⊕ Si
15: EcoT := EcoT ∪ {e ∈ ES : e ∈ Sj for every Sj ∈ S}
16: else
17: Discard C
18: end if
19: if C is not discarded then C := C ∪ C end if
20: end while

ES and EN depending on whether they belong or do not belong to some explicit


witnesses (Step 7), where EN contains the newly identified co-tree edges. Different
cases may occur. In the most favorable one (Step 11), C contains only one new
co-tree edge and no witnesses need to be updated. Note that this is equivalent
to select an implicit witness Si with a single nonzero component that, from that
point in time, can be neglected and hence added to EcoT . If C contains k new
co-tree edges, with k > 2, we should select one of the corresponding implicit
witnesses and update the remaining k − 1 ones. This is achieved in Step 9 by
defining each one of the k − 1 witnesses by two consecutive co-tree edges so that
each one of these new co-tree edges is in at most two witnesses. It is easy to
verify that these witnesses are orthogonal to C. Explicit witnesses in S must
only be updated when ES is not empty. Note that the inner product with the
witnesses is restricted to this set of edges. The explicit witnesses that are not
orthogonal to the current cycle C are then appropriately updated. Whenever
possible the update is carried out by selecting as Si an implicit witness with a
single nonzero component (Step 10) or anyone of the explicit witnesses (Steps
12 to 15).
It is worth pointing out that in spite of the heuristic choices aiming at keeping
the witnesses as sparse as possible, the overall algorithm is guaranteed to yield
a minimum cycle basis. Although this approach does not seem to have a better
worst-case complexity than the original de Pina’s one, we shall see that it is
computationally very efficient.
Efficient Deterministic Algorithms for Finding a Minimum Cycle Basis 405

4 Computational Results
To evaluate the practical performance of our AICE algorithm, we carried out an
extensive computational campaign and compared the results with those provided
by the main alternative algorithms, namely, the improved de Pina’s one described
in [15], the hybrid one in [17], the hybrid one based on FVS [18] and an efficient
implementation of Horton’s method. In the previous computational work [17],
implementations are based on LEDA [19] and the executables are available. For
fairness of comparison, we have implemented all algorithms in C with the same
data structures. Since, due to vector sparsity, we do not expect actual speed-up
from fast matrix multiplication and bit packing tricks, we have not included
these features in the implementation of the previous algorithms and we have not
tested the algorithm described in Section 3.1.
As benchmark we use random graphs and hypercubes like those in [17] but
of larger size, and Euclidean graphs like in [2]. Random graphs are generated
using the G(n; p) model [9], for p = 0.3, 0.5, 0.9 and for p = 4/n (sparse graphs,
m ≈ n), and hypercubes are in dimension d = 7, . . . , 10, with n = 2d . Both
unweighted and weighted cases are considered, with integer weights randomly
chosen from the uniform distribution in the range [0 . . . 216 ]. Euclidean graphs
have density p = 0.3, 0.5, 0.9, and the weight of an edge is the Euclidean distance
between its endpoints, supposing that vertices are randomly distributed on a
10x10 square. The algorithms are run on a Xeon 2.0 Ghz 64 bit machine with 3
GB of memory, in Linux. We use the GNU gcc 4.1 compiler (g++ 4.1 for LEDA
implementations) with the -O3 optimization flag. For each type of graph, results
are reported as average (for ν and the size of the witnesses the averages are
rounded up) on 20 randomly generated instances with the same characteristics.
Table 1 reports the ratio between the size of the sets of candidate cycles and
the cyclomatic number ν. Both sets H and HFVS (the one of FVS-based candidate
cycles) are without duplicates. The set HFVS does not contain the representations
(x, [u, v]) whose corresponding cycle has a smaller vertex than x also belonging to
the given FVS. A close-to-minimum FVS is obtained by the heuristic in [5]. For
weighted graphs, the number of isometric cycles is very close to ν, so that very few
of them must be tested for independence. Also the standard deviation is in general
very small, less than 2%. The size of H and HFVS is up to 20 times larger than ν,
with a higher standard deviation, 10 − 20%. Thus even a close-to-minimum FVS
does not help. If duplicate cycles are not deleted, the size of both H and HFVS turns
out to be up to 10 times larger (for lack of space not shown in the table).
For unweighted graphs, isometric cycles are less effective for larger density
because in this case a random graph tends to become a complete graph with nν/3
isometric triangles. The particular weighting of Euclidean graphs also reduces
the impact of the restriction to isometric cycles.
In order to try to further trim the set of isometric cycles we have removed
isometric cycles that admit a wheel decomposition, i.e., which can be expressed
as the composition of more than two cycles of smaller weight with a vertex
r not incident to any edge in C such that each co-tree edge in C relative to
Tr closes a lighter fundamental cycle in Tr . For an example see Fig. 2(a). The
406

Table 1. Comparison of the number of candidate cycles divided by ν for the Horton cycles (H), the FVS-based cycles (HFVS ), and the
isometric cycles (I). Both H and HFVS do not include duplicates. For unweighted graphs, the isometric cycles which do not admit a wheel
decomposition (InoWh ) are also considered, with the generation time (in seconds). The label “x” indicates memory limit exceeded (3 GB).
Weighted Unweighted
n ν H HFVS I H HFVS I InoWh (Time)
Weighted hypercubes
100 1375 6.67 6.35 1.07 9.28 10.86 4.57 1.97 (0.03) Weighted random sparse
d ν H HFVS I
200 5775 11.65 11.51 1.06 13.02 15.94 7.78 2.44 (0.42) n ν H HFVS I
7 321 7.67 7.59 1.56
Random 0.3 300 13173 16.39 16.19 1.05 15.89 18.94 10.74 2.67 (1.70) 500 510 18.21 13.72 2.15
8 769 11.99 12.22 1.69
400 23527 19.40 19.23 1.04 19.13 23.82 13.76 2.83 (4.46) 750 761 25.59 19.45 2.51
500 36903 23.42 23.29 1.04 22.12 27.56 16.78 2.91 (9.46) 9 1793 20.12 20.70 1.79
1000 1010 31.86 23.10 2.73
10 4097 35.31 34.44 1.92
100 2379 6.29 6.21 1.04 10.81 12.55 9.05 1.76 (0.06)
200 9749 11.65 11.59 1.03 19.06 20.18 17.35 1.83 (0.65) Euclidean n = 200
Random 0.5 300 22119 15.45 15.38 1.03 27.47 29.29 25.67 1.89 (2.56) p ν H HFVS I InoWh (Time)
400 39556 20.01 19.88 1.03 35.98 40.43 34.19 1.93 (5.93)
500 61871 24.14 24.17 1.02 44.31 46.25 42.41 1.97 (11.66) 0.1 1781 27.13 25.26 3.32 1.08 (0.08)
0.3 5770 41.69 40.96 7.92 1.03 (0.30)
100 4360 6.62 6.66 1.02 27.34 27.36 27.12 1.06 (0.10)
0.5 9733 49.54 49.21 17.65 1.02 (0.68)
200 17706 11.15 11.15 1.02 54.26 54.58 54.04 1.05 (1.03)
E. Amaldi, C. Iuliano, and R. Rizzi

0.7 13728 56.89 56.78 33.04 1.01 (1.52)


Random 0.9 300 40052 15.15 15.09 1.01 81.22 82.04 81.02 1.01 (3.81)
400 71425 20.30 20.27 1.01 108.27 108.32 108.09 1.00 (9.19) 0.9 17697 63.36 63.28 53.83 1.01 (2.96)
500 111770 23.53 23.49 1.01 135.27 135.70 135.06 x

Table 2. Time in seconds and size of the witnesses for the AICE independence test and the original de Pina’s test restricted to the
isometric cycles. For AICE, the percentage of the selected cycles which do not require witness update (% c.n.u.) is also reported.
Weighted Unweighted
De Pina’s test on I AICE test De Pina’s test on I AICE test
n ν Time max |Si | avg|Si | Time max |Si | avg|Si | % c.n.u. Time max |Si | avg|Si | Time max |Si | avg|Si | % c.n.u.
100 1375 0.10 628 9 0.00 12 2 97.20 0.08 665 11 0.00 39 2 74.73
Random 0.3 200 5775 2.02 2656 13 0.00 16 2 98.08 1.57 2767 16 0.01 56 2 87.18
300 13173 15.88 6185 15 0.01 26 2 98.75 13.03 6399 18 0.02 39 2 92.51
100 2379 0.29 1105 10 0.00 12 2 98.15 0.21 1117 9 0.00 9 2 96.41
Random 0.5 200 9749 5.67 4560 13 0.01 17 2 98.93 4.82 4567 11 0.02 8 2 98.33
300 22119 74.68 10411 15 0.02 24 2 99.20 40.70 10461 12 0.07 8 2 98.83
100 4360 0.92 2007 9 0.00 8 2 99.16 0.54 1973 7 0.01 2 2 99.80
Random 0.9 200 17706 30.25 8273 13 0.02 13 2 99.44 11.96 8074 7 0.07 2 2 99.91
Table 3. Comparison of the running times in seconds for the main algorithms, for our C implementations as well as the LEDA available
ones. The labels “-” and “x” indicate that the time limit of 900 seconds and the memory limit of 3 GB are respectively exceeded.
LEDA implementation Our implementation LEDA implementation Our implementation
De Pina Hybrid Hyb FVS De Pina Hybrid Hyb FVS Horton AICE De Pina Hybrid Hyb FVS De Pina Hybrid Hyb FVS Horton AICE

n Weighted random 0.3 Unweighted random 0.3


100 1.09 16.68 7.09 0.31 0.35 0.89 0.05 0.01 3.55 0.64 1.80 0.43 0.07 0.61 0.04 0.01
200 18.49 - 413.95 4.83 46.10 27.37 0.70 0.08 139.47 17.73 47.60 10.30 1.18 5.57 0.54 0.08
300 111.01 - - 28.11 439.40 275.68 3.83 0.26 - x 263.39 60.34 10.57 23.68 2.68 0.29
400 574.30 - - 161.52 - - 11.39 0.78 - x - 418.28 47.93 79.46 8.00 0.76
500 - - - 474.31 - - 30.98 1.90 - x - - 143.36 211.75 20.13 1.67
n Weighted random 0.5 Unweighted random 0.5
100 2.58 69.13 21.93 0.74 1.56 1.53 0.09 0.02 9.13 1.63 3.07 0.87 0.17 0.67 0.18 0.02
200 46.64 - - 13.60 161.95 91.03 1.32 0.12 268.18 55.40 81.61 13.64 3.27 6.57 3.40 0.19
300 399.43 - - 101.65 - 676.70 6.02 0.39 - x 446.03 123.18 28.26 33.15 17.26 0.68
400 - - - 475.95 - - 24.67 1.18 - x - 528.03 107.59 120.76 61.49 1.72
500 - - - - - - 50.97 2.54 - x - - 283.58 309.33 171.99 3.54
n Weighted random 0.9 Unweighted random 0.9
100 7.34 309.14 76.75 1.98 10.41 3.53 0.18 0.03 18.86 3.73 5.42 1.24 0.33 0.59 1.29 0.06
200 180.76 - - 47.28 477.55 264.74 2.17 0.20 - x 152.99 24.14 5.27 6.95 23.76 0.55
300 - - - 472.30 - - 11.31 0.71 - x 827.04 195.04 31.75 32.62 154.88 2.04
400 - - - - - - 42.62 1.86 - x - - 108.90 109.36 601.42 5.30
500 - - - - - - 86.69 4.27 - x - - 296.74 289.61 - 11.12

LEDA implementation Our implementation


De Pina Hybrid Hyb FVS De Pina Hybrid Hyb FVS Horton AICE
Our implementation
n Weighted random sparse De Pina Hybrid Hyb FVS Horton AICE
500 0.83 2.52 31.71 0.44 0.27 59.54 0.27 0.21
750 2.33 7.65 108.38 1.25 0.81 260.37 0.81 0.63 p Euclidean n = 200
1000 4.74 16.47 249.72 2.63 1.70 769.09 1.65 1.32 0.1 1.31 1.72 6.19 0.35 0.03
0.3 6.30 45.51 53.16 2.07 0.10
d Weighted hypercubes 0.5 15.46 152.94 190.86 4.34 0.29
7 0.14 0.21 0.88 0.06 0.02 0.79 0.01 0.00 0.7 27.74 312.83 430.56 7.59 0.75
8 0.96 2.49 14.25 0.44 0.11 7.44 0.07 0.03 0.9 47.88 521.49 790.85 11.10 1.56
Efficient Deterministic Algorithms for Finding a Minimum Cycle Basis

9 6.41 25.21 134.52 3.04 0.95 75.57 0.58 0.32


10 42.85 265.08 - 19.28 10.37 - 3.92 1.96
407
408 E. Amaldi, C. Iuliano, and R. Rizzi

1
1 6 1
1 7
1 2 2 1 2
5 1
2
2 2 1
2 7 5 3 3 4 2 3
2 2 2 6 1
1 1 2
2 8
3 4 1
1

(a) (b)

Fig. 2. (a) A small example of wheel decomposition centered in vertex 7 (T7 in bold):
the isometric hexagonal cycle of weight 6 can be obtained as the composition of the
6 inner triangles of weight 5. (b) A wheel decomposition centered in vertex 4 (T4 in
bold) of the outer cycle of weight 8. Since the non-isometric cycle C(4, [7, 8]) of weight
9 can be separated by [5, 6] (dashed edge) in two cycles of strictly smaller weight (6
and 7), when decomposing the outer cycle the weight of the heavier of the two cycles,
7, is considered instead of 9.

isometric cycles are checked for a wheel decomposition after they are sorted by
non-decreasing weight. If a fundamental cycle of a possible wheel decomposition
is non-isometric, we consider the weight of the heavier of the two cycles in which
it can be separated, see Fig. 2(b). Since the total number of edges is bounded
by nν (sparseness property) and there are n vertices, it is easy to see that all
candidate cycles that admit a wheel decomposition can be discarded in O(mn2 ).
In Table 1 the set of isometric cycles that are left after checking for wheel
decomposition is denoted by InoWh . Although its size is very close to ν, in partic-
ular for Euclidean graphs, the time needed to obtain InoWh is very high compared
to the total time of AICE (reported in Table 3). This suggests that, due to the
efficiency of the independence test, it does not pay to further reduce the number
of cycles with a similar technique.
In Table 2, we assess the impact of the adaptive independence test of AICE
w.r.t. the original de Pina’s test. For a fair comparison, they are both applied
to the set I of isometric cycles. AICE results are averages on 20 instances,
whereas de Pina’s test results are averages on n instances (n = 100, 200, 300)
corresponding to n different randomly generated spanning trees that induce the
initial (standard) basis {S1 , . . . , Sν }. As in [17], we report statistics on the size
(number of nonzero components) of the witnesses Si , for 1 ≤ i ≤ ν, used in the
independence test, namely the maximum and the rounded up average cardinality.
Note that in the AICE tests the rounded up average of the size of the witnesses
Si , with 1 ≤ i ≤ ν, is always equal to 2. The maximum has a large standard
deviation, since it depends on the specific instance, but it is always much smaller
than that of de Pina’s test, whose standard deviation is less than 10%. Not only
the size of the witnesses is very small but for almost all cycles identified by AICE
no witness update is needed (Step 11 in Algorithm 1). Since many unnecessary
operations are avoided, the overall computing time is greatly reduced.
In Table 3, we compare the running times of the main algorithms. First, we
consider the algorithms whose implementation based on LEDA [19] is available,
Efficient Deterministic Algorithms for Finding a Minimum Cycle Basis 409

Table 4. Comparison of the running times in seconds for large instances. In most cases,
the previous algorithms exceed the time limit of 1800 seconds (“-”).

De Pina Hybrid Horton AICE De Pina Hybrid Horton AICE

n Weighted random 0.3 n Weighted random 0.9


1000 - - 361.04 24.10 1000 - - 1261.80 59.66
1500 - - - 87.46 1500 - - - 204.82
2000 - - - 195.82 2000 - - - 490.56
2500 - - - 396.12 2500 - - - 900.61
3000 - - - 685.16 3000 - - - 1424.03
n Weighted random 0.5 p Euclidean n = 500
1000 - - 629.57 32.76 0.1 42.53 416.87 16.41 0.85
1500 - - - 119.13 0.3 366.34 - 85.69 3.01
2000 - - - 278.72 0.5 1117.00 - 189.53 8.38
2500 - - - 575.96 0.7 - - 330.68 21.14
3000 - - - 1061.33 0.9 - - 510.17 43.67

namely de Pina’s [15], the Hybrid [17] and the FVS-based Hybrid [18] algo-
rithms. For a fair comparison, these algorithms have also been implemented in
C within the same environment used for AICE. De Pina’s algorithm is imple-
mented with the heuristics suggested in [15] and the Hybrid method is that
in [17] but duplicates are removed from H. In the FVS-based Hybrid algorithm
a close-to-minimum FVS is obtained by the heuristic in [5], but the heuristic
computing time is neglected, and duplicate cycles are also removed. We also de-
vised an efficient version of Horton’s algorithm using H without duplicates and
an ad hoc Gaussian elimination exploiting operations on GF(2). For Euclidean
graphs LEDA algorithms cannot be tested since they require integer weights.
The time limit is set to 900 seconds. For all results the standard deviation is in
general less than 10%. Our C implementation of the previous algorithms turns
out to be more effective than the ones based on LEDA. It is worth pointing out
that, except for dense unweighted random graphs, the ad hoc implementation of
Horton’s algorithm is substantially faster than sophisticated algorithms based
on de Pina’s idea. However, AICE outperforms all previous algorithms, in most
cases by one or two order of magnitude.
Finally, in Table 4 we report the results of our C implementations for larger
weighted instances. AICE finds an optimal solution for graphs with up to 3000
vertices within the time limit of 1800 seconds, while the other algorithms cannot
solve most of the instances.
An interesting open question is whether it is possible to do without the in-
dependence test, even though in practice it is unlikely to lead to an efficient
algorithm.

Acknowledgements. We thank Kurt Mehlhorn and Dimitrios Michail for mak-


ing available the executables of their algorithms.

References
1. Amaldi, E., Iuliano, C., Jurkiewicz, T., Mehlhorn, K., Rizzi, R.: Breaking the
O(m2 n) barrier for minimum cycle bases. In: Fiat, A., Sanders, P. (eds.) ESA
2009. LNCS, vol. 5757, pp. 301–312. Springer, Heidelberg (2009)
410 E. Amaldi, C. Iuliano, and R. Rizzi

2. Amaldi, E., Liberti, L., Maculan, N., Maffioli, F.: Edge-swapping algorithms for the
minimum fundamental cycle basis problem. Mathematical Methods of Operations
Research 69(12), 205–233 (2009)
3. Bafna, V., Berman, P., Fujito, T.: A 2-approximation algorithm for the undirected
feedback vertex set problem. SIAM J. Discrete Math. 12(3), 289–297 (1999)
4. Bollobas, B.: Graduate Texts in Mathematics, vol. 184. Springer, Heidelberg (2nd
printing)
5. Brunetta, L., Maffioli, F., Trubian, M.: Solving the feedback vertex set problem on
undirected graphs. Discrete Applied Mathematics 101(1-3), 37–51 (2000)
6. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions.
J. Symb. Comput. 9(3), 251–280 (1990)
7. De Pina, J.C.: Applications of shortest path methods. Ph.D. thesis, University of
Amsterdam, The Netherlands (1995)
8. Deo, N., Prabhu, G., Krishnamoorthy, M.S.: Algorithms for generating fundamen-
tal cycles in a graph. ACM Trans. on Mathematical Software 8(1), 26–42 (1982)
9. Erdös, P., Rényi, A.: On random graphs, I. Publicationes Mathematicae (Debre-
cen) 6, 290–297 (1959)
10. Gleiss, P.M.: Short cycles: minimum cycle bases of graphs from chemistry and
biochemistry. Ph.D. thesis, Universität Wien, Austria (2001)
11. Golynski, A., Horton, J.D.: A polynomial time algorithm to find the minimum
cycle basis of a regular matroid. In: Penttonen, M., Schmidt, E.M. (eds.) SWAT
2002. LNCS, vol. 2368, pp. 200–209. Springer, Heidelberg (2002)
12. Hartvigsen, D., Mardon, R.: The all-pairs min cut problem and the minimum cycle
basis problem on planar graphs. SIAM J. Discrete Math. 7(3), 403–418 (1994)
13. Horton, J.D.: A polynomial-time algorithm to find the shortest cycle basis of a
graph. SIAM J. Computing 16(2), 358–366 (1987)
14. Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T.,
Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and
applications. Computer Science Review 3(4), 199–243 (2009)
15. Kavitha, T., Mehlhorn, K., Michail, D., Paluch, K.E.: An Õ(m2 n) algorithm for
minimum cycle basis of graphs. Algorithmica 52(3), 333–349 (2008)
16. Liebchen, C., Rizzi, R.: Classes of cycle bases. Discrete Applied Mathemat-
ics 155(3), 337–355 (2007)
17. Mehlhorn, K., Michail, D.: Implementing minimum cycle basis algorithms. ACM
Journal of Experimental Algorithmics 11 (2006)
18. Mehlhorn, K., Michail, D.: Minimum cycle bases: Faster and simpler. Accepted for
publication in ACM Trans. on Algorithms (2007)
19. Mehlhorn, K., Näher, S.: LEDA: A Platform for Combinatorial and Geometric
Computing. Cambridge University Press, Cambridge (1999)
20. Rizzi, R.: Minimum weakly fundamental cycle bases are hard to find. Algorith-
mica 53(3), 402–424 (2009)
21. Stepanec, G.F.: Basis systems of vector cycles with extremal properties in graphs.
Uspekhi Mat. Nauk II 19, 171–175 (1964) (in Russian)
22. Zykov, A.A.: Theory of Finite Graphs. Nauka, Novosibirsk (1969) (in Russian)
Efficient Algorithms for Average Completion
Time Scheduling

René Sitters

Department of Econometrics and Operations Research, Free University, Amsterdam


[email protected]

Abstract. We analyze the competitive ratio of algorithms for minimiz-


ing (weighted) average completion time on identical parallel machines
and prove that the well-known shortest remaining processing time algo-
rithm (SRPT) is 5/4-competitive w.r.t. the average completion time ob-
jective. For weighted completion times we give a deterministic algorithm
with competitive ratio 1.791 + o(m). This ratio holds for preemptive and
non-preemptive scheduling.

1 Introduction
There is a vast amount of papers on minimizing average completion in machine
scheduling. Most appeared in the combinatorial optimization community in the
last fifteen years. The papers by Schulz and Skutella [22] and Correa and Wag-
ner [6] give a good overview.
The shortest remaining processing time (SRPT) algorithm is a well-known
and simple online procedure for preemptive scheduling of jobs. It produces an
optimal schedule on a single machine with respect to the average completion time
objective [20]. The example in Figure 1 shows that this is not true when SRPT
is applied to parallel machines. The best known upper bound on its competitive
ratio was 2 [18] until recently (SODA2010), Chung et al. [5] showed that the
ratio is at most 1.86. Moreover, they show that the ratio is not better than
21/19 > 1.105. In this paper, we show that the competitive ratio of SRPT is at
most 1.25.
The SRPT algorithm has a natural generalization to the case where jobs have
given weights. Unfortunately, our proof does not carry over to this case. No
algorithm is known to have a competitive ratio less than 2. Remarkably, even
for the offline problem, the only ratio less than 2 results from the approximation
scheme given by Afrati et al. [1]. Schulz and Skutella [22] give a randomized 2-
approximate algorithm which can be derandomized and applied online (although
not at the same time). A deterministic online algorithm for the preemptive case
is given by Megow and Schulz [16] and for the non-preemptive case by Correa
and Wagner [6]. The ratios are, respectively, 2 and 2.62. The first bound on
the algorithm is tight, the latter is probably not. On the single machine, no

Supported by a research grant from the Netherlands Organization for Scientific Re-
search (NWO-veni grant).

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 411–423, 2010.

c Springer-Verlag Berlin Heidelberg 2010
412 R. Sitters

non-preemptive online algorithm can be better than 2 competitive [27] but it


was unknown if the same is true for parallel machines. We give a simple online
algorithm that runs in O(n log n) time and has competitive ratio 1.791 + o(m),
i.e., it drops down to 1.791 for m → ∞. This gives new insight in online and
offline algorithms for average completion time minimization on parallel machines.

t=0 1 2 3 4 t=0 1
Fig. 1. There are two machines. At time 0, two jobs of length 1 and one job of length
2 are released and at time 2, two jobs of length 1 are released. The picture shows the
suboptimal SRPT schedule and the optimal schedule.

The first approximation guarantee for weighted non-preemptive scheduling


was given by Hall et al. [10]. This ratio of 4+ was reduced to 3.28 by Megow and
Schulz [16] and then reduced to 2.62 by Correa and Wagner [6]. Table 1 gives a
summary of known best ratios for a selection of problems. Remarkable is the large
gap between lower and upper bounds for parallel machines. Not mentioned in the
table are recent papers by Jaillet and Wagner [13] and by Chou et al. [4] which an-
alyze the asymptotic ratio for several of these problems. Asymptotic, in this case,
means that jobs have comparable weights and the number of jobs goes to infinity.

1.1 Problem Definition


An instance is given by a number of machines m, a job set J ⊂ N and for
each j ∈ J integer parameters pj ≥ 1, rj ≥ 0, wj ≥ 0 indicating the required
processing time, the release time, and the weight of the job. A schedule is an
assignment of jobs to machines over time such that no job is processed by more
than one machine at the time and no machine processes more than one job at the
time. In the non-preemptive setting, each job j is assigned to one machine and
is processed without interruption. In the preemptive setting, we may repeatedly
interrupt the processing of a job and continue it at any time on any machine.

Table 1. Known lower and upper bounds on the competitive ratio for randomized and
deterministic online algorithm

Problem (Online) L.B. Rand. U.B. Rand. L.B. Det. U.B. Det.

1|rj , pmtn|  j Cj 1 1 [20] 1 1 [20]
1|rj , pmtn| j wj Cj 1.038 [7] 4/3 [21] 1.073 [7] 1.57 [24]

1|rj | j Cj e/(e − 1) [26] e/(e−1) ≈ 1.58 [3] 2 [27] 2 [11][15][18]

1|rj | j wj C 
j e/(e − 1) [26] 1.69 [9] 2 [27] 2 [2][19]
P |rj , pmtn|  j Cj 1 − 1.86 → 5/4 [5] 1.047 [27] 1.86 → 5/4 [5]
P |rj , pmtn| j wj Cj 1 − 2 → 1.791 [22][16] 1.047 [27] 2 → 1.791 [16]

P |rj | j Cj 1.157 [23] 2 → 1.791 [22] 1.309 [27] 2 → 1.791 [14]

P |rj | j wj Cj 1.157 [23] 2 → 1.791 [22] 1.309 [27] 2.62 → 1.791 [6]
Efficient Algorithms for Average Completion Time Scheduling 413

The algorithm has to construct the schedule online, i.e., the number of machines
is known a priori but jobs are only revealed at their release times . Even the
number of jobs n = |J| is unknown until the last job has been scheduled. Given
a schedule, we denote the completion time of 
job j by Cj . The value of a schedule
is the weighted average completion time n1 j∈J wj Cj and the objective is to
find a schedule with small value. We say that an algorithm is c-competitive if it
finds for any instance a schedule with value at most c times the optimal value.

2 The Competitive Ratio of SRPT


Phillips et al. [18] showed that SRP T is at most 2-competitive and showed
that their analysis is tight. Hence, a new idea is needed to prove a smaller
ratio. Indeed, the proof by Chung et al [5] is completely different and uses a
sophisticated randomized analysis of the optimal solution. On the contrary, our
proof builds on the original proof of Phillips et al. and continues where that proof
stops. Their main lemma is one of the four lemmas in our proof (Lemma 2).
In the proof, we may restrict ourselves to schedules that preempt jobs only at
integer time points since all processing times and release times are integer. For
any integer t ≥ 1 we define slot t as the time interval [t − 1, t]. By this notation,
the first slot that is available for j is slot rj + 1. Given a (partial) schedule σ,
we say that job j is unfinished at time t (or, equivalently, unfinished before slot
t + 1) if less than pj units are processed before t in σ. A job j is available at
time t (or, equivalently, available for slot t + 1) if rj ≤ t and j is unfinished at
time t. Let σ(t) be the set of jobs processed in slot t and denote by μi (σ) the
i-th smallest completion time in σ.

The SRPT algorithm:

Let t = 1. Repeat:
If there are more than m jobs available for slot t, then process m jobs in slot t that have
the shortest remaining processing times among all available jobs. Otherwise, process
all available jobs. Let t = t + 1.

The SRPT algorithm as defined here is not deterministic since it may need
to choose between jobs with the same remaining processing time. We say that
a schedule σ is an SRPT schedule for instance I if it is a possible output of the
SRPT algorithm applied to I. Note that the values μi (σ) do not depend on the
non-deterministic choices of the algorithm, i.e., if σ and σ  are SRPT schedules
for the same instance on n jobs, then μi (σ) = μi (σ  ) for all i ∈ {1, 2, . . . , n}.
All four lemmas are quite intuitive. For the first lemma, imagine that for
a given instance we reduce the release time of some job by δ and increase its
processing time by at least the same amount. Then, the optimum value cannot
improve since there is no advantage in starting a job earlier if this is undone
by an increase in its processing time. The first lemma shows that SRPT has an
even stronger property in this case.
414 R. Sitters

Lemma 1. Let I and I  satisfy J = J  and for each j ∈ J satisfy rj = rj −δj ≥ 0
and pj ≥ pj + δj , for some integers δj ≥ 0. Let σ and σ  be SRPT schedules for,
respectively, I and I  . Then, for every i ∈ {1, 2, . . . , n},

μi (σ) ≤ μi (σ  ).

Proof. We proof it by induction on the makespan of σ. Let qj (t) and qj (t) be the
remaining processing time of job j in, respectively, σ and σ  at time t. Define the
multiset Q(t) = {qj (t) | rj ≤ t}, i.e., it contains the remaining processing times
of all jobs released at t or earlier. Let Q (t) contain the remaining processing
times of the same set in σ  , i.e., Q (t) = {qj (t) | rj ≤ t}. Note that we take rj
and not rj in Q . Let Qi (t) and Qi (t) be the i-th smallest element in, respectively,
Q(t) and Q (t). We claim that for any time point t,

Qi (t) ≤ Qi (t), for all i ∈ {1, 2, . . . , |Qi (t)|}. (1)

If we can show (1) then the proof follows directly since μi (σ) (μi (σ  )) is the
smallest t such that Qi (t) (Qi (t)) has at least i zero elements.
The proof is by induction on t. It is true for t = 0 since Q(0) = Q (0). Now
consider an arbitrary time t0 and assume the claim is true for and t ≤ t0 .
First we analyze the changes when no job is released at time t0 + 1. If σ
processes less than m jobs in slot t then all non-zero elements in Q(t0 ) are
reduced by one, implying, Qi (t0 + 1) ≤ Qi (t0 + 1) for all i ≤ |Qi (t0 + 1)|. Now
assume σ processes less than m jobs in slot t0 . Then it processes jobs with
remaining processing times Qk+1 (t0 ), Qk+2 (t0 ), . . . , Qk+m (t0 ) for some k ≥ 0
while Qj (t0 ) = 0 for any j ≤ k. Since Qk+1 (t0 ), Qk+2 (t0 ), . . . , Qk+m (t0 ) are
also non-zero, only values Qs (t0 ) with s ≤ k + m are reduced for σ  . Again,
Qi (t0 + 1) ≤ Qi (t0 + 1) for all i ≤ |Qi (t0 + 1)|.
Now assume some jobs are released at time t0 + 1. We may use the analysis
above and only consider the affect of the newly added jobs. For any new job j
we have pj = qj (t0 + 1) ≤ qj (t0 + 1). Clearly, (1) remains valid after the addition
of these jobs. 


The next lemma follow directly from Lemma 1 and was given before by Phillips
et al. (Lemma 4.3 in [18])
Lemma 2. Let instance I  be obtained from I by removing some of the jobs
from I. Let σ and σ  be SRPT schedules for, respectively, I and I  and let n, n
be the number of jobs in I and I  . Then, for every i ≤ n ,

μi (σ) ≤ μi (σ  ).

Proof. For each job j that is included in I but not in I  we add a job j to I  with
rj = rj and pj = ∞ (or some large enough number). In the SRPT schedule for
the extended instance, the added jobs will complete last and the other jobs are
scheduled as in σ  . Now the lemma follows directly from Lemma 1 with δj = 0
for all j. (N.B. Phillips et al. [18] use the same argument. However, we do need
the stronger version of Lemma 1 with arbitrary δj ≥ 0 to prove Lemma 4.)  
Efficient Algorithms for Average Completion Time Scheduling 415

An advantage of unweighted completion times over weighted completion times


is that we can use a volume argument. For example, in any feasible schedule,
the sum of the last m completion times is bounded from below by the sum of
all processing times. To approximate the sum of the last m completion times we
may compare the total volume that SRPT has done until a moment t with the
volume that could have been done by any other schedule. This backlog argument
enables us to bound the sum of the last m completion times as we do in Lemma 4.
Given schedule σ, let Vt (σ) be the volume processed until time t. Call a
schedule busy if at any moment t either all machines are busy or all jobs available
at time t are being processed. Clearly, any SRPT schedule is busy. The next
lemma gives an upper bound on the volume that a busy schedule may do less than
any other schedule. Figure 2 shows that the lemma is tight for m = 2. (Basically
the same lemma is given in [12] and was also found by the authors of [5]).
Lemma 3. Let σ be a busy schedule and σ ∗ be any feasible schedule, both for
the same instance I on m machines. Then, at any moment t
Vt (σ ∗ ) − Vt (σ) ≤ mt/4.
Proof. A complete proof is given in [25]. Here we give a short proof that Vt (σ ∗ )−
Vt (σ) ≤ mt/2. Using this bound it follows that SRP T is at most 3/2-competitive.
Fix an arbitrary time t and, for any j, let qj and qj∗ be the number of units
processed of j until time t in, respectively, σ and σ ∗ . Then Vt (σ ∗ ) − Vt (σ) =
 ∗ ∗ ∗
j (qj − qj ). If qj − qj ≥ 0 there are at least qj − qj slots where j is processed

in σ but not in σ. For each such slot mark the position (i.e. slot plus machine)
in σ ∗ . Note that σ must process some other job at this position. Doing this for
all jobs we see that the volume that σ processes before t is at least j (qj∗ − qj ).
Hence, Vt (σ ∗ ) − Vt (σ) ≤ Vt (σ), which implies
2(Vt (σ ∗ ) − Vt (σ)) ≤ (Vt (σ ∗ ) − Vt (σ)) + Vt (σ) = Vt (σ ∗ ) ≤ mt.


Lemma 4. Given an instance I with n ≥ m jobs, let τ be its SRPT schedule
and ρ be an arbitrary feasible schedule for I. Then,

n
5 
n
μi (τ ) ≤ μi (ρ).
i=n−m+1
4 i=n−m+1

Proof. Let t = μn−m (ρ). We change the instance I into I  as follows such that
no job is released after time t in the new instance. Every job j with rj ≥ t + 1
gets release time rj = t and processing time pj + rj − t. Let τ  be an SRPT
schedule for I  . Then, by Lemma 1 we have
μi (τ ) ≤ μi (τ  ), for any i ∈ {1, 2, . . . , n}. (2)
On the other hand, we can change ρ into a feasible schedule ρ for I  without
changing any of the completion times since at most m jobs are processed after
time t in ρ. Hence, we may assume
μi (ρ) = μi (ρ ), for any i ∈ {1, 2, . . . , n}. (3)
416 R. Sitters

Let Wt (τ  ) and Wt (ρ ) be the total remaining processing time at time t in,
respectively, τ  and ρ . Since the last m jobs complete at time t or later in ρ we
have
n
μi (ρ ) ≥ mt + Wt (ρ ). (4)
i=n−m+1
Since no jobs are released after t, the SRPT schedule satisfies

t=0 1 2 3

Fig. 2. A tight example for Lemma 3. Take m = 2 and two jobs of length 1 and one
job of length 2. All are released at time 0. It is possible to complete the jobs by time
2. The remaining volume at time t = 2 in the SRPT schedule is 1 = mt/4.


n
μi (τ  ) ≤ mt + Wt (τ  ). (5)
i=n−m+1

(Equality holds if τ  completes at least m jobs at time t or later than t.) By


Lemma 3, Wt (τ  ) − Wt (ρ ) = Vt (ρ ) − Vt (τ  ) ≤ mt/4. This combined with (4)
and (5) gives

n
μi (τ  ) ≤ mt + Wt (τ  )
i=n−m+1
5
≤ mt + Wt (ρ )
4
5 
n
5
≤ (mt + Wt (ρ )) ≤ μi (ρ ).
4 4 i=n−m+1

Equations (2) and (3) complete the proof. 



Theorem 1. SRPT is 5/4-competitive for minimizing total completion time on
identical machines.
Proof. Let ϕ be an optimal schedule. Take any n ≤ n and let J  be the set
of the first n jobs completed in ϕ. Consider an SRPT schedule σ  for J  . By
Lemma 2 we know that
μi (σ) ≤ μi (σ  ) for all i ≤ |J  |. (6)
We distinguish between the cases n ≤ m and n ≥ m. In the first case we have
μi (σ  ) ≤ μi (ϕ) since σ  starts each job at its release time and processes it without
preemption. Combining this with (6) we get that
μi (σ) ≤ μi (ϕ) for all i ≤ n . (7)
Efficient Algorithms for Average Completion Time Scheduling 417

Now assume n ≥ m and let ϕ be the schedule ϕ restricted to jobs of J  . By


definition,
μi (ϕ ) = μi (ϕ) for all i ≤ |J  |. (8)
We apply Lemma 4 with τ = σ  and ρ = ϕ .
 

n
5 
n

μi (σ ) ≤ μi (ϕ ). (9)
4
i=n −m+1 i=n −m+1

Using (6) and (8) we conclude that


 

n
5 
n
μi (σ) ≤ μi (ϕ). (10)
4
i=n −m+1 i=n −m+1

Hence, we see from (7) and (10) that the theorem follows by partitioning the com-
pletion times in groups of size m. The first group may be smaller. 


2.1 More Properties of SRPT


Given Lemmas 1 and 2 one might believe that a similar statement holds with
respect to release times. However, it is not true that completion times do not
decrease if release times are increased. In the example of Figure 1, SRPT will
produce an optimal schedule if we change the release time of one small job from
0 to 1. The same example shows that SRPT may not be optimal even if no job
is preempted. Finally, it is also not true that SRPT is optimal if it contains no
idle time. This can be seen if we add two long jobs to example of Figure 1. This
will not change the schedule of the other jobs and the sum of the completion
times of the two long jobs is the same for SRPT and the optimal schedule. We
conjecture that an SRPT schedule is optimal if it is non-preemptive and has no
idle time.

3 Weighted Jobs
The SRPT algorithm has a natural generalization to the case where jobs have
given weights. Unfortunately, our proof does not carry over to this case. A com-
mon approach in the analysis of the weighted average completion time is to use
the mean busy time of a job which is defined as the average point in time that a
job is processed. Given a schedule σ let Z(σ) be the sum of weighted completion
times and Z R (σ) be the sum of weighted mean busy times. On a single machine,
the average (or total) weighted mean busy time is minimized by scheduling jobs
preemptively in order of highest ratio of wj /pj [8]. This is called the preemptive
weighted shortest processing time (WSPT) schedule. The WSPT-schedule is not
unique but its total mean busy time is. Now consider a fast single machine that
runs each job m times faster, i.e., job j has release time rj and processing time
pj /m. For a given instance I, let σm (I) be its preemptive WSPT-schedule on
418 R. Sitters

the fast single machine. The following inequality is a well-known lower bound
on the optimal value of a preemptive and non-preemptive schedule [4,22].

1
Z R (σm (I)) + wj pj ≤ Opt(I). (11)
2 j

Our algorithm uses the same two steps as the algorithms by Schulz and Skutella
[22] and Correa and Wagner [6]: First, the jobs are scheduled on the fast single
machine and then, as soon as an α-fraction of a job is processed, a job is placed
as early as possible on one of the parallel machines. The algorithm in [22] uses
random values of α and a random assignment to machines. The deterministic
algorithm of [6] optimizes over α and simply takes the first available machine
for each job. Our algorithm differs at three points: First, we take a fast single
machine schedule of a modified instance I  instead of I. Second, we do not apply
preemptive WSPT but use non-preemptive WSPT instead. Third, we simply
take α = 0 for each job. The behavior of our algorithm depends on the input I
and a real number  > 0.

Theorem 2. With  = 1/ m, algorithm Online() is δm√-competitive for min-
imizing total weighted completion time, where δm = (1+1/ m)2 (3e−2)/(2e−2).
The ratio holds for preemptive and non-preemptive scheduling on m identical
parallel machines.
We denote the start and completion time of job j in the fast machine ρm by,
respectively, sj and cj and in the parallel machine schedule ρ by Sj and Cj . First,
we prove that the optimal value does not change much by the modification made
in step (i).
Lemma 5. Opt(I  ) ≤ (1 + )Opt(I).

Proof. Let σ ∗ be an optimal schedule for I and for any job j let Cj∗ be the
completion time of j in σ ∗ . We stretch the schedule by a factor 1 +  such that
each job j completes at time (1 + )Cj∗ and starts at time

(1 + )Cj∗ − pj ≥ (1 + )(rj + pj ) − pj = (1 + )rj + pj ≥ rj .

We see that the schedule is feasible for I  and its value is exactly 1 +  times the
optimal value of I. 


Since we apply non-preemptive WSPT, the schedule ρm derived in step (ii) will
in general not be the same as the fast single machine schedule σ(I  ), which is
derived by preemptive WSPT. Hence, we cannot use inequality (11) directly. We
define a new instance I  such that ρm is the fast machine schedule of I  . We
shall prove this in Lemma 7 but first we introduce I  and bound its optimal
value like we did in the previous lemma. Let I  = {(pj , wj , rj ) | j = 1 . . . n}
with pj = pj , wj = wj and rj = min{γ rj , sj }, where γ = 1 + 1/(m).
Lemma 6. Opt(I  ) ≤ (1 + 1/(m))Opt(I  ).
Efficient Algorithms for Average Completion Time Scheduling 419

Proof. The proof is similar to that of Lemma 5. Let σ  be an optimal schedule


for I  and Cj the completion time of j in σ  . We stretch the schedule by a factor
γ such that each job j completes at time γ Cj and starts at time

γ Cj − pj ≥ γ (rj + pj ) − pj = γ rj + (γ − 1)pj ≥ γ rj ≥ rj .

We see that the schedule is feasible for I  and its value is exactly γ times the
optimal value of I  . 


Algorithm Online( ):

Input: Instance I = {(pj , wj , rj ) | j = 1 . . . n}.

(i) Let I  = {(pj , wj , rj ) | j = 1 . . . n} with pj = pj , wj = wj and rj = rj + pj .
(ii) Apply non-preemptive WSPT to I  on the fast single machine. Let ρm be this
schedule and let sj be the start time of job j in ρm .
(iii) Each job j is placed at time sj on one of the parallel machines as early as possible
(but not before sj ). Let ρ be the final schedule.

Clearly, Opt(I) ≤ Opt(I  ) since we only shift release times forward. Com-
bining Lemmas √ 5 and 6 we see that Opt(I  ) ≤ (1 + 1/(m))(1 + )Opt(I).
Choosing  = 1/ m we obtain the following corollary.
Corollary 1
 2
1
Opt(I) ≤ Opt(I  ) ≤ 1+ √ Opt(I). (12)
m

If we want to prove a bound on the competitive ratio of our algorithm only


for large values of m, then we may just as well compare our schedule with the
optimal schedule of I  instead of I since Opt(I  )/Opt(I) → 1 for m → ∞. The
next lemma states that the total mean busy time of ρm equals the total mean
busy time of the preemptive WSPT-schedule of I  on the single machine.
Lemma 7. Z R (ρm ) = Z R (σ(I  )).

Proof. We show that schedule ρm is a preemptive WSPT schedule for I  . First,


ρm is a feasible schedule for the fast single machine relaxation of I  since, by
definition, rj ≤ sj . Next we use sj ≥ rj ≥ pj .

cj /sj = (sj + pj /m)/sj (13)


= 1 + pj /(msj )
≤ 1 + pj /(mpj )

= 1 + 1/ m.
420 R. Sitters

Assume that at moment t, job j is being √ processed in ρm and job k is available


in I  , i.e., rk ≤ t. Denote γ = 1 + 1/ m, then by definition rk = min{γrk , sk }.
Since also rk ≤ t < sk we must have rk = γrk . Using (13) we get

rk = rk /γ ≤ t/γ < cj /γ ≤ (1 + 1/ m)sj /γ = sj .

We see that job k was available at the time we started job j in step (ii). Hence,
we must have wk /pk ≤ wj /pj . 


We apply the lower bound of (11) to instance I  .


1
Z R (σm (I  )) + wj pj ≤ Opt(I  ). (14)
2 j

Combining this with Corollary 1 and Lemma 7, we finally get a useful lower
bound on the optimal solution.
Corollary 2
 2
1 1
Z R (ρm ) + wj pj ≤ 1+ √ Opt(I).
2 j m

 lower bound of Corollary 2 together with the obvious lower bound Opt(I) ≥
The
j wj pj results in the following lemma.

Lemma 8. Let 1 ≤ α ≤ 2. If Sj ≤ αsj for every job j, then


 2
 α! 1
wj Cj ≤ 1 + 1+ √ Opt(I).
j
2 m

Proof. Let bj be the mean busy time of j in ρm , then sj = bj − pj /(2m) < bj .

Cj = Sj + pj
≤ αsj + pj
< αbj + pj
= α(bj + pj /2) + (1 − α/2)pj

Next, we add weights and take the sum over all jobs.
 !
1  
j wj Cj ≤ α Z (ρm ) + 2 j wj pj + (1 − α/2)
R
j wj pj


Now we use Corollary 2 and use that Opt(I  ) ≥ Opt(I) ≥ j wj pj . For any
α ≤ 2 we have
 √ 2
j wj Cj ≤ α(1 + 1/ m) Opt(I)

+ (1 − α/2)Opt(I)
≤ (1 + α/2)(1 + 1/ m)2 Opt(I).


Efficient Algorithms for Average Completion Time Scheduling 421

First we give a short proof that α ≤ 2. This shows that the competitive ratio is
at most 2 + o(m).
Lemma 9. Sj ≤ 2sj for any job j.
Proof. Consider an arbitrary job j. At time sj , the total processing time of jobs k
with sk < sj is at most msj . Since these are the only jobs processed on the parallel
machines between time sj and Sj we have msj ≥ m(Sj − sj ). Hence, Sj ≤ 2sj .  
The bound of the next lemma is stronger. The proof is given in the technical
e
report [25]. Lemma 8 tells us that the competitive ratio is at most 1 + 2(e−1) ≈
1.791 in the limit.
Lemma 10. Sj ≤ e
e−1 sj and this bound is tight.

3.1 Removing the o(m)


We can easily get rid of the o(m) term at the cost of a higher ratio. Correa
and Wagner [6] give a randomized αm -competitive algorithm for the preemptive
problem and a βm -competitive algorithm for the non-preemptive version, where
2 − 1/m = αm < βm < 2 for m ≥ 3. Let δm be our ratio as defined in Theorem 2.
Then 2 − 1/m > δm for m ≥ 320. Hence, we get a randomized 2 − 1/320 < 1.997-
competitive for the preemptive version when we apply our algorithm for m ≥ 320
and the αm -competitive for m < 320. The ratio for the non-preemptive version
is even closer to 2 (but strictly less than 2).

4 Conclusion
We have shown that approximation ratios less than 2 can be obtained for parallel
machines by simple and efficient online algorithms. The lower bounds indicate
that competitive ratios close to 1 are possible for randomized algorithms, espe-
cially when preemption is allowed. Our analysis for SRPT is tight and it seems
that a substantially different proof is needed to get below 1.25. Already, the gap
with the lower bound, 1.105, is quite small. Muthukrishnan et al.[17] show that
SRPT is at most 14 competitive w.r.t. the average stretch of jobs. Possibly, our
result can reduce this ratio substantially. The analysis for algorithm Online is
not tight and a slight modification of the algorithm and analysis may give a ratio
e/(e − 1) + o(m) ≈ 1.58 + o(m). Moreover, the analysis is not parameterized by
m. A refined analysis will reduce the o(m) for small values of m.

References
1. Afrati, F., Bampis, E., Chekuri, C., Karger, D., Kenyon, C., Khanna, S., Milis, I.,
Queyranne, M., Skutella, M., Stein, C., Sviridenko, M.: Approximation schemes
for minimizing average weighted completion time with release dates. In: FOCS ’99,
pp. 32–44 (1999)
2. Anderson, E.J., Potts, C.N.: Online scheduling of a single machine to minimize
total weighted completion time. Math. Oper. Res. 29, 686–697 (2004)
422 R. Sitters

3. Chekuri, C., Motwani, R., Natarajan, B., Stein, C.: Approximation techniques for
average completion time scheduling. SIAM Journal on Computing 31, 146–166
(2001)
4. Chou, M.C., Queyranne, M., Simchi-Levi, D.: The asymptotic performance ratio
of an on-line algorithm for uniform parallel machine scheduling with release dates.
Mathematical Programming 106, 137–157 (2006)
5. Chung, C., Nonner, T., Souza, A.: SRPT is 1.86-competitive for completion time
scheduling. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete
Algorithms (Austin, Texas), pp. 1373–1388 (2010)
6. Correa, J.R., Wagner, M.R.: LP-based online scheduling: from single to parallel
machines. Mathematical Programming 119, 109–136 (2009)
7. Epstein, L., van Stee, R.: Lower bounds for on-line single-machine scheduling.
Theoretical Computer Science 299, 439–450 (2003)
8. Goemans, M.X.: Improved approximation algorithms for scheduling with release
dates. In: Proc. 8th Symp. on Discrete Algorithms, New Orleans, Louisiana, United
States, pp. 591–598 (1997)
9. Goemans, M.X., Queyranne, M., Schulz, A.S., Skutella, M., Wang, Y.: Single ma-
chine scheduling with release dates. SIAM Journal on Discrete Mathematics 15,
165–192 (2002)
10. Hall, L.A., Schulz, A.S., Shmoys, D.B., Wein, J.: Scheduling to minimize average
completion time: Off-line and on-line approximation algorithms. Mathematics of
Operations Research 22, 513–544 (1997)
11. Hoogeveen, J.A., Vestjens, A.P.A.: Optimal on-line algorithms for single-machine
scheduling. In: Cunningham, W.H., Queyranne, M., McCormick, S.T. (eds.) IPCO
1996. LNCS, vol. 1084, pp. 404–414. Springer, Heidelberg (1996)
12. Hussein, M.E., Schwiegelshohn, U.: Utilization of nonclairvoyant online schedules.
Theoretical Computer Science 362, 238–247 (2006)
13. Jalliet, P., Wagner, R.M.: Almost sure asymptotic optimality for online routing
and machine scheduling problems. Networks 55, 2–12 (2009)
14. Liu, P., Lu, X.: On-line scheduling of parallel machines to minimize total comple-
tion times. Computers and Operations Research 36, 2647–2652 (2009)
15. Lu, X., Sitters, R.A., Stougie, L.: A class of on-line scheduling algorithms to min-
imize total completion time. Operations Research Letters 31, 232–236 (2002)
16. Megow, N., Schulz, A.S.: On-line scheduling to minimize average completion time
revisited. Operations Research Letters 32, 485–490 (2004)
17. Muthukrishnan, S., Rajaraman, R., Shaheen, A., Gehrke, J.E.: Online scheduling
to minimize average stretch. SIAM J. Comput. 34, 433–452 (2005)
18. Phillips, C., Stein, C., Wein, J.: Minimizing average completion time in the presence
of release dates, networks and matroids; sequencing and scheduling. Mathematical
Programming 82, 199–223 (1998)
19. Queyranne, M.: On the Anderson-Potts single machine on-line scheduling algo-
rithm (2001) (unpublished manuscript)
20. Schrage, L.: A proof of the optimality of the shortest remaining processing time
discipline. Operations Research 16(3), 687–690 (1968)
21. Schulz, A.S., Skutella, M.: The power of α-points in single machine scheduling.
Journal of Scheduling 5, 121–133 (2002)
22. Schulz, A.S., Skutella, M.: Scheduling unrelated machines by randomized rounding.
SIAM Journal on Discrete Mathematics 15, 450–469 (2002)
23. Seiden, S.: A guessing game and randomized online algorithms. In: Proceedings of
the 32nd ACM Symposium on Theory of Computing, pp. 592–601 (2000)
Efficient Algorithms for Average Completion Time Scheduling 423

24. Sitters, R.A.: Complexity and approximation in routing and scheduling, Ph.D.
thesis, Eindhoven Universtity of Technology, the Netherlands (2004)
25. Sitters, R.A.: Efficient algorithms for average completion time scheduling, Tech. Re-
port 2009-58, FEWEB research memorandum, Free University Amsterdam (2009)
26. Stougie, L., Vestjens, A.P.A.: Randomized on-line scheduling: How low can’t you
go? Operations Research Letters 30, 89–96 (2002)
27. Vestjens, A.P.A.: On-line machine scheduling, Ph.D. thesis, Department of Math-
ematics and Computing Science, Technische Universiteit Eindhoven, Eindhoven,
the Netherlands (1997)
Experiments with Two Row Tableau Cuts

Santanu S. Dey1 , Andrea Lodi2 ,


Andrea Tramontani2 , and Laurence A. Wolsey3
1
H. Milton Stewart School of Industrial and Systems Engineering,
Georgia Institute of Technology, USA
[email protected]
2
DEIS, Università di Bologna, Italy
{andrea.lodi,andrea.tramontani}@unibo.it
3
CORE, Catholic University of Louvain, Belgium
[email protected]

Abstract. Following the flurry of recent theoretical work on cutting


planes from two row mixed integer group relaxations of an LP tableau,
we report on some computational tests to evaluate the effectiveness of
two row cuts based on lattice-free (type 2) triangles having more than
one integer point on one side. A heuristic procedure to generate such
triangles is presented, and then the coefficients of the integer variables
are tightened by lifting. As a first step in testing the effectiveness of
the triangle cuts, we make comparisons between the gap closed using
Gomory mixed integer cuts for one round and the gap closed in one
round using all the triangles generated by our heuristic. Our tests are
carried out on different classes of randomly generated instances designed
to represent different models in the literature by varying the number of
integer non-basic variables, bounds and non-negativity constraints.

1 Introduction

Addition of cutting planes to strengthen linear programming relaxations has


proven to be an indispensable tool in solving mixed integer programs (MIPs).
For general MIPs, Gomory mixed integer cuts [19] (GMIC) and mixed integer
rounding inequalities [25] have been found to be important cutting planes. Con-
struction of most cutting planes can be viewed as deriving valid inequalities for
some relaxation of MIPs. Seen from this perspective, the above mentioned cut-
ting planes can be obtained by deriving valid inequalities for single constraint
relaxations of MIPs. Therefore a natural extension is to consider inequalities
that are valid for multiple constraint relaxations of MIPs. This approach has
been pursued recently with the aim of obtaining new cutting planes that could
possibly improve on the gap closed by GMICs.
A two constraint relaxation of a simplex tableau introduced in [5] is


n
{(z, y) ∈ Z2 × Rn+ | z = f + r j yj } , (1)
j=1

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 424–437, 2010.

c Springer-Verlag Berlin Heidelberg 2010
Experiments with Two Row Tableau Cuts 425

where f ∈ Q2 \ {(0, 0)}, rj ∈ Q2 ∀j ∈ {1, ..., n}. Various variants of (1) have been
studied; see for example [3,4,9,10,11,12,13,15,16,26] and [14] for a recent survey on
the topic. The relaxation (1) can be obtained in three steps. First select two rows of
a simplex tableau corresponding to integer basic variables currently at fractional
value(s). Then relax the non-negativity of the integer basic variables. Finally relax
the non-basic integer variables to be continuous variables. Two appealing features
of this relaxation are the complete characterization of all facet-defining inequal-
ities using the so-called lattice-free convex sets and the advantage of obtaining
the strongest possible coefficients for continuous variables in the resulting cutting
planes for MIPs. Note that the importance of obtaining cutting planes where the
continuous variables have strong coefficients has been emphasized by previous the-
oretical and computational studies; see for example [2,18].
Since the continuous variables receive strong coefficients, the focus next is to
improve the coefficients of non-basic integer variables. One way to strengthen
the coefficients of non-basic integer variables is to consider lifting them to obtain
valid inequalities for

n 
n+k
{(z, y, x) ∈ Z2 × Rn+ × Zk+ | z = f + r j yj + rj xj } , (2)
j=1 j=n+1

where all data is rational. One possible approach for lifting is by the use of
the so-called ‘fill-in functions’ [21,23] or ‘monoidal strengthening’ [8]. Observe
that such lifted inequalities do not provide the complete list of facet-defining
inequalities of the convex hull of (2). In fact, (2) is a face of the mixed integer
master group relaxation and a complete description of the convex hull of the
master group relaxation is unknown. (See discussion in [20,22]).
The goal of this paper is to evaluate the quality of the two-row lifted cutting
planes described above computationally. In particular, we would like to under-
stand how to select a small subset of these cutting planes which are useful in prac-
tice, to discover what are the potential weaknesses of these cutting planes, and to
evaluate how far these weaknesses can be attributed to the different relaxations or
the lack of knowledge of the complete description of the convex hull of (2). We work
with a specific subclass of facet-defining inequality of the convex hull of (1) and
consider its lifted version. We attempt to answer the following specific question.
How good are these cutting planes in the the presence of integer non-basic vari-
ables? Is there a cut off on the ratio of number of integer and continuous non-basic
variables for these cutting planes to be useful? What is the strength of these two
row cuts in the presence of multiple rows? Can the sparsity structure be used to se-
lect important two row relaxations? How important is the effect of non-negativity
of the basic variables? Through experiments designed to answer each of these ques-
tions individually by keeping all other parameters constant, we hope to gain insight
into the strength of relaxations of m-row sets consisting of the intersection of the
convex hulls of two-row relaxations, and possibly obtain guidelines for selecting
the most useful two-row lifted cutting planes.
The outline of this paper is the following. In Section 2, we summarize the rele-
vant results regarding valid inequalities of (1) and the lifting of these inequalities.
426 S.S. Dey et al.

In Section 3, we briefly discuss the algorithmic aspects of generating the lifted two
row inequalities. In Section 4, we describe various experiments conducted and the
results. We conclude in Section 5.

2 Basics
We begin with a definition of maximal lattice-free convex sets and describe these
sets in R2 .
Definition 1 ([24]). A set M ⊆ Rm is called lattice-free if int(M ) ∩ Zm = ∅.
A lattice-free convex set M is maximal if there exists no lattice-free convex set
M  = M such that M  M  .
Proposition 1. Let M be a full-dimensional maximal lattice-free convex set in
R2 . Then M is one of the following:
1. A split set {(x1 , x2 ) | b ≤ a1 x1 + a2 x2 ≤ b + 1} where a1 and a2 are coprime
integers and b is an integer,
2. A triangle which is one of the following:
(a) A type 1 triangle: triangle with integral vertices and exactly one integral
point in the relative interior of each edge,
(b) A type 2 triangle: triangle with at least one fractional vertex v, exactly
one integral point in the relative interior of the two edges incident to v
and at least two integral points on the third edge,
(c) A type 3 triangle: triangle with exactly three integral points on the bound-
ary, one in the relative interior of each edge.
3. A quadrilateral containing exactly one integral point in the relative interior
of each of its edges.
A maximal lattice-free convex
n set M containing f in its interior can be used to
derive the intersection cut j=1 π(rj )yj ≥ 1 ([7]) for (1) where the coefficients
are obtained as:

j λ if ∃λ > 0 s.t. f + λ1 rj ∈ boundary(M )
π(r ) = (3)
0 if rj belongs to the recession cone of M.
All non-trivial valid inequalities for (1) are intersection cuts and can be derived
from maximal lattice-free convex sets using (3). We refer the readers to [5,13]
for a complete characterization of facet-defining inequalities of the convex hull
of (1).
Next consider the strengthening of the ncoefficients for integer variables. It is
n+k
possible to obtain the valid inequality j=1 π(rj )yj + j=n+1 φ(rj )xj ≥ 1 for
(2) where φ(rj ) = infu∈Z2 {π(rj + u)} ([21,23],[8]). For an infinite version of the
relaxation (2), it has been shown [15] that this strengthening yields extreme
inequalities if π was obtained using (3) and M is a type 1 or type 2 triangle.
Moreover in the case in which M is a type 1 and type 2 triangle [15], the function
φ can be evaluated as
φ(rj ) = minu∈Z2 {π(rj + u) | rj + u + f ∈ M } .
Experiments with Two Row Tableau Cuts 427

3 Generating Type 2 Triangles


In this paper, we focus on the inequalities that are generated using maximal
lattice-free triangles of type 2.
If the exact rational representation of the data in (1) is known, then it is
possible to efficiently enumerate all the vertices of (1) by the use of the Euclidean
algorithm. Once all the vertices are known, by setting up the polar, all facet-
defining inequalities can be obtained. Since the exact rational representation
may not typically be known, we outline a different strategy to construct maximal
lattice-free triangles of type 2.
Maximal lattice-free triangles of type 2 that are facet-defining inequalities
have the following interpretation.
1. Construct a facet-defining inequality of the form αj1 yj1 + αj2 yj2 ≥ 1 for the
set Y 2 := {(z, y) ∈ Z2 × R2+ | z = rj1 yj1 + rj2 yj2 } where αj1 , αj2 > 0. The set
j1 j2
conv{f, f + αr j , f + αr j } ⊆ R2 is lattice-free and the line segment between
1 2
j1 j2
f + αr j and f + αr j contains at least two integer points.
1 2
2. Lift a third continuous variable yj3 to obtain the inequality αj1 yj1 + αj2 yj2 +
j1 j2 j3
αj3 yj3 ≥ 1. The set conv{f + αr j , f + αr j , f + αr j } ⊆ R2 is lattice-free and
1 2 3
r j1
at least one of the sides of the triangle (other than the side between f + αj1
r j2
and f + αj2 ) contains one integer point.

Based on the above interpretations, we now give details of our implementation


for generating type 2 triangles.
1. Given three variables yj1 , yj2 and yj3 such that cone(rj1 , rj2 , rj3 ) = R2 , we
first attempt to generate one facet αj1 yj1 + αj2 yj2 ≥ 1 of conv(Y 2 ) that is
j1
‘most different’ from a GMIC. Since the line segment between f + αr j and
1
j2
f + αr j contains at least two integer points, we divide this step into two sub
2
steps corresponding to finding these two integer points. At the end of these
two steps, either the integer points are discovered or it is verified that no
such integer points exist. For simplicity let 1 := j1 , 2 := j2 and 3 := j3 .
(a) Finding the first integer point: Let c := (c1 , c2 ) be a strict convex com-
bination of r1 and r2 . Solve min{cT z | λ1 r1 + λ2 r2 = −f + z, λ1 , λ2 ≥
0, z ∈ Z2 }. Let the optimal objective value be w0 and the optimal solu-
tion be (λ̄01 , λ̄02 , z̄ 0 ). The point z̄ 0 is tentatively one of the integer points
in the interior of the side of the triangle inequality containing multiple
integer points.
(b) Finding the second integer point: Let v = λ̄01 r1 + λ̄02 r2 (i.e., v = z̄ 0 −
f ). First an integer point belonging to the set {u | u = f + μ1 r1 +
μ2 r2 , μ1 , μ2 ≥ 0} different from z̄ 0 is found. Let rnew = v + θr1 for
a suitable θ > 0. For some c such that cT r1 ≥ 0, cT rnew ≥ 0, solve
min{cT z | λ1 r1 + λ2 rnew = −f + z, λ1 , λ2 ≥ 0, z ∈ Z2 }. Let the optimal
solution be (λ̄11 , λ̄12 , z̄ 1 ).
428 S.S. Dey et al.

Now we verify if the integer point z 1 is suitable to be the second in-


teger point on the side of the triangle. If not, we iteratively update this
T T
point. Let e1 ∈ R2 be such that (e1 )T z̄ 0 = e1 z̄ 1 = 1 and e1 f < 1.
T
This is equivalent to verifying if e1 x ≥ 1 is a valid inequality for Y 2 .
Repeat the following step: Solve

T
min{ei z | λ1 r1 + λ2 r2 = −f + z, λ1 , λ2 ≥ 0, z ∈ Z2 } .

Let the optimal objective value be wi+1 and the optimal solution be
T T
(λ̄i+1 i+1 i+1
1 , λ̄2 , z̄ ). Let ei+1 ∈ R2 be such that ei+1 z̄ 0 = ei+1 z̄ i+1 = 1
T
and ei+1 f < 1. If wi+1 = 1, then denote the point z̄ i+1 as z̄ i0 and stop.
If w i+1
< 1, then set i ← i + 1 and repeat this step.
One final check is needed to verify that z̄ i0 is indeed a vertex of
the conv(Y 2 ). This check becomes relevant when conv(Y 2 ) has only one
vertex. Verify that (z̄ i0 − z̄ 0 ) and (−z̄ i0 + z̄ 0 ) do not belong to the cone
formed by r1 and r2 .
2. Lifting the third continuous variable: From the previous step we obtain two
integer points z̄ 0 and z̄ i0 that are tight for the inequality we will obtain.
Let z v1 and z v2 be the two points obtained by extending the line segment
passing through z̄ 0 and z̄ i0 and intersecting the two half lines f +μj rj , μj ≥ 0
j ∈ {1, 2}. The next step is to identify two other integer points which lie in
the relative interior of the other two sides of the triangle. Let (a, b) = z̄ 0 − z̄ i0 .
Update a and b by dividing them by their greatest common divisor and let
p, q ∈ Z such that pa + qb = ±1 and (q, −p)T r3 > 0. Then the two other
integer points are of the form z̄ 0 + (q, −p) + k(a, b) and z̄ 0 + (q, −p) + (k +
1)(a, b) for some integer k. The integer k can be calculated as follows. There
are two cases:
(a) f lies in the set {(w1 , w2 ) | z̄10 (−b) + z̄20 a ≤ w1 (−b) + w2 (a) ≤ z̄10 (−b) +
z̄20 a + 1}. In this case solve z̄ 0 + (q, −p) + λ(a, b) = f + r3 μ, μ ≥ 0 for λ
and μ and set k = λ.
(b) f lies in the set {(w1 , w2 ) | w1 (−b) + w2 (a) ≥ z̄10 (−b) + z̄20 a + 1}. Then
solve (q, −p) + λ(a, b) = μ(f − z̄ 0 ), μ ≥ 0 for λ and μ and set k = λ.

Denote z̄ 0 + (q, −p) + k(a, b) as z L and z̄ 0 + (q, −p) + k(a, b) as z R . Construct the
triangle by joining the two vertices z v1 and z v2 obtained in the previous step to
z L and z R , and extending these line segments until they intersect at the third
vertex.
We observe here that the method described above is a heuristic and not an
exact method for selecting best possible type 2 triangles. This is for at least two
reasons. First, due to the method employed to lift the third continuous variable
in step 2, the resulting cut may not always be facet-defining. More importantly
for every choice of two continuous variable that correspond to the set Y 2 in
step 1, we select only one inequality, whereas there are typically possible many
candidates.
Experiments with Two Row Tableau Cuts 429

4 Computational Experiments

The aim of this computational section is to analyze the effectiveness of cuts


associated with type 2 triangles. The most natural setting which is generally
used in these cases is to run the standard benchmarks on the MIPLIB 2003 [1]
with a cutting plane approach with and without the new family of cuts for a
fixed number of iterations/rounds and look at the usual performance indicators:
separation time, overall computing time, percentage gap closed, density of cuts,
etc. This is the framework adopted in [17] in which multi-row cuts were tested
for the first time.
However, in this particular context there are a number of issues that prevent
such an approach being entirely satisfactory. The first of these issues is precisely
the fact that the relaxation is no longer single-row as in all cutting planes from
the literature so far. Indeed, independently of the answer yes/no about the ef-
fectiveness of the new cuts, one has to deal with a series of specific questions.
Below we list some that appear important:

1. Is the 2-row relaxation stronger than the classical 1-row one?


(a) Should one consider cuts from 2 rows instead of cuts from 1 row?
(b) Are cuts from 2 rows “complementary” to cuts from 1 row, i.e., are the
cuts sufficiently different to be used together?
(c) Would several rounds of 1-row cuts give as much improvement as 2-row
cuts (if any)?
2. The theory developed so far considers separating facets of the 2-row relax-
ation (1) and then lifting the integer variables. In other words, 2-row cuts
are concerned mostly with the continuous component of the problem because
all integer non-basic variables are considered afterwards. Is that an effective
model?
3. Again on the theoretical side, model (1) assumes no bounds on the two
integer basic variables and only non-negativity on the non-basic variables.
What is the role played by bounds on the variables?
4. Although it is rather straightforward separating and adding GMICs from
the tableau, one for each basic fractional variable defined to be integer (at
least for a limited number of iterations), it is clearly not possible to add all
cuts from type 2 triangles without effecting the performance of the cutting
plane algorithm in various ways. Specifically, the number of rows explodes,
the tableau becomes too dense, and overall the numerical stability of the
linear programming engine deteriorates. Thus, we have to face a serious cut
selection problem which is far from trivial at this stage.

To begin to answer some of the questions raised above we have decided to use
randomly generated instances of which we can control the size and (partially)
the structure. In particular, we used multidimensional knapsack instances ob-
tained by the random generator of Atamturk [6], kindly provided by the author.
In addition, we considered a single round of cuts and we did not apply any
cut/triangle selection rule.
430 S.S. Dey et al.

It is clear that an overall assessment of the effectiveness of multi-row cuts


will be their use within MIP solvers on real-world problems. However, as it took
around 40 years before GMICs were proven to be effective within a computational
setting, we believe that investing time in understanding multi-row cuts step-by-
step is worthwhile.

4.1 Computational Setting


By using the generator [6], we first obtained multidimensional knapsack instances
having m rows, and
integer non-negative variables. These instances are of the
form
 
max pj xj
j=1


wij xj ≤ bi , i = 1, . . . , m,
j=1
xj ≥ 0, integer, j = 1, . . . ,
,
where all the data are strictly positive. After solving the continuous relaxation
of these instances we slightly modified them so as to obtain three sets having
characteristics suitable for our tests. Namely:
A. We relaxed the m basic variables to be free variables, we changed 10% of the
other variables to be non-negative continuous variables and we removed the
other variables.
In this way, we obtain instances with only continuous variables (besides
the basic ones) but without too many different directions/rays (not too
many variables) so that the number of type 2 triangles is small (discussed
later).
B. As for set A but the remaining 90% are kept as non-negative integer variables.
This is now a real mixed integer program in which the 10% of continuous
variables need to interact with the non-basic integer variables.
C. As for set B but the objective coefficients of the 10% of continuous variables
are divided by 100.
In this way, the importance of the integer variables is significantly
increased.
For the sets above, we considered: (i) m ∈ {2, 5}, (ii)
∈ {50, 100, 150}, (iii)
only one round of cuts, either GMICs, or type 2 triangle cuts, or both, and (iv)
we consider 30 instances for each pair (m,
).
Concerning type 2 triangle cuts, we separate cuts from any pair or rows
(only one pair for m = 2) in the following way. Given a set, say R, of suit-
able rays: for each pair of rays r1 , r2 ∈ R we select a third ray r3 ∈ R such
that cone(r1 , r2 , r3 ) = R2 and try to construct a type 2 triangle as described in
Section 3. We then distinguish two cases for R:
Experiments with Two Row Tableau Cuts 431

(y): R contains the set of rays associated with continuous variables.


(y+x): We consider rays associated with both continuous and integer variables.
However, note that the direction read from the tableau for an integer
variable is not well-defined because such a variable will be lifted after-
wards, see Section 1. Thus, for each integer variable we consider the
direction obtained by applying the GMIC lifting to the variable. Pre-
cisely, let rj = (r1j , r2j ) the ray associated with the integer variable xj in
the tableau. We construct the ray r̂j where
 j
ri − rij  if fi + rij − rij  ≤ 1
r̂ij = i = 1, 2 . (4)
rij − rij  − 1 otherwise

Of course, r̂j is only one of the possible liftings of variable xj , not nec-
essarily the best one.
The experiments with m = 2 are aimed at analyzing the case in which the
two constraint relaxation (1) is supposed to capture much of the structure of the
original problem (see Section 4.2 below). Of course, real problems have generally
(many) more constraints but in this context we limited the setting to m = 5
because we do not apply any cut selection mechanism, and otherwise the number
of separated type 2 triangle cuts would have been too big.
The code has been implemented in C++, using IBM ILOG Cplex 10.0 as
LP solver. In order to limit possible numerical issues in the cut generation, we
adopted several tolerances and safeguards. In particular, concerning the GMIC
generation, we did not generate a GMIC from a tableau row if the corresponding
basic variable has a fractional part smaller (resp. greater) than 0.001 (resp.
0.999). Similarly, we generated cuts from a triangle only if the fractional point
f is safely in its interior. More precisely, a triangle is discarded if the Euclidean
distance of f from its boundary is smaller than 0.001. In addition, both for
GMICs and triangle cuts, we discarded all the generated cuts with “dynamism”
greater than 109 , where the dynamism is computed as the ratio between the
greatest and smallest nonzero absolute value of the cut coefficients.
Before going into the details of the numbers, it is important to note that the
way used to derive the instances accomplishes a very important goal. The cuts
generated are exactly the same for the three sets A, B and C (except for the
fact that no integer rays are present in case A). Indeed, what changes is the
impact of the cuts on the solution process whereas (i) the number of continuous
variables is constant from A to C, (ii) the number of integer variables is the same
from B to C, and (iii) the rays of all variables stay the same from A to C. The
changes are instead due to the fact that (1) no integer variables are present in A
and (2) the objective function coefficients of the continuous variables have been
decreased from B to C.

4.2 2-Row Results


In this section we report the computational results for instances with m = 2
and all the three sets A, B and C described above. In particular, note that on
432 S.S. Dey et al.

problems with only two rows and no integer non-basic variables (set A), model
(1) is not a relaxation but coincides with the problem itself. Thus, type 2 triangle
cuts are supposed to be really strong and the experiments are aimed at asserting
their real effect in practice. On the other hand, sets B and C allow one to evaluate
the impact of non-basic integer variables on the strength of the two constraint
relaxation (1).
Moreover, in order to better understand the impact of bounds on the generated
instances, we also considered introducing bounds on instances of sets B and C
in the following way. For the generic set Z (Z ∈ {B, C}):

Z.1: means basic binary, y ≥ 0, x ≥ 0;


Z.2: means basic free, y ≥ 0, x binary;
Z.3: means basic free, 0 ≤ y ≤ 1, x ≥ 0.

Table 1 compares one round of Gomory Mixed Integer Cuts (GMIC in the table)
with type 2 triangle cuts either in the setting (y) above – only rays associated
with continuous variables, Tr.(y) in the table – or in the setting (x+y) – rays
from both integer and continuous variables, Tr.(x+y) in the table. In particular,
all entries are average values of the results over 90 instances and for each set,
the first 5 columns report the characteristics of the instances in the set: namely,
number of y variables (ny), number of x variables (nx), number of non-basic
slacks that are nonzero in the MIP optimal solution (nz.s), number of non-basic
y variables that are nonzero in the MIP optimal solution (nz.y), number of non-
basic x variables that are nonzero in the MIP optimal solution (nz.x). Then,
for each of the cut family, Table 1 reports three columns: percentage gap closed
(%gap), number of generated cuts (ncuts) and number of cuts tight at the second
optimal tableau, i.e., after reoptimization (ntight).
Table 1 does not report computing times for the separation. It is easy to see
that separating type 2 triangle cuts is more time consuming than separating
GMICs, mainly because of the procedure for finding the edge of the triangle
containing more than one integral point (see Section 3). However, the separation
time is quite reasonable: the average 2740.8 type 2 triangle cuts for instances
of type B and C require a separation time of 8 CPU seconds on a workstation

Table 1. 2-row Instances


Characteristics GMIC Tr.(y) Tr.(y+x)
Set ny nx nz.s nz.y nz.x %gap ncuts ntight %gap ncuts ntight %gap ncuts ntight
A 9.8 — 0.0 2.0 — 75.10 2.0 1.6 97.99 68.1 2.1 — — —
B 9.8 90.2 0.0 2.0 0.5 65.33 2.0 1.5 82.01 68.1 2.1 91.13 2,740.8 2.5
C 9.8 90.2 0.3 0.1 2.8 16.27 2.0 1.3 25.80 68.1 1.9 37.17 2,740.8 3.0
B 9.8 90.2 0.0 2.0 0.5 65.33 2.0 1.5 82.01 68.1 2.1 — — —
B.1 0.0 2.0 0.5 69.60 2.0 1.6 87.54 68.1 2.2 — — —
B.2 0.0 2.0 0.4 66.10 2.0 1.6 86.69 68.1 2.2 — — —
B.3 0.0 2.0 0.5 63.55 2.0 1.5 79.54 68.1 2.1 — — —
C 9.8 90.2 0.3 0.1 2.8 16.27 2.0 1.3 25.80 68.1 1.9 — — —
C.1 0.5 0.2 3.3 17.36 2.0 1.4 25.77 68.1 2.3 — — —
C.2 0.4 0.3 3.2 14.68 2.0 1.5 22.82 68.1 2.4 — — —
C.3 0.3 0.1 2.9 16.20 2.0 1.3 25.67 68.1 1.9 — — —
Experiments with Two Row Tableau Cuts 433

with an Intel(R) 2.40 GHz processor running the SUSE Linux 10.1 Operating
System. Again, developing effective selection strategies is likely to heavily reduce
this computational effort.
Finally, ee did not report detailed results concerning the use of GMICs and
type 2 triangle cuts together because the improvement over the latter is negligible
(even if consistent).

Tentative Observations. An additional step in the interpretation of the results


of Table 1 can be made by analyzing some statistics on the separated type 2
triangle cuts which are collected in Table 2. More precisely, the table contains
for each triangle type and set of instances, the percentage of cuts whose smallest
angle belongs to the ranges ]0,15], ]15,30], ]30,45] and ]45,60]. These numbers
are reported separately for separated and tight cuts.
Below we note possible interpretations of the results reported in the Tables:

– Type 2 triangle cuts vs other lattice-free cuts: Type 2 triangle cuts appear
to be quite important: A very few of them together with the GMICs close
98% of the gap in the instances of set A.
– Type 2 triangle cuts vs GMICs: The fact that the triangles close more of the
gap than GMICs is very interesting. In fact, the number of triangles needed
is not much more than the number of GMICs. This suggests that effort spent
in finding the “right triangles” is probably very useful especially in the case
in which the continuous variables are more important.
– Need for study of new relaxations: The instances when the integer variables
are more important show that the performance of both the GMICs and the
triangle inequalities deteriorate. This suggests that analysis of other relax-
ations based on integer non-basic variables should be pursued. The impor-
tance of generating inequalities with strong coefficients on integer variables is
again illustrated by the improvement obtained by the triangle cutting planes

Table 2. Type 2 Triangle Statistics on 2-row Instances

%Min angle
Cut type Separated/Tight 0–15 15–30 30–45 45–60
Tr.(y) Separated in A,B,C 64.66 25.23 8.42 1.70
Tight in A 62.23 29.79 6.38 1.60
Tight in B 72.11 24.21 3.16 0.53
Tight in C 78.16 17.82 3.45 0.57
Tr.(y) Separated in B.n 64.66 25.23 8.42 1.70
Tight in B 72.11 24.21 3.16 0.53
Tight in B.1 66.50 27.50 4.50 1.50
Tight in B.2 62.81 28.64 6.53 2.01
Tight in B.3 71.66 24.60 3.21 0.53
Tr.(y) Separated in C.n 64.66 25.23 8.42 1.70
Tight in C 78.16 17.82 3.45 0.57
Tight in C.1 74.76 17.62 6.67 0.95
Tight in C.2 74.77 19.16 3.74 2.34
Tight in C.3 78.16 17.82 3.45 0.57
Tr.(x+y) Separated in B,C 61.86 24.94 10.52 2.67
Tight in B 77.63 15.35 5.26 1.75
Tight in C 91.21 5.86 2.56 0.37
434 S.S. Dey et al.

based on all nonbasic variables over the triangle cutting planes obtained us-
ing only the continuous non-basic variables.
– Effect of bounds on variables: Although bounds on different types of variables
deteriorate the quality of the triangle cutting planes, this reduction in the
quality of the cutting planes is not significant. Note that the gap for set B
(continuous variable more important) deteriorates when we have bounds on
non-basic continuous variables. Similarly, the gap for set C (integer variable
more important) deteriorates when we have bounds on non-basic integer
variables. This illustrates that if a certain type of variables is important,
then adding bound on these variables deteriorates the effect of the triangle
inequalities and GMICs.
– Shape of Triangle and their importance: Almost consistently, the percentage
of triangle cutting planes which have a lower min angle and are tight after
resolving is greater than the percentage of total triangle cutting planes with
a lower min angle. This observation may relate to the fact that the thin angle
gives strong coefficient to integer non-basic variables. However, further study
is needed to understand which triangles are more important.

4.3 5-Row Results


The natural question about the strength of the 2-row relaxation for m-row in-
stances (m > 2) is addressed in this section in which m = 5. The results
are reported in Table 3 in which we only consider the basic versions A, B,
and C. In addition, we considered a fourth set of instances (C.s in the Ta-
ble) obtained by modifying the original multidimensional knapsack instances
with the aim of enforcing some structured sparsity in the constraints. More pre-
cisely, we first generated multidimensional knapsack instances having m = 5
rows and
∈ {50, 100, 150} integer non-negative variables as for the sets A, B
and C. Then, we partitioned the
variables in two sets L1 = {1, . . . ,
/2} and
L2 = {
/2 + 1, . . . ,
}, we fixed to 0 the coefficients of variables in L1 in the first
and the second knapsack constraint, we fixed to 0 the coefficients of variables in
L2 in the third and the fourth knapsack constraint, and we left unchanged the
fifth knapsack constraint. Afterwards, as for the instances of set C, we solved the
continuous relaxation, we relaxed the m basic variables to be free, we changed
10% of the variables in L1 and L2 to be non-negative continuous and we divided
the profit of the continuous variables by 100 in the objective function.
The results reported are the same as in Table 3. Again we did not report the
results of GMICs and type 2 triangle cuts together. As for the 2-row case, the

Table 3. 5-row Instances


Characteristics GMIC Tr.(y) Tr.(y+x)
Set ny nx nz.s nz.y nz.x %gap ncuts ntight %gap ncuts ntight %gap ncuts ntight
A 9.5 — 0.3 4.7 — 37.76 5.0 2.3 62.83 1,260.0 4.1 — — —
B 9.5 90.5 0.1 5.0 0.9 32.18 5.0 2.1 51.14 1,260.0 4.5 56.79 30,430.8 5.1
C 9.5 90.5 2.9 0.3 4.5 8.97 5.0 1.9 14.16 1,260.0 3.2 18.80 30,430.8 4.4
C.s 9.5 90.5 2.5 1.5 2.3 33.53 5.0 2.7 43.04 182.4 4.5 51.54 10,261.3 5.8
Experiments with Two Row Tableau Cuts 435

improvement is negligible for sets A, B and C. However, the situation changes


substantially for the set C.s where the percentage gap closed by Tr.(y) im-
proves from 43.04 (see Table 3) to 45.08 with the 5 additional GMICs, a 2%
improvement. A smaller, but still non-negligible improvement, is obtained by
using GMICs together with Tr.(x+y): the percentage gap closed goes from 51.54
(see Table 3) to 52.33.

Tentative Observations. As for the 2-row instances, we collected statistics on


the separated type 2 triangle cuts in Table 4. In addition to the information
reported for 2-row instances, we collected the percentage of cuts separated by
four distinct row-pairing types. Namely, the pairings are distinguished among:
two sparse rows with many overlapping variables (s-s-s in the table), two sparse
rows with few overlapping variables (s-s-d), one sparse and one dense row (s-d),
and two dense rows (d-d). Note that the rows in the sets A, B and C are all
dense, thus only case “d-d” above occurs.
Below we note possible interpretations of the results reported in the
Tables:
– Gap closed by triangle cutting planes for five row problem: From 2 rows to
5 rows, we observe that the amount of gap closed deteriorates. This is not
surprising. However, the triangle cutting planes still seem to do significantly
well and could possibly be important with more rows.
– Effect of Sparsity: Sparsity significantly improves the quality of the gap
closed both by the triangle cutting planes and GMICs. Interestingly, for the
sparse instances, the GMICs and the triangle inequalities seem to be different
from each other, and their effect is additive.
– Use of Sparsity for row selection: From the tables it is clear that that the
triangle cutting planes that are generated by the use of two rows of the same
sparsity pattern are more useful. This suggests that searching for tableau
rows and selecting pairs that have similar sparsity pattern to generate two
row cutting planes may be a useful heuristic.

Table 4. Type 2 Triangle Statistics on 5-row Instances

%Min angle %Row pairing


Cut type Sperated/Tight 0–15 15–30 30–45 45–60 s-s-s s-s-d s-d d-d
T.(y) Separated in A,B,C 65.33 23.30 8.87 2.49 0.00 0.00 0.00 100.00
Tight in A 68.75 20.63 8.75 1.88 0.00 0.00 0.00 100.00
Tight in B 79.43 12.00 8.00 0.57 0.00 0.00 0.00 100.00
Tight in C 92.74 4.84 1.61 0.81 0.00 0.00 0.00 100.00

Generated in C.s 67.68 23.42 7.59 1.31 8.69 0.00 55.39 35.92
Tight in C.s 69.29 23.60 6.37 0.75 32.21 0.00 28.84 38.95
Tr.(x+y) Separated in B,C 61.80 25.36 10.15 2.69 0.00 0.00 0.00 100.00
Tight in B 81.41 14.57 3.52 0.50 0.00 0.00 0.00 100.00
Tight in C 94.12 5.29 0.59 0.00 0.00 0.00 0.00 100.00

Separated in C.s 61.57 25.36 10.45 2.62 7.29 0.00 43.90 48.81
Tight in C.s 76.86 16.29 6.57 0.29 35.43 0.00 24.86 39.71
436 S.S. Dey et al.

5 Conclusions

In this paper, we carried out experiments with type 2 triangle inequalities on


different classes of randomly generated instances designed to study the effect
of integer non-basic variables, multiple rows and their sparsity patterns. By
comparing the gap closed by these inequalities and the Gomory mixed integer
cuts for one round, we were able to obtain some understanding of these cutting
planes. It is clear that the comparison might be unfair because only few GMICs
can be separated with respect to a huge number of type 2 triangle cuts but
this is inherent to the fact that we compared a one constraint relaxation with
a two constraint relaxation. However, although there are numerous triangles of
type 2, very few of these cuts are eventually useful. Concerning quality, we have
observed that both GMICs and triangle inequalities significantly deteriorate in
performance in the presence of important non-basic integer variables. Finally,
we believe that the computational methodology introduced in this paper may
be useful in the future to analyze multi-row cuts whose inherent structure is
rather different from that of the classical 1-row cuts.
While these initial observations are interesting, the important question of cut
selection still remains. Moreover, additional experiments need to be conducted
on more application-oriented problems and possibly more rounds of cuts need
to be considered to truly understand the impact of these cutting planes.

References
1. Achterberg, T., Koch, T., Martin, A.: MIPLIB 2003. Operations Research Let-
ters 34, 361–372 (2006), http://miplib.zib.de
2. Andersen, K., Cornuéjols, G., Li, Y.: Reduce-and-split cuts: Improving the perfor-
mance of mixed integer Gomory cuts. Management Science 51, 1720–1732 (2005)
3. Andersen, K., Louveaux, Q., Weismantel, R.: An analysis of mixed integer linear
sets based on lattice point free convex sets. Mathematics of Operations Research 35,
233–256 (2010)
4. Andersen, K., Louveaux, Q., Weismantel, R.: Mixed-integer sets from two rows of
two adjacent simplex bases (2009), http://hdl.handle.net/2268/35089
5. Andersen, K., Louveaux, Q., Weismantel, R., Wolsey, L.A.: Cutting planes from
two rows of a simplex tableau. In: Fischetti, M., Williamson, D.P. (eds.) Pro-
ceedings 12th Conference on Integer and Combinatorial Optimization, pp. 1–15.
Springer, Heidelberg (2007)
6. Atamturk, A.: http://ieor.berkeley.edu/~ atamturk/data/
7. Balas, E.: Intersection cuts - a new type of cutting planes for integer programming.
Operations Research 19, 19–39 (1971)
8. Balas, E., Jeroslow, R.: Strenghtening cuts for mixed integer programs. European
Journal of Operations Research 4, 224–234 (1980)
9. Basu, A., Conforti, M., Cornuéjols, G., Zambelli, G.: Maximal lattice-free convex
sets in linear subspaces (2009), http://www.math.unipd.it/~ giacomo/
10. Basu, A., Conforti, M., Cornuéjols, G., Zambelli, G.: Minimal inequalities for an
infinite relaxation of integer programs (2009),
http://www.math.unipd.it/~ giacomo/
Experiments with Two Row Tableau Cuts 437

11. Borozan, V., Cornuéjols, G.: Minimal valid inequalities for integer constraints.
Mathematics of Operations Research 34, 538–546 (2009)
12. Conforti, M., Cornuéjols, G., Zambelli, G.: A geometric perspective on lifting
(2009), http://www.math.unipd.it/~ giacomo/
13. Cornuéjols, G., Margot, F.: On the facets of mixed integer programs with two
integer variables and two constraints. Mathematical Programming 120, 429–456
(2009)
14. Dey, S.S., Tramontani, A.: Recent developments in multi-row cuts. Optima 80, 2–8
(2009)
15. Dey, S.S., Wolsey, L.A.: Lifting integer variables in minimal inequalities correspond-
ing to lattice-free triangles. In: Lodi, A., Panconesi, A., Rinaldi, G. (eds.) Proceed-
ings 13th Conference on Integer and Combinatorial Optimization, pp. 463–475.
Springer, Heidelberg (2008)
16. Dey, S.S., Wolsey, L.A.: Constrained infinite group relaxations of MIPs, Tech. Re-
port CORE DP 33, Université catholique de Louvain, Louvain-la-Neuve, Belgium
(2009)
17. Espinoza, D.: Computing with multiple-row Gomory cuts. In: Lodi, A., Panconesi,
A., Rinaldi, G. (eds.) IPCO 2008. LNCS, vol. 5035, pp. 214–224. Springer, Heidel-
berg (2008)
18. Fischetti, M., Saturni, C.: Mixed integer cuts from cyclic groups. Mathematical
Programming 109, 27–53 (2007)
19. Gomory, R.E.: An algorithm for integer solutions to linear programs. In: Graves,
R.L., Wolfe, P. (eds.) Recent Advances in Mathematical Programming, pp. 269–
308. Mcgraw-Hill Book Company Inc., New York (1963)
20. Gomory, R.E., Johnson, E.L.: Some continuous functions related to corner polyhe-
dra, part I. Mathematical Programming 3, 23–85 (1972)
21. Gomory, R.E., Johnson, E.L.: Some continuous functions related to corner polyhe-
dra, part II. Mathematical Programming 3, 359–389 (1972)
22. Gomory, R.E., Johnson, E.L.: T-space and cutting planes. Mathematical Program-
ming 96, 341–375 (2003)
23. Johnson, E.L.: On the group problem for mixed integer programming. Mathemat-
ical Programming Study 2, 137–179 (1974)
24. Lovász, L.: Geometry of numbers and integer programming. Mathematical Pro-
gramming: Recent Developments and Applications, 177–210 (1989)
25. Nemhauser, G.L., Wolsey, L.A.: A recursive procedure to generate all cuts for 0-1
mixed integer programs. Mathematical Programming 46, 379–390 (1990)
26. Zambelli, G.: On degenerate multi-row Gomory cuts. Operations Research Let-
ters 37, 21–22 (2009)
An OP T + 1 Algorithm for the Cutting Stock
Problem with Constant Number of Object
Lengths

Klaus Jansen1, and Roberto Solis-Oba2,


1
Institut für Informatik, Universität zu Kiel, Kiel, Germany
[email protected]
2
Department of Computer Science, The University of Western Ontario,
London, Canada
[email protected]

Abstract. In the cutting stock problem we are given a set T of object


types, where objects of type Ti ∈ T have integer length pi > 0. Given a
set O of n objects containing ni objects of type Ti , for each i = 1, . . . , d,
the problem is to pack O into the minimum number of bins of capacity β.
In this paper we consider the version of the problem in which the number
d of different object types is constant and we present an algorithm that
computes a solution using at most OP T + 1 bins, where OP T is the
value of an optimum solution.

1 Introduction

In the cutting stock problem we are given a set T = {T1 , T2 , . . . , Td } of object


types, where objects of type Ti have positive integer length pi . Given an infinite
set of bins, each of integer capacity β, the problem is to pack a set O of n objects
into the minimum possible number of bins in such a way that the capacity of the
bins is not exceeded; in set O there are ni objects of type Ti , for all i = 1, . . . , d.
In this paper we consider the version of the problem in which the number d of
different object types is constant.
In the related bin packing problem the goal is to pack a set of n objects with
positive integer lengths into the minimum possible number of unit capacity bins.
The cutting stock problem can be considered as the high multiplicity version of
bin packing, as defined by Hochbaum and Shamir [8]. In a high multiplicity
problem, the input objects are partitioned into types and all objects of the same
type are identical. The number of objects of a given type is called the type’s mul-
tiplicity. Note that a high multiplicity problem allows a compact representation
of the input, as the attributes of each type need to be listed only once along with
the multiplicity of the type. Hence, any instance of the cutting stock problem

Research partially supported by the European Project AEOLUS, contract 015964.

Research partially supported by the Natural Sciences and Engineering Research
Council of Canada, grant 227829-2009.

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 438–449, 2010.

c Springer-Verlag Berlin Heidelberg 2010
An OP T + 1 Algorithm for the Cutting Stock Problem 439

with a constant number of object types can be represented with a number of


bits that is logarithmic in the number of objects.
There is extensive research literature on the bin packing and cutting stock
problems, attesting to their importance, both, from the theoretical and practi-
cal points of view (see e.g. the survey by Coffman et al. [1]). The cutting stock
problem was introduced by Eisemann [2] in 1957 under the name of the “Trim
problem”. The cutting stock and bin packing problems are known to be strongly
NP-hard and no approximation algorithm for them can have approximation ra-
tio smaller than 3/2 unless P = NP. In 1985 Marcotte [17] showed that the
cutting stock problem with two different object types has the so called integer
round-up property and so the algorithm by Orlin in [19] can solve this particu-
lar version of the problem in polynomial time. Later, McCormick et al. [18] pre-
sented a more efficient O(log2 β log n) time algorithm for this same version of the
problem.
Filippi and Agnetis [4] proposed an algorithm for the cutting stock problem
that uses at most OP T + d − 2 bins, where OP T is the value of an optimum
solution; hence, this algorithm also finds an optimum solution for the case of
d = 2. Recently, Filippi [5] improved on the above algorithm for the case when
d ≥ 4 by providing an algorithm that uses at most OP T + 1 bins for 2 < d ≤ 6
and at most OP T + 1 + (d − 1)/3 bins for d > 6. From the asymptotic point
of view, the best known algorithm for the problem is by Karmarkar and Karp
[12] and it produces solutions of value at most OP T + log2 d. This algorithm has
running time that is polynomial in log n, and interestingly the exponent of log n
in the running time is independent of d.
It is not known whether the cutting stock problem can be solved in poly-
nomial time for every fixed value d. Similarly, it is not known whether there
is any polynomial time algorithm for bin packing that produces a solution of
value at most OP T + k for some constant value k. In this paper we make
further progress towards answering these questions by providing an algorithm
for the cutting stock problem that uses at most OP T + 1 bins, for any fixed
value d.
3d+3
Theorem 1. There is an O(d21d 22 (log2 n + log β)4 ) time algorithm for the
cutting stock problem with a constant number d of different object types, that
solves the problem using at most OP T + 1 bins, where OP T is the value of an
optimum solution.

When computing time complexities we use the log-cost RAM model, where each
arithmetic operation requires time proportional to the logarithm of the size of its
operands. Our algorithm uses a variation of the integer programming formula-
tion (IP) for the cutting stock problem of Gilmore and Gomory [6]; furthermore,
we take advantage of a result by Eisenbrand and Shmonin [3] stating that IP
has an optimum solution with only a constant number of positive variables.
By partitioning the set of objects into two groups of small and big objects, we
can re-write IP so that only a constant number of constraints is needed to restrict
440 K. Jansen and R. Solis-Oba

the placement of big objects in the bins. Then, by relaxing the integrality con-
straints on the variables controlling the packing for the small objects, we obtain
a mixed integer program with a constant number of integer variables. We show
that this mixed integer program can be solved in polynomial time using Lenstra’s
algorithm [15], and a simple rounding procedure can be then used to transform
this solution into a feasible solution for the cutting stock problem that uses at
most OP T + 1 bins.

2 Mixed Integer Program Formulation


1
Let ε = d(2d +d+1) . We partition the set O of objects in two sets: The big objects,
with lengths at least εβ, and the small objects, with lengths smaller than εβ.
Without loss of generality, let p1 , . . . , pα be the different lengths of the big objects
and let pα+1 , . . . , pd be the lengths of the small objects. Note that a bin can have
at most 1/ε big objects in it.
A configuration Ci is a set of objects of total length at most β, so all ob-
jects in a configuration can be packed in a bin. Given a configuration Ci , the
subset CiB of big objects in Ci is called a big configuration and the subset CiS
of small objects in Ci is called a small configuration. Observe that CiB could
be empty. A configuration Ci can be specified with a d-dimensional vector
Ci = a(Ci , 1), a(Ci , 2) . . . , a(Ci , d) in which the j-th entry, a(Ci , j), specifies
the number of objects of length pj in Ci . As the number of different object
lengths is d, the number of different configurations is at most nd ; similarly, the
number of different big configurations is at most 1/εd.
Let C be the set of all configurations. The cutting stock problem can be
formulated as the following integer program, first proposed by Gilmore and
Gomory [6].

IP : min xCi
Ci ∈ C

s.t. a(Ci , j)xCi ≥ nj , for j = 1, . . . , d (1)
Ci ∈ C
xCi ∈ Z≥0 , for all Ci ∈ C

In this integer program, nj is the total number of objects of length pj , and


for each configuration Ci , variable xCi indicates the number of bins storing ob-
jects according to Ci . Constraint (1) ensures that all objects are placed in the
bins.
Eisenbrand and Shmonin [3] show that this integer program has an optimum
solution x∗ in which at most 2d of the variables x∗Ci are non-zero. We will use this
result to re-write IP so that the number of big configurations used is at most 2d .
To do this, let us first split each configuration Ci ∈ C into a big, CiB , and a small,
An OP T + 1 Algorithm for the Cutting Stock Problem 441

CiS , configuration. Let C B be the set of all big configurations. Note that C B
includes the configuration with no big objects in it.
IP, then, can be re-written as the following feasibility problem.

IP1 : xCi ≤ y , for all B ∈ C B (2)
Ci ∈ C
CiB =B

y  = m∗ (3)
B ∈ C B

a(B , j)y ≥ nj , for all j = 1, . . . , α (4)
B ∈ C B

a(Ci , j)xCi ≥ nj , for all j = α + 1, . . . , d (5)
Ci ∈ C
xCi ∈ Z≥0 , for all Ci ∈ C
y ∈ Z≥0 , for all B ∈ C B

In this integer program a(B , j) is the number of objects of length pj in configu-


ration B , y is a variable indicating the number of bins in which the big objects
are packed according to big configuration B , and m∗ is the minimum number
of bins needed to pack all the objects. Constraint (3) ensures that the number
of bins used is the optimum one, while constraints (4) and (5) guarantee that
all big and small objects are packed in the bins.

Lemma 1. x = xCi , xC2 , . . . , xC|C|  is an optimum solution for IP if and only


if (x, y) is a feasible solution for IP1, where y = y1 , y2 , . . . , y|C B | , and y =

Ci ∈ C xCi for every index
= 1, 2, . . . , |C |.
B
CiB =B

From Corollary 6 in [3] and Lemma 1, IP1 has an optimum solution (x∗ , y ∗ )

with at most 2d non-zero variables y ∗ . Let S B be the set of at most 2d big
configurations corresponding to the non-zero variables in y ∗ . Thus, we can
reduce the number of constraints of type (2) by not considering all big con-

figurations C B , but only those in S B . Since we do not know the optimum
solution (x∗ , y ∗ ) we do not know either which big configurations to select. How-
d
ever,as the number of big configurations is only 1/ε , there is a constant num-
−d
ε
ber of subsets S B of 2d big configurations, so we can try them all
2d
knowing that one of them will lead to an optimum solution. Furthermore, in
IP1 the value of m∗ is unknown. However, since m∗ ≤ n, we can use binary
search to find in O(log n) iterations the smallest value for m∗ for which IP1
has a feasible solution. Finally, we consider a mixed integer linear program-
ming relaxation of IP1 by relaxing the integrality constraints on the variables
xCi :
442 K. Jansen and R. Solis-Oba


MILP(m, S B ) : xCi ≤ y , for all B ∈ S B (6)
Ci ∈ C
CiB =B

y = m (7)
B ∈ S B

a(B , j)y ≥ nj , for all j = 1, . . . , α (8)
B ∈ S B

a(Ci , j)xCi ≥ nj , for all j = α + 1, . . . , d (9)
Ci ∈ C
CiB ∈S B

xCi ≥ 0, for all Ci ∈ C such that CiB ∈ S B


y ∈ Z≥0 , for all B ∈ S B
xCi = 0, for all Ci ∈ C such that CiB ∈ S B
y = 0, for all B ∈ S B

where m is the number of bins and S B is a subset of big configurations.

3 Rounding
In Section 4 we show how to solve MILP(m, S B ) using Lenstra’s algorithm [15].

Let (x+ , y + ) be the solution produced by Lenstra’s algorithm for MILP(m∗ , S B );
as we show below (see Theorem 2), in this solution at most 2d + d + 1 of the
variables x+ have non-zero value. We show now how to obtain an integer solution
from (x+ , y + ) that uses at most m∗ + 1 bins.

For each big configuration B ∈ S B such that y+ > 0, the solution (x+ , y + )
+ +
uses y  bins so that the big objects stored in them conform to B . Let Δ =
y − Ci ∈ C;C B =B xCi . If for some big configuration B , Δ > 0, then we select
+ + +
i
any Ch ∈ C such that ChB = B and increase the value of x+ +
Ch by Δ . Note that
this change does not affect the number of bins that it uses, but it ensures that
constraint (6) is satisfied with equality for every B ∈ C B ; we need this property
for our rounding procedure. For simplicity, let us denote the new solution as
(x+ , y + ).

For each Ci ∈ C such that x+ Ci > 0, let xCi = xCi − xCi , so xCi can be split
+ + +

 
into an integer part x+ Ci  and a fractional one xCi . We round each xCi to an
integer value as follows.

1. Let C  = {Ci | Ci ∈ C and xCi > 0}.


∗ 
2. Consider a B ∈ S B such that Ci ∈C  ,C B =B xCi > 0 and select a set Q ⊆ C 
 i 
such that CiB = B for all Ci ∈ Q, Ci ∈Q xCi ≥ 1, and Ci ∈Q xCi < 1 for
all Q ⊂ Q.
An OP T + 1 Algorithm for the Cutting Stock Problem 443


Note that since condition (6) holds with equality for every B ∈ S B and
y is integer, then set Q always exists, unless C  = ∅.
Take any configuration Cp ∈ Q and split xCp into two parts xCp , x
Cp so
that xCp = xCp + x
Cp and

xCp + xCi = 1. (10)
Ci ∈Q\{Cp }

Observe that x  


Cp could be equal to zero. For simplicity, set xCp = xCp .
3. All configurations in Q have the same big configuration B , so we will
combine all configurations of Q into  a single pseudo-configuration CQ =
CQ1 , CQ2 , . . . , CQd , where CQj = Ci ∈Q (a(Ci , j)xCi ). Note that

⎛⎛ ⎞ ⎞

d 
d 
(CQj × pπ(j) ) = ⎝⎝ (a(Ci , j)xCi )⎠ pπ(j) ⎠
j=1 j=1 Ci ∈Q
⎛ ⎛ ⎞⎞
 d
= ⎝xC ⎝ (a(Ci , j)pπ(j) )⎠⎠
i
Ci ∈Q j=1

≤ xCi β, as each Ci is a configuration,
Ci ∈Q


d
so (a(Ci , j)pπ(j) ) ≤ β
j=1
= β, by (10).

so, the total size of the objects in CQ is β. But, CQ might not be a fea-
sible configuration, as some of its components CQi might not be integer.

Let C Q = CQ1 , CQ2 , . . . , CQd , and CQ = CQ1 − CQ1 , CQ2 −
CQ2 , . . . , CQd − CQd . Clearly C Q is a valid configuration and each com-

ponent CQi − CQi  of CQ has value smaller than 1.

Note that the first α components of CQ are zero because for all configu-
  
rations Cj ∈ Q, Cj = B , hence,

B
 for j ≤ α, CQj = Ci ∈Q a(Ci , j)xCi =
 
Ci ∈Q (a(B , j)xCi ) = a(B , j) Ci ∈Q xCi = a(B , j), by (10); observe that

a(B , j) is integer. Thus, each CQ is of the form

   
CQ = 0, . . . , 0, CQ,α+1 , . . . , CQ,d  where 0 ≤ CQ,i < 1 for all i = 1, . . . , d.
(11)

We can think of CQ as containing only a fraction (maybe equal to zero) of

an object of each different small length. The fractional items in CQ are set
aside for the time being.
4. Remove Q from C  and if x  
Cp > 0, add Cp back to C and set xCp = xCp .


5. If C = ∅ go back to Step 2.
444 K. Jansen and R. Solis-Oba

The above procedure yields a solution that uses m∗ bins, but the objects from

the configurations CQ identified in Step 3 are not yet packed. Let CQ be the set
of all these configurations.

Lemma 2. All objects in configurations CQ can be packed in one bin.

Lemma 3. The above rounding procedure packs all the objects in at most m∗ +1
bins.

A description of our algorithm for bin packing is given below.

Algorithm BinPacking(P, U, β)
Input: Sets P = {p1 , . . . , pd }, U = {n1 , . . . , nd } of object lengths and their
multiplicities; capacity β of each bin.
Output: A packing for the objects into at most OP T + 1 bins.
1
1. Set ε = d(2d +d+1) and then partition the set of objects into big (of length at
least εβ) and small (of length smaller than εβ). Set m∗ = n.
2. For each set S B of 2d big objects do :
Use Lenstra’s algorithm and binary search over the set V =
{1, 2, . . . , m∗ } to find the smallest value j ∈ V , if any, for which
MILP(j, S B ) has a solution.
If a value j < m∗ was found for which MILP(j, S B ) has a solution,
then set m∗ = j and let (x+ , y + ) be the solution computed by Lenstra’s
algorithm for MILP(j, S B ).
3. Round (x+ , y + ) as described above and output the corresponding packing
of the objects into m∗ + 1 bins.

Lemma 4. Algorithm BinPacking computes a solution for the cutting stock


problem that uses at most OP T + 1 bins.

4 Solving the Mixed Integer Linear Program

Lenstra’s algorithm [15] can be used to solve mixed integer linear programs
in which the number of integer variables is constant; the time complexity of
the algorithm is O(P (N  )), where P is a polynomial and N  = O(M η log k)
is the maximum number of bits needed to specify the input, where M is the
number of constraints in the mixed integer linear program, η is the number of
variables, and k is the maximum of the absolute values of the coefficients of
the constraints. Since MILP(m, S B ) has O(nd ) variables, it might seem that
the time complexity of Lenstra’s algorithm is too high for our purposes as an
instance of the high multiplicity bin packing problem is specified with only N =
 d 
i=1 (log pi + log ni ) + log β = O(log β + log n) bits, and so P (N ) is not a
polynomial function of N . In this section we show that Lenstra’s algorithm can,
in fact, be implemented to run in time polynomial in N .
An OP T + 1 Algorithm for the Cutting Stock Problem 445

The set of constraints of MILP(m, S B ) can be written in the form A(y, x) ≤ b.


Let K  denote the closed convex set

K  = {(y, x) ∈ R2
d
+nd
| A(y, x) ≤ b}

and let

such that (y, x) ∈ K  };


d d
K = {y ∈ R2 | there exists x ∈ Rn

then deciding whether MILP(m, S B ) has a feasible solution is equivalent to de-


d
ciding whether K ∩ Z2 = ∅. For completeness we give below a brief description
of Lenstra’s algorithm.

Algorithm Lenstra(K)
Input: Closed convex set K of dimension D.
Output: A point in K ∩ ZD , if one exists, or null if K ∩ ZD = ∅.
1. Reduce the dimension of K until we ensure that K has positive volume.
2. Compute a linear transformation τ that maps K into a ball-like set τ K such
that there is a point σ and radii r, R with Rr = 2D3/2 for which B(σ, r) ⊂
τ K ⊂ B(σ, R), where B(σ, z) ⊂ RD is the closed ball with center σ and
radius z.
Compute a reduced basis b1 , b2 , . . . , bD for the lattice τ ZD : a basis such that
3. <
D
i=1 !bi ! ≤ 2 × |determinant(b1 , b2 , . . . , bD )|, where ! ! denotes the
D(D−1)/4

Euclidean norm. √
4. Find a point v ∈ τ ZD such that !v − σ! ≤ 12 D max{!bi ! | i = 1, . . . , D}.
5. If v ∈ τ K then output τ −1 v.
D−1
6. If v ∈ τ K let H = i=1 (Rbi ) be the (D − 1)-hyperplane spanned by
b1 , . . . , bD−1 .
For each integer i such that H + ibD intersects B(σ, R) do :
Let K be the intersection of K with H + ibD .
If v =Lenstra(K) is not null, then output τ −1 (v, ibD ).
Output null.

4.1 Time Complexity of Lenstra’s Algorithm


Step 1 of Lenstra’s algorithm (for details see [15]) requires maximizing O(22d )
linear functions on K and sometimes using the Hermite normal form algorithm
of Kannan and Bachem [10]; the algorithm in [10] requires the computation of
O(22d ) determinants of matrices of size O(2d ) × O(2d ), each of whose entries
is encoded with log n bits, and so it runs in O(24.7d log n) time by using the
algorithm of [9] for computing the determinants.
Maximizing a linear function f (y) = f1 y1 +f2 y2 +· · · f2d y2d on K is equivalent
to maximizing on K  a linear function that depends only on the 2d variables y;
this latter problem can be written as follows.
446 K. Jansen and R. Solis-Oba


LP : max f y
B ∈S B

s.t. y − xCi ≥ 0, for all B ∈ S B
Ci ∈ C
CiB =B

− y ≥ −m
B ∈S B

a(B , j)y ≥ nj , for all j = 1, . . . , α
B ∈S B

a(Ci , j)xCi ≥ nj , for all j = α + 1, . . . , d
Ci ∈ C
CiB ∈S B

y ≥ 0 for all B ∈ S B ; xCi ≥ 0, for all Ci ∈ C such that CiB ∈ S B

LP has 2d + d + 1 constraints, but it might have a very large number of variables,


so we deal with its dual instead:

d
DLP : min δ0 m − (λj nj )
j=1

α
s.t. δ − δ0 + (a(B , j)λj ) ≤ −f , for all B ∈ S B (12)
j=1


d
−δ + (a(Ci , j)λj ) ≤ 0, for all Ci ∈ C,
j=α+1

B ∈ S B s.t. CiB = B (13)


δ0 ≥ 0, δ ≥ 0 for all B ∈ S B

λj ≥ 0, j = 1, . . . , d

We use the ellipsoid algorithm [13,7] to solve DLP. Note that DLP has only
2d + d + 1 variables, but it has a large number O(nd ) of constraints, so for
the ellipsoid’s algorithm to solve DLP in time polynomial in N , we need an
efficient separation oracle that given a vector δ = λ1 , . . . , λd , δ0 , . . . , δ2d  it either
determines that δ is a feasible solution for DLP or it finds a constraint of DLP
that is violated by δ.
To design this separation oracle, we can think that each object oi ∈ O has
length pi and value λi . Each constraint (12) can be tested in constant time.
However, constraints (13) are a bit harder to test. Since a configuration Ci for
which CiB = B ∈ S B includes small objects of total length at most β − β ,
where β is the total length of the big objects in B , then constraints (13) check
that for each Ci ∈ C and B ∈ S B such that CiB = B , the set of small objects
in Ci has total value at most δ .
An OP T + 1 Algorithm for the Cutting Stock Problem 447

Hence, (as it was also observed by Karmakar and Karp [12]) to determine
whether the constraints (13) are satisfied we need to solve an instance of the
knapsack problem where the input is the set of small objects and the knapsack
has capacity β − β . If the maximum value of any subset of small objects of total
length at most β −β is larger than δ then we know that a constraint of type (13)
is violated; furthermore, the solution of the knapsack problem indicates exactly
which constraint is not satisfied by δ.
Therefore, a separation oracle for DLP needs to be able to efficiently solve
any instance of the knapsack problem formed by a set of objects of d ≤ d
different types, where objects of type i have length pi and value λi . This knapsack
problem can be formulated as an integer program with a constant number of
variables and so it can be solved, for example, by using Kannan’s algorithm [11]
in O(d9d (d28d log2 n + log β)3 ) time.
By Lemmas 2.1 and 8.5 of [20] the basic feasible solutions of DLP are 2d -
vectors of rational numbers whose numerators and denominators have absolute
d
values at most L = (2d )!nd2 . Therefore, we can use the version of the ellipsoid
algorithm described in [7] with precision L−1 to solve DLP in time polynomial
in log n and log β.

Lemma 5. The maximum number of bits needed to encode each value in the
solution (δ, λ) computed by the ellipsoid algorithm for DLP, is O(d28d log2 n).

Lemma 6. The running time of the ellipsoid algorithm when solving DLP is
O(d9d 24d (d28d log2 n + log β)3 log n).

Lemma 7. Step 1 of Lenstra’s algorithm can be performed in time

O(d9d 26d (d28d log2 n + log β)3 log n).

Step 2 of algorithm Lenstra requires finding a linear transformation τ that


maps K into a set τ K that is nearly “spherical”. To do this, a simplex S spanned
by O(2d ) vertices of K is first found and then it is transformed into a polyhedron
of “large” volume through an iterative procedure that at each step modifies S
by replacing one of its vertices with some vertex v of K such that the volume of
the resulting polyhedron increases by at least a factor of 3/2. Each vertex v can
be found by maximizing 2d+1 linear functions on K.
Since the volume of the simplex S is at least 1/(2d)! (see [15]) and by con-
d
straint (7) of MILP(m, S B ) the volume of K is at most n2 , then the number
d
of iterations in the above procedure is at most log3/2 (n2 (2d )!) = O(2d log n).
Therefore, for step 2 of algorithm Lenstra we need to maximize O(2d log n)
linear functions over K. By using the ellipsoid algorithm to maximize each lin-
ear function, the total time needed to perform step 2 of Lenstra’s algorithm is
O(d9d 25d (d28d log2 n + log β)3 log2 n).
For Step 3 we can use the algorithm of Lenstra, Lenstra, and Lovász [14] to
find a reduced basis. By Proposition 1.26 in [14] this step can be performed in
O(26d ) time. Step 4 requires projecting σ over each one of the vectors bi from the
reduced basis and then rounding each component of the projection down to the
448 K. Jansen and R. Solis-Oba

nearest integer. This step requires O(2d ) multiplications on numbers encoded


with O(26d ) bits, so this step can be performed in O(212d ) time.
Finally, in Step 5, to decide whether y ∈ τ K, we need to determine whether
y  = τ −1 y ∈ K. This requires us to solve MILP when the values of the variables
y are known: this is just linear program LP when the objective function f is
constant and the values for the variables y are given. The dual of this linear
program is DLP without constraints (12), so we can solve it using the ellipsoid
algorithm.
Lemma 8. The running time of Lenstra’s algorithm is
3d
O(d9d 22 +6d
(d28d log2 n + log β)3 log2 n).

Proof. We have shown that steps 1-5 of the algorithm can be performed in time
T = O(d9d 26d (d28d log2 n + log β)3 log2 n). As shown in [15] in the “for” loop
of step 6 we need to consider 21+2d+2 (2 −1)/4 different values for i. In each
d d

iteration of the “for” loop we perform a recursive call to the algorithm, and the
recursive call dominates the running time of every iteration of the loop.
Let F (D) be the time complexity of the algorithm when the input convex set
has dimension D. Then, F (D) satisfies the following recurrence:
d
(2d −1)/4
F (D) = T + 21+d+2 F (D − 1).
!
d
(2d −1)/4 D
Therefore, F (D) = O T (21+d+2 ) . Since D = 2d , the complexity of
3d
the algorithm is O(d9d 22 +6d
(d28d log2 n + log β)3 log2 n).

Theorem 2. A solution for MILP(m∗ , S B ) in which at most 2d + d + 1 vari-
3d
ables xCi have positive value can be computed in O(d18d 22 +6d (d28d log2 n +
3 2
log β) log n) time.
Proof of Theorem 1: By Lemma 4 algorithm BinPacking produces a solution
for the high multiplicity bin packing problem using at most OP T + 1 bins. By
3d+3
Theorem 2 the time complexity of BinPacking is O(d21d 22 (log2 n + log β)4 )
as the algorithm needs to solve O((ε−d )! log n) mixed integer linear programs.

References
1. Coffman Jr., E.G., Garey, M.R., Johnson, D.S.: Approximation algorithms for bin
packing: a survey. In: Hochbaum, D.S. (ed.) Approximation algorithms for NP-hard
problems, pp. 46–86. PWS Publishing Company (1997)
2. Eisemann, K.: The trim problem. Management Science 3(3), 279–284 (1957)
3. Eisenbrand, F., Shmonin, G.: Carathéorody bounds for integer cones. Operations
Research Letters 34, 564–568 (2006)
4. Filippi, C., Agnetis, A.: An asymptotically exact algorithm for the high-multiplicity
bin packing problem. Mathematical Programming 104(1), 21–57 (2005)
5. Filippi, C.: On the bin packing problem with a fixed number of object weights.
European Journal of Operational Research 181, 117–126 (2007)
An OP T + 1 Algorithm for the Cutting Stock Problem 449

6. Gilmore, P.C., Gomory, R.E.: A linear programming approach to the cutting stock
problem. Operations Research 9, 849–859 (1961)
7. Grötschel, M., Lovász, L., Schrijver, A.: The ellipsoid method and its consequences
in Combinatorial Optimization. Combinatorica 1(2), 169–197 (1981)
8. Hochbaum, D.S., Shamir, R.: Strongly polynomial algorithms for the high multi-
plicity scheduling problem. Operations Research 39, 648–653 (1991)
9. Kaltofen, E., Villard, G.: On the complexity of computing determinants. Compu-
tational Complexity 13(3-4), 91–130 (2004)
10. Kannan, R., Bachem, A.: Polynomial algorithms for computing the Smith and
Hermite normal forms of an integer matrix. SIAM Journal on Computing 8, 499–
507 (1979)
11. Kannan, R.: Minkowski’s convex body theorem and integer programming. Mathe-
matics of Operations Research 12(3), 415–440 (1987)
12. Karmakar, N., Karp, R.M.: An efficient approximation scheme for the one-
dimensional bin packing problem. In: Proceedings FOCS, pp. 312–520 (1982)
13. Khachiyan, L.G.: A polynomial algorithm in linear programming. Dokl. Akad.
Nauk. SSSR 244, 1093-1096 (1979); English translation: Soviet Math. Dokl. 20,
191–194 (1979)
14. Lenstra, A.K., Lenstra Jr., H.W., Lovász, L.: Factoring Polynomials with rational
coefficients. Math. Ann. 261, 515–534 (1982)
15. Lenstra Jr., H.W.: Integer programming with a fixed number of variables. Mathe-
matics of Operations Research 8(4), 538–548 (1983)
16. Lovász, L.: Complexity of algorithms (1998)
17. Marcotte, O.: The cutting stock problem and integer rounding. Mathematical Pro-
gramming 33, 82–92 (1985)
18. McCormick, S.T., Smallwood, S.R., Spieksma, F.C.R.: A polynomial algorithm for
multiprocessor scheduling with two job lengths. Math. Op. Res. 26, 31–49 (2001)
19. Orlin, J.B.: A polynomial algorithm for integer programming covering problems
satisfying the integer round-up property. Mathematical Programming 22, 231–235
(1982)
20. Papadimitriou, C.H., Steiglitz, K.: Combinatorial optimization: Algorithms and
complexity. Prentice-Hall, Inc., Englewood Cliffs (1982)
21. Schrijver, A.: Theory of Linear and Integer Programming. John Wiley, Chichester
(1986)
On the Rank of Cutting-Plane Proof Systems

Sebastian Pokutta1 and Andreas S. Schulz2


1
Technische Universität Darmstadt, Germany
[email protected]
2
Massachusetts Institute of Technology, USA
[email protected]

Abstract. We introduce a natural abstraction of propositional proof


systems that are based on cutting planes. This new class of proof sys-
tems includes well-known operators such as Gomory-Chvátal cuts, lift-
and-project cuts, Sherali-Adams cuts (for a fixed hierarchy level d), and
split cuts. The rank of such a proof system corresponds to the num-
ber of rounds needed to show the nonexistence of integral solutions. We
exhibit a family of polytopes without integral points contained in the n-
dimensional 0/1-cube that has rank Ω(n/ log n) for any proof system in
our class. In fact, we show that whenever a specific cutting-plane based
proof system has (maximal) rank n on a particular family of instances,
then any cutting-plane proof system in our class has rank Ω(n/ log n) for
this family. This shows that the rank complexity of worst-case instances
is intrinsic to the problem, and does not depend on specific cutting-plane
proof systems, except for log factors. We also construct a new cutting-
plane proof system that has worst-case rank O(n/ log n) for any poly-
tope without integral points, implying that the universal lower bound is
essentially tight.

Keywords: Cutting planes, proof systems, Gomory-Chvátal cuts, lift-


and-project cuts, split cuts.

1 Introduction
Cutting planes are a fundamental, theoretically and practically relevant tool in
combinatorial optimization and integer programming. Cutting planes help to
eliminate irrelevant fractional solutions from polyhedral relaxations while pre-
serving the feasibility of integer solutions. There are several well-known pro-
cedures to systematically derive valid inequalities for the integer hull PI of a
rational polyhedron P = {x ∈ Rn : Ax ≤ b} ⊆ [0, 1]n (see, e.g., [8, 9]). This
includes Gomory-Chvátal cuts [5, 17, 18, 19], lift-and-project cuts [1], Sherali-
Adams cuts [28], and the matrix cuts of Lovász and Schrijver [21], to name just
a few. Repeated application of these operators is guaranteed to yield a linear
description of the integer hull, and the question naturally arises of how many
rounds are, in fact, necessary. This gives rise to the notion of rank. For exam-
ple, it is known that the Gomory-Chvátal rank of a polytope contained in the
n-dimensional 0/1-cube is at most O(n2 log n) [14], whereas the rank of all other

F. Eisenbrand and B. Shepherd (Eds.): IPCO 2010, LNCS 6080, pp. 450–463, 2010.

c Springer-Verlag Berlin Heidelberg 2010
On the Rank of Cutting-Plane Proof Systems 451

methods mentioned before is bounded above by n, which is known to be tight


(see, e.g., [7, 8]). These convexification procedures can also be viewed as propo-
sitional proof systems (e.g., [6, 12, 13]), each using its own rules to prove that a
system of linear inequalities with integer coefficients does not have a 0/1-solution.
While exponential lower bounds on the lengths of the proofs were obtained for
specific systems (e.g., [3, 13, 25]), there is no general technique available that
would work for all propositional proof systems (which would actually prove that
NP = co-NP). We formalize the concept of an “admissible” cutting-plane proof
system (see Definition 1 below for details) and provide a generic framework that
comprises all proof systems based on cutting planes mentioned above and that
allows us to make general statements on the rank of these proof systems.

Our contribution. Our main contributions are as follows. The introduction of


admissible cutting-plane procedures exposes the commonalities of several well-
known operators and helps to explain several of their properties on a higher
level. It also allows us to uncover much deeper connections. In particular, in
the context of cutting-plane procedures as refutation systems in propositional
logic, we will show that if an arbitrary admissible cutting-plane procedure has
maximal rank n, then so does the Gomory-Chvátal procedure. In addition, the
rank of the matrix cuts of Lovász and Schrijver, Sherali and Adams, and Balas,
Ceria, and Cornuéjols and that of the split cut operator is at least n − 1. In
this sense, we show that some of the better known procedures are essentially the
weakest possible members in the family of admissible cutting-plane procedures.
However, we also provide a family of instances, i.e., polytopes P ⊆ [0, 1]n with
empty integer hull, for which the rank of any admissible cutting-plane procedure
is Ω(n/ log n). In fact, we show that the rank of any admissible cutting-plane
procedure is Ω(n/ log n) whenever there is an admissible cutting-plane proce-
dure that has maximal rank n. Last not least, we introduce a new cutting-plane
procedure whose rank is bounded by O(n/ log n), showing that the lower bound
is virtually tight. In the course of our proofs, we also exhibit several interest-
ing structural properties of integer-empty polytopes P ⊆ [0, 1]n with maximal
Gomory-Chvátal rank, maximal matrix cut rank, or maximal split rank.

Related work. A lower bound of n for the rank of the Gomory-Chvátal pro-
cedure for polytopes P ⊆ [0, 1]n with PI = ∅ was established in [6]. This was
later used to provide a lower bound of (1 + )n on the Gomory-Chvátal rank of
arbitrary polytopes P ⊆ [0, 1]n , showing that in contrast to most other cutting-
plane procedures, the Gomory-Chvátal procedure does not have an upper bound
of n if PI = ∅ [14]. The upper bound is known to be n if PI = ∅ [2]. Lower
bounds of n for the matrix cut operators N0 , N , and N+ of Balas, Ceria, and
Cornuéjols [1], Sherali and Adams [28], and Lovász and Schrijver [21] were given
in [7, 10, 16]. Lower bounds for the split cut operator SC were obtained in
[11]. These operators (and some strengthenings thereof) have recently regained
attention [15, 20, 24], partly due to an interesting connection between the inap-
proximability of certain combinatorial optimization problems and the integrality
gaps of their (LP and SDP) relaxations. For example, in [27] it was shown that
452 S. Pokutta and A.S. Schulz

the integrality gaps of the vertex cover and the max cut problem remain at least
2 −  after  n rounds of the Sherali-Adams operator. A related result for the
stronger
= Lovász-Schrijver operator established an integrality gap of 2 −  after
Ω( log n/ log log n) rounds [15]. In [26] it was shown that even for the stronger
Lasserre hierarchy [20] one cannot expect to be able to prove the unsatisfiability
of certain k-CSP formulas within Ω(n) rounds. As a result, a 76 −  integrality
gap for the vertex cover problem after Ω(n) rounds of the Lasserre hierarchy
follows. In [4], the strength of the Sherali-Adams operator is studied in terms of
integrality gaps for well-known problems like max cut, vertex cover, and spars-
est cut, and in [22] integrality gaps for the fractional matching polytope, which
has Gomory-Chvátal rank 1, are provided, showing that although the match-
ing problem can be solved in polynomial time, it cannot be approximated well
with a small number of rounds of the Sherali-Adams operator. In addition, it
was shown that for certain tautologies that can be expressed in first-order logic,
the Lovász-Schrijver N+ rank can be constant, whereas the Sherali-Adams rank
grows poly-logarithmically [12].

Outline. Our work complements these results in in a variety of ways. On the


one hand, we provide a basic framework (Section 2) that allows us to show
that in the case of polytopes P ⊆ [0, 1]n with PI = ∅ all admissible cutting-
plane procedures exhibit a similar behavior in terms of maximal rank, log factors
omitted (Section 4). On the other hand, we define a new cutting-plane procedure
that is optimal with respect to the lower bound, i.e., it establishes PI = ∅ in
O(n/ log n) rounds and thus outperforms well-known cutting-plane procedures
in terms of maximal rank (Section 5). We also derive sufficient and necessary
conditions on polytopes with maximal rank (Section 3).
Due to space limitations, we have to refer the reader to the literature (e.g.,
[8]) or to the full version of this paper for the definition of well-established
cutting-plane procedures.

2 Admissible Cutting-Plane Proof Systems

Let P = {x ∈ Rn : Ax ≤ b} be a rational polyhedron that is contained in the n-


dimensional 0/1-cube; i.e., we assume that A ∈ Zm×n , b ∈ Zm , and P ⊆ [0, 1]n .
We use ai to denote row i of A, and bi is the corresponding entry on the right-
hand side. The integer hull, PI , of P is the convex hull of all integer points in P ,
PI = conv(P ∩{0, 1}n). If F is a face of the n-dimensional unit cube, [0, 1]n , then
P ∩ F can be viewed as the set of those points in P for which certain coordinates
have been fixed to 0 or 1. We define ϕF (P ) as the projection of P onto the
space of variables that are not fixed by F . If P , Q ⊆ [0, 1]n are polytopes, we
say that P ∼ = Q if there exists a face F of the n-dimensional unit cube such that
ϕF (P ) = Q. Moreover, [n] = {1, 2, . . . , n}.
A cutting-plane procedure consists of an operator M that maps P to a closed
convex set M (P ), which we call the M -closure of P . Any linear inequality that
is valid for M (P ) is called an M -cut.
On the Rank of Cutting-Plane Proof Systems 453

Definition 1. We say that a cutting-plane procedure is admissible if M has the


following properties:
(1) M strengthens P and keeps PI intact: PI ⊆ M (P ) ⊆ P .
(2) Preservation of inclusion: If P ⊆ Q, then M (P ) ⊆ M (Q), for all poly-
topes P, Q ⊆ [0, 1]n .
(3) Homogeneity: M (F ∩ P ) = F ∩ M (P ), for all faces F of [0, 1]n .
(4) Single coordinate rounding: If xi ≤  < 1 (or xi ≥  > 0) is valid for P ,
then xi ≤ 0 (or xi ≥ 1) is valid for M (P ).
(5) Commuting with coordinate flips: Let τi : [0, 1]n → [0, 1]n with xi →
(1 − xi ) be a coordinate flip. Then τi (M (P )) = M (τi (P )).
(6) Short verification: There exists a polynomial p such that for any inequality
cx ≤ δ that is valid for M (P ) there is a set I ⊆ [m] with |I| ≤ p(n) such that
cx ≤ δ is valid for M ({x : ai x ≤ bi , i ∈ I}). We call p(n) the verification
degree of M .

Note that these conditions are quite natural and are indeed satisfied by all linear
cutting-plane procedures mentioned above; proofs and references are given in the
full version of this paper. Condition (1) ensures that M (P ) is a relaxation of PI
that is not worse than P itself. Condition (2) establishes the monotonicity of
the procedure; as any inequality that is valid for Q is also valid for P , the same
should hold for the corresponding M -cuts. Condition (3) states that the order in
which we fix certain variables to 0 or 1 and apply the operator should not matter.
Condition (4) makes sure that an admissible procedure is able to derive the most
basic conclusions, while Condition (5) makes certain that the natural symmetry
of the 0/1-cube is maintained. Finally, Condition (6) guarantees that admissible
cutting-plane procedures cannot be too powerful; otherwise even M (P ) = PI
would be included, and the family of admissible procedures would be too broad
to derive interesting results. Note also that (6) is akin to an independence of
irrelevant alternatives axiom.
A cutting-plane operator can be applied iteratively; we define M (i+1) (P ) :=
M (M (i) (P )).1 For consistency, we let M (1) (P ) := M (P ) and M (0) (P ) := P . Ob-
viously, PI ⊆ M (i+1) (P ) ⊆ M (i) (P ) ⊆ · · · ⊆ M (1) (P ) ⊆ M (0) (P ) = P . In gen-
eral, it is not clear whether there exists a finite k ∈ Z+ such that PI = M (k) (P ).
However, we will see that for polytopes P ⊆ [0, 1]n with PI = ∅ this follows
from properties (3) and (4). In this sense, every admissible cutting-plane proce-
dure can be viewed as a system for proving the unsatisfiability of propositional
formulas in conjunctive normal form (which can be naturally represented as sys-
tems of integer inequalities), which is the setting considered here. The rank of
P with respect to M is the minimal k ∈ Z+ such that PI = M (k) (P ). We write
rkM (P ) = k (and drop the index M if it is clear from the context).
In the following, we put together some useful properties of admissible cutting-
plane procedures that follow almost directly from the definition. We define L∩M
as (L ∩ M )(P ) := L(P ) ∩ M (P ) for all polytopes P ⊆ [0, 1]n .
1
To avoid technical difficulties, we assume that M (P ) is again a rational polytope
whenever we apply M repeatedly.
454 S. Pokutta and A.S. Schulz

Lemma 2. Let L and M be admissible cutting-plane procedures. Then L ∩ M


is an admissible cutting-plane procedure.
In other words, admissible cutting-plane procedures are closed under intersec-
tion. Moreover, whenever M is admissible, rk(Q) ≤ rk(P ) for Q ⊆ P ⊆ [0, 1]n
with PI = ∅:
Lemma 3. Let M be admissible and consider polytopes Q ⊆ P ⊆ [0, 1]n . Then
rk(Q) ≤ rk(P ) if QI = PI .

2.1 Universal Upper Bounds


We will now show that there is a natural upper bound on rk(P ) for any admissible
cutting-plane procedure M whenever PI = ∅, and that this upper bound is
attained if and only if the rank of P ∩ F is maximal for all faces F of [0, 1]n . The
proof of the first result is similar to that for the Gomory-Chvátal procedure [2,
Lemma 3] and is omitted from this extended abstract.
Theorem 4. Let M be an admissible cutting-plane procedure and let P ⊆ [0, 1]n
be a polytope of dimension d with PI = ∅. If d = 0, then M (P ) = ∅. If d > 0,
then rk(P ) ≤ d.
The following lemma states that rk(P ) is “sandwiched” between the largest rank
of P intersected with a facet of the 0/1-cube and that number plus one.
Lemma 5. Let M be an admissible cutting-plane procedure and let P ⊆
[0, 1]n be a polytope with PI = ∅. Then k ≤ rk(P ) ≤ k + 1, where k =
max(i,l)∈[n]×{0,1} rk(P ∩ {xi = l}). Moreover, if there exist i ∈ [n] and l ∈ {0, 1}
such that rk(P ∩ {xi = l}) < k, then rk(P ) = k.
Proof. Clearly, k ≤ rk(P ), by Lemma 3. For the right-hand side of the inequality,
observe that M k (P ) ∩ {xi = l} = M k (P ∩ {xi = l}) = ∅ by Property (3) of
Definition 1. It follows that xi < 1 and xi > 0 are valid for M k (P ) for all i ∈ [n].
Hence xi ≤ 0 and xi ≥ 1 are valid for M k+1 (P ) for all i ∈ [n] and we can deduce
M k+1 (P ) = ∅, i.e., rk(P ) ≤ k + 1. It remains to prove that rk(P ) = k if there
exist i ∈ [n] and l ∈ {0, 1} such that rk(P ∩ {xi = l}) =: h < k. Without loss
of generality assume that l = 1; otherwise apply the corresponding coordinate
flip. Then M h (P ) ∩ {xi = l} = ∅ and thus xi < 1 is valid for M h (P ) and
as h < k we can derive that xi ≤ 0 is valid for M k (P ). It follows now that
M k (P ) = M k (P ) ∩ {xi = 0} = M k (P ∩ {xi = 0}) = ∅, which implies rk(P ) ≤ k;
the claim follows. 

Interestingly, one can show that rk(P ∩ {xi = l}) = k for all (i, l) ∈ [n] × {0, 1} is
not sufficient for rk(P ) = k + 1. We immediately obtain the following corollary:
Corollary 6. Let M be an admissible cutting-plane procedure and let P ⊆ [0, 1]n
be a polytope with PI = ∅. Then rk(P ) = n if and only if rk(P ∩ F ) = k for all
k-dimensional faces F of [0, 1]n with 1 ≤ k ≤ n.
Proof. One direction follows by induction from Lemma 5; the other direction is
trivial. 

On the Rank of Cutting-Plane Proof Systems 455

3 Polytopes P ⊆ [0, 1]n with PI = ∅ and Maximal Rank


We will now study polytopes P ⊆ [0, 1]n with PI = ∅ and rkM (P ) = n, where
M is an admissible cutting-plane procedure. For lack of space, most proofs are
omitted from this section. They can be found in the full version of this paper.
We use e to denote the all-ones vector of appropriate dimension. For a face F of
the 0/1-cube, we define 12 eF to be fixed to 0 or 1 according to F and to 1/2 on
all other coordinates. Let Int(P ) denote the interior of P , and RIntF (P ) denote
the interior of ϕF (P ). We use Fk to denote the set of all n-dimensional points
that have exactly k coordinates equal to 1/2, and the other n − k coordinates
are 0 or 1. The polytope Bn is defined as
   
Bn := x ∈ [0, 1]n | xi + (1 − xi ) ≥ 1 for all S ⊆ [n]
i∈S i∈[n]\S

for all n ∈ Z+ . Note that Bn contains no integer points, and if F is a k-


dimensional face of [0, 1]n , then Bn ∩ F ∼
= Bk . Moreover, Bn = conv(F2 ).
Lemma 7. Let M be admissible. Then rk(Bn ) ≤ n − 1.

We first consider two-dimensional polytopes P ⊆ [0, 1]2 with maximal rank.

Theorem 8. Let M be admissible and let P ⊆ [0, 1]2 be a polytope with PI = ∅


so that rk(P ) = 2. Then
(1) 12 e ∈ Int(P ), and
(2) P ∩ {xi = l} = ∅ for all (i, l) ∈ [2] × {0, 1}.

If M is the Gomory-Chvátal operator, P  , we can obtain the stronger statement


that P  = { 21 e}. In general, 12 e ∈ M (P ). More specifically, in [23] the following
characterization of polytopes P ⊆ [0, 1]n with PI = ∅ and maximal Gomory-
Chvátal rank rkGC was shown:
Theorem 9. Let P ⊆ [0, 1]n be a polytope with PI = ∅. Then the following are
equivalent:
(1) rkGC (P ) = n.
(2) Bn = P  .
(3) F ∩ P = ∅ for all one-dimensional faces F of [0, 1]n .
We now prove a similar, but slightly weaker version for generic admissible cutting-
plane procedures. This weakening is a direct consequence of the fact that, in
general, 12 e ∈ Int(P ) fails to imply 12 e ∈ M (P ).
Theorem 10. Let M be admissible and let P ⊆ [0, 1]n be a polytope with PI = ∅
and rk(P ) = n. Then
(1) P ∩ F = ∅ for all one-dimensional
B faces F of [0, 1]n .
(2) For v ∈ F2 and F := (i,l)∈[n]×{0,1}:vi =l,l = 1 {xi = l}, v ∈ RIntF (P ).
2
(3) Bn ⊆ P .
456 S. Pokutta and A.S. Schulz

3.1 Results for Gomory-Chvátal Cuts, Matrix Cuts, and Split Cuts
We immediately obtain the following corollary from Theorem 10, which shows
that the Gomory-Chvátal procedure is, in some sense, weakest possible: When-
ever the rank of some admissible cutting-plane procedure is maximal, then so is
the Gomory-Chvátal rank. More precisely:
Corollary 11. Let M be admissible and let P ⊆ [0, 1]n be a polytope with PI = ∅
and rkM (P ) = n. Then rkGC (P ) = n.
Proof. By Theorem 10 (1) we have that P ∩ F = ∅ for all one-dimensional faces
F of [0, 1]n . With Theorem 9 we therefore obtain rkGC (P ) = n. 

Note that Corollary11 does not hold for polytopes P ⊆ [0, 1]n  with PI = ∅: Let
Pn = {x ∈ [0, 1]n | i∈[n] xi ≥ 12 }. Then (Pn )I = {x ∈ [0, 1]n | i∈[n] xi ≥ 1} =
∅. In [7, Section 3] it was shown that rkGC (Pn ) = 1, but rkN0 (Pn ) = n.
We can also derive a slightly weaker relation between the rank of matrix cuts,
split cuts, and other admissible cutting-plane procedures. First we will establish
lower bounds for the rank of Bn . The following result was provided in [7, Lemma
3.3] for matrix cuts and in [8, Lemma 6] for split cuts.
Lemma 12. Let P ⊆ [0, 1]n be a polytope and let Fk ⊆ P . Then Fk+1 ⊆ N+ (P )
and Fk+1 ⊆ SC(P ).
This yields:
Lemma 13. Let M ∈ {N0 , N, N+ , SC}. Then rkM (Bn ) = n − 1.
Proof. As Bn = conv(F2 ), Lemma 12 implies that Fn ⊆ M (n−2) (Bn ), and thus
rk(Bn ) ≥ n − 1. Together with Lemma 7 it follows that rk(Bn ) = n − 1. 

We also obtain the following corollary that shows that the M -rank with M ∈
{N0 , N, N+ , SC} is at least n − 1 whenever it is n with respect to any other
admissible cutting-plane procedure.
Corollary 14. Let L be an admissible cutting-plane procedure, let M ∈
{N0 , N, N+ , SC}, and let P ⊆ [0, 1]n be a polytope with PI = ∅ and rkL (P ) = n.
Then rkM (P ) ≥ n − 1 and, if P is half-integral, then rkM (P ) = n.
Proof. If rkL (P ) = n, then Bn ⊆ P by Theorem 10 and rkM (Bn ) = n − 1
by Lemma 13. So the first part follows from Lemma 3. In order to prove the
second part observe that P ∩ F = ∅ for all one-dimensional faces F of [0, 1]n .
Thus P ∩ F ∼ = F2 for all two-dimensional faces F of [0, 1]n and, by Lemma
12, M (P ) ∩ F = { 12 eF }. Therefore Bn ⊆ M (P ). The claim now follows from
Lemma 13. 

We will now consider the case where P ⊆ [0, 1]n is half-integral with PI = ∅ in
detail. The polytope An ⊆ [0, 1]n is defined by
   1 
An := x ∈ [0, 1]n | xi + (1 − xi ) ≥ for all S ⊆ [n] .
2
i∈S i∈[n]\S
On the Rank of Cutting-Plane Proof Systems 457

Lemma 15. Let P ⊆ [0, 1]n be a half-integral polytope with PI = ∅. Then


rkM (P ) = n if and only if P = An ,
with M ∈ {GC, N0 , N, N+ , SC}.
Proof. It suffices to show that P = An if and only if rkGC (P ) = n. Let rkGC (P ) =
n. By Theorem 10 we have that P ∩ F = ∅ for all one-dimensional faces F of
[0, 1]n and as PI = ∅ and P half-integral, it follows that P ∩ F = { 21 eF } and
P = An . For the other direction, observe that if P = An , then P ∩ F = ∅
for all one-dimensional faces F of [0, 1]n and, by Theorem 9, we therefore have
rkGC (P ) = n. 

Hence, in the case of half-integral polytopes without integral points there is
exactly one polytope that realizes the maximal rank for the classical operators.
Combining Corollary 11 and Lemma 15, we obtain:
Corollary 16. Let P ⊆ [0, 1]n be a half-integral polytope with PI = ∅, and let
M be an admissible cutting-plane procedure. Then rkM (P ) = n implies P = An .
For half-integral polytopes P ⊆ [0, 1]2 with PI = ∅ the matrix cut operators, the
split cut operator, and the Gomory-Chvátal procedure are actually identical.
Lemma 17. Let P ⊆ [0, 1]2 be a half-integral polytope with PI = ∅. Then
N0 (P ) = N (P ) = N+ (P ) = SC(P ) = P  .
Note that Lemma 15 and Lemma 17 are in strong contrast to the case where
P ⊆ [0, 1]n is a half-integral polytope with  PI = ∅: In1the remark after Corollary
11, the polytope Pn = {x ∈ [0, 1]n | i∈[n] xi ≥ 2 } has rkGC (Pn ) = 1 but
rkN0 (Pn ) = n as shown in [7, Theorem 3.1]. On the other hand, the polytope
P = conv({(0, 0), (1, 0), ( 21 , 1)}) ⊆ [0, 1]2 has rkN0 (P ) = 1, but rkGC (P ) = 2 as
P is half-integral and 12 e ∈ Int(P ).

4 A Universal Lower Bound


We now establish a universal lower bound on the rank of admissible cutting-
plane procedures. Our approach makes use of inequalities as certificates for non-
membership:
Definition 18. Let cx ≤ δ with c ∈ Zn and δ ∈ Z be an inequality. The violation
set V (c, δ) := {x ∈ {0, 1}n : cx > δ} is the set of 0/1 points for which cx ≤ δ
serves as a certificate of infeasibility.
The following observation is an essential building block in establishing the lower
bound:
Lemma 19. Let M be an admissible cutting-plane procedure, and let P ⊆ [0, 1]n
be a polytope. Let cx ≤ δ with c ∈ Zn and δ ∈ Z be a valid inequality for M (P )
whose certificate of M (P )-validity depends only on {ci x ≤ δi : i ∈ I}, where I is
an index set and/ci x ≤ δi with ci ∈ Zn and δi ∈ Z is valid for P , for all i ∈ I.
Then V (c, δ) ⊆ i∈I V (ci , δi ).
458 S. Pokutta and A.S. Schulz

B x0 ∈ {0, 1} such that


n
Proof. The proof
/ is by contradiction. Suppose there is
x0 ∈ V (c, δ) \ i∈I V (ci , δi ). We define Q := [0, 1] ∩ i∈I {x : ci x ≤ δi }. Note
n

that x0 ∈ QI . On the other hand, by Property (6) of Definition 1, cx ≤ δ is valid


for M (Q) as well. Thus x0 ∈ M (Q) as cx ≤ δ is valid for M (Q) and x0 ∈ V (c, δ).
But then QI ⊆ M (Q) and therefore M is not admissible, a contradiction. 


This lemma can be interpreted as follows: Together, a set of inequalities ci x ≤ δi


certifies that a certain set of 0/1 points is not contained in P . The cutting plane
procedure combines these inequalities into a new one, cx ≤ δ, that certifies that
a (hopefully large) subset of the set of 0/1 points is not contained in P . The fact
that we will exploit in order to establish a lower bound is that an admissible
cutting-plane procedure can access at most a polynomial number of inequalities
in the derivation of a single new inequality. If we now had a polytope P ⊆ [0, 1]n
with |V (a, β)| small for all inequalities ax ≤ β in a linear description of P , we
could estimate how many rounds it takes to generate an inequality cx ≤ δ so
that V (c, δ) = {0, 1}n. The following observation characterizes PI = ∅ in terms
of a violation set V (c, δ).

Lemma 20. Let P ⊆ [0, 1]n be a polytope. Then PI = ∅ if and only if there
exists an inequality cx ≤ δ valid for PI with V (c, δ) = {0, 1}n.

Proof. Clearly, if there exists an inequality cx ≤ δ valid for PI with V (c, δ) =


{0, 1}n, then PI = ∅. For the other direction, ex ≤ −1 is valid for PI = ∅, and
V (e, −1) = {0, 1}n. 


Next we establish an upper bound on the growth of the size of V (c, δ).

Lemma 21. Let M be admissible with verification degree p(n). Further, let
P = {x : Ax ≤ b} ⊆ [0, 1]n be a polytope with PI = ∅ and define k :=
maxi∈[m] |V (ai , bi )|. If cx ≤ δ has been derived by M from Ax ≤ b within

rounds, then |V (c, δ)| ≤ p(n) k.

Proof. The proof is by induction on the number


of rounds. For
= 1, cx ≤ δ
can be derived with the help of at most p(n) inequalities {ai x ≤ bi/ } from the
original system Ax ≤b. By Lemma 19, it follows that V (c, δ) ⊆ i V (ai , bi )
and thus |V (c, δ)| ≤ i |V (ai , bi )| ≤ p(n)k. Now consider the case
> 1. The
derivation of cx ≤ δ involves at most p(n) inequalities {ci x ≤ δi } each of which
/ been derived in at most
− 1 
has rounds. By Lemma 19, it follows that V (c, δ) ⊆
i V (ci , δi ) and thus |V (c, δ)| ≤ i |V (ci , δi )| ≤ p(n)(p(n) k) ≤ p(n) k. 

−1

We are ready to prove a universal lower bound on the rank of admissible cutting-
plane procedures:

Theorem 22. Let k ∈ Z+ be fixed, and let M be admissible with verification


degree p(n). Further, let P = {x : Ax ≤ b} ⊆ [0, 1]n be a polytope with PI = ∅
such that P ∩F = ∅ for all k-dimensional faces F of [0, 1]n . Then, for all n ≥ 2k,
rk(P ) ∈ Ω(n/ log n).
On the Rank of Cutting-Plane Proof Systems 459

Proof. We will first show that if P ∩F = ∅ for all k-dimensional faces F of [0, 1]n
and cx ≤ δ is a valid inequality for P , then cx ≤ δ can cut off at most (2n)k
0/1 points, i.e., |V (c, δ)| ≤ (2n)k . Without loss of generality, we may assume
that c ≥ 0 and that ci ≥ cj whenever i ≤ j; otherwise we can apply coordinate
j
flips and variable permutations. Define l := min{j ∈ [n] : i=1 ci > δ}. Suppose
Bn−k
l ≤ n − k. Define F := i=1 {xi = 1}. Observe that dim(F ) = k and cx > δ
for all x ∈ F as l ≤ n − k. Thus P ∩ F = ∅, which contradicts our assumption
that P ∩ F = ∅ for all k-dimensional faces F of [0, 1]n . Therefore, k ≥ n − l + 1.
By the choice of l, every 0/1 point x0 cut off by cx ≤ δ has to have at least
l coordinates equal to 1. The number ζ of 0/1 points of dimension n with this
property is bounded by
     
n n n
ζ ≤ 2n−l ≤ 2k ≤ 2k ≤ 2k nk ≤ (2n)k .
l n−l k−1

Note that the third inequality holds as k ≤ n/2, by assumption. It follows that
|V (c, δ)| ≤ (2n)k .
As we have seen, any inequality πx ≤ π0 valid for P can cut off at most (2n)k
0/1-points. In order to prove that PI = ∅, we have to derive an infeasibility
certificate cx ≤ δ with V (c, δ) = {0, 1}n, by Lemma 20. Thus, |V (c, δ)| = 2n is
a necessary condition for cx ≤ δ to be such a certificate. If cx ≤ δ is derived
in
rounds by M from Ax ≤ b then, by Lemma 21, we have that |V (c, δ)| ≤
p(n) (2n)k . Hence,
∈ Ω(n/ log n) and, therefore, rk(P ) ∈ Ω(n/ log n). 


Note that the result can be easily generalized to non-fixed k if k is growing


slowly enough as a function of n, e.g., k ∈ O(log n). Theorem 22 implies that, in
contrast to the case where PI = ∅, when dealing with polytopes with PI = ∅, the
property of having high/maximal rank is universal, i.e., it is a property of the
polytope and not the particular cutting-plane procedure used. We immediately
obtain the following corollary:

Corollary 23. Let M be admissible. Then rk(Bn ) ∈ Ω(n/ log n) and rk(An ) ∈
Ω(n/ log n).

Proof. It is sufficient to observe that An ∩ F = ∅ for all one-dimensional faces


F of [0, 1]n and Bn ∩ F = ∅ for all two-dimensional faces F of [0, 1]n . The claim
then follows from Theorem 22. 


For k ∈ N, it is also easy to see that 12 e ∈ M k (Bn ) whenever M k (Bn ) = ∅.


This is because Bn is symmetric with respect to coordinate flips and thus 12 e is
obtained by averaging over all points in M k (Bn ). The next corollary relates all
cutting plane procedures in terms of maximal rank:

Corollary 24. Let P ⊆ [0, 1]n be a polytope with PI = ∅ and let L, M be two
admissible cutting-plane procedures. If rkL (P ) = n, then rkM (P ) ∈ Ω(n/ log n).
460 S. Pokutta and A.S. Schulz

Proof. If rkL (P ) = n, then, by Theorem 10, we have that P ∩ F = ∅ for all


one-dimensional faces F of [0, 1]n . The claim now follows from Theorem 22. 


In this sense, modulo log-factors, all admissible cutting-plane procedures are of


similar strength, at least as far as proving 0/1-infeasibility of a system of linear
inequalities is concerned.

5 A Rank Optimal Cutting-Plane Procedure

Traditional convexification procedures such as Gomory-Chvátal or lift-and-project


have worst-case rank n, and thus one might wonder if the lower bound of
Ω(n/ log n) in Theorem 22 is tight. We will now construct a new, admissi-
ble cutting-plane procedure that is asymptotically optimal with respect to this
bound.

Definition 25. Let P ⊆ [0, 1]n be a polytope. The cutting-plane procedure “+”
is defined as follows. Let J˜ ⊆ [n] with |J|˜ ≤ log n and let I ⊆ I˜ ⊆ [n] with
I ∩ J = ∅. If there exists  > 0 such that
˜ ˜
   
xi + (1 − xi ) + xi + (1 − xi ) ≥ 
i∈I ˜
i∈I\I i∈J i∈J˜\J

˜ then we add the inequality  xi + ˜ (1−xi ) ≥


is valid for P for all J ⊆ J, i∈I i∈I\I
1, and we call this inequality a “+-cut.” Furthermore, we define P + to be the
set of points in P that satisfy all +-cuts.

Let us first prove that +-cuts are indeed valid; i.e., they do not cut off any integer
points contained in P . At the same time, the proof of the following lemma helps
to establish that the +-operator satisfies Property (6) of Definition 1.

Lemma 26. Let P ⊆ [0, 1]n be a polytope. Every +-cut is valid for PI .

 For J ⊆  [n] with |J| ≤ log n, and I ⊆ I ⊆ [n] with I ∩ J = ∅,


Proof. ˜ ˜ ˜ ˜ ˜
let i∈I xi + i∈I\I ˜ (1 − xi ) ≥ 1 be the corresponding +-cut. Using Farkas’
Lemma
 and
 Carathéodory’s Theorem,
 wecan verify in polynomial time that
i∈I xi + ˜ (1 − xi ) +
i∈I\I i∈J xi + ˜ (1 − xi ) ≥  is valid for P
i∈J\J
for all J ⊆ J. ˜ Note that we have at most 2log n ∈ O(n) of these initial
inequalities.
Now we round up all right-hand sides to 1, which leaves us with inequalities
that are valid for PI . By induction, we can verify that
   
xi + (1 − xi ) + xi + (1 − xi ) ≥ 1
i∈I ˜
i∈I\I i∈J i∈J0 \J
On the Rank of Cutting-Plane Proof Systems 461

is valid for PI with J0 = J˜ \ {i0 }, i0 ∈ J,


˜ and J ⊆ J0 . For this, consider

    
1
xi + (1 − xi ) + xi + (1 − xi ) + xi0 ≥ 1
2
i∈I ˜
i∈I\I i∈J i∈J0 \J
    
1
+ xi + (1 − xi ) + xi + (1 − xi ) + (1 − xi0 ) ≥ 1
2
i∈I ˜
i∈I\I i∈J i∈J0 \J

    1
xi + (1 − xi ) + xi + (1 − xi ) ≥
2
i∈I ˜
i∈I\I i∈J i∈J0 \J

We can again round up the right-hand side and iteratively repeat this process
until |J0 | = 0. 


The “+”-operator is indeed admissible. In addition to (6), properties (1), (2),


(4), and (5) clearly hold. It remains to prove (3). Let F be a k-dimensional face
of [0, 1]n and let P ⊆ [0, 1]n be a polytope. Without loss of generality, we can
assume that F fixes the last  n − k coordinates
 to 0. Clearly, (P ∩ F )+ ⊆ P + ∩ F .
For the other direction, let i∈I xi + i∈I\I ˜ (1 − xi ) ≥ 1 be a +-cut valid for
(P ∩ F ) with I ⊆ I ⊆ [n]. Then there exists  > 0 such that
+ ˜
   
xi + (1 − xi ) + xi + (1 − xi ) ≥ 
i∈I ˜
i∈I\I i∈J i∈J˜\J

is valid for P ∩ F with J ⊆ J˜ ⊆ [n], |J|


˜ ≤ log n, and I˜ ∩ J˜ = ∅. By Farkas’
Lemma, there exists τ ≥ 1 such that
    
xi + (1 − xi ) + xi + (1 − xi ) + τ ( xi ) ≥ 
i∈I ˜
i∈I\I i∈J ˜
i∈J\J k+1≤i≤n

with J ⊆ J˜ ⊆ [n], |J|


˜ ≤ log n and I˜ ∩ J˜ = ∅ is valid for P . Hence, so is the
weaker inequality
     
xi + (1 − xi ) + xi + (1 − xi ) + xi ≥ .
˜
τ
i∈I i∈I\I ˜ i∈J i∈J \J k+1≤i≤n

By Definition 25,
  
xi + (1 − xi ) + xi ≥ 1
i∈I ˜
i∈I\I k+1≤i≤n

is valid for P + . Restricting it to the face F , we get that


 
xi + (1 − xi ) ≥ 1
i∈I ˜
i∈I\I

is valid for P + ∩ F ; thus, property (3) holds.


462 S. Pokutta and A.S. Schulz

In the following we will show that, for any given polytope P ⊆ [0, 1]n with
PI = ∅, rk+ (P ) ∈ O(n/ log n). This is a direct consequence of the following
lemma; we use P (k) to denote the k-th closure of the “+” operator.
 
Lemma 27. Let P ⊆ [0, 1]n be a polytope with PI = ∅. Then i∈I xi + i∈I\I ˜ (1−
xi ) ≥ 1 with I ⊆ I ⊆ [n], |I| ≥ n − klog n is valid for P
˜ ˜ (k+1)
.

Proof. The proofis by induction


 on k. Let k = 0. As PI = ∅, there exists
˜ (1 − xi ) ≥  is valid for P for all I ⊆ I =
 > 0 such that i∈I xi + i∈I\I ˜
 
˜ (1 − xi ) ≥ 1 is valid for P for all I ⊆ I˜ = [n].
+
[n]. Thus i∈I xi + i∈I\I
 
Consider now k ≥ 1. Then i∈I xi + i∈I\I ˜ (1 − xi ) ≥ 1 is valid for P
(k)
for
all I ⊆ I˜ ⊆ [n] with |I|
˜ ≥ n − (k − 1)log n. Now consider I ⊆ I˜ ⊆ [n] with
n − klog n ≤ |I|˜ < n − (k − 1)log n. Pick J˜ ⊆ [n] such that |J| ˜ ≤ log n,
˜ ∩ J˜ = ∅, and |I˜ ∪ J|
I ˜ ≥ n − (k − 1)log n. Then for all J ⊆ J˜ we have that
  
˜ (1 − xi ) + ˜ (1 − xi ) ≥ 1 is valid for P
(k)
i∈I xi + i∈I\I i∈J xi + i∈J\J 
by
induction hypothesis. We may conclude that i∈I xi + i∈I\I ˜ (1 − x i ) ≥ 1 is
valid for P (k+1)
by Definition 25. 


We are ready to establish an upper bound on the rank of the “+” operator:
Theorem 28. Let P ⊆ [0, 1]n be a polytope with PI = ∅. Then rk+ (P ) ∈
O(n/ log n).

Proof. It suffices
 to derive
 the inequalities xi ≥ 1 and xi ≤ 0. By Lemma 27 we
˜ (1 − xi ) ≥ 1 with I ⊆ I ⊆ [n], |I| ≥ n − klog n is
have that i∈I xi + i∈I\I ˜ ˜
 
˜ (1 − xi ) ≥ 1 with I ⊆ I = {i} is valid for
valid for P (k+1) . Thus i∈I xi + i∈I\I ˜
k ≥ (n − 1)/log n. Observe that for I = {i} and I = ∅ we obtain that xi ≥ 1
and xi ≤ 0 are valid for P (k+1) , respectively. 


References
[1] Balas, E., Ceria, S., Cornuejols, G.: A lift-and-project cutting plane algorithm for
mixed 0-1 programs. Mathematical Programming 58, 295–324 (1993)
[2] Bockmayr, A., Eisenbrand, F., Hartmann, M., Schulz, A.: On the Chvátal rank of
polytopes in the 0/1 cube. Discrete Applied Mathematics 98, 21–27 (1999)
[3] Bonet, M., Pitassi, T., Raz, R.: Lower bounds for cutting planes proofs with small
coefficients. In: Proceedings of the 27th Annual ACM Symposium on Theory of
Computing, pp. 575–584 (1995)
[4] Charikar, M., Makarychev, K., Makarychev, Y.: Integrality gaps for Sherali-Adams
relaxations. In: Proceedings of the 41st Annual ACM Symposium on Theory of
Computing, pp. 283–292 (2009)
[5] Chvátal, V.: Edmonds polytopes and a hierarchy of combinatorial problems. Dis-
crete Mathematics 4, 305–337 (1973)
[6] Chvátal, V., Cook, W., Hartmann, M.: On cutting-plane proofs in combinatorial
optimization. Linear algebra and its applications 114, 455–499 (1989)
[7] Cook, W., Dash, S.: On the matrix-cut rank of polyhedra. Mathematics of Oper-
ations Research 26, 19–30 (2001)
On the Rank of Cutting-Plane Proof Systems 463

[8] Cornuéjols, G.: Valid inequalities for mixed integer linear programs. Mathematical
Programming 112, 3–44 (2008)
[9] Cornuejols, G., Li, Y.: Elementary closures for integer programs. Operations Re-
search Letters 28, 1–8 (2001)
[10] Cornuéjols, G., Li, Y.: A connection between cutting plane theory and the geom-
etry of numbers. Mathematical Programming 93, 123–127 (2002)
[11] Cornuéjols, G., Li, Y.: On the rank of mixed 0,1 polyhedra. Mathematical Pro-
gramming 91, 391–397 (2002)
[12] Dantchev, S.: Rank complexity gap for Lovász-Schrijver and Sherali-Adams proof
systems. In: Proceedings of the 39th Annual ACM Symposium on Theory of Com-
puting, pp. 311–317 (2007)
[13] Dash, S.: An exponential lower bound on the length of some classes of branch-
and-cut proofs. Mathematics of Operations Research 30, 678–700 (2005)
[14] Eisenbrand, F., Schulz, A.: Bounds on the Chvátal rank of polytopes in the 0/1-
cube. Combinatorica 23, 245–261 (2003)
[15] Georgiou, K., Magen, A., Pitassi, T., Tourlakis, I.: Integrality gaps of 2-o(1) for ver-
tex cover SDPs in the Lovász-Schrijver hierarchy. In: Proceedings of the 48th Annual
IEEE Symposium on Foundations of Computer Science, pp. 702–712 (2007)
[16] Goemans, M., Tuncel, L.: When does the positive semidefiniteness constraint help
in lifting procedures? Mathematics of Operations Research 26, 796–815 (2001)
[17] Gomory, R.: Outline of an algorithm for integer solutions to linear programs.
Bulletin of the American Mathematical Society 64, 275–278 (1958)
[18] Gomory, R.: Solving linear programming problems in integers. In: Bellman, R.,
Hall, M. (eds.) Proceedings of Symposia in Applied Mathematics X, pp. 211–215.
American Mathematical Society, Providence (1960)
[19] Gomory, R.: An algorithm for integer solutions to linear programs. In: Recent Ad-
vances in Mathematical Programming, pp. 269–302. McGraw-Hill, New York (1963)
[20] Lasserre, J.: An explicit exact SDP relaxation for nonlinear 0-1 programs. In:
Aardal, K., Gerards, B. (eds.) IPCO 2001. LNCS, vol. 2081, pp. 293–303. Springer,
Heidelberg (2001)
[21] Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0-1 optimization.
SIAM Journal on Optimization 1, 166–190 (1991)
[22] Mathieu, C., Sinclair, A.: Sherali-Adams relaxations of the matching polytope. In:
Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp.
293–302 (2009)
[23] Pokutta, S., Schulz, A.: A note on 0/1 polytopes without integral points and with
maximal rank (2009) (preprint)
[24] Pokutta, S., Schulz, A.: On the connection of the Sherali-Adams closure and border
bases (2009) (submitted)
[25] Pudlak, P.: On the complexity of propositional calculus. In: Sets and Proofs, In-
vited Papers from Logic Colloquium ’97, pp. 197–218. Cambridge University Press,
Cambridge (1999)
[26] Schoenebeck, G.: Linear level Lasserre lower bounds for certain k-CSPs. In: Pro-
ceedings of the 49th Annual IEEE Symposium on Foundations of Computer Sci-
ence, pp. 593–602 (2008)
[27] Schoenebeck, G., Trevisan, L., Tulsiani, M.: Tight integrality gaps for Lovász-
Schrijver LP relaxations of vertex cover and max cut. In: Proceedings of the 39th
Annual ACM Symposium on Theory of Computing, pp. 302–310 (2007)
[28] Sherali, H., Adams, W.: A hierarchy of relaxations between the continuous and
convex representations for zero-one programming problems. SIAM Journal on Dis-
crete Mathematics 3, 411–430 (1990)
Author Index

Aggarwal, Ankit 149 Jain, Kamal 163


Amaldi, Edoardo 397 Jain, Surabhi 149
Anand, L. 149 Jansen, Klaus 438
Andersen, Kent 57 Johnson, Ellis L. 124
Joret, Gwenaël 191
Bansal, Manisha 149
Bansal, Nikhil 110, 369 Kaibel, Volker 135, 177
Basu, Amitabh 85 Khandekar, Rohit 71, 110
Benabbas, Siavosh 299 Könemann, Jochen 110, 355, 383
Bérczi, Kristóf 43 Köppe, Matthias 219
Bhaskar, Umang 313 Kortsarz, Guy 71
Bienstock, Daniel 1, 29 Korula, Nitish 369
Bley, Andreas 205
Lau, Lap Chi 96
Boros, Endre 341
Letchford, Adam N. 258
Buchbinder, Niv 163
Levin, Asaf 230
Buchheim, Christoph 285
Lodi, Andrea 285, 424
Byrka, Jaroslaw 244
Loos, Andreas 177
Luedtke, James 271
Campelo, Manoel 85
Caprara, Alberto 285 Magen, Avner 299
Chakrabarty, Deeparnab 355, 383 Makino, Kazuhisa 341
Conforti, Michele 85 Marchetti-Spaccamela, Alberto 230
Cornuéjols, Gérard 85 Megow, Nicole 230
Mestre, Julián 230
Dey, Santanu S. 327, 424
Nagarajan, Viswanath 110, 369
Elbassioni, Khaled 341 Neto, Jose 205
Epstein, Leah 230 Nutov, Zeev 71

Fiorini, Samuel 191 Pashkovich, Kanstantsin 135


Fleischer, Lisa 313 Peis, Britta 110
Fukunaga, Takuro 15 Pietropaoli, Ugo 191
Pokutta, Sebastian 450
Pritchard, David 383
Garg, Naveen 149
Grant, Elyot 355 Rizzi, Romeo 397
Gupta, Neelima 149
Gupta, Shubham 149 Schulz, Andreas S. 450
Gurvich, Vladimir 341 Singh, Mohit 163
Sitters, René 411
Hajiaghayi, MohammadTaghi 71 Skutella, Martin 230
Hemmecke, Raymond 219 Solis-Oba, Roberto 438
Huang, Chien-Chung 313 Srinivasan, Aravind 244, 369
Stougie, Leen 230
Iuliano, Claudio 397 Swamy, Chaitanya 244
466 Author Index

Theis, Dirk Oliver 135 Weismantel, Robert 57, 219


Tramontani, Andrea 424 Wolsey, Laurence A. 424
Tyber, Steve 124
Yung, Chun Kong 96

Végh, László A. 43 Zambelli, Giacomo 85


Vielma, Juan Pablo 327 Zuckerberg, Mark 1

You might also like