Lecture Notes in Computer Science 1610: Edited by G. Goos, J. Hartmanis and J. Van Leeuwen

Download as pdf or txt
Download as pdf or txt
You are on page 1of 462

Lecture Notes in Computer Science 1610

Edited by G. Goos, J. Hartmanis and J. van Leeuwen


3
Berlin
Heidelberg
New York
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Gérard Cornuéjols Rainer E. Burkard
Gerhard J. Woeginger (Eds.)

Integer Programming
and Combinatorial
Optimization
7th International IPCO Conference
Graz, Austria, June 9-11, 1999
Proceedings

13
Series Editors
Gerhard Goos, Karlsruhe University, Germany
Juris Hartmanis, Cornell University, NY, USA
Jan van Leeuwen, Utrecht University, The Netherlands

Volume Editors

Gérard Cornuéjols
GSIA, Carnegie Mellon University
Schenley Park, Pittsburgh, PA 15213, USA
E-mail: [email protected]

Rainer E. Burkard
Gerhard J. Woeginger
Institut für Mathematik, Technische Universität Graz
Steyrergasse 30, A-8010 Graz, Austria
E-mail: {burkard,gwoegi}@opt.math.tu-graz.ac.at

Cataloging-in-Publication data applied for

Die Deutsche Bibliothek - CIP-Einheitsaufnahme


Integer programming and combinatorial optimization :
proceedings / 7th International IPCO Conference, Graz, Austria, June
9 - 11, 1999. Gérard Cornuéjols . . . (ed.). - Berlin ; Heidelberg ; New
York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Singapore ;
Tokyo : Springer, 1999
(Lecture notes in computer science ; Vol. 1610)
ISBN 3-540-66019-4

CR Subject Classification (1998): G.1.6, G.2.1, F.2.2

ISSN 0302-9743
ISBN 3-540-66019-4 Springer-Verlag Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are
liable for prosecution under the German Copyright Law.

c Springer-Verlag Berlin Heidelberg 1999
Printed in Germany
Typesetting: Camera-ready by author
SPIN: 10705123 06/3142 – 5 4 3 2 1 0 Printed on acid-free paper
Preface

This volume contains the papers selected for presentation at IPCO VII, the
Seventh Conference on Integer Programming and Combinatorial Optimization,
Graz, Austria, June 9–11, 1999. This meeting is a forum for researchers and prac-
titioners working on various aspects of integer programming and combinatorial
optimization. The aim is to present recent developments in theory, computa-
tion, and applications of integer programming and combinatorial optimization.
Topics include, but are not limited to: approximation algorithms, branch and
bound algorithms, computational biology, computational complexity, computa-
tional geometry, cutting plane algorithms, diophantine equations, geometry of
numbers, graph and network algorithms, integer programming, matroids and
submodular functions, on-line algorithms, polyhedral combinatorics, scheduling
theory and algorithms, and semidefinite programs.
IPCO was established in 1988 when the first IPCO program committee was
formed. IPCO I took place in Waterloo (Canada) in 1990, IPCO II was held in
Pittsburgh (USA) in 1992, IPCO III in Erice (Italy) 1993, IPCO IV in Copen-
hagen (Denmark) 1995, IPCO V in Vancouver (Canada) 1996, and IPCO VI in
Houston (USA) 1998. IPCO is held every year in which no MPS (Mathematical
Programming Society) International Symposium takes place: 1990, 1992, 1993,
1995, 1996, 1998, 1999, 2001, 2002, 2004, 2005, 2007, 2008, . . . . . . Since the MPS
meeting is triennial, IPCO conferences are held twice in every three-year period.
As a rule, in even years IPCO is held somewhere in Northern America, and in
odd years it is held somewhere in Europe.
In response to the call for papers for IPCO’99, the program committee re-
ceived 99 submissions, indicating a strong and growing interest in the conference.
The program committee met on January 10 and January 11, 1999, in Oberwol-
fach (Germany) and selected 33 contributed papers for inclusion in the scientific
program of IPCO’99. The selection was based on originality and quality, and
reflects many of the current directions in integer programming and optimization
research. The overall quality of the submissions was extremely high. As a result,
many excellent papers could not be chosen.
We thank all the referees who helped us in evaluating the submitted papers:
Karen Aardal, Norbert Ascheuer, Peter Auer, Imre Bárány, Therese Biedl, Hans
Bodlaender, Andreas Brandstädt, dan brown, Peter Brucker, Alberto Caprara,
Eranda Çela, Sebastian Ceria, Chandra Chekuri, Joseph Cheriyan, Fabian Chu-
dak, William H. Cunningham, Jesus De Loura, Friedrich Eisenbrand, Matteo
Fischetti, Michel Goemans, Albert Gräf, Jens Gustedt, Leslie Hall, Christoph
Helmberg, Winfried Hochstättler, Stan van Hoesel, Han Hoogeveen, Mark Jer-
rum, Olaf Jahn, Michael Jünger, Howard Karloff, Samir Khuller, Bettina Klinz,
Dieter Kratsch, Monique Laurent, Jan Karel Lenstra, Martin Loebl, Alexander
Martin, Ross McConnell, S. Tom McCormick, Petra Mutzel, Michael Naatz, Karl
Nachtigall, John Noga, Andreas Nolte, Alessandro Panconesi, Chris Potts, Mau-
VI Preface

rice Queyranne, Jörg Rambau, R. Ravi, Gerhard Reinelt, Franz Rendl, Günter
Rote, Juan José Salazar, Rüdiger Schultz, Andreas S. Schulz, Petra Schuurman,
András Sebő, Jay Sethuraman, Martin Skutella, Frits Spieksma, Angelika Steger,
Cliff Stein, Mechthild Stoer, Frederik Stork, Leen Stougie, Éva Tardos, Gottfried
Tinhofer, Zsolt Tuza, Marc Uetz, Vijay Vazirani, Albert Wagelmans, Dorothea
Wagner, Robert Weismantel, David Williamson, Laurence Wolsey, Günter M.
Ziegler, and Uwe Zimmermann. This list of referees is as complete as we could
make it, and we apologize for any omissions or errors.
The organizing committee for IPCO’99 essentially consisted of Eranda Çela,
Bettina Klinz, and Gerhard Woeginger. IPCO’99 was conducted in coopera-
tion with the Mathematical Programming Society (MPS), and it was sponsored
by the Austrian Ministry of Science, by Graz University of Technology, by the
Province of Styria, and by the City of Graz.

March 1999 Gérard Cornuéjols


Rainer E. Burkard
Gerhard J. Woeginger

IPCO VII Program Committee


Gérard Cornuéjols (Chair), Carnegie Mellon University
Rainer E. Burkard, TU Graz
Ravi Kannan, Yale University
Rolf H. Moehring, TU Berlin
Manfred Padberg, New York University
David B. Shmoys, Cornell University
Paolo Toth, University of Bologna
Gerhard J. Woeginger, TU Graz
Table of Contents

Market Split and Basis Reduction: Towards a Solution of


the Cornuéjols-Dawande Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
K. Aardal, R.E. Bixby, C.A.J. Hurkens, A.K. Lenstra, and
J.W. Smeltink

Approximation Algorithms for Maximum Coverage and Max Cut


with Given Sizes of Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A.A. Ageev and M.I. Sviridenko

Solving the Convex Cost Integer Dual Network Flow Problem . . . . . . . . . . . . . . 31


R.K. Ahuja, D.S. Hochbaum, and J.B. Orlin

Some Structural and Algorithmic Properties of the Maximum


Feasible Subsystem Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
E. Amaldi, M.E. Pfetsch, and L.E. Trotter, Jr.

Valid Inequalities for Problems with Additive Variable Upper Bounds . . . . . . 60


A. Atamtürk, G.L. Nemhauser, and M.W.P. Savelsbergh

A Min-Max Theorem on Feedback Vertex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73


M. Cai, X. Deng, and W. Zang

On the Separation of Maximally Violated mod-k Cuts . . . . . . . . . . . . . . . . . . . . . 87


A. Caprara, M. Fischetti, and A.N. Letchford

Improved Approximation Algorithms for Capacitated Facility


Location Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
F.A. Chudak and D.P. Williamson

Optimal 3-Terminal Cuts and Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 114


W.H. Cunningham and L. Tang

Semidefinite Programming Methods for the Symmetric Traveling


Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
D. Cvetković, M. Čangalović, and V. Kovačević-Vujčić

Bounds on the Chvátal Rank of Polytopes in the 0/1-Cube . . . . . . . . . . . . . . . . 137


F. Eisenbrand and A.S. Schulz

Universally Maximum Flow with Piecewise-Constant Capacities . . . . . . . . . . 151


L. Fleischer
VIII Table of Contents

Critical Extreme Points of the 2-Edge Connected Spannning


Subgraph Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
J. Fonlupt and A.R. Mahjoub

An Orientation Theorem with Parity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 183


A. Frank, T. Jordán, and Z. Szigeti

Parity Constrained k-Edge-Connected Orientations . . . . . . . . . . . . . . . . . . . . . . . 191


A. Frank and Z. Király

Approximation Algorithms for MAX 4-SAT and Rounding


Procedures for Semidefinite Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
E. Halperin and U. Zwick

On the Chvátal Rank of Certain Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218


M. Hartmann, M. Queyranne, and Y. Wang

The Square-Free 2-Factor Problem in Bipartite Graphs . . . . . . . . . . . . . . . . . . . 234


D. Hartvigsen

The m-Cost ATSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242


C. Helmberg

A Strongly Polynomial Cut Canceling Algorithm for the


Submodular Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
S. Iwata, S.T. McCormick, and M. Shigeno

Edge-Splitting Problems with Demands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273


T. Jordán

Integral Polyhedra Associated with Certain Submodular Functions


Defined on 012-Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
K. Kashiwabara, M. Nakamura, and T. Takabatake

Optimal Compaction of Orthogonal Grid Drawings . . . . . . . . . . . . . . . . . . . . . . . 304


G.W. Klau and P. Mutzel

On the Number of Iterations for Dantzig-Wolfe Optimization and


Packing-Covering Approximation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
P. Klein and N. Young

Experimental Evaluation of Approximation Algorithms for


Single-Source Unsplittable Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
S.G. Kolliopoulos and C. Stein
Table of Contents IX

Approximation Algorithms for a Directed Network Design Problem . . . . . . . 345


V. Melkonian and É. Tardos

Optimizing over All Combinatorial Embeddings of a Planar Graph . . . . . . . . 361


P. Mutzel and R. Weiskircher

A Fast Algorithm for Computing Minimum 3-Way and 4-Way Cuts . . . . . . 377
H. Nagamochi and T. Ibaraki

Scheduling Two Machines with Release Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391


J. Noga and S. Seiden

An Introduction to Empty Lattice Simplices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400


A. Sebő

On Optimal Ear-Decompositions of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415


Z. Szigeti

Gale-Shapley Stable Marriage Problem Revisited: Strategic Issues


and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
C.-P. Teo, J. Sethuraman, and W.-P. Tan

Vertex-Disjoint Packing of Two Steiner Trees: Polyhedra and


Branch-and-Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
E. Uchoa and M. Poggi de Aragão

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453


Market Split and Basis Reduction:
Towards a Solution of the
Cornuéjols-Dawande Instances

Karen Aardal?1 , Robert E. Bixby2 , Cor A.J. Hurkens??3 , Arjen K. Lenstra4 ,


and Job W. Smeltink? ? ?1
1
Department of Computer Science, Utrecht University
{aardal,job}@cs.uu.nl
2
Department of Computational and Applied Mathematics, Rice University
[email protected]
3
Department of Mathematics and Computing Science,
Eindhoven University of Technology
[email protected]
4
Emerging Technology, Citibank N.A.
[email protected]

Abstract. At the IPCO VI conference Cornuéjols and Dawande pro-


posed a set of 0-1 linear programming instances that proved to be very
hard to solve by traditional methods, and in particular by linear program-
ming based branch-and-bound. They offered these market split instances
as a challenge to the integer programming community. The market split
problem can be formulated as a system of linear diophantine equations
in 0-1 variables.
In our study we use the algorithm of Aardal, Hurkens, and Lenstra (1998)
based on lattice basis reduction. This algorithm is not restricted to deal
with market split instances only but is a general method for solving
systems of linear diophantine equations with bounds on the variables.
We show computational results from solving both feasibility and opti-
mization versions of the market split instances with up to 7 equations
and 60 variables, and discuss various branching strategies and their ef-
fect on the number of nodes enumerated. To our knowledge, the largest
feasibility and optimization instances solved before have 6 equations and
50 variables, and 4 equations and 30 variables respectively.
?
Research partially supported by the ESPRIT Long Term Research Project nr. 20244
(Project ALCOM-IT: Algorithms and Complexity in Information Technology), by
the project TMR-DONET nr. ERB FMRX-CT98-0202, both of the European Com-
munity, and by NSF through the Center for Research on Parallel Computation, Rice
University, under Cooperative Agreement No. CCR-9120008.
??
Research partially supported by the project TMR-DONET nr. ERB FMRX-CT98-
0202 of the European Community.
???
Research partially supported by the ESPRIT Long Term Research Project nr. 20244
(Project ALCOM-IT: Algorithms and Complexity in Information Technology), and
by the project TMR-DONET nr. ERB FMRX-CT98-0202, both of the European
Community.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 1–16, 1999.
c Springer-Verlag Berlin Heidelberg 1999
2 Karen Aardal et al.

We also present a probabilistic analysis describing how to compute the


probability of generating infeasible market split instances. The formula
used by Cornuéjols and Dawande tends to produce relatively many fea-
sible instances for sizes larger than 5 equations and 40 variables.

1 Introduction and Problem Description


The feasibility version of the market split problem is described as follows. A
company with two divisions supplies retailers with several products. The goal
is to allocate each retailer to one of the divisions such that division 1 controls
100ci %, 0 ≤ ci ≤ 1, of the market for product i, and division 2 controls (100 −
100ci )%. There are n retailers and m ≤ n products. Let aij be the demand
of retailer j for product i, and let di be determined as bci d0i c, where d0i is the
total amount of product i that is supplied to the retailers. The decision variable
xj takes value 1 if retailer j is allocated to division 1 and 0 otherwise. The
question is: “does there exist an allocation of the retailers to the divisions such
that the desired market split is obtained?” One can formulate this problem
mathematically as follows:

FP: does there exist a vector x ∈ ZZ n : Ax = d, 0 ≤ x ≤ 1? (1)


Pn
Let X = {x ∈ {0, 1}n : j=1 aij xj = di , 1 ≤ i ≤ m}. Problem FP is NP-hard
due to the bounds on the variables. The algorithm that we use was developed
for the more general problem:

does there exist a vector x ∈ ZZ n : Ax = d, l ≤ x ≤ u? (2)

We assume that A is an integral m × n matrix, where m ≤ n, d is an integral m-


vector, and l and u are integral n-vectors. We denote the ith row of the matrix
A by ai . Without loss of generality we assume that gcd(ai1 , ai2 , ..., ain ) = 1 for
1 ≤ i ≤ m, and that A has full row rank.
In the optimization version of the market split problem we want to find the
minimum slack, positive or negative, that needs to be added to the diophantine
equations in order to make the system feasible:
X
m
OPT: min{ (si + wi ) s.t. (x, s, w) ∈ X S }, (3)
i=1
Pn
where X S = {(x, s, w) : j=1 aij xj + si − wi = di , 1 ≤ i ≤ m, x ∈
{0, 1}n, s, w ∈ ZZ m
+ }.
The Cornuéjols-Dawande instances [2] of the market split problem were gen-
erated such that they were hard for linear programming based branch-and-
bound, and they appear to be hard for several other methods as well. The
input was generated as follows. Let n = 10(m − 1) and let the coefficients
aij be integer numbers drawn uniformly and independently from the interval
[0, D − 1], where D = 100. The right-hand-side coefficients are computed as
Market Split and Basis Reduction 3

P
di = b 12 nj=1 aij c, 1 ≤ i ≤ m. This corresponds to a market split where ci = 12
for 1 ≤ i ≤ m. Cornuéjols and Dawande [2] argued that with this choice of
data most of the instances of the feasibility problem (1) are infeasible, which
implies that the optimization variant (3) has an objective value greater than
zero. If branch-and-bound is used to solve OPT (3), then, due to the symmetry
of the input, the value of the LP-relaxation remains at zero even after many
variables have been fixed. A behaviour of branch-and-bound that was observed
by Cornuéjols and Dawande was that 2αn nodes needed to be evaluated, where
α typically takes values between 0.6 and 0.7.
The algorithm we use in our study is described briefly in Section 2. The
algorithm was developed by Aardal, Hurkens, and Lenstra [1] for solving a system
of linear diophantine equations with bounds on the variables, such as problem
(2), and is based on Lovász’ lattice basis reduction algorithm as described by
Lenstra, Lenstra, and Lovász [6]. Aardal et al. motivate their choice of basis
reduction as the main ingredient of their algorithm by arguing that one can
interpret problem (2) as checking whether there exists a short integral vector
satisfying the system Ax = d. Given the lattice, the basis reduction algorithm
finds a basis spanning that lattice such that the basis consists of short, nearly
orthogonal vectors. Hence, a lattice is chosen that seems particularly useful for
problem (2). An initial basis that spans the given lattice is derived, and the
basis reduction algorithm is applied to this basis. The parameters of the initial
basis are chosen such that the reduced basis contains one vector xd satisfying
Axd = d, and n− m linearly independent vectors x0 satisfying Ax0 = 0. Due to
the basis reduction algorithm all these vectors are relatively short. If the vector
xd satisfies the bounds, then the algorithm terminates, and if not, one observes
that A(xd + λx0 ) = d for any integer multiplier λ and any vector x0 such that
Ax0 = 0. Hence, one can branch on integer linear combinations of vectors x0
satisfying Ax0 = 0 in order to obtain either a vector satisfying the diophantine
equations as well as the lower and upper bounds, or a proof that no such vector
exists.
In our computational study we solve both feasibility and optimization ver-
sions of the market split problem. The optimization version can be solved by
a slightly adapted version of the Aardal-Hurkens-Lenstra algorithm. We have
solved instances with up to 7 equations and 60 variables. To our knowledge,
the largest feasibility instances solved so far had 6 constraints and 50 variables,
and the largest optimization instances had 4 constraints and 30 variables. These
results were reported by Cornuéjols and Dawande [2]. Our computational expe-
rience is presented in Section 3.
When performing the computational study we observed that the larger the
instances became, the more often feasible instances were generated. This mo-
tivated us to analyse the expected number of solutions for instances generated
according to Cornuéjols and Dawande. Our conclusion is that for a given value
of m > 4, one needs to generate slightly less variables than is given by the ex-
pression n = 10(m − 1) (keeping all other parameters the same). We present our
analysis together with numerical support in Section 4.
4 Karen Aardal et al.

2 An Outline of the Algorithm


Here we give a summary of the algorithm developed by Aardal, Hurkens, and
Lenstra [1] to solve problem (2). They also give a brief overview of the basis
reduction algorithm and the use of basis reduction in integer programming. For a
detailed description of the basis reduction algorithm we refer to Lenstra, Lenstra,
and Lovász [6].
The main idea behind the algorithm is to use an integer relaxation of the set
X = {x ∈ ZZ n : Ax = d, l ≤ x ≤ u}. The relaxation that Aardal et al. consider
is XIR = {x ∈ ZZ n : Ax = d}. To determine whether XIR is empty can be done
in polynomial time. Aardal et al. rewrite the set XIR as follows:
XIR = {x ∈ ZZ n : x = xd + X 0 λ, λ ∈ ZZ n−m }, (4)
where xd ∈ ZZ satisfies Axd = d, and where X 0 is an integral n × (n − m)
n

matrix such that the columns xj0 , 1 ≤ j ≤ n − m, of X 0 are linearly independent


and such that each column xj0 satisfies Axj0 = 0. Note that reformulation (4) is
not unique. Expression (4) states that any integer vector x satisfying Ax = d
can be described as a vector xd , satisfying Axd = d, plus an integer linear
combination of vectors that form a basis for the lattice {x ∈ ZZ n : Ax = 0}. Of
course this observation also holds for the integer vectors x satisfying the bound
constraints, if such vectors exist. Aardal et al. argue that if we are able to find
a vector xd that is reasonably short, then this vector will hopefully satisfy the
bound constraints. If that is the case, one is done, and if not one needs to check
whether there exists an integer linear combination of the columns of X 0 , X 0 λ,
such that x = xd + X 0 λ satisfies the bounds.
One way of obtaining the vectors xd and xj0 , 1 ≤ j ≤ n − m, is by using the
Hermite normal form of the matrix A, see Schrijver [11], and Aardal et al. [1].
The numbers of the Hermite normal form, however, tend to be relatively large,
whereas we want the vector xd to contain small elements. Moreover having the
other numbers in the computation large may cause numerical problems depend-
ing on which branching strategy we apply. Aardal et al. therefore chose to use
lattice basis reduction to derive the vectors xd and xj0 , 1 ≤ j ≤ n − m.
Let I (p) denote the p-dimensional identity matrix, and let 0(p×q) denote
the (p × q)-matrix consisting of only zeros. Let N1 and N2 be positive in-
tegral numbers. Consider the following linearly independent column vectors
B = (bj )1≤j≤n+1 :
 (n) 
I 0(n×1)
B =  0(1×n) N1  . (5)
N2 A −N2 d

The vectors of B span the lattice L ⊂ IRn+m+1 that contains vectors of the form
 
x
(x , N1 y, N2 (a1 x − d1 y), . . . , N2 (am x − dm y)) = B
T T
, (6)
y
where y is a variable associated with the right-hand-side vector d.
Market Split and Basis Reduction 5

Proposition 1. The integer vector xd satisfies Axd = d if and only if the


vector
 
T (1×m) T xd
(xd , N1 , 0 ) =B (7)
1

belongs to the lattice L, and the integer vector x0 satisfies Ax0 = 0 if and only
if the vector
 
T (1×m) T x0
(x0 , 0, 0 ) =B (8)
0

belongs to the lattice L.


Aardal, Hurkens, and Lenstra [1] proved that if there exists an integer vector
x satisfying the system Ax = d, and if the numbers N1 and N2 are chosen
appropriately, i.e., large enough with respect to the input and relative to each
other, then the first n − m + 1 columns of the reduced basis that is obtained by
applying the basis reduction algorithm to B will be of the following form:
 (n×(n−m)) 
X0 xd
 0(1×(n−m)) N1  , (9)
(m×(n−m))
0 0
(n−m)
where X 0 = (x10 , . . . , x0 ). Due to Proposition 1 we can conclude that xd
satisfies Axd = d, and that Axj0 = 0, 1 ≤ j ≤ n − m.
Problem (2) can now be formulated equivalently as (cf. (4)):

does there exist a vector λ ∈ ZZ (n−m) s. t. l − xd ≤ X 0 λ ≤ u − xd ? (10)

Unless xd satisfies the bound constraints, one needs to branch on the variables
λj in order to check whether the polytope P = {λ ∈ ZZ (n−m) : l − xd ≤ X 0 λ ≤
u − xd } is empty. The basis reduction algorithm runs in polynomial time. If one
wants an overall algorithm that runs in polynomial time for a fixed number of
variables, one needs to apply the algorithms of H.W. Lenstra, Jr. [7] or of Lovász
and Scarf [9]. Otherwise, one can, for instance, apply integral branching on
the unit vectors in λ-space or linear programming based branch-and-bound. By
integral branching in the λ-space we mean the following. Assume we are at node
k of the search tree. Take any unit vector ej , i.e., the jth vector of the (n − m)-
dimensional identity matrix, that has not yet been considered at the predecessors
of k. Measure the width of the polytope P in this direction, i.e., determine
uk = max{eTj λ : λ ∈ P ∩ {λj ’s fixed at predecessors of k}} and lk = min{eTj λ :
λ ∈ P ∩{λj ’s fixed at predecessors of k}}. Create buk c−dlk e+ 1 subproblems at
node k of the search tree by fixing λj to the values dlk e, . . . , buk c. The question is
now in which order we should choose the unit vectors in our branching scheme.
One alternative is to just take them in any predetermined order, and another is
to determine, at node k, which unit vector yields the smallest value of buk c−dlk e.
This branching scheme is similar to the scheme proposed by Lovász and Scarf [9]
6 Karen Aardal et al.

except that we in general are not sure whether the number of branches created
at each level is bounded by some constant depending only on the dimension.
What we hope for, and what seems to be the case given our computational
results, is that buk c − dlk e is small at most nodes of the tree. A natural question
in the context of branching is whether we may hope that linear programming
based branch-and-bound is more efficient on the polytope P as compared to the
polytope X. As can be observed in Section 3 we typically reduce the number of
nodes if we branch on P instead of X by several orders of magnitude. One way
of explaining this is that we obtain a scaling effect by going from description (2)
of our problem to description (10), see Aardal et al. [1].

Table 1. Results for the feasibility version

LP B&B on  thinnest ej ej , j = (n − m), . . . , 1


Inst. m n type # nodes time (s) # nodes time (s) # nodes time (s)

M1 5 40 N 7,203 40 1,723 652 3,107 58


M2 5 40 N 10,488 58 3,839 1, 335 8,252 118
M3 5 40 Y 5,808 30 1,398 558 2,155 39
M4 5 40 N 16,484 84 5,893 1, 685 14,100 175
M5 5 40 N 17,182 94 3,027 1, 163 5,376 85
M6 5 40 N 11,500 62 2,762 1, 010 5,322 87
M7 5 40 N 16,666 88 4,025 1, 391 9,710 123
M8 5 40 N 7,483 42 2,386 899 4,310 68
M9 5 40 N 6,393 36 1,674 660 3,115 50
M10 5 40 N 17,206 90 3,791 1, 319 7,860 115

M11 6 50 N 413, 386 3, 690 88,619 53, 713 152,399 3, 532


M12 6 50 Y 72,093 516 14,456 9, 078 8,479 229
M13 6 50 N 375, 654 3, 080 86,505 51, 626 141,259 3, 596
M14 6 50 N 381, 813 2, 984 108,725 75, 794 204,367 4, 193
M15 6 50 Y 125, 250 960 69,981 37, 774 129,402 2, 700

M16 6 50 N 114, 215 932 40,274 24, 969 79,130 1, 639


M17 7 60 N 108, 154 1, 228 36,288 36, 023 73,877 2, 095

3 Computational Experience
3.1 The Feasibility Version
We solved 17 instances of problem FP (1) reformulated as problem (10). Three
of the instances were feasible and 14 infeasible. The input was generated as de-
scribed in Section 1. The instances M16 and M17 are the instances “markshare1”
and “markshare2” of MIPLIB [10].
Market Split and Basis Reduction 7

In our computational study we wanted to determine the effect of the reformu-


lation of problem (1) to problem (10). Therefore, we solved formulation (10) by
linear programming based branch-and-bound as it was used by Cornuéjols and
Dawande [2] on formulation (1). Given that we consider an integer relaxation
(4) rather than a linear relaxation when determining (10), we also wanted to
investigate the effect of maintaining the integral representation by applying in-
tegral branching. This explains the choice of the three branching strategies that
we considered in our study: linear programming based branch-and-bound on the
variables λj , branching on the unit vector in λ-space that yields the smallest
value of buk c − dlk e at node k of the search tree as described in Section 2, and
branching on the unit vectors in the predetermined order j = n − m, . . . , 1, see
Section 2. An example of an enumeration tree derived by the third strategy for
an instance of size 4 × 30 is shown in Figure 1. Notice that few branches are
created at each node of the tree, and that no branching occurs below level eight.

1 1

3 2 77 78

6 3 59 64 79 108 111

9 4 35 60 65 67 80 89 109 112

13 5 26 31 36 52 61 66 68 81 86 90 95 110

15 6 19 27 32 37 42 53 62 69 70 82 87 91 93 96

17 7 14 20 28 33 38 43 50 54 63 71 83 88 92 94 97 103

17 8 9 15 16 21 22 29 34 39 40 44 51 55 72 84 98 104

12 10 17 23 30 41 45 48 56 73 85 99 105

9 11 18 24 46 49 57 74 100 106

7 12 25 47 58 75 101 107

3 13 76 102

Fig. 1. Enumeration tree for integer branching ej , j = (n − m), . . . , 1

The information in Table 1 is interpreted as follows. In the first three columns,


“Instance”, “m”, and “n”, the instance names and the dimension of the in-
stances are given. A “Y” in column “type” means that the instance is feasible,
and an “N” that it is not feasible. Note that it is not known a priori whether
the instances are feasible, but it is established by our algorithm. In the following
six columns the number of nodes and the computing times are given for the
three branching strategies. The computing times for the strategies “thinnest ej ”
and “ej , j = (n − m), . . . , 1”, are given in seconds on a Sun Ultra Enterprise
2 with two 168 MHz Ultra Sparc processors (our implementation is sequential),
SpecInt95 6.34, SpecFp95 9.33. The linear programming branch-and-bound com-
putations were carried out on an Alphaserver 4100 5/400 with four 400 MHz
21164 Alpha Processors (sequential implementation) SpecInt95 12.1, SpecFp95
8 Karen Aardal et al.

17.2. The times reported on for this strategy are the actual times obtained with
the Alphaserver multiplied by a factor of 2, in order to make it easier to com-
pare the times of the three strategies. The basis reduction in the algorithm by
Aardal, Hurkens, and Lenstra is done using LiDIA, a library for computational
number theory [8]. The average computing time for the Aardal-Hurkens-Lenstra
algorithm was for the three instance sizes 1.6, 3.1, and 4.8 seconds respectively.
These computations were all carried out on the Sun Ultra Enterprise 2. For the
LP-based branch-and-bound on the variables λj we used CPLEX 6.5 [4], and for
the other two strategies we use the enumeration scheme developed by Verweij
[12]. The linear programming subproblems that arise in these strategies when
determining buk c and dlk e, are solved using CPLEX version 6.0.1 [3].
An important conclusion of the experiments is that the reformulation itself
is essential. Cornuéjols and Dawande [2] could only solve instances up to size
4 × 30 using CPLEX version 4.0.3. We also applied CPLEX versions 6.0 and
6.5 to the initial problem formulation (1) and observed a similar behaviour,
whereas CPLEX produced very good results on the reformulated problem (10).
Cornuéjols and Dawande did solve feasibility instances of size 6 × 50 by a group
relaxation approach. Their computing times are a factor of 3-10 slower than
ours. No previous computational results for instances of size 7 × 60 have, to our
knowledge, been reported.
If we consider the number of nodes that we enumerate in the three strate-
gies, we note that branching on the thinnest unit vector (in λ-space) yields the
fewest number of nodes for all instances except instance M12, which is a feasi-
ble instance. We do, however, need to determine the unit vector that yields the
thinnest direction at each node of the tree, which explains the longer computing
times. Branching on unit vectors in the predetermined order j = n − m, . . . , 1,
also requires fewer nodes for most instances than the linear programming based
branching on the variables λj . In terms of computing times, linear program-
ming based branch-and-bound is for most instances the fastest strategy, but
does not differ too much from the times needed for branching on unit vectors
ej , j = n−m, . . . , 1. This indicates that integer branching is an interesting strat-
egy, in particular if we can find reasonably good branching directions quickly,
as in the third strategy. In our case it seems as if the unit vectors in λ-space
yield thin branching directions. To investigate this we applied the generalized
basis reduction algorithm of Lovász and Scarf [9] to our polytope P . The re-
duced basis vectors yielded thinner directions than the strategy “thinnest ej ”
in only about 6% of the cases for the instances of size 5 × 40. This implies that
the unit vectors in λ-space, in some order, basically form a reduced basis in the
Lovász-Scarf sense. The computations involved in determining a Lovász-Scarf
reduced basis are fairly time consuming. For a problem of dimension 5 × 40,
at the root node of the tree, one has to solve at least 100 linear programs to
determine the basis. For each level of the tree the number of linear programs
solved at each node will decrease as the dimension of the subproblems decrease.
If the unit basis would generate bad search directions, then a heuristic version
of the Lovász-Scarf algorithm would be a possibility.
Market Split and Basis Reduction 9

3.2 The Optimization Version

The algorithm by Aardal, Hurkens, and Lenstra [1] was primarily designed to
solve feasibility problems, but can with simple adaptions be used to solve the
optimization version (3) of the market split instances as well. Below, we report on
the results obtained by using three different strategies to solve the optimization
version. All strategies are based on linear programming based branch-and-bound.

Strategy 1: Here, we solve sequence of feasibility problems. We start with the


feasibility version (10). If the instance is infeasible, then we proceed by consider-
ing the following sequence of feasibility problems for v = 1, 2, . . . until a feasible
solution is found.

do there exist vectors x ∈ {0, 1}n, s, w ∈ ZZ m


≥0 : (11)

X
n X
m
aij xj + si − wi = di , 1 ≤ i ≤ m, (si + wi ) = v?
j=1 i=1

These feasibility problems are then reformulated as problems of type (10) using
the algorithm of Aardal et al. For each of these feasibility problems we apply
linear programming based branch-and-bound on the variables λj . Here, we also
investigate the influence of the choice of objective function on the search
Pn−mthat
CPLEX is performing. In Strategy 1 we use the objective function max j=1 λj .

Strategy 2: This is the same as Strategy 1 except that the objective function is a
perturbation of the objective function zero. Here we sketch the principle of the
perturbation. What is basically done to construct the perturbed objective func-
tion is to perturb the variables of the linear programming dual as follows. Notice
that the number of constraints in the linear relaxation of the reformulation (10)
of the feasibility problem (11) is p = 2n + 2m; we have 2n constraints corre-
sponding to the upper and lower bounds on the x-variables, and 2m constraints
corresponding to the nonnegativity requirements on the slack variables si and
wi , 1 ≤ i ≤ m. Let ε = 10−6 and let, for i = 1, . . . , p, Zi be a drawn uniformly
and independently from the interval [0, 1]. Let δi = εZi . If the dual variable
yi ≤ 0 in the original formulation we let yi ≤ δi , and if yi ≥ 0 we let yi ≥ −δi .
For yi such that yi ≤ δi , make the substitution Yi = yi − δi , and for yi ≥ −δi
we substitute yi by Yi = yi + δi . This substitution implies a perturbation of the
primal objective function.

Strategy 3: Here we view the problem as an optimization problem directly, which


implies that only one problem is solved instead of a sequence of problems as in
Strategies 1 and 2. We extract the expressions of the slack variables si and wi in
terms of the variables λj and minimize the sum of the slack variables expressed
in the λj ’s.
For all computations we used CPLEX version 6.5. The computations were
made on an Alphaserver 4100 5/400 as described in the previous subsection.
10 Karen Aardal et al.

Table 2. Results for the optimization version

Strat. 1 Strat. 2 Strat. 3


Inst. # nodes time (s) # nodes time (s) # nodes time (s)

M1 20,022 57 13,699 38 53,267 180


M2 6,451 44 75,174 222 109,498 281
M3 5,484 15 5,230 15 41,518 118
M4 36,847 107 75,985 211 176,437 456
M5 32,880 93 16,379 45 271,208 705
M6 35,710 105 21,277 62 220,603 875
M7 64,090 180 88,160 254 396,416 1,261
M8 33,937 99 35,471 103 122,526 383
M9 9,910 29 19,083 56 200,987 625
M10 28,402 76 146,224 436 99,502 299

M11 1,165,498 6,018 1,728,646 7,811 6,158,407 29,486


M12 53,273 214 73,801 283 5,101,843 22,002
M13 810,496 505 1,410,840 6,607 6,057,528 26,333
M14 384,882 1,505 1,117,107 4,748 7,861,402 34,083
M15 96,470 384 17,007 60 9,558,521 37,960

M16 282,665 1,190 319,029 1,248 71,253 348


M17 567,837 3,767 1,823,915 11,597 342,5941 27,168

From the results in Table 2 we can conclude that instances of sizes up to 7×60 are
relatively easy to solve to optimality after using the reformulation of the problem
implied by the algorithm of Aardal, Hurkens, and Lenstra [1]. This represents a
large improvement over earlier results, where the largest optimization instances
had dimension 4×30, see Cornuéjols and Dawande [2]. If we consider the number
of nodes that we enumerate when applying linear programming based branch-
and-bound on the variables λj , we observe that this number is significantly
smaller than the number 2αn for α between 0.6 and 0.7 that Cornuéjols and
Dawande observed when applying branch-and-bound on the xj -variables. For
instances of size 4 × 30 they enumerated between 106 and 2 × 106 nodes. For the
same number of enumeration nodes we can solve instances of more than twice
that size. We can also observe that solving the reformulated optimization version
(Strategy 3) instead of a sequence of feasibility problems (Strategies 1 and 2) is
more time consuming in most cases. One reason is that the optimum objective
value is small, so the number of problems we need to solve in Strategies 1 and 2
is small, but in case of the infeasible instances greater than one. If one expects
the optimum value to be large and no reasonable bounds are known, then it is
probably better to consider Strategy 3.
Next to the instances we report on here we also generated another five in-
stances of size 7 × 60. All these instances were feasible so we decided not to
Market Split and Basis Reduction 11

report on the results of the computations here. For the size 6 × 50 we also had
to generate quite a few instances to obtain infeasible ones. This motivated us
to investigate whether the relation n = 10(m − 1), as Cornuéjols and Dawande
suggested, is actually likely to produce infeasible instances if we keep all other
parameters as they suggested. Our probabilistic analysis is presented in the next
section.

4 The Expected Number of Solutions


Here we derive an expression for the expected number of solutions for problem
formulation (1), given that the coefficients aij are generatedPn uniformly and in-
dependently from the set {0, . . . , D − 1}, and that di = b p j=1 aij c, cf. Section
1. Cornuéjols and Dawande [2] use D = 100 and p = 12 .

4.1 The Probability that a Subset Induces a Solution


Consider a subset S ⊆ {1, 2, . . . , n}, and let
P xj = 1 if jP∈n S, and xj = 0
otherwise. We compute the probability that j∈S aij = b p j=1 aij c, in which
case the vector x P
as defined above
Pnsatisfies Ax = d for row i. Define a random
variable Zi (S) = j∈S aij − b p j=1 aij c denoting the difference between the
left-hand side and the right-hand side of row i. The probability that we want
to compute is therefore Pr[Zi (S) = 0]. Let random variables Yi (S) and Ui be
defined as
X X
n X X
Yi (S) = aij − p aij = (1 − p)aij − paij , (12)
j∈S j=1 j∈S j ∈S
/
P P
and Ui = p nj=1 aij − b p nj=1 aij c. Hence, we can write Zi (S) = Yi (S) + Ui .
For any rational fraction p = P/Q (P, Q ∈ IN, gcd(P, Q) = 1), we have
Yi (S) ∈ Q
1
ZZ and Ui ∈ Q
1
ZZ ∩ [0, 1). Since Yi (S) + Ui ≡ 0 (mod 1), we have

X
Q−1
−k
Pr[Zi (S) = 0] = Pr[Yi (S) = −Ui ] = Pr[Yi (S) = ]. (13)
Q
k=0

We can compute this probability exactly using the probability generating func-
tion of Yi (S), see Section 4.2, or give an approximation using the normal dis-
tribution as described in Section 4.3. In either case, we obtain an expression
Pr[Zi (S) = 0] = q(n, D, |S|), i.e., the probability that x induced by S defines a
solution for (Ax)i = di depends on n, D and the size of S only. The probability
that S induces a solution for Ax = d is given by q(n, D, |S|)m . The expected
number of solutions is derived by summing over all subsets S, i.e.,
X Xn  
n
E[#solutions] = q(n, D, |S|) =
m
q(n, D, s)m . (14)
s
S⊂{1,...,n} s=0
12 Karen Aardal et al.

4.2 The Probability Generating Function


The probability generating function of aij is given by

X
D−1
1 i 1 1 xD − 1
Gaij (x) = E[xaij ] = x = (1 + x + · · · + xD−1 ) = . (15)
i=0
D D D x−1

Similarly, the probability generating function of Yi (S) is given by


Y 1 x(1−p)D − 1 Y 1 x(−p)D − 1
GYi (S) (x) = (16)
D x(1−p) − 1 D x(−p) − 1
j∈S j ∈S
/
 n  |S|  (−p)D n−|S|
1x(1−p)D − 1 x −1
= (17)
D x(1−p) − 1 x(−p) − 1
 (1−p)D |S|  pD n−|S|
1 1 x −1 x −1
= n p(D−1)(n−|S|) . (18)
D x x(1−p) − 1 xp − 1
P
For rational p = P/Q, we would like to expand expression (18) to j cj xj/Q ,
where cj denotes the probability Pr[Yi (S) = Qj ]. To compute Pr[Zi (S) = 0], we
only need to evaluate the coefficients cj for j = 0, −1, . . . , −Q + 1. For p = 12 ,
expression (18) is equal to
 D/2 n
1 1 x −1
. (19)
Dn x(D−1)(n−|S|)/2 x1/2 − 1
D
−1 T
The major factors in (18) and (19) are of the form ( yy−1 ) , which is equal
P∞ j
to j=0 aj y with

min{T,bj/Dc}    
X T j − Dk + T − 1
aj = (−1)k . (20)
k j − Dk
k=0

For p = 12 , we take T = n and y = x1/2 , to obtain

1 
Pr[Zi (S) = 0] = q(n, D, |S|) = n
a(D−1)(n−|S|) + a(D−1)(n−|S|)−1 . (21)
D
For p 6= 12 , we use (20) to compute the coefficients of each factor in expres-
sion (18) and derive the cj -coefficients by convolution of the two power series
obtained.

4.3 An Approximation Using the Normal Distribution


We can also approximate q(n, D, |S|) using the normal distribution. Each of the
coefficients aij has expectation 12 (D − 1) and variance 12
1
(D2 − 1). Since they are
drawn independently, we obtain E[Yi (S)] = 2 (D − 1)(|S| − np) and Var[Yi (S)] =
1
Market Split and Basis Reduction 13

12 (D − 1)(|S|(1 − 2p) + p n). Note that for p = 2 , the variance reduces to


1 2 2 1

(D2 − 1)n/48. For rational p = P/Q, the probability that subset S induces a
solution for row i is given by Pr[Zi (S) = 0] = Pr[1/(2Q) − 1 ≤ Yi (S) ≤ 1/(2Q)].
We can approximate this expression, using the Central Limit Theorem [5], by
the normal distribution giving
  Z β
1 1 1 1
Pr − 1 < Yi (S) < ≈ √ exp(− u2 )du , (22)
2Q 2Q 2π α 2

with
1
2Q − 1 − E[Yi (S)] − E[Yi (S)]
1
2Q
α= p , and β = p . (23)
Var[Yi (S)] Var[Yi (S)]

4.4 The Probability of Generating Infeasible Instances

Finally, we approximate the probability of drawing an infeasible instance. Here,


we neglect the dependency between two distinct subsets, each not providing a
solution. We further use log(1 + x) = x + O(x2 ), finding

Y
Pr[#solutions = 0] ≈ Pr[S is no solution] (24)
S⊂{1,...,n}
 Y 
= exp log (1 − q(n, D, |S|)m ) (25)
S⊂{1,...,n}
 X 
= exp log(1 − q(n, D, |S|)m ) (26)
S⊂{1,...,n}
 X 
≈ exp −q(n, D, |S|)m (27)
S⊂{1,...,n}

= exp (−E[#solutions]) . (28)

4.5 Computational Results

We have computed the expected number of solutions using probability generating


functions (gen), and using an approximation by the normal distribution (approx)
for several values of m and n. The results are presented in Table 3. In our
computations we use p = 12 and D = 100. The horizontal lines in the table
indicate the relation n = 10(m − 1) as proposed by Cornuéjols and Dawande.
We notice that the value obtained by the approximation overestimates the exact
value with a relative error of at most 5.2 %.
For a given value of m the table shows that the expected number of so-
lutions, and therefore the probability of drawing a feasible instance, increases
rapidly when n increases, see also Figure 2. In particular, we observe that for
m ≥ 6, the expected number of solutions using the relation n = 10(m − 1) is
14 Karen Aardal et al.

greater than 0.9. This confirms our experience with the instances we drew for our
computational experiments reported on in Section 3. If one wants to generate
infeasible instances for m ≥ 6 with high probability, then one needs to generate
slightly fewer columns for a given value of m than the relation n = 10(m − 1)
indicates.

Table 3. The expected number of solutions computed exactly using the prob-
ability generating function (gen) and approximated by the normal distribution
(approx)
n m =4 m= 5 m = 6 m = 7 m =8
gen approx gen approx gen approx gen approx gen approx
20 0.0004 0.0004 2.3091e-6 2.3692e-6 1.3071e-8 1.3519e-8 7.5214e-11 7.8457e-11 4.3865e-13 4.6155e-13
21 0.0007 0.0008 4.0743e-6 4.1700e-6 2.2299e-8 2.2964e-8 1.2316e-10 1.2759e-10 6.8386e-13 7.1256e-13
22 0.0014 0.0014 7.2921e-6 7.4623e-6 3.9302e-8 4.0507e-8 2.1514e-10 2.2339e-10 1.1927e-12 1.2479e-12
23 0.0025 0.0026 0.00001 0.00001 6.8320e-8 7.0205e-8 3.6171e-10 3.7380e-10 1.9270e-12 2.0023e-12
24 0.0046 0.0047 0.00002 0.00002 1.2125e-7 1.2461e-7 6.3449e-10 6.5644e-10 3.3604e-12 3.5006e-12
25 0.0086 0.0087 0.00004 0.00004 2.1378e-7 2.1924e-7 1.0881e-9 1.1219e-9 5.5765e-12 5.7801e-12
26 0.0159 0.0161 0.00008 0.00008 3.8208e-7 3.9177e-7 1.9192e-9 1.9797e-9 9.7509e-12 1.0121e-11
27 0.0295 0.0298 0.0001 0.0001 6.8118e-7 6.9740e-7 3.3415e-9 3.4385e-9 1.6516e-11 1.7079e-11
28 0.0548 0.0555 0.0003 0.0003 1.2257e-6 1.2544e-6 5.9298e-9 6.1020e-9 2.9002e-11 3.0013e-11
29 0.1023 0.1035 0.0005 0.0005 2.2050e-6 2.2540e-6 1.0449e-8 1.0733e-8 4.9914e-11 5.1513e-11
30 0.1913 0.1936 0.0009 0.0009 3.9929e-6 4.0798e-6 1.8657e-8 1.9159e-8 8.8096e-11 9.0939e-11
31 0.3585 0.3626 0.0016 0.0016 7.2371e-6 7.3878e-6 3.3199e-8 3.4045e-8 1.5357e-10 1.5819e-10
32 0.6732 0.6808 0.0030 0.0030 0.00001 0.00001 5.9626e-8 6.1124e-8 2.7250e-10 2.8068e-10
33 1.2668 1.2805 0.0055 0.0055 0.00002 0.00002 1.0696e-7 1.0953e-7 4.8000e-10 4.9361e-10
34 2.3879 2.4131 0.0102 0.0103 0.00004 0.00004 1.9319e-7 1.9774e-7 8.5633e-10 8.8043e-10
35 4.5090 4.5552 0.0189 0.0192 0.00008 0.00008 3.4895e-7 3.5685e-7 1.5215e-9 1.5623e-9
36 8.5279 8.6128 0.0353 0.0358 0.0001 0.0001 6.3353e-7 6.4759e-7 2.7288e-9 2.8010e-9
37 16.1533 16.3098 0.0659 0.0668 0.0003 0.0003 1.1512e-6 1.1758e-6 4.8844e-9 5.0085e-9
38 30.6410 30.9297 0.1234 0.1250 0.0005 0.0005 2.1000e-6 2.1440e-6 8.8038e-9 9.0240e-9
39 58.2021 58.7363 0.2314 0.2343 0.0009 0.0009 3.8357e-6 3.9137e-6 1.5859e-8 1.6241e-8
40 110.6973 111.6878 0.4345 0.4400 0.0017 0.0018 7.0282e-6 7.1681e-6 2.8719e-8 2.9400e-8
41 210.7999 212.6397 0.8174 0.8274 0.0032 0.0033 0.00001 0.00001 5.2021e-8 5.3216e-8
42 401.8960 405.3196 1.5399 1.5582 0.0060 0.0061 0.00002 0.00002 9.4624e-8 9.6756e-8
43 767.0835 773.4649 2.9050 2.9388 0.0112 0.0114 0.00004 0.00004 1.7224e-7 1.7601e-7
44 1465.6670 1477.5812 5.4876 5.5499 0.0209 0.0212 0.00008 0.00008 3.1459e-7 3.2134e-7
45 2803.3082 2825.5860 10.3793 10.4945 0.0391 0.0397 0.0001 0.0002 5.7517e-7 5.8721e-7
46 5366.9806 5408.6988 19.6556 19.8690 0.0733 0.0743 0.0003 0.0003 1.0545e-6 1.0761e-6
47 1.0285e4 1.0363e4 37.2658 37.6617 0.1375 0.1393 0.0005 0.0005 1.9356e-6 1.9744e-6
48 1.9726e4 1.9874e4 70.7327 71.4684 0.2582 0.2617 0.0010 0.0010 3.5613e-6 3.6313e-6
49 3.7868e4 3.8145e4 134.3989 135.7680 0.4856 0.4920 0.0018 0.0018 6.5607e-6 6.6867e-6
50 7.2753e4 7.3275e4 255.6339 258.1855 0.9144 0.9262 0.0033 0.0034 0.00001 0.00001
51 1.3989e5 1.4087e5 486.7099 491.4721 1.7238 1.7457 0.0062 0.0062 0.00002 0.00002
52 2.6918e5 2.7103e5 927.5451 936.4449 3.2537 3.2940 0.0116 0.0117 0.00004 0.00004
53 5.1834e5 5.2184e5 1769.2812 1785.9346 6.1478 6.2225 0.0216 0.0220 0.00008 0.00008
54 9.9884e5 1.0054e6 3377.8566 3409.0581 11.6287 11.7673 0.0405 0.0411 0.0001 0.0001
55 1.9261e6 1.9386e6 6454.3702 6512.8979 22.0181 22.2758 0.0761 0.0772 0.0003 0.0003
56 3.7165e6 3.7402e6 1.2343e4 1.2453e4 41.7308 42.2104 0.1429 0.1449 0.0005 0.0005
57 7.1757e6 7.2207e6 2.3622e4 2.3830e4 79.1668 80.0606 0.2687 0.2724 0.0009 0.0009
58 1.3863e7 1.3948e7 4.5245e4 4.5635e4 150.3239 151.9916 0.5058 0.5127 0.0017 0.0017
59 2.6799e7 2.6961e7 8.6723e4 8.7457e4 285.6905 288.8057 0.9531 0.9658 0.0032 0.0033
60 5.1835e7 5.2143e7 1.6634e5 1.6772e5 543.4188 549.2448 1.7978 1.8214 0.0060 0.0061
61 1.0031e8 1.0090e8 3.1928e5 3.2189e5 1034.5026 1045.4103 3.3943 3.4383 0.0112 0.0114
62 1.9424e8 1.9535e8 6.1324e5 6.1817e5 1970.9501 1991.3942 6.4148 6.4966 0.0211 0.0214
63 3.7629e8 3.7843e8 1.1786e6 1.1879e6 3757.9874 3796.3444 12.1341 12.2862 0.0395 0.0401
64 7.2936e8 7.3343e8 2.2666e6 2.2842e6 7170.6817 7242.7199 22.9726 23.2561 0.0733 0.0754
65 1.4144e9 1.4221e9 4.3616e6 4.3951e6 1.3692e4 1.3828e4 43.5290 44.0578 0.1397 0.1417
66 2.7440e9 2.7589e9 8.3980e6 8.4614e6 2.6164e4 2.6419e4 82.5476 83.5352 0.2629 0.2666
67 5.3262e9 5.3545e9 1.6179e7 1.6299e7 5.0029e4 5.0511e4 156.6664 158.5126 0.4952 0.5021
68 1.0343e10 1.0396e10 3.1186e7 3.1412e7 9.5728e4 9.6634e4 297.5662 301.0208 0.9337 0.9465
69 2.0092e10 2.0196e10 6.0146e7 6.0581e7 1.8328e5 1.8499e5 565.6096 572.0801 1.7619 1.7858
70 3.9050e10 3.9248e10 1.1606e8 1.1688e8 3.5114e5 3.5437e5 1075.8876 1088.0185 3.3274 3.3720
71 7.5923e10 7.6305e10 2.2406e8 2.2563e8 6.7315e5 6.7925e5 2047.9738 2070.7375 6.2893 6.3723
72 1.4767e11 1.4840e11 4.3279e8 4.3578e8 1.2912e6 1.3027e6 3901.0477 3943.8021 11.8969 12.0517
73 2.8734e11 2.8875e11 8.3636e8 8.4207e8 2.4781e6 2.4999e6 7435.8186 7516.1883 22.5215 22.8106
74 5.5932e11 5.6202e11 1.6170e9 1.6278e9 4.7588e6 4.8001e6 1.4183e4 1.4334e4 42.6664 43.2066
75 1.0891e12 1.0943e12 3.1277e9 3.1484e9 9.1434e6 9.2218e6 2.7068e4 2.7354e4 80.8888 81.8993
76 2.1215e12 2.1314e12 6.0523e9 6.0920e9 1.7577e7 1.7725e7 5.1694e4 5.2231e4 153.4610 155.3527
77 4.1339e12 4.1531e12 1.1717e10 1.1792e10 3.3807e7 3.4089e7 9.8781e4 9.9795e4 291.3437 294.8880
78 8.0580e12 8.0948e12 2.2693e10 2.2837e10 6.5057e7 6.5593e7 1.8887e5 1.9078e5 553.4833 560.1297
79 1.5712e13 1.5783e13 4.3968e10 4.4245e10 1.2525e8 1.2627e8 3.6133e5 3.6494e5 1052.1712 1064.6447
80 3.0646e13 3.0782e13 8.5223e10 8.5754e10 2.4126e8 2.4319e8 6.9164e5 6.9847e5 2001.4513 2024.8798
Market Split and Basis Reduction 15

70

60

50
n
10
8 40
ExpHits 6 70
4 60
2
0 50
4 n 30
5 40
0
6 2
m
30 4
7 6
20 8
ExpHits
8 20 4 5 6 7 10
8
m

Fig. 2. The expected number of solutions for m = 4, . . . , 8, truncated at 10

Acknowledgements

We would like to thank Bram Verweij for his assistance in implementing our
integral branching algorithm using his enumeration scheme [12]. We also want
to thank David Applegate and Bill Cook for their many useful comments on our
work and for allowing us to use their DEC Alphaservers.

References
1. K. Aardal, C. Hurkens, A. K. Lenstra (1998). Solving a system of diophantine
equation with lower and upper bounds on the variables. Research report UU-CS-
1998-36, Department of Computer Science, Utrecht University.
2. G. Cornuéjols, M. Dawande (1998). A class of hard small 0-1 programs. In: R. E.
Bixby, E. A. Boyd, R. Z. Rı́os-Mercado (eds.) Integer Programming and Combinato-
rial Optimization, 6th International IPCO Conference. Lecture Notes in Computer
Science 1412, pp 284–293, Springer-Verlag, Berlin Heidelberg.
3. CPLEX 6.0 Documentation Supplement (1998). ILOG Inc., CPLEX Division, In-
cline Village NV.
4. CPLEX 6.5 Documentation Supplement (1999). ILOG Inc., CPLEX Division, In-
cline Village NV.
5. G. R. Grimmett, D. R. Stirzaker (1982). Probability and Random Processes, Oxford
University Press, Oxford.
6. A. K. Lenstra, H. W. Lenstra, Jr., L. Lovász (1982). Factoring polynomials with
rational coefficients. Mathematische Annalen 261, 515–534.
7. H. W. Lenstra, Jr. (1983). Integer programming with a fixed number of variables.
Mathematics of Operations Research 8, 538–548.
8. LiDIA – A library for computational number theory. TH Darmstadt / Universität
des Saarlandes, Fachbereich Informatik, Institut für Theoretische Informatik.
http://www.informatik.th-darmstadt.de/pub/TI/LiDIA
16 Karen Aardal et al.

9. L. Lovász, H. E. Scarf (1992). The generalized basis reduction algorithm. Mathe-


matics of Operations Research 17, 751–764.
10. MIPLIB. http://www.caam.rice.edu/∼ bixby/miplib/miplib3.html
11. A. Schrijver (1986). Theory of Integer and Linear Programming. Wiley, Chichester.
12. A. M. Verweij (1998). The UHFCO Library. Department of Computer Science,
Utrecht University.
Approximation Algorithms for Maximum
Coverage and Max Cut with Given Sizes of
Parts?

Alexander A. Ageev and Maxim I. Sviridenko

Sobolev Institute of Mathematics


pr. Koptyuga 4, 630090, Novosibirsk, Russia
{ageev,svir}@math.nsc.ru

Abstract. In this paper we demonstrate a general method of designing


constant-factor approximation algorithms for some discrete optimization
problems with cardinality constraints. The core of the method is a simple
deterministic (“pipage”) procedure of rounding of linear relaxations. By
using the method we design a (1 − (1 − 1/k)k )-approximation algorithm
for the maximum coverage problem where k is the maximum size of
the subsets that are covered, and a 1/2-approximation algorithm for
the maximum cut problem with given sizes of parts in the vertex set
bipartition. The performance guarantee of the former improves on that
of the well-known (1 − e−1 )-greedy algorithm due to Cornuejols, Fisher
and Nemhauser in each case of bounded k. The latter is, to the best of
our knowledge, the first constant-factor algorithm for that version of the
maximum cut problem.

1 Introduction
It is a fact of the present day that rounding of linear relaxations is one of the
most effective techniques in designing approximation algorithms with proven
worst-case bounds for discrete optimization problems. The quality characteris-
tics of a rounding-based algorithm are highly dependent on the choice of an
integer program reformulation and a rounding scheme. When applying popular
random roundings one encounters the additional problem of derandomization.
This problem may prove to be extremely difficult or quite intractable. For exam-
ple, the widely known derandomization method of conditional probabilities [1]
succeeds, as is easily seen, only under some very special conditions; in particu-
lar, if the relaxation has a subset of active variables that determine the optimal
values of the remaining ones and the optimization problem with respect to these
active variables is unconstrained. If one adds even a single cardinality constraint
connecting the active variables, the method fails. In this paper we present a
simple deterministic (“pipage”) rounding method to tackle some problems of
this sort. So far the method has happened to be applicable to two well-known
?
This research was partially supported by the Russian Foundation for Basic Research,
grant 97-01-00890

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 17–30, 1999.
c Springer-Verlag Berlin Heidelberg 1999
18 Alexander A. Ageev and Maxim I. Sviridenko

problems. By using it we design a (1 − (1 − 1/k)k )-approximation algorithm for


the maximum coverage problem where k is the maximum size of the subsets,
and a 1/2-approximation algorithm for the maximum cut problem with given
sizes of parts in the vertex set bipartition. The performance guarantee of the
former improves on that of the well-known (1 − e−1 )- greedy algorithm due to
Cornuejols, Fisher and Nemhauser [2] in each case of bounded k. The latter is,
to the best of our knowledge, the first constant-factor algorithm for that version
of the maximum cut problem. In Sections 4 and 5 we show that both algorithms
can be extended to apply to more general problems — the maximum k-cut
problem with given sizes of parts and the maximum coverage problem with a
knapsack constraint — with preservation of the same performance guarantees in
both cases.
To elucidate the key ideas behind the method we describe it informally under
some general assumptions (of course, these are far from most general).
Assume that the problem under consideration can be formulated as the fol-
lowing nonlinear binary program:
max F (x1 , . . . , xn ) (1)
n
X
s. t. xi = p (2)
i=1
xi ∈ {0, 1}, i = 1, . . . , n (3)
where p is a positive integer, F (x1 , . . . , xn ) is a function defined on the ra-
tional points of the n-dimensional cube [0, 1]n and computable in polynomial
time. Assume further that one can associate with F (x1 , . . . , xn ) another func-
tion L(x1 , . . . , xn ) which is defined and polynomially computable on the same
set, coincides with F (x1 , . . . , xn ) on binary vectors, and the problem of maximiz-
ing L(x1 , . . . , xn ) over all rational points of [0, 1]n subject to (2) is polynomially
solvable. We shall call such a problem a nice relaxation. In our algorithms each
nice relaxation will be in fact a slight reformulation of a linear program. Assume
that additionally the following properties hold:
(A) there exists C > 0 such that F (x1 , . . . , xn ) ≥ CL(x1 , . . . , xn ) for each
x ∈ [0, 1]n ;
(B) the function ϕ(ε, x, i, j) = F (x1 , . . . , xi + ε, . . . , xj − ε, . . . , xn ) is convex
with respect to ε ∈ [− min{xi , 1 − xj }, min{1 − xi , xj }] for each pair of indices i
and j and each x ∈ [0, 1]n .
We claim that under the above assumptions one can find in polynomial time
a feasible solution x̃ = (x̃1 , . . . , x̃n ) to the problem (1)–(3), satisfying F (x̃) ≥
CF (x∗ ) where x∗ is an optimal solution to (1)–(3). Let x0 be an optimal solution
to the nice relaxation. Indeed, if the vector x0 is not binary, then due to (2)
it has at least two different components xi and xj with values lying strictly
between 0 and 1. By property (B), ϕ(ε, x0 , i, j) ≥ F (x0 ) either for ε = min{1 −
xi , xj }, or for ε = − min{xi , 1 − xj }. Thus we obtain a new feasible solution
x00 = (x01 , . . . , x0i + ε, . . . , x0j − ε, . . . , x0n ) with smaller number of noninteger
components and such that F (x00 ) ≥ F (x0 ). After repeating this “pipage” step at
Approximation Algorithms for Maximum Coverage and Max Cut 19

most n times we arrive at a binary feasible solution x̃ with F (x̃) ≥ CL(x0 ) ≥


CL(x∗ ) = CF (x∗ ), as required. Since an optimal solution to the nice relaxation
can be found in polynomial time, the overall running time of the above described
C-approximation algorithm is polynomially bounded.

2 The Maximum Coverage Problem

In the maximum coverage problem (MCP for short), given a family F = {Sj :
j ∈ J} of subsets of a set I = {1, . . . , n} with associated nonnegative weights
wj and a positive integer p, it is required to find a subset X ⊆ I with |X| = p so
as to maximize the total weight of the sets in F having nonempty intersections
with X. The polynomial-time solvability of MCP clearly implies that of the set
cover problem and so it is N P -hard. In a sense MCP can be treated as an in-
verse of the set cover problem and like the latter has numerous applications (see,
e. g. [10]). It is well known that a simple greedy algorithm solves MCP approxi-
mately within a factor of 1 − (1 − 1/p)p of the optimum (Cornuejols, Fisher and
Nemhauser [2]). Feige [4] proves that no polynomial algorithm can have better
performance guarantee provided that P6=NP. Another result concerning MCP is
due to Cornuejols, Nemhauser and Wolsey [3] who prove that the greedy algo-
rithm almost always finds an optimal solution to MCP in the case of two-element
sets. We show below that MCP can be solved in polynomial time approximately
within a factor of 1 − (1 − 1/k)k of the optimum, where k = max{|Sj | : j ∈ J}.
Although 1 − (1 − 1/k)k like 1 − (1 − 1/p)p can be arbitrary close to 1 − e−1 ,
the parameter k looks more interesting: for each fixed k (k = 2, 3, . . . ) MCP
still remains NP-hard. E. g., in the case when k = 2, which is the converse of
the vertex cover problem, the performance guarantee of the greedy algorithm
has the same value of 1 − e−1 [3], whereas our algorithm finds a solution within
a factor of 3/4 of the optimum. Ultimately, the performance guarantee of our
algorithm beats the performance guarantee of the greedy algorithm in each case
of bounded k and coincides with that when k is unbounded. Note also that our
result is similar in a sense to the well-known result [9] that the set cover problem
can be approximated in polynomial time within a factor of r of the optimum,
where r is the maximum number of sets containing an element.
Let J = {1, . . . m}. MCP can be equivalently reformulated as a constrained
version of MAX SAT over variables y1 , . . . , yn with m clauses C1 , . . . , Cm such
that Cj is the collection of yi with i ∈ Sj and has weight wj . It is required to
assign “true” values to exactly p variables so as to maximize the total weight of
satisfied clauses. Furthermore, analogously to MAX SAT (see, e. g. [6]), MCP
can be stated as the following integer program:
m
X
max wj zj (4)
j=1
X
s. t. xi ≥ zj , j = 1, . . . , m, (5)
i∈Sj
20 Alexander A. Ageev and Maxim I. Sviridenko

n
X
xi = p, (6)
i=1
xi ∈ {0, 1}, i = 1, . . . , n, (7)
0 ≤ zi ≤ 1, i = 1, . . . , m. (8)

It is easy to see that the relation “xi = 1 if i ∈ X, and xi = 0 otherwise”


establishes a 1-1 correspondence between the optimal sets in MCP and the op-
timal solutions to (4)–(8). Note that the variables xi determine the optimal
values of zj in any optimal solution and represent active variables in the sense
above. Moreover,
Pm it is clear
Q that MCP  is equivalent to maximizing the function
F (x) = w
j=1 j 1 − i∈Sj (1 − xi ) over all binary vectors x satisfying (6).
Observe also that the objective function (4) can be replaced by the function
m
X X
L(x) = wj min{1, xi }
j=1 i∈Sj

of the active variables x1 , . . . , xn , thus providing a nice relaxation.


We now show that the functions F and L just defined satisfy properties (A)
and (B).
Property (A) holds with C = (1 − (1 − 1/k)k ) where k = max{|Sj | : j ∈
J}, which is implied by the following inequality (used first by Goemans and
Williamson [6] in a similar context):
k
Y k
X
1− (1 − yi ) ≥ (1 − (1 − 1/k)k ) min{1, yi }, (9)
i=1 i=1

valid for all 0 ≤ yi ≤ 1, i = 1, . . . , k. To make the paper self-contained we derive


it below. By using the arithmetic-geometric mean inequality we have that
k
Y  z k
1− (1 − yi ) ≥ 1 − 1 − ,
i=1
k
Pk
where z = min{1, i=1 yi }. Since the function g(z) = 1 − (1 − z/k)k is concave
on the segment [0, 1] and g(0) = 0, g(1) = 1 − (1 − 1/k)k , we finally obtain
 
g(z) ≥ 1 − (1 − 1/k)k z,

as desired.
To check property (B) it suffices to observe that in this case the function
ϕ(ε, x, i, j) is convex because it is a quadratic polynomial in ε, whose main
coefficient is nonnegative for each pair of indices i and j and each x ∈ [0, 1]n .
Thus by concretizing the general scheme described in the introduction we obtain
a (1 − (1 − 1/k)k )-approximation algorithm for the maximum coverage problem.
We now demonstrate that the integrality gap of (4)–(8) can be arbitrarily
close to (1 − (1 − 1/k)k ) and thus the rounding scheme described above is best
Approximation Algorithms for Maximum Coverage and Max Cut 21

possible for the integer program (4)–(8). Set n = kp, wj = 1 for all j and let F be
the collection of all subsets of {1, . . . , n} with cardinality k. Then, by symmetry,
any binary vector with exactly p units maximizes L(x) subject to (6)–(7) and
so the optimal value of this problem is equal to L∗ = Cnk − Cn−p k
. On the other
hand, the vector with all components equal to 1/k provides an optimal solution
of weight L0 = Cnk to the linear relaxation in which the objective is to maximize
L(x) subject to (6) and 0 ≤ xi ≤ 1 for all i. Now it is easy to derive an upper
bound on the ratio
L∗ Cnk − Cn−p
k
=
L0 Cnk
(n − p)! k!(n − k)!
=1−
k!(n − p − k)! n!
 n − p  n − p − 1   n − p − k + 1 
=1− ...
n n−1 n−k+1
 n − p  n − p − 1   n − p − k + 1 
≤1− ...
n n n
 1  1 1  1 2  1 k + 1
=1− 1− 1− − 1− − ... 1 − −
k k n k n k n
 1 k + 1 k
≤1− 1− − ,
k n
which tends to (1 − (1 − 1/k)k ) when k is fixed and n → ∞.
Remark 1. The algorithm and the argument above can be easily adopted to
yield the same performance guarantee in the case of the more general problem
in which the constraint (6) is replaced by the constraints
X
xi = pt , t = 1, . . . , r
i∈It

where {It : t = 1, . . . , r} is a partition of the ground set I and pt are positive


integers. It can be shown, on the other hand, that the worst-case ratio of the
straightforward extension of the greedy algorithm cannot be lower bounded by
any absolute constant. So, our algorithm is the only algorithm with constant
performance guarantee among those known for this generalization.

Remark 2. It can be easily observed that from the very beginning (and with
the same ultimate result) we could consider objective functions of the following
more general form:
m
X Y l
 X Y 
F (x) = wj 1 − (1 − xi ) + ut 1 − xi ,
j=1 i∈Sj t=1 i∈Rt

where Sj and Rt are arbitrary subsets of {1, . . . , n}, and ut , wj are nonnegative
weights. The problem with such objective functions can be reformulated as the
constrained MAX SAT in which each clause either contains no negations or
contains nothing but negations.
22 Alexander A. Ageev and Maxim I. Sviridenko

3 The Maximum Cut Problem with Given Sizes of Parts


Let G be a complete undirected graph with vertex set V (G) = {1, . . . , n}. Any
nonempty vertex subset X ⊆ V (G) determines a cut δ(X) of G which, by def-
inition, consists of the set of edges having exactly one end in X. Assume that
to each edge e = ij of G is assigned a nonnegative weight wij . In the maximum
cut problem with given sizes of parts (MCGS for short), given a complete graph
G, nonnegative edge weights wij and a positive integer p ≤ n/2, it is required
to find a cut δ(X) having maximum total weight over all cuts with |X| = p.
In the special case of p = n/2, also known as the max bisection problem, an
approximate solution of weight within a factor of 1/2 of the optimum can be
found in polynomial time by a simple random sampling algorithm. However,
natural extensions of this algorithm to the case of arbitrary p do not admit
any fixed constant performance guarantee. Frieze and Jerrum [5] prove that the
max bisection problem can be approximately solved within a factor of 0.65 of
the optimum by a randomized rounding of an optimal solution to a semidefi-
nite program but their algorithm does not admit straightforward extensions. We
present an approximation algorithm which finds a feasible solution to MCGS of
weight within a factor of 1/2 of the optimum in the case of arbitrary p.
Observe first that MCGS is equivalent to maximizing the function
X
F (x) = wij (xi + xj − 2xi xj )
i<j

over all binary vectors x with exactly p units. Notice first that the function F
has property (B) because of the same reason as in the above section. Consider
the following integer program:
X
max wij zij (10)
i<j

s. t. zij ≤ xi + xj , i < j, (11)


zij ≤ 2 − xi − xj , i < j, (12)
X
xi = p, (13)
i∈V (G)

xi , zij ∈ {0, 1}, i ∈ V (G), i < j. (14)


It is easy to check that X ∗ ⊆ V (G) provides an optimal cut in MCGS if and
only if the vector x defined by x∗i = 1 if i ∈ X ∗ and 0 otherwise, is an optimal
solution to (10)–(14). Note that again, variables xi can be considered as active
in the sense that the program (10)–(14) is in fact equivalent to
X
max L(x) = wij min{xi + xj , 2 − xi − xj } (15)
i<j
X
s. t. xi = p, (16)
i∈V (G)

xi ∈ {0, 1}, i ∈ V (G). (17)


Approximation Algorithms for Maximum Coverage and Max Cut 23

Here we take as a nice relaxation the continuous relaxation of (15)–(17) (in which
(17) is replaced by 0 ≤ xi ≤ 1 for all i ∈ V (G)). Let x be an optimal solution to
the nice relaxation. We claim that F (x) ≥ 1/2L(x). Indeed, it suffices to check
that

xi + xj − 2xi xj ≥ 1/2 min{xi + xj , 2 − xi − xj } (18)

for all pairs i, j with i < j. Fix some i < j and assume first that xi + xj ≤ 1.
Then (18) is equivalent to xi + xj ≥ 4xi xj , which follows from

xi + xj ≥ (xi + xj )2 ≥ 4xi xj .

The case when xi +xj ≥ 1 is quite symmetric to the above, which is easily shown
by the substitution xi = 1 − yi , xj = 1 − yj . Thus property (A) with C = 1/2
does hold and we obtain an 1/2-approximation algorithm for MCGS.
Consider an instance of MCGS in which 2p ≤ n, wij = 1 if both i and j
lie in a fixed subset of V (G) having cardinality 2p, and wij = 0 otherwise. The
optimal value of this instance is obviously equal to p2 , whereas the optimal value
of the linear relaxation is p(p − 1)/2. Hence

p2 2p2 2
= 2 =
p(p − 1)/2 p −p 1 − 1/p

which can be arbitrarily close to 2. Thus, again, our rounding scheme is best
possible for the integer program (10)–(14).

4 The Maximum k-Cut Problem with Given Sizes of


Parts

In this section we illustrate how the “pipage” technique can be extended to apply
to a natural generalization of MCGS.
In the maximum k-cut problem with given sizes of parts (k-MCGS), given a
complete undirected graph G with V (G) = {1, . . . , n}, nonnegative edge weights
wij , and a collection of nonnegative numbers {p1 , . . . , pk }, it is required to find a
partition {P1 , . . . , Pk } of V (G) with |Pt | = pt for each t, maximizing the function
XX X
wij .
r<s i∈Pr j∈Ps

Two similar problems have been considered earlier by Guttmann-Beck and


Hassin in [7] and [8]. The minimization version of k-MCGS, which is also NP-
hard, is studied in [8]. No constant-factor algorithm for this problem is known.
Guttmann-Beck and Hassin present a polynomial-time 3-approximation algo-
rithm for the special case when the edge weights satisfy triangle inequality and
the number of parts is fixed. The other paper [7] considers the min-sum p-
clustering problem which can be treated as a complement to k-MCGS. The
24 Alexander A. Ageev and Maxim I. Sviridenko

authors suggest a 2-approximation algorithm for the similar special case of this
problem.
We present further an approximation algorithm which produces a feasible so-
lution to k-MCGS of weight within a factor of 1/2 of the optimum. The problem
can be reformulated as the following nonlinear integer program:

X k
X
max F (x) = wij (1 − xit xjt ) (19)
i<j t=1
n
X
s. t. xit = pt , t = 1, . . . k, (20)
i=1
k
X
xit = 1, i ∈ V (G), (21)
t=1
xit ∈ {0, 1}, i ∈ V (G). (22)

It is easy to check that x∗it is an optimal solution to (19)–(22) if and only if


{P1 , . . . , Pk } with Pt = {i : xit = 1} for each t, is an optimal partition in
k-MCGS.
Consider now the following linear integer program also equivalent to k-MCGS
and virtually generalizing (10)–(14):
k
1X X
t
max wij zij (23)
2 i<j t=1
t
s. t. zij ≤ xit + xjt , i < j, t = 1, . . . , k, (24)
t
zij ≤ 2 − xit − xjt , i < j, t = 1, . . . , k, (25)
X n
xit = pt , t = 1, . . . , k, (26)
i=1
k
X
xit = 1, i ∈ V (G), (27)
t=1
t
xit , zij ∈ {0, 1}, i, j ∈ V (G), t = 1, . . . , k. (28)

As in the previous section the objective function (23) can be replaced by


k
1X X
L(x) = wij min{xit + xjt , 2 − xit − xjt }.
2 i<j t=1

We claim that F (x) ≥ 1/2L(x) for all feasible x = (xit ). Fix a pair of indices i
and j with i < j. It suffices to show that
k
X k
X
1− xit xjt ≥ 1/4 min{xit + xjt , 2 − xit − xjt }. (29)
t=1 t=1
Approximation Algorithms for Maximum Coverage and Max Cut 25

Indeed, we have already proved in the above section (see (18)) that for each t
xit + xjt − 2xit xjt ≥ 1/2 min{xit + xjt , 2 − xit − xjt }. (30)
Adding together (30) over t from 1 to k and taking into account (27) we obtain
that
Xk k
X
2−2 xit xjt ≥ 1/2 min{xit + xjt , 2 − xit − xjt },
t=1 t=1
which implies (29).
We describe now the “pipage” step. Let x be an optimal solution to the linear
t
relaxation of (23)–(28) (that is, with (28) replaced by 0 ≤ xjt ≤ 1, 0 ≤ zij ≤ 1).
Define the bipartite graph H with the bipartition ({1, . . . , n}, {1, . . . , k}) so
that jt ∈ E(H) if and only if xjt is fractional. Note that (26) and (27) imply
that each vertex of H is either isolated or has degree at least 2. Assume that x
has fractional components. Since H is bipartite it follows that H has a circuit
C of even length. Let M1 and M2 be the matchings of H whose union is the
circuit C. Define a new solution x(ε) by the following rule: if jt is not an edge
of C, then xjt (ε) coincides with xjt , otherwise, xjt (ε) = xjt + ε if jt ∈ M1 , and
xjt (ε) = xjt − ε if jt ∈ M2 .
By definition x(ε) is a feasible solution to the linear relaxation of (23)–(28)
for all ε ∈ [−ε1 , ε2 ] where
ε1 = min{ min xjt , min (1 − xjt )}
jt∈M1 jt∈M2

and
ε2 = min{ min (1 − xjt ), min xjt }.
jt∈M1 jt∈M2

Moreover, F (x(ε)) is a quadratic polynomial in ε with a nonnegative main


coefficient and therefore
F (x(ε∗ )) ≥ F (x) ≥ 1/2L(x)
for some ε∗ ∈ {−ε1 , ε2 }. The new solution x(ε∗ ), being feasible for the linear
relaxation of (23)–(28), has a smaller number of fractional components. Set
x0 = x(ε∗ ) and, if x0 has fractional components, apply to x0 the above described
“pipage” step and so on. Ultimately, after at most nk steps, we arrive at a
solution x̃ which is feasible for (19)–(22) and satisfies
F (x̃) ≥ 1/2L(x) ≥ 1/2F ∗ ,
where F ∗ is an optimal value of (19)–(22) (and of the original instance of k-
MCGS). Thus we have designed an 1/2-approximation algorithm for k-MCGS.

5 The Maximum Coverage Problem with a Knapsack


Constraint
In this section we show that the “pipage” rounding of linear relaxations in con-
junction with the partial enumeration method developed by Sahni [12] for the
26 Alexander A. Ageev and Maxim I. Sviridenko

knapsack problem can yield good constant-factor approximation algorithms even


in the case of a knapsack constraint.
The maximum coverage problem with a knapsack constraint (MCKP) is,
given a family F = {Sj : j ∈ J} of subsets of a set I = {1, . . . , n} with asso-
ciated nonnegative P weights wj and costs cj , and a positive integer B, to find a
subset X ⊆ I with j∈X cj ≤ B so as to maximize the total weight of the sets
in F having nonempty intersections with X. MCKP includes both MCP and
the knapsack problem as special cases. Wolsey [13] appears to be the first who
succeeded in constructing a constant-factor approximation algorithm for MCKP
even in a more general setting with an arbitrary nondecreasing submodular ob-
jective function. His algorithm is of greedy type and has performance guarantee
of 1 − e−β ≈ 0.35 where β is the root of the equation eβ = 2 − β. Recently,
Khuller, Moss and Naor [11] have designed a (1 − e−1 )-approximation algorithm
for MCKP by combining the partial enumeration method of Sahni for the knap-
sack problem [12] with a simple greedy procedure. In this section we present an
1 − (1 − 1/k)k-approximation algorithm for MCKP where k is the maximum size
of sets in the instance. Our algorithm exploits the same idea of partial enumer-
ation but instead finding a greedy solution, solves a linear relaxation and then
rounds the fractional solution by a bit more general “pipage” procedure.
Generalizing (4)–(8) rewrite MCKP as the following integer program:
m
X
max wj zj (31)
j=1
X
s. t. xi ≥ zj , j ∈ J, (32)
i∈Sj
n
X
ci xi ≤ B, (33)
i=1
0 ≤ xi , zj ≤ 1, i ∈ I, j ∈ J (34)
xi ∈ {0, 1}, i ∈ I. (35)
Note that one can exclude variables zj by rewriting (31)–(35) as the following
equivalent nonlinear program:

m
X  Y 
max F (x) = wj 1 − (1 − xi ) (36)
j=1 i∈Sj

subject to (33)–(35).
Set k = max{|Sj | : j ∈ J}. Denote by IP [I0 , I1 ] and LP [I0 , I1 ] the integer
program (31)–(35) and its linear relaxation (31)–(34) respectively, subject to the
additional constraints: xi = 0 for i ∈ I0 and xi = 1 for i ∈ I1 where I0 and I1
are disjoint subsets of I. By a solution to IP [I0 , I1 ] and LP [I0 , I1 ] we shall mean
only a vector x, since the optimal values of zj are trivially computed if x is fixed.
We first describe an auxiliary algorithm A which finds a feasible solution
xA to the linear program LP [I0 , I1 ]. The algorithm is divided into two phases.
Approximation Algorithms for Maximum Coverage and Max Cut 27

The first phase consists in finding an optimal solution xLP to LP (I0 , I1 ) by


application one of the known polynomial-time algorithms. The second phase of A
transforms xLP into xA through a series of “pipage” steps. We now describe the
general step. Set xA ← xLP . If at most one component of xA is fractional, stop.
Otherwise, choose two indices i0 and i00 such that 0 < xA A
i0 < 1 and 0 < xi00 < 1.
Set xA
i0 (ε) ← xA
i0 + ε, xA
i00 (ε) ← xA
i00 − εc i0 /c i00 and xA
k (ε) ← x A
k for all k 6
= i0 , i00 .
Find an endpoint ε∗ of the interval
ci00 ci00
[− min{xi0 , (1 − xi00 ) }, min{1 − xi0 , xi00 }].
ci0 ci0
such that F (x(ε∗ )) ≥ F (xA ). Set xA ← x(ε∗ ). Go to the next step.
The correctness of the second phase follows from the fact that the vector x(ε)
is feasible for each ε in the above interval, and from the earlier observation (see
Section 2) that the function F (x(ε)) is convex.
Each ”pipage” step of the second phase of A reduces the number of fractional
components of the current vector xA at least by one. So, finally, A outputs an
“almost integral” feasible vector xA having at most one fractional component.
By construction, F (xA ) ≥ F (xLP ). By (9), it follows that
F (xA ) ≥ F (xLP )
X  X X
≥ wj + 1 − (1 − 1/k)k wj min{1, xLP
i } (37)
j∈J1 j∈J\J1 i∈Sj

where and henceforth J1 = {j : Sj ∩ I1 6= ∅}.


We now present a description of the whole algorithm.
Input: An instance of the integer program (31)–(35);
Output: A feasible solution x to the instance;
P
Among all feasible solutions x to the instance satisfying i∈I xi ≤ 3, by complete
enumeration find a solution x0 of maximum weight;
x ← x0 ; P
for all I1 ⊂ I such that |I1 | = 4 and i∈I1 ci ≤ B do
begin
I0 ← ∅;
t ← 0;
while t = 0 do
begin
apply A to LP [I0 , I1 ];
if all xAi are integral
then begin t ← 1; x̂ ← xA end
otherwise
begin
find i0 such that xAi0 is fractional;
x̂i0 ← 0;
for all i 6= i0 do x̂i ← xA i ;
I0 ← I0 ∪ {i0 }
end
28 Alexander A. Ageev and Maxim I. Sviridenko

if F (x̂) > F (x) then x ← x̂


end
end
We now prove that the performance guarantee of the described algorithm is
indeed (1 − (1 − 1/k)k .
Let X ∗ be an optimal set of the given instance of MCKP. Denote by x∗ the
incidence vector of X ∗ . Recall that x∗ is an optimal solution to the equivalent
nonlinear program (36), (33)–(35). If |X ∗ | ≤ 3, then the output of the algorithm
is an optimal solution. So we may assume that |X ∗ | ≥ 4. W.l.o.g. we may also
assume that the set I is ordered in such a way that X ∗ = {1, . . . , |X ∗ |} and
for each i ∈ X ∗ , the element i covers the sets in F not covered by the elements
1, . . . , i − 1 of the maximum total weight among j ∈ X ∗ \ {1, . . . , i − 1}.
Consider now that iteration of the algorithm at which I1 = {1, 2, 3, 4}. At
this iteration the algorithm runs through q steps, q ≤ n − 4. At step t it calls the
algorithm A to find an“almost integral” feasible solution xt = xA to IP [I0t , I1 ]
where I0t = {i1 , . . . , it−1 }. If all components of xt are integral then t = q and
the iteration ends up. Otherwise the vector xt is replaced by the integral vector
x̂t which coincides with xt in its integral components and equals zero in its
single fractional component indexed by it . If the weight of x̂t exceeds that of the
record solution, the latter is updated. Then the algorithm sets I0t+1 = I0t ∪ {it }
and goes to the next step. Thus at the iteration the algorithm finds a series of
feasible integral solutions x̂1 , . . . , x̂q to (36), (33)–(35) or, equivalently, subsets
X̂1 , . . . , X̂q ⊆ I satisfying X̂t ⊇ I1 and X̂t ∩ I0t = ∅ where I0t = {i1 , . . . , it−1 },
t = 1, . . . , q.
Assume first that X ∗ ∩ I0q = ∅. Then x∗ is an optimal solution to IP [I0q , I1 ].
Recall that x̂q = xq and the latter is obtained from the optimal solution to
LP [I0q , I1 ] by the “pipage” process. By (37) it follows that the solution x̂q , being
not better than the output solution, has weight within a factor of (1−(1−1/k)k )
of the optimum. Thus, in this case we are done.
Assume now that X ∗ ∩ I0q 6= ∅ and let I0s+1 be the first set in the series
I0 = ∅, . . . , I0q , having nonempty intersection with X ∗ . In other words, is is the
1

first index in the series i1 , . . . , iq−1 lying in X ∗ . We claim then that



F (x̂s ) ≥ 1 − (1 − 1/k)k F (x∗ ). (38)

Since the weight of x̂s does not exceed the weight of the output vector of the
algorithm this would prove that (1 − (1 − 1/k)k ) is the performance guarantee
of the algorithm.
In the following argument for brevity we shall use alternately the sets and
their incidence vectors as the arguments of F .
Indeed, the function F can be also treated as the set function F (X) defined on
all subsets X ⊆ I. It is well known that F (X) is submodular and, consequently,
have the property that

F (X ∪ {i}) − F (X) ≥ F (X ∪ Y ∪ {i}) − F (X ∪ Y ), (39)


Approximation Algorithms for Maximum Coverage and Max Cut 29

for all X, Y ⊆ I and i ∈ I. Let i ∈ I and Y ⊇ I1 . Then



1/4F (I1 ) = 1/4F ({1, 2, 3, 4}) = 1/4 F ({1, 2, 3, 4}) − F ({1, 2, 3} +
F ({1, 2, 3}) − F ({1, 2}) +
F ({1, 2}) − F ({1}) +

F ({1}) − F (∅)

by the choice of I1

≥ 1/4 F ({1, 2, 3, i}) − F ({1, 2, 3} +
F ({1, 2, i}) − F ({1, 2}) +
F ({1, i}) − F ({1}) +

F ({i}) − F (∅)

by (39)

≥ F (Y ∪ {i}) − F (Y ) .

Thus, we have proved that for any Y ⊇ I1 and any i ∈ I,

1/4F (I1 ) ≥ F (Y ∪ {i}) − F (Y ). (40)

Recall that the integral vector x̂s is obtained from an “almost integral” vector
s
x returned by the algorithm A, by the replacement with zero its single fractional
component xsis . It follows that

F (X̂s ∪ {is }) ≥ F (xs ). (41)

Let xLP , z LP denote an optimal solution to LP [I0s , I1 ]. Using (35), (40) and (41)
we finally obtain

F (x̂s ) = F (X̂s )

= F (X̂s ∪ {is }) − F (X̂s ∪ {is }) − F (X̂s )

by (40)
≥ F (X̂s ∪ {is }) − 1/4F (I1 )

by (41)
≥ F (xs ) − 1/4F (I1 )
X X Y  X
= wj + wj 1 − (1 − xsi ) − 1/4 wj
j∈J1 j∈J\J1 i∈Sj j∈J1

(by (37))
X  X X
≥ 3/4 wj + 1 − (1 − 1/k)k wj min{1, xLP
i }
j∈J1 j∈J\J1 i∈Sj

k
 X X X 
≥ 1 − (1 − 1/k) wj + wj min{1, xLP
i }
j∈J1 j∈J\J1 i∈Sj
30 Alexander A. Ageev and Maxim I. Sviridenko

(by the choice of s)


 X X X 
≥ 1 − (1 − 1/k)k wj + wj min{1, x∗i }
j∈J1 j∈J\J1 i∈Sj

= 1 − (1 − 1/k) F (x∗ ).
k

Acknowledgement
The authors wish to thank Refael Hassin for helpful comments and for pointing
out some references.

References
1. N. Alon and J. H. Spenser, The probabilistic method, John Wiley and Sons, New
York, 1992.
2. G. Cornuejols, M. L. Fisher and G. L. Nemhauser, Location of bank accounts to
optimize float: an analytic study exact and approximate algorithms, Management
Science 23 (1977) 789–810.
3. G. Cornuejols, G. L. Nemhauser and L. A. Wolsey, Worst-case and probabilistic
analysis of algorithms for a location problem, Operations Research 28 (1980) 847–
858.
4. U. Feige, A threshold of ln n for approximating set cover, J. of ACM. 45 (1998)
634–652.
5. A. Frieze and M. Jerrum, Improved approximation algorithms for MAX k-CUT
and MAX BISECTION, Algorithmica 18 (1997) 67–81.
6. M. X. Goemans and D. P. Williamson, New 3/4-approximation algorithms for
MAX SAT, SIAM J. Discrete Math. 7 (1994) 656–666.
7. N. Guttmann-Beck and R. Hassin, Approximation algorithms for min-sum p-
clustering, Discrete Appl. Math. 89 (1998) 125–142.
8. N. Guttmann-Beck and R. Hassin, Approximation algorithms for minimum K-cut,
to appear in Algorithmica.
9. D. S. Hochbaum, Approximation algorithms for the set covering and vertex cover
problems, SIAM J. on Computing 11 (1982) 555–556.
10. D. S. Hochbaum, Approximating covering and packing problems: Set Cover, Vertex
Cover, Independent Set, and related problems, in: D. S. Hochbaum, ed., Approxi-
mation algorithms for NP-hard problems ( PWS Publishing Company, New York,
1997) 94–143.
11. S. Khuller, A. Moss and J. Naor, The budgeted maximum coverage problem, to
appear in Inform. Proc. Letters.
12. S. Sahni, Approximate algorithms for the 0–1 knapsack problem, J. of ACM 22
(1975) 115–124.
13. L. A. Wolsey, Maximizing real-valued submodular functions: primal and dual
heuristics for location problems, Math. Oper. Res. 7 (1982) 410–425.
Solving the Convex Cost Integer Dual Network Flow
Problem

Ravindra K. Ahuja1, Dorit S. Hochbaum2, and James B. Orlin3

1Industrial & Systems Engineering,


University of Florida, Gainesville, FL 32611, USA
[email protected]
2Department of IE & OR and Haas School of Management,
University of California, Berkeley, CA 94720, USA
[email protected]
3Sloan School of Management, MIT,
Cambridge, MA 02139, USA
[email protected]

Abstract. In this paper, we consider a convex optimization problem where the


objective function is the sum of separable convex functions, the constraints are
similar to those arising in the dual of a minimum cost flow problem (that is, of
the form π(i) - π(j) ≤ wij), and the variables are required to take integer values
within a specified range bounded by an integer U. Let m denote the number of
constraints and (n+m) denote the number of variables. We call this problem the
convex cost integer dual network flow problem. In this paper, we develop
network flow based algorithms to solve the convex cost integer dual network
flow problem efficiently. We show that using the Lagrangian relaxation
technique, the convex cost integer dual network flow problem can be reduced
to a convex cost primal network flow problem where each cost function is a
piecewise linear convex function with integer slopes. We next show that the
cost scaling algorithm for the minimum cost flow problem can be adapted to
solve the convex cost integer dual network flow problem in O(nm log(n2/m)
log(nU)) time. This algorithm improves the best currently available algorithm
and is also likely to yield algorithms with attractive empirical performance.

1 Introduction

In this paper, we consider the following integer programming problem with convex
costs:
Minimize Σ(i, j)∈Q Fij ( w ij ) + Σi∈P Bi (µ i ) (1a)
subject to
µi - µj = wij for all (i, j) ∈ Q, (1b)

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 31-44, 1999.
 Springer-Verlag Berlin Heidelberg 1999
32 Ravindra K. Ahuja, Dorit S. Hochbaum, and James B. Orlin

lij ≤ wij ≤ uij for all (i, j) ∈ Q, (1c)


li ≤ µi ≤ ui for all i ∈ P, (1d)
wij integer for all (i, j) ∈ Q, and µi integer for all i ∈ P, (1e)
where P = {1, 2, … , n} is a set of n numbers, Q ⊆ P x P, and li, ui, lij, and uij are
specified integers. Let m = |Q|. In this problem, µi’s and wij’s are decision variables,
Fij ( w ij ) is a convex function for every (i, j) ∈ Q, and Bi (µ i ) is also a convex
function of µi for every i ∈ P. Let U = max[max{uij – lij: (i, j) ∈ Q}, max{ui – li : i ∈
P}].
We call problem (1) the convex cost integer dual network flow problem (or simply
the dual network flow problem), because the constraints in (1b) are the constraints in
the dual of the minimum cost flow problem. (As a matter of fact, the constraints
arising in the dual of the minimum cost flow problem are of the form µi - µj ≤ wij. We
will show in Section 5 that the constraints of the form µi - µj ≤ wij can be transformed
to the constraints µi - µj = wij.) The dual network flow problem and its special cases
arise in various application settings, including multi-product multistage
production/inventory systems, project scheduling, machine-scheduling with
precedence constraints, dial-a-ride transit problems, inverse optimization, graph
drawing of rectangles, and isotone regression. We describe several applications of the
dual network flow problem in Section 6.
Since each function Fij ( w ij ) is a convex function of the decision variable wij
which is required to take integer values, we can replace Fij ( w ij ) by a piecewise linear
convex function Fij(wij) with integer breakpoints (that is, we set Fij(wij) = Fij ( w ij ) for
each integer wij in the range [lij, uij] and make Fij(wij) linear between every two
consecutive integer values of wij). We perform the same transformation for each
function Bi (µ i ) to obtain a piecewise linear convex function Bi(µi). Now consider
the following problem where we drop the integrality constraints on decision variables:
Minimize Σ(i, j)∈Q Fij ( w ij ) + Σi∈P Bi (µ i ) (2a)
µi - µj = wij for all (i, j) ∈ Q, (2b)
lij ≤ wij ≤ uij for all (i, j) ∈ Q, (2c)
li ≤ µi ≤ ui for all i ∈ P. (2d)
It is well known that these always exist an optimal solution of (2) that is integer.
It is also well known that (2) can further be transformed to a linear programming
problem by introducing a separate variable for each linear segment in the
functions Fij ( w ij ) and Bi (µ i ) . This gives us a linear programming problem with
O(uU + mU) decision variables and hence can be solved in pseudo-polynomial time.
In this paper we develop an O(nm log(n2/m) log(nU)) algorithm to solve the dual
network flow problem. Rockafellar [1984] used a Lagrangian relaxation approach for
solving the dual network flow problem. He showed that the Lagrangian multiplier
problem is a minimum convex cost network flow problem. We use his transformation
as our starting point for our approach. We solve the Lagrangian multiplier problem by
Solving the Convex Cost Integer Dual Network Flow Problem 33

using an adaptation of the preflow-push cost-scaling algorithm for the minimum cost
flow problem due to Goldberg and Tarjan [1987] and show that a solution to the
Lagrangian multiplier can be used to solve the dual network flow problem.
Our paper makes the following contributions: (i) We show that the minimum
convex cost network flow problem, obtained through Lagrangian relaxation
technique, is a piecewise linear convex function with integer slopes. Integrality of the
slopes allows us to solve this problem using Goldberg and Tarjan’s cost scaling
algorithm for the minimum cost flow problem. (The minimum convex cost network
flow problem with general cost structure cannot be solved by Goldberg and Tarjan’s
cost scaling algorithm.) (ii) We use another special property of the cost function to
implement the cost scaling algorithm in a manner so that the running time of the cost
scaling algorithm is the same as for solving the linear cost minimum cost flow
problem. The running time of our algorithm is O(nm log(n2/m) log(nU)), which gives
the best running time to solve the dual network flow problem. According to
McCormick [1998], the cancel and tighten algorithm due to Karzanov and
McCormick [1997] can be used to solve the convex cost integer dual network flow
problem in O(nm log n log(nU)) time. Our algorithm improves this cancel and tighten
approach for all network densities.

2 Transformation to a Network Flow Problem

We use the Lagrangian relaxation technique of Rockafellar [1984] to solve the dual
network flow problem. He showed that the Lagrangian multiplier problem is a
minimum convex cost network flow problem. We review his approach here and
include some additional results. Our approach differs from his approach in a couple
of aspects. First of all, we are focused on finding an optimal integral solution.
Second, we are focused on developing formally efficient algorithms. We also include
this material in this paper because our notation and basic network terminology are
substantially different from that of Rockafellar.
As per Rockafellar [1984], we dualize the constraints (2b) using the vector x,
obtaining the following Lagrangian subproblem:
Minimize Σ(i, j)∈Q Fij ( w ij ) + Σi∈P Bi (µ i ) - Σ(i, j)∈Q (wij + µj - µi)xij (3a)
subject to
lij ≤ wij ≤ uij for all (i, j) ∈ Q, (3b)
li ≤ µi ≤ ui for all i ∈ P. (3c)
Notice that

Σ(i, j)∈Q (µj - µi) xij = Σi∈P µi(Σ{j:(j, i)∈Q} xji - Σ{j:(i, j)∈Q} xij) = Σi∈P µi xi0, (4)

where xi0 = Σ{j:(j,i)∈Q} xji - Σ{j:(i,j)∈Q} xij for each i ∈ P. Substituting (4) into (3)
yields
34 Ravindra K. Ahuja, Dorit S. Hochbaum, and James B. Orlin

L(x) = min Σ(i, j)∈Q { Fij ( w ij ) - xij wij} + Σi∈P{ Bi (µ i ) - x0iµi} (5a)

subject to

xi0 = Σ{j:(j,i)∈Q} xji - Σ{j:(i,j)∈Q} xij for all i ∈ P, (5b)


lij ≤ wij ≤ uij for all (i, j) ∈ Q, (5c)
li ≤ µi ≤ ui for all i ∈ P. (5d)
Since each Fij(wij) is a piecewise linear convex function with integer breakpoints,
Fij(wij) - xijwij is also a piecewise linear convex function with integer breakpoints.
Similarly, each Bi(µi) - xi0µi is also a piecewise linear function with integer
breakpoints. For a given value of x, each term in the first and second summation is a
convex separable function and can be optimized separately. Consequently, in order to
determine L(x), we need to determine the following:
(i) for each (i, j) ∈ Q the value of wij, lij ≤ wij ≤ uij for which Fij(wij) - xijwij attains its
lowest value;
(i) for each i ∈ P the value of µi for which Bi (µ i ) - xi0µi attains its lowest value.
Any piecewise linear convex function with integer breakpoints defined over an
interval [α, β] with α and β integers always has an integer optimal solution.
Therefore, we can find the minimum value of Fij(wij) - xijwij by performing binary
search over the integers in the interval [lij, uij], which requires O(log U) time.
Similarly, we can determine the minimum value of Bi(µi) - xi0µi in O(log U) time.
Hence, we need a total of O((n + m) log U) time to determine L(x).
We have thus shown that for a specified value of x, the Lagrangian subproblem can
be efficiently solved. We now focus on solving the Lagrangian Multiplier Problem,
which is to determine the value of x for which the Lagrangian subproblem attains the
highest objective function value. The Lagrangian multiplier problem is to determine
x* such that
L(x*) = max L(x) (6a)

subject to

xi0 = Σ{j:(j,i)∈Q} xji - Σ{j:(i,j)∈Q} xij for each i ∈ P, (6b)


where L(x) is defined as in (5a). The following well-known theorem for convex
optimization establishes a connection between the Lagrangian multiplier and the dual
network flow problem.
Theorem 1. Let x* be an optimal solution of the Lagrangian multiplier problem (6).
Then L(x*) equals the optimal objective function value of the dual network flow
problem (2).
Proof: The dual network flow problem (2) can be transformed to a linear
programming problem by introducing a separate variable for each linear segment in
the cost functions Fij(wij) and Bi(µi). Then, Theorem 1 follows from a well known
Solving the Convex Cost Integer Dual Network Flow Problem 35

result in the theory of Lagrangian relaxation (see, for example, Ahuja, Magnanti and
Orlin [1993], Chapter 16). ♦
We will now focus on solving the Lagrangian multiplier problem (6). We will
show that the Lagrangian multiplier problem can be transformed into a network flow
problem with nonlinear arc costs. We define a directed network G = (N, A) with the
node set N and the arc set A. The node set N contains a node i for each element i ∈ P
and an extra node, node 0. The arc set A contains an arc (i, j) for each (i, j) ∈ Q and
an arc (i, 0) for each i ∈ P. For each (i, 0), i ∈ P, we let wi0 = µi, li0 = li, µi0 = µi, and
Fi0(wi0) = Bi(µi). Let us define the function Hij(xij) for each arc (i, j) in the following
manner:
Hij(xij) = min{Fij(wij) - xijwij : lij ≤ wij ≤ uij}. (7)

In terms of this notation, the Lagrangian multiplier problem (6) can be restated as

Maximize Σ(i, j)∈A Hij(xij) (8a)

subject to

Σ{j:(j,i)∈A} xji - Σ{j:(i,j)∈A} xij =0 for all i ∈ N, (8b)


which is a network flow problem with nonlinear cost functions and with no upper or
lower bounds on arc flows. In the next section, we will study properties of the
function Hij(xij).

3 Properties of the Function Hij(xij)

In this section, we show that for any arc (i, j), Hij(xij) is a piecewise linear concave
function with at most (uij – lij + 1) linear segments and the slopes of these segments
are integers between –uij and –lij. We shall use these properties in the next section to
develop an efficient cost scaling algorithm to solve the Lagrangian multiplier
problem. Let bij(θ) = Fij(θ +1) – Fij(θ) for lij ≤ θ ≤ uij – 1, θ integral. The fact that
Fij(θ) is a convex function of θ can be used to prove the following results:
Property 1. bij(θ) ≤ bij(θ + 1) ≤ bij(θ + 2) ≤ …… .
The following theorem obtains the expression for Hij(xij):
Theorem 2. The function Hij(xij) = min{Fij(wij) – wijxij : lij ≤ wij ≤ uij and wij integer}
is a piecewise linear concave function of xij, and is described in the following manner:
36 Ravindra K. Ahuja, Dorit S. Hochbaum, and James B. Orlin

%KFij (lij ) − lij xij if − ∞ ≤ xij ≤ bij (lij )

KKFij (lij + 1) − (lij + 1)xij if bij (lij ) ≤ xij ≤ bij (lij + 1)

Hij(xij) = &
K:
KKFij ( θ ) − θxij
(9)
if bij ( θ − 1) ≤ xij ≤ bij ( θ )

KK:F (u ) − u x
' ij ij ij ij if bij (uij − 1) ≤ xij ≤ ∞

Proof Sketch: The function Hij(xij) is defined as Hij(xij) = min{Fij(wij) - xijwij : lij ≤
wij ≤ uij and integer}, or Hij(xij) = min{Fij(θ) - θxij : lij ≤ θ ≤ uij and integer}. The
function Hij(xij) is the lower envelope of the (uij – lij +1) linear functions of xij, and
hence is a piecewise linear concave function. Using Property 1 it can be shown that
each line segment Fij(θ) - θxij is represented in the lower envelope Hij(xij), and it
starts at bij(θ-1) and ends at bij(θ). ♦
We have so far transformed the Lagrangian multiplier problem to maximizing a
concave cost flow problem. We can alternatively restate this problem as

Minimize Σ(i, j)∈A Cij(xij) (10a)

subject to

Σ{j:(j,i)∈A} xji - Σ{j:(i,j)∈A} xij =0 for all i ∈ N, (10b)


where Cij(xij) = -Hij(xij). Hence slopes of the linear segments in Cij(xij) are negative
of the corresponding slopes in Hij(xij). Since Hij(xij) is a piecewise linear concave
function, Cij(xij) is a piecewise linear convex function. The problem (10) is a convex
cost flow problem without lower and upper bounds on arc flows. However, the
algorithm described in Section 4 requires lower and upper bounds on arc flows. It can
be shown that there exists an optimal solution of (10) such that xij satisfies –M ≤ xij ≤
M for each (i, j) ∈ A, where M is a sufficiently large number. We shall henceforth
assume that the flow x also satisfies the following constraint:
-M ≤ xij ≤ M for all (i, j) ∈ A. (10c)

4 Cost Scaling Algorithm

The problem (10) is a convex cost flow problem in which the cost associated with the
flow on an arc (i, j) is a piecewise linear convex function containing at most U linear
segments. It is well known that such problems can be transformed to a minimum cost
flow problem by introducing an arc for every linear segment in the cost function (see,
for example, Ahuja, Magnanti and Orlin [1993, Chapter 14]). In the case that the
number of breakpoints for each cost function is bounded by a constant, one can solve
Solving the Convex Cost Integer Dual Network Flow Problem 37

the resulting minimum cost flow problem efficiently; otherwise this approach yields
an exponential-time algorithm. The algorithm by Hochbaum and Shantikumar [1990]
solves this problem in time polynomial in terms of the log of the number of
breakpoints. In this paper, we develop another algorithm for solving the minimum
convex cost network flow problem which has better time complexity.
Our algorithm is an adaptation of the cost scaling algorithm due to Goldberg and
Tarjan [1987]. The running time of our algorithm for the dual network flow problem
is O(nm log(n2/m) log(nU)) which is comparable to the time taken by the Goldberg-
Tarjan (GT) algorithm for the minimum cost flow problem. We refer to our algorithm
as the convex cost scaling algorithm to differentiate it from GT’s linear cost scaling
algorithm. Our subsequent discussion requires some familiarity with the linear cost
scaling algorithm and we refer the readers to the paper by Goldberg and Tarjan [1987]
or the book of Ahuja, Magnanti and Orlin [1993] for a description of this algorithm.
The cost scaling algorithm maintains a pseudoflow at each step. A pseudoflow x is
any function x: A → R satisfying -M ≤ xij ≤ M for all (i, j) ∈ A. For any pseudoflow
x, we define the imbalance of node i as

Σ Σ
e(i) = {j:(i,j)∈A} xji – {j:(i,j)∈A} xij for all i ∈ N. (11)
If e(i) > 0 for some node i, we refer to node i as an excess node and refer to e(i) as
the excess of node i. We refer to a pseudoflow x with e(i) = 0 for all i ∈ N as a flow.
The cost scaling algorithm also maintains a value π(i) for each node i ∈ N. We refer
to the vector π as a vector of node potentials.
The cost scaling algorithm proceeds by constructing and manipulating the residual
network G(x) defined as follows with respect to a pseudoflow x. For each (i, j) ∈ A,
the residual network G(x) contains two arcs (i, j) and (j, i). The cost of sending yij
units of flow in arc (i, j) is Cij(xij + yij) – Cij(xij); that is, it is the incremental cost of
increasing the flow in (i, j) by yij units. Notice that yij can be positive as well as
negative. We set the cost of arc (i, j) in the residual network as cij = Cij(xij + 1) –
Cij(xij) and set the cost of arc (j, i) as cji = Cij(xij – 1) – Cij(xij). Thus cij is the
marginal cost of increasing the flow in arc (i, j) and cji is the marginal cost of
decreasing flow in arc (i, j). By Theorem 2, both cij and cji are integer valued. The
following result is well known for convex cost flows (see for example, Rockafellar
[1984]).
Theorem 3. A flow x is an optimal flow of (10) if and only if G(x) contains no
negative cost directed cycle.
For a given residual network G(x) and a set of node potentials π, we define the
reduced cost of an arc (i, j) as cπij = cij - π(i) + π(j). Just like its linear counterpart,
the convex cost scaling algorithm relies on the concept of approximate optimality. A
flow or a pseudoflow x is said to be ε-optimal for some ε ≥ 0 if for some node
potentials π, the pair (x, π) satisfies the following ε-optimality conditions:
cπij ≥ - ε for every arc (i, j) in G(x). (12)

We call an arc (i, j) admissible if - ε ≤ cπij < 0. We shall use the following result
in our algorithm. This result uses the fact that each arc in G(x) has an integer cost.
38 Ravindra K. Ahuja, Dorit S. Hochbaum, and James B. Orlin

Lemma 1. (Bertsekas [1979]). If ε < 1/n, then any ε-optimal feasible flow of (10) is
an optimal flow.
We next define the residual capacity rij of an arc (i, j) in G(x) with respect to a flow
x and a set of node potential π. The residual capacity rij of an arc (i, j) in G(x) is the
additional flow we need to send on the arc (i, j) so that the reduced costs of both the
arcs (i, j) and (j, i) become nonnegative, that is, if one sends rij units of flow in (i, j)
and then recomputes the residual network, then cπij ≥ 0 and cπji ≥ 0. Alternatively, the
residual capacity rij of an arc (i, j) in G(x) is the flow yij that minimizes (Cij(xij + yij):
yij ≥ 0). The following lemma obtains a formula for rij.

Lemma 2. Let θ = π(i) - π(j). Suppose that (i, j) is an arc with cijπ < 0. Then, rij =
bij(θ) - xij, if θ ≤ uij –1; otherwise rij = M - xij.
Proof Sketch: If θ ≤ uij –1, then sending rij units of flow makes the flow on this arc
equal to bij(θ). Using Theorem 2, it can be shown that both the reduced costs cπij and

cπji are nonnegative. If θ ≤ uij –1, then we augment M - xij units of flow, which
eliminates the arc (i, j) from the residual network and makes cπji nonnegative. ♦

The convex cost scaling algorithm treats ε as a parameter and iteratively obtains ε-
optimal flows for successively smaller values of ε. Initially, ε = U. The algorithm
performs cost scaling phases by repeatedly applying an improve-approximation
procedure that transforms an ε-optimal flow into an ε/2-optimal flow. After log(nU)
+ 1 scaling phases, ε < 1/n and the algorithm terminates with an optimal flow. Figure
1 gives the algorithm description.

algorithm cost scaling;


begin
π := 0; x := 0; ε := U;
while ε ≥ 1/n do
begin
improve-approximation(ε, x, π);
ε := ε/2;
end;
x is an optimal flow for (10);
end;

(a)
Solving the Convex Cost Integer Dual Network Flow Problem 39

procedure improve-approximation;
begin
for each arc (i, j) in G(x) do
if cπij < 0 then send rij amount of flow on arc (i, j);
while the network contains an excess node do
begin
select an excess node i;
if G(x) contains an admissible arc (i, j) then
push δ := min{e(i), rij} units of flow from node i to node j
else π(i) := π(i) + ε/2;
end;
end;

(b)

Fig. 1. The convex cost scaling algorithm.

In the algorithm, we call the operation of sending δ = min{e(i), rij} units of flow
over the admissible arc (i, j) a push. If δ = rij; we refer to the push as saturating and
nonsaturating otherwise. We also refer to the operation of increasing the potential of
node i from π(i) to π(i) + ε/2 as a relabel operation. The purpose of a relabel
operation is to create new admissible arcs emanating from this node.
Goldberg and Tarjan had observed that their linear cost scaling algorithm could
be extended to treat convex cost flows; moreover, the bounds on the number of pushes
per scaling phase were the same as for the linear cost flow problem. Tarjan [1998]
communicated this to one of the co-authors of this paper. The algorithm gives an ε–
optimal flow at the end of the ε-scaling phase. However, for the general convex cost
functions, the algorithm is not guaranteed to give an optimal flow no matter how
small ε is. But in our case, the cost functions have integer slopes, and the algorithm
gives an optimal flow if ε < 1. The running time analysis of the cost scaling algorithm
for the convex cost case does not appear in the published literature; hence we give a
brief sketch of this analysis here.
Theorem 4. The convex cost scaling algorithm correctly solves the dual network flow
problem in O(nm log(n2/m) log(nU)) time.
Proof Sketch: The running time of O(nm log(n2/m) log(nU)) for the linear cost
scaling algorithm relies on the following two critical facts: (i) in a scaling phase any
node is relabeled O(n) times; and (ii) in between two consecutive saturating pushes of
any arc (i, j) node i must be relabeled at least once. These two facts hold for the
convex cost scaling algorithm too. The bound on the number of non-saturating pushes
also works as before. The dynamic tree data structure also works as it does for the
linear cost case. All we needs for it to work is that the residual capacities should
behave well in the following sense: if the residual capacity is rij, and if you send k
units of flow in the arc, then the residual capacity reduces to rij – k. This result holds
from the manner in which we define the residual capacities. Hence the convex cost
40 Ravindra K. Ahuja, Dorit S. Hochbaum, and James B. Orlin

scaling algorithm runs in O(nm log(n2/m)) time per scaling phase and in O(nm
log(n2/m) log(nU)) time overall. ♦
The cost scaling algorithm upon termination gives an optimal flow x* and the
optimal node potentials π*. Both the x* and π* may be non-integer. Since the
objective function in the convex cost flow problem (10) is piecewise linear, it follows
that there always exist optimal node potentials π. To determine those, we construct
G(x*) and solve a shortest path problem to determine shortest path distance d(i) from
node 0 to every other node i ∈ N. Then π(i) = -d(i) for each i ∈ N gives an integer
optimal set of node potentials for the problem (12). Now recall that Cij(xij) = -Hij(xij)
for each (i, j) ∈ A. This implies that µ(i) = -π(i) = d(i) for each i ∈ N gives optimal
dual variables for (8) and these µ(i) together wij = µ(i) – u(j) for each (i, j) ∈ A give
an optimal solution of the dual network flow problem (2).

5 Generalizations of the Dual Network Flow Problem

In our formulation of the dual network flow problem we have assumed that the
constraints µi - µj = wij are in the equality form. We will show in this section that the
constraints of the forms µi - µj ≤ wij can be transformed to the equality form; hence
there is no loss of generality by restricting the constraints to the equality form.
Suppose that we wish to solve the following problem:
Minimize Σ(i, j)∈Q Fij ( w ij ) + Σi∈P Bi (µ i ) (13a)
subject to
µi - µj ≤ wij for all (i, j) ∈ Q, (13b)
lij ≤ wij ≤ uij for all (i, j) ∈ Q, (13c)
li ≤ µi ≤ ui for all i ∈ P. (13d)
Let w *ij denote the value of wij for which Fij(wij) is minimum. In case there are
multiple values for which Fij(wij) is minimum, choose the minimum such value. Let
us define the function Eij(wij) in the following manner:

%KFij ( w*ij ) if w ij ≤ w*ij


Eij(wij) = &KF ( w ) (14)
' ij ij if w ij > w*ij

Now consider the following problem:


Minimize Σ(i, j)∈Q E ij ( w ij ) + Σi∈P Bi (µ i ) (15a)
subject to
µi - µj = wij for all (i, j) ∈ Q, (15b)
lij ≤ wij ≤ uij for all (i, j) ∈ Q, (15c)
li ≤ µi ≤ ui for all i ∈ P. (15d)
Solving the Convex Cost Integer Dual Network Flow Problem 41

The following lemma establishes a relationship between optimal solutions of (13) and
(15).
Lemma 3. For every optimal solution ( w , µ ) of (13), there is an optimal solution
( wü , µ ) of (15) of the same cost, and the converse also holds.
Proof Sketch: Consider an optimal solution ( w , µ ) of (13). We can show that the
ü ij = min{ µ i – µ j , w ij } is an optimal solution of (15) with
ü , µ ) of with w
solution ( w
ü , µü ) is an optimal solution of (15),
the same cost. Similarly, it can be shown that if ( w
then the solution ( w , µü ) constructed as w ij = max{ µü i – µü j , w ü ij } is an optimal
solution of (13). ♦

6 Applications of the Dual Network Flow Problem


The dual network flow problem and its special cases arise in many application
settings. Roundy [1986] formulates a lot-sizing problem in a multi-product, multi-
stage, production/inventory system as a dual network flow problem. Boros and
Shamir [1991] describe an application of the dual network flow problem in solving a
quadratic cost machine scheduling problem. Several multi-facility location problems
have constraints of the form (1b)–(1d) (see for example, Ahuja Magnanti and Orlin
[1993, Chapter 19]). The convex cost version of these location problems will be dual
network flow problems. We describe next in detail three applications of dual network
flow problems, some of which we have encountered in our previous research.

Application 1: Inverse Spanning Tree Problem with Convex Costs

Consider an undirected network G = (N, A) with the node set N and the arc set A. Let
n = |N| and m = |A|. We assume that N = {1, 2, ... , n} and A = {a1, a2, ..., am}. Let cj
denote the cost of the arc aj. In the inverse spanning tree problem we are given a
spanning tree T0 of G which may or may not be a minimum spanning tree of G and
we wish to perturb the arc cost vector c to d so that T0 becomes a minimum spanning
∑ j=1
n
tree with d as the cost vector and Fj(dj – cj) is minimum, where each Fj(dj –
cj) is a convex function of dj. Sokkalingam, Ahuja and Orlin [1999], and Ahuja and
Orlin [1998] have studied special cases of the inverse spanning tree problem with cost
∑ j=1
n
functions as Fj|dj – cj| and max{|dj – cj| : 1 ≤ j ≤ m}.
We assume without any loss of generality that T0 = {a1, a2, ... , an-1}. We refer to
the arcs in T0 as tree arcs and the arcs not in T0 as nontree arcs. In the given
spanning tree T0, there is a unique path between any two nodes; we denote by W[aj]
the set of tree arcs contained between the two endpoints of the arc aj. It is well known
(see, for example, Ahuja, Magnanti and Orlin [1993]) that T0 is a minimum spanning
tree with respect to the arc cost vector d if and only if
di ≤ dj for each ai ∈ W[aj] and for each j = n, n+1, ..., m. (17)
42 Ravindra K. Ahuja, Dorit S. Hochbaum, and James B. Orlin

We can convert the inequalities in (17) into equations by introducing nonnegative


slack variables wij’s. Let P = {1, 2, … , m} and Q = {(i, j) : ai ∈ T0 and aj ∈ W[ai]}.
Then, in this notation, the inverse spanning tree problem can be formulated as the
following optimization problem:
Minimize Σi∈P Fi(di – ci) (18a)
subject to
di – dj = wij for each (i, j) ∈ Q, (18b)
wij ≥ 0 for each (i, j) ∈ Q. (18c)
This problem is clearly an instance of the dual network flow problem.

Application 2: Time-Cost Tradeoff Problem in Project Scheduling

Project scheduling is an important management tool used in numerous industries


regularly. We will show that the time-cost tradeoff problem, an important problem in
project scheduling, can be formulated as a dual network flow problem. We refer the
reader to the book by Elmaghraby [1978] for a comprehensive discussion of the
applications of network models in project scheduling.
We can envision a project as a directed graph G = (N, A) where the arc set A
represents the jobs of the project and the node set N represents events, denoting the
beginning and ending of jobs. Different jobs in the network have precedence
relations. We require that all jobs directed into any node must be completed before
any job directed out of the node begins. We let node s designate the beginning of the
project and node T designate the ending of the project.
Normally, in project scheduling we assume that the job completion time is fixed;
however, we here consider a more general settings where job completion times are
variable. Let tij denote the completion time of job (i, j). We allow tij to vary in the
range [αij, βij] and associate a convex cost function Fij(tij) which captures the cost of
completing the job for different completion times in the range [αij, βij]. We also have
another set of decision variables υi’s denoting event times; the variable υi denotes the
time when jobs emanating from node i can begin. The time-cost tradeoff problem is
to determine the minimum cost of completing the project in a specified time period T.
This problem can be mathematically stated as follows:
Minimize Σ(i,j)∈A Fij(tij) (19a)
subject to
υt - υs ≤ T (19b)
υj - υi ≥ tij for all (i, j) ∈ A, (19c)
αij ≤ tij ≤ βij for all (i, j) ∈ A. (19d)
Observe that the constraints (19b) capture the fact that the project must be
completed within the time period T, and the constraints (19c) imply that for every arc
(i, j) ∈ A, event j can occur only after wij time later than the occurrence of event i.
The formulation (19) is an instance of the dual network flow problem.
Solving the Convex Cost Integer Dual Network Flow Problem 43

Application 3: Isotonic Regression Problem

The isotonic regression problem can be defined as follows. Given a set A = {a1, a2,
… , an} ∈ Rn, find X = {x1, x2, … , xn} ∈ Rn so as to
∑ j=1
n
Minimize Bj(xj – aj) (21a)
subject to
xj ≤ xj+1 for all j = 1, 2, … , n-1, (21b)
lj ≤ xj ≤ uj for all j = 1, 2, … , n-1, (21c)
where Bj(xj – aj) is a convex function for every j, 1 ≤ j ≤ n. The isotonic regression
problem arises in statistics, production planning, and inventory control (see for
example, Barlow et al. [1972] and Robertson et al. [1988]. As an application of the
isotonic regression, consider a fuel tank where fuel is being consumed at a slow rate
and measurements of the fuel tank are being taken at different points in time.
Suppose these measurements are a1, a2, .., an. Due to the errors in the measurements,
these numbers in may not be in the non-increasing order despite the fact that the true
amounts of fuel remaining in the tank are non-increasing. However, we need to
determine these measurements as accurately as possible. One possible way to
accomplish this could be to perturb these numbers to x1 ≥ x2 ≥ … ≥ xn so that the cost
∑ j=1
n
of perturbation given by Bj(xj – aj) is minimum, where each Bj(xj – aj) is a
convex function. We can transform this problem to the isotonic regression problem
by replacing xj’s by their negatives.
If we define P = {1, 2, … , n} and Q = {(j, j+1) : j = 1, 2, … , n-1} and require that
xj must be integer, then the isotonic regression problem can be cast as a dual network
flow problem. However, it is a very special case of the dual network flow problem
and more efficient algorithms can be developed to solve it compared to the dual
network flow problem. Ahuja and Orlin [1998] recently developed an O(n log U)
algorithm to solve the isotonic regression problem.

Reference

Ahuja, R. K., T. L. Magnanti, and J. B. Orlin. 1993. Network Flows: Theory,


Algorithms, and Applications, Prentice Hall, NJ.
Ahuja, R. K., and J. B. Orlin. 1997. A fast algorithm for the bipartite node weighted
matching on path graphs with application to the inverse spanning tree problem.
Working Paper, Sloan School of Management, MIT, Cambridge, MA.
Barlow, R. E., D. J. Bartholomew, D. J. Bremner, and H. D. Brunk. 1972. Statistical
Inference under Order Restrictions. Wiley, New York.
Bertsekas, D. P. 1979. A distributed algorithm for the assignment problem. Working
Paper, Laboratory for Information and Decision Sciences, MIT, Cambridge, MA.
Boros, E., and R. Shamir. 1991. Convex separable minimization over partial order
constraints. Report No. RRR 27-91, RUTCOR, Rutgers University, New
Brunswick, NJ.
44 Ravindra K. Ahuja, Dorit S. Hochbaum, and James B. Orlin

Elmaghraby, S. E. 1978. Activity Networks: Project Planning and Control by Network


Models. Wiley-Interscience, New York.
Hochbaum, D.S., and J.G. Shanthikumar. 1990. Convex separable optimization is not
much harder than linear optimization. Journal of ACM 37, 843-862.
Karzanov, A. V., and S. T. McCormick. 1997. Poloynomial methods for separable
convex optimization in unimodular linear spaces with applications. SIAM Journal
on Computing 4, 1245-1275.
McCormick, S. T. 1998. Personal communications.
Robertson, T., F. T. Wright, and R. L. Dykstra. 1988. Order Restricted Statistical
Inference. John Wiley & Sons, New York.
Rockafellar, R. T. 1984. Network Flows and Monotropic Optimization. Wiley-
Interscience, New York.
Roundy, R. 1986. A 98% effective lot-sizing rule for a multi-product, multi-stage,
production-/inventory system. Mathematics of Operations Research 11, 699-727.
Sokkalingam, P. T., R. K. Ahuja, and J. B. Orlin. 1999. Solving inverse spanning
tree problems through network flow techniques. Operations Research, March-April
Issue.
Tarjan, R. E. 1998. Personal Communications.
Some Structural and Algorithmic Properties of
the Maximum Feasible Subsystem Problem

Edoardo Amaldi1 , Marc E. Pfetsch2 , and Leslie E. Trotter, Jr.1


1
School of OR&IE, Rhodes Hall, Cornell University, Ithaca, NY 14853, USA
{amaldi,ltrotter}@cs.cornell.edu
2
Department of Mathematics, MA 7–1, TU Berlin, Germany
[email protected]

Abstract. We consider the problem Max FS: For a given infeasible


linear system, determine a largest feasible subsystem. This problem has
interesting applications in linear programming as well as in fields such
as machine learning and statistical discriminant analysis. Max FS is
N P -hard and also difficult to approximate. In this paper we examine
structural and algorithmic properties of Max FS and of irreducible in-
feasible subsystems (IISs), which are intrinsically related, since one must
delete at least one constraint from each IIS to attain feasibility. In partic-
ular, we establish: (i) that finding a smallest cardinality IIS is N P -hard
as well as very difficult to approximate; (ii) a new simplex decomposition
characterization of IISs; (iii) that for a given clutter, realizability as the
IIS family for an infeasible linear system subsumes the Steinitz problem
for polytopes; (iv) some results on the feasible subsystem polytope whose
vertices are incidence vectors of feasible subsystems of a given infeasible
system.

1 Introduction
We consider the following combinatorial problem related to infeasible linear in-
equality systems.
Max FS: Given an infeasible system Σ : {Ax ≤ b} with A ∈ IRm×n and
b ∈ IRm , find a feasible subsystem containing as many inequalities as possible.
This problem has several interesting applications in various fields such as
statistical discriminant analysis, machine learning and linear programming (see
[2, 26, 22] and the references therein). In the latter case, it arises when the
LP formulation phase yields infeasible models and one wishes to diagnose and
resolve infeasibility by deleting as few constraints as possible, which is the com-
plementary version of Max FS [19, 27, 12]. In most situations this cannot be
done by inspection and the need for effective algorithmic tools has become more
acute with the considerable increase in model size. In fact, Max FS turns out to
be N P -hard [10] and it does not admit a polynomial time approximation scheme
unless P = N P [3]. The above complementary version, in which the goal is to
delete as few inequalities as possible in order to achieve feasibility, is equivalent
to solve to optimality, but is much harder to approximate than Max FS [5, 4].

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 45–59, 1999.
c Springer-Verlag Berlin Heidelberg 1999
46 Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter, Jr.

Not surprisingly, minimal infeasible subsystems, first discussed in Motzkin’s


thesis [25], play a key role in the study of Max FS. A subsystem Σ 0 of Σ is
an irreducible infeasible subsystem (IIS) when Σ 0 is infeasible, but every proper
subsystem of Σ 0 is feasible. In order to help the modeler resolve infeasibility of
large linear inequality systems, attention was first devoted to the problem of
identifying IISs with a small and possibly minimum number of inequalities [19];
see [14, 13] for several heuristics, now available in commercial solvers such as
CPLEX and MINOS [11]. Clearly, when there are many overlapping IISs, this
does not provide enough information to repair the original system. To achieve
feasibility, one must delete an inequality from each IIS. If all IISs were known,
the complementary version of Max FS could be formulated as the following
covering problem [17].
Min IIS Cover: Given an infeasible system Σ P : {Ax ≤ b} with A ∈PIRm×n and
m
b ∈ IR and the set C of all its IISs, minimize
m
i=1 yi subject to i∈C yi ≥ 1
∀ C ∈ C, yi ∈ {0, 1}, 1 ≤ i ≤ m.
Note that |C| can grow exponentially with m and n [10].
An exact algorithm based on a partial cover formulation is proposed in [26, 27]
and heuristics are described in [22, 12]; a collection of infeasible LPs is maintained
in the Netlib library. In [29, 30] the class of hypergraphs representing the IISs of
infeasible systems is studied and it is shown that in some special cases Max FS
and Min IIS Cover can be solved in polynomial time in the number of IISs.
In this paper we investigate some structural and algorithmic properties of
IISs and of the polytope defined by the convex hull of incidence vectors of fea-
sible subsystems of a given infeasible system. It is worth noting that, although
Max FS with 0 − 1 variables can be easily shown to admit as a special case
the graphical problem of finding a maximum independent node set, it has a dif-
ferent structure when the variables are real-valued. Recent work on problems
related to Max FS and IISs includes, for instance, Håstad’s breakthrough [20]
which bridges the approximability gap for Max FS on GF (p), as well as the
investigation of the problems of determining minimum or minimal witnesses of
infeasibility in network flows [1].
Below we denote the ith row of the matrix A ∈ IRm×n by ai ∈ IRn , 1 ≤ i ≤ m;
for S ⊆ [m] := {1, . . . , m}, AS denotes the |S|×n matrix consisting of the rows of
A indexed by S. By identifying the ith inequality of the system Σ (i.e., ai x ≤ bi )
with index i itself, [m] may also refer to Σ.

2 Irreducible Infeasible Subsystems

First we briefly recall the main known structural results regarding IISs. For
notational simplicity, we use the same A and b, with A ∈ IRm×n and b ∈ IRm ,
to denote either the original system Σ or one of its IISs.
The known characterizations of IISs are based on the following version of
the Farkas Lemma. For any system Σ : {Ax ≤ b}, either Ax ≤ b is feasible or
∃ y ∈ IRm , y ≥ 0, such that yA = 0 and yb < 0, but not both.
Some Structural and Algorithmic Properties 47

Theorem 1 (Motzkin [25], Fan [16]). The system Σ : {Ax ≤ b} with A, b


as above is an IIS if and only if rank(A) = m − 1 and ∃ y ∈ IRm , y > 0, such
that yA = 0 and yb < 0.

The rank condition obviously implies that m ≤ n + 1.


Now let Σ : {Ax ≤ b} be an infeasible system which is not necessarily an IIS.
The following result relates the IISs of Σ to the vertices of a given alternative
polyhedron. Recall that the support of a vector is the set of indices of its nonzero
components.

Theorem 2 (Gleeson and Ryan [17]). Let Σ : {Ax ≤ b} be an infeasible


system with A, b as above. Then the IISs of Σ are in one-to-one correspondence
with the supports of the vertices of the polyhedron

P := {y ∈ IRm | yA = 0, yb ≤ −1, y ≥ 0} .

The inequality in the alternative system can obviously be replaced by the equa-
tion yb = −1. Note that, by using the transformation into Karmarkar’s standard
form, any polytope can be expressed as {y ∈ IRm | yA = 0, y1l = 1, y ≥ 0} for
an appropriate matrix A. Theorem 2 can also be stated in terms of rays [27] and
elementary vectors [18].
Definition 1. An elementary vector of a subspace L ⊆ IRm is a nonzero vector
y that has a minimal number of nonzero components (when expressed with respect
to the standard basis of IRm ). In other words, if x ∈ L and supp(x) ⊂ supp(y)
then x = 0, where supp(y) denotes the support of y.

Corollary 1 (Greenberg [18]). Let Σ : Ax ≤ b, A ∈ IRm×n , b ∈ IRm be an


infeasible system. Then S ⊆ [m] corresponds to an IIS of Σ if and only if there
exists an elementary vector y in the subspace L := {y ∈ IRm | yA = 0} with
yb < 0, y ≥ 0 such that S = supp(y).

The following result establishes an interesting geometric property of the poly-


hedra obtained by deleting any inequality from an IIS.

Theorem 3 (Motzkin [25]). Let Σ : {Ax ≤ b} be an IIS and let σ ∈ Σ be an


arbitrary inequality of Σ. Then the polyhedron corresponding to Σ \ σ, i.e., the
subsystem obtained by removal of σ, is an affine convex cone.

2.1 Minimum Cardinality IISs

We now determine the complexity status of the following problem for which
heuristics have been proposed in [14, 13, 26, 27].
Min IIS: Given an infeasible system Σ : {Ax ≤ b} as above, find a minimum
cardinality IIS.
48 Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter, Jr.

To settle the issue left open in [19, 14, 27], we prove that Min IIS is not only
N P -hard to solve optimally but also hard to approximate. Note that, where
DT IM E(T (m)) denotes the class of problems solvable in time T (m), the as-
sumption N P 6⊆ DT IM E(mpolylog m ) is stronger than N P 6⊆ P , but it is also
believed to be extremely unlikely. Results that hold under such an assumption
are often referred to as almost N P -hard.
Theorem 4. Assuming P 6= N P , no polynomial time algorithm is guaranteed
to yield an IIS whose cardinality is at most c times larger than the minimum
one, for any constant c ≥ 1. Assuming N P 6⊆ DT IM E(mpolylog m ), Min IIS
1−ε
cannot be approximated within a factor 2log m , for any ε > 0, where m is the
number of inequalities.
Proof. We proceed by reduction from the following problem: Given a feasible
0 0 0
linear system Dz = d, with D ∈ IRm ×n and d ∈ IRm , find a solution z satisfy-
ing all equations with as few nonzero components as possible. In [4] we establish
that it is (almost) N P -hard to approximate this problem within the same type
of factors, but with m replaced by n, the number of variables. Note that the
above nonconstant factor grows faster than any polylogarithmic function, but
slower than any polynomial one.
For each instance of the latter problem with an optimal solution containing
s nonzero components, we construct a particular instance of Min IIS with a
minimum cardinality IIS containing s+ 1 inequalities. Given any instance (D,d),
consider the system
 +  +
 z−  z
D −D −d  z  = 0, 0t 0t −1  z −  < 0, z + , z − ≥ 0, z0 ≥ 0.
z0 z0
(1)
Since the strict inequality implies z0 > 0, the system Dz = d has a solution with
s nonzero components if and only if (1) has one with s + 1 nonzero components.
Now, applying Corollary 1, (1) has such a solution if and only if the system
 t   
D 0
 −Dt  x ≤  0  (2)
−dt −1
has an IIS of cardinality s + 1. Since (2) is the alternative system of (1), the
Farkas Lemma implies that exactly one of these is feasible; as (1) is feasible, (2)
must be infeasible. Thus (2) is a particular instance of Min IIS with m = 2n0 + 1
inequalities in n = m0 variables.
Given that the polynomial time reduction preserves the objective function
modulo an additive unit constant, we obtain the same type of non-approx-
imability factors for Min IIS. t
u
Note that for the similar (but not directly related) problem of determining
minimum witnesses of infeasibility in network flows, NP-hardness is established
in [1].
Some Structural and Algorithmic Properties 49

2.2 IIS Simplex Decomposition

Here we provide a new geometric characterization of IISs. For A ∈ IRm×n , b ∈


IRm , let Ai := A[m]\{i} and bi := b[m]\{i} denote the (m − 1) × n submatrix and,
respectively, the (m − 1)–dimensional vector obtained by removing the ith row
of A and ith component of b. The following result strengthens the initial part of
Theorem 1.

Lemma 1. For any IIS {Ax ≤ b}, Ai has linearly independent rows, ∀ i; i.e.,
rank(Ai ) = m − 1.

Proof. According to Theorem 1, there exists a y > 0 such that yA = 0 and


yb = −1 (by scaling yb < 0). Suppose some proper subset of rows is linearly
dependent; i.e. ∃z, such that zA = 0, zb ≥ 0 (without loss of generality) and
some zk = 0.
If some zi > 0, consider (y − z)A = 0, (y − z)b ≤ −1, where  =
min{yi /zi > 0 | 1 ≤ i ≤ m, zi > 0} (and y is as above). Then y − z ≥ 0,
the ith component of y − z is 0 and the Farkas Lemma contradicts minimality
of the system (y − z fulfills the requirements).
If all zi ≤ 0, then −z ≥ 0, −zA = 0 and −zb ≤ 0; so setting y = −z in
the Farkas Lemma leads to a contradiction of minimality, provided −zb < 0. If
−zb = 0, then (y + z)A = 0, (y + z)b = −1, with  = min{yi /(−zi ) | 1 ≤ i ≤
m, −zi > 0} leads to a contradiction as above. t
u

It is interesting to note that this lemma together with Theorem 1 imply that an
infeasible system {Ax ≤ b} is an IIS if and only if rank(Ai ) = m − 1 for all i,
1 ≤ i ≤ m.
We thus have the following simplex decomposition result for IISs.

Theorem 5. The system {Ax ≤ b} is an IIS if and only if {x ∈ IRn | Ax ≥


b} = L + Q, where L is the lineality subspace {x ∈ IRn | Ax = 0} and Q is
an (m − 1)–simplex with vertices determined by maximal proper subsystems of
{Ax = b}; namely, each vertex of Q is a solution for a subsystem {Ai x = bi },
1 ≤ i ≤ m.

Proof. (⇒) To see feasibility of {Ax ≥ b}, delete constraint ai x ≥ bi to get the
equality system {Ai x = bi }. By Lemma 1, this system has a solution, say xi ,
and we must have ai xi > bi , else xi satisfies {Ax ≤ b}. Applying the polyhedral
resolution theorem, P := {x ∈ IRn | Ax ≥ b} = 6 ∅ can be written as P = K + Q,
where K = {x ∈ IRn | Ax ≥ 0} is its recession cone and Q ⊆ P is a polytope
generated by representatives of its minimal nonempty faces.
If x satisfies Ax ≥ 0 and ai x > 0 for row ai then xi −x satisfies A(xi −x) ≤
b for sufficiently large  > 0 and the original system {Ax ≤ b} would be feasible.
Therefore we must have that each ai x = 0 for 1 ≤ i ≤ m, x ∈ K and we get
that in fact K = L := {x ∈ IRn | Ax = 0}.
For Q, minimal nonempty faces of P are given by changing a maximal set of
inequalities into equalities (all but one relation). Thus the vectors xi obtained
50 Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter, Jr.

by solving {Ai x = bi } determine Q; i.e., Q = conv({x1 , . . . , xm }). For A ∈


IRm×n , Q is the (m − 1)–simplex generated by the m points {x1 , . . . , xm }. To
see that the xi generate an (m − 1)–simplex, we must only show that they
i j
are affinely
P independent.
P But if x is affinely dependent on the other x , then
xi = j6=i λj xj with j6=i λj = 1. Thus we have ai xi > bi , but also ai xi =
P P P
ai ( j6=i λj xj ) = j6=i λj (ai xj ) = j6=i λj bi = bi , which is a contradiction.
(⇐) If the system {Ax ≤ b} is infeasible, then the minimality is obvious, because
the simplex conditions on Q imply that every proper subsystem has an equality
solution.
To show that {Ax ≤ b} is infeasible, assume for the sake of contradiction
that x̂ ∈ {x ∈ IRn | Ax ≤ b} = 6 ∅ and x̂ satisfies a maximal number of these
relations at equality. Let ai x̂ < bi and note that for xi defined as above, we have
ai xi > bi . Thus we can set λ = (ai xi − bi )/(ai xi − ai x̂) and have 0 < λ < 1,
so that ai (λx̂ + (1 − λ)xi ) = bi . But then at λx̂ + (1 − λ)xi more relations of
{Ax ≤ b} hold at equality than at x̂, contradicting the choice of x̂. t
u

According to the above proof, we can take the xi ’s as the representatives


of the minimal nonempty faces of {Ax ≤ b} that lie in L⊥ ; i.e., Q ⊂ L⊥ . By
Lemma 1, we know that {x ∈ IRn | Ai x = bi } = xi + L, where L is the lineality
space of the original linear system {Ax ≥ b}.
It is worth observing that Theorem 5 handles the following special cases.
If m = 1, then A has only one row, say {Ax ≤ b} = {0x ≤ −1}. Thus L = {x ∈
IRn | 0x = 0} = IRn and {x ∈ IRn | 0x ≥ −1} = IRn + {0} = L + Q = L.
If m = n + 1, then A has n + 1 rows. Assuming A to be of full column rank,
L = {x ∈ IRn | Ax = 0} = {0} and Q = conv({x1 , . . . , xn+1 }) is an n–simplex
and {x ∈ IRn | Ax ≥ b} = {0} + Q.

3 IIS-Hypergraphs

Consider for any infeasible system the following hypergraph.

Definition 2. Given an infeasible system Σ : {Ax ≤ b} with A ∈ IRm×n and


b ∈ IRm , H = (V, E) is the IIS-hypergraph of Σ if
i. the nodes in V are in one-to-one correspondence with the inequalities of Σ,
ii. the hyperedges in E are in one-to-one correspondence with the IISs of Σ and
each hyperedge contains precisely the nodes associated to the inequalities
contained in the corresponding IIS.

Investigations on the structure of IIS-hypergraphs began with [29, 30]. In


particular, it was shown that IIS-hypergraphs do not share many properties
with other known classes of hypergraphs generalizing bipartite graphs. Indeed,
IIS-hypergraphs (with no trivial IISs of cardinality 1) just turn out to be bi-
colourable; i.e., their nodes can be partitioned into two subsets so that neither
subset contains a hyperedge. Note, however, that there is more structure for
Some Structural and Algorithmic Properties 51

IIS-hypergraphs than simply bicolourability, as there will generally exist many


different bipartitions into two feasible subsystems [29, 18].
In IIS-hypergraph terminology, Min IIS Cover amounts to finding a mini-
mum cardinality transversal, i.e., a subset of nodes having nonempty intersection
with every hyperedge. The special structure of the IIS-hypergraphs accounts for
the fact that the greedy algorithm is guaranteed to find a minimum transveral
for those with nondegenerate alternative polyhedra [30] (a subclass of uniform
hypergraphs) while the problem is N P -hard even for simple graphs, i.e., for
2-uniform hypergraphs.
Here we address the fundamental problem of recognizing IIS-hypergraphs.
For the definitions of a poset and (face) lattice see, e.g., [31].
Let E be a finite set and F a clutter on E. The poset L(F ) = (S, ≤) can be
constructed as follows. S ⊆ 2E and the relation “≤” on S is the set inclusion.
A subset
S U of E is in S if U is the intersection of elements T of F . The element
1̂ := {F ∈ F } is also in S. Notice that the zero 0̂ := {F ∈ F } is always in S
and is possibly the empty set. Then L(F ) is a lattice with the meet defined by
intersection. Note that the size of L(F ) can be exponential in the size of F .
The face lattice of a polytope P is its set of faces, ordered by inclusion, with
the meet defined by intersection. It is well known (see, e.g., [31]) that the face
lattice of P has a rank function r(·) satisfying r(F ) = dim(F ) − 1 for any face F ,
and is both atomic and coatomic. Two polytopes with isomorphic face lattices
are combinatorially equivalent.
Let R denote either ZZ , Q, A (the real algebraic numbers over Q) or IR.
IIS Realizability problem for R: Given a clutter C over a finite ground set
of cardinality m, does there exist an infeasible linear system {Ax ≤ b}, with
A ∈ Rm×n and b ∈ Rm , such that the sets in C index the IISs of this system?
In the above definition, infeasibility is meant with respect to IR and n is free. If
such a system exists, the clutter C is IIS-realizable. The IIS Realizability problem
is obviously equivalent to that of recognizing IIS-hypergraphs. In the sequel we
also consider the restricted version of the IIS Realizability problem in which the
right-hand side of the linear system is fixed, namely, in which b = −1l.
Steinitz problem for R: Given a lattice L, does there exist a polytope P ⊂ IRd
with vertices in Rd such that the face lattice of P is isomorphic to L?
If the answer is affirmative, L is realizable as a polytope. In this case d can be
assumed to be the dimension of L. P can be given either as a (complete) list of
vertices or facets. See [9] for related material.

Theorem 6. The IIS Realizability problem is at least as hard as the Steinitz


problem.

Proof. We show that for any instance of the Steinitz problem we can construct
in polynomial time a special instance of the above-mentioned restricted IIS Re-
alizability problem such that the answer to the first instance is affirmative if
and only if the answer to the second instance is affirmative. Since face lattices
52 Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter, Jr.

of polytopes need to be ranked as well as atomic and these properties can be


checked in polynomial time, we focus attention on this type of lattices.
Given an arbitrary instance of the Steinitz problem defined by a ranked
atomic lattice L, we construct the following special instance of the restricted
IIS Realizability problem with b = −1l. Suppose L contains k atoms and m
coatoms. Label arbitrarily the coatoms with the sets {1}, . . . , {m} and the atoms
with the sets C1 , . . . , Ck , where Ci includes all the elements in the labels of the
coatoms that contain the corresponding atom. Define C = {C1 , . . . , Ck } and
C := {C 1 , . . . , C k }, where C i = {1, . . . , m} \ Ci . Thus the arbitrary choices of
the labeling just correspond to a permutation of coordinates and hence do not
change the structure.
If the original instance of the Steinitz problem has a positive answer, there
exists a polytope P such that L is the face lattice of P . According to the remark
following Theorem 2, this polytope can be expressed in the special form P =
0 0
{y ∈ IRm | yA = 0, y1l = 1, y ≥ 0}, with A ∈ IRm ×n and suitable m0 , n.
Hence m0 > m, the number of facets, and {Ax ≤ −1l} is the infeasible system
associated to P .
Since the face lattice of a polytope is coatomic, each face of P can be identified
with the set of facets it is contained in. If these sets corresponding to all faces
are ordered by set inclusion, one obtains a lattice L0 which is anti-isomorphic to
the face lattice of P . The meet is defined by intersection. It is easy to see that
the lattice L(C) is isomorphic to L0 . The atoms correspond to the facets of P
and the coatoms to its vertices.
By construction, each set Ci (atom of L) corresponds to a vertex v i of P . All
facets of P are defined by inequalities of the form yi ≥ 0. Up to relabeling of the
coatoms in the definition of C, the facet defined by yi ≥ 0 can be identified with
{i}. Thus Ci = {j ∈ {1, . . . , m} | vji = 0} and C i is the support of the vertex v i .
By Theorem 2, each C i corresponds to an IIS of the associated infeasible system
{Ax ≤ −1l} and hence C is IIS-realizable with the restricted type of right-hand
side and with a polytope as alternative polyhedron.
Conversely, suppose that the corresponding instance of the restricted IIS
Realizability problem with b = −1l defined by C has a positive answer and
consider the alternative polyhedron P = {y ∈ IRm | yA = 0, y1l = 1, y ≥
0} with A ∈ IRm×n . As seen above, each C i corresponds to the support of a
vertex of P and each Ci corresponds to the set of facets that this vertex lies
on, i.e., L(C) is anti-isomorphic to the face lattice of P . Now the vertex-facet
incidence information encoded in C and the fact that L is atomic, imply the
whole structure of the lattice L. Therefore L(C) is anti-isomorphic to L and
hence P is a realization of L. t
u

Given polynomials f1 , . . . , fr , g1 , . . . , gs , h1 , . . . , ht ∈ ZZ [x1 , . . . , xl ], the prob-


lem to decide whether the polynomial system f1 = · · · = fr = 0, g1 ≥ 0, . . . , gs ≥
0, h1 > 0, . . . , ht > 0 has a solution in Rl = A l is called the Existential theory
of the reals (ETR). ETR is polynomial time equivalent to the Steinitz problem
for 4-Polytopes over A [28]. (All polytopes realizable over IR are realizable over
Some Structural and Algorithmic Properties 53

A .) Moreover, ETR is polynomial time equivalent to the Steinitz problem for


d-Polytopes with d + 4 vertices over A [24]. Since ETR is easily verified to be
NP-hard, the same is valid for the general Steinitz problem (over A ) and for the
IIS Realizability problem.
According to Theorem 2.7 of [9], for R = Q or A , to decide whether an
arbitrary polynomial f ∈ ZZ [x1 , . . . , xl ] has zeros in Rl , where l is a positive
integer, is equivalent to solve the Steinitz problem for R. For R = Q, it is not even
clear whether the Steinitz problem (and therefore the IIS Realizability problem)
is decidable since finding roots in R = Q of a single polynomial f ∈ ZZ [x1 , . . . , xl ]
is the unsolved rational version of Hilbert’s 10th problem. By the well known
theorem of Matiyasevic, there does not exist an algorithm for deciding whether
f has roots in ZZ . By the quantifier elimination result of Tarski, the problem
is decidable for R = A . Note that, unlike IR, A admits a finite representation.
For R = A , it is unkown whether the Steinitz problem is in NP. See [23, 8] and
references therein for this and related issues.

4 Feasible Subsystem (FS) Polytope

Consider an infeasible system Σ : {Ax ≤ b} and let [m] = {1, . . . , m} be the set
of indices of all inequalities in Σ. If I denotes the set of all feasible subsystems
of Σ, ([m], I) is clearly an independence system and its set of circuits C(I)
corresponds to the set of all IISs. We denote by PF S the polytope generated by
the convex hull of all the incidence vectors of feasible subsystems.
Let us first briefly recall some definitions and facts about independence sys-
tem polytopes. To any independence system (E, I) with the family of circuits
denoted by C(I) we can associate the polytope P (I) = P (C(I)) = conv({y ∈
{0, 1}|E| | y is the incidence vector of an I ∈ I}). The rank function is defined
by r(S) = max{|I|P| I ⊆ S, I ∈ I} for all S ⊆ E. For any S ⊆ E, the rank
inequality for S is e∈S ye ≤ r(S), which is clearly valid for P (I). A subset
S ⊆ E is closed if r(S ∪ {t}) ≥ r(S) + 1 for all t ∈ E − S and nonseparable if
r(S) < r(T ) + r(S − T ) for all T ⊂ S, T 6= ∅. For any set S ⊆ E, S must be
closed and nonseparable for the corresponding rank inequality to define a facet
of P (I). These conditions generally are only necessary, but sufficient conditions
can be stated using the following concept [21]. For S ⊆ E, the critical graph
GS (I) = (S, F ) is defined as follows: (e, e0 ) ∈ F , for e, e0 ∈ S, if and only if
there exists an independent set I such that I ⊆ S, |I| = r(S) and e ∈ I, e0 ∈ / I,
I − e + e0 ∈ I. It is shown in [21] that if S is a closed subset of E and the critical
graph GS (I) of I on S is connected, then the corresponding rank inequality
induces a facet of the polytope P (I). See references in [15].

4.1 Rank-Facets of the FS Polytope

As PF S is an independence system polytope, it is full-dimensional if and only


if there are no trivially infeasible inequalities in Σ. The inequalities yi ≥ 0 are
facet defining for all 1 ≤ i ≤ m. Moreover, it is easy to verify that for each i the
54 Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter, Jr.

inequality yi ≤ 1 defines a facet of PF S if and only if there is no IIS of cardinality


2 that includes i and PF S is full-dimensional.
In fact, Parker [26] began an investigation of the polytope associated to the
Min IIS Cover problem, considering it as a special case of the general set
covering polytope (see references in [15]). Since there is a simple correspondence
between set covering polytopes and the complementary independence system
polytopes [21], the results in [26] can be translated so that they apply to PF S .
P S be an arbitrary IIS of Σ, AS x ≤ bS be its corresponding subsystem,
Let
and i∈S yi ≤ r(S) = |S| − 1 the corresponding
P (rank) IIS-inequality. Since
the complementary covering inequality i∈C yi ≥ 1 induced by every IIS C is
proved to be facet defining in [26], we have:
Theorem 7. The IIS-inequality arising from any IIS defines a facet of PF S .
We give here a geometric proof (based on the above-mentioned sufficient con-
ditions [21]), which is simpler than that of [26] and which provides additional
insight into the IIS structure.
Proof. It is easy to verify that IIS-inequalities are valid for PF S . Since the critical
graph corresponding to any IIS is clearly connected (in fact, a complete graph),
we just need to show that every IIS is closed.
a) First consider the case of maximal IISs, i.e. with |S| = n + 1.

K1
1111111111111111
0000000000000000
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
x
0000000000000000
1111111111111111
1

d1

d3 d2
^x
x3 x2 1
000000000000
111111111111 00000000000
11111111111
11111111111
00000000000
111111111111
000000000000 00000000000
11111111111
000000000000
111111111111
000000000000
111111111111 00000000000
11111111111
000000000000
111111111111 00000000000
11111111111
000000000000
111111111111 00000000000
11111111111
00000000000
11111111111
000000000000
111111111111 00000000000
11111111111
000000000000
111111111111
000000000000
111111111111 00000000000
11111111111
000000000000
111111111111 00000000000
11111111111
K000000000000
111111111111
000000000000
111111111111
3 00000000000
11111111111
00000000000
11111111111
000000000000
111111111111
000000000000
111111111111
00000000000 2
11111111111
00000000000
11111111111 K
000000000000
111111111111 00000000000
11111111111
000000000000
111111111111
3 00000000000
11111111111
00000000000
11111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111 2
00000000000
11111111111
00000000000
11111111111
000000000000
111111111111 00000000000
11111111111

For each i ∈ S, consider the unique xi = A−1 S\{i} bS\{i} . By the proof of
1
Theorem 5, we know that x , . . . , x n+1
are affinely independent. If di := (xi − x̂)
1 Pn+1 i
for all i, 1 ≤ i ≤ n + 1, where x̂ := n+1 i=1 x , d1 , . . . , dn+1 are also affinely
Pn+1 n i
independent. Clearly i=1 di = 0 and the di ’s generate IR . Since each x
i i
satisfies exactly n of the n + 1 inequalities in S with equality and a x > bi
(otherwise S would be feasible), we have x̂ ∈ {x ∈ IRn | AS x ≥ bS }, i.e., x̂
satisfies the reversed inequalities of the IIS. In fact, x̂ is an interior point of the
above “reversed” polyhedron.
Some Structural and Algorithmic Properties 55

According to Theorem 3, deleting any inequality from an IIS yields a feasible


subsystem that defines an affine cone. For maximal IISs, we have n + 1 affine
cones Ki := xi +Ki0 , where Ki0 = {x ∈ IRn | AS\{i} x ≤ 0} for 1 ≤ i ≤ n+1. Note
that the ray generated by di and passing through xi , i.e., Ri := {x ∈ IRn | x =
xi + αdi , α ≥ 0}, is contained in Ki because we have

AS\{i} (αdi ) = αAS\{i} (xi − x̂) = α(bS\{i} − AS\{i} x̂) ≤ 0,

where we used the fact that AS\{i} x̂ ≥ bS\{i} . Now consider an arbitrary in-
equality ãx ≤ b̃ with ã 6= 0. We will verify that H := {x ∈ IRn | ãx ≤ b̃} has
a nonempty intersection with at least one of the Ki ’s, 1 ≤ i ≤ n + 1. Thus, for
any t ∈ E − S we have rank(S ∪ {t}) = rank(S) + 1 = n + 1, which means that
the IIS defined by S is closed. Pn+1 Pn+1
Since d , . . . , dn+1 generate IRn and i=1 di = 0, we have i=1 ãdi =
Pn+1 1
ã( i=1 di ) = 0 and therefore ã 6= 0 implies that we cannot have ãdi =
0 ∀i, 1 ≤ i ≤ n + 1. Thus there exists at least one i, such that ãdi < 0.
But this implies that Ri ∩ H 6= ∅. In other words, Ki ∩ H 6= ∅ and this proves
the theorem for maximal IISs.
b) The result can be easily extended to non-maximal IISs, i.e., with |S| < n + 1.
From Theorem 5 we know that P := {x ∈ IRn | Ax ≥ b} = L + Q with Q ⊆ L⊥ .
Since P is full-dimensional (x̂ is an interior point), n = dim(P ) = dim(L) +
dim(Q) and dim(Q) = rank(AS ) = |S| − 1 < n imply that dim(L) ≥ 1.
Two cases can arise:
i) If the above-mentioned ã is in lin({a1 , . . . , am }) = L⊥ , the linear hull of the
rows of A, then since dim(L⊥ ) = dim(Q), we can apply the above result to L⊥ .
ii) If ã 6∈ lin({a1 , . . . , am }) = L⊥ , then the projection of H = := {x ∈ IRn | ãx =
b̃} onto L yields the whole L and therefore H = {x ∈ IRn | ãx ≤ b̃} must
have a nonempty intersection with all the cones corresponding to the maximal
consistent subsystems of {AS x ≤ bS }. t
u
It is worth noting that closedness of every IIS makes PF S quite special among
all independence system polyhedra, since the circuits of a general independence
system need not be closed.
The separation problem for IIS-inequalities is defined as follows: Given an
infeasible system Σ and an arbitrary vector y ∈ IRm , show that y satisfies all
IIS-inequalities or find at least one violated by y.
In view of the trivial valid inequalities, we can assume that y ∈ [0, 1]m . Moreover,
we may assume with no loss of generality, that the nonzero components of y
correspond to an infeasible subsystem of Σ.

Proposition 1. The separation problem for IIS-inequalities is N P -hard.

Proof. We proceed by polynomial time reduction from the decision version of


the Min IIS problem, which is N P -hard according to Theorem 4. Given an
infeasible system Σ : {Ax ≤ b} with m inequalities, n variables and a positive
integer K with 1 ≤ K ≤ n + 1, does it have an IIS of cardinality at most K?
56 Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter, Jr.

Let (A, b) and K define an arbitrary instance of the above decision problem.
Consider the particular instance of the separation problem given by the same
infeasible system together with the vector y such that yi = 1 − 1/(K + 1) for all
i, 1 ≤ i ≤ m.
Suppose that Σ has an IIS of cardinality atPmost K which is indexed by the
set S. Then the corresponding IIS-inequality i∈S yi ≤ |S| − 1 is violated by
the vector y because
X X 1 |S|
yi = (1 − ) = |S| − > |S| − 1,
K +1 K +1
i∈S i∈S

where the strict inequality is implied by |S| ≤ K. Thus the vector y can be
separated from PF S .
Conversely, if there exists an IIS-inequality violated by y, then
X |S|
yi = |S| − > |S| − 1
(K + 1)
i∈S

implies that the cardinality of the IIS defined by S is at most K.


Therefore, the original infeasible system Σ has an IIS of cardinality at most
K if and only if some IIS-inequality is violated by the given vector y. t
u

In [21] the concept of generalized antiwebs, which includes as special cases


generalized cliques, generalized odd holes and generalized antiholes, is intro-
duced. Necessary and sufficient conditions are also established for the corre-
sponding rank inequalities to define facets of the associated independence system
polytope.
Let m, t, q be integers such that 2 ≤ q ≤ t ≤ m, let E = {e1 , . . . , em } be a fi-
nite set, and define for each i ∈ M = {1, . . . , m} the subset E i = {ei , . . . , ei+t−1 }
(where the indices are taken modulo m) formed by t consecutive elements of E.
An (m,t,q)-generalized antiweb on E is the independence system having the fol-
lowing family of subsets of E as circuits:

AW(m, t, q) = {C ⊆ E | C ⊆ E i for some i ∈ M, |C| = q}.

As mentioned in [21], AW(m, t, q) corresponds to generalized cliques when m = t,


to generalized odd holes when q = t and t does not divide m, and to generalized
antiholes
P when m = qt+1. The rank inequality induced by a generalized antiweb
y
i∈E i ≤ bm(q − 1)/tc defines a canonical facet of the independence system
polytope P (AW(m, t, q)) if and only if m = t or t does not divide m(q − 1) [21].

In the case of PF S , the ground set is the set of indices of inequalities in the
infeasible system Σ under consideration.

Proposition 2. No facets of PF S are induced by generalized cliques other than


simple IISs (i.e., m = t = q).
Some Structural and Algorithmic Properties 57

Proof. We invoke the following result (Proposition 3.15 of [21]). For any S ⊆ E,
let CS = {C ∈ C | C ⊆ S} denote the family of circuits P of the independence
system induced by (E, I) on S. Then the rank inequality e∈S ye ≤ r(S) induces
a facet of P (C) if and only if S is closed and the rank inequality induces a facet
of P (CS ). Hence it suffices to consider the case S = E and CS = AW(m, t, q)).
It is easy to verify that the only (m, t, q)-generalized antiwebs that can arise
in IIS-hypergraphs are those with q = t. Suppose that q < t and consider E 1 ,
an arbitrary circuit C ∈ AW(m, t, q) with C ⊆ E 1 and an arbitrary element
e ∈ E 1 \ C. By definition of AW(m, t, q), any q subset of E 1 is a circuit. This
must be true in particular for all subsets containing e and q − 1 elements of C.
But then C cannot be closed because r(C ∪ {e}) = r(C) and thus we have a
contradiction to the fact that all IISs are closed (Theorem 7). Hence the only
generalized cliques that can arise are those with m = t = q, that is, in which the
whole ground set E is an IIS. t
u

The generalized antiwebs which are not ruled out by the above proof, i.e,
AW(m, t, q) with q = t, clearly correspond to simple circular sequences of IISs
of cardinality t given by the subsets E i , i ∈ M , of the definition. For t = q = 2, it
is easy to see that the only possible cases that can arise as induced hypergraphs
of IIS-hypergraphs are those with m = 4 and m = 2. In fact, we conjecture that
no other (m, t, q)-generalized antiwebs can occur besides the cases m = t = q
with q ≥ 2, m = 4 and t = q = 2 as well as the trivial cases in which q = 1.
In this respect it is interesting to note that the remark following Theorem 5
implies that the lineality spaces L associated to all the IISs E i , i ∈ M , in any
given generalized antiweb are identical. Therefore we can assume that they are
all maximal IISs contained in L⊥ and exploit the special geometric structure
of such IISs revealed by the proof of Theorem 7. An intermediate step would
then be to show that no sequence of more than 3 such successive IISs E i can
occur without other additional IISs involving t nonsequential elements. In the
case m = 5 and t = 2, this observation is clearly valid.
Besides settling the above-mentioned issue, we are investigating other rank
and non-rank facets of PF S . For rank facets, it is also of interest to consider the
extent to which the sufficient condition involving connectedness of the critical
graph could also be necessary. By enumerating all independence systems on at
most 6 elements, we have verified that all cases with rank facets different from
IIS-inequalities and with a nonconnected critical graph occur in independence
systems which cannot be realized as PF S .
For non-rank facets, we can specialize some known facet classes for general
independence system polytopes and set covering polytopes, e.g., the class of all
facets (0, 1, 2)-valued coefficients characterized in [7]. A simple example of PF S
polytope with such a non-rank facet is as follows. The original system contains
six inequalities in three variables. In addition to the rank inequalities defined
by the five maximal IISs ({3456}, {2345}, {1346}, {1246}, {1245}) and to the
trivial (0, 1)–bounding inequalities, the single additional constraint x1 + x2 +
x3 + 2x4 + x5 + x6 ≤ 5 is required to provide the full description.
58 Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter, Jr.

We have also constructed numerous examples of facets of PF S having coefficients


larger than 2. These examples come from full descriptions of small-to-medium
size problems which we have analyzed using the software PORTA.
Acknowledgement The authors would like to thank G. M. Ziegler for helpful
discussions regarding the material of Section 3.

References
[1] C. C. Aggarwal, R. K. Ahuja, J. Hao, and J. B. Orlin, Diagnosing infeasi-
bilities in network flow problems, Mathematical Programming, 81 (1998), pp. 263–
280.
[2] E. Amaldi, From finding maximum feasible subsystems of linear systems to feed-
forward neural network design, PhD thesis, Dep. of Mathematics, EPF-Lausanne,
1994.
[3] E. Amaldi and V. Kann, The complexity and approximability of finding maxi-
mum feasible subsystems of linear relations, Theoretical Comput. Sci., 147 (1995),
pp. 181–210.
[4] , On the approximability of minimizing nonzero variables or unsatisfied re-
lations in linear systems, Theoretical Comput. Sci., 209 (1998), pp. 237–260.
[5] S. Arora, L. Babai, J. Stern, and Z. Sweedyk, The hardness of approximate
optima in lattices, codes, and systems of linear equations, J. Comput. Syst. Sci.,
54 (1997), pp. 317–331.
[6] A. Bachem and M. Grötschel, New aspects of polyhedral theory, in Optimiza-
tion and Operations Research, A. Bachem, ed., Modern Applied Mathematics,
North Holland, 1982, ch. I.2, pp. 51 – 106.
[7] E. Balas and S. M. Ng, On the set covering polytope: All the facets with coeffi-
cients in {0,1,2}, Mathematical Programming, 43 (1989), pp. 57–69.
[8] L. Blum, F. Cucker, M. Shub, and S. Smale, Complexity and Real Compu-
tation, Springer-Verlag, 1997.
[9] J. Bokowski and B. Sturmfels, Computational Synthetic Geometry, no. 1355
in Lecture Notes in Mathematics, Springer-Verlag, 1989.
[10] N. Chakravarti, Some results concerning post-infeasibility analysis, Eur. J.
Oper. Res., 73 (1994), pp. 139–143.
[11] J. Chinneck, Computer codes for the analysis of infeasible linear programs, J.
Oper. Res. Soc., 47 (1996), pp. 61–72.
[12] , An effective polynomial-time heuristic for the minimum-cardinality IIS set-
covering problem, Annals of Mathematics and Artificial Intelligence, 17 (1996),
pp. 127–144.
[13] , Feasibility and viability, in Advances in Sensitivity Analysis and Parametric
Programming, T. Gál and H. Greenberg, eds., Kluwer Academic Publishers, 1997.
[14] J. Chinneck and E. Dravnieks, Locating minimal infeasible constraint sets in
linear programs, ORSA Journal on Computing, 3 (1991), pp. 157–168.
[15] M. Dell’Amico, F. Maffioli, and S. Martello, Annotated Bibliographies in
Combinatorial Optimization, John Wiley, 1997.
[16] K. Fan, On systems of linear inequalities, in Linear Inequalities and Related
Systems, H. W. Kuhn and A. W. Tucker, eds., no. 38 in Annals of Mathematical
Studies, Princeton University Press, NJ, 1956, pp. 99–156.
[17] J. Gleeson and J. Ryan, Identifying minimally infeasible subsystems of inequal-
ities, ORSA Journal on Computing, 2 (1990), pp. 61–63.
Some Structural and Algorithmic Properties 59

[18] H. J. Greenberg, Consistency, redundancy, and implied equalities in linear sys-


tems, Annals of Mathematics and Artificial Intelligence, 17 (1996), pp. 37–83.
[19] H. J. Greenberg and F. H. Murphy, Approaches to diagnosing infeasible linear
programs, ORSA Journal on Computing, 3 (1991), pp. 253–261.
[20] J. Håstad, Some optimal inapproximability results, in Proc. Twenty-ninth Ann.
ACM Symp. Theory of Comp., ACM, 1997, pp. 1–10.
[21] M. Laurent, A generalization of antiwebs to independence systems and their
canonical facets, Mathematical Programming, 45 (1989), pp. 97–108.
[22] O. L. Mangasarian, Misclassification minimization, J. of Global Optimization,
5 (1994), pp. 309–323.
[23] B. Mishra, Computational real algebraic geometry, in Handbook of Discrete and
Computational Geometry, J. Goodman and J. O’Rouke, eds., CRC Press, 1997,
ch. 29.
[24] N. E. Mnëv, The universality theorems on the classification problem of configura-
tion varieties and convex polytopes varieties, in Topology and Geometry – Rohlin
Seminar, O. Y. Viro, ed., no. 1346 in Lecture Notes in Mathematics, Springer-
Verlag, 1988, pp. 527 – 543.
[25] T. S. Motzkin, Beiträge zur Theorie der Linearen Ungleichungen, PhD thesis,
Basel, 1933.
[26] M. Parker, A set covering approach to infeasibility analysis of linear program-
ming problems and related issues, PhD thesis, Dep. of Mathematics, University of
Colorado at Denver, 1995.
[27] M. Parker and J. Ryan, Finding the minimum weight IIS cover of an infeasible
system of linear inequalities, Annals of Mathematics and Artificial Intelligence,
17 (1996), pp. 107–126.
[28] J. Richter-Gebert, Realization Spaces of Polytopes, no. 1643 in Lecture Notes
in Mathematics, Springer-Verlag, 1996.
[29] J. Ryan, Transversals of IIS-hypergraphs, Congressus Numerantium, 81 (1991),
pp. 17–22.
[30] , IIS-hypergraphs, SIAM J. Disc. Math., 9 (1996), pp. 643–653.
[31] G. M. Ziegler, Lectures on Polytopes, Springer-Verlag, 1994.
Valid Inequalities for Problems with Additive
Variable Upper Bounds?

Alper Atamtürk1 , George L. Nemhauser2 , and Martin W. P. Savelsbergh2


1
Department of Industrial Engineering and Operations Research,
University of California, Berkeley, CA 94720-1777, USA
[email protected]
2
School of Industrial and Systems Engineering,
Georgia Institute of Technology, Atlanta, GA 30332-0205, USA
{gnemhaus}{mwps}@isye.gatech.edu

Abstract. We study the facial structure of a polyhedron associated with


the single node relaxation of network flow problems with additive vari-
able upper bounds. This type of structure arises, for example, in net-
work design/expansion problems and in production planning problems
with setup times. We first derive two classes of valid inequalities for this
polyhedron and give the conditions under which they are facet-defining.
Then we generalize our results through sequence independent lifting of
valid inequalities for lower-dimensional projections. Our computational
experience with large network expansion problems indicates that these
inequalities are very effective in improving the quality of the linear pro-
gramming relaxations.

1 Introduction

The single node fixed-charge flow polyhedron, studied by Padberg et al. [9] and
Van Roy and Wolsey [12], arises as an important relaxation of many 0-1 mixed
integer programming problems with fixed charges, including lot-sizing problems
[4,10] and capacitated facility location problems [1]. The valid inequalities de-
rived for the single node fixed-charge flow polyhedron have proven to be effective
for solving these types of problems. Here we study a generalization of the sin-
gle node fixed-charge flow polyhedron that arises as a relaxation of network flow
problems with additive variable upper bounds, such as network design/expansion
problems and production planning problems with setup times. We derive several
classes of strong valid inequalities for this polyhedron. Our computational ex-
perience with network expansion problems indicates that these inequalities are
very effective in improving the quality of the linear programming relaxations.
In a network design problem, given a network and demands on the nodes, we
are interested in installing capacities on the edges of the network so that the total
cost of flows and capacity installation is minimized. If some of the edges already
?
This research is supported, in part, by NSF Grant DMI-9700285 to the Georgia
Institute of Technology.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 60–72, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Valid Inequalities for Problems with Additive Variable Upper Bounds 61

have positive capacities, the problem is called a network expansion problem.


In many applications capacity is available in discrete quantities and has a cost
structure that exhibits economies of scale [5,11]. The constraints of the mixed
integer programming formulation of the network expansion problem for a single
node are
X X
yi − yi ≤ b (1)
i∈M + i∈M −
X
yi ≤ u i + aij xj , i ∈ M. (2)
j∈N (i)

Inequality (1) is the balance constraint of a node with inflow (M + ) and


outflow (M − ) edges and demand b. The continuous variable yi represents the
flow on edge i, i ∈ M = M + ∪M − . Inequalities (2) are the additive variable upper
bound (AVUB) constraints on the flow variables. N (i) is the index set of binary
variables xj representing the availability of resources that increase the capacity
of edge i that has capacity ui . For multi-commodity network expansion problems,
it is possible to arrive at this single commodity relaxation by aggregating the
balance constraints and the flow variables over commodities for a single node.
Additive variable upper bounds generalize the simple variable upper bounds
in three respects. First, several binary variables additively increase the upper
bound on the continuous variable. Second, the continuous variable is not neces-
sarily restricted to zero when its additive variable bounds are zero. Third, we
allow an overlap of additive variable upper bound variables, i.e. N (i) ∩ N (k) 6= ∅
for i, k ∈ M . This situation typically occurs when capacities are installed on
subsets of edges such as on cycles of the network (rings). Note that a simple
variable upper bound constraint yi ≤ ui xi is a special case of (2). We also point
out that a variable lower bound constraint li xi ≤ yi can be put into a simple
form of AVUB, ȳi ≤ (ui − li ) + li x̄i , after complementing the binary variable xi
and the continuous variable yi assuming that it has a finite upper bound ui .
Multi-item production planning problems with setup times have the following
constraints as part of their MIP formulations
X
yit0 ≥ dit , ∀i, t (3)
t0 ≤t
X X
yit ≤ ut − ai xit , ∀t (4)
i i

where dit denotes the demand for item i in period t, ut the total production
capacity in period t and ai the setup time required for item i if the machine is
setup for this item. Aggregating the demand constraints (3) and the production
variables yit for each period, we arrive at the same structure as in (1)-(2).
In the next section we introduce four classes of valid inequalities for

P = {(x, y) ∈ IBn × IRm


+ : subject to (1) and (2)}
62 Alper Atamtürk, George L. Nemhauser, and Martin W. P. Savelsbergh

and give conditions under which these inequalities are facet-defining for conv(P ).
In Section 3 we present a summary of computational results on the use of the
new inequalities in a branch-and-cut algorithm for network expansion problems.

2 Valid Inequalities
The proofs of the results given in the sequel are abbreviated or omitted due to
space considerations. For detailed proofs and for further results and explanations,
the reader is referred to Atamtürk [2].
Let N = {1, 2, . . . , n} be the index set of binary variables and N (S) be
the subset of N appearing in the additive variable upper bound constraints
associated with S ⊆ M = {1, 2, . . . ,m}. For notational
 simplicity we use N (i)
P P
for N ({i}). We define u(S) = i∈S ui + j∈N (i) aij for S ⊆ M and aj (S) =
P
i∈S aij for j ∈ N (S). Again for notational simplicity we use u(i) for u({i}).
Throughout we make the following assumptions on the data of the model:
(A.1) aij > 0 for all i ∈ M, j ∈ N (i).
(A.2) u(i) > 0 for all i ∈ M .
(A.3) b + u(M − ) > 0.
(A.4) u(i) − aij ≥ 0 for all i ∈ M, j ∈ N (i).
(A.5) b + u(M − ) − aj (M − ) ≥ 0 for all j ∈ N .
Assumptions (A.2-A.5) are made without loss of generality. If u(i) < 0 or
b+u(M −) < 0, then P = ∅. If u(i) = 0 (b+u(M −) = 0), then yi = 0 (yi = 0, i ∈
M + ) in every feasible solution and can be eliminated. Similarly if u(i) − aij < 0
or b + u(M − ) − aj (M − ) < 0, then xj = 1 in every feasible solution and can be
eliminated. Note that given (A.1), if N (i) 6= ∅ for all i ∈ M , then (A.4) implies
(A.2) and (A.5) implies (A.3). Assumption (A.1) is made for convenience. Results
presented in the sequel can easily be generalized to the case with aij < 0. Note
that, for a particular j ∈ N if aij < 0 for all i ∈ M , then xj can be complemented
to satisfy (A.1). If there is no overlap of additive variable upper bounds, i.e.
N (i) ∩ N (k) = ∅ for all i, k ∈ M , then M (j) is singleton for all j ∈ N and (A.1)
can be satisfied by complementing the binary variables when aij < 0.
Proposition 1. Conv(P ) is full-dimensional.

2.1 Additive Flow Cover Inequalities


For C + ⊆ M + and C − ⊆ M − , (C + , C − ) is said to be a flow cover if λ =
u(C + ) − bP− u(C − ) > 0. For a flow cover (C + , C − ), let L− ⊆ M − \ C − be such
that γ = i∈L− ui < λ and K = M − \ (C − ∪ L− ). Then, the additive flow cover
inequality is
X X
yi + (aj (C + ) − λ + γ)+ (1 − xj )−
i∈C + j∈N (C + )
X X
min{aj (L− ), λ − γ}xj − yi ≤ b + u(C − ) + γ. (5)
j∈N (L− ) i∈K
Valid Inequalities for Problems with Additive Variable Upper Bounds 63

Proposition 2. The additive flow cover inequality (5) is valid for P .

Proof. Let (x̄, ȳ) ∈ P and T = {j ∈ N : x̄j = 0}. Also define N (C + )+ = {j ∈


N (C + ) : aj (C + ) > λ − γ}, N (L− )+ = {j ∈ N (L− ) : aj (L− ) > λ − γ }, and
N (L− )− = N (L− ) \ N (L− )+ . For (x̄, ȳ) the left hand side of (5), lhs, equals
X X X X
ȳi + (aj (C + ) − λ + γ)+ − min{aj (L− ), λ − γ} − ȳi .
i∈C + j∈N (C + )∩T j∈N (L− )\T i∈K

If (N (C + )+ ∩ T ) ∪ (N (L− )+ \ T ) = ∅, then
X X X
lhs ≤ ȳi − aj (L− ) − ȳi ≤ b + u(C − ) + γ.
i∈C + j∈N (L− )− \T i∈K
P P
To + yi − i∈L− yi −
P see that the second−
inequality is valid,
P observe that i∈C

P −
y i ≤ b+u(C ) is valid for P and − y i ≤ u(L )− j∈N (L ) aj (L )−
− +
Pi∈K −
i∈L
− +
j∈N (L− )− ∩T aj (L ) is valid for (x̄, ȳ) since N (L ) ⊆ T . Adding these two
inequalities gives the result. Now, suppose (N (C ) ∩ T ) ∪ (N (L− )+ \ T ) 6= ∅.
+ +

Then
X X
lhs ≤ u(C + ) − aj (C + ) + aj (C + ) +
j∈N (C + )∩T j∈N (C + )+ ∩T
X X X
(γ − λ) − (λ − γ) − aj (L− )
j∈N (C + )+ ∩T j∈N (L− )+ \T j∈N (L− )− \T

≤ u(C + ) − λ + γ − (λ − γ)[|N (C + )+ ∩ T | + |N (L− )+ \ T | − 1]


≤ b + u(C − ) + γ. u
t

Remark 1. For the single node fixed-charge flow model, where (2) is replaced
with yi ≤ ui xi , the additive flow cover inequality reduces to the flow cover
inequality [12]
X X X X X
yi + (ui − λ)+ (1 − xi ) − min{ui , λ}xi − yi ≤ b + ui .
i∈C + i∈C + i∈L− i∈K i∈C −

Proposition 3. The additive flow cover inequality (5) is facet-defining for


conv(P) if the following five conditions are satisfied.

1. C − = ∅,
2. maxj∈N (C + ) aj (C + ) > λ − γ,
3. aj (L− ) > λ − γ for some j ∈ N (i) for all i ∈ L− with ui = 0,
4. ui ≥ 0 for all i ∈ L− ,
5. N (L− ) ∩ N (M \ L− ) = ∅.

Note that if there is no overlap of additive variable upper bounds among


continuous variables, then Condition 5 is trivially satisfied.
64 Alper Atamtürk, George L. Nemhauser, and Martin W. P. Savelsbergh

2.2 Additive Flow Packing Inequalities


Next we give the second class of valid inequalities for P . For C + ⊆ M + and C − ⊆
M − , (C + , C − ) is said to be a flow packing if µ = −λ = b + u(C − )P
− u(C + ) > 0.

For a flow packing (C , C ), let L ⊆ M \ C be such that γ = i∈L+ ui < µ
+ + + +

and K = M − \ C − . Then the additive flow packing inequality is


X X
yi − min{aj (L+ ), µ − γ}xj +
i∈C + ∪L+ j∈N (L+ )
X X
(aj (C − ) − µ + γ)+ (1 − xj ) − yi ≤ u(C + ) + γ. (6)
j∈N (C − ) i∈K

Proposition 4. The additive flow packing inequality (6) is valid for P .


Proof. Let (x̄, ȳ) ∈ P and T = {j ∈ N : x̄j = 0}. Also let N (C − )+ = {j ∈
N (C + ) : aj (C − ) > µ − γ}, N (L+ )+ = {j ∈ N (C + ) : aj (L+ ) > µ − γ}, and
N (L+ )− = N (L+ ) \ N (L+ )+ . For (x̄, ȳ) the left hand side of (6), lhs, equals
X X X X
ȳi − min{aj (L+ ), µ − γ} + (aj (C − ) − µ + γ) − ȳi .
i∈C + ∪L+ j∈N (L+ )\T j∈N (C − )+ ∩T i∈K

If (N (L+ )+ \ T ) ∪ (N (C − )+ ∩ T ) = ∅, then
P P P
lhs = i∈C + ∪L+ ȳi − j∈N (L+ )− \T aj (L+ ) − i∈K ȳi ≤ u(C + ) + γ.
Otherwise,
X X
lhs ≤ b + u(C − ) − aj (C − ) − (µ − γ)
j∈N (C − )∪T j∈N (L+ )+ \T
X X
aj (L+ ) + (aj (C − ) − µ + γ)
j∈N (L+ )− \T j∈N (C − )+ ∩T

≤ b + u(C − ) − µ + γ − (µ − γ)[|N (L+ )+ \ T | + |N (C − )+ ∩ T | − 1]


≤ u(C + ) + γ. u
t

Remark 2. For the single node fixed-charge flow model, the additive flow packing
inequality reduces to the flow packing inequality [3]
X X X X X
yi − min{ui , µ}xi + (ui − µ)+ (1 − xi ) − yi ≤ ui .
i∈C + ∪L+ i∈L+ i∈C − i∈K i∈C +

Proposition 5. The additive flow packing inequality (6) is facet-defining for


conv(P) if the following five conditions are satisfied.
1. C + = ∅,
2. u(M − ) + b > aj (C − ) > µ − γ for some j ∈ N (C − ),
3. aj (L+ ) > µ − γ for some j ∈ N (i) for all i ∈ L+ with ui = 0,
4. ui ≥ 0 for all i ∈ L+ ,
5. N (L+ ) ∩ N (M \ L+ ) = N (C − ) ∩ N (K) = ∅.
Valid Inequalities for Problems with Additive Variable Upper Bounds 65

2.3 Generalized Additive Flow Cover Inequalities


In order to derive more general classes of valid inequalities for P , we fix a subset
of the binary variables to zero, derive a valid inequality for the resulting projec-
tion, and then lift this inequality with the variables that are fixed to zero. More
precisely, let F ⊆ N and consider the projection
X X
PF = {(x, y) ∈ IBn−|F | × IRm+ : yi − yi ≤ b,
i∈M + i∈M −
X
yi ≤ u i + aij xj , i ∈ M }
j∈N (i)\F

of P obtained by fixing xj = 0 for all j ∈ F . We assume that conv(PF ) is


full-dimensional. P P
For S ⊆ M , let ū(S) = i∈S (ui + j∈N (i)\F aij ). Let C
+
⊆ M + and
− − − − − −
C ⊆ PM such that λ = ū(C ) − b − ū(C ) > 0 and L ⊆ M \ C such that
+

γ = i∈L− ui < λ. Then, from Section 2.1 we have the following valid additive
flow cover inequality for PF
X X
yi + (aj (C + ) − λ + γ)+ (1 − xj )−
i∈C + j∈N (C + )\F
X X
min{aj (L− ), λ − γ}xj − yi ≤ b + ū(C − ) + γ. (7)
j∈N (L− )\F i∈K

Note that inequality (7) is not necessarily valid for P . We assume that
the conditions of Proposition 3 are satisfied and hence (7) is facet-defining for
conv(PF ). In order to derive a generalized additive flow cover inequality for
P , we lift (7) in two phases. In the first phase we lift the inequality with the
variables in N (L− ∪ C − ) ∩ F . Then in the second phase we lift the resulting in-
equality with the variables in N (C + ) ∩ F . When lifting the variables in phases,
for convenience, we make the following assumption:
(A.6) (N (C + ) ∩ F ) ∩ (N (L− ∪ C − ) ∩ F ) = ∅.
Even if (A.6) is not satisfied, the lifted inequality is still valid for P , but it
may not be facet-defining for conv(P ). Now, let (7) be lifted first with variable
xl , l ∈ N (L− ∪ C − ) ∩ F . Then the lifting coefficient associated with xl is equal
to
(
X
− − −
f (al (L ∪ C )) = b + ū(C ) + γ − max yi +
(x,y)∈PF \l ,xl =1
i∈C +

X X X 
(aj (C + ) − λ + γ)+ (1 − xj ) − min{aj (L− ), λ − γ}xj − yi .

j∈N (C + )\F j∈N (L− )\F i∈K

Since (7) satisfies the conditions of Proposition 3, it follows that ū(C + ) >
λ − γ or equivalently b + ū(C − ) + γ > 0. Then the lifting problem has an optimal
66 Alper Atamtürk, George L. Nemhauser, and Martin W. P. Savelsbergh

solution such that yi = 0 for all i ∈ (M + \ C + )∪K. Let (x̄, ȳ) be such an optimal
solution and let S = {j ∈ N (C + )\F : x̄j = 0} and T = {j ∈ N (L− )\F : x̄j = 1}.
Clearly, we may assume that S ⊆ {j ∈ N (C + ) \ F : aj (C + ) > λ − γ} and
T ⊆ {j ∈ N (L− ) \ F : aj (L− ) > λ − γ}; otherwise we can obtain a solution with
the same or better objective value by considering a subset of S or T satisfying
these conditions. There are two cases Pto consider when determining the value of
f (al (L− ∪ C − )) depending on how i∈C + yi is bounded in an optimal solution.
We analyze f (al (L− ∪ C − )) separately for each case.
X X
Case 1: λ − γ ≤ aj (C + ) + aj (L− ) + al (L− ∪ C − ).
j∈S j∈T
X
f (al (L− ∪ C − )) = b + ū(C − ) + γ − [ū(C + ) − aj (C + ) +
j∈S
X X
(aj (C ) − λ + γ) −
+
(λ − γ)]
j∈S j∈T
= (|S ∪ T | − 1)(λ − γ).
X X
Case 2: λ − γ > aj (C + ) + aj (L− ) + al (L− ∪ C − ).
j∈S j∈T
X
f (al (L− ∪ C − )) = b + ū(C − ) + γ − [b + ū(C − ) + γ + aj (L− ) +
j∈T
X X
− −
al (L ∪ C ) + (aj (C ) − λ + γ) −
+
(λ − γ)]
j∈S j∈T
X X
= |S ∪ T |(λ − γ) − aj (C + ) − aj (L− ) − al (L− ∪ C − ).
j∈S j∈T

Observe that in Case 2, i.e. if the balance constraint is tight, S = T = ∅


since al (L− ∪ C − ) ≥ 0 and by assumption aj (C + ) > λ − γ for all j ∈ S and
aj (L− ) > λ − γ for all j ∈ T . Also in Case 1, f (al (L− ∪ C − )) is minimized when
S = T = ∅. Then, we conclude that f (al (L− ∪C − )) = − min{al (L− ∪C − ), λ−γ}.
It is easy to see that f is superadditive on IR− , which implies that the lifting
is sequence independent, that is the lifting function f remains unchanged as
the projected variables in N (L− ∪ C − ) ∩ F are introduced to inequality (7)
sequentially [7,13]. Therefore,
X X
yi + (aj (C + ) − λ + γ)+ (1 − xj )−
i∈C + j∈N (C + )\F
X
min{aj (L− ∪ C − ), λ − γ}xj − (8)
j∈N (L− ∪C − )∩F
X X
min{aj (L− ), λ − γ}xj − yi ≤ b + ū(C − ) + γ
j∈N (L− )\F i∈K

is a valid inequality for PN (C + )∩F .


Valid Inequalities for Problems with Additive Variable Upper Bounds 67

In the second phase, we lift inequality (8) with xl , l ∈ N (C + ) ∩ F . The lifting


coefficient of xl equals
(
X

g(al (C )) = b + ū(C ) + γ −
+
max yi +
(x,y)∈PN (C + )∩F \l,x
l =1 i∈C +
X X
(aj (C ) − λ + γ) (1 − xj ) −
+ +
min{aj (L− ), λ − γ}xj −
j∈N (C + )\F j∈N (L− )\F

X X 
min{aj (L− ∪ C − ), λ − γ}xj − yi .

j∈N (L− ∪C − )∩F i∈K

The lifting problem has an optimal solution such that yi = 0 for all i ∈
(M + \ C + ) ∪ K, xj = 1 for all j ∈ N (C + ) \ F such that aj (C + ) ≤ λ − γ, xj = 0
for all j ∈ N (L− ∪ C − ) ∩ F such that aj (L− ∪ C − ) ≤ λ − γ, and xj = 0 for all
j ∈ N (L− ) \ F such that aj (L− ) ≤ λ − γ. Let (x̄, ȳ) be such an optimal solution.
Let R = {j ∈ N (L− ∪ C − ) ∩ F ) : x̄j = 1}, S = {j ∈ N (C + ) \ F : x̄j = 0}, and
T = {j ∈ N (L− ) \ F : x̄j = 1}. Again, there P are two cases when determining
the value of g(al (C + )) depending on how i∈C + yi is bounded in an optimal
solution.
X X X
Case 1: λ − γ ≤ aj (C + ) + aj (L− ∪ C − ) + aj (L− ) − al (C + ).
j∈S j∈R j∈T

g(al (C + )) = b + ū(C − ) + γ − [ū(C + ) −


X X X
aj (C + ) + al (C + ) + (aj (C + ) − λ + γ) − (λ − γ)]
j∈S j∈S i∈R∪T

= (|R ∪ S ∪ T | − 1)(λ − γ) − al (C ). +

X X X
Case 2: λ − γ > aj (C + ) + aj (L− ∪ C − ) + aj (L− ) − al (C + ).
j∈S j∈R j∈T

X
g(al (C + )) = b + ū(C − ) + γ − [b + ū(C − ) + γ + aj (L− ∪ C − ) +
j∈R
X X X
aj (L− ) + (aj (C + ) − λ + γ) − (λ − γ)]
j∈T j∈S j∈R∪T
X X X
= |R ∪ S ∪ T |(λ − γ) − aj (C + ) − aj (L− ∪ C − ) − aj (L− ).
j∈S j∈R j∈T

Now, let 
 aj (C + ), if j ∈ N (C + ) \ F,
vj = aj (L ∪ C ), if j ∈ N (L− ∪ C − ) ∩ F,
− −

aj (L− ), if j ∈ N (L− ) \ F,
68 Alper Atamtürk, George L. Nemhauser, and Martin W. P. Savelsbergh

1 − xj , if j ∈ N (C + ) \ F,
x0j =
xj , if j ∈ N (L− ) ∪ (N (C − ) ∩ F ),
and {j1 , j2 , . . . , jr } = {j ∈ (N (C + ) \ F ) ∪ N (L− ) ∪ (N (C − ) ∩ F ) : vj > λ − γ}
such that vjk ≥ vjk+1 for k = 1, 2, . . . r − 1. We also define the partial sums
Pk
w0 = 0, wk = i=1 vji for k = 1, 2, . . . , r.
It is not hard to show that there is a monotone optimal solution to the
lifting problem. That is, there exists an optimal solution such that x̄0jk ≥ x̄0jk+1
for k = 1, 2, . . . , r − 1. Therefore g(al (C + )) can be expressed in a closed form as
follows:


 k(λ − γ) − al (C + ), wk < al (C + ) ≤ wk+1 − λ + γ,

k = 0, 1, . . . , r − 1,
g(al (C + )) =

 k(λ − γ) − wk , w k − λ + γ < a l (C +
) ≤ wk , k = 1, 2, . . . , r,

r(λ − γ) − wr , wr < al (C + ).

It can be shown that g is superadditive on IR− , which implies that the lifting
function g remains unchanged as the projected variables in N (C + ) ∩ F are
introduced to inequality (8) sequentially [7,13]. Hence we have the following
result.

Proposition 6. The generalized additive flow cover inequality


X X
yi + (aj (C + ) − λ + γ)+ (1 − xj )+
i∈C + j∈N (C + )\F
X X
αj xj − min{aj (L− ∪ C − ), λ − γ}xj − (9)
j∈N (C + )∩F j∈N (L− ∪C − )∩F
X X
min{aj (L− ), λ − γ}xj − yi ≤ b + ū(C − ) + γ
j∈N (L− )\F i∈K

with

 k(λ − γ) − aj (C + ), wk < aj (C + ) ≤ wk+1 − λ + γ, k = 0, 1, . . . , r − 1,
αj = k(λ − γ) − wk , wk − λ + γ < aj (C + ) ≤ wk , k = 1, 2, . . . , r,

r(λ − γ) − wr , wr < aj (C + ).

is valid for P .

Proposition 7. The generalized additive flow cover inequality (9) is facet-defi-


ning for conv(P ) if (7) is facet-defining for conv(PF ).

2.4 Generalized Additive Flow Packing Inequalities


Here we generalize the additive flow packing inequalities with the same approach
taken in Section 2.3 for the additive flow cover inequalities. Consider the projec-
tion PF of P introduced in Section 2.3. Let C + ⊆ M + and C − ⊆P M − such that

µ = b + ū(C ) − ū(C ) > 0 and L ⊆ M \ C such that γ = i∈L+ ui < µ.
+ + + +
Valid Inequalities for Problems with Additive Variable Upper Bounds 69

Then from Section 2.2 we have the following valid additive flow packing inequal-
ity for PF
X X
yi − min{aj (L+ ), µ − γ}xj +
i∈C + ∪L+ j∈N (L+ )\F
X X
(aj (C − ) − µ + γ)+ (1 − xj ) − yi ≤ ū(C + ) + γ. (10)
j∈N (C − )\F i∈K

We assume that the conditions of Proposition 5 are satisfied and hence (10)
is facet-defining for conv(PF ). To introduce the variables in F into inequality
(10), we lift (10) in two phases. First we lift the inequality with the variables in
N (C + ∪ L+ ) ∩ F . Then in the second phase we lift the resulting inequality with
variables in N (C − ) ∩ F . When employing this two phase lifting procedure, for
convenience, we assume that
(A.7) (N (L+ ∪ C + ) ∩ F ) ∩ (N (C − ) ∩ F ) = ∅.
The lifting of inequality (10) proceeds similar to the lifting of inequality (7).
Therefore, we only give the final result here.

Proposition 8. The generalized additive flow packing inequality


X X
yi − min{aj (L+ ), µ − γ}xj −
i∈C + ∪L+ j∈N (L+ )\F
X X
min{aj (C + ∪ L+ ), µ − γ}xj + αj xj − (11)
j∈N (C + ∪L+ )∩F j∈N (C − )∩F
X X
(aj − µ + γ)+ (1 − xj ) − yi ≤ ū(C + ) + γ
j∈N (C − )\F i∈K

with

 k(µ − γ) − aj (C − ), wk < aj (C − ) ≤ wk+1 − µ + γ, k = 0, 1, . . . , r − 1,
αj = k(µ − γ) − wk , wk − µ + γ < aj (C − ) ≤ wk , k = 1, 2, . . . , r,

r(µ − γ) − wr , wr < aj (C − ),

is valid for P .

Proposition 9. The generalized additive flow packing inequality (11) is facet-


defining for conv(P ) if (10) is facet-defining for conv(PF ).

3 Computational Results
In this section, we present our computational results on solving network ex-
pansion problems with a branch-and-cut algorithm. We implemented heuristic
separation algorithms for the generalized additive flow cover and flow packing
inequalities for the single node relaxation of the problem. We also used the lifted
70 Alper Atamtürk, George L. Nemhauser, and Martin W. P. Savelsbergh

cover inequalities [6] for surrogate 0-1 knapsack relaxations of the single node
relaxation, where the continuous flow variables are replaced with either their
0-1 additive variable upper bound variables or with their lower bounds. The
branch-and-cut algorithm was implemented with MINTO [8] (version 3.0) using
CPLEX as LP solver (version 6.0). All of the experiments were performed on a
SUN Ultra 10 workstation with a one hour CPU time limit and a 100,000 nodes
search tree size limit.
We present a summary of two experiments. The first experiment is performed
to test the effectiveness of the cuts in solving a set of randomly generated net-
work expansion problems with 20 vertices and 70% edge density. The instances
were solved using MINTO first with its default settings and then with the above
mentioned cutting planes generated throughout the search tree. In Table 1, we
report the number of AVUB variables per flow variable (avubs) and the average
values for the LP relaxation at the root node of the search tree (zroot), the
best lower bound (zlb) and the best upper bound (zub) on the optimal value at
termination, the percentage gap between zlb and zub (endgap), the number of
generalized additive flow cover cuts (gafcov), generalized additive flow packing
cuts (gafpack), surrogate knapsack cover cuts (skcov) added, the number of
nodes evaluated (nodes), and the CPU time elapsed in seconds (time) for five
random instances. While none of the problems could be solved to optimality
without adding the cuts within 100,000 nodes, all of the problems were solved
easily when the cuts were added. We note that MINTO does not generate any
flow cover inequalities for these problem, since it does not recognize that ad-
ditive variable upper bounds can be relaxed to simple variable upper bounds.
Observe that the addition of the cuts improves the lower bounds as well as the
upper bounds significantly, which leads to much smaller search trees and overall
solution times. Table 1 clearly shows the effectiveness of the cuts.

Table 1. Effectiveness of cuts: 20 vertices.


avubs zroot zlb zub endgap gafcov gafpack skcov nodes time
without 2 9.49 10.22 16.80 39.00 0 0 0 100,000 1386
cuts 4 12.74 16.99 25.60 33.86 0 0 0 100,000 1018
8 2.77 11.51 58.40 79.15 0 0 0 100,000 1859
with 2 15.30 15.40 15.40 0.00 46 42 15 6 1
cuts 4 23.73 25.20 25.20 0.00 55 33 47 73 5
8 36.36 38.40 38.40 0.00 47 34 41 121 9

In the next experiment, we solved larger instances of the network expansion


problem with 20% edge density to find out the sizes of instances that can be
solved with the branch-and-cut algorithm. The results of this experiment are
summarized in Table 2, where we present the number of AVUB variables per
flow variable (avubs), the average values for the percentage difference between
the initial LP relaxation and zub (initgap), the percentage difference between
the LP relaxation after the cuts are added at the root node and zub (rootgap),
Valid Inequalities for Problems with Additive Variable Upper Bounds 71

in addition to endgap, gafcov, gafpack, skcov, nodes, and time for five random
instances with 50, 100 and 150 vertices. We note that these problems are much
larger than ones for which computations are provided in the literature [5,11].
Although all of the instances with 50 vertices could be solved to optimality, for
the larger instances the gap between the best lower bound and the best upper
bound could not be completely closed for most of the problems with 4 or 8 avubs
within an hour of CPU time. Nevertheless, the improvement in LP relaxations
is significant, ranging between 50% and 98%.

Table 2. Performance of the branch-and-cut algorithm.

vertices avubs initgap rootgap endgap gafcov gafpack skcov nodes time
1 25.09 1.05 0.00 48 56 16 24 2
50 2 20.12 2.20 0.00 115 102 38 110 11
4 36.34 3.31 0.00 192 139 106 721 49
8 92.06 3.72 0.00 416 216 164 7712 960
1 14.80 0.94 0.00 182 148 41 44 39
100 2 12.27 4.44 0.00 590 324 191 1236 961
4 38.04 3.74 2.37 707 382 448 2437 2171
8 92.75 9.15 8.75 828 408 509 1717 3600
1 16.56 0.29 0.00 480 307 57 346 586
150 2 10.81 5.39 4.34 586 395 318 813 2965
4 43.69 12.22 12.22 862 527 672 711 3600
8 93.08 15.02 15.02 659 449 503 477 3600

For most of the unsolved problems, the best lower bound and the best upper
bound were found at the root node in a few minutes; no improvement in the gap
was observed later in the search tree. For instance nexp.100.8.5 the value of
the initial LP relaxation was 18.23. After adding 916 cuts in 42 rounds the root
LP relaxation improved to 220.81, which was in fact the best lower bound found
in the search tree, in 617 seconds. The best upper bound 259 was again found at
the root node by a simple heuristic which installs the least cost integral capacity
feasible for the flow on each edge provided by the LP relaxation. Therefore, for
the unsolved problems it is likely that the actual duality gaps of the improved
LP relaxations are much smaller.
More detailed experiments to compare the relative effectiveness of the differ-
ent classes of cuts revealed that the generalized additive flow cover inequalities
were the most effective, and that the lifted surrogate knapsack inequalities were
more effective than the generalized additive flow packing inequalities. However,
the use of all three classes of cuts delivered the best performance in most cases.
From these computational results, we conclude that the valid inequalities derived
from the single node relaxations are very effective in improving the LP bounds
for network design/expansion problems.
72 Alper Atamtürk, George L. Nemhauser, and Martin W. P. Savelsbergh

References
1. K. Aardal, Y. Pochet, and L. A. Wolsey. Capacitated facility location: Valid
inequalities and facets. Mathematics of Operations Research, 20:562–582, 1995.
2. A. Atamtürk. Conflict graphs and flow models for mixed-integer linear optimization
problems. PhD thesis, ISyE, Georgia Institute of Technology, Atlanta, USA, 1998.
3. A. Atamtürk. Flow packing facets of the single node fixed-charge flow polytope.
Technical report, IEOR, University of California at Berkeley, 1998.
4. I. Barany, T. J. Van Roy, and L. A. Wolsey. Uncapacitated lot sizing: The convex
hull of solutions. Mathematical Programming Study, 22:32–43, 1984.
5. D. Bienstock and O. Günlük. Capacitated network design - Polyhedral structure
and computation. INFORMS Journal on Computing, 8:243–259, 1996.
6. Z. Gu, G. L. Nemhauser, and M. W. P. Savelsbergh. Lifted knapsack covers inequal-
ities for 0-1 integer programs: Computation. Technical Report LEC-94-9, Georgia
Institute of Technology, Atlanta GA, 1994. (to appear in INFORMS Journal on
Computing).
7. Z. Gu, G. L. Nemhauser, and M. W. P. Savelsbergh. Sequence independent lifting.
Technical Report LEC-95-08, Georgia Institute of Technology, Atlanta, 1995.
8. G. L. Nemhauser, M. W. P. Savelsbergh, and G. S. Sigismondi. MINTO, a Mixed
INTeger Optimizer. Operations Research Letters, 15:47–58, 1994.
9. M. W. Padberg, T. J. Van Roy, and L. A. Wolsey. Valid linear inequalities for
fixed charge problems. Operations Research, 32:842–861, 1984.
10. Y. Pochet. Valid inequalities and separation for capacitated economic lot sizing.
Operations Research Letters, 7:109–115, 1988.
11. M. Stoer and G. Dahl. A polyhedral approach to multicommodity survivable
network. Numerische Mathematik, 68:149–167, 1994.
12. T. J. Van Roy and L. A. Wolsey. Valid inequalities for mixed 0-1 programs. Discrete
Applied Mathematics, 14:199–213, 1986.
13. L. A. Wolsey. Valid inequalities and superadditivity for 0/1 integer programs.
Mathematics of Operations Research, 2:66–77, 1977.
A Min-Max Theorem on Feedback Vertex Sets
(Preliminary Version)

Mao-cheng Cai1,?,∗∗ , Xiaotie Deng2,?? and Wenan Zang3,? ? ?


1
Institute of Systems Science, Academia Sinica
Beijing 100080, P. R. China
[email protected]
2
Department of Computer Science, City University of Hong Kong
Hong Kong, P. R. China
[email protected]
3
Department of Mathematics, The University of Hong Kong
Hong Kong, P. R. China
[email protected]

Abstract. We establish a necessary and sufficient condition for the lin-


ear system {x : Hx ≥ e, x ≥ 0} associated with a bipartite tournament
to be TDI, where H is the cycle-vertex incidence matrix and e is the
all-one vector. The consequence is a min-max relation on packing and
covering cycles, together with strongly polynomial time algorithms for
the feedback vertex set problem and the cycle packing problem on the
corresponding bipartite tournaments. In addition, we show that the feed-
back vertex set problem on general bipartite tournaments is NP-complete
and approximable within 3.5 based on the max-min theorem.

Key words. feedback vertex set, bipartite tournament, totally dual in-
tegrality, min-max relation, approximation algorithm.

AMS subject classification. 68Q25, 68R10.

1 Introduction

The basic theme of polyhedral combinatorics is the application of linear pro-


gramming duality theory to combinatorial problems, the end product of which
is often a combinatorial min-max result. In addition to its esthetical value and
theoretical interest, a combinatorial min-max relation usually leads to algo-
rithmic solvability of the corresponding optimization problem. The model of
the totally dual integral (abbreviated TDI) systems proposed by Edmonds and
Giles [2,3] serves as a general framework for establishing min-max relations for
?
Research partially supported by the National Natural Science Foundation of China.
??
Research supported in part by a RGC CERG grant and a SRG grant of City Uni-
versity of Hong Kong
???
Supported in part by RGC grant 338/024/0009.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 73–86, 1999.
c Springer-Verlag Berlin Heidelberg 1999
74 Mao-cheng Cai, Xiaotie Deng, and Wenan Zang

various combinatorial problems, such as Fulkerson’s optimum arborescence the-


orem, Lucchesi-Younger theorem, Edmonds’ matroid intersection theorem, and
Edmonds’ matching polydedron theorem [7]; it is also closely related to many
other models for the same purpose [6].
A rational linear system {x : Ax ≥ b, x ≥ 0} is called totally dual integral, if
the maximum Max {y T b | y T A ≤ cT , y ≥ 0} has an integral optimum solution y
for every integral vector c for which the maximum is finite. As shown by Edmonds
and Giles [2], if {Ax ≥ b, x ≥ 0} is TDI, b is integral, and Min {cT x | Ax ≥ b, x ≥
0} is finite, then it has an integral optimal solution x. Giles and Pulleyblank
showed that any rational polyhedron P has a TDI system P = {x : Ax ≤ b}
representation with A integral. Moreover, b can be chosen to be integral if and
only if P is integral [4]. Usually, a polyhedron can be defined by different systems
of linear inequalities. It was proved by Schrijver [8] that if a rational polyhedron
P is of full dimension, then P has a unique minimal TDI system representation
P = {x : Ax ≤ b} with A and b integral if and only if P is integral.
Many combinatorial problems involve naturally defined classes of polyhe-
dra, for instance, the vertex cover problem of graphs. Let A be the edge-vertex
incidence matrix of a graph G. Then {x : Ax ≥ e, x ∈ {0, 1}} is the collec-
tion of vertex cover sets. A classical theorem asserts that the linear relaxation
{x : Ax ≥ e, x ≥ 0} is TDI if and only if G is bipartite. In a recent work [1], we
investigated the feedback vertex set problem on tournaments using a similar ap-
proach. Given a digraph D = (V, E) with weight on each vertex, a subset S ⊂ V
is called a feedback vertex set if V (C) ∩ S 6= ∅ for any (directed) cycle C in D.
The problem of finding a feedback vertex set with the minimum total weight
is called the feedback vertex set problem. Let H be the triangle-vertex incidence
matrix of a tournament, we [1] established a necessary and sufficient condition
for {x : Hx ≥ e, x ≥ 0} to be TDI. This allowed us to obtain a 2.5-approximation
algorithm (based on the subgraph removal technique) for the minimum feedback
vertex set problem on tournaments, improving a previous known algorithm with
performance guarantee of three by Speckenmeyer [9].
We are interested in extending this approach to other problems. In this work,
we study the feedback vertex set problem on bipartite tournaments, where a
bipartite tournament is an orientation of a complete bipartite graph. We prove
that a linear system associated with the minimum feedback vertex set problem on
a bipartite tournament T is TDI if and only if T contains no F1 nor F2 (see Figure
1). We also give strongly polynomial time algorithms for the feedback vertex
set problem and the cycle packing problem on these bipartite tournaments. In
comparison with the previous work [1] for tournaments, this work requires deeper
insight into the mathematical structure of bipartite tournaments. In fact, it
would be a formidable task (if not impossible) to adapt the proof in the previous
work to the more complicated problem studied here, and the present proof is
much more concise and mathematically easier to understand.
In Section 2, we introduce notations and give a structural description of
bipartite tournaments with no F1 nor F2 . In section 3, we proceed to investigate
the maximum cycle-packing problem on these bipartite tournaments and prove
A Min-Max Theorem on Feedback Vertex Sets 75

p2 p4 p6 p5 p6 p7
u u u u - u - u
Z
JZ J  Z
JJZ JJ >
6J Z 6J  6 ] Z
J ] 
J
J Z J J Z J
J ZZ J J Z ~  J
J  Z J J Z u p4 J
J Z J J>ZZ J
 J Z J J Z J

=
  ^
J
J  ZZJ^J
~Z  J  ZZ ~J
u Ju ZJu u - Ju - ZJu
p1 p3 p5 p1 p2 p3

F1 F2
Figure 1: Two Forbidden Bipartite Subtournaments

that the packing problem has an integral optimum solution for any nonnegative
integral weight function w on vertices. In Section 4, we establish a TDI system
associated with the cycle-covering problem, which yields an integral optimum
solution to the LP relaxation for the minimum cycle-covering problem on these
bipartite tournaments. As a result, we obtain a min-max theorem—the cycle-
packing number equals the cycle-covering number for any bipartite tournament
with no F1 nor F2 . In addition, we present strongly polynomial time algorithms
for the cycle-packing and cycle-covering problems. In section 5, we show the NP-
completeness of the feedback set problem on general bipartite tournaments. Thus
it is natural to consider the approximation problem. Clearly, a 4-approximation
algorithm for this problem can be obtained by the primal-dual method [5]. Based
on the TDI system, we shall be able to improve the ratio from 4 to 3.5. This
exhibits another example of applying the duality structures of linear programs
to approximation algorithms. In section 6, we conclude this paper with remarks
and discussion.

2 Preliminaries

We consider a bipartite tournament T = (V, E; w), with a weight function on


the vertices w : V → R+ = {x : x ≥ 0}. Let (u, v) denote the arc of T from
vertex u to vertex v, and let N − (v) = {u ∈ V | (u, v) ∈ E} and N + (v) = {u ∈
V | (v, u)) ∈ E}, let d(u, v) stand for the distance from u to v. Without loss of
generality, let us assume henceforward that T under consideration is strongly
connected for otherwise we may consider its components separately. Clearly, in
bipartite tournaments, S ⊆ V is a feedback vertex set if and only if S intersects
every (directed) cycle of length four, denoted by C4 . Thus the feedback vertex
set problem is actually the C4 -covering problem. Similarly, the cycle packing
problem is actually the C4 -packing problem. We shall focus on C4 ’s in place of
cycles in the remainder of the paper.
More formally, a C4 -packing in T is a family of C4 ’s (repetition is allowed)
in T such that each vertex is contained in at most w(v) of C4 ’s in this family. A
76 Mao-cheng Cai, Xiaotie Deng, and Wenan Zang

maximum C4 -packing in T is a C4 -packing in T with the largest size. The C4 -


packing number of T is the size of a maximum C4 -packing in T . A C4 -covering
P set S ⊆ V that intersects each C4 in T . The size of S, denoted
in T is a vertex
by w(S), is v∈S w(v). A minimum C4 -covering in T is a C4 -covering with the
smallest size; the C4 -covering number of T is the size of a minimum C4 -covering
in T .
The C4 -covering number of T is always greater than or equal to the C4 -
packing number of T . The situation in which the packing and covering numbers
are equal is particularly interesting. We point out that equality does not neces-
sarily hold on general bipartite tournaments: both F1 and F2 have C4 -packing
number of 1 and C4 -covering number of 2. We shall demonstrate, based on the
above-mentioned TDI system, that actually F1 and F2 are the only obstructions
for our problem: if a bipartite tournament T = (V, E) contains no F1 nor F2 ,
then the C4 -packing number of T equals the C4 -covering number of T .

Let us now present a structural description of the bipartite-tournaments with


no F1 nor F2 , which will be used repeatedly later.

Lemma 2.1 Let T = (V, E) be a strongly connected bipartite tournament with


no subdigraph isomorphic to F1 nor F2 . Then the vertex set V can be parti-
tioned into V1 , V2 , . . . , Vk for some k with 4 ≤ k ≤ |V |, which have the following
properties:

(a) For i = 1, 2, . . . , k, Vi is independent;


(b) For i = 1, 2, . . . , k, Vi admits a linear order ≺ such that for any arc (u, v) ∈
E, we have
• (u, x) ∈ E whenever v ≺ x and
• (x, v) ∈ E whenever x ≺ u.
(c) There is no arc between Vi and Vi+2j for 1 ≤ i ≤ k − 2 and 1 ≤ j ≤ b k−i 2 c;
each arc between Vi and Vi+1+2j is directed from Vi to Vi+1+2j for 1 ≤ i ≤
k − 3 and 1 ≤ j ≤ b k−1−i 2 c.

Proof. Let us reserve the symbol w for a vertex in T with minimum indegree
throughout the proof. For i = 0, 1, 2, . . ., define

Vi+1 := {v ∈ V | d(v, w) = i}.

Let k stand for the largest subscript with Vk 6= ∅. Then V1 , V2 , . . . , Vk form a


partition of V as T is strongly connected. Note that each Vi is independent for
T is bipartite. In addition, k ≥ 4 for N + (w) 6= ∅.
We shall show that V1 , V2 , . . . , Vk have the desired properties. For this pur-
pose, observe that
(2.1) For each v ∈ Vi with 2 ≤ i ≤ k, N + (v) ∩ Vi−1 6= ∅ by the definition of
Vi .
(2.2) For each v ∈ Vi with i = 2, 3, N − (v) ∩ Vi+1 6= ∅, since otherwise, if
v ∈ V2 , then v would be a source of D, contradicting the strong connectivity of
A Min-Max Theorem on Feedback Vertex Sets 77

D; if v ∈ V3 , then d− (v) < d− (w) for T is bipartite, contradicting the choice of


w.
(2.3) There is no arc between Vi and Vi+2j for 1 ≤ i ≤ k and 1 ≤ j ≤ b k−i 2 c
as T is bipartite.
(2.4) Each arc between Vi and Vi+1+2j is directed from Vi to Vi+1+2j for
1 ≤ i ≤ k − 3 by the definition of Vi+1+2j , j ≥ 1.
Property (c) thus follows. To prove that V1 , V2 , . . . , Vk enjoy properties (a)
and (b), we make the following observations.
(2.5) There is no C4 in [Vi ∪ Vi+1 ] for i = 1, 2, . . . , k − 1.
Assume the contrary: some [Vi ∪ Vi+1 ] contains a C4 . Let i be such smallest
subscript and let v1 v2 v3 v4 v1 be a C4 with v1 , v3 ∈ Vi and v2 , v4 ∈ Vi+1 . Then
i ≥ 2. Now let us distinguish between two cases.
Case 1. i = 2. If there exists v5 ∈ V4 such that (v5 , v2 ), (v5 , v4 ) ∈ E, then
{v1 , . . . , v5 , w} induces an F1 in T with vertex correspondence v1 ↔ p2 , v2 ↔
p3 , v3 ↔ p4 , v4 ↔ p1 , v5 ↔ p6 , w ↔ p5 , a contradiction. Thus, by (2.2), there
exist two vertices v5 , v6 ∈ V4 such that (v5 , v2 ), (v6 , v4 ), (v2 , v6 ), (v4 , v5 ) ∈ E. So
{v1 , . . . , v6 , w} induces an F2 in T with correspondence v1 ↔ p1 , v2 ↔ p2 , v3 ↔
p5 , v4 ↔ p6 , v5 ↔ p7 , v6 ↔ p3 , w ↔ p4 , again a contradiction.
Case 2. i ≥ 3. First observe that there exists v ∈ Vi−1 such that (v1 , v), (v3 , v) ∈
E. (Indeed, by (2.1) there exist v, v 0 ∈ Vi−1 such that (v1 , v), (v3 , v 0 ) ∈ E. Thus
either (v1 , v 0 ) ∈ E or (v3 , v) ∈ E for otherwise v1 vv3 v 0 v1 is a C4 in [Vi−1 ∪ Vi ],
contradicting the choice of i.) Now (2.1) guarantees the existence of u in Vi−2
such that (v, u) ∈ E. In view of (2.4), (u, v2 ), (u, v4 ) ∈ E. So [{v1 , v2 , v3 , v4 , v, w}]
is isomorphic to F1 with correspondence v1 ↔ p2 , v2 ↔ p3 , v3 ↔ p4 , v4 ↔ p1 , v ↔
p5 , u ↔ p6 , a contradiction.
(2.6) There is no C4 in [Vi ∪ Vi+1 ∪ Vi+2 ] for i = 1, 2, . . . , k − 2.
Assume the contrary: some [Vi ∪ Vi+1 ∪ Vi+2 ] contains a C4 . Let i be such
smallest subscript and let v1 v2 v3 v4 v1 be a C4 in [Vi ∪ Vi+1 ∪ Vi+2 ]. Then, by
(2.5), we may assume v1 ∈ Vi , v2 , v4 ∈ Vi+1 and v3 ∈ Vi+2 . In view of (2.1), there
exists v5 ∈ Vi such that (v2 , v5 ) ∈ E. Thus (v4 , v5 ) ∈ E for otherwise v1 v2 v5 v4 v1
would be a C4 in [Vi ∪ Vi+1 ], contradicting (2.5). Now we consider two cases.
In case i = 2, (2.2) guarantees the existence of v6 ∈ Vi+2 with (v6 , v2 ) ∈ E. In
view of (2.5), (v6 , v4 ) ∈ E. By (2.4), (w, v6 ), (w, v3 ) ∈ E. Thus [{w, v1 , . . . , v6 }] is
isomorphic to F2 with correspondence v1 ↔ p1 , v2 ↔ p4 , v3 ↔ p3 , v4 ↔ p6 , v5 ↔
p7 , v6 ↔ p5 , w ↔ p2 , a contradiction.
In case i ≥ 3, it can be shown similarly that there exist v6 ∈ Vi−1 and v7 ∈
Vi−2 with (v1 , v6 ), (v5 , v6 ), (v6 , v3 ), (v6 , v7 ), (v7 , v2 ), (v7 , v4 ) ∈ E. Thus {v1 ,. . ., v7 }
induces an F2 with the correspondence v1 ↔ p1 , v2 ↔ p4 , v3 ↔ p3 , v4 ↔ p6 , v5 ↔
p7 , v6 ↔ p2 , v7 ↔ p5 , a contradiction.
Now let us introduce a partial order ≺ on each Vi as follows. For any u, v ∈ Vi ,
define u ≺ v if d(u, v) = 2, in other words, there exists a directed path from u
to v with length 2. Then ≺ is well defined. To justify it, note that at most one
of u ≺ v and v ≺ u can occur, since otherwise u and v would be on a C4 in
one of [Vi−1 ∪ Vi ], [Vi ∪ Vi+1 ], and [Vi−1 ∪ Vi ∪ Vi+1 ], contradicting (2.5) or (2.6).
Moreover, v ≺ v will never occur for any vertex v.
78 Mao-cheng Cai, Xiaotie Deng, and Wenan Zang

(2.7) For any x, y, z ∈ Vi , if x ≺ y and y ≺ z, then x ≺ z.


Suppose x ≺ y is defined by path xuy and y ≺ z defined by path yvz. Then
u 6= v since (u, y), (y, v) ∈ E. It follows that (x, v) ∈ E for otherwise xuyvx
would be a C4 in one of [Vi−1 ∪ Vi ], [Vi ∪ Vi+1 ], and [Vi−1 ∪ Vi ∪ Vi+1 ], a contra-
diction. Thus path xvz yields x ≺ z.

Let us now extend the partial order ≺ to a linear order on the whole Vi as
follows: for any two incomparable (according to ≺) vertices u, v ∈ Vi , assign an
arbitrary order between u and v. Then ≺ is the desired linear order.
(2.8) For any arc (u, v) ∈ E, we have (u, x) ∈ E whenever v ≺ x and
(x, v) ∈ E whenever x ≺ u.
Otherwise, we have path xuv, implying x ≺ v in the former case and have
path uvx, implying u ≺ x in the latter, a contradiction.
The proof is complete. t
u

Corollary 2.1 Let bipartite tournament T = (V, E) and {V1 , V2 , . . . , Vk }, a


partition of V , be as described in Lemma 2.1. If uvxyu is a C4 in T , then there
exists a subscript i with 1 ≤ i ≤ k − 3 such that u ∈ Vi+3 , v ∈ Vi+2 , x ∈ Vi+1 ,
and y ∈ Vi (renaming u, v, x and y if necessary).
Proof. Since the distance between any two vertices on cycle uvxyu is at most
three, it follows from the definition of Vi that u, v, x and y are located in at most
four consecutive sets, say Vi ∪ Vi+1 ∪ Vi+2 ∪ Vi+3 . On the other hand, by (2.5)
and (2.6) any three consecutive sets contain no C4 . So the desired statement
follows. t
u

Lemma 2.2 Let T = (V, E) be a strongly connected bipartite tournament.


Then either one of F1 and F2 in T or a partition {V1 , V2 , . . . , Vk } of V as de-
scribed in Lemma 2.1 can be found in time O(|V |2 ).
Proof. First note that the vertex w and the partition {V1 , V2 , . . . , Vk } can be
determined by the breadth-first search in time O(|V |2 ).
In order to establish a linear order for each Vi , let us set X2 = V3 , Xk =
Vk−1 , Xi = Vi−1 ∪ Vi+1 for 3 ≤ i ≤ k − 1. Then the order of each Vi will be
determined based on Xi . Since V1 is a singleton, its order is trivial. Suppose we
have determined the order of V1 ∪ V2 · · · ∪ Vi−1 , let us proceed to the order of
Vi , where i ≥ 2.
Set Vi,1 = Vi and k = 1. Let Pi = {Vi,1 , Vi,2 , . . . , Vi,k } be an ordered partition
of Vi with Vi,1 ≺ Vi,2 ≺ . . . ≺ Vi,k . We scan the vertices in Xi successively;
suppose v ∈ Xi is the vertex in our consideration.
(2.9) If there exist x ∈ Vi,h and y ∈ Vi,j such that
• h < j,
• x ∈ N + (v), and
• y ∈ N − (v),
then there exists a C4 containing v, x and y in one of [Vi−1 ∪ Vi ], [Vi ∪ Vi+1 ] and
[Vi−1 ∪ Vi ∪ Vi+1 ]. Recall the proof of (2.5) and (2.6), we can thus output an F1
or F2 in time O(|V |), stop; else,
A Min-Max Theorem on Feedback Vertex Sets 79

(2.10) If there exists h, 1 ≤ h ≤ k, such that


• Vi,h ∩ N − (v) 6= ∅,
• Vi,h ∩ N + (v) 6= ∅,
• Vi,j ⊂ N − (v) for all j < h, and
• Vi,j ⊂ N + (v) for all j > h,
then set Vi,h1 = Vi,h ∩ N − (v), Vi,h2 = Vi,h ∩ N + (v). Replace Vi,h by {Vi,h1 , Vi,h2 }
in Pi with Vi,h1 ≺ Vi,h2 , and replace k by k+1. (Since N − (v)∩Vi = Vi,h1 ∪h−1j=1 Vi,j
`i h−1
and N (v) ∩ Vi = Vi,h2 ∪j=h+1 Vi,j , for each x ∈ Vi,h1 ∪j=1 Vi,j and each y ∈
+

Vi,h2 ∪`j=h+1
i
Vi,j , there is a directed path xvy. So x ≺ y and thus the original
order of Pi is preserved after the replacement of Vi,h .) We then scan the next
vertex in Xi and repeat the process until no vertex is unscanned.
Since each v ∈ Xi can be scanned in time O(|V |), the total time complexity
Pk
is i=2 O(|V ||Xi |) + O(|V |2 ) = O(|V |2 ). t
u

3 Optimal Cycle Packings


Let C4 be the set of all C4 ’s in T and let H be the incidence matrix of C4 whose
rows and columns are indexed by C4 and V , respectively, such that HC,v = 1 if
v ∈ C and 0 otherwise for each c ∈ C and v ∈ V .
In this section, the bipartite tournament T = (V, E) is confined to one with
no F1 nor F2 . Let us proceed to investigate the (fractional) C4 -packing problem
max{y T em | y T H ≤ wT , y ≥ 0} (1)
where m = |C4 |, n = |V |, em is the all-one column vector of size m, and w is in
n
R+ = {x ∈ Rn | x ≥ 0}.
Without loss of generality, we may assume that T is strongly connected. Since
T contains no F1 nor F2 , V admits a partition {V1 , V2 , . . . , Vk } as described in
Lemma 2.1. Let D denote the digraph obtained from T by removing all arcs
from Vi to Vj for any i < j, let P4 denote a (directed) path with 4 vertices in
D, and let P4 be the set of all P4 ’s in D. Then it follows from Corollary 2.1
that there is a one-to-one correspondence between C4 of T and P4 of D. Hence
C4 and P4 have the same incidence matrix, and the C4 -packing problem on T
is equivalent to the P4 -packing on D. For convenience, we consider the latter
problem instead of the former.
Let ≺ be the linear order on each Vi as defined in Lemma 2.1. Recall that
the order does not apply to any two vertices in distinct Vi ’s. Let us now fill this
gap and extend ≺ to the whole vertex-set V of D.
(3.1) Define u ≺ v for any u ∈ Vi and v ∈ Vj with i < j.
Note that if v1 v2 v3 v4 is a P4 in D then, according to (3.1), we have v4 ≺
v3 ≺ v2 ≺ v1 . The order ≺ on V leads to a lexicographic order on P4 as follows.
(3.2) Let Q1 = u1 u2 u3 u4 and Q2 = v1 v2 v3 v4 be two P4 ’s in D. Define
Q1 ≺ Q2 if uj ≺ vj for the largest subscript j with uj 6= vj .
Based on (3.2), we can further derive a lexicographic order on the packing
m
polytope {y ∈ R+ | y T H ≤ wT }. For this purpose, we assume that the rows of
the incidence matrix H are arranged in the increasing order of P4 ’s.
80 Mao-cheng Cai, Xiaotie Deng, and Wenan Zang

(3.3) Let y = (y1 , y2 , . . . , ym )T and y 0 = (y10 , y20 , . . . , ym


0 T
) be two fractional
P4 -packings in D, that is, both y and y 0 are in {y ∈ R+ m
| y T H ≤ wT }. Define
y ≺ y 0 if yj < yj0 for the smallest subscript j with yj 6= yj0 .
(3.4) Two directed paths P = u1 u2 u3 u4 and P 0 = u01 u02 u03 u04 are said to be
crossing if P ≺ P 0 and if there exist two vertices ui , u0j , 1 ≤ i, j ≤ 4, and some
subscript ` such that u0j ≺ ui and ui , u0j ∈ V` .
For the above crossing pair, in view of Corollary 2.1, P is contained in
[∪s+3 0 t+3
h=s Vh ] for some s ≤ k − 3, P is contained in [∪h=t Vh ] for some t with
s ≤ t ≤ s+3, and each Vh contains at least one and at most two vertices of P and
P 0 , where s ≤ h ≤ t+ 3. Let vh and vh0 denote the vertices in Vh ∩(V (P )∪V (P 0 ))
with vh  vh0 (vh = vh0 if Vh contains only one vertex of P and P 0 ). Define

P ∧ P 0 = vs+3 vs+2 vs+1 vs and P ∨ P 0 = vt+3


0 0
vt+2 0
vt+1 vt0 .

Then we have
(3.5) P ∧ P 0 , P ∨ P 0 ∈ P4 and P ∧ P 0 ≺ P ≺ P 0 ≺ P ∨ P 0 .
Indeed, for each subscript h between s + 1 and s + 3, if neither P nor P 0
contains (vh , vh−1 ), then vh ≺ vh0 and (vh0 , vh−1 ) is an arc of P or P 0 . Thus by
Lemma 2.1 (b), (vh , vh−1 ) ∈ E. It follows that P ∧P 0 ∈ P4 . Similarly, P ∨P 0 ∈ P4 .
Since u0j ∈ P ∧ P 0 and ui ∈ P ∨ P 0 , we have P ∧ P 0 ≺ P ≺ P 0 ≺ P ∨ P 0 .

Lemma 3.1 Let T = (V, E) be a bipartite tournament with vertex-weight w ≥ 0


and with no subdigraph isomorphic to F1 nor F2 . Then the lexicographically
m
largest packing ŷ ∈ {y ∈ R+ | y T H ≤ wT } is an optimum solution to (1).

Proof. Assume the contrary: ȳ T em > ŷ T em for any optimum solution ȳ to (1).
For convenience, choose ȳ to be the lexicographically largest among all the op-
timum solutions to (1). Since ŷ is the lexicographically largest, there exists a
directed path Q∗ = v4 v3 v2 v1 such that ŷ(Q∗ ) > ȳ(Q∗ ) and ŷ(Q) = ȳ(Q) for all
Q ∈ P4 with Q ≺ Q∗ . Let i be the subscript with vj ∈ Vi−1+j for each 1 ≤ j ≤ 4
hereafter.
(3.6) Set R = {Q ∈ P4 | ȳ(Q) > 0, Q  Q∗ , V (Q) ∩ V (Q∗ ) 6= ∅}. Then R =6 ∅.
Otherwise, let ỹ be the vector obtained from ȳ by replacing ȳ(Q∗ ) with ŷ(Q∗ ).
Then ỹ T H ≤ wT and ỹ T em > ȳ T em , a contradiction.
(3.7) No Q ∈ R contains any vertex in {v ∈ Vi−1+j | v ≺ vj } for any 1 ≤ j ≤
4.
Assume the contrary: some Q0 ∈ R contains a vertex v ∗ ∈ Vi−1+j with
v ≺ vj . Then Q0 and Q∗ form a crossing pair (recall (3.4)). By (3.5), Q0 ∧Q∗ ∈ P4

and Q0 ∧ Q∗ ≺ Q∗ . Set δ = min{ȳ(Q0 ), ŷ(Q∗ )} and define



 ŷ(Q) + δ if Q = Q0 ∧ Q∗ ,
ỹ(Q) = ŷ(Q) if Q ≺ Q∗ and Q 6= Q0 ∧ Q∗ ,

0 otherwise.

Then ỹ is feasible to (1) with ŷ ≺ ỹ, contradicting the definition of ŷ.


(3.8) No path Q0 ∈ R satisfies V (Q0 ) ∩ V (Q∗ ) ⊇ V (Q) ∩ V (Q∗ ) for all Q ∈ R.
A Min-Max Theorem on Feedback Vertex Sets 81

Otherwise, define 
 ȳ(Q) + δ if Q = Q∗ ,
ỹ(Q) = ȳ(Q) − δ if Q = Q0 ,

ȳ(Q) otherwise,
where δ = min{ȳ(Q0 ), ŷ(Q∗ ) − ȳ(Q∗ )}. It is easy to see that ỹ is also an optimum
solution to (1). Since ȳ ≺ ỹ, we reach a contradiction. Hence
(3.9) There exist two vertices vh , vj ∈ Q∗ and two paths R, R0 ∈ R such that
vh ∈ R, vj 6∈ R, vj ∈ R0 , vh 6∈ R0 .
Let us show that R and R0 are crossing. Since Q∗ ≺ R and Q∗ ≺ R0 , neither
R nor R0 contains any vertex in Vr for any r < i. Without loss of generality, we
may assume h < j. Then R must contain a vertex u ∈ Vi−1+j . From (3.7) and
vj ∈ Vi−1+j , it follows that vj ≺ u. Thus R0 ≺ R for otherwise they are crossing.
Similarly, R0 must contain a vertex u0 ∈ Vi−1+h . By (3.7), we have vh ≺ u0 ,
implying that R and R0 are crossing. Now set δ = min{ȳ(R), ȳ(R0 )} and define

 ȳ(Q) + δ if Q = R ∧ R0 or R ∨ R0 ,
ỹ(R) = ȳ(Q) − δ if Q = R or R0 ,

ȳ(Q) otherwise.
Then it is easy to verify that ỹ ≥ 0, ỹ T H = ȳ T H, and ỹ T em = ȳ T em . Hence ỹ
is an optimum solution to (1) with ȳ ≺ ỹ, contradicting the choice of ȳ.
This completes the proof. t
u
Lemma 3.2 In addition to the hypothesis of Lemma 3.1, if the weight w is
m
integral, then the lexicographically largest packing ȳ ∈ {y ∈ R+ | y T H ≤ wT } is
integral.
Proof. According to the statement of Lemma 3.1, the lexicographically largest
packing is optimum to (1). Based on this observation, we can come up with a
greedy algorithmic proof of the present statement as follows.
At the current step, let i ≤ k − 3 be the smallest subscript with Vi 6= ∅
and let vj∗ be the smallest vertex in Vj with respect to the linear order ≺ as

defined in Lemma 2.1. If (vj+1 , vj∗ ) 6∈ E for j = i, i + 1, or i + 2, then remove
∗ ∗ ∗ ∗ ∗
vj from D; else, set Q = vi+3 vi+2 vi+1 vi∗ , y(Q∗ ) = min{w(vj∗ ) | i ≤ j ≤ i + 3},
w(vj ) := w(vj ) − y(Q ), i ≤ j ≤ i + 3, and remove all the vertices vj∗ with
∗ ∗ ∗

w(vj∗ ) = 0 from D, i ≤ j ≤ i + 3. Repeat the process.


Since y(Q∗ ) is integral for each iteration, the solution is integral. In view of
Lemma 3.1, it is also optimum.
Recall Lemma 2.2, it takes O(|V |2 ) time to construct D. Since at least one
vertex is removed from D in each iteration of the present algorithm, and each it-
eration takes time O(|V |), the total time complexity of our algorithm is O(|V |2 ).
t
u
Combining Lemmas 3.1 and 3.2, we get the following result.
Theorem 3.1 Let T = (V, E) be a bipartite tournament with integral vertex-
weight w ≥ 0 and with no subdigraph isomorphic to F1 nor F2 . Then the lexi-
m
cographically largest packing ȳ ∈ {y ∈ R+ | y T H ≤ wT } is an integral optimum
solution to (1).
82 Mao-cheng Cai, Xiaotie Deng, and Wenan Zang

4 The Min-Max Relation


Now let us turn to investigate the (fractional) C4 -covering problem min{wT x | x ≥
n
0; Hx ≥ em }, where w ∈ Z+ .

Theorem 4.1 Let H be the C4 × V incidence-matrix of a bipartite tournament


n
T . Then the linear system {x ∈ R+ | Hx ≥ em } is TDI if and only if T contains
no subdigraph isomorphic to F1 nor F2 .
n
Proof. Recall the definition, {x ∈ R+ | Hx ≥ em } is TDI if and only if (1) has an
integral optimal solution for any nonnegative integral vector w. The sufficiency
follows directly from Theorem 3.1. Let us now justify the necessity. Suppose the
contrary: T contains Fi , where i = 1 or 2. Let w be such that w(v) = 1 if v is
a vertex in Fi and 0 otherwise. Then (1) has no integral optimal solution with
respect to w. To justify it, observe that
(a) F1 contains three C4 ’s in total; each vertex in F1 is on exactly two C4 ’s.
(b) F2 contains six C4 ’s in total; each vertex in F2 is on at most five C4 ’s.
Now set y(C4 ) = 1/2 (resp. 1/5) for each C4 in F1 (resp. in F2 ) and 0 otherwise,
then y is a feasible solution to (1) with y T em > 1. Since for any integral solution
ỹ to (1), ỹ T em corresponds to the number of some vertex-disjoint C4 ’s in Fi ,
which is at most 1. Thus (1) has no integral optimal solution with respect to w,
a contradiction. t
u

Edmonds and Giles [2] proved that if {Ax ≥ b, x ≥ 0} is a TDI system, b is


integral, and the minimum value in the LP-duality equation

Min {cT x | Ax ≥ b, x ≥ 0} = Max {y T b | y T A ≤ cT , y ≥ 0}

is finite for every integral vector c for which the maximum is finite, then the
minimum has an integral optimum solution. Based on this theorem, Theorem
4.1 and Theorem 3.1, we can instantly establish the following min-max result.

Theorem 4.2 Let T = (V, E) be a bipartite tournament with integral vertex


weight w ≥ 0 and with no subdigraph isomorphic to F1 nor F2 . Then the C4 -
packing number of T equals the C4 -covering number of T .

Let T be a bipartite tournament with no F1 nor F2 . As shown in the proof


of Lemma 3.2, a maximum C4 -packing in T can be obtained by a greedy al-
gorithm in time O(|V |2 ). Let us point out that a minimum C4 -covering in T
can be obtained in time O(|V |3 ) using the maximum C4 -packing algorithm as a
subroutine.
To start with, set C = ∅ and let D be the digraph as constructed in (3.1).
At the current step, let i ≤ k − 3 be the smallest subscript with Vi 6= ∅ and
let vj∗ be the smallest vertex in Vj with respect to the linear order ≺ as defined

in Lemma 2.1. If (vj+1 , vj∗ ) 6∈ E for some j = i, i + 1, or i + 2, then remove

vj from D; else, apply the maximum P4 -packing algorithm to D and to each
A Min-Max Theorem on Feedback Vertex Sets 83

D − {vj∗ } for i ≤ j ≤ i + 3 to find optimum solutions, denoted by ȳ(D) and


ȳ(D − {vj∗ }), respectively. Let vj∗ , with i ≤ j ≤ i + 3, be a vertex satisfying
eT ȳ(D − {vj∗ }) + w(vj∗ ) = eT ȳ(D). Set C = C ∪ {vj∗ } and D = D − {vj∗ }. Repeat
the process.
Since it takes O(|V |2 ) to find the desired vj∗ by the maximum P4 -packing
algorithm and at least one vertex is removed at each iteration, the total time
complexity of the algorithm is O(|V |3 ).

5 General Bipartite Tournaments

The min-max relation together with the above minimum cycle covering algorithm
lead to a 3.5-approximation algorithm for the feedback vertex set problem on
general bipartite tournaments, which relies on “eliminating” the problematic
subdigraphs, F1 and F2 , from T .
Given a bipartite tournament T = (V, E) such that each vertex v ∈ V is
associated with a positive integer w(v), recall Lemma 2.2, we can find an F1 or F2 ,
or a partition {V1 , V2 , . . . , Vk } of V as described in Lemma 2.1 in time O(|V |2 ).
Set C 0 = ∅. If an Fj , where j = 1 or 2, is output, then set δ = min{w(v) | v ∈
V (Fj )}, w(v) = w(v) − δ for all v ∈ V (Fj ), C0 = {v ∈ V (Fj ) | w(v) = 0}, C 0 =
C 0 ∪ C0 and T = T − C0 ; else, construct the digraph D as described in (3.1)
and apply the minimum P4 -covering algorithm to D to get a minimum P4 -
covering C 00 for D. Then C 0 ∪ C 00 is a C4 -covering of T . It is easy to see that the
performance guarantee of the algorithm is 3.5.
On the other hand, the problem is NP-complete in general.

Theorem 5.1 The feedback vertex set problem (given c > 0 whether there is a
feedback vertex set of size at most c) on bipartite tournaments is N P -complete
and approximable within 3.5.

Proof. The approximation ratio follows from the above argument. Let us show
the NP-completeness. Obviously, the problem is in N P . To prove the assertion,
it suffices to reduce the 3-SATISFIABILITY problem (3SAT ) to the feedback
vertex set problem on bipartite tournaments. Let U = {u1 , u2 , . . . , un } be the
set of variables and let C = {c1 , c2 , . . . , cm } be the set of clauses in an arbitrary
instance of 3SAT . We aim to construct a bipartite tournament T = (V, E) such
that T has a feedback vertex set of size n + 2m if and only if C is satisfiable.
The construction consists of several components: truth-setting components,
satisfaction testing components, and membership components, which are aug-
mented by some additional arcs so that the resulting digraph is a bipartite
tournament.

• For each variable ui ∈ U , there is a truth-setting component Ti = (Vi , Ei )


with Vi = {ui , ūi , ai , a0i } and Ei = {(ui , ai ), (ai , ūi ), (ūi , a0i ), (a0i , ui )}. Note
that Ti is a directed cycle of length four.
84 Mao-cheng Cai, Xiaotie Deng, and Wenan Zang

• For each clause cj ∈ C, there is a satisfaction testing component Sj =


(Vj0 , Ej0 ) with

Vj0 = {x1j , x2j , x3j , yj1 , yj2 , yj3 },


Ej0 = {(yjh , xhj )|h = 1, 2, 3} ∪ {(xhj , yjk )|1 ≤ h 6= k ≤ 3}.

Notice that Sj is isomorphic to the forbidden digraph F1 with the correspon-


dence xhj ↔ p2h , yjh ↔ p2h−1 for h = 1, 2, 3.
• For each clause cj ∈ C, let zj1 , zj2 , and zj3 denote the three literals in cj .
For each literal zji , there is a membership component Mji = (V̂ji , Êji ) with
V̂i = {pij , qji } and Êji = {(pij , qji )}.
The following two sets will form the bipartition of the desired bipartite tourna-
ment.

W = {ui , ūi ∈ Vi | 1 ≤ i ≤ n} ∪ {qji ∈ V̂ji | 1 ≤ i ≤ 3, 1 ≤ j ≤ m}


∪{yji ∈ Vj0 | 1 ≤ i ≤ 3, 1 ≤ j ≤ m},
B = {ai , a0i ∈ Vi | 1 ≤ i ≤ n} ∪ {pij ∈ V̂ji | 1 ≤ i ≤ 3, 1 ≤ j ≤ m}
∪{xij ∈ Vj0 | 1 ≤ i ≤ 3, 1 ≤ j ≤ m}.

For convenience, write

V ∗ = ∪ni=1 Vi , V 0 = ∪m 0
j=1 Vj , V̂ = ∪m 3 i
j=1 ∪i=1 V̂j .

Now let us proceed to the construction of the remaining arc-set.


• First add arc-set
{(W ∩ Vi , B ∩ Vj ), (B ∩ Vi , W ∩ Vj ) | 1 ≤ i < j ≤ n}
∪ {(W ∩ Vi0 , B ∩ Vj0 ), (B ∩ Vi0 , W ∩ Vj0 ) | 1 ≤ i < j ≤ n}
∪ {(pij , qkh ), (qji , phk ) | 1 ≤ j < k ≤ m, 1 ≤ i ≤ 3, 1 ≤ h ≤ 3}
∪ {(pij , qjh ), (qji , phj ) | 1 ≤ j ≤ m, 1 ≤ i < h ≤ 3}.

• Then add arc-set


{(W ∩ V ∗ , B ∩ (V 0 ∪ V̂ ))} ∪ {(B ∩ V ∗ , W ∩ (V 0 ∪ V̂ ))}
∪ {(W ∩ V̂ , B ∩ V 0 )} ∪ {(B ∩ V̂ , W ∩ V 0 )}.

• Finally, for each clause Cj and each literal zji ∈ Cj , reverse the arc (zji , xij ),
that is, if zji = uk for some k, then replace (uk , xij ) by (xij , uk ); if zji = ūk for
some k, then replace (ūk , xij ) by (xij , ūk ).
The construction is completed. It is easy to see that the construction can be
accomplished in polynomial time and the resulting digraph is a bipartite tour-
nament with 12m + 4n vertices.
Let us show that T has a feedback vertex set of size n + 2m if and only if
C is satisfiable. Our proof heavily relies on the following observation: if B is a
A Min-Max Theorem on Feedback Vertex Sets 85

feedback vertex set of T with n + 2m vertices, then B ∗ contains exactly one


vertex from each Ti and exactly two vertices from each Sj since Ti ’s and Sj ’s
are pairwise vertex disjoint.
Sufficiency. Suppose T has a feedback vertex set of size n + 2m. Let B ∗
be such a feedback vertex set that minimizes |B ∗ ∩ {ai , a0i | 1 ≤ i ≤ n}|. Then
B ∗ ∩ {ai , a0i | 1 ≤ i ≤ n} = ∅. To justify it, assume the contrary: bi ∈ B ∗ for some
bi = ai or a0i . Set B 0 = (B ∗ \ {bi }) ∪ {ui }. Then by the assumption on B ∗ , there
exists a directed cycle C 0 with C 0 ∩ B 0 = ∅. Thus bi ∈ C 0 and ui 6∈ C 0 . Since
ui ai ūi a0i ui is the unique directed cycle containing bi in T [V ∗ ∪ V̂ ], C 0 contains
some vertex in V 0 . From the construction of T , it follows that C 0 contains an
arc (xhj , zjh ), where zjh ∈ cj and zjh = uk or ūk for some k, whence xhj zjh phj qjh xhj
is disjoint from B ∗ , contradicting the definition of B ∗ .
So |B ∗ ∩{ui , ūi }| = 1 for all 1 ≤ i ≤ n, we can thus obtain a truth assignment
τ : U → {true, f alse} by setting τ (ui ) = true if ui ∈ B ∗ and τ (ui ) = f alse if
ūi ∈ B ∗ , i = 1, 2 . . . , n. It remains to show that each clause cj is satisfied by τ .
Indeed, for each satisfaction testing component Sj , at least one of x1j , x2j , and
x3j , say xhj , is outside B ∗ . So the literal zjh ∈ B ∗ since B ∗ intersects the cycle
xhj zjh phj qjh xhj . Thus τ (zjh ) = true, in other words, cj is satisfied.
Necessity. Suppose that τ : U → {true, f alse} is a satisfying truth assign-
ment for C. Then there exists at least one true literal in each clause. We choose
h h
one (denote it by zj j ) from each clause cj such that τ (zj j ) = true. Set
h
B ∗ = {vi ∈ {ui , ūj } | τ (vi ) = true, 1 ≤ i ≤ n}∪{xij 6= xj j | 1 ≤ i ≤ 3; 1 ≤ j ≤ m}.

Clearly, |B ∗ | = n + 2m. Moreover, each directed cycle in T [V ∗ ∪ V̂ ] or in T [V 0 ] is


covered by B ∗ . Note that any other directed cycle contains some arc, say (xij , zji ),
from V 0 to V ∗ . If xij 6∈ B ∗ then, by the definition of B ∗ , τ (zji ) = true, implying
zji ∈ B ∗ . Thus B ∗ is a feedback vertex set. The proof is complete. t
u

6 Concluding Remarks
We generalize the approach developed in our previous work [1] to the feed-
back vertex set problem on bipartite tournaments (which is shown to be NP-
complete). The new structure characterization here is of its own interests and the
proof is much simplified. The TDI characterization yields a 3.5-approximation
algorithm for the feedback vertex set problem on general bipartite tournaments
when combined with the subgraph removal technique. We are still interested in
knowing whether this method of applying TDI characterization can be extended
to wider range of combinatorial optimization problems and would like to pursue
this direction further.

References
1. M. Cai, X. Deng, and W. Zang, A TDI System and Its Application to Approxima-
tion Algorithm, Proc. 39th IEEE Symposium on Foundations of Computer Science,
Palo Alto, 1998, pp. 227-231.
86 Mao-cheng Cai, Xiaotie Deng, and Wenan Zang

2. J. Edmonds and R. Giles, A Min-max Relation for Submodular Functions on


Graphs, Annals of Discrete Mathematics 1 (1977), 185-204.
3. J. Edmonds and R. Giles, Total Dual Integrality of Linear Systems, Progress in
Combinatorial Optimization (ed. W. R. Pulleyblank), Academic Press, 1984, pp.
117-131.
4. R. Giles, and W.R. Pulleyblank, Total Dual Integrality and Integral Polyhedra,
Linear Algebra Appli. 25 (1979), 191-196.
5. M. X. Goemans and D. P. Williamson, The Primal-Dual Method for Approxima-
tion Algorithms and Its Application to Network Design Problems, in: Approxima-
tion Algorithms for N P -Hard Problems (ed. D.S. Hochbaum), PWS Publishing
Company, 1997, pp. 144-191.
6. A. Schrijver, Total Dual Integrality from Directed Graphs, Crossing Families and
Sub- and Supermodular Functions, Progress in Combinatorial Optimization (ed.
W. R. Pulleyblank), Academic Press, 1984, pp. 315-362.
7. A. Schrijver, Polyhedral Combinatorics, in Handbook of Combinatorics (eds. R.L.
Graham, M. Groötschel, and L. Lovász), Elsevier Science B.V., Amsterdam, 1995,
pp. 1649-1704.
8. A. Schrijver, On Total Dual Integrality, Linear Algebra Appli. 38 (1981), 27-32.
9. E. Speckenmeyer, On Feedback Problems in Digraphs, in: Lecture Notes in Com-
puter Science 411, Springer-Verlag, 1989, pp. 218-231.
On the Separation of
Maximally Violated mod-k Cuts

Alberto Caprara1, Matteo Fischetti2 , and Adam N. Letchford3


1
DEIS, University of Bologna,
viale Risorgimento 2, 40136 Bologna, Italy
[email protected]
2
DEI, University of Padova,
via Gradenigo 6/A, 35131 Padova, Italy
[email protected]
3
Dept. of Mgt. Science, Lancaster University,
Lancaster LA1 4YW, United Kingdom
[email protected]

Abstract. Separation is of fundamental importance in cutting-plane


based techniques for Integer Linear Programming (ILP). In recent dec-
ades, a considerable research effort has been devoted to the definition
of effective separation procedures for families of well-structured cuts. In
this paper we address the separation of Chvátal rank-1 inequalities in
the context of general ILP’s of the form min{cT x : Ax ≤ b, x integer},
where A is an m × n integer matrix and b an m-dimensional integer
vector. In particular, for any given integer k we study mod-k cuts of
the form λT Ax ≤ bλT bc for any λ ∈ {0, 1/k, . . . , (k − 1)/k}m such that
λT A is integer. Following the line of research recently proposed for mod-
2 cuts by Applegate, Bixby, Chvátal and Cook [1] and Fleischer and
Tardos [16], we restrict to maximally violated cuts, i.e., to inequalities
which are violated by (k − 1)/k by the given fractional point. We show
that, for any given k, such a separation requires O(mn min{m, n}) time.
Applications to the TSP are discussed. In particular, for any given k,
we propose an O(|V |2 |E ∗ |)-time exact separation algorithm for mod-k
cuts which are maximally violated by a given fractional TSP solution
with support graph G∗ = (V, E ∗ ). This implies that we can identify a
maximally violated TSP cut whenever a maximally violated (extended)
comb inequality exists. Finally, specific classes of (sometimes new) facet-
defining mod-k cuts for the TSP are analyzed.

1 Introduction

Separation is of fundamental importance in cutting-plane based techniques for


Integer Linear Programming (ILP). In recent decades, a considerable research
effort has been devoted to the definition of effective separation procedures for
families of well-structured cuts. This line of research was originated by the pio-
neering work of Dantzig, Fulkerson and Johnson [12] on the Traveling Salesman

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 87–98, 1999.
c Springer-Verlag Berlin Heidelberg 1999
88 Alberto Caprara, Matteo Fischetti, and Adam N. Letchford

Problem (TSP) and led to the very successful branch-and-cut approach intro-
duced by Padberg and Rinaldi [24]. Most of the known methods have been
originally proposed for the TSP, a prototype in combinatorial optimization and
integer programming.
In spite of the large research effort, however, polynomial-time exact separa-
tion procedures are known for only a few classes of facet-defining TSP cuts. In
particular, no efficient separation procedure is known at present for the famous
class of comb inequalities [19]. The only exact method is due to Carr [7], and
requires O(n2t+3 ) time for separation of comb inequalities with t teeth on a
graph of n nodes. Recently, Letchford [21] proposed an O(|V |3 )-time separation
procedure for a superclass of comb inequalities, applicable when the fractional
point to be separated has a planar support.
Applegate, Bixby, Chvátal and Cook [1] recently suggested concentrating on
maximally violated combs, i.e., on comb inequalities which are violated by 1/2
by the given fractional point x∗ to be separated. This is motivated by the fact
that maximally violated combs exhibit a very strong combinatorial structure,
which can be exploited for separation. Their approach is heuristic in nature,
and is based on the solution of a suitably-defined system of mod-2 congru-
ences. Following this approach, Fleischer and Tardos [16] were able to design
an O(|V |2 log |V |)-time exact separation procedure for maximally violated comb
inequalities for the case where the support graph G∗ = (V, E ∗ ) of the fractional
point x∗ is planar.
It is well known that comb inequalities can be obtained by adding-up and
rounding a convenient set of TSP degree equations and subtour elimination con-
straints weighed by 1/2, i.e., they are {0, 12 }-cuts in the terminology of Caprara
and Fischetti [6]. These authors studied {0, 12 }-cuts in the context of general
ILP’s. They showed that the associated separation problem is equivalent to
the problem of finding a minimum-weight member of a binary clutter, i.e., a
minimum-weight {0, 1}-vector satisfying a certain set of mod-2 congruences.
This problem is NP-hard in general, as it subsumes the max-cut problem as
a special case.
In this paper we address the separation of Chvátal rank-1 inequalities in the
context of general ILP’s of the form min{cT x : Ax ≤ b, x integer}, where A is
an m × n integer matrix and b an m-dimensional integer vector. In particular,
for any given integer k we study mod-k cuts of the form λT Ax ≤ bλT bc for any
λ ∈ {0, 1/k, . . . , (k − 1)/k}m such that λT A is integer. We show that, for any
given k, separation of maximally violated mod-k cuts requires O(mn min{m, n})
time as it is equivalent to finding a {0, 1, . . . , k − 1}-vector satisfying a certain
set of mod-k congruences. We also discuss the separation of maximally violated
mod-k cuts in the context of the TSP. In particular, we show how to separate
efficiently maximally violated members of a family of cuts that properly contains
comb inequalities. Interestingly, this family contains facet-inducing cuts which
are not comb inequalities. We also show how to reduce from O(|V |2 ) to O(|V |)
the number of tight constraints to be considered in the mod-k congruence system,
where |V | is the number of nodes of the underlying graph. We investigate specific
On the Separation of Maximally Violated mod-k Cuts 89

classes of (sometimes new) mod-k facet-defining cuts for the TSP and then give
some concluding comments.

2 Maximally Violated mod-k Cuts

Given an m × n integer matrix A and an m-dimensional integer vector b, let


P := {x ∈ Rn : Ax ≤ b}, PI := conv{x ∈ Z n : Ax ≤ b}, and assume PI 6= P .
A Chvátal-Gomory cut is a valid inequality for PI of the form λT Ax ≤ bλT bc,
m
where the multiplier vector λ ∈ R+ is such that λT A ∈ Z n , and b·c denotes
lower integer part. In this paper we address cuts which can be obtained through
multiplier vectors λ belonging to {0, 1/k, . . . , (k − 1)/k}m for any given integer
k ≥ 2. We call them mod-k cuts, as their validity relies on mod-k rounding
arguments. Note that mod-2 cuts are in fact the {0, 12 }-cuts studied in Caprara
and Fischetti [6].
Any Chvátal-Gomory cut is a mod-k cut for some integer k > 0, as it is well
known that undominated Chvátal-Gomory cuts only arise for λ ∈ [0, 1)m , since
replacing any λi by its fractional part λi − bλi c always leads to an equivalent
or stronger cut. Moreover, λ can always be assumed to be rational, i.e., an
m
integer k > 0 exists such that kλ is integer. Indeed, for any given λ ∈ R+ with
T T n
α := λ A ∈ Z one can obtain an equivalent (or better) multiplier vector λ̃ by
T T T
solving the linear program
 −1 min{ λ̃ b : λ̃ A = α , λ̃T≥ 0}, whose basic solutions
are of the form λ̃ = B α, 0 for some basis B of A . Hence det(B)· λ̃ is integer,
as claimed.
We are interested in the following separation problem, in its optimization
version:

mod-k SEP: Given x∗ ∈ P , find λ ∈ {0, 1/k, . . . , (k − 1)/k}m such that


λT A ∈ Z n , and bλT bc − λT Ax∗ is a minimum.

Following [6], this problem can equivalently be restated in terms of the integer
multiplier vector µ := kλ ∈ {0, 1, . . . , k − 1}m . For any given z ∈ Z and k ∈ Z+ ,
let z mod k := z − bz/kck. As is customary, notation a ≡ b (mod k) stands
for a mod k = b mod k. Given an integer matrix Q = (qij ) and k ∈ Z+ , let
Q = (q ij ) := Q mod k denote the mod-k support of Q, where q ij := qij mod k for
all i, j. Then, mod-k SEP is equivalent to the following optimization problem.

mod-k SEP: Given x∗ ∈ P and the associated slack vector s∗ := b −


Ax∗ ≥ 0, solve
δ ∗ := min (s∗ T µ − θ) (1)
subject to
T
A µ≡0 (mod k) (2)
T
b µ≡θ (mod k) (3)
90 Alberto Caprara, Matteo Fischetti, and Adam N. Letchford

µ ∈ {0, 1, . . . , k − 1}m (4)


θ ∈ {1, . . . , k − 1}. (5)

By construction, (s∗ T µ−θ)/k gives the slack of the mod-k cut λT Ax ≤ bλT bc
for λ := µ/k, computed with respect to the given point x∗ . Hence, there exists a
mod-k cut violated by x∗ if and only if the minimum δ ∗ in (1) is strictly less than
0. Observe that s∗ ≥ 0 and θ ≤ k − 1 imply δ ∗ ≥ 1 − k, i.e., no mod-k cut can be
violated by more than (k − 1)/k. This bound is attained for θ = k − 1, when the
mod-k congruence system (2)–(4) has a solution µ with µi = 0 whenever s∗i > 0.
In this case, the resulting mod-k cut is said to be maximally violated.
Even for k = 2, mod-k SEP is NP-hard as it is equivalent to finding a
minimum-weight member of a binary clutter [6]. However, finding a maximally
violated mod-k cut amounts to finding any feasible solution of the congruence
system (2)–(4) after having fixed θ = k − 1 and having removed all the rows of
(A, b) associated with a strictly positive slack s∗i . For any k prime this solution,
if any exists, can be found in O(mn min{m, n}) time by standard Gaussian
elimination in GF (k).
For k nonprime GF (k) is not a field, hence Gaussian elimination cannot be
performed. On the other hand, there exists an O(mn min{m, n})-time algorithm
to find, if any, a solution of the mod-k congruence system (2)–(4) even for k
nonprime, provided a prime factorization of k is known; see, e.g., Cohen [11].
The above considerations lead to the following result.
Theorem 1. For any given k, maximally violated mod-k cuts can be found in
O(mn min{m, n}) time, provided a prime factorization of k is known.
It is worth noting that mod-k SEP with µi = 0 whenever s∗i > 0 can be solved
efficiently by fixing θ to any value in {1, . . . , k − 1}. We call the corresponding
solutions of (2)–(5) totally tight mod-k cuts. The following theorem shows that,
for k prime, the existence of a totally tight mod-k cut implies the existence of a
maximally violated mod-k cut.
Theorem 2. For any k prime, a maximally violated mod-k cut exists if and
only if a totally tight mod-k cut exists.

Proof. One direction is trivial, as a maximally violated mod-k cut is also a totally
tight mod-k cut. Assume now that a totally tight mod-k cut exists, associated
with a vector (µ, θ) satisfying (2)–(5) and such that µi = 0 for all s∗i > 0. If
θ 6= k − 1 and k is prime, µ can always be scaled by a factor w ∈ {2, . . . , k − 1}
T T
such that A wµ ≡ 0 (mod k) and b wµ ≡ k − 1 (mod k).

Note that Theorem 2 cannot be extended to the case of k nonprime.


Of course, not all maximally violated mod-k cuts are guaranteed to be facet
defining for PI . In particular, a cut is not facet defining whenever it is associated
with a nonminimal solution µ of the congruence system (2)–(4), where θ has
been fixed to k − 1 (barring the case of equivalent formulations of the same
facet-defining cut). Indeed, the inequality associated with any solution µ̃ ≤ µ is
On the Separation of Maximally Violated mod-k Cuts 91

violated whenever the one associated with µ is. Hence one is motivated in finding
maximally violated mod-k cuts which are associated with minimal solutions. This
can be done with no extra computational effort for k prime since, for any fixed
θ, all basic solutions to (2)–(4) are minimal by construction. Unfortunately, the
algorithm for k nonprime does not guarantee finding a minimal solution. On the
other hand, the following result holds.
Theorem 3. If there exists a maximally violated mod-k cut for some k non-
prime, a maximally violated mod-` cut exists also for every ` which is a prime
factor of k.

Proof. First of all, observe that Qy ≡ d (mod k) implies Qy ≡ d (mod `) for each
prime factor ` of k. Hence, given a solution (µ, θ) of (2)–(5) with θ = k − 1, the
vector (µ, θ) mod ` yields a totally tight mod-` cut, as θ mod ` = k − 1 mod ` 6= 0.
The claim then follows from Theorem 2.

It is then natural to concentrate on the separation of maximally violated mod-k


cuts for some k prime. For several important problems these cuts define facets
of PI , as shown for the TSP in Section 4.

3 Separation of Maximally Violated mod-k Cuts for the


TSP
The TSP polytope is defined as the convex hull of the characteristic vectors of
all the Hamiltonian cycles of a given complete undirected graph G = (V, E). For
any S ⊆ V , let δ(S) denote the set of the edges with exactly one end node in S,
and E(S) denote the set of the edges with both end nodes in S. Moreover, for
any A, B ∈ V we write E(A : B) for δ(A) ∩ δ(B). As is customary, for singleton
P of {v}. For any real function x : E → R and for
node sets we write v instead
any Q ⊆ E, let x(Q) := e∈Q xe .
A widely-used TSP formulation is based on the following constraints, called
degree equations, subtour elimination constraints (SEC’s), and nonnegativity
constraints, respectively:

x(δ(v)) = 2, for all v ∈ V (6)


x(E(S)) ≤ |S| − 1, for all S ⊂ V, |S| ≥ 2 (7)
−xe ≤ 0, for all e ∈ E. (8)

We next address the separation of maximally violated mod-k cuts that can
be obtained from (6)–(8). Given a point x∗ ∈ RE satisfying (6)–(8), we call tight
any node set S with x∗ (E(S)) = |S| − 1. It is well known that only O(|V |2 ) tight
sets exist, which can be represented by an O(|V |)-sized data structure called
cactus tree [13]. A cactus tree associated with x∗ can be found efficiently in
O(|E ∗ ||V | log(|V |2 /|E ∗ |)), where E ∗ := {e ∈ E : x∗e > 0} is the support of x∗ ;
see [15] and also [20]. Moreover, we next show that only O(|V |) tight sets need
be considered explicitly in the separation of maximally violated mod-k cuts.
92 Alberto Caprara, Matteo Fischetti, and Adam N. Letchford
' $' $
e 6e
5
e
7
e
8
e9 1

B6 4 e 10 e 11
e e13 1/2

2 e e3 e16
& 
 %
 17 e e e " e15
B1 1 e
 &  12 14 "
" %

  "
"
" B 5

   
"
"
e e e"
 "
20 
19 
18
B2
B3 B4

Fig. 1. A fractional point x∗ and one of its necklaces.

Applegate, Bixby, Chvátal and Cook [1] and Fleischer and Tardos [16] showed
that tight sets can be arranged in necklaces. A necklace of size q ≥ 3 is a partition
of V into a cyclic sequence of tight sets B1 , . . . , Bq called beads; see Figure 1 for
an illustration. To simplify notation, the subscripts in B1 , . . . , Bq are intended
modulo q, i.e., Bi = Bi+hq for all integer h. Beads in a necklace satisfy:
(i) Bi ∪ Bi+1 ∪ . . . ∪ Bi+t is a tight set for all i = 1, . . . , q and t = 0, . . . , q − 2,
(ii) x∗ (E(Bi : Bj )) is equal to 1 if j ∈ {i + 1, i − 1}, and 0 otherwise.
A pair (Bi , Bi+1 ) of consecutive beads in a necklace is called a domino. We
allow for degenerate necklaces with q = 2 beads, in which x∗ (E(B1 : B2 )) = 2.
Degenerate necklaces have no dominoes.
Given x∗ satisfying (6)–(8), one can find in time O(|E ∗ ||V | log(|V |2 /|E ∗ |)) a
family F (x∗ ) of O(|V |) necklaces with the property that every tight set is the
union of consecutive beads in a necklace of the family. The next theorem shows
that the columns in the congruence system (2)–(4) corresponding to tight SEC’s
are linearly dependent, in GF (k), on a set of columns associated with degree
equations, tight nonnegativity constraints, and tight SEC’s corresponding to
beads and dominoes in F (x∗ ).
Theorem 4. If any TSP mod-k cut is maximally violated by x∗ , then there
exists a maximally violated mod-k cut whose Chvátal-Gomory derivation uses
SEC’s associated with beads and dominoes (only) of necklaces of F (x∗ ).
Proof. Let S be any tight set whose SEC is used in the Chvátal-Gomory deriva-
tion of some maximally violated mod-k cut. By the properties of F (x∗ ), S is the
union of consecutive beads B1 , . . . , Bt of a certain necklace B1 , . . . , Bq in F (x∗ ),
1 ≤ t ≤ q − 1. If t ≤ 2, then S is either a bead or a domino, and there is nothing
to prove. Assume then t ≥ 3, as in Figure 2, and add together:
On the Separation of Maximally Violated mod-k Cuts 93
 
Bq Bq−1
'   J
' $ JJJJ $
J
JJ
 J 
J
JJ
J
B1 J Bq−2
  B
  BBB B



 

 B B
BBB
BB
BB
B2 1 Bt+2
 B 
  
BBB B
B B
BB BB  
 

BBB Bt+1
B3
 J 
& JJJJ J%
 1 
JJ 
JJ k − 1

JJ t1
JJ
B t t Bt
t−1

 
  
& %
S

Fig. 2. Illustration for the proof of Theorem 4.

– the SEC on B1 ∪ B2 ∪ . . . ∪ Bt−2 ,


– the SEC on Bt−1 multiplied by k − 1,
– the SEC on Bt ,
– the degree equations on every v ∈ Bt−1 ,
– the nonnegativity inequalities −xe ≤ 0 for every e ∈ E(Bt−1 : Bt+1 ∪ . . . ∪
Bq ),
– the nonnegativity inequalities −xe ≤ 0 multiplied by k − 1 for every e ∈
E(Bt : B1 ∪ . . . ∪ Bt−2 ).
This gives the following inequality:

αT x := x(E(S)) + kx(E(Bt−1 )) − kx(E(Bt : B1 ∪ . . . ∪ Bt−2 )) ≤

α0 := |S| + k|Bt−1 | − k − 1.
All the inequalities used in the combination are tight at x∗ . Moreover, all the
coefficients in αT x ≤ α0 are identical, modulo k, to the coefficients of the SEC
x(E(S)) ≤ |S| − 1. So we can use the inequalities in the derivation of αT x ≤ α0
in place of the original SEC to obtain a (different) maximally violated mod-k
cut. Applying this procedure recursively yields the result.

As an immediate consequence, one has


Theorem 5. For any given k, maximally violated mod-k cuts for the TSP can
be found in O(|E ∗ ||V |2 ) time, i.e., in O(|V |4 ) time in the worst case.
94 Alberto Caprara, Matteo Fischetti, and Adam N. Letchford

Proof. Theorem 1 gives an O(mn min{m, n})-time separation algorithm, where


m is the number of tight constraints (6)-(7), and n = |E ∗ | = O(|V |2 ) is the
number of fractional components in x∗ . By virtue of Theorem 4, only O(|V |)
tight sets need be considered explicitly, hence m = O(|V |) and the claim follows.

The practical efficiency of the mod-k separation algorithm can be improved


even further, as it turns out that one can always disregard all dominoes ex-
cept one (arbitrarily chosen) in each necklace. This is expressed in the following
theorem, whose proof is contained in the full paper.
Theorem 6. Let B contain, for each necklace in F (x∗ ), all beads and a single
(arbitrary) domino. If any TSP mod-k cut is maximally violated by x∗ , then
there exists a maximally violated mod-k cut whose Chvátal-Gomory derivation
uses SEC’s associated with sets S ∈ B only.
In the full paper we will also derive similar results for the Asymmetric TSP
polytope.

4 Specific Classes of mod-k Cuts for the TSP

In this section we analyze specific classes of facet-defining mod-k cuts for the
Symmetric TSP. We also briefly mention some analogous results for the Asym-
metric TSP which will be presented in detail in the full paper.
We first address mod-2 cuts that can be obtained from (6)–(8). A well known
class of such cuts is that of comb inequalities, as introduced by Edmonds [14] in
the context of matching theory, and extended by Chvátal [10] and by Grötschel
and Padberg [17,18] for the TSP. Comb inequalities are defined as follows. We
are given a handle set H ⊂ V and t ≥ 3, t odd, tooth sets T1 , . . . , Tt ⊂ V such
that Ti ∩ H 6= ∅ and Ti \ H 6= ∅ hold for any i = 1, . . . , t. The comb inequality
associated with H, T1 , . . . , Tt reads:
t
X t
X t+1
x(E(H)) + x(E(Ti )) ≤ |H| + (|Ti | − 1) − . (9)
i=1 i=1
2

The simplest case of comb inequalities arises for |Ti | = 2 for i = 1, . . . , t, leading
to the Edmonds’ 2-matching constraints. It is well known that comb inequalities
define facets of the TSP polytope [19]. Also well known is that comb inequalities
are mod-2 cuts.
As already mentioned, no polynomial-time exact separation algorithm for
comb inequalities is known at present. A heuristic scheme for maximally violated
comb inequalities has been recently proposed by Applegate, Bixby, Chvátal and
Cook [1], and elaborated by Fleischer and Tardos [16] to give a polynomial-time
exact method for the case of x∗ with planar support. Here, comb separation is
viewed as the problem of “building-up” a comb structure starting with a given
set of dominoes. The interested reader is referred to [1] and [16] for a detailed
description of the method.
On the Separation of Maximally Violated mod-k Cuts 95

c c
 c    s  
c c c c c c H s s s s s s
LS L B
L
SL B
cLcSLc
c c B c c c c c c c

    
T1 T2 T3
(a) (b)

Fig. 3. (a) The support graph of a simple extended comb inequality; all the
drawn edges, as well as the edges in E(H), have coefficient 1. (b) A mod-2
derivation, obtained by combining the degree equations on the black nodes and
the SEC’s on the sets drawn in continuous line (the nonnegativity inequalities
used in the derivation are not indicated).

Theorem 5 puts comb separation in a different light, in that it allows for


efficient exact separation of maximally violated members of the family of mod-2
cuts which contains, among others, comb inequalities.
One may wonder whether comb inequalities are the only TSP facet-defining
mod-2 cuts with respect to formulation (6)–(8). This is not the case; in particular,
we address in the full paper the facet-defining extended comb inequalities of
Naddef and Rinaldi [22] (see Figure 3 for an illustration) and prove the following

Theorem 7. Extended comb inequalities are facet-defining TSP mod-2 cuts.


Extended comb inequalities can be derived from 2-matching constraints by
means of two general lifting operations, called edge-cloning and 0-node lifting.
These operations have been studied by Naddef and Rinaldi [23] who proved that,
under mild assumptions, they preserve the facet-defining property of the original
inequality. Interestingly, at least for the case of extended comb inequalities both
operations do not increase the Chvátal rank [25] of the starting inequality, and
also preserve the property of being a mod-2 cut. One may wonder whether this
property is true in general. An answer to this question will be given in the full
paper, where we study the two operations in the more general context of the
Asymmetric TSP.
A family N of sets S1 , . . . , Sk ⊆ V is called nested (or laminar) if, for all i, j,
Si ∩ Sj 6= ∅ implies Si ⊆ Sj or Sj ⊆ Si . The node sets associated with SEC’s
with nonzero multipliers in the Chvátal-Gomory derivation of an extended comb
inequality define a nested family N with nesting degree not greater than 2, in the
sense that N does not contain 3 subsets S1 ⊂ S2 ⊂ S3 . Actually, it is easy to show
that any mod-k cut can be derived by only using SEC’s associated with subsets
defining a nested family. Interestingly, there are mod-2 facet-defining TSP cuts
whose Chvátal-Gomory derivation involves SEC’s with nesting level greater than
96 Alberto Caprara, Matteo Fischetti, and Adam N. Letchford

' $
# #
     
c c c c c c s s s s s s
b
@b lA ,  ,
1
1/2 @,b l
Abl,
, 
@A,bb
c c c, c,@@A c ll
bc c c c c c c
1/8
,
 ,
     
"! "!
& %
(a) (b)

Fig. 4. (a) A fractional point x∗ which violates maximally no extended comb


inequality. (b) The derivation of a mod-2 cut which is maximally violated by x∗ .

2. Here is an example. Consider the fractional point x∗ of Figure 4(a). It is not


hard to check by complete enumeration that x∗ maximally violates no extended
comb inequality. However, x∗ maximally violates the mod-2 cut whose derivation
is illustrated in Figure 4(b). It can be shown that this inequality is facet defining
for the TSP polytope when |V | ≥ 12.
Examples of facet-defining mod-3 cuts for the TSP are the Christof, Jünger
and Reinelt [9] NEW1 inequality along with a generalization which we will in-
troduce in the full paper.
Also in the full paper we give examples of facet-defining mod-k cuts for
the Asymmetric TSP. These include, for k = 2, a generalization of the source-
destination inequalities of Balas and Fischetti [5], for k = 3, generalizations of
the NEW1 inequality and the C3 inequalities of Grötschel and Padberg [19]
and, finally, for arbitrary k, generalizations of the Dk+ and Dk− inequalities of
Grötschel and Padberg [19].

5 Concluding Remarks

Recent developments in cutting-plane algorithms, such as the work of Balas,


Ceria and Cornuéjols [2,3] and Balas, Ceria, Cornuéjols and Natraj [4] on lift-
and-project (disjunctive) cuts and Gomory cuts, put the emphasis on the sepa-
ration of large classes of inequalities which are not given explicitly. The approach
developed in this paper provides still another tool for tackling hard problems.
Future theoretical research should be devoted to the study of the structure
of undominated (facet-defining) mod-k TSP cuts. One should also address mod-
k cuts for other combinatorial problems. Furthermore, the practical use of the
separation methods herein proposed should be investigated.
On the Separation of Maximally Violated mod-k Cuts 97

References
1. D. Applegate, R. Bixby, V. Chvátal, W. Cook (1995). Finding cuts in the TSP (A
preliminary report). Technical Report Technical Report 95–05, DIMACS, Rutgers
University, New Brunswick, NJ.
2. E. Balas, S. Ceria, G. Cornuéjols (1993). A lift-and-project cutting plane algorithm
for mixed 0-1 programs. Math. Program. (A) 58, 295–324.
3. E. Balas, S. Ceria, G. Cornuéjols (1996). Mixed 0-1 programming by lift-and-
project in a branch-and-cut framework. Management Sci. 42, 1229–1246.
4. E. Balas, S. Ceria, G. Cornuéjols, N. Natraj (1996). Gomory cuts revisited. Oper.
Res. Lett. 19, 1–9.
5. E. Balas, M. Fischetti (1993). A lifting procedure for the asymmetric traveling
salesman polytope and a large new class of facets. Math. Program. (A) 58, 325–
352.
6. A. Caprara, M. Fischetti (1996). {0, 12 }-Chvátal-Gomory cuts. Math. Program. (A)
74, 221–235.
7. R. Carr (1995). Separating clique tree and bipartition inequalities in polynomial
time. E. Balas, J. Clausen (eds.). Integer Programming and Combinatorial Op-
timization 4, Lecture Notes in Computer Science, 920, Berlin. Springer-Verlag,
40–49.
8. S. Ceria, G. Cornuéjols, M. Dawande (1995). Combining and strengthening Go-
mory cuts. E. Balas, J. Clausen (eds.). Integer Programming and Combinatorial
Optimization 4, Lecture Notes in Computer Science, 920, Berlin. Springer-Verlag,
438–451.
9. T. Christof, M. Jünger, G. Reinelt (1991). A complete description of the traveling
salesman polytope on 8 nodes. Oper. Res. Lett. 10, 497–500.
10. V. Chvátal (1973). Edmonds polytopes and weakly Hamiltonian graphs. Math.
Program. 5, 29–40.
11. H. Cohen (1995). A Course in Computational Algebraic Number Theory, Springer-
Verlag, Berlin.
12. G. Dantzig, D. Fulkerson, S. Johnson (1954). Solution of a large scale traveling-
salesman problem. Oper. Res. 2, 393–410.
13. E.A. Dinitz, A.V. Karzanov, M.V. Lomosonov (1976). On the structure of a fam-
ily of minimal weighted cuts in a graph. A.A. Fridman (ed.) Studies in Discrete
Optimization, Moscow Nauka, 290–306 (in Russian).
14. J. Edmonds (1965). Maximum matching and a polyhedron with 0,1-vertices. J.
Res. Natl. Bureau of Standards 69, 125–130.
15. L. Fleischer (1998). Building the chain and cactus representations of all minimum
cuts from Hao-Orlin in same asymptotic run time. R. Bixby, E. Boyd, R. Rios
Mercado (eds.). Integer Programming and Combinatorial Optimization 6, Lecture
Notes in Computer Science, Berlin. Springer-Verlag.
16. L. Fleischer, É. Tardos (1996). Separating maximally violated comb inequalities
in planar graphs. W. Cunningham, S. McCormick, M. Queyranne (eds.). Integer
Programming and Combinatorial Optimization 5, Lecture Notes in Computer Sci-
ence, 1084, Berlin. Springer-Verlag, 475–489. Revised version to appear in Math.
Oper. Res.
17. M. Grötschel, M. Padberg (1979). On the symmetric traveling salesman problem I:
Inequalities. Math. Program. 16, 265–280.
18. M. Grötschel, M. Padberg (1979). On the symmetric traveling salesman problem II:
lifting theorems and facets. Math. Program. 16, 281–302.
98 Alberto Caprara, Matteo Fischetti, and Adam N. Letchford

19. M. Grötschel, M. Padberg (1985). Polyhedral theory. E. Lawler, J. Lenstra, A. Rin-


nooy Kan, D. Shmoys (eds.). The Traveling Salesman Problem, John Wiley & Sons,
Chichester, 251–305.
20. D. Karger, C. Stein (1996). A new approach to the minimum cut problem. J. ACM
43, 601–640.
21. A.N. Letchford (1998). Separating a superclass of comb inequalities in planar
graphs. Technical Report, Dept. of Man. Science, The Management School, Lan-
caster University, 1998.
22. D. Naddef, G. Rinaldi (1988). The symmetric traveling salesman polytope: New
facets from the graphical relaxation. Technical Report 248, IASI-CNR, Rome.
23. D. Naddef, G. Rinaldi (1993). The graphical relaxation: A new framework for the
symmetric traveling salesman polytope. Math. Program. (A) 58, 53–88.
24. M. Padberg, G. Rinaldi (1991). A branch and cut algorithm for the resolution of
large-scale symmetric traveling salesman problems. SIAM Rev. 33, 60–100.
25. A. Schrijver (1986). Theory of Linear and Integer Programming, John Wiley &
Sons, New York.
Improved Approximation Algorithms for
Capacitated Facility Location Problems

Fabián A. Chudak1 and David P. Williamson2


1
IBM T.J. Watson Research Center, Room 36-241, P.O. Box 218,
Yorktown Heights, NY, 10598. [email protected]
2
IBM T.J. Watson Research Center, Room 33-219, P.O. Box 218,
Yorktown Heights, NY, 10598. [email protected]
http://www.research.ibm.com/people/w/williamson

Abstract. In a recent surprising result, Korupolu, Plaxton, and Rajara-


man [10,11] showed that a simple local search heuristic for the capaci-
tated facility location problem (CFLP) in which the service costs obey
the triangle inequality produces a solution in polynomial time which is
within a factor of 8 +  of the value of an optimal solution. By simplify-
ing their analysis, we are able to show that the same heuristic produces
a solution which is within a factor of 6(1 + ) of the value of an op-
timal solution. Our simplified analysis uses the supermodularity of the
cost function of the problem and the integrality of the transshipment
polyhedron.
Additionally, we consider the variant of the CFLP in which one may
open multiple copies of any facility. Using ideas from the analysis of the
local search heuristic, we show how to turn any α-approximation algo-
rithm for this variant into one which, at an additional cost of twice the
optimum of the standard CFLP, opens at most one additional copy of any
facility. This allows us to transform a recent 3-approximation algorithm
of Chudak and Shmoys [5] that opens many additional copies of facilities
into a polynomial-time algorithm which only opens one additional copy
and has cost no more than five times the value of the standard CFLP.

1 Introduction

We consider the capacitated facility location problem (CFLP). In this problem,


we are given a set of facilities F and a set of clients D. Each client j ∈ D
has a demand dj that must be serviced by one or more open facilities. There
is a cost fi for opening facility i ∈ F , and it costs cij for facility i to service
one unit of demand from client j. We call the first type of cost facility cost
and the second service cost. Furthermore, no facility may service more than
U units of demand. We wish to service all clients at minimum total cost. The
capacitated facility location problem and variations of it have been well-studied
in the literature (see, for example, the book of Mirchandani and Francis [14])
and arise in practice (see, for example, the paper of Barahona and Jensen [3] for
an instance of a parts warehousing problem from IBM).

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 99–113, 1999.
c Springer-Verlag Berlin Heidelberg 1999
100 Fabián A. Chudak and David P. Williamson

The CFLP is NP-hard even in the case that U = ∞, sometimes called the
uncapacitated facility location problem (UFLP) [7]. Thus we turn our attention
to approximation algorithms. We say we have an α-approximation algorithm for
the CFLP if the algorithm runs in polynomial time and returns a solution of
value no more than α times the value of an optimal solution. The value α is
sometimes called the performance guarantee of the algorithm.
It is possible to express any instance of the well-known set cover problem as
an instance of the UFLP of the same cost, which implies that unless P = N P ,
there is no approximation algorithm for the UFLP with performance guarantee
better than c ln |D|, where c is some constant [13,8,16,1]. Thus we turn to special
cases of the CFLP. In particular, we assume that for any k, l ∈ F ∪ D a service
cost ckl is defined, and the service costs are symmetric and obey the triangle
inequality. This is a natural assumption, since service costs are often associated
with the distance between points in Euclidean space representing facilities and
clients. From now on, when we refer to the CFLP or UFLP, we refer to this
metric case.
Recently, Korupolu, Plaxton, and Rajaraman (KPR) gave the first approx-
imation algorithm for the CFLP with constant performance guarantee [10,11].
Surprisingly, Korupolu et al. show that a simple local search heuristic is guar-
anteed to run in polynomial time and to terminate with a solution of value no
more than (8 + ) times optimum, for any  > 0. The central contribution of our
paper is to simplify and improve their analysis of the heuristic, showing that it
is a 6(1 + )-approximation algorithm for the CFLP. Although our proof follows
theirs closely at many points, we show that some case distinctions (e.g. “cheap”
versus “expensive” facilities) are unnecessary and some proofs can be simplified
and strengthened by using standard tools from mathematical programming. For
example, using the supermodularity of the cost function of the CFLP reduces a
six and a half page proof to a half page, and using the notion of a transshipment
problem and the integrality of its polyhedron allows us to get rid of the extra-
neous concept of a “refined β-allocation,” which in turn leads to the improved
performance guarantee.
We are also able to use a concept translated from KPR to get an improved
approximation algorithm for a variant of the CFLP. The variant we consider is
the one in which a solution may open up to k copies of facility i, each at cost
fi and having capacity U , and we denote this problem the k-CFLP (so that the
ordinary CFLP is the same as the 1-CFLP). Shmoys, Tardos, and Aardal [17] give
a polynomial-time algorithm for the 72 -CFLP which produces a solution of value
no more than 7 times the optimal value of the 1-CFLP. Chudak and Shmoys [6],
building on previous work [4,5] for the UFLP, give a 3-approximation algorithm
for the ∞-CFLP. Here we show how to take any solution for the ∞-CFLP and
produce from it a solution for the 2-CFLP adding cost no more than twice the
optimal value of the 1-CFLP. Thus by using the Chudak-Shmoys algorithm, we
are able to produce solutions in polynomial time for the 2-CFLP of cost no more
than 5 times the optimal value of the 1-CFLP, improving the previous result of
Shmoys et al. [17].
Improved Capacitated Facility Algorithms 101

The recent work on approximation algorithms for facility location problems


was started by the paper of Shmoys, Tardos, and Aardal [17], who gave a 3.16-
approximation algorithm for the UFLP, the first approximation algorithm for
this problem with a constant performance guarantee. Guha and Khuller [9] then
gave a 2.41-approximation algorithm for the UFLP, which was followed by a
1.74-approximation algorithm due to Chudak and Shmoys [4,5]. Unlike the local
search algorithm of KPR, the algorithms of these papers are based on techniques
for deterministically and randomly rounding solutions to linear programming re-
laxations of the UFLP. Additionally, an observation of Sviridenko [18], combined
with a result of Guha and Khuller [9] implies that no approximation algorithm
for the UFLP with performance guarantee 1.46 is possible, unless P = N P .
The rest of the paper is structured as follows. We begin in Section 2, where
we introduce the local search algorithm of KPR, define some notation, and prove
some preliminary lemmas. We then, in Section 3, define the concept of a “swap
graph”, analogous to the concept of the β-allocation problem in KPR, and show
how it leads to our algorithm for the 2-CFLP. Finally, we show how to obtain an
improved analysis of the local search algorithm using the swap graph in Section
4.

2 The Local Search Algorithm


2.1 Preliminaries
In this section, we define some notation and give some preliminary lemmas that
will be needed in subsequent discussion. Given a set S ⊆ F of facilities to open,
it is easy to determine the minimum service costs for that set of facilities by
solving the following transportation problem: for each facility i ∈ F we have a
supply node i with supply U , and for each client j ∈ D we have a demand node
j with demand dj ; the unit shipping cost from i to j is cij . When discussing
the k-CFLP, we will let S be a multiset of facilities (here l copies of i ∈ F in
S corresponds to opening l facilities at location i). We let x(S, i, j) denote the
amount of demand of client j serviced by facility i in the solution given by S. We
will denote the overall cost of the location problem given by opening the facilities
in S by c(S). Furthermore,
P we let cf (S) denote the facility costs of the solution
S (i.e., cf (S) P
= i∈S fi ) and cs (S) denote the service costs of the solution S
(i.e., cs (S) = i∈F,j∈D cij x(S, i, j)). Let S ∗ denote the set of facilities opened
by some optimal solution; it will always be a solution to the 1-CFLP and hence
not a multiset. Let n = |F |.
The local search algorithm given by Korupolu et al. for the CFLP is the fol-
lowing: given a current solution S, perform any one of three types of operations
that improve the value of the solution by at least c(S)/p(n, ), where p(n, ) is
a suitably chosen polynomial in n and 1/, and continue doing so until none of
these operations results in an improvement of at least that much. The opera-
tions are: adding a facility i ∈ F − S to S (i.e., S ← S + i); dropping a facility
i ∈ S (i.e., S ← S − i); or swapping a facility i ∈ S for a facility i0 ∈ F − S
(i.e., S ← S − i + i0 ). We call any operation that improves the solution by at
102 Fabián A. Chudak and David P. Williamson

least c(S)/p(n, ) an admissible operation; thus the algorithm runs until there
are no more admissible operations. This heuristic runs in polynomial time, as
Korupolu et al. argued: start with some arbitrary feasible solution (for instance,
setting S = F ). Since in each step, the value of the solution improves by a factor
of (1 − p(n,)
1
), after p(n, ) operations the value of the solution will have im-
proved by a constant factor. Since the value of the solution can’t be smaller than
c(F )
c(S ∗ ), after O(p(n, ) log c(S ∗ ) ) operations the algorithm will terminate. Each lo-

cal search step can be implemented in polynomial time, and O(p(n, ) log c(F ))
is a polynomial in the input size, so overall the algorithm takes polynomial time.
We now turn to proving some preliminary lemmas. These lemmas use the
fact that the cost function c is supermodular; that is, if A, B ⊆ F , we have that

c(A) + c(B) ≤ c(A ∩ B) + c(A ∪ B).

(See Babayev [2], Propositions 3.3 and 3.4 of Nemhauser, Wolsey, and Fisher
[15].) In particular, cs is supermodular, while cf is modular (that is, cf (A) +
cf (B) = cf (A ∩ B) + cf (A ∪ B)). We will use the fact that supermodularity holds
even for multisets.
We show the following three lemmas:

Lemma 2.1. If c(S) ≥ (1 + )c(S ∗ ) and S ⊆ S ∗ , then there is an admissible add


operation.

Lemma 2.2. If c(S) ≥ (1 + )c(S ∗ ) and S ⊇ S ∗ , then there is an admissible drop


operation.

Lemma 2.3 (KPR [11], Lemma 9.3). If there is no admissible add operation,
nc(S)
then cs (S) ≤ c(S ∗ ) + p(n,) .

In addition, in Section 4, we show the following theorem.

Theorem 2.4. If neither S ⊆ S ∗ nor S ⊇ S ∗ and there are no admissible drops


or swaps, then

cf (S − S ∗ ) ≤ 3cf (S ∗ − S) + 2cs (S) + 2cs (S ∗ ) + nc(S)/p(n, ).

2.2 The Main Theorem

Before proving the lemmas, we show how they lead to the 6(1+)-approximation
algorithm for the CFLP.

Theorem 2.5. If there are no admissible operations, then

c(S) ≤ 6(1 + )c(S ∗ ).


Improved Capacitated Facility Algorithms 103

Proof. If there are no admissible operations and if S ⊆ S ∗ or S ⊇ S ∗ , then by


Lemmas 2.1 and 2.2 we know that c(S) ≤ (1 + )c(S ∗ ). If there are no admissible
operations and neither S ⊆ S ∗ nor S ⊇ S ∗ then

cf (S − S ∗ ) ≤ 3cf (S ∗ − S) + 2cs (S) + 2cs (S ∗ ) + nc(S)/p(n, ),

by Theorem 2.4. Adding cf (S ∩ S ∗ ) + cs (S) to both sides, we obtain

c(S) ≤ 2cf (S ∗ − S) + cf (S ∗ ) + 3cs (S) + 2cs (S ∗ ) + nc(S)/p(n, )


≤ 3c(S ∗ ) + 3c(S ∗ ) + 4nc(S)/p(n, ),

using Lemma 2.3. Then


 
4n
c(S) 1 − ≤ 6c(S ∗ ),
p(n, )
or
6
c(S) ≤ c(S ∗ ).
1− 4n
p(n,)

This gives that c(S) ≤ 6(1 + )c(S ∗ ) for p(n, ) ≥ 8n


 and  < 1. t
u

2.3 Proofs of Preliminary Lemmas


We start by proving somewhat more general forms of Lemmas 2.1 and 2.2, and
deriving those Lemmas as corollaries.

Lemma 2.6. Let f : V → < be any supermodular function. If S ⊂ S ∗ ⊆ V and


f (S) ≥ γf (S ∗ ), then there exists u ∈ S ∗ − S such that
 
1 1
f (S + u) − f (S) ≤ ∗ − 1 f (S).
|S − S| γ

Proof. Let W = S ∗ −S = {u1 , u2 , . . . , uk }. Let Wi = {u1 , . . . , ui }. The statement


certainly holds if |S ∗ − S| = 1, so assume that |S ∗ − S| ≥ 2. Then by the
supermodularity of f we know that:

f (S + Wk−1 ) + f (S + uk ) ≤ f (S ∗ ) + f (S)
f (S + Wk−2 ) + f (S + uk−1 ) ≤ f (S + Wk−1 ) + f (S)
..
.
f (S + W2 ) + f (S + u3 ) ≤ f (S + W3 ) + f (S)
f (S + u1 ) + f (S + u2 ) ≤ f (S + W2 ) + f (S).
Pk−1
Summing the inequalities and subtracting i=2 f (S + Wi ) from both sides, we
obtain
Xk
f (S + ui ) ≤ f (S ∗ ) + (k − 1)f (S),
i=1
104 Fabián A. Chudak and David P. Williamson

so that there exists some i such that


1
f (S + ui ) − f (S) ≤ (f (S ∗ ) − f (S))
k 
1 1
≤ − 1 f (S).
k γ
t
u

Proof of Lemma 2.1: It follows from Lemma 2.6 that if c(S) ≥ (1 + )c(S ∗ ),
then
 there exists an add operation that changes the cost by no more than
n(1+)
n 1+ − 1 c(S) ≤ −c(S)/p(n, ), for p(n, ) ≥
1 1
 . So there is an admis-
sible add operation. t
u
Proof of Lemma 2.3: By modifying the last few lines of the proof of Lemma
2.6, it follows that if f (S ∗ ) ≤ f (S) − β, then there exists a ui ∈ S ∗ − S such that
nc(S)
f (S + ui ) − f (S) ≤ − |S ∗β−S| . Suppose it is the case that cs (S) > c(S ∗ ) + p(n,) .

Then by adding cf (S) to the left-hand side, cf (S − S ) to the right-hand side,
nc(S)
and observing that cs (S ∪ S ∗ ) ≤ cs (S ∗ ), we have that c(S) > c(S ∗ ∪ S) + p(n,) .
nc(S)
Setting β = p(n,) and applying the above gives us that there is an admissible
add operation, proving the lemma. t
u

Lemma 2.7. Let f : V → < be any supermodular function. If S ⊃ S ∗ and


f (S) ≥ γf (S ∗ ) for γ ≥ 1, then there exists u ∈ S − S ∗ such that
 
1 1
f (S − u) − f (S) ≤ − 1 f (S).
|S − S ∗ | γ

Proof. The proof of Lemma 2.7 is similar to that of Lemma 2.6, and so we omit
it. t
u

Proof of Lemma 2.2: It follows from Lemma 2.7 that if c(S) ≥ (1 + )c(S ∗ ),
then
 there exists a drop operation that changes the cost by no more than
n(1+)
n 1+ − 1 c(S) ≤ −c(S)/p(n, ), for p(n, ) ≥
1 1
 . So there is an admis-
sible drop operation. t
u
Note then that we need p(n, ) ≥ n(1+)
 (from the proofs of Lemmas 2.1 and
Lemma 2.2) and p(n, ) ≥ 8n  (from the proof of Theorem 2.5). Thus p(n, ) = 
8n

is sufficient (assuming  < 1).

3 Path Decompositions and the Swap Graph


3.1 A Path Decomposition
In this section, we define a path decomposition and a concept called the swap
graph which will be useful in both of our results. The path decomposition is more
or less equivalent to the “difference graph” of Korupolu et al. [11] Appendix B,
while the swap graph roughly corresponds to their “β-allocation problem”. The
Improved Capacitated Facility Algorithms 105

path decomposition is useful in comparing the value of our current solution with
the optimal solution. The swap graph will be used in the analysis of the local
search algorithm (in the proof of Theorem 2.4) and will be used in the algorithm
and analysis of our result for the 2-CFLP.
To obtain the path decomposition, we start with some current solution S and
the optimal solution S ∗ . We construct the following directed graph: we include
a node j for each client j ∈ D, and a node i for each facility i ∈ S ∪ S ∗ . We
include an arc (j, i) of weight w(j, i) = x(S ∗ , i, j) for all i ∈ S ∗ , j ∈ D when
x(S ∗ , i, j) > 0, and an arc (i, j) of weight w(i, j) = x(S, i, j) for all i ∈ S, j ∈ D
when x(S, i, j) > 0. Observe that by the properties of x, the total weight of all
arcs incoming to a node j for j ∈ D is dj , as is the total weight of all outgoing
arcs. The total weight of arcs incoming to any node i for i ∈ S ∗ is at most U ,
and the total weight of arcs P going out of any ∗node i for i ∈ S is also at most U .
Furthermore, notice that cij w(i, j) = cs (S ) + cs (S).
By standard path-stripping arguments, we can decompose this graph into
a set of paths P and cycles. We ignore the cycles; the paths start at nodes in
S and end at nodes in S ∗ . Let the weight of a path P be denoted Pw(P ), and,
overloading notation somewhat, let c(P ) denote its cost (c(P ) = (i,j)∈P cij ).
P
Then P ∈P c(P )w(P ) ≤ cs (S) + cs (S ∗ ). For any subset of paths P 0 ⊆ P, let
P 0 (A, ·) denote the set of paths in P 0 starting at nodes i ∈ A for A ⊆ S, let
P 0 (·, B) denote the set of paths ending at nodes i0 ∈ B for B ⊆ S ∗ . Then
P 0 (A, B) denotes
P the set of paths in 0P from
0 0
P i ∈ A ⊆ S to i ∈ B ⊆ S . Also,

0
let w(P ) = P ∈P 0 w(P ), and val(P ) = P ∈P 0 w(P )c(P ). Thus, for instance,
val(P) ≤ cs (S) + cs (S ∗ ).

3.2 The Swap Graph


The swap graph simply corresponds to a transshipment problem from a specified
subset S 0 of nodes of a current solution S (possibly a multiset) to a subset of
facilities F 0 ⊂ F . We place demands of 1 on the nodes of S 0 and integer supplies
on the facilities F 0 , and set the cost of an edge ĉkl from k ∈ S 0 to l ∈ F − S
to be U ckl + fl − fk . When using a swap graph, we use the path decomposition
to prove that a fractional solution of some value β exists to the transshipment
problem. Then by the integrality of the transshipment polyhedra, we know that
there exists an integral solution to the transshipment problem of cost no more
than β such that one unit of flow is shipped from each node in S 0 to exactly one
node in F 0 .
We then observe that in the integral solution to the transshipment problem,
each unit of flow from k ∈ S 0 to l ∈ F 0 corresponds to a swap operation in our
current solution that can be performed while increasing the cost of the current
solution by no more than ĉkl : each unit of demand assigned from client j to
k ∈ S 0 in the current solution can be assigned to l ∈ F 0 at a change in cost of
clj − ckj ≤ clk + ckj − ckj ≤ ckl .
There are at most U units of demand assigned to k ∈ S 0 , so the total change
in cost of transferring the demand assigned to k to l is at most U ckl , and the
106 Fabián A. Chudak and David P. Williamson

change in cost of closing facility k and opening facility l is fl − fk . Thus the


overall cost of performing the swap is at most U ckl + fl − fk = ĉkl .

3.3 An Algorithm for the 2-CFLP


To illustrate the use of the swap graph, we give an algorithm such that given
any solution to the k-CFLP, for k > 2, the algorithm returns a solution to the
2-CFLP at additional cost no more than twice the cost of an optimal solution
to the 1-CFLP. Given a solution S to the k-CFLP (a multiset), the algorithm
works as follows. As long as there exists an add operation that reduces the cost,
we add facilities from F − S to S one at a time. Let S̃ be the solution when there
are no longer add operations that improve the cost of the solution. We then solve
a transshipment problem via the swap graph between nodes in S 0 ⊆ S̃ and F ,
where S 0 consists of all the copies of facilities that are used at full capacity; let
S̃1 = S̃ − S 0 be the remainder copies of facilities in S. We put demands of 1
on the nodes in S 0 , and supplies of 1 on the nodes in F , so that we have the
following problem:
X
Min ĉkl xkl
k∈S 0 ,l∈F
subject to:
X
xkl = 1 ∀k ∈ S 0
l∈F
X
xkl ≤ 1 ∀l ∈ F
k∈S 0
xkl ≥ 0 ∀k ∈ S 0 , l ∈ F.
Given the integral solution xkl to the transshipment problem, whenever xkl = 1,
we obtain a new solution Ŝ by swapping k ∈ S 0 for l ∈ F at change in cost ĉkl .
It is easy to verify that in solution Ŝ we open at most 2 facilities for each i ∈ F ,
one possibly from the assignment problem, and one from S̃1 , so that we have a
solution to the 2-CFLP.
Certainly the algorithm runs in polynomial time. We can now prove that this
algorithm does not increase the cost of the original solution by much.
Theorem 3.1. The algorithm above, given a solution S to the k-CFLP, produces
a solution Ŝ to the 2-CFLP at additional cost at most twice the optimal value of a
solution to the 1-CFLP.
Proof. We start with the solution S and apply add operations, each of which
does not increase the cost of the solution. Given the solution S̃ (after we have
applied all add operations to S that improve the cost of the solution), let P be
the path decomposition giving paths from facilities in S̃ to an optimal 1-CFLP
solution S ∗ . We use the path decomposition to give a fractional solution to the
transshipment problem of cost no more than
cs (S̃) + cs (S ∗ ) − cf (S 0 ) + cf (S ∗ ) ≤ cs (S̃) + c(S ∗ ).
Improved Capacitated Facility Algorithms 107

By a small variation of Lemma 2.3, we know that we must have cs (S̃) ≤ c(S ∗ ).
Since the cost of the solution Ŝ obtained after swapping is at most the cost of
c(S̃) plus the cost of the solution to the transshipment problem, we know that

c(Ŝ) ≤ c(S̃) + 2c(S ∗ ).

To obtain a feasible fractional solution xe to the transshipment problem, we


set xekl to be 1/U times the total weight of paths from k ∈ S 0 to l ∈ S ∗ (that
ekl = w(P(k, l))/U ). Clearly x
is, x e is a feasible solution for the transshipment
problem, since the total weight of paths leaving any k ∈ S 0 is U , and the total
weight of paths entering any l ∈ S ∗ is at most U . The cost of the solution x
e is
X X X
ekl =
ĉkl x (U ckl − fk + fl )(1/U ) w(P )
k∈S 0 ,l∈S ∗ k∈S 0 ,l∈S ∗ P ∈P(k,l)
X X
≤ w(P )ckl − cf (S 0 ) + cf (S ∗ )
k∈S 0 ,l∈S ∗ P ∈P(k,l)
X
≤ c(P )w(P ) − cf (S 0 ) + cf (S ∗ )
P ∈P

≤ cs (S̃) + cs (S ∗ ) − cf (S 0 ) + cf (S ∗ ),

where the inequality ckl ≤ c(P ) follows from the triangle inequality. t
u

Corollary 3.2. There is a polynomial-time algorithm that finds a solution to the


2-CFLP of cost at most 5 times the optimal value of a 1-CFLP solution.

Proof. We apply the 3-approximation algorithm of Chudak and Shmoys [5] for
the ∞-CFLP to obtain our initial solution S. Since the cost of the optimal
solution for ∞-CFLP is at most the cost of the optimal solution for the 1-CFLP,
the corollary follows. t
u

4 Analysis of the Local Search Algorithm

We now use the path decomposition and swap graph tools from the previous
section to complete our analysis of the local search algorithm, and prove Theorem
2.4. The lemmas we derive below are roughly similar to those of Korupolu et al.
[11]: Lemma 4.2 corresponds to their Claim 9.7, and Lemma 4.3 to their Claims
9.8 and 9.9. However, we do not need an analogue of their “refined β-allocation”,
which gives us an improvement in the analysis in Lemma 4.2.
Let S be a solution meeting the conditions of Theorem 2.4, and let P be the
path decomposition for S and an optimal solution S ∗ . We will be particularly
interested in three subsets of paths from P. The first set is the set of all paths
from nodes in S − S ∗ to S ∩ S ∗ (if any); we call these the transfer paths and
denote them T = P(S − S ∗ , S ∩ S ∗ ). The basic idea of these paths in the proof is
that for any path P ∈ T , we claim we can transfer w(P ) of the demand assigned
108 Fabián A. Chudak and David P. Williamson

to the start node of the path to the end node of the path at a cost of c(P )w(P )
without violating the capacity constraints. We establish this claim later.
The next subset of paths of interest is the set of all paths from S − S ∗ to
S − S; we call these the swap paths and denote them S = P(S − S ∗ , S ∗ − S).

We use the swap paths to get a fractional feasible solution for a transshipment
problem from S − S ∗ to S ∗ − S in the swap graph, and get an integral solution
of swaps whose cost is a simple expression in terms of cs (S), cs (S ∗ ), cf (S − S ∗ ),
and cf (S ∗ − S). Thus if no swap can be performed that improves the cost of the
current solution by a certain amount, this implies a bound on cf (S − S ∗ ).
This idea does not quite work as stated because the weight of swap paths
from i ∈ S −S ∗ could be quite small. Thus, as in Korupolu et al. [11], we split the
nodes of S −S ∗ into two types: heavy nodes H such that the weight of paths from
any i ∈ H to S ∗ − S is at least U/2 (i.e., H = {i ∈ S − S ∗ : w(S(i, ·)) ≥ U/2}),
and light nodes (all the rest: L = S − S ∗ − H). We will be able to set up
a transshipment problem for the nodes in H, which will give us a bound on
cf (H). To get a bound on cf (L), we will have to set up a transshipment problem
in a different manner and use the observation that we can transfer the demand
assigned from one light node to another light node without violating capacity
constraints.
To build towards our proof of Theorem 2.4, we now formalize the statements
above in a series of lemmas.

Lemma 4.1. Weight w(T (i, ·)) of the demand assigned to facility i in the current
assignment can be transferred to other nodes in S at a cost increase of at most
val(T (i, ·)).

Proof. To prove the lemma, consider a path P ∈ T (i, ·), with start node i and end
node i0 . We observe that the first edge (i, j) in path P corresponds to a demand
w(P ) assigned to i by client j in the current assignment. We reassign this demand
to i0 ∈ S ∩ S ∗ ; the increase in cost is at most (ci0 ,j − ci,j )w(P ) ≤ c(P )w(P ) by
the triangle inequality. We now must show that such a reassignment does not
violate the capacity constraints at i0 . To see this, observe that by the properties
of path-stripping, the total weight of paths incoming to any node i0 ∈ S ∗ ∩ S
is the difference between the total weight of arcs coming into node i0 and the
total weight of arcs going out of node i0 . Since the total weight of arcs coming
into node i0 corresponds to the total amount of demand assigned to i0 by the
optimal solution, and the total weight of arcs going out of node i0 corresponds
to the total amount of demand assigned to i0 by the current solution, and the
optimal solution must be feasible, we can increase the demand serviced by i0 by
this difference and still remain feasible. t
u

Lemma 4.2. If there is not an admissible swap operation, then

cf (H) ≤ 2cf (S ∗ − S) + 2val(S(H, ·)) + val(T (H, ·)) + |H|c(S)/p(n, ).


Improved Capacitated Facility Algorithms 109

Proof. As suggested in the exposition, we set up a transshipment problem from


H to S ∗ − S, as follows:
X
Min ĉkl xkl
k∈H,l∈S ∗ −S
subject to:
X
xkl = 1 ∀k ∈ H
l∈S ∗ −S
X
xkl ≤ 2 ∀l ∈ S ∗ − S
k∈H
xkl ≥ 0 ∀k ∈ H, l ∈ S ∗ − S.

We claim that we can give a fractional solution to the transshipment problem


of cost no more than 2val(S(H, ·)) + 2cf (S ∗ − S) − cf (H). Thus there exists an
integral solution of no greater cost. Given an integral solution x, when xkl = 1,
we can swap facility k ∈ H for l ∈ S ∗ − S and transfer the demand w(S(k, ·))
assigned to k at change in cost at most ĉkl . By Lemma 4.1, we can transfer the
remaining demand w(T (k, ·)) assigned to k to nodes in S ∩ S ∗ at change in cost
at most val(T (k, ·)). By the hypothesis of the lemma, we know that any swap
for a facility results in a change in cost of at least −c(S)/p(n, ). Summing over
all swaps for k ∈ H given by the solution to the transshipment problem, we have
that
|H|c(S)
2val(S(H, ·)) + 2cf (S ∗ − S) − cf (H) + val(T (H, ·)) ≥ − .
p(n, )
Rearranging terms gives us the lemma.
e for this transshipment
To complete the proof, we give a fractional solution x
problem by setting
w(S(k, l))
ekl =
x .
w(S(k, ·))
P
CertainlyPthe constraints l∈S ∗ −S xekl = 1 are obeyed for all k ∈ H. The con-
straints k∈H x ekl ≤ 2 are also obeyed since
X X w(S(k, l)) X w(S(k, l))
ekl =
x ≤ ≤ 2,
w(S(k, ·)) U/2
k∈H k∈H k∈H

where the first inequality follows by the definition of H and the second since
the total weight of paths adjacent to any node is at most U . The cost of this
fractional solution is
X X w(S(k, l))
ekl =
ĉkl x (U ckl + fl − fk )
w(S(k, ·))
k∈H,l∈S ∗ −S k∈H,l∈S ∗ −S
X  
w(S(k, l)) w(S(k, l))
≤ (U ckl + fl ) − fk
U/2 w(S(k, ·))
k∈H,l∈S ∗ −S
110 Fabián A. Chudak and David P. Williamson
X
≤ 2ckl w(S(k, l)) + 2cf (S ∗ − S) − cf (H)
k∈H,l∈S ∗ −S
≤ 2val(S(H, ·)) + 2cf (S ∗ − S) − cf (H).

t
u

Lemma 4.3 (KPR [11], Claims 9.8 and 9.9). If there are no admissible drop
and swap operations, then

cf (L) ≤ cf (S ∗ − S) + 2val(T (L, ·)) + 2val(S(L, ·)) + |L|c(S)/p(n, ).

Proof. The proof of this lemma is similar to the proof of the previous lemma,
although here we will have to set up a transshipment problem to capture both
swap and drop operations. One difficulty with translating the previous proof to
this case is ensuring that one can find a feasible fractional solution such that
each facility in F − S is in no more than a small constant number swap/drop
operations. We do this by choosing exactly one “primary” facility k in L that
can be swapped for a facility l in F − S; i.e. xkl > 0 for exactly one k ∈ L. We
make a careful choice of this facility k so that any other facility i to which we
might otherwise normally make a fractional assignment xil > 0, we can drop i
and reassign its demand to k, the primary facility of l, at not much more cost.
We do this by setting up a transshipment problem from L to (F − S) ∪ L, in
which we set cost ĉkl = w(S(k, ·))ckl +fl −fk for l ∈ F −S, ĉkl = w(S(k, ·))(ckl +
θ(l)) − fk for l ∈ L, l 6= k, and ĉkk = ∞, where θ(l) for l ∈ L is the the cost
per unit demand for making U/2 units of capacity available at node l, either
via the unused capacity at l or transferring demand via the paths T (l, ·).1 Note
that since l ∈ L, w(S(l, ·)) ≤ U/2, and thus the unused capacity at node l plus
w(T (l, ·)) is at least U/2. Thus U2 θ(l) ≤ val(T (l, ·)). The transshipment problem
is then:
X
Min ĉkl xkl
k∈L,l∈(F −S)∪L
subject to:
X
xkl = 1 ∀k ∈ L
l∈(F −S)∪L
X
xkl ≤ 1 ∀l ∈ F − S
k∈L
xkl ≥ 0 ∀k ∈ L, l ∈ (F − S) ∪ L.

We claim that we can give a fractional solution to the transshipment problem


of cost no greater than 2val(S(L, ·)) + val(T (L, ·)) − cf (L) + cf (S ∗ − S). Thus
there exists an integral solution of no greater cost. Given an integral solution x,
when xkl = 1 for k ∈ L, l ∈ F − S, we can swap facility k ∈ L for l ∈ F − S
1
The same cost function, including the definition of θ, was used by Korupolu et al.
[11].
Improved Capacitated Facility Algorithms 111

and transfer the demand w(S(k, ·)) assigned to k at change in cost at most ĉkl .
By Lemma 4.1, we can transfer the remaining demand w(T (k, ·)) assigned to
k to nodes in S ∩ S ∗ at change in cost at most val(T (k, ·)). When xki = 1 for
k ∈ L, i ∈ L, k 6= i, we can drop facility k from S and transfer the demand
w(S(k, ·)) assigned to k to i at change in cost ĉki = w(S(k, ·))(cki + θ(i)), as this
cost covers transferring these units of demand to i and transferring the same
amount of demand from i to nodes in S ∩ S ∗ . By Lemma 4.1, we can transfer the
remaining demand w(T (k, ·)) assigned to k to nodes in S ∩ S ∗ at change in cost
at most val(T (k, ·)). By the hypothesis of the lemma, we know that any swap or
drop of a facility results in a change in cost of at least −c(S)/p(n, ). Summing
over all swaps and drops for k ∈ L given by the solution to the transshipment
problem, we have that
|L|c(S)
2val(S(L, ·)) + 2val(T (L, ·)) − cf (L) + cf (S ∗ − S) ≥ − .
p(n, )
Rearranging terms gives us the lemma.
e for this transshipment
To complete the proof, we give a fractional solution x
problem. For each l ∈ S ∗ − S we find k ∈ L that minimizes ckl + θ(k) and
designate k as the primary node π(l) for l. We then set xekl as follows. For each
l ∈ S ∗ − S, if k is the primary node for l, we set x
ekl = w(S(k, l))/w(S(k, ·)),
ekl = 0. For each i ∈ L, we set
otherwise x
X w(S(k, l))
eki =
x .
w(S(k, ·))
l∈S ∗ −S:i=π(l),k6=π(l)
P
This solution is feasible since certainly l∈F xkl =P1 for all k ∈ L. Also, since
for at most one k ∈ L is x ekl > 0 for l ∈ F − S, k∈L xkl ≤ 1. Observe that
when xeki > 0 for k ∈ L, i = π(l), l ∈ S ∗ − S, then

ĉki = w(S(k, ·))(cki + θ(i)) − fk


≤ w(S(k, ·))(ckl + cil + θ(i)) − fk
≤ w(S(k, ·))(2ckl + θ(k)) − fk ,

since cil + θ(i) ≤ ckl + θ(k) by the definition of primary nodes. Then the cost of
this fractional solution is
X
ekl
ĉkl x
k∈L,l∈F
X w(S(k, l)) X X w(S(k, l))
= ĉkl + ĉki
w(S(k, ·)) w(S(k, ·))
k∈L,l∈S ∗ −S,k=π(l) k∈L,i∈L l∈S ∗ −S:i=π(l),k6=π(l)
X w(S(k, l)) X w(S(k, l))
≤ ĉkl + ĉki
w(S(k, ·)) w(S(k, ·))
k∈L,l∈S ∗ −S,k=π(l) k∈L,l∈S ∗ −S,i=π(l),k6=i
X w(S(k, l))
≤ [w(S(k, ·))ckl + fl − fk ]
w(S(k, ·))
k∈L,l∈S ∗ −S,k=π(l)
112 Fabián A. Chudak and David P. Williamson

X w(S(k, l))
+ [w(S(k, ·))(2ckl + θ(k)) − fk ]
w(S(k, ·))
k∈L,l∈S ∗ −S,k6=π(l)
X w(S(k, l))
≤ [w(S(k, ·))(2ckl + θ(k)) − fk ] + cf (S ∗ − S)
w(S(k, ·))
k∈L,l∈S ∗ −S
X X
≤ 2ckl w(S(k, l)) + val(T (k, ·)) − cf (L) + cf (S ∗ − S)
k∈L,l∈S ∗ −S k∈L
≤ 2val(S(L, ·)) + val(T (L, ·)) − cf (L) + cf (S ∗ − S).

t
u

Combining Lemmas 4.2 and 4.3 gives Theorem 2.4.

References

1. S. Arora and M. Sudan. Improved low-degree testing and its applications. In


Proceedings of the 29th ACM Symposium on Theory of Computing, pages 485–495,
1997.
2. Dj. A. Babayev. Comments on the note of Frieze. Mathematical Programming
7:249–252, 1974.
3. F. Barahona and D. Jensen. Plant location with minimum inventory. Mathematical
Programming 83:101–111, 1998.
4. F. Chudak. Improved approximation algorithms for uncapacitated facility location.
In Proceedings of the 6th IPCO Conference, pages 180–194, 1998.
5. F. Chudak and D.B. Shmoys. Improved approximation algorithms for the uncapac-
itated facility location problem. In preparation.
6. F. Chudak and D.B. Shmoys. Improved approximation algorithms for a capacitated
facility location problem. In Proceedings of the 10th Annual ACM-SIAM Symposium
on Discrete Algorithms, pages 875–876, 1999.
7. G. Cornuéjols, G. Nemhauser, and L. Wolsey. The uncapacitated facility location
problem. In P. Mirchandani and R. Francis, editors, Discrete Location Theory, pages
119–171. John Wiley and Sons, Inc., New York, 1990.
8. U. Feige. A threshold of ln n for approximating set-cover. In Proceedings of the 28th
ACM Symposium on Theory of Computing, pages 314–318, 1996.
9. S. Guha and S. Khuller. Greedy strikes back: improved facility location algorithms.
In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms,
pages 649–657, 1998.
10. M. Korupolu, C. Plaxton, and R. Rajaraman. Analysis of a local search heuristic for
facility location problems. In Proceedings of the 9th Annual ACM-SIAM Symposium
on Discrete Algorithms, pages 1–10, 1998.
11. M. Korupolu, C. Plaxton, and R. Rajaraman. Analysis of a local search heuristic for
facility location problems. Technical Report 98-30, DIMACS, June 1998. Available
from dimacs.rutgers.edu/TechnicalReports/1998.html.
12. E. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart,
and Winston, New York, 1976.
13. C. Lund and M. Yannakakis. On the hardness of approximating minimization
problems. JACM 41:960–981, 1994.
Improved Capacitated Facility Algorithms 113

14. P. Mirchandani and R. Francis, eds. Discrete Location Theory. John Wiley and
Sons, Inc., New York, 1990.
15. G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations for
maximizing submodular set functions – I. Mathematical Programming 14:265–294,
1978.
16. R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-
constant error-probability PCP characterization of NP. In Proceedings of the 29th
ACM Symposium on Theory of Computing, pages 475–484, 1997.
17. D. Shmoys, É. Tardos, and K. Aardal. Approximation algorithms for facility loca-
tion problems. In Proceedings of the 29th ACM Symposium on Theory of Computing,
pages 265–274, 1997.
18. M. Sviridenko, July, 1998. Personal communication.
Optimal 3-Terminal Cuts and Linear
Programming

William H. Cunningham1 and Lawrence Tang2


1
Department of Combinatorics & Optimization, University of Waterloo, Waterloo,
ON, Canada, N2L 3G1
2
Department of Mathematics, University of British Columbia, Vancouver, BC,
Canada V6T 1Y8

Abstract. Given an undirected graph G = (V, E) and three specified


terminal nodes t1 , t2 , t3 , a 3-cut is a subset A of E such that no two
terminals are in the same component of G\A. If a non-negative edge
weight ce is specified for each e ∈ E, the optimal 3-cut problem is to
find a 3-cut of minimum total weight. This problem is NP-hard, and in
fact, is max-SNP-hard. An approximation algorithm having performance
guarantee 76 has recently been given by Călinescu, Karloff, and Rabani.
It is based on a certain linear programming relaxation, for which it is
shown that the optimal 3-cut has weight at most 76 times the optimal LP
value. It is proved here that 76 can be improved to 12 11
, and that this is
best possible. As a consequence, we obtain an approximation algorithm
for the optimal 3-cut problem having performance guarantee 12 11
.

1 Introduction

Given an undirected graph G = (V, E) and k specified terminal nodes t1 , . . . , tk ,


a k-cut is a subset A of E such that no two terminals are in the same component
of G\A. If a non-negative edge-weight ce is specified for each e ∈ E, the optimal
k-cut problem is to find a k-cut of minimum total weight. This problem was shown
by Dahlhaus, Johnson, Papadimitriou, Seymour, and Yannakakis [5] to be NP-
hard for k ≥ 3. (Of course, it is solvable in polynomial time if k = 2.) They also
gave a simple polynomial-time algorithm having performance guarantee 2(k−1) k ,
2(k−1)
that is, it is guaranteed to deliver a k-cut of weight at most k times the
minimum weight of a k-cut. Later, in [6], the same authors showed that for
k ≥ 3 the problem is max-SNP-hard, which implies that, assuming P6=NP, there
exists a positive ε such that the problem has no polynomial-time approximation
algorithm with performance guarantee 1 + ε.
The present paper concentrates on the optimal 3-cut problem. From the
above remarks, it follows that this problem is max-SNP-hard, and the approxi-
mation algorithm of [6] has a performance guarantee of 43 . Recently, Călinescu,
Karloff, and Rabani [1] gave an approximation algorithm having a performance
guarantee of 76 . We give a further improvement that is based on their approach.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 114–125, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Optimal 3-Terminal Cuts and Linear Programming 115

Chopra and Rao [3] and Cunningham [4] investigated linear programming
relaxations of the 3-cut problem, showing results on classes of facets and separa-
tion algorithms. Here are the two simplest relaxations. (By a T-path we mean the
edge-set of a path joining two of the terminals. By a wye we mean the edge-set
of a tree having exactly three nodes of degree one, each of which P is a terminal.
For a set A, a subset B of A, and a vector z ∈ RA , z(B) denotes j∈B zj .)
P
minimize e∈E ce xe
(LP 1) subject to
x(P ) ≥ 1, P a T -path
xe ≥ 0, e ∈ E.
P
minimize e∈E ce xe
(LP 2) subject to
x(P ) ≥ 1, P a T -path
x(Y ) ≥ 2, Y a wye
xe ≥ 0, e ∈ E.
It follows from some simple observations about shortest paths, and the equiva-
lence of optimization and separation, that both problems can be solved in poly-
nomial time. It was proved in [4] that the approximation algorithm of [5] delivers
a 3-cut of value at most 43 times the optimal value of (LP 1). (In particular, the
minimum weight of a 3-cut is at most 43 times the optimal value of (LP 1).) It
was conjectured that the minimum weight of a 3-cut is at most 16 15 times the
optimal value (LP 2). The examples in Figure 1 (from [4]) show that this conjec-
ture, if true, is best possible. In both examples, the values of a feasible solution
x of (LP 2) are shown in the figure. The weights ce are all 2 for the example on
the left. For the one on the right they are 1 for the edges of the interior triangle,
and 2 for the other edges. In both cases the minimum 3-cut value is 8, but the
given feasible solution of (LP 2) has value 7.5.

t1 t1

1/ 1/
2 2 1/ 1/
2 2
1/
2
1/
4
1/ 1/ 1/
1/ 4 1/ 2 2
2 2
1/ 1/
1/ 2 2
4

t2 1/ 1/
t3 t2 1/ 1/
t3
2 2 2 2

Fig. 1. Bad examples for (LP 2)

Recently, Călinescu, Karloff, and Rabani [1] gave a new linear programming
relaxation. Although their approach applies to any number k of terminals, we
116 William H. Cunningham and Lawrence Tang

continue to restrict attention to the case when k = 3. They need to assume that G
be a complete graph. (Of course, if any missing edges are added with weight zero,
the resulting 3-cut problem is equivalent to the given one, so this assumption
is not limiting.) The relaxation is based on the following observations. First,
every minimal 3-cut is of the form β(R1 , R2 , R3 ), where ti ∈ Ri for all i. Here,
where R is a family of disjoint subsets of R, β(R) denotes the set of all edges
of G joining nodes in different members of the family. Since c ≥ 0, there is an
optimal 3-cut of this form. Second, the incidence vector x of a minimal 3-cut
is a kind of distance function, that is, it defines a function d(v, w) = xvw on
pairs of nodes of G which is non-negative, symmetric, and satisfies the triangle
inequality. Finally, with respect to d the distance between any two terminals
is 1, and the sum of the distances from any node v to the terminals is 2. The
resulting linear-programming relaxation is:
P
minimize e∈E ce xe
(LP 3) subject to
P xvw = 1, v, w ∈ T, v 6= w
v∈T xvw = 2, w ∈ V
xuv + xvw − xuw ≥ 0, u, v, w ∈ V
xe ≥ 0, e ∈ E.
This relaxation is at least as tight as (LP 2). To see this, suppose that (af-
ter adding missing edges to make G complete), we have a feasible solution x
to (LP 3). Then for any path P of G joining u to v, x(P ) ≥ xuv , by applying
the triangle inequality. It follows that x(P ) ≥ 1 for any T -path P . Moreover,
any wye Y is the disjoint union of Ppaths P1 , P2 , P3 from some node v to the
terminals. It follows that x(Y ) ≥ w∈T xvw = 2. Thus every feasible solution
of (LP 3) gives a feasible solution of (LP 2) having the same objective value. The
first example of Figure 1 shows that the optimal value of (LP 3) can be strictly
greater than the optimal value of (LP 2). On the other hand, the second example
shows that there is no hope to prove in general that the the minimum weight of
a 3-cut is less than 16
15 times the optimal value of (LP 3).
It was proved in [1] that the minimum weight of a 3-cut is at most 76 times
the optimal value of (LP 3). As a consequence, an approximation algorithm for
the optimal 3-cut problem having a performance guarantee of 76 was derived. (It
is clear that (LP 3) can be solved in polynomial time, since it is of polynomial
size.) However, it was left open whether this result could be strengthened; the
second example of Figure 1 shows an example for which the minimum weight of
a 3-cut can be as large as 16/15 times the optimal value of (LP 3), and this is
the worst example given in [1]. (To see that x of that example does extend to a
feasible solution of (LP 3), we simply define x on each missing edge uv to be the
minimum length, with respect to lengths xe , of a path from u to v.)
In this paper it is shown that the minimum weight of a 3-cut is at most 12 11
times the optimal value of (LP 3), and that this is best possible. (This result has
been obtained independently by Karger, Klein, Stein, Thorrup, and Young [7].)
As a consequence we obtain an approximation algorithm for the optimal 3-cut
problem having a performance guarantee of 12 11 .
Optimal 3-Terminal Cuts and Linear Programming 117

2 Triangle Embeddings
Călinescu, Karloff, and Rabani [1] introduced an extremely useful geometric
relaxation, which they showed was equivalent to the linear-programming re-
laxation (LP 3). Let 4 denote the convex hull of the three elementary vectors
e1 = (1, 0, 0), e2 = (0, 1, 0), and e3 = (0, 0, 1) in R3 . By a triangle embedding
of G we mean a mapping y from V into 4 such that y(ti ) = ei for i = 1, 2, 3.
A triangle embedding y determines a vector x ∈ RE as follows. For each edge
uv, let xuv be one-half the L1 distance from y(u) to y(v). It is easy to see
that this x is a feasible solution to (LP 3). Conversely, a feasible solution x
of (LP 3) determines a triangle embedding y as follows. For each node v, let
y(v) = (1 − xt1 v , 1 − xt2 v , 1 − xt3 v ).
Given a triangle embedding y we can obtain x as above, and then use x to
obtain a triangle embedding y 0 . It is easy to see that y = y 0 . It is not true, on
the other hand, that every feasible solution of (LP 3) arises in this way from a
triangle-embedding. However, it is “almost true”. The following result is implicit
in [1], and we include a proof for completeness.
Theorem 1. Let x be a feasible solution of (LP 3), let y be the triangle embed-
ding determined by x and let x0 be the feasible solution of (LP 3) determined by
y. Then x0 ≤ x, and if x is an optimal solution of (LP 3), so is x0 .
Proof. First, observe that the second statement is a consequence of the first and
the non-negativity of c. Now let uv ∈ E. Both y(u) and y(v) have component-
sum 1. Therefore, y(u) − y(v) has component-sum zero, and so one-half of the
L1 distance between y(u) and y(v) is the sum of the non-negative components
of y(u) − y(v). Hence we may assume, perhaps by interchanging u with v and
relabelling the terminals, that one-half of the L1 distance between y(u) and y(v)
is the sum of the first two components of y(u) − y(v). Therefore,
1
ky(u) − y(v)k1 = y1 (u) − y1 (v) + y2 (u) − y2 (v)
2
= 1 − xut1 − (1 − xvt1 ) + 1 − xut2 − (1 − xvt2 )
= (2 − xut3 ) − (2 − xvt3 )
≤ xuv ,

as required.
t
u
The approximation algorithm of Călinescu, Karloff, and Rabani uses the
following ideas. Suppose that (LP 3) is solved, and an optimal solution x∗ that
arises from a triangle embedding is found. For a number α between 0 and 1 that
is different from x∗rv for every v ∈ V and r ∈ T , and an ordering r, s, t of T ,
define Rr = {v ∈ V : x∗rv < α}, Rs = {v ∈ V \Rr : x∗sv < α}, Rt = V \(Rr ∪ Rs ).
We call the 3-cut β(Rr , Rs , Rt ) uniform (with respect to this x∗ ). It is easy to
see that there are O(n) uniform 3-cuts. The algorithm of [1] simply chooses the
uniform 3-cut having minimum weight. It is proved to have weight at most 76
times the minimum weight of a 3-cut.
118 William H. Cunningham and Lawrence Tang

We consider a slight generalization of the notion of uniform 3-cut. Let α, α0


be two numbers chosen as α was above, and let r, s, t be an ordering of T . Define
Rr = {v ∈ V : x∗rv < α}, Rs = {v ∈ V \Rr : x∗sv < α0 }, Rt = V \(Rr ∪ Rs ). We
call the 3-cut β(Rr , Rs , Rt ) flat (with respect to this x∗ ). Clearly, every uniform
3-cut is flat. It is easy to see that there are O(n2 ) flat 3-cuts. Our approximation
algorithm simply chooses the flat 3-cut having minimum weight. We will show
that it has weight at most 12 11 times the weight of an optimal 3-cut. This result
is based on a tight analysis of the bound for the optimal 3-cut problem given by
(LP 3).

3 Linear Programming Again


It is easy to check that if the optimal value of (LP 3) is zero, then there is a 3-cut
of weight zero. Therefore, we may assume that the optimal value is positive. So
our problem may be restated as finding the best upper bound, over all choices
of G and c, for the minimum weight of a 3-cut divided by the optimal value
of (LP 3). By multiplying c by an appropriate positive number, we may assume
that the minimum weight of a 3-cut is 1. It is now more convenient to prove
the best lower bound on the value of (LP 3). Surprisingly, we can use a different
linear programming problem to do this.
Assume that G is fixed, and that an optimal solution x∗ of (LP 3) is also
fixed. Then the problem of finding the worst optimal value can be stated as:
P
minimize e∈E ce x∗e
(P ) subject to
c(S) ≥ 1, S a 3-cut
ce ≥ 0, e ∈ E.

Note that the variables are the weights ce ! It may seem that the hypothesis that
G and x∗ are known is very strong, but it turns out that we can assume that
there are not many choices for them. First, we may assume that x∗ is rational,
since it is an optimal solution of a linear-programming problem having rational
data. Therefore, there exists a positive integer q such that qx∗ is integer-valued.
Second, we may assume that x∗ arises from a triangle-embedding y ∗ , and it
is easy to see that qy ∗ is integral, as well. Therefore, we can think of y ∗ as
embedding the nodes of G into a finite subset 4q of 4, consisting of those
points y ∈ 4 for which qy is integral. We define the planar graph Gq = (4q , Eq )
by uv ∈ Eq if and only if the L1 distance between u and v is 2q . Figure 3 shows
G9 ; the numbers there are explained later. For nodes u, v of Gq , we denote by
dq (u, v) the least number of edges of a path in Gq from u to v. (It is easy to see
that dq (u, v) = q2 times the L1 distance from u to v.)

Theorem 2. Let G, c be a 3-cut instance, let x∗ be a rational-valued optimal


solution of (LP 3), with corresponding triangle-embedding y ∗ , and let q be a pos-
itive integer such that qx∗ is integral. Then there is a 3-cut instance on graph Ĝ
with nodeset 4q and edge-weights ĉ such that:
Optimal 3-Terminal Cuts and Linear Programming 119

10 10
0
8 8
2 2
0 0
8 0 2 0 8
2

8 8

8 8

8 8

8 8
0 0
8 0 2 2 0 8
2 2 2 2
10 2 2 10
0 0 0 0 0 0

10 8 8 8 8 8 8 8 10

Fig. 2. G9

(a) x̂ defined by qx̂uv = dq (u, v) for all uv ∈ E is a feasible solution of (LP 3)


(for Ĝ, ĉ), and ĉx̂ ≤ cx∗ ;
(b) The optimal 3-cut value for Ĝ, ĉ is at least that for G, c;
(c) ĉe = 0 for all e ∈/ Eq ;
(d) For every flat 3-cut of Ĝ with respect to x̂, there is a flat 3-cut of G with
respect to x∗ having no larger weight.

Proof. We use the mapping y ∗ from V to 4q , and we assume that x∗ arises


from y ∗ . Suppose that two nodes u, v of G are mapped to the same point of 4q
by y ∗ . Form G0 by identifying u with v and, where multiple edges are formed,
replacing the pair by a single edge whose weight is their sum. Then every 3-cut
of G0 determines a 3-cut of G having the same weight, so the minimum weight of
a 3-cut of G0 is at least the minimum weight of a 3-cut of G. Moreover, x∗ also
determines a triangle-embedding of G0 , so there is a feasible solution of (LP 3)
for G0 having value cx∗ . Finally, every flat cut of G0 gives a flat cut of G of the
same weight. Thus the theorem is true for G if it is true for G0 , and so we may
assume that y ∗ is one-to-one.
Now suppose that y ∗ is not onto, that is, that there is an element z of 4q
such that y ∗ (v) 6= z for all v ∈ V . We can form a graph G0 from G by adding
a node v and an edge uv of weight zero for every u ∈ V . It is easy to see that
the minimum weight of a 3-cut of G0 is the same as that of G. Also, if we map
the new node to z, we get a triangle embedding of G0 , and it corresponds to a
feasible solution of (LP 3) on G0 having value equal to cx∗ . Finally, every flat
cut of G0 corresponds to a flat cut of G of the same weight. So the theorem is
120 William H. Cunningham and Lawrence Tang

true for G if it is true for G0 . It follows that we may assume that y ∗ is onto.
Therefore, we may assume that V = 4q , and that y ∗ is the identity mapping.
Now suppose that there exists uv ∈ E\Eq , such that cuv = ε > 0. Let P be
the edge-set of a path in Gq from u to v such that |P | = dq (u, v). Decrease cuv
to zero, and increase ce by ε for all e ∈ P . We denote the new c by c0 . Then,
since every 3-cut using e uses an edge from P , the minimum weight of a 3-cut
with respect to c0 is not less than that with respect to c. (Similarly, every flat
3-cut has value with respect to c0 not less than that with respect to c.) Now
c0 x∗ = cx∗ − εdq (u, v) + εdq (u, v) = cx∗ . This argument can be repeated as long
as there is such an edge uv.
t
u
It is a consequence of the above theorem that it is enough to study the 3-
cut problem on graphs Gq with x∗e = 1q for all e ∈ Eq . (That is, to obtain the
best bound on the ratio of the optimal weight of a 3-cut to the optimal value
of (LP 3), it suffices to analyze such graphs and weights.) In particular, for each
positive integer q, we are interested in the optimal value of the following linear
programming problem.
P
minimize 1q e∈E ce
(Pq ) subject to
c(S) ≥ 1, S a 3-cut of Gq
ce ≥ 0, e ∈ Eq
The dual problem is
P
maximize zS
(Dq ) subject
P to
e∈S ≤ 1q , e ∈ Eq
zS ≥ 0, S a 3-cut of Gq .
We actually solved these problems numerically for several values of q, and then
were able to find solutions for general q.
Theorem 3. For q ≥ 4 the optimal value of (Pq ) and of (Dq ) is equal to

 11 1
 12 + 12(q+1) , if q ≡ 0 mod 3
11 1
f (q) = 12 + 12q , if q ≡ 1 mod 3

 11 + 1 − 1 2 ,
12 12q 12q if q ≡ 2 mod 3

Moreover, there is an optimal solution of (Dq ) for which zS is positive only if S


is a flat 3-cut.
It is easy to see that Theorems 2 and 3 have the following consequence. This
result has been proved independently by Karger et al. [7], whose approach is
somewhat different, but also uses a linear programming analysis of triangle-
embedding.
Theorem 4. For any 3-cut instance, the minimum weight of a 3-cut is at most
12 12
11 times the optimal value of (LP 3), and the constant 11 is best possible. t
u
Optimal 3-Terminal Cuts and Linear Programming 121

4 An Improved Approximation Algorithm


Algorithm
1. Find a rational-valued optimal solution x∗ of (LP 3).
2. Find the triangle embedding y ∗ determined by x∗ .
3. Return the flat 3-cut of minimum weight.
As pointed out before, the first step can be performed in polynomial time. The
polynomial-time algorithms for linear programming can be modified to return a
rational-valued optimal solution, and one of polynomial size. The second is easy.
So is the third step, using the observation made earlier that there are only O(n2 )
flat 3-cuts of G.
12
Theorem 5. The above algorithm returns a 3-cut of weight at most 11 times
the minimum weight of a 3-cut.
Proof. We may assume that the optimal value of a 3-cut is 1, so it is enough

to prove that the algorithm delivers a 3-cut of weight at most 12
11 . Let x be a
rational-valued optimal solution for (LP 3), and let q be a common denominator
for the components of x∗ , such that q is a multiple of 3. Consider an optimal
solution z ∗ of (Dq ) as given by Theorem 3. Then
X 12
zS∗ ≥ 1,
11
S

and zS∗ > 0 only if S is a flat 3-cut. Therefore


X 12
min

c(S) ≤ zS∗ c(S)
zS >0 11
S
12 X
= zS∗ c(S)
11
S
12 X X ∗
= ce zS
11
e∈E e∈S
12 X
≤ ce x∗e
11
e∈E
12
≤ .
11
t
u

5 Proof of Theorem 3
To prove Theorem 3, it is enough to give feasible solutions of (Pq ) and of (Dq )
having objective value f (q). For simplicity, we will actually do something weaker.
For the case when q ≡ 0 mod 3, we give a feasible solution of (Pq ) having objec-
tive value f (q), and a feasible solution to (Dq ) using only variables corresponding
122 William H. Cunningham and Lawrence Tang

to flat 3-cuts having objective value 11 12 . Although this does not quite prove The-
orem 3, it is enough to prove Theorems 4 and 5, since a common denominator
for the components of x∗ can always be chosen to be a multiple of 3.
First, we describe our feasible solution to (Pq ). Consider Figure 2 which shows
G9 . Let c0e be the number next to edge e, or 1 if no number appears. It is easy to
see that the minimum value of a 3-cut is 40, so c = c0 /40 is a feasible solution to
(P9 ). Its objective value is the sum of the components of c0 divided by 9, which
is 37
40 .
Here is the general construction (when q is a multiple of 3) for an optimal
solution of (Pq ). If q = 3m, divide 4q into three “corner triangles” of side m
together with the “middle hexagon”. Put c0e = 3m + 1 for all edges incident with
the terminals. Put c0e = 2m + 2 for all other edges on the boundary of 4q . Put
c0e = m − 1 for each edge e in a corner triangle that is parallel to an outside edge
and distance 1 from it. Put c0e = 1 for all other edges in the middle hexagon
(including its boundary). Put c0e = 0 for all other edges.
It is easy to convince oneself that the minimum weight of a 3-cut with respect
to c0 is 4(3m + 1), and hence that c = c0 /4(3m + 1) is a feasible solution to (Pq ).
Here is a sketch of a proof. (The ideas come, essentially, from the result of
Dahlhaus, et al. [5], showing that there is a polynomial-time algorithm to solve
the optimal multiterminal cut problem when G is planar and the number of
terminals is fixed.) Any minimal 3-cut of Gq has the form β(R1 , R2 , R3 ). There
are two kinds of such 3-cuts, corresponding to the case in which there is a pair
i, j for which there is no edge joining a node in Ri to a node in Rj , and the
one where this is not true. The minimum value of a 3-cut of the first type is
simply the sum of the weights of two cuts, each separating a terminal from the
other two. In the case of Gq with c0 described above, to show that any such cut
has weight at least 4(3m + 1), it is enough to show (due to the symmetry of
c0 ) that any cut separating one terminal from the other two has weight at least
2(3m + 1). This is done by exhibiting an appropriate flow of this value from one
terminal to the other two.
The second type of 3-cut corresponds to the union of three paths in the
planar dual of Gq , such that the three paths begin at the same face triangle
and end with edges that are on different sides of the outside face. Finding a
minimum-weight such 3-cut can be accomplished by, for each choice of the face
triangle, solving a shortest path problem. Therefore, to show that any 3-cut of
the second type has c0 -weight at least 4(3m + 1), one shows that, for each choice
of face triangle, there is an appropriate “potential” on the faces of Gq .
To compute the objective value of this feasible solution (Pq ), note that there
are 6 edges e having c0e = 3m + 1, 3(3m − 2) edges e having c0e = 2m + 2, 6(m − 1)
edges e having c0e = m − 1, and 9m2 edges e having c0e = 1. From this we get
that the total c0 -weight of all the edges is 3m(11m + 12). To obtain the objective
value of the resulting c in (Pq ), we divide by 4(3m + 1)(3m), and this gives f (q)
for q = 3m.
Now we need to show a feasible solution of (Dq ) having objective value 11 12 .
This requires a weighting of the flat 3-cuts of Gq . We assign positive dual vari-
Optimal 3-Terminal Cuts and Linear Programming 123

ables to two kinds of 3-cuts. For each integer j, 1 ≤ j < m and each choice of two
terminals r, s we consider the (uniform) 3-cut β(Rr (j), Rs (j), V \(Rr (j)∪Rs (j)))
where, for t = r, s, Rr (j) = {v ∈ Vq : dq (t, v) < j}. There are 3m such 3-cuts S,
1
and for each of them we set zS = 4q . Notice that these variables contribute to the
left-hand side of the main constraint of (Dq ) only for certain edges, namely, those
that are contained in the corner triangles and are parallel to one of the two sides
of 4 that meet at that corner. For each of these edges, the total contribution is
exactly 1/2q.

5 5 5
5 5 5 5

5 3 3 5
5 3 3 3 5

5 3 1 3 5
5 3 1 1 3 5
5 3 1 1 3 5
5 3 1 3 5
5 3 3 3 5
5 3 3 5

5 5 5 5
5 5 5

Fig. 3. Feasible solution of (D9 )

The weights assigned to the second type of flat cut are determined by a
weighting of the face triangles of Gq that are contained in the middle hexagon.
See Figure 3, where such a weighting of the face triangles is indicated for G9 . Let
us use the term row in the following technical sense. It is defined by a straight
line through the centre of a face triangle and parallel to one of its three sides.
When we speak of the face triangles in the row, we mean all of the face triangles
that are intersected by the line. When we speak of the edges in the row, we
mean all of the edges that are intersected by the line. Notice that in the figure,
the sum of the weights of the face triangles in each row is the same, namely 35.
It is obvious how to extend this pattern to find a weighting with this property
for any q = 3m. Then the sum of the weights of the face triangles in any row is
4m2 − 1.
Given a face triangle, consider the set of all edges in the three rows containing
the triangle. It is possible to choose two flat 3-cuts of Gq whose union is this
124 William H. Cunningham and Lawrence Tang

set, and whose intersection is a single edge, or is the set of edges of the face
triangle. (There is more than one way to do this.) For each of these two 3-cuts,
assign a weight equal to the weight of the triangle divided by 2q(4m2 − 1). (Note
that a 3-cut S may be assigned weight by two different face triangles; these
weights are added to form the variable zS .) Now consider the constraint of (Dq )
corresponding to an edge e. The contribution of the variables just defined to the
left-hand side of the constraint, is at most the sum of the weights of the face
triangles in rows containing the edge. If the edge is in the middle hexagon, or is in
a corner triangle and is not parallel to one of the edges incident with the corner,
then it gets contributions from triangles in two different rows, and otherwise,
it gets contributions from triangles in one row. Therefore, the contribution for
the first type of edge is at most (4m2 − 1)/(4m2 − 1)q = 1q . For the second type
1
of edge the total contribution is at most half this, that is, at most 2q . But the
second group of edges consists precisely of the ones that get a contribution from
1
the dual variables assigned to the uniform 3-cuts, and that contribution is 2q .
So the total contribution of all of the dual variables to the left-hand side of the
constraint of (Dq ) corresponding to any edge e is at most 1q , so we have defined
a feasible solution of (Dq ).
Now the objective value of this solution can be computed as follows. There
1
are 3m variables corresponding to uniform 3-cuts, each given value 4q . Therefore,
3m
the contribution to the objective function of variables of this type is 12m = 14 .
The contribution of the other variables is the sum of over the 2m horizontal
rows in the middle hexagon, of the total weight of a row divided by q(4m2 − 1).
Therefore, it is
2
2m(4m2 − 1)/q(4m2 − 1) = .
3
Therefore, the objective value of our feasible solution to (Dq ) is

1 2 11
+ = .
4 3 12

6 Remarks

Since the constant 12 11 is best possible in Theorem 4, it is natural to ask whether


it is best possible in Theorem 5. Note, however, that the family of examples
that we use to show the tightness of the LP bound, all have the property that
there is a flat 3-cut that is optimal. Therefore, these examples are not at all bad
for the approximation algorithm. However, it seems likely that 12 11 is indeed best
possible in Theorem 5. For several values of q Kevin Cheung [2] has constructed
examples in which the optimal solution of (LP 3) has denominator q, and the
1
approximation algorithm returns a 3-cut of value at least f (q) times the optimal
value of (LP 3). Actually, his examples seem to be the first that show that our
approximation algorithm does not always return an optimal solution. In fact,
no such example seems to have been known even for the simpler algorithm of
Călinescu et al. [1].
Optimal 3-Terminal Cuts and Linear Programming 125

All of the results of Călinescu et al. [1] quoted above for k = 3 are special cases
of their results for general k. They give a linear-programming relaxation that
generalizes (LP 3), and a corresponding generalization of the notion of triangle-
embedding, an embedding into a (k − 1)-dimensional simplex in which the ter-
minals are mapped to the extreme points. They show that the optimal value of a
k-cut is at most 3k−2
2k times the optimal value of this linear-programming prob-
lem. As a result, they obtain an approximation algorithm for the optimal k-cut
problem having performance guarantee 3k−2 2k . The recent paper [7], which has
most of our results for k = 3, also has results for k > 3, improving the bounds
given by [1]. For example, [7] gives bounds of 1.1539 for k = 4 and 1.3438 for
all k > 6. The problem of giving a tight analysis for k > 3, as we now have for
k = 3, remains open.
Acknowledgment. We are grateful to Gruia Călinescu, Joseph Cheriyan, Kevin
Cheung, and Levent Tunçel for conversations about this work.

References
1. G. Călinescu, H. Karloff, and Y. Rabani: An improved approximation algorithm
for MULTIWAY CUT Proceedings of Symposium on Theory of Computing, ACM,
1998.
2. Kevin Cheung, private communication, 1999.
3. S. Chopra and M.R. Rao, “On the multiway cut polyhedron”, Networks 21(1991),
51–89.
4. W.H. Cunningham, “The optimal multiterminal cut problem”, in: C. Monma and
F. Hwang (eds.), Reliability of Computer and Communications Networks, American
Math. Soc., 1991, pp. 105–120.
5. E. Dahlhaus, D. Johnson, C. Papadimitriou, P. Seymour, and M. Yannakakis, “The
Complexity of multiway cuts”, extended abstract, 1983.
6. E. Dahlhaus, D. Johnson, C. Papadimitriou, P. Seymour, and M. Yannakakis, “The
Complexity of multiterminal cuts”, SIAM J. Computing, 23(1994), 864–894.
7. D. Karger, P. Klein, C. Stein, M. Thorrup, and N. Young, “Rounding algorithms
for a geometric embedding of minimum multiway cut,” Proceedings of Symposium
on Theory of Computing, ACM, 1999, to appear.
Semidefinite Programming Methods for the
Symmetric Traveling Salesman Problem

Dragoš Cvetković1, Mirjana Čangalović2, and Vera Kovačević-Vujčić2


1
Faculty of Electrical Engineering, University of Belgrade
2
Faculty of Organizational Sciences, University of Belgrade

Abstract. In this paper the symmetric traveling salesman problem


(STSP) is modeled as a problem of discrete semidefinite programming.
A class of semidefinite relaxations of STSP model is defined and two
variants of a branch-and-bound technique based on this class of relax-
ations are proposed. The results of preliminary numerical experiments
with randomly generated problems are reported.

Keywords. Semidefinite programming, Traveling salesman problem,


Branch-and-bound methods.

1 Introduction
Semidefinite programming (SDP) has many applications to various classes of
optimization problems (see e.g. [33]). In particular, there is a growing interest in
the application of SDP to combinatorial optimization, where it is used in order to
get satisfactory bounds on the optimal objective function value (see [15], [31] for
a survey). Some examples are recently introduced semidefinite relaxations for the
max-cut problem (Goemans, Williamson [16]), graph colouring problem (Karger,
Motwani, Sudan [20]) and traveling salesman problem (Cvetković, Čangalović,
Kovačević-Vujčić [7], [8]). It is the purpose of this paper to investigate the power
of semidefinite relaxations for traveling salesman problem in a branch-and-bound
framework.
The traveling salesman problem (TSP) is one of the best-known NP-hard
combinatorial optimization problems. There is an extensive literature on both
theoretical and practical aspects of TSP. The most important theoretical results
on TSP can be found in [24] (see also [4], [9]). A large number of both exact
algorithms and heuristics for TSP have been proposed; for a review we refer to
Laporte [22], [23]. We shall mention here only the most important approaches for
finding an exact solution of the symmetric traveling salesman problem (STSP).
Two classical relaxations of STSP have been extensively discussed in literature.
The first exploits the fact that the cost of an optimal STSP-tour cannot be less
than that of a shortest 1-tree. Several algorithms of branch-and-bound type are
based on this relaxation first proposed by Christofides [3]. The basic algorithm
was developed by Held and Karp [19] and further improved by Helbig-Hansen
and Krarup [18], Smith and Thompson [32], Volgenant and Jonker [34], Gav-
ish and Srikanth [14] and, more recently, Carpaneto, Fischetti and Toth [2].

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 126–136, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Semidefinite Programming Methods 127

The second relaxation is the linear programming relaxation of the 2- matching


problem corresponding to STSP. This relaxation has been embedded in various
optimization algorithms by gradually introducing violated subtour elimination
constraints, integrality constraints, other types of valid inequalities, etc. Dantzig,
Fulkerson and Johnson were the first to propose such an algorithm [11], which
was followed by Martin [25], Miliotis [26], [27], Land [21], Crowder and Padberg
[5], Padberg and Hong [28], Padberg and Rinaldi [29], [30], Grötschel and Hol-
land [17], among the others. In this paper we propose a new class of branch-and-
bound algorithms for STSP which is based on semidefinite relaxations .
The paper is organized as follows. In Section 2 we prove that STSP can
be modeled as the problem of discrete semidefinite programming. The model is
based on the notion of the Laplasian of graphs. A class of semidefinite relaxations
of the discrete STSP model is presented and its relation to 1-tree and 2-matching
relaxations is discussed. At the end of Section 2 a new type of cutting planes for
the linear programming relaxation of STSP which are motivated by the structure
of the semidefinite relaxation model is introduced. In Section 3 two variants of
a branch-and-bound method based on the semidefinite relaxation are proposed
and preliminary numerical results are reported.

2 A Class of Semidefinite Relaxations for STSP

Semidefinite relaxations developed here are based on the following results of


Fiedler related to the Laplacian of graphs and algebraic connectivity (see [12],
[13]).
Let G = (V, E) be an undirected simple graph, where V = {1, . . . , n} is the
set of vertices and E is the set of edges. The Laplacian L(G) of graph G is a
symmetric matrix defined as L(G) = D(G) − A(G), where D(G) is the diagonal
matrix with vertex degrees on the diagonal and A(G) is the adjacency matrix of
G.
The matrix L(G) is positive semidefinite. If λ1 ≤ . . . ≤ λn are eigenvalues of
L(G) then λ1 = 0 with the corresponding eigenvector e = (1, . . . , 1). All other
eigenvalues have eigenvectors which belong to set
( n n
)
X X
n
S = x = (x1 , . . . , xn ) ∈ IR | xi = 0, 2
xi = 1
i=1 i=1

According to Fiedler, the second smallest eigenvalue λ2 of L(G), is called the


algebraic connectivity of G and denoted by a(G). In [13] the following results
are proved:

Theorem 1. The algebraic connectivity a(G) has the properties:

(i) a(G) = min xT L(G)x


x∈S
(ii) a(G) ≥ 0, a(G) > 0 if and only if G is connected.
128 Dragoš Cvetković, Mirjana Čangalović, and Vera Kovačević-Vujčić

Fiedler shows that the notion of the Laplacian and the algebraic connectivity
can be generalized to graphs with positively weighted edges.
A C-edge-weighted graph GC = (V, E, C) is defined by graph G = (V, E)
and a symmetric nonnegative weight matrix C such that cij > 0 if and only if
{i, j} ∈ E. Now the Laplacian L(GC ) is defined as L(GC ) = diag(r1 , . . . rn ) − C,
where ri is the sum of the i-th row of C. The Laplacian L(GC ) has similar
characteristics as L(G). Namely it is symmetric, positive semidefinite with the
smallest eigenvalue λ1 = 0 and the corresponding eigenvector e. As before, the
algebraic connectivity a(GC ) is the second smallest eigenvalue of L(GC ), which
enjoys similar properties to those in Theorem 1.
Theorem 2 ((M. Fiedler [13])). The generalized algebraic connectivity a(GC )
has the following properties:
(i) a(GC ) = min xT L(GC )x
x∈S
(ii) a(GC ) ≥ 0, a(GC ) > 0 if and only if GC is connected.
In the sequel we shall assume that G = (V, E) is a complete undirected
graph, where, as before, V = {1, . . . , n} is the set of vertices and E is the set
of edges. To each edge {i, j} ∈ E a distance (cost) dij is associated such that
the distance matrix D = [dij ]n×n is symmetric and dii = 0, i = 1, . . . , n. Now
the symmetric traveling salesman problem (STSP) can be formulated as follows:
find a Hamiltonian circuit of G with minimal cost.
Algebraic connectivity of a Hamiltonian circuit is well known in the theory
of graph spectra (see e.g. [10]). The Laplacian of a circuit with n vertices has
the spectrum
2 − 2 cos(2πj/n), j = 1, . . . , n
and the second smallest eigenvalue is obtained for j = 1 and j = n − 1, i.e. λ2 =
λ3 = 2 − 2 cos(2π/n). This value will be denoted by hn , i.e. hn = 2 − 2 cos(2π/n).
The next theorem, which gives a basis for the discrete semidefinite program-
ming model of STSP, has been proved in [8] as a consequence of a more general
result. For the sake of completeness we supply here a self-contained proof.
Theorem 3. Let H be a spanning subgraph of G such that d(i) = 2, i = 1, . . . , n,
where d(i) is the degree of vertex i with respect to H, and let L(H) = [lij ]n×n
be the corresponding Laplacian. Let α and β be real parameters such that α >
hn /n, 0 < β ≤ hn . Then H is a Hamiltonian circuit if and only if the matrix
X = L(H) + αJ − βI is positive semidefinite, where J is the n × n matrix with
all entries equal to one and I is the unit matrix of order n.
Proof. Let 0 = λ1 ≤ λ2 ≤ . . . ≤ λn be the eigenvalues of L(H) and let x1 = e
and xi ∈ S, i = 2, . . . , n, be the corresponding eigenvectors which form a basis
for IRn . It is easy to check that J has two eigenvalues: 0, with multiplicity n − 1
and the corresponding eigenvectors x2 , . . . , xn , and n with e as its eigenvector.
Therefore
Xe=(L + αJ − βI)e = (αn − β)e
Xxi=(L + αJ − βI)xi = (λi − β)xi , i = 2, . . . , n
Semidefinite Programming Methods 129

which means that αn − β and λi − β, i = 2, . . . , n are eigenvalues of X with


eigenvectors e, x2 , . . . , xn , respectively.
The conditions of Theorem 3 garantee that H is a 2-matching, i.e. it is either
a Hamiltonian circuit or a collection of at least two disjoint subcircuits. In the
first case λ2 = hn , while in the second, according to Theorem 1, λ2 = 0. As
α > hn /n in both cases it follows that αn − β > λ2 − β, i.e. the smallest
eigenvalue of X is equal to λ2 − β.
Suppose that H is a Hamiltonian circuit. Then β ≤ hn implies λ2 − β =
hn −β ≥ 0, i.e. matrix X is positive semidefinite. Suppose now that X is positive
semidefinite. Then λ2 −β ≥ 0 and β > 0 imply λ2 = a(H) > 0 and by Theorem 1
it follows that H is a connected 2-matching, i.e. a Hamiltonian circuit. t
u

It follows from Theorem 3 that a spanning subgraph H of G is a Hamiltonian


circuit if and only if its Laplacian L(H) satisfies the following conditions:

lii = 2, i = 1, . . . , n (1)

X = L(H) + αJ − βI is positive semidefinite, α > hn /n, 0 < β ≤ hn (2)


Starting from (1) and (2) the following discrete semidefinite programming
model of STSP can be defined
Xn X n   n n
1 α XX
minimize F (X) = − dij xij + dij (3)
i=1 j=1
2 2 i=1 j=1

subject to
xii = 2 + α − β, i = 1, . . . , n (4)
n
X
xij = nα − β, i = 1, . . . , n (5)
j=1

xij ∈ {α − 1, α}, i, j = 1, . . . , n, i < j (6)

X ≥0 (7)
where X ≥ 0 denotes that the matrix X = [xij ]n×n is symmetric and positive
semidefinite and α and β are chosen according to Theorem 3. Matrix L = X +
βI − αJ represents the Laplacian of a Hamiltonian circuit if and only if X
satisfies (4)-(7). Indeed, constraints (4)-(6) provide that L has the form of a
Laplacian with diagonal entries equal to 2, while condition (7) guarantees that
L corresponds to a Hamiltonian circuit. Therefore, if X ∗ is an optimal solution
of problem (3)-(7) then L∗ = X ∗ + βI − αJ is the Laplacian of an optimal

P
n P n 1 ∗
Hamiltonian circuit of G with the objective function value − dij lij =
i=1 j=1 2
F (X ∗ ).
A natural semidefinite relaxation of the traveling salesman problem is ob-
tained when discrete conditions (6) are replaced by inequality conditions:
130 Dragoš Cvetković, Mirjana Čangalović, and Vera Kovačević-Vujčić

minimize F (X) (8)

subject to
xii = 2 + α − β, i = 1, . . . , n (9)
n
X
xij = nα − β, i = 1, . . . , n (10)
j=1

α − 1 ≤ xij ≤ α, i, j = 1, . . . , n, i < j (11)


X ≥0 (12)
It is easy to see that the relaxation (8)-(12) can be expressed in the standard
form of an SDP problem. Indeed, constraint (9) can be represented as Ai ◦ X =
2 + α − β, where ◦ is the Frobenius product and Ai is a symmetric n × n matrix
with 1 at the position (i, i) and all other entries equal to 0. Similarly, condition
(10) is equivalent to Bi ◦ X = 2(nα − β), where Bi has 2 at the position (i, i)
while all the remaining elements of the i-th row and the i-th column are equal
to 1, and all the other entries are zero. Finally, condition (11) can be expressed
as 2(α − 1) ≤ Cij ◦ X ≤ 2α, where Cij has 1 at the positions (i, j) and (j, i) and
zero otherwise. Since SDP problem (8)-(12) depends on parameters α and β it
represents a class of semidefinite relaxations of TSP. In the sequel, members of
this class will be refered to as SDP relaxations.
Let us denote by D and D◦ the feasible set of problem (8)-(12) and its relative
interior. For each X ∈ D the corresponding Laplacian L = X + βI − αJ can
be interpreted as the Laplacian of the weighted graph GL = (V, EL , CL ), where
EL = {{i, j} ∈ E | lij < 0} and CL = 2I − L. If α and β satisfy the conditions
of Theorem 3 then, using similar arguments as in the proof of Theorem 3, it
can be shown that X ≥ 0 is equivalent to a(GL ) ≥ β (see also [8]). Hence, by
Theorem 2 graph GL is connected. It immediately follows that 2-matchings with
disjoint subcircuits cannot correspond to any X inD. 
◦ 2 2
It is easy to see that D 6= ∅. Indeed, if e.g. L̂ = 2 + I− J then
   n
 − 1 n − 1
2 2
X̂ = L̂ + αJ − βI = 2 + −β I + α− J has the eigenvalues
n−1 n−1
2
2+ − β with the multiplicity n − 1 and nα − β with the multiplicity 1.
n−1
2 2
Since nα − β > 0 and 2 + −β ≥2+ − hn > 0 for n ≥ 4, it follows
n−1 n−1
that X̂ ∈ D◦ , n ≥ 4.
For β < hn matrices X which correspond to Laplacians of Hamiltonian cir-
cuits are in D◦ , while for β = hn these matrices belong to D \ D◦ . It is clear
that the best relaxation is obtained for β = hn . For that reason in numerical
experiments reported in Section 3 parameter β is always chosen to be equal to
hn . Concerning the parameter α, it is always sufficient to choose α = 1.
The semidefinite relaxation (8)-(12) is substantially different from the exist-
ing STSP relaxations. It should be pointed out that it cannot be theoretically
compared neither with 2-matching nor with 1-tree. Indeed, if we consider STSP
Semidefinite Programming Methods 131

model (3)-(7) it is easy to see that X which corresponds to the Laplacian of a


2-matching satisfies (4)-(6) but need not satisfy (7). In the case of 1-tree, the
condition (4) is relaxed, while (5), (6) and (7) hold (see [8]). Preliminary nu-
merical experiments on randomly generated problems with 10 ≤ n ≤ 20 which
are reported in [8], indicate that SDP relaxation gives considerably better lower
bounds than both 1-tree and 2-matching.
SDP model (8)-(12) can be expressed in terms of Laplacians in the following
equivalent way
Xn X n  
1
minimize Φ(L) = − dij lij (13)
i=1 j=1
2

subject to
lii = 2, i = 1, . . . , n (14)
n
X
lij = 0, i = 1, . . . , n (15)
j=1

− 1 ≤ lij ≤ 0, i, j = 1, . . . , n, i < j (16)


λ2 (L) ≥ β (17)
where L = [lij ]n×n is a symmetric matrix with second smallest eigenvalue λ2 (L).
The existing branch-and-cut approaches to STSP start from the linear relaxation
of the 2-matching problem defined by (13) - (16) and form a search tree branch-
ing on fractional variables. At each node of the tree several types of cutting
planes (subtour elimination constraints, 2-matching inequalitites, comb inequal-
ities, etc.) are introduced in order to limit the growth of the search tree. None
of these explicitely takes in account nonlinear constraint (17). In the sequel we
shall discuss a possibility to introduce a new type of cutting planes based on
(17). Suppose that in (17) β is chosen to be equal to hn and let us denote by V
and W the feasible sets of problems (13) - (16) and (13) - (17), respectively. It
is clear that V \ W 6= ∅. The following theorem can be proved.
Theorem 4. Let L∗ ∈ V \ W and let s ∈ S be an eigenvector corresponding to
λ2 (L∗ ). Then
(i) L ◦ ssT ≥ hn for each L ∈ W

(ii) The hyperplane L ◦ ssT = hn is supporting for the set W .


Proof. (i) For each L ∈ W , according to Theorem 2, we have

λ2 (L) = min xT Lx = min L ◦ xxT ≥ hn .


x∈S x∈S

As s ∈ S then L ◦ ssT ≥ hn .
(ii) It is sufficient to prove that the hyperplane L ◦ ssT = hn and the set
W have nonempty intersection. We shall construct
 a point
 in the intersection of
2 2
the form Lγ = γL∗ + (1 − γ)L̂, where L̂ = 2 + I− J. The matrix
n−1 n−1
132 Dragoš Cvetković, Mirjana Čangalović, and Vera Kovačević-Vujčić

2
L̂ has the eigenvalue 2 + with multiplicity n − 1 and each x ∈ S as its
n−1
2
eigenvector, and 0 with multiplicity 1. Since λ2 (L̂) = 2 + > hn it follows
n−1
that L̂ ∈ ri W .
Let us prove that for each γ ∈ (0, 1) the vector s is an eigenvector corre-
sponding to λ2 (Lγ ). Indeed, for each x ∈ S

Lγ ◦ xxT = γL∗ ◦ xxT + (1 − γ)L̂ ◦ xxT ≥ γL∗ ◦ ssT + (1 − γ)L̂ ◦ ssT = Lγ ◦ ssT .

Moreover,
 
2
λ2 (Lγ ) = Lγ ◦ ssT = γλ2 (L∗ ) + (1 − γ) 2 + .
n−1
   
2 2 ∗
For γ = 2 + − hn / 2 + − λ2 (L ) we have λ2 (Lγ ) = Lγ ◦
n−1 n−1
ssT = hn . t
u

Suppose that L∗ is an optimal solution obtained at a node of the search tree


in some of existing branch-and-cut procedures. If λ2 (L∗ ) < hn and s ∈ S is a
corresponding eigenvector then

L ◦ ssT ≥ hn (18)

is a cutting plane inequality which could be added to the current relaxation


problem. Indeed, λ2 (L∗ ) = L∗ ◦ ssT < hn and, according to (i) of Theorem 4,
L(H) ◦ ssT ≥ hn for each L(H) which represents the Laplacian of a Hamiltonian
circuit H with n vertices. Let us note that semidefinite relaxation (13)-(17)
includes through (17) all valid inequalities of the type (18), which, according to
(ii) of Theorem 4, are defined by supporting hyperplanes of the feasible set.

3 Branch and Bound Algorithms


The goal of our numerical experiments was to give a first insigth in the power
of SDP as a relaxation in branch and bound framework for STSP.
We have implemented two branch and bound algorithms with the SDP re-
laxation (with α = 1, β = hn ) and one with the 1-tree relaxation. The last
one was implemented to check the correctness of the results. All algorithms are
based on the general branch and bound scheme as described in [24]. We used
a FORTRAN implementation of the branch and bound shell from the package
TSP-SOLVER [6], [9]. An initial upper bound was obtained in all cases by the
3-optimal heuristic. The depth first search was used to select the next subprob-
lem.
The two branch and bound algorithms differ only in their branching rules:
Algorithm 1. At the first vertex of degree greater than 2 in the weighted graph
representing the SDP solution an edge is excluded in each son;
Semidefinite Programming Methods 133

Algorithm 2. The first non-integer entry of the SDP solution matrix is replaced
in the sons by 0 and 1 respectively.
For solving the SDP relaxation tasks we used CSDP 2.2 software package
developed by Borchers [1] in C language. Inequality conditions (11) were handled
adding n2 − n slack variables each represented by a 1 × 1 block as accepted by
the software.
Our numerical experiments included 55 randomly generated STSP instances
of dimension 10 ≤ n ≤ 20 already treated in [8]. Entries of the distance matrix
are uniformly distributed in the range from 1 to 999. The experiments were
performed on an Alpha 800 5/400 computer. In a time sharing system it took no
more than 1 minute real time to get a solution of an SDP relaxation task related
to an STSP instance of dimension n ≤ 20. Computational results are presented
in Table 1.

Table 1.
1 2 3 4 5 6 7 8 9
1 1680.9950 1681 1681 1 / 1 /
2 2777.9920 2778 2778 1 / 1 /
10 3 1626.1300 1714 1630 7 5 5 3
4 2058.9950 2059 2059 1 / 1 /
5 2672.4910 2801 2713 24 18 11(1) 6
1 2884.0000 2884 2884 1 / 1 /
2 2258.4940 2283 2283 10 7 7 4
11 3 1565.0000 1565 1565 1 / 1 /
4 1226.8920 1229 1229 7 5 5 3
5 1999.0000 2019 2019 4 3 3 2
1 2962.0000 2962 2962 1 / 1 /
2 2416.0000 2416 2416 1 / 1 /
12 3 1267.0010 1267 1267 1 / 1 /
4 2434.0000 2434 2434 1 / 1 /
5 1981.7260 2021 2021 15 11 5 3
1 1742.0000 1742 1742 1 / 1 /
2 2064.4350 2072 2072 49 35 7(2) 4
13 3 1786.0010 1786 1786 1 / 1 /
4 2650.3250 2688 2686 10 7 11(1) 6
5 2458.0000 2458 2458 1 / 1 /
1 1503.0000 1838 1503 1 / 1 /
2 2269.0000 2269 2269 1 / 1 /
14 3 1985.5090 2091 2091 351 242 57(51) 29
4 2170.5680 2173 2173 11 8 5 3
5 2000.0000 2000 2000 1 / 1 /
1 1548.0010 1926 1548 1 / 1 /
2 1415.0000 1415 1415 1 / 1 /
15 3 1813.0000 2082 1849 4 3 3 2
4 2455.6730 2471 2471 8 6 7(2) 4
5 1749.0000 1749 1749 1 / 1 /
134 Dragoš Cvetković, Mirjana Čangalović, and Vera Kovačević-Vujčić

Table 1. (continued)

1 2 3 4 5 6 7 8 9
1 2579.0000 2579 2579 1 / 1 /
2 2189.0000 2189 2189 1 / 1 /
16 3 2147.9210 2247 2181 54 38 51(20) 26
4 1447.7110 1473 1473 11 8 7(1) 4
5 2595.0010 2896 2595 1 / 1 /
1 1183.9970 1184 1184 1 / 1 /
2 2606.9930 2607 2607 1 / 1 /
17 3 1664.9950 1665 1665 1 / 1 /
4 1568.4940 1579 1576 13 9 9 5
5 2192.2420 2233 2208 53 36 5(1)∗ 3
1 2606.5870 2676 2651 ∗ 59(1) 30
2 2273.2220 2356 2275 21 15 11 6
18 3 1562.0080 1673 1562 1 / 1 /
4 2490.0000 2490 2490 1 / 1 /
5 1815.9110 1994 1824 ∗ 17 9
1 1223.9960 1420 1224 1 / 1 /
2 2039.9930 2073 2068 89 60 19(3)∗ 10
19 3 1417.9960 1418 1418 1 / 1 /
4 1897.4480 2121 1926 ∗ 29(5) 15
5 2015.5880 2166 2035 * 15(5) 8
1 1953.2990 2250 2011 * 71(19) 36
2 2410.2790 2501 2420 452 309 7(3) 4
20 3 2585.1740 2680 2589 4 3 13(2) 7
4 1758.4360 1784 1777 20 14 15(2) 8
5 1817.7610 1838 1838 50 34 27(1) 14

The columns in Table 1 contain the following data:


1. Dimension of TSP (the number of cities);
2. Instance indentification number;
3. Value of the SDP relaxation (lower bound);
4. Value obtained by the 3-optimal heuristic (upper bound);
5. Length of the optimal solution;
6. The number of the solved relaxation tasks in Algorithm 1; asterisk indicates
that this number is greater than 1000;
7. The number of killed subproblems in the Algorithm 1;
8. The number of solved relaxation tasks in Algorithm 2;
9. The number of killed subproblems in Algorithm 2.
It can be seen from Table 1 that Algorithm 2 is superior to Algorithm 1
with respect to the cardinality of the search tree. However, in the case of Algo-
rithm 2 certain numerical instabilities occured in solving some SDP relaxation
tasks (CSDP output code 6). Numbers in parentheses in column 8 indicates the
number of such tasks. In all of these cases the solutions of SDP problems were
reached, but instead of usual 6-8 correct significant digits, only 3-5 correct digits
were obtained. Insufficient accuracy influenced the search procedure only for 2
Semidefinite Programming Methods 135

instances, denoted by asterisk in column 8, when Algorithm 2 failed to generate


an optimal solution.
The software we have implemented for the experiments presented in this pa-
per is far from being optimized. Improvements could include more sophisticated
branching rules and additional heuristics for choosing the next subproblem, as
well as a special purpose code addapted to the structure of SDP relaxations with
stabilization mechanisms for handling numerical instabilities.

References
1. Borchers, B.: CSDP, A C Library for Semidefinite Programming. Optimization
Methods and Software (to appear)
2. Carpaneto G., Fischetti M., Toth P.: New Lower Bounds for the Symmetric Trav-
elling Salesman Problem. Math. Program. 45 (1989) 233–254.
3. Christofides N.: The Shortest Hamiltonian Chain of a Graph. SIAM J. Appl. Math.
19 (1970) 689–696.
4. Cook, W., Cunningham, W., Pulleyblank, W., Schrijver, A.: Combinatorial Opti-
mization. John Wiley & Sons, New York Chichester Weinheim Brisbane Singapore
Toronto (1998)
5. Crowder H., Padberg M.W.: Solving Large-Scale Symmetric Travelling Salesman
Problems to Optimality. Management Sci. 26 (1980) 495–509
6. Cvetković, D., Čangalović, M., Dimitrijević, V., Kraus, L., Milosavljević, M., Simić,
S.: TSP-SOLVER - A Programming Package for the Traveling Salesman Problem.
Univ. Beograd, Publ. Elektrotehn. Fak. Ser. Mat., 1 (1990) 41–47
7. Cvetković, D., Čangalović, M., Kovačević-Vujčić, V.: Semidefinite Programming
and Traveling Salesman Problem. In: Petrović, R., Radojević, D. (eds.): Proceed-
ings of Yugoslav Symposium on Operations Research. Herceg Novi, Yugoslavia
(1998) 239–242
8. Cvetković, D., Čangalović, M., Kovačević-Vujčić, V.: Semidefinite Relaxations of
Travelling Salesman Problem. (to appear)
9. Cvetković, D., Dimitrijević, V., Milosavljević, M.: Variations on the Travelling
Salesman Theme. Libra Produkt, Belgrade (1996)
10. Cvetković, D., Doob, M., Sachs, H.: Spectra of Graphs. 3rd edn. Johann Ambrosius
Barth, Heidelberg Leipzig (1995)
11. Dantzig G.B., Fulkerson D.R., Johnson S.M.: Solution of a Large-Scale Traveling
Salesman Problem. Operations Research 2 (1954) 393–410
12. Fiedler M.: Algebraic Connectivity of Graphs. Czechoslovak Math. J. 23 (1973)
298–305
13. Fiedler, M.: Laplacian of Graphs and Algebraic Connectivity. In: Combinatorics
and Graph Theory, Vol. 25, Banach center publications, PWN-Polish scientific
publishers Warsaw (1989) 57–70
14. Gavish B., Srikanth K.N.: An Optimal Solution Method for Large-Scale Multiple
Travelling Salesman Problems. Operations Research 34 (1986) 698–717
15. Goemans, M.: Semidefinite Programming in Combinatorial Optimization. Math.
Program. 79 (1997) 143–161
16. Goemans M.X., Williamson D.P.: Improved Approximation Algorithms for Max-
imum Cut and Satisfability Problems Using Semidefinite Programming. J. ACM
42 (1995) 1115–1145
136 Dragoš Cvetković, Mirjana Čangalović, and Vera Kovačević-Vujčić

17. Grötschel M., Holland O.: Solution of Large-Scale Symmetric Travelling Salesman
Problems. Math. Program. 51 (1991) 141–202
18. Helbig-Hansen K., Krarup J.: Improvements of the Held-Karp Algorithm for the
Symmetric Traveling Salesman Problem. Math. Program. 7 (1974) 87–96
19. Held M., Karp R.M.: The Travelling Salesman Problem and Minimum Spanning
Trees. Part II, Math. Program. 1 (1971) 6–25
20. Karger D., Motwani R., Sudan M.: Approximate Graph Coloring by Semidefinite
Programming. J. ACM 45 (1998) 246–265
21. Land A.H.: The Solution of Some 100-City Travelling Salesman Problems. Working
Paper. London School of Economics (1979)
22. Laporte, G.: The Traveling Salesman Problem: An Overview of Exact and Approx-
imate Algorithms. European J. Operational Research 59 (1992) 231–247
23. Laporte G.: Exact Algorithms for the Traveling Salesman Problem and the Vehicle
Routing Problem. Les Cahiers du GERAD G-98-37 July (1998)
24. Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G., Shmoys, D.B.: The Traveling
Salesman Problem. John Wiley & Sons, Chichester New York Brisbane Toronto
Singapore (1985)
25. Martin G.T.: Solving the Travelling Salesman Problem by Integer Programming.
Working Paper. CEIR, New York (1966)
26. Miliotis P.: Integer Programming Approaches to the Travelling Salesman Problem.
Math. Program. 10 (1976) 367–378
27. Miliotis P.: Using Cuting Planes to Solve the Symmetric Travelling Salesman Prob-
lem. Math. Program. 15 (1978) 177–188
28. Padberg M.W., Hong S.: On the Symmetric Travelling Salesman Problem: A Com-
putational Study. Math. Program. Study 12 (1980) 78–107
29. Padberg M.W., Rinaldi G.: Optimization of a 532-City Symmetric Traveling Sales-
man Problem by Branch and Cut. Operations Research Letters 6 (1987) 1–7
30. Padberg M.W., Rinaldi G.: A Branch-and-Cut Algorithm for the Resolution of
Large Scale Symmetric Traveling Salesman Problems. SIAM Review 33 (1991)
66–100
31. Rendl, F.: Semidefinite Programming and Combinatorial Optimization. Technical
Report Woe-19, TU Graz, Austria December (1997)
32. Smith T.H.C., Thompson G.L.: A LIFO Implicit Enumeration Search Algorithm
for the Symmetric Traveling Salesman Problem Using Held and Karp’s 1-Tree
Relaxation. Annals Disc. Math. 1 (1977) 479–493
33. Vandenberghe, L., Boyd, S.: Semidefinite Programming. SIAM Review 38 (1996)
49–95
34. Volgenant T., Jonker R.: A Branch and Bound Algorithm for the Symmetric Trav-
eling Salesman Problem Based on the 1-Tree Relaxation. Europian J. Operational
Research 9 (1982) 83–89
Bounds on the Chvátal Rank of Polytopes in the
0/1-Cube

Friedrich Eisenbrand1 and Andreas S. Schulz2


1
Max-Planck-Institut für Informatik, Im Stadtwald, D-66123 Saarbrücken, Germany,
[email protected]
2
MIT, Sloan School of Management and Operations Research Center, E53-361,
Cambridge, MA 02139, USA, [email protected]

Abstract. Gomory’s and Chvátal’s cutting-plane procedure proves re-


cursively the validity of linear inequalities for the integer hull of a given
polyhedron. The number of rounds needed to obtain all valid inequal-
ities is known as the Chvátal rank of the polyhedron. It is well-known
that the Chvátal rank can be arbitrarily large, even if the polyhedron is
bounded, if it is of dimension 2, and if its integer hull is a 0/1-polytope.
We prove that the Chvátal rank of polyhedra featured in common relax-
ations of many combinatorial optimization problems is rather small; in
fact, the rank of any polytope contained in the n-dimensional 0/1-cube
is at most 3n2 lg n. This improves upon a recent result of Bockmayr et
al. [6] who obtained an upper bound of O(n3 lg n).
Moreover, we refine this result by showing that the rank of any polytope
in the 0/1-cube that is defined by inequalities with small coefficients is
O(n). The latter observation explains why for most cutting planes de-
rived in polyhedral studies of several popular combinatorial optimization
problems only linear growth has been observed (see, e.g., [13]); the coeffi-
cients of the corresponding inequalities are usually small. Similar results
were only known for monotone polyhedra before.
Finally, we provide a family of polytopes contained in the 0/1-cube the
Chvátal rank of which is at least (1 + )n for some  > 0; the best known
lower bound was n.

1 Introduction
Chvátal [11] established cutting-plane proofs as a way to certify certain prop-
erties of combinatorial problems, e.g., to testify that there are no k pairwise
non-adjacent nodes in a given graph, that there is no acyclic subdigraph with k
arcs in a given digraph, or that there is no tour of length at most k in a pre-
scribed instance of the traveling salesperson problem. In this paper we discuss
the length of such proofs. Let us first recall the notion of a cutting-plane proof.
A sequence of inequalities
c1 x 6 δ1 , c2 x 6 δ2 , . . . , cm x 6 δm (1)
is called a cutting-plane proof of c x 6 δ from a given system of linear inequalities
A x 6 b, if c1 , . . . , cm are integral, cm = c, δm = δ, and if ci x 6 δi0 is a

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 137–150, 1999.
c Springer-Verlag Berlin Heidelberg 1999
138 Friedrich Eisenbrand and Andreas S. Schulz

nonnegative linear combination of A x 6 b, c1 x 6 δ1 , . . . , ci−1 x 6 δi−1 for some


δi0 with bδi0 c 6 δi . Obviously, if there is a cutting-plane proof of c x 6 δ from
A x 6 b then every integer solution to A x 6 b must satisfy c x 6 δ. Chvátal
[11] showed that the converse holds as well. That is, if all integer points in a
nonempty polytope {x ∈ Rn : Ax 6 b} satisfy an inequality c x 6 δ, for some
c ∈ Zn, then there is a cutting-plane proof of c x 6 δ from A x 6 b. Schrijver
extended this result to rational polyhedra [36].
In a way, the sequential order of the inequalities in (1) obscures the (recursive)
structure of the cutting-plane proof; it is better revealed by a directed graph
with vertices 0, 1, 2, . . . , m, in which an arc goes from node i to node j iff the
i-th inequality has a positive coefficient in the linear combination of the j-th
inequality. Here, 0 serves as a representative for any inequality in A x 6 b. The
number of arcs in a longest simple path terminating at a node i is usually referred
to as the depth of the i-th inequality ci x 6 δi w.r.t. the cutting plane proof. The
depth of the m-th inequality is called the depth of the proof, whereas m is the
so-called length of the cutting-plane proof. We also say that an inequality c x 6 δ
has depth (at most) d relative to a polyhedron {x : A x 6 b} if it has a cutting-
plane proof from A x 6 b of depth less than or equal to d. The following theorem
clarifies the relation between the depth and the length of a cutting-plane proof.
It resembles very much the relation between the height and the number of nodes
of a recursion tree where every interior node has at most degree n. It can be
proved with the help of Farkas’ Lemma.

Theorem 1 (Chvátal, Cook, and Hartmann [13]). Let A ∈ Zm×n and


b ∈ Zm, let A x 6 b have an integer solution, and let c x 6 δ have depth at most
d relative to A x 6 b. Then there is a cutting-plane proof of c x 6 δ from A x 6 b
of length at most (nd+1 − 1)/(n − 1).

Gomory-Chvátal cutting-planes have gained importance for at least three


reasons. First, the cutting-plane method is a (theoretical) tool to obtain a linear
description of the integer hull of a polyhedron. In fact, as we already mentioned
before any valid inequality for the integer hull has a cutting-plane proof from
the defining system of the polyhedron. The Chvátal rank of this polyhedron is
the smallest number d such that all inequalities valid for its integer hull have
depth at most d relative to the defining system. Hence, if we later state lower
and upper bounds for the depth of inequalities they immediately apply to the
Chvátal rank of the corresponding polyhedron as well. Second, despite the early
disappointments with Gomory’s cutting-plane method [21, 22], it is of practical
relevance. On the one hand, it has stimulated to a certain extent the search
for problem-specific cutting planes which became the basis of an own branch
of combinatorial optimization, namely polyhedral combinatorics (see, e.g., [33,
23, 35]). On the other hand, Balas et al. [2] successfully incorporated Gomory’s
mixed integer cuts within a Branch-and-Cut framework. Third, since cutting-
plane theory implies that certain implications in integer linear programming
have cutting-plane proofs, it is of particular importance in mathematical logic
and complexity theory. It is a fundamental problem whether there exists a proof
Bounds on the Chvátal Rank of Polytopes in the 0/1-Cube 139

system for propositional logic in which every tautology has a short proof. Here,
the length of the proof is measured by the total number of symbols in it and short
means polynomial in the length of the tautology. This question is equivalent to
whether or not NP equals co-NP. Cook, Coullard, and Turán [14] were the first to
consider cutting-plane proofs as a propositional proof system. In particular, they
pointed out that the cutting-plane proof system is a strengthening of resolution
proofs. Since the work of Haken [25] exponential lower bounds are known for
the latter. Results of Chvátal, Cook, and Hartmann [13], of Bonet, Pitassi, and
Raz [7], of Impagliazzo, Pitassi, and Urquhart [30], and of Pudlák [34] imply
exponential lower bounds on the length of cutting-plane proofs as well. On the
other hand, there is no upper bound on the length of cutting-plane proofs in
terms of the dimension of the corresponding polyhedron as the following well-
known example shows. The Chvátal rank of the polytope defined by
−t x1 + x2 61
t x1 + x2 6 t + 1
x1 6 1
x1 , x2 > 0
grows with t. Here, t is an arbitrary positive number. This fact is rather counter-
intuitive since the corresponding integer hull is a 0/1-polytope, i.e., all its vertices
have components 0 or 1 only. That is, for any 0/1-polytope there is a simple
certificate of the validity of an inequality c x 6 δ. Just list all, at most 2n possible
assignments of 0/1-values to the variables. One of our main results helps to
meet the natural expectation. We give a polynomial bound in the dimension for
the Chvátal rank of any polytope contained in the 0/1-cube. Then, Theorem 1
implies the existence of exponentially long cutting-plane proofs, matching the
known exponential lower bounds.
In polyhedral combinatorics, it has been quite common to consider the depth
of a class of inequalities if not as an indicator of quality at least as a measure
of its complexity. Hartmann, Queyranne, and Wang [29] give conditions under
which an inequality has depth at most 1 and use them to establish that sev-
eral classes of inequalities for the traveling salesperson polytopes have depth at
least 2, as was claimed before in [3, 8, 9, 10, 18, 20, 24]. However, it follows
from a recent result in [16] that deciding whether a given inequality c x 6 δ
has depth at least 2 can in general not be done in polynomial time, unless
P = NP. Chvátal, Cook, and Hartmann [13] (see also [27]) answered questions
and proved conjectures of Schrijver, of Barahona, Grötschel, and Mahjoub [4], of
Jünger, of Chvátal [12], and of Grötschel and Pulleyblank [24] on the behavior
of the depth of certain inequalities relative to popular relaxations of the stable
set polytope, the bipartite-subgraph polytope, the acyclic-subdigraph polytope,
and the traveling salesperson polytope, resp. They obtained similar results for
the set-covering and the set-partitioning polytope, the knapsack polytope, and
the maximum-cut polytope, and so did Schulz [38] for the transitive packing,
the clique partitioning, and the interval order polytope. The observed increase
of the depth was never faster than a linear function of the dimension; we prove
140 Friedrich Eisenbrand and Andreas S. Schulz

that this indeed has to be the case as the depth of any inequality with coeffi-
cients bounded by a constant is O(n), relative to a polytope in the 0/1-cube.
Naturally, most polytopes associated with combinatorial optimization problems
are 0/1-polytopes.

Main Results. We present two new upper bounds on the depth of inequalities
relative to polytopes in the 0/1-cube. For notational convenience, let P be any
polytope contained in the 0/1-cube, i.e., P ⊆ [0, 1]n , and let c x 6 δ, c ∈ Zn be
an arbitrary inequality valid for the integer hull PI of P .
We prove first that the depth of c x 6 δ relative to P is at most 2(n2 +
n lg kck∞ ). This yields an O(n2 lg n) bound on the Chvátal rank of P since
any 0/1-polytope PI can be represented by a system of inequalities Ax 6 b
with A ∈ Zm×n, b ∈ Zm such that each absolute value of an entry in A is
bounded by nn/2 . Note that the latter bound is sharp, i.e., there exist 0/1-
polytopes with facets for which any inducing inequality a x 6 β, a ∈ Zn satisfies
kak∞ ∈ Ω(nn/2 ) [1].
Second, we show that the depth of c x 6 δ relative to P is no more than
kck1 + n. A similar result was only known for monotone polyhedra [13]. In fact,
we present a reduction to the monotone case that is of interest in its own right
because of the smooth interplay of unimodular transformations and rounding
operations. The second bound gives an asymptotic improvement by a factor n
to the before-mentioned bound if the components of c are bounded by a constant.
Third, we construct a family of polytopes in the n-dimensional 0/1-cube
whose Chvátal rank is at least (1 + )n, for some  > 0. In other words, if r(n)
denotes the maximum Chvátal rank over all polytopes that are contained in
[0, 1]n , then it is one outcome of our study that this function behaves as follows:

(1 + )n 6 r(n) 6 3n2 lg n .

Finally, we also show that the number of inequalities in any linear description
of a polytope P ⊆ [0, 1]n with empty integer hull is exponential in n, whenever
there is an inequality of depth n.

Related Work. Via a geometric argument, Bockmayr and Eisenbrand [5] derived
the first polynomial upper bound of 6 n3 lg n on the Chvátal rank of polytopes in
the n-dimensional 0/1-cube. Subsequently, Schulz [39] and Hartmann [28] inde-
pendently obtained both a considerably simpler proof and a slightly better bound
of n2 lg(nn/2 ), by using bit-scaling. The reader is referred to the joint journal
version of their papers [6], where the authors actually show that the depth of any
inequality c x 6 δ, c ∈ Zn, which is valid for PI is at most n2 lg kck∞ , relative to
P . For monotone polytopes P , Chvátal, Cook, and Hartmann [13] showed that
the depth of any inequality c x 6 δ that is valid for PI is at most kck1 . More-
over, they also identified polytopes stemming from relaxations of combinatorial
optimization problems that have Chvátal rank at least n.
Eventually, our study of r(n) can also be seen as a continuation of the in-
vestigation of combinatorial properties of 0/1-polytopes, like their diameter [32],
Bounds on the Chvátal Rank of Polytopes in the 0/1-Cube 141

their number of facets [19], their number of vertices in a 2-dimensional projection


[31], or their feature of admitting polynomial-time simplex-type algorithms for
optimization [40].

The paper is organized as follows. We start with some preliminaries and


introduce some notation in Section 2. We also show that any linear description of
a polytope in the 0/1-cube that has empty integer hull and Chvátal rank n needs
to contain at least 2n inequalities. In Section 3, we prove the O(n2 lg n) upper
bound on the Chvátal rank of polytopes in the 0/1-cube. Then, in Section 4, we
utilize unimodular transformations as a key tool to derive an O(n) bound on the
depth of inequalities with small coefficients, relative to polytopes in the 0/1-cube.
Finally, we present the new lower bound on the Chvátal rank in Section 5.

2 Preliminaries

A polyhedron P is a set of points of the form P = {x ∈ Rn |Ax 6 b}, for some


matrix A ∈ Rm×n and some vector b ∈ Rm . The polyhedron is rational if both A
and b can be chosen to be rational. If P is bounded, then P is called a polytope.
The integer hull PI of a polyhedron P is the convex hull of the integer points
in P .
The half space H = (c x 6 δ) is the set {x ∈ Rn | c x 6 δ}, for some non-zero
vector c ∈ Q n . It is called valid for a subset S of Rn , if S ⊆ H. Sometimes we
also say that the inequality c x 6 δ is valid for S. If the components of c are
relatively prime integers, i.e., c ∈ Zn and gcd(c) = 1, then HI = (c x 6 bδc),
where bδc is the largest integer number less than or equal to δ. The elementary
closure of a polyhedron P is the set
\
P0 = HI ,
H⊇P

where the intersection ranges over all rational half spaces containing P . We refer
to an application of the 0 operation as one iteration of the Gomory-Chvátal pro-
cedure. If we set P (0) = P and P (i+1) = (P (i) )0 , for i > 0, then the Chvátal rank
of P is the smallest number t such that P (t) = PI . The depth of an inequality
c x 6 δ with respect to P is the smallest k such that c x 6 δ is valid for P (k) .
Let P ⊆ Rn be a polyhedron. A polyhedron Q with Q ⊇ P is called a
weakening of P , if QI = PI . If c x 6 δ is valid for PI , then the depth of this
inequality with respect to Q is an upper bound on the depth of this inequality
with respect to P . It is easy to see that each polytope P ⊆ [0, 1]n has a rational
weakening in the 0/1-cube.
The following important lemma can be found in [37, p. 340]. (For a very
nice treatment, see also [15, Lemma 6.33].) It allows to use induction on the
dimension of the polyhedra considered and provides the key for the termination
of the Gomory-Chvátal procedure, which was shown by Schrijver for rational
polyhedra in [36].
142 Friedrich Eisenbrand and Andreas S. Schulz

Lemma 1. Let F be a face of a rational polyhedron P . Then F 0 = P 0 ∩ F .


Lemma 1 yields the following upper bound on the Chvátal rank of rational
polytopes in the 0/1-cube with empty integer hull (see [6] for details).
Lemma 2. Let P ⊆ [0, 1]n be a d-dimensional rational polytope in the 0/1-cube
with PI = ∅. If d = 0, then P 0 = ∅; if d > 0, then P (d) = ∅.
Thus, if c x 6 δ is valid for a rational polytope P ⊆ [0, 1]n and c x 6 δ − 1 is
valid for PI , then c x 6 δ − 1 is valid for P (n) .
With these methods at hand one can prove the following result due to Hart-
mann [27].
P P
Lemma 3. If P ⊆ [0, 1]n is a polytope and i∈I xi − j∈J xj 6 r is valid for
PI for some subsets I and J of {1, . . . , n}, then this inequality has depth at most
n2 with respect to P .
A side-product of our result in Section 4.3 is a reduction of this bound to 2n.
Chvátal, Cook, and Hartmann [13, p. 481] provided the following family of
rational polytopes in the 0/1-cube with empty integer hull and Chvátal rank n:
 X X
(1 − xj ) > , for all J ⊆ {1, . . . , n} .
1
Pn = x ∈ Rn | xj + (2)
2
j∈J j ∈J
/

The polytopes in this example have exponentially many inequalities, and this
indeed has to be the case.
Proposition 1. Let P ⊆ [0, 1]n be a polytope in the 0/1-cube with PI = ∅ and
rank(P ) = n. Any inequality description of P has at least 2n inequalities.
Proof. For a polytope P ⊆ Rn and for some i ∈ {1, . . . , n} and ` ∈ {0, 1} let
Pi` ⊆ Rn−1 be the polytope defined by
Pi` = {x ∈ [0, 1]n−1 | (x1 , . . . , xi−1 , `, xi+1 , . . . , xn )T ∈ P }.
Notice that, if P is contained in a facet (xi = `) of [0, 1]n for some ` ∈ {0, 1}
and some i ∈ {1, . . . , n}, then the Chvátal rank of P is the Chvátal rank of Pi` .
We will prove now that any one-dimensional face F1 of the cube satisfies
F1 ∩ P 6= ∅. We proceed by induction on n.
If n = 1, this is definitely true since P is not empty and since F1 is the cube
itself. For n > 1, observe that any one-dimensional face F1 of the cube lies in a
facet (xi = `) of the cube, for some ` ∈ {0, 1} and for some i ∈ {1, . . . , n}. Since
P has Chvátal rank n it follows that P̃ = (xi = `) ∩ P has Chvátal rank n − 1.
If the Chvátal rank of P̃ was less than that, P would vanish after n − 1 steps.
It follows by induction that (F1 )`i ∩ P̃i` 6= ∅, thus F1 ∩ P 6= ∅.
Now, each 0/1-point has to be cut off from P by some inequality, as PI = ∅.
If an inequality c x 6 δ cuts off two different 0/1-points simultaneously, then it
must also cut off a 1-dimensional face of [0, 1]n . Because of our previous obser-
vation this is not possible, and hence there is at least one inequality for each
0/1-point which cuts off only this point. Since there are 2n different 0/1-points
in the cube, the claim follows. t
u
Bounds on the Chvátal Rank of Polytopes in the 0/1-Cube 143

We close this section by introducing some further notation. The `∞ -norm


kck∞ of a vector c ∈ Rn is the largest absolute value of its entries, Pn kck∞ =
max{|ci | | i = 1, . . . , n}. The `1 -norm kck1 of c is the sum kck1 = i=1 |ci |. We
define the function lg : N → N as
(
1 if n = 0
lg n =
1 + blog2 (n)c if n > 0

where byc denotes the largest integer smaller than or equal to y. Note that lg n
is the number of bits in the binary representation of n. For a vector x ∈ Rn , bxc
denotes the vector obtained by component-wise application of b·c.

3 A New Upper Bound on the Chvátal Rank

We call a vector c saturated with respect to a polytope P , if max{c x | x ∈


P } = max{c x | x ∈ PI }. If Ax 6 b is an inequality description of PI , then
P = PI if and only if each row vector of A is saturated w.r.t. P . In [6], it is
shown that an integral vector c ∈ Zn is saturated after at most n2 lg kck∞ steps
of the Gomory-Chvátal procedure. Since each 0/1-polytope has a representation
Ax 6 b with A ∈ Zm×n, b ∈ Zm such that each absolute value of an entry in
A is bounded by nn/2 (see, e.g., [33]), the known bound of O(n3 lg n) follows.
One drawback in this proof is that faces of P which do not contain 0/1-points
are taken to have worst case behavior n. The following observation is crucial to
derive a better bound.
Lemma 4. Let c x 6 α be valid for PI and c x 6 γ be valid for P , where α 6 γ,
α, γ ∈ Z and c ∈ Zn. If, for each β ∈ R, β > α, the polytope Fβ = P ∩ (c x = β)
does not intersect with two opposite facets of the 0/1-cube, then the depth of
c x 6 α is at most 2(γ − α).

Proof. Notice that Fβ0 = ∅ for each β > α. The proof is by induction on γ − α.
If α = γ, there is nothing to prove. So let γ − α > 0. Since Fγ0 = ∅, Lemma 1
implies that c x 6 γ −  is valid for P 0 for some  > 0 and thus the inequality
c x 6 γ − 1 is valid for P (2) . t
u

Proposition 2. Let P be a rational polytope in the n-dimensional 0/1-cube.


Any integral vector c ∈ Zn is saturated w.r.t. P (t) , for any t > 2(n2 + n lg kck∞ ).

Proof. We can assume that c > 0 holds and that PI 6= ∅. (It is shown in [6] that
polytopes with empty integer hull have Chvátal rank at most n.) The proof is
by induction on n and lg kck∞ . The claim holds for n = 1, 2 since the Chvátal
rank of a polytope in the 1- or 2-dimensional 0/1-cube is at most 4.
So let n > 2. If lg(kck∞ ) = 1, then the claim follows, e.g., from Theorem 3
below. So let lg kck∞ > 1. Write c = 2c1 + c2 , where c1 = bc/2c and c2 ∈ {0, 1}n.
By induction, it takes at most 2(n2 + n lg kc1 k∞ ) = 2(n2 + n lg kck∞ ) − 2n
144 Friedrich Eisenbrand and Andreas S. Schulz

iterations of the Gomory-Chvátal procedure until c1 is saturated. Let k = 2(n2 +


n lg kck∞ ) − 2n.
Let α = max{c x | x ∈ PI } and γ = max{c x | x ∈ P (k) }. The integrality gap
γ − α is at most n. This can be seen as follows. Choose x̂ ∈ P (k) with c x̂ = γ
and let xI ∈ PI satisfy c1 xI = max{c1 x | x ∈ P (k) }. One can choose xI out of
PI since c1 is saturated w.r.t. P (k) . It follows that
γ − α 6 c(x̂ − xI ) = 2c1 (x̂ − xI ) + c2 (x̂ − xI ) 6 n .
Consider now an arbitrary fixing of an arbitrary variable xi to a specific value
`, ` ∈ {0, 1}. The result is the polytope
Pi` = {x ∈ [0, 1]n−1 | (x1 , . . . , xi−1 , `, xi+1 , . . . , xn )T ∈ P }
in the (n − 1)-dimensional 0/1-cube for which, by the induction hypothesis, the
vector e
ci = (c1 , . . . , ci−1 , ci+1 , . . . , cn ) is saturated after at most
ci k∞ ) 6 2(n2 + n lg kck∞ ) − 2n
2((n − 1)2 + (n − 1) lg ke
iterations.
It follows that
α − `ci > max{eci x | x ∈ (Pi` )(k) } = max{eci x | x ∈ (Pi` )I }.
If β > α, then (c x = β) ∩ P (k) cannot intersect with a facet of the cube, since a
point in (c x = β) ∩ P (k) ∩ (xi = `), ` ∈ {0, 1}, has to satisfy c x 6 α.
With Lemma 4, after 2n more iterations of the Gomory-Chvátal procedure,
c is saturated, which altogether happens after 2(n2 + n lg kck∞ ) iterations. u t
We conclude this section with a new upper bound on the Chvátal rank.
Theorem 2. The Chvátal rank of a polytope in the n-dimensional 0/1-cube is
O(n2 log n).
Proof. Each polytope Q in the 0/1-cube has a rational weakening P . The integral
0/1-polytope PI can be described by a system of integral inequalities PI = {x ∈
Rn | Ax 6 b} with A ∈ Zm×n, b ∈ Zm such that each absolute value of an entry
in A is bounded by nn/2 . We estimate the number of Gomory-Chvátal steps until
all row-vectors of A are saturated. Proposition 2 implies that those row-vectors
are saturated after at most 2(n2 + n lg nn/2 ) 6 3 n2 lg n steps. t
u

4 A Different Upper Bound on the Depth


In this section we show that any inequality c x 6 δ, which is valid for the integer
hull of a polytope P in the n-dimensional 0/1-cube, has depth at most n + kck1
w.r.t. P .
We start by recalling some useful properties of monotone polyhedra, prove
then that the Gomory-Chvátal operation is compliant with unimodular trans-
formations, and eventually reduce the general case to the depth of inequalities
over monotone polytopes via a special unimodular transformation.
Bounds on the Chvátal Rank of Polytopes in the 0/1-Cube 145

4.1 Monotone Polyhedra

A nonempty polyhedron P ⊆ Rn>0 is called monotone if x ∈ P and 0 6 y 6 x


imply y ∈ P . Hammer, Johnson, and Peled [26] observed that a polyhedron P
is monotone if and only if P can be described by a system x > 0, Ax 6 b with
A, b > 0.
The next statements are proved in [27] and [13, p. 494]. We include a proof
of Lemma 6 for the sake of completeness.

Lemma 5. If P is a monotone polyhedron, then P 0 is monotone as well.

Lemma 6. Let P be a monotone polytope in the 0/1-cube and let w x 6 δ,


w ∈ Zn, be valid for PI . Then w x 6 δ has depth at most kwk1 − δ.

Proof. The proof is by induction on kwk1 . If kwk1 = 0, the claim follows trivially.
P w > 0 holds. Let γ = max{w x | x ∈ P } and let
W.l.o.g., we can assume that
J = {j | wj > 0}. If max{ j∈J xj | x ∈ P } = |J|, then, since P is monotone, x̂
with
(
1 if i ∈ J,
x̂i =
0 otherwise

is in P
P. Also w x̂ = γ must hold. So Pγ = δ and the claim follows trivially. If
max{ j∈J xj | x ∈ P } < |J|, then j∈J xj 6 |J| − 1 has depth at most 1. If
kwk1 = 1 this also implies the claim, so assume kwk1 > 2. By induction the
valid inequalities w x − xj 6 δ, j ∈ J have depth
P at most kwk1 − δ − 1. Adding
up the inequalities w x − xj 6 δ, j ∈ J and j∈J xj 6 |J| − 1 yields

w x 6 δ + (|J| − 1)/|J|.

Rounding down yields w x 6 δ and the claim follows. t


u

4.2 Unimodular Transformations

Unimodular transformations and in particular switching operations will play a


crucial role to relate the Chvátal rank of arbitrary polytopes in the 0/1-cube to
the Chvátal rank of monotone polytopes. In this section, we show that unimod-
ular transformations and the Gomory-Chvátal operation commute.
A unimodular transformation is a mapping

u : Rn → Rn
x 7→ U x + v,

where U ∈ Zn×n is a unimodular matrix, i.e., det(U ) = ±1, and v ∈ Zn.


Note that u is a bijection. Its inverse is the unimodular transformation
u−1 (x) = U −1 x − U −1 v. Since U −1 ∈ Zn×n, u is also a bijection of Zn.
146 Friedrich Eisenbrand and Andreas S. Schulz

Consider the rational halfspace (c x 6 δ), c∈ Zn, δ ∈ Q . The set u(c x 6 δ)


is the rational halfspace

{x ∈ Rn | c u−1 (x) 6 δ} = {x ∈ Rn | c U −1 x 6 δ + c U −1 v}
= (c U −1 x 6 δ + c U −1 v).

Notice that the vector c U −1 is also integral. Let S be some subset of Rn . It


follows that (c x 6 δ) ⊇ S if and only if (c U −1 x 6 δ + c U −1 v) ⊇ u(S).
Consider now the first elementary closure P 0 of some polyhedron P ,
\
P0 = (c x 6 bδc).
6
(c x δ)⊇P
Z
c∈ n

It follows that
\
u(P 0 ) = (c U −1 x 6 bδc + c U −1 v).
6
(c x δ)⊇P
Z
c∈ n

From this one can derive the next lemma.

Lemma 7. Let P be a polyhedron and u be a unimodular transformation. Then

u(P 0 ) = (u(P ))0 .

Corollary 1. Let P ⊆ Rn be a polyhedron and let c x 6 δ be a valid inequality


for PI . Let u be a unimodular transformation. The inequality c x 6 δ is valid for
P (k) if and only if u(c x 6 δ) is valid for (u(P ))(k) .

The i-th switching operation is the unimodular transformation

πi : Rn → Rn
(x1 , . . . , xn ) 7→ (x1 , . . . , xi−1 , 1 − xi , xi+1 , . . . , xn ),

It has a representation

πi : Rn → Rn
x 7→ U x + ei ,

where U coincides with the identity matrix In except for U(i,i) which is −1. Note
that the switching operation is a bijection of [0, 1]n . For the set (c x 6 δ) one
has πi (c x 6 δ) = e
c x 6 δ − ci . Here e
c coincides with c except for a change of
sign in the i-th component.
Bounds on the Chvátal Rank of Polytopes in the 0/1-Cube 147

4.3 The Reduction to Monotone Weakenings

If one wants to examine the depth of a particular inequality with respect to a


polytope P ⊆ [0, 1]n , one can apply a series of switching operations until all
its coefficients become nonnegative. An inequality with nonnegative coefficients
defines a (fractional) 0/1-knapsack polytope K. The depth of this inequality
with respect to the convex hull of P and K is then an upper bound on the depth
with respect to P . We will show that conv(P, K)(n) has a monotone weakening
in the 0/1-cube.

Lemma 8. Let P ⊆ [0, 1]n be a polytope in the 0/1-cube, with PI = KI , where


K = {x | c x 6 δ, 0 6 x 6 1} and c > 0. Then, P (n) has a rational, monotone
weakening Q in the 0/1-cube.

Proof. We can assume that P is rational. Let x̂ be a 0/1-point P which is not


contained in P , i.e., c x̂ > δ. Let I = {i | x̂i = 1}. The inequality i∈I xi 6 |I| is
P for the cube and thus for P . Since c > 0, the corresponding face F = {x |
valid
i∈IPxi = |I|, x ∈ P } of P does not contain any 0/1-points. Lemma 2 implies
that i∈I xi 6 |I| − 1 is valid for P (n) .
Thus, for each 0/1-point x̂ which is not in P , there exists a nonnegative
rational inequality ax̂ x 6 γx̂ which is valid for P (n) and which cuts x̂ off. Thus

0 6 xi 6 1, i ∈ {1, . . . , n}
ax̂ x 6 γx̂ , x̂ ∈ {0, 1}n, x̂ ∈
/P

is the desired weakening. t


u

Theorem 3. Let P ⊆ [0, 1]n , P 6= ∅ be a nonempty polytope in the 0/1-cube


and let c x 6 δ be a valid inequality for PI with c ∈ Zn. Then c x 6 δ has depth
at most n + kck1 with respect to P .

Proof. One can assume that c is nonnegative, since one can apply a series of
switching operations. Notice that this can change the right hand side δ, but in
the end δ has to be nonnegative since P 6= ∅. Let K = {x ∈ [0, 1]n | c x 6 δ}
and consider the polytope Q = conv(K, P ). The inequality c x 6 δ is valid for
QI and the depth of c x 6 δ with respect to P is at most the depth of c x 6 δ
with respect to Q. By Lemma 8, Q(n) has a monotone weakening S. The depth
of c x 6 δ with respect to Q(n) is at most the depth of c x 6 δ with respect to
S. But it follows from Lemma 6 that the depth of c x 6 δ with respect to S is
at most kck1 − δ 6 kck1 . t
u

5 A New Lower Bound on the Chvátal Rank

To the best of the authors’ knowledge, no example of a polytope P in the n-


dimensional 0/1-cube with rank(P ) > n has been provided in the literature so
far. We now show that r(n) > (1 + )n, for infinitely many n, where  > 0.
148 Friedrich Eisenbrand and Andreas S. Schulz

The construction relies on the lower bound result for the fractional stable-set
polytope due to Chvátal, Cook, and Hartmann [13].
Let G = (V, E) be a graph on n vertices, C be the family of all cliques of
G, and let Q ⊆ Rn be the fractional stable set polytope of G defined by the
equations

x(C) 6 1 for all C ∈ C,


xv > 0 for all v ∈ V.
(3)

Let e be the vector of all ones. The following lemma is proved in [13, Proof
of Lemma 3.1].
Lemma 9. Let k < s be positive integers and let G be a graph with n vertices
such that every subgraph of G with s vertices is k-colorable. If P is a polyhedron
that contains QI and the point u = k1 e, then P (j) contains the point xj =
s j
( s+k ) u.
Let α(G) be the size of the largest independent subset of the nodes of G. It
follows that e x 6 α(G) is valid for QI . One has

e xj =
n
(
s j
k s+k
) > nk e−jk/s ,

and thus xj does not satisfy the inequality e x 6 α(G) for all j < (s/k) ln kα(G)
n
.
Erdös proved in [17] that for every positive t there exist a positive integer
c, a positive number δ and arbitrarily large graphs G with n vertices, cn edges,
α(G) < tn and every subgraph of G with at most δn vertices is 3 colorable.
n
One wants that ln kα(G) > 1 and that s/k grows linearly, so by chosing some
t < 1/(3e), k = 3 and s = bδnc one has that xj does not satisfy the inequality
e x 6 α(G) for all j < (s/k).
We now give the construction. Let P be the polytope that results from the
convex hull of Pn defined in (2) and Q. Pn ⊆ P contributes to the fact that
1/2 e is in P (n−1) [13, Lemma 7.2]. Thus x0 = 1/3 e is in P (n−1) . Since the
convex hull of P is QI , it follows from the above discussion that the depth of
e x 6 α(G) with respect to P (n−1) is Ω(n). Thus the depth of e x 6 α(G) is at
least (n − 1) + Ω(n) > (1 + )n for infinitely many n, where  > 0.

Acknowledgments
The authors are grateful to Alexander Bockmayr, Volker Priebe, and Günter
Ziegler for helpful comments on an earlier version of this paper.

References
[1] N. Alon and V. H. Vu. Anti-Hadamard matrices, coin weighing, threshold gates,
and indecomposable hypergraphs. Journal of Combinatorial Theory, 79A:133–
160, 1997.
Bounds on the Chvátal Rank of Polytopes in the 0/1-Cube 149

[2] E. Balas, S. Ceria, G. Cornuéjols, and N. R. Natraj. Gomory cuts revisited.


Operations Research Letters, 19:1–9, 1996.
[3] E. Balas and M. J. Saltzman. Facets of the three-index assignment polytope.
Discrete Applied Mathematics, 23:201–229, 1989.
[4] F. Barahona, M. Grötschel, and A. R. Mahjoub. Facets of the bipartite subgraph
polytope. Mathematics of Operations Research, 10:340–358, 1985.
[5] A. Bockmayr and F. Eisenbrand. On the Chvátal rank of polytopes in the
0/1 cube. Research Report MPI-I-97-2-009, Max-Planck-Institut für Informatik,
September 1997.
[6] A. Bockmayr, F. Eisenbrand, M. E. Hartmann, and A. S. Schulz. On the Chvátal
rank of polytopes in the 0/1 cube. Technical Report 616, Technical University of
Berlin, Department of Mathematics, December 1998.
[7] M. Bonet, T. Pitassi, and R. Raz. Lower bounds for cutting planes proofs with
small coefficients. Journal of Symbolic Logic, 62:708–728, 1997.
[8] S. C. Boyd and W. H. Cunningham. Small travelling salesman polytopes. Math-
ematics of Operations Research, 16:259–271, 1991.
[9] S. C. Boyd, W. H. Cunningham, M. Queyranne, and Y. Wang. Ladders for
travelling salesmen. SIAM Journal on Optimization, 5:408–420, 1995.
[10] S. C. Boyd and W. R. Pulleyblank. Optimizing over the subtour polytope of the
travelling salesman problem. Mathematical Programming, 49:163–187, 1991.
[11] V. Chvátal. Edmonds polytopes and a hierarchy of combinatorial problems. Dis-
crete Mathematics, 4:305–337, 1973.
[12] V. Chvátal. Flip-flops in hypohamiltonian graphs. Canadian Mathematical Bul-
letin, 16:33–41, 1973.
[13] V. Chvátal, W. Cook, and M. E. Hartmann. On cutting-plane proofs in com-
binatorial optimization. Linear Algebra and its Applications, 114/115:455–499,
1989.
[14] W. Cook, C. R. Coullard, and Gy. Turán. On the complexity of cutting plane
proofs. Discrete Applied Mathematics, 18:25–38, 1987.
[15] W. Cook, W. H. Cunningham, W. R. Pulleyblank, and A. Schrijver. Combinato-
rial Optimization. John Wiley, 1998.
[16] F. Eisenbrand. A note on the membership problem for the first elementary closure
of a polyhedron. Technical Report 605, Technical University of Berlin, Department
of Mathematics, November 1998. To appear in Combinatorica.
[17] P. Erdös. On circuits and subgraphs of chromatic graphs. Mathematika, 9:170–
175, 1962.
[18] M. Fischetti. Three facet lifting theorems for the asymmetric traveling salesman
polytope. In E. Balas, G. Cournuéjols, and R. Kannan, editors, Integer Program-
ming and Combinatorial Optimization, pages 260–273. Proceedings of the 2nd
IPCO Conference, 1992.
[19] T. Fleiner, V. Kaibel, and G. Rote. Upper bounds on the maximal number of
facets of 0/1-polytopes. Technical Report 98-327, University of Cologne, Depart-
ment of Computer Science, 1998. To appear in European Journal of Combina-
torics.
[20] R. Giles and L. E. Trotter. On stable set polyhedra for K1,3 -free graphs. Journal
of Combinatorial Theory, 31:313–326, 1981.
[21] R. E. Gomory. Outline of an algorithm for integer solutions to linear programs.
Bulletin of the American Mathematical Society, 64:275–278, 1958.
[22] R. E. Gomory. An algorithm for integer solutions to linear programs. In R. L.
Graves and P. Wolfe, editors, Recent Advances in Mathematical Programming,
pages 269–302. McGraw-Hill, 1963.
150 Friedrich Eisenbrand and Andreas S. Schulz

[23] M. Grötschel and M. W. Padberg. Polyhedral theory. In E. L. Lawler, J. K.


Lenstra, A. H. G. Rinnoy Kan, and D. B. Shmoys, editors, The Traveling Salesman
Problem: A Guided Tour of Combinatorial Optimization, pages 251-305. John
Wiley, 1985.
[24] M. Grötschel and W. R. Pulleyblank. Clique tree inequalities and the symmetric
travelling salesman problem. Mathematics of Operations Research, 11:537–569,
1986.
[25] A. Haken. The intractability of resolution. Theoretical Computer Science, 39:297–
308, 1985.
[26] P. L. Hammer, E. Johnson, and U. N. Peled. Facets of regular 0-1 polytopes.
Mathematical Programming, 8:179–206, 1975.
[27] M. E. Hartmann. Cutting planes and the complexity of the integer hull. Technical
Report 819, School of Operations Research and Industrial Engineering, Cornell
University, September 1988.
[28] M. E. Hartmann. Personal communication, March 1998.
[29] M. E. Hartmann, M. Queyranne, and Y. Wang. On the Chvátal rank of certain
inequalities. This volume, 1999.
[30] R. Impagliazzo, T. Pitassi, and A. Urquhart. Upper and lower bound for tree-like
cutting plane proofs. In Proc. Logic in Computer Science, LICS’94, Paris, 1994.
[31] U. H. Kortenkamp, J. Richter-Gebert, A. Sarangarajan, and G. M. Ziegler.
Extremal properties of 0/1-polytopes. Discrete and Computational Geometry,
17:439–448, 1997.
[32] D. Naddef. The Hirsch conjecture is true for (0,1)-polytopes. Mathematical Pro-
gramming, 45:109–110, 1989.
[33] M. W. Padberg and M. Grötschel. Polyhedral computations. In E. L. Lawler,
J. K. Lenstra, A. H. G. Rinnoy Kan, and D. B. Shmoys, editors, The Traveling
Salesman Problem: A Guided Tour of Combinatorial Optimization, pages 307–360.
John Wiley, 1985.
[34] P. Pudlák. Lower bounds for resolution and cutting plane proofs and monotone
computations. Journal of Symbolic Logic, 62:981–988, 1997.
[35] W. R. Pulleyblank. Polyhedral combinatorics. In G. L. Nemhauser, A. H. G.
Rinnooy Kan, and M. J. Todd, editors, Optimization, Volume 1 of Handbooks
in Operations Research and Management Science, Chapter V, pages 371–446.
Elsevier, 1989.
[36] A. Schrijver. On cutting planes. Annals of Discrete Mathematics, 9:291–296, 1980.
[37] A. Schrijver. Theory of Linear and Integer Programming. John Wiley, 1986.
[38] A. S. Schulz. Polytopes and Scheduling. PhD thesis, Technical University of Berlin,
Berlin, Germany, 1996.
[39] A. S. Schulz. A simple proof that the Chvátal rank of polytopes in the 0/1-cube
is small. Unpublished manuscript, September 1997.
[40] A. S. Schulz, R. Weismantel, and G. M. Ziegler. An optimization problem is ten
problems. In preparation.
Universally Maximum Flow with
Piecewise-Constant Capacities

Lisa Fleischer

Department of Industrial Engineering and Operations Research


Columbia University, New York, NY 10027
[email protected]

Abstract. The maximum dynamic flow problem generalizes the stan-


dard maximum flow problem by introducing time. The object is to send
as much flow from source to sink in T time units as possible, where ca-
pacities are interpreted as an upper bound on the rate of flow entering
an arc. A related problem is the universally maximum flow, which is to
send a flow from source to sink that maximizes the amount of flow ar-
riving at the sink by time t simultaneously for all t ≤ T . We consider a
further generalization of this problem that allows arc and node capacities
to change over time. In particular, given a network with arc and node
capacities that are piecewise constant functions of time with at most
k breakpoints, and a time bound T , we show how to compute a flow
that maximizes the amount of flow reaching the sink in all time intervals
(0, t] simultaneously for all 0 < t ≤ T , in O(k2 mn log(kn2 /m)) time. The
best previous algorithm requires O(nk) maximum flow computations on
a network with (m + n)k arcs and nk nodes.

1 Introduction
In the 1960’s, Ford and Fulkerson introduced dynamic network flows to include
time in the standard network flow model. Since then, dynamic network flows
have been used widely to model network-structured, decision-making problems
over time: problems in electronic communication, production and distribution,
economic planning, cash flow, job scheduling, and transportation. For examples,
see the surveys of Aronson [4] and Powell, et al. [20].
The maximum dynamic flow problem generalizes the standard maximum flow
problem by introducing time. A standard network consists of a set of nodes V
and a set of arcs E which is a subset of V × V . The capacity function from
the arcs to the real numbers bounds the amount of flow allowed on each arc. In
a dynamic network, the capacity function u limits the rate of flow into an arc.
In addition, a dynamic network has a transit-time vector % associated with the
arcs. The transit time of an arc is the amount of time it takes for flow to travel
from one end of the arc to the other.
Ford and Fulkerson [7] consider the dynamic maximum flow problem: given
a dynamic network with a specified source and sink, determine the maximum
amount of flow that can be sent from source to sink in T time units. They show

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 151–165, 1999.
c Springer-Verlag Berlin Heidelberg 1999
152 Lisa Fleischer

that the problem can be solved in polynomial time by using information obtained
from a minimum cost flow computation in a related network of comparable size.
A generalization of dynamic maximum flows, a universally maximum flow is a
flow that simultaneously maximizes the amount of flow reaching a specified sink
node d, sent from a specified source node s, by time t, for all 0 < t ≤ T . Wilkin-
son [24] and Minieka [14] showed that such a flow exists, and they both provide
algorithms to solve this problem, but these do not run in polynomial time. There
is no known polynomial time algorithm that solves the universally maximum flow
problem. It is not even known if the optimal solution is polynomial in the size
of the input. Hoppe and Tardos [11,12] present an O(m−1 (m + n log n) log U )
algorithm that computes a dynamic flow with the property that the quantity of
flow reaching the sink by time t is within (1 − ) of the maximum possible for
all 0 ≤ t ≤ T .
Another generalization of the maximum dynamic flow problem is the problem
with multiple sources and sinks. The dynamic transshipment problem is a single
commodity flow problem that asks to send specified supplies located at source
nodes to satisfy specified demands at sink nodes within a given time bound T .
A universally quickest transshipment is a dynamic flow that satisfies as much
demand as possible in every interval of time (0, t] for 0 < t ≤ T . If there is
more than one sink, a universally quickest transshipment may not exist, so the
universally quickest transshipment problem refers to a dynamic transshipment
problem with one sink only.
One interesting special case of dynamic flow problems is when all transit
times are zero. In this setting, the universally maximum flow problem is solved
by sending flow at the rate of a maximum flow in the static network at every
moment of time in [0, T ]. (The static network is the interpretation of the dy-
namic network as a standard network. Transit times are ignored, and capacities
are interpreted as bound on the total flow allowed on an arc.) A universally
quickest transshipment with zero transit times has a more complicated struc-
ture. This problem models the situation when the supplies and demands exceed
network capacity, and it is necessary to send flow over time, in rounds. Unlike
standard flow problems with multiple sources and sinks, this problem cannot
be modeled as an equivalent s − t dynamic maximum flow problem. The rea-
son such a transformation does not work here is because the arc capacities are
upper bounds on the rate of flow per time period, and do not correspond to
the total amount of flow an arc may carry throughout an interval. The dynamic
transshipment problem with zero transit times is discussed in [6,10,15,23]. Hajek
and Ogier [10] describe the first polynomial time algorithm to find a universally
quickest transshipment for a dynamic network with all zero transit times. Their
algorithm uses n maximum flow computations on the underlying static network.
This is improved in [6] with an algorithm that solves this problem in the same
asymptotic time as a preflow-push maximum flow algorithm.
All of the above mentioned problems may be generalized to allow storage
of flow at the nodes. In this setting, there may be node storage capacities that
limit the total excess flow allowed at a node. Flow deficit is not allowed. In
Universally Maximum Flow with Piecewise-Constant Capacities 153

the above mentioned problems, even when node excess is allowed, there always
exists an optimal solution that does not use any node storage, and the algorithms
mentioned above find such a solution. For the universally quickest transshipment
with piecewise-constant capacity functions, the subject of this paper, this will
not be the case: node storage will be used in an optimal solution, and the amount
of available storage will affect the solution.
Many other generalizations of this problem have been considered in the lit-
erature, however few other generalizations are known to be polynomial-time
solvable. One exception is the integer dynamic transshipment problem with gen-
eral transit times, for which Hoppe and Tardos [13] give the only polynomial
time algorithm known to solve this problem. Their algorithm repeatedly calls an
oracle to minimize submodular functions. Orlin [17] provides polynomial time
algorithms for some infinite horizon problems, where the objective function is
related to average throughput. Burkard, et al. [5] find the minimum feasible time
to send specified flow from source to sink faster than incorporating the Ford-
Fulkerson algorithm in a binary search framework. Most other work has focussed
on characterizing the structure of problems with more general capacity functions,
and determining when optimal solutions exist [1,2,3,18,19,21,22]. None of these
problems are known to be polynomial time solvable. Among the earlier work,
Anderson, Nash, and Philpott [2] consider the problem of finding a maximum
flow in a network with zero transit times, and with capacities and node storage
that are much more general functions of time, and develop a duality theory for
continuous network flows.
We consider the generalization of the zero transit time, universally maximum
flow problem that allows both arc capacities and node storage capacities to be
piecewise-constant functions on [0, T ] with at most k breakpoints. As with the
general universally maximum flow problem, it is not clear a universally maxi-
mum flow exists in these circumstances. Ogier [16] proves it does, and provides
a polynomial time algorithm that finds a universally maximum flow with at
most kn breakpoints. These breakpoints can be computed with nk maximum
flow computations on a network with nk vertices and (m + n)k arcs. After the
breakpoints are computed, additional maximum flow computations on the same
network are used to calculate the flow between two breakpoints. Thus the to-
tal run time of Ogier’s algorithm is determined by the time to solve O(nk)
maximum flow problems on a network with nk vertices and (n + m)k arcs. As
Ogier [16] demonstrates, this problem also generalizes the universally quickest
transshipment problem.
The main contributions of this paper is to a) recognize that these problems
can be solved by solving a parametric maximum flow problem on a suitably de-
fined graph and b) generalize the parametric maximum flow algorithm of Gallo,
Grigoriadis, and Tarjan [8] to fit the needs of this problem. The end result is
that all the computations described in [16] can be performed in the same asymp-
totic time as one preflow-push maximum flow computation on a network with
nk vertices and (n + m)k arcs. This improves the previous strongly polynomial
run time by a factor of O(nk).
154 Lisa Fleischer

1.1 The Model


The universally maximum flow problem with piecewise constant capacity func-
tions (UMFP) is defined on a network with n nodes and m arcs, along with a
nonnegative, left-continuous, piecewise-constant capacity function u : (E ∪ V ) ×
(0, T ] → Z+ ∪ {0}. When referring to the capacity of a particular arc e or a
particular node i at a particular time t, we use the notation ue (t) or ui (t). The
vector of all capacities at time t is simply denoted u(t), the vector of just arc
capacities is denoted uE (t), and the vector of node Pcapacities is denoted uV (t).
For A ⊂ V , we use the notation uA,A (t) to denote i∈A,j∈A uij (t), and similarly
for a flow function x(t).
For convenience of notation, we assume that the breakpoints of the capacity
functions are exactly one unit apart. If this is not the case, we can scale time
between each breakpoint to create an equivalent problem with this property.
That is, if the time between two breakpoints is τ , then we create an equivalent
problem by changing this to 1 and multiplying all capacities in this interval by
τ . Any flow that solves this modified problem can be transformed back to a
solution to the original problem by dividing the rate of flow sent in the new time
interval by τ and sending it over an interval that is τ times longer. In such a
manner, we scale T so that it equals k. Thus, for the remainder of the paper,
we assume our time bound is k, all k breakpoints are in the set {1, 2, . . . , k}, the
domain of u is (E ∪ V ) × (0, k], and u(τ ) = u(dτ e) for all 0 ≤ τ ≤ k.
Define x to be a feasible dynamic flow if xe is a Lebesgue-measurable function
on (0, k] for all e ∈ E, x obeys the antisymmetry constraints xe = −xe for
all e ∈ E (where e is the reverse of arc e) and the arc capacity constraints
xe (t) ≤ ue (t) for all e ∈ E and all t ∈ (0, k]; and the excess at each node, defined
by Z τX
pj (τ ) := xij (t)dt,
0 i∈V

obeys the node capacity constraint 0 ≤ pj (t) ≤ uj (t) for all j ∈ V − {s, d}, and
all 0 ≤ t ≤ k. Denote the set of feasible dynamic flows by D. The value of flow x
at time τ , denoted vτ (x), is the total flow reaching the sink by time τ . That is,
Z τ
vτ (x) := xN,d (t)dt.
0

In this paper, we will use a series of related time-expanded graphs. A time-


expanded network of N is a directed graph that contains a copy of N for each
time unit, and holdover arcs from a copy of a node at time θ to the copy of the
same node at time θ + 1 for θ ∈ Z. See Figure 1. Let Gτ be the time-expanded
graph on k + 1 copies of N , labeled N0 , . . . , Nk . Let i(θ) be the copy of vertex
i in copy Nθ . Assign the arcs in Nθ capacities uE (θ) for θ ∈ {0, . . . , dτ e}, and 0
otherwise. Assign holdover arcs leaving Nθ capacities equal to uV (θ+1) for θ = 0
to bτ c, and zero otherwise. By our scaling assumption, there are exactly k + 1
unique graphs Gτ , one for each θ ∈ {0, 1, . . . , k}, such that for all 0 ≤ τ ≤ k,
Gτ = Gdτ e .
Universally Maximum Flow with Piecewise-Constant Capacities 155

G1
G 1.5
s

d
Θ= 0 1 2 3
H 1.5

initial network G:
i
u = u( Θ )
u = 1/2 u( Θ )
s d
u=0

Fig. 1. Examples of time-expanded graphs Gτ and Hτ for τ = 1, 1.5.

1.2 Optimality Conditions

A τ -maximum flow is a feasible dynamic flow x that maximizes vτ (x). As in


the static flow setting, we can define a notion of dynamic cut that proves the
optimality of a τ -maximum flow. Dynamic cuts can be interpreted as cuts in
a modified time-expanded network of N . A dynamic cut is a function C that
maps [0, k] to subsets of vertices of the network N such that C(α) = C(β) for
all α, β ∈ (θ, θ + 1], θ ∈ {0, 1, . . . , k − 1}, and C(t) ∩ {s, d} = {d} for all t ∈ (0, k].
Note that this use of cut is the complement of what is traditionally defined as
a cut. Let C := {C|C is a dynamic cut}. Since the range of a dynamic cut is
a finite set, and, over the domain [0, k], each dynamic cut can change only k
times (each is uniform over each interval (θ, θ + 1]), C is a finite set. The value
of dynamic cut C at time τ is determined by arc capacities (the first term) and
node capacities (the second term) and is expressed as

Z τ dτ e−1
X X
wτ (C) = uC(t)C(t) (t)dt + uj (θ). (1)
0 θ=1 j∈C(θ+1)∩C(θ)

A dynamic cut can be interpreted as a set of arcs entering a cut in the time-
expanded network. This will be made more explicit following Theorem 1. This
theorem is comparable to strong duality in standard network flows. The proof
of even the weak duality case is a little more involved than the corresponding
static version and is omitted. A slightly weaker statement is proved by Anderson,
Nash, and Philpott [1], but for more general capacity functions.

Theorem 1 (Ogier [16]). For all τ , minC∈C wτ (C) = maxx∈D vτ (x).


156 Lisa Fleischer

Corollary 1. If θ ∈ {1, 2, . . . , k}, then a maximum flow in the time-expanded


network Gθ corresponds to a θ-maximum flow of the same value, where the flow
on arc e(θ) in Gθ is precisely the flow on this arc in the θ-maximum flow over
the interval (θ, θ + 1].
Proof. A maximum flow in Gθ gives rise to a minimum cut Rθ that contains
all copies of the sink node in Gθ , and no copies of the source node. Letting
C(τ ) = C(dτ e) := Ndτ e ∩ Rθ for all τ ∈ (0, k], we have that C is a dynamic cut
of value equal to the maximum flow in Gθ , and hence equal to the value of the
corresponding dynamic flow described in the corollary.
We extend the series of time-expanded graphs Gτ to a more continuous
version Hτ , so that we may relate τ -maximum flows to τ -minimum cuts for
τ 6∈ {1, 2, . . . , k}. For each real 0 ≤ τ ≤ k, we define Hτ to be a graph on the
same vertex set and support as Gτ , but with a slightly different capacity func-
tion. In Hτ the capacities of the arcs in Nbτ c+1 are multiplied by τ − bτ c, which
is always less than one. The capacities of all other arcs equal the capacities of
the arcs in Gτ . Denote the capacity function of Hτ by w̃τ . Note that Hθ = Gθ
for all θ ∈ {0, 1, . . . , k}. Hτ and Gτ are depicted in Figure 1.
The value of a dynamic cut at time τ corresponds to the value of the arcs
entering the sink side of a cut in Hτ . We make this explicit as follows. Denote
by Vτ∗ the vertex set of Hτ (and Gτ ). Since the vertex sets of each Hτ and Gτ
are the same for all τ , we will denote this V ∗ when the context is clear. A cut
in Hτ is a set R ⊂ V ∗ such that R contains all copies of the sink node, and no
copies of the source. That is, ifSsθ is the copy of Sks and dθ is the copy of d in Nθ
k
for θ ∈ {0, 1, . . . , k}, then R ∩ θ=0 {sθ , dθ } = θ=0 {dθ }. See Figure 2.
For a set R ⊂ V ∗ , define w̃τ (R) to be the sum of capacities of arcs in Hτ
entering R. We can equate each dynamic cut C ∈ C with a cut R such that
wτ (C) = w̃τ (R) by setting R ∩ Nθ = C(θ) for θ ∈ {0, 1, . . . , k}. Define R(θ) :=
R∩Nθ for θ ∈ {0, 1, . . . , k}. The following corollary follows easily from the above
discussions.
Corollary 2. maxx∈D vτ (x) = minR⊂V ∗ w̃τ (R) and a maximum flow in Hτ
yields a corresponding τ -maximum flow, where the correspondence is as in Corol-
lary 1.
By the submodularity of cut functions in a graph, the non-empty intersection
of minimum cuts is a minimum cut. Thus, there is a minimum cut in Hτ with
smallest sink-side. Let Rτ be the minimum cut such that Rτ ⊆ R for all R such
that w̃τ (R) = w̃τ (Rτ ). For examples, see Figure 2.
Lemma 1 (Ogier [16]). For all 0 ≤ τ < σ ≤ k, Rτ ⊆ Rσ , and Rτ is a left
continuous function of τ .
Let W be the set of breakpoints of Rτ as a function of τ . That is, W is the
set of values τ such that Rτ 6= Rτ + , where Rτ + indicates the limit of Rσ as
σ ↓ τ . Ogier’s main contribution is to prove that there is an optimal solution
to UMFP that is uniform on the intervals between successive breakpoints of W ,
Universally Maximum Flow with Piecewise-Constant Capacities 157

5.5 2.5
G3
8

1 1 3 1 1 1 0 1 2

1 2

2 1 1 2

3 2 3
3 2 2 2
8

2 4 6
Θ = 0 1 2 3
2.5-maximum flow x 2.5
8

x=1
x=1/2
1 1 3 1 1 1 0 1
R 2.5+ = R3
1 2

2 1 1 2
R 2+ = R2.5
3 2 3
3 2 2 2
R2
8

Fig. 2. Top left: The time expanded graph Gk for k = 3. Bottom left: Cuts Rτ
for τ = 2, 2.5, 3. Top right: A τ -maximum flow xτ for τ = 2.5.

and that such an optimal flow can be defined piecemeal by special τ -maximum
flows [16].
Let xτ be a τ -maximum flow that is constant on each interval (θ − 1, θ] for
θ ∈ {1, 2, . . . , bτ c} and also constant on interval (bτ c, τ ]. In addition, the excess
function pτ of xτ satisfies pτj (τ ) = 0 for all j ∈ V − {s, d}. That such a flow exists
is implied by Corollary 2 : Compute a maximum flow in Hτ . This flow saturates
the arcs entering Rτ , hence has the value of a τ -maximum flow. It does not use
any node storage arc leaving N (dτ e), hence in the dynamic setting, completes
by time τ . (See also Figure 2.) Define x0 to be the dynamic flow such that for
all t, f = x0 (t) is a maximum flow for static network N with capacities u(t),
sources Rt (dte), and sink d. Theorem 2 below extends the static flow concept of
complementary slackness to UMFP.
For each node j of the dynamic network N , define


max{τ |τ ≥ t, j(dte) ∈ Rτ }, if this set is non-empty,
qj (t) := (2)
0, otherwise.

In words, qj (t) is the largest value of τ for which there is no path from j(dte) to
d(dτ e) in the residual network of a τ -maximum flow. Thus if qi (t) > qj (t) then
there is a τ such that for all xσ , σ > τ , if i(dte) has a residual path to d(dσe),
then j(dte) has a residual path to d(dσe), and there is some such σ for which
i(dte) does not have a residual path to d(dσe), when j(dte) does. Since Rτ is
left-continuous, qj (t) is well-defined.
158 Lisa Fleischer

Theorem 2 (Ogier [16]). Flow x∗ , defined by


 τ
 xij (t), if qi (t) = qj (t) = τ
x∗ij (t) = uij (t), if qi (t) > qj (t)

−uij (t), if qi (t) < qj (t)
is a τ -maximum flow for all τ ∈ (0, T ].
The proof of this theorem is technical. The intuition is that if, for t ∈ (θ−1, θ],
there is a σ-maximum flow with σ ≥ t such that node i(θ) is on the source side
of the cut defined by this flow, and j(θ) is on the sink side, then the flow on the
arc (i, j) should be at capacity in this interval. If i(θ) and j(θ) are always on the
same side of the cut corresponding to any τ -maximum flow, then the flow on
the arc at time t should be determined by the τ -maximum flow with the latest
completion time, xqj (θ) , if this time qj (θ) is after t; and if not, then the flow
should be determined by the maximum flow on Nθ , which is x0 .
If W is known, then x0 can be computed with |W | maximum flows in a
network the same size as N , and x∗ can be computed with an additional |W |
maximum flows in each Hτ , for all τ ∈ W . Ogier describes a method to compute
W in a piecewise fashion, computing W ∩ (θ, θ + 1] for θ = 0, . . . , k − 1. Define
the minimum cut function κ by setting κ(τ ) equal to the value of the minimum
cut in Hτ .
Lemma 2. Within the domain (θ, θ + 1] for θ ∈ {0, . . . , k − 1}, κ is concave and
piecewise linear with at most n − 2 breakpoints.
Proof. For τ ∈ (θ, θ + 1], θ ∈ {0, . . . , k − 1}, κ(τ ) is the minimum function of
a set of linear functions with positive slope: all capacities are linear, increasing
functions of τ on the unit interval; hence, the value of all cuts are also linear,
increasing functions of τ . This implies that κ is concave. Since Rτ is increasing
in τ , changes in the slope of the cut function imply an increase in |Rτ ∩ Nθ+1 |.
This together with the fact that dθ+1 ∈ Rτ and sθ+1 6∈ Rτ implies that κ(τ ) has
at most n − 2 breakpoints in a unit interval.
Ogier [16] computes each subset W ∩(θ, θ+1] by finding τ such that Rθ+ (τ ) =
Rθ+1 (τ ), computing Rτ , and recursing on the subintervals (θ, τ ] and (τ, θ + 1]
until the interval does not properly contain elements of W . Since κ is concave in
the interval (θ, θ + 1], the τ that satisfies Rθ+ (τ ) = Rθ+1 (τ ) lies in the interval.
Lemma 2 implies that |W | is at most nk. Ogier uses one maximum flow com-
putation on a graph with at most nk vertices at each recursive step, yielding
an O(n4 k 4 ) algorithm to compute W . Substituting the fastest strongly polyno-
mial maximum flow algorithm — the preflow-push algorithm of Goldberg and
Tarjan [9], Ogier’s approach leads to an O(k 3 mn2 log(kn2 /m)) algorithm to
compute x∗ .

1.3 Parametric Maximum Flows


Our main contribution is to generalize the parametric maximum flow algorithm
of Gallo, Grigoriadis, and Tarjan [8] to speed up the computation of W and
Universally Maximum Flow with Piecewise-Constant Capacities 159

the xτ , τ ∈ W , as defined in the previous section. In this section, we review


the parametric maximum flow algorithm and the preflow-push maximum flow
algorithm on which it is based.
Gallo, Grigoriadis, and Tarjan [8] present several algorithms based on a para-
metric preflow algorithm. Given a graph G on n vertices and m arcs such that
capacities of arcs leaving the source are nondecreasing functions of θ, capacities
of arcs entering the sink are nonincreasing functions of θ, and all other capaci-
ties are constant, the parametric preflow algorithm finds a maximum flow and
a minimum cut for each value of θ in an increasing sequence θ1 < . . . < θk . The
sequence may be computed on-line.
The parametric preflow algorithm is a parameterized version of the Gold-
berg and Tarjan [9] preflow-push algorithm for computing maximum flows. The
preflow-push algorithm maintains at all times a feasible preflow and a valid la-
P arc capacity constraints fij ≤ uij
beling. A feasible preflow is a flow f satisfying
and relaxedP node conservation constraints i∈V fij ≥ 0 for all j ∈ V − {s}. The
quantity i∈V fij , when strictly positive, is called the excess at node j. A valid
labeling l for preflow f is a function from the vertices to the nonnegative integers
satisfying l(s) = n, l(d) = 0, l(i) ≥ 0, and l(i) ≤ l(j) + 1 for all (i, j) such that
fij < uij .
The preflow-push algorithm typically starts with the preflow defined by sat-
urating all arcs leaving the source, and the labeling l(i) = 0 for all i 6= s.
Throughout the course of the algorithm, labels of nodes may increase but never
decrease. The algorithm terminates when no nodes besides the source or sink
contain excess. The entire analysis of the algorithm depends on the fact that the
labels can only increase, and that at all times there is a path of residual arcs
from any vertex to either the source or the sink, implying that no node has label
greater than 2n. The bound on the run time of the algorithm is determined by
bounding the number of times each node is relabeled.
The analysis of this algorithm remains valid as long as it starts with any
feasible preflow and corresponding valid labeling. There is nothing that requires
the algorithm to start with the stated preflow and labeling. This fact is exploited
by Gallo, et al. in their generalization of this algorithm to solve the parametric
maximum flow problem.
The parametric preflow algorithm starts with capacities determined by the
smallest value of the parameter θ, i.e. θ1 , and computes a maximum flow. Arc
capacities are then increased to the next largest value of θ. If there are arcs from
s to nodes with label less than l(s), the flow on these arcs is increased to meet
the new capacities. The flow on arcs entering the sink is decreased, if necessary,
to meet the new capacities, increasing the excess at the nodes adjacent to the
sink. This results in a new, feasible preflow for which the previous labeling is
still valid, since all saturated arcs in the previous preflow remain saturated in
the new preflow. Thus the preflow-push algorithm will compute a maximum flow
in this network with updated capacities.
The same bound on the run time of the preflow-push algorithm is also a bound
on the run time of the parametric preflow push algorithm: O(nm log(n2 /m))
160 Lisa Fleischer

time [8,9] (assuming the number of values of θ is not more than m log(n2 /m)).
This is because the bound on the run-time of the preflow-push algorithm depends
only on the number of times a node is relabeled, node labels never decrease in
either algorithm, and all node labels are ≤ 2n.
Using this parametric preflow algorithm, Gallo, Grigoriadis, and Tarjan [8]
describe an algorithm that finds all breakpoints of κ(θ), the minimum cut func-
tion of graph G, in the same asymptotic time. Like Ogier’s algorithm, this al-
gorithm relies on the concavity of κ. Once the breakpoints are found, the para-
metric preflow algorithm can be invoked again to compute the maximum flows
and minimum cuts corresponding to these breakpoints.

2 Solving UMFP: A Generalized Parametric Maximum


Flow Algorithm

In this section, we discuss the main contribution of this paper which is a gen-
eralization of the parametric maximum flow algorithm and the breakpoint al-
gorithm of Gallo, Grigoriadis, and Tarjan [8] to solve the universally maximum
flow problem with piecewise constant capacities. Our generalization of [8] en-
ables us to reduce the time needed to compute the set of breakpoints of the
universally maximum dynamic flow W , and the τ -maximum flows xτ , τ ∈ W
that are necessary to compute the optimal flow x∗ as detailed in Section 1.2.
We require O(k 2 nm log(kn2 /m)) time to do this. Since x0 can be computed
in O(knm log(n2 /m)) time using k calls to Goldberg and Tarjan’s push-relabel
maximum flow algorithm [9], this implies that the universally maximum dynamic
flow x∗ can also be computed in O(k 2 nm log(kn2 /m)) time, which improves the
algorithm of Ogier [16] by a factor of O(kn).
Our algorithm integrates the work of Gallo, et al. [8] into the framework
of the Ogier algorithm. In Step 1, we use the parametric preflow algorithm of
Gallo, et al. to compute the minimum cuts Rθ and θ-maximum flows xθ in Hθ
for θ = 1, . . . , k. In Step 2, we generalize the breakpoint algorithm of Gallo, et
al. to compute the minimum cuts Rτ and corresponding maximum flows xτ for
all τ ∈ (θ − 1, θ] ∩ W , for each θ = 1 . . . , k.
We consider a parametric flow problem based on the graphs Hτ . Instead of
considering the graphs Hτ for τ ∈ (0, k] as separate graphs, we consider one
graph H on the same vertex set but with parameterized capacities, so that the
capacities of arcs of H at time τ equal the capacities of Hτ . That is, H(τ ) = Hτ .
More precisely, using Nθ to denote the θth copy of network N in H, an arc in
Nθ has capacity 0 for 0 ≤ t ≤ θ − 1, capacity (t − θ + 1)ue (θ) for θ − 1 < t ≤ θ,
and capacity ue (θ) for θ < t ≤ k. An arc from Nθ to Nθ+1 has capacity 0 for
0 ≤ t ≤ θ and capacity uj (θ) for θ < t ≤ k.
For the correctness and speed of their algorithm, Gallo et al. require that
all arcs with parameterized capacities either leave the source or enter the sink,
and this does not hold for H. However, the Gallo et al. requirements as stated
in Section 1.3 are merely sufficient conditions. The following conditions are also
sufficient, but more general [8].
Universally Maximum Flow with Piecewise-Constant Capacities 161

1. After increasing θ, it must be possible to adjust the flow so that it is a feasible


preflow and modify the labeling so that it remains valid without decreasing
any labels.
2. The minimum cut function κ(τ ) must be concave.

The second item is necessary for us to find the breakpoints W efficiently using a
modified version of the Goldberg and Tarjan maximum flow algorithm and holds
true here by Lemma 2. The first item is necessary to compute all corresponding
maximum flows and minimum cuts in the same asymptotic time as one — this
is a requirement of the parametric preflow algorithm as discussed in Section 1.3.
We establish below (in Step 2) how to satisfy the first condition.
The two steps of our algorithm are detailed below. Figure 4 briefly summa-
rizes the algorithm.
Step 1: Computing Rθ and xθ for θ ∈ {0, 1, . . . , k}. To compute the
θ-maximum flows xθ , θ ∈ {0, . . . , k}, we construct a parametric flow problem
on a modified Gk (= Hk ). Recall that si is the copy of the source in Ni , and
di is the corresponding sink. We introduce a super source sS with infinite ca-
pacity arcs (sS , si ) for each i ∈ {0, . . . , k}, a super sink dS with arcs (di , dS ) for
each i ∈ {0, . . . , k} with capacity function that is infinite when i ≤ θ and zero
otherwise, and call this parametric network Ĝk . (See Figure 3.) We then solve
the parametric flow problem in Ĝk to find the maximum flows corresponding
to each parameter θ ∈ {0, 1, . . . , k}. These maximum flows are the flows xθ for
θ ∈ {0, 1, . . . , k}. This is because, by Corollary 1, they correspond to cuts in
H(θ) ≡ Hθ of the same value; and, as a cut in Hθ , they keep this value even
when all sinks di that are not in the original sink side of the cut are moved to
the sink side: If di is not in the sink side of the cut for parameter θ in Ĝk , the
capacity of (di , dS ) for parameter θ must be finite, and is therefore zero. Thus
the capacity of all arcs entering di in Hθ is also zero, and di can be moved to
the sink side of the cut.
To solve this parametric flow problem, we reverse Ĝk by reversing the di-
rection of every arc in Ĝk , and apply the algorithm in [8] to this network. This
parametric flow problem is of the form that is solved in [8]: all arcs that vary
with θ are leaving the source, and the capacities are all increasing functions of θ.
Thus, xθ and Rθ for all θ ∈ {0, . . . , k} can be computed in the same asymptotic
time as one maximum flow computation in the graph Ĝk : O(k 2 nm log(kn2 /m))
time.
Step 2: Computing Rτ and xτ for all τ ∈ W . To find the elements
of W , and the corresponding τ -maximum flows within interval (θ − 1, θ], we
generalize the version of the parametric maximum flow algorithm [8] that finds
all breakpoints of the minimum cut function κ(θ). We start with Rθ−1 and Rθ .
The corresponding graphs are H(θ − 1) and H(θ). As τ increases from θ − 1 to
θ, the capacities of the arcs in H ∩ Nθ increase linearly from 0 to uE (θ). Because
the change in the capacity of the arcs is linear, the change in the minimum cut
function is piecewise linear. By Lemma 2, this minimum cut function is also
concave; thus it remains to show how to satisfy Condition 1 of [8].
162 Lisa Fleischer

sS
Gk
8

8
8

u( θ )

u1 = 0, for t=[0,1)
, for t=[1,3]
8

u2 = 0, for t=[0,2)
, for t=[2,3]
8

u3 = 0, for t=[0,3)
, for t=3
8

u1 u2
8

u3
dS

Fig. 3. Ĝk for k = 3.

Step 1: Input: Gk = (Vk , Ek )


Output: θ-maximum flows xθ and cuts Rθ for θ = 1, 2, . . . , k.
V̂ = Vk ∪ {sS , dS }.
Ê = Ek ∪ {(sS , si )|i = 0, 1, . . . , k} ∪ {(di , dS )|i = 0, 1, . . . , k}.
with ûe (θ) = ue (θ) for e ∈ E,
ûe (θ) = ∞ for e adjacent to sS ,
ûe (θ) = ∞ for e = (di , dS ) when i ≤ θ,
ûe (θ) = 0 for e = (di , dS ) when i > θ.
Ĝ = (V̂ , Ê) with capacity function û.
ĜR = the reverse network of Ĝ.
(x1 , x2 , . . . , xk , R1 , R2 , . . . , Rk ) = GGT(GR , θ = 1, 2, . . . , k).

Step 2: Input: R0 = {di }ki=0 , R1 , R2 , . . . , Rk .


Output: W and all τ -maximum flows xτ for τ ∈ W .
For θ = 1, 2, . . . , k,
Modified GGT(Rθ − Rθ−1 ) returns W ∩ (θ − 1, θ] and Rτi+1 − Rτi
for all τi ∈ W ∩ (θ − 1, θ], τi < τi+1 .
For τ1 < τ2 < · · · < τn ∈ W ∩ (θ − 1, θ],
xτi |Rτi+1 −Rτi = max flow on Rτi+1 − Rτi ∪ {v1 = Rτi+1 , v2 = Rτi }

Fig. 4. The two step algorithm to compute the τ -maximum flows needed for
constructing a universally maximum flow.
Universally Maximum Flow with Piecewise-Constant Capacities 163

Modified GGT. We run an iteration of the breakpoint algorithm in the


reverse network H R of H, restricted to the time interval (θ − 1, θ] for some θ ∈
{1, 2, . . . , k}. After computing the first maximum flow, we increase the current
value of θ by , thus increasing the capacity of arcs in Nθ by . (Here,  is
determined by the breakpoint algorithm in [8].) To adjust the preflow, we then
x (θ)
increase the current flow in Nθ by  uij ij (θ)
for each arc (i(θ), j(θ)). Recall that
xij (θ) equals the flow on arc (i, j) in Nθ . Since there are no arcs with positive
capacity that enter Nθ in H R , there is no node in Nθ that receives flow from
outside Nθ . Hence this adjustment of the preflow maintains non-negative excess
within Nθ , and the adjusted preflow is feasible. Since this update keeps previously
saturated arcs saturated and empty arcs empty, the current labeling is still valid,
and we have satisfied Condition 1.
The breakpoint algorithm is repeated for each unit interval of the form
(θ − 1, θ], θ = 1, . . . , k in [0, k]. This could take k parametric maximum flow
computations in H R , i.e. k times O(k 2 nm log(kn2 /m)) time. To speed this up,
we use the fact that we have already computed Rθ for all θ ∈ {1, . . . , k} in Step
1, and that, by Lemma 1, Rθ−1 ⊆ Rτ ⊆ Rθ , for all θ − 1 ≤ τ ≤ θ. Thus, in
searching for breakpoints of κ(τ ) in the interval (θ − 1, θ], we can restrict the
computations performed for each unit interval to the subgraph of H R on the
vertex set (Rθ − Rθ−1 ) ∪ {v1 , v2 }, where v1 is the vertex resulting from the con-
traction of all vertices in Rθ and v2 is the vertex resulting from the contraction
of Rθ−1 . Along with the breakpoints τ ∈ W ∩(θ − 1, θ] the algorithm also returns
the Rτ implicitly as Rτ − Rθ−1 .
We also need to compute the τ -maximum flows xτ for all τ ∈ W . We don’t
actually need to know the flow on every arc at every time step for each xτ .
By Theorem 2, it is sufficient to determine the values xτij (t) for t and τ such
that qi (t) = qj (t) = τ . Since xτ is constant on unit intervals by definition (see
Section 1.2), the definition of qj (t) in (2) implies that it is necessary to compute
xτij (t) only for t ≤ τ where i(dte), j(dte) ∈ Rτ and i(dte), j(dte) ∈ Rσ = Rτ + . By
Corollary 2, this flow corresponds to a maximum flow in Hτ , and the particular
flow amounts we require are on arcs contained in the subgraph of H R induced
by the vertices in Rσ − Rτ . In order to compute xτ restricted to this space-
network interval, we must also specify the flow on all arcs entering and leaving
this interval, i.e. all arcs adjacent but not contained in Rσ and Rτ . The flow
on such arcs that are adjacent to Rτ in the relevant time interval is determined
by definition of Rτ : all arcs entering Rτ must be saturated in xτ , and all arcs
leaving Rτ must be empty in xτ . The flow on the remainder of these arcs, i.e.
those adjacent to Rσ , is similarly determined by citing one additional lemma of
Ogier’s.

Lemma 3 (Ogier [16]). For each τ ∈ (0, T ), Rτ + is a τ -minimum cut.

In particular, for consecutive breakpoints τ < σ in W , Rτ + = Rσ , so Rσ is also


a τ -minimum cut. Thus the above observations for arcs entering and leaving Rτ
also hold for Rσ , and now we can compute xτ on H R restricted to Rσ \Rτ .
164 Lisa Fleischer

Theorem 3. A universally maximum flow with piecewise-constant capacities


can be constructed in
O(k 2 mn log(kn2 /m)) time.
Proof. We have shown above that Step 1 takes O(k 2 mn log(kn2 /m)) time. To
bound the run time of Step 2, we note that performing maximum flow computa-
tions on k graphs that together contain a total of nk vertices and mk + nk arcs
is bounded by the time to compute a maximum flow on a single graph with nk
vertices and (m + n)k arcs. Thus the total run time is O(k 2 nm log(kn2 /m)).

Acknowledgments
I am grateful to Éva Tardos and Kevin Wayne for providing helpful comments
on earlier drafts of this paper.

References
1. E. J. Anderson and P. Nash. Linear Programming in Infinite-Dimensional Spaces.
John Wiley & Sons, 1987.
2. E. J. Anderson, P. Nash, and A. B. Philpott. A class of continuous network flow
problems. Mathematics of Operations Research, 7:501–14, 1982.
3. E. J. Anderson and A. B. Philpott. A continuous-time network simplex algorithm.
Networks, 19:395–425, 1989.
4. J. E. Aronson. A survey of dynamic network flows. Annals of Operations Research,
20:1–66, 1989.
5. R. E. Burkard, K. Dlaska, and B. Klinz. The quickest flow problem. ZOR Methods
and Models of Operations Research, 37(1):31–58, 1993.
6. L. Fleischer. Faster algorithms for the quickest transshipment problem with zero
transit times. In Proceedings of the Ninth Annual ACM/SIAM Symposium on
Discrete Algorithms, pages 147–156, 1998. Submitted to SIAM Journal on Opti-
mization.
7. L. R. Ford and D. R. Fulkerson. Flows in Networks. Princeton University Press,
1962.
8. G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow
algorithm and applications. SIAM J. Comput., 18(1):30–55, 1989.
9. A. V. Goldberg and R. E. Tarjan. A new approach to the maximum flow problem.
Journal of ACM, 35:921–940, 1988.
10. B. Hajek and R. G. Ogier. Optimal dynamic routing in communication networks
with continuous traffic. Networks, 14:457–487, 1984.
11. B. Hoppe. Efficient Dynamic Network Flow Algorithms. PhD thesis, Cornell Uni-
versity, June 1995. Department of Computer Science Technical Report TR95-1524.
12. B. Hoppe and É. Tardos. Polynomial time algorithms for some evacuation prob-
lems. In Proc. of 5th Annual ACM-SIAM Symp. on Discrete Algorithms, pages
433–441, 1994.
13. B. Hoppe and É. Tardos. The quickest transshipment problem. In Proc. of 6th
Annual ACM-SIAM Symp. on Discrete Algorithms, pages 512–521, 1995.
14. E. Minieka. Maximal, lexicographic, and dynamic network flows. Operations Re-
search, 21:517–527, 1973.
Universally Maximum Flow with Piecewise-Constant Capacities 165

15. F. H. Moss and A. Segall. An optimal control approach to dynamic routing in


networks. IEEE Transactions on Automatic Control, 27(2):329–339, 1982.
16. R. G. Ogier. Minimum-delay routing in continuous-time dynamic networks with
piecewise-constant capacities. Networks, 18:303–318, 1988.
17. J. B. Orlin. Minimum convex cost dynamic network flows. Mathematics of Oper-
ations Research, 9(2):190–207, 1984.
18. A. B. Philpott. Continuous-time flows in networks. Mathematics of Operations
Research, 15(4):640–661, November 1990.
19. A. B. Philpott and M. Craddock. An adaptive discretization method for a class of
continuous network programs. Networks, 26:1–11, 1995.
20. W. B. Powell, P. Jaillet, and A. Odoni. Stochastic and dynamic networks and
routing. In M. O. Ball, T. L. Magnanti, C. L. Monma, and G. L. Nemhauser,
editors, Handbooks in Operations Research and Management Science: Networks.
Elsevier Science Publishers B. V., 1995.
21. M. C. Pullan. An algorithm for a class of continuous linear programs. SIAM J.
Control and Optimization, 31(6):1558–1577, November 1993.
22. M. C. Pullan. A study of general dynamic network programs with arc time-delays.
SIAM Journal on Optimization, 7:889–912, 1997.
23. G. I. Stassinopoulos and P. Konstantopoulos. Optimal congestion control in single
destination networks. IEEE transactions on communications, 33(8):792–800, 1985.
24. W. L. Wilkinson. An algorithm for universal maximal dynamic flows in a network.
Operations Research, 19:1602–1612, 1971.
Critical Extreme Points of the 2-Edge
Connected Spannning Subgraph Polytope

Jean Fonlupt1 and Ali Ridha Mahjoub2


1
Équipe de Combinatoire, UF9 921, Université Pierre et Marie Curie,
4 place Jussieu, 75252 Paris Cedex 05 France
[email protected]
2
LIMOS, Université de Clermont II, Complexe Scientifique des Cézeaux,
63177 Aubière Cedex, France
[email protected]

Abstract. In this paper we study the extreme points of the polytope


P (G), the linear relaxation of the 2-edge connected spanning subgraph
polytope of a graph G. We introduce a partial ordering on the extreme
points of P (G) and give necessary conditions for a non-integer extreme
point of P (G) to be minimal with respect to that ordering. We show
that, if x̄ is a non-integer minimal extreme point of P (G), then G and x̄
can be reduced, by means of some reduction operations, to a graph G0
and an extreme point x̄0 of P (G0 ) where G0 and x̄0 satisfy some simple
properties. As a consequence we obtain a characterization of the per-
fectly 2-edge connected graphs, the graphs for which the polytope P (G)
is integral.

Keywords: Polytope, cut, 2-edge connected graph, critical extreme point.

1 Introduction
A graph G = (V, E) is called 2-edge connected if for every pair of nodes (u, v)
there are at least two edge-disjoint paths between u and v. Given a graph
G = (V, E) and a weight function w which associates to each edge e a weight
w(e), the 2-edge connected subgraph problem (TECSP) consits of finding a 2-edge
connected
P subgraph H = (V, F ) of G, spanning all the nodes of G and such that
e∈F w(e) is minimum. This problem arises in the design of reliable transporta-
tion and communication networks [23], [24]. It is NP-hard in general. It has been
shown to be polynomial in series-parallel graphs [26] and Halin graphs [25].
Given a graph G = (V, E) and an edge subset F ⊆ E, the 0 − 1 vector xF
of <E such that xF (e) = 1 if e ∈ F and xF (e) = 0 if e ∈ E \ F is called the
incidence vector of F . The convex hull of the incidence vectors of the edge sets
of the connected spanning subgraphs of G, denoted by TECP(G), is called the
2-edge connected spanning subgraph polytope of G.
G = (V, E) be a graph. Given b : E → < and F a subset of E, b(F ) will
Let P
denote e∈F b(e). For W ⊆ V we let W = V \ W . If W ⊂ V is a node subset of
G, then the set of edges that have only one node in W is called a cut and denoted

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 166–182, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Critical Extreme Points 167

by δ(W ). (Note that δ(W ) = δ(W )). If W = {v} for some node v ∈ V , then we
write δ(v) for δ({v}), δ(v) will be called a node cut. An edge cutset F ⊆ E of G
is a set of edges such that F = δ(S) forP some non-empty set S ⊂ V . Given two
vectors w, x ∈ <E , we let < w, x >= e∈E w(e)x(e).
If xF is the incidence vector of the edge set F of a 2-edge connected spanning
subgraph of G, then xF is a feasible solution of the polytope P (G) defined by

 x(e) ≥ 0 ∀ e ∈ E, (1.1)
P (G) = x(e) ≤ 1 ∀ e ∈ E, (1.2)

x(δ(S)) ≥ 2 ∀ S ⊂ V, S 6= ∅. (1.3)
Conversely, any integer solution of P (G) is the incidence vector of the edge set
of a 2-connected subgraph of G. Inequalities (1.1) and (1.2) are called trivial
constraints and constraints (1.3) are called cut constraints.
The polytope P (G) is a relaxation of the TECP(G). It is also a relaxation of
the subtour polytope of the traveling salesman problem, the set of all the solu-
tions of the system given by inequalities (1.1)-(1.3) together with the constraints
x(δ(v)) = 2 for all v ∈ V . Thus minimizing < w, x > over P (G) may provide a
good lower bound for both the TECSP and the traveling salesman problem (see
[12] [21]).
Using network flows [8] [9], one can compute in polynomial time a minimum
cut in a weighted undirected graph. Hence the separation problem for inequalities
(1.3) (i.e. the problem that consists of finding whether a given solution ȳ ∈
<|E| satisfies inequalities (1.3) and if not to find an inequality which is violated
by ȳ) can be solved in polynomial time. This implies by the ellipsoid method
[13] that the TECSP can be solved in polynomial time on graphs G for which
TECP(G) = P (G). Mahjoub [20] called these graphs perfectly 2-edge connected
graphs (perfectly-TEC). Thus an interesting question would be to characterize
these graphs. In [19], Mahjoub showed that series-parallel graphs are perfectly-
TEC. In [20] he described sufficient conditions for a graph to be perfectly-TEC.
In [10], Fonlupt and Naddef studied the graphs for which the polyhedron
given by the inequalities (1.1) and (1.3) is the convex hull of the tours of G. ( Here
a tour is a cycle going through each node at least once). This is in connection with
the graphical traveling salesman problem. They gave a characterization of these
graphs in terms of excluded minors. (A minor of a graph G is a graph obtained
from G by deletions and contractions of edges). A natural question that may arise
here is whether or not one can obtain a similar characterization for perfectly-
TEC graphs. The answer to this question is, unfortunately, in the negative. If we
add the constraints x(e) ≤ 1 for all e ∈ E, the approach developped by Fonlupt
and Naddef [10] would not be appropriate. In fact, consider a non perfectly-TEC
graph G (for instance a complete graph on four nodes or more). Subdivide G
by inserting a new node of degree 2 on each edge and let G0 = (V 0 , E 0 ) be the
resulting graph. Clearly, each edge e of G0 belongs to a 2-edge cutset. So x(e) = 1
for all e ∈ E 0 is the unique solution of P (G0 ) and hence G0 is perfectly-TEC.
However, the graph G, which is a minor of G0 , is not.
In this paper we study the fractional extreme points of the polytope P (G).
We introduce an ordering on the extreme points of P (G) and give necessary
168 Jean Fonlupt and Ali Ridha Mahjoub

condition for a non-intger extreme point of P (G) to be minimal with respect


to that ordering. We show that if x̄ is a minimal non-integer extreme point of
P (G), then G and x̄ can be reduced, by means of some reduction operations, to a
graph G0 and an extreme point x̄0 of P (G0 ), where G0 and x̄0 verify some simple
properties . As a consequence, we obtain a characterization of perfectly-TEC
graphs.
The polytope TECP(G) has seen much attention in the last few years.
Grötschel and Monma [14] and Grötschel, Monma and Stoer [15] [16] [17] [18]
consider a more general model related to the design of minimum survivable net-
works. They discuss polyhedral aspects of this model and devise cutting plane
algorithms along with computational results are presented. A complete survey of
that model can be found in Stoer [24]. In [3] [4] Chopra studies the minimum k-
edge connected spanning subgraph problem when multiple copies of an edge may
be used. In [2], Barahona and Mahjoub characterize the polytope TECP(G) for
the class of Halin graphs. Baı̈ou and Mahjoub [1] characterize the Steiner 2-edge
connected subgraph polytope for series-parallel graphs.
Related work can be found in [5] [6] [7]. In [5] Cornuéjols, Fonlupt and Naddef
study the TECSP when multiple copies of an edge may be used. They showed
that when the graph is series-parallel, the associated polyhedron is completly
described by inequalities (1.1) and (1.3). In [6] [7] Coullard et al. study the
polytope, the extreme points of which are the edge-sets of the Steiner 2-node
connected subgraphs of G.
This paper is organized as follows. In the next section we give more notation
and definitions and present some preliminary results. In Section 3 we study an
ordering on the extreme points of P (G). In Section 4 we discuss some structural
properties of the extreme points of P (G). In Section 5 we prove our main result.

2 Notation, Definitions, and Preliminary Results

We consider finite, undirected and loopless 2-edge connected graphs, which may
have multiple edges. We denote a graph by G = (V, E) where V is the node set
and E is the edge set of G. If e is an edge with endnodes u and v, then we write
e = uv. If u is the endnode of an edge e, we say that u (resp. e) is incident to e
(resp. u). If W ⊆ V , we denote by G(W ) the subgraph of G induced by W and
by E(W ) the set of edges having both endnodes in W .
Given a solution x ∈ <E of P (G) we denote by E0 (x), E1 (x), Ef (x) the sets
of edges e such that x(e) = 0, x(e) = 1, 0 < x(e) < 1, respectively. A cut δ(S)
is said to be tight for x if x(δ(S)) = 2. A node v ∈ V is called tight for x if the
cut δ(v) is tight for x. We will denote by τ (x) the set of tight cuts for x. A cut
δ(W ) will be called proper if |W | > 2 and |W | > 2. Given a polyhedron P , we
denote by dim(P ) the dimension of P .
Let G = (V, E) be a graph. Two cuts δ(W1 ) and δ(W2 ) of G are said to be
crossing if W1 ∩ W2 6= ∅, W1 6⊂ W2 , W2 6⊂ W1 and V \ (W1 ∪ W2 ) 6= ∅. A
family of cuts {δ(W1 ), . . . , δ(Wk )} is said to be laminar if δ(W1 ), . . . , δ(Wk ) are
pairwise non-crossing.
Critical Extreme Points 169

If x ∈ <E is a solution of P (G) and δ(W ) is a cut tight for x, we let

τ (W, x) = {δ(S) ∈ τ (x) | S ⊂ W },


τ (W , x) = {δ(S) ∈ τ (x) | S ⊂ W }.

In [5] Cornuéjols, Fonlupt and Naddef showed that if x is an extreme point of


P (G) and δ(W ) is a cut tight for x, then x can be charcterized by a system of
equations whose proper tight cuts do not cross δ(W ).
Lemma 2.1. [5] Let x be an extreme point of P (G) and δ(W ) a cut of G
tight for x. Then x is the unique solution of the system


 x(e) = 0 ∀ e ∈ E0 (x),


 x(e) = 1 ∀ e ∈ E1 (x),
(2.1) x(δ(W )) = 2,



 x(δ(S)) = 2 ∀ S ∈ τ (W, x),

x(δ(S)) = 2 ∀ S ∈ τ (W , x).

This lemma implies that any tight cut that crosses δ(W ) induces a constraint
which is redundant with respect to system (2.1). Note that system (2.1) may
contain redundant equalities.
If x is a solution of <F where F ⊆ E and T is a subset of F , then xT
will denote the restriction of x on T . If S ⊂ V , we let Γ (S) = E(S) ∪ δ(S). If
W1 , W2 ⊂ V , (W1 , W2 ) will denote the set of edges between W1 and W2 .
Throughout this paper x̄ will denote a non-integer extreme point of P (G).
All the structures (polytopes, affine subspaces,. . .) that will be considered in the
sequel will refer to x̄. So in the rest of the paper we will write τ (W ), τ (W )
instead of τ (W, x̄), τ (W , x̄).
Let δ(W ) be a cut of G tight for x̄. Let L0 be the affine subspace of <Γ (W )
given by the constraints

x(e) = 0 ∀ e ∈ E0 (x̄) ∩ Γ (W ),
L0 =
x(e) = 1 ∀ e ∈ E1 (x̄) ∩ Γ (W ).

Let P (W ) ⊆ L0 be the polytope given by

P (W ) = {x ∈ L0 | 0 ≤ x(e) ≤ 1 ∀ e ∈ Γ (W ),
x(δ(S)) ≥ 2 ∀ S ⊂ W, S 6= ∅}.

And let L(W ) be the affine subspace of L0 defined as

L(W ) = {x ∈ L0 | x(δ(S)) = 2 ∀ S ∈ τ (W )}.

In a similar way we define P (W ) and L(W ). Let Lp (W ) (resp. Lp (W )) be the


projection of L(W ) (resp. L(W )) onto <δ(W ) . Let

H(W ) = {x ∈ <δ(W ) | x(δ(W )) = 2},


H + (W ) = {x ∈ <δ(W ) | x(δ(W )) ≥ 2}.
170 Jean Fonlupt and Ali Ridha Mahjoub

We have the following lemma which is nothing but a reformulation of Lemma


2.1.
Lemma 2.2.
i) Lp (W ) ∩ Lp (W ) ∩ H(W ) = x̄δ(W ) .
ii) The maps L(W ) −→ Lp (W ) and L(W ) −→ Lp (W ) are bijections.

3 Critical Extreme Points of P (G)

In this section we introduce the concept of critical extreme points and give our
main result.
Suppose that x̄ is a non-integer extreme point of P (G). Let x̄0 be the solution
obtained by replacing some (but at least one) non-integer components of x̄ by 0
or 1 (and keeping all the other components of x̄ unchanged). Assume that x̄0 is
feasible for P (G). Then x̄0 can be written as a convex combination of extreme
points of P (G). If ȳ is such an extreme point, then we shall say that x̄ dominates
ȳ and we shall write x̄  ȳ. Note that if x̄0 is itself an extreme point of P (G),
then ȳ = x̄0 . Also note that Ef (ȳ) ⊂ Ef (x̄) and if x̄(e) = 0 or 1 for some edge
e ∈ E, then ȳ(e) = x̄(e) . Moreover, if a cut δ(W ) is tight for both x̄ and x̄0 ,
then δ(W ) is also tight for ȳ. It may however exist cuts which are tight for x̄
(resp. ȳ) but not tight for ȳ (resp. x̄).
It is clear that the relation  defines a partial ordering on the extreme points
of P (G). The minimal elements of this ordering (i.e. the extreme points x for
which there is no extreme point y such that x  y ) correspond to the integer
extreme points of P (G). In what follows we define, in a recursive way, a rank
function on the set of extreme points of P (G).
The minimal extreme points of P (G) will be called extreme points of rank 0.
An extreme point x of P (G) will be said of rank k, for fixed k, if x dominates
the extreme points of rank ≤ k − 1 and if there exists at least one extreme point
of P (G) of rank k − 1.
In [20], Mahjoub introduced the following operations:

θ1 : Delete an edge.
θ2 : Contract an edge whose both endnodes are of degree two.
θ3 : Contract a node subset W such that G(W ) is 2-edge connected.

And he proved the following result:


Theorem 3.1. [20] If G = (V, E) is perfectly-TEC and G0 is a graph obtained
by repeated applications of the operations θ1 , θ2 , θ3 , then G0 is perfectly-TEC.
Now we are going to consider somewhat similar operations but defined with
respect to a given solution x̄ of P (G).

θ10 : Delete an edge e with x̄(e) = 0.


θ20 : Contract an edge e whose one of its endnodes is of degree 2.
θ30 : Contract a node subset W such that G(W ) is 2-edge connected and
x̄(e) = 1 for all e ∈ E(W ).
Critical Extreme Points 171

Starting from a graph G = (V, E) and a solution x̄ of P (G) and applying


operations θ10 , θ20 , θ30 we obtain a reduced graph G0 = (V 0 , E 0 ) and a solution
x̄0 ∈ P (G0 ). The following lemmas establish the relation between x̄ and x̄0 . The
proof of the first lemma is omitted because it is similar to that of Theorem 3.1.
Lemma 3.2. x̄ is an extreme point of P (G) if and only if x̄0 is an extreme
point of P (G0 ).
Lemma 3.3. x̄ is an extreme point of P (G) of rank 1 if and only if x̄0 is an
extreme point of P (G0 ) of rank 1.
Proof. Suppose that x̄ is an extreme point of P (G) of rank 1. By Lemma
3.2, x̄0 is an extreme point of P (G0 ). If x̄0 is not of rank 1, there must exist a
fractional extreme point x̄00 of P (G0 ) that is dominated by x̄0 . Now let ȳ be the
solution such that
 00
 x̄ (e) if e ∈ E 0 ,
ȳ(e) = 1 if e ∈ E1 (x̄) \ E 0 ,

0 if e ∈ E0 (x̄) \ E 0 .

By Lemma 3.2, ȳ is an extreme point of P (G). Moreover, x̄ dominates ȳ. As ȳ


is fractional, this is a contradiction. The converse can be proved similarly.
An extreme point of P (G) will be called critical if it is of rank 1 and if
none of the operations θ10 , θ20 , θ30 can be applied for it. Thus by Lemma 3.3, the
characterization of the extreme points of rank 1 reduces to that of the critical
extreme points of P (G). In the rest of the paper we restrict ourselves to this
type of extreme points.
Definition 3.4. Given a graph G = (V, E) and a solution x ∈ P (G), the
pair (G, x) will be called a basic pair if the following hold:

1. V = V1 ∪ V2 with V1 ∩ V2 = ∅,
E = E1 ∪ E2 with E1 ∩ E2 = ∅ ,
G1 = (V1 , E1 ) is an odd cycle,
G2 = (V1 ∪ V2 , E2 ) is a forest whose set of pending nodes is V1 ,
and such that all the nodes of V2 have degree at least 3.
2. x(e) = 12 for e ∈ E1 and x(e) = 1 for e ∈ E2 .
3. For any proper cut δ(W ) of G, x(δ(W )) > 2.

Figure 1 shows some examples of basic pairs. Figure 1 (a) is a wheel, Figure
1 (b) shows an example where one component of the forest is an edge with two
pending nodes in V1 . In Figure 1 (c) the forest is not a star. If (G, x̄) is a basic
pair, then we will say that G is a basic graph and x̄ is a basic point.
Lemma 3.5. If (G, x̄) is a basic pair, then x̄ is an extreme point of P (G) .
Proof. Easy.

We state now the main result of the paper:


Theorem 3.6. If x̄ is a critical extreme point of P (G), then (G, x̄) is a basic
pair.
Let Ω be the class of graphs G such that G can be obtained from a basic
graph by replacing some edges of E2 by paths of length two. Note that the graphs
172 Jean Fonlupt and Ali Ridha Mahjoub

1/2 1/2 1/2


1/2 1/2 1/2
1 1
1 1 1/2 1 1/2 1 1/2
1/2 1 1 1/2 1/2 1 1
1 1
1 1
1 1 1 1 1
1/2 1 1/2 1 1/2
1/2 1/2 1/2
1/2 1/2 1/2
(a) (b) (c)

Fig. 1. Basic pairs

of Ω are not perfectly-TEC. In fact, let G̃ = (Ṽ , Ẽ) and x̃ ∈ <Ẽ form a basic
pair and G = (V, E) a graph of Ω obtained from G̃ by inserting new nodes on
some edges of the forest of G̃. Let x ∈ <E be the solution of P (G) such that
x(e) = x̃(e) if e ∈ Ẽ and x(e) = 1 if e ∈ E \ Ẽ. As by Lemme 3.5, x̃ is an
extreme point of P (G̃), x is a fractional extreme point of P (G), and thus G is
not perfectly-TEC.
A consequence of Theorem 3.6 is the following.
Corollary 3.7 A graph G is perfectly-TEC if and only if G is not reducible
to a graph of Ω by means of the operations θ1 , θ2 , θ3 .
Proof. Assume that G = (V, E) reduces to a graph G0 = (V 0 , E 0 ) ∈ Ω by
means of the operations θ1 , θ2 , θ3 . By the remark above, G0 is not perfectly-TEC.
And thus by Theorem 3.1, G so is.
Conversely, if x̄ is a fractional extreme point of P (G), then there must exist
an extreme point ȳ of P (G) of rank 1. By Lemma 3.3 together with Theorem
3.6, G and ȳ can be reduced by operations θ10 , θ20 , θ30 to a basic pair (G0 , ȳ 0 ). If
instead of applying θ20 we apply θ2 , we obtain a graph of Ω.
The proof of Theorem 3.6 will be given in Section 5. In what follows we are
going to establish some structural properties of the critical extreme points of
P (G). These properties will be useful in the sequel.

4 Structural Properties of Critical Extreme Points

Let x̄ be a critical extreme point of P (G). We have the following lemmas, for
the proofs that are omitted, see [11].
Lemma 4.1.
i) x̄(e) > 0 for all e ∈ E.
ii) G does not contain nodes of degree 2.
iii) If W ⊆ V is a node set such that G(W ) is 2-edge connected, then there exists
an edge f in E(W ) with 0 < x(f ) < 1.
iv) If δ(W ) is a cut of G tight for x̄, then both G(W ) and G(W ) are connected.
Lemma 4.2. Every edge f ∈ Ef (x̄) belongs to at least two tight cuts of
system (2.1).
Critical Extreme Points 173

Lemma 4.3. Let δ(W ) be a cut tight for x̄ with W = {u, v}. Then
i) u and v are linked by exactly one edge, say f ,
ii) u and v are tight and x̄(f ) = 1,
iii) δ(W ) ⊆ Ef (x̄).
Remark 4.4. The converse of Lemma 4.3 also holds. That is, if u, v ∈ V
are two nodes satisfying i) and ii) of Lemma 4.3, then the cut δ(W ) is tight for
x̄, where W = {u, v}. This implies that the constraint x̄(δ(W )) = 2 is redundant
with respect to the equations x̄(δ(u)) = 2, x̄(δ(v)) = 2 and x̄(f ) = 1.
Lemma 4.5. Let δ(W ) be a proper cut tight for x̄. If G(W ) and G(W ) are
2-edge connected, then for every two edges f, g ∈ δ(W ), x̄(f ) + x̄(g) ≤ 1.
Lemma 4.6. Let δ(W ) be a cut tight for x̄. Then
i) δ(W ) contains at least three edges,
ii) if δ(W ) ∩ E1 (x̄) 6= ∅, then either δ(W ) or δ(W ) is a node cut.
Proof. i) If δ(W ) is a 2-edge cutset, then we can show in a similar way as a
result in [20] that either W or W is reduced to a single node. But this implies
that G contains a node of degree 2, which contradicts Lemma 4.1 ii).
ii) Suppose that δ(W ) contains an edge e0 ∈ E1 (x̄). By Lemma 4.3 iii), |W | = 6
2 6= |W |. So let us assume that δ(W ) is a proper cut. We claim that G(W ) and
G(W ) are both 2-edge connected. Indeed, since δ(W ) is tight for x̄, by Lemma
4.1 iv), G(W ) and G(W ) are both connected. Now suppose, for instance, that
G(W ) is not 2-edge connected. Then there exists a partition W1 , W2 of W with
(W1 , W2 ) = {f } for some f ∈ E. W.l.o.g. we may suppose that e0 ∈ (W1 , W ). We
have x̄(δ(W )) = x̄(δ(W1 ))+ x̄(δ(W2 ))−2x̄(f ) = 2. As x̄(δ(W1 )) ≥ 2, x̄(δ(W2 )) ≥
2 and x̄(f ) ≤ 1, it follows that x̄(f ) = 1 and δ(W1 ) and δ(W2 ) are both tight
for x̄. Hence δ(W1 ) = {e0 , f }, contradicting i).
Consequently, both graphs G(W ) and G(W ) are 2-edge connected. Now let e1
be an edge of δ(W ) \ {e0 }. As x̄(e1 ) > 0, we have that x̄(e0 ) + x̄(e1 ) > 1. Since
δ(W ) is proper, this contradicts Lemma 4.5.
Lemma 4.7. If δ(W ) is a proper cut tight for x̄, then
i) E(W ) ∩ Ef (x̄) 6= ∅, E(W ) ∩ Ef (x̄) 6= ∅,
ii) G(W ) and G(W ) are both 2-edge connected,
iii) x̄(e) < 1 for all e ∈ δ(W ),
iv) |δ(W )| ≥ 4,
v) P (W ) ∩ H(W ), P (W ) ∩ H + (W ), P (W ) ∩ H(W ) and P (W ) ∩ H + (W ) are
integer polyhedra.
Proof. i) Assume for instance that E(W ) ∩ Ef (x̄) = ∅. Hence G(W ) can-
not contain an induced 2-edge connected subgraph. For otherwise, G would be
reducible by θ30 to a smaller graph, which contradicts the fact that x̄ is crit-
ical. In consequence, by Lemma 4.1 iv), it follows that G(W ) is a tree. Let
u and v be two pendant nodes of this tree. As x̄(δ(u)) ≥ 2, it follows that
x̄(δ(u) ∩ δ(W )) ≥ 1. Similarly we have x̄(δ(v) ∩ δ(W )) ≥ 1. Since δ(W ) is tight
for x̄, this implies that these inequalities are satisfied with equality and that
δ(W ) = (δ(u) ∩ δ(W )) ∪ (δ(v) ∩ δ(W )). Thus u and v are the only pendant
nodes of G(W ) and, in consequence, G(W ) is a path P . Moreover, as G does
174 Jean Fonlupt and Ali Ridha Mahjoub

not contain nodes of degree 2, P must consist of a single edge. Thus W = {u, v},
which contradicts the fact that δ(W ) is a proper cut.
ii) Suppose for instance that G(W ) is not 2-edge connected. As by Lemma
4.1 iv) G(W ) is connected, there exists a partition W1 , W2 of W and an edge f ∈
E(W ) such that (W1 , W2 ) = {f }. We have x̄(δ(W )) = x̄(δ(W1 )) + x̄(δ(W2 )) −
2x̄(f ) = 2. As x̄(δ(W1 )) ≥ 2, x̄(δ(W2 )) ≥ 2 and x̄(f ) ≤ 1, it follows that
x̄(f ) = 1 and δ(W1 ) and δ(W2 ) are both tight for x̄. By Lemma 4.6 ii) this
implies that |W1 | = |W2 | = 1. Hence |W | = 2, a contradiction
iii) If x̄(e0 ) = 1 for some e0 ∈ δ(W ), as x̄(e) > 0 for all e ∈ δ(W ), by ii)
together with Lemma 4.5 it follows that x̄(e) = 0 for all e ∈ δ(W ) \ {e0 }. Thus
x̄(δ(W )) < 2, which is impossible.
iv) This is a consequence of ii), iii), Lemma 4.5 and the fact that δ(W ) is
tight for x̄.
v) Suppose, for instance, that P (W )∩H + (W ) is not integer and let y ∈ <Γ (W )
be a fractional extreme point of P (W ) ∩ H + (W ). Let x0 ∈ <E be the solution
such that 
y(e) if e ∈ Γ (W ),
x0 (e) =
1 if e ∈ E(W ).
Since by ii) G(W ) is 2-edge connected, x0 is a solution of P (G). Moreover, as x0
is the unique solution of the system given by the constraints defining y together
with x(e) = 1, for all e ∈ E(W ), x0 is an extreme point of P (G). Furthermore,
as E(W ) ∩ Ef (x̄) 6= ∅ we have that x̄  x0 . As x0 is fractional this contradicts
the fact that x̄ is critical.
Thus P (W ) ∩ H + (W ) is integer. Since P (W ) ∩ H(W ) is a face of P (W ) ∩
H (W ), it is also integer. The proof for P (W ) ∩ H(W ) and P (W ) ∩ H + (W ) is
+

similar.

5 Proof of Theorem 3.6


In this section we shall prove our main result. To this end, we first establish
the following result, its proof uses elements of convex analysis. For the basic
definitions the reader is referred to Rockafellar [22].
Proposition 5.1. If x̄ is critical, then the set of proper tight cuts for x̄ is
empty.
Proof. Assume the contrary and let δ(S) be a proper cut tight for x̄. We
may suppose that |S| ≤ |S| and that |S| is minimum, that is |S| ≤ min(|S 0 |, |S̄ 0 |)
for any proper cut δ(S 0 ) tight for x̄. We have the following claims.
Claim 1.
i) x̄(e) = 12 for all e ∈ Ef (x̄),
ii) |δ(Z)| = 4 for every proper cut δ(Z) tight for x̄.
Proof. See [11].
Now let δ(W ) be a proper tight cut for x̄. Then by Claim 1 there are four
edges {e1 , . . . , e4 } such that δ(W
P ) = {e1 , e2 , e3 , e4 }. As a consequence, the sub-
space H(W ) = {x ∈ <δ(W ) | i=1,...,4 x(ei ) = 2} contains six points with two
components equal to 1 and two components equal to zero. Let x1 , . . . , x6 be
Critical Extreme Points 175

those points. As x̄(ei ) = 12 for i = 1, . . . , 4, by a suitable ordering we can assume


that x̄δ(W ) = 12 (xi + xi+3 ) for i = 1, 2, 3. Let us denote by Li the line (subspace
of dimension 1) generated by the points xi , xi+3 for i = 1, 2, 3 and by P li the
plane (subspace of dimension 2) generated by the lines Lj , Lk ; j 6= i 6= k.
Claim 2.
i) A line Li , i = 1, 2, 3 cannot belong to both Lp (W ) and Lp (W ).
ii) Lp (W ) ∩ H(W ) ( Lp (W ) ∩ H(W )) is either a line Li or a plane P li , for some
i ∈ {1, 2, 3}.
Proof. i) As the lines Li , i = 1, 2, 3 are contained in H(W ), the statement
follows from Lemma 2.2 i).
ii) The polyhedron P (W )∩L(W )∩H(W ) is a face of the polyhedron P (W )∩
H + (W ) and is therefore integer. Let M (W ) be the projection of P (W ) ∩ L(W ) ∩
H(W ) onto <δ(W ) . Hence M (W ) is integer and M (W ) ⊆ (Lp (W ) ∩ H(W )).
On the other hand, P (W ) ∩ L(W ) ∩ H(W ) is the smallest face of P (W ) ∩
H + (W ) containing x̄Γ (W ) . As x̄δ(W ) ∈ M (W ) and x̄δ(W ) is fractional, M (W )
must contains at least two points xi , xi+3 for some i ∈ {1, 2, 3}. (For otherwise
x̄δ(W ) could not be written as a convex combination of integer extreme points
of M (W )). Thus Li ⊂ M (W ) and hence Li ⊂ (Lp (W ) ∩ H(W )). Similarly there
is a line Lj , j 6= i, such that Lj ⊂ (Lp (W ) ∩ H(W )).
Furthemore M (W ) cannot contain the six points x1 , . . . , x6 . For otherwise one
would have Lp (W ) ∩ H(W ) = H(W ), which implies, by Lemma 2.2 i), that
Lp (W ) ∩ H(W ) = {x̄δ(W ) }. But this contradicts the fact that Lj ⊂ (Lp (W ) ∩
H(W )). Thus M (W ) has either two or four points among x1 , . . . , x6 , implying
that either Lp (W ) ∩ H(W ) = Li or Lp (W ) ∩ H(W ) = P li .
Similarly the statement holds for Lp (W ) ∩ H(W ).
Using Claim 2 we can show the following
Claim 3. If dim (Lp (W ) ∩ H(W )) = 2, then δ(W ) does not cross any tight
cut for x̄.
Proof. See [11].
By Claim 1, it follows that any cut δ(W ) (proper or not) tight for x̄ contains
either three or four edges. Thus dim(Lp (W ) ∩ H(W )) ≤ 3. In what follows we
will be mainly interested by the cases where the description of Lp (W ) ∩ H(W )
contains either one or two nontrivial equations.
If Lp (W )∩H(W ) contains only one nontrivial equation say < a, x >= b, then
this constraint corresponds to the cut equation x(δ(W )) = 2. If we consider the
solution y ∈ <Γ (W ) such that y(e) = x̄(e) + , for  > 0, if e is a fractional
edge of δ(W ) and y(e) = 1 if not, then y is a solution of P (W ). Moreover the
projection of y onto <δ(W ) , yδ(W ) verifies < a, x >> b. In this case we will say
that < a, x >= b is relaxable.
Now suppose that the description of Lp (W )∩H(W ) is given by two nontrivial
constraints
< a1 , x >= b1 ,
< a2 , x >= b2 .
Note that in this case Lp (W ) ∩ H(W ) is a plane.
176 Jean Fonlupt and Ali Ridha Mahjoub

A constraint < ai , x >= bi , i ∈ {1, 2} is said to be relaxable if there exists an


affine subspace L0 (W ) ⊂ <Γ (W ) containing L(W ) and a solution y ∈ P (W ) ∩
L0 (W ) whose projection onto <δ(W ) verifies

< ai , x >> bi ,
< aj , x >= bj .

where {j} = {1, 2} \ {i}.


Now suppose that δ(W ) is a proper cut tight for x̄. Suppose that the cuts
of τ (W ) form a laminar family. A cut δ(T ) of τ (W ) is said to be maximal with
respect to W if there is no a tight cut δ(Z) of τ (W ) with T ⊂ Z ⊂ W . Let
δ(W1 ), . . . , δ(Wr ) be the tight cuts maximal with respect to W . Let
[
E0 = δ(Wi ) ∪ δ(W ).
1≤i≤r

0
In what follows we will denote by Lq (W ) the projection of L(W ) onto <E .
Claim 4. i) Lp (W ) is the projection of Lq (W ) onto <δ(W ) .
0
ii) A solution x ∈ <E belongs to Lq (W ) if and only if xδ(Wi ) ∈ Lp (Wi ))∩H(Wi )
for i = 1, . . . , r.
Proof. See [11].
By Claim 4, if Lq (W ) is given by a system of the form
0
Lq (W ) = {x ∈ <E | < Ai , x >= bi ; i ∈ I},

where I is a set if indices, then there exits a partition I1 , . . . , Ir of I such that

Lp (Wi ) ∩ H(Wi ) = {x ∈ <Γ (Wi | < Aj , x >= bj ; j ∈ Ii },

for i = 1, . . . , r. Let < Ai0 , x >= bi0 be an equation with i0 ∈ I. Suppose that
< Ai0 , x >= bi0 is relaxable. Consider the affine subspace

L0q (W ) = {x ∈ <Γ (W ) | < Ai , x >= bi , i ∈ I \ {i0 }}.

We have that Lq (W ) ⊂ L0q (W ). Thus there is an affine subspace L0 (W ) ⊂ <Γ (W )


0
containing L(W ) whose projection onto <E is L0q (W ). Moreover there is a point
0
y ∈ P (W ) ∩ L0 (W ) whose projection onto <E verifies

< Ai0 , y >> bi0 ,


< Ai , y >= bi , i ∈ I \ {i0 }.

We say that the relaxation of < Ai0 , x >= bi0 is valid if at least one of such
points satisfies the constraint x(δ(W )) ≥ 2.
Remark 5.2. If Lq (W ) 6⊂ H(W ), then y can be chosen so that y(δ(W )) = 2,
and thus the relaxation of < Ai0 , x >= bi0 is valid.
Now suppose that Lp (W )∩H(W ) is a plane. So w.l.o.g. we may suppose that
Lp (W ) ∩ H(W ) = P l1 and therefore, a description of Lp (W ) ∩ H(W ) is given by
Critical Extreme Points 177

system (5.3). The cut δ(W ) is said to be good for W if dim (Lp (W ) ∩ H(W )) = 2
and if in the description given by (5.3), at least one of the equations has a valid
relaxation.
Now we turn to the crucial point in the proof.
Claim 5. All the proper cuts tight for x̄ are good.
Proof. Suppose for instance that δ(W ) is not good for W . We may Suppose
that |W | is minimum, that is, all the proper tight cuts δ(Z) with Z ⊂ W are
good for Z. Consequently, by Claim 3, these cuts form a laminar family. Let
δ(W1 ), . . . , δ(Wr ) be the maximal tight cuts of W . Let E 0 and Lq (W ) be as
defined in Claim 4. In what follows we are going to give a description of the affine
subspace Lq (W ). For this, let us first note that, by Claim 1, either |δ(Wi )| = 3
or |δ(Wi )| = 4, for i = 1, . . . , r.
If |Wi | = 1, and δ(Wi ) = {e1 , e2 , e3 , e4 }, (resp. δ(Wi ) = {e1 , e2 , e3 } with
e3 ∈ E1 (x̄)), then δ(Wi ) produces in Lq (W ) the constraint:
x(e1 ) + x(e2 ) + x(e3 ) + x(e4 ) = 2, (5.4)
(resp. x(e1 ) + x(e2 ) = 1). (5.5)
Note here that Lp (Wi ) ∩ H(Wi ) is given by inequality (5.4) (resp. (5.5) together
with x(e3 ) = 1), and thus (5.4) (resp. (5.5)) is relaxable.
If Wi = {u, v} for some nodes u, v ∈ V , by Lemma 4.3 i) we have that
uv ∈ E, x̄(uv) = 1 and δ(u) and δ(v) are both tight for x̄. This implies that
x̄(δ(Wi )) = 2 is redundant with respect to x̄(δ(u)) = 2, x̄(δ(v)) = 2 and x̄(uv) =
1. And thus δ(Wi ) produces two equations of type (5.5).
If δ(Wi ) is a proper cut, then by our minimality assumption, δ(Wi ) is a
good cut, and by Claim 1 ii), |δ(Wi )| = 4. Moreover if, for instance, δ(Wi ) =
{e1 , e2 , e3 , e4 }, we may suppose, w.l.o.g. that the plane Lp (Wi ) ∩ H(Wi ) is given
by system (5.3), and that at least one of the constraints of that system has a
valid relaxation.
Let Ax = b be the system given by the constraints of type (5.3),(5.4),(5.5).
Let k be the number of constraints of this system. Let E ∗ = E 0 ∩ Ef (x̄). Let
L∗q (W ) be the projection of L(W ) onto E ∗ . Note that the projection of L∗q (W )
onto <δ(W ) is Lp (W ). Also note that the maps L(W ) −→ Lq (W ) −→ L∗q (W ) −→
Lp (W ) are bijections. By Claim 4 it follows that

L∗q (W ) = {x ∈ <E ; Ax = b}.

Now remark that an edge e ∈ E ∗ \ δ(W ) belongs to at most two maximal


tight cuts δ(Wi ). And an edge e of δ(W ) belongs to at most one maximal tight
cut δ(Wi ). Thus x(e) appears in at most two rows (resp. one row) of Ax = b, if
e ∈ E ∗ \ δ(W ) (resp. e ∈ δ(W )). In what follows we are going to define connected
components of A in a similar way as for graphs. We say that two edges f and g
of E ∗ ( or the corresponding columns of A) are connected if
1) there exists a sequence f1 , . . . , fq (the fi may be the same) with f1 = f
and fq = g such that x(fi ) and x(fi+1 ) appear in a row j of Ax = b for i =
1, . . . , q − 1, and
2) the set {f1 , . . . , fq } is maximal with respect to 1).
178 Jean Fonlupt and Ali Ridha Mahjoub

Let E1 be a connected component of the column set of A, and let I1 be the set of
indices i ∈ {1, . . . , k} such that some x(f ) with f ∈ E1 appears in the equation
E∗
< Ai , x >= bi . (The submatrix AE I1 is a block of A). A point y ∈ <
1
is a
E1
solution of Ax = b if and only if yE1 is a solution of the system AI1 x = bI1 and
yE ∗ \E1 is a solution of AE ∗
I2 x = bI2 where E2 = E \ E1 and I2 = {1, . . . , k} \ I1 .
2

We claim that E1 ∩ δ(W ) 6= ∅. Indeed, as by Lemma 4.7 v) P (W ) ∩ H(W ) is


integer and therefore L∗q (W ) ∩ H(W ) so is, the system {Ax = b, x(δ(W )) = 2}
0
has an integer solution y, say. Suppose that E1 ∩ δ(W ) = ∅. Define x0 ∈ <E as
x0E1 = yE1 and x0E 0 \E1 = x̄E 0 \E1 . Clearly, x0 ∈ Lq (W ). Since x̄E1 is fractional,
we have that x0 6= x̄E 0 . Moreover, x0 and x̄E 0 have the same projection x̄δ(W )
onto <δ(W ) . But this contradicts the fact that the projection Lq (W ) −→ Lp (W )
is a bijection.
A block AE I1 of A will be called active if the projection of the affine subspace
1

E∗
{x ∈ < | < Ai , x >= bi ; i ∈ I1 } onto Rδ(W ) is contained in an hyperplane
of the form X
H = {x ∈ <δ(W ) | c(ei )x(ei ) = β}.
i=1,...,4
P
In other words, a block AE 1
I1 is active if an equation of the form c(ei )x(ei ) =
i=1,...,4
β is redundant with respect to that block. In that case we will say that the hy-
E1
perplane H is produced
P by AI1 .
Since the equation c(ei )x(ei ) = β can be obtained as a linear combination
i=1,...,4

P < Ai , x >= bi , i ∈ I1 ,P
of the equations there must exist a vector u = (u1 , . . . , uI1 )
such that ( c(ei )x(ei ) − β) = ui (< Ai , x > −bi ). As u 6= 0, we may
i=1,...,4 i∈I1 P
suppose, w.l.o.g. that ui0 = 1 for some i0 ∈ I1 . As ui Ai (f ) = 0 for every
i∈I1
f ∈ E1 \ δ(W ), for every row i ∈ I1 one should have either ui = +1 or ui = −1
in such a way that if an edge f of E1 \ δ(W ) appears in two lines i and j, then
ui + uj = 0. Moreover, starting from ui0 = +1, the coefficients ui , i ∈ I1 \ {i0 }
are determined in a unique manner. As c(ei ) 6= 0 for at least one edge ei ∈ δ(W ),
this implies that the rows i ∈ I1 are linearly independant and that at most one
hyperplane may be produced by an active block.
Let t = |E1 ∩ δ(W )| and L1 = {x ∈ <E1 ∪δ(W ) | < Ai , x >= bi , i ∈ I1 }.
As the map Lq (W ) −→ Lp (W ) is a bijection, it follows that the projection of
L1 onto <δ(W ) so is. Since the projection of L1 onto <δ(W ) is an hyperplane,
it follows that dim(L1 ) = 3. Also as AE I1 x = bI1 is a non-redundant system, it
1

follows that |E1 ∪ δ(W )| = |I1 | + 3. Hence

|E1 | = |I1 | + t − 1. (5.6)

Let l be the number of constraints i ∈ I1 of type (5.4). (Note that for this type
of constraints we have bi = 2.) We then have
P
i∈I1 < Ai , x̄ >= |I1 | + l. (5.7)
Critical Extreme Points 179

On the other hand, as x̄(e) = 12 for all e ∈ Ef (x̄), and every column correspond-
ing to an edge of E1 \ δ(W ) (resp. E1 ∩ δ(W )) contains exactly two 10 s (resp.
one 1), we have
P t t
i∈I1 < Ai , x̄ >= |E1 | − t + 2 = |E1 | − 2 . (5.8)
Combining (5.6)-(5.8) we get l = 2t − 1. Thus t is even, and therefore either t = 2
(and l = 0) or t = 4 (and l = 1). As Lp (W ) is the projection of Lq (W ) onto
<δ(W ) , each hyperplane in the description of Lp (W ) is produced by one active
block. We distinguish three cases:
Case1) There is only one active block AE I1 with |E1 ∩ δ(W )| = 2, and thus
1

dim(Lp (W )) = 3.
Case 2) There is only one active block AE I1 with |E1 ∩ δ(W )| = 4. We claim
1


that in this case E1 = E and I1 = {1, . . . , k}. In fact, if there is a further
block AE I2 , then as E1 ∩ δ(W ) 6= ∅ =
2
6 E2 ∩ δ(W ) there must exist an edge
g ∈ E1 ∩ E2 ∩ δ(W ), which is impossible. Note that here there is a constraint of
type (5.4) (l = 1) and, as in Case 1), dim(Lp (W )) = 3.
Case 3) There are two active blocks AE E2
I1 , AI2 with I1 ∪I2 = {1, . . . , k}, E1 ∪
1

E2 = E ∗ , |E1 ∩ δ(W )| = |E2 ∩ δ(W )| = 2 and dim(Lp (W )) = 2. Note that in


this case, there is no constraints of type (5.4) (l = 0).
If Case 1 holds, then Lp (W ) ∩ H(W ) is a plane, which, by Claim 2, may be
supposed to be given by system (5.3). Thus there are two scalars λ, µ ∈ < such
that
Lp (W ) = {x ∈ Rδ(W ) | λ(x(e
P 1 ) + x(e2 ) − 1) + µ(x(e3 ) + x(e4 ) − 1) = 0}
δ(W )
= {x ∈ R | c(ei )x(ei ) = β}.
i=1,...,4
As |E1 ∩δ(W )| = 2, two coefficients among c(e1 ), . . . , c(e4 ) must be equal to zero.
In consequence, we may assume, w.l.o.g., that Lp (W ) = {x ∈ Rδ(W ) | x(e1 ) +
x(e2 ) = 1}. As Lp (W ) ∩ H(W ) ⊂ Lp (W ), there must exist a solution y 0 ∈
Lp (W ) \ H(W ). Moreover, y 0 can be chosen so that y 0 (e3 ) + y 0 (e4 ) > 1 and
y 0 (e1 ) + y 0 (e2 ) = 1. Hence y 0 is the projection of a point y of P (W ) ∩ L0 (W ) onto
<δ(W ) where L0 (W ) is an affine subspace containing L(W ). As y(δ(W )) ≥ 2, the
constraint x(e3 ) + x(e4 ) = 1 has a valid relaxation. But this implies that δ(W )
is good, a contradiction.
Now let us examine the two remaining cases. For this, one can first show that
k + l is even. And as either l = 0 or l = 1, Ax = b contains at least one constraint
of type (5.5) or (5.3). So we may suppose, for instance, that the first constraint
of Ax = b is of the form < A1 , x >= 1 and is relaxable. Thus there exists a
solution x∗ ∈ (P (W ) ∩ L0 (W )) where L0 (W ) is an affine subspace containing
L(W ) such that
< A1 , x >> 1
< Ai , x >= bi , i = 2, . . . , k.
If Case 2 holds, then dim(Lp (W )) = 3 and thus Lp (W ) 6⊂ H(W ). For otherwise,
we would have Lp (W ) = H(W ) and, in consequence, Lp (W ) ∩ H(W ) is neither
a line nor a plane, which contradicts Claim 2 ii).
180 Jean Fonlupt and Ali Ridha Mahjoub

Consequently, the equation x(δ(W )) = 2 is non-redundant with respect to the


system Ax = b. By Remark 5.2, x∗ can be chosen so that x∗ (δ(W )) = 2. More-
over, as the polyhedron P (W ) ∩ L(W ) ∩ H(W ) is integer as well as its faces,

there exists an integer solution ȳ ∈ <E such that

< A1 , ȳ >> 1,
<PAi , ȳ >= bi , i = 2, . . . , k,
ȳ(ei ) = 2.
i=1,...,4

As P < A1 , ȳ >= 2. Thus


Pthe first constraint is of type (5.3)Por (5.5), we have
< Ai , ȳ >= k + l + 1 = 2 ( ȳ(f )) + ȳ(e). But the right
1,...,k f ∈(E ∗ \δ(W )) e∈δ(W )
hand side of the second equation is even, whereas k +l+1 is odd, a contradiction.
Now consider Case 3. If Lp (W ) 6⊂ H(W ), then one can get a contradiction in
a similar way as before. So suppose that Lp (W ) ⊂ H(W ). As dim(Lp (W )) = 2,
Lp (W ) is a plane and thus may be supposed to be defined by system (5.3). Let
AEI1 be the block that produces the equation x(e1 ) + x(e2 ) = 1. Thus there is a
1

vector u such that P


ui < Ai , x >= x(e1 ) + x(e2 ),
i∈I
P1
ui bi = 1.
i∈I1

Note that bi = 1 for all i ∈ I1 (since l = 0). Let I1+ (resp. I1− ) beP the set of
rows for which the coefficient ui is equal to +1 (resp. −1). We have i∈I1 ui =
|I1+ | − |I1− | = 1. We can define in a similar way I2+ and I2− for the second block
AE − −
I2 . Thus |I1 | + |I2 | = 2 + |I1 | + |I2 |. Note that all the constraints of Ax = b
2 + +

are of type (5.3) and (5.6) and thus the number of relaxable rows is greater than
or equal to the number of rows which are not relaxable. Thus there exists an
equation < Ai0 , x >= bi0 , i0 ∈ I1+ ∪ I2+ which is relaxable. W.l.o.g. we may
assume that i0 ∈ I2+ and < Ai0 , x >= bi0 is the constraint x(e3 ) + x(e4 ) = 1. So

there is a solution y 0 ∈ <E such that

< Ai0 , y 0 >> bi0 ,


< Ai , y 0 >= bi , i = 1, . . . , k, i 6= i0 .

Hence y 0 (e1 )+y 0 (e2 ) = 1 and y 0 (e3 )+y 0 (e4 ) > 1, Moreover y 0 is the projection of a
point of P (W )∩L0 (W ) where L0 (W ) is an affine subspace containing L(W ). Since
y(δ(W )) ≥ 2, this implies that δ(W ) is a good cut. But this is a contradiction
and our claim is proved.
Now as δ(S) is a proper cut tight for x̄, By Claim 5, δ(S) is good for S and
for S̄. Thus dim(Lp (S)∩H(S)) = 2 and dim(Lp (S̄)∩H(S)) = 2. In consequence,
the affine space Lp (S) ∩ Lp (S̄) ∩ H(S) contains one of the three lines L1 , L2 , L3 .
But this contradicts Lemma 2.2 i), which finishes the proof of the proposition.
Since x̄ is critical, x̄(e) > 0 for all e ∈ E. Let Vf (x̄) be the subset of nodes
incident to at least one edge of Ef (x̄). From Lemma 4.2 together with Proposition
5.1, it follows that every node of Vf (x̄) is tight for x̄. Let Gf = (Vf (x̄), Ef (x̄)).
We claim that Gf does not contain pendant nodes. In fact, assume the contrary
Critical Extreme Points 181

and let v0 be a pendant node of Gf . Let f0 be the edge of Gf adjacent to v0 .


Since x̄(δ(v0 )) ≥ 2, v0 must be incident to at least two edges of E1 (x). Since
x̄(f0 ) > 0 , this implies that v0 is not tight, a contradiction. Moreover Gf cannot
contain even cycles. In fact, if Gf contains an even cycle (e1 , e2 , · · · , ek ), then
the solution x̄0 ∈ <E such that

 x̄(e) +  if e = ei , i = 1, 3, . . . , k − 1,
x̄0 (e) = x̄(e) −  if e = ei , i = 2, 4, . . . , k,

x̄(e) otherwise,

where  is a scalar sufficiently small, would be a solution of system (2.1) different


from x̄, which is impossible.
Thus Gf does not contain even cycles. Now, using b-matchings, it is not
hard to see that the components of Gf consist of odd cycles. We claim that Gf
consists of only one odd cycle. Indeed, suppose that Gf contains two odd cycles
C 1 and C 2 . Consider the solution x̃ defined as
1
if e ∈ C 1 ,
x̃(e) = 2
1 if e ∈ E \ C 1 .

Obviously, x̃ ∈ P (G). Moreover x̃ is an extreme point of P (G) such that x̄  x̃.


But this contradicts the fact that x̄ is critical.
Consequently, Gf consists of only one odd cycle, say C. Note that every node
of C is adjacent to exactly one edge of E1 (x̄). Also note that, as x̄ is critical, the
graph induced by E1 (x̄) does not contain neither 2-edge connected subgraphs
nor nodes of degree 2. Thus this graph is a forest without nodes of degree 2
and whose pendant nodes are the nodes of C. Hence G verifies condition 1 of
Definition 3.4. Moreover it is easy to see that x̄ is given by x̄(e) = 12 for all
e ∈ C and x̄(e) = 1 for all e ∈ E \ C. As G does not contain proper tight cut
we have that x̄ also satisfies conditions 2 and 3 of Definition 3.4. Which implies
that (G, x̄) is a basic pair and our proof is complete.

References
1. Baı̈ou, M., Mahjoub, A.R.: Steiner 2-edge connected subgraphs polytopes on series
parallel graphs. SIAM Journal on Discrete Mathematics 10 (1997) 505-514
2. Barahona, F., Mahjoub, A.R.: On two-connected subgraphs polytopes. Discrete
Mathematics 17 (1995) 19-34
3. Chopra, S.: Polyhedra of the equivalent subgraph problem and some edge con-
nectivity problems. SIAM Journal on Discrete Mathematics 5 (1992) 321-337
4. Chopra, S.: The k-edge connected spanning subgraph polyhedron. SIAM Journal
on Discrete Mathematics 7 (1994) 245-259
5. Coruégols, G., Fonlupt, J., Naddef, D.: The traveling salesman problem on a
graph and some related integer polyhedra. Mathematical Programming 33 (1985)
1-27
6. Coullard, R., Rais, A., Rardin, R.L., Wagner, D.K.: The 2-connected Steiner
subgraph polytope for series-parallel graphs. Report No. CC-91-23, School of In-
dusrial Engineering, Purdue University (1991)
182 Jean Fonlupt and Ali Ridha Mahjoub

7. Coullard, R., Rais, A., Rardin, R.L., Wagner, D.K.: The dominant of the 2-
connected Steiner subgraph polytope for W4 -free graphs. Discrete Applied Math-
ematics 66 (1996) 33-43
8. Dinits, E.A,: Algorithm for solution of a problem of maximum flow in a network
unit with power estimation. Soziet. mat. Dokl. 11 (1970) 1277-1280
9. Edmonds, J., Karp, R.M.: Theoretical improvement in algorithm efficiency for
network flow problems. J. of Assoc. Comp. Mach. 19 (1972) 254-264
10. Fonlupt, J., Naddef, D.: The traveling salesman problem in graphs with some
excluded minors. Mathematical Programming 53 (1992) 147-172
11. Fonlupt, J., Mahjoub, A.R.: Critical extreme points of the 2-edge connected span-
ning subgraph polytope. Preprint (1998)
12. Goemans, M.X., Bertsimas, D.J.: Survivable Network, linear programming and
the parsimonious property. Mathematical Programming 60 (1993) 145-166
13. Grötschel, M, Lovász, L., Schrijver, A.: The ellipsoid method and its consequences
combinatorial optimization. Combinatorica 1 (1981) 70-89
14. Grötschel, M., Monma, C.: Integer polyhedra arising from certain network design
problem with connectivity constraints. SIAM Journal of Discrete Mathematics 3
(1990) 502-523
15. Grötschel, M., Monma, C., Stoer, M.: Facets for polyhedra arising in the design
of commumication networks with low-connectivity constraints. SIAM Journal on
Optimization 2 (1992) 474-504
16. Grötschel, M., Monma, C., Stoer, M.: Polyhedral approaches to network surviv-
ability. In: Roberts, F., Hwang, F., Monma, C. (eds.): Reliability of Computer and
Communication Networks, Vol. 5, Series in Discrete Mathematics and Computer
Science AMS/ACM (1991) 121-141
17. Grötschel, M., Monma, C., Stoer, M.: Computational results with a cutting
plane algorithm for designing communication networks with low-connectivity con-
straints. Operations Research 40/2 (1992) 309-330
18. Grötschel, M., Monma, C., Stoer, M.:Polyhedral and Computational Ivestigation
for designing communication networks with high suivability requirements. Oper-
ations Research 43/6 (1995) 1012-1024
19. Mahjoub, A.R.: Two-edge connected spanning subgraphs and polyhedra. Mathe-
matical Programming 64 (1994) 199-208
20. Mahjoub, A.R.: On perfectly 2-edge connected graphs. Discrete Mathematics 170
(1997) 153-172
21. Monma, C., Munson, B.S., Pulleyblank, W.R.: Minimum-weight two connected
spanning networks. Mathematical Programming 46 (1990) 153-171
22. Rockafellar, R.T.: Convex Analysis. Princeton University Press (1970)
23. Steiglitz, K.S, Weinen, P., Kleitmann, D.S.: The design of minimum cost suivable
networks. IEEE Transactions and Circuit Theory 16 (1969) 455-460
24. Stoer, M.: Design of suivable Networks. Lecture Notes in Mathematics Vol. 1531,
Springer, Berlin (1992)
25. Winter, P.: Generalized Steiner Problem in Halin Graphs. Proceedings of the 12th
International Symposium on Mathematical Programming, MIT (1985)
26. Winter, P.: Generalized Steiner Problem in series-parallel networks. Journal of
Algorithms 7 (1986) 549-566
An Orientation Theorem with Parity Conditions

András Frank1? , Tibor Jordán2?? , and Zoltán Szigeti3


1
Department of Operations Research, Eötvös University,
H-1088 Rákóczi út 5., Budapest, Hungary, and
Ericsson Traffic Laboratory, H-1037 Laborc u. 1, Budapest, Hungary.
[email protected]
2
BRICS† , Department of Computer Science, University of Aarhus,
Ny Munkegade, building 540, DK-8000 Aarhus C, Denmark.
[email protected]
3
Equipe Combinatoire, Université Paris VI,
4, place Jussieu, 75252 Paris, France.
[email protected]

Abstract. Given a graph G = (V, E) and a set T ⊆ V , an orientation


of G is called T -odd if precisely the vertices of T get odd in-degree.
We give a good characterization for the existence of a T -odd orientation
for which there exist k edge-disjoint spanning arborescences rooted at a
prespecified set of k roots. Our result implies Nash-Williams’ theorem
on covering the edges of a graph by k forests and a (generalization of a)
theorem due to Nebeský on upper embeddable graphs.

1 Introduction

Problems involving parity or connectivity conditions (for example matching the-


ory or the theory of flows) form two major areas of combinatorial optimization.
In some cases the two areas overlap. An example where parity and connectivity
conditions are imposed simultaneously is the following theorem of Nebeský.

Theorem 1. [9] A connected graph G = (V, E) has a spanning tree F for which
each connected component of G − E(F ) has an even number of edges if and only
if
|A| ≥ c(G − A) + b(G − A) − 1 (1)
holds for every A ⊆ E, where c(G − A) denotes the number of connected com-
ponents of G − A and b(G − A) denotes the number of those components D of
G − A for which |V (D)| + |E(D)| − 1 is odd. t
u
?
Supported by the Hungarian National Foundation for Scientific Research Grant
OTKA T029772.
??
Supported in part by the Danish Natural Science Research Council, grant no. 28808.

Basic Research in Computer Science, Centre of the Danish National Research Foun-
dation.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 183–190, 1999.
c Springer-Verlag Berlin Heidelberg 1999
184 András Frank, Tibor Jordán, and Zoltán Szigeti

By a result of Jungerman [7] and Xuong [11], a graph G is upper embeddable


if and only if it has a spanning tree F so that each connected component of
G − E(F ) has an even number of edges. Thus Nebeský’s theorem gives a good
characterization for upper embeddable graphs. For more details on connections
to topology see [6] or [9].
Theorem 1 can be reformulated in terms of orientations of G. Let G = (V, E)
be a connected undirected graph and let T ⊆ V . An orientation of G is called
T -odd if precisely the vertices of T get odd in-degree. It is easy to see that G
has a T -odd orientation if and only if |E| + |T | is even. From this observation
we obtain the following:

Proposition 1. Let G = (V, E) be a connected graph for which |V | + |E| − 1


is even. Then G has a spanning tree F for which each connected component of
G − E(F ) has an even number of edges if and only if for every r ∈ V there exists
a (V − r)-odd orientation of G which contains a spanning arborescence rooted at
r. t
u

Motivated by Theorem 1 and Proposition 1, our goal in this paper is to


investigate more general problems concerning orientations of undirected graphs
simultaneously satisfying connectivity and parity requirements. Namely, given
an undirected graph G = (V, E), T ⊆ V and k ≥ 0, our main result gives a
necessary and sufficient condition for the existence of a T -odd orientation of
G which contains k edge-disjoint spanning arborescences rooted at a given set
of k roots. This good characterization generalizes Theorem 1 and at the same
time slightly simplifies condition (1). Furthermore, it implies Nash-Williams’
theorem on covering the edges of a graph by k forests as well. These corollaries
are discussed in Section 3. Note that a related problem on k-edge-connected
T -odd orientations is solved in [5].
Our proof employs the proof method which was developed independently
by Gallai and Anderson and which was first used to show an elegant proof for
Tutte’s theorem on perfect matchings of graphs, see [1]. In our case the weaker
result the proof hinges on (which is Hall’s theorem in the previously mentioned
proof for Tutte’s result) is an orientation theorem of the first author (Theorem
4 below).

Let R = {r1 , ..., rk } be a multiset of vertices of G (that is, the elements of R


are not necessarily pairwise distinct). By T ⊕R we mean ((T ⊕r1 )⊕. . .)⊕rk , where
⊕ denotes the symmetric difference. For some X ⊆ V the subgraph induced by
X is denoted by G[X]. The number of edges in G[X] is denoted by i(X). For
a partition P = {V1 , ..., Vt } of V with t elements the set of edges connecting
different elements of P is denoted by E(P). We set e(P) = |E(P)|. A subset F
of edges of a directed graph D is a spanning arborescence rooted at r if F forms
a spanning tree and each vertex has in-degree one except r. The in-degree of a
set X ⊆ V in a directed graph D = (V, E) is denoted by ρ(X). The following
well-known result is due to Edmonds.
An Orientation Theorem with Parity Conditions 185

Theorem 2. [3] Let R be a multiset of vertices of size k in a directed graph


D = (V, E). Then D contains k edge-disjoint spanning arborescences rooted at
R if and only if ρ(X) ≥ k − |X ∩ R| for every X ⊆ V. t
u

The following result is due to Frank.

Theorem 3. [4, Theorem 2.1] Let H = (V, E) be a graph, and let g : V → Z+


be a function. Then there exists an orientation of H whose in-degree function ρ
satisfies ρ(v) = g(v) for every v ∈ V if and only if the following two conditions
hold.
g(V ) = |E| (2)

g(X) ≥ i(X) for every ∅ =


6 X ⊆ V. (3)

t
u

We shall rely on the following orientation theorem, which is easy to prove


from Theorem 3 and Theorem 2.

Theorem 4. Let H = (V, E) be a graph, let R0 = {r10 , . . . , rk0 } be a multiset of k


vertices of H and let g : V → Z+ be a function. Then there exists an orientation
of H whose in-degree function ρ satisfies ρ(v) = g(v) for every v ∈ V and for
which there exist k edge-disjoint spanning arborescences with roots {r10 , . . . , rk0 }
if and only if (2) and the following condition hold.

g(X) ≥ i(X) + k − |X ∩ R0 | for every ∅ =


6 X ⊆ V. (4)

t
u

Note that R is a multiset in Theorem 4, hence by |X ∩ R| in (4) we mean


|{ri ∈ R : ri ∈ X, i = 1, ..., k}|. This convention will be used later on, whenever
we take the intersection (or union) with a multiset.
Given G = (V, E), T ⊆ V , k ∈ Z+ and a partition P = {V1 , ..., Vt } of V , an
element Vj (1 ≤ j ≤ t) is called odd if |Vj ∩ T | − i(Vj ) − k is odd, otherwise Vj is
even. The number of odd elements of P is denoted by sG (P, T, k) (where some
parameters may be omitted if they are clear from the context). Our main result
is the following.

Theorem 5. Let G = (V, E) be a graph, T ⊆ V and let k ≥ 0 be an integer.


For a multiset of k vertices R = {r1 , . . . , rk } of V there exists a T ⊕ R-odd
orientation of G for which there exist k edge-disjoint spanning arborescences
with roots {r1 , . . . , rk } if and only if

e(P) ≥ k(t − 1) + s(P, T ) (5)

holds for every partition P = {V1 , . . . Vt } of V .


186 András Frank, Tibor Jordán, and Zoltán Szigeti

2 The Proof of Theorem 5

Proof. (of Theorem 5) To see the necessity of condition (5), consider an orien-
tation of G with the required properties and some partition P = {V1 , ..., Vt } of
V . The following fact is easy to observe.

Proposition 2. For every T ⊕ R-odd orientation of G and for every Vj (1 ≤


j ≤ t) we have |Vj ∩ T | − i(Vj ) − k ≡ ρ(Vj ) − (k − |Vj ∩ R|) (mod 2).
P
Proof. Since the orientation is T ⊕R-odd, we obtain ρ(Vj )+i(Vj ) = v∈Vj ρ(v) ≡
|Vj ∩ (T ⊕ R)| ≡ |Vj ∩ T | − |Vj ∩ R| and the lemma follows. t
u

Since there exist k edge-disjoint arborescences rooted at vertices of R, ρ(Vj )−


(k − |Vj ∩ R|) ≥ 0 for each Vj . If this number is odd (or, equivalently
P by Propo-
sition 2, if Vj is odd) then it is at least one. This yields e(P) = Vj ∈P ρ(Vj ) =
P
Vj ∈P (ρ(Vj )−(k−|Vj ∩R|))+kt−|V ∩R| ≥ s(P, T )+kt−k = k(t−1)+s(P, T ),
hence the necessity follows.
In what follows we prove that (5) is sufficient. An orientation is called good
if the directed graph obtained contains k edge disjoint spanning arborescences
rooted at R. Let us suppose that the statement of the theorem does not hold
and let us take a counter-example (that is, a graph G = (V, E) with T , R and k,
for which (5) holds but no good T ⊕ R-odd orientation exists) for which |V | + |E|
is as small as possible.

Proposition 3. e(P) ≡ k(t − 1) + s(P, T ) (mod 2) for every partition P.

Proof. By choosing P0 P = {V } in (5) we obtain that |T | − |E| − k is even.


t
This implies s(P, T ) ≡ 1 (|Vj ∩ T | − i(Vj ) − k) = |T | − (|E| − e(P)) − kt =
|T | − |E| − k + e(P) − k(t − 1) ≡ e(P) − k(t − 1). t
u

We call a partition P of V consisting of t elements tight if e(P) = k(t − 1) +


s(P, T ) and t ≥ 2.

Lemma 1. There exists a tight partition of V .

Proof. Let ab be an arbitrary edge of G. Focus on the graph G0 = G − ab and


the modified set T 0 = T ⊕ b. If there was a good T 0 ⊕ R-odd orientation of
G0 then adding the arc ab would provide a good T ⊕ R-odd orientation of G,
which is impossible. Thus, by the minimality of G, there exists a partition P
of V consisting of t elements violating (5) in G0 , that is, using Proposition 3,
eG0 (P) + 2 ≤ k(t − 1) + sG0 (P, T 0 ). Clearly, t ≥ 2 holds.
For the same partition in G we have eG (P) ≤ eG0 (P)+ 1 and also sG (P, T ) ≥
sG0 (P, T 0 ) − 1, since adding the edge ab and replacing T ⊕ b by T may change
the parity of at most one element of the partition. Thus k(t − 1) + sG (P, T ) ≥
k(t − 1) + sG0 (P, T 0 ) − 1 ≥ eG0 (P) + 2 − 1 ≥ eG (P) ≥ k(t − 1) + sG (P, T ), hence
P is tight in G and the lemma follows. t
u
An Orientation Theorem with Parity Conditions 187

Let us fix a tight partition P = {V1 , ..., Vt } in G for which t is maximal.


Denote the number of odd components of P by s.

Lemma 2. Every element Vj of P (1 ≤ j ≤ t) has the following property:


(a) if Vj is even, then (5) holds in G[Vj ], with respect to T ∩ Vj ,
(b) if Vj is odd, then for each vertex v ∈ Vj , (5) holds in G[Vj ], with respect to
(T ∩ Vj ) ⊕ v.

Proof. We handle the two cases simultaneously. Suppose that there exists a
partition P 0 of t0 elements in G[Vj ] violating (5) (with respect to T ∩ Vj in case
(a) or with respect to (T ∩Vj )⊕v, for some v ∈ Vj , in case (b)). By Proposition 3
this implies k(t0 −1)+s0 ≥ e(P 0 )+2, where s0 denotes the number of odd elements
of P 0 . Consider the partition P 00 = (P − Vj ) ∪ P 0 , consisting of t00 elements from
which s00 are odd. Clearly, e(P 00 ) = e(P)+ e(P 0 ) and t00 = t+ t0 − 1. Furthermore,
s00 ≥ s + s0 − 2, since the parity of at most two elements may be changed (these
are Vj – only in case (b) – and the element in P 0 which contains the vertex v –
only in case (b) again). Since (5) holds for P 00 by the assumption of the theorem,
we have k(t00 − 1) + s00 ≤ e(P 00 ) = e(P) + e(P 0 ) ≤ k(t − 1) + s + k(t0 − 1) + s0 − 2 =
k((t + t0 − 1) − 1) + s + s0 − 2 ≤ k(t00 − 1) + s00 . Thus P 00 is a tight partition with
t00 > t, which contradicts the choice of P. t
u

Let us denote the graph obtained from G by contracting each element Vj of


P into a single vertex vj (1 ≤ j ≤ t) by H. Let R0 = {r10 , ..., rk0 } denote the
multiset of vertices of H corresponding to the vertices of R in G (that is, every
root in some Vj yields a new root vj ). Furthermore, let A denote those vertices
of H which correspond to odd elements of P and let B = V (H) − A. Note that
since P is tight, we have

|E(H)| = e(P) = k(t − 1) + s(P). (6)

Now define the following function g on the vertex set of H.



k + 1 − |Vj ∩ R| if vj ∈ A
g(vj ) =
k − |Vj ∩ R| otherwise.

Lemma 3. There exists an orientation of H whose in-degree function is g and


which contains k edge-disjoint spanning arborescences with roots {r10 , ..., rk0 }.

Proof. To prove the lemma we have to verify that the two conditions (2) and
(4) of Theorem 4 are satisfied. First we can see that g(V (H)) = g(A) + g(B) =
s(k + 1) + (t − s)k − k = k(t − 1) + s = |E(H)| by the definition of g and by (6).
Thus (2) is satisfied.
To verify (4), let us choose an arbitrary non-empty subset X of VS(H). Let
us define the partition P ∗ of V (G) by P ∗ := {Vj : vj ∈ V (H) − X} ∪ vj ∈X Vj .
Then P ∗ has t∗ = t − |X| + 1 elements and the number of its odd elements s∗ is
at least s − |X ∩ A|. Applying (5) for P ∗ , it follows that k(t∗ − 1) + s∗ ≤ e(P ∗ ).
188 András Frank, Tibor Jordán, and Zoltán Szigeti

Hence k((t − |X| + 1) − 1) + s − |X ∩ A| ≤ k(t∗ − 1) + s∗ ≤ e(P ∗ ) = e(P) − i(X) =


k(t − 1) + s − i(X). From this it follows that i(X) + k ≤ k|X| + |X ∩ A|. Therefore
i(X)+k−|X∩R0 | ≤ k|X|+|X∩A|−|X∩R0 | = |X∩A|(k+1)+|X∩B|k−|X∩R0 | =
g(X ∩ A) + g(X ∩ B) = g(X), proving that (4) is also satisfied. t
u

Let us fix an orientation of H whose in-degree function ρH = g and which


contains a set F of k edge-disjoint spanning arborescences {F1 , ..., Fk } with roots
{r10 , ..., rk0 }. Such an orientation exists by Lemma 3. Observe, that this orientation
of H corresponds to a partial orientation of G (namely, an orientation of the edges
of E(P)).
For any vertex vj of H there are g(vj ) arcs entering vj . If Vj is even then
each arc entering vj belongs to some arborescence in F . If Vj is odd then each
arc entering vj except exactly one belongs to some arborescence of F , by the
definition of g.
For an arbitrary Vj ∈ P let us denote by RjH the multiset of those vertices
in Vj which are the heads of the arcs of this partial orientation entering Vj and
belonging to some arborescence in F . By the definition of g, we have |RjH | =
k − |Vj ∩ R|. Let Rj = (Vj ∩ R) ∪ RjH . Note that |Rj | = |Vj ∩ R| + |RjH | = k.
Furthermore, if Vj is odd then let us denote by aj the vertex in Vj which is the
head of the unique arc entering Vj and not belonging to any arborescence in F .
Let Tj = T ∩ Vj if Vj is even and let Tj = (T ∩ Vj ) ⊕ aj if Vj is odd.
By the minimality of G and since |Vj | < |V (G)| for each 1 ≤ j ≤ t, Lemma 2
implies that for each j there exists a Tj ⊕ Rj -odd orientation of G[Vj ] which con-
tains k edge-disjoint spanning arborescences with roots in Rj . Combining these
orientations of the subgraphs induced by the elements of P and the orientation
of E(P) obtained earlier, we get an orientation of G. This orientation is clearly
a good T ⊕ R-odd orientation of G, contradicting our assumption on G. This
contradiction proves the theorem. t
u

3 Corollaries
As we reformulated Theorem 1 in terms of odd orientations and spanning ar-
borescences, we can similarly reformulate Theorem 5 in terms of even compo-
nents and spanning trees.

Theorem 6. A graph G = (V, E) has k edge-disjoint spanning trees F1 , ..., Fk


so that each connected component of G − ∪k1 E(Fi ) has an even number of edges
if and only if
e(P) ≥ k(t − 1) + s (7)
holds for each partition P = {V1 , . . . Vt } of V , where s is the number of those
elements of P for which i(Vj ) + k(|Vj | − 1) is odd.

Proof. As we observed, G has an oriention for which the in-degree of every vertex
is even if and only if each connected component of G contains an even number of
edges. Thus the desired spanning trees exist in G if and only if G has a T ⊕R-odd
An Orientation Theorem with Parity Conditions 189

orientation which contains k edge-disjoint r-arborescences, where T = V , if k is


odd, T = ∅, if k is even, and R = {r1 , ..., rk }, ri = r (i = 1, ..., k) for an arbitrary
r ∈ V . Based on this fact, Theorem 5 proves the theorem (by observing that (5)
specializes to (7) due to the special choice of T ). t
u

The special case k = 1 of Theorem 6 corresponds to Theorem 1. Since (1)


implies (7) if k = 1, Theorem 6 applies and we obtain a slightly simplified
version of Nebeskýs result. Note also that our main result provides a proof of
different nature for Theorem 1 by using Theorem 4. Actually, Nebeský proved the
defect form of the previous result, showing a min-max equality for the minimum
number of components with odd number of edges of G−E(F ) among all possible
spanning trees F of G (hence characterizing the maximum genus of the graph).
The next corollary we prove is Nash-Williams’ classical theorem on forest
covers.
Corollary 1. [8] The edges of a graph G = (V, E) can be covered by k forests if
and only if
i(X) ≤ k(|X| − 1) (8)
holds for every ∅ =
6 X ⊆ V.
Proof. We consider the sufficiency of the condition. Let G = (V, E) be a graph
for which (8) holds. The first claim is that we can add new edges to G until the
number of edges equals k(|V |−1) without destroying (8). To see this, observe that
the addition of a new edge e = xy (which may be parallel to some other edges
already present in G) cannot be added if and only if x, y ∈ Z for some Z ⊆ V
with i(Z) = k(|Z| − 1). Such a set, satisfying (8) with equality, will be called full.
It is well-known that the function i : 2V → Z+ is supermodular. Therefore for
two intersecting full sets Z and W we have k(|Z|−1)+k(|W |−1) = i(Z)+i(W ) ≤
i(Z ∩ W ) + i(Z ∪ W ) ≤ k(|Z ∩ W | − 1) + k(|Z ∪ W | − 1) = k(|Z| − 1) + k(|W | − 1).
Thus equality holds everywhere, and the sets Z ∩ W and Z ∪ W are also full.
Now let F be a maximal full set (we may assume F 6= V ) and e = xy for some
pair x ∈ F, y ∈ V − F . If we destroyed (8) by adding e, we would have a full set
x, y ∈ F 0 in G intersecting F , hence F ∪ F 0 would also be full by our previous
observation. This contradicts the maximality of F .
Thus in the rest of the proof we may assume that |E| = k(|V | − 1). We claim
that there exist k edge-disjoint spanning trees in G. The existence of these trees
immediately implies that G can be covered by k forests because |E| = k(|V |− 1).
By Theorem 6, it is enough to prove that (7) holds in G. Let P = {V1 , ..., Vt }
be a partition of V and let V1 , ..., Vs denote the odd elements of P (with respect
to k). Observe that for an odd element Vj the parity of i(Vj ) and k(|Vj |−1) must
be different (this holds for even k and for odd k as well), hence P these numbers
cannotP be equal. Thus we can
P count as follows: e(P) = |E| − i(V
P i ) = k(|V | −
1) − (i(Vi )P : Vi is even) − (i(Vj ) : Vj is odd) ≥ k(|V | − 1) − (k(|Vi | − 1) :
Vi is even)− (k(|Vj |−1)−1 : Vj is odd) = k(|V |−1)−k|V |+kt+s = k(t−1)+s,
as required. t
u
190 András Frank, Tibor Jordán, and Zoltán Szigeti

4 Remarks

The existence of a spanning tree of Theorem 1 with the required properties


can be formulated as a linear matroid parity problem, hence it can be obtained
from Lovász’s characterization concerning the existence of a perfect matroid-
matching [10] as well. (Although it is not an easy task to deduce Nebeský’s
result.) The reduction to a certain co-graphic matroid-parity problem was shown
by Furst et al. [6]. Based on this, they developed a polynomial algorithm for the
corresponding optimization problem. A similar reduction, where the matroid is
the dual of the sum of k graphic matroids, works in the more general case of
Theorem 6, too. However, from algorithmic point of view, such a reduction is
not satisfactory, since it is not known how to represent the matroid in question.
Finally we remark that Proposition 1 was also observed independently by
Chevalier et al. [2].

References
1. I. Anderson, Perfect matchings of a graph, J. Combin. Theory Ser. B, 10 (1971),
183-186.
2. O. Chevalier, F. Jaeger, C. Payan and N.H. Xuong, Odd rooted orientations and
upper-embeddable graphs, Annals of Discrete Mathematics 17, (1983) 177-181.
3. J. Edmonds, Edge-disjoint branchings, in: R. Rustin (Ed.), Combinatorial Algo-
rithms, Academic Press, (1973) 91-96.
4. A. Frank, Orientations of graphs and submodular flows, Congr. Numer. 113 (1996),
111-142.
5. A. Frank, Z. Király, Parity constrained k-edge-connected orientations, Proc.
Seventh Conference on Integer Programming and Combinatorial Optimization
(IPCO), Graz, 1999. LNCS, Springer, this issue.
6. M.L. Furst, J.L. Gross, and L.A. McGeoch, Finding a maximum genus graph
imbedding, J. of the ACM, Vol. 35, No. 3, July 1988, 523-534.
7. M. Jungerman, A characterization of upper embeddable graphs, Trans. Amer.
Math. Soc. 241 (1978), 401-406.
8. C. St. J. A. Nash-Williams, Edge-disjoint spanning trees of finite graphs, J. London
Math. Soc. 36 (1961), 445-450.
9. L. Nebeský, A new characterization of the maximum genus of a graph, Czechoslo-
vak Mathematical Journal, 31 (106) 1981, 604-613.
10. L. Lovász, Selecting independent lines from a family of lines in a space, Acta Sci.
Univ. Szeged 42, 1980, 121-131.
11. N.H. Xuong, How to determine the maximum genus of a graph, J. Combin. Theory
Ser. B 26 (1979), 217-225.
Parity Constrained k-Edge-Connected
Orientations?

András Frank1 and Zoltán Király2


1
Department of Operations Research, Eötvös University, Rákóczi út 5, Budapest,
Hungary, H-1088 and Ericsson Traffic Laboratory, Laborc u. 1, Budapest,
Hungary, H-1037. [email protected]
2
Department of Computer Science, Eötvös University, Rákóczi út 5, Budapest,
Hungary, H-1088. [email protected]

Abstract. Parity (matching theory) and connectivity (network flows)


are two main branches of combinatorial optimization. In an attempt to
understand better their interrelation, we study a problem where both
parity and connectivity requirements are imposed. The main result is
a characterization of undirected graphs G = (V, E) having a k-edge-
connected T -odd orientation for every subset T ⊆ V with |E| + |T | even.
(T -odd orientation: the in-degree of v is odd precisely if v is in T .) As
a corollary, we obtain that every (2k + 2)-edge-connected graph with
|V | + |E| even has a k-edge-connected orientation in which the in-degree
of every node is odd. Along the way, a structural characterization will
be given for digraphs with a root-node s having k + 1 edge-disjoint paths
from s to every node and k edge-disjoint paths from every node to s.

1 Introduction

The notion of parity plays an important role in describing combinatorial struc-


tures. The prime example is W. Tutte’s theorem [T47] on the existence of a
perfect matching of a graph. Later, the notion of ”odd components” has been
extended and used by W. Mader [M78] in his disjoint A-paths theorem, by R.
Giles [G82] in describing matching-forests, by L. Nebeský [N81] in determining
the maximum genus, by W. Cunningham and J. Geelen [CG97] in character-
izing optimal path-matchings. L. Lovász’ [L80] general framework on matroid
parity (as its name already suggests) also relies on odd components. Sometimes
parity comes in already with the problem formulation. Lovász [L70] for example
considered the existence of subgraphs with parity prescription on the degree of
nodes. The theory of T -joins describes several problems of this type.
Another large class of combinatorial optimization problems concerns connec-
tivity properties of graphs, in particular, the role of cuts, partitions, trees, paths,
and flows are especially well studied.
?
Research supported by the Hungarian National Foundation for Scientific Research
Grant, OTKA T17580 and OTKA F014919. Part of the research was conducted
while the first-named author was visiting EPFL (Lausanne, Switzerland, 1998).

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 191–201, 1999.
c Springer-Verlag Berlin Heidelberg 1999
192 András Frank and Zoltán Király

In some cases the two areas overlap (for example, when we are interested
in finding paths or subgraphs of given properties and the parity comes in the
characterization). For example, Seymour’s theorem [S81] on minimum T -joins
implies a result on the edge-disjoint paths problem in planar graphs. In [FST84]
some informal analogy was pointed out between results on parity and on connec-
tivity, but in order to understand better the relationship of these two big aspects
of combinatorial optimization, it is desirable to explore further problems where
both parity and connectivity requirements are imposed. For example, Nebeský
provided a characterization of graphs having an orientation in which every node
is reachable from a given node by a directed path and the in-degree of every
node is odd. Recently, [FJS98] extended this result to the case where, beside the
parity constraints, the existence of k edge-disjoint paths was required from the
specific node to every other one.
The goal of the present paper is to provide a new contribution to this broader
picture. The main result is about orientations of undirected graphs simultane-
ously satisfying connectivity and parity requirements. The following concepts of
connectivity will be used.

Let l be a positive integer. A digraph D = (V, A) is l-edge-connected


(l-ec, for short) if the in-degree %(X) = %D (X) of X, that is, the number
of edges entering X is at least l for every non-empty proper subset X of V .
The out-degree δ(X) = δD (X) is the number of edges leaving X, that is,
δ(X) = %(V − X). The 1-ec digraphs are called strongly connected. A digraph
D is said to be l− -edge-connected (l− -ec) if it has a node s, called root node,
so that %(X) ≥ l for every subset X with ∅ ⊂ X ⊆ V − s, and %(X) ≥ l − 1 for
every subset X with s ∈ X ⊂ V . When the role of the root is emphasized, we
say that D is l− -ec with respect to s. Throughout the root-node will be denoted
by s. Note that by reorienting the edges of a directed path from s to another
node t of an l− -ec digraph one obtains an l− -ec digraph with respect to root t.
Define a set-function pl as follows. Let pl (∅) := pl (V ) := 0 and

l if ∅ ⊂ X ⊆ V − s
pl (X) := (1.1)
l − 1 if s ∈ X ⊂ V.

By the definition, D is l− -ec if and only if %(X) ≥ pl (X) holds for every X ⊆ V .
By Menger’s theorem the l− -edge-connectivity of D is equivalent to requiring
that D has l edge-disjoint paths from s to every node and l − 1 edge-disjoint
paths from every node to s.

An undirected graph G = (V, E) is l-edge-connected (l-ec) if the number


d(X) of edges connecting any non-empty proper subset X of V and its comple-
ment V − X is at least l. G is called 2l− -edge-connected (2l− -ec) if

eG (F ) ≥ lt − 1 (1.2a)

for every partition F := {V1 , V2 , . . . , Vt } of V (1.2b)


Parity Constrained k-Edge-Connected Orientations 193

where eG (F ) denotes the number of edges connecting distinct parts of F . Through-


out we will assume on partitions to admit at least two non-empty classes and no
empty ones. P
Note that a 2l-ec graph is automatically 2l− -ec since eG (F ) = ( i d(Vi ))/2 ≥
(2l)t/2 = lt, that is, (1.2) is satisfied. Also, a 2l− -ec graph G is (2l − 1)-ec since
(1.2), when specialized to t = 2, requires for every cut of G to have at least 2l − 1
edges. In other words, 2l− -edge-connectivity is somewhere between (2l − 1)-ec
and 2l-ec.

Let T be a subset of V . We call T G-even if |T | + |E| is even. An orientation


of G is called T -odd if the in-degree of a node v is odd precisely if v ∈ T . It is
easy to prove that a connected graph has a T -odd orientation if and only if T is
G-even. (Namely, if an orientation is not yet T -odd, then there are at least two
bad nodes. Let P be a path in the undirected sense that connects two bad nodes.
By reversing the orientation of all of the edges of P , we obtain an orientation
having two fewer bad nodes.) Note that if we subdivide each edge of G by a
new node and let T 0 be denote the union of T and the set of subdividing nodes,
then there is a one-to-one correspondence between T -odd orientations of G and
T 0 -joins of the subdivided graph G0 . (A T 0 -join is a subgraph of G0 in which a
node v is of odd degree precisely if v belongs to T 0 .)

As a main result, we will prove that an undirected graph G admits a k-


ec T -odd orientation for every (!) G-even subset T of nodes if and only if G
is (2k + 2)− -ec. This is a co-NP characterization. An NP-characterization will
also be derived by showing how these graphs can be built up from one node by
repeated applications of two elementary graph-operations. It will also be shown
that an undirected graph G has an l− -ec orientation if and only if G is 2l− -ec.
Finally, a structural characterization of l− -ec digraphs will be given and used
for constructing all 2l− -ec undirected graphs.
A remaining important open problem of the area, which has motivated the
present investigations, is finding a characterization of graphs having a k-edge-
connected orientation in which the in-degree of every node is of specified parity.
No answer is known even in the special case k = 1.
The organization of the rest of the paper is as follows. The present section
is completed by listing some definitions and notation. In Section 2 we formulate
the new and the auxiliary results. Section 3 includes a characterization of l− -ec
digraphs. The last section contains the proof of the main theorem.

For two sets X and Y , X − Y denotes the set of elements of X which are
not in Y , and X ⊕ Y := (X − Y ) ∪ (Y − X) denotes the symmetric difference.
A one-element set is called singleton. We often will not distinguish between a
singleton and its element. In particular, the in-degree of a singleton {x} will be
denoted by %(x) rather than %({x}). For a set X and an element r, we denote
X ∪ {r} by X + r.
For a directed or undirected graph G, let iG (X) = i(X) denote the number
¯
of edges having both end-nodes in X. Let d(X, Y ) (respectively, d(X, Y )) denote
194 András Frank and Zoltán Király

the number of edges connecting a node of X − Y and a node of Y − X (a node


of X ∩ Y and a node of V − (X ∪ Y )). Simple calculation yields the following
identities for the in-degree function % of a digraph G:

%(X ∩ Y ) + %(X ∪ Y ) = %(X) + %(Y ) − d(X, Y ), (1.3a)

¯
%(X − Y ) + %(Y − X) = %(X) + %(Y ) − d(X, Y ) − [%(X ∩ Y ) − δ(X ∩ Y )]. (1.3b)
Let f be an edge and r a node of G. Then G−f and G−r denote, respectively,
the (di-)graphs arising from G by deleting edge f or node r.

Two subsets X and Y of node-set V are called intersecting if none of sets


X − Y, Y − X, X ∩ Y is empty. If, in addition, V − (X ∪ Y ) is non-empty, then X
and Y are crossing. A family of subsets containing no two crossing (respectively,
intersecting) sets is called cross-free (laminar).
Let p be a non-negative, integer-valued set-function on V for which p(∅) =
p(V ) = 0. Function p is called crossing supermodular if p(X) + p(Y ) ≤
p(X ∩Y )+p(X ∪Y ) holds for every pair of crossing subsets X, Y of V . When this
inequality is required only for crossing sets X, Y with p(X) > 0 and p(Y ) > 0,
we speak of positively crossing supermodular functions. A set-function p is
called monotone decreasing if p(X) ≥ p(Y ) holds whenever ∅ = 6 X ⊆Y.
For a number x, let x+ := max(0, x). ForP a function m : V → R and subset
X ⊆ V we will use the notation m(X) := (m(v) : v ∈ X).

2 Results: New and Old


Our main result is as follows.
Theorem 2.1 Let G = (V, E) be an undirected graph with n ≥ 1 nodes and let
k be a positive integer. The following conditions are equivalent.

(1) For every G-even subset T ⊆ V , G has a k-edge-connected T -odd orientation,


(2) G is (2k + 2)− -edge-connected, that is,
eG (F ) ≥ (k + 1)t − 1 (2.1a)
for every partition F := {V1 , V2 , . . . , Vt } of V , (2.1b)
(3) G can be constructed from a node by a sequence of the following two opera-
tions:
(i) add an undirected edge connecting two (not necessarily distinct) existing
nodes,
(ii) choose a subset F of k (distinct) existing edges, subdivide each element
of F by a new node, identify the k new nodes into one, denoted by r, and
finally connect r with an arbitrary existing node u (that may or may not
be an end-node of a subdivided edge).

Since any l-ec graph is l− -ec, we have the following corollary.


Parity Constrained k-Edge-Connected Orientations 195

Corollary. A (2k+2)-edge-connected undirected graph G = (V, E) with |E|+|V |


even has a k-edge-connected orientation so that the in-degree of every node is
odd. t
u

We do not know any simple proof of this result even in the secial case of
k = 1. W. Mader [M82] proved a structural characterization of l-ec digraphs.
We are going to show an analogous result for l− -ec digraphs which will be used
in the proof of Theorem 2.1 but may perhaps be interesting for its own sake, as
well.

Theorem 2.2 Let D = (V, A) be a digraph and let l ≥ 2 be an integer. Then


D is l− -edge-connected if and only if D can be constructed from a node by a
sequence of the following two operations:
(j) add a directed edge connecting two (not necessarily distinct) existing nodes,
(jj) choose a set F of l − 1 (distinct) existing edges, subdivide each element of F
by a new node, identify the l − 1 new nodes into one, denoted by r, and finally
add a directed edge from an old node u (that may or may not be an end-node of
a subdivided edge) to r.

C.St.J.A. Nash-Williams [N60] proved that an undirected graph G has an


l-edge-connected orientation if and only if G is 2l-edge-connected. We will need
the following auxiliary result.

Theorem 2.3 An undirected graph G has an l− -edge-connected orientation if


and only if G is 2l− -edge-connected.

This result turns out to be an easy consequence of the following theorem.

Theorem 2.4 [F80] Let G = (V, E) be an undirected graph. Suppose that p is


a non-negative integer-valued crossing supermodular set-function on V for which
p(∅) = p(V ) = 0. Then there exists an orientation of G for which %(X) ≥ p(X)
holds for every X ⊆ V if and only if both
X
eG (F ) ≥ p(Vi ) (2.2a)
i

and X
eG (F ) ≥ p(V − Vi ) (2.2b)
i

hold for every partition F = {V1 , . . . , Vt } of V . If, in addition, p is monotone


decreasing, then already (2.2a) is sufficient. t
u

In the proof of Theorem 2.2 we will need another auxiliary result.

Theorem 2.5 [F94] Let U be a ground-set, p a non-negative, integer-valued


positively crossing supermodular set-function on U for which p(∅) = p(U ) = 0.
196 András Frank and Zoltán Király

Let mi , mo be two non-negative integer-valued functions on U for which mi (U ) =


mo (U ). There exists a digraph H = (U, F ) for which
%H (X) ≥ p(X) for every X ⊆ U (2.3)
and
%H (v) = mi (v), δH (v) = mo (v) for every v ∈ U (2.4)
if and only if
mi (X) ≥ p(X) for every X ⊆ U (2.5a)
and
mo (U − X) ≥ p(X) for every X ⊆ U. (2.5b)
In Theorem 2.1 property (2) may be viewed as a co-NP characterization while
(3) is an NP-characterization of property (1). The following result provides two
other equivalent characterizations.
Theorem 2.6 Let G = (V, E) be an undirected graph with n ≥ 1 nodes and let
k be a positive integer. The following conditions are equivalent.
(1) For every G-even subset T of V , G has a k-edge-connected T -odd orientation,
(4) G has an orientation which is (k + 1)− -edge-connected,
(5) G − J contains k + 1 edge-disjoint spanning trees for every choice of a k-
element subset J of edges.

Proof. By Theorem 2.1, (1) is equivalent to (2) which in turn, by Theorem


2.3, is equivalent to (4). (5) clearly implies (2). To see that (2) implies (5), we
apply Tutte’s theorem on disjoint spanning trees [T61] which asserts that a graph
G = (V, E) contains k 0 edge-disjoint spanning trees if and only if eG (F ) ≥ k 0 t−k 0
holds for every partition F := {V1 , V2 , . . . , Vt } of V . Applying this result to
k 0 = k + 1, one can observe that (2) in Theorem 2.1 is equivalent to (5). t
u
We remark that (5) can be derived directly from (3) without invoking Tutte’s
theorem. It is tempting to try to find a direct, short proof of the equivalence of
(1) and (4), or at least one of the two opposite implications. We did not succeed
even in the special case k = 1.

3 l, -Edge-Connected Digraphs
Let D = (V, A) be a digraph which is l− -ec with respect to root-node s, that is,
%(X) ≥ pl (X) for every X ⊂ V where pl is defined in (1.1). We call a set tight
if %(X) = pl (X). A node r of D and the set {r} as well will be called special if
%(r) = l = δ(r) + 1. (Since δ(s) ≥ l, s is not special.)
Lemma 3.1 Suppose that a digraph D = (V, A) with |V | ≥ 2 is l− -ec where
l ≥ 2. Then there is an edge f = ur of D which does not enter any non-special
tight set.

Proof. We need some preparatory claims.


Parity Constrained k-Edge-Connected Orientations 197

Claim A For crossing sets X, Y , one has pl (X)+pl (Y ) = pl (X ∩Y )+pl (X ∪Y )


and pl (X) + pl (Y ) ≤ pl (X − Y ) + pl (Y − X). t
u
Claim B The intersection of two crossing tight sets X, Y is not special.
Proof. By (1.3b) we have %(X)+%(Y ) = pl (X)+pl (Y ) ≤ pl (X−Y )+pl (Y −X) ≤
¯
%(X − Y ) + %(Y − X) = %(X) + %(Y ) − d(X, Y ) − [%(X ∩ Y ) − δ(X ∩ Y )] from
which %(X ∩ Y ) = δ(X ∩ Y ) follows and hence X ∩ Y cannot be special. t
u
Claim C For two crossing tight sets X, Y , both X ∩ Y and X ∪ Y are tight.
Moreover, d(X, Y ) = 0 holds.
Proof. By (1.3a) we have %(X)+%(Y ) = pl (X)+pl (Y ) = pl (X ∩Y )+pl (X ∪Y ) ≤
%(X ∩ Y ) + %(X ∪ Y ) = %(X) + %(Y ) − d(X, Y ) from which equality holds
everywhere, and the claim follows. t
u
Let us turn to the proof of the lemma and suppose indirectly that there is
a family T of (distinct) non-special tight setsP
so that every edge of D enters a
member of T , and assume that T maximizes (|X|2 : X ∈ T ).
Claim D T contains no two crossing members.
Proof. Suppose indirectly that X and Y are two crossing members of T . By
Claim B, X ∩ Y is not special. By Claim C, X ∩ Y and X ∪ Y are tight. Hence
T 0 := T − {X, Y } ∪ {X ∩ Y, X ∪ Y } is a family of non-special tight sets. Since
d(X, Y ) = 0, every edge of D enters a member of T 0 , as well. However this
contradicts the maximal choice of T since |X|2 + |Y |2 < |X ∩ Y |2 + |X ∪ Y |2 . u
t
Let K := {X ∈ T : s 6∈ X} and L := {V − X : X ∈ T , s ∈ X}. Then K
contains no special set, %(X) = l for every X ∈ K, δ(X) = l − 1 for every X ∈ L.
Let C denote the union of K and L in the sense that if X is a set belonging to
both K and L, then C includes two copies of X. Now C is a laminar family of
subsets of V − s, and every edge e of D is covered by C in the sense that e
enters a member of K or leaves a member of L. P Let us choose families K and L
so as to satisfy all these properties and so that (|X| : X ∈ C) is minimum.
Claim E There is no node v ∈ V − s for which {v} ∈ K and {v} ∈ L.
Proof. v ∈ L implies δ(v) = l − 1. v ∈ K implies %(v) = l, that is, v would be
special, contradicting the assumption on K. u
t
We distinguish two cases.
Case 1 Every member of C is a singleton.

Let K = {v : {v} ∈ K}. Since D is strongly connected and |V | ≥ 2, there


is an edge e = st leaving s. Edge e cannot leave any member of L since these
members are subsets of V − s. Therefore e must enter a member of K, that is,
K is non-empty. By the strong connectivity of D, there is an edge e0 leaving K.
By Claim E, no element of K, as a singleton, is a member of L, and hence edge
e0 neither enters a member of K nor leaves a member of L, contradicting the
assumption. Therefore Case 1 cannot occur.
198 András Frank and Zoltán Király

Case 2 There is a non-singleton member Z of C.

Suppose that Z is minimal with respect to containment.


Claim F Z induces a strongly connected digraph.
Proof. Suppose indirectly that there is a subset ∅ ⊂ X ⊂ Z so that there is no
edge in D from X to Z − X. If Z ∈ K, then replace Z in K by Z − X. Since
l ≤ %(Z − X) ≤ %(Z) = l we have %(Z − X) = l. Furthermore, Z − X cannot be
special since every edge entering Z enters Z − X as well and hence every edge
entering X leaves Z − X from which l ≤ %(X) ≤ δ(Z − X).
If Z ∈ L, then replace Z in L by X. In both cases we obtain a laminar
family satisfying the requirements for C and this contradicts the minimal choice
of C. t
u
Subcase 2.1 Z ∈ K.

There must be an element v of Z for which {v} 6∈ K, for otherwise Z can


be left out from K. We claim that {v} 6∈ K for every v ∈ Z. For otherwise
X := {x ∈ Z : {x} ∈ K} is a non-empty, proper subset of Z, so by Claim F
there is an edge e = xy with x ∈ X, y ∈ Z − X, and then e cannot be covered
by C (using that, by Claim E, {x} is not in L.
It follows that every edge uv with u, v ∈ Z leaves a member of L which is a
singleton, by the minimal choice of Z, andP hence, by Claim F, {v} is in L for
every v ∈ Z. Then wePhave l = %(Z) = (%(v) : v ∈ Z) − i(Z) ≥ l|Z| − i(Z)
and l − 1 ≤ δ(Z) = (δ(v) : v ∈ Z) − i(Z) = (l − 1)|Z| − i(Z) from which
(l − 1)(|Z| − 1) ≥ i(Z) ≥ l(|Z| − 1), a contradiction. Therefore Subcase 2.1
cannot occur.
Subcase 2.2 Z ∈ L.

There must be an element v of Z for which {v} 6∈ L, for otherwise Z can


be left out from L. We claim that {v} 6∈ L for every v ∈ Z. For otherwise
X := {x ∈ Z : {x} ∈ L} is a non-empty, proper subset of Z, so by Claim F there
is an edge e = yx with x ∈ X, y ∈ Z − X, and then e cannot be covered by C
(using that, by Claim E, {x} is not in K).
It follows that every edge uv with u, v ∈ Z must enter a member of K, which
is a singleton, by the minimal choice of Z, and hence, by Claim F, {v} is in K
for every v ∈ Z. Therefore, as K contains no special members, no element of Z
is special, from which δ(v) ≥ l holds
P for every v ∈ Z.
We have P l − 1 = δ(Z) = (δ(v) : v ∈ Z) − i(Z) ≥ l|Z| − i(Z) and
l ≤ %(Z) = (%(v) : v ∈ Z) − i(Z) = l|Z| − i(Z) from which l − 1 ≥ l, a
contradiction. Therefore Subcase 2.2 cannot occur either, and this completes
the proof of Lemma 3.1. t
u
Proof of Theorem 2.2. It is easy to see that both operations (j) and (jj) given in
the theorem preserve l − -edge-connectivity.
To see the converse, suppose that D is l− -ec. If D has no edges, then D has
the only node s. Suppose now that A is non-empty and assume by induction
Parity Constrained k-Edge-Connected Orientations 199

that every l− -ec digraph, having a fewer number of edges than D has, is con-
structible in the sense that it can be constructed as described in the theorem.
If D has an edge f so that D0 = D − f is l− -ec, then D0 is constructible and
then we obtain D form D0 by adding back f , that is, by operation (j). Therefore
we may assume that the deletion of any edge destroys l− -edge-connectivity.
By Lemma 3.1 there is an edge f = ur of D so that r is special and so
that %0 (X) ≥ pl (X) for every subset X ⊆ V distinct from {r} where %0 denotes
the in-degree function of digraph D0 := D − f. Since r is special, we have
%0 (r) = l − 1 = δ 0 (r) where δ 0 is the out-degree function of D0 .
Let D1 = (U, A1 ) be the digraph arising from D by deleting r (where U :=
V − r), and let %1 denote the in-degree function of D1 . Let mi (u) (respectively,
mo (u)) denote the number of parallel edges in D0 from r to u (from u to r). From
%0 (r) = δ 0 (r) we have mo (U ) = mi (U ). Let p(X) := (pl (X) − %1 (X))+ (X ⊂ U )
and p(∅) := p(U ) := 0. Since both pl and −%1 are crossing supermodular, p is
positively crossing supermodular.
We claim that (2.5a) holds. Indeed, for every ∅ ⊂ X ⊂ U one has pl (X) ≤
%0 (X) = %1 (X) + mi (X) from which p(X) = (pl (X) − %1 (X))+ ≤ mi (X), which
is (2.5a).
We claim that (2.5b) holds, as well. Indeed, for every ∅ ⊂ X ⊂ U we have
pl (X) = pl (X + r) ≤ %0 (X + r) = %1 (X) + mo (U − X) from which p(X) =
(pl (X) − %1 (X))+ ≤ mo (U − X), which is (2.5b).
By Theorem 2.5, there exists a digraph H = (U, F ) satisfying (2.3) and (2.4).
It follows from (2.3) and from the definition of p that the digraph D1 + H :=
(U, A1 ∪ F ) is l− -ec. Then D1 + H is constructible by induction, as |A1 ∪ F | =
|A|−(2l−1)+(l−1) < |A|. By (2.4), D arises from D1 +H by applying operation
(jj) with the choice F , proving that D is also constructible. t
u
Remark.
The proof of Theorem 2.5 in [F94] is algorithmic and gives rise to a combina-
torial strongly polynomial algorithm if an oracle for handling p is available. We
applied Theorem 2.5 to a function p defined by p(X) := (pl (X) − %1 (X))+ and
in this case the oracle can indeed be constructed via a network flow algorithm.
Hence we can find in polynomial time a digraph H = (U, F ) for which D1 + H
is l− -ec. By applying this method at most |A| times one can find the sequence
of operations (j) and (jj) guaranteed by Theorem 2.2.

4 Proof of Theorem 2.1

(1) → (2). Let F := {V1 , . . . , Vt } be a partition of V . For j = 2, . . . , t choose


an element tj of Vj if k + i(Vj ) is even. Furthermore, if the number of chosen
elements plus |E| is odd, then choose an element t1 of V1 . Let T be the set of
chosen elements. Then T is G-even, and by (1), G has a k-ec T -odd orientation.
For every j = 2, . . . , t, %(Vj ) ≥ k, and we
P claim that equality cannot occur.
Indeed, if k + i(Vj ) = %(Vj ) + i(Vj ) = (%(v) : v ∈ Vj ) ≡ |Vj ∩ T | (mod
2), then k + i(Vj ) + |Vj ∩ T | would be even contradicting the definition of T .
200 András Frank and Zoltán Király

Therefore, for every j = 2, . . . , t we have %(Vj ) ≥ k + 1 and also %(V1 ) ≥ k. Hence


Pt
eG (F ) = j=1 %(Vj ) ≥ (k + 1)(t − 1) + k = (k + 1)t − 1, that is, (2.1a) holds.

(2) → (3) Let s be any node of G. By Theorem 2.3, G has a (k + 1)− -ec
orientation, denoted by D = (V, A). By Theorem 2.2, D can be constructed
from s by a sequence of operations (j) and (jj). The corresponding sequence of
operations (i) and (ii) provides G.

(3) → (1). We use induction on the number of edges. There is nothing to prove
if G has no edges so suppose that E is non-empty. Let T be a G-even subset of
V . Let G0 denote the graph from which G is obtained by one of the operations
(i) and (ii). By induction, we may assume that G0 has a k-ec T 0 -odd orientation
for every G0 -even set T 0 .
Suppose first that G arises from G0 by adding a new edge f = xy. Let
T := T ⊕ {y}. Clearly, T 0 is G0 -even. By induction, there exists a k-ec T 0 -odd
0

orientation of G0 . By orienting edge e from x to y, we obtain a k-ec T -odd


orientation of G.
Second, suppose that G arises from G0 by operation (ii). In case r ∈ T , define
T := T − r if k is even and T 0 := (T − r) ⊕ {u} if k is odd. In case r 6∈ T , define
0

T 0 := T ⊕ {u} if k is even and T 0 = T if k is odd. Then T 0 is G0 -even and, by


induction, G0 has a k-ec T 0 -odd orientation. Orient the undirected edge ur from
u to r if either k is even and r ∈ T or else k is odd and r 6∈ T . Otherwise orient
ur from r to u.
Recall that F denotes the subset of edges of G0 used in Operation (jj). Orient
the undirected edge ur from r to u if either k is odd and r ∈ T or else k is even
and r 6∈ T . Furthermore, if an element xy of F gets orientation from x to y, then
let the two corresponding edges of G be oriented from x to r and from r to y,
respectively. Obviously, the resulting orientation of G is k-ec and T -odd. t
u
Remark.
Condition (2) in Theorem 2.1 shows that the property in (1) is in co-NP.
Indeed, if (2) fails to hold for a partition, then a G-even subset T can be con-
structed (as was proved in (1) → (2)) for which no k-ec T -odd orientation exists.
For a given graph G the question whether there is a partition F violating
(2.1a) or G can be constructed as described in condition (3) of Theorem 2.1 can
be decided algorithmically as follows. The proof of Theorem 2.3 described in
[F80] is algorithmic, and gives rise to a combinatorial strongly polynomial time
algorithm in the special case p = pl for finding either a l− -ec orientation of G or
else a partition violating (1.2a) (which is equivalent to (2.1a) for l = k + 1). As
we have mentioned at the end of Section 3, finding a sequence of operations (j)
and (jj) to build D, and hence a sequence of operations (i) and (ii) to build G,
can also be done in polynomial time.
Naturally, this is just a rough proof of the existence of a combinatorial poly-
nomial time algorithm for finding either a partition of V violating (2.1a) or else a
sequence of operations (i) and (ii) to build G, and leaves room for improvements
to get a decent bound on the complexity.
Parity Constrained k-Edge-Connected Orientations 201

References
[CG97] W. Cunningham and J. Geelen: The optimal path-matching problem, Combi-
natorica, Vol. 17, No. 3 (1997) pp. 315-338.
[F80] A. Frank: On the orientation of graphs, J. Combinatorial Theory, Ser B. ,
Vol. 28, No. 3 (1980) pp. 251-261.
[FST84] A. Frank, A. Sebő and É. Tardos: Covering directed and odd cuts, Mathe-
matical Programming Studies 22 (1984) pp. 99-112.
[FJS98] A. Frank, T. Jordán and Z. Szigeti: An orientation theorem with parity con-
straints, to appear in IPCO ’99.
[F94] A. Frank: Connectivity augmentation problems in network design, in: Mathe-
matical Programming: State of the Art 1994, eds. J.R. Birge and K.G. Murty,
The University of Michigan, pp. 34-63.
[G82] R. Giles: Optimum matching forests, I-II-III, Mathematical Programming,
22 (1982) pp. 1-51.
[L70] L. Lovász: Subgraphs with prescribed valencies, J. Combin. Theory, 8 (1970)
pp. 391-416.
[L80] L. Lovász: Selecting independent lines from a family of lines in a space, Acta
Sci. Univ. Szeged, 43 (1980) pp. 121-131.
[M78] W. Mader: Über die Maximalzahl kantendisjunkter A-Wege, Archiv der Math-
ematik (Basel) 30 (1978) pp. 325-336.
[M82] W. Mader: Konstruktion aller n-fach kantenzusammenhängenden Digraphen,
Europ. J. Combinatorics 3 (1982) pp. 63-67.
[N60] C.St.J.A. Nash-Williams: On orientations, connectivity, and odd vertex pair-
ings in finite graphs, Canad. J. Math. 12 (1960) pp. 555-567.
[N81] L. Nebeský: A new characterization of the maximum genus of a graph,
Czechoslovak Mathematical Journal, 31 (106), (1981) pp. 604-613
[S81] P.D. Seymour: On odd cuts and plane multicommodity flows, Proceedings of
the London Math.Soc. 42 (1981) pp. 178-192.
[T61] W.T. Tutte: On the problem of decomposing a graph into n connected factors,
J. London Math. Soc. 36 (1961) pp. 221-230.
[T47] W.T. Tutte: The factorization of linear graphs, J. London Math. Soc. 22
(1947) pp. 107-111.
Approximation Algorithms for MAX 4-SAT and
Rounding Procedures for Semidefinite Programs

Eran Halperin and Uri Zwick

Department of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel.


{heran,zwick}@math.tau.ac.il

Abstract. Karloff and Zwick obtained recently an optimal 7/8-approxi-


mation algorithm for MAX 3-SAT. In an attempt to see whether similar
methods can be used to obtain a 7/8-approximation algorithm for MAX
SAT, we consider the most natural generalization of MAX 3-SAT, namely
MAX 4-SAT. We present a semidefinite programming relaxation of MAX
4-SAT and a new family of rounding procedures that try to cope well
with clauses of various sizes. We study the potential, and the limitations,
of the relaxation and of the proposed family of rounding procedures us-
ing a combination of theoretical and experimental means. We select two
rounding procedures from the proposed family of rounding procedures.
Using the first rounding procedure we seem to obtain an almost opti-
mal 0.8721-approximation algorithm for MAX 4-SAT. Using the second
rounding procedure we seem to obtain an optimal 7/8-approximation al-
gorithm for satisfiable instances of MAX 4-SAT. On the other hand, we
show that no rounding procedure from the family considered can yield an
approximation algorithm for MAX 4-SAT whose performance guarantee
on all instances of the problem is greater than 0.8724.
Although most of this paper deals specifically with the MAX 4-SAT
problem, we believe that the new family of rounding procedures intro-
duced, and the methodology used in the design and in the analysis of the
various rounding procedures considered would have a much wider range
of applicability.

1 Introduction

MAX SAT is one of the most natural optimization problems. An instance of


MAX SAT in the Boolean variables x1 , . . . , xn is composed of a collection of
clauses. Each clause is the disjunction of an arbitrary number of literals. Each
literal is a variable, xi , or a negation, x̄i , of a variable. Each clause has a non-
negative weight w associated with it. The goal is to find a 0-1 assignment of
values to the Boolean variables x1 , . . . , xn so that the sum of the weights of the
satisfied clauses is maximized.
Following a long line of research by many authors, we now know that MAX
SAT is APX-hard (or MAX SNP-hard) [21,10,3,2,7,6,18] . This means that there
is a constant  > 0 such that, assuming P6=NP, there is no polynomial time
approximation algorithm for MAX SAT with a performance guarantee of at

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 202–217, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Approximation Algorithms for MAX 4-SAT 203

least 1 −  on all instances of the problem. Approximation algorithms for MAX


SAT were designed by many authors, including [16,28,11,12,5,4]. The best per-
formance ratio known for the problem is currently 0.77 [4]. An approximation
algorithm for MAX SAT with a conjectured performance guarantee of 0.797 is
given in [31].
In a major breakthrough, Håstad [14] showed recently that no polynomial
time approximation algorithm for MAX SAT can have a performance guarantee
of more than 7/8, unless P=NP. Håstad’s shows, in fact, that no polynomial time
approximation algorithm for satisfiable instances of MAX {3}-SAT can have a
performance guarantee of more than 7/8. MAX {3}-SAT is the subproblem of
MAX SAT in which each clause is of size exactly three. An instance is satisfiable
if there is an assignment that satisfies all its clauses.
Karloff and Zwick [17] obtained recently an optimal 7/8-approximation algo-
rithm for MAX 3-SAT, the version of MAX SAT in which each clause is of size at
most three. (This claim appears in [17] as a conjecture. It has since been proved.)
Their algorithm uses semidefinite programming. A much simpler approximation
algorithm has a performance guarantee of 7/8 if all clauses are of size at least
three. If all clauses are of size at least three then a random assignment satisfies,
on the average, at least 7/8 of the clauses.
We thus have a performance guarantee of 7/8 for instances in which all
clauses are of size at most three, and for instances in which all clauses are of
size at least three. Can we get a performance guarantee of 7/8 for all instances
of MAX SAT? In an attempt to answer this question, we check the prospects
of obtaining a 7/8-approximation algorithm for MAX 4-SAT, the subproblem of
MAX SAT in which each clause is of size at most four. As it turns out, this is
already a challenging problem.
The 7/8-approximation algorithm for MAX 3-SAT starts by solving a semidef-
inite programming relaxation of the problem. It then rounds the solution of this
program using a random hyperplane passing through the origin. It is natural
to try to obtain a similar approximation algorithm for MAX 4-SAT. It is not
difficult, see Section 2, to obtain a semidefinite programming relaxation of MAX
4-SAT. It is again natural to try to round this solution using a random hyper-
plane. It turns out, however, that the performance guarantee of this algorithm
is only 0.845173. Although this is much better than all previous performance
guarantees for MAX 4-SAT, this guarantee is, unfortunately, below 7/8.
As the semidefinite programming relaxation of MAX 4-SAT is the strongest
relaxation of its kind (see again Section 2), it seems that a different rounding
procedure should be used. We describe, in Section 3, a new family of rounding
procedures. This family extends all the families of rounding procedures previ-
ously suggested for maximum satisfiability problems. The difficulty in developing
good rounding procedures for MAX 4-SAT is that rounding procedures that work
well for the short clauses, do not work so well for the longer clauses, and vice
versa. Rounding procedures from the new family try to work well on all clause
sizes simultaneously. We initially hoped that an appropriate rounding procedure
from this family could be used to obtain 7/8-approximation algorithms for MAX
204 Eran Halperin and Uri Zwick

4-SAT and perhaps even MAX SAT. It turns out, however, that the new family
falls just short of this mission. The experiments that we have made suggest that
a rounding procedure from the family, which we explicitly describe, can be used
to obtain a 0.8721-approximation algorithm for MAX 4-SAT. Unfortunately, no
rounding procedure from the family yields an approximation algorithm for MAX
4-SAT with a performance guarantee larger than 0.8724.
We have more success with MAX {2, 3, 4}-SAT, the version of MAX SAT in
which the clauses are of size two, three or four. We present a second rounding
procedure from the family that seems to yield an optimal 7/8-approximation
algorithm for MAX {2, 3, 4}-SAT. A 7/8-approximation algorithm for MAX
{2, 3, 4}-SAT yields immediately an optimal 7/8-approximation algorithm for
satisfiable instances of MAX 4-SAT, as clauses of size one can be easily elimi-
nated from satisfiable instances.
To determine the performance guarantee obtained using a given rounding
procedure R, or at least a lower bound on this ratio, we have to find the global
minimum of a function ratio R (v0 , v1 , v2 , v3 , v4 ), given a set of constraints on
the unit vectors v0 , v1 , . . . , v4 ∈ IR5 . The function ratio R is a fairly complicated
function determined by the rounding procedure R. As five unit vectors are de-
termined, up to rotations, by the 52 = 10 angles between them, the function
ratio R is actually a function of 10 real variables. Finding the global minimum
of ratio R analytically is a formidable task. In the course of our investigation we
experimented with hundreds of rounding procedures. Finding these minima ‘by
hand’ was not really an option. We have implemented a set of Matlab functions
that use numerical techniques to find these minima.
The discussion so far centered on the quality of the rounding procedures con-
sidered. We also consider the quality of the suggested semidefinite programming
relaxation itself. The integrality ratio of the MAX 4-SAT relaxation cannot be
more than 7/8, as it is also a relaxation of MAX 3-SAT. We also show that the
integrality ratio of the relaxation, considered as a relaxation of the problem MAX
{1, 4}-SAT, is at most 0.8753. The fact that this ratio is, at best, just above 7/8
is another indication of the difficulty of obtaining optimal 7/8-approximation
algorithm for MAX 4-SAT and MAX SAT. It may also indicate that a stronger
semidefinite programming relaxation would be needed to accomplish this goal.
The fact that numerical optimization techniques were used to compute the
performance guarantees of the algorithms means that we cannot claim the ex-
istence of a 0.8721-approximation algorithm for MAX 4-SAT, and of a 7/8-ap-
proximation algorithm for MAX {2, 3, 4}-SAT as theorems. We believe, however,
that it is possible to prove these claims analytically and promote them to the
status of theorems, as was eventually done with the optimal 7/8-approximation
algorithm for MAX 3-SAT. This would require, however, considerable effort. It
may make more sense, therefore, to look for an approximation algorithm that
seems to be a 7/8-approximation algorithm for MAX 4-SAT before proceeding
to this stage.
In addition to implementing a set of Matlab functions that try to find the per-
formance guarantee of a given rounding procedure from the family considered,
Approximation Algorithms for MAX 4-SAT 205

we have also implemented a set of functions that search for good rounding pro-
cedures. The whole project required about 3000 lines of code. The two rounding
procedures mentioned above, and several other interesting rounding procedures
mentioned in Section 5, were found automatically using this system, with some
manual help from the authors. The total running time used in the search for
good rounding procedures is measured by months.
We end this section with a short survey of related results. The 7/8-approxi-
mation algorithm for MAX 3-SAT is based on the MAX CUT approximation
algorithm of Goemans and Williamson [12]. A 0.931-approximation algorithm
for MAX 2-SAT was obtained by Feige and Goemans [9]. Asano [4] obtained
a 0.770- approximation algorithm for MAX SAT. Trevisan [25] obtained a 0.8-
approximation algorithm for satisfiable MAX SAT instances. The last two results
are also the best published results for MAX 4-SAT.

2 Semidefinite Programming Relaxation of MAX 4-SAT

Karloff and Zwick [17] describe a canonical way of obtaining semidefinite pro-
gramming relaxations for any constraint satisfaction problem. We now describe
the canonical relaxation of MAX 4-SAT obtained using this approach.
Assume that x1 , . . . , xn are the variables of the MAX 4-SAT instance. We let
x0 = 0 and xn+i = x̄i , for 1 ≤ i ≤ n. The semidefinite program corresponding
to the instance has a variable unit vector vi , corresponding to each literal xi ,
and scalar variables zi , zij , zijk or zijkl corresponding to the clauses xi , xi ∨ xj ,
xi ∨ xj ∨ xk and xi ∨ xj ∨ xk ∨ xl of the instance, where 1 ≤ i, j, k ≤ 2n. Note
that all clauses, including those that contain negated literals, can be expressed
in this form. Clearly, we require vn+i = −vi , or vi · vn+i = −1, for 1 ≤ i ≤ n.
The objective of the semidefinite program is to maximize the function
X X X X
wi zi + wij zij + wijk zijk + wijkl zijkl ,
i i,j i,j,k i,j,k,l

where the wi ’s, wij ’s, wijk ’s and wijkl ’s are the non-negative weights of the
different clauses, subject to the following collection of constraints. For ease of
notation, we write down the constraints that correspond to the clauses x1 , x1 ∨x2 ,
x1 ∨x2 ∨x3 and x1 ∨x2 ∨x3 ∨x4 . The constraints corresponding to the other clauses
are easily obtained by plugging in the corresponding indices. The constraints
corresponding to x1 and x1 ∨ x2 are quite simple:
1−v0 ·v1 3−v0 ·v1 −v0 ·v2 −v1 ·v2
z1 = 2 , z12 ≤ 4 , z12 ≤ 1 .

The constraints corresponding to x1 ∨ x2 ∨ x3 are slightly more complicated:

4−(v0 +v1 )·(v2 +v3 ) 4−(v0 +v2 )·(v1 +v3 )


z123 ≤ 4 , z123 ≤ 4
4−(v0 +v3 )·(v1 +v2 )
z123 ≤ 4 , z123 ≤ 1
206 Eran Halperin and Uri Zwick

It is not difficult to check that the first three constraints above are equivalent to
the requirement that
4−(vi0 ·vi1 +vi1 ·vi2 +vi2 ·vi3 +vi3 ·vi0 )
z123 ≤ 4 ,
for any permutation i0 , i1 , i2 , i3 on 0, 1, 2, 3. We will encounter similar constraints
for the 4-clauses. The constraints corresponding to x1 ∨ x2 ∨ x3 ∨ x4 are even
more complicated. For any permutation i0 , i1 , i2 , i3 , i4 on 0, 1, 2, 3, 4 we require:
5−(vi0 ·vi1 +vi1 ·vi2 +vi2 ·vi3 +vi3 ·vi4 +vi4 ·vi0 )
z1234 ≤ 4 ,
5−(vi0 +vi4 )·(vi1 +vi2 +vi3 )+vi0 ·vi4
z1234 ≤ 4 , z1234 ≤ 1 .
The first line above contributes 12 different constraints, the second line con-
tributes 10 different constraints. Together with the constraint z1234 ≤ 1 we
get a total of 23 constraints per 4-clause. In addition, for every distinct 0 ≤
i1 , i2 , i3 , i4 , i5 ≤ 2n, we require
X X
vij · vik ≥ −1 and vij · vik ≥ −2 .
1≤j<k≤3 1≤j<k≤5

Although we do not consider here clauses of size larger than 4, we remark


that for any Pk k, and for any permutation i0 , i1 , . . . , ik on 0, 1, . . . , k, z12...k ≤
((k + 1) − j=0 vij · vij+1 )/4, where the index j + 1 is interpreted modulo k + 1,
is a valid constraint in the semidefinite programming relaxation of MAX k-SAT.
It is not difficult to verify that all these constraints are satisfied by any valid
‘integral’ assignment to the vectors vi and the scalars zi , zij , zijk and zijkl , i.e.,
an assignment in which vi = (1, 0, . . . , 0) if xi = 1, and vi = (−1, 0, . . . , 0) if
xi = 0, and in which every z variable is set to 1 if its corresponding clause is sat-
isfied by the assignment x1 , x2 , . . . , xn , and to 0 otherwise. Thus, the presented
semidefinite program is indeed a relaxation of the MAX 4-SAT instance.
The constraints of the above semidefinite program correspond to the facets of
a polyhedron corresponding to the Boolean function x1 ∨x2 ∨x3 ∨x4 . As explained
in [17], it is therefore the strongest semidefinite relaxation that considers the
clauses of the instance one by one. Stronger relaxations may be obtained by
considering several clauses at once.
The semidefinite programming relaxation of a MAX 4-SAT instance has n+1
unknown unit vectors v0 , v1 , . . . , vn (the vectors vn+i are just used as shorthands
for −vi ), O(n4 ) scalar variables and O(n5 ) constraints. An almost optimal solu-
tion in which all unit vectors v0 , v1 , . . . , vn lie in IRn+1 can be found in polynomial
time ([1],[13],[19]).

3 Rounding Procedures
In this section we consider various procedures that can be used to round so-
lutions of semidefinite programming relaxations and examine the performance
guarantees that we get for MAX 4-SAT using them. We start with simple round-
ing procedures and then move on to more complicated ones. The new family of
rounding procedures is then presented in Section 3.5.
Approximation Algorithms for MAX 4-SAT 207

3.1 Rounding Using a Random Hyperplane


Following Goemans and Williamson [12], many semidefinite programming based
approximation algorithms round the solution to the semidefinite program using
a random hyperplane passing through the origin. A random hyperplane that
passes through the origin is chosen by choosing its normal vector r as a uniformly
distributed vector on the unit sphere (in IRn+1 ). A vector vi is then rounded to 0
if vi and v0 fall on the same side of the random hyperplane, i.e., if sgn(r · vi ) =
sgn(r · v0 ), and to 1, otherwise. Note that the rounded values of the variables are
usually not independent. More specifically, if vi and vj are not perpendicular,
i.e., if vi · vj 6= 0, then the rounded values of xi and xj are dependent.
Given a set of unit vector V , we let probH (V ) denote the probability that
not all the vectors of V fall on the same side of a random hyperplane that passes
through the origin. It is not difficult to see that
θ01 θ01 + θ02 + θ12
probH (v0 , v1 ) = , probH (v0 , v1 , v2 ) = ,
π 2π
where θij = arccos(vi ·vj ) is the angle between vi and vj . Evaluating probH (v0 , v1 ,
v2 , v3 ) is more difficult. As noted in [17],
V ol(λ01 , λ02 , λ12 , λ03 , λ13 , λ23 )
probH (v0 , v1 , v2 , v3 ) = 1 − ,
π2
where (λ01 , λ02 , λ12 , λ03 , λ13 , λ23 ) = π − (θ23 , θ13 , θ03 , θ12 , θ02 , θ01 ) and V ol(λ01 ,
λ02 , λ12 , λ03 , λ13 , λ23 ) is the volume of a spherical tetrahedron with dihedral
angles λ01 , λ02 , λ12 , λ03 , λ13 , λ23 . Unfortunately, the volume function of spherical
tetrahedra seems to be a non-elementary function and numerical integration
should be used to evaluate it ([24],[8],[15],[27]).
It is not difficult to verify, using inclusion-exclusion, that
1 X 1X
probH (v0 , v1 , v2 , v3 , v4 ) = probH (vi , vj , vk , vl ) − probH (vi , vj ) .
2 4 i<j
i<j<k<l

Thus, the probability that certain five unit vectors are separated by a random
hyperplane can be expressed as a combination of probabilities of events that
involve at most four vectors and no further numerical integration is needed. More
generally, probH (v0 , . . . , vk ), for any even k, can be expressed as combinations of
probabilities involving at most k vectors. The same does not hold, unfortunately,
when k is odd.
We let relax(v0 , v1 , . . . , vi ), where 1 ≤ i ≤ 4, denote the ‘relaxed’ value of
the clause x1 ∨ . . . ∨ xi , i.e., the maximal value to which the scalar z12...i can
be set, while still satisfying all the relevant constraints, given the unit vectors
v0 , v1 , . . . , vi . We let

ratio H (v0 , v1 , . . . , vi ) = probH (v0 , v1 , . . . , vi )/relax(v0 , v1 , . . . , vi ) .

Finally, for every 1 ≤ i ≤ 4, we let αi = min ratio H (v0 , v1 , . . . , vi ), where the


minimum is over all configurations of unit vectors that satisfy the constraints
208 Eran Halperin and Uri Zwick

described in the previous section, and α = min{α1 , α2 , α3 , α4 }. As follows from


straightforward arguments (see [12] or [17]), α is a lower bound on the perfor-
mance guarantee of the approximation algorithm for MAX 4-SAT that uses the
semidefinite relaxation and then rounds the solution using a random hyperplane.
It is shown in [12] that α1 = α2 ' 0.87856. It is shown in [17] that α3 = 7/8.
Unfortunately, it turns out that α4 ' 0.845173. The minimum is attained
when the angle between each pair of vectors among v0 , v1 , . . . , v4 is exactly
arccos(1/5) ' 1.369438. It is not difficult to check that when vi · vj = 1/5,
for every 0 ≤ i < j ≤ 4, all the inequalities on z1234 simplify to z1234 ≤ 1 and
therefore relax(v0 , v1 , v2 , v3 , v4 ) = 1.

3.2 Pre-rounding Rotations

Feige and Goemans [9] introduced the following variation of random hyperplane
rounding. Let f : [0, π] → [0, π] be a continuous function satisfying f (0) = 0 and
f (π − θ) = π − f (θ), for 0 ≤ θ ≤ π. Before rounding the vectors using a random
hyperplane, the vector vi is rotated into a new vector vi0 , in the plane spanned
0
by v0 and vi , so that the angle θ0i between v0 and vi0 would be θ0i0
= f (θ0i ). The
rotations of the vectors v1 , . . . , vn affects, of course, the angles between these
0
vectors. Let θij be the angle between vi0 and vj0 . It is not difficult to see (see [9]),
that for i, j > 0, i 6= j, we have

0 0 0 cos θij − cos θ0i · cos θ0j 0 0


cos θij = cos θ0i · cos θ0j + · sin θ0i · sin θ0j .
sin θ0i · sin θ0j

The vectors v0 , v10 , . . . , vn0 are then rounded using a random hyperplane. The
condition f (0) = 0 is required to ensure the continuity of the transformation
vi → vi0 . The condition f (π − θ) = π − f (θ) ensures that unnegated and negated
literals are treated in the same manner.
Feige and Goemans [9] use rotations to obtain a 0.931-approximation algo-
rithm for MAX 2-SAT. Rotations are also used in [29] and [30]. Can we use
rotations to get a better approximation algorithm for MAX 4-SAT? The answer
is that rotations on their own help, but very little. Consider the configuration
v0 , v1 , v2 , v3 , v4 in which θ0i = π/2, for 1 ≤ i ≤ 4, and θij = arccos(1/3), for
1 ≤ i < j ≤ 4. For this configuration we get relax = 1 and ratio H ' 0.8503. As
every rotation function f must satisfy f (π/2) = π/2, rotations have no effect on
this configuration.
A different type of rotations was recently used by Nesterov [20] and Zwick [31].
These outer-rotations are used in [31] to obtain some improved approximation
algorithms for MAX SAT and MAX NAE-SAT. We were not able to use them,
however, to improve the results that we get in this paper for MAX 4-SAT.

3.3 Rounding the Vectors Independently

The semidefinite programming relaxation of MAX 4-SAT that we are using here
is stronger than the linear programming relaxation suggested by Goemans and
Approximation Algorithms for MAX 4-SAT 209

Williamson [11]. It is nonetheless interesting to consider an adaptation of the


rounding procedures used in [11] to the present context. The rounding proce-
dures of [11] are based on the randomized rounding technique of Raghavan and
Thompson [23],[22].
Let g : [0, π] → [0, π] be a continuous function such that g(π − θ) = π −
g(θ), for 0 ≤ θ ≤ π. Note again that we must have g(π/2) = π/2. We do not
require g(0) = 0 this time. The rounding procedure described here rounds each
vector independently. The variable xi is assigned the value 1 with probability
g(θ0i )/π, and the value 0 with the complementary probability. The probability
probI (v0 , v1 , . . . , vi ) that a clause x1 ∨ . . . ∨ xi is satisfied is now

Yi  
g(θ0j )
probI (v0 , v1 , . . . , vi ) = 1 − 1− .
j=1
π

Note that as each vector is rounded independently, the angles θij , where i, j > 0,
between the vectors, have no effect this time. It may be worthwhile to note
that the choice g(θ) = π/2, for every 0 ≤ θ ≤ π, corresponds to choosing
the assignment to the variables x1 , x2 , . . . , xn uniformly at random, a ‘rounding
procedure’ that yields a ratio of 7/8 for clauses of size 3 and 15/16 for clauses
of size 4. Goemans and Williamson [11] describe several functions g using which
a 3/4-approximation algorithm for MAX SAT may be obtained.
Independent rounding performs well for long clauses. It cannot yield a ratio
larger than 3/4, however, for clauses of size 2. To see this, consider the configura-
tion v0 , v1 , v2 in which θ01 = θ02 = π/2 and θ12 = π. We have relax(v0 , v1 , v2 ) = 1
and probI (v0 , v1 , v2 ) = 3/4, for any function g.

3.4 Simple Combinations

We have seen that hyperplane rounding works well for short clauses and that in-
dependent rounding works well for long clauses. It is therefore natural to consider
a combination of the two.
Perhaps the most natural combination of hyperplane rounding and indepen-
dent rounding is the following. Let 0 ≤  ≤ 1. With probability 1 −  round
the vectors using a random hyperplane. With probability  choose a random
assignment. It turns out that the best choice of  here is  ' 0.086553. With this
value of , we get α1 = α2 = α4 ' 0.853150 while α3 = 7/8. Thus, we again get
a small improvement but we are still far from 7/8.
Instead of rounding all vectors using a random hyperplane, or choosing ran-
dom values to all variables, we can round some of the vectors using a random
hyperplane, and assign some of the variables random values. More precisely, we
choose one random hyperplane. Each vector is now rounded using this random
hyperplane with probability 1 − , or is assigned a random value with proba-
bility . The decisions for the different variables made independently. Letting
 ' 0.073609, we get α1 = α2 = α4 ' 0.856994, while α3 ' 0.874496. This is
again slightly better but still far from 7/8.
210 Eran Halperin and Uri Zwick

3.5 More Complicated Combinations

Simple combinations of hyperplane rounding and independent rounding yield


modest improvements. Can we get more substantial improvements by using
more sophisticated combinations? To answer this question we introduce the
following family of rounding procedures. The new family seems to include all
the natural combinations of the rounding procedures mentioned above.
Each rounding procedure in the new family is characterized by three con-
tinuous functions f, g : [0, π] → [0, π] and  : [0, π] → [0, 1]. The function f is
used for rotating the vectors before rounding them using a random hyperplane,
as described in Section 3.2. The function g is used to round the vectors inde-
pendently, as described in Section 3.3. The function  is used to decide which of
the two roundings should be used. The decision is made independently for each
vector, depending on the angle between it and v0 . The function  : [0, π] → [0, 1]
is a continuous function satisfying (π − θ) = (θ), a condition that ensures that
negated and unnegated literals are treated in the same manner. The vector vi is
rounded using a random hyperplane, shared by all the vectors rounded using a
random hyperplane, with probability 1 − (θ0i ), and is rounded independently,
with probability (θ0i ). Vectors rounded using the shared hyperplane are rotated
before the rounding. Let v10 , v20 , . . . , vn0 be the vectors obtained by rotating the
vectors v1 , v2 , . . . , vn , as specified by the rotation function f . The probability
that a clause x1 ∨ x2 ∨ . . . ∨ xi is satisfied by the assignment produced by this
combined rounding procedure is given by the following expression:
X
probC (v0 , v1 , . . . , vi ) = 1 − pr(S) · (1 − probH (v 0 (S))) · (1 − probI (v(S̄)))
S

where
Y Y
pr(S) = (1 − (θ0i )) · (θ0i ) ,
i∈S i6∈S

v 0 (S) = {v0 } ∪ {vi0 | i ∈ S} , v(S̄) = {v0 } ∪ {vi | i 6∈ S} ,

and where S ranges over all subsets of {1, 2, . . . , i}. Recall that probH (u1 , u2 , . . . ,
uk ) is the probability that the set of vectors u1 , u2 , . . . , uk is separated by a
random hyperplane, and that probI (v0 , u1 , . . . , uk ) is the probability that at
least one of the vectors u1 , u2 , . . . , uk is assigned the value 1 when all these
vectors are rounded independently using the function g.
We have made some experiments with an even wider family of rounding pro-
cedures but we were not able to improve on the results obtained using rounding
procedures selected from the family described here. More details will be given
in the full version of the paper.
Can we select a rounding procedure from the proposed family of rounding
procedures using which we can get an optimal, or an almost optimal, approxi-
mation algorithm for MAX 4-SAT?
Approximation Algorithms for MAX 4-SAT 211

4 The Search for Good Rounding Procedures


The new family of rounding procedures defined in the previous section is huge.
How can we expect to select the best, or almost the best, rounding procedure
from this family? As it turns out, although each rounding procedure is defined
by three continuous functions f, g and , most of the values of these functions
do not matter much. What really matter are the values of these functions at
several ‘important’ angles. We therefore restrict ourselves to rounding procedures
defined by piecewise linear functions f, g and  with a relatively small number of
bends. By placing these bends at the ‘important’ angles, we can find, as we shall
see, a rounding procedure which is close to being the best rounding procedure
from this family.
More specifically, we consider functions obtained by connecting k given points
(x1 , y1 ), (x2 , y2 ), . . . , (xk , yk ) by straight line segments, where x1 = 0 and xk =
π/2. For f we also require y1 = 0 and yk = π/2. For g we also require yk = π/2.
The values of the functions f, g and  for π/2 < θ ≤ π are determined by the
conditions f (π − θ) = π − f (θ), g(π − θ) = π − g(θ) and (π − θ) = (θ). We
usually worked with k ≤ 5.
For a given value of k we are now faced with a very difficult optimization
problem in 6k−9 real variables, the variables being the x and y coordinates of the
points through which the functions f, g and  are required to pass. The objective
is to maximize α(C(f, g, )), the performance guarantee obtained by using the
rounding procedure defined by the functions f, g and  that pass through the
points. Recall that evaluating α(C(f, g, )) for a given set of functions f, g and 
is already a difficult task that requires finding the global minimum of a rather
complicated function of 10 real variables.
We have written a Matlab program, called opt fun, that tries to find a close
to optimal rounding procedure that uses functions specified using at most k
points. This is quite a non-trivial task and, as mentioned in the introduction,
it required about 3000 lines of code, in addition to the sophisticated numerical
optimization routines of Matlab’s optimization toolbox.
Although numerical methods were used to evaluate the performance guaran-
tees of the different rounding procedures, we believe that the 0.8721 and 7/8 per-
formances ratio claimed for the two rounding procedures that will be described
shortly are the correct performance ratios. There is, in fact, a completely me-
chanical way of generating a (long and tedious) rigorous proof of these claims.
As mentioned in the introduction, we believe that it would be more fruitful to
look for an algorithm that seems to achieve a performance ratio of 7/8 before
taking on the task of producing rigorous proofs. We believe that the use of nu-
merical methods would be inevitable in the search for optimal algorithms for
MAX 4-SAT and MAX SAT, at least using the current techniques.

5 Almost Optimal or Optimal Approximation Algorithms


We now present some optimal or close to optimal approximation algorithms
obtained using rounding procedures from the new family of rounding procedures.
212 Eran Halperin and Uri Zwick

f g 

( 0 , 0 ) ( 0 , 0 ) ( 0 , 0.250000 )
( 0.777843 , 1.210627 ) ( 0.750000 , 0 ) ( 0.744611 , 0.357201 )
( 1.038994 , 1.445975 ) ( 1.072646 , 0 ) ( 1.039987 , 0.255183 )
( 1.248362 , 1.394099 ) ( 1.248697 , 0.872552 ) ( 1.072689 , 0.222928 )
( π/2 , π/2 ) ( π/2 , π/2 ) ( π/2 , 0.131681 )

Fig. 1. The rounding procedure that seems to yield a 0.8721-approximation


algorithm for MAX 4-SAT

f g 

( 0 , 0 ) ( 0 , 0.550000 ) ( 0 , 0.650000 )
( 1.394245 , 1.544705 ) ( 1.155432 , 1.154866 ) ( 0.413021 , 0.163085 )
( π/2 , π/2 ) ( 1.394111 , 0.931661 ) ( π/2 , 0.160924 )
( π/2 , π/2 )

Fig. 2. The rounding procedure that seems to yield an optimal 7/8-


approximation algorithm for MAX {2, 3, 4}-SAT.

5.1 MAX 4-SAT


Using the semidefinite programming relaxation of Section 2 and the rounding
procedure defined by the three piecewise linear functions passing through the
points given in Figure 1 we seem to obtain a 0.8721-approximation algorithm
for MAX 4-SAT, or more specifically, an algorithm with α1 ' α2 ' α3 ' α4 '
0.8721. As we shall see in Section 6, this is essentially the best approximation
ratio that we can obtain using a rounding procedure from the family considered.
It is interesting to note that g(θ) = 0 for 0 ≤ θ ≤ 1.072646 and that 0.13 ≤
(θ) ≤ 0.36 for 0 ≤ θ ≤ π. This means that if the angle θ0i between vi and v0
is less than about π/3, then with a probability of about 1/4, the variable xi is
assigned the value 0, without any further consideration of the angle θ0i . It is also
interesting to note that the function f (θ) is not monotone.

5.2 MAX {2,3,4}-SAT


Using the semidefinite programming relaxation of Section 2 and the rounding
procedure defined by the three piecewise linear functions passing through the
points given in Figure 2 we believe we obtain a 7/8-approximation algorithm for
MAX {2, 3, 4}-SAT. We get in fact, an approximation algorithm for MAX 4-SAT
with α2 ' 0.8751, α3 = 7/8, α4 ' 0.8755 but with α1 ' 0.8352. It is interesting
Approximation Algorithms for MAX 4-SAT 213

to note the non-monotonicity of the function g(θ) and the fact that only one
intermediate point is needed for f (θ) and (θ) and only two intermediate points
are needed for g(θ).
A 7/8-approximation algorithm for MAX {2, 3, 4}-SAT is of course optimal
as a ratio better than 7/8 cannot be obtained even for MAX {3}-SAT, which is
a subproblem of MAX {2, 3, 4}-SAT.

5.3 MAX 3-SAT


The optimal 7/8-approximation algorithm for MAX 3-SAT presented in [17] has
α1 = α2 ' 0.87856 and α3 = 7/8. Using pre-rounding rotations we can obtain an
approximation algorithm for MAX 3-SAT with α1 = α2 ' 0.9197 and α3 = 7/8.
This algorithm would perform better than the algorithm of [17] on instances
in which some of the contribution to the optimal value of their semidefinite
programming relaxation comes from clauses of size one or two. The details of
this algorithm will be given in the full version of the paper.

5.4 MAX 2-SAT


Feige and Goemans [9] obtained an approximation algorithm for MAX 2-SAT
with α1 ' 0.976 and α2 ' 0.931. Although we cannot improve α2 , the perfor-
mance ratio on clauses of size two, we can obtain, using pre-rounding rotations,
an approximation algorithm for MAX 2-SAT with α1 ' 0.983 and α2 ' 0.931.
The details of this algorithm will be given in the full version of the paper.

6 Limitations of Current Rounding Procedures


We presented above a rounding procedure using which we seem to get a 0.8721-
approximation algorithm for MAX 4-SAT. This is extremely close to 7/8. Could
it be that by searching a little bit harder, or perhaps allowing more bends, we
could find a rounding procedure from the family defined in Section 3.5 using
which we could obtain an optimal 7/8-approximation algorithm for MAX 4-
SAT? Unfortunately, the answer is no. We show in this section that the rounding
procedure described in Section 5.1 is close to being the best rounding procedure
of the family considered.
Let θij , for 0 ≤ i < j ≤ 4, be the angles between the five unit vectors
v0 , v1 , v2 , v3 , v4 . Let cij = cos θij . It is not difficult to check that if
1+c01 +c02 −2c03 −2c04 1+c01 −2c02 +c03 −2c04
c12 = 3 , c13 = 3
1−2c01 +c02 +c03 −2c04 1+c01 −2c02 −2c03 +c04
c23 = 3 , c14 = 3
1−2c01 +c02 −2c03 +c04 1−2c01 −2c02 +c03 +c04
c24 = 3 , c34 = 3

then relax(v0 , v1 , v2 , v3 , v4 ) = 1.
Let 0 < θ1 < θ2 ≤ π/2 be two angles. Consider the configuration (v0 , v1 ) in
which θ01 = π − θ1 , and the two configurations (v0 , v11 , v21 , v31 , v41 ) and (v0 , v12 , v22 ,
v32 , v42 ) in which
1
(θ01 , θ02
1
, θ03
1
, θ04
1
) = (θ1 , θ1 , θ1 , π − θ2 ) , 2
(θ01 , θ02
2
, θ03
2
, θ04
2
) = (θ2 , θ2 , θ2 , θ2 )
214 Eran Halperin and Uri Zwick

i
and in which the angles θjk , for 1 ≤ j < k ≤ 4 are determined according to
the relations above so that relax(v0 , v1i , v2i , v3i , v4i ) = 1, for i = 1, 2. It is not
θ2
difficult to check that θ12 1
= θ131
= θ23 1
= arccos( 1+2 cos3 ), θ14
1
= θ24
1
= θ34
1
=
1−3 cos θ1 −cos θ2 1−2 cos θ2
arccos( 3 ) and that θ 2
ij = arccos( 3 ), for 1 ≤ j ≤ k ≤ 4.
Assume that the configurations (v0 , v1i , v2i , v3i , v4i ), for i = 1, 2, are feasible. For
every rounding procedure C we have

α(C) ≤ min{ ratio C (v0 , v1 ), ratio C (v0 , v11 , v21 , v31 , v41 ), ratio C (v0 , v12 , v22 , v32 , v42 ) }.

As the only angles between v0 and and other vectors in these three configurations
are θ1 , θ2 , π − θ1 and π − θ2 , and as f (π − θ) = π − f (θ), g(π − θ) = π − g(θ) and
(π − θ) = (θ), we get that for every rounding procedure from our family, the
three ratios ratio C (v0 , v1 ), ratio C (v0 , v11 , v21 , v31 , v41 ) and ratio C (v0 , v12 , v22 , v32 , v42 )
depend only on the six parameters f (θ1 ), f (θ2 ), g(θ1 ), g(θ2 ), (θ1 ) and (θ2 ).
Take θ1 = 0.95 and θ2 = arccos(1/5) ' 1.369438. It is possible to check
that the resulting two configurations (v0 , v11 , v21 , v31 , v41 ) and (v0 , v12 , v22 , v32 , v42 ) are
feasible. The choice of the six parameters that maximizes the minimum ratio of
the three configurations, found again using numerical optimization, is:

f (θ1 ) ' 1.410756 , g(θ1 ) ' 0 , (θ1 ) ' 0.309376


f (θ2 ) ' 1.448494 , g(θ2 ) ' 1.233821 , (θ2 ) ' 0.122906

With this choice of parameters, the three ratios evaluate to about 0.8724. No
rounding procedure from the family can therefore attain a ratio of more than
0.8724 simultaneously on these three specific configurations. No rounding proce-
dure from the family can therefore yield a performance ratio greater than 0.8724
for MAX 4-SAT, even if the functions f, g and  are not piecewise linear.

7 The Quality of the Semidefinite Programming


Relaxation
Let I be an instance of MAX 4-SAT. Let opt(I) be the value of the optimal
assignment for this instance. Let opt∗ (I) be the value of the optimal solution
of the canonical semidefinite programming relaxation of the instance given in
Section 2. Clearly opt(I) ≤ opt∗ (I) for every instance I. The integrality ratio
of the relaxation is defined to be γ = inf I opt(I)/opt∗ (I), where the infimum is
taken over all the instances.
In Section 3, when we analyzed the performance of different rounding proce-
dures, we compared the value, or rather the expected value, of the assignment
produced by a rounding procedure to opt∗ (I), the optimal value of the semidef-
inite programming relaxation. It is not difficult to see that any lower bound α
on the performance ratio of a rounding procedure obtained in this way would
satisfy α ≤ γ. Thus, the rounding procedure of Section 5.1 seems to imply that
γ ≥ 0.8721. In this section we describe upper bounds on the integrality ratio γ,
thereby obtaining upper bounds on the performance ratios that can be obtained
Approximation Algorithms for MAX 4-SAT 215

by any approximation algorithm that uses the relaxation of Section 2, at least


using the type of analysis used in Section 3.
It is shown in [17] that the integrality ratio of the canonical semidefinite
programming relaxation of MAX 3-SAT is exactly γ3 = 7/8. As the canonical
relaxations of MAX 3-SAT and MAX 4-SAT coincide on instances of MAX 3-
SAT, we get that γ = γ4 ≤ 7/8.
We can show, that the integrality ratio of the canonical relaxation of MAX
4-SAT, given in Section 2, is at most 0.8753, even when restricted to instances
of MAX {1, 4}-SAT, i.e., to instances of MAX 4-SAT in which all clauses are
of size 1 or 4. Though this upper bound does not preclude the possibility of
obtaining an optimal 7/8-approximation algorithm for MAX 4-SAT using the
canonical semidefinite programming relaxation of the problem, the closeness of
this upper bound to 7/8 does indicate that it will not be easy, even if clauses
of length 3 are not present. It may be necessary to consider stronger relaxations
of MAX 4-SAT, e.g., relaxations obtained by considering several clauses of the
instance at once.

8 Concluding Remarks
We have come frustratingly close to obtaining an optimal 7/8-approximation al-
gorithm for MAX 4-SAT. We have seen that devising a 7/8-approximation algo-
rithm for MAX {1, 4}-SAT is already a challenging problem. Note that Håstad’s
7/8 upper bound for MAX 3-SAT and MAX 4-SAT does not apply to MAX
{1, 4}-SAT, as clauses of length three are not allowed in this problem. A gadget
(see [26]) supplied by Greg Sorkin shows that no polynomial time approximation
algorithm for MAX {1, 4}-SAT can have a performance ratio greater that 9/10,
unless P=NP.
We believe that optimal 7/8-approximation algorithms for MAX 4-SAT and
MAX SAT do exist. The fact that we have come so close to obtaining such
algorithms may in fact be seen as cause for optimism. There is still a possibility
that simple extensions of ideas laid out here could be used to achieve this goal. If
this fails, it may be necessary to attack the problems from a more global point of
view. Note that the analysis carried out here was very local in nature. We only
considered one clause of the instance at a time. As a result we only obtained
lower bounds on the performance ratios of the algorithms considered. It may
even be the case that the algorithms from the family of algorithms considered
here do give a performance ratio of 7/8 for MAX 4-SAT although a more global
analysis is required to show it.
We also hope that MAX 4-SAT would turn out to be the last barrier on the
road to an optimal approximation algorithm for MAX SAT. The almost optimal
algorithms for MAX 4-SAT presented here may be used to obtain an almost
optimal algorithm for MAX SAT. We have not worked out yet the exact bounds
that we can get for MAX SAT as we still hope to get an optimal algorithm for
MAX 4-SAT before proceeding with MAX SAT.
Finally, a word on our methodology. Our work is a bit unusual as we use
experimental and numerical means to obtain theoretical results. We think that
216 Eran Halperin and Uri Zwick

the nature of the problems that we are trying to solve calls for this approach.
No one can rule out, of course, the possibility that some clever new ideas would
dispense with most of the technical difficulties that we are facing here. Until
that happens, however, we see no alternative to the current techniques. The use
of experimental and numerical means does not mean that we have to give up
the rigorousity of the results. Once we obtain the ‘right’ result, we can devote
efforts to proving it rigorously, possibly using automated means.

References
1. F. Alizadeh. Interior point methods in semidefinite programming with applications
to combinatorial optimization. SIAM Journal on Optimization, 5:13–51, 1995.
2. S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy. Proof verification and
the hardness of approximation problems. Journal of the ACM, 45:501–555, 1998.
3. S. Arora and S. Safra. Probabilistic checking of proofs: A new characterization
of NP. Journal of the ACM, 45:70–122, 1998.
4. T. Asano. Approximation algorithms for MAX SAT: Yannakakis vs. Goemans-
Williamson. In Proceedings of the 3nd Israel Symposium on Theory and Computing
Systems, Ramat Gan, Israel, pages 24–37, 1997.
5. T. Asano, T. Ono, and T. Hirata. Approximation algorithms for the maximum
satisfiability problem. Nordic Journal of Computing, 3:388–404, 1996.
6. M. Bellare, O. Goldreich, and M. Sudan. Free bits, PCPs, and
nonapproximability—towards tight results. SIAM Journal on Computing, 27:804–
915, 1998.
7. M. Bellare, S. Goldwasser, C. Lund, and A. Russell. Efficient probabilistically
checkable proofs and applications to approximation. In Proceedings of the 25rd
Annual ACM Symposium on Theory of Computing, San Diego, California, pages
294–304, 1993. See Errata in STOC’94.
8. H.S.M. Coxeter. The functions of Schläfli and Lobatschefsky. Quarterly Journal
of of Mathematics (Oxford), 6:13–29, 1935.
9. U. Feige and M.X. Goemans. Approximating the value of two prover proof systems,
with applications to MAX-2SAT and MAX-DICUT. In Proceedings of the 3nd
Israel Symposium on Theory and Computing Systems, Tel Aviv, Israel, pages 182–
189, 1995.
10. U. Feige, S. Goldwasser, L. Lovász, S. Safra, and M. Szegedy. Interactive proofs
and the hardness of approximating cliques. Journal of the ACM, 43:268–292, 1996.
11. M.X. Goemans and D.P. Williamson. New 3/4-approximation algorithms for the
maximum satisfiability problem. SIAM Journal on Discrete Mathematics, 7:656–
666, 1994.
12. M.X. Goemans and D.P. Williamson. Improved approximation algorithms for max-
imum cut and satisfiability problems using semidefinite programming. Journal of
the ACM, 42:1115–1145, 1995.
13. M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinato-
rial Optimization. Springer Verlag, 1993. Second corrected edition.
14. J. Håstad. Some optimal inapproximability results. In Proceedings of the 29th
Annual ACM Symposium on Theory of Computing, El Paso, Texas, pages 1–10,
1997. Full version available as E-CCC Report number TR97-037.
15. W.Y. Hsiang. On infinitesimal symmetrization and volume formula for spherical or
hyperbolic tetrahedrons. Quarterly Journal of Mathematics (Oxford), 39:463–468,
1988.
Approximation Algorithms for MAX 4-SAT 217

16. D.S. Johnson. Approximation algorithms for combinatorical problems. Journal of


Computer and System Sciences, 9:256–278, 1974.
17. H. Karloff and U. Zwick. A 7/8-approximation algorithm for MAX 3SAT? In
Proceedings of the 38rd Annual IEEE Symposium on Foundations of Computer
Science, Miami Beach, Florida, pages 406–415, 1997.
18. S. Khanna, R. Motwani, M. Sudan, and U. Vazirani. On syntactic versus com-
putational views of approximability. In Proceedings of the 35rd Annual IEEE
Symposium on Foundations of Computer Science, Santa Fe, New Mexico, pages
819–830, 1994.
19. Y. Nesterov and A. Nemirovskii. Interior Point Polynomial Methods in Convex
Programming. SIAM, 1994.
20. Y. E. Nesterov. Semidefinite relaxation and nonconvex quadratic optimization.
Optimization Methods and Software, 9:141–160, 1998.
21. C.H. Papadimitriou and M. Yannakakis. Optimization, approximation, and com-
plexity classes. Journal of Computer and System Sciences, 43:425–440, 1991.
22. P. Raghavan. Probabilistic construction of deterministic algorithms: Approximat-
ing packing integer programs. Journal of Computer and System Sciences, 37:130–
143, 1988.
23. P. Raghavan and C. Thompson. Randomized rounding: A technique for provably
good algorithms and algorithmic proofs. R n Combinatorica, 7:365–374, 1987.
24. L. Schläfli. On the multiple integral dx dy . . . dz, whose limits are p1 = a1 x +
b1 y + . . . + h1 z > 0, p2 > 0, . . . , pn > 0, and x2 + y 2 + . . . + z 2 < 1. Quarterly
Journal of Mathematics (Oxford), 2:269–300, 1858. Continued in Vol. 3 (1860),
pp. 54–68 and pp. 97-108.
25. L. Trevisan. Approximating satisfiable satisfiability problems. In Proceedings of
the 5th European Symposium on Algorithms, Graz, Austria, 1997. 472–485.
26. L. Trevisan, G.B. Sorkin, M. Sudan, and D.P. Williamson. Gadgets, approxi-
mation, and linear programming (extended abstract). In Proceedings of the 37rd
Annual IEEE Symposium on Foundations of Computer Science, Burlington, Ver-
mont, pages 617–626, 1996.
27. E.B. Vinberg. Volumes of non-Euclidean polyhedra. Russian Math. Surveys, 48:15–
45, 1993.
28. M. Yannakakis. On the approximation of maximum satisfiability. Journal of Al-
gorithms, 17:475–502, 1994.
29. U. Zwick. Approximation algorithms for constraint satisfaction problems involving
at most three variables per constraint. In Proceedings of the 9th Annual ACM-
SIAM Symposium on Discrete Algorithms, San Francisco, California, pages 201–
210, 1998.
30. U. Zwick. Finding almost-satisfying assignments. In Proceedings of the 30th Annual
ACM Symposium on Theory of Computing, Dallas, Texas, pages 551–560, 1998.
31. U. Zwick. Outward rotations: a tool for rounding solutions of semidefinite program-
ming relaxations, with applications to max cut and other problems. In Proceedings
of the 31th Annual ACM Symposium on Theory of Computing, Atlanta, Georgia,
1999. To appear.
On the Chvátal Rank of Certain Inequalities

Mark Hartmann1 , Maurice Queyranne2,? , and Yaoguang Wang3


1
University of North Carolina, Chapel Hill, N.C., U.S.A. 27599
[email protected]
2
University of British Columbia, Vancouver, B.C., Canada V6T 1Z2
[email protected]
3
PeopleSoft Inc., San Mateo, CA, U.S.A. 94404
Yaoguang [email protected]

Abstract. The Chvátal rank of an inequality ax ≤ b with integral com-


ponents and valid for the integral hull of a polyhedron P , is the minimum
number of rounds of Gomory-Chvátal cutting planes needed to obtain
the given inequality. The Chvátal rank is at most one if b is the integral
part of the optimum value z(a) of the linear program max{ax : x ∈ P }.
We show that, contrary to what was stated or implied by other authors,
the converse to the latter statement, namely, the Chvátal rank is at least
two if b is less than the integral part of z(a), is not true in general. We
establish simple conditions for which this implication is valid, and ap-
ply these conditions to several classes of facet-inducing inequalities for
travelling salesman polytopes.

1 Introduction
We consider the problem of deciding whether the Chvátal rank of an inequality,
relative to a given polyhedron, is greater than one or not. Formally, given a
finite set E, we let IRE denote the space of vectors with real-valued components
indexed by E, and ZZ E the set of all vectors in IRE all of whose components
are integer. The integral part byc of a real number y is the largest integer less
.
than or equal to y. Given a polyhedron P in IRE , let PI = conv(P ∩ ZZ E ) be
its integral hull, that is, the convex hull of the integer points in P . Following
Chvátal et al. [11], let
n
.
P 0 = x ∈ P : ax ≤ a0 whenever a ∈ ZZ E , a0 ∈ ZZ and
o
max{ax : x ∈ P } < a0 + 1 .

So PI ⊆ P 0 ⊆ P , and we may think of P 0 as being derived from P by a single


.
round of adding cutting planes ax ≤ a0 = bbc derived from any valid inequality
.
ax ≤ b for P with integral left hand side vector a. Defining P (0) = P and
. 0
P (j) = P (j−1) for all positive integers j, Schrijver [25] shows that, when P is
?
Research supported by grants from the Natural Sciences and Engineering Research
Council (NSERC) of Canada

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 218–233, 1999.
c Springer-Verlag Berlin Heidelberg 1999
On the Chvátal Rank of Certain Inequalities 219

a rational polyhedron, each P (j) thus defined is itself a rational polyhedron and
there exists a nonnegative integer r such that P (r) = PI . (Chvátal [10] shows
this when P is bounded.) Consider any inequality ax ≤ a0 with a ∈ ZZ E and
a0 ∈ ZZ, which is valid for PI . Its Chvátal rank rP (a, a0 ) is the smallest j such
that ax ≤ a0 is valid for P (j) . Thus rP (a, a0 ) = 0 iff the inequality ax ≤ a0 is
valid for P ; rP (a, a0 ) = 1 iff it is valid for P 0 but not for P ; and rP (a, a0 ) ≥ 2
iff it is valid for PI but not for P 0 .
The Chvátal rank of an inequality is often considered as a measure of its
“complexity” relative to the formulation defining polyhedron P . Rank-one in-
equalities may be derived by a generally straightforward rounding argument. In
contrast, inequalities of rank two or greater require a more complicated proof,
involving several stages of rounding. In this paper, we consider the problem of
determining whether or not the Chvátal rank a given inequality is at least two,
as considered, among others, in [2], [18] and [4], [5], [6], [16] and [20] for sev-
eral classes of inequalities for travelling salesman polytopes. See also Chvátal
et al. [11] for references to a number of results and conjectures regarding lower
bounds on the Chvátal rank of inequalities and polyhedra related to various
combinatorial optimization problems and more recently Bockmayr et al. [3] and
Eisenbrand and Schulz [15] for polynomial upper bounds on the Chvátal rank of
0–1 polytopes. See also Caprara and Fischetti [7], Caprara et al. [8] and Eisen-
brand [14] regarding the separation problem for inequalities with Chvátal rank
one.
A related notion of disjunctive rank was introduced in the seminal work of
Balas et al. [1]; see also [9]. Their lift-and-project approach, which does not rely
on rounding, applies to polyhedra whose integer variables are restricted to be
0-1 valued. It does not apply (directly) to general integer variables; it is, in this
sense, less general than the Gomory-Chvátal approach. On the other hand, it
applies directly to mixed 0-1 programs, thus allowing continuous variables. It
also leads to interesting mathematical and computational properties. We outline
below some of those properties that are most directly related to the present
paper. Let K ⊆ IRN be a polyhedron and E ⊆ N be the set of 0-1 variables. For
S ⊆ E and any assignment x̄S ∈ {0, 1}S of 0-1 values S to the variables in S, let
KS (x̄S ) = K∩{x ∈ IRN : xj = x̄j ∀j ∈ S}. Let KS = {KS (x̄S ) : x̄S ∈ {0, 1}S },
so we have K = K∅ ⊇ KS ⊇ KE , and KE is the set of all feasible mixed 0-1
solutions associated with K and E. An inequality ax ≤ a0 , valid for KE , has
disjunctive rank r if r is the smallest integer such that there exists a subset
S of E with |S| = r and ax ≤ a0 valid for KS . It follows immediately that
the disjunctive rank of any such inequality is at most |E|, the number of 0-
1 variables. This is in sharp contrast with the fact that the Chvátal rank of
certain inequalities can be unbounded, even for polytopes with two variables and
integral hull contained in {0, 1}2 (e.g., [23], page 227). Eisenbrand and Schulz
[15] show that, if P = K is contained in the unit cube, then the Chvátal rank
of any inequality, valid for its integral hull PI = KN , is O(|N |2 log |N |); this
upper bound is still much larger than the corresponding bound of |N | for the
disjunctive rank. Balas et al. also show that the disjunctive and Chvátal ranks of
220 Mark Hartmann, Maurice Queyranne, and Yaoguang Wang

an inequality are “incomparable”; that is, for certain inequalities, the disjunctive
rank is larger than their Chvátal rank, whereas the converse holds for other
inequalities. An important computational property of the disjunctive rank is
that, for any fixed r, it can be decided in polynomial time whether or not the
disjunctive
 rank of a given inequality ax ≤ a0 is at most r. Indeed, enumerate
all |E|
r = O(|E|r
) subsets S ⊆ E with |S| = r; for any such S, the inequality is
valid for KS if and only if it is valid for all 2r polyhedra KS (x̄S ) with x̄S ∈ {0, 1}S
(if KS (x̄S ) = ∅ then the inequality is trivially valid); since r is fixed, validity
for KS can thus be verified by solving a fixed number 2r of linear programs
max{ax : x ∈ KS (x̄S )}, each about the same size as the linear system defining
K; this yields the desired polynomial time algorithm. In contrast, it follows
from a result of Eisenbrand [14] and the equivalence between separation and
optimization (e.g., [23]), that it is NP-hard to decide whether or not the Chvátal
rank of an inequality is at most one. In this paper we present sufficient conditions
for a negative answer to this decision problem.
For a ∈ ZZ E and a0 ∈ ZZ, a straightforward method for proving that the
Chvátal rank rP (a, a0 ) is zero or one, relies on the optimal value zP (a) of the
linear programming problem
.
zP (a) = max{ax : x ∈ P }

as follows. If a0 ≥ zP (a) then rP (a, a0 ) = 0. If bzP (a)c = a0 < zP (a) then


rP (a, a0 ) = 1. A number of researchers have inferred from these elementary
observations that the following implication must hold:

a0 < bzP (a)c =⇒ rP (a, a0 ) ≥ 2 . (1)

This is stated explicitly in Lemma 3.1 in [6] and is used in Theorem 3.3 therein to
prove that the Chvátal rank of certain clique tree inequalities for the symmetric
Travelling Salesman Problem (TSP) is at least two, where P is the corresponding
subtour polytope (see Sect. 3 below for details), by Giles and Trotter [18] (p. 321)
to disprove a conjecture of Edmonds that each non-trivial facet of the stable set
polytope of a K1,3 -free graph has Chvátal rank one, where P is described by the
non-negativity and clique constraints, and by Balas and Saltzman in Proposition
3.8 in [2] to show that the Chvátal rank of inequalities induced by certain cliques
of the complete tri-partite graph Kn,n,n is at least (in fact, exactly) two, where
P is the linear programming relaxation of the (axial) three-index assignment
problem. This result is also implicit in related statements in [4] (p. 265) for the
envelope inequality for the symmetric TSP, and in [16] (p. 263) for a class of
T -lifted comb inequalities for the asymmetric TSP.
Example 1 below shows that implication (1) may fail if the inequality is not
properly scaled.
. 
Example 1: Let P = x = (x1 , x2 ) ∈ IR2 : 2x1 + 3x2 ≤ 11, x ≥ 0 . The in-
equality ax ≤ a0 with a = (3, 3) and a0 = 15 is facet-inducing for PI =
conv{(0, 0), (0, 3), (1, 3), (4, 1), (5, 0)}. We have zP (a) = a (5 21 , 0) = 16 21 so
a0 < bzP (a)c. Yet rP (a, a0 ) = 1, since 3x1 +3x2 ≤ 15 is equivalent to x1 +x2 ≤ 5,
which is valid for P 0 since zP (1, 1) = 5 12 . t
u
On the Chvátal Rank of Certain Inequalities 221

We should therefore assume that the components of a are relatively prime


integers. Example 2 below, however, shows that this assumption is still not suf-
ficient for the validity of implication (1).

Example 2: With polyhedron P of Example 1, now let a = (3, 1) and a0 = 15.


Inequality ax ≤ a0 is valid for PI , with equality attained for x = (5, 0). We have
zP (a) = 16 21 so here again a0 < bzP (a)c. Yet rP (a, a0 ) = 1, since 3x1 + x2 ≤ 15
is implied by x1 + x2 ≤ 5 and x1 ≤ 5 (the latter inequality is valid for P 0 since
zP (1, 0) = 5 12 ). t
u
Note that the inequality 3x1 + x2 ≤ 15 in Example 2, not being facet-
inducing for PI , is implied by other inequalities. This allowed us to obtain a
large enough integrality gap zP (a) − bzP (a)c while retaining relatively prime
components. Because most inequalities of interest in combinatorial optimization
are facet-inducing for the integral hull PI , we shall restrict attention to these
cases. Corollary 3 herein will show that the preceding two conditions on an in-
equality (that is, with relatively prime components, and facet-inducing for PI )
are sufficient for implication (1) to hold, when PI is full dimensional.
Example 3 below, however, shows that this need not be the case when PI is
not full dimensional. (Another, more complicated example using a TSP facet-
inducing inequality is given in [5].)
. 
Example 3: Polyhedron P = x = (x1 , x2 ) ∈ IR2 : 2x1 + 3x2 = 11, x ≥ 0 is
derived from that of Example 1 above by turning the first inequality constraint
into an equation. The inequality ax ≤ a0 with a = (1, 0) and a0 = 4 has relatively
prime components, and is facet-inducing for PI = conv{(1, 3); (4, 1)}. We have
zP (a) = a(5 12 , 0) = 5 12 so a0 < bzP (a)c. Yet rP (a, a0 ) = 1, since x1 ≤ 4 is implied
by inequalities −2x1 − 3x2 ≤ −11 and x1 + x2 ≤ 5 which are valid for P 0 . t
u
In this paper, we give simple sufficient conditions for implication (1) to hold.
We then use these conditions to present correct proofs that the inequalities from
[2], [4], [6] [16] and [18] cited above have Chvátal rank at least two, where P
is the corresponding relaxation. Some of these conditions were stated without
proof and used in [5] to prove that all ladder inequalities have Chvátal rank two
for the symmetric TSP.
The content of the present paper is as follows. In Sect. 2, we define a sufficient
condition, called reduced integral form, for implication (1) to hold. This result
implies in particular that (1) holds if PI is full dimensional and inequality ax ≤
a0 is facet-inducing for PI and has relatively prime integer components. We also
present a relatively simple sufficient condition for an inequality to be in reduced
integral form. In Sections 3 and 4, we consider some classes of inequalities for
the symmetric and asymmetric TSP, respectively. We show how to apply this
sufficient condition and prove that several classes of TSP inequalities, including
those referenced above, have Chvátal rank at least two relative to the subtour
polytope.
222 Mark Hartmann, Maurice Queyranne, and Yaoguang Wang

2 Reduced Integral Forms and B-Canonical Forms


A system of linear equations Cx = d represents the equality system for a polyhe-
dron P ⊆ IRE iff all equations valid for all x ∈ P are linear combinations of the
equations in Cx = d, and if it is minimal for this property. Recall that all systems
Cx = d representing the equality system for P have the same number of equa-
.
tions, which we denote by m(P ). The dimension of P is dim(P ) = |E| − m(P ).
Other polyhedral concepts and results used below are found in [23].
An integral polyhedron is a rational polyhedron in IRE with at least one
extreme point, and such that all its extreme points are integral. An inequality
ax ≤ a0 is a reduced integral form for integral polyhedron Q if
(i) it is facet-inducing for Q;
(ii) all its components ae (e ∈ E) are integers, that is, a ∈ ZZ E ; and
(iii) there exists z ∈ ZZ E with az = 1 and Cz = 0, where Cx = d is any linear
system representing the equality system for Q.
Remark. It follows immediately from condition (iii) that an inequality ax ≤ a0
in reduced integral form has relatively prime integral components. In this case,
we can interpret the rounding down of the right-hand-side b to bbc as moving
the supporting hyperplane to P towards Q until it first hits an integral point.
Condition (iii) ensures that at least one such point lies in the affine space of Q
(this can be shown starting from an integer point in the facet of Q determined
by ax ≤ a0 ).
Our first result relates reduced integral forms to condition (1), which is the
nontrivial half of equivalence (2) below (see Theorem 2.2.3 of Hartmann [21]).

Theorem 1 Let ax ≤ a0 be a reduced integral form for integral polyhedron Q.


The equivalence
rP (a, a0 ) ≤ 1 iff a0 ≥ bzP (a)c . (2)
holds for any rational polyhedron P such that m(Q) = m(P ) and conv(P ∩
ZZ E ) = Q.

Proof. Assume that ax ≤ a0 is a reduced integral form for Q, so that a0 ∈ ZZ


and Q = conv(P ∩ ZZ E ). The implication a0 ≥ bzP (a)c ⇒ rP (a, a0 ) ≤ 1
follows directly from the definition of the Chvátal rank. To prove the converse
implication, we assume that rP (a, a0 ) ≤ 1 and aim to prove that zP (a) < a0 + 1.
Since ax ≤ a0 is valid for P 0 and P 0 is a rational polyhedron, the inequality
ax ≤ a0 is implied by the inequalities and equations in a minimal linear system
defining P 0 . Since Q ⊆ P 0 ⊆ P and m(Q) = m(P ), we may assumeP that Cx = d
is thePequality system for P , P 0 and Q. Therefore, a = k
k µk c + νC and
a0 ≥ k µk c0 + νd, where each inequality c x ≤ c0 is facet-inducing for P 0 , has
k k k

integral components and satisfies zP (ck ) < ck0 + 1; all µk ∈ IR satisfy µk ≥ 0;


and ν ∈ IRm(Q) . Since ax ≤ a0 is facet-inducing for Q, and all inequalities
ck x ≤ ck0 are valid for Q, it follows that at least one of these inequalities, say,
c1 x ≤ c10 , induces the same facet of Q as ax ≤ a0 . Therefore, there exist a vector
On the Chvátal Rank of Certain Inequalities 223

λ ∈ IRm(Q) and a scalar s > 0 such that (a, a0 ) = s(c1 , c10 ) + λ(C, d). If z is the
“certificate” from condition (iii), then

1 = az = (sc1 + λC)z = s(c1 z) + λ(Cz) = s(c1 z),

which implies that s ≤ 1 since c1 z is a (positive) integer. Since zP (c1 ) < c10 + 1
and 0 < s ≤ 1, we have

zP (a) = zP (sc1 + λC) = szP (c1 ) + λd < s(c10 + 1) + λd = a0 + s ≤ a0 + 1 .

The proof is complete. t


u

Example 4: This example shows that we cannot drop the requirement that
m(P ) = m(Q). Let
. 
P = x = (x1 , x2 ) ∈ IR2 : 4x1 + 3x2 ≤ 4, x1 + x2 ≥ 1, 4x1 + x2 ≥ 0 .

The inequality ax ≤ a0 with a = (0, 1) and a0 = 1 is facet-inducing for PI =


conv{(1, 0); (0, 1)}, whose equality set is x1 + x2 = 1, and is in reduced integral
form as shown by the “certificate” z = (−1, 1). We have zP (a) = 2 so a0 <
bzP (a)c. Yet rP (a, a0 ) = 1, since x2 ≤ 1 is implied by x1 + x2 ≤ 1 and −x1 ≤ 0
(which are valid for P 0 since zP (1, 1) = 32 and zP (−1, 0) = 12 ). t
u
We now turn to the existence and construction of reduced integral forms.

Theorem 2 Let Q be an integral polyhedron and let P be any rational polyhedron


such that m(P ) = m(Q) and conv(P ∩ ZZ E ) = Q. If the inequality ax ≤ a0
induces a facet F of Q, then there exists an inequality a0 x ≤ a00 in reduced
integral form which induces F . Moreover, if it is possible to optimize over P in
polynomial time, then a0 x ≤ a00 can be computed in polynomial time.

Proof. Without loss of generality we may assume that (a, a0 ) ∈ ZZ E × ZZ. If


condition (iii) is not satisfied, then let n be the smallest positive integer for
which there exists z ∈ ZZ E with az = n and Cz = 0, where Cx = d is any linear
system representing the equality system for Q. Since ax ≤ a0 is facet-inducing,
a is linearly independent of the rows of C, so that matrix
 
a
A =
C

has full row rank. Theorem 5.2 of Schrijver [26] implies that there exists z ∈ ZZ E
with az = n and Cz = 0 if and only if nH −1 e is integral, where [H|0] is the
Hermite normal form of A and e is the first unit vector. By choice of n, the
components of nH −1 e are relatively prime integers, so there exists w ∈ ZZ m(Q)+1
such that w(nH −1 e) = 1. This implies that wH −1 = ( n1 , λ) for some λ ∈ IRm(Q) .
Theorem 5.2 of [26] also implies that A = [H|0]U for some unimodular (and
hence integral) matrix U , and thus
.
a0 = 1
na + λC = wH −1 A = wH −1 [H|0]U = wU
224 Mark Hartmann, Maurice Queyranne, and Yaoguang Wang

is integral. If z ∈ ZZ E has az = n and Cz = 0, then a0 z = n1 az + λCz = 1. Hence


a0 x ≤ a00 = n1 a0 + λd is a reduced integral form for ax ≤ a0 .
Edmonds et al. [13] show how to compute a matrix C and vector d for which
Cx = d is the equality system for P in polynomial time, provided it is possible
to optimize over P in polynomial time. Domich et al. [12] show how to compute
the Hermite normal form [H|0] of A in polynomial time. Hence H −1 e can be
determined in polynomial time, by solving Hx = e. Theorem 5.1a of [26] implies
that w ∈ ZZ m(Q)+1 such that w(nH −1 e) = 1 can be determined in polynomial
time using the Euclidean algorithm. Since λ ∈ IRm(Q) can be determined in
polynomial time by solving yH = w, a reduced integral form a0 x ≤ a00 can be
computed in polynomial time. t
u

The following example illustrates Theorem 2 and its proof.


Example 5: With polyhedron P of Example 3, let Q = PI and F = {(4, 1)}.
Here a = (1, 0) and so the matrix
   
a 10
A= = .
C 23

The Hermite normal form of A is


   
10 1 0
H= =⇒ H −1 = .
−1 3 1 1
3 3

Here n = 3 and hence w1 and w2 must solve 3w1 + w2 = 1. If we let w = (0, 1)


then λ = 13 and a0 = 13 (1, 0) + 13 (2, 3) = (11). Thus x1 + x2 ≤ 13 (4) + 13 (11) = 5 is
an inequality in reduced integral form which induces F . Other choices of w lead
to the reduced integral forms 5x1 + 7x2 ≤ 27 and −x1 − 2x2 ≤ −6. t
u
Combining Theorems 1 and 2, we obtain:

Corollary 3 Let Q be any integral polyhedron and let P be any rational polyhe-
dron such that conv(P ∩ ZZ E ) = Q.
(A) If m(P ) = m(Q) then for every facet F of Q, there exists an inequality
ax ≤ a0 , inducing facet F and with integral components, such that equivalence (2)
holds.
(B) If Q is full dimensional then every facet-inducing inequality ax ≤ a0 for Q,
scaled so its components are relatively prime integers, verifies equivalence (2).

Proof. For (B), recall that, since Q is a rational polyhedron, every facet-inducing
inequality for Q may be scaled so its components are relatively prime integers.
Since m(Q) = 0, condition (iii) then holds trivially so the inequality is in reduced
integral form. The rest of the proof follows directly from Theorems 1 and 2. u t

Thus, to prove that the Chvátal rank rP (a, a0 ) of a facet-inducing inequality


ax ≤ a0 for a full-dimensional polyhedron Q is at least two, it suffices to scale
the inequality so the components of a are relative prime integers, and then to
find a point x̄ ∈ P such that ax̄ ≥ a0 + 1, where P is any rational polyhedron
On the Chvátal Rank of Certain Inequalities 225

such that conv(P ∩ ZZ E ) = Q. In Giles and Trotter [18], the stable-set polytope
is full dimensional and the facet-inducing inequality in question has coefficients
of 1, 2 and 3, so equivalence (2) holds by Corollary 3(B).
We now consider the case where Q is not full-dimensional. This case is im-
portant for many combinatorial optimization problems, such as the travelling
salesman problems discussed in the next sections. Although the proof of Theo-
rem 2 is constructive, it may not be easy to apply directly. We present below
a simple condition for obtaining a reduced integral form when Q is not full-
dimensional.
Consider any B ⊆ E such that |B| = m(Q) and the |B| × |B| submatrix CB
of C with columns indexed by B is nonsingular. Subset B (and by extension,
.
matrix CB ) is called a basis of the equality system. Defining N = E \ B, we may
E
write C = (CB , CN ) and, for any vector a ∈ IR , a = (aB , aN ). An inequality
ax ≤ a0 is in B-canonical form if aB = 0. It is in integral B-canonical form if the
components of aN are relatively prime integers. Note that, for any given basis B
of the equality system for a rational polyhedron Q, every facet of Q is induced by
a unique inequality in integral B-canonical form. (Indeed, the equations aB = 0
define the facet-inducing inequality ax ≤ a0 uniquely, up to a positive multiple;
this multiple is determined by the requirement that the components of a be
relatively prime integers.)
−1
Define a basis B of Cx = d to be dual-slack integral if the matrix CB CN
is integer. Note that this definition is equivalent to requiring that the vector
.
c = c − cB CB−1
C be integral for every integral c ∈ ZZ E . The vector c may thus be
interpreted as that of basic dual slack variables (or reduced costs) for the linear
programming problem max{cx : Cx = d, x ≥ 0}. Note also that the property
of being dual-slack integral is independent of scaling and, more generally, of the
linear system used to represent the given equality system. (Indeed, if C 0 x = d0
represents the same system as Cx = d, then (C 0 , d0 ) = M (C, d) where M is a
0 −1 0
nonsingular matrix; it follows that c − cB (CB ) C = c.)

Proposition 4 Let B be a dual-slack integral basis of the equality system of an


integral polyhedron Q. Then the integral B-canonical form of any facet-inducing
inequality is a reduced integral form.

Proof. Let B be a dual-slack integral basis of the equality system for integral
polyhedron Q. Let ax ≤ a0 be a facet-inducing inequality in integral B-canonical
form. We only need to verify condition (iii). Since the components of aN are
relatively prime, there exists zN ∈ ZZ N such that aN zN = 1. Define zB =
−1
−CB CN zN so that Cz = CB zB + CN zN = 0. Then z is the “certificate” in
condition (iii). t
u

Remark. This result can also be interpreted in terms of the Hermite normal
form of the matrix A = (AB , AN ). If ax ≤ a0 is in B-canonical form for a dual-
−1
slack integral basis B, then CB CN is integral and aB = 0 so integer multiples
of the columns of AB can be added to the columns of AN to zero out CN . But
226 Mark Hartmann, Maurice Queyranne, and Yaoguang Wang

then since the components of aN are relatively prime, the Hermite normal form
of A will be [H|0], where  
1 0
H=
0 HB
and HB is the Hermite normal form of CB .
Thus if P is a rational polyhedron with the same equality set as an inte-
gral polyhedron Q with conv(P ∩ ZZ E ) = Q, to prove that the Chvátal rank
rP (a, a0 ) of a facet-inducing inequality ax ≤ a0 with relatively prime inte-
ger components is at least two, it suffices to exhibit a dual-slack integral ba-
sis comprised only of edges e with ae = 0, and a point x∗ ∈ P with ax∗ ≥
a0 + 1. In Balas and Saltzman [2], the (axial) three-index assignment polytope
is not full dimensional, but the facet-inducing inequality induced by the clique
{(3, 3, 3), (2, 2, 3), (2, 3, 2), (3, 2, 2)} of Kn,n,n is in B-canonical form with respect
to the dual-slack integral basis described in the proof of Lemma 3.1 ibid. Hence
equivalence (2) holds for this class of inequalities.
Further examples of integral B-canonical forms for which Proposition 4 ap-
plies will be presented in the following sections, for integral polytopes related
to symmetric and asymmetric travelling salesman problems, respectively. Note,
however, that dual-slack integral bases need not exist (see, e.g., Example 3), so
Proposition 4 will not always apply.

3 Inequalities for Symmetric Travelling Salesman


Polytopes
In this section, we consider the symmetric travelling salesman (STS) problem
on the n-node complete graph Kn = (V, E). The subtour polytope is
.
P = {x ∈ IRE : Cx = d, x ≥ 0 and
x(δ(S)) ≥ 2 for all S ⊂ V with 2 ≤ |S| ≤ n − 1} (3)

where C is the node-edge incidence matrix of Kn , d is a column vector of 2’s, and


δ(S) denotes the set of all edges with exactly one endpoint in S. The symmetric
.
travelling salesman polytope is Q = conv(P ∩ ZZ E ). Since P = Q for n ≤ 5 (see,
e.g., [19]), we assume n ≥ 6.
Recall that the n degree constraints x(δ(v)) = 2 are linearly independent and
represent the equality system Cx = d for both P and Q.
The following proposition characterizes the dual-slack integral bases of C.

Proposition 5 A subset B ⊆ E defines a dual-slack integral basis if and only


.
if the subgraph GB = (V, B) of Kn is connected, contains exactly one cycle, and
this cycle is odd.

Proof. Recall that, since graph Kn is connected and contains some odd cycles,
B is a basis of its incidence matrix C if and only if every connected component
of GB contains exactly one cycle and every such cycle is odd (see, e.g., [22]).
On the Chvátal Rank of Certain Inequalities 227

Furthermore, let D(v) denote the connected component of GB containing node v,


and let uv ∈ IRV denote the v-th unit vector (where uvi = 1 if i = v, and zero
otherwise). Then the unique solution y v to CB y v = uv is obtained as follows:
yev alternates between values +1 and −1 for e on the path from node v to the
odd cycle in D(v); yev alternates between values in { 12 , − 21 } for e on this odd
cycle; and yev = 0 for all other edges. Consider a column cij = ui + uj in CN ,
.
so the unique solution to CB y = cij is y ij = y i + y j . If D(i) 6= D(j), then y ij
has a fractional component for every edge in the odd cycles of D(i) and D(j).
Otherwise, D(i) = D(j) and, since yeij ≡ 12 + 12 ≡ 0 modulo 1 for every edge e
in the odd cycle of D(i), all yeij ∈ {0, ±1, ±2}. The proposition follows. t
u

This proposition allows us to apply Proposition 4 and Theorem 1 so as to


prove that certain classes of facet-inducing inequalities for the symmetric trav-
elling salesman polytope have Chvátal rank at least two, relative to the subtour
polytope. This approach was used in [5] to show that ladder inequalities have
Chvátal rank at least (in fact, exactly) two, relative to the subtour polytope. We
now give two additional examples validating Chvátal rank statements mentioned
in the Introduction.
First, consider the class of clique tree inequalities. A clique tree is collection
of two types of subsets of V , teeth and handles, satisfying certain properties; see,
e.g., [6], [19], [20] or [23] for details and a definition of the corresponding STS
inequality. Clique trees with no handle have a single tooth, and the correspond-
ing STS inequalities are equivalent to constraints used in the definition of the
subtour polytope P . Clique trees with one handle are called combs. Clique tree
inequalities were introduced and proved to be facet-inducing for the STS poly-
tope by Grötschel and Pulleyblank [20]. They were further analyzed by Boyd
and Pulleyblank [6]. Chvátal et al. [11] construct a class of clique trees with
any positive number h of handles, for which they prove a lower bound f (h) on
the Chvátal rank, growing almost linearly with h. It takes, however, at least 29
handles (and therefore at least 146 nodes) to get f (h) > 1, that is, to prove
by their approach that the Chvátal rank is at least two, relative to the subtour
polytope. We use Proposition 5 and the results from Sect. 2 above in the proof
of the following result, which was first announced without proof in [20] (p. 568),
and then stated as Theorem 3.3 and partly proven in [6].

Theorem 6 A clique tree inequality has Chvátal rank one, relative to the STS
subtour polytope, if and only it is a comb.

Proof. Theorem 3.2 in [6] states that a clique tree has Chvátal rank zero if and
only if it consists of a single tooth. Theorem 2.2 ibid. implies that the Chvátal
rank is one if the clique tree is a comb. So, we only need to prove that every
clique tree with at least two handles has Chvátal rank at least two. Let H denote
the union of all the handles, and let T1 and T2 be any two distinct teeth. There
exist two nodes t1 ∈ T1 \ H and t2 ∈ T2 \ H, and a node h ∈ V \ (T1 ∪ T2 ). Define

B = {t1 v : v ∈ V \ T1 } ∪ {t2 v : v ∈ T1 } ∪ {t2 h} .


228 Mark Hartmann, Maurice Queyranne, and Yaoguang Wang

Thus, B forms a dual-slack integral basis, with odd cycle (t1 , t2 , h). The clique
tree inequality ax ≤ a0 , as defined in equation (2.1) ibid., is in integer B-
canonical form. Hence, by Proposition 4, it is a reduced integral form. By The-
orem 2.2 ibid., there exists x∗ ∈ P with ax∗ ≥ a0 + 1. The result then follows
by applying Proposition 4 and Theorem 1. t
u

Now consider the case n = 7 and the envelope inequality ax ≤ 8, as given on


p. 261 and in Figure 1 of [4]. Therein, the inequality is shown to be facet-inducing
(note that this result only holds for n = 7). A dual-slack integral basis B, for
which the inequality is in integral B-canonical form, is

B = {(3, 4), (3, 5), (3, 6), (3, 7), (2, 4), (1, 5), (5, 6)} .

A solution x∗ in P with ax∗ = 9 is given on page 265, ibid. It then follows that
the envelope inequality for n = 7 is of Chvátal rank at least two, relative to the
subtour polytope, as stated in [4].
Such results may easily be extended to other classes of STS facet-inducing
inequalities. This is left to the interested reader.

4 Inequalities for Asymmetric Travelling Salesman


Polytopes

We now turn to the asymmetric travelling salesman (ATS) problem on the n-


node complete digraph Dn = (V, A). The subtour polytope is now
.
P = {x ∈ IRA : Cx = d, x(δ + (S)) ≥ 1 and
x(δ − (S)) ≥ 1 for all S ⊂ V with 2 ≤ |S| ≤ n − 1} (4)

where C is the node-arc incidence matrix of Dn , d is a column vector of 1’s,


and δ + (S) (resp., δ − (S)) denotes the set of all arcs with tail (resp., head) in S
and head (resp., tail) in V \ S. The asymmetric travelling salesman polytope is
.
Q = conv(P ∩ZZ A ). We assume n ≥ 5 (see [19] for the cases n ≤ 4). Recall (ibid.)
that any 2n−1 of the 2n degree constraints x(δ + (v)) = x(δ − (v)) = 1 are linearly
independent and represent the equality system for both P and Q. When x ∈ IRA
satisfies these equations, the constraints x(δ + (S)) ≥ 1 and x(δ − (V \ S)) ≥ 1 are
equivalent.
Recall also that the constraint matrix C is totally unimodular. Any basis B
of C is therefore dual-slack integral. Bases of C may be described as follows.
. e where V 1 and V 2
Construct a complete bipartite graph Kn,n = (V 1 ∪ V 2 , A)
are two copies of V , and A e consists of all edges i1 j 2 with i1 ∈ V 1 and j 2 ∈ V 2 .
To any subset B ⊆ A, associate the subset B e⊆A e where ij ∈ B if and only if
e e
i j ∈ B. Then B is a basis of C if and only if B defines a spanning tree of Kn,n
1 2

(see, e.g., [22]).


The next proposition relates dual-slack integral bases for symmetric travelling
salesman (STS) polytopes to those for the ATS polytope on the same node set V .
On the Chvátal Rank of Certain Inequalities 229

Proposition 7 Let B 0 = T 0 ∪ {i0 j0 } be a basis of the equality system for the


symmetric travelling salesman polytope on n nodes, where T 0 is a spanning tree
. e Then B = .
of Kn . Let T = {ij, ji : ij ∈ T 0 } ⊂ A. T ∪ {i0 j0 } is a (dual-slack
integral) basis of the equality set of the ATS polytope.

Proof. The tree T 0 in Kn is a bipartite graph, and induces a partition (S, S)


of the node set V , where ij ∈ T 0 implies that i and j are not in the same
subset S or S. The arc set Te ⊂ A e thus induces a forest in Kn,n comprised of
2
exactly two connected components: one spanning the node subset S 1 ∪ S and
1
the other one spanning its complement S 2 ∪ S . Since the edge i0 j0 creates an
odd cycle with T 0 , the arc i10 j02 connects these two subtrees into a spanning tree
e in Kn,n . The result follows.
(V 1 ∪ V 2 , B) t
u

We may first apply Proposition 7, in the spirit of the work in [24], to symmet-
ric ATS inequalities. A symmetric ATS inequality ax ≤ a0 is one that satisfies
aij = aji for all ij ∈ A. The corresponding STS inequality a0 y ≤ a0 is defined
by a0ij = aij for all i, j ∈ V (with i < j). Recall that a symmetric inequality is
valid for ATS polytope if and only if the corresponding STS inequality is valid
for the STS polytope.

Corollary 8 Let ax ≤ a0 be a symmetric, facet-inducing ATS inequality with


relative prime components, and such that the corresponding STS inequality a0 y ≤
a0 is in integral B 0 -canonical form relative to a dual-slack integral basis B 0 for
the STS polytope, and there exists y ∗ in the STS subtour polytope such that
a0 y ∗ ≥ a0 + 1. Then ax ≤ a0 is of Chvátal rank at least two, relative to the ATS
subtour polytope.

Proof. By Proposition 7, we may construct a (dual-slack integral) basis B for Q


such that ax ≤ a0 is in integral B-canonical form. The point x∗ ∈ IRA defined
.
by x∗ij = x∗ji = 12 yij

∀ij ∈ E, satisfies x∗ ∈ P and ax∗ ≥ a0 + 1. The result then
follows by applying Proposition 4 and Theorem 1. t
u

Note that the assumption that the symmetric inequality ax ≤ a0 be ATS


facet-inducing implies that the corresponding STS inequality a0 y ≤ a0 is also
facet-inducing [17] for the STS polytope. The other assumptions in Corollary 8
then imply that a0 y ≤ a0 also has Chvátal rank at least two, relative to the STS
subtour polytope.
Using Theorem 6 above and Fischetti’s result [17] that all clique trees are
ATS facet-inducing for n ≥ 7, Corollary 8 implies:

Corollary 9 A clique tree inequality has Chvátal rank one, relative to the ATS
subtour polytope, if and only it is a comb.

Remark. Corollary 9 actually implies the assertion in Theorem 6 that every


clique tree inequality with at least two handles has Chvátal rank at least two
with respect to the STS subtour polytope. In fact, if ax ≤ a0 is a symmetric
ATS inequality, then the Chvátal rank of the corresponding STS inequality with
230 Mark Hartmann, Maurice Queyranne, and Yaoguang Wang

respect to the STS subtour polytope is an upper bound on the Chvátal rank
of ax ≤ a0 with respect to the ATS subtour polytope (see Lemma 2.2.1 of
Hartmann [21]).
Proposition 7 may also apply to asymmetric ATS inequalities. We conclude
this paper by investigating an asymmetric lifting method due to Fischetti [16].
Let ax ≤ a0 be a valid inequality for Q and assume that:
(a) its components are nonnegative integers;
(b) there exist two distinct isolated nodes p and q, i.e., such that ahi = aih = 0
for all i ∈ V and h ∈ {p, q}; and
(c) there exists a node w ∈ V \ {p, q} such that

max {aiw + awj − aij : i, j ∈ V \ {w}} = 1

and the restriction


. X
aw x = (aij xij : ij ∈ A \ δ(w)) ≤ a0 − 1

of ax ≤ a0 to V \ {w} is valid and induces a nonempty face of the ATS


polytope on the node set V \ {w}.
Applying T -lifting to ax ≤ a0 yields the following T -lifted inequality:
. .
βx = ax + xpw + xwq + xpq ≤ β0 = a0 + 1 . (5)

We now give a sufficient condition for the T -lifted inequality to have Chvátal
rank at least two, as was claimed for an example in [16]. We shall use the following
.
notation in the rest of this paper. Let Ve = V \ {p, q}. We use tildes to denote
any object associated with Ve . Thus A e is the associated arc set; Pe and Qe are the
subtour polytope and ATS polytope, respectively, defined on Ve ; e axe ≤ a0 is the
restriction to IRAe of ax ≤ a0 ; and δ(w)
e .
= δe+ (w) ∪ δe− (w) is the set of all arcs
e incident with w.
in A

Theorem 10 Assume that the T -lifted inequality βx ≤ β0 defined in (5) above,


e ∈ Pe
is facet-inducing for the ATS polytope Q. Assume also that there exist x
and ξ ∈ IReδ (w)
such that 0 ≤ ξ ≤ x
e,
X X 1 X X 1
ξwi = ξiw = and e xwi −ξwi )+
awi (e e xiw −ξiw ) ≤ e
aiw (e axe−a0 − .
2 2
i6=w i6=w i6=w i6=w

Then βx ≤ β0 has Chvátal rank at least two.

Proof. Since βx ≤ β0 is facet-inducing with relatively prime integer components,


we only need to exhibit a dual-slack integral basis B such that βe = 0 ∀e ∈ B,
and a point x∗ ∈ P with βx∗ ≥ a0 + 2. Choose r ∈ Ve \ {w} and define
n o
.
Bw = pv, vp : v ∈ Ve \ {w} ∪ {qr, rq, qp} .
On the Chvátal Rank of Certain Inequalities 231

Then Bw satisfies the conditions of Proposition 7 relative to the node set V \{w},
and therefore defines a basis of the corresponding STS polytope. Adding the two
.
arcs wp ∈ δ + (w) and qw ∈ δ − (w) thus yields a (dual-slack integral) basis B =
Bw ∪ {wp, qw} for Q, for which βx ≤ β0 is in integral B-canonical form. Define
x∗ ∈ IRA as follows:

 ee
x for e ∈ A e
e \ δ(w)



 e
e − ξe for e ∈ δ(w);
x

 e
∗ .
xe = ξ iw for e = ip with i ∈ Ve \ {w};

 ξ for e = qi with i ∈ Ve \ {w};

 wi

 1
for e ∈ {pw, wq, pq}; and
2
0 for e ∈ {wp, qw, qp}.
Since x∗ satisfies the degree constraints Cx∗ = d and
X X 3
βx∗ = e
axe+ e xwi − ξwi ) +
awi (e e xiw − ξiw ) +
aiw (e ≥ a0 + 2 ,
2
i6=w i6=w

we only need to show that x∗ satisfies, for all nonempty S ⊂ V , the subtour
elimination constraint x∗ (δ + (S)) ≥ 1. First, it is straightforward to verify that
these constraints are satisfied for all nonempty S ⊂P {w, p, q} and S ⊇ Ve \ {w}.
.
Next, for disjoint node sets S and T , let x(S : T ) = {xij : i ∈ S, j ∈ T } for all
.
6 T ⊂ Ve \{w}, let T = Ve \(T ∪{w}), so V = T ∪T ∪{w, p, q}
x ∈ IRA . For any ∅ =
and these three subsets are disjoint. We have
 
1≤x e T : T ∪ {w} = x∗ T : T ∪ {w, p, q}
 
< x∗ T ∪ {q} : T ∪ {w, p} ≤ x∗ T ∪ {p, q} : T ∪ {w}

where the strict inequality follows from x∗ (T : q) = 0 and x∗ (q : p) = 12 ,


and the last inequality from x∗ (T : p) ≤ 12 = x∗ (p : w). Hence the subtour
elimination constraints x∗ (δ + (S)) ≥ 1 are satisfied for S = T ; S = T ∪ {q};
and S = T ∪ {p, q}. Since x∗ (T : p) < 1 = x∗ (p : w) + x∗ (p : q), we have
x∗ (δ + (T ∪ {p})) ≥ 1. By exchanging T and T , and also p and q, it follows
similarly that x∗ (δ + (S)) = x∗ (δ − (S)) ≥ 1 holds for S = T ∪{w}; S = T ∪{w, p};
S = T ∪ {w, q, p}; and S = T ∪ {w, q}. The proof is complete. t
u

We now apply T -lifting to any clique tree inequality. In a clique tree, a


tooth is pendent if it intersects exactly one handle, and nonpendent otherwise,
i.e., if it intersects at least two handles. Let H denote the union of all handles.
The following proposition gives two sufficient conditions for a T -lifted clique tree
inequality to have Chvátal rank at least 2, relative to the ATS subtour polytope.
We refer to [16] for a discussion of cases where the T -lifted clique tree inequality
is facet-inducing for the ATS polytope.

Proposition 11 A T -lifted clique tree inequality (5) has Chvátal rank at least 2,
relative to the ATS subtour polytope, if it is facet-inducing for the ATS polytope
and either
232 Mark Hartmann, Maurice Queyranne, and Yaoguang Wang

(a) it has at least one handle, and node w is in T ∩ H where T is a pendent


tooth; or
(b) it has at least two nonpendent teeth.

Proof. By Theorem 10, we only need to construct the required x e and ξ. For
case (a), recall that, if a clique tree has at least one handle, then there exists
ye in the STS subtour polytope such that a0 ye − a0 ≥ 12 . Further, from step (6)
in the proof of Theorem 2.2 in [6] (p. 170), for every w as described in (a),
we can construct such a ye with yewk = yewl = 12 for two edges wk and wl with
a0wk = a0wl = 0. Then x e defined by x
eij = x
eji = yeij for all ij ∈ E, and ξ defined
by ξij = 14 if ij ∈ {wk, kw, wl, lw}, and 0 otherwise, satisfy the conditions of
Theorem 10. The proof is complete for case (a). Now consider case (b). The
solution ye constructed in the proof of Theorem 2.2 in [6] satisfies a0 ye − a0 ≥ 32 .
If w ∈ T \ H where T is a pendent teeth which intersects exactly two handles,
then yee = 1 for two edges wk and wl with k, l ∈ T ; and 0 for all other edges (see
step (3) in the proof of Theorem 2.2 ibid.). With x e as defined above, let the only
nonzero components of ξ be ξwk = ξkw = 12 . For every choice of w other than an
isolated node, that in (a) or that just discussed, we can apply the construction
in the proof of Theorem 2.2 in [6] so that the solution ye satisfies yewk = yewl = 12
for two nodes k, l in the same handle as w, or in the same nonpendent tooth T
(intersecting at least three handles) as w. With x e as defined above, let the only
nonzero components of ξ be ξwk = ξkw = ξwl = ξlw = 14 . Finally, note that
aij = a0ij = 1 for all arcs ij ∈ A such that ξij > 0, so ξ satisfies all the requisite
conditions. The proof is complete. t
u

In particular, case (a) of Proposition 11 proves a statement in [16] (p. 263)


that a T -lifted comb, where node w is in a tooth and not in the handle, has
Chvátal rank at least two, relative to the ATS polytope.

Acknowledgements. The authors would like to thank Professor William H.


Cunningham, of the University of Waterloo, for his support and his encour-
agement to complete this work, and Professor Sebastian Ceria, of Columbia
University, for information regarding the disjunctive rank of inequalities.

References
1. E. Balas, S. Ceria and G. Cornuéjols (1993), “A lift-and-project cutting plane
algorithm for mixed 0-1 programs,” Mathematical Programming 58, 295–324.
2. E. Balas and M.J. Saltzman (1989), “Facets of the three-index assignment poly-
tope,” Discrete Applied Mathematics 23, 201–229.
3. A. Bockmayr, F. Eisenbrand, M. Hartmann and A.S. Schulz (to appear), “On
the Chvátal rank of polytopes in the 0/1 cube,” to appear in Discrete Applied
Mathematics.
4. S.C. Boyd and W.H. Cunningham (1991), “Small travelling salesman polytopes,”
Mathematics of Operations Research 16, 259–271.
5. S.C. Boyd, W.H. Cunningham, M. Queyranne and Y. Wang (1995), “Ladders for
travelling salesmen,” SIAM Journal on Optimization 5, 408–420.
On the Chvátal Rank of Certain Inequalities 233

6. S.C. Boyd and W.R. Pulleyblank (1991), “Optimizing over the subtour polytope
of the travelling salesman problem,” Mathematical Programming 49, 163–187.
7. A. Caprara and M. Fischetti (1996), “0-1/2 Chvátal-Gomory cuts,” Mathematical
Programming 74, 221–236.
8. A. Caprara, M. Fischetti and A.N. Letchford (1997), “On the separation of max-
imally violated mod-k cuts,” DEIS-OR Technical Report OR-97-8, Dipartimento
di Elettronica, Informatica e Sistemistica Università degli Studi di Bologna, Italy,
May 1997.
9. S. Ceria (1993), “Lift-and-project methods for mixed 0-1 programs,” Ph.D. dis-
sertation, Carnegie-Mellon University.
10. V. Chvátal (1973), “Edmonds polytopes and a hierarchy of combinatorial prob-
lems,” Discrete Mathematics 4, 305–337.
11. V. Chvátal, W. Cook and M. Hartmann (1989), “On cutting-plane proofs in
combinatorial optimization,” Linear Algebra and Its Applications 114/115, 455–
499.
12. P.D. Domich, R. Kannan and L.E. Trotter, Jr. (1987), “Hermite normal form
computation using modulo determinant arithmetic,” Mathematics of Operations
Research 12, 50–59.
13. J. Edmonds, L. Lovász and W.R. Pulleyblank (1982), “Brick decompositions and
the matching rank of graphs,” Combinatorica 2, 247–274.
14. F. Eisenbrand (1998), “A note on the membership problem for the elementary
closure of a polyhedron,” Preprint TU-Berlin No. 605/1998, Fachbereich Mathe-
matik, Technische Universität Berlin, Germany, November 1998.
15. F. Eisenbrand and A.S. Schulz (1999), “ Bounds on the Chvátal Rank of Polytopes
in the 0/1-Cube”, this Volume.
16. M. Fischetti (1992), “Three facet-lifting theorems for the asymmetric traveling
salesman polytope”, pp. 260–273 in E. Balas, G. Cornuéjols and R. Kannan, eds,
Integer Programming and Combinatorial Optimization (Proceedings of the IPCO2
Conference), G.S.I.A, Carnegie Mellon University.
17. M. Fischetti (1995), “Clique tree inequalities define facets of the asymmetric
travelling salesman polytope,” Discrete Applied Mathematics 56, 9–18.
18. R. Giles and L.E. Trotter, Jr. (1981), “On stable set polyhedra for K1,3 -free
graphs,” Journal of Combinatorial Theory Ser. B 31, 313–326.
19. M. Grötschel and M.W. Padberg (1985), “Polyhedral theory,” pp. 251–305 in:
E.L. Lawler et al., eds., The Travelling Salesman Problem, Wiley, New York.
20. M. Grötschel and W.R. Pulleyblank (1986), “Clique tree inequalities and the
symmetric travelling salesman problem,” Mathematics of Operations Research 11,
537–569.
21. M. Hartmann (1988), “Cutting planes and the complexity of the integer hull,”
Ph.D. thesis, Cornell University.
22. E.L. Johnson (1965), “Programming in networks and graphs”, Research Report
ORC 65-1, Operations Research Center, University of California–Berkeley.
23. G.L. Nemhauser and L.A. Wolsey (1988), Integer and Combinatorial Optimiza-
tion, Wiley, New York.
24. M. Queyranne and Y. Wang (1995), “Symmetric inequalities and their composi-
tion for asymmetric travelling salesman polytopes,” Mathematics of Operations
Research 20, 838–863.
25. A. Schrijver (1980), “On cutting planes,” in: M. Deza and I.G. Rosenberg, eds.,
Combinatorics 79 Part II, Ann. Discrete Math. 9, 291–296.
26. A. Schrijver (1986), Theory of Linear and Integer Programming, Wiley, Chich-
ester.
The Square-Free 2-Factor Problem in Bipartite
Graphs

David Hartvigsen

College of Business Administration


University of Notre Dame
P.O. Box 399
Notre Dame, IN 46556-0399
[email protected]

Abstract. The 2-factor problem is to find, in an undirected graph, a


spanning subgraph whose components are all simple cycles; hence it is
a relaxation of the problem of finding a Hamilton tour in a graph. In
this paper we study, in bipartite graphs, a problem of intermediate dif-
ficulty: the problem of finding a 2-factor that contains no 4-cycles. We
introduce a polynomial time algorithm for this problem; we also present
an “augmenting path” theorem, a polyhedral characterization, and a
“Tutte-type” characterization of the bipartite graphs that contain such
2-factors.

1 Introduction
A 2-factor in a simple undirected graph is a spanning subgraph whose compo-
nents are all simple cycles (i.e., the degrees of all nodes in the subgraph are 2). A
k-restricted 2-factor, for k ≥ 2 and integer, is a 2-factor with no cycles of length
k or less. Let us denote by Pk the problem of finding a k-restricted 2-factor in
a graph. Thus P2 is the problem of finding a 2-factor in a graph and Pk , for
n/2 ≤ k ≤ n − 1 (where n is the number of nodes in the graph), is the problem
of finding a Hamilton cycle in a graph.
The basic problem P2 has been has been extensively studied. A polynomial
algorithm exists due to a reduction (see Tutte [11]) to the classical matching
problem (see Edmonds [6]). Also well-known is a “good” characterization of those
graphs that have a 2-factor (due to Tutte [10]) and a polyhedral characterization
of the convex hull of incidence vectors of 2-factors (which can also be obtained
using Tutte’s reduction and Edmonds’ polyhedral result for matchings [5]). For
a more complete history of this problem see Schrijver [8].
Of course a polynomial time algorithm for Pk , for n/2 ≤ k ≤ n−1, is unlikely
since the problem of deciding if a graph has a Hamilton tour is NP-complete. In
fact, Papadimitiou (see [2]) showed that deciding if a graph has a k-restricted
2-factor for k ≥ 5 is NP-complete. A polynomial time algorithm for P3 was given
in [7]. This leaves P4 whose status is open.
In this note we consider the problem P4 on bipartite graphs, which we call
the bipartite square-free 2-factor problem. (The idea of studying this problem

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 234–241, 1999.
c Springer-Verlag Berlin Heidelberg 1999
The Square-Free 2-Factor Problem in Bipartite Graphs 235

was suggested to the author by Cunningham and Geelen [3].) Observe that
for bipartite graphs the problems Pk of interest are for k ≥ 2 and even. We
introduce here a polynomial time algorithm for the bipartite square-free 2-factor
problem. The algorithm uses techniques similar to those used for the problem
P3 on general graphs, but the resulting algorithm is simpler. We also present in
this paper, for the bipartite square-free 2-factor problem, three theorems that
are the appropriate versions of the following classical results for matchings in
graphs:

– Tutte’s [9] characterization of the graphs that contain perfect matchings;


– Berge’s [1] “augmenting path” characterization of maximum matchings in
graphs;
– Edmonds’ [5] polyhedral characterization of matchings in graphs.

Our three theorems follow from the validity of our algorithm. We present our
Tutte-type theorem in Section 2. We introduce a slightly more general version
of the k-restricted 2-factor problem in Section 3, in terms of which we state
our remaining results. In particular, we present our “augmenting path” theorem
in Section 4 and our polyhedral theorem in Section 5. We introduce some of
the ideas of the algorithm in Section 6. We contrast all of our results with the
comparable results for the problem P2 on bipartite graphs.
Before ending this section let us briefly mention two related classes of prob-
lems. First, consider the problems Pk , for k ≥ 6, on bipartite graphs. The status
of these problems is open. Second, consider the “weighted” versions of the prob-
lems Pk . That is, given a graph with nonnegative weights, let weighted-Pk be the
problem of finding a k-restricted 2-factor such that the sum of the weights of its
edges is a maximum. Vornberger [12] showed that weighted-P4 is NP-hard. The
status of weighted-P3 remains open. Cunningham and Wang [4] have studied
the closely related polytopes given by the convex hull of incidence vectors of
k-restricted 2-factors.

2 Tutte-Type Theorems
In this section we present a theorem that characterizes those bipartite graphs
that have a 2-factor and a theorem that characterizes those bipartite graphs that
have a square-free 2-factor. The first theorem is essentially a bipartite version of
Tutte’s 2-factor theorem (see [10]). The necessity of the condition is straightfor-
ward to establish. The sufficiency follows from a max-min theorem of Edmonds
(see [5]). The sufficiency also follows from the proof of the validity of the first
algorithm we present in Section 6.
For a graph H = (V, E), let A(H) denote the number of nodes of H that
have degree 0; let B(H) denote the number of nodes of H that have degree 1.
For S ⊆ V , we let H[S] denote the subgraph of H induced by S.

Theorem 1. A bipartite graph G = (V, E) has a 2-factor if and only if for every
subset V 0 ⊆ V , 2|V 0 | ≥ 2A(G[V \V 0 ]) + B(G[V \V 0 ]).
236 David Hartvigsen

With a few more definitions we can present a variant of this result that holds
for square-free 2-factors. Again, let H = (V, E) be a graph. We refer to a cycle in
H of length 4 as a square. Let C(H) denote the number of isolated squares of H
(i.e., those squares all of whose nodes have degree 2 in H); and let D(H) denote
the number of squares of H for which two nonadjacent nodes have degree 2 (in
H) but at least one of the other two nodes has degree > 2 (in H). The necessity
of the condition in the following theorem is straightforward to establish. The
sufficiency appears to be considerably more difficult; it follows from the validity
of the second algorithm we introduce in Section 6.
Theorem 2. A bipartite graph G = (V, E) has a square-free 2-factor if and only
if for every subset V 0 ⊆ V ,
2|V 0 | ≥ 2A(G[V \V 0 ]) + B(G[V \V 0 ]) + 2C(G[V \V 0 ]) + D(G[V \V 0 ]).

3 More General Problems


We introduce in this section two problems that are slightly more general than
the problems P2 and P4 in bipartite graphs. These are the problems we consider
in the remainder of this note. In particular, these are the problems that our
algorithms in Section 6 actually solve. Nevertheless, our algorithms also solve P2
and P4 in bipartite graphs.
Consider a simple undirected graph G. A simple 2-matching in G is a sub-
graph of G whose components are simple paths and cycles (i.e., the degrees of all
nodes in the subgraph are 1 or 2). The cardinality of a simple 2-matching is its
number of edges. A square-free simple 2-matching is a simple 2-matching that
contains no cycles of length 4. We subsequently consider in this note the fol-
lowing two problems on bipartite graphs: finding a maximum cardinality simple
2-matching and finding a maximum cardinality square-free simple 2-matching.
Note that, as we claimed above, an algorithm that finds a (square-free) maxi-
mum cardinality simple 2-matching also determines if a graph has a (square-free)
2-factor.

4 Augmenting Path Theorems


Our algorithms for finding maximum cardinality simple 2-matchings in a bipar-
tite graph are similar in form to Edmonds’ algorithm for finding a maximum
cardinality matching in a graph (see [6]). Edmonds’ algorithm builds up such a
matching incrementally by identifying “augmenting paths.” In this section we
define types of “augmenting paths” for our two problems and state two theorems
that characterize maximum cardinality simple 2-matchings for our problems in
terms of these “augmenting paths.” These theorems are in the spirit of Berge’s
theorem [1] for matchings.
Consider a graph G and a simple 2-matching M . A node in G is called
saturated if it is incident in G with two edges of M ; if it is incident with one
edge or no edge of M it is called deficient.
The Square-Free 2-Factor Problem in Bipartite Graphs 237

An augmenting path with respect to M is a simple path (no repeated nodes)


with the following properties: the edges are alternately in and out of M ; the
endedges are not in M ; and the endnodes are deficient. Hence if G has an aug-
menting path, then a simple 2-matching with cardinality one greater than |M |
can be obtained by interchanging the edges in and out of M along this path.
The following theorem shows that we have the converse as well. This result fol-
lows from the validity of our algorithm for finding maximum cardinality simple
2-matchings in bipartite graphs. It also follows from Berge’s [1] augmenting path
theorem for matchings with the reduction of Tutte [11].

Theorem 3. A simple 2-matching M in a bipartite graph G has maximum car-


dinality if and only if G contains no augmenting path with respect to M .

We next consider the case of square-free simple 2-matchings. Let M denote


a square-free simple 2-matching in G. An augmenting structure with respect to
M is a simple path (no repeated nodes) P together with a collection (possibly
empty) of cycles C1 , . . . , Cp with the following properties:

– P, C1 , . . . , Cp are pairwise edge-disjoint (they may share nodes);


– For P : the edges are alternately in and out of M ; the endedges are not in
M ; and the endnodes are deficient.
– For each Ci : the edges are alternately in and out of M ;
– Interchanging the edges of P, C1 , . . . , Cp in and out of M results in a square-
free simple 2-matching in G.

Hence if G has an augmenting structure, then a square-free simple 2-matching


with cardinality one greater than |M | can be obtained by interchanging the edges
in and out of M in this structure. The following theorem shows that we have
the converse as well. The proof of this result follows from the validity of our
algorithm for finding maximum cardinality square-free simple 2-matchings in
bipartite graphs. The example following the theorem shows why the definition
of augmenting paths is not sufficient for the square-free case.

Theorem 4. A square-free simple 2-matching M in a bipartite graph G has


maximum cardinality if and only if G contains no augmenting structure with
respect to M .

Example Consider the square-free simple 2-matching M in Figure 1. Bold lines


are edges in M ; non-bold lines are edges not in M . Let P denote the path
with the six nodes a - f and let C1 and C2 denote the cycles bounding the
regions with the corresponding labels. Observe that P is the only augmenting
path in this graph, yet if P alone is used to increase |M |, then a square is
created in the resulting matching. However, |M | can be increased by one by
using the augmenting structure P , C1, and C2.
238 David Hartvigsen

C2

C1

g h

a b c d e f

Fig. 1.

5 Polyhedral Descriptions
In this section we present polyhedral results for our two problems. Let G = (V, E)
be a graph. For v ∈ V , let δ(v) denote the edges of G incident with v and let
x ∈ <E . For a simple 2-matching M , x is called an incidence vector for M if
Pe = 1 when e ∈ M ; and xe = 0 when e ∈
x / M . For S ⊆ E, we let x(S) denote
e∈S x e .
Consider the following linear program (P ).

max x(E)
s.t. x(δ(v)) ≤ 2 for all v ∈ V
0 ≤ xe ≤ 1 for all e ∈ E
Observe that integral solutions to (P ) are incidence vectors for simple 2-
matchings and hence an optimal integral solution is a maximum cardinality
simple 2-matching. When the graph is bipartite, it’s well known that the con-
straint matrix for (P ) is totally unimodular hence all the extreme points of the
polyhedron are incidence vectors for simple 2-matchings. In particular, we have
the following result.
Theorem 5. For any bipartite graph G, (P ) has an integer optimal solution
(which is an incidence vector of a maximum cardinality simple 2-matching).
Next, consider the following linear program (P 0 ).

max x(E)
s.t. x(δ(v)) ≤ 2 for all v ∈ V
x(S) ≤ 3 for all edge sets S of squares
0 ≤ xe ≤ 1 for all e ∈ E
The Square-Free 2-Factor Problem in Bipartite Graphs 239

Observe that integral solutions to (P 0 ) are incidence vectors for square-free


simple 2-matchings and hence an optimal integral solution is a maximum cardi-
nality square-free simple 2-matching. However, the constraint matrix for (P 0 ) is
not totally unimodular, in general.

Theorem 6. For any bipartite graph G, (P 0 ) has an integer optimal solution


(which is a maximum cardinality square-free simple 2-matching).

This theorem can be proved from the structure available at the end of the
algorithm. In particular, we can produce an integral feasible primal solution and
a half-integral feasible dual solution that satisfy complementary slackness.

6 The Algorithms

We begin this section with a complete statement of the algorithm for finding
a maximum cardinality simple 2-matching in a bipartite graph. This algorithm
can be obtained using Tutte’s reduction [11] to the matching problem and then
applying Edmonds’ matching algorithm [6]. We present this algorithm in the
same form that is used for the square-free algorithm so that we may discuss
some of the features of the square-free algorithm later in this section.
In the course of the algorithm for finding a maximum cardinality simple
2-matching in a bipartite graph, we construct a subgraph S of G called an
alternating structure. We next describe some properties of this structure that
are maintained throughout the algorithm.
The nodes of S are either odd or even and have the following properties.

– Odd nodes are incident with two edges of M (zero, one, or two of these can
be in S); the other endnodes of these edges are even nodes.
– Even nodes may be saturated or deficient.
– If an even node is saturated, then exactly one of these two edges of M must
be in S and the other endnode of this edge must be odd.

S is partitioned into the union of node disjoint trees. These trees have the
following properties:

– Each tree has a unique node called its root.


– The set of roots is the same as the set of deficient nodes.
– All roots are even nodes.

Consider an arbitrary path in such a tree that contains the tree’s root as an
endnode. Then this path has the following properties:

– The nodes of the path are alternately even and odd.


– The edges of the path are alternately in and not in M such that the edge
incident with the root is not in M .
240 David Hartvigsen

Algorithm :
Input: Bipartite graph G.
Output: A maximum cardinality simple 2-matching of G.
Step 0: Let M be any simple 2-matching of G.
Step 1: If M is a 2-factor, done. Otherwise, let S be the set of deficient nodes
of G, where these nodes are the roots of S and are even.
Step 2: If every edge of G that is incident with an even node is either an
edge of M or has an odd node as its other endnode, then M has maximum
cardinality; done. Otherwise, select an edge j ∈ / M joining an even node v
to a node w that is not an odd node.
Case 1: w ∈ / S. Go to Step 3.
Case 2: w is an even node. Go to Step 4.
Step 3: [Grow the structure] w ∈ / S.
w must be incident with two edges in M : wu1 and wu2 (otherwise w is
deficient, hence a root in S). Neither u1 nor u2 is odd, since then w would
be even. Thus each ui is either even or ∈/ S. Make w odd and add it and the
edge vw to the tree that contains v. If ui ∈/ S, then make it even and add it
and the edge wui to the tree that contains v. If ui is even, do not change its
status.
Step 4: [Augment the matching] w is an even node.
Observe that v and w must be in different trees (otherwise there exists an
odd cycle). Consider the path consisting of the path from v to the root of
its tree, the path from w to the root of its tree, plus the edge vw. The edges
along this path are alternately in and not in M with the endedges not in M
and the endnodes are deficient (i.e., it is an augmenting path). Interchange
edges in and out of the matching along this path. Go to Step 1.
End .

The validity of this algorithm can be proved using arguments similar to those
used by Edmonds [6] to prove the validity of his matching algorithm.
We next introduce some of the key ideas of our square-free version of the
above algorithm by considering an example. To begin, consider an application
of the above algorithm to the graph in Figure 1. Let us begin in Step 0 with
the matching M illustrated: bold lines are edges in M ; non-bold lines are edges
not in M . Observe that nodes a and f are the only deficient nodes, hence the
algorithm starts in Step 1 by making these two nodes into even nodes and the
roots of two trees in S. The algorithm may next select the edge ab in Step 2.
It would then go to Step 3 and make b an odd node, and c and g even nodes.
It would add the edges ab, bc, and bg to the tree containing a. The algorithm
may next select the edge cd in Step 2. As above it would then make d an odd
node, and e and h even nodes. It would add the edges cd, de, and dh to the tree
containing a. The algorithm may next select the edge ef in Step 2. This then
calls for augmenting the matching in Step 4. However, if we were to do so, we
would create a square in the matching. The way in which this problem is dealt
with in the square-free version of the algorithm is as follows: When we consider
edge cd, we make only h an even node. Thus we only look to grow the tree from
The Square-Free 2-Factor Problem in Bipartite Graphs 241

node h. As we continue to grow the tree, we eventually discover the cycles C1


and C2, and incorporate them into the structure. When this is accomplished, we
make e an even node and add the edge de to the tree containing a. Now when
we consider the edge ef , we may augment by using the augmenting structure
P , C1, C2 as described in the example in Section 4. In order to handle the
search for these augmenting structures, the definition of alternating structure
becomes more complex. Several new types of nodes are added as well as a more
complex tree structure. Also, as in the classical matching algorithm of Edmonds
[6], certain subgraphs (in this case squares) may be shrunk in the course of the
algorithm.

Acknowledgement I would like to thank Gerard Cornuejols for some helpful


discussions of this work and for his major role in developing with me in my
doctoral thesis the techniques used in this paper.

References
1. Berge, C., Two theorems in graph theory. Proceedings of the National Academy of
Sciences (U.S.A.) 43 (1957) 842-844.
2. Cornuejols, G. and W.R. Pulleyblank. A matching problem with side conditions.
Disc. Math. 29 (1980) 135-159.
3. Cunningham, W.H. and J. Geelen. Personal communication (1997).
4. Cunningham, W.H. and Y. Wang. Restricted 2-factor polytopes. Working paper,
Dept. of Comb. and Opt., University of Waterloo (Feb. 1997).
5. Edmonds, J., Maximum matching and a polyhedron with 0,1 vertices. J. Res. Nat.
Bur. Standards Sect. B 69 (1965) 73-77.
6. Edmonds, J., Paths, trees, and flowers. Canad. J. Math. 17 (1965) 449-467.
7. Hartvigsen, D. Extensions of Matching Theory. Ph.D. Thesis, Carnegie-Mellon
University (1984).
8. Schrijver, A., Min-max results in combinatorial optimization. in Mathematical Pro-
gramming, the State of the Art: Bonn, 1982, Eds.: A. Bachem, M. Grotschel, and
B. Korte, Springer-Verlag, Berlin (1983) 439-500.
9. Tutte, W.T., The factorization of linear graphs. J. London Math. Soc. 22 (1947)
107-111.
10. Tutte, W.T., The factors of graphs. Canad. J. Math. 4 (1952) 314-328.
11. Tutte, W.T., A short proof of the factor theorem for finite graphs. Canad. J. Math.
6 (1954) 347-352.
12. Vornberger, O., Easy and hard cycle covers. Preprint, Universitat Paderborn
(1980).
The m-Cost ATSP

Christoph Helmberg

Konrad-Zuse-Zentrum Berlin
Takustraße 7, 14195 Berlin, Germany
[email protected], http://www.zib.de/helmberg

Abstract. Although the m-ATSP (or multi traveling salesman problem)


is well known for its importance in scheduling and vehicle routing, it has,
to the best of our knowledge, never been studied polyhedraly, i.e., it has
always been transformed to the standard ATSP. This transformation is
valid only if the cost of an arc from node i to node j is the same for all
machines. In many practical applications this is not the case, machines
produce with different speeds and require different (usually sequence de-
pendent) setup times. We present first results of a polyhedral analysis
of the m-ATSP in full generality. For this we exploit the tight relation
between the subproblem for one machine and the prize collecting travel-
ing salesman problem. We show that, for m ≥ 3 machines, all facets of
the one machine subproblem also define facets of the m-ATSP polytope.
In particular the inequalities corresponding to the subtour elimination
constraints in the one machine subproblems are facet defining for m-
ATSP for m ≥ 2 and can be separated in polynomial time. Furthermore,
they imply the subtour elimination constraints for the ATSP-problem
obtained via the standard transformation for identical machines. In ad-
dition, we identify a new class of facet defining inequalities of the one
machine subproblem, that are also facet defining for m-ATSP for m ≥ 2.
To illustrate the efficacy of the approach we present numerical results
for a scheduling problem with non-identical machines, arising in the pro-
duction of gift wrap at Herlitz PBS AG.

1 Introduction
For Herlitz PBS AG, Berlin, we are developing a software package for the fol-
lowing scheduling problem. Gift wrap has to be printed on two non-identical
printing machines. The gift wrap is printed in up to six colors on various kinds
of materials in various widths. The colors may differ considerably from one gift
wrap to the next. The machines differ in speed and capabilities, not all jobs can
be processed on both machines. Setup times for the machines are quite large in
comparison to printing time. They depend strongly on whether material, width,
or colors have to be changed, so in particular on the sequence of the jobs. The
task is to find an assignment of the jobs to the machines and an ordering on
these machines such that the last job is finished as soon as possible, i.e., min-
imize the makespan for a scheduling problem on two (non-identical) machines
with sequence dependent set-up times.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 242–258, 1999.
c Springer-Verlag Berlin Heidelberg 1999
The m-Cost ATSP 243

An obvious mathematical model for this problem is the asymmetric multi


traveling salesman problem with separate arc costs for each salesman. Although
there is considerable literature on the TSP/ATSP [10,13,14,15,3,11,7] as well as
on the m-ATSP for vehicle routing (see [5] and references therein) it seems that
the m-ATSP problem in full generality has never been studied from a polyhedral
point of view. Existing work on the m-ATSP relies on the standard transforma-
tion to ATSP (see e.g. [16] or Section 6) which assumes that all salesmen need
the same time for the same route. The general case of non-identical salesmen
should also be of importance in vehicle routing where usually not all vehicles in
a car park have the same capabilities or fuel consumption.
In this paper we give a precise definition of what we call the m-Cost ATSP and
present a ‘canonical’ integer programming formulation. This formulation includes
an exponential class of inequalities, which we call the v0-cut inequalities. They
may be interpreted as a kind of conditional subtour elimination constraints and
are, in a certain sense, equivalent to the inequalities introduced in [1], Theorem
2.5, for the Price Collecting TSP (PCTSP) (see also [8,2,9]). The PCTSP is
tightly related to the one machine subproblem of the m-Cost ATSP and we
will clarify this relation in Section 4. Another variant, the generalized traveling
salesman problem [6], also allows for skipping certain jobs but differs in that the
jobs are partitioned into sets and at least one job from each set has to be visited
by the tour.
The main results of the paper are: Facets of the one machine subproblem
define facets of the m-Cost ATSP polytope for m ≥ 3; the non-negativity con-
straints and the v0-cut inequalities are facet defining for the m-Cost ATSP poly-
tope for m ≥ 2, they can be separated in polynomial time, and they imply the
subtour elimination constraints if the standard transformation for identical ma-
chines from m-ATSP to ATSP is applied. In addition, we identify a new class of
so called nested conflict inequalities that are facet defining for the polytope of
the one machine subproblem and the m-Cost ATSP polytope for m ≥ 2.
For our real-world instances the linear relaxation based on the v0-cut in-
equalities yields, in average, a relative gap of 0.2% with respect to the best
solution known. In comparison, the relaxation using subtour elimination con-
straints on the transformed problem exhibits an average relative gap of 0.5%, so
this approach reduces the gap by 60%.
The paper is organized as follows. Section 2 introduces the necessary nota-
tion and gives the problem definition. In Section 3 we present an integer pro-
gramming formulation of the m-Cost ATSP and determine the dimension of the
m-Cost ATSP polytope. Section 4 is devoted to the one machine subproblem,
its relation to the PCTSP, and the separation algorithm. In Section 5 the facet
defining inequalities of the subproblem are shown to be facet defining for the
m-Cost ATSP polytope. In Section 6 we prove that the v0-cut inequalities im-
ply the subtour elimination constraints for the transformed problem. Finally, in
Section 7 numerical results are presented for the scheduling problem at Herlitz
PBS AG.
244 Christoph Helmberg

2 Notation and Problem Formulation

Let D = (V, A) be a digraph on node set V and arc set A without multiple arcs
but possibly with loops. An arc is an ordered pair (v, w) of nodes v, w ∈ V ,
v is called tail, w head of (v, w). Most of the time we will simply write vw
instead of (v, w). For a subset S ⊆ V , δ − (S) = {vw ∈ A : v ∈ V \ S, w ∈ S} is
the set of arcs of A that have their head in S and their tail in V \ S. Similarly,
δ + (S) = {vw ∈ A : v ∈ S, w ∈ V \ S} is the set of arcs of A having tail in S and
head in V \ S. By definition, loops are not contained in any set δ − (S) or δ + (S).
For a node v ∈ V we will write δ − (v) instead of δ − ({v}). For an arc set  ⊂ A,
V (Â) = {v ∈ V : vw ∈ Â or wv ∈ Â} is the set of nodes that are tail or head of
an arc in Â. For a node set S ⊂ V , A(S) = {vw ∈ A : v ∈ S and w ∈ S} is the
Px∈Q
|A|
set of arcs having both head and tail in S. For a vector the sum of the
weights over an arc set  ⊂ A is denoted by x(Â) = a∈ xa .
A dicycle is a set of k > 0 arcs C = {v1 v2 , v2 v3 , . . . , vk v1 } ⊆ A with distinct
vertices v1 , . . . , vk ∈ V . The number of arcs k is referred to as the length of the
dicycle. A tour is a dicycle of length |V |. A loop is a dicycle of length 1.
Now consider the problem of scheduling n jobs on m distinct machines. Let
J = {1, . . . , n} denote the set of jobs and M = {1, . . . , m} the set of machines.
In order to model all possible schedules we introduce for each machine k ∈ M
a digraph Dk0 = (Vk , Ak ) with Vk = J ∪ {0} and arc set Ak = {(0, 0)k } ∪
{(i, j)k : i, j ∈ Vk , i 6= j} (node 0 allows to model initial and final state of each
machine as well as idle costs). We will write ijk instead of (i, j)k . Usually we
will regard the nodes in Vk and Vi as the same objects. Sometimes, however, we
will have to discern between nodes in Vk and Vi , then we will do so by adding a
subscript, e.g., 0k for 0 ∈ Vk . For S ⊆ Vk the subscript in the symbols δk− (S) and
δk+ (S) will indicate on which arc set we are working.
Sm We will frequently need the
union of all arc sets which we denote by A = k=1 Ak . With each arc ijk ∈ A
we associate a cost or weight cijk .
A feasible solution AF = C1 ∪ . . . ∪ Cm is a set of m + n arcs that is the union
of m dicycles Ck ⊆ Ak for k ∈ M , such that 0 ∈ Vk (Ck ) for all k ∈ M and for
each v ∈ J there is a k ∈ M with v ∈ Vk (Ck ). The set of feasible solutions is

Pm
Fnm = C1 ∪ . . . ∪ Cm : k=1 |Ck | = n + m,
∀k ∈ M : Ck ⊆ Ak is a dicyclewith 0 ∈ Vk (Ck ),
∀v ∈ J : ∃k ∈ M : v ∈ Vk (Ck ) .

One proves easily that each job is assigned uniquely.


Proposition 1. Let AF = C1 ∪ . . . ∪ Cm ∈ Fnm , then for v ∈ J there is a unique
k ∈ M with v ∈ V (Ck ).
An AF ∈ Fnm may be interpreted in the following way: if ijk ∈ AF and i, j ∈ J
then job j has to be processed directly after job i on machine k. If i = 0 and
j ∈ J, then j is the first job to be processed on machine k. If i ∈ J and j = 0
The m-Cost ATSP 245

then i is the last job to be processed on machine k. If i = j = 0 then machine k


does not process any jobs.
The optimization problem we will deal with is to find
(m-Cost ATSP) min {c(AF ) : AF ∈ Fnm } .
It is not difficult to show that (m-Cost ATSP) is NP -complete.

3 The m-Cost ATSP Polytope


We will approach the problem from a polyhedral point of view. To this end
define, for A ⊆ A, the incidence vector xA ∈ {0, 1}|A| by

A 1, if ijk ∈ A
xijk =
0, otherwise.
The m-Cost ATSP polytope is defined as the convex hull of the incidence vectors
of all feasible solutions, i.e.,

Pnm = conv xA : A ∈ Fnm .
In order to study this polytope consider the following set of constraints
x00k + x(δk+ (0)) = 1 k∈M (1)
P +
k∈M x(δk (j)) = 1 j∈J (2)
x(δk− (j)) = x(δk+ (j)) j ∈ J, k ∈ M (3)
x(δk+ (S)) ≥ x(δk+ (v)) k ∈ M, S ⊆ J, |S| ≥ 2, v ∈ S (4)
xijk ∈ {0, 1} ijk ∈ A. (5)
The constraints of (1) ensure that for each node 0k , k ∈ M , one arc is selected.
In (2) we require that for each job j ∈ J one outgoing arc of j is selected among
all such arcs on all machines. Constraints (3) are flow conservation constraints.
Constraints (4) are a kind of conditional subtour elimination constraints that
ensure that any flow through vk in graph k reaches node 0k in this graph (note,
that 0 ∈/ J), i.e., the flow through vk has to cross the cut from S to Vk \ S.
Therefore, we call constraints (4) v0-cut inequalities.
Lemma 1. An incidence vector xA of A ⊆ A is a feasible solution of (1)–(5) if
and only if A ∈ Fnm .
Proof. The if part is easy to check, so consider the only if part. Let x be a
solution of (1)–(5). Then, by the flow conservation constraints (3), x is the
incidence vector of a union of r dicycles C1 , . . . , Cr and each Ci ⊆ Aki for
some ki ∈ M . Because of (1) there is exactly one dicycle Ci ⊆ Ak for each
k ∈ M with 0 ∈ Vk (Ci ). Likewise, by (2), for each v ∈ J there is exactly one
dicycle Ci with v ∈ Vki (Ci ). Furthermore, for each Ci we must have 0 ∈ Vki (Ci ),
because otherwise S = Vki (Ci ) would lead to a violated v0-cut inequality (4) for
k = ki and arbitrary v ∈ S. Thus, r = m and the cycles C1 , . . . , Cm fulfill the
requirements for Fnm . t
u
246 Christoph Helmberg

So Pnm is the convex hull of the feasible solutions of (1)–(5). In order to deter-
mine the dimension of Pnm we investigate equations (1)–(3) which we collect, for
convenience, in a matrix B and a right hand side vector b,

Bx = b.

(m+n+mn)×|A|
Lemma 2. The coefficient matrix B ∈ {−1, 0, 1} corresponding
to (1)–(3) has full row rank m + n + mn.

Proof. Consider the sets of column indices

F1 = {0i1, i01 : i ∈ J} ∪ {121}


Fk = {00k} ∪ {0ik : i ∈ J} for k ∈ M \ {1}
[
F = Fk .
k∈M

We claim that the columns of B corresponding to F are linearly independent.


The linear independence of Fh for h ∈ M \ {1} with respect to F \ Fh
follows from the fact that the variables are the only arcs in F that are in the
support of the equation with k = h in (1) and of the n equations with k = h
in (3). A proper arrangement of the submatrix of B consisting of these rows
and the columns corresponding to Fh yields an upper triangular matrix with all
diagonal entries equal to one. Thus the columns corresponding to Fh are linearly
independent.
In the case of F1 variables 0i1 and i01 are both in the support of (3) for
k = 1 and j = i but only i01 is in (2) for j = i, so they are linearly independent
with respect to these rows and the rows do not contain further variables in
F \ {121}. Now observe that column 001 is the linear combination (of columns)
021 + 101 − 121. Variable 001 is only in the support of the equation with k = 1
in (1) and its column is therefore not in the span of the columns of F \ {121}.
We conclude that the columns of F are linearly independent. t
u

To complement this upper bound on the dimension of Pnm with a lower bound
we will need a standard construction of linearly independent tours for the ATSP.

Theorem 1. (see [10]) Let DV be the complete digraph on nodes V = {0, . . . , n}


with n ≥ 2. There exist n(n − 1) tours in DV that are linearly independent with
respect to the arcs not incident to 0.

Proof. Follows directly from [10], proof 1 of Theorem 7 and Theorem 20. t
u

Now we are ready to determine the dimension of Pnm .

Theorem 2. dim(Pnm ) = mn2 − n for n ≥ 3 and m ≥ 2.


The m-Cost ATSP 247

Proof. If follows from Lemma 2 that dim(Pnm ) ≤ mn2 − n. To proof equality let
aT x = a0 be a valid equation for Pnm . We will show that it is linearly dependent
with respect to Bx =Sb. By the proof of Lemma 2 the columns of B corresponding
to the arcs in F = k∈M Fk are linearly independent, so we can compute λ ∈
Q n+m+nm such that aTijk = (λT B)ijk for ijk ∈ F . The equation dT x = (aT −
λT B)x = a0 − λT b is again valid for Pnm and dijk = 0 for ijk ∈ F . We will show
that all other coefficients dijk have to be zero as well.
First observe that on the support A1 \ {001} d has exactly n(n − 1) − 1
‘unknowns’. Employing the n(n − 1) linearly independent tours of Lemma 1
construct linearly independent feasible solutions Ah ∈ Fnm for h = 1, . . . , n(n−1)
h
that use these tours on machine 1 such that the xA are linearly independent on
the index set {ij1 : i 6= j, i, j ∈ J}. There exists an ĥ, 1 ≤ ĥ ≤ n(n−1), such that
h ĥ
the n(n − 1) − 1 vectors xA − xA (h 6= ĥ) are linearly independent on the index
set {ij1 : i 6= j, i, j ∈ J}\{121}. These difference vectors have no support outside
h ĥ
A1 \ {001} and satisfy dT (xA − xA ) = 0. Collect the n(n − 1) − 1 difference
vectors as columns in a matrix X. Since dT X = 0 and dij1 = 0 for ij1 ∈ F1 we
conclude that dij1 = 0 for all ij1 ∈ A1 \ {001} by the linear independence of the
columns of X.
We proceed to show that dij2 = 0 for all ij2 ∈ A2 . The following two arc sets
A1 , A2 ∈ Fnm correspond to a tour on machine 1 and to the same tour shortened
by ‘moving’ node n to machine 2,

A1 = {i(i + 1)1 : i = 0, . . . , n − 1} ∪ {n01} ∪ {00k : 2 ≤ k ∈ M }


A2 = {i(i + 1)1 : i = 0, . . . , n − 2} ∪ {(n − 1)01} ∪ {0n2, n02}
∪ {00k : 2 < k ∈ M } .
1 2 1 2
Since 0 = dT (xA − xA ) = −dn02 (in the support of xA − xA only dn02 has
not yet be shown to be zero) we obtain dn02 = 0. Moving any other vertex i ∈ J
from the tour on machine 1 to machine 2 yields di02 = 0. Now it is easy to obtain
dij2 = 0 for i 6= j and i, j ∈ J by the dicycles {0i2, ij2, j02} (here we use n ≥ 3
as otherwise d001 would have to enter). So dij2 = 0 for ij2 ∈ A2 . The same steps
for k > 2 lead to dijk = 0 for ijk ∈ Ak for 2 ≤ k ∈ M .
It remains to show that d001 = 0. This is easily achieved by taking the
1
difference vector between xA and an incidence vector of a feasible arc set A3 ∈
m
Fn with 001 ∈ A (e.g., let A3 contain a tour on machine 2).
3
t
u

Remark 1. It can be shown that for n = 2 and m ≥ 2 the dimension is mn2 −


n − 1, the additional affine constraint being
X
xijk = 2.
ijk∈{01k,10k,12k,21k:k∈M}

Before we turn to the facets of Pnm we study a tightly related problem.


248 Christoph Helmberg

4 The One-Machine Subproblem

Consider the subproblem on one machine where this machine is not required
to process all jobs, but may process any subset of jobs, or even choose the idle
loop. We model this problem on a digraph Ds = (Vs , As ) with Vs = {0} ∪ J and
As = {(0, 0)} ∪ {(i, j) : i, j ∈ Vs , i 6= j}. The feasible set is

Fns = {C : C ⊆ As is a dicycle with 0 ∈ Vs (C)}

The corresponding polytope defined by the incidence vectors of the arc sets in
the feasible set is 
Pns = conv xA : A ∈ Fns
We can model the incidence vectors by the following constraints,

x00 + x(δ + (0)) = 1 (6)


x(δ − (j)) = x(δ + (j)) j ∈ J (7)
x(δ + (S)) ≥ x(δ + (v)) S ⊆ J, |S| ≥ 2, v ∈ S (8)
xij ∈ {0, 1} ∀ij ∈ As . (9)

Lemma 3. An incidence vector xA of A ⊂ As is a feasible solution of (6)–(9)


if and only if A ∈ Fns .

Proof. The if part is easy to check, so consider the only if part. Let x be a
solution of (6)–(9). Then, by the flow conservation constraints (7), x is the
incidence vector of a union of r dicycles C1 , . . . , Cr . Because of (6) exactly one
dicycle of these, w.l.o.g. C1 , satisfies 0 ∈ V (C1 ). Furthermore, for all Ci we must
have 0 ∈ V (Ci ), because otherwise S = V (Ci ) would lead to a violated v0-cut
inequality (8) with arbitrary v ∈ S. Thus r = 1 and the cycles C1 fulfill the
requirements for Fns . t
u

Pns is therefore the convex hull of the feasible solutions of (6)–(9). We determine
the dimension of Pns by the same steps as in Section 3. We collect equations (6)
and (7) in a matrix Bs and a right hand side vector bs , Bs x = bs .
(n+1)×|As |
Lemma 4. The coefficient matrix Bs ∈ {−1, 0, 1} corresponding to
(6) and (7) has full row rank n + 1.

Proof. The set of column indices

Fs = {00} ∪ {0i : i ∈ J}

corresponds to the sets Fk for k > 1 of the proof of Lemma 2. The proof can be
completed analogous to the proof there. u
t

Theorem 3. dim(Pns ) = n2 .
The m-Cost ATSP 249

Proof. dim(Pns ) ≤ n2 follows from Lemma 4. We construct n2 + 1 feasible dicy-


cles,
C00 = {00}
Ci0 = {0i, i0} i∈J
Cij = {0i, ij, j0} i, j ∈ J, i 6= j.
To show the linear independence of the incidence vectors xCij arrange them
as rows in a matrix B. Let F be the set of indices indicated in bold above,
F = {00} ∪ {i0 : i ∈ J} ∪ {ij : i, j ∈ J, i 6= j}, and consider the submatrix Bs ∈
|F |×|F |
{0, 1} consisting of the columns corresponding to F . Reorder rows and
columns of Bs simultaneously arranging rows and columns corresponding to
{ij : i, j ∈ J, i 6= j} in front, followed by rows and columns {00} ∪ {i0 : i ∈ J}.
The reordered matrix is upper triangular with all diagonal elements equal to
one. Thus the vectors are linearly independent. t
u

Theorem 4. For ij ∈ As , xij ≥ 0 defines a facet of Pns if n ≥ 3.


Proof. For 00 or ij ∈ As with i, j ∈ J, i 6= j we have to drop dicycle Cij from
the dicycles defined in the proof of Theorem 3 to obtain n2 linearly independent
tours satisfying xij = 0. (Indeed, these are facets for n ≥ 1.)
For 0v ∈ As with v ∈ J we assume w.l.o.g. that v = 1. Then the n2 feasible
dicycles

C00 = {00}
C0i = {0i, i0} i ∈ J \ {1}
Cij = {0i, ij, j0} i ∈ J \ {1} , j ∈ J, i 6= j
C1j = {0i, i1, 1j, j0} j ∈ J \ {1} with some i ∈ J \ {1, j}

are again easily seen to satisfy x01 = 0 and to be linearly independent (proceed
as in the proof of Theorem 3). A similar construction shows that xv0 is facet
defining for v ∈ J. t
u
The following corollary will be especially useful in relating Pns with Pnm .
Corollary 1. Any facet defining inequality aT x ≤ a0 of Pns not equivalent to
x00 ≥ 0 satisfies aT xC00 = a0 where C00 = {00}.

Proof. Let aT x ≤ a0 be facet defining with C00 ∈/ Fa = A ∈ Fns : aT xA = a0 .
Since
 A = C00 is the only set A ∈ Fns satisfying xA 00 = 1 it follows that Fa ⊂
A ∈ Fns : xA T
00 = 0 , so a x ≤ a0 is equivalent to the facet x00 ≥ 0. t
u

Theorem 5. For S ⊆ J, |S| ≥ 2, v ∈ S inequalities x(δ + (S)) ≥ x(δ + (v)) are


facet defining for Pns .
Proof. W.l.o.g. let S = {1, . . . , h} with 1 < h ≤ n and v = 1. For convenience,
let aT x ≥ 0 denote the inequality x(δ + (S)) ≥ x(δ + (1)), and S1 = S \{1}. We list
n2 feasible dicycles whose incidence vectors are linearly independent and satisfy
aT xCi = 0:
250 Christoph Helmberg

1. C00 = {00}; the ‘idle’ loop.


2. Ci0 = {0i, i0} for i ∈ J \ S1 ; n − h + 1 dicycles.
3. Cij = {0i, ij, j0} for i, j ∈ J \ S1 , i 6= j; (n − h + 1)(n − h) dicycles.
4. Ci1 = {0i, i1, 10} for i ∈ S1 ; h − 1 dicycles.
5. Cij = {0i, ij, j1, 10} for i ∈ J \ {1}, j ∈ S \ {1, i}; (n − 2)(h − 1) dicycles.
6. Cij = {01, 1i, ij, j0} for i ∈ S1 , j ∈ J \ S; (n − h)(h − 1) dicycles.
7. Ci0 = {01, 1i, i0} for i ∈ S1 ; h − 1 dicycles.
8. C1i = {01, 1i, i(i + 1), (i + 1)0} for i ∈ {2, . . . , h − 1}; h − 2 dicycles.
To see the linear independence, collect the incidence vectors of the dicycles as
rows in a matrix. By arranging the rows corresponding to 1–6 in reverse order
and the columns of the variables indicated in bold ‘in front on the diagonal’ then
this submatrix is upper triangular with each diagonal element equal to 1.
On the other hand the submatrix corresponding to the rows of 7 and 8 is
regular on the support indicated in bold (on this support the submatrix is the
edge-node incidence matrix of an undirected path). The only other type of rows
having these variables in their support is in 6, but the rows of 6 are already
determined uniquely by their support in bold, which is not in the support of any
other row. So 6–8 are linearly independent on their support in bold and none
of the variables in this support appear in any other row, therefore the incidence
vectors of the cycles 1–8 are linearly independent. t
u
It is to be expected that the following theorem has already been proved for the
PCTSP, but we could not find an explicit reference so far.
Theorem 6. For x ∈ Q n(n+1)+1 , x ≥ 0 satisfying (7), inequalities (8) can be
separated in polynomial time.
Proof. For each node v ∈ J construct a digraph Dv = (Vs , Av ) where Av =
{ij : i ∈ J, j ∈ Vs \ {v}} and define capacities cvij = xij for ij ∈ Av . Solve, in
polynomial time, a v0-max flow problem for these capacities to obtain a min-cut
set δ+ (Sv ) with v ∈ Sv ⊂ J for these capacities. If x(δ + (Sv )) < x(δ + (v)) then a
violated inequality has been found, otherwise (8) is satisfied for all S ⊂ J with
v ∈ S. t
u
In order to show the relation between Pns and the polytope P0 for the PCTSP
as defined in [1,2] we introduce loop variables xjj (called yj in [2]) and add to
(6)–(9) the constraints

xjj + x(δ − (j)) = 1 j ∈ J. (10)

The polyhedron Pns (n) corresponding to the convex hull of the points satisfying
constraints (6)–(9) and (10) is not exactly P0 of [2], but the only differences are
that now the “long” cycle is required to pass through node 0 and that this long
cycle may also be of length 1 as opposed to at least two. This close relationship
allows to copy the lifting Theorem 3.1 of [2] more or less verbatim. Denote by
Pns (k) the polytope obtained from the ATSP polytope P (with respect to the
complete digraph on nodes J ∪ {0}) by introducing the first k loop variables xjj
for j = 0, . . . , k.
The m-Cost ATSP 251

Theorem 7. Let aT x ≤ a0 be any facet defining inequality for the ATSP poly-
tope P with respect to the complete digraph on Vs . For k = 0, . . . , n define
bjk = a0 − z(Pns (k)), where
( k−1
)
X
s T s
z(Pn (k)) = max a x + bji xji ji : x ∈ Pn (k), xjk jk = 1 .
i=0
Pk−1
Then the inequality aT x + i=0 bji xji ji ≤ a0 is valid and facet defining for
Pns (k).
Proof. As in [2], but with P0 (k) replaced with Pns (k). t
u
After having determined the lifting coefficients we eliminate variables xjj for
j ∈ J by using (10) in order to obtain the lifted inequality for Pns .
For example the v0-cut inequalities can be seen to be liftings of the subtour
elimination constraints for 0 ∈ / S ⊂ Vs , |S| ≥ 2, with respect to the sequence
j1 , . . . , j|S| = v ∈ S and afterwards for j ∈ Vs \ S. Here, we need the direct proof
of Theorem 5 in order to arrive at Theorem 11.
It is not yet clear for which other classes of facet defining inequalities of the
ATSP polytope it is possible to determine the lifting coefficients explicitly as in
[2], because the special role of node 0 does not allow to use these results without
some careful checking.
For the following class of facet defining inequalities we do not know whether
there exists a corresponding class in the ATSP or PCTSP literature.
Theorem 8. Let S0 ⊂ S1 ⊂ . . . ⊂ Sk = J be a nested sequence of proper subsets
with k ≥ 2 and let t0 ∈ S0 , ti ∈ Si \ Si−1 for i = 1, . . . , k. For
[  
Ac = δ + (Si−1 ) \ {vti : v ∈ Si−1 }
i=1,...,k

the nested conflict inequality


k
X X
xti t0 − xij ≤ 0 (11)
i=1 ij∈Ac

is facet defining for Pns if n ≥ 3.


Proof. Validity: Any dicycle through 0 contains at most one of the arcs ti t0 with
1 ≤ i ≤ k. If it does so for some i then it must include an arc from the cut
set δ + (Si−1 ) \ {vti : v ∈ Si−1 } in order to return from the set Si−1 to 0 without
visiting ti once more.
Facet defining: We exhibit n2 linearly independent dicycles through 0 that
satisfy (11) with equality. In order to simplify notation we will only give the
sequence of vertices of each dicycle, consecutive multiple appearances of the same
vertex should be interpreted as a single appearance, a sequence tj , tj+1 , . . . , tk
is regarded as empty if j > k. Furthermore, we will use ∆i = Si \ Si−1 \ {ti } for
1 ≤ i ≤ k and ∆0 = S0 \ {t0 } (these sets may be empty).
252 Christoph Helmberg

1. The ‘idle’ loop: 0, 0


2. v ∈ ∆i , w ∈ ∆j , 0 ≤ i < j ≤ k: 0, tj , t0 , t1 , . . . , ti , v, w, tj+1 , tj+2 , . . . , tk , 0
3. v ∈ ∆i , w ∈ ∆j , k ≥ i ≥ j ≥ 0, v 6= w: 0, v, w, tj+1 , . . . , tk , 0
4. v ∈ ∆i , w = tj , 0 ≤ i < j − 1 ≤ k − 1: 0, tj−1 , t0 , t1 , . . . , ti , v, tj , tj+1 , . . . , tk , 0
5. v = ti , w ∈ ∆j , 0 ≤ i < j ≤ k: 0, tj , t0 , t1 , . . . , ti , w, tj+1 , tj+2 , . . . , tk , 0
6. v = t1 , w ∈ ∆0 : 0, tk , t0 , t1 , w, 0
7. v ∈ ∆i , w = 0, 0 ≤ i ≤ k − 1: 0, tk , t0 , t1 , . . . , ti , v, 0
8. v = ti , w ∈ ∆j , k ≥ i ≥ j ≥ 0, i > 1: 0, ti , w, t1 , t0 , 0
9. v = ti , w ∈ ∆i , 0 ≤ i ≤ 1: 0, ti , w, ti+1 , ti+2 , . . . , tk , 0
10. v ∈ ∆i , w = tj , k ≥ i ≥ 0, i + 1 ≥ j, k ≥ j ≥ 0: 0, v, tj , tj+1 , . . . , tk , 0
11. v ∈ ∆k , w = 0: 0, v, 0
12. v = ti , w = 0, 1 ≤ i ≤ k − 1: 0, ti+1 , t0 , t1 , . . . , ti , 0
13. v = ti , w = tj , 0 ≤ i < j − 1 ≤ k − 1: 0, ti+1 , t0 , t1 , . . . , ti , tj , tj+1 , . . . , tk , 0
14. v = ti , w = ti+1 , 0 ≤ i ≤ k − 1: 0, ti , ti+1 , . . . , tk , 0
15. v = ti , w = tj , k ≥ i > j ≥ 0: 0, ti , tj , t0 , 0
16. v = tk , w = 0: 0, tk , 0
For each edge vw ∈ As \ {01, . . . , 0n, t0 0} (in total n2 edges) the list provides an
associated dicycle via v and w as specified above. Observe that an identifying
edge in this list does not appear in dicycles listed after the specification of its
associated dicycle (an appropriate internal order can be defined for 14 and 15).
Thus, by arranging the incidence vectors of the dicycles as rows and the edges as
columns according to this ordering the resulting matrix is upper triangular with
ones on the main diagonal and therefore the vectors are linearly independent.
t
u

5 Facets of the m-Cost ATSP Polytope


Theorem 9. For ijk ∈ A, xijk ≥ 0 defines a facet of Pnm if n ≥ 3 and m ≥ 2.

Proof. Without loss of generality let k = 2.


For 002 or ij2 ∈ A with i, j ∈ J we first construct n(n−1) tours on machine 1
as in the proof of Theorem 2. Then we construct n2 solutions corresponding to
the dicycles given in the proof of Theorem 4, where these dicycles are used
on machine 2, whereas the remaining nodes of J are covered by dicycles on
machine 1. Finally we construct a tour on machine 2 without using the current
arc in question, which is the first solution containing 001 in its support. If m > 2
further linear independent solutions can be constructed by using the dicycles of
the proof of Theorem 3 on these machines and by completing them on machine 1.
For indices 0v2 we show this, w.l.o.g., for v = 1. The proof is analogous, it
uses the corresponding dicycles of Theorem 4. The final tour for exploiting 001
has the form {0n2} ∪ {i(i − 1)2 : i ∈ J}. Since for n = 3 this tour does not have
common support with other tours of Theorem 4 with respect to their support
indicated in bold, it is linearly dependent with respect to dicycles only, that do
not have 001 in their support. t
u
The m-Cost ATSP 253

Theorem 10. For m ≥ 3 and n ≥ 3 all facet defining inequalities âT x ≤ â0
of Pns give rise to facet defining inequalities aT x ≤ a0 for Pnm by identifying As
with Ah for some h ∈ M , i.e., by setting aijh = âij for ij ∈ As , aijk = 0 for
ijk ∈ A \ Ah , and a0 = â0 .

Proof. W.l.o.g. let h = m. It follows from Theorem 9 that x00m ≥ 0 is facet


defining. So let âT y ≤ â0 be a facet defining inequality of Pns not equivalent
m−1
to x00 ≥ 0. Then Corollary 1 guarantees that any xA with Am−1 ∈ Fnm−1
m m
A
is easily extended to a solution x with A ∈ Fn satisfying aT xA = a0
m m

by setting Am = Am−1 ∪ {00m}. By Theorem 2 this ensures the existence of


(m − 1)n2 − n + 1 solutions that are linearly independent on the support outside
Am . Since âT x ≤ â0 is a facet of Pns we can construct further n2 − 1 linearly
independent solutions with respect to the support in Am \ {00m} by completing
the corresponding solutions of Fns on machine 1 to solutions of Fnm . t
u

We do not know whether the theorem also holds for m = 2. The main
difficulty is that one has to construct an additional solution that exploits the
variable 001. We can do this for the v0-cut and the nested conflict inequalities.

Theorem 11. The v0-cut inequalities (4) are facet defining for Pnm for m ≥ 2
and n ≥ 3.

Proof. By Theorem 10 we may concentrate on the case m = 2. W.l.o.g. consider


in (4) the case k = 2, S = {1, . . . , h} with 1 < h ≤ n and v = 1. For convenience,
let aT x ≥ 0 denote the inequality x(δ2+ (S)) ≥ x(δ2+ (1)), and S1 = S \ {1}.
We use n(n − 1) tours on machine 1 as in the proof of Theorem 2 and
n2 − 1 additional solutions corresponding to the dicycles defined in the proof
of Theorem 5 on machine 2, completed to full solutions on machine 1. These
2n2 − n − 1 solutions are linearly independent and satisfy aT x = 0. If n > 3 then
a tour on machine 2 suffices, since all other solutions have x001 = 0. If n = 3 then
consider the tour C = {022, 212, 132, 302}. Eliminate in its incidence vector the
support in 212 by subtracting an incidence vector of type 4 (in combination with
the type 2 incidence vector {012, 102}) and eliminate the support in 132 as well
as 302 by subtracting an incidence vector of either type 3 or 7, then all relevant
support on machine 2 has been eliminated while 001 is still in the support. So
the tour is linearly independent with respect to all other solutions. t
u

Theorem 12. For x ∈ Q |A| , x ≥ 0 satisfying (3), the v0-cut inequalities (4)
can be separated in polynomial time.

Proof. Follows directly from Theorem 6. t


u

Theorem 13. The nested conflict inequalities (11) are facet defining for Pnm
for m ≥ 2 and n ≥ 3.
254 Christoph Helmberg

Proof. By Theorem 10 we may concentrate on the case m = 2. Let k, Si , ti , ∆i


be defined as in Theorem 8 and its proof and identify the edges,
 w.l.o.g., with
the edges of machine 2. Denote by hi = |∆i | and let ∆i = v1i , . . . , vhi i , the
sets may be empty. Consider the tour on machine 2 specified by the sequence of
vertices

0, v1k , v2k , . . . , vhkk , v1k−1 , . . . , vhk−1


k−1
, v1k−2 , . . . , vh0 0 , t1 , t2 , . . . , tk , t0 , 0.

On the support A \ A1 \ {012, . . . , 0n2, t0 02} the incidence vector (with respect
to A) of this tour is a linear combination of incidence vectors (with respect to
A) of dicycles from 3, 10, 14, and the dicycle 0, tk , t0 , 0 of 15 as specified in the
proof of Theorem 8. None of these dicycles are tours on machine 2, so 001 is not
in the support of any of these incidence vectors. However, 001 is in the support
of the new tour. Thus, by the linear independence of the dicycles of the proof
of Theorem 8, its incidence vector is linearly independent with respect to the
n(n−1) tours on machine 1 of the proof of Theorem 2 and the n2 −1 dicycles (the
idle dicycle is covered by the tours on machine 1) of the proof of Theorem 8. u t

6 Subtour Elimination Constraints

We consider the standard transformation from m-ATSP (with identical ma-


chines) to standard ATSP (see e.g. [16]). To this end let D = (V, B) be a digraph
with V = {01 , . . . , 0m } ∪ J and

B = {0k i, i0k : k ∈ M, i ∈ J} ∪
{ij : i 6= j, i, j ∈ J} ∪
{0k , 0k+1 : k ∈ M \ {m}} ∪
{0m 01 } .

We map vectors x ∈ Q |A| to T x = y ∈ Q |B| by

y0k i = x0ik i ∈ J, k ∈ M
yi01 = xi0m i∈J
yi0k =xPi0(k−1) i ∈ J, k ∈ {2, . . . M }
yij = k∈M xijk i 6= j, i, j ∈ J
y0k 0k+1 = x00k k ∈ {1, . . . m − 1}
y0m 01 = x00m

It is well known that any feasible solution xA with A ∈ Fnm maps to y B = T xA


with B being a tour in D. Probably the most useful inequalities for the general
ATSP are the subtour elimination constraints, usually stated in the form

y(B(S)) ≤ |S| − 1 for S ⊂ V, 2 ≤ |S| ≤ |V | − 2.

In order to show that T x satisfies all subtour elimination constraints if x satisfies


all v0-cut inequalities, we need the following observation.
The m-Cost ATSP 255

Proposition 2. Let x ∈ Q |A| , x ≥ 0, satisfy constraints (1) to (3). If for some


S ⊂ V , 2 ≤ |S| ≤ |V | − 2, there exist h, k ∈ M , h 6= k such that 0h ∈ S and
0k ∈/ S then the subtour elimination constraint corresponding to S cannot be
violated for y = T x.

Proof. W.l.o.g. assume h = 1, k = 2 and let x ∈ Q |A| , x ≥ 0. Decompose the flow


through h into cycles C1 to Cr . In the transformed graph each of these cycles
correspond to a path from 01 to 02 and must therefore cross δ + (S). Therefore,
P= x001 ++x(δ1 (0)) ≤ y(δ (S)), which implies because of y(δ (S)) + y(B(S)) =
+ + +
1
v∈S y(δ (v)) = |S| that y(B(S)) ≤ |S| − 1. t
u

Theorem 14. Let x ∈ Q |A| , x ≥ 0, satisfy constraints (1) to (4), let y = T x,


and let S ⊂ V with 2 ≤ |S| ≤ |V | − 2. Then y(B(S)) ≤ |S| − 1.

Proof. Because of Proposition 2 we may assume 0k ∈ / S for k ∈ M . First observe


that
X X
x(δk+ (S)) − x(δk+ (v)) = |S|x(δk+ (S)) − x(δk+ (S)) − x(Ak (S)).
v∈S v∈S

By summing constraints (4) over all k ∈ M and v ∈ S we obtain


X X 
0≤ x(δk+ (S)) − x(δk+ (v)) =
k∈M v∈S
X 
= |S|x(δk+ (S)) − x(δk+ (S)) − x(Ak (S))
k∈M
X X
= (|S| − 1) x(δk+ (S)) − x(Ak (S))
k∈M k∈M
= (|S| − 1)y(δ (S)) − y(B(S))
+

= (|S| − 1)y(δ + (S)) + (|S| − 1) [y(B(S)) − y(B(S))] − y(B(S))


X
= (|S| − 1) y(δ + (v)) − |S|y(B(S))
v∈S
X
= (|S| − 1) 1 − |S|y(B(S)) = (|S| − 1)|S| − |S|y(B(S))
v∈S

It follows that y(B(S)) ≤ |S| − 1 and the proof is complete. t


u

7 Numerical Results

The scheduling problem at Herlitz PBS AG asks for a solution minimizing the
makespan over two non-identical machines. Conflicts arising from jobs using the
same printing cylinders are avoided by additional side constraints that require
that these jobs be executed on the same machine. The problem is thus not a
pure m-Cost ATSP, but the basic structure is the same. Our data instances
256 Christoph Helmberg

are constructed by randomly choosing jobs (without repetitions) from a set of


roughly 150 jobs provided by Herlitz PBS AG. The cost coefficients are computed
according to data and rather involved rules determined in cooperation with
experts at Herlitz PBS AG. The cost of one arc cijk includes the setup time
from job i to job j and the printing time of job j on machine k.
Our computational experiments were performed on a Sun Ultra 10 with a 299
MHz SUNW,UltraSPARC-IIi CPU and with CPLEX 6.0 as LP-solver, compu-
tation times are given in seconds. Table 1 shows detailed results for 10 random
instances of 60 jobs, the typical number of jobs processed in one week.

Table 1. 10 instances of 60 jobs each

n feas v0-cut (% ) secs subtour (% ) subt+Dk (% )


60 9420 9406.8 (0.14) 19.6 9387.42 (0.35) 9387.42 (0.35)
60 9382 9366.93 (0.16) 10.5 9345.37 (0.39) 9345.48 (0.39)
60 9519 9481.32 (0.4 ) 6.8 9425.49 (0.98) 9425.49 (0.98)
60 9723 9707.24 (0.16) 12.2 9674.58 (0.5 ) 9674.58 (0.5 )
60 9910 9900.59 (0.1 ) 31.4 9859.52 (0.51) 9860.24 (0.5 )
60 9421 9411.46 (0.1 ) 8.5 9395.92 (0.27) 9395.92 (0.27)
60 9984 9967.58 (0.16) 14.4 9914.74 (0.69) 9915.24 (0.69)
60 9557 9542.84 (0.15) 17.9 9517.14 (0.42) 9517.14 (0.42)
60 9263 9261.73 (0.01) 7.4 9238.88 (0.26) 9238.88 (0.26)
60 9373 9343.13 (0.32) 23.9 9294.81 (0.83) 9294.81 (0.83)

Column n gives the size of the instance, feas displays the best integral so-
lution we know, v0-cut shows the optimal value of the relaxation based on the
v0-cut inequalities, in parenthesis we give the relative gap 1−(v0-cut/feas). secs
is the number of seconds the code needed to solve this relaxation, including sepa-
ration. We compare the value of relaxation v0-cut to the relaxations obtained by
separating the subtour inequalities on the transformed problem and to the relax-
ation combining subtour and Dk+ /Dk− inequalities (for the definition of Dk+ /DK −

inequalities see [10], for a separation heuristic [7]). With respect to the latter
relaxations the v0-cut inequalities close the gap by roughly 60%. Optimal so-
lutions to the subtour relaxations without v0-cut inequalities would typically
exhibit several subtours on the one machine subproblems. There is still much
room for improvement in the separation of the v0-cut inequalities, so it should
be possible to reduce the computation time in more sophisticated implementa-
tions. It is surprising that the Dk+ /Dk− inequalities are not very effective, quite
contrary to the experience in usual ATSP-problems.
In Table 2 we summarize results on instances with 40 to 100 jobs. For each
problem size we generated 10 instances, the table displays the average relative
gap of the relaxations as well as the average computation time. In the application
in question the error in the data is certainly significantly larger than 0.2%, so
there is no immediate need to improve these solutions further. On the other
The m-Cost ATSP 257

Table 2. Relative gap and computation time, averaged over 10 instances

n # v0-cut % secs subtour % subt+Dk %


40 10 0.22 2.84 0.54 0.54
50 10 0.15 9.57 0.51 0.51
60 10 0.17 15.25 0.52 0.52
70 10 0.16 38.13 0.52 0.52
80 10 0.18 55.19 0.49 0.48
90 10 0.23 82.75 0.54 0.54
100 10 0.19 135.91 0.44 0.44

hand, most fractional solutions we investigated so far exhibited violated nested


conflict inequalities, so separation heuristics for this new class of inequalities
might help to improve the performance.
I thank Norbert Ascheuer for many helpful discussions and pointers to the
literature and software, Oleg Mänz for helping in the implementation of the
v0-cut separator, and Thorsten Koch for making available his min-cut code.

References
1. E. Balas. The prize collecting traveling salesman problem. Networks, 19:621–636,
1989.
2. E. Balas. The prize collecting traveling salesman problem: II. polyhedral results.
Networks, 25:199–216, 1995.
3. E. Balas and M. Fischetti. A lifting procedure for the asymmetric traveling sales-
man polytope and a large new class of facets. Mathematical Programming, 58:325–
352, 1993.
4. M. O. Ball, T. L. Magnanti, C. L. Monma, and G. L. Nemhauser, editors. Network
Routing, volume 8 of Handbooks in Operations Research and Management Science.
Elsevier Sci. B.V., Amsterdam, 1995.
5. J. Desrosiers, Y. Dumas, M. M. Solomon, and F. Soumis. Time Constrained Rout-
ing and Scheduling, chapter 2, pages 35–139. Volume 8 of Ball et al. [4], 1995.
6. M. Fischetti, J. Salazar González, and P. Toth. A branch-and-cut algorithm for the
generalized travelling salesman problem. Technical report, University of Padova,
1994.
7. M. Fischetti and P. Toth. A polyhedral approach to the asymmetric traveling
salesman problem. Management Science, 43:1520–1536, 1997.
8. M. Fischetti and P. Toth. An additive approach for the optimal solution of the
prize-collecting travelling salesman problem. In B. Golden and A. Assad, editors,
Vehicle Routing: Methods and Studies, pages 319–343. Elsevier Science Publishers
B.V. (North-Holland), 1998.
9. M. Gendreau, G. Laporte, and F. Semet. A branch-and-cut algorithm for the
undirected selective traveling salesman problem. Networks, 32:263–273, 1998.
10. M. Grötschel and M. Padberg. Polyhedral theory. In Lawler et al. [12], chapter 8.
11. M. Jünger, G. Reinelt, and G. Rinaldi. The traveling salesman problem. In M. Ball,
T. Magnanti, C. Monma, and G. Nemhauser, editors, Network Models, volume 7
of Handbooks in Operations Research and Management Science, chapter 4, pages
225–330. North Holland, 1995.
258 Christoph Helmberg

12. E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys, editors.


The Traveling Salesman Problem. John Wiley & Sons Ltd, Chichester, 1985.
13. M. Padberg and M. Grötschel. Polyhedral computations. In Lawler et al. [12],
chapter 9.
14. M. Padberg and G. Rinaldi. Facet identification for the symmetric traveling sales-
man polytope. Mathematical Programming, 47:219–257, 1990.
15. M. Padberg and G. Rinaldi. A branch and cut algorithm for the resolution of large–
scale symmetric traveling salesman problems. SIAM Review, 33:60–100, 1991.
16. G. Reinelt. The traveling salesman - Computational solutions for TSP applications.
Number 840 in Lecture Notes in Computer Science. Springer-Verlag, 1994.
A Strongly Polynomial Cut Canceling Algorithm
for the Submodular Flow Problem

Satoru Iwata1? , S. Thomas McCormick2?? , and Maiko Shigeno3? ? ?


1
Graduate School of Engineering Sciences
Osaka University, Osaka 560, Japan
[email protected]
2
Faculty of Commerce and Business Administration
University of British Columbia, BC V6T 1Z2, Canada
[email protected]
3
Institute of Policy and Planning Sciences
University of Tsukuba, Ibaraki 305, Japan
[email protected]

Abstract. This paper presents a new strongly polynomial cut cancel-


ing algorithm for minimum cost submodular flow. The algorithm is a
generalization of our similar cut canceling algorithm for ordinary min-
cost flow. The advantage of cut canceling over cycle canceling is that
cut canceling seems to generalize to other problems more readily than
cycle canceling. The algorithm scales a relaxed optimality parameter,
and creates a second, inner relaxation that is a kind of submodular max
flow problem. The outer relaxation uses a novel technique for relaxing
the submodular constraints that allows our previous proof techniques to
work. The algorithm uses the min cuts from the max flow subproblem
as the relaxed most positive cuts it chooses to cancel. We show that this
algorithm needs to cancel only O(n3 ) cuts per scaling phase, where n is
the number of nodes. Furthermore, we also show how to slightly modify
this algorithm to get a strongly polynomial running time.

1 Introduction

A fundamental problem in combinatorial optimization is min-cost network flow


(MCF). It can be modeled as a linear program with guaranteed integral optimal
solutions (with integral data), and many polynomial and strongly polynomial
algorithms for it exist (see Ahuja, Magnanti, and Orlin [1] for background).
Researchers in mathematical programming have developed a series of extensions
of MCF having integral optimal solutions (see Schrijver [28]).
?
Research supported by a Grant-in-Aid for Scientific Research from Ministry of Ed-
ucation, Science, Sports and Culture of Japan.
??
Research supported by an NSERC Operating Grant; part of this research was done
during visits to RIMS at Kyoto University and SORIE at Cornell University.
???
Research supported by a Grant-in-Aid for Scientific Research from Ministry of Ed-
ucation, Science, Sports and Culture of Japan.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 259–272, 1999.
c Springer-Verlag Berlin Heidelberg 1999
260 Satoru Iwata, S. Thomas McCormick, and Maiko Shigeno

We have long been interested in finding a generic (strongly) polynomial algo-


rithm for such problems, i.e., one that is easily extended from the MCF case to
more general cases. A natural class of generic algorithm to consider is the class of
(primal) cycle canceling algorithms, and (dual) cut canceling algorithms. These
are natural since they take improving steps coming from the linear algebra of
the constraints, and whose cost depends only on the local effect on the objective
value. It is (in theory) easy to figure out what these objects should look like for
more general models than MCF.
A natural first step in applying such a research program is to consider the
case of submodular flow (SF). This problem very much resembles ordinary MCF,
except that the usual conservation constraints have been relaxed into constraints
on the total violation of conservation on any node subset. SF was shown to enjoy
integral optimal solutions by Edmonds and Giles [4]. Early algorithmic contribu-
tions came from [3,6,8] Our algorithm uses many ideas from these papers. The
first strongly polynomial algorithm for SF is due to Frank and Tardos [7], who
generalized the strongly polynomial MCF algorithm by Tardos [30] to a fairly
wide class of combinatorial optimization problems. A more direct generalization
of the Tardos algorithm was presented by Fujishige, Röck, and Zimmermann [12]
with the aid of the tree-projection method by Fujishige [9].
Unfortunately it has been surprisingly difficult to extend known MCF cycle
and cut canceling algorithms to SF. One of the most attractive MCF canceling
algorithms is the min mean cycle canceling algorithm of Goldberg and Tarjan [14]
(and its dual, the max mean cut canceling algorithm of Ervolina and McCormick
[5]). Cui and Fujishige [2] were able to show finiteness for this algorithm, but no
polynomial bound. McCormick, Ervolina, and Zhou [23] show that it is unlikely
that a straightforward generalization of either min mean cycle or max mean cut
canceling is going to be polynomial for SF. Recently, Wallacher and Zimmermann
[31] devised a weakly polynomial cycle canceling algorithm for SF based on the
min ratio approach by Zimmermann [32]. This algorithm is provably not strongly
polynomial.
Another possibility is to cancel cycles or cuts which maximize the viola-
tion of complementary slackness with respect to a relaxed optimality parameter,
the relaxed min/max canceling algorithms. These algorithms allow for a simpler
computation of the cycle or cut to cancel, but otherwise enjoy the relatively
simple analysis of min mean cycle/max mean cut canceling. These algorithms
are developed and analyzed in [29].
The same authors found a way to generalize relaxed min cycle canceling to
SF [18], which led to nearly the fastest weakly polynomial running time for SF.
This algorithm leads to the same strongly polynomial bound as the fastest known
one by Fujishige, Röck, and Zimmermann [12], and it can also be extended to
SF with separable convex costs. However, it seems unlikely that it would extend
to more general cases like M-convex SF, see Murota [24,25].
This difficulty lead us to consider instead extending relaxed max cut canceling
to SF. This paper will show how to modify relaxed max cut canceling in a non-
Cut Canceling for Submodular Flow 261

trivial way so that it will generalize to both a weakly and a strongly polynomial
algorithm for SF.
The only other polynomial cut canceling algorithm we know for SF is the most
helpful total cut canceling algorithm of [21]. However that algorithm is provably
not strongly polynomial, and it needs an oracle to compute exchange capacities
for a derived submodular function which appears to be achievable only through
the ellipsoid algorithm, even if we have a non-ellipsoid oracle for the original
submodular function. By contrast, the present algorithm can compute exchange
capacities in derived networks using only an oracle for the original function, and
is the first dual strongly polynomial algorithm for SF that we know of. Our
algorithm also appears to be the first strongly polynomial algorithm (primal or
dual) for SF that avoids scaling or rounding the data. Furthermore, it seems
very likely to us that the present algorithm will be able to be further generalized
so as to apply to the M-convex case unlike its cycle canceling cousin.

2 Submodular Flow
An instance of submodular flow looks much like an instance of MCF. We are
given a directed graph G = (N, A) with node set N of cardinality n and arc set
A of cardinality m. We are also given lower and upper bounds `, u ∈ RA on the
arcs, and costs c ∈ RA on the arcs.
We need some notation toP talk about relaxed conservation. If w ∈ RX and
Y ⊆ X, then we abbreviate y∈Y wy by w(Y ) as usual. For node subset S,
define ∆+ S as {i → j ∈ A | i ∈ S, j ∈ / S}, ∆− S as {i → j ∈ A | i ∈ / S, j ∈ S},

and ∆S = ∆ S ∪ ∆ S. We say that arc a crosses S if a ∈ ∆S. If ϕ is a flow on
+

the arcs (i.e., ϕ ∈ RA ), the boundary of ϕ, ∂ϕ(S) ≡ ϕ(∆+ S)−ϕ(∆P −


S), is the net
flow out of set S. Note that boundary is modular, i.e., ∂ϕ(S) = i∈S ∂ϕ({i}).
In the sequel we will consider several auxiliary networks whose arc sets are
supersets of A, so we will often subscript ∂ and ∆ by the set of arcs we want
them to include at that point.
Usual MCF conservation requires that ∂ϕ({i}) = 0 for all i ∈ N , which
implies that ∂ϕ(S) = 0 for all S ⊆ N . From this perspective it is natural to
relax this constraint to ∂ϕ(S) ≤ f (S) for some set function f : 2N → R. A way
to get integral optimal solutions with such a constraint is to require that f be
submodular, e.g., that it satisfy

f (S) + f (T ) ≥ f (S ∪ T ) + f (S ∩ T )

for all S, T ⊆ N . For a submodular function f with f (∅) = 0, the base polyhedron
B(f ) is defined by

B(f ) = {x | x ∈ RN , x(N ) = f (N ), ∀S ⊆ N : x(S) ≤ f (S)}.

A vector in B(f ) is called a base. Since ∂ϕ(∅) = ∂ϕ(N ) = 0 for any flow ϕ,
requiring that f (N ) = 0 and ∂ϕ(S) ≤ f (S) for all S ⊆ N is equivalent to
requiring that ∂ϕ belong to B(f ).
262 Satoru Iwata, S. Thomas McCormick, and Maiko Shigeno

Now if f : 2N → R is a submodular function with f (∅) = f (N ) = 0, then


the submodular flow problem is:
X
Minimize ca ϕa
a∈A
subject to `a ≤ ϕa ≤ ua (a ∈ A) (1)
∂ϕ ∈ B(f ).

This is a linear program with an exponential number of constraints.


Technically, the dual problem to (1) should have an exponential number
of dual variables, one for each subset of N . However it is possible ([10]) and
much more convenient to reduce these down to dual variables on nodes, or node
potentials π ∈ RN . For arc a = i → j, we define the reduced cost on a by
cπa = ca + πi − πj .
We need some further concepts to characterize optimality for (1). If ϕ is a
flow with boundary x = ∂ϕ such that x ∈ B(f ), then we say that subset S is
ϕ-tight, or x-tight, or just tight if x(S) (= ∂ϕ(S)) = f (S). Submodularity implies
that the union and intersection of tight sets are tight.
If π is a set of node potentials with distinct values v1 > v2 > . . . > vh , then
the kth level set of π is Lπk ≡ {i ∈ N | πi ≥ vk }, k = 1, 2, . . . , h. It will be
convenient to let Lπ0 = ∅. Note that ∅ = Lπ0 ⊂ Lπ1 ⊂ Lπ2 ⊂ . . . ⊂ Lπh = N .
The “submodular part” of the optimality condition requires that an optimal
ϕ and π satisfy that each Lπk is ϕ-tight. There are two equivalent ways to express
this. The first is to require that ∂ϕ maximize the objective π T x over x ∈ B(f ).
The second requires defining
h
X
f π (S) = {f ((S ∩ Lπk ) ∪ Lπk−1 ) − f (Lπk−1 )} (S ⊆ N ).
k=1

We note that f π is submodular, and that f π ≤ f . It is easy to see that if S


nests with each Lπk (either S ⊆ Lπk or Lπk ⊆ S), then f π (S) = f (S). Also, x is in
B(f π ) if and only if x ∈ B(f ) and every Lπk is x-tight for f [3]. Thus the second
equivalence is to require that ∂ϕ ∈ B(f π ).
Theorem 1 ([10, Theorem 5.3]). A submodular flow ϕ is optimal if and only
if there exists a node potential π : N → R such that

cπa > 0 ⇒ ϕa = `a ,
cπa < 0 ⇒ ϕa = ua ,

and ∂ϕ ∈ B(f π ).

3 Dual Approximate Optimality


We first cover the details of how to check node potentials π for optimality. We
then show how to adapt checking exact optimality to checking a new kind of
Cut Canceling for Submodular Flow 263

approximate optimality. This checking routine will form the core of our cut
canceling algorithm, as it will produce the cuts that we cancel. For this paper
a cut is just a subset S of N . We cancel a cut by increasing πi by some step
length β for i ∈ S, which has the effect of increasing cπa on ∆+ S, decreasing cπa
on ∆− S, and changing the level sets of π.
Given a node potential π, we define modified bounds `π and uπ as follows. If
cπa < 0 then `πa = uπa = `a ; if cπa = 0 then `πa = `a and uπa = ua ; if cπa > 0 then
`πa = uπa = ua . Then Theorem 1 implies that π is optimal if and only if there is
a feasible flow in the network Gπ with bounds `π , uπ , and f π .
For the proof of correctness of our algorithm we will need some details of
how feasibility of Gπ is checked.
We use a variant of an algorithm of Frank [6] to check feasibility of Gπ . It
starts with any initial base x ∈ B(f π ) and any initial ϕ satisfying `π and uπ .
We now define a residual network on N . If ϕij > `πij , then we have a backward
residual arc j → i with residual capacity rij = ϕij − `πij ; if ϕij < uπij , then we
have a forward residual arc i → j with rji = uπij − ϕij . To deal with relaxed
conservation, we have jumping arcs: For every i, j ∈ N with i 6= j and x ∈ B(f π ),
define
re(x, i, j) = max{α | x + αχi − αχj ∈ B(f π )};
thus re(x, i, j) > 0 means that there is no x-tight set containing i but not j. Make
a jumping arc j → i with capacity re(x, i, j) whenever re(x, i, j) > 0. Note that S
is x-tight if and only if there are no jumping arcs w.r.t. x entering S.
The Feasibility Algorithm finds directed residual paths from N + = {i ∈ N |
xi > ∂ϕ({i})} to N − = {i ∈ N | xi < ∂ϕ({i})} with a minimum number of
arcs in the residual network. On each path it augments flow ϕ on the residual
arcs, and modifies x as per the jumping arcs, which monotonically reduces the
difference of x and ∂ϕ. By using a lexicographic selection rule, the algorithm
terminates in finite time. At termination, either x coincides with ∂ϕ, which
implies the δ-optimality of π; or there is no directed path from N + to N − .
In this last case, define T ⊆ N as the set of nodes from which N − is reachable
by directed residual paths. No jumping arcs enter T , so it must be tight for the
final x, and it must contain all i with xi < ∂ϕ({i}). Furthermore, we have
ϕ(∆+ T ) = `π (∆+ T ) and ϕ(∆− T ) = uπ (∆− T ). Thus we get

V π (T ) ≡ `π (∆+ π π
A T ) − u (∆A T ) − f (T ) = ∂ϕ(T ) − x(T ) > 0.

We call a node subset S with V π (S) > 0 a positive cut. Similar reasoning shows
that for any other S ⊆ N , we have V π (S) ≤ ∂ϕ(S) − x(S) ≤ ∂ϕ(T ) − x(T ) =
V π (T ), proving that T is a most positive cut. Intuitively, V π (T ) measures how
far away from optimality π is. We summarize as follows:
Lemma 1. Node potentials π are optimal if and only if there are no positive
cuts w.r.t. π. When π is not optimal, the output of the Feasibility Algorithm is
a most positive cut.
We denote the running time of the Feasibility Algorithm by FA. As usual, we
assume that we have an oracle to compute exchange capacities, and denote its
264 Satoru Iwata, S. Thomas McCormick, and Maiko Shigeno

running time by h. Computing an exchange capacity is a submodular function


minimization on a distributive lattice, which can be done via the ellipsoid method
in (strongly) polynomial time [15]. Cunningham and Frank [3, Theorem 7] show
how to derive an oracle for computing exchange capacities for f π from an oracle
for f with the same running time. Fujishige and Zhang [11] show how to extend
the push-relabel algorithm of Goldberg and Tarjan [13] to get FA = O(n3 h).
Unfortunately, even for min-cost flow, an example of Hassin [16] shows that
it may be necessary to cancel an exponential number of most positive cuts to
achieve optimality. In [29] we get around this difficulty for MCF by relaxing the
optimality conditions by a parameter δ > 0, and then applying scaling to δ. It
then turns out that only a polynomial number of relaxed most positive cuts need
to be canceled for a given value of the parameter, and only a polynomial number
of scaled subproblems need to be solved.
Until now it has been difficult to find a relaxation in the submodular flow case
that would allow the same analysis to apply. Here we introduce a new relaxation
that works. We relax the modified bounds on the arcs in A by widening them
by δ just as in [5]. Define `π,δ π π,δ π
a = `a − δ and ua = ua + δ for every arc a ∈ A.
We also need to relax the submodular bounds on conservation by some func-
tion of δ. Define f π,δ (S) = f π (S) + δ|S| · |N − S|. This relaxation is closely
related to [17]. Since |S| · |N − S| is submodular, f π,δ is submodular.
We can now define the relaxed cut value of S as
V π,δ (S) = `π,δ (∆+
A S) − u
π,δ
(∆−
A S) − f
π,δ
(S).
We say that node potential π is δ-optimal if V π,δ (S) ≤ 0 holds for every cut S.
Thus π is 0-optimal if and only if it is optimal.
To check if π is δ-optimal, we need to see if there is a feasible flow in the
network with bounds `π,δ , uπ,δ , and f π,δ . It turns out to be more convenient to
move the δ relaxation off f π,δ onto a new set of arcs. Define G bπ = (N, A ∪ E) to
be the directed graph obtained from G by adding E = {i → j | i, j ∈ N, i 6= j}
to the arc set. Extend the bounds `π , uπ , `π,δ and uπ,δ from A to E by setting
`πe = uπe = 0, `π,δ
e = 0 and uπ,δ e = δ for every e ∈ E. For convenience set
0
I = A ∪ E and m = |A ∪ E| = m + n(n − 1).
Now checking `π,δ , uπ,δ and f π,δ for feasibility on G (with arc set A) is
equivalent to checking `π,δ , uπ,δ , and f π for feasibility on G bπ (with arc set
I = A ∪ E). Also, the relaxed cut value on G can be re-expressed as
V π,δ (S) = `π,δ (∆+
I S) − u
π,δ
(∆− π
I S) − f (S).

b π , then π is δ-optimal. Otherwise, the


If there is a feasible submodular flow in G
Feasibility Algorithm gives us a node subset T that maximizes V π,δ (S), a relaxed
most positive cut or δ-MPC. When π is not δ-optimal, the Feasibility Algorithm
also gives us an optimal “max flow” ϕ on G b π and a base x in B(f π ) such that
xi ≤ ∂I ϕ({i}) for i ∈ T and xi ≥ ∂I ϕ({i}) for i ∈/ T.
For Section 5 we will need to consider max mean cuts. The mean value of
node subset S is
π V π (S)
V (S) ≡ .
|∆A S| + |S| · |N − S|
Cut Canceling for Submodular Flow 265

π
A max mean cut attains the maximum in δ(π) ≡ maxS V (S). By standard
LP duality arguments δ(π) also equals the minimum δ such that there is a
bπ with bounds `π,δ , uπ,δ , and f π . Define U to be the maximum
feasible flow in G
absolute value of any `a , ua , or f ({i}). A max mean cut can be computed
using O(min{m0 , 1+log log(nU)−log
log(nU)
log n }) calls to the Feasibility Algorithm in the
framework of Newton’s Algorithm, see Ervolina and McCormick [22] or Radzik
[26].

4 Cut Cancellation

We will start out with δ large and drive δ towards zero, since π is 0-optimal if
and only if it is optimal. The next lemma says that δ need not start out too big,
and need not end up too small. Its proof is similar to [5, Lemma 5.1].

Lemma 2. Suppose that `, u, and f are integral. Then any node potentials π
are 2U -optimal. Moreover, when δ < 1/m0 , any δ-optimal node potentials are
optimal.

Our relaxed δ-MPC Canceling algorithm will start with δ = 2U and will
execute scaling phases, where each phase first sets δ := δ/2. The input to a
phase will be a 2δ-optimal set of node potentials from the previous phase, and
its output will be a δ-optimal set of node potentials. Lemma 2 says that after
O(log(nU )) scaling phases we have δ < 1/m0 , and we are optimal. Within a
scaling phase we use the Feasibility Algorithm to find a δ-MPC T . We then
cancel T by adding a constant step length β to πi for each node in T to get π 0 .
In ordinary min-cost flow we choose the step length β based on the first arc
in A whose reduced cost hits zero (as long as the associated flow is within the
bounds, see [29, Figure 2]). This bound on β is
 
min{|cπa | | a ∈ ∆+
A T, cπa < 0, ϕa ≥ `a }
η ≡ min .
min{cπa | a ∈ ∆− π
A T, ca > 0, ϕa ≤ ua }

(Note that optimality of T implies that every arc a with ϕa ≥ `a has negative
reduced cost, and every arc a with ϕa ≤ ua has positive reduced cost.) We say
that an arc achieving the min for η determines η.
Here we must also worry about the level set structure of π changing during
0
the cancel. We need to increase flows on jumping arcs leaving T so that the Lπk
will be tight. We try to do this by decreasing flow on arcs of ∆−E T , all of which
have flow δ since T is tight. If the exchange capacity of such an arc is at most δ
we can do this. Otherwise, this E-arc will determine β, and the large exchange
capacity will allow us to show that the algorithm makes sufficient progress.
These considerations lead to the subroutine AdjustFlow below. It takes as
input the optimal max flow ϕ from the Feasibility Algorithm, and its associated
base x ∈ B(f π ). It computes an update ψ ∈ RE to ϕ and base x0 so that x0 will
be the base associated with ϕ0 ≡ ϕ − ψ. Note that in AdjustFlow we always
266 Satoru Iwata, S. Thomas McCormick, and Maiko Shigeno

Algorithm AdjustFlow:
begin
ψe = 0 for e ∈ E;
H := {j → i | i ∈ T, j ∈ N − T, 0 < πj − πi < η};
while H 6= ∅ do
begin
x0 := x − ∂E ψ;
select e = j → i ∈ H with minimum πj − πi ;
set H := H − {e};
if e
r (x0 , i, j) < δ then ψe := e
r (x0 , i, j);
else return β := πj − πi [β is determined by jumping arc j → i];
end
return β := η [β is determined by the A-arc determining η];
end.

compute re(x, i, j) w.r.t. f , never w.r.t. f π , so that jumping arcs are w.r.t. f , not
f π . We call an arc that determines β a determining arc.
The full description of canceling a cut is now: Use the Feasibility Algorithm
to find δ-MPC T , max flow ϕ, and base x. Run AdjustFlow to modify ϕ to
ϕ0 = ϕ−ψ and x to x0 = x−∂E ψ, and to choose β. Finally, increase πi by β for all
i ∈ T . A scaling phase cancels δ-MPCs in this manner until π is δ-optimal. The
next lemma shows that canceling a δ-MPC cannot increase the δ-MPC value,
and that V π,δ (S) will decrease by at least δ under some circumstances.

Lemma 3. Suppose we cancel a δ-MPC T for π to get a node potential π 0 . Then


0
V π ,δ (S) ≤ V π,δ (T ) holds for any cut S.
0 0
Proof. It is easy to show similarly to [29, Lemma 5.3] that `πa ,δ ≤ ϕa ≤ uπa ,δ
holds for every a ∈ A. As a result of AdjustFlow, we obtain a flow ψ ∈ RE
with 0 ≤ ψe < δ for e ∈ E. Put ϕ0a = ϕa for a ∈ A and ϕ0e = ϕe − ψe for e ∈ E.
0
To prove that ϕ0 is feasible for B(f π ) we need:
0
Claim: x0 = x − ∂E ψ ∈ B(f π ).
Proof: If the initial H = ∅ in AdjustFlow (i.e., πj > πi implies that πj ≥
0
πi + η), then ψ ≡ 0 and so x0 = x. In this case it can be shown that f π (S) =
f π (S ∩ T ) + f π (S ∪ T ) − f π (T ). Since x(T ) = x0 (T ) = f π (T ), we have x0 (S) =
0
x0 (S ∩ T ) + x0 (S ∪ T ) − x0 (T ) ≤ f π (S ∩ T ) + f π (S ∪ T ) − f π (T ) = f π (S).
Suppose instead that the initial H is non-empty. Here we know at least that
x0 ∈ B(f ) (since ψji is never bigger than re(x0 , i, j)). Denote T ∩ (Lπk − Lπk−1 ) by
Tk , and (Lπk − Lπk−1 ) − T by T k , so that the Tk partition T , and the T k partition
Sq Sp
N − T . Then a typical level set of π 0 looks like L0 = ( k=1 Tk ) ∪ ( k=1 T k ) =
Lπp ∪ (T ∩ Lπq ) for some 0 ≤ p < q ≤ h.
Now both T and Lπi are ϕ-tight for f π , so their union and intersection are
Si
both ϕ-tight for f π . By the same reasoning, Tbi ≡ (T ∩ Lπi ) ∪ Lπi−1 = ( k=1 Tk ) ∪
Si−1
( k=1 T k ) is also ϕ-tight for f π . Since every arc of H starts in some T j and ends
in some Ti with j < i, no arc of H crosses any Tbi , so each Tbi is also ϕ0 -tight for
Cut Canceling for Submodular Flow 267

f π . Each Tbi nests with the Lπk , so we have f π (Tbi ) = f (Tbi ), so each Tbi is in fact
ϕ0 -tight for f . This implies that the only possible jumping arcs entering level set
L0 of π 0 are those from T j to Ti for p < j < i ≤ q. But these jumping arcs all
belong to H and were removed by AdjustFlow. Thus L0 is ϕ0 -tight for f , and
0
so x0 ∈ B(f π ).
Since ψe > 0 implies ϕe = δ, every e ∈ E satisfies 0 ≤ ϕ0e ≤ δ. Therefore we
have
0 0 0 0 0
V π ,δ (S) = `π ,δ (∆+I S) − u
π ,δ
(∆− π
I S) − f (S) (def’n of V
π ,δ
(S))
0
0 0
≤ ∂I ϕ (S) − x (S) (feas. of ϕ , x0 ∈ B(f π ))
0

= ∂I ϕ(S) − x(S) (def’n of ϕ0 , x0 )


≤ ∂I ϕ(T ) − x(T ) (T a δ-MPC)

= `π,δ (∆+I T ) − u π,δ
(∆I T ) − f π
(T ) (T is ϕ-tight for f π )
π,δ
= V (T ). (def’n of V π,δ (T ))
t
u

Corollary 1. Suppose we cancel a δ-MPC T for π to get a node potential π 0 . If


0
the determining arc a ∈ A ∪ E crosses S, then we have V π ,δ (S) ≤ V π,δ (T ) − δ.
0
Proof. If the determining arc a ∈ A, then `πa ,δ + δ = `a ≤ ϕa ≤ ua =
0 0
π 0 ,δ
ua − δ, and hence `π ,δ (∆+
π ,δ
I S) − u (∆− 0
I S) ≤ ∂I ϕ (S) − δ. If a ∈ ∆E S, then
+
0 0
π 0 ,δ
ϕa = δ = `a +δ, and hence ` (∆I S)−u (∆I S) ≤ ∂I ϕ (S)−δ. If a ∈ ∆−
π ,δ π ,δ + − 0
E S,
0
then f (S) − x0 (S) ≥ δ, and hence f π (S) ≥ x0 (S) + δ. In either case we have
0 0
V π ,δ (S) ≤ ∂I ϕ0 (S) − x0 (S) − δ, which implies V π ,δ (S) ≤ V π,δ (T ) − δ. t
u
Note that we have two different notions of “closeness to optimality” in this
algorithm. At the outer level of scaling phases we drive δ towards zero (and so
π towards optimality), and inside a phase we drive the δ-MPC value towards
zero (and so π towards δ-optimality). Observe that the difference between ∂ϕ
and x in the Feasibility Algorithm is an inner relaxation of the condition that
π is δ-optimal, in that we keep a flow satisfying the bounds `π,δ and uπ,δ and a
base x in B(f π ). As the phase progresses the difference is driven to zero, until
the final flow proves δ-optimality of the final π.
We can now prove the key lemma in the convergence proof of δ-MPC Can-
celing, which shows that we must be in a position to apply Lemma 3 reasonably
often.

Lemma 4. After at most n cut cancellations, the value of a δ-MPC decreases


by at least δ.

Proof. Suppose that the first cancellation has initial node potentials π 0 and
cancels δ-MPC T 1 to get π 1 , the next cancels T 2 to get π 2 , . . . , and the nth
cancels T n to get π n . Each cancellation makes at least one arc into a determining
arc. Consider the subgraph of these determining arcs. If a cut creates a determin-
ing arc whose ends are in the same connected component of this subgraph, then
this cut must be crossed by a determining arc from an earlier cut. We can avoid
268 Satoru Iwata, S. Thomas McCormick, and Maiko Shigeno

this only if each new determining arc strictly decreases the number of connected
components in the subgraph. This can happen at most n − 1 times, so it must
happen at least once within n iterations.
Let k be an iteration where T k shares a determining arc with an earlier cut
h−1
T . By Corollary 1 applied to T = T h and S = T k , we have V π ,δ (T h ) ≥
h
h i−1
V π ,δ (T k ) + δ. Note that the δ-MPC value at iteration i is V π ,δ (T i ). If
h k−1 0 h−1
V π ,δ (T k ) ≥ V π ,δ (T k ), Lemma 3 says that V π ,δ (T 1 ) ≥ V π ,δ (T h ) ≥
h k−1 n−1
V π ,δ (T k ) + δ ≥ V π ,δ (T k ) + δ ≥ V π ,δ (T n ) + δ.
h k−1
If instead V π ,δ (T k ) < V π ,δ (T k ), then let p be the latest iteration between
p−1 p
h and k with V π ,δ (T k ) < V π ,δ (T k ). The only way for the value of T k to
p−1
increase like this is if there is an arc a ∈ A with cπa = 0 that crosses T k in the
p π p−1 ,δ p
reverse orientation to its orientation in T , or f (T k ) > f π ,δ (T k ) holds. Let
ϕ be a flow proving optimality of T . If a ∈ ∆A T ∩ ∆A T (a ∈ ∆−
p p + p − k p + k
A T ∩ ∆A T )
πp p p πp p p
then we have ua = ϕa + 2δ ≥ ϕa + δ (la = ϕa − 2δ ≤ ϕa − δ). When
p−1 p
f π ,δ (T k ) > f π ,δ (T k ) holds, there exist i, j ∈ N such that i ∈ T k \ T p ,
j ∈ T p \ T k . It follows from i → j ∈ ∆− p + p
I Tp and j → i ∈ ∆I T that ϕe = δ =
p
π p ,δ p π ,δ 0
le + δ for e = i → j and ϕe0 = 0 = ue0 − δ for e = j → i. In either case,
p
π p ,δ
we have lπ ,δ (∆+ k
I T )−u (∆− k p
I T ) ≤ ∂I ϕ − δ. Thus equations of the proof
p−1 p
of Lemma 3 applies, showing that V π ,δ (T p ) − δ ≥ V π ,δ (T k ). Then Lemma 3
0 p−1 p
and the choice of p says that V π ,δ (T 1 ) ≥ V π ,δ (T p ) ≥ V π ,δ (T k ) + δ ≥
k−1 n−1
V π ,δ (T k ) + δ ≥ V π ,δ (T n ) + δ. t
u

Lemma 5. The value of every δ-MPC in a scaling phase is at most m0 δ.

Proof. Let ϕ prove the 2δ-optimality of the initial π in a phase, so that ϕ


violates the bounds lπ and uπ by at most 2δ on every arc of A ∪ E, and ϕsi = 0
for all i. To get an initial flow ϕ0 to begin the next phase that violates the bounds
by at most δ, we need to change ϕ by at most δ on each arc of A P∪ E. Then if
π
we set ϕ0si = δ|∆+ I {i}| for all i, ϕ will satisfy f also. The sum
0
i∈N ϕsi is at
0

most m0 δ, and this is an upper bound on the value of the first δ-MPC in this
phase. By Lemma 3, it is then also a bound on the value of every δ-MPC in the
phase. t
u

Putting Lemmas 4 and 5 together yields our first bound on the running time
of δ-MPC Canceling:

Theorem 2. A scaling phase of δ-MPC Canceling cancels at most m0 n δ-MPCs.


Thus the running time of δ-MPC Canceling is O(n3 log(nU )FA) = O(n6 h log(nU )).

Proof. Lemma 5 shows that the δ-MPC value of the first cut in a phase is
at most m0 δ. It takes at most n iterations to reduce this by δ, so there are at
most m0 n = O(n3 ) iterations per phase. The time per iteration is dominated by
computing a δ-MPC, which is O(FA). The number of phases is O(log(nU )) by
Lemma 2. t
u
Cut Canceling for Submodular Flow 269

5 A Strongly Polynomial Bound


We first prove the following lemma. Such a result was first proved by Tardos [30]
for MCF, and this version is an extension of dual versions by Fujishige [9] and
Ervolina and McCormick [5].
Lemma 6. Suppose that flow ϕ0 proves that π 0 is δ 0 -optimal. If arc a ∈ A sat-

isfies ϕ0a < ua − (m0 + 1)δ 0 , then all optimal π ∗ have cπa ≥ 0. If arc a ∈ A

satisfies ϕ0a > `a + (m0 + 1)δ 0 , then all optimal π ∗ have cπa ≤ 0. If ∂I ϕ0 (T 0 ) >
0
f π (T 0 ) + (m0 n/2)δ 0 for some π 0 and some T 0 ⊆ N , then there is an E-arc
i → j leaving some level set of π 0 such that all optimal π ∗ have πi∗ ≤ πj∗ .
Proof. Note that ϕ0 proving δ 0 -optimality of π 0 implies that there is no jump-
0 0
ing arc entering any Lπk for any k. Since x0 ≡ ∂I ϕ0 ∈ B(f π ), this says that x0 is
0
a π -maximum base of B(f ).
0 0 0 0
Now change ϕ0 , a flow on A∪E feasible for `π ,δ and uπ ,δ , into flow ϕ b on just
π0 π0
A, feasible for ` and u , by getting rid of ϕe for e ∈ E, and by changing ϕ0a by
0

at most δ 0 on each a ∈ A. Note that this ϕ


b will not in general satisfy ∂A ϕb ∈ B(f ),
but that ϕ b otherwise satisfies all optimality conditions: it satisfies the bounds
and is complementary slack with π 0 . Define N + = {i ∈ N | x0i > ∂A ϕ({i})}
b and
N − = {i ∈ N | x0i < ∂A ϕ({i})},
b so that

x0 (N + ) ≤ m0 δ 0 + ∂A ϕ(N
b + ). (2)

We now apply the successive shortest path algorithm for submodular flow
of Fujishige [8] starting from ϕ. b (As originally stated, this algorithm is finite
only for integer data, but the lexicographic techniques of Schönsleben [27] and
Lawler and Martel [20] show that it can be made finite for any data.) This
algorithm looks for an augmenting path from a node i ∈ N + to a node j ∈ N − ,
where residual capacities on A-arcs come from ϕ, b and residual capacities on
jumping arcs come from x0 . It chooses a shortest augmenting path (using the
current reduced costs as lengths; such a path can be shown to always exist) and
augments flow along the path, updating π 0 by the shortest path distances, and x0
as per the jumping arcs. This update maintains the properties that the current
b satisfies the bounds and is complementary slack with the current π 0 , and the
ϕ
current x0 belongs to B(f ). The algorithm terminates with optimal flow ϕ∗ once
the boundary of the current ϕ b coincides with the current x0 . By (2), the total
amount of flow pushed by this algorithm is at most m0 δ 0 .
This implies that for each a ∈ A, ϕ ba differs from ϕ∗a by at most m0 δ 0 , so ϕ0a
differs from ϕa by at most (m + 1)δ . In particular, if ϕ0a < ua − (m0 + 1)δ 0 , then
∗ 0 0

ϕ∗a < ua , implying that cπa ≥ 0, and similarly for ϕ0a > `a + (m0 + 1)δ 0 .
Suppose that P is an augmenting path chosen by the algorithm, and that
flow is augmented by amount τP along P . Since P has at most n/2 jumping
arcs, the boundary of any S ⊆ N changes by at most (n/2)τP due to P . Since
P 0 0
P τP ≤ m δ by (2), the total change in ∂A ϕ(S) b during the algorithm is at most
(m0 n/2)δ 0 . Since |∂I ϕ0 (S) − ∂A ϕ(S)|
b ≤ m0 δ 0 , the total change in ∂I ϕ0 (S) is at
0
most (m0 n/2)δ 0 . Thus ∂I ϕ0 (T 0 ) > f π (T 0 ) + (m0 n/2)δ 0 implies that ∂A ϕ∗ (T 0 ) >
270 Satoru Iwata, S. Thomas McCormick, and Maiko Shigeno

0
f π (T 0 ). This says that some level set of π 0 is not ϕ∗ -tight. This implies that
there is some E-arc i → j with πi0 > πj0 but πi∗ ≤ πj∗ . t
u

We now modify our algorithm a little bit. We divide our scaling phases into
blocks of log2 (m0 + 1) phases each. At the beginning of each block we compute a
max mean cut T 0 with mean value δ 0 = δ(π 0 ) and cancel T 0 (including calling
AdjustFlow). This ensures that our current flow is δ 0 -optimal, so we set δ = δ 0
and start the block of scaling phases. It will turn out that only 2m0 blocks are
sufficient to attain optimality.

Theorem 3. This modified version of the algorithm takes O(n5 log nFA) =
O(n8 h log n) time.

Proof. Let T 0 be the max mean cut canceled at the beginning of the block,
with associated node potential π 0 , flow ϕ0 , and mean value δ 0 . Complementary
0
π0
slackness for T 0 implies that ϕ0a = uπ + δ 0 for all a ∈ ∆− I T , that ϕa = `
0 0
− δ0
0
π π0
for all a ∈ ∆A T , that ϕa = ` for all a ∈ ∆I T , and that ∂I ϕ (T ) = f (T 0 ).
+ 0 0 + 0 0 0

Let π 0 , ϕ0 , and δ 0 be the similar values after the last phase in the block. Since
each scaling phase cuts δ in half, we have δ 0 < δ 0 /(m0 + 1).
− 0 π0
Subtracting ϕ0 (∆+ 0 0
I T ) − ϕ (∆I T ) − ∂I ϕ (T ) = 0 from V
0 0
(T 0 ) yields
0
(|∆A T 0 | + |T 0 | · |N − T 0 |)δ 0 = V π (T 0 )
0 0
− 0
= (`π − ϕ0 )(∆+ 0 π
I T ) + (ϕ − u )(∆I T ) (3)
0

0
+ (∂I ϕ0 (T 0 ) − f π (T 0 )).
0
Now apply Lemma 6 to ϕ0 and π 0 . If the term for arc a of (`π − ϕ0 )(∆+ A T ) is at
0
π0
least δ > (m +1)δ , then we must have that `a = ua and so ϕa < ua −(m +1)δ 0 ,
0 0 0 0 0

π0
and we can conclude that cπa ≥ 0. But each a in ∆+ A T had ca < 0, so this is
0
0
a new sign constraint on cπ . The case for terms of (ϕ0 − uπ )(∆− A T ) is similar.
0
− 0
Suppose instead that all the terms in the ∆A T and ∆A T sums of (3)
+ 0

are at most δ 0 . The total of all the E-arc terms is at most |T 0 | · |N − T 0 |δ 0 .


Therefore the only possibility left to achieve the large left-hand side of (3) is
0
to have ∂I ϕ0 (T 0 ) > f π (T 0 ) + (m0 n/2)δ 0 . Lemma 6 says that in this case there
must be a jumping arc i → j leaving some level set of π 0 such that all optimal
π ∗ have πi∗ ≤ πj∗ . Since πi0 was larger than πj0 , this is a new sign restriction on
π.

In either case each block imposes a new sign restriction on cπa for some I-arc
a. At most 2m0 such sign restrictions can be imposed before π ∗ is completely
determined, so after at most 2m0 = O(n2 ) blocks we must be optimal. Each
block requires log(m0 + 1) = O(log n) scaling phases. The proof of Theorem 2
shows that each scaling phase costs O(n3 FA) time exclusive of computing the
max mean cut. The time to compute a max mean cut is O(m0 FA), which is not
a bottleneck. t
u
Cut Canceling for Submodular Flow 271

6 Directions for Further Research

These techniques should lead to a (strongly) polynomial maximum mean cut can-
celing algorithm for the submodular flow problem. It should be very straightfor-
ward to extend this algorithm to the separable convex submodular flow problem
as was done for its cycle canceling cousin. We are also optimistic about extend-
ing it to the M-convex cost submodular flow problem [24,25]. If we can do this,
then we would have a unified approach to a variety of optimization problems
including separable convex cost optimization in totally unimodular spaces [19].

Acknowledgment

We heartily thank Lisa Fleischer for many valuable conversations about this
paper and helpful comments on previous drafts of it.

References
1. R. K. Ahuja, T. L. Magnanti, and J. B. Orlin: Network Flows — Theory, Algo-
rithms, and Applications, Prentice Hall, 1993.
2. W. Cui and S. Fujishige: A primal algorithm for the submodular flow problem with
minimum-mean cycle selection, J. Oper. Res. Soc. Japan, 31 (1988), 431–440.
3. W. H. Cunningham and A. Frank: A primal-dual algorithm for submodular flows,
Math. Oper. Res., 10 (1985), 251–262.
4. J. Edmonds and R. Giles: A min-max relation for submodular functions on graphs,
Ann. Discrete Math., 1 (1977), 185–204.
5. T. R. Ervolina and S. T. McCormick: Two strongly polynomial cut canceling algo-
rithms for minimum cost network flow, Discrete Appl. Math., 46 (1993), 133–165.
6. A. Frank: Finding feasible vectors of Edmonds-Giles polyhedra, J. Combinatorial
Theory, B36 (1984), 221–239.
7. A. Frank and É. Tardos: An application of simultaneous Diophantine approxima-
tion in combinatorial optimization, Combinatorica, 7 (1987), 49–65.
8. S. Fujishige: Algorithms for solving the independent-flow problems, J. Oper. Res.
Soc. Japan, 21 (1978), 189–204.
9. S. Fujishige: Capacity-rounding algorithm for the minimum-cost circulation prob-
lem: A dual framework of the Tardos algorithm, Math. Programming, 35 (1986),
298–309.
10. S. Fujishige: Submodular Functions and Optimization, North-Holland , 1991.
11. S. Fujishige and X. Zhang: New algorithms for the intersection problem of sub-
modular systems, Japan J. Indust. Appl. Math., 9 (1992), 369–382.
12. S. Fujishige, H. Röck, and U. Zimmermann: A strongly polynomial algorithm for
minimum cost submodular flow problems, Math. Oper. Res., 14 (1989), 60–69.
13. A. V. Goldberg and R. E. Tarjan: A new approach to the maximum flow problem,
J. ACM, 35 (1988), 921–940.
14. A. V. Goldberg and R. E. Tarjan: Finding minimum-cost circulations by canceling
negative cycles, J. ACM, 36 (1989), 873–886.
15. M. Grötschel, L. Lovász, and A. Schrijver: Geometric Algorithms and Combinato-
rial Optimization, Springer-Verlag, 1988.
272 Satoru Iwata, S. Thomas McCormick, and Maiko Shigeno

16. R. Hassin: Algorithm for the minimum cost circulation problem based on maxi-
mizing the mean improvement, Oper. Res. Lett., 12 (1992), 227–233.
17. S. Iwata: A capacity scaling algorithm for convex cost submodular flows,
Math. Programming, 76 (1997), 299–308.
18. S. Iwata, S. T. McCormick, and M. Shigeno: A faster algorithm for minimum cost
submodular flows, Proceedings of the Ninth Annual ACM-SIAM Symposium on
Discrete Algorithms (1998), 167–174.
19. A. V. Karzanov and S. T. McCormick: Polynomial methods for separable convex
optimization in unimodular linear spaces with applications, SIAM J. Comput., 26
(1997), 1245–1275.
20. E. L. Lawler and C. U. Martel: Computing maximal polymatroidal network flows,
Math. Oper. Res., 7 (1982), 334–347.
21. S. T. McCormick and T. R. Ervolina: Cancelling most helpful total submodular
cuts for submodular flow, Integer Programming and Combinatorial Optimization
(Proceedings of the Third IPCO Conference), G. Rinaldi and L. A. Wolsey eds.
(1993), 343–353.
22. S. T. McCormick and T. R. Ervolina: Computing maximum mean cuts, Discrete
Appl. Math., 52 (1994), 53–70.
23. S. T. McCormick, T. R. Ervolina and B. Zhou: Mean canceling algorithms for
general linear programs and why they (probably) don’t work for submodular flow,
UBC Faculty of Commerce Working Paper 94-MSC-011 (1994).
24. K. Murota: Discrete convex analysis, Math. Programming, 83 (1998), 313–371.
25. K. Murota: Submodular flow problem with a nonseparable cost function, Combi-
natorica, to appear.
26. T. Radzik: Newton’s method for fractional combinatorial optimization, Proceed-
ings of the 33rd IEEE Annual Symposium on Foundations of Computer Science
(1992), 659–669; see also: Parametric flows, weighted means of cuts, and fractional
combinatorial optimization, Complexity in Numerical Optimization, P. Pardalos,
ed. (World Scientific, 1993), 351–386.
27. P. Schönsleben: Ganzzahlige Polymatroid-Intersektions Algorithmen, Dissertation,
Eigenössische Technische Hochschule Zürich, 1980.
28. A. Schrijver: Total dual integrality from directed graphs, crossing families, and
sub- and supermodular functions, Progress in Combinatorial Optimization, W. R.
Pulleyblank, ed. (Academic Press, 1984), 315–361.
29. M. Shigeno, S. Iwata, and S. T. McCormick: Relaxed most negative cycle and
most positive cut canceling algorithms for minimum cost flow, Math. Oper. Res.,
submitted.
30. É. Tardos: A strongly polynomial minimum cost circulation algorithm, Combina-
torica, 5 (1985), 247–255.
31. C. Wallacher and U. Zimmermann: A polynomial cycle canceling algorithm for
submdoular flows, Math. Programming, to appear.
32. U. Zimmermann: Negative circuits for flows and submodular flows, Discrete Appl.
Math., 36 (1992), 179–189.
Edge-Splitting Problems with Demands

?
Tibor Jordán

BRICS?? , Department of Computer Science


University of Aarhus
Ny Munkegade, building 540
8000 Aarhus C, Denmark
[email protected]

Abstract. Splitting off two edges su, sv in a graph G means deleting


su, sv and adding a new edge uv. Let G = (V +s, E) be k-edge-connected
in V (k ≥ 2) and let d(s) be even. Lovász [8] proved that the edges in-
cident to s can be split off in pairs in a such a way that the resulting
graph remains k-edge-connected. In this paper we investigate the exis-
tence of such complete splitting sequences when the set of split edges
has to satisfy some additional requirements. We prove structural prop-
erties of the set of those pairs u, v of neighbours of s for which splitting
off su, sv destroys k-edge-connectivity. This leads to a new method for
solving problems of this type.
By applying this new approach first we obtain a short proof for a re-
cent result of Nagamochi and Eades [9] on planarity-preserving complete
splitting sequences.
Then we apply our structural result to prove the following: let G and
H be two graphs on the same set V + s of vertices and suppose that
their sets of edges incident to s coincide. Let G (H) be k-edge-connected
(l-edge-connected, respectively) in V and let d(s) be even. Then there
exists a pair su, sv which is allowed to split off in both graphs simultane-
ously provided d(s) ≥ 6. If k and l are both even then such a pair exists
for arbitrary even d(s). Using this result and the polymatroid intersec-
tion theorem we give a polynomial algorithm for the problem of simul-
taneously augmenting the edge-connectivity of two graphs by adding a
smallest (common) set of new edges.

1 Introduction
Edge-splitting is a well-known and useful method to solve graph problems which
involve certain edge-connectivity properties. Splitting off two edges su, sv means
deleting su, sv and adding a new edge uv. Such an operation may reduce the local
edge-connectivity between some pairs of vertices and may decrease the global
?
Part of this work was done while the author visited the Department of Applied
Mathematics and Physics, Kyoto University, supported by the Monbusho Interna-
tional Scientific Research Program no. 09044160.
??
Basic Research in Computer Science, Centre of the Danish National Research Foun-
dation.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 273–288, 1999.
c Springer-Verlag Berlin Heidelberg 1999
274 Tibor Jordán

edge-connectivity of the graph. The essence of the edge splitting method is to


find pairs of edges which can be split off maintaining certain edge-connectivity
properties of the graph. If such a good pair exists then one may reduce the
problem to a smaller graph which can lead to inductive proofs. Another typical
application is the edge-connectivity augmentation problem where splitting off is
an important subroutine in some polynomial algorithms. This connection will
be discussed in detail in Section 5.
Let G = (V + s, E) be a graph which is k-edge-connected in V , that is,
d(X) ≥ k holds for the degree of every ∅ = 6 X ⊂ V . Suppose that d(s) is even
and k ≥ 2. Lovász [8] proved that for every edge su of G there exists an edge
sv for which splitting off the pair su, sv maintains k-edge-connectivity in V . We
call such a pair admissible. By repeated applications of this theorem we can see
that all the edges incident to s can be split off in pairs in such a way that the
resulting graph (on vertex set V ) is k-edge-connected. Such a splitting sequence
which isolates s (and preserves k-edge-connectivity in V ) is called a complete
(admissible) splitting at s.
This result gives no information about the structure of the subgraph (V, F )
induced by the set F of new edges that we obtain by a complete admissible
splitting (except the degree-sequence of its vertices, which is clearly uniquely
determined by G and is the same for every complete splitting). Recent problems
in edge-connectivity augmentation gave rise to edge-splitting problems where
the goal is to find a complete admissible splitting for which the subgraph of the
new edges satisfies some additional requirement.
The goal of this paper is to develop a new method for these kind of edge
splitting problems, to prove new structural results and to give a polynomial
algorithm for a new optimization problem.
The basic idea is to define the non-admissibility graph B(s) of G = (V + s, E)
on the set of neighbours of s by connecting two vertices x, y if and only if the pair
sx, sy is not admissible. We prove that B(s) is 2-edge-connected if and only if
B(s) is a cycle, k is odd and G has a special structure which we call round. This
structural property turns out to be essential in several edge-splitting problems.
Suppose that the additional requirement that the split edges have to meet can be
given by defining a demand graph D(s) on the neighbours of s at every iteration
and demanding that only admissible pairs su, sv (u 6= v) with uv ∈ E(D(s)) are
allowed to split. Then it is clear that such an allowed pair exists if and only if
D(s) is not a subgraph of B(s). Our method is to compare the structure of the
graphs B(s) and D(s) that may occur at some iteration and show the existence of
a complete admissible splitting satisfying the additional requirement by showing
that D(s) can never be the subgraph of the corresponding B(s).
As a first application, we show how this new method leads to simple proofs
of previous results due to Nagamochi and Eades [9] on planarity-preserving com-
plete admissible splittings and some results of Bang-Jensen, Gabow, Jordán and
Szigeti [1] on partition-constrained complete admissible splittings. Then we use
our structural results to prove the following new “intersection theorem” for ad-
missible splittings: let two graphs G = (V + s, E) and H = (V + s, K) be given,
Edge-Splitting Problems with Demands 275

for which the two sets of edges incident to s in G and H coincide. Let G and H
be k- and l-edge-connected in V , respectively. Then there exists a pair of edges
su, sv which is admissible in G and H (with respect to k and l, respectively)
simultaneously, provided d(s) ≥ 6. This property may not hold if d(s) = 4 but
assuming k and l are both even we prove that a simultaneously admissible pair
always exists, showing that a simultaneously admissible complete splitting exists
as well.
Using these splitting results and the g-polymatroid intersection theorem we
give a min-max theorem and a polynomial algorithm for the simultaneous edge-
connectivity augmentation problem. In this problem two graphs G0 = (V, E 0 ),
H 0 = (V, K 0 ) and two integers k and l are given and the goal is to find a
smallest set of edges whose addition makes G0 (and H 0 ) k-edge-connected (l-
edge-connected, respectively) simultaneously. Our algorithm finds an optimal
solution if k and l are both even. In the remaining cases the size of the solution
does not exceed the optimum by more than one.

1.1 Definitions and Notation

Graphs in this paper are undirected and may contain parallel edges. Let G =
(V, E) be a graph. A subpartition of V is a collection of pairwise disjoint subsets
of V . The subgraph of G induced by a subset X of vertices is denoted by G[X].
A set consisting of a single vertex v is simply denoted by v. An edge joining
vertices x and y is denoted by xy. Sometimes xy will refer to an arbitrary copy
of the parallel edges between x and y but this will not cause any confusion.
Adding or deleting an edge e from a graph G is often denoted by G + e or G − e,
respectively.
For X, Y ⊆ V , d(X, Y ) denotes the number of edges with one endvertex
in X − Y and the other in Y − X. We define the degree of a subset X as
d(X) = d(X, V − X). For example d(v) denotes the degree of vertex v. The
degree-function of a graph G0 is denoted by d0 . The set of neighbours of v (or v-
neighbours, for short), that is, the set of vertices adjacent to vertex v, is denoted
by N (v). A graph G = (V, E) is k-edge-connected if

d(X) ≥ k for all ∅ =


6 X ⊂ V. (1)

The operation splitting off a pair of edges sv, st at a vertex s means replacing
sv, st by a new edge vt. If v = t then the resulting loop is deleted. We use Gv,t
to denote the graph obtained after splitting off the edges sv, st in G (the vertex
s will always be clear from the context). A complete splitting at a vertex s (with
even degree) is a sequence of d(s)/2 splittings of pairs of edges incident to s.

2 Preliminaries

This section contains some basic results. The degree function satisfies the fol-
lowing well-known equalities.
276 Tibor Jordán

Proposition 1. Let H = (V, E) be a graph. For arbitrary subsets X, Y ⊆ V ,

d(X) + d(Y ) = d(X ∩ Y ) + d(X ∪ Y ) + 2d(X, Y ), (2)

d(X) + d(Y ) = d(X − Y ) + d(Y − X) + 2d(X ∩ Y, V − (X ∪ Y )). (3)

In the rest of this section let s be a specified vertex of a graph G = (V + s, E)


with degree function d such that d(s) is even and (1) holds. Saying (1) holds in
such a graph G means it holds for all ∅ = 6 X ⊂ V . A set ∅ = 6 X ⊂ V is called
dangerous if d(X) ≤ k+1 and d(s, X) ≥ 2. (Notice that in the standard definition
of dangerous sets one does not require property d(s, X) ≥ 2.) A set ∅ = 6 X⊂V
is critical if d(X) = k. Two sets X, Y ⊆ V are crossing if X − Y , Y − X, X ∩ Y
and V − (X ∪ Y ) are all nonempty. Again these definitions refer only to subsets
of V . Edges sv, st form an admissible pair in G if Gv,t still satisfies (1). It is easy
to see that sv, st is not admissible if and only if some dangerous set contains
both t and v.
The statements of the following two lemmas can be proved by standard meth-
ods using Proposition 1. Most of them are well-known and appeared explicitely
or implicitely in [5] (or later in [1]). The proofs are omitted.

Lemma 1. (a) A maximal dangerous set does not cross any critical set.
(b) If X is dangerous then d(s, V − X) ≥ d(s, X).
(c) If k is even then two maximal dangerous sets X, Y which are crossing
have d(s, X ∩ Y ) = 0. t
u

Let t be a neighbour of s. A dangerous set X with t ∈ X is called a t-


dangerous set.

Lemma 2. Let v be an s-neighbour. Then exactly one of the following holds:


(o) The pair sv, su is admissible for every edge su 6= sv.
(i) There exists a unique maximal v-dangerous set X.
(ii) There exist precisely two maximal v-dangerous sets X, Y . In this case
k is odd and we have d(X) = d(Y ) = k + 1, d(X − Y ) = d(Y − X) = k,
d(X ∩ Y ) = k, d(X ∪ Y ) = k + 2, d(X ∩ Y, V + s − (X ∪ Y )) = 1, d(s, X − Y ) ≥ 1,
d(s, Y − X) ≥ 1, and d(X ∩ Y, X − Y ) = d(X ∩ Y, Y − X) = (k − 1)/2. t
u

An s-neighbour v for which alternative (i) (resp. (ii)) holds is called nor-
mal (special, respectively). Note that if k is even then there exist no special
s-neighbours by Lemma 2.
The previous lemmas include all ingredients of Frank’s proof [5] for the next
splitting off theorem due to Lovász.

Theorem 1. [8] Suppose that (1) holds in G = (V + s, E), k ≥ 2, d(s) is even


and |N (s)| ≥ 2. Then for every edge st there exists an edge su (t 6= u) such that
the pair st, su is admissible. t
u
Edge-Splitting Problems with Demands 277

3 The Structure of Non-admissibility Graphs


In this section let G = (V +s, E) be given which satisfies (1) with respect to some
k ≥ 2. First we introduce the non-admissibility graph B(s) = (N (s), E(B(s)))
of G (with respect to s) as follows. Let the vertices of B(s) correspond to the
neighbours of s and let two vertices u, v, u 6= v of B be adjacent if and only if
the pair su, sv is not admissible in G. Notice that while G may contain multiple
edges B(s) is always a simple graph. B(s) may consist of one vertex only. By
definition, two edges su, sv (u 6= v) form an admissible pair in G if and only if
uv ∈ E(B̄(s)), that is, uv is an edge of the complement of B(s).
This notion will be useful in problems where we search for admissible pairs
sx, sy for which the new edge xy obtained by splitting off sx and sy (x 6= y)
satisfies some extra property Π. For example, one may ask for an admissible
split which does not create parallel edges or whose end vertices are both in a
given set W of vertices, etc. Suppose that this extra property can be given by
a demand graph D(s) = (N (s), E(D(s))) defined on the set of neighbours of s
in such a way that xy satisfies Π if and only if xy ∈ E(D(s)). By definition,
a demand graph is simple. In the previous examples such a D(s) exists: it is
the complement of the subgraph of G induced by the s-neighbours in the first
case and the union of a complete graph on W and some isolated vertices in the
second case. If Π is given by a demand graph D(s) then a split satisfying Π will
be called a D(s)-split, or simply a D-split. It is clear from the definitions that
a D(s)-split exists if and only if D(s) and B̄(s) has a common edge. In other
words, a D(s)-split does not exist if and only if D(s) is a subgraph of B(s).
Thus by showing that D(s) cannot be the subgraph of B(s) we can guarantee
the existence of an admissible split satisfying property Π. In order to use this
kind of argument, we prove that B(s) satisfies certain structural properties.
In what follows assume that d(s) is even and |N (s)| ≥ 2. The definition of
B(s) and Theorem 1 imply the following.
Proposition 2. B(s) has no vertex which is adjacent to all the other vertices
of B(s). t
u
Now we characterize B(s) in the case when k is even. In this case the structure
of B(s) is particularly simple and can be described easily.
Theorem 2. Suppose that G = (V + s, E) satisfies (1) and k is even. Then
B(s) is the disjoint union of (at least two) complete graphs.
Proof. By Lemma 2 each s-neighbour is either isolated in B(s) or is normal.
Hence if su, sv and su, sw (v 6= w) are both non-admissible then v, w ∈ Xu
follows, where Xu is the unique maximal u-dangerous set. This implies that
the pair sv, sw is also non-admissible. By this transitivity property B(s) is the
union of pairwise disjoint complete graphs. B(s) itself cannot be complete by
Proposition 2. t
u
It can be seen that every graph consisting of (at least two) disjoint complete
graphs can be obtained as a non-admissibility graph. Theorem 2 implies that if
278 Tibor Jordán

k is even then B(s) is disconnected. Thus for any connected demand graph D(s)
there exists a D(s)-split. This simple observation, which illustrates our proof
method, will be used later.
Now let us focus on the case when k ≥ 3 is odd. In this case we do not give a
complete characterization of the non-admissibility graphs but characterize those
graphs G for which B(s) is 2-edge-connected. To state our result we need some
definitions. In a cyclic partition X = (X0 , ..., Xt−1 ) of V the t partition classes
{X0 , ..., Xt−1 } are cyclically ordered. Thus we use the convention Xt = X0 , and
so on. In a cyclic partition two classes Xi and Xj are neighbouring if |j−i| = 1 and
non-neighbouring otherwise. We say that G0 = (V 0 , E 0 ) is a Clp -graph for some
p ≥ 3 and some even l ≥ 2 if there exists a cyclic partition Y = (Y0 , ..., Yp−1 )
of V 0 for which d0 (Yi ) = l (0 ≤ i ≤ p − 1) and d(Yi , Yj ) = l/2 for each pair
Yi , Yj of neighbouring classes of Y (which implies d(Yi0 , Yj 0 ) = 0 for each pair of
non-neighbouring classes Yi0 , Yj 0 ). A cyclic partition of G0 with these properties
is called uniform.
Let G = (V + s, E) satisfy (1) for some odd k ≥ 3. Such a G is called round
d(s)
(from vertex s) if G − s is a Ck−1 -graph. Note that by (1) this implies that
d(s, Vi ) = 1 for each class Vi (0 ≤ i ≤ d(s) − 1) of a uniform partition V of G − s.
The following lemma will not be used in the proof of the main result of this
section but will be used later in the applications. We omit the proof.
Lemma 3. Let G = (V + s, E) satisfy (1) for some odd k ≥ 3. Suppose that
G is round from s and let V = (V0 , ..., Vr ) be a uniform partition of V , where
r = d(s) − 1. Then
(a) G−s is (k−1)-edge-connected and for every X ⊂ V with dG−s (X) = k−1
S
either X ⊆ Vi or V − X ⊆ Vi holds for some 0 ≤ i ≤ r or X = i+j i Vi for some
0 ≤ i ≤ r, 1 ≤ j ≤ r − 1.
(b) For any set I of r edges which induces a spanning tree on N (s) the graph
(G − s) + I is k-edge-connected.
(c) The uniform partition of G − s is unique.
(d) B(s) is a cycle on d(s) vertices (which follows the ordering of V). t
u
The main structural result is the following.
Theorem 3. Suppose that G = (V + s, E) satisfies (1), d(s) is even, |N (s)| ≥ 2
and k ≥ 3 is odd. Then B(s) is 2-edge-connected if and only if G is round from
s and d(s) ≥ 4.
Proof. First suppose that G is round from s and d(s) ≥ 4. Lemma 3(d) shows
B(s) is a cycle and hence B(s) is 2-edge-connected.
In what follows we prove the other direction and assume that B(s) is 2-edge-
connected. If there is no special s-neighbour then the proof of Theorem 2 shows
that B(s) is disconnected. Thus there exists at least one special s-neighbour.
Lemma 4. Suppose that t is special and let X and Y be the two maximal t-
dangerous sets. Let u be a special s-neighbour in Y − X. Let Y and Z denote
the two maximal u-dangerous sets in G and let su, sv be a non-admissible pair
with v ∈ Z − Y . Then Z ∩ X = ∅, Y = (X ∩ Y ) ∪ (Z ∩ Y ) and d(s, Y ) = 2.
Edge-Splitting Problems with Demands 279

Proof. Notice that for every special vertex u0 ∈ Y − X one of the two maximal
u0 -dangerous sets must be equal to Y , thus Y is indeed one of the two maximal
u-dangerous sets.
First suppose v ∈ X − Y . Since X and Y are the only maximal t-dangerous
sets we have t ∈ / Z. This shows that v is also special and the two maximal v-
dangerous sets are X and Z. Lemma 2 shows d(Z) = k + 1 and d(X − Y ) =
d(Y − X) = k. Hence by (3), applied to Z and X − Y and to Z and Y − X,
we obtain X − Y ⊂ Z and Y − X ⊂ Z. By Lemma 2 d(X ∩ Y ) = k. Hence
Z ∩X ∩Y = ∅ by Lemma 1, provided Z ∪(X ∩Y ) 6= V . Moreover, Z ∪(X ∩Y ) = V
implies d(s, Z) ≥ d(s) − 1 > d(s, V − Z), using d(s) ≥ 4. Since Z is dangerous,
this contradicts Lemma 1(b). Thus we conclude Z ∩ X ∩ Y = ∅.
The above verified properties of Z and Lemma 2 imply that k + 1 = d(Z) ≥
d(X ∩Y, Z)+ d(s, Z) ≥ d(X ∩Y, X − Y )+ d(X ∩Y, Y − X)+ 2 = k − 1 + 2 = k + 1.
This shows that d(s, Z) = 2 and hence d(s, Z ∪ (X ∩ Y )) = 3 holds. We also get
d(Z, V − Z − (X ∩ Y )) = 0 and by Lemma 2 we have d(X ∩ Y, V − Z − (X ∩
Y )) = 0 as well. Therefore d(Z ∪ (X ∩ Y ), V − Z − (X ∩ Y )) = 0 and hence
d(s, Z ∪ (X ∩ Y )) = d(Z ∪ X ∪ Y ) = 3 ≤ k. This shows Z ∪ X ∪ Y is dangerous,
contradicting the maximality of X.
Thus we may assume that v ∈ V −(X ∪Y ). By Lemma 2 we have d(Z) = k+1,
d(X −Y ) = d(Y −X) = d(X ∩Y ) = k and there exists an s-neighbour w ∈ X −Y .
Clearly, t ∈/ Z and by the previous argument we may assume w ∈ / Z. We claim
that Z ∩ (X − Y ) = ∅. Indeed, otherwise Z and X − Y would cross (observe that
t∈/ Z ∪ (X − Y )), contradicting Lemma 1(a). We claim that Z ∩ (X ∩ Y ) = ∅
holds as well. This claim follows similarly: since w ∈
/ Z ∪ (X ∩ Y ), Lemma 1(a)
leads to a contradiction.
A third application of Lemma 1(a) shows Y − X ⊂ Z. To see this observe
that t ∈/ Z ∪ (Y − X) and hence (Y − X) − Z 6= ∅ would imply that Z and Y − X
cross, a contradiction.
Summarizing the previous observations we obtain Z ∩ X = ∅ and Y = (X ∩
Y ) ∪ (Z ∩ Y ). By Lemma 2 this implies d(s, Y ) = 2. This proves the lemma. u t

From Lemma 2 we can see that for every normal s-neighbour v the v-
neighbours in B(s) induce a complete subgraph of B(s) and for every special
s-neighbour t the t-neighbours in B(s) can be divided into two nonempty parts
(depending on whether a t-neighbour belongs to X − Y or Y − X for the two
maximal t-dangerous sets X and Y ), such that each of them induces a complete
subgraph of B(s). By Lemma 4 we obtain that there are no edges between these
two complete subgraphs of B(s) and hence this bipartition of the t-neighbours
in B(s) into complete subgraphs is unique for every special s-neighbour t.
To deduce further properties of B(s) let us fix a special s-neighbour t and take
an s-neighbour r (r 6= t) which is not adjacent to t in B(s) (that is, for which
st, sr is admissible). Such an r exists by Proposition 2. Let the two maximal
t-dangerous sets be X and Y . Since B(s) is 2-edge-connected, there exist two
edge-disjoint paths P1 , P2 from t to r in B(s). Without loss of generality we
may assume that for the first edge uv of P1 which leaves X ∪ Y (that is, for
which u ∈ X ∪ Y and v ∈ / X ∪ Y ) we have u ∈ Y − X. Clearly, u is a special
280 Tibor Jordán

s-neighbour (for otherwise t and v, which are both neighbours of u, would be


adjacent in B(s)) and one of the two maximal u-dangerous sets is Y . Thus by
Lemma 4 we get d(s, Y ) = 2.
Therefore u is the only neighbour of t in B(s) in Y − X and hence P2 must
contain a t-neighbour in B(s) which corresponds to an s-neighbour in X − Y .
Let w be the last vertex of P2 in X − Y . As above, we obtain that w is special
and d(s, X) = 2. Hence t has degree two in B(s) and both neighbours of t in
B(s) are special. This argument is valid for each special s-neighbour.
Therefore, since B(s) is 2-edge-connected, each s-neighbour must be special
and B(s) must be a cycle. Since each s-neighbour is special, there are no parallel
edges incident to s. This shows that B(s) has d(s) vertices. Let v0 , ..., vd(s)−1
denote the vertices of B(s) = Cd(s) , following the cyclic ordering. Let Vi =
Xv1i ∩ Xv2i (0 ≤ i ≤ d(s) − 1), where Xv1i and Xv2i are the two maximal vi -
dangerous sets in G. Now by Lemma 2(ii) and Lemma 4 it is easy to see that
(V0 , ..., Vd(s)−1 ) is a uniform partition of G. This implies that G is round from
s, as required. t
u

4 Applications
In this section we apply Theorems 2 and 3 for proving the existence of complete
admissible splittings (or properties of maximal admissible splitting sequences)
when the set of split edges has to satisfy some additional requirement.

4.1 Edge-Splitting Preserving Planarity


The following theorem is due to Nagamochi and Eades. The new proof we present
here is substantially shorter.
Theorem 4. [9] Let G = (V + s, E) be a planar graph satisfying (1), where k is
even or k = 3. Then there exists a complete admissible splitting at s for which
the resulting graph is planar.
Proof. First suppose that we are given a fixed planar embedding of G. Notice
that this embedding uniquely determines a cyclic ordering C of (the edges inci-
dent to s and hence) the neighbours of s. Clearly, splitting off a pair su, sv for
which u and v are consecutive in this cyclic ordering preserves planarity (and a
planar embedding of the resulting graph can easily be obtained without reem-
bedding G − {su, sv}). Thus to see that a complete admissible splitting exists
at s which preserves planarity it is enough to prove that (*) there exists an
admissible splitting su, sv for which u and v are consecutive in C.
By repeated applications of (*) we obtain a complete admissible splitting
which preserves planarity (and the embedding of G − s as well). The existence
of a consecutive admissible splitting can be formulated in terms of a demand
graph. We may assume |N (s)| ≥ 3. Let C be a cycle defined on the neighbours
of s following the cyclic ordering C. Clearly, a consecutive admissible splitting
exists if and only if there exists a D(s)-split by choosing D(s) := C. If k is even
Edge-Splitting Problems with Demands 281

then Theorem 2 and the fact that D(s) is connected (while the non-admissibility
graph B(s) is disconnected) shows that (*) holds and hence the proof of Theorem
4 is complete in this case. (Note that during the process of iteratively splitting
off consecutive admissible pairs the demand cycle C has to be modified whenever
s looses some neighbour w by splitting off the last copy of the edges sw.)
Now consider the case k = 3. The above argument and Theorem 3 shows
(using the fact that D(s) is 2-edge-connected) that by splitting off consecutive
admissible pairs as long as possible either we find a complete admissible splitting
which preserves planarity or we get stuck in a graph G0 which is round from s
and for which BG0 (s) = DG0 (s) holds. In the latter case we need to reembed some
parts of G0 in order to complete the splitting sequence. We may assume that s
is inside one of the bounded faces F of G0 − s. Let V0 , ..., V2m−1 be the uniform
partition of V in G0 − s (where 2m := dG0 (s)) and let vi be the neighbour of s in
Vi (0 ≤ i ≤ 2m − 1). By Lemma 3 we can easily see that adding the edges v0 vm
and vi v2m−i (1 ≤ i ≤ m − 1) to G0 − s results in a 3-edge-connected graph G00 .
Observe that G00 is obtained from G0 by a complete admissible splitting and all
these edges can be added within F in such a way that in the resulting embedding
of G00 every edge crossing involves the edge v0 vm . To avoid these edge crossings
in G00 we “flip” V0 and/or Vm , that is, reembed the subgraphs induced by V0 and
Vm in such a way that after the flippings both v0 and vm occur on the boundary
of the unbounded face. Since G0 is round and k = 3 it is easy to see that this
can be done. Then we can connect v0 and vm within the unbounded face and
obtain a planar embedding of the resulting graph. Thus the proof of Theorem 4
is complete. t
u

The theorem does not hold if k ≥ 5 is odd, see [9]. Note that the above proof
implies that the graphs obtained by a maximal planarity preserving admissible
splitting sequence are round for every odd k ≥ 5. Moreover, it shows that if
k = 3 then at most two “flippings” are sufficient.

4.2 Edge-Splitting with Partition Constraints

Let G = (V + s, E) be a graph for which (1) holds and d(s) is even and let
P = {P1 , P2 , . . . , Pr }, 2 ≤ r ≤ |V | be a prescribed partition of V . In order to
solve a more general partition-constrained augmentation problem, Bang-Jensen,
Gabow, Jordán and Szigeti [1] investigated the existence of complete admissible
splittings at s for which each split edge connects two distinct elements of P.
They proved that if k is even and an obvious necessary condition holds (namely,
d(s, Pi ) ≤ d(s)/2 for every Pi ) then such a complete admissible splitting exists
and for odd k they characterized those graphs for which such a complete splitting
does not exist.
This partition-constrained edge-splitting problem can also be formulated as
an edge-splitting problem with demands. Here the demand graph is a complete
multipartite graph, which is either 2-edge-connected or is a star. Thus Theorems
2 and 3 can be applied and for several lemmas from [1] somewhat shorter proofs
can be obtained. We omit these proofs and note that, as we will point out, the
282 Tibor Jordán

partition-constrained edge-splitting problem turns out to be a special case of the


‘simultaneous edge-splitting problem’ that we discuss in detail in Section 5.

5 Simultaneous Edge-Splitting and Edge-Connectivity


Augmentation
In this section we deal with the following optimization problem: let G = (V, E)
and H = (V, K) be two graphs on the same set V of vertices and let k, l be
positive integers. Find a smallest set F of new edges for which Ĝ = (V, E + F )
is k-edge-connected and Ĥ = (V, K + F ) is l-edge-connected. Let us call this the
simultaneous edge-connectivity augmentation problem. We give a polynomial
algorithm which finds an optimal solution if both k and l are even and finds a
solution whose size is at most one more than the optimum otherwise. One of the
two main parts of our solution is a new splitting theorem which we prove using
Theorems 2 and 3.
If G = H (and hence wlog k = l) then the problem reduces to finding a
smallest set F of edges for which Ĝ = (V, E + F ) is k-edge-connected. This is the
well-solved k-edge-connectivity augmentation problem. For this problem there
exist several polynomial algorithms to find an optimal solution. One approach,
which is due to Cai and Sun [2] (simplified and extended later by Frank [5]),
divides the problem into two parts: first it extends G by adding a new vertex
s and a smallest set F 0 of edges incident to s such that |F 0 | is even and G0 =
(V + s, E + F 0 ) satisfies (1) with respect to k. Then in the second part, using
Theorem 1, it finds a complete admissible splitting from s in G0 . The resulting
set of split edges will be an optimal solution for the augmentation problem.
Our goal is to extend this approach for the simultaneous augmentation prob-
lem. To do this we have to extend both parts: we need an algorithm which finds a
smallest F 0 incident to s for which G0 = (V +s, E +F 0 ) and H 0 = (V +s, K +F 0 )
simultaneously satisfy (1) with respect to k and l, respectively, and then we have
to prove that there exists a complete splitting at s which is simultaneously ad-
missible in G0 and H 0 .
Both of these extended problems have interesting new properties. While a
smallest F 0 can be found by a greedy deletion procedure in the k-edge-connectivi-
ty augmentation problem, this is not the case in the simultaneous augmentation
problem. Moreover, a complete splitting at s which is admissible in G0 and
H 0 does not always exist. (To see this let V = {a, b, c, d}, E = {ac, bd}, K =
{ab, bc, cd, da}, F 0 = {sa, sb, sc, sd} and let k = 2, l = 3.) However, as we shall
see, a smallest F 0 can be found in polynomial time by solving an appropriate
submodular flow problem. Furthermore, if k and l are both even then the required
simultaneous complete admissible splitting does exist (and an “almost complete”
splitting sequence can be found otherwise).

5.1 Simultaneous Edge-Splitting


Let us start with the solution of the splitting problem. Let G = (V + s, E + F )
and H = (V + s, K + F ) be given which satisfy (1) with respect to k and l,
Edge-Splitting Problems with Demands 283

respectively, and let |F | be even, where F denotes the set of edges incident to
s. We say that a pair su, sv is legal if it is admissible in G as well as in H. A
complete splitting sequence at s is legal if the resulting graphs (after deleting s)
satisfy (1) with respect to k and l, respectively. Let d(s) := dG (s) = dH (s).
Theorem 5. If d(s) ≥ 6 then there exists a legal pair su, sv. If k and l are both
even then there exists a complete legal splitting at s.
Proof. The property of being legal can be formulated in terms of a demand graph
D(s). Namely, a pair su, sv (u 6= v) is legal if and only if su, sv is a D(s)-split
in G with respect to D(s) = B̄H (s). Thus the existence of a legal pair follows
if we show that B̄H (s) cannot be a subgraph of BG (s). Let D := B̄H (s) and
A := BG (s). We may assume |N (s)| ≥ 4. (Otherwise, by Proposition 2, BH (s)
has an isolated vertex and hence D has a vertex which is connected to all the
other vertices. A has no such vertex.)
First suppose that k and l are both even. In this case Theorem 2 shows that
D is connected (since it is the complement of a disconnected graph) and A is
disconnected. This implies that a legal pair exists for arbitrary even d(s). Hence
a complete legal splitting exists as well. Now suppose that k is odd and l is
even. As above, we can see that D, which is a complete multipartitite graph by
Theorem 2, is either 2-edge-connected (and is not a cycle) or contains a vertex
which is connected to all the other vertices or is a four-cycle. In the first two
cases Theorem 3 and Proposition 2 show that D cannot be a subgraph of A.
In the last case A is 2-edge-connected if no legal split exists. Theorem 3 shows
that this may happen only if A = C4 and G is round. In that case there are no
parallel edges incident to s and d(s) = 4. This proves the theorem when one of
k and l is even.
Finally, suppose that k and l are both odd. In this case the proof is more
complicated. First we prove the following.
Lemma 5. Let H = (V + s, E) satisfy (1) with respect to some odd k ≥ 3. Let
B := BH (s) be the non-admissibility graph of H and let D := B̄. Then one of
the following holds:
(i) D is 2-edge-connected and D is not a cycle,
(ii) D has a vertex which is adjacent to all the other vertices,
(iii) D = C4 ,
(iv) B arises from a complete graph Km (m ≥ 2) by attaching a path of
length two to some vertex of Km ,
(v) B = C4 .
Proof. If B is 2-edge-connected then Theorem 3 gives B = Cl for some even
l ≥ 4. If l = 4 then (v) holds, otherwise (i) holds. Hence we may assume that B
is not 2-edge-connected. In what follows suppose that neither (i) nor (ii) holds.
Let S := N (s) denote the set of vertices of B and D.
Case I: B is disconnected.
Since (ii) does not hold, S has a bipartition S = X ∪ Y , X ∩ Y = ∅, for
which there are no edges from X to Y in B and |X|, |Y | ≥ 2. Let p := |X| and
284 Tibor Jordán

r := |Y |. Now D contains a spanning complete bipartite graph Kp,r and hence


D is 2-edge-connected. If p ≥ 3 or r ≥ 3 then (i) holds. Thus D = C4 follows,
showing (iii) holds.
Case II: B is connected (and has at least one cut-edge).
First consider the subcase when there exists an X ⊂ S with |X|, |S − X| ≥ 2
and dB (X) = 1. Let p := |X| and r := |S − X| and let e = xv, v ∈ X, be
the unique edge leaving X in B. Now D contains a spanning complete bipartite
graph minus one edge. Therefore (i) holds if p, r ≥ 3 and hence we may assume
that at least one of p and r is equal to 2. If p = r = 2 then D = P4 , showing
(iv) holds. In the rest of the investigation of this subcase assume p = 2, r ≥ 3
holds. Let X = {v, w}. Since B is connected, the edge vw is present in B. Now
D − x is 2-edge-connected, thus if dD (x) ≥ 2 then (i) holds. Hence dD (x) = 1
which means x is adjacent to every vertex y ∈ S − X − {x} in B. Since v is not
adjacent to any vertex in S − X − {x} we can see that x is special and S − X
induces a complete graph in B by Lemma 2 and Lemma 4. Notice that v is also
special since its neighbours x and w are not adjacent. Thus B can be obtained
from the complete graph B[S − X] by attaching xv and vw. This gives (iv).
Now consider the subcase when one of the components we obtain by deleting
an arbitrary cut-edge from B is a sigleton vertex. This means B arises from a
2-edge-connected subgraph M (possibly M is a single vertex) by attaching some
leaves, i.e. vertices of degree one. If two leaves aq, bq are adjacent to the same
vertex q ∈ M then q has three pairwise non-adjacent neighbours in B (using
|S| ≥ 4). Thus the set of neighbours of q in B cannot be the union of two complete
graphs, contradicting Lemma 2. Thus each vertex q ∈ M has at most one leaf
adjacent to it. If the total number of leaves is at least 3 then this property implies
that D contains a spanning complete bipartite graph Kp,r , (p, r ≥ 3) minus
some independent edges, where one of the color classes, in addition, induces a
complete graph. This shows (i) holds. Thus we may assume that the number of
leaves is at most two. Let xu be a leaf (and let yv be the other leaf, if exists)
with u, v ∈ M . Now |M | ≥ 2 holds and as we remarked, M is 2-edge-connected.
By Lemma 2 u is special. If u is not adjacent to each of the vertices in M
then there exist z, w ∈ M with uz, zw ∈ E(B) and uw ∈ / E(B). Now Lemma
4, applied to the two maximal u-dangerous sets and z shows dB (u) = 2 and
dM (u) = 1, contradicting the 2-edge-connectivity of M . Thus u is adjacent to
each vertex in M . This contradicts Proposition 2 if ux is the unique leaf, hence
the other leaf yv exists. Now Lemma 4, applied to the two maximal u-dangerous
sets and v shows M = {u, v}, contradicting the 2-edge-connectivity of M . This
contradiction proves the lemma. t
u
Recall that to prove the theorem we need to show that D is not a subgraph
of A provided d(s) ≥ 6. Thus we are done if (i),(ii) or (v) holds by Theorem 3
and Proposition 2.
Suppose that D = C4 and D is a subgraph of A. Then A is 2-edge-connected
and by Theorem 3 we have A = C4 and d(s) = 4. This settles case (iii). Now
assume (iv) holds. If m = 2 then D = P4 (i.e. D is a path on four vertices).
In this case if A contains D then by Proposition 2 either A = C4 , in which
Edge-Splitting Problems with Demands 285

case d(s) = 4 follows, or A = P4 . Observe that Lemma 4 implies that the two
inner vertices of A are special in G. Now B = D̄ is a P4 as well. Hence the
two inner vertices of B (which are disjoint from the inner vertices of A) are
special in H. From this it follows that there are no parallel edges incident to s
and d(s) = 4. Furthermore, B = P4 implies that there exists a dangerous set
X in H which contains the two inner vertices x, y of B. It is easy to see that
V − X is also dangerous in H and contains the other two neighbours u, v of s,
which correspond to the two (non-adjacent) vertices of B with degree one. Thus
u and v should also be adjacent in B, a contradiction. This solves case (iv) when
m = 2.
Finally assume that (iv) holds with m ≥ 3 and D is a subgraph of A. Using
the notation of Lemma 5 we obtain that the edge vw is not present in A by
Proposition 2. Hence all the vertices in S − X − {x} are special in G by Lemma
2. If w is normal in G then A[S−X] is complete. Thus each vertex u in S−X−{x}
is adjacent to all the other vertices in A, contradicting Proposition 2. (Now such
a u exists since r ≥ 3.) Thus w is special in G. By Lemma 2 and Lemma 4
S − X can be partitioned into two non-empty complete subgraphs J and L in
A according to the two maximal w-dangerous sets J 0 and L0 in G. Without loss
of generality assume that x ∈ J. Let u ∈ L. Now Lemma 4, applied to L0 , u and
v gives L = {u}. If there exists a vertex y ∈ J − x then Lemma 4, applied to
J 0 , y and v gives J = {y}, a contradiction. Thus we conclude that s has four
neighbours {v, w, u, x} only, contradicting m ≥ 3. This completes the proof of
the theorem. t
u
From the above proof we can easily deduce the following.
Corollary 1. Suppose that k or l is odd, d(s) = 4 and there exists no legal
pair. Then BG (s) = C4 or BH (s) = C4 and hence at least one of G and H is
round. t
u
Note that a complete splitting sequence which is simultaneously admissible
in three (or more) graphs does not necessarily exist, even if each of the edge-
connectivity values is even. We also remark that the partition-constrained split-
ting problem can be reduced to a simultaneous edge-splitting problem where at
least one of k and l is even. To see this suppose that an instance of the partition-
constrained splitting problem is given as in the beginning of Section 4.2. Let
dm := maxi {dG (s, Pi )} in G = (V + s, E + F ) and let S := NG (s). Build graph
H = (S + x + s, K + F ) as follows. For each set S ∩ Pi in G let the corresponding
set in H induce a (2dm )-edge-connected graph (say, a complete graph with suf-
ficiently many parallel edges or a singleton). The edges incident to s in G and
H coincide. Then from vertex x of H add 2dm − dG (s, Pi ) parallel edges to some
vertex of S ∩ Pi (1 ≤ i ≤ r). Now H satisfies (1) with respect to l := 2dm . It can
be seen that a complete admissible splitting satisfying the partition-constraints
in G exists if and only if there exists a complete legal splitting in the pair G, H.
(Now the sets of vertices of G and H may be different. However, it is easy to
see that the assumption V (G) = V (H) is not essential in the simultaneous edge-
splitting problem.) This shows that characterizing the pairs G, H for which a
286 Tibor Jordán

complete legal splitting does not exist (even if one of k and l is even) is at least
as difficult as the solution of the partition-constrained problem [1].

5.2 Simultaneous Edge-Connectivity Augmentation


Let G = (V, E) and H = (V, K) be given for which (1) holds with respect to
k and l, respectively. First we show how to find a smallest F 0 for which the
extended graphs G0 = (V + s, E + F 0 ) and H 0 = (V + s, K + F 0 ) simultaneously
satisfy (1) with respect to k and l, respectively. As we will see, this problem
turns out to be a special submodular flow problem and hence can be solved in
polynomial time (see e.g. Cunningham and Frank [3]).
Let V be a finite ground-set and let p : 2V → Z ∪ {−∞} be an integer-valued
function for which p(∅) = 0. We call p fully supermodular if p(X) + p(Y ) ≤
p(X ∩ Y ) + p(X ∪ Y ) holds for every X, Y ⊆ V . If p is fully supermodular and
monotone increasing then C(p) := {x ∈ RV : x ≥ 0, x(A) ≥ p(A) for every A ⊆
V } is called a contra-polymatroid. A function p : 2V → Z ∪ {−∞} is skew
supermodular if for every X, Y ⊆ V either the above submodular inequality
holds or p(X) + p(Y ) ≤ p(X − Y ) + p(Y − X). The next result is due to Frank.
Theorem 6. [5] Let p be a skew supermodular function. Then C(p) is a contra-
polymatroid whose unique (monotone, fully supermodular) defining function p̄ is
given by
X
p̄(X) := max( p(Xi ) : {X1 , ..., Xt } is a subpartition of X). t
u
t

Given a graph G = (V, E) and k ∈ Z+ let pG : 2V → Z be defined by


p(X) := k − dG (X), if ∅ = 6 X 6= V and pG (∅) = pG (V ) = 0. This function
pG is skew-supermodular. Following [5], we say that a vector z : V → Z+
is an augmentation vector of G if z(X) ≥ pG (X) for every X ⊆ V . Observe
that G0 = (V + s, E + F ) satisfies (1) with respect to k if and only if z(v) :=
dF (v) (v ∈ V ) is an augmentation vector. Hence by Theorem 6 the problem
of finding a smallest F can be solved by finding an integer valued element of
the contra-polymatroid C(pG ) for which z(V ) is minimum. This can be done
by a greedy algorithm. Similarly, F 0 is simultaneously good for G and H if and
only if z(X) ≥ max{pG (X), pH (X)} for every X ⊆ V , where z(v) := dF 0 (v),
(v ∈ V ). Let us call such a z a common augmentation vector of G and H.
Observe that finding a common augmentation vector is equivalent to finding
an integer valued z ∈ C(pG ) ∩ C(pH ) for which z(V ) is minimum. By the g-
polymatroid intersection theorem (see Frank and Tardos [7] and [4]) the system
S(pG , pH ) = {x ∈ RV : x ≥ 0, x(A) ≥ max{pG (A), pH (A)}} is a submodular
flow system and hence we can find an integer valued x ∈ S(pG , pH ) minimizing
x(V ) in polynomial time.
Summarizing the previous results we obtain the following algorithm for the
simultaneous edge-connectivity augmentation problem. Let G = (V, E) and H =
(V, K) (satisfying (1) with respect to k and l, resp.) be the pair of input graphs.
Edge-Splitting Problems with Demands 287

(Step 1) Find a common augmentation vector for G and H for which z(V ) is as
small as possible.
(Step 2) Add a new vertex s to each of G and H and z(v) parallel edges from s
to v for every v ∈ V . If z(V ) is odd then add one more edge sw for some w ∈ V .
(Step 3) Find a maximal legal splitting sequence S at s in the resulting pair
of graphs. If S is complete, let the solution F consist of the set of split edges.
Otherwise splitting off S results in a pair of graphs G0 , H 0 for which the degree
of s is four. In this case delete s and add a (common) set I of three properly
chosen edges to G0 − s and H 0 − s. Let the solution F be the union of the split
edges and I.
The following theorem shows the correctness of the above algorithm and
proves that the solution set F is (almost) optimal. Let us define
Pr Pt
Φk,l (G, H) = max{ 1 (k − dG (Xi )) + r+1 (l − dH (Xi ) :
{X1 , ..., Xt } is a subpartition of V ; 0 ≤ r ≤ t}.
The size of a smallest simultaneous augmenting set for G and H (with respect
to k and l, resp.) is denoted by OP Tk,l (G, H).

Theorem 7. dΦk,l (G, H)/2e ≤ OP Tk,l (G, H) ≤ dΦk,l (G, H)/2e + 1. If k and l
are both even then OP Tk,l (G, H) = dΦk,l (G, H)/2e holds.

Proof. It is easy to see that dΦk,l (G, H)/2e ≤ OP Tk,l (G, H) holds. We will show
that the above algorithm results in a simultaneous augmenting set F with size
at most dΦk,l (G, H)/2e + 1 (and with size dΦk,l (G, H)/2e, if k and l are both
even). From the g-polymatroid intersection theorem and our remarks on common
augmentation vectors it can be verified that for the vector z that we obtain in
Step 1 of the above algorithm we have z(V ) = Φk,l (G, H) and hence we have
2dΦk,l (G, H)/2e edges incident to s at the end of Step 2. By Theorem 5 we can
find a maximal sequence of legal splittings in Step 3 which is either complete or
results in a pair of graphs G0 , H 0 , where the degree of s is four. In the former case
the set F of split edges, which is clearly a feasible simultaneous augmenting set,
has size dΦk,l (G, H)/2e, and hence is optimal. If k and l are both even then such
a complete legal splitting always exists, proving OP Tk,l (G, H) = dΦk,l (G, H)e/2.
In the latter case by Corollary 1 one of G0 and H 0 , say G0 , is round. Now there
exists a complete admissible splitting in H 0 by Theorem 1. Let e = uv, f = xy
be the two edges obtained by such a complete splitting. Let g = vx. By Lemma
3(b) adding the edge set I := {e, f, g} to G0 yields a k-edge-connected graph.
Thus the set of edges F which is the union of the edges obtained by the maximal
legal splitting sequence and the edge set I is a simultaneous augmenting set. Now
|F | = dΦk,l (G, H)/2e + 1, as required. t
u

There are examples showing OP T = Φ + 1 may hold if one of k and l is odd.


It is easy to see that the above algorithm can be implemented in polynomial
time. (As we pointed out, Step 1 is a submodular flow problem and hence can
be solved in polynomial time. One approach to solve Step 3 efficiently is using
max-flow computations to check whether a pair of edges is legal or not.) We
omit the algorithmic details from this version.
288 Tibor Jordán

References
1. J. Bang-Jensen, H.N. Gabow, T. Jordán and Z. Szigeti, Edge-connectivity aug-
mentation with partition constraints, Proc. 9th Annual ACM-SIAM Symposium
on Discrete Algorithms (SODA) 1998, pp. 306-315. To appear in SIAM J. Discrete
Mathematics.
2. G.R. Cai and Y.G. Sun, The minimum augmentation of any graph to a k-edge-
connected graph, Networks 19 (1989), 151–172.
3. W.H. Cunningham, A. Frank, A primal-dual algorithm for submodular flows,
Mathematics of Operations Research, 10 (1985) 251-262.
4. J. Edmonds, R. Giles, A min-max relation for submodular functions on graphs,
Annals of Discrete Mathematics 1 (1977) 185-204.
5. A. Frank, Augmenting graphs to meet edge–connectivity requirements, SIAM J.
Discrete Mathematics, 5 (1992) 22–53.
6. A. Frank, Connectivity augmentation problems in network design, in: Mathemat-
ical Programming: State of the Art 1994, (Eds. J.R. Birge and K.G. Murty), The
University of Michigan, Ann Arbor, MI, 34-63, 1994.
7. A. Frank and É. Tardos, Generalized polymatroids and submodular flows, Math-
ematical Programming 42, 1988, pp 489-563.
8. L. Lovász, Combinatorial Problems and Exercises, North-Holland, Amsterdam,
1979.
9. H. Nagamochi and P. Eades, Edge-splitting and edge-connectivity augmentation
in planar graphs, R.E. Bixby, E.A. Boyd, and R.Z. Rios-Mercado (Eds.): IPCO VI
LNCS 1412, pp. 96-111, 1998., Springer-Verlag Berlin Heidelberg 1998.
Integral Polyhedra Associated with Certain
Submodular Functions Defined on 012-Vectors

Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

Graduate Division of International and Interdisciplinary Studies,


University of Tokyo
Komaba, Meguro-ku, Tokyo 153-8902, Japan
{kashiwa,nakamura,takashi}@klee.c.u-tokyo.ac.jp

Abstract. A new class of polyhedra, named greedy-type polyhedra, is


introduced. This class contains polyhedra associated with submodular
set functions. Greedy-type polyhedra are associated with submodular
functions defined on 012-vectors and have 012-vectors as normal vectors
of their facets. The face structure of greedy-type polyhedra is described
with maximal chains of a certain partial order defined on 012-vectors.
Integrality of polyhedra associated with integral greedy-type functions is
shown through total dual integrality of the systems of inequalities defin-
ing polyhedra. Then a dual algorithm maximizing linear functions over
these polyhedra is proposed. It is shown that feasible outputs of certain
bipartite networks with gain make greedy-type polyhedra. A separation
theorem for greedy-type functions is also proved.

1 Introduction
J. Edmonds and R. Giles introduced the notion of total dual integrality of sys-
tems of inequalities [1]. Several combinatorial optimization problems can be for-
mulated as linear programming problems, the sets of whose feasible solutions
are described by totally dual integral systems of inequalities [9], [10]. Such sys-
tems of linear inequalities have in common a part derived from submodular or
supermodular set functions. Linear inequalities defined by set functions deter-
mine hyperplanes whose normal vectors are 01-vectors. Thus facets of feasible
polyhedra for such problems have only 01-vectors as their normal vectors.
K. Murota, in his theory of “discrete convex analysis” [7], introduced func-
tions on integer lattice points, called M-convex and L-convex functions. An L-
convex function is a generalization of a submodular set function and satisfies
submodularity on integer lattice points:

f (p) + f (q) > S


= f (p ∨ q) + f (p ∧ q) for any p, q ∈ ZZ . (1)

He proved duality theorems for L- and M- convex functions based on the dis-
crete separation theorem by A. Frank [2] and showed that submodularity may
be regarded as a discrete version of convexity.
While L-convex functions satisfy (1), the submodularity of L-convex functions
is essentially that of set functions. (It corresponds to the fact that the effective

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 289–303, 1999.
c Springer-Verlag Berlin Heidelberg 1999
290 Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

domain of an M-convex function makes a polyhedron, normal vectors of whose


facets are 01-vectors. See [7].) We would like to investigate polyhedra reflecting
submodularity on integer lattice points, facets of which polyhedra may have “non
01-vectors” as their normal vectors. And then we would like to discuss total dual
integrality and convexity of such polyhedra. In the first step, we study polyhedra
described with submodular functions defined on 012-vectors, which make totally
dual integral systems of inequalities.

2 Preliminaries

Throughout this paper, S denotes a nonempty finite set. For a set X, the symbol
χX denotes the characteristic vector of X ⊆ S. For an element e ∈ S, the
characteristic vector of e is denoted by χe . For a vector p ∈ IRS , the symbol
supp(p) denotes {e ∈ S | p(e) 6= 0} and suppk (p) denotes {e ∈ S | p(e) = k}
for any real k. For vectors p, q ∈ IRS , the vectors s and t defined by s(e) :=
max{p(e), q(e)} and t(e) := min{p(e), q(e)} for P e ∈ S are denoted by p ∨ q and
p ∧ q, respectively. The 1-norm of p ∈ IRS is e∈S |p(e)| and denoted by ||p||.
The symbol ZZ + (IR+ ) denotes the set of nonnegative integers (reals), re-
spectively.
A chain of a partially ordered set (S, ) is a subset of S which is totally
ordered with . A chain is maximal if it is a proper subset of no chain. We write
(p1 , . . . , pm ) for a chain {p1 , . . . , pm } with pi  pi+1 for i ∈ 1, . . . , m − 1.
The support function of a convex set C ⊆ IRn is the function δC ∗
: IRn → IR

defined by δC (x) := sup{x y | y ∈ C}. Any support function f is known to be
T

positively homogeneous [8], that is, f (kp) = kf (p) holds for any positive real k.
A set C ⊆ IRn is called down-monotone if for each y ∈ S all vectors x ∈ IRS
with x < = y are in C.
A system of inequalities Ax < = b, where A is an m × n-matrix and b is an n
dimensional vector, is totally dual integral (TDI ) if (i) A and b are rational and
(ii) for each integral vector c such that the LP program max{cT x | Ax < = b} has
the finite optimal value, its dual LP problem min{y T c | yA = b, y > = } has an
0
integral solution. The system is box TDI if the system

Ax <
= b; l< <
=x=u
is TDI for each pair of rational vectors l and u [1]. It is proved in [1] that if
a system of inequalities Ax < = b is TDI with an integral vector b, then {x ∈
IRS | Ax <
= b} is an integral polyhedron.
A set function f : 2S → IR is submodular if it satisfies f (X) + f (Y ) > =
f (X ∪ Y ) + f (X ∩ Y ) for any X, Y ⊆ S. We assume f (∅) = 0 in the following.
S
A polyhedron P (f ) is associated withP a submodular set function f on 2
with f (∅) = 0 by P (f ) = {x ∈ IRS | e∈X x(e) < = f (X) (X ⊆ S)}. Such a
polyhedron is known to be TDI and box TDI [1].
A submodular set function f on 2S → IR is extended to a convex function fˆ
on IRS+ by fˆ(p) := max{pT x | x ∈ P (f )} (p ∈ IR+ ) in [6]. This extended function
Integral Polyhedra Associated with Certain Submodular Functions 291

fˆ is called the Lovász extension of f . It is proved in [7] that g : ZZ S → ZZ is the


restriction of the Lovász extension of an integer-valued submodular set function
to ZZ S if and only if g satisfies the following properties:

(L0) g is positively homogeneous;


(L1) g(p) + g(q) > S
= g(p ∨ q) + g(p ∧ q) for any p, q ∈ ZZ ;
(L2) There exists an integer r such that g(p + 1l) = g(p) + r for any p ∈ ZZ S .

It is obvious from (L2) the values of the Lovász extension for any nonnegative
vectors are nonnegative combinations of values for 01-vectors.
A function g : ZZ S → ZZ ∪{+∞} is called L-convex if g satisfies (L1) and (L2)
and if g(p) ∈ ZZ for some p ∈ ZZ S [7]. The class of L-convex functions thereby
contains the restriction of the Lovász extension to ZZ S of any integer-valued
submodular set function.

3 Definition of Greedy-Type Polyhedra

In this section we introduce a class of polyhedra, named greedy-type polyhedra,


defined with functions on 012-vectors.

Definition 1 (Greedy-type function). Let S be a nonempty finite set. We


call a function f : {0, 1, 2}S → IR greedy-type on {0, 1, 2}S if it satisfies the
following properties:

(G1) f (2p) = 2f (p) for every p ∈ {0, 1}S ;


(G2) f (p) + f (q) > S
= f (p ∨ q) + f (p ∧ q) for any p, q ∈ {0, 1, 2} ;
X Y X Z > Y Z X
(G3) f (χ + χ ) + f (χ + χ ) = f (χ + χ ) + f (2χ ) for any X, Y, Z ⊆ S
with X ( Y ( Z.

We refer simply as (G1), (G2), and (G3) to the respective properties in


Definition 1 in what follows.

Remark 2. Property (G1) can be regarded as positive homogeneity and im-


plies f (00) = 0. Property (G2) is submodularity. Property (G3) implies, f (p) < =
2
f (χsupp(p) ) + f (χsupp (p) ) as its special case. This inequality shows that values
of greedy-type functions for non 01-vectors possess non-redundant information,
while those of the Lovász extensions of submodular set functions do not.

Remark 3. Restrictions of the Lovász extensions of submodular set functions to


{0, 1, 2}S satisfy (G1), (G2), and (G3). They are thereby greedy-type functions.
L-convex functions satisfy (G2) but neither (G1) nor (G3), which implies that
restrictions of L-convex functions to {0, 1, 2}S are not greedy-type.
On the other hand, an integral greedy-type function may not be the restric-
tion of any L-convex function. An example of such functions, the cardinality of
whose support is two, is defined by f ((0, 0)) := 0, f ((1, 0)) := 10, f ((0, 1)) := 9,
292 Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

f ((1, 1)) := 14, f ((2, 1)) := 22, f ((1, 2)) := 22. But f fails (L2) and is the re-
striction of no L-convex function. (This function f is not the restriction of any
L\ -convex function introduced in [4], either.)
Functions on {0, ±1}S , such as bisubmodular functions and functions de-
termining ternary semimodular polyhedra, correspond to certain functions on
{0, 1, 2}S with a 1 to 1 mapping between {0, ±1} and {0, 1, 2}. (See [3] for defini-
tions of bisubmodular functions and ternary semimodular polyhedra.) Functions
on {0, 1, 2}S corresponding to bisubmodular functions do not satisfy (G2) with
any mapping. On the other hand, there exists a mapping with which functions
on {0, 1, 2}S corresponding to those on {0, 1}S determining ternary semimodular
polyhedra satisfy (G2). But the greedy-type function f described in the previous
paragraph also fails the necessary condition for functions determining ternary
semimodular polyhedra.
The focus of this paper is to study polyhedra associated with greedy-type
functions in the following way.
Definition 4 (Greedy-type polyhedra). A polyhedron in P ⊆ IRS is called
a greedy-type polyhedron if there exists a greedy-type function on IRS such that
P = {x ∈ IRS | pT x < S
= f (p) (p ∈ {0, 1, 2} )}. Then the polyhedron P is also
denoted by P (f ), where f is such a greedy-type function.
We give examples of such polyhedra and polyhedra associated with submod-
ular set functions in Fig. 1.

A greedy−type polyhedron The polyhedron associated with


a submodular set function

Fig. 1. A greedy-type polyhedron and the polyhedron associated with a sub-


modular set function

4 Face Structure of Greedy-Type Polyhedra


In this section, the face structure of greedy-type polyhedra is described with
maximal chains of a poset defined on a subset of 012-vectors. We first make
several definitions necessary for the definition of this poset.
Integral Polyhedra Associated with Certain Submodular Functions 293

We write Ir(S)2 for {0, 1, 2}S \{0, 2}S , which is the set of “irreducible” vectors
in {0, 1, 2}S .
Now let S consist of n elements s1 , . . . , sn . A permutation σ of {1, . . . , n}
gives a total order < < <
=σ on S such that sσ(i) =σ sσ(j) for every i, j with i = j. We
write <=σ for this total order. For k1 ∈ {0, 1, . . . , n}, we write vecσ 1 for the
(k )
characteristic vector of the set {sσ(i) ∈ S | 1 < = = i < k1 }. The symbol vecσ (k2 , k1 )
denotes vecσ (k2 ) + vecσ (k1 ) in what follows. We sometimes omit σ when there
is no danger of confusion.
A vector p ∈ {0, 1, 2}S is simply non-increasing under < =σ if p(sσ(i) ) =
>
p(sσ(j) ) for any i, j ∈ {1, . . . , n} with i <
= j.
With these definitions, we introduce a partial order, the set of whose maximal
chains is the main issue of this section.
Definition 5 (Partial order 2 ). We define a partial order 2 on Ir(S)2 so
that p 2 q if both p and q are simply non-increasing under some total order on
S and if
(a) p <
= q and supp(p) = supp(q), or,
(b) p <
= q and supp(p) ⊆ supp (q).
2

The check for well-definedness of this partial order is easy and left for readers.
Here we describe maximal chains of the poset (Ir(S)2 , 2 ).
Lemma 6. Let S be a nonempty set which consists of n elements 1, . . . , n. Then
a sequence of vectors (p1 , . . . , pk ) in Ir(S)2 is a maximal chain of the partial
order 2 if and only if
(I) the number k of vectors equals n, and,
(II) there exists a total order < =σ on S such that
(i) pn = vecσ (n − 1, n), and,
(ii) pi is either vecσ (i − 1) + χsupp(pi+1 ) or vecσ (i − 1, i) for i = 1, . . . , n − 1.
Proof. Obviously a sequence of n vectors (p1 , . . . , pn ) satisfying (I) and (II)
makes a chain of the partial order. From Definition 5, two vectors in a chain
must have different number of “2” ’s. Thus this chain is maximal.
We now prove the “only if” part. It is obvious that any chain of 2 consists
of at most n vectors. Let q1 , q2 be arbitrary vectors in Ir(S)2 with q1 2 q2 and
|supp2 (q1 )| + 1 < |supp2 (q2 )|. Then, for any e ∈ supp2 (q2 )\supp2 (q1 ), the vector
q2 −χe is between q1 and q2 with respect to 2 . Thus any maximal chain consists
of exactly n vectors. Inclusion relationship of supp2 (pi )’s determines one unique
total order < =σ on S under which all pi ’s are simply non-increasing. It is obvious
that (i) and (ii) must be satisfied for this total order. t
u

Corollary 7. Any maximal chain of (Ir(S)2 , 2 ), where S consists of n ele-


ments, contain exactly n elements. Let < =σ be the total order on S under which
all vectors of a maximal chain (p1 , . . . , pn ) are simply non-increasing. Then pk
coincides with vecσ (k − 1, j) for some j with k < <
= j = n and pi = vecσ (i − 1, j)
< <
holds for nay i with k = i = j.
294 Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

(2, 2, 1) (2, 1, 2) (1, 2, 2)

(2, 1, 0)(2, 1, 1)(2, 0, 1)(1, 2, 0)(1, 2, 1)(0, 2, 1)(1, 0, 2)(1, 1, 2)(0, 1, 2)

(1, 0, 0) (1, 1, 0) (1, 0, 1) (1, 1, 1) (0, 1, 0) (0, 0, 1) (0, 1, 1)

Fig. 2. Maximal chains of (Ir(S)2 , 2 ) for |S|=3

Figure 2 shows the set of maximal chains of the partial order 2 , where |S|
is three.
A maximal chain (p1 , . . . , pn ) of the poset (Ir(S)2 , 2 ) together with a func-
tion on Ir(S)2 determines a point in IRS as the intersection of hyperplanes
pTi x = f (pi ) (i = 1, . . . , n). We state this fact as the following lemma. We
call this point the point corresponding to the maximal chain in what follows.

Lemma 8. Let (p1 , . . . , pn ) be a maximal chain of a poset (Ir(S)2 , 2 ), where


S is a finite set with n elements. Then the set of vectors {p1 , . . . , pn } is linearly
independent.

Proof. We may assume, without loss of generality, that S is composed with


1, . . . , n so that supp2 (pi ) = {1, . . . , i − 1} for i >
= 2. We prove the lemma by
showing that the n × n-matrix (pn , . . . , p1 )T is nonsingular.
Let p1 be vec(j). By Corollary 7, pi = vec(i − 1, j) for every i = 1, . . . , j.
Then χi = pi+1 −pi for i = 1, . . . , j −1 and χj = 2p1 −pj . The proof is completed
if j = n. Otherwise we can reduce the size of the matrix and repeat the same
procedure to prove the lemma. t
u

We would like to show the fact that any point corresponding to a maximal
chain belongs to P (f ) if f is a greedy-type function. Thus such points are vertices
of P (f ). The following two lemmas are used to prove this fact.

Lemma 9. Let f be a function from a subset of IRS to IR, where S consists


n elements. Let p1 , . . . , pn be linearly independent vectors in dom(f ), that is,
P = (p1 , . . . , pn )T is a nonsingular n × n-matrix. Let q be an arbitrary vector
in dom(f ). Let a ∈ IRS denote (P −1 )T q. The intersection of hyperplanes pT i x=
, n), which is a point, belongs to the half space q T x <
f (pi )(i = 1, . . .P = f (q) if and
only if f (q) > n
= i=1 a(i)f (pi ).
Proof. Let b be the n dimensional vector (f (p1 ), . . . , f (pn ))T . The intersec-
tion of the n hyperplanes
Pn is P −1 b ∈ IRS . Our claim follows from q T (P −1 b) =
T −1
(q P )b = a b = i=1 a(i)f (pi ).
T
t
u
Integral Polyhedra Associated with Certain Submodular Functions 295

Lemma 10. Let f be a greedy-type function on {0, 1, 2}S and let p, q be vec-
tors in {0, 1, 2}S such that either supp1 (p)\supp(q) or supp1 (q)\supp(p) is not
empty. Then f (p) + f (q) > S S
= f ((p + q) ∧ 2χ ) + f (p + q − (p + q) ∧ 2χ ) holds.

Proof. If supp1 (p) ∩ supp1 (q) = ∅, the inequality of the lemma is nothing but
that of (G2). So we may assume that supp1 (p) ∩ supp1 (q) is not empty. Let r
2
denote 2χsupp (p∨q) . By submodularity, we have

f (p) + f (q) + f (r) >


= f (p ∨ q) + f (p ∧ q) + f (r)
> f (p ∨ q) + f ((p ∧ q) ∨ r) + f ((p ∧ q) ∧ r) . (2)
=

Let X, Y , Z denote supp(r), supp((p ∧ q) ∨ r), supp(p ∨ q), respectively. Then


(p ∧ q) ∨ r = χX + χY and p ∨ q = χX + χZ . By the assumption that
supp1 (p) ∩ supp1 (q) 6= ∅, we have X ( Y . Since either supp1 (p)\supp(q) or
supp1 (q)\supp(p) is not empty, we have Y ( Z. By (G3), we have

f (p ∨ q) + f ((p ∧ q) ∨ r) > X Y X Z
= f (χ + χ ) + f (χ + χ )
= f (χY + χZ ) + f (2χX )
= f ((p + q) ∧ 2χS ) + f (r) . (3)

Since (p ∧ q) ∧ r = p + q − (p + q) ∧ 2χS , the inequality of the lemma follows


from (2) and (3). t
u

Now we have reached the point to state the main theorem of this section,
which ensures that every maximal chain of the poset (Ir(S)2 , 2 ) has its corre-
sponding vertex on P (f ) for any greedy-type function f .

Theorem 11. Let f be a greedy-type function whose support is S. Then its


associated polyhedron P (f ) contains any point corresponding to a maximal chain
of (Ir(S)2 , 2 ).

Proof. Let n be |S|. Choose an arbitrary maximal chain (p1 , . . . , pn ). Let x ∈ IRS
be the vertex corresponding to the chain. Let < =σ denote the total order on
S under which p1 , . . . , pn are simply non-increasing. We omit the symbol σ
for simplicity. Let P denote the matrix (p1 , . . . , pn )T and b denote the vector
(f (p1 ), . . . , f (pn ))T . In this proof, lst(p) denotes the last coordinate that is not
0, more precisely, min{i ∈ {0, 1, . . . , n} | p(j) = 0 for every j > i}. The symbol
lst2 (p) denotes min{i ∈ {0, 1, . . . , n} | p(j) < 2 for every j > i}.
This theorem is true if x satisfies the linear inequality

pT x <
= f (p) (4)

for every p ∈ {0, 1, 2}S . By Lemma 9, (4) is equivalent to


X
f (p) >
= a(i)f (pi ), (5)
296 Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

P P
where a is the only vector in IRS with p = ni=1 a(i)pi . If p + pk = mi qi holds
for (i) some vector pk in the maximal chain, (ii) some nonnegative integers mi ’s,
and (iii) some vectors qi ’s for which (5) is shown, then the following inequality
X
f (p) + f (pk ) >
= mi f (qi ) (6)

implies (5). We thereby prove this theorem by showing (5) or (6) for every
p ∈ {0, 1, 2}S .
In this proof we call a vector p ∈ {0, 1, 2}S hole-less under ≤σ if χsupp(p) =
vecσ (k1 ) for some k1 ∈ {0, 1, . . . , n}. We say that a vector has a hole if it is not
a hole-less vector.
We first show these inequalities for vectors simply non-increasing under
< . Then for hole-less vectors in {0, 1, 2}S and lastly for arbitrary vectors in

{0, 1, 2}S . It should be noted that (5) holds if the vector p belongs to the chain.
We show (5) or (6) for a simply non-increasing vector p, using induction on
lst(p). Inequality (5) is trivial if lst(p) = 0, that is p = 0 . Suppose that (5)
holds for any simply non-increasing vector p ∈ {0, 1, 2}S with lst(p) < = k − 1. We
examine whether (5) or (6) holds for a vector p ∈ {0, 1, 2}S with lst(p) = k.
We first assume that pk = vec(k − 1, k). We also assume that neither p
nor 12 p belongs to the chain, since (5) is trivial otherwise. By Corollary 7, this
assumption implies that supp(p1 ) ( {1, . . . , k}. Let pl be the last vector in the
chain with supp(pl ) ( {1, . . . , k}. We see from Corollary 7 that pl = vec(l − 1, l),
pl+1 = vec(l, k), and supp2 (p) ⊆ {1, . . . , l − 1} hold. Then p + pl = p − χ{l,... ,k} +
pl+1 holds. By Lemma 10, we have f (p)+f (pl ) > = f (pl+1 )+f (p−χ
{l,... ,k}
). Since
{l,... ,k} {l,... ,k}
p−χ is a simply non-increasing vector with lst(p − χ ) < k, (6) is
proved.
Now we assume that pk = vec(k − 1, j) for some j > k. Then p + pk =
p − χk + pk+1 holds. By Lemma 10, we have f (p) + f (pk ) > = f (p − χk ) + f (pk+1 ).
Since p − χk is a simply non-increasing vector with lst(p − χk ) < k, we have
shown (6).
We show (6) for a hole-less vector p in {0, 1, 2}S , using induction on ||p||. If
||p|| <
= 2, then p is a simply non-increasing vector, for which we have already
shown (5). Suppose that (5) holds for every hole-less vector p ∈ {0, 1, 2}S with
||p|| <
= k − 1. We examine whether (6) holds for a hole-less vector p ∈ {0, 1, 2}S
with ||p|| = k. Let l denote lst (p).
2

We may assume l > = 2 since otherwise p is a simply non-increasing vector.


Obviously p ∨ pl is simply non-increasing, and p ∧ pl is a hole-less vector with
||p ∧ pl || < >
= k − 1. Since f (p) + f (pl ) = f (p ∨ pl ) + f (p ∧ pl ) follows from (G2),
we see that (6) holds for such p.
We show (6) for an arbitrary vector p in {0, 1, 2}S , using induction on ||p||.
If ||p|| = 0, then p equals 0 and (6) holds for such p. Suppose that (6) holds for
every p ∈ {0, 1, 2}S with ||p|| < = k − 1. We examine whether (6) holds for a vector
p ∈ {0, 1, 2}S with ||p|| = k.
Let l be “the last coordinate of holes”, more precisely, max{i ∈ S\{n} | p(i) =
0 and p(i+1) 6= 0}. We may assume that p has a hole and then l > = 1 holds. Then
(p+pl )∧2χS is a hole-less vector and ||p+pl −(p+pl )∧2χS || ≤ k−1. Since p(l) = 0
Integral Polyhedra Associated with Certain Submodular Functions 297

and pl (l) = 1, we have f (p) + f (pl ) ≥ f ((p + pl ) ∧ 2χS ) + f (p + pl − (p + pl ) ∧ 2χS )


by Lemma 10. Thus (6) is shown. t
u

5 Integrality of Greedy-Type Polyhedra

In this section, we discuss integrality of greedy-type polyhedra. We prove the


total dual integrality of the system of linear inequalities:

pT x < S
= f (p) (p ∈ {0, 1, 2} ) (7)

for a rational greedy-type function f defined on {0, 1, 2}S .

Lemma 12. Every vector c ∈ IRS+ is a nonnegative combination of vectors in a


certain maximal chain of the poset (Ir(S)2 , 2 ). Moreover if the vector is integral,
the coefficients may be taken from ZZ + .

Proof. Let the set S be {1, . . . , n}. Considering permutations of S, we only have
to show this theorem for simply non-increasing vector c. This theorem is proved
as a corollary of the claim stated in the following paragraph.
Let j be some integer in {1, . . . , n} and let c be a simply non-increasing
vector. Let l denote min{i ∈ {j, . . . , n} | c(i0 ) = 0 (i0 > i)}. If both c(i) = c(j)
for i = j + 1, . . . , l and c(i) = 0 for i = l + 1, . . . , n are satisfied, there exists
a maximal chain (p1 , . . . , pn ) of (Ir(S)2 , 2 ) such that (i) c is a nonnegative
combination of p1 , . . . , pj and (ii) supp(pj ) = {1, . . . , l}. Moreover if c is integral,
the coefficients may be taken from ZZ + .
We use induction on j to prove this claim. If j = 1, the claim is trivial.
Suppose that the claim holds for any integer j with j < = k. We now show the
claim for j = k + 1.
We consider the following two cases according to the vector c. If c(k) < =
c(k+1)−c(k), let c0 denote the vector c−c(k)vec(k, l). And then c0 is nonnegative,
non-increasing, and c0 (i) = 0 for i = k + 1, . . . , n. If c(k) > c(k + 1) − c(k), let c0
denote c − (c(k + 1) − c(k))vec(k, l) And then c0 is nonnegative, non-increasing,
and c0 (i) = c0 (k) for i = k, . . . , l and c0 (i) = 0 for i = l + 1, . . . , n.
In either case, by the assumption of induction, c0 is represented as a nonnega-
tive combination of vectors p1 , . . . , pk , which make a chain of (Ir(S)2 , 2 ). Thus
c can be represented as a nonnegative combination of p1 , . . . , pk , vec(k, l). Since
either supp(pk ) ⊆ supp2 (vec(k, l)) or supp(pk ) = supp(vec(k, l)) holds, vectors
p1 , . . . , pk , vec(k, l) also make a chain.
If c is integral, the coefficient for vec(k, l) is an integer and the other coeffi-
cients are also integral since the vector c0 is also integral.
Lemma 12 follows from the claim for j = n. t
u

Corollary 13. For every vertex of a greedy-type polyhedron in IRS , there exists
a maximal chain of (Ir(S)2 , 2 ) corresponding to the vertex.
298 Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

Theorem 14. Let f be a rational greedy-type function defined on {0, 1, 2}S ,


where S is a nonempty finite set. Then the system of linear inequalities

pT x <
= f (p) (p ∈ {0, 1, 2}S )

is totally dual integral.

Proof. The LP problem max{cT x | pT x < S


= f (p) (p ∈ {0, 1, 2} )} has a finite
optimal value if and only if c is nonnegative. By Lemma 12, a nonnegative
vector c0 is a nonnegative combination of vectors in some maximal chain of
the poset (Ir(S)2 , 2 ). Thus the coefficients of the combination make a integral
feasible solution of the dual problem min{y T b | yA = c0 , y >
= 0 }, where b is the
appropriate vector composed with f (p)’s. Let y0 denote this feasible solution
and let x0 denote the vertex on P (f ) corresponding to the chain. It is easy to
show y0T b = cT0 x0 . By the duality theorem of linear programming, we see that
this integral feasible solution of the dual problem is an optimal solution. Thus
the system of inequality is TDI. t
u

Corollary 15. Let f be an integral greedy-type function defined on {0, 1, 2}S ,


where S is a nonempty finite set. Then its associated polyhedron P (f ) is integral.

Remark 16. Such a system of linear inequalities is not box TDI. Let f be a
greedy-type function derived from the polyhedron {x ∈ IR{s1 ,s2 ,s3 } | x(s1 ) < =
2, x(s2 ) < < <
= 4, x(s3 ) = 1, 2x(s1 ) + x(s2 ) + 2x(s3 ) = 6}. Adding inequalities corre-
sponding to the box (0, 0, 0)T < <
= x = (2, 4, 1) to the system of linear inequalities
T
T < {s1,s2 ,s3 }
p x = f (p) (p ∈ {0, 1, 2} ) makes a system of inequalities which is not
TDI. For example, the objective vector (3, 2, 1)T is maximized by the vertex
(1, 4, 0), while (3, 2, 1)T is not any nonnegative integral combination of vectors
(2, 1, 2)T , (0, 0, −1)T , (0, 1, 0)T , which correspond to the vertex (1, 4, 0). (See
Fig 3.)

In the end of this section, we propose a dual greedy algorithm which solves
the LP problem max{cT x | pT x < S
= f (p) (p ∈ {0, 1, 2} )} for a greedy-type
function f : {0, 1, 2} → IR and a nonnegative objective vector c ∈ IRS+ . Readers
S

may easily see that Algorithm 17 gives a representation of the objective vector
as a nonnegative linear combination of vectors in a maximal chain. Then the
validity of Algorithm 17 follows from duality theorem of linear programming
and Theorem 11. (It is also proved in [11].)

Algorithm 17 (Dual greedy algorithm for greedy-type functions).

Input A greedy-type function f : {0, 1, 2}S → IR, where S is a finite set with n
elements. A nonnegative objective vector c ∈ IRS+ .
Output The maximum value of cT x among x ∈ P (f ). A vertex x ∈ P (f ) which
gives the maximum value.
Step1 Sort the support S = {1, 2, . . . , n} so that c(i) > < <
= c(i + 1) (1 = i = n − 1).
Integral Polyhedra Associated with Certain Submodular Functions 299

s2
(0,4,1)
1

4 s3
(2,0,1)
0
(2,1,2) x < 6 (1,4,0)

2 (2,2,0)
s1
(2,4,−1)

Fig. 3. The polytope corresponding to the system of inequalities in Remark 16

Step2 Let t, y be n dimensional vectors. Initialize cn+1 := c, vmax := 0.


Step3 For i := n, n − 1, . . . , 2, 1, repeat;
y(i) := 
min{ci+1 (i − 1) − ci+1 (i), ci+1 (i)},
i if ci+1 (i + 1) = 0,
l(i) :=
l(i + 1) otherwise,
pi := vec(i − 1, l(i)),
ci := ci+1 − y(i)pi ,
vmax := vmax + y(i)f (pi ),

where cn+1 (n + 1), c2 (0) is considered to be 0, +∞, respectively.


End Return the value vmax and the vector P −1 b as the outputs, where P is the
n × n-matrix with pT i as its ith row, and b is the n dimensional vector with
f (pi ) as its ith component, for i = 1, . . . , n.

6 An Example and Separation Theorem


We mention here some bipartite network flow with gain whose feasible flows
make a greedy-type polyhedron. A separation theorem for greedy-type functions
is also shown in this section.
Definition 18 (Bipartite network with {1, 2}-gain). Let D = (S + , S; A) be
a simple bipartite digraph such that A = {(s, t) | s ∈ S + , t ∈ S}. The members
of S + are called sources and those of S sinks. Let an upper capacity function
c : A → IR and a supply function b : S + → IR be given, and a gain function
α : A → {1, 2}. We call such a network a bipartite network with {1, 2}-gain and
write N = ((S + , S; A), c, b, α) for this network.
Given a bipartite network with {1, 2}-gain N = ((S + , S; A), c, b, α), we define
feasible flows of this network as follows. A bipartite flow with {1, 2}-gain (or
300 Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

simply flow) of N is a function ϕ : A → IR. We define the generalized boundary


(or simplyPboundary) of P a flow ϕ to be a function ∂ϕ : S + ∪ S → IR defined by
∂ϕ(v) := a∈δv+ ϕ(a) − a∈δv− α(a)ϕ(a) for every v ∈ S + ∪ S. A flow is feasible
if ϕ(a) < <
= c(a) for every a ∈ A and if ∂ϕ(v) = b(v) for every v ∈ S . We call
+

−∂ϕ|S ∈ IRS an output of N if ϕ is a feasible flow of N .


We call the set of all outputs in IRS of a bipartite network with {1, 2}-gain the
output polyhedron of the network. We call restrictions of the support functions
of polyhedra to {0, 1, 2}S the {0, 1, 2}-support functions of the polyhedra.
From now on, we would like to show that the set of feasible flows makes a
greedy-type polyhedron. It is not difficult to prove this statement for bipartite
networks with {1, 2}-gain whose source sets consist of one element.
Lemma 19. Let P be the output polyhedron of a bipartite network with {1, 2}-
gain N = (({s}, S; A), c, b, α), whose source set consists of the only element {s}.
Let f be the {0, 1, 2}-support function of P . Then f is a greedy-type function
and P (f ) = P holds.
Proof. Let S be {1, . . . , n}. Let γ be the n dimensional vector defined by γ(i) =
2/α(i) for i ∈ S. Obviously P = {x ∈ IRS | γ T x < <
= 2b, x(i) = α(i)c(i) for any i ∈
S} holds.
We first show that f is greedy-type, that is, f satisfies (G1), (G2), and (G3).
Since f isPa restriction of the support function, it satisfies (G1).
If b >
= c(i), then (α(1)c(1), . . . , α(n)c(n)) is the only vertex of P . Since any
value of the {0, 1, 2}-support function is attained by this vertex,
P the inequalities
of (G2) and (G3) hold with equalities. We now assume b < c(i). In this case,
P has exactly n vertices, each of which corresponds to the vector set

{γ} ∪ {χj ∈ {0, 1}S | j ∈ S\{i}}

for i = 1, . . . , n.
For an arbitrary vector p in {0, 1, 2}S , let argmin(p, γ) denote an element l
of S that satisfies p(l)/γ(l) <
= p(i)/γ(i) for any i ∈ S. (If there exists more than
one such elements, any choice may be possible for the following argument.) We
see that f (p) is attained by the vertex corresponding to vectors in

{γ} ∪ {χj ∈ {0, 1}S | j ∈ S\{argmin(p, γ)}},

and we see that f (p) is equal to


p(argmin(p, γ)) X p(argmin(p, γ))
f (γ) + [p(j) − γ(j)]f (χj ) .
γ(argmin(p, γ)) γ(argmin(p, γ))
j∈S\{argmin(p,γ)}

p(argmin(p,γ))
We write cntrb(p) for γ(argmin(p,γ)) in this proof.
Then we have

f (p) + f (q) − f (p ∨ q) − f (p ∧ q) =
X
(cntrb(p) + cntrb(q) − cntrb(p ∨ q) − cntrb(p ∧ q))[ γ(j)f (χj ) − f (γ)] .
j∈S
Integral Polyhedra Associated with Certain Submodular Functions 301

P
It is clear that j∈S γ(j)f (χj ) − f (γ) > >
= 0 and cntrb(p) + cntrb(q) = cntrb(p ∨
q) + cntrb(p ∧ q) hold. Thus (G2) is proved and (G3) follows from a similar
argument.
Since the normal vectors of P ’s facets are 012-vectors, we see that P = P (f )
holds. t
u

Now all we have to prove is that the sum of greedy-type polyhedra is the
polyhedron associated with the sum of those functions, which is also a greedy-
type function.

Lemma 20. Let S be a nonempty finite set. Let f and g be greedy-type functions
from {0, 1, 2}S to IR. Then the vector sum of P (f ) and P (g) coincides with
P (f + g).

Proof. Let S contain n elements. Since it is obvious that P (f +g) ⊇ P (f )+P (g),
all we have to show is that P (f + g) ⊆ P (f ) + P (g).
To prove our claim, we show that δP∗ (f +g) (p) > ∗
= δP (f )+P (g) (p) holds for any
vector p ∈ IRS . This inequality is trivial if p 6∈ IRS+ . (Then both sides are equal
to +∞.) Let q be an arbitrary vector in IRS+ . By Lemma 12, q is a nonnegative
combination of vectors p1 , . . . , pn that compose a maximal chain of (Ir(S)2 , 2 ).
Let P denote the n × n-matrix that has pi as its ith row vector for i = 1, . . . , n.
Then q T = y T P for some nonnegative vector y ∈ IRn+ . Let b1 (b2 ) be the n
dimensional vector that has f (pi ) (g(pi )) as its ith component, respectively. Since
the point corresponding to the chain belongs to P (f + g), we have δP∗ (f +g) (q) =
y T (b1 + b2 ) by the duality theorem of linear programming. Since f and g are
greedy-type, P (f ) contains P −1 b1 and P (g) contains P −1 b2 . The polyhedron
P (f ) + P (g) thereby contains P −1 (b1 + b2 ). Then δP∗ (f )+P (g) (q) > T −1
= q P (b1 +
b2 ) = y T (b1 + b2 ) = δP∗ (f +g) (q), which we have tried to show, holds. t
u

As a corollary of Lemma 19 and Lemma 20, we have the following theorem.

Theorem 21. Let P be the output polyhedron of a bipartite network with {1, 2}-
gain and let f be the {0, 1, 2}-support function of P . Then f is a greedy-type
function and P (f ) = P holds.

From Lemma 20, we have Theorem 22, which is a kind of separation theorem.

Theorem 22. Let S be a nonempty finite set. Let f and g be functions from
{0, 1, 2}S to IR such that f and −g are greedy-type. If f (p) >
= g(p) holds for every
p ∈ {0, 1, 2}S , there exists a vector h ∈ IRS which satisfies f (p) > T >
= h p = g(p)
S
for every p ∈ {0, 1, 2} .

Proof. We have P (f − g) = P (f ) + P (−g) by Lemma 20. Since f > = g, P (f − g)


contains the zero vector and so does P (f ) + P (−g). It means that there exists
a vector x ∈ IRS such that x ∈ P (f ) and −x ∈ P (−g). Then f (p) > T >
= x p = g(p)
S
holds for every p ∈ {0, 1, 2} . t
u
302 Kenji Kashiwabara, Masataka Nakamura, and Takashi Takabatake

Remark 23. Property (G3) is essential for Theorem 22. We give two functions
defined on {0, 1, 2}S that satisfy (G1) and (G2) and that are not separated by
any linear functions.
Let f be the 012-support of the polytope

{x ∈ IR{s1 ,s2 ,s3 } | x(s1 ) < < < < <


= 2, x(s2 ) = 4, x(s3 ) = 1, (2, 1, 2)x = 6, (2, 1, 0)x = 6}
and let f 0 be that of

{x ∈ IR{s1 ,s2 ,s3 } | x(s1 ) < < < < <


= 2, x(s2 ) = 1, x(s3 ) = 4, (2, 2, 1)x = 6, (2, 0, 1)x = 6} .
(P (f ) is the minimum down-monotone polyhedron that includes the nonnegative
part of the polytope in Fig. 3. P (f 0 ) is obtained from P (f ) by exchanging the
s2 -axis and the s3 -axis.)
Let g be the function on {0, 1, 2}{s1,s2 ,s3 } defined by g(p) = (3.5, 1, 4)p−f 0(p)
for p ∈ {0, 1, 2}{s1,s2 ,s3 } . Then f > = −g holds. Both f and −g satisfy (G1)
and (G2) but not (G3). For example, f ((2, 1, 1)) + f ((2, 1, 0)) is greater than
f ((2, 2, 1)) + f ((2, 0, 0)). (See Table 1 for values of f, f 0 , g.)
Since f and g give the same value for three vectors (2, 0, 1)T , (2, 1, 1)T , and
(2, 1, 2)T , the only possible vector for h is (2.5, 1, 0)T . But then f ((1, 0, 0)) is less
than hT (1, 0, 0)T .
Property (G3) is also essential for Lemma 20. The vertex (3.5, 1, 4) of P (f +
f 0 ) is not contained P (f ) + P (f 0 ). (The value of P (f )’s support function for
(4, 1, 2)T is 10 and that of P (f 0 )’s support function is 12, while (3.5, 1, 4)(4, 1, 2)T
is 23.)
The reason why f and g cannot be separated is that the face structures of
the polyhedra for f and g 0 have no common refinement such that normal vectors
of its facets are 012-vectors. Property (G3) is necessary for the face structure of
greedy-type polyhedra to have such a common refinement. See [5] for relationship
between face structures and separation theorem.

Table 1. Values of functions in Remark 23

p f (p) f 0 (p) g(p) p f (p) f 0 (p) g(p) p f (p) f 0 (p) g(p)


(0,0,0) 0 0 0 (1,1,1) 5 5 3.5 (2,1,1) 6 6 6
(1,0,0) 2 2 1.5 (2,1,0) 6 5 3 (1,2,1) 9 6 3.5
(0,1,0) 4 1 0 (1,2,0) 9 4 1.5 (1,1,2) 6 9 3.5
(0,0,1) 1 4 0 (0,2,1) 9 6 0 (2,2,1) 10 6 7
(1,1,0) 5 3 1.5 (0,1,2) 6 9 0 (1,2,2) 10 10 3.5
(0,1,1) 5 5 0 (1,0,2) 4 9 2.5 (2,1,2) 6 10 6
(1,0,1) 3 5 2.5 (2,0,1) 5 6 5

References
[1] Edmonds, J., Giles, R.: A min-max relation for submodular functions on graphs.
Annals of Discrete Mathematics 1 (1977) 185–204
Integral Polyhedra Associated with Certain Submodular Functions 303

[2] Frank, A.: An algorithm for submodular functions on graphs. Annals of Discrete
Mathematics 16 (1982) 97–120
[3] Fujishige, S.: Submodular Functions and Optimization. Annal of Discrete Math-
ematics 47 (1991)
[4] Fujishige, S., Murota., K.: On the relationship between L-convex functions and
submodular integrally convex functions. RIMS Preprint 1152, Research Institute
for Mathematical Sciences, Kyoto University (1997)
[5] Kashiwabara, K.: Set Functions and Polyhedra. PhD thesis, Tokyo Institute of
Technology (1998)
[6] Lovász, L.: Submodular functions and convexity. in Grötschel M,. Bachem, A.,
Korte B. (eds.), Mathematical Programming — The State of the Art. Springer-
Verlag, Berlin (1983) 235–257
[7] Murota, K.: Discrete convex analysis. Mathematical Programming. 83 (1998) 313–
371
[8] Rockafellar, R.T.: Convex Analysis. Princeton University Press (1970)
[9] Scrijver, A.: Total dual integrality from directed graphs, crossing families, and
sub- and supermodular functions. in Pulleyblank, W.R. (ed.), Progress in Com-
binatorial Optimization. Academic Press (1984) 315–361
[10] Schrijver, A.: Theory of Linear and Integer Programming. Wiley (1986)
[11] Takabatake, T.: Generalizations of Submodular Functions and Delta-Matroids.
PhD thesis, Universtiy of Tokyo (1998)
Optimal Compaction of
Orthogonal Grid Drawings?
(Extended Abstract)

Gunnar W. Klau and Petra Mutzel

Max–Planck–Institut für Informatik


Im Stadtwald, D–66123 Saarbrücken, Germany
{guwek, mutzel}@mpi-sb.mpg.de

Abstract. We consider the two–dimensional compaction problem for


orthogonal grid drawings in which the task is to alter the coordinates of
the vertices and edge segments while preserving the shape of the drawing
so that the total edge length is minimized. The problem is closely related
to two–dimensional compaction in vlsi–design and has been shown to
be NP–hard.
We characterize the set of feasible solutions for the two–dimensional com-
paction problem in terms of paths in the so–called constraint graphs in
x– and y–direction. Similar graphs (known as layout graphs) have already
been used for one–dimensional compaction in vlsi–design, but this is the
first time that a direct connection between these graphs is established.
Given the pair of constraint graphs, the two–dimensional compaction
task can be viewed as extending these graphs by new arcs so that cer-
tain conditions are satisfied and the total edge length is minimized. We
can recognize those instances having only one such extension; for these
cases we solve the compaction problem in polynomial time.
We transform the geometrical problem into a graph–theoretical one and
formulate it as an integer linear program. Our computational experi-
ments show that the new approach works well in practice.

1 Introduction

The compaction problem has been one of the challenging tasks in vlsi–design.
The goal is to minimize the area or total edge length of the circuit layout while
mainly preserving its shape. In graph drawing the compaction problem also plays
an important role. Orthogonal grid drawings, in particular the ones produced by
the algorithms based on bend minimization (e.g., [16, 6, 10]), suffer from missing
compaction algorithms. In orthogonal grid drawings every edge is represented
as a chain of horizontal and vertical segments; moreover, the vertices and bends
are placed on grid points. Two orthogonal drawings have the same shape if one
can be obtained from the other by modifying the lengths of the horizontal and
?
This work is partially supported by the Bundesministerium für Bildung, Wissen-
schaft, Forschung und Technologie (No. 03–MU7MP1–4).

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 304–319, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Optimal Compaction of Orthogonal Grid Drawings 305

vertical segments without changing the angles formed by them. The orthogo-
nal drawing standard is in particular suitable for Entity–Relationship diagrams,
state charts, and pert–diagrams with applications in data bases, case–tools,
algorithm engineering, work–flow management and many more.
So far, in graph drawing only heuristics have been used for compacting or-
thogonal grid drawings. Tamassia [16] suggested “refining” the shape of the
drawing into one with rectangular faces by introducing artificial edges. If all the
faces are rectangles, the compaction problem can be solved in polynomial time
using minimum–cost network flow algorithms. In general, however, the solution
is far from the optimal solution for the original graph without the artificial edges
(see also Sect. 4). Other heuristics are based on the idea of iteratively fixing the
x– and y–coordinates followed by a one–dimensional compaction step. In one–
dimensional compaction, the goal is to minimize the width or height of the layout
while preserving the coordinates of the fixed dimension. It can be solved in poly-
nomial time using the so–called layout graphs. The differences of the methods are
illustrated in Fig. 1. It shows the output of four different compaction methods
for a given graph with fixed shape. The big vertices are internally represented
as a face that must have rectangular shape. Figure 1(a) illustrates the method
proposed in [16]. Figures 1(b) and 1(c) show the output of two one–dimensional
compaction strategies applied to the drawing in Fig. 1(a). Both methods use the
underlying layout graphs, the first algorithm is based on longest paths computa-
tions, the second on computing maximum flows. Figure 1(d) shows an optimally
compacted drawing computed by our algorithm.
The compaction problems in vlsi–design and in graph drawing are closely
related but are not the same. Most versions of the compaction problem in vlsi–
design and the compaction problem in graph drawing are proven to be NP–
hard [11, 14]. Research in vlsi–design has concentrated on one–dimensional
methods; only two papers have suggested optimal algorithms for two–dimensio-
nal compaction (see [15, 8]). The idea is based on a (0, 1)–non–linear integer
programming formulation of the problem and solving it via branch–and–bound.
Unfortunately, the methods have not been efficient enough for practical use.
Lengauer states the following: “The difficulty of two–dimensional compaction
lies in determining how the two dimensions of the layout must interact to mini-
mize the area” [11, p. 581]. The two–dimensional compaction method presented
in this paper exactly attacks this point. We provide a necessary and sufficient
condition for all feasible solutions of a given instance of the compaction problem.
This condition is based on existing paths in the so–called constraint graphs in
x– and y–direction. These graphs are similar to the layout graphs known from
one–dimensional compaction methods. The layout graphs, however, are based
on visibility properties whereas the constraint graphs arise from the shape of
the drawing.
Let us describe our idea more precisely: The two constraint graphs Dh and
Dv specify the shape of the given orthogonal drawing. We characterize exactly
those extensions of the constraint graphs which belong to feasible orthogonal
grid drawings. The task is to extend the given constraint graphs to a complete
306 Gunnar W. Klau and Petra Mutzel

(a) Original method. (b) Longest paths.

(c) Flow. (d) Optimal.

Fig. 1. The result of four different compaction algorithms.

pair of constraint graphs by adding new arcs for which the necessary and suffi-
cient condition is satisfied and the total edge–length of the layout is minimized.
Hence, we have transformed the geometrical problem into a graph–theoretical
one. Furthermore, we can detect constructively those instances having only one
complete extension. For these cases, we can solve the compaction problem in
polynomial time.
We formulate the resulting problem as an integer linear program that can be
solved via branch–and–cut or branch–and–bound methods. Our computational
results on a benchmark set of 11,582 graphs [4] have shown that we are able
to solve the two–dimensional compaction problem for all the instances in short
computation time. Furthermore, they have shown that often it is worthwhile
looking for the optimally compacted drawing. The total edge lengths have been
improved up to 37, 0% and 65, 4% as compared to iterated one–dimensional
compaction and the method proposed in [16].
Optimal Compaction of Orthogonal Grid Drawings 307

In Sect. 2 we provide the characterization of the set of feasible solutions. The


formulation of the compaction problem in the form of an integer linear program
is given in Sect. 3. We describe a realization of our approach and corresponding
computational experiments on a set of benchmark graphs in Sect. 4. Finally,
Sect. 5 contains the conclusions and our future plans. In this version of the
paper we have omitted some proofs due to space limitations. They can be found
in [9].

2 A Characterization of Feasible Solutions

In this section we describe the transformation of the compaction problem into


a graph–theoretical problem. After giving some basic definitions we introduce
the notion of shape descriptions and present some of their properties. As a main
result of this section, we establish a one–to–one correspondence between shape
descriptions satisfying a certain property and feasible orthogonal grid drawings.

2.1 Definitions and Notations

In an orthogonal (grid) drawing Γ of a graph G the vertices are placed on mutu-


ally distinct integer grid points and the edges on mutually line–distinct paths in
the grid connecting their endpoints. We call Γ simple if the number of bends and
crossings in Γ is zero. Each orthogonal grid drawing can be transformed into a
simple orthogonal grid drawing. The straightforward transformation consists of
replacing each crossing and bend by a virtual vertex. Figure 2 gives an example.

Fig. 2. An orthogonal grid drawing and its simple counterpart.

The shape of a simple drawing is given by the angles inside the faces, i.e.,
the angles occurring at consecutive edges of a face cycle. Note that the notion
of shape induces a partitioning of drawings in equivalence classes. Often, the
shape of an orthogonal drawing is given by a so–called orthogonal representation
H. Formally, for a simple orthogonal drawing, H is a function from the set of
faces F to clockwise ordered lists of tuples (er , ar ) where er is an edge, and ar is
308 Gunnar W. Klau and Petra Mutzel

the angle formed with the following edge inside the appropriate face. Using the
definitions, we can specify the compaction problem:
Definition 1. The compaction problem for orthogonal drawings (cod) is stated
by the following: Given a simple orthogonal grid drawing Γ with orthogonal rep-
resentation H, find a drawing Γ 0 with orthogonal representation H of minimum
total edge length.
The compaction problem (cod) has been shown to be NP–hard [14]. So far,
in practice, one–dimensional graph–based compaction methods have been used.
But the results are often not satisfying. Figure 3(a) shows an orthogonal drawing
with total edge length 2k+5 which cannot be improved by one–dimensional com-
paction methods. The reason for this lies in the fact that one–dimensional com-
paction methods are based on visibility properties. An exact two–dimensional
compaction method computes Fig. 3(b) in which the total edge length is only
k + 6.
Let Γ be a simple orthogonal grid drawing of a graph G = (V, E). It induces a
partition of the set of edges E into the horizontal set Eh and the vertical set Ev .
A horizontal (resp. vertical) subsegment in Γ is a connected component in (V, Eh )
(resp. (V, Ev )). If the component is maximally connected it is also referred to
as a segment. We denote the set of horizontal and vertical subsegments by Sh ,
and Sv resp., and the sets of segments by Sh ⊆ Sh , Sv ⊆ Sv . We further define
S = Sh ∪ Sv and S = Sh ∪ Sv . The following observations are of interest (see also
Fig. 4).
1. Each edge is a subsegment, i.e., Eh ⊆ Sh , Ev ⊆ Sv .
2. Each vertex v belongs to one horizontal and one vertical segment, denoted
by hor(v) and vert(v).
3. Each subsegment s is contained in exactly one segment, referred to as seg(s).
4. The limits of a subsegment s are given as follows: Let vl , vr , vb , and vt be
the leftmost, rightmost, bottommost, and topmost vertices on s. Then l(s) =
vert(vl ), r(s) = vert(vr ), b(s) = hor(vb ), and t(s) = hor(vt ).
The following lemma implies that the total number of segments is 2|V | − |E|.
Lemma 1. (proof omitted) Let Γ be a simple orthogonal grid drawing of a graph
G = (V, Eh ∪ Ev ). Then |Sh | = |V | − |Eh | and |Sv | = |V | − |Ev |.

| {z }
k edges

(a) (b)

Fig. 3. A drawing generated with (a) an iterative one–dimensional and (b) an


optimal compaction method.
Optimal Compaction of Orthogonal Grid Drawings 309

v3 v4 Sh = {s1 , s2 , s3 } Sv = {s4 , s5 }
s3
s5 i V (si ) E(si ) l(si ) r(si ) b(si ) t(si )
s2 1 {v1 } {} s4 s4 s1 s1
v2 v5
2 {v2 , v5 } {(v2 , v5 )} s4 s5 s2 s2
s4 3 {v3 , v4 } {(v3 , v4 )} s4 s5 s3 s3
4 {v1 , v2 , v3 } {(v1 , v2 ), (v2 , v3 )} s4 s4 s1 s3
v1 s1 5 {v4 , v5 } {(v4 , v5 )} s5 s5 s2 s3

Fig. 4. Segments of a simple orthogonal grid drawing and its limits.

2.2 Shape Descriptions and Its Properties

Let G = (V, E) be a graph with simple orthogonal representation H and seg-


ments Sh ∪ Sv . A shape description of H is a tuple σ = hDh , Dv i of so–called
constraint graphs. Both graphs are directed and defined as Dh = (Sv , Ah ) and
Dv = (Sh , Av ). Thus, each node in Dh and Dv is a segment and the arc sets are
given by

Ah = {(l(e), r(e)) | e ∈ Eh } and Av = {(b(e), t(e)) | e ∈ Ev } .

The two digraphs characterize known relationships between the segments that
must hold in any drawing of the graph because of its shape properties. Let
a = (si , sj ) be an arc in Ah ∪ Av . If a ∈ Av then the horizontal segment si
must be placed at least one grid unit below segment sj . For vertical segments
the arc a ∈ Ah expresses the fact that si must be to the left of sj . Each arc
is defined by at least one edge in E. Clearly, each vertical edge determines the
relative position of two horizontal segments and vice versa. Figure 5 illustrates
the shape description of a simple orthogonal grid drawing.

s4 s5

s6 s8 s3
s9
s2

s7
s1

Fig. 5. A simple orthogonal drawing (dotted) and its shape description.


310 Gunnar W. Klau and Petra Mutzel


For two vertices v and w we use the notation v −→ w if there is a path from
v to w. Shape descriptions have the following property:

Lemma 2. Let σ = h(Sv , Ah ), (Sh , Av )i be a shape description. For every sub-


∗ ∗
segment s ∈ Sh ∪ Sv the paths l(s) −→ r(s) and b(s) −→ t(s) are contained in
Ah ∪ Av .

Proof. We prove the lemma for horizontal subsegments. The proof for vertical
subsegments is similar. By definition, the lemma holds for edges. Let s be a
horizontal subsegment consisting of k consecutive edges e1 , . . . , ek . With l(e1 ) =
l(s), r(ek ) = r(s) and (l(ei ), r(ei )) ∈ Ah and b(s) = t(s) = seg(s) the result
follows. t
u

Definition 2. Let (si , sj ) ∈ S × S be a pair of subsegments. We call the pair


separated if and only if one of the following conditions holds:
∗ ∗
1. r(si ) −→ l(sj ) 3. t(sj ) −→ b(si )
∗ ∗
2. r(sj ) −→ l(si ) 4. t(si ) −→ b(sj )

In an orthogonal drawing at least one of the four conditions must be satisfied for
any such pair (si , sj ). The following two observations show that we only need to
consider separated segments of opposite direction (proofs omitted).

Observation 1. Let (si , sj ) ∈ S × S be a pair of subsegments. If (seg(si ), seg(sj ))


is separated then (si , sj ) is separated.

Observation 2. Assume that the arcs between the segments form an acyclic
graph. Then the following is true: All segment pairs are separated if and only if
all segment pairs of opposite directions are separated.

The following lemma shows that we can restrict our focus to separated seg-
ments that share a common face. For a face f , we write S(f ) for the segments
containing the horizontal and vertical edges on the boundary of f .

Lemma 3. All segment pairs are separated if and only if for every face f the
segment pairs (si , sj ) ∈ S(f ) × S(f ) are separated.

Proof (Sketch). The forward direction follows immediately because of S(f ) ⊆ S


for every face f . To show the other direction observe that the relative position
of each pair of segments (si , sj ) ∈ S(f ) × S(f ) is fixed for any face f . Thus in
a drawing respecting the placement of the underlying constraint graphs there
are no crossings between segments of a face. Given that property we can show
that there are no crossings at all in the drawing. It follows the separation of all
segment pairs in the underlying constraint graphs. t
u
Optimal Compaction of Orthogonal Grid Drawings 311

2.3 Complete Extensions of Shape Descriptions

We will see next that any shape description σ = h(Sv , Ah ), (Sh , Av )i can be
extended so that the resulting constraint graphs correspond to a feasible orthog-
onal planar drawing. We give a characterization of these complete extensions in
terms of properties of their constraint graphs.

Definition 3. A complete extension of a shape description σ = h(Sv , Ah ),


(Sh , Av )i is a tuple τ = h(Sv , Bh ), (Sh , Bv )i with the following properties:

1. Ah ⊆ Bh , Av ⊆ Bv .
2. Bh and Bv are acyclic.
3. Every segment pair is separated.

The following theorem characterizes the set of feasible solutions for the com-
paction problem.

Theorem 1. For any simple orthogonal drawing with shape description σ =


h(Sv , Ah ), (Sh , Av )i there exists a complete extension τ = h(Sv , Bh ), (Sh , Bv )i of
σ and vice versa: Any complete extension τ = h(Sv , Bh ), (Sh , Bv )i of a shape
description σ = h(Sv , Ah ), (Sh , Av )i corresponds to a simple orthogonal drawing
with shape description σ.

Proof. To prove the first part of the theorem, we consider a simple orthogonal
grid drawing Γ with shape description σ = h(Sv , Ah ), (Sh , Av )i. Let c(si ) denote
the fixed coordinate for segment si ∈ Sh ∪Sv . We construct a complete extension
τ = h(Sv , Bh ), (Sh , Bv )i for σ as follows: Bh = {(si , sj ) ∈ Sv ×Sv | c(si ) < c(sj )},
i.e., we insert an arc from every vertical segment to each vertical segment lying
to the right of si . Similarly, we construct the set Bv . Clearly, we have Ah ⊆ Bh
and Av ⊆ Bv . We show the completeness by contradiction: Assume first that
there is some pair (si , sj ) which is not separated. According to the construction
this is only possible if the segments cross in Γ , which is a contradiction. Now
assume that there is a cycle in one of the arc sets. Again, the construction of Bh
and Bv forbids this case. Hence τ is a complete extension of σ.
We give a constructive proof for the second part of the theorem by specifying
a simple orthogonal grid drawing for the complete extension τ . To accomplish
this task we need to assign lengths to the segments. A length assignment for a
complete extension of a shape description τ = h(Sv , Bh ), (Sh , Bv )i is a function
c : Sv ∪ Sh → IN with the property (si , sj ) ∈ Bh ∪ Bv ⇒ c(si ) < c(sj ). Given τ ,
such a function can be computed using any topological sorting algorithm in the
acyclic graphs in τ , e.g., longest paths or maximum flow algorithms in the dual
graph. For a fixed length assignment, the following simple and straightforward
method assigns coordinates to the vertices. Let x ∈ INV and y ∈ INV be the
coordinate vectors. Then simply setting xv = c(vert(v)) and yv = c(hor(v)) for
every vertex v ∈ V results in a correct grid drawing. The following points have
to be verified:
312 Gunnar W. Klau and Petra Mutzel

1. All edges have positive integer length.


The length of a horizontal edge e ∈ Eh is given by c(r(e)) − c(l(e)). We know
that both values are integer and according to Lemma 2 that (l(e), r(e)) ∈ Bh
and thus c(r(e)) > c(l(e)). A similar argument applies to vertical edges.
2. Γ maps each circuit in G into a rectilinear polygon.
For a face f let Γ (f ) be the geometric representation of vertices and edges
belonging to f . It it sufficient to show that Γ (f ) is a rectilinear polygon for
each face f in G. Every vertex v on the boundary of f is placed according to
the segments hor(v) and vert(v). Two consecutive vertices v and w on the
boundary of f either share the same horizontal or the same vertical segment
(since they are linked by an edge). Thus, either xv = xw or yv = yw .
3. No subsegments cross.
Otherwise, assume that there are two such segments si and sj that cross.
Then c(r(si )) ≥ c(l(sj )), c(r(sj )) ≥ c(l(si )), c(t(sj )) ≥ c(b(si )) and c(t(si )) ≥
c(b(sj )) (see also Fig. 6). This is a contradiction to the completeness of τ . u t

We have transformed the compaction problem into a graph–theoretical prob-


lem. Our new task is to find a complete extension of a given shape description
σ that minimizes the total edge length. If a shape description already satisfies
the conditions of a complete extension (see Fig. 7(a)), the compaction problem
can be solved optimally in polynomial time: The resulting problem is the dual
of a maximum flow problem (see also Observation 3). Sometimes the shape de-
scription is not complete but it is only possible to extend it in one way (see
Fig. 7(b)). In these cases also it is easy to solve the compaction problem. But
in most cases it is not clear how to extend the shape description since there are
many different possibilities (see Fig. 7(c)).
Observe that the preprocessing phase of our algorithm described at the be-
ginning of Sect. 4 tests in polynomial time if there is a unique completion. In
that case, our algorithm computes an optimal drawing in polynomial time.

3 An ILP Formulation for the Compaction Problem


The characterization given in the previous section can be used to obtain an in-
teger linear programming formulation for the compaction problem cod. Let
Γ be a simple orthogonal grid drawing of a graph G = (V, Eh ∪ Ev ) with

t(sj )
sj
l(si ) r(si ) l(sj ) sj r(sj )
si

b(sj ) l(si ) si r(si )

Fig. 6. Crossing subsegments of opposite (left) and same (right) direction.


Optimal Compaction of Orthogonal Grid Drawings 313

(a) (b) (c)

Fig. 7. Three types of shape descriptions. Dotted lines show the orthogonal grid
drawings, thin arrows arcs in shape descriptions and thick gray arcs possible
completions.

orthogonal representation H and let σ = h(Sv , Ah ), (Sh , Av )i be the corre-


sponding shape description. The set of feasible solutions for this instance of
cod can now be written as C(σ) = {τ | τ is a complete extension of σ}. Let
C = Sh × Sh ∪ Sv × Sv be the set of possible arcs in the digraphs of σ. Q|C| is
the vector space whose elements are indexed with numbers corresponding to the
members of C. For a complete extension τ = h(Sv , Bh ), (Sh , Bv )i of σ we define
an element xτ ∈ Q|C| in the following way: xτij = 1 if (si , sj ) ∈ Bh ∪ Bv and
xτij = 0 otherwise. We use these vectors to characterize the Compaction Polytope
Pcod = conv{xτ ∈ Q|C| | τ ∈ C(σ)}.
In order to determine the minimum total edge length over all feasible points
in Pcod we introduce a vector c ∈ Q|Sh ∪Sv | to code the length assignment and
give an integer linear programming formulation for the compaction problem cod.
Let M be a very large number (for the choice of M see Lemma 5). The ILP for
the compaction problem is the following:

X X
(ILP) min cr(e) − cl(e) + ct(e) − cb(e) subject to
e∈Eh e∈Ev

xij = 1 ∀(si , sj ) ∈ Ah ∪ Av (1.1)


xr(i),l(j) + xr(j),l(i) + xt(j),b(i) + xt(i),b(j) ≥ 1 ∀(si , sj ) ∈ S × S (1.2)
ci − cj + (M + 1)xij ≤ M ∀(si , sj ) ∈ C (1.3)
xij ∈ {0, 1} ∀(si , sj ) ∈ C (1.4)

The objective function expresses the total edge length in a drawing for G.
Note that the formulation also captures the related problem of minimizing the
length of the longest edge in a drawing. In this case, the constraints cr(e) −cl(e) ≤
lmax or ct(e) − cb(e) ≤ lmax must be added for each edge e and the objective
function must be substituted by min lmax . Furthermore, it is possible to give
314 Gunnar W. Klau and Petra Mutzel

each edge an individual weight in the objective function. In this manner, edges
with higher values are considered more important and will preferably be assigned
a shorter length.
We first give an informal motivation of the three different types of constraints
and then show that any feasible solution of the ILP formulation indeed corre-
sponds to an orthogonal grid drawing.
(1.1) Shape constraints. We are looking for an extension of shape description σ.
Since any extension must contain the arc sets of σ, the appropriate entries
of x must be set to 1.
(1.2) Completeness constraints. This set of constraints guarantees completeness
of the extension. The respective inequalities model Definition 3.3.
(1.3) Consistency constraints. The vector c corresponds to the length assignment
and thus must fulfill the property (si , sj ) ∈ Bh ∪ Bv ⇒ c(si ) < c(sj ). If
xij = 1, then the inequality reads cj − ci ≥ 1, realizing the property for the
arc (i, j). The case xij = 0 yields ci − cj ≤ M which is true if M is set to
the maximum of the width and the height of Γ .
The following observation shows that we do not need to require integrality
for the variable vector c. The subsequent lemma motivates the fact that no
additional constraints forbidding the cycles are necessary.
Observation 3. Let (x, cf ) with x ∈ {0, 1}|C| and cf ∈ Q|Sh ∪Sv | be a feasible
solution for (ILP) and let zf be the value of the objective function. Then there is
a feasible solution (x, c) with c ∈ IN|Sh ∪Sv | and objective function value z ≤ zf .
Proof. Since x is part of a feasible solution, its components must be either zero
or one. Then (ILP) reads as the dual of a maximum flow problem with integer
capacities. It follows that there is an optimal integer solution [2].

Lemma 4. Let (x, c) be a feasible solution for (ILP) and let Dh and Dv be the
digraphs corresponding to x. Then Dh and Dv are acyclic.
Proof. Assume without loss of generality that there is a cycle (si1 , si2 , . . . , sik ) of
length k in Dh . This implies xi1 ,i2 = xi2 ,i3 = . . . = xik ,i1 = 1 and the appropriate
consistency constraints read
ci1 − ci2 + (M + 1) ≤ M
ci2 − ci3 + (M + 1) ≤ M
..
.
cik − ci1 + (M + 1) ≤ M .
Summed up, this yields k · (M + 1) ≤ k · M , a contradiction. t
u

Theorem 2. For each feasible solution (x, c) of (ILP) for a shape description
σ, there is a simple orthogonal grid drawing Γ whose shape corresponds to σ
and vice versa. The total edge length of Γ is equal to the value of the objective
function.
Optimal Compaction of Orthogonal Grid Drawings 315

Proof. For the first part of the proof, let x and c be the solution vectors of (ILP).
According to Observation 3 we can assume that both vectors are integer. Vector
x describes a complete extension τ of σ; the extension property is guaranteed
by constraints 1.1, completeness by constraints 1.2 and acyclicity according to
Lemma 4. The consistency constraints 1.3 require c to be a length assignment
for τ . The result follows with Theorem 1.
Again, we give a constructive proof for the second part: Theorem 1 allows us
to use Γ for the construction of a complete extension τ for σ. Setting ci to the
fixed coordinate of segment si and xij to 1 if the arc (si , sj ) is contained in τ and
to 0 otherwise, results in a feasible solution for the ILP. Evidently, the bounds
and integrality requirements for c and x are not violated and Constraints 1.1
and 1.2 are clearly satisfied. To show that the consistency constraints hold, we
consider two arbitrary vertical segments si and sj . Two cases may occur: Assume
first that si is to the left of sj . Then the corresponding constraint reads cj −ci ≥ 1
which is true since c(j) > c(i) and the values are integer. Now suppose that si is
not to the left of sj . In this case the constraint becomes ci −cj ≤ M which is true
for a sufficiently large M . A similar discussion applies to horizontal segments.
Obviously, the value of the objective function is equal to the total edge length
in both directions of the proof. t
u

The integer linear program can be solved via a branch–and–cut or branch–


and–bound algorithm. We now discuss the choice of the constant M .

Lemma 5. The value max{|Sh |, |Sv |} is a sufficient choice for M .

Proof. Note that any optimal drawing Γ has width w ≤ |Sv | and height h ≤ |Sh |,
otherwise a one–dimensional compaction step could be applied. M has to be big
enough to “disable” Constraints 1.3 if the corresponding entry of x is zero; it
has to be an upper bound for the distance between any pair of segments. Setting
it to the size of the bigger of the two sets fulfills this requirement. t
u

4 Implementation and Computational Results

4.1 Solving the ILP

Our implementation which we will refer to as opt throughout this section, solves
the integer linear program described in Sect. 3. It is realized as a compaction
module inside the AGD–library [3, 1] and is written in C++ using LEDA [13, 12].
In a preprocessing phase, the given shape description σ is completed as far as
possible. Starting with a list L of all segment pairs, we remove a pair (si , sj ) from
L if it either fulfills the definition of separation or there is only one possibility
to meet this requirement. In the latter case we insert the appropriate arc and
proceed the iteration. The process stops if no such pair can be found in L. If the
list is empty at the end of the preprocessing phase, we can solve the compaction
problem in polynomial time by optimizing over the integral polyhedron given by
the corresponding inequalities.
316 Gunnar W. Klau and Petra Mutzel

Otherwise, let σ + be the extension of σ resulting from the preprocessing


phase. opt first computes a corrupted layout for σ + in polynomial time. Then
it checks which non–separated pairs in L induce indeed a violation of the draw-
ing and adds the corresponding completeness and consistency inequalities. The
resulting integer linear program is solved with the Mixed Integer Solver from
CPLEX. The process of checking, adding inequalities and solving the ILP may
have to be repeated because new pairs in L can cause a violation. Currently
we are realizing a second implementation within the branch–and–cut framework
ABACUS [7].
For the initial construction of L we exploit Lemma 3 and Observation 2. We
only have to add pairs of segments which share a common face and which are of
opposite direction. By decreasing the size of L in this way we could reduce the
search space and speed up the computation remarkably.

4.2 Computational Results


We compare the results of our method to the results achieved by two other
compaction methods: orig is an implementation of the original method proposed
in [16]. It divides all the faces of the drawing into sub–faces of rectangular shape
and assigns consistent edge lengths in linear time. 1dim is an optimal one–
dimensional compaction algorithm. It first calls orig to get an initial drawing
and then runs iteratively a visibility–based compaction, alternating the direction
in each run. It stops if no further one–dimensional improvement is possible.
The algorithms orig, 1dim, and opt have been tested on a large test set.
This set, collected by the group of G. Di Battista, contains 11,582 graphs repre-
senting data from real–world applications [4]. For our experimental study, each
of the graphs has been planarized by computing a planar subgraph and rein-
serting the edges, representing each crossing by a virtual vertex. After fixing the
planar embedding for every graph we computed its shape using an extension
of Tamassia’s bend minimizing algorithm [10]. The resulting graphs with fixed
shape served as input for the three compaction algorithms. We compared the
total edge length and the area of the smallest enclosing rectangle of the draw-
ings produced by orig, 1dim, and opt. Furthermore, we recorded their running
times.
All the examples could be solved to optimality on a SUN Enterprise 10000.
For all instances, the running times of orig and 1dim have been below 0.05
and 0.43 seconds, respectively. opt could solve the vast majority of instances
(95.5%) in less than one second, few graphs needed more than five seconds (1.1%)
and only 29 graphs needed more than one minute. The longest running time,
however, was 68 minutes.
The average improvement of the total edge length computed by opt over the
whole test set of graphs was 2.4% compared to 1dim and 21, 0% compared to
orig. Just looking at hard instances where opt needed more than two seconds of
running time we yield average improvements of 7.5% and 36.1%, resp. Figures 8
and 9 show this fact in more detail: The x–axis denotes the size of the graphs,
the y–axis shows the improvement of total edge length in percent with respect to
Optimal Compaction of Orthogonal Grid Drawings 317

1dim and orig. We computed the minimal, maximal and average improvement
for the graphs of the same size. The average improvement values are quite inde-
pendent from the graph size, and the minimum and maximum values converge
to them with increasing number of vertices. Note that opt yields in some cases
improvements of more than 30% in comparison to the previously best strategy,
Fig. 1 in Sect. 1 shows the drawings for such a case (here, the improvement is
28%). For the area, the data look similar with the restriction that in few cases
the values produced by 1dim are slightly better than those from opt; short
edge length does not necessarily lead to low area consumption. The average area
improvements compared to 1dim and orig are 2.7% and 29.3%, however.
In general, we could make the following observations: Instances of the com-
paction problem cod divide into easy and hard problems, depending on the
structure of the corresponding graphs. On the one hand, we are able to solve
some randomly generated instances of biconnected planar graphs with 1,000
vertices in less than five seconds. In these cases, however, the improvement com-
pared to the results computed by 1dim is small. On the other hand, graphs
containing tree–like structures have shown to be hard to compact since their
number of fundamentally different drawings is in general very high. For these
cases, however, the improvement is much higher.

5 Conclusions and Future Plans

We have introduced the constraint graphs describing the shape of a simple or-
thogonal grid drawing. Furthermore, we have established a direct connection
between these graphs by defining complete extensions of the constraint graphs
that satisfy certain connectivity requirements in both graphs. We have shown
that complete extensions characterize the set of feasible drawings with the given
shape. For a given complete extension we can solve the compaction problem cod
in polynomial time. The graph–theoretical characterization allows us to formu-
late cod as an integer linear program. The preprocessing phase of our algorithm
detects those instances having only one complete extension and, for them, the
optimal algorithm runs in polynomial time. Our experiments show that the re-
sulting ILP can be solved within short computation time for instances as big as
1,000 vertices.
There are still open questions concerning the two–dimensional compaction
problem: We are not satisfied having the ‘big’ M in our integer linear program-
ming formulation. Our future plans include the extensive study of the 0/1–
polytope Pcod that characterizes the set of complete extensions. We hope that
the results will lead to a running time improvement of our branch–and–cut al-
gorithm. Independently, Bridgeman et al. [5] have developed an approach based
on the upward planar properties of the layout graphs. We want to investigate a
possible integration of their ideas. Furthermore, we plan to adapt our method
to variations of the two–dimensional compaction problem.
318 Gunnar W. Klau and Petra Mutzel

Improvement in %
40

35

30

25

20

15

10

0
50 100 150 200 250 300
Number of vertices

Fig. 8. Quality of opt compared to 1dim.


Improvement in %
70

60

50

40

30

20

10

0
50 100 150 200 250 300
Number of vertices

Fig. 9. Quality of opt compared to orig.


Optimal Compaction of Orthogonal Grid Drawings 319

References
[1] AGD. AGD User Manual. Max-Planck-Institut Saarbrücken, Universität Halle,
Universität Köln, 1998. http://www.mpi-sb.mpg.de/AGD.
[2] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms,
and Applications. Prentice Hall, Englewood Cliffs, NJ, 1993.
[3] D. Alberts, C. Gutwenger, P. Mutzel, and S. Näher. AGD–Library: A library of
algorithms for graph drawing. In G. Italiano and S. Orlando, editors, WAE ’97
(Proc. on the Workshop on Algorithm Engineering), Venice, Italy, Sept. 11-13,
1997. http://www.dsi.unive.it/~wae97.
[4] G. D. Battista, A. Garg, G. Liotta, R. Tamassia, E. Tassinari, and F. Vargiu.
An experimental comparison of four graph drawing algorithms. CGTA: Compu-
tational Geometry: Theory and Applications, 7:303 – 316, 1997.
[5] S. Bridgeman, G. Di Battista, W. Didimo, G. Liotta, R. Tamassia, and L. Vis-
mara. Turn–regularity and optimal area drawings of orthogonal representations.
Technical report, Dipartimento di Informatica e Automazione, Università degli
Studi di Roma Tre, 1999. To appear.
[6] U. Fößmeier and M. Kaufmann. Drawing high degree graphs with low bend
numbers. In F. J. Brandenburg, editor, Graph Drawing (Proc. GD ’95), volume
1027 of Lecture Notes in Computer Science, pages 254–266. Springer-Verlag, 1996.
[7] M. Jünger and S. Thienel. Introduction to ABACUS - A Branch-And-CUt System.
Operations Research Letters, 22:83–95, March 1998.
[8] G. Kedem and H. Watanabe. Graph optimization techniques for IC–layout and
compaction. IEEE Transact. Comp.-Aided Design of Integrated Circuits and Sys-
tems, CAD-3 (1):12–20, 1984.
[9] G. W. Klau and P. Mutzel. Optimal compaction of orthogonal grid draw-
ings. Technical Report MPI–I–98–1–031, Max–Planck–Institut für Informatik,
Saarbrücken, December 1998.
[10] G. W. Klau and P. Mutzel. Quasi–orthogonal drawing of planar graphs. Technical
Report MPI–I–98–1–013, Max–Planck–Institut für Informatik, Saarbrücken, May
1998.
[11] T. Lengauer. Combinatorial Algorithms for Integrated Circuit Layout. John Wiley
& Sons, New York, 1990.
[12] K. Mehlhorn, S. Näher, M. Seel, and C. Uhrig. LEDA Manual Version
3.7.1. Technical report, Max-Planck-Institut für Informatik, 1998. http://
www.mpi-sb.mpg.de/LEDA.
[13] K. Mehlhorn and S. Näher. LEDA: A platform for combinatorial and geometric
computing. Communications of the ACM, 38(1):96–102, 1995.
[14] M. Patrignani. On the complexity of orthogonal compaction. Technical Report
RT–DIA–39–99, Dipartimento di Informatica e Automazione, Università degli
Studi di Roma Tre, January 1999.
[15] M. Schlag, Y.-Z. Liao, and C. K. Wong. An algorithm for optimal two–dimensional
compaction of VLSI layouts. Integration, the VLSI Journal, 1:179–209, 1983.
[16] R. Tamassia. On embedding a graph in the grid with the minimum number of
bends. SIAM J. Comput., 16(3):421–444, 1987.
On the Number of Iterations for Dantzig-Wolfe
Optimization and Packing-Covering
Approximation Algorithms

Philip Klein1,? and Neal Young2,??


1
Brown University, Providence, RI, USA; [email protected]
2
Dartmouth College, Hanover, NH, USA; [email protected]

1 Introduction

We start with definitions given by Plotkin, Shmoys, and Tardos [16]. Given
A ∈ IRm×n , b ∈ IRm and a polytope P ⊆ IRn , the fractional packing problem is
to find an x ∈ P such that Ax ≤ b if such an x exists. An -approximate solution
to this problem is an x ∈ P such that Ax ≤ (1 + )b. An -relaxed decision
procedure always finds an -approximate solution if an exact solution exists.
A Dantzig-Wolfe-type algorithm for a fractional packing problem x ∈
P, Ax ≤ b is an algorithm that accesses P only by queries to P of the following
form: “given a vector c, what is an x ∈ P minimizing c · x?”
There are Dantzig-Wolfe-type -relaxed decision procedures (e.g. [16]) that
require O(ρ−2 log m) queries to P , where ρ is the width of the problem instance,
defined as follows:
ρ(A, P ) = max max Ai · x/bi
x∈P i

th
where Ai denotes the i row of A.
In this paper we give a natural probability distribution of fractional packing
instances such that, for an instance chosen at random, with probability 1 − o(1)
any Dantzig-Wolfe-type -relaxed procedure must make at least Ω(ρ−2 log m)
queries to P . This lower bound matches the aforementioned upper bound, pro-
viding evidence that the unfortunate linear dependence of the running times of
these algorithms on the width and on −2 is an inherent aspect of the Dantzig-
Wolfe approach.
The specific probability distribution we study here is as follows.
√ Given m and
ρ, let A be a random {0, 1}-matrix with m rows and n = m columns, where
each entry of A has probability 1/ρ of being 1. Let P be the n-simplex, and let
b be the m-vector whose every entry is some v, where v is as small as possible
so that Ax ≤ b for some x ∈ P .
The class of Dantzig-Wolfe-type algorithms encompasses algorithms and al-
gorithmic methods that have been actively studied since the 1950’s through the
current time, including:
?
Research supported by NSF NSF Grant CCR-9700146.
??
Research supported by NSF Career award CCR-9720664.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 320–327, 1999.
c Springer-Verlag Berlin Heidelberg 1999
On the Number of Iterations for Dantzig-Wolfe Optimization 321

– an algorithm by Ford and Fulkerson for multicommodity flow [14],


– Dantzig-Wolfe decomposition (generalized linear programming) [4],
– Benders’ decomposition [3],
– the Lagrangean relaxation method developed by Held and Karp and applied
to obtaining lower bounds for the traveling salesman problem [9, 10],
– the multicommodity flow approximation algorithms of Shahrokhi and Mat-
ula [18], of Klein et al. [13], and of Leighton et al. [15],
– the covering and packing approximation algorithms of Plotkin, Shmoys, and
Tardos[16] and the approximation algorithms of Grigoriadis and Khachiyan[8]
for block-angular convex programs, and many subsequent works (e.g. [20, 6]).
In a later section we discuss some of the history of the above algorithms and
methods and how they relate to the fractional packing problem studied here.
To prove the lower bound we use a probabilistic, discrepancy-theory argu-
ment to characterize the values of random m × s zero-sum games when s is much
smaller than m. From the point of view proposed in [20], where fractional pack-
ing algorithms are derived using randomized rounding (and in particular the
Chernoff bound), the intuition for the lower bound here is that it comes from
the fact that the Chernoff bound is essentially tight.
Some of the multicommodity flow algorithms, and subsequently the algo-
rithms of Plotkin, Shmoys, Tardos and of Grigoriadis and Khachiyan, use a more
general model than the one described above. This model assumes the polytope
P is the cross-product P = P 1 × · · · × P k of k polytopes. In this model, each
iteration involves optimizing a linear function over one of the polytopes P i .
It is straightforward to extend the our lower bound to this model by making
A block-diagonal, thus forcing each subproblem to be independently solved. In
this general
P case, the lower bound shows that the number of iterations must be
Ω(−2 ( i ρi ) log m), where ρi is the width of P i . This lower bound is also tight
within a constant factor, as it matches the upper bounds of Plotkin, Shmoys,
and Tardos.

Previous Lower Bounds. In 1977, Khachiyan [12] proved an Ω(−1 ) lower


bound on the number of iterations to achieve an error of 
In 1994, Grigoriadis and Khachiyan proved an Ω(m) lower bound on the
number of iterations to achieve a relative error of  = 1. They did not consider
the dependence of the number of iterations on  for smaller values of .
Freund and Schapire [5], in an independent work in the context of learning
theory, prove a lower bound on the net “regret” of any adaptive strategy for
playing repeated zero-sum games against an adversary. This result is related to,
but different from, the result proved here.

2 Proof of Main Result


For any m-row n-column matrix A, define the value of A (considered as a two-
player zero-sum matrix game) to be
.
V (A) = min max Ai · x
x 1≤i≤m
322 Philip Klein and Neal Young

where Ai denotes the ith row of A and x ranges over the n-vectors with non-
negative entries summing to 1.

Theorem 1. For m ∈ IN, n = Θ(m0.5 ), and p ∈ (0, 1/2), let A be a random


{0, 1} m × n matrix with i.i.d. entries, each being 1 with probability p. Let ¯ > 0.
With probability 1 − o(1),
– V (A) = Ω(p), and
– for s ≤ min{(ln m)/(p¯2 ), m0.5−δ } (where δ > 0 is fixed), every m × s sub-
matrix B of A satisfies

V (B) > (1 + c¯
)V (A)

where c is a constant depending on δ.

Our main result follows as a corollary.

Corollary 1. Let m ∈ IN, ρ > 2, and  > 0 be given such that ρ−2 = O(m0.5−δ )
for some constant δ > 0.
For p = 1/ρ, and n = m0.5 , let A be a random {0, 1} m × n matrix as in the
theorem. Let b denoteP the m-element vector whose every element is V (A). Let
P = {x ∈ Rn : x ≥ 0, i xi = 1} be the n-simplex.
Then with probability 1 − o(1), the fractional packing problem instance x ∈
P, Ax ≤ b has width O(ρ), and any Dantzig-Wolfe-type -relaxed decision pro-
cedure requires at least Ω(ρ−2 log m) queries to P when given the instance as
input.

Assuming the theorem for a moment, we prove the corollary. Suppose that
the matrix A indeed has the two properties that hold with probability 1 − o(1)
according to the theorem. It follows from the definition of V (A) that there exists
x∗ ∈ P n such that Ax∗ ≤ b. That is, there exists a (non-approximate) solution
to the fractional packing problem.
To bound the width, let x̄ be any vector in P . By definition of P and A, for
any row Aj of A we have Aj · x̄ ≤ 1. On the other hand, from the theorem we
know that V (A) = Ω(p) = Ω(1/ρ). Since bj = V (A), it follows that Aj · x̄/bj is
O(ρ). Since this is true for every j and x̄ ∈ P , this bounds the width.
Now consider any Dantzig-Wolfe-type -relaxed decision procedure. Suppose
for a contradiction that it makes no more than s ≤ ρ(c)−2 ln m calls to the
oracle that optimizes over P . In each of these calls, the oracle returns a vertex
of P , i.e. a vector of the form

(0, 0, . . . , 0, 1, 0, . . . , 0, 0)

Let S be the set of vertices returned, and let P (S) be the convex hull of these
vertices. Every vector in P (S) has at most s non-zero entries, for its only non-
zero entries can occur in positions for which there is a vector in S having a 1 in
that position. Hence, by the theorem with ¯ = /c, there is no vector x ∈ P (S)
that satisfies Ax ≤ (1 + )b.
On the Number of Iterations for Dantzig-Wolfe Optimization 323

Consider running the same algorithm on the fractional packing problem Ax ≤


b, x ∈ P (S), i.e. with P (S) replacing P . The procedure makes all the same queries
to P as before, and receives all the same answers, and hence must give the
same output, namely that an -approximate solution exists. This is an incorrect
output, which contradicts the definition of a relaxed decision procedure.

3 Proof of Theorem 1
For any m-row n-column matrix A, define the value of A (considered as a two-
player zero-sum matrix game) to be
V (A) = min max Ai · x
x 1≤i≤m

where Ai denotes the ith row of A and x ranges over the n-vectors with non-
negative entries summing to 1.
Before we give the proof of Theorem 1, we introduce some simple tools for
reasoning about V (X) for a random {0, 1} matrix X.
By the definition of V , V (X) is at most the maximum, over all rows, of the
average of the row’s entries. Suppose each entry in X is 1 with probability q, and
within any row of X the entries are independent. Then for any δ with 0 < δ < 1, a
standard Chernoff bound implies that the probability that a given row’s average
exceeds (1 + δ)q is exp(−Θ(δ 2 qnX )), where nX is the number of columns of
X. Thus, by a naive union bound Pr[V (X) ≥ (1 + δ)q] ≤ mX exp(−Θ(δ 2 qnX ))
where mX is the number of rows of X. For convenience we rewrite this bound
as follows. For any q ∈ [0, 1] and β ∈ (0, 1], assuming mX /β → ∞,
s !
ln(mX /β)
Pr[V (X) ≥ (1 + δ)q] = o(β) for some δ = O . (1)
qnX
We use an analogous lower bound on V (X). By von Neumann’s Min-Max The-
orem
V (X) = max min Xi0 · y
y i
(where X 0 denotes the transpose of X). Thus, reasoning similarly, if within any
column of X (instead of any row) the entries are independent,
s !
ln(nX /β)
Pr[V (X) ≤ (1 − δ)q] = o(β) for some δ = O , (2)
qmX
assuming nX /β → ∞. We will refer to (1) and (2) as the naive upper and lower
bounds on V (X), respectively.
Proof of Theorem 1.
The naive lower bound to V (A) shows that
s !
ln n
Pr[V (A) ≤ p(1 − δ0 )] = o(1) for some δ0 = O = o(1). (3)
pm
Thus, V (A) ≥ Ω(p) with probability 1 − o(1).
324 Philip Klein and Neal Young

Let s = min{(ln m)/(p¯ 2 ), m0.5−δ }. Assume without generality that s =


(ln m)/(p¯
2 ) (by increasing ¯ if necessary).
We will show that with probability 1 − o(1) any m × s submatrix B of A has
value
V (B) > (1 + c¯)V (A) (4)
The definition of value implies that V (B 0 ) ≥ V (B) for any m × s0 submatrix B 0
of B (where s0 ≤ s). Thus we obtain (4) for such submatrices B 0 as well.
For any of the m rows of B, the expected value of the average of the s entries
is p. We will show at least r = s2 ln n of the rows have a higher than average
number of ones and by focusing on these rows we will show that V (B) is likely
to be significantly higher than V (A).
For appropriately chosen δ1 , the
 probability that a given row of B has at least
` = (1 + δ1 )ps ones is at least s` p` (1 − p)s−` = exp(−O(δ12 ps)). (That is, the
Chernoff bound is essentially tight here up to constant factors in the exponent.)
Call any such row good and let G denote the number of good rows. In particular
choosing some
s ! s !
ln(m/r) ln m
δ1 = Ω =Ω = Ω()
ps ps

the probability that any given row is good is at least 2r/m and the expectation
of G is at least 2r. Since G is a sum of independent random {0, 1} random
variables, Pr[G < r] < exp(−r/8).
By the choice of r, this is o(1/ns ), so with probability 1 − o(1/ns ), B has at
least r good rows.
Suppose this is indeed the case and select any r good rows. Let C be the r × s
submatrix of B formed by the chosen rows. In any column of C, the entries are
independent and by symmetry each has probability at least p(1 + δ1 ) of being 1.
Applying the naive lower bound (2) to V (C), we find
s !
s s ln n
Pr[V (C) ≤ p(1 + δ1 )(1 − δ2 )] = o(1/n ) for some δ2 = O . (5)
pr

By the choice of r, δ2 = o(δ1 ). Thus (1 + δ1 )(1 − δ2 ) = 1 + Ω(δ1 ). Since V (B) ≥


V (C), we find that, for any m × s submatrix B, V (B) ≥ p(1 + Ω(δ1 )) with
probability 1 − o(1/ns ). 
Since there are at most ns ≤ ns distinct m × s submatrices B of A, the
probability that all of them have value p(1 + Ω(δ1 )) is 1 − o(1). Finally, applying
the naive upper bound to V (A) shows that
s !
ln m
Pr[V (A) ≥ p(1 + δ3 )] = o(1) for some δ3 = O . (6)
pn

Since δ3 = o(δ1 ), the result follows. t


u
On the Number of Iterations for Dantzig-Wolfe Optimization 325

4 Historical Discussion
Historically, there are three lines of research within what we might call the
Dantzig-Wolfe model. One line of work began with a method proposed by Ford
and Fulkerson for computing multicommodity flow. Dantzig and Wolfe noticed
that this method was not specific to multicommodity flow; they suggested de-
composing an arbitrary linear program into two sets of constraints, writing it
as
min{cx : x ≥ 0, Ax ≥ b, x ∈ P } (7)
and solving the linear program by an iterative procedure: each iteration in-
volves optimizing over the polytope P . This approach, now called Dantzig-Wolfe
decomposition, is especially useful when P can be written as a cross-product
P1 × · · · × Pk , for in this case minimization over P can be accomplished by
minimizing separately over each Pi . Often, for example, distinct Pi ’s constrain
disjoint subsets of variables. In practice, this method tends to require many it-
erations to obtain a solution with value optimum or nearly optimum, often too
many to be useful.

Lagrangean Relaxation
A second line of research is represented by the work of Held and Karp [9, 10]. In
1970 they proposed a method for estimating the minimum cost of a traveling-
salesman tour. Their method was based on the concept of a 1-tree, which is a
slight variant of a spanning tree. They proposed two ways to calculate this esti-
mate; one involved formulating the estimate as the solution to the mathematical
program  
max ub + min(c − uA)x (8)
u x∈P

where P is the polytope whose vertices are the 1-trees. They suggested an it-
erative method to find an optimal or near-optimal solution: While they given
some initial assignment to u, find a minimum-cost 1-tree with respect to the
edge-costs c − uA. Next, update the node-prices u based on the degrees of the
nodes in the 1-tree found. Find a min-cost 1-tree with respect to the modified
costs, update the node-prices accordingly, and so on.
Like Dantzig and Wolfe’s method, this method’s only dependence on the
polytope P is via repeatedly optimizing over it. In the case of Held and Karp’s
estimate, optimizing over P amounts to finding a minimum-cost spanning tree.
Their method of obtaining an estimate for the solution to a discrete-optimization
problem came to be known as Lagrangean relaxation, and has been applied to
a variety of other problems.
Held and Karp’s method for finding the optimal or near-optimal solution
to (8) turns out to be the subgradient method, which dates back to the early
sixties. Under certain conditions this method can be shown to converge in the
limit, but, like Dantzig and Wolfe’s method it can be rather slow. (One author
refers to the “the correct combination of artistic expertise and luck” [19] needed
to make progress in subgradient optimization.)
326 Philip Klein and Neal Young

Fractional Packing and Covering


The third line of research, unlike the first two, provided guaranteed convergence
rates. Shahrokhi and Matula [18] gave an approximation algorithm for a special
case of multicommodity flow. Their algorithm was improved and generalized by
Klein, Plotkin, Stein, and Tardos [13], Leighton et al. [15], and others. Plotkin,
Shmoys, and Tardos [16] noticed that the technique could be generalized to apply
to the problem of finding an element of the set

{x : Ax ≤ b, x ∈ P } (9)

where P is a convex set and A is a matrix such that Ax ≥ 0 for every x ∈ P . In


particular, as discussed in the introduction, they gave an -relaxed decision
procedure that required O(ρ−2 log m) queries to P , where ρ is the width of
the problem instance.
A similar result was obtained independently by Grigoriadis and Khachiyan [8].
Many subsequent algorithms (e.g. [20, 6]) built on these results. Furthermore,
many applications for these results have been proposed.
This method of Plotkin, Shmoys, Tardos and Grigoriadis, Khachiyan im-
proves on Dantzig-Wolfe decomposition and subgradient optimization in that it
does not require artistry to achieve convergence, and it is effective for reasonably
large values of . However, for small  the method is frustratingly slow. Might
there be an algorithm in the Dantzig-Wolfe model that converges more quickly?
Our aim in this paper has been to address this question, and to provide
evidence that the answer is no. However, our lower bound technique is incapable
of proving a lower bound that is superlinear in m, the number of rows of A.
The reason is that for any m-row matrix A, there is an m-column submatrix B
such that V (B) = V (A). This raises the question of whether there is a Dantzig-
Wolfe-type method that requires a number of iterations polynomial in m but
subquadratic in 1/.

References
[1] B. Awerbuch and T. Leighton. A simple local-control approximation algorithm
for multicommodity flow. In Proc. of the 34th IEEE Annual Symp. on Foundation
of Computer Science, pages 459–468, 1993.
[2] B. Awerbuch and T. Leighton. Improved approximation algorithms for the multi-
commodity flow problem and local competitive routing in dynamic networks. In
Proc. of the 26th Ann. ACM Symp. on Theory of Computing, pages 487–495, 1994.
[3] J. F. Benders Partitioning procedures for solving mixed-variables programming
problems. Numerische Mathematik 4:238–252, 1962.
[4] G. B. Dantzig and P. Wolfe. Decomposition principle for linear programs. Oper-
ations Res., 8:101–111, 1960.
[5] Y. Freund and R. Schapire. Adaptive game playing using multiplicative weights.
J. Games and Economic Behavior, to appear.
[6] N. Garg and J. Könemann. Faster and simpler algorithms for multicommodity
flow and other fractional packing problems. In Proc. of the 39th Annual Symp.
on Foundations of Computer Science, pages 300–309, 1998.
On the Number of Iterations for Dantzig-Wolfe Optimization 327

[7] A. V. Goldberg. A natural randomization strategy for multicommodity flow and


related problems. Information Processing Letters, 42:249–256, 1992.
[8] M. D. Grigoriadis and L. G. Khachiyan. A sublinear-time randomized approx-
imation algorithm for matrix games. Technical Report LCSR-TR-222, Rutgers
University Computer Science Department, New Brunswick, NJ, April 1994.
[9] M. Held and R. M. Karp. The traveling salesman problem and minimum spanning
trees. Operations Research, 18:1138–1162, 1971
[10] M. Held and R. M. Karp. The traveling salesman problem and minimum spanning
trees: Part II. Mathematical Programming , 1:6–25, 1971.
[11] D. Karger and S. Plotkin. Adding multiple cost constraints to combinatorial
optimization problems, with applications to multicommodity flows. In Proc. of
the 27th Ann. ACM Symp. on Theory of Computing, pages 18–25, 1995.
[12] L. G. Khachiyan. Convergence rate of the game processes for solving matrix
games. Zh. Vychisl. Mat. and Mat. Fiz., 17:1421–1431, 1977. English translation
in USSR Comput. Math and Math. Phys., 17:78–88, 1978.
[13] P. Klein, S. Plotkin, C. Stein, and E. Tardos. Faster approximation algorithms
for the unit capacity concurrent flow problem with applications to routing and
finding sparse cuts. SIAM Journal on Computing, 23(3):466–487, June 1994.
[14] L. R. Ford and D. R. Fulkerson. A suggested computation for maximal multicom-
modity network flow. Management Sci., 5:97–101, 1958.
[15] T. Leighton, F. Makedon, S. Plotkin, C. Stein, É. Tardos, and S. Tragoudas.
Fast approximation algorithms for multicommodity flow problems. Journal of
Computer and System Sciences, 50(2):228, 1995.
[16] S. Plotkin, D. Shmoys, and É. Tardos. Fast approximation algorithms for frac-
tional packing and covering problems. Mathematics of Operations Research,
20:257–301, 1995.
[17] T. Radzik. Fast deterministic approximation for the multicommodity flow prob-
lem. In Proc. of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms,
486–492, 1995.
[18] F. Shahrokhi and D. W. Matula. The maximum concurrent flow problem. JACM,
37:318–334, 1990.
[19] J. F. Shapiro. A survey of lagrangean techniques for discrete optimization. Annals
of Discrete Mathematics, 5:113–138, 1979.
[20] N. E. Young. Randomized rounding without solving the linear program. In Proc.
of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 170–
178, 1995.
Experimental Evaluation of Approximation
Algorithms for Single-Source Unsplittable Flow

Stavros G. Kolliopoulos? and Clifford Stein??


1
NEC Research Institute, Inc., 4 Independence Way, Princeton, NJ 08540
[email protected]
2
Dartmouth College, Department of Computer Science, Hanover, NH 03755-3510
[email protected]

Abstract. In the single-source unsplittable flow problem, we are given


a network G, a source vertex s and k commodities with sinks ti and
real-valued demands ρi , 1 ≤ i ≤ k. We seek to route the demand ρi of
each commodity i along a single s-ti flow path, so that the total flow
routed across any edge e is bounded by the edge capacity ce . This NP-
hard problem combines the difficulty of bin-packing with routing through
an arbitrary graph and has many interesting and important variations.
In this paper we initiate the experimental evaluation of approximation
algorithms for unsplittable flow problems. We examine the quality of ap-
proximation achieved by several algorithms for finding a solution with
near-optimal congestion. In the process we analyze theoretically a new
algorithm and report on the practical relevance of heuristics based on
minimum-cost flow. The experimental results demonstrate practical per-
formance that is better than the theoretical guarantees for all algorithms
tested. Moreover modifications to the algorithms to achieve better theo-
retical results translate to improvements in practice as well.

1 Introduction
In the single-source unsplittable flow problem (Ufp), we are given a network
G = (V, E, u), a source vertex s and a set of k commodities with sinks t1 , . . . , tk
and associated real-valued demands ρ1 , . . . , ρk . We seek to route the demand
ρi of each commodity i, along a single s-ti flow path, so that the total flow
routed across any edge e is bounded by the edge capacity ue . See Figure 1 for
an example instance. The minimum edge capacity is assumed to have value at
least maxi ρi .
The requirement of routing each commodity on a single path bears resem-
blance to integral multicommodity flow and in particular to the multiple-source
edge-disjoint path problem. For the latter problem, a large amount of work ex-
ists either for solving exactly interesting special cases (e.g. [10], [33], [32]) or for
approximating, with limited success, various objective functions (e.g. [31],[11],
?
Part of this work was performed at the Department of Computer Science, Dartmouth
College while partially supported by NSF Award NSF Career Award CCR-9624828.
??
Research partly supported by NSF Career Award CCR-9624828.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 328–344, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Unsplittable Flow 329

[21], [22],[19]). Shedding light on single-source unsplittable flow may lead to a


better understanding of multiple-source disjoint-path problems. From a different
perspective, Ufp can be approached as a variant of standard, single-commodity,
maximum flow and generalizes single-source edge-disjoint paths. Ufp is also im-
portant as a unifying framework for a variety of scheduling and load balancing
problems [20]. It can be used to model bin-packing [20] and certain scheduling
problems on parallel machines [27]. Another application for Ufp is in virtual-
circuit routing, where the source represents a node wishing to multicast data
to selected sinks. The conceptual difficulty in Ufp arises from combining both
packing constraints due to the existence of capacities and path selection through
a graph of arbitrary topology.

1 1
111
000
000
111 111
000
000
111
000
111
000
111 000
111
000
111
000
111 11
00 000
111 11
00
00
11
00 1/2
11 00
11
00 1/2
11
00
11 00
11
00
11 00
11
s 111
000
000
111 s 111
000
000
111
000
111 000
111
000
111 000
111
11
00
00
11 11
00
00
11
00
11
00 1/2
11 00
11
00 1/2
11
00
11 00
11
111
000
000
111 11
00
00
11
000
111
000
111 00
11
00
11
000
111 00
11
1 1 1/2 flow units

1 flow unit

Input Unsplittable solution

Fig. 1. Example Ufp instance. Numbers on the vertices denote demands; all
edge capacities are equal to 1.

The feasibility question for Ufp, i.e., whether all commodities can be routed
unsplittably, is strongly NP-complete [19]. Thus research has focused on efficient
algorithms to obtain approximate solutions. For a minimization (resp. maxi-
mization) problem, a ρ-approximation algorithm, ρ > 1, (ρ < 1), outputs in
polynomial time a solution with objective value at most (at least) ρ times the
optimum. Different optimization versions can be defined such as minimizing con-
gestion, maximizing the routable demand and minimizing the number of rounds
[19]. In this work we focus on the congestion metric. The reader is referred to
[20,24,7] for theoretical work on the other metrics.
Given a flow f define the utilization ze of edge e as the ratio fe /ue . We
seek an unsplittable flow f satisfying all demands, which minimizes congestion,
i.e. the quantity maxe {ze , 1}. That is, the minimum congestion is the smallest
number α ≥ 1, such that if we multiplied all capacities by α, f would satisfy the
capacity constraints.
In this paper we initiate the experimental study of approximation algorithms
for the unsplittable flow problem. Our study examines the quality of the effective
approximation achieved by different algorithms, how this quality is affected by
330 Stavros G. Kolliopoulos and Clifford Stein

input structure, and the effect of heuristics on approximation and running time.
The experimental results indicate performance which is typically better than
the theoretical guarantees. Moreover modifications to the algorithms to achieve
better theoretical results translate to improvements in practice as well. The main
focus of our work is to examine the quality of the approximation and we have
not extensively optimized our codes for speed. Thus we have largely opted for
maintaining the relative simplicity and modularity that we believe is one of the
advantages of the algorithms we test. However keeping the running time low
allows the testing of larger instances. Therefore we report on wall clock running
times, at a granularity of seconds, as an indication of the time complexity in
practice. Our codes run typically much faster than multicommodity flow codes
on similar size instances [28,12,30] but worse than maximum flow codes [5] with
which they have a comparable theoretical running time [24].
We chose to focus our experimentation on the congestion metric for a variety
of reasons. The approximation ratios and integrality gaps known for congestion
are in general better than those known for the other metrics [19,24,7]. Moreover
congestion is relevant for maximizing routed demand in the following sense: an
unsplittable solution with congestion λ provides a guarantee that 1/λ fraction of
each demand can be routed while respecting capacities. Finally, the congestion
metric has been extensively studied theoretically in the setting of randomized
rounding technique [31] and for its connections to multicommodity flow and cuts
(e.g. [26,17,25,18]).

Known theoretical performance guarantees. Kleinberg [19,20] was the first to


consider Ufp as a distinct problem and to give (large) constant-factor approxi-
mation algorithms. The authors obtained Partition, a simple 4-approximation
algorithm [24] and a 3.33 bound for a more elaborate version of Partition.
Very recently, Dinitz, Garg, and Goemans [7] obtained an algorithm with a
performance guarantee of 1 + maxi {ρi } ≤ 2. This new algorithm matches the
integrality gap known for the problem.

New algorithms and heuristics. We provide a new algorithm, called


H Partition, and prove that it is a 3-approximation algorithm. We outline the
ideas behind our algorithms in Section 3. For both Partition and H Partition
we test several codes, each implementing different heuristics. All heuristics are
designed so that they do not affect the theoretical performance guarantee of the
underlying base algorithm (4 and 3 for Partition and H Partition respec-
tively). The heuristics fall into two major categories. The cost heuristic invokes
minimum-cost flow subroutines when the original algorithm would invoke max-
imum flow to compute partial unsplittable solutions. The cost of an edge e is
defined as an appropriate increasing function of the current congestion on e. The
sparsification heuristic discards at an early stage of the execution the edges car-
rying zero amount of flow in the initial solution to the maximum flow relaxation.
The motivation behind the sparsification heuristic is to decrease the running time
and observe how different versions of the algorithms behave in practice since the
Unsplittable Flow 331

zero edges are not exploited in the theoretical analysis. Finally, we test codes
which combine both the sparsification and the cost heuristic.

Test data. We test our codes on five main families of networks. For each family we
have generated an extensive set of instances by varying appropriate parameters.
We test mostly on synthetic data adapted from previous experimental work for
multicommodity flow [28,12] and cuts [3] together with data from some new
generators. The instances produced are designed to accommodate a variety of
path and cut characteristics. Establishing a library of interesting data generators
for disjoint-path and unsplittable flow problems is an important task in its own
right given the practical significance of this class of problems. We believe our
work makes a first contribution in this direction. We attempted to obtain access
to real-world data from telecommunication and VLSI applications, but were
unsuccessful due to their proprietary nature. However one of the input families
we test (LA networks) is adapted from synthetic data, used in the restoration
capacity study of Deng et al. [6], which was created based on actual traffic
forecasts on realistic networks.

Experimental Results. In summary, the experimental results indicate, for both


algorithms Partition and H Partition, typical performance which is better
than the theoretical guarantees. Theoretical improvements translate to practical
improvements as well since algorithm H Partition achieved significantly better
approximations than algorithm Partition on the instances we examined. We
observed that the sparsification heuristic can drastically reduce the running time
without much adverse effect on congestion. Finally we show that running the cost
heuristic can provide additional small improvement on congestion at the expense
of an increase in running time.

2 Preliminaries

A single-source unsplittable flow problem (Ufp) (G, s, T ) is defined on a capac-


itated network G = (V, E, u) where a subset T = {t1 , . . . , tk } of V is the set of
sinks; every edge e has a capacity ue ; each sink ti has a demand ρi . The pair
(ti , ρi ) defines the commodity i. To avoid cumbersome notation, we use occa-
sionally the set T to denote the set of commodities. We assume in our definition
that each vertex in V hosts at most one sink; the input network can always be
transformed to fulfill this requirement. A source vertex s ∈ V is specified. We
denote by ρmax , ρmin the maximum and minimum demand values for the prob-
lem of interest. For any I ⊆ T , a routing is a set of s-tij paths Pj , 1 ≤ j ≤ |I|
used to route ρij amount of flow from P s to tij for each tij ∈ I. Given a routing g,
the flow ge through edge e is equal to Pi ∈g|Pi 3e ρi . For clarity we will refer to a
standard maximum flow as fractional maximum flow. We make two assumptions
in our treatment of Ufp. First, we assume that there is a feasible fractional flow
solution. This assumption is introduced for simplicity, our approximation guar-
antees hold even without this assumption. The second assumption is a balance
332 Stavros G. Kolliopoulos and Clifford Stein

assumption that states that the maximum demand is less than or equal to the
minimum edge capacity, i.e., a demand can be routed on any edge. This assump-
tion is partly justified by the splitting of data into small size packets in actual
networks so that no commodity has arbitrarily large demand. Even without the
balance assumption a constant factor approximation is possible [24]. We will re-
fer to instances that satisfy the balanced assumption as balanced, and ones which
violate it as unbalanced, or more precisely, ones in which the maximum demand
is ρ times the minimum capacity as ρ-unbalanced. Unless otherwise stated, we
use n, m to denote |V |, |E| respectively.
The following theorem is an easy consequence of the well-known augmenting
path algorithm ([9], [8]) and will be of use.
Theorem 1. Let (G = (V, E, u), s, T ) be a Ufp on an arbitrary network G with
all demands ρi , 1 ≤ i ≤ k, equal to the same value σ, and edge capacities ue
equal to integral multiples of σ for all e ∈ E. If there is a fractional flow of value
f, then there is an unsplittable flow of value at least f. This unsplittable flow can
be found in time O(km) where k is the number of sinks in T .
In the same way a routing corresponds to an edge flow, an edge flow can be
represented as a path flow. Our algorithms will make use of the well-known
flow decomposition theorem [1]. Given problem (G, s, T ) let a flow solution f be
represented as an edge flow, i.e. an assignment of flow values to edges. Then one
can find, in O(nm) time, a representation of f as a path and cycle flow, where
the paths join s to sinks in T . In the remainder of the paper, we will assume
that the cycles are deleted from the output of a flow decomposition.

3 The Two Basic Algorithms

We present first briefly the 4-approximation algorithm Partition. Subsequently


we present and analyze the 3-approximation algorithm H Partition. Recall
that in our theoretical exposition we assume that in an Ufp demands lie in the
interval (0, 1] and capacities are at least 1. In particular let 1/D be the minimum
demand value for real D ≥ 1.
Partition works by partitioning the original Ufp into subproblems, each
containing only demands in a fixed range. We use an initial maximum-flow solu-
tion to allocate capacity to different subproblems. Capacities in the subproblems
are arbitrary and may violate the balance assumption. The subproblems are
separately solved by subroutine Interval Routing which finds low-congetion
solutions; the partial solutions are then combined by superimposing the flow
paths obtained in the respective routings on the original network G. The com-
bination step will have a subadditive effect on the total congestion and thus a
near-optimal global solution is achieved. Subroutine Interval Routing deals
with a subproblem Π that has demands in the range (1/2i , 1/2i−1 ] as follows.
Let f be a feasible fractional solution to Π. Round all demands up to 1/2i−1
and call Π 0 the resulting Ufp instance. To obtain a feasible fractional solution
for Π 0 , it suffices to multiply the flow fe and the capacity ue on each edge e
Unsplittable Flow 333

by a factor of at most 2. Further round all capacities up to the closest multiple


of 1/2i−1 by adding at most 1/2i−1 . The new edge capacities are now at most
2ue + 1/2i−1. By Theorem 1 an unsplittable routing for Π 0 can now be found in
polynomial time. Algorithm Partition is described formally in Fig. 2. Observe
that an edge e which is assigned capacity uie = 0 at Step 2, may still be used by
subroutine Interval Routing on Gi to route up to ai amount of flow.

Step 1. Find a feasible fractional solution f.


Step 2. Define a partition of the [1/D, 1] interval into ξ = blog Dc + 1 consec-
utive subintervals (1/2blog Dc+1 , 1/2blog Dc ], (1/2blog Dc , 1/2blog Dc−1 ], . . . , (1/2, 1].
Construct ξ copies of G where the set of sinks in Gi , 1 ≤ i ≤ ξ, is the subset
Ti of T with demands in the interval (1/2i , 1/2i−1 ]. Using flow decomposition de-
termine for every edge e the amount uie of ue used by f to route flow to sinks in
Ti . Set the capacity of edge e in Gi to uie .
Step 3. Invoke Interval Routing on each Gi , 1 ≤ i ≤ ξ, to obtain an unsplittable
routing g i . Output routing g, the union of the path sets g i , 1 ≤ i ≤ ξ.

Fig. 2. Algorithm Partition(G = (V, E, u), s, T ).

Theorem 2. [24] Given an Ufp Π, Algorithm Partition finds a


4-approximation for congestion in polynomial time.

The running time is dominated by the time to perform O(log n) maximum


flow computations and to perform flow decomposition. The fastest maximum
flow algorithms currently have running time O(nm log n log
m
n
n)[13] and
2
O(min(n2/3 , m1/2 )m log( nm ) log U ) [14] when the capacities are integers in the
range [1, . . . , U ]. See [23] for exact computations of running times. In our imple-
mentations, we use a maximum flow routine repeatedly in order to perform flow
decomposition.

3.1 A 3-Approximation Algorithm


An improved approximation algorithm can be obtained by combining the par-
titioning idea above with a more careful treatment of the subproblems. At the
first publication of our results [24] we achieved a (3.33 + o(1))-approximation.
Subsequently we improved our scheme to obtain a 3 ratio; prior to this improve-
ment, a 2-approximation for congestion was independently obtained by Dinitz,
Garg and Goemans [7]. See [23] for a detailed exposition.
We give first without proof an application of the Interval Routing sub-
routine.
Lemma 1. Let Π = (G, s, T ), be a Ufp in which all demands have value 1/2x
for x ∈ N, and all capacities are multiples of 1/2x+1, and let f be a fractional
flow solution such that the flow fe through any edge is a multiple of 1/2x+1. We
334 Stavros G. Kolliopoulos and Clifford Stein

can find in polynomial time an unsplittable routing g where the flow ge through
an edge e is at most fe + 1/2x+1.
Algorithm Partition finds a routing for each subproblem by scaling up sub-
problem capacities to ensure they conform to the requirements of Lemma 1. The
new algorithm operates in phases, during which the number of distinct paths
used to route flow for a particular commodity is progressively decreased. Call
granularity the minimum amount of flow routed along a single path to any com-
modity, which has not been routed yet unsplittably. Algorithm Partition starts
with a fractional solution of arbitrary granularity and essentially “in parallel”
for all subproblems, rounds this fractional solution to an unsplittable one. In
the new algorithm, we delay the rounding process in the sense that we keep
computing a fractional solution of successively increasing granularity. The gran-
ularity will always be a power of 1/2 and this unit will be doubled after each
iteration. Once the granularity surpasses 1/2x , for some x, we guarantee that all
commodities with demands 1/2x or less have already been routed unsplittably.
Every iteration improves the granularity at the expense of increasing the con-
gestion. The method has the advantage that by Lemma 1, a fractional solution
of granularity 1/2x requires only a 1/2x additive offset to current capacities to
find a new flow of granularity 1/2x−1. Therefore, if the algorithm starts with
a fractional solution of granularity 1/2λ , for some potentially large λ, the total
Pj=λ
increase to an edge capacity from then on would be at most j=1 1/2j < 1.
Expressing the granularity in powers of 1/2 requires an initial rounding of the
demand values; this rounding will force us to scale capacities by at most a factor
of 2. We first provide an algorithm for the case in which demand values are
powers of 1/2. The algorithm for the general case will then follow easily.
Theorem 3. Let Π = (G = (V, E, u), s, T ) be a Ufp, in which all demand
values are of the form 2−ν with ν ∈ N. There is an algorithm, 2H Partition,
which obtains, in polynomial time, an unsplittable routing such that the flow
through any edge e is at most zue +ρmax −ρmin where z is the optimum congestion.
Remark: We first obtained a relative (1 + ρmax − ρmin )-approximation for the
case with demand values in {1/2, 1} in [24]. Dinitz, Garg and Goemans [7]
first extended that theorem to arbitrary powers of 1/2. Their derivation uses a
different approach.

Proof. We describe the algorithm 2H Partition. The crux of the algorithm is


a relaxed decision procedure that addresses the following question. Is there an
unsplittable flow after scaling all the input capacities by x? The procedure will
either return no or will output a solution where the capacity of edge e is at most
xue + ρmax − ρmin . Embedding the relaxed decision procedure in a binary search
completes the algorithm. See [16] for background on relaxed decision procedures.
The following claim will be used to show the correctness of the procedure.
Claim. Given an Ufp Π with capacity function r let Π 0 be the modified version
of Π in which every edge capacity re has been rounded down to the closest
Unsplittable Flow 335

multiple re0 of 1/2λ . If the optimum congestion is 1 for Π, the optimum congestion
is 1 for Π 0 as well.

Proof of Claim. Let G be the network with capacity function r and G0 be the
network with rounded capacities r0 . The amount re −re0 of capacity willP be unused
by any optimal unsplittable flow in G. The reason is that the sum i∈S 1/2i ,
for any finite multiset S ⊂ N is equal to a multiple of 1/2i0 , i0 = mini∈S {i}.
We describe now the relaxed decision procedure. Let u0 denote the scaled ca-
pacity function xu. Let ρmin = 1/2λ . The relaxed decision procedure has at most
λ − log(ρ−1
max ) + 1 iterations. During the first iteration, split every commodity in
T , called an original commodity, with demand 1/2x, 0 ≤ x ≤ λ, into 2λ−x virtual
commodities each with demand 1/2λ . Call T1 the resulting set of commodities.
Round down all capacities to the closest multiple of 1/2λ . By Theorem 1 we can
find a fractional flow solution f 0 ≡ g 1 , for T1 , in which the flow through any edge
is a multiple of 1/2λ . If this solution does not satisfy all demands the decision
procedure returns no; by Claim 3.1 no unsplittable solution is possible without
increasing the capacities u0 . Otherwise the procedure continues as follows.
Observe that f 0 is a fractional solution for the set of original commodities
as well, a solution with granularity 1/2λ . Let uje denote the total flow through
an edge after iteration j. The flow u1e is trivially fe0 ≤ u0e ≤ ue . At iteration
i > 1, we extract from gi−1 , using flow decomposition, the set of flow paths f i−1
used to route the set Si−1 of virtual commodities in Ti−1 which correspond to
original commodities in T with demand 1/2λ−i+1 or more. Pair up the virtual
commodities in Si−1 corresponding to the same original commodity; this is pos-
sible since the demand of an original commodity is inductively an even multiple
of the demands of the virtual commodities. Call the resulting set of virtual com-
modities Ti . Ti contains virtual commodities with demand 1/2λ−i+1 . The set of
flow paths f i−1 is a fractional solution for the commodity set Ti . By Lemma 1,
we can find an unsplittable routing g i for Ti such that the flow gei through an
edge e is at most fei−1 + 1/2λ−i+2 . Now the granularity has been doubled to
1/2λ−i+1 . A crucial observation is that from this quantity, gei , only an amount
of at most 1/2λ−i+2 corresponds to new flow, added to e during iteration i. It is
easy to see that the algorithm maintains the following two invariants.
INV1: After iteration i, all original commodities with demands 1/2λ−i+1 or
less have been routed unsplittably.
INV2: After iteration i > 1, the total flow uie though edge e, which is due to
all i first iterations, satisfies the relation uie ≤ ui−1
e + 1/2λ−i+2 .
Given that u1e ≤ u0e the total flow though e used to route unsplittably all com-
modities in T is at most
λ−log(ρ−1
X max )+1

u0e + 1/2λ−i+2 = xue + ρmax − 1/2λ = xue + ρmax − ρmin .


i=2

Therefore the relaxed decision procedure returns the claimed guarantee. For
the running time, we observe that in each of the l = λ − log(ρ−1
max ) + 1 iterations,
336 Stavros G. Kolliopoulos and Clifford Stein

it appears that the number of virtual commodities could be as large as k2l ,


where k is the number of the original commodities. There are two approaches
to ensure polynomiality of our method. One is to consider only demands of
value 1/2λ > 1/nd for some d > 1. It is known [24] that we can route the very
small demands while affecting the approximation ratio by an o(1) factor. The
other approach relies on the observation that at iteration i, the set Si−1 of flow
paths yielded by the flow decomposition has size O(km) since it corresponds to
a fractional solution for a subset of the original commodities. In other words,
virtual commodities in Si−1 corresponding to the same original commodity have
potentially been routed along the same path in g i . Hence, by appropriately
merging virtual commodities corresponding to the same original commodity, and
increasing the granularity at different rates for distinct original commodities, we
can always run an iteration on a polynomial-size set of virtual commodities.
We opted to present a “pseudopolynomial” version of the decision procedure for
clarity. t
u

Theorem 4. Let Π = (G = (V, E, u), s, T ) be a Ufp. There is an algorithm,


H Partition, which obtains in polynomial time a min{3−ρmin, 2+2ρmax −ρmin }
approximation for congestion.

Proof. We describe the algorithm H Partition. Round up every demand to its


closest power of 1/2. Call the resulting Ufp Π 0 with sink set T 0 . Multiply all
capacities by at most 2. Then there is a fractional feasible solution f 0 to Π 0 .
Let u0 be the capacity function in Π 0 . The purpose of the rounding is to be
able to express later on the granularity as a power of 1/2. The remainder of
the proof consists of finding an unsplittable solution to Π 0 ; this solution can be
efficiently converted to an unsplittable solution to Π without further capacity
increase. Such a solution to Π 0 can be found by the algorithm 2H Partition
from Theorem 3. Observe that the optimum congestion for Π 0 is at most z, the
optimum congestion for Π. 2H Partition will route through edge e flow that
is at most
zu0e + ρ0max − ρ0min ≤ 2zue + ρ0max − ρ0min ,
where ρ0max and ρ0min denote the maximum and minimum demand values in Π 0 .
But ρ0max < 2ρmax , ρ0max ≤ 1, and ρ0min ≥ ρmin from the construction of Π 0 .
Therefore ρ0max − ρ0min ≤ 1 − ρmin and ρ0max − ρ0min ≤ 2ρmax − ρmin . Dividing the
upper bound on the flow by zue ≥ 1 yields the claimed guarantees on congestion.
t
u

4 Heuristics Used

The heuristics we designed fall into two major categories. We tested them both
individually and in combination with each other.
The theoretical analysis of Partition and H Partition is not affected if
after finding the initial solution to the maximum flow relaxation, edges carrying
Unsplittable Flow 337

zero flow, called zero edges are discarded. The sparsification heuristic imple-
ments this observation. The motivation behind the sparsification heuristic is to
decrease the running time and also observe how the theoretical versions of the
algorithms behave in practice since the zero edges are not exploited in the the-
oretical analysis. On random graph instances where maximum flow solutions
use typically a small fraction of the edges, the sparsification heuristic resulted
in drastic reductions of the running time. We observed that the approximation
guarantee typically degrades by a small amount when a large number of edges
is discarded. However the reduction on running time and space requirements
makes the sparsification heuristic attractive for large instances. We also tested
an intermediate approach where a randomly chosen fraction of the zero edges
(on the average between 30 and 50 percent) is retained.
The cost heuristic invokes a minimum-cost flow subroutine when the original
algorithm would invoke maximum flow to compute partial unsplittable solutions.
In algorithm Partition we use minimum cost flow at Step 3 to find the partial
unsplittable solution to each subproblem. In algorithm H Partition minimum-
cost flow is used to compute at iteration i the unsplittable routing gi for the set Ti
of virtual commodities (cf. proof of Theorem 3). That is we use a minimum-cost
flow routine to successively improve the granularity of the fractional solution.
The cost of an edge e is defined as an appropriate increasing function of the cur-
rent utilization of e. Let λ denote the current congestion. For robustness we use
the ratio ze /λ in the calculation of the cost c(e). We tested using functions that
depend either linearly or exponentially on ze /λ. The objective was to discour-
age the repeated use of edges which were already heavily utilized. Exponential
cost functions tend to penalize heavily the edge with the maximum utilization
in a given set of edges. Their theoretical success in multicommodity flow work
[34,18] motivated their use in our setting. In particular we let cL (e) = β1 ze /λ
and cE (e) = 2β2 ze /λ and tested both functions for different values of β1 , β2 . Ob-
serve that an edge with utilization 0 so far will incur a cost of 1 under cE , while
0 seems a more natural cost assignment for this case. However experiments sug-
gested that the cE function behaved marginally better than cE − 1 by imposing
limited moderation on the introduction of new edges in the solution.

5 Experimental Framework

Software and hardware resources. We conducted our experiments on a 400 MHz


dual Pentium Pro PC with 500 MB of RAM and 130 MB swap space. None of
the executions we report on here exceeded the system RAM or used the second
system processor. The operating system was Linux, version 9. Our programs were
written in C and compiled using gcc, version 2.7.2, with the -O3 optimization
option.

Codes tested. For the implementation of the maximum flow and minimum cost
flow subroutines of our algorithms we used the codes described in [5] and [4]
respectively. We also used the Cherkassky-Goldberg code to implement the flow
338 Stavros G. Kolliopoulos and Clifford Stein

decomposition step as described in Section 3. We assume integral capacities and


demands in the unsplittable flow input. We test the following codes:

h prf: this is algorithm Partition without sparsification.


h prf2: this is algorithm Partition with intermediate sparsification. Zero cost
edges are sampled with probability .3 and if the coin toss fails they are
discarded.
h prf3: this is algorithm Partition with all the zero edges discarded.
3al: this is algorithm H Partition without sparsification. We implement a
simplified version of algorithm H Partition with theoretical guarantee 3
(in contrast to 3 − ρmin in Theorem 4).
3al2: this is algorithm H Partition with all the zero edges discarded.
c/h prf10y: version of h prf with the cost heuristic. Linear cost function with
slope β1 = 10y .
c/h prfexpy: version of h prf with the cost heuristic. Exponential cost function
with coefficient β2 = y.
c3/3al10y: version of 3al with the cost heuristic. Linear cost function with
slope β1 = 10y .
c3/3alexpy: version of 3al with the cost heuristic. Exponential cost function
with coefficient β2 = y.
c3 2/3al210y: version of 3al2 with the cost heuristic. Linear cost function with
slope β1 = 10y .
c3 2/3al2expy: version of 3al2 with the cost heuristic. Exponential cost func-
tion with coefficient β2 = y.

Input classes. We generated data from four input classes. For each class we
generated a variety of instances varying different parameters. The generators
use randomness to produce different instances for the same parameter values.
To make our experiments repeatable the seed of the pseudorandom generator is
an input parameter for all generators. If no seed is given, a fixed default value is
chosen. We used the default seed in generating all inputs. The four classes used
follow.

genrmf. This is adapted from the GENRMF generator of Goldfarb and Grigo-
riadis [15,2]. The input parameters are a b c1 c2 k d. The generated net-
work has b frames (grids) of size a × a, for a total of a ∗ a ∗ b vertices. In each
frame each vertex is connected with its neighbors in all four directions. In
addition, the vertices of a frame are connected one-to-one with the vertices
of the next frame via a random permutation of those vertices. The source is
the lower left vertex of the first frame. Vertices become sinks with probabil-
ity 1/k and their demand is chosen uniformly at random from the interval
[1, d]. The capacities are randomly chosen integers from (c1, c2) in the case
of interframe edges, and (1, c2 ∗ a ∗ a) for the in-frame edges.
noigen. This is adapted from the noigen generator used in [3,29] for minimum-
cut experimentation. The input parameters are n d t p k. The network has
n nodes and an edge between a pair of vertices is included with probability
Unsplittable Flow 339

1/d. Vertices are randomly distributed among t components. Capacities are


chosen uniformly from a prespecified range [l, 2l] in the case of intercompo-
nent edges and from [pl, 2pl] for intracomponent edges, p being a positive
integer. Only vertices belonging to one of the t−1 components not containing
the source can become sinks, each with probability 1/k. The desired effect of
the construction is for commodities to contend for the light intercomponent
cuts. Demand for commodities is chosen uniformly from the range [1, 2l].
rangen. This generates a random graph G(n, p) with input parameters n p c1
c2 k d, where n is the number of nodes, p is the edge probability, capacities
are in the range (c1, c2), k is the number of commodities and demands are
in the range (0, d).
satgen. satgen first generates a random graph G(n, p) as in rangen and then
uses the following procedure to designate commodities. Two vertices s and t
are picked from G and maximum flow is computed from s to t. Let v be the
value of the flow. New nodes corresponding to sinks are incrementally added
each connected only to t and with a randomly chosen demand value. The
process stops when the total demand reaches v, the value of the minimum s-t
cut or when the number of added commodities reaches the input parameter
k, typically given as a crude upper bound.
LA networks. This is a synthetic family of networks generated by Deng et al.
[6] as part of the AT&T capacity restoration project. Although this project
does not deal with unsplittable flow, the data readily gives rise to multiple-
source unsplittable flow instances. The networks are derived from apply-
ing planarity-preserving edge contractions to a street map of Los Angeles.
Pairs of terminals are placed on vertices following an exponential distribu-
tion arising from actual traffic forecasts. We transformed the instances to
single-source by picking the largest degree vertex as the source.

2 6 12 25 50 100
program name cong. time cong. time cong. time cong. time cong. time cong. time
3al 1.64 1 2.37 1 2.14 1 1.59 1 1.18 1 1.23 3
3al2 2.50 0 2.51 0 2.41 0 1.47 1 1.25 1 1.19 2
c/h prf103 1.49 0 2.91 0 2.51 0 1.85 1 1.67 1 1.55 2
c/h prfexp10 1.49 0 2.17 0 2.15 1 1.85 1 1.44 1 1.59 2
c3/3al103 2.56 0 2.29 1 1.96 2 1.46 2 1.25 4 1.16 7
c3/3alexp10 1.49 0 1.86 1 1.74 1 1.81 2 1.27 2 1.18 3
c3 2/3al2103 1.49 0 2.51 0 1.76 1 1.92 1 1.57 2 1.44 2
c3 2/3al2exp10 1.49 0 2.13 0 2.04 1 1.59 1 1.32 1 1.31 2
h prf 2.00 0 2.47 0 2.51 0 2.15 0 1.50 0 1.45 1
h prf2 2.00 0 2.47 0 2.51 0 2.14 0 1.50 1 1.41 2
h prf3 2.00 0 2.75 0 2.20 0 1.74 0 1.36 0 1.46 1
Table 1. family noi commodities: noigen 1000 1 2 10 i; 7981 edges; 2-unbalanced
family; i is the expected percentage of sinks in the non-source component.
340 Stavros G. Kolliopoulos and Clifford Stein

2 4 8 16 32
program name cong. time cong. time cong. time cong. time cong. time
3al 1.18 1 1.12 2 1.09 3 1.04 3 1.06 3
3al2 1.25 1 1.15 1 1.09 2 1.08 1 1.05 1
c/h prf103 1.67 1 1.47 2 1.33 2 1.24 2 1.24 2
c/h prfexp10 1.44 1 1.40 1 1.26 1 1.29 1 1.25 2
c3/3al103 1.25 4 1.22 6 1.13 7 1.12 5 1.07 6
c3/3alexp10 1.27 2 1.12 3 1.11 3 1.12 3 1.11 4
c3 2/3al2103 1.57 2 1.24 1 1.09 2 1.14 2 1.10 1
c3 2/3al2exp10 1.32 1 1.16 2 1.15 2 1.13 2 1.12 2
h prf 1.50 1 1.42 1 1.30 1 1.22 1 1.24 2
h prf2 1.55 1 1.40 1 1.22 2 1.23 2 1.18 2
h prf3 1.36 0 1.22 1 1.38 0 1.28 1 1.20 1
Table 2. family noi components: noigen 1000 1 i 10 50; 7981 edges; 2-unbalanced
family; i is the number of components.

32 64 128 256 512 1024


program name cong. time cong. time cong. time cong. time cong. time cong. time
3al 5.33 0 7.00 1 8.00 2 8.50 10 7.50 42 8.00 185
3al2 7.50 0 5.50 0 6.00 1 7.00 4 7.50 22 9.00 93
c/h prf103 7.00 0 7.50 0 6.50 0 7.50 4 8.00 16 8.00 70
c/h prfexp10 7.50 0 4.50 0 7.00 1 7.00 4 7.50 16 8.00 70
c3/3al103 8.00 0 4.50 1 8.00 2 7.50 12 7.50 63 7.50 278
c3/3alexp10 7.50 0 7.00 1 7.50 3 7.50 12 7.50 55 7.50 243
c3 2/3al2103 7.50 0 4.50 0 7.00 1 7.50 5 7.50 22 7.50 95
c3 2/3al2exp10 7.50 0 4.50 0 7.00 1 7.50 5 7.50 22 8.00 95
h prf 7.50 0 7.00 0 6.00 1 9.50 3 7.50 16 8.50 70
h prf2 7.50 0 7.00 0 7.50 1 10.00 4 10.50 16 8.50 70
h prf3 5.33 0 7.00 0 9.50 0 7.50 0 7.50 3 8.00 11
Table 3. family ran dense: rangen -a i*2 -b 30 -c1 1 -c2 16 -k i -d 16; 1228, 4890,
19733, 78454, 313922, 1257086 edges; 16-unbalanced family; i*2 is the number
of vertices.
2 5 10 20 50 70
program name cong. time cong. time cong. time cong. time cong. time cong. time
3al 1.88 7 1.91 22 1.65 44 1.35 94 1.16 259 1.14 372
3al2 1.74 4 1.97 13 1.55 24 1.31 49 1.18 143 1.14 201
c/h prf103 2.41 6 2.49 13 2.10 27 1.81 63 1.66 178 1.65 234
c/h prfexp10 2.67 6 2.43 11 2.09 22 1.79 53 1.74 149 1.71 201
c3/3al103 1.85 29 1.88 56 1.63 88 1.42 145 1.20 272 1.18 337
c3/3alexp10 1.80 21 1.96 30 1.64 42 1.46 70 1.24 175 1.18 235
c3 2/3al2103 1.80 5 1.97 18 1.68 31 1.47 58 1.26 150 1.20 207
c3 2/3al2exp10 1.92 8 2.00 20 1.67 31 1.37 54 1.21 144 1.22 196
h prf 2.52 5 2.46 14 2.22 29 1.86 69 1.66 199 1.63 250
h prf2 2.65 6 2.77 16 2.15 33 1.82 76 1.70 218 1.68 257
h prf3 2.43 3 2.49 7 2.20 16 1.80 36 1.75 108 1.67 147
Table 4. family rmf commDem: genrmf -a 10 -b 64 -c1 64 -c2 128 -k i -d 128;
29340 edges; 2-unbalanced family; i is the expected percentage of sinks in the
vertices.
Unsplittable Flow 341

6 Experimental Results
In this section we give an overview of experimental results. We experimented
on 13 different families and present data on several representative ones. Tables
1-6 provide detailed experimental results. The running time is given in seconds.
The congestion achieved by algorithm Partition, on a balanced input, was
typically less than 2.3 without any heuristics. The largest value observed was
2.89. Similarly for H Partition the congestion, on a balanced input, was typi-
cally less than 1.65 and always at most 1.75. The theoretically better algorithm
H Partition does better in practice as well. To test the robustness of the ap-
proximation guarantee we relaxed, on several instances, the balance assumption
allowing the maximum demand to be twice the minimum capacity or more, see
Tables 1-5. The approximation guarantee was not significantly affected. Even in
the extreme case when the balance assumption was violated by a factor of 16, as
in Table 3, the best code achieved a congestion of 7.5. Performace on the smallest
instances of the LA family (not shown here) was well within the typical values
reported above. On larger LA instances, the exponential distribution generated
an excessive number of commodities (on the order of 30,000 and up) for our
codes to handle in the time frames we considered (a few minutes).
The sparsification heuristic proved beneficial in terms of efficiency as it typ-
ically cut the running time by half. One would expect that it would incur a
degradation in congestion as it discards potentially useful capacity. This turned
out not to be true in our tests. There is no clear winner in our experiments
between h prf and h prf3 or between 3al and 3al2. One possible explanation
is that in all cases the same maximum flow code is running in subroutines, which
would tend to route flow in a similar fashion.

2 4 16 64
program name cong. time cong. time cong. time cong. time
3al 2.26 0 2.18 0 1.48 9 1.21 214
3al2 1.82 0 2.72 0 1.46 4 1.20 102
c/h prf103 2.45 0 3.10 0 1.88 5 1.68 140
c/h prfexp10 2.45 0 3.01 0 1.98 4 1.73 117
c3/3al103 1.42 0 2.03 1 1.53 13 1.24 225
c3/3alexp10 1.72 0 2.11 1 1.57 7 1.22 145
c3 2/3al2103 1.50 0 2.40 1 1.70 5 1.32 113
c3 2/3al2exp10 1.61 0 2.17 0 1.59 5 1.27 112
h prf 1.85 0 3.56 0 2.05 6 1.72 154
h prf2 1.85 0 3.56 0 1.94 6 1.75 171
h prf3 2.38 0 3.12 0 1.92 3 1.72 87
Table 5. family rmf depthDem: genrmf -a 10 -b i -c1 64 -c2 128 -k 40 -d 128;
820, 1740, 7260, 29340 edges; 2-unbalanced family; i is the number of frames.

In contrast, the cost heuristic achieved modest improvements with respect to


the theoretical versions of the algorithms. The gains were more tangible when
342 Stavros G. Kolliopoulos and Clifford Stein

the cost heuristic was applied to H Partition which solves subproblems in an


incremental fashion using information from the subproblems solved previously.
On instances where 3al or 3al2 achieved a congestion of around 1.7 the gains
were of the order of 5-10% depending on the actual cost function used. On in-
stances where H Partition did really well, achieving a congestion close to 1 the
cost heuristic offered no improvement or even deteriorated the approximation.
Typical values of the cost function coefficients that achieved the best results
were β1 = 103 , 104 for the linear case and β2 = 10 or 25 for the exponential case.
In this abstract we report results for β1 = 103 and β2 = 10. The exponential
function performed slightly better than the linear function, with the gap closing
as the network size or the number of commodities increased, in which case the
actual numerical values for costs became less important.

1 2 6 12 25 50
program name cong. time cong. time cong. time cong. time cong. time cong. time
3al 1.55 1 1.56 1 1.67 5 1.64 11 1.64 27 1.70 64
3al2 1.45 0 1.67 1 1.78 3 1.57 6 1.67 15 1.75 35
c/h prf103 1.80 1 2.11 0 2.22 2 2.15 4 2.33 11 2.33 26
c/h prfexp10 1.92 0 2.11 1 2.11 2 2.15 5 2.33 11 2.33 24
c3/3al103 1.44 1 1.64 2 1.44 7 1.67 14 1.56 30 1.56 130
c3/3alexp10 1.30 1 1.33 2 1.56 7 1.57 14 1.56 34 1.56 78
c3 2/3al2103 1.15 1 1.40 1 1.44 3 1.56 6 1.56 15 1.56 34
c3 2/3al2exp10 1.27 0 1.44 1 1.44 3 1.56 6 1.56 14 1.57 34
h prf 1.92 0 2.11 1 2.11 2 2.33 4 2.22 11 2.33 27
h prf2 1.92 0 2.22 0 2.33 2 2.56 4 2.60 12 2.89 28
h prf3 2.00 0 2.20 1 2.18 1 2.33 1 2.22 3 2.42 9
Table 6. family sat density: satgen -a 1000 -b i -c1 8 -c2 16 -k 10000 -d 8;
9967, 20076, 59977, 120081, 250379, 500828 edges; 22, 61, 138, 281, 682, 1350
commodities; i is the expected percentage of pairs of vertices joined by an edge.

In summary, algorithm H Partition achieved significantly better approx-


imations than algorithm Partition on the instances we examined. The spar-
sification heuristic can reduce drastically the running time. Running the cost
heuristic can provide additional improvement on congestion at the expense of
an increase in running time.

7 Future Work

There are several directions to pursue in future work. The main task would be
to implement the new 2-approximation algorithm of Dinitz et al. [7] and see
how it behaves in practice. Testing on data with real network characteristics is
important if such instances can be made available. Designing new generators
that push the algorithms to meet their theoretical guarantees will enhance our
understanding of the problem. Finally, a problem of minimizing makespan on
Unsplittable Flow 343

parallel machines [27] reduces to single-source unsplittable flow on a two-level


graph with simple topology. It would be interesting to see how the unsplittable
flow algorithms behave on actual instances of this problem.
Acknowledgement. Thanks to Aravind Srinivasan for useful discussions and
for motivating this project. An early version of a G(n, p) generator was written
by Albert Lim. We also wish to thank Michel Goemans for sending us a copy of
[7]. We are grateful to Jeff Westbrook for giving us access to the LA networks.
The first author thanks Peter Blicher and Satish Rao for useful discussions.

References
1. R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network flows: Theory, Algorithms
and Applications. Prentice Hall, 1993.
2. Tamas Badics. genrmf. ftp://dimacs.rutgers.edu/pub/netflow/generators/-
network/ genrmf/, 1991.
3. C. Chekuri, A. V. Goldberg, D. Karger, M. Levine, and C. Stein. Experimental
study of minimum cut algorithms. In Proc. of 8th SODA, 324–333, 1997.
4. B. V. Cherkassky and A. V. Goldberg. An efficient implementation of a scaling
minimum-cost flow algorithm. Journal of Algorithms, 22:1–29, 1997.
5. B. V. Cherkassky and A. V. Goldberg. On implementing the push-relabel method
for the maximum flow problem. Algorithmica, 19:390–410, 1997.
6. M. Deng, D. F. Lynch, S. J. Phillips, and J. R. Westbrook. Algorithms for restora-
tion planning in a telecommunications network. In ALENEX99, 1999.
7. Y. Dinitz, N. Garg, and M. Goemans. On the single-source unsplittable flow
problem. In Proc. of 39th FOCS, 290–299, November 1998.
8. P. Elias, A. Feinstein, and C. E. Shannon. Note on maximum flow through a
network. IRE Transactions on Information Theory IT-2, 117–199, 1956.
9. L. R. Ford and D. R. Fulkerson. Maximal flow through a network. Canad. J.
Math., 8:399–404, 1956.
10. A. Frank. Packing paths, cuts and circuits - a survey. In B. Korte, L. Lovász,
H. J. Prömel, and A. Schrijver, editors, Paths, Flows and VLSI-Layout, 49–100.
Springer-Verlag, Berlin, 1990.
11. N. Garg, V. Vazirani, and M. Yannakakis. Primal-dual approximation algorithms
for integral flow and multicut in trees. Algorithmica, 18:3–20, 1997.
12. A. V. Goldberg, J. D. Oldham, S. A. Plotkin, and C. Stein. An implementation
of a combinatorial approximation algorithm for minimum-cost multicommodity
flow. In Proc. of the 6th Conference on Integer Programming and Combinatorial
Optimization, 338–352, 1998. Published as Lecture Notes in Computer Science
1412, Springer-Verlag”,.
13. A. V. Goldberg and R. E. Tarjan. A new approach to the maximum flow problem.
Journal of the ACM, 35:921–940, 1988.
14. A.V. Goldberg and S. Rao. Beyond the flow decomposition barrier. In Proc. of
the 38th Annual Symposium on Foundations of Computer Science, 2–11, 1997.
15. D. Goldfarb and M. Grigoriadis. A computational comparison of the Dinic and
Network Simplex methods for maximum flow. Annals of Operations Research,
13:83–123, 1988.
16. D. S. Hochbaum and D. B. Shmoys. Using dual approximation algorithms for
scheduling problems: theoretical and practical results. Journal of the ACM, 34:144–
162, 1987.
344 Stavros G. Kolliopoulos and Clifford Stein

17. P. Klein, A. Agrawal, R. Ravi, and S. Rao. Approximation through multicom-


modity flow. In Proc. of the 31st Annual Symposium on Foundations of Computer
Science, 726–737, 1990.
18. P. Klein, S. A. Plotkin, C. Stein, and É. Tardos. Faster approximation algorithms
for the unit capacity concurrent flow problem with applications to routing and
finding sparse cuts. SIAM Journal on Computing, 23(3):466–487, June 1994.
19. J. M. Kleinberg. Approximation algorithms for disjoint paths problems. PhD thesis,
MIT, Cambridge, MA, May 1996.
20. J. M. Kleinberg. Single-source unsplittable flow. In Proc. of the 37th Annual
Symposium on Foundations of Computer Science, 68–77, October 1996.
21. J. M. Kleinberg and É. Tardos. Approximations for the disjoint paths problem in
high-diameter planar networks. In Proc. of the 27th Annual ACM Symposium on
Theory of Computing, 26–35, 1995.
22. J. M. Kleinberg and É. Tardos. Disjoint paths in densely-embedded graphs. In
Proc. of the 36th Annual Symposium on Foundations of Computer Science, 52–61,
1995.
23. S. G. Kolliopoulos. Exact and Approximation Algorithms for Network Flow and
Disjoint-Path Problems. PhD thesis, Dartmouth College, Hanover, NH, August
1998.
24. S. G. Kolliopoulos and C. Stein. Improved approximation algorithms for unsplit-
table flow problems. In Proc. of the 38th Annual Symposium on Foundations of
Computer Science, 426–435, 1997.
25. T. Leighton, F. Makedon, S. Plotkin, C. Stein, É. Tardos, and S. Tragoudas. Fast
approximation algorithms for multicommodity flow problems. Journal of Computer
and System Sciences, 50:228–243, 1995.
26. T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform
multicommodity flow problems with applications to approximation algorithms. In
Proc. of the 29th Annual Symposium on Foundations of Computer Science, 422–
431, 1988.
27. J. K. Lenstra, D. B. Shmoys, and É. Tardos. Approximation algorithms for schedul-
ing unrelated parallel machines. Mathematical Programming, 46:259–271, 1990.
28. T. Leong, P. Shor, and C. Stein. Implementation of a combinatorial multicommod-
ity flow algorithm. In D. S. Johnson and C. C. McGeoch, editors, DIMACS Series
in Discrete Mathematics and Theoretical Computer Science: Volume 12, Network
Flows and Matching. October 1991.
29. Hiroshi Nagamochi, Tadashi Ono, and Toshihde Ibaraki. Implementing an efficient
minimum capacity cut algorithm. Mathematical Programming, 67:325–241, 1994.
30. CPLEX Optimization, Inc. Using the CPLEX callable library. Software Manual,
1995.
31. P. Raghavan and C. D. Thompson. Randomized rounding: a technique for provably
good algorithms and algorithmic proofs. Combinatorica, 7:365–374, 1987.
32. N. Robertson and P. D. Seymour. Outline of a disjoint paths algorithm. In B. Ko-
rte, L. Lovász, H. J. Prömel, and A. Schrijver, editors, Paths, Flows and VLSI-
Layout. Springer-Verlag, Berlin, 1990.
33. A. Schrijver. Homotopic routing methods. In B. Korte, L. Lovász, H. J. Prömel,
and A. Schrijver, editors, Paths, Flows and VLSI-Layout. Springer-Verlag, Berlin,
1990.
34. F. Shahrokhi and D. W. Matula. The maximum concurrent flow problem. Journal
of the ACM, 37:318 – 334, 1990.
Approximation Algorithms for a Directed
Network Design Problem

Vardges Melkonian? and Éva Tardos??

Cornell University, Ithaca, NY 14853


[email protected], [email protected]

Abstract. We present a 2-approximation algorithm for a class of di-


rected network design problems. The network design problem is to find a
minimum cost subgraph such that for each vertex set S there are at least
f (S) arcs leaving the set S. In the last 10 years general techniques have
been developed for designing approximation algorithms for undirected
network design problems. Recently, Kamal Jain gave a 2-approximation
algorithm for the case when the function f is weakly supermodular. There
has been very little progress made on directed network design problems.
The main techniques used for the undirected problems do not extend to
the directed case.
András Frank has shown that in a special case when the function f
is intersecting supermodular the problem can be solved optimally. In
this paper, we use this result to get a 2-approximation algorithm for
a more general case when f is crossing supermodular. We also extend
Jain’s techniques to directed problems. We prove that if the function f
is crossing supermodular, then any basic solution of the LP relaxation
of our problem contains at least one variable with value greater or equal
to 1/4.

1 Introduction
We consider the following network design problem for directed networks. Given a
directed graph with nonnegative costs on the arcs find a minimum cost subgraph
where the number of arcs leaving set S is at least f (S) for all subsets S. Formally,
given a directed graph G = (V, E) and a requirement function f : 2V 7→ Z, the
network design problem is the following integer program:
X
minimize ce xe (1)
e∈E
subject to
X
xe ≥ f (S), for each S ⊆ V,
e∈δG (S)

xe ∈ {0, 1}, for each e ∈ E,


?
Research partially supported by NSF grant CCR-9700163.
??
Research partially supported by NSF grant CCR-9700163, and ONR grant N00014-
98-1-0589.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 345–360, 1999.
c Springer-Verlag Berlin Heidelberg 1999
346 Vardges Melkonian and Éva Tardos

where δG (S) denotes thePset of arcs leaving S. For simplicity of notation we will
use x(δG (S)) to denote e∈δG (S) xe .
The following are some special cases of interest. When f (S) = k for all
∅=6 S ⊂ V the problem is that of finding a minimum cost k-connected subgraph.
The directed Steiner tree problem is to find the minimum cost directed tree
rooted at r that contains a subset of vertices D ⊆ V . This problem is a network
design problem where f (S) = 1 if r 6∈ S and S ∩ D 6= ∅ and f (S) = 0 otherwise.
Both of these special cases are known to be NP-complete. In fact, the directed
Steiner tree problem contains the Set Cover problem as a special case, and hence
no polynomial time algorithm can achieve an approximation better than O(log n)
unless P=NP [10].
In the last 10 years there has been significant progress in designing approxi-
mation algorithm for undirected network design problems [1,5,12,4,8], the analog
of this problem where the graph G is undirected. General techniques have been
developed for the undirected case, e.g., primal-dual algorithms [1,5,12,4]. Re-
cently, Kamal Jain gave a 2-approximation algorithm for the undirected case
when the function f is weakly supermodular. The algorithm is based on a new
technique: Jain proved that in any basic solution to the linear programming
relaxation of the problem there is a variable whose value is at least a half.
There has been very little progress made on directed network design problems.
The main general technique used for approximating undirected network design
problems, the primal-dual method does not extend to the directed case. Charikar
et al. [2] gave the only nontrivial approximation algorithm for the directed Steiner
tree problem. Their method provides an O(n )-approximation algorithm for any
fixed . Khuller and Vishkin [9] gave a simple 2-approximation algorithm for
the special case of the k-connected subgraph problem in undirected graphs. A
similar method also works for the directed case.
In this paper, we consider the case when f is crossing supermodular, i.e., for
every A, B ⊆ V such that A ∩ B 6= ∅ and A ∪ B 6= V we have that

f (A) + f (B) ≤ f (A ∩ B) + f (A ∪ B). (2)

Note that the k-connected subgraph problem is defined by the function f (S) = k
for all ∅ =
6 S ⊂ V , which is crossing supermodular. Hence the network design
problem with a crossing supermodular requirement function f is still NP-hard.
However, the function defining the directed Steiner tree problem is not crossing
supermodular.
The paper contains two main results: a 2-approximation algorithm for the
integer program, and a description of the basic solutions of the LP-relaxation.
The network design problem with a crossing supermodular requirement func-
tion f is a natural extension of the case of intersecting supermodular function
considered by Frank [11], i.e., when the inequality (2) holds whenever A and B
are intersecting. Frank [11] has shown that this special case can be solved opti-
mally. However, when f is crossing supermodular the problem is NP-hard and
no approximation results are known for this general case. In this paper, we com-
bine Frank’s result and the k-connected subgraph approximation of Khuller and
Approximation Algorithms for a Directed Network Design Problem 347

Vishkin [9] to obtain a 2-approximation algorithm for the crossing supermodular


case.
In the second part of the paper, we describe the basic solutions of the LP
relaxation of our problem. Extending Jain’s technique [8] to the case of directed
graphs, we show that in any basic feasible solution of the LP relaxation (where
we replace the xe ∈ {0, 1} constraints by 0 ≤ xe ≤ 1 for all arcs e), there is at
least one variable with value at least quarter. Using this result, we can obtain a
4-approximation algorithm for the problem by iteratively rounding all variables
above a quarter to 1.
An interesting special case of crossing supermodular function arises when
the requirement of a subset S is merely a function of its cardinality |S|, i.e.,
f (S) = g(|S|). It is easy to show that requiring f to be crossing supermodular is
equivalent to requiring that g to be a convex function. This particularly includes
the case when f (S) = k for all sets. For this latter problem, the best known
approximation factor is 2 [9]. Another example of the crossing supermodular
function is f (S) = max(|S|, k). Note that sets S that contain almost all nodes
have a very large value in this function. A more interesting function for the
context of network design would be the function f (S) = min(|S|, k, |V \ S|).
However, the minimum of convex functions is not convex, and this function is not
crossing supermodular. It is a very interesting open problem if these techniques
extend to the class of weakly supermodular functions, that were considered by
Jain in the undirected case. Note that this more general case includes the Steiner
tree problem, and hence we cannot hope for a constant approximation algorithm.
The rest of the paper is organized as follows. In the next section we give the
2-approximation algorithm. The main theorem stating that all basic solutions to
the linear programming relaxation of (1) have a variable that is at least a quarter
is given in Section 3. In Section 4 we sketch the 4-approximation algorithm that
is based on this theorem. This part of the paper is analogous to Jain’s paper [8]
for the undirected case.

2 The 2-Approximation Algorithm

We consider the following network design problem:


X
minimize ce xe (3)
e∈E

subject to
X
xe ≥ f (S), for each S ∈ ρ,
e∈δG (S)

xe ∈ {0, 1}, for each e ∈ E,

where δG (S) denotes the set of the arcs leaving S, and f (S) is a crossing super-
modular function on the set system ρ ⊆ 2V .
348 Vardges Melkonian and Éva Tardos

Frank [11] has considered the special case of this problem when f is inter-
secting supermodular. He showed that in this case, the LP relaxation is totally
dual integral (TDI), and gave a direct combinatorial algorithm to solve the prob-
lem. The case of crossing supermodular function is NP-hard (as it contains the
k-connected subgraph problem as a special case). Here we use Frank’s theorem
to get a simple 2-approximation algorithm.
Let r be a vertex of G. Divide all the sets in ρ into two groups as follows:

ρ1 = {A ∈ ρ : r 6∈ A}; (4)

ρ2 = {A ∈ ρ : r ∈ A}. (5)
The idea is that using Frank’s technique we can solve the network design
problem separately for ρ1 and ρ2 , and then combine the solutions to get a 2-
approximation for the original problem.

Lemma 1. For the set systems ρ1 and ρ2 , the problem (3) can be solved opti-
mally.

Proof. For any S ∈ ρ1 , define the requirement function as f1 (S) = f (S). Then
the problem for the first collection ρ1 is
X
minimize ce xe (6)
e∈E

subject to
X
xe ≥ f1 (S), for each S ∈ ρ1 ,
e∈δG (S)

xe ∈ {0, 1}, for each e ∈ E,

Note that f1 is intersecting supermodular on the set family ρ1 . For any


S1 , S2 ∈ ρ1 , we have r 6∈ S1 ∪ S2 , and so S1 ∪ S2 6= V . This together with
crossing supermodularity of f implies that the requirement function f1 (S) is
intersecting supermodular on ρ1 , and hence the LP relaxation of (6) is TDI, and
the integer program can be solved optimally.
We will use a similar idea for ρ2 . This is not an intersecting set system. To
be able to apply Schrijver’s theorem we will consider the reverse of the graph G
defined as Gr = (V, E r ) where E r = {i → j : j → i ∈ E}. If e = i → j ∈ E,
then for er = j → i ∈ E r define cer = ce .
Note that the network design problem with the function f2 (S) = f (S) for
S ∈ ρ2 is equivalent to the problem on the reverse graph Gr with requirement
function f2r (V \ S) = f2 (S) for S ∈ ρ2 . Let ρr2 = {A ⊆ V : V \ A ∈ ρ2 } and
f2r (S) = f (V \ S) for any S ∈ ρr2 . Then the second subproblem is
X
minimize ce xe (7)
e∈E r
Approximation Algorithms for a Directed Network Design Problem 349

subject to
X
xe ≥ f2r (S), for each S ∈ ρr2 ,
e∈δGr (S)

xe ∈ {0, 1}, for each e ∈ E r ,

Note that f2r is intersecting supermodular on the set system ρr2 : Suppose
S1 , S2 ∈ ρr2 such that S1 ∩ S2 6= ∅. Then S1 ∪ S2 = S1 ∩ S2 6= V , and we have
that r ∈ S1 ∩ S2 6= ∅, so the sets S1 and S2 are crossing, and from the crossing
supermodularity of f we get

f2 (S1 ) + f2 (S2 ) = f (S1 ) + f (S2 ) ≤ f (S1 ∩ S2 ) + f (S1 ∪ S2 )


= f (S1 ∩ S2 ) + f (S1 ∪ S2 ) = f2 (S1 ∪ S2 ) + f2 (S1 ∩ S2 )

That is, f2 (S) is intersecting supermodular on ρr2 , and hence the linear pro-
gramming relaxation of (7) is TDI, and the integer program can be solved opti-
mally. t
u
We have the following simple algorithm: solve (6) and (7) optimally, and
return the union of the optimal arc sets of the two problems as a solution to (3).

Theorem 1. The algorithm given above returns a solution to (3) which is within
a factor of 2 of the optimal.

Proof. Let x∗ be an optimal solution to (3); x̃ and x̂ be optimal solutions to (6)


and (7) respectively. Since x∗ is a feasible solution for both (6) and (7), we have
X X X X X
2∗ ce x∗e = ce x∗e + ce x∗e ≥ ce x˜e + ce xˆe . (8)
e∈E e∈E e∈E e∈E e∈E

Combining the
Poptimal arcPsets of (6) and (7) we will get a solution for (3)
of cost at most e∈E ce x̃e + e∈E ce x̂e , which is within a factor of 2 of the
optimal. t
u

3 The Description of the Basic Solutions


Consider the LP relaxation of the main problem (1).
X
minimize ce xe (9)
e∈E

subject to

x(δG (S)) ≥ f (S), for each S ⊆ V,


0 ≤ xe ≤ 1, for each e ∈ E.

The second main result of the paper is the following property of the basic
solutions of (9).
350 Vardges Melkonian and Éva Tardos

Theorem 2. If f (S) is crossing supermodular then in any basic solution of (9),


xe ≥ 1/4 for at least one arc e.

The rest of this section gives a proof to this theorem. The outline of the
proof is analogous to Jain’s [8] proof. Consider a basic solution x to the linear
program.

– First we may assume without loss of generality that xe > 0 for all arcs e.
To see this simply delete all the arcs from the graph that have xe = 0. Also
assume xe < 1 for all arcs e; in the opposite case the theorem obviously
holds.
– If x is a basic solution with m non-zero variables, then there must be m
linearly independent inequalities in the problem that are satisfied by equa-
tion. Each inequality corresponds to a set S, and those satisfied by equations
correspond to tight sets, i.e., sets S such that x(δG (S)) = f (S). We use the
well-known structure of tight sets to show that there are m linearly indepen-
dent such equations that correspond to a cross-free family of m tight sets.
(A family of sets is cross-free if for all pairs of sets A and B in the family one
of the four sets A \ B, B \ A, A ∩ B or the complement of A ∪ B is empty.)
– We will use the fact that a cross-free family of sets can be naturally repre-
sented by a forest.
– We will prove the theorem by contradiction. Assume that the statement is
not true, and all variables have value xe < 1/4. We consider any subgraph
of the forest; using induction on the size of the subgraph we show that the
k sets in this family have more than 2k endpoints. Applying this result for
the whole forest we get there the m tight sets of the cross-free family have
more than 2m endpoints. This is a contradiction since m arcs can have only
2m endpoints.

The hard part of the proof is the last step in this outline. While Jain can
establish the same contradiction based on the assumption that all variables have
xe < 1/2, we will need the stronger assumption that xe < 1/4 to reach a con-
tradiction.
We start the proof by discussing the structure of tight sets. Call a set S
tight if x(δG (S)) = f (S). Two sets A and B are called intersecting if A ∩ B,
A \ B, B \ A are all nonempty. A family of sets is laminar if no two sets in it are
intersecting, i.e., every pair of sets is either disjoint or one is contained in the
other.
Two sets are called crossing if they are intersecting and A∪B is not the whole
set V . A family of sets is cross-free if no two sets in the family are crossing. The
key lemma for network design problems with crossing supermodular requirement
function is that crossing tight sets have tight intersection and union, and the rows
corresponding to these four sets in the constraint matrix are linearly dependent.
Let AG (S) denote the row corresponding to the set S in the constraint matrix
of (1).
Approximation Algorithms for a Directed Network Design Problem 351

Lemma 2. If two crossing sets A and B are tight then their intersection and
union are also tight, and if xe > 0 for all arcs then AG (A) + AG (B) = AG (A ∪
B) + AG (A ∩ B).
Proof. First observe that the function x(δG (S)) is submodular, and that equation
holds if and only if there are no positive weight arcs crossing from A to B or
from B to A, i.e., if x(δG (A, B)) = x(δG (B, A)) = 0, where δG (A, B) denotes
the arcs that are leaving A and entering B.
Next consider the chain of inequalities, using the crossing submodularity of
x(δG (.)), supermodularity of f and the fact that A and B are tight.
f (A ∪ B) + f (A ∩ B) ≥ f (A) + f (B) = x(δG (A)) + x(δG (B))
≥ x(δG (A ∪ B)) + x(δG (A ∩ B) ≥ f (A ∪ B) + f (A ∩ B).
This implies that both A ∩ B and A ∪ B are tight and x(δG (A, B)) =
x(δG (B, A)) = 0. By our assumption that xe > 0 on all arcs e, this im-
plies that δG (A, B) and δG (B, A) are both empty, and so AG (A) + AG (B) =
AG (A ∪ B) + AG (A ∩ B). t
u
The main tool for the proof is a cross-free family of |E(G)| linearly indepen-
dent tight sets.
Lemma 3. For any basic solution x that has xe > 0 for all arcs e, there exists
a cross-free family Q of tight subsets of V satisfying the following:
– |Q| = |E(G)|.
– The vectors AG (S), S ∈ Q are independent.
– For every set S ∈ Q, f (S) ≥ 1.
Proof. For any basic solution x that has xe > 0 for all arcs e there must be a
set of |E(G)| tight sets so that the vectors AG (S) corresponding to the sets S
are linearly independent.
Let Q be the family of all tight sets S. We use the technique of [7] to uncross
sets in Q and to create the family of tight sets in the lemma. Note that the rank
of the set of vectors AG (S) is |E(G)|. Suppose there are two crossing tight sets
A and B in the family. We can delete either of A or B from our family and add
both the union and the intersection. This step maintains the properties that
– all the sets in our family are tight,
– the rank of the set of vectors AG (S) for S ∈ Q is |E(G)|.
This is true, as the union and intersection are tight by Lemma 2, and by the
same lemma the vector of the deleted element B can be expressed as a linear
combination of the other three as AG (B) = AG (A ∪ B) + AG (A ∩ B) − AG (A).
In [7] it was shown that a polynomial sequence of such steps will result in a
family whose sets are cross-free. Now we can delete dependent elements to get
the family of sets satisfying the first two properties stated in the lemma.
To see the last property observe that the assumption that xe > 0 for all arcs
e implies that all the tight sets S that have a nonzero vector AG (S) must have
f (S) ≥ 1. t
u
352 Vardges Melkonian and Éva Tardos

Following the notation of András Frank we will represent a cross-free family


as a laminar family with two kind of sets: round and square sets. Take any
element v of V (G). Let S ∈ Q be a round set if v 6∈ S and S be a square set if
v ∈ S. It is easy to see that by representing all square sets by their complements
we get a laminar family. This laminar family will provide the the natural forest
representation for our cross-free family.

Lemma 4. For any cross-free family Q, the sets {R ∈ Q for round sets}, and
the sets {V \ S for square sets S ∈ Q} together form a laminar family.

The laminar family as given by Lemma 4 has the following forest rep-
resentation. Let R1 , ..., Rk be the round sets and S1 , . . . , Sl be the square
sets of a cross-free family Q. Consider the corresponding laminar family L =
{R1 , ..., Rk , S¯1 , ..., S̄l }. Define the forest F as follows. The node set of F is L,
and there is an arc from U ∈ L to W ∈ L if W is the smallest subset in L con-
taining U . See the figure below for an example of a laminar family L obtained
from a cross-free family Q and the tree representation of L.

Fig. 1. A laminar family with tree representation

Now let’s give some intuition for our main proof. Lemma 3 says that |Q| =
|E(G)|, that is,

T here are exactly twice as many arc endpoints of G as subsets of Q (10)

The idea of the proof comes from the following observation. Assuming that
the statement of Theorem 1 is not true, that is for any e ∈ E(G), xe < 1/4,
distribute the endpoints such that each subset of Q gets at least 2 endpoints
and some subsets get more than 2 endpoints. This will contradict (10). How to
find this kind of distribution? The concept of incidence defined below gives the
necessary connection between endpoints and subsets.
We say that an arc e = i → j of G = (V, E) leaves a subset U ∈ Q if i ∈ U
and j ∈ Ū. Consider an arc e = i → j of G = (V, E). We will define which set is
the head and the tail of this arc incident to. If a node of the graph is the head or
the tail of many arcs, then each head and tail at the same node may be incident
to different sets. If U ∈ L is the smallest round subset of L such that e is leaving
Approximation Algorithms for a Directed Network Design Problem 353

U then the tail of e at i is called an endpoint incident to U ; when no round


subset satisfies the above condition then the tail of e at i is called incident to
W ∈ L if W is the smallest square subset containingboth i and j. Similarly, if
U ∈ L is the smallest square subset of L such that e is leaving Ū then the head
of e at j is called an endpoint incident to U ; when no square subset satisfies
the above condition then the head of the arc e at j is called incident to W ∈ L
if W is the smallest round subset containing both i and j. For example, in the
following picture, the tail at i is incident to S, head at j is incident to R, the
head at w is incident to P , and the tail at u is not incident to any of given
subsets.

R
S P i j
w T

Fig. 2. Incidence between endpoints and subsets

Note that by definition each arc endpoint of G is incident to at most one


subset of Q. Thus, in our distribution each subset can naturally get those end-
points which are incident to it. If a subset gets more than 2 endpoints, we will
reallocate the “surplus” endpoints to some other subset “lacking” endpoints so
that the minimum “quota” of 2 is provided for every subset.
How to do the reallocation? Here we can use the directed forest F defined
above. That is, start allocating endpoints to the subsets which correspond to
the leaves of the forest. If a subset has surplus endpoints reallocate them to
its parent, call this process “pumping” the surplus endpoints up the tree. If
at the end of the “pumping up” process, every non-root node gets 2 endpoints
and at least one root node gets more than 2 endpoints then we are getting a
contradiction to (10).
Obviously, the mathematical technique to accomplish this process of pumping
up is induction. That is, achieve the above allocation first for smaller subtrees
and using that achieve the allocation for larger subtrees. Ideally, we would like
to prove that for any rooted subtree it is possible to allocate the endpoints such
that each node gets 2 endpoints and the root gets at least some specified number
k which is strictly greater than 2. Unfortunately, this kind of simple statement
doesn’t allow us to apply the induction successfully. The point is that the roots
of some tree structures can always get more than k endpoints; and if we use this
loose estimate for those “surplus-rich” nodes then this might cause some of their
ancestors to lack even the minimum quota of 2. That is why we need to define
354 Vardges Melkonian and Éva Tardos

some tree structures and for each of them state a different k as the minimum
number of endpoints that the root of that particular subtree can get.
Next we define the necessary tree structures. See the figures for examples.

uniform-chain chain-merger chameleon

Fig. 3. Tree structures

A chain is a subtree of nodes, all of which are unary except maybe the tail
(the lowest node in the chain). Note that a node of any arity is itself a chain.
A chain with a non-unary tail is called the survival chain of its head (the
highest node in the chain). So if a node is not unary then its survival chain is
the node itself.
A chain of nodes having the same shape and requirement 1 is called a uniform-
chain.
A chain-merger is a union of a round-shape uniform-chain and a square-
shape uniform-chain such that the head of one of the chains is a child of a node
of the other chain.
A subtree is called chameleon if
– the root has one child;
– the survival chain of the root consists of nodes of both shapes;
– the tail of the survival chain has two children.
The rest of this section is the following main lemma which completes the
proof of the theorem.

Lemma 5. Suppose every arc takes value strictly less than 1/4. Then for any
rooted subtree of the forest, we can distribute the endpoints contained in it such
that every vertex gets at least 2 endpoints and the root gets at least
– 5 if the subtree is a uniform-chain;
– 6 if the subtree is a chameleon;
– 7 if the subtree is a chain-merger;
– 10 if the root has at least 3 children;
– 8 otherwise.

Proof. by induction on the height of the subtree.


Approximation Algorithms for a Directed Network Design Problem 355

First consider a leaf R. If the requirement of R is 1 then it is a uniform-chain


and needs at least 5 endpoints allocated to it (hereafter, this allocated number
will be called label ). On the other hand, since all the variables have values less
than 1/4 then there are at least 5 arcs leaving R. Since R has no children this
implies that there are at least 5 endpoints incident to it, so R gets label at least
5. If the requirement of R is greater than 1 then the same argument shows that
it can get label at least 10.
For the subtrees having more than one node let’s consider cases dependent
on the number of children of the root R. Hereafter, without loss of generality we
will assume that R is a round node; the other case is symmetric.

Case 1: If R has at least four children, by induction it gets label at least 4∗3 = 12
since each child has excess of at least 3.

Case 2: Suppose R has three children: S1 , S2 , S3 . If at least one of its children


has label ≥ 6 then R will get at least 3 + 3 + 4 = 10. Consider the case when
all three children have labels 5, i.e., by induction the subtrees of all three are
uniform-chains. Let’s show that in this case R has an endpoint incident to it
(and therefore gets label ≥ 3 + 3 + 3 + 1 = 10). Consider cases dependent on the
shapes of the children.

1. If the subtrees of S1 , S2 , S3 are round-shaped uniform-chains then an arc


leaving one of the Si ’s has its head in R. Otherwise, we would have AG (R) =
AG (S1 )+AG (S2 )+AG (S3 ) which contradicts the independence of the vectors
AG (R), AG (S1 ), AG (S2 ), AG (S3 ).
2. Suppose S1 and S2 are round-shaped and S3 is square-shaped. Unlike the
previous case, here some arcs which start from S1 and S2 might enter S3
without creating endpoints for R.

S1
S3 S2

Fig. 4. Case 2.2

If R has no endpoints incident to it then f (R) < f (S1 ) + f (S2 ) = 1 + 1 = 2


and hence we must have f (R) = 1. Thus, the sum of the values of the arcs
which leave S1 and S2 and enter S3 , is equal to f (S1 ) + f (S2 ) − f (R) = 1.
Hence, the requirement of 1 of S¯3 is completely satisfied by this kind of arcs.
356 Vardges Melkonian and Éva Tardos

In the result, AG (R) = AG (S1 ) + AG (S2 ) − AG (S¯3 ) which contradicts the


independence.
3. Suppose S1 is round-shaped and S2 , S3 are square-shaped. If there is no
endpoint incident to R then all the arcs leaving S1 should also leave R to
satisfy the requirement of R which can be only 1; but this means AG (R) =
AG (S1 ) contradicting the independence.
4. If S1 , S2 , S3 are square-shaped then R should have endpoints to satisfy its
positive requirement.

Case 3: Suppose R has two children, S1 and S2 . Then R needs label 7 if its
subtree is a chain-merger and label 8 otherwise. Let’s consider cases in that
order.

1. The subtree of R is a chain-merger if and only if the subtrees of S1 and S2


are uniform-chains of different shapes. In this case, by the argument used in
case 2.3, R has an endpoint incident to it, so it gets label at least 3+3+1=7.
2. If none of the subtrees of S1 and S2 is a uniform-chain then R gets at least
4+4=8. R gets at least 8=5+3 also in the case when at least one of its
children has label ≥ 7.
3. The remaining hard case is when the subtree of S1 is a uniform-chain with
label 5 and the subtree of S2 is a chameleon with label 6. Let’s show that R
has an endpoint incident to it and so gets label at least 3+4+1=8. Assume
that R doesn’t have endpoints. Consider 2 cases:
(a) Suppose S1 is round-shaped. Let T be the highest round-shaped node
in the survival chain of S2 (can be S2 itself). If an arc leaving T doesn’t
leave R then its head is incident to R; so all the arcs leaving T leave
also R. Since f (S1 ) = 1, f (R) ∈ Z and there is no endpoint incident
to R then either f (R) = f (T ) + 1 or f (R) = f (T ). In the first case,
all the arcs leaving S1 should leave also R (see figure), and AG (R) =
AG (S1 ) + AG (T ); in the second case no arc leaves both S1 and R, and
AG (R) = AG (T ); so the independence is violated in both cases.

R
R
S2
S1 T
T
S1
S2

case (b)
case (a) when f(R)=f(T)+1

Fig. 5. Two children cases


Approximation Algorithms for a Directed Network Design Problem 357

(b) Suppose S1 is square-shaped. Let T be the highest round-shaped node in


the survival chain of S2 . There is an arc leaving both T and S¯1 , otherwise
we would have AG (R) = AG (T ). In that case, all the arcs leaving S¯1
should leave also T because f (S¯1 ) = 1 and f (R) ∈ Z (see figure). Thus,
AG (T ) = AG (S¯1 ) + AG (R) contradicting the independence.

Case 4: Suppose R has one child, S. Consider cases dependent on the structure
of the subtrees of R and S.

1. Suppose the subtree of R is a round-shaped uniform-chain, and hence R


needs label 5. Then the subtree of S is also a uniform-chain. The indepen-
dence of AG (R) and AG (S) (along with integral requirement of R) implies
that R should have at least 2 endpoints incident to it (see figure); thus, R
gets label ≥ 3 + 2 = 5.

R
S S T
R

case 4.1 case 4.2(b)

Fig. 6. One child cases

2. Suppose the subtree of R is a chameleon, and hence R needs label 6. Consider


3 cases.
(a) The subtree of S is a chameleon. Then the survival chain of S contains
a round node T . The independence of AG (R) and AG (T ) implies that
R should have at least 2 endpoints incident to it; thus, R gets label
≥ 4 + 2 = 6.
(b) The subtree of S, τ is a chain-merger. Then τ has a round node T such
that every other round node of τ is contained in T . The independence
of AG (R) and AG (T ) implies that R should have at least 1 endpoint
incident to it (see figure); thus, R gets label ≥ 5 + 1 = 6.
(c) In all other cases, S has label at least 8, so R gets at least 6.
3. Suppose the subtree of R is a chain-merger, and hence R needs label 7.
Consider 2 cases.
(a) The subtree of S is a chain-merger itself. Then R and S are round-shaped,
and the independence of AG (R) and AG (S) implies that R should have
at least 2 endpoints incident to it. Thus, R gets label ≥ 5 + 2 = 7.
(b) The subtree of S is a square-shaped uniform-chain. Since R contains only
square-shaped subsets, it should have at least 5 endpoints to satisfy its
requirement; thus, R gets ≥ 3 + 5 = 8.
358 Vardges Melkonian and Éva Tardos

4. Suppose the subtree of R is none of the 3 structures considered above, and


hence R needs label 8. Then the subtree of S can’t be a uniform-chain or
chameleon, and has label at least 7. Consider 2 cases.
(a) The survival chain of S contains a round-shaped subset T . Then the
independence of AG (R) and AG (T ) implies that R should have at least
2 endpoints incident to it. Thus, R gets label ≥ 7 + 2 = 9.
(b) All the nodes in the survival chain of S are square-shaped. Then its tail T
has at least 3 children and label ≥ 10. Based on the same independence
argument, any node in the survival-chain, including S, also gets label
≥ 10. Thus, R gets at least 8.
t
u

4 The 4-Approximation Algorithm via Iterative


Rounding

The idea of the algorithm is to iteratively round the solutions of the linear
programs derived from the main IP formulation as described below.
Based on Theorem 2, if we include the arcs with value at least a quarter in
the solution of the main integer program (1) then the factor that we lose in the
cost is at most 4. These arcs might not form a feasible solution for (1) yet. We
reduce the LP to reflect the set of edges included in the solution, and apply the
method recursively.
Formally, let x∗ be an optimal basic solution of (9) and E1/4+ be the set of
the arcs which take value at least 1/4 in x∗ . Let Eres = E − E1/4+ . Consider
the residual graph Gres = (V, Eres ) and the corresponding residual LP:
X
minimize ce xe (11)
e∈Eres

subject to

x(δGres (S)) ≥ f (S) − |E1/4+ ∩ δG (S)|, for each S ⊆ V,


0 ≤ xe ≤ 1, for each e ∈ Eres .

This residual program has the same structure as (9); the difference is that the
graph G and the requirements f (S) are reduced respectively to Gres and f (S) −
|E1/4+ ∩ δG (S)| considering that the arcs from E1/4+ are already included in the
integral solution. Theorem 2 can be applied to (11) if the reduced requirement
function is crossing supermodular which is shown next.
A function f : 2V 7→ Z is called submodular if −f is supermodular, i.e., if for
all A, B ⊆ V , f (A)+f (B) ≥ f (A∩B)+f (A∪B). The functions |δG (.)| and more
generally x(δG (.)) for any nonnegative vector x are the most classical examples of
a submodular functions. The requirement function in the residual problem is the
difference of a crossing supermodular function f and this submodular function,
so it is also crossing supermodular.
Approximation Algorithms for a Directed Network Design Problem 359

Theorem 3. Let G̃ = (V, Ẽ) be a subgraph of the directed graph G = (V, E). If
f : 2V 7→ Z is a crossing supermodular function then f (S) − |δG̃ (S)| is also a
crossing supermodular function.

Summarizing, we have the following high-level description of our algorithm:


– Find an optimal basic solution to LP (9).
– Include all the arcs with values 1/4 or more in the solution of (1).
– Delete all the arcs included in the solution from the graph, and solve the
residual problem recursively.
The algorithm terminates when there are no positive requirements left.
The method requires solving the linear program (9). Note that (9) has a
constraint for each subset S. Using the ellipsoid method Grötschel, Lovász and
Schrijver [6] proved that such a linear program can be solved in polynomial time
if there is a polynomial time separation subroutine, i.e., a method that for a given
vector x can find a subset S such that x(δG (S)) < f (S) if such a set exists. Note
that a violated set exists if and only if the minimum minS (x(δG (S)) − f (S)) is
negative. The function x(δG (S))−f (S) is crossing submodular. Grötschel, Lovász
and Schrijver [6] designed a polynomial time algorithm to minimize submodular
functions. Note that a crossing submodular function is (fully) submodular if
restricted to the sets {S : r ∈ S, v 6∈ S} for any nodes r 6= v. We can obtain the
minimum of a crossing submodular function by selecting a node r and computing
the minimum separately over the sets {S : r ∈ S, v 6∈ S} and {S : r 6∈ S, v ∈ S}
for all nodes v 6= r.

Theorem 4. The iterative algorithm given above returns a solution for (1)
which is within a factor of 4 of the optimal.

Proof. We prove by induction that given a basic solution x∗ to (9) the method
finds a feasible solution to the integer program (1) of cost at most 4cx∗ . Consider
an iteration of the method. We add the arcs E1/4+ to the integer solution. The
cost of these arcs is at most P 4 times the corresponding part of the fractional
solution x∗ , i.e., c(E1/4+ ) ≤ 4 e∈E1/4+ ce x∗e . A feasible solution to the residual
problem can be obtained by projecting the current solution x∗ to the residual
arcs. Using purification we obtain a basic solution to the residual linear program
of cost at most the cost of x∗ restricted to the arcs Eres . We recursively apply
the method to find an integer solution
P to the residual problem of cost at most 4
times this cost, i.e., at most 4 e∈Eres ce x∗e . This proves the theorem. t
u
The algorithm stated above is assuming that we solve a linear program every
iteration. However, as seen by the proof, it suffices to solve the linear program
(9) once, and use purification to obtain a basic solution to the residual problems
in subsequent iterations.
360 Vardges Melkonian and Éva Tardos

References
1. A. Agrawal, P. Klein, and R. Ravi. When trees collide: An approximation algorithm
for generalized Steiner tree problems on networks. In Proceedings of the 23rd ACM
Symposium on Theory of Computing, pages 134–144, 1991.
2. M. Charikar, C. Chekuri, T. Cheung, Z. Dai, A. Goel, S. Guha, and M. Li. Ap-
proximation algorithms for directed Steiner problems. In Proceedings of the 9th
Annual Symposium of Discrete Algorithms, pages 192–200, 1998.
3. J. Edmonds. Edge-disjoint branchings. In R. Rustin, editor, Combinatorial Algo-
rithms, pages 91–96. Academic Press, New York, 1973.
4. M. X. Goemans, A. Goldberg, S.Plotkin, D. Shmoys, E. Tardos, and D. P.
Williamson. Approximations algorithms for network design problems. In Pro-
ceedings of the 5th Annual Symposium on Discrete Algorithms, pages 223–232,
1994.
5. M. X. Goemans and D. P. Williamson. A general approximation technique for
constrained forest problem. SIAM Journal on Computing, 24:296–317, 1995.
6. M. Grotschel, L. Lovasz, and A. Schrijver. Geometric algorithms and combinatorial
optimization. Springer-Verlag, 1988.
7. C. A. Hurkens, L. Lovasz, A. Schrijver, and E. Tardos. How to tidy up your
set-system? In Proceedings of Colloquia Mathematica Societatis Janos Bolyai 52.
Combinatorics, pages 309–314, 1987.
8. K. Jain. A factor 2 approximation algorithm for the generalized Steiner network
problem. In Proceedings of the 39th Annual Symposium on the Foundation of
Computer Science, pages 448–457, 1998.
9. S. Khuller and U. Vishkin. Biconnectivity approximations and graph carvings.
Journal of the Association of Computing Machinery, 41:214–235, 1994.
10. R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-
constant error-probability PCP characterization of NP. In Proceedings of the 29th
Annual ACM Symposium on the Theory of Computing, pages 475–484, 1997.
11. A. Frank. Kernel systems of directed graphs. Acta Sci. Math (Szeged), 41:63–76,
1979.
12. D. P. Williamson, M. X. Goemans, M. Mihail, and V. V. Vazirani. A primal-dual
approximation algorithm for generalized Steiner network problems. Combinatorica,
15:435–454, 1995.
Optimizing over All Combinatorial Embeddings
of a Planar Graph
(Extended Abstract)

Petra Mutzel? and René Weiskircher??

Max–Planck–Institut für Informatik, Saarbrücken


[email protected], [email protected]

Abstract. We study the problem of optimizing over the set of all com-
binatorial embeddings of a given planar graph. Our objective function
prefers certain cycles of G as face cycles in the embedding. The motiva-
tion for studying this problem arises in graph drawing, where the chosen
embedding has an important influence on the aesthetics of the drawing.
We characterize the set of all possible embeddings of a given biconnected
planar graph G by means of a system of linear inequalities with {0, 1}-
variables corresponding to the set of those cycles in G which can appear
in a combinatorial embedding. This system of linear inequalities can be
constructed recursively using SPQR-trees and a new splitting operation.
Our computational results on two benchmark sets of graphs are surpris-
ing: The number of variables and constraints seems to grow only linearly
with the size of the graphs although the number of embeddings grows
exponentially. For all tested graphs (up to 500 vertices) and linear objec-
tive functions, the resulting integer linear programs could be generated
within 10 minutes and solved within two seconds on a Sun Enterprise
10000 using CPLEX.

1 Introduction
A graph is called planar when it admits a drawing into the plane without edge-
crossings. There are infinitely many different drawings for every planar graph,
but they can be divided into a finite number of equivalence classes. We call two
planar drawings of the same graph equivalent when the sequence of the edges in
clockwise order around each node is the same in both drawings. The equivalence
classes of planar drawings are called combinatorial embeddings. A combinatorial
embedding also defines the set of cycles in the graph that bound faces in a planar
drawing.
The complexity of embedding planar graphs has been studied by various
authors in the literature [4, 3, 5]. E.g., Bienstock and Monma have given poly-
nomial time algorithms for computing an embedding of a planar graph that
?
Partially supported by DFG-Grant Mu 1129/3-1, Forschungsschwerpunkt “Effiziente
Algorithmen für diskrete Probleme und ihre Anwendungen”
??
Supported by the Graduiertenkolleg “Effizienz und Komplexität von Algorithmen
und Rechenanlagen”

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 361–376, 1999.
c Springer-Verlag Berlin Heidelberg 1999
362 Petra Mutzel and René Weiskircher

minimizes various distance functions to the outer face [4]. Moreover, they have
shown that computing an embedding that minimizes the diameter of the dual
graph is NP-hard.
In this paper we deal with the following optimization problem concerned with
embeddings: Given a planar graph and a cost function on the cycles of the graph.
Find an embedding Π such that the sum of the cost of the cycles that appear as
face cycles in Π is minimized. When choosing the cost 1 for all cycles of length
greater or equal to five and 0 for all other cycles, the problem is NP-hard [13].
Our motivation to study this optimization problem and in particular its inte-
ger linear programming formulation arises in graph drawing. Most algorithms for
drawing planar graphs need not only the graph as input but also a combinatorial
embedding. The aesthetic properties of the drawing often changes dramatically
when a different embedding is chosen.

5
0
4

3 1 2 6
2
3 7
0 1

6
4 8
7

8 5

(a) (b)

Fig. 1. The impact of the chosen planar embedding on the drawing

Figure 1 shows two different drawings of the same graph that were generated
using the bend minimization algorithm by Tamassia [12]. The algorithm used
different combinatorial embeddings as input. Drawing 1(a) has 13 bends while
drawing 1(b) has only 7 bends. It makes sense to look for the embedding that
will produce the best drawing. Our original motivation has been the following.
In graph drawing it is often desirable to optimize some cost function over all
possible embeddings in a planar graph. In general these optimization problems
are NP-hard [9]. For example: The number of bends in an orthogonal planar
drawing highly depends on the chosen planar embedding. In the planarization
method, the number of crossings highly depends on the chosen embedding when
the deleted edges are reinserted into a planar drawing of the rest-graph. Both
problems can be formulated as flow problems in the geometric dual graph. A
flow between vertices in the geometric dual graph corresponds to a flow between
Optimizing over All Combinatorial Embeddings of a Planar Graph 363

adjacent face cycles in the primal graph. Once we have characterized the set of all
feasible embeddings (via an integer linear formulation on the variables associated
with each cycle), we can use this in an ILP-formulation for the corresponding flow
problem. Here, the variables consist of ‘flow variables’ and ‘embedding variables’.
This paper introduces an integer linear program whose set of feasible solu-
tions corresponds to the set of all possible combinatorial embeddings of a given
biconnected planar graph. One way of constructing such an integer linear pro-
gram is by using the fact that every combinatorial embedding corresponds to a
2-fold complete set of cycles (see MacLane [11]). The variables in such a program
are all simple cycles in the graph; the constraints guarantee that the chosen sub-
set of all simple cycles is complete and that no edge of the graph appears in
more than two simple cycles of the subset.
We have chosen another way of formulating the problem. The advantage
of our formulation is that we only introduce variables for those simple cycles
that form the boundary of a face in at least one combinatorial embedding of
the graph, thus reducing the number of variables tremendously. Furthermore,
the constraints are derived using the structure of the graph. We achieve this
by constructing the program recursively using a data structure called SPQR-
tree suggested by Di Battista and Tamassia ([1]) for the on-line maintenance
of triconnected components. The static variant of this problem was studied in
[10]. SPQR-trees can be used to code and enumerate all possible combinatorial
embeddings of a biconnected planar graph. Furthermore we introduce a new
splitting operation which enables us to construct the linear description recur-
sively.
Our computational results on two benchmark sets of graphs have been quite
surprising. We expected that the size of the linear system will grow exponentially
with the size of the graph. Surprisingly, we could only observe a linear growth.
However, the time for generating the system grows sub-exponentially; but for
practical instances it is still reasonable. For a graph with 500 vertices and 1019
different combinatorial embeddings the construction of the ILP took about 10
minutes. Very surprising was the fact that the solution of the generated ILPs
took only up to 2 seconds using CPLEX.
Section 2 gives a brief description of the data structure SPQR-tree. In Sec-
tion 3 we describe the recursive construction of the linear constraint system using
a new splitting operation. Our computational results are described in Section 4.

2 SPQR-Trees

In this section, we give a brief description of the SPQR-tree data structure for
biconnected planar graphs. A connected graph is biconnected, if it has no cut
vertex. A cut vertex of a graph G = (V, E) is a vertex whose removal increases
the number of connected components. A connected graph that has no cut vertex
is called biconnected. A set of two vertices whose removal increases the number
of connected components is called a separation pair; a connected graph without
a separation pair is called triconnected.
364 Petra Mutzel and René Weiskircher

SPQR-trees have been suggested by Di Battista and Tamassia ([1]). They


represent a decomposition of a planar biconnected graph according to its split
pairs. A split pair is a pair of nodes in the graph that is either connected by
an edge or has the property that its removal increases the number of connected
components. The split components of a split pair p are the maximal subgraphs of
the original graph, for which p is not a split pair. When a split pair p is connected
by an edge, one of the split components consists just of this edge together with
the incident nodes while the other one is the original graph without the edge.
The construction of the SPQR-tree works recursively. At every node v of the
tree, we split the graph into smaller edge-disjoint subgraphs. We add an edge to
each of them to make sure that they are biconnected and continue by computing
their SPQR-tree and making the resulting trees the subtrees of the node used
for the splitting. Every node of the SPQR-tree has two associated graphs:
– The skeleton of the node defined by a split pair p is a simplified version of the
whole graph where the split-components of p are replaced by single edges.
– The pertinent graph of a node v is the subgraph of the original graph that
is represented by the subtree rooted at v.
The two nodes of the split pair p that define a node v are called the poles
of v. For the recursive decomposition, a new edge between the poles is added
to the pertinent graph of a node which results in a biconnected graph that may
have multiple edges. The SPQR-tree has four different types of nodes that are
defined by the structure and number of the split components of its poles va and
vb :
1. Q-node: The pertinent graph of the node is just the single edge e = {va , vn }.
The skeleton consists of the two poles that are connected by two edges. One
of the edges represents the edge e and the other one the rest of the graph.
2. S-node: The pertinent graph of the node has at least one cut vertex (a node
whose removal increases the number of connected components). When we
have the cut vertices v1 , v2 to vk , they then split the pertinent graph into
the components G1 , G2 to Gk+1 . In the skeleton of the node, G1 to Gk+1
are replaced by single edges and the edge between the poles is added. The
decomposition continues with the subgraphs Gi , where the poles are vi and
vi+1 . Figure 2(a) shows the pertinent graph of an S-node together with the
skeleton.
3. P-node: va and vb in the pertinent graph have more than one split-compo-
nents G1 to Gk . In the skeleton, each Gi is replaced by a single edge and
the edge between the poles is added. The decomposition continues with the
subgraphs Gi , where the poles are again va and vb . Figure 2(b) shows the
pertinent graph of a P-node with 3 split components and its skeleton.
4. R-node: None of the other cases is applicable, so the pertinent graph is
biconnected. The poles va and vb are not a split pair of the pertinent graph.
In this case, the decomposition depends on the maximal split pairs of the
pertinent graph with respect to the pair {va , vb }. A split pair {v1 , v2 } is
maximal with respect to {va , vb }, if for every other split pair {v10 , v20 }, there
Optimizing over All Combinatorial Embeddings of a Planar Graph 365

is a split component that includes the nodes v1 , v2 , va and vb . For each


maximal split pair p with respect to {va , vb }, we define a subgraph Gp of
the original graph as the union of all the split-components of p that do not
include va and vb . In the skeleton, each subgraph Gp is replaced by a single
edge and the edge between the poles is added. The decomposition proceeds
with the subgraphs defined by the maximal split pairs (see Fig. 2(c)).

va va va va

G1
v1 v1
G2 G1 G2 G3
v2 v2
G3
vb vb vb vb

(a) S-node (b) P-node

va va
G1 G2

G3

G4 G5

vb vb

(c) R-node

Fig. 2. Pertinent graphs and skeletons of the different node types of an SPQR-
tree

The SPQR-tree of a biconnected planar graph G where one edge is marked


(the so-called reference edge) is constructed in the following way:
1. Remove the reference edge and consider the end-nodes of it as the poles of
the remaining graph G0 . Depending on the structure of G0 and the number
of split components of the poles, choose the type of the new node v (S, P, R
or Q).
2. Compute the subgraphs G1 to Gk as defined above for the different cases
and add an edge between the poles of each of the subgraphs.
3. Compute the SPQR-trees T1 to Tk for the subgraphs where the added edge
is the reference edge and make the root of these trees the sons of v.
366 Petra Mutzel and René Weiskircher

When we have completed this recursive construction, we create a new Q-node


representing the reference edge of G and make it the root of the whole SPQR-
tree by making the old root a son of the Q-node. This construction implies that
all leaves of the tree are Q-nodes and all inner nodes are S-, P-, or R-nodes.
Figure 3 shows a biconnected planar graph and its SPQR-tree where the edge
{1, 2} was chosen as the reference edge.

2 7

4 3 8 9

5 10

Fig. 3. A biconnected planar graph and its SPQR-tree

When we see the SPQR-tree as an unrooted tree, we get the same tree no
matter what edge of the graph was marked as the reference edge. The skeletons
of the nodes are also independent of the choice of the reference edge. Thus,
we can define a unique SPQR-tree for each biconnected planar graph. Another
important property of these trees is that their size (including the skeletons) is
linear in the size of the original graph and they can be constructed in linear time
([1]).
As described in [1], SPQR-trees can be used to represent all combinatorial
embeddings of a biconnected planar graph. This is done by choosing embeddings
for the skeletons of the nodes in the tree. The skeletons of S- and Q-nodes are
simple cycles, so they have only one embedding. The skeletons of R-nodes are
always triconnected graphs. In most publications, combinatorial embeddings are
defined in such a way, that only one combinatorial embedding for a triconnected
planar graph exists (note that a combinatorial embedding does not fix the outer
face of a drawing which realizes the embedding). Our definition distinguishes
between two combinatorial embeddings which are mirror-images of each other
(the order of the edges around each node in clockwise order is reversed in the
second embedding). When the skeleton of a P-node has k edges, there are (k−1)!
different embeddings of its skeleton.
Every combinatorial embedding of the original graph defines a unique com-
binatorial embedding for each skeleton of a node in the SPQR-tree. Conversely,
when we define an embedding for each skeleton of a node in the SPQR-tree, we
define a unique embedding for the original graph. The reason for this fact is that
each skeleton is a simplified version of the original graph where the split com-
ponents of some split pair are replaced by single edges. Thus, if the SPQR-tree
Optimizing over All Combinatorial Embeddings of a Planar Graph 367

of G has r R-nodes and the P-nodes P1 to Pk where the skeleton of Pi has Li


edges, than the number of combinatorial embeddings of G is exactly

X
k
2r (Li − 1)! .
i=1

Because the embeddings of the R- and P-nodes determine the embedding of


the graph, we call these nodes the decision nodes of the SPQR-tree. In [2], the
fact that SPQR-trees can be used to enumerate all combinatorial embeddings of
a biconnected planar graph was used to devise a branch-and-bound algorithm for
finding a planar embedding and an outer face for a graph such that the drawing
computed by Tamassia’s algorithm has the minimum number of bends among
all possible orthogonal drawings of the graph.

3 Recursive Construction of the Integer Linear Program

3.1 The Variables of the Integer Linear Program

The skeletons of P-nodes are multi-graphs, so they have multiple edges between
the same pair of nodes. Because we want to talk about directed cycles, we can be
much more precise when we are dealing with bidirected graphs. A directed graph
is called bidirected if there exists a bijective function r : E → E such that for
every edge e = (v, w) with r(e) = eR we have eR = (w, v) and r(eR ) = e We can
turn an undirected graph into a bidirected graph by replacing each undirected
edge by two directed edges that go in opposite directions. The undirected graph
G that can be transformed in this way to get the bidirected graph G0 is called
the underlying graph of G0 .
A directed cycle in the bidirected graph G = (V, E) is a sequence of edges of
the following form: c = ((v1 , v2 ), (v2 , v3 ), . . . , (vk , v1 )) = (e1 , e2 , . . . , ek ) with the
properties that every node of the graph is contained in at most two edges of c
and if k = 2, then e1 6= e2 holds. We say a planar drawing of a bidirected graph
is the drawing of the underlying graph, so the embeddings of a bidirected graph
are identical with the embeddings of the underlying graph.
A face cycle in a combinatorial embedding of a bidirected planar graph is a
directed cycle of the graph, such that in any planar drawing that realizes the
embedding, the left side of the cycle is empty. Note that the number of face
cycles of a planar biconnected graph is m − n + 2 where m is the number of
edges in the graph and n the number of nodes.
Now we are ready to construct an integer linear program (ILP) in which the
feasible solutions correspond to the combinatorial embeddings of a biconnected
planar bidirected graph. The variables of the program are binary and they corre-
spond to directed cycles in the graph. As objective function, we can choose any
linear function on the directed cycles of the graph. With every cycle c we asso-
ciate a binary variable xc . In a feasible solution of the integer linear program, a
variable xc has value 1 if the associated cycle is a face cycle in the represented
368 Petra Mutzel and René Weiskircher

embedding and 0 otherwise. To keep the number of variables as small as possible,


we only introduce variables for those cycles that are a face cycle in at least one
combinatorial embedding of the graph.

3.2 Splitting an SPQR-Tree

We use a recursive approach to construct the variables and constraints of the ILP.
Therefore, we need an operation that constructs a number of smaller problems
out of our original problem such that we can use the variables and constraints
computed for the smaller problems to compute the ILP for the original problem.
This is done by splitting the SPQR-tree at some decision-node v. Let e be an
edge incident to v whose other endpoint is not a Q-node. Deleting e splits the
tree into two trees T1 and T2 . We add a new edge with a Q-node attached to
both trees to replace the deleted edge and thus ensure that T1 and T2 become
complete SPQR-trees again. The edges corresponding to the new Q-nodes are
called split edges. For incident edges of v, whose other endpoint is a Q-node, the
splitting is not necessary. Doing this for each edge incident to v results in d + 1
smaller SPQR-trees, called the split-trees of v, where d is the number of inner
nodes adjacent to v . This splitting process is shown in Fig. 4. Since the new
trees are SPQR-trees, they represent planar biconnected graphs which are called
the split graphs of v. We will show how to compute the ILP for the original graph
using the ILPs computed for the split graphs.

Q ... Q
Q ... Q

T1 T1

v Q

Q
Q
T2 T3 Q
Split
...
...

v
Q Q
Q Q
Q Q
Q
Q T2 T3
...
...

Q Q

Fig. 4. Splitting an SPQR-tree at an inner node

As we have seen, the number and type of decision-nodes in the SPQR-tree of


a graph determines the number of combinatorial embeddings. The subproblems
we generated by splitting the tree either have only one decision-node or at least
one fewer than the original problem.
Optimizing over All Combinatorial Embeddings of a Planar Graph 369

3.3 The Integer Linear Program for SPQR-Trees with One Inner
Node

We observe that a graph whose SPQR-tree has only one inner node is isomorphic
to the skeleton of this inner node. So the split-tree of v which includes v, called
the center split-tree of v, represents a graph which is isomorphic to the whole
graph.
The ILPs for SPQR-trees with only one inner node are defined as follows:

– S-node: When the only inner node of the SPQR-tree is an S-node, the whole
graph is a simple cycle. Thus it has two directed cycles and both are face-
cycles in the only combinatorial embedding of the graph. So the ILP consists
of two variables, both of which must be equal to one.
– R-node: In this case, the whole graph is triconnected. According to our
definition of planar embedding, every triconnected graph has exactly two
embeddings, which are mirror-images of each other. When the graph has m
edges and n nodes, we have k = 2(m − n + 2) variables and two feasible
solutions. The constraints are given by the convex hull of the points in k-
dimensional space, that correspond to the two solutions.
– P-node: The whole graph consists only of two nodes connected by k edges
with k ≥ 3. Every directed cycle in the graph is a face cycle in at least one
embedding of the graph, so the number of variables is equal to the number
of directed cycles in the graph. The number of cycles is l = 2 k2 because we
always get an undirected cycle by pairing two edges and, since we are talking
about directed cycles, we get twice the number of pairs of edges. As already
mentioned, the number of embeddings is (k − 1)!. The constraints are given
as the convex hull of the points in l-dimensional space that represent these
embeddings.

3.4 Construction of the ILP for SPQR-Trees with More than One
Inner Node

We define, how to construct the ILP of an SPQR-tree T from the ILPs of the
split-trees of a decision node v of T . Let G be the graph that corresponds to T
and T1 , . . . , Tk the split-trees of v representing the graphs G1 to Gk . We assume
that T1 is the center split-tree of v. Now we consider the directed cycles of G.
We can distinguish two types:

1. Local cycles are cycles of G that also appear in one of the graphs G1 , . . . , Gk .
2. Global cycles of G are not contained in any of the Gi .

Every split-tree of v except the center split-tree is a subgraph of the original


graph G with one additional edge (the split edge corresponding to the added
Q-node). The graph that corresponds to the center split-tree may have more
than one split edge. Note that the number of split edges in this graph is not
necessarily equal to the degree of v, because v may have been connected to Q-
nodes in the original tree. For every split edge e, we define a subgraph expand(e)
370 Petra Mutzel and René Weiskircher

of the original graph G, which is represented by e. The two nodes connected by


a split edge always form a split pair p of G. When e belongs to the graph Gi
represented by the split-tree Ti , then expand(e) is the union of all the split
components of G that share only the nodes of p and no edge with Gi .
For every directed cycle c in a graph Gi represented by a split-tree, we define
the set R(C) of represented cycles in the original graph . A cycle c0 of G is in
R(c), when it can be constructed from c by replacing every split edge e = (v, w)
in c by a simple path in expand(e) from v to w.
The variables of the ILPs of the split-trees that represent local cycles will also
be variables of the ILP of the original graph G. But we will also have variables
that correspond to global cycles of G. A global cycle c in G will get a variable
in the ILP, when the following conditions are met:

1. There is a variable xc1 in the ILP of T1 with c ∈ R(c1 ).


2. For every split-tree Ti with 2 ≤ i ≤ k where c has at least one edge in Gi ,
there is a variable xci in the ILP of Ti such that c ∈ R(ci ).

So far we have defined all the variables for the integer linear program of G.
The set C of all constraints of the ILP of T is given by

C = Cl ∪ Cc ∪ CG .

First we define the set Cl which is the set of lifted constraints of T . Each of the
graphs T1 , . . . , Tk is a simplified versions of the original graph G. They can be
generated from G by replacing some split components of one or more split pairs
by single edges. When we have a constraint that is valid for a split graph, a
weaker version of this constraint is still valid for the original graph. The process
of generating these new constraints is called lifting because we introduce new
variables that cause the constraint to describe a higher dimensional half space
or hyper plane. Let
Xl
.
aj xcj = R
j=1
.
be a constraint in a split-tree, where = ∈ {≤, ≥, =} and let X be the set of all
variables of T . Then the lifted constraint for the tree T is the following:

X
l X .
aj xc = R
j=1 c: c∈R(cj )∩X

We define Cl as the set of lifted constraints of all the split-trees. The number of
constraints in Cl is the sum of all constraints in all split-trees.
The set Cc is the set of choice constraints. For a cycle c in Gi , which includes
a split edge, we have |R(c)| > 1. All the cycles in R(c) share either at least one
directed edge or they pass a split graph of the split node in the same direction.
Therefore, only one of the cycles in R(c) can be a face cycle in any combinatorial
Optimizing over All Combinatorial Embeddings of a Planar Graph 371

embedding of G (proof omitted). For each variable xc in a split tree with |R(c)| >
1 we have therefore one constraint that has the following form:
X
xc0 ≤ 1
c0 ∈R(c)∧xc0 ∈X

The set CG consists of only one constraint, called the center graph constraint.
Let F be the number of face cycles in a combinatorial embedding of G1 , CG the
set of all global cycles c in G and CL the set of all local cycles c in G1 then this
constraint is: X
xc = F
c ∈ (Cg ∪Cl )∩C

This constraint is valid, because we can produce every drawing D of G by re-


placing all split edges in a drawing D1 of G1 with the drawings of subgraphs
of G. For each face cycle in D1 , there will be a face cycle in D, that is either
identical to the one in D1 (if it was a local cycle) or is a global cycle. This defines
the ILP for any biconnected planar graph.

3.5 Correctness of the ILP

Theorem 1. Every feasible solution of the generated ILP corresponds to a com-


binatorial embedding of the given biconnected planar graph G and vice versa:
every combinatorial embedding of G corresponds to a feasible solution for the
generated ILP.

Because the proof of the theorem is quite complex and the space is limited,
we can only give a sketch of the proof. The proof is split into three lemmas.

Lemma 1. Let G be a biconnected planar graph and let T be its SPQR-Tree.


Let µ be a decision node in T with degree d, T1 , . . . , Td0 with d0 ≤ d be the split
trees of µ (T1 is the center split tree) and G1 , . . . , Gd0 the associated split graphs.
Every combinatorial embedding Γ of G defines a unique embedding for each Gi .
On the other side, if we fix a combinatorial embedding Γi for each Gi , we have
defined a unique embedding for G.

proof: (Sketch) To show the first part of the lemma, we start with a drawing
Z of G that realizes embedding Γ . When G0i is the graph Gi without its split
edge, we get a drawing Z1 of G1 by replacing in Z the drawings of the G0i with
2 ≤ i ≤ d0 with drawings of single edges that are drawn inside the area of the
plane formerly occupied by the drawing of G0i . We can show that each drawing of
G1 that we construct in this way realizes the same embedding Γ1 . We construct
a planar drawing of each Gi with 2 ≤ i ≤ d0 by deleting all nodes and edges
from Z that are not contained in Gi and drawing the split edge into the area
of the plane that was formerly occupied by the drawing of a path in G between
the poles of Gi not inside Gi . Again we can show that all drawings produced in
this way realize the same embedding Γi .
372 Petra Mutzel and René Weiskircher

To show the second part of the lemma, we start with special planar drawings
Zi of the Gi that realize the embeddings Γi . We assume that Z1 is a straight line
drawing (such a drawing always exists [8]) and that each Zi with 2 ≤ i ≤ d0 is
drawn inside an ellipse with the split edge on the outer face and the poles drawn
as the vertices on the major axis of the ellipse. Then we can construct a drawing
of G by replacing the drawings of the straight edges in Z1 by the drawings Zi
of the Gi from which the split edges have been deleted. We can show that every
drawing Z we construct in this way realizes the same embedding Γ of G.

To proof the main theorem, we first have to define the incidence vector of a
combinatorial embedding. Let C be the set of all directed cycles in the graph
that are face cycles in at least one combinatorial embedding of the graph. Then
the incidence vector of an embedding Γ is given as a vector in {0, 1}|C| where
the components representing the face cycles in Γ have value one and all other
components have value zero.

Lemma 2. Let Γ = {c1 , c2 , . . . , ck } be a combinatorial embedding of the bicon-


nected planar graph G. Then the incidence vector χΓ satisfies all constraints of
the ILP we defined.

proof: (Sketch) We proof the lemma using induction over the number n of deci-
sion nodes in the SPQR-Tree T of G. The value χ(c) is the value of the component
in χ associated with the cycle c. We don’t consider the case n = 0, because G is
a simple cycle in this case and has only one combinatorial embedding.

1. n = 1:
No splitting of the SPQR-tree is necessary, the ILP is defined directly by T .
The variables are defined as the set of all directed cycles that are face cycles
in at least one combinatorial embedding of G. Since the constraints of the
ILP are defined as the convex hull of all incidence vectors of combinatorial
embeddings of G, χΓ satisfies all constraints of the ILP.
2. n > 1:
From the previous lemma we know that Γ uniquely defines embeddings Γi
with incidence vectors χi for the split graphs Gi . We will use the induction
basis to show that χΓ satisfies all lifted constraints. We know that the choice
constraints are satisfied by χΓ because in any embedding there can be only
on cycle passing a certain split pair in the same direction. When lifting
constraints, we replace certain variables by the sum of new variables and the
choice constraints guarantee that this sum is either 0 or 1. Using this fact
and the construction of the χi from χΓ , we can show that the sums of the
values of the new variables are always equal to the value of the old variable.
Therefore, all lifted constraints are satisfied.
To see that the center graph constraint is satisfied, we observe that any
embedding of the skeleton of the split node has F faces. We can construct
any embedding of G from an embedding Γ1 of this skeleton by replacing
edges by subgraphs. The faces in Γ that are global cycles are represented
Optimizing over All Combinatorial Embeddings of a Planar Graph 373

by faces in Γ1 and the faces that are local cycles in G are also faces in Γ1 .
Therefore the center graph constraint is also satisfied.

Lemma 3. Let G be a biconnected planar graph and χ ∈ {0, 1}|C| a vector satis-
fying all constraints of the ILP. Then χ is the incidence vector of a combinatorial
embedding Γ of G.

proof: Again, we use induction on the number n of decision nodes in the SPQR-
tree T of G and we disregard the case n = 0.
1. n = 1:
Like in the previous lemma, our claim holds by definition of the ILP.
2. n > 1:
The proof works in two stages: First we construct vectors χi for each split
graph from χ and prove that these vectors satisfy the ILPs for the Gi , and
are therefore incidence vectors of embeddings Γi of the Gi by induction basis.
In the second stage, we use the Γi to construct an embedding Γ for G and
show that χ is the incidence vector of Γ .
The construction of the χi works as follows: When x is a variable in the
ILP of Gi and the corresponding cycle is contained in G, then x gets the
value of the corresponding variable in χ. Otherwise, we define the value of
x as the sum of the values of all variables in χ whose cycles are represented
by the cycle of x. This value is either 0 or 1 because χ satisfies the choice
constraints.
Because χ satisfies the lifted constraints, the χi must satisfy the original con-
straints and by induction basis we know that each χi represents an embed-
ding Γi of Gi . Using these embeddings for the split graphs, we can construct
an embedding Γ for G like in lemma 1.
To show that χ is the incidence vector of Γ , we define χΓ as the incidence
vector of Γ and show that χ and χΓ are identical. By construction of Γ and
χΓ , the components in χΓ and χ corresponding to local cycles must be equal.
The number of global cycles whose variable in χ has value 1 must be equal
to the number of faces in Γ consisting of global cycles. This is guaranteed
by the center graph constraint. Using the fact that for all face cycle in Γ1
there must be a represented cycle in G whose component in χ and in χΓ is
1 we can show that both vectors agree on the values of the variables of the
global cycles, and thus must be identical.


4 Computational Results
In our computational experiments, we tried to get statistical data about the size
of the integer linear program and the times needed to compute it. Our implemen-
tation works for biconnected planar graphs with maximal degree four, since we
374 Petra Mutzel and René Weiskircher

600
Generation Time

500

400
Seconds

300

200

100

0
0 100 200 300 400 500
Number of Nodes

Fig. 5. Generation time for the ILP

1e+20
Number of Embeddings

1e+18

1e+16

1e+14

1e+12

1e+10

1e+08

1e+06

10000

100

1
0 100 200 300 400 500
Number of Nodes

Fig. 6. Number of embeddings

2500
Number of Constraints

2000

1500

1000

500

0
0 100 200 300 400 500
Number of Nodes

Fig. 7. Number of constraints

are interested in improving orthogonal planar drawings. First we used a bench-


mark set of 11491 practical graphs collected by the group around G. Di Battista
in Rome ([6]). We have transformed these graphs into biconnected planar graphs
with maximal degree four using planarization, planar augmentation, and the ring
approach described in [7]. This is a commonly used approach getting orthogonal
drawings with a small number of bends [7]. The obtained graphs have up to 160
vertices; only some of them had more than 100 different combinatorial embed-
dings. The maximum number of embeddings for any of the graphs was 5000.
Optimizing over All Combinatorial Embeddings of a Planar Graph 375

1000
Number of Variables

900

800

700

600

500

400

300

200

100

0
0 100 200 300 400 500
Number of Nodes

Fig. 8. Number of variables

2
Solution Time

1.5
Seconds

0.5

0
0 100 200 300 400 500
Number of Nodes

Fig. 9. Solution time

The times for generating the ILPs have been below one minute; the ILPs were
quite small. CPLEX has been able to solve all of them very quickly.
In order to study the limits of our method, we started test runs on extremely
difficult graphs. We used the random graph generator developed by the group
around G. Di Battista in Rome that creates biconnected planar graphs with
maximal degree four with an extremely high number of embeddings (see [2] for
detailed information). We generated graphs with the number of nodes ranging
from 25 to 500, proceeding in steps of 25 nodes and generating 10 random graphs
for each number of nodes. For each of the 200 graphs, we generated the ILP and
measured the time needed to do this. The times are shown in Fig. 5. They
grow sub-exponentially and the maximum time needed was 10 minutes on a Sun
Enterprise 10000.
The number of embeddings of each graph is shown in Fig. 6. They grow
exponentially with the number of nodes, so we used a logarithmic scale for the
y-axis. There was one graph with more than 1019 combinatorial embeddings.
These numbers were computed by counting the number of R- and P-nodes in
the SPQR-tree of each graph. Each R-node doubles the number of combinatorial
embeddings while each P-node multiplies it by 2 or 6 depending on the number
of edges in its skeleton. Figures 7 and 8 show the number of constraints and
376 Petra Mutzel and René Weiskircher

variables in each ILP. Surprisingly, both of them grow linearly with the number
of nodes. The largest ILP has about 2500 constraints and 1000 variables.
To test how difficult it is to optimize over the ILPs, we have chosen 10 random
objective functions for each ILP with integer coefficients between 0 and 100 and
computed a maximal integer solution using the mixed integer solver of CPLEX.
Figure 9 shows the maximum time needed for any of the 10 objective functions.
The computation time always stayed below 2 seconds.
Our future goal will be to extend our formulation such that each solution will
not only represent a combinatorial embedding but an orthogonal drawing of the
graph. This will give us a chance to find drawings with the minimum number of
bends or drawings with fewer crossings. Of course, this will make the solution of
the ILP much more difficult.

Acknowledgments We thank the group of G. Di Battista in Rome for giving


us the opportunity to use their implementation of SPQR-trees in GDToolkit, a
software library that is part of the ESPRIT ALCOM-IT project (work package
1.2), and to use their graph generator.

References
[1] G. Di Battista and R. Tamassia. On-line planarity testing. SIAM Journal on
Computing, 25(5):956–997, October 1996.
[2] P. Bertolazzi, G. Di Battista, and W. Didimo. Computing orthogonal draw-
ings with the minimum number of bends. Lecture Notes in Computer Science,
1272:331–344, 1998.
[3] D. Bienstock and C. L. Monma. Optimal enclosing regions in planar graphs.
Networks, 19(1):79–94, 1989.
[4] D. Bienstock and C. L. Monma. On the complexity of embedding planar graphs
to minimize certain distance measures. Algorithmica, 5(1):93–109, 1990.
[5] J. Cai. Counting embeddings of planar graphs using DFS trees. SIAM Journal
on Discrete Mathematics, 6(3):335–352, 1993.
[6] G. Di Battista, A. Garg, G. Liotta, R. Tamassia, E. Tassinari, and F. Vargiu.
An experimental comparison of four graph drawing algorithms. Comput. Geom.
Theory Appl., 7:303–326, 1997.
[7] P. Eades and P. Mutzel. Algorithms and theory of computation handbook, chapter
9 Graph drawing algorithms. CRC Press, 1999.
[8] I. Fary. On straight line representing of planar graphs. Acta. Sci. Math.(Szeged),
11:229–233, 1948.
[9] A. Garg and R. Tamassia. On the computational complexity of upward and
rectilinear planarity testing. Lecture Notes in Computer Science, 894:286–297,
1995.
[10] J. E. Hopcroft and R. E. Tarjan. Dividing a graph into triconnected components.
SIAM Journal on Computing, 2(3):135-158, August 1973.
[11] S. MacLane. A combinatorial condition for planar graphs. Fundamenta Mathe-
maticae, 28:22–32, 1937.
[12] R. Tamassia. On embedding a graph in the grid with the minimum number of
bends. SIAM Journal on Computing, 16(3):421–444, 1987.
[13] G. J. Woeginger. personal communications, July 1998.
A Fast Algorithm for Computing Minimum
3-Way and 4-Way Cuts?

Hiroshi Nagamochi and Toshihide Ibaraki

Kyoto University, Kyoto, Japan 606-8501


{naga, ibaraki}@kuamp.kyoto-u.ac.jp

Abstract. For an edge-weighted graph G with n vertices and m edges,


we present a new deterministic algorithm for computing a minimum
k-way cut for k = 3, 4. The algorithm runs in O(nk−2 (nF (n, m) +
C2 (n, m) + n2 )) = O(mnk log(n2 /m)) time for k = 3, 4, where F (n, m)
and C2 (n, m) denote respectively the time bounds required to solve
the maximum flow problem and the minimum 2-way cut problem in
G. The bound for k = 3 matches the current best deterministic
bound Õ(mn3 ) for weighted graphs, but improves the bound Õ(mn3 )
to O(n(nF (n, m) + C2 (n, m) + n2 )) = O(min{mn8/3 , m3/2 n2 }) for un-
weighted graphs. The bound Õ(mn4 ) for k = 4 improves the previous
best randomized bound Õ(n6 ) (for m = o(n2 )). The algorithm is then
generalized to the problem of finding a minimum 3-way cut in a sym-
metric submodular system.

1 Introduction

Let G = (V, E) stand for an undirected graph with a set V of vertices and a
set E of edges being weighted by non-negative real numbers, and let n and m
denote the numbers of vertices and edges, respectively. For an integer k ≥ 2, a
k-way cut is a partition {V1 , V2 , . . . , Vk } of V consisting of k non-empty subsets.
The problem of partitioning V into k non-empty subsets so as to minimize the
weight sum of the edges between different subsets is called the minimum k-way
cut problem. The problem has several important applications such as cutting
planes in the traveling salesman problem [26], VLSI design [21], task allocation
in distributed computing systems [20] and network reliability. The 2-way cut
problem (i.e., the problem of computing the edge-connectivity) can be solved
by Õ(nm) time deterministic algorithms [7,22] and by Õ(n2 ) and Õ(m) time
randomized algorithms [13,14,15]. For an unweighted planar graph, Hochbaum
and Shmoys [9] proved that the minimum 3-way cut problem can be solved in
O(n2 ) time. However, the complexity status of the problem for general k ≥
3 in an arbitrary graph G has been open for several years. Goldschmidt and
Hochbaum proved that the problem is NP-hard if k is an input parameter [6]. In
?
This research was partially supported by the Scientific Grant-in-Aid from Ministry of
Education, Science, Sports and Culture of Japan, and the subsidy from the Inamori
Foundation.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 377–390, 1999.
c Springer-Verlag Berlin Heidelberg 1999
378 Hiroshi Nagamochi and Toshihide Ibaraki

2
the same article [6], they presented an O(nk /2−3k/2+4 F (n, m)) time algorithm
for solving the minimum k-way cut problem, where F (n, m) denotes the time
required to find a minimum (s, t)-cut (i.e., a minimum 2-way cut that separates
two specified vertices s and t) in an edge-weighted graph with n vertices and
m edges, which can be obtained by applying a maximum flow algorithm. This
running time is polynomial for any fixed k. Afterwards, Karger and Stein [15]
proposed a randomized algorithm that solves the minimum k-way cut problem
with high probability in O(n2(k−1) (log n)3 ) time. For general k, a deterministic
O(n2k−3 m) time algorithm for the minimum k-way cut problem is claimed in
[11] (where no full proof is available).
For k = 3, Kapoor [12] and Kamidoi et al. [10] showed that the problem can
be solved in O(n3 F (n, m)) time, which was then improved to Õ(mn3 ) by Burlet
and Goldschmidt [1]. For k = 4, Kamidoi et al. [10] gave an O(n4 F (n, m)) =
Õ(mn5 ) time algorithm.
Let us call a non-empty and proper subset X of V a cut. Clearly, if we can
identify the first cut V1 in a minimum k-way cut {V1 , . . . , Vk }, then the rest of
cuts V2 , . . . , Vk can be obtained by solving the minimum (k −1)-way cut problem
in the graph induced by V −V1 . For k = 3, Burlet and Goldschmidt [1] succeeded
to characterize a set of O(n2 ) number of cuts which contains at least one such
cut V1 . Thus, by solving O(n2 ) minimum (k − 1)-way cut problems, a minimum
k-way cut can be computed. They showed that, given a minimum 2-way cut
{X, V − X} in G, such V1 is a cut whose weight is less than 4/3 of the weight
of a minimum 2-way cut in G or in the induced subgraphs G[X] and G[V − X].
Since it is known [24] that there are O(n2 ) cuts with weight less than 4/3 of the
weight of a minimum 2-way cut, and all those cuts can be enumerated in Õ(mn3 )
time, their approach yields an Õ(mn3 ) time minimum 3-way cut algorithm.
In this paper, we consider the minimum k-way cut problem for k = 3, 4, and
give a new characterization of a set of O(n) number of such candidate cuts for the
first cut V1 , on the basis of the submodularity of cut functions. We also show that
those O(n) cuts can be obtained in O(n2 F (n, m)) time by using Vazirani and
Yannakakis’s algorithm for enumerating small cuts [28]. Therefore, we can find
a minimum 3-way cut in O(n2 F (n, m) + nC2 (n, m)) = O(mn3 log(n2 /m)) time
and a minimum 4-way cut in O(n3 F (n, m) + n2 C2 (n, m)) = O(mn4 log(n2 /m))
time, where C2 (n, m) denotes the time required to find a minimum 2-way cut
in an edge-weighted graph with n vertices and m edges. The bound for k = 3
matches the current best deterministic bound Õ(mn3 ) for weighted graphs, but
improves the bound Õ(mn3 ) to O(min{mn8/3 , m3/2 n2 }) for unweighted graphs
(since F (n, m) = O(min{mn2/3 , m3/2 }) is known for unweighted graphs [2,12]).
The bound Õ(mn4 ) for k = 4 improves the previous best randomized bound
Õ(n6 ) (for m = o(n2 )). In the case of an edge-weighted planar graph G, we also
shows that the algorithm can be implemented to run in O(n3 ) time for k = 3
and in O(n4 ) time for k = 4, respectively. The algorithm is then generalized to
the problem of finding a minimum 3-way cut in a symmetric submodular system.
In the next section, we review some basic results of cuts and symmetric
submodular functions. In section 3, we present a new algorithm for computing
A Fast Algorithm for Computing Minimum 3-Way and 4-Way Cuts 379

minimum 3-way and 4-way cuts in an edge-weighted graph. In section 4, we


extend the algorithm to the problem of finding a minimum 3-way cut in a sym-
metric submodular system, and analyze its running time for the case in which
the symmetric submodular function is a cut function of a hypergraph. Finally
in section 5, we make some remarks on our approach in this paper.

2 Preliminaries

2.1 Notations and Definitions

A singleton set {x} may be simply written as x, and “ ⊂ ” implies proper


inclusion while “ ⊆ ” means “ ⊂ ” or “ = ”. For a finite set V , a cut is defined as
a non-empty and proper subset X of V . For two disjoint subsets S, T ⊂ V , we
say that a cut X separates S and T if S ⊆ X ⊆ V − T or T ⊆ X ⊆ V − S holds.
A cut X intersects another cut Y if X − Y 6= ∅, Y − X 6= ∅ and X ∩ Y 6= ∅ hold,
and X crosses Y if, in addition, V − (X ∪ Y ) 6= ∅ holds. A set X of cuts of V is
called non-crossing if no two cuts X, Y ∈ X cross each other (it is possible that
X contains a pair of intersecting cuts).
The following observation plays a key role in analyzing the time bound of
our algorithm.

Lemma 1. Let V be a non-empty finite set, and X be a family of cuts in V


such that, for each subset X ∈ X , its complement V − X does not belong to X .
Then |X | ≤ 2|V | − 3.

Proof. Choose an arbitrary vertex r ∈ V as a reference point, and assume that


each cut X ∈ X does not contain r without loss of generality (replace X with
V − X if necessary). Now for any two cuts X, Y ∈ X , r ∈ V − (X ∪ Y ) 6= ∅.
Thus, X is a non-intersecting laminar of non-empty subsets of V − r. Then it is
not difficult to see that X ≤ 2|V − r| − 1 = 2|V | − 3. t
u
A set function f on a ground set V is a function f : 2V 7→ <, where < is
the set of real numbers. A set function f is called submodular if it satisfies the
following inequality

f (X) + f (Y ) ≥ f (X ∩ Y ) + f (X ∪ Y ) for every pair of subsets X, Y ⊆ V . (2.1)

An f is called symmetric if

f (X) = f (V − X) for all subsets X ⊆ V. (2.2)

For a symmetric and submodular set function f , it holds

f (X) + f (Y ) ≥ f (X − Y ) + f (Y − X) for every pair of subsets X, Y ⊆ V . (2.3)

A pair (V, f ) of a finite set V and a set function f on V is called a system. It is


called a symmetric submodular system if f is symmetric and submodular.
380 Hiroshi Nagamochi and Toshihide Ibaraki

Given a system (V, f ), we define the minimum k-way cut problem as follows.
A k-way cut is a partition π = {V1 , V2 , . . . , Vk } of V consisting of k non-empty
subsets. The weight ωf (π) of a k-way cut π is defined by

1
ωf (π) = (f (V1 ) + f (V2 ) + · · · + f (Vk )). (2.4)
2
A k-way cut is called minimum if it has the minimum weight among all k-way
cuts in (V, f ).
Let G = (V, E) be an undirected graph with a set V of vertices and a set
E of edges weighted by non-negative reals. For a non-empty subset X ⊆ V , let
G[X] denote the graph induced from G by X. For a subset X ⊂ V , its cut value,
denoted by c(X), is defined to be the sum of weights of edges between X and
V − X, where c(∅) and c(V ) are defined to be 0. This set function c on V is
called the cut function of G. The cut function c is symmetric and submodular,
as easily verified. For a k-way cut π = {V1 , V2 , . . . , Vk } of V , its weight in G is
defined by
1
ωc (π) = (c(V1 ) + c(V2 ) + · · · + c(Vk )),
2
which means the weight sum of the edges between different cuts in π.

2.2 Enumerating All 2-Way Cuts

In [28], Vazirani and Yannakakis presented an algorithm that finds all the 2-way
cuts in G in the order of non-decreasing weights. The algorithm finds the next
2-way cut by solving at most 2n − 3 maximum flow problems. We describe this
result in a slightly more general way in terms of set functions. This algorithm
will be used in Section 3 to obtain minimum 3-way and 4-way cuts.

Theorem 2. [28] For a system (V, f ) with n = |V |, 2-way cuts in (V, f ) can
be enumerated in the order of non-decreasing weights with O(nFf ) time delay
between two consecutive outputs, where Ff is the time required to find a minimum
2-way cut that separates specified two disjoint subsets S, T ⊂ V in (V, f ).

Proof. Suppose that, for specified two disjoint subsets S, T ⊂ V , there is a


procedure A(S, T ) for finding a minimum 2-way cut that separates S and T
in Ff time. Let V = {v1 , . . . , vn }. To represent a set of 2-way cuts, we use a
p-dimensional {0, 1}-vector µ, where p satisfies 1 ≤ p ≤ n and µ(i) denotes
the i-th entry of vector µ. We assume that µ(1) = 1 for all the vectors µ in
the subsequent discussion. A p-dimensional {0, 1}-vector µ gives Sµ = {vi | i ∈
{1, . . . , p}, µ(i) = 1} and Tµ = {vi | i ∈ {1, . . . , p}, µ(i) = 0} (i.e., p = |Sµ ∪ Tµ |
holds), and represents the set C(µ) of all 2-way cuts {X, V − X} such that

Sµ ⊆ X ⊆ V − T µ

(i.e., those separating Sµ and Tµ ). In particular, C(µ1 ) with a 1-dimensional


vector µ1 represents all 2-way cuts {X, V − X} (with v1 ∈ X), while C(µn ) with
A Fast Algorithm for Computing Minimum 3-Way and 4-Way Cuts 381

an n-dimensional vector µn consists of a single 2-way cut {Sµ , Tµ }, where Sµ


and Tµ are specified by the vector µ = µn .
We first note that, if we remove the single cut in C(µn ) represented by an
n-dimensional vector µn from the entire set C(µ1 ) of 2-way cuts, the set of
remaining 2-ways cuts can be represented by the union of (n − 1) sets C(µn p ),
p = 2, 3, . . . , n, such that µn p is the p-dimensional vector defined by

µn (i) if 1 ≤ i ≤ p − 1
µn p (i) =
1 − µn (i) if i = p.

(Note that there is at most one vector µn p in which all entries are 1.)
Given a p-dimensional vector µ, where 2 ≤ p ≤ n, we can find a minimum
2-way cut {Y, V − Y } over all the 2-way cuts in C(µ) in Ff time by applying
procedure A(Sµ , Tµ ) if Tµ 6= ∅, i.e., µ contains at least one 0-value entry (recall
that v1 ∈ Sµ ); otherwise (if Tµ = ∅, i.e., µ(i) = 1 for all i ∈ {1, . . . , p}), such a
minimum 2-way cut {Y, V − Y } can be found by applying procedure A(Sµ , T )
at most (n − 2) times by choosing T = {vi } for all vi ∈ V − Sµ . Let µ∗ (µ)
denote the n-dimensional {0, 1}-vector that represents the minimum 2-way cut
{Y, V − Y } ∈ C(µ) obtained by this procedure, and let f(µ) be its weight.
With these notations, an algorithm for enumerating all 2-way cuts in the
order of non-decreasing weights can now be described. Initially we compute
µ∗ (µ1 ) and f(µ1 ) for the 1-dimensional vector µ1 with µ1 (1) = 1. Let Q := {µ1 }.
Then we can enumerate 2-way cuts one after another in the non-decreasing order
by repeatedly executing the next procedure B.

Procedure B
Choose a vector α ∈ Q with the minimum weight f(α); µn := µ∗ (α);
Output µn ; Q := Q − {α};
Let a be the dimension of α, and for each β = µn p , p = a + 1, a + 2, . . . , n,
compute µ∗ (β) and f(β), and Q := Q ∪ {β}.

Notice that after deleting α from Q, we do not have to add vectors


µn 1 , . . . , µn a to Q at this point (since these vectors have already been gener-
ated and added to Q). It is not difficult to see that the procedure B correctly
finds the next smallest 2-way cut. The procedure B computes at most n − 1
µ∗ (β), each of which can be obtained by procedure A once, except for the case
where all entries of β are 1 (in this case µ∗ (β) is computed by applying proce-
dure A n − 1 times). By maintaining set argmin{f(α) | α ∈ Q} by using a deta
structure of heap, a vector α ∈ Q with the minimum f(α) can be obtained in
O(log |Q|) = O(log(2n )) = O(n). Thus the time delay to find the next smallest
2-way cut is O((n − 2 + n − 1)(Ff + n)) = O(nFf ) (assuming Ff = Ω(n)). u t

It is known that a non-empty and proper subset X that minimizes g(X)


in a submodular system (V, g) can be found in polynomial time by using the
ellipsoid method [5]. In particular, the problem of finding a minimum 2-way cut
which separates specified two disjoint subsets S, T ⊂ V in (V, f ) can be solved
in polynomial time, since the problem is reduced to finding a minimum 2-way
382 Hiroshi Nagamochi and Toshihide Ibaraki

cut in the system (V 0 , g) such that V 0 = V − (S ∪ T ) and a submodular function


g : V 0 → < defined by g(X) = f (X ∪ S) − f (S).
For the cut function c in an edge-weighted graph G, it is well known that
a minimum 2-way cut that separates specified two disjoint subsets S, T ⊂ V
can be found by solving a single maximum flow problem. Therefore Ff =
O(mn log(n2 /m)) holds if we use the maximum flow algorithm of [4], where
n and m are the numbers of vertices and edges in G, respectively. In the pla-
nar graph case, we can enumerates cuts more efficiently by making use of the
shortest path algorithm in dual graphs.
Call a simple path between s and t an s, t-path, and let S(n, m) denote the
time to compute a shortest s, t-path in an edge-weighted graph G with n vertices
and m edges.

Lemma 3. For an edge-weighted graph G = (V, E) with n = |V | and m =


|E|, cycles in G can be enumerated in the order of non-decreasing lengths, with
O(S(n, m)) time delay between two consecutive outputs, where the first cycle can
be found in O(mS(n, m)) time.

Proof. It is known [17] that s, t-paths can be enumerated in the order of non-
decreasing lengths with O(S(n, m)) time delay between two consecutive outputs.
Based on this, we enumerate cycles in G in the order of non-decreasing lengths
as follows. Let E = {e1 = (s1 , t1 ), e2 = (s2 , t2 ), . . . , em = (sm , tm )}. Then the
set S of all cycles in G can be partitioned into m sets Si , i = 1, . . . , m, of cycles,
where Si is the set of cycles C ⊆ E such that ei ∈ C ⊆ E − {e1 , . . . , ei−1 }. Also,
a cycle C ∈ Si is obtained by combining an si , ti -path P ⊆ E − {e1 , . . . , ei−1 }
and the edge ei = (si , ti ) and vice versa. The shortest cycle in G can be obtained
as a cycle with the minimum length among Pi∗ ∪ {ei }, i = 1, . . . , m, where Pi∗ is
the shortest si , ti -path in the graph (V, E − {e1 , . . . , ei−1 }). If Pj∗ ∪ {ej } is chosen
as a cycle of the minimum length, then we compute the next shortest sj , tj -path
P in the graph (V, E − {e1 , . . . , ej−1 }), and update Pj∗ by this P . Then the
second shortest cycle in G can be as a cycle with the minimum length among
Pi∗ ∪ {ei }, i = 1, . . . , m. Thus, by updating Pj∗ by the next shortest sj , tj -path
P after Pj∗ ∪ {ej } is chosen, we can repeatedly enumerate cycles in the order of
non-decreasing weights, with O(S(n, m)) time delay. t
u

Corollary 4. For an edge-weighted planar graph G = (V, E) with n = |V |, 2-


way cuts in G can be enumerated in the order of non-decreasing weights with
O(n) time delay between two consecutive outputs, where the first 2-way cut can
be found in O(n2 ) time.

Proof. It is known that a 2-way cut in a planar graph G = (V, E) corresponds


to a cycle in its dual graph G∗ = (V ∗ , E), where G and G∗ have the same edge
set E. Thus, it suffices to enumerate cycles in G∗ in the order of non-decreasing
lengths, where the length of a cycle C ⊆ E in G∗ is the sum of weights of edges
in C. By Lemma 3 (see Appendix), cycles in G∗ can be enumerated in the order
of non-decreasing lengths with O(S(|V ∗ |, |E|)) time delay. For a planar graph
G∗ , S(|V ∗ |, |E|) = O(|V ∗ | + |E|) = O(|V |) is known [8]. t
u
A Fast Algorithm for Computing Minimum 3-Way and 4-Way Cuts 383

3 Minimum 3-Way and 4-Way Cuts


Let π = {V1 , . . . , Vk } be a minimum k-way cut in a system (V, f ). Suppose that
the first cut V1 in π is known. Then we need to compute a minimum k-way
cut {V1 , . . . , Vk } in (V, f ) under the restriction that V1 is fixed. In the case of a
graph, the problem can be reduced to finding a minimum (k − 1)-way cut in the
induced subgraph G[V − V1 ]. For k = 3, 4, we now show that such a cut V1 can
be obtained from among O(n) number of cuts.

Theorem 5. For a symmetric submodular system (V, f ) with n = |V | ≥ 4, let

{X1 , V − X1 }, {X2 , V − X2 }, . . . , {Xr , V − Xr }

be the first r smallest 2-way cuts in the order of non-decreasing weights, where
r is the first integer r such that Xr crosses some Xq (1 ≤ q < r). Let us denote
Y1 = Xq − Xr , Y2 = Xq ∩ Xr , Y3 = Xr − Xq , and Y4 = V − (Xq ∪ Xr ). Then:
(i) There is a minimum 3-way cut {V1 , V2 , V3 } of (V, f ) such that V1 = Xi or
V1 = V − Xi for some i ∈ {1, 2, . . . , r − 1}, or {V1 , V2 , V3 } = {Yj , Yj+1 , V −
(Yj ∪ Yj+1 )} for some j ∈ {1, 2, 3, 4} (where Yj+1 = Y1 for j = 4).
(ii) There is a minimum 4-way cut {V1 , . . . , V4 } of (V, f ) such that V1 = Xi or
V1 = V − Xi for some i ∈ {1, 2, . . . , r − 1}, or {V1 , . . . , V4 } = {Y1 , Y2 , Y3 , Y4 }.

Proof. We have f(X1 ) ≤ f(X2 ) ≤ · · · ≤ f(Xr ) by assumption.


(i) Let a minimum 3-way cut in G be denoted by {V1 , V2 , V3 } with f(V1 ) ≤
min{f(V2 ), f(V3 )}. If f(V1 ) < f(Xr ), this implies that V1 = Xi or V1 = V −Xi for
some i < r. Therefore assume f(V1 ) ≥ f(Xr ). Thus, ωf ({V1 , V2 , V3 }) ≥ 32 f(Xr )
holds. Now the cuts Xq and Xr cross each other, and f(Xq ) ≤ f(Xr ) holds by
q < r. By (2.3), we have

f(Xq ) + f(Xr ) ≥ f(Xq − Xr ) + f(Xr − Xq ).

Hence at least one of f(Xq − Xr ) and f(Xr − Xq ) is equal to or less than


max{f(Xq ), f(Xr )} (= f(Xr )). Thus,

f (Yi ) ≤ f (Xr ) holds for i = 1 or i = 3.

Assume that f(Xh − Xj ) ≤ f(Xr ) holds for {h, j} = {q, r}. Similarly, we obtain
min{f(Xq ∩ Xr ), f(Xq ∪ Xr )} ≤ f(Xr ) by (2.1). Thus,

f (Yi ) ≤ f (Xr ) holds for i = 2 or i = 4

(note that f (Xq ∪ Xr ) = f (V − (Xq ∪ Xr ))). Therefore, there is an index j ∈


{1, 2, 3, 4} such that f (Yj ) ≤ f (Xr ) and f (Yj+1 ) ≤ f (Xr ) (where Yj+1 implies
Y1 for j = 4). Clearly, f (V − (Yj ∪ Yj+1 )) is equal to f (Xq ) or f (Xr ). Therefore,
for the 3-way cut {Yj , Yj+1 , V − (Yj ∪ Yj+1 )}, we have
3
ωf ({Yj , Yj+1 , V − (Yj ∪ Yj+1 )}) ≤ f (Xr ) ≤ ωf ({V1 , V2 , V3 }).
2
384 Hiroshi Nagamochi and Toshihide Ibaraki

This implies that {Yj , Yj+1 , V − (Yj ∪ Yj+1 )} is also a minimum 3-way cut.
(ii) Let a minimum 4-way cut in G be denoted by {V1 , V2 , V3 , V4 } with f(V1 ) ≤
min{f(V2 ), f(V3 ), f(V4 )}. Assume that f(V1 ) ≥ f(Xr ) holds, since otherwise (i.e.,
f(V1 ) < f(Xr )), we are done. Thus, 2f(Xr ) ≤ 2f(V1 ) ≤ ωf ({V1 , V2 , V3 , V4 }).
From inequalities (2.1) - (2.3), we then obtain
1
2f (Xr ) ≥ f(Xq )+f(Xr ) ≥ [f(Xq−Xr )+f(Xr−Xq )+f(Xq ∩Xr )+f(V −(Xq ∪Xr ))].
2
Therefore, we have 12 [f(Xq −Xr )+f(Xr −Xq )+f(Xq ∩Xr )+f(V −(Xq ∪Xr ))] ≤
f(Xq ) + f(Xr ) ≤ 2f(Xr ) ≤ ωf ({V1 , V2 , V3 , V4 }), indicating that {Xq − Xr , Xr −
Xq , Xq ∩ Xr , V − (Xq ∪ Xr )} is also a minimum 4-way cut in (V, f ). t
u
Now we are ready to describe a new algorithm for computing minimum 3-way
and 4-way cuts in an edge-weighted graph G. In the algorithm, minimum 2-way
cuts will be stored in X in the non-decreasing order, until X becomes crossing,
and an O(n) number of k-way cuts will be stored in C, from which a k-way cut
of the minimum weight is chosen.

MULTIWAY
Input: An edge-weighted graph G = (V, E) with |V | ≥ 4 and an integer k ∈ {3, 4}.
Output: A minimum k-way cut π.
1 X := C := ∅; i := 1;
2 while X is non-crossing do
3 Find the i-th minimum 2-way cut {Xi , V − Xi } in G;
4 X := X ∪ {Xi };
5 if |V −Xi | ≥ k−1 then find a minimum (k − 1)-way cut {Z1 , . . . , Zk−1 }
in the induced subgraph G[V − Xi ], and add the k-way cut
{Xi , Z1 , . . . , Zk−1 } to C;
6 if |Xi | ≥ k − 1 then find a minimum (k − 1)-way cut {Z10 , . . . , Zk−1
0
}
in the induced subgraph G[Xi ], and add the k-way cut
{Z10 , . . . , Zk−1
0
, V − Xi } to C;
7 i := i + 1
8 end; /* while */
9 Let Xr be the last cut added to X , and choose a cut Xq ∈ X that crosses
Xr , where we denote Y1 = Xq − Xr , Y2 = Xq ∩ Xr , Y3 = Xr − Xq , and
Y4 = V − (Xq ∪ Xr );
10 if k = 3 then add to C 3-way cuts {Yj , Yj+1 , V − (Yj ∪ Yj+1 )},
j ∈ {1, 2, 3, 4} (where Yj+1 = Y1 for j = 4);
11 if k = 4 then add 4-way cut {Y1 , Y2 , Y3 , Y4 } to C;
12 Output a k-way cut π in C with the minimum weight.

The correctness of algorithm MULTIWAY follows from Theorem 5. To an-


alyze the running time of MULTIWAY, let Ck (n, m) (k ≥ 2) denote the time
required to find a minimum k-way cut in an edge-weighted graph with n vertices
and m edges.
By Lemma 1, the number r of iterations of the while-loop is at most 2n −
2 = O(n). Therefore, lines 5 and 6 in MULTIWAY requires O(nCk−1 (n, m))
A Fast Algorithm for Computing Minimum 3-Way and 4-Way Cuts 385

time in total. The total time required to check whether X is crossing or not is
O(rn2 ) = O(n3 ). By Theorem 2, r minimum 2-way cuts can be enumerated in
the non-decreasing order in O(rnF (n, m)) = O(n2 F (n, m)) time, where F (n, m)
denotes the time required to find a minimum (s, t)-cut (i.e., a minimum 2-way
cut that separates two specified vertices s and t) in an edge-weighted graph with
n vertices and m edges. Summing up these, we have

C3 (n, m) = O(n2 F (n, m) + nC2 (n, m) + n3 ),

C4 (n, m) = O(n2 F (n, m) + nC3 (n, m) + n3 ) = O(n3 F (n, m) + n2 C2 (n, m) + n4 ),


establishing the following result.

Theorem 6. For an edge-weighted graph G = (V, E), where n = |V | ≥ 4 and


m = |E|, a minimum 3-way cut and a minimum 4-way cut can be computed
in O(n2 F (n, m) + nC2 (n, m) + n3 ) and O(n3 F (n, m) + n2 C2 (n, m) + n4 ) time,
respectively. t
u
Currently, F (n, m) = O(mn log(n2 /m)) [4] and O(min{n2/3 , m1/2 }
m log(n2 /m) log U ) (where U is the maximum edge weight if weights are all
integers) [3] are among the fastest, and C2 (n, m) = O(mn log(n2 /m)) [7] are
known for weighted graphs. Thus our bounds become

C3 (n, m) = O(mn3 log(n2 /m)) or O(min{n2/3 , m1/2 }mn log(n2 /m) log U ),

C4 (n, m) = O(mn4 log(n2 /m)) or O(min{n2/3 , m1/2 }mn2 log(n2 /m) log U ).
For unweighted graphs, we have C3 (n, m) = O(min{mn2/3 , m3/2 }n) and
C4 (n, m) = O(min{mn2/3 , m3/2 }n2 ), since F (n, m) = O(min{mn2/3 , m3/2 }) is
known for unweighted graphs [2,12]. For a planar graph G, we can obtain better
bounds by applying Corollary 4.

Corollary 7. For an edge-weighted planar graph G = (V, E), where n = |V | ≥


4, a minimum 3-way cut and a minimum 4-way cut can be computed in O(n3 )
and O(n4 ) time, respectively. t
u

4 3-Way Cuts in Symmetric Submodular Systems


4.1 General Case
Let (V, f ) be a symmetric submodular system. Let us consider whether the
algorithm MULTIWAY can be extended to find a minimum k-way cut in (V, f )
for k = 3, 4. In the case of a graph G = (V, E), if the first cut V1 in a minimum k-
way cut {V1 , . . . , Vk } is known, then the remaining task of finding cuts V2 , . . . , Vk
can be reduced to computing a minimum (k−1)-way cut in the induced subgraph
G[V − V1 ]. However, even if the first cut V1 in a minimum k-way cut {V1 , . . . , Vk }
is known in (V, f ), finding the rest of cuts V2 , . . . , Vk may not be directly reduced
to the minimum (k − 1)-way cut problem in a certain symmetric submodular
system (V − V1 , f 0 ) (because, we do not have any concept corresponding to the
386 Hiroshi Nagamochi and Toshihide Ibaraki

induced subgraph G[V − V1 ] in a symmetric submodular system (V, f )). Thus,


we need to find a procedure to compute a minimum k-way cut {V1 , . . . , Vk } in
(V, f ) under the restriction that V1 is fixed. In what follows, we show that such
a procedure can be constructed for k = 3 (but the case of k = 4 is still open).
For this, we first define a set function f 0 on V 0 = V − V1 from the cut V1 .
Lemma 8. For a symmetric submodular system (V, f ) and a subset V1 ⊂ V ,
the set function f 0 on V 0 = V − V1 , defined by
f 0 (X) = f (X) + f (V 0 − X) − f (V 0 ), X ⊆ V 0,
is symmetric and submodular.
Proof. Clearly f 0 is symmetric by definition. For any X, Y ⊆ V , we obtain
f 0 (X) + f 0 (Y ) = f (X) + f (V 0 − X) − f 0 (V ) + f (Y ) + f (V 0 − Y ) − f 0 (V )
≥ f (X ∩ Y ) + f (X ∪ Y ) + f (V 0 − (X ∩ Y )) + f (V 0 − (X ∪ Y )) − 2f 0 (V )
(by the submodularity of f )
= f 0 (X ∩ Y ) + f 0 (X ∪ Y ).u
t
Note that for a 3-way cut π = {V1 , V2 , V3 } in (V, f ), its weight ωf (π) is given by
0
2 (f (V1 ) + f (V2 ) + f (V − V1 − V2 )) = f (V1 ) + 2 (f (V2 ) + f (V − V2 ) − f (V1 )) =
1 1
1 0 0
f (V1 ) + 2 f (V2 ) for V = V − V1 . Therefore, a minimum 3-way cut {V1 , V2 , V3 }
in (V, f ) for a specified V1 can be obtained by solving the minimum 2-way cut
problem in the system (V 0 , f 0 ).
For a symmetric submodular system (V, f ), it is known that a minimum 2-
way cut in a symmetric submodular system can be computed in O(n3 Tf ) time
[27,23], where Tf is the time to evaluate the function value f (X) for a given
subset X ⊆ V .
Now we are ready to extend the algorithm MULTIWAY so as to find a
minimum 3-way cut in (V, f ). By Theorem 2, we can enumerate 2-way cuts of
(V, f ) in the order of non-decreasing weights with O(nFf ) time delay, where Ff
is the time required to find a minimum 2-way cut that separates specified two
disjoint subsets S, T ⊂ V in (V, f ). Thus, we can enumerate the first r minimum
2-way cuts of (V, f ) in O(rnFf ) = O(n2 Ff ) time, using the result r = O(n)
of Lemma 1. Therefore, we can compute a minimum 3-way cut {V1 , V2 , V3 } in
(V, f ) under the restriction that V1 is fixed. Summarizing these up, we have the
following result.
Theorem 9. For a symmetric and submodular system (V, f ) with n = |V | ≥ 3,
a minimum 3-way cut can be computed in O(n2 Ff + n4 Tf ) time, where Ff is the
time required to find a minimum 2-way cut that separates specified two disjoint
subsets S, T ⊂ V in (V, f ), and Tf is the time to evaluate the function value
f (X) for a given subset X ⊆ V . t
u

4.2 Cut Function of a Hypergraph


Now we elaborate upon the result of Theorem 9, assuming that f is the cut
function c of a hypergraph. Let H = (V, E) be an edge-weighted hypergraph
A Fast Algorithm for Computing Minimum 3-Way and 4-Way Cuts 387

with a vertex set V and a hyper-edge set E, where a hyper-edge e is defined as


a non-empty subset of V . For a hyper-edge e ∈ E, V (e) denotes the set of end
vertices of e (i.e., the vertices incident to e). A cut function c in H is defined by

c(X) = {w(e) | V (e) ∩ X 6= ∅, V (e) ∩ (V − X) 6= ∅}, X ⊆ V,

where w(e) is the weight of edge e. It is easy to observe that the cut function
c in a hypergraph H is also symmetric and submodular. We define the weight
ωc (π) of a k-way cut {V1 , . . . , Vk } also by (2.4).
Remark: The weight ωc (π) of a minimum 2-way cut π is equal to the minimum
weight sum of edges whose removal makes the hypergraph disconnected. This is
the same as the case of the cut function in a graph. For k ≥ 3, however, ωc (π)
of a minimum k-way cut π = {V1 , . . . , Vk } may not mean the minimum weight
sum γH of edges whose removal generates at least k components, where

γH = min{γH (π) | k-way cut π}, (4.5)

γH (π) = {w(e) | e ∈ E, V (e) 6⊆ Vi for all Vi ∈ π}.


This is different from the case of a graph. The reason is because the weight w(e)
of a hyper-edge e may contribute to ωc (π) by more than 1 if e is incident to
more than two subsets in π. t
u

Let H = (V, E) be a hypergraph with |E| ≥ |V |. Given two specified vertices


s, t ∈ V , it is known [19] that a minimum 2-way cut separating s and t in H can
be computed by solving a single maximum flow problem in the auxiliary digraph
with |V | + 2|E| vertices and 2dH + 2|E| edges, where
X
dH = |V (e)|.
e∈E

Thus, a minimum 2-way cut separating two disjoint subsets S, T ⊂ V can be


computed in Fc = F (|V | + 2|E|, 2dH + 2|E|) = Õ(|E|dH ) time by computing a
minimum 2-way cut separating s and t in the hypergraph obtained from H by
contracting S and T into single vertices s and t, respectively.
For a given subset X ⊂ V 0 = V − V1 , the function value f 0 (X) = c(X) +
c(V 0 −X)−c(V1 ) can be evaluated in Tc = O(n+dH ) time. Hence, by Theorem 9,
we can compute a minimum 3-way cut in O(|V |2 Fc + |V |4 Tc ) = Õ(|V |2 |E|dH +
|V |4 (|V | + dH )) = Õ((|V |4 + |V |2 |E|)dH ) time.
The running time can be slightly improved as follows. Notice that for a 3-way
cut π = {V1 , V2 , V3 } in H = (V, E), its weight ωc (π) is written as
X
ωc (π) = {w(e) | e ∈ E, |{i | Vi ∩ V (e) 6= ∅}| = 2}

3X
+ {w(e) | e ∈ E, |{i | Vi ∩ V (e) 6= ∅}| = 3}.
2
388 Hiroshi Nagamochi and Toshihide Ibaraki

For a given cut V1 , let us define an edge-weighted hypergraph H[V1 ] = (V 0 , E 0 )


induced from H by V1 by

V 0 = V − V1 and E 0 = {e ∈ E | V (e) ∩ (V − V1 ) 6= ∅},

where for each hyper-edge e ∈ E 0 , the set V 0 (e) of its end vertices and its weight
w0 (e) are redefined by V 0 (e) = V (e) − V1 and

0 w(e) if V 0 (e) = V (e) (i.e., V (e) ∩ V1 = ∅)
w (e) =
w(e)/2 if V 0 (e) ⊂ V (e) (i.e., V (e) ∩ V1 6= ∅).

Hence,
P 0 for a 2-way cut π 0 = {V2 , V3 } in H[V1 ] = (V 0 , E 0 ), its weight ωc0 (π 0 ) =
{w (e) | e ∈ E , V 0 (e)∩V
0
P 2 6= ∅ = 6 V 0 (e)∩V3 } (where c0 denotes the cut function
in H[V P1 ]) is equal to {w(e) | e ∈ E 0 , V (e) ∩ V2 6= ∅ =
6 V (e) ∩ V3 , V (e) ∩ V1 =
∅} + {w(e)/2 | e ∈ E , V (e) ∩ V2 6= ∅, V 0 (e) ∩ V3 6= ∅, V 0 (e) ∩ V1 6= ∅}.
0 0

Thus, c(V1 ) + ωc0 (π 0 ) = ωc (π) holds for a 3-way cut π = {V1 , V2 , V3 } and the cut
function c in H. Therefore, for a fixed cut V1 , we can find a minimum 3-way cut
{V1 , V2 , V3 } in H by computing a minimum 2-way cut {V2 , V3 } in H[V1 ]. It is
known that a minimum 2-way cut in a hypergraph H = (V, E) can be obtained
in O(|V |dH + |V |2 log |V |) time [18]. Since we need to solve O(n) number of
minimum 2-way cut problems in our algorithm, the time complexity becomes
O(|V |2 F (|V |+|E|, dH )+|V |2 (dH +|V | log |V |)) = Õ(|V |2 |E|dH ), assuming |E| ≥
|V |.

Corollary 10. For an edge-weighted hypergraph H = (V, E), a 3-way cut π


P can be computed in Õ(n mdH ) time, where n = |V |, m =
2
minimizing ωc (π)
|E| ≥ n, dH = e∈E |V (e)| and c is the cut function in H. t
u

As remarked in the above, the weight of ωc (π) of a minimum 3-way cut π in


a hypergraph H = (V, E) may not give the γH of (4.5). However, for any 3-way
cut π, it holds
3
ωc (π) ≤ γH (π).
2
Therefore, the minimum value ωc (π) over all 3-way cuts π is at most 1.5 of γH .

Corollary 11. For an edge-weighted hypergraph H = (V, E), there is a 1.5-


approximation algorithm for the problem of minimizing the sum of weights of
hyper-edges whose removal leaves at least 3 components.
P The algorithm runs in
Õ(n2 mdH ) time, where n = |V |, m = |E| and dH = e∈E |V (e)| ≥ n. t
u

For k = 4, it is not known how to find a minimum 4-way cut {V1 , . . . , V4 } in


a symmetric submodular system (or in a hypergraph) under the condition that
V1 is fixed. By using the above argument, we can find a minimum 4-way cut
in a hypergraph H = (V, E) if every hyper-edge e has at most three incident
vertices (i.e., 2 ≤ |V (e)| ≤ 3 for all e ∈ E). This is because if |V (e)| ≤ 3 for all
e ∈ E, then it holds ωc (π) = c(V1 )+ωc0 (π 0 ) for the weight ωc (π) of 4-way cut π =
{V1 , V2 , V3 , V4 } in H and the weight ωc0 (π 0 ) of a 3-way cut π 0 = {V2 , V3 , V4 } in the
induced hypergraph H[V1 ] as defined above. The algorithm runs in Õ(|V |2 Fc +
A Fast Algorithm for Computing Minimum 3-Way and 4-Way Cuts 389

|V |(|V |2 |E|dH )) = Õ(|V |3 |E|dH ) time. Also, in this class of hypergraphs, a 4-


way cut π that minimizes the weight ωc (π) is a 1.5-approximation to the problem
of finding a 4-way cut π that minimizes γH (π).

Corollary 12. Let H = (V, E) be an edge-weighted hypergraph such that every


hyper-edge e ∈ E P has at most three incident vertices, and let n = |V |, m =
|E| ≥ n and dH = e∈E |V (e)|. Then a 4-way cut π that minimizes ωc (π) can be
computed in Õ(n3 mdH ) time, and its γH (π) value is at most 1.5 of the minimum
weight sum γH of hyper-edges whose removal leaves at least 4 components in H.
t
u

5 Concluding Remark

In this paper, we proposed an algorithm for computing minimum 3-way and


4-way cuts in a graph, based on the enumeration algorithm of 2-way cuts in the
non-decreasing order. The algorithm is then extended to the problem of finding
a minimum 3-way cut in a symmetric submodular system. As a result of this,
the minimum 3-way cut problem in a hypergraph can be solved in polynomial
time. It is left for future work how to extend the algorithm to the case of k ≥
5 in graphs and the case of k ≥ 4 in symmetric submodular systems (or in
hypergraphs). Very recently, we proved that the minimum k-way cuts in graphs
can be computed in O(mnk log(n2 /m)) time for k = 5, 6 [25].

Acknowledgments
This research was partially supported by the Scientific Grant-in-Aid from
Ministry of Education, Science, Sports and Culture of Japan, and the subsidy
from the Inamori Foundation. We would like to thank Professor Naoki Katoh
and Professor Satoru Iwata for their valuable comments.

References
1. M. Burlet and O. Goldschmidt, A new and improved algorithm for the 3-cut problem,
Operations Research Letters, vol.21, (1997), pp. 225–227.
2. S. Even and R. E. Tarjan, Network flow and testing graph connectivity, SIAM J.
Computing, vol.4, (1975), pp. 507–518.
3. A. V. Goldberg and S. Rao, Beyond the flow decomposition barrier, Proc. 38th IEEE
Annual Symp. on Foundations of Computer Science, (1997), pp. 2–11.
4. A. V. Goldberg and R. E. Tarjan, A new approach to the maximum flow problem,
J. ACM, vol.35, (1988), pp. 921–940.
5. M. Grötschel, L. Lovász and A. Schrijver, Geometric Algorithms and Combinatorial
Optimization, Springer, Berlin (1988).
6. O. Goldschmidt and D. S. Hochbaum, A polynomial algorithm for the k-cut problem
for fixed k, Mathematics of Operations Research, vol.19, (1994), pp. 24–37.
7. J. Hao and J. Orlin, A faster algorithm for finding the minimum cut in a directed
graph, J. Algorithms, vol. 17, (1994), pp. 424–446.
390 Hiroshi Nagamochi and Toshihide Ibaraki

8. M. R. Henzinger, P. Klein, S. Rao and D. Williamson, Faster shortest-path algo-


rithms for planar graphs, J. Comp. Syst. Sc., vol. 53, (1997), pp. 2-23.
9. D. S. Hochbaum and D. B. Shmoys, An O(|V |2 ) algorithm for the planar 3-cut
problem, SIAM J. Algebraic Discrete Methods, vol.6, (1985), pp. 707–712.
10. Y. Kamidoi, S. Wakabayashi and N. Yoshida, Faster algorithms for finding a min-
imum k-way cut in a weighted graph, Proc. IEEE International Symposium on Cir-
cuits and Systems, (1997), pp. 1009–1012.
11. Y. Kamidoi, S. Wakabayashi and N. Yoshida, A new approach to the minimum
k-way partition problem for weighted graphs, Technical Report of Inst. Electron.
Inform. Comm. Eng., COMP97-25, (1997), pp. 25–32.
12. S. Kapoor, On minimum 3-cuts and approximating k-cuts using cut trees, Lecture
Notes in Computer Science 1084, Springer-Verlag, (1996), pp. 132–146.
13. D. R. Karger, Minimum cuts in near-linear time, Proceedings 28th ACM Sympo-
sium on Theory of Computing, (1996), pp. 56–63.
14. D. R. Karger and C. Stein, An Õ(n2 ) algorithm for minimum cuts, Proceedings
25th ACM Symposium on Theory of Computing, (1993), pp. 757–765.
15. D. R. Karger and C. Stein, A new approach to the minimum cut problems, J. ACM,
vol.43, no.4, (1996), pp. 601–640.
16. A. V. Karzanov, O nakhozhdenii maksimal’nogo potoka v setyakh spetsial’nogo vida
i nekotorykh prilozheniyakh, In Mathematicheskie Voprosy Upravleniya Proizvod-
stvom, volume 5, Moscow State University Press, Moscow, (1973). In Russian; title
translation: On finding maximum flows in networks with special structure and some
applications.
17. N. Katoh, T. Ibaraki and H. Mine, An efficient algorithm for K shortest simple
paths, Networks, vol.12, (1982), pp. 441–427.
18. R. Klimmek and F. Wagner, A simple hypergraph min cut algorithm, Technical Re-
port B 96-02, Department of Mathematics and Computer Science, Freie Universität
Berlin (1996).
19. E. L. Lawler, Cutsets and partitions of hypergraphs, Networks, vol.3, (1973), pp.
275–285.
20. C. H. Lee, M. Kim and C. I. Park, An efficient k-way graph partitioning algo-
rithm for task allocation in parallel computing systems, Proc. IEEE Int. Conf. on
Computer-Aided Design, (1990), pp. 748–751.
21. T. Lengaur, Combinatorial Algorithms for Integrated Circuit Layout, Wiley (1990).
22. H. Nagamochi and T. Ibaraki, Computing the edge-connectivity of multigraphs and
capacitated graphs, SIAM J. Discrete Mathematics, vol.5, (1992), pp. 54-66.
23. H. Nagamochi and T. Ibaraki, A note on minimizing submodular functions, Infor-
mation Processing Letters, vol. 67, (1998), pp.239–244.
24. H. Nagamochi, K. Nishimura and T. Ibaraki, Computing all small cuts in undirected
networks, SIAM J. Discrete Mathematics, vol. 10, (1997), pp. 469–481.
25. H. Nagamochi, S. Katayama and T. Ibaraki, A faster algorithm for computing
minimum 5-way and 6-way cuts in graphs, Technical Report, Department of Applied
Mathematics and Physics, Kyoto University, 1999.
26. Pulleyblank, Presentation at SIAM Meeting on Optimization, MIT, Boston, MA
(1982).
27. M. Queyranne, Minimizing symmetric submodular functions, Mathematical Pro-
gramming, 82 (1998), pp. 3–12.
28. V. V. Vazirani and M. Yannakakis, Suboptimal cuts: Their enumeration, weight,
and number, Lecture Notes in Computer Science, 623, Springer-Verlag, (1992), pp.
366–377.
Scheduling Two Machines with Release Times

John Noga1 and Steve Seiden2


1
Technische Universität Graz
Institut für Mathematik B
Steyrergasse 30/ 2. Stock
A-8010 Graz, Austria
2
Max-Planck-Institute für Informatik
Im Stadtwald
D-66123 Saarbrücken, Germany

Abstract. We present a deterministic online algorithm for scheduling


two √ parallel machines when jobs arrive over time and show that it is
(5 − 5)/2 ≈ 1.38198-competitive. The best previously known algorithm
is 32 -competitive. Our upper bound matches a previously known lower
bound, and thus our algorithm has the best possible competitive ratio.
We also present a lower bound of 1.21207 on the competitive ratio of any
randomized online algorithm
√ for any number of machines. This improves
a previous result of 4 − 2 2 ≈ 1.17157.

1 Introduction
We consider one of the most basic scheduling problems: scheduling parallel ma-
chines when jobs arrive over time with the objective of minimizing the makespan.
This problem is formulated as follows: There are m machines and n jobs. Each
job has a release time rj and a processing time pj . An algorithm must assign
each job to a machine and fix a start time. No machine can run more that one
job at a time and no job may start prior to being released. For job j let sj be
the start time in the algorithm’s schedule. We define the completion time for
job j to be cj = sj + pj . The makespan is maxj cj . The algorithm’s goal is to
minimize the makespan.
We study an online version of this problem, where jobs are completely un-
known until their release times. In contrast, in the offline version all jobs are
known in advance. Since it is in general impossible to solve the problem optimally
online, we consider algorithms which approximate the best possible solution.
Competitive analysis is a type of worst case analysis where the performance
of an online algorithm is compared to that of the optimal offline algorithm. This
approach to analyzing online problems was initiated by Sleator and Tarjan, who
used it to analyze the List Update problem [17]. The term competitive analysis
originated in [12]. For a given job set σ, let costA (σ) be the cost incurred by
an algorithm A on σ. Let cost(σ) be the cost of the optimal schedule for σ. A
scheduling algorithm A is ρ-competitive if

costA (σ) ≤ ρ · cost(σ),

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 391–399, 1999.
c Springer-Verlag Berlin Heidelberg 1999
392 John Noga and Steve Seiden

for all job sets σ. The competitive ratio of A is the infimum of the set of values
ρ for which A is ρ-competitive. Our goal is to find the algorithm with smallest
possible competitive ratio.
A great deal of work has been done on parallel machine scheduling when
jobs are presented in a list and must be immediately scheduled prior to any
processing [1, 2, 3, 4, 5, 7, 8, 9, 11, 14, 15]. On the other hand, very little
work has been done on the version which we address here, despite the fact
that this version of the problem seems more realistic. The known results are
as follows: Hall and Shmoys show that the List algorithm of Graham is 2-
competitive for all m [9, 10]. In [16], Shmoys, Wein and Williamson present a
general online algorithm for scheduling with release dates. This algorithm uses as
a subroutine an offline algorithm for the given problem. If the offline algorithm is
an α-approximation, then the online algorithm is 2α-competitive. They also show
a lower bound of 10 9 on the competitive ratio for online scheduling of parallel
machines with release times. The LPT algorithm starts the job with largest
processing time whenever a machine becomes available. Chen and Vestjens show
that LPT is 32 competitive for all m [6]. These same authors also show a lower
bound of 3√− ϕ ≈ 1.38197 for m = 2 and a general lower bound of 1.34729, where
ϕ = (1 + 5)/2 ≈ 1.61803 is the √ golden ratio. Stougie and Vestjens [18] show a
randomized lower bound of 4 − 2 2 ≈ 1.17157 which is valid for all m.
We present an algorithm for scheduling two parallel machines, called Sleepy,
and show that it is (3−ϕ)-competitive, and thus has the best possible competitive
ratio. We also show a randomized lower bound of
u
1 + max > 1.21207.
0≤u<1 1 − 2 ln(1 − u)

Finally, we show that any distribution over two deterministic algorithms is at


best 54 -competitive. It is our hope that these results arouse interest in the general
m-machine problem.

2 The Sleepy Algorithm


We say that a job is available if it has been released, but not yet scheduled. A
machine that is not processing a job is called idle. Define

3− 5
α=2−ϕ= ≈ 0.38197.
2
Our algorithm, which we call Sleepy, is quite simple. If both machines are
processing jobs or there are no available jobs then we have no choice but to wait.
So, assume that there is at least one idle machine and at least one available job. If
both machines are idle then start the available job with largest processing time.
If at time t, one machine is idle and the other machine is running job j then
start the available job with largest processing time if and only if t ≥ sj + α · pj .
With this algorithm a machine can be idle for two reasons. If a machine is idle
because t < sj + α · pj we say the machine is sleeping. If a machine is idle because
Scheduling Two Machines with Release Times 393

there are no available jobs we say that the machine is waiting. If a machine is
waiting we call the next job released tardy.
We claim that Sleepy is (1+α)-competitive. By considering two jobs of size 1
released at time 0 we see that Sleepy is no better than (1 + α)-competitive.

3 Analysis

We will show, by contradiction, that no set of jobs with optimal offline makespan
1 causes Sleepy to have a makespan more than 1 + α. This suffices, since we
can rescale any nonempty job set to have optimal offline makespan 1.
Without loss of generality, we identify jobs with integers 1, . . . , n so that
c1 ≤ c2 ≤ · · · ≤ cn . Assume that % = {(pi , ri )}ni=1 is a minimum size set of
jobs with optimal offline makespan 1 which causes Sleepy to have a makespan
cn > 1 + α. Let A be the machine which runs job n and B be the other machine.
The remainder of the analysis consists of verifying a number of claims. Intu-
itively, what we want to show is: 1) the last three jobs to be released are the last
three jobs completed by Sleepy, 2) two of the last three jobs have processing
time greater than α and the third has processing time greater than 1 − α, and
3) replacing the last three jobs with one job results in a smaller counterexample.

Claim (Chen and Vestjens [6]). Without loss of generality, there is no time
period during which Sleepy has both machines idle.

Proof. Assume both machines are idle during (t1 , t2 ) and that there is no time
later than t2 when both machines are idle. If there is a job with release time prior
to t2 then removing this job results in a set with fewer jobs, the same optimal
offline cost, and the same cost for Sleepy. If no job is released before t2 then
%0 = {(pi /(1 − t2 ), (ri − t2 )/(1 − t2 ))}ni=1 is a job set with optimal makespan 1,
greater cost for Sleepy, and no time period when Sleepy has both machines
idle. Note %0 is simply % with all release times decreased by t2 and all jobs rescaled
so that the optimal cost remains 1.

Claim. The last job on machine B completes at time cn−1 < 1.

Proof. Since there is no period when Sleepy has both machines idle, the last
job on machine B must complete at or after sn and so cn−1 ≥ sn .
Let j be the last tardy job and k be the job running when the last tardy
job is released. If there are no tardy jobs let rj = ck = 0. At time rj the
optimal algorithm may or may not have completed all jobs released before rj .
Let p (possibly 0) be the amount of processing remaining to be completed by
the optimal algorithm at time rj on jobs released before rj .
Since % is a minimum size counterexample, ck ≤ (1 + α)(rj + p). Note that
between time rj and cn the processing done by Sleepy is exactly the processing
done by the optimal algorithm plus ck − rj − p.
It follows from the previous claim that there is a set of jobs σ1 , . . . , σ` such
that sσ1 ≤ ck < cσ1 , sσi ≤ sσi+1 , and cσ` = cn−1 .
394 John Noga and Steve Seiden

Consider the time period (ck , cσ1 ). During this period neither machine is
waiting. So, at least (2 − α)(cσ1 − ck ) processing is completed by Sleepy. Sim-
ilarly, during the time period (cσi , cσi+1 ) at least (2 − α)(cσi+1 − cσi ) processing
is completed by Sleepy for i = 1, 2, . . . , ` − 1. Therefore, during the period
(ck , cn−1 ), Sleepy completes at least (2 − α)(cn−1 − ck ) processing.
Since the optimal algorithm can process at most 2(1 − rj ) after rj ,

2(ck − rj ) + (2 − α)(cn−1 − ck ) + (cn − cn−1 ) − (ck − rj − p) ≤ 2(1 − rj ).

Therefore,
(1 − α)(cn−1 − ck ) ≤ 2 − rj − p − cn .
and further
2 − rj − p − cn
cn−1 ≤ + ck
1−α
= (2 − α)(2 − rj − p − cn ) + ck
< (2 − α)(1 − α − rj − p) + ck
= 1 − (2 − α)(rj + p) + ck
≤ 1 − (2α − 1)(rj + p)
≤ 1.

The previous claim implies that pn ≥ cn − cn−1 > α.


Claim. sn > sn−1 + α · pn−1 .

Proof. If sn ≤ sn−1 then removing job n − 1 results in a smaller set with no


greater offline cost and no less cost for Sleepy, a contradiction. From the def-
inition of Sleepy, sn ≥ sn−1 + α · pn−1 . Suppose that this is satisfied with
equality. If pn > pn−1 then rn ≥ sn−1 . Therefore, we have

cn = sn + pn = sn−1 + α · pn−1 + pn ≤ α · pn−1 + rn + pn ≤ 1 + α.

On the other hand, if pn ≤ pn−1 then

cn = sn + pn ≤ sn−1 + α · pn−1 + pn−1 ≤ cn−1 + α · pn−1 ≤ 1 + α.

Claim. cn−2 = sn .

Proof. Since sn > sn−1 + α · pn−1 , either cn−2 = sn or machine A is idle im-
mediately before time sn . In the later case, we have sn = rn and therefore
cn = sn + pn = rn + pn ≤ 1, which is a contradiction.

Claim. pn−1 ≥ cn−2 − sn−1 > α.

Proof. It is easily seen that pn−1 ≥ cn−2 − sn−1 . If cn−2 − sn−1 ≤ α then
pn−1 = (cn−1 − cn−2 ) + (cn−2 − sn−1 ) ≤ (cn−1 − cn−2 ) + α < pn . Therefore
pn−1 − cn−1 + cn−2 ≤ α and further rn ≥ sn−1 . This means cn = sn−1 + pn−1 +
pn − (cn−1 − cn−2 ) ≤ rn + pn + α ≤ 1 + α, a contradiction.
Scheduling Two Machines with Release Times 395

Claim. pn−2 > α.


Proof. If pn−2 ≤ α then pn−2 < pn . So, rn ≥ sn−2 . This means cn = sn−2 +
pn−2 + pn ≤ rn + α + pn ≤ 1 + α, a contradiction.
Of the jobs n − 1 and n − 2, let j be the job which starts first and k be the other.
Claim. pj > 1 − α.
Proof. We have pj = cj − sj ≥ cn−2 − sj . Certainly, sj + α · pj ≤ sk . So,
(1 − α)(cn−2 − sj ) ≥ cn−2 − sj − α · pj ≥ cn−2 − sk . Therefore,

pj ≥ cn−2 − sj
cn−2 − sk

1−α
= (2 − α)(cn−2 − max{sn−1 , sn−2 })
= (2 − α) min{cn−2 − sn−1 , pn−2 }
> (2 − α)α
= 1 − α.

Claim. sj + α · pj = sk .
Proof. No job other than n, n − 1, and n − 2 requires more than α processing
time. Note that jobs k and n must go on the same machine in the optimal
schedule. This means either k or n must be released before time 1 − 2α. Suppose
job i has si ≤ sj and ci ≥ sj + α · pj . Then si + α · pi ≤ sj and pi ≥ α · pi +
α · pj ≥ α · pi + α(1 − α). This implies that pi ≥ α, a contradiction. Therefore,
any job that starts before job j must complete before time sj + α · pj : Since
α · pj > α(1 − α) = 1 − 2α, we have sk = sj + α · pj .
Claim. min{rn , rn−1 , rn−2 } > maxi≤n−3 {si }.
Proof. Since jobs n, n − 1,and n − 2 are larger than all other jobs and start later,
they must have been released after maxi≤n−3 {si }.
Claim. pn + pk − pj ≥ 0.
Proof. We have

pn + pk − pj = pn + ck − sk − (cj − sj )
= p n + ck − cj − α · p j
≥ p n + ck − cj − α
= cn − cn−1 + cn−1 − cn−2 + ck − cj − α
> cn−1 − cn−2 + ck − cj
≥ 0.

Let %0 be the set of jobs {(pi , ri )}n−3


i=1 ∪ {(pn + pk − pj , min{rn , rn−1 , rn−2 })}.
This set is valid by the preceding claim.
396 John Noga and Steve Seiden

Claim. The optimal cost of %0 is no more than 1 − pj .

Proof. Since we can assume that on each of the machines individually the optimal
algorithm orders its jobs by release times, this set can be processed identically
to % for the first n − 3 jobs. The new job (pn + pk − pj , min{rn , rn−1 , rn−2 }) runs
on the machine which would have run k. This schedule has cost no more than
1 − pj .

Claim. Sleepy’s cost on %0 is at least cn − (1 + α)pj .

Proof. The first n − 3 jobs are served identically. The final job starts at time sj .
The makespan is at least sj +pn +pk −pj = sk +pk +pn −(1+α)pj ≥ cn −(1+α)pj .

If we rescale %0 to have optimal cost 1 then we have our contradiction (a smaller


counterexample). We therefore have:
Theorem 1. Sleepy is (1 + α)-competitive for α = 2 − ϕ.

4 Randomized Lower Bounds

First we show a lower bound which improves on the previous result of Stougie
and Vestjens [18, 19]:
Theorem 2. No randomized algorithm for m ≥ 2 machines is ρ-competitive
with ρ less than
u
1 + max > 1.21207.
0≤u<1 1 − 2 ln(1 − u)

Proof. We make use of the von Neumann/Yao principle [21, 20]. Let 0 ≤ u < 1 be
a real constant. We show a distribution over job sets such that the expectation
of the competitive ratio of all deterministic algorithms is at least 1 + u/(1 −
2 ln(1 − u)). Let
1 1
q= , p(y) = .
1 − 2 ln(1 − u) (y − 1) ln(1 − u)

We now define the distribution. In all cases, we give two jobs of size 1 at
time 0. With probability q, we give no further jobs. With probability 1 − q, we
pick y at random from the interval [0, u] using the probability density
R u p(y) and
release m−1 jobs of size 2−y at time y. Note that q ≥ 0 and that 0 p(y)dy = 1,
and therefore this is a valid distribution.
We now analyze the expected competitive ratio incurred by a deterministic
online algorithm on this distribution. Let x be the time at which the algorithm
would start the second of the two size one jobs, given that it never receives the
jobs of size 2 − y. If the third job is not given, the algorithm’s cost is 1 + x while
the optimal offline cost is 1. If the jobs of size 2 − y are given at time y ≤ x,
then the algorithm’s cost is at least 2 and this is also the optimal offline cost.
If the jobs of size 2 − y are given at time y > x, then the algorithm’s cost is at
Scheduling Two Machines with Release Times 397

least 3 − y while the optimal offline cost is again 2. First consider x ≥ u. In this
case, the expected competitive ratio is
  Z u
costA (σ)
Eσ ≥ q(1 + x) + (1 − q) p(y)dy
cost(σ) 0
Z u
1+u −2 ln(1 − u) dy
≥ +
1 − 2 ln(1 − u) 1 − 2 ln(1 − u) 0 (y − 1) ln(1 − u)
u
=1+ .
1 − 2 ln(1 − u)

Now consider x < u. In the this case we have


 
costA (σ)

cost(σ)
Z x Z u 
3−y
≥ q(1 + x) + (1 − q) p(y)dy + p(y) dy
0 x 2
Z x Z u 
1+x −2 dy 3 − y dy
= + +
1 − 2 ln(1 − u) 1 − 2 ln(1 − u) 0 y−1 x 2 y−1
1+x −2  u x
= + ln(u − 1) − +
1 − 2 ln(1 − u) 1 − 2 ln(1 − u) 2 2
u
=1+ .
1 − 2 ln(1 − u)

Since the adversary may choose u, the theorem follows. Choosing u = 0.575854
yields a bound of at least 1.21207.

One natural class of randomized online algorithms is the class of barely random
algorithms [13]. A barely random online algorithm is one which is distribution
over a constant number of deterministic algorithms. Such algorithms are often
easier to analyze than general randomized ones. We show a lower bound for any
barely random algorithm which is a distribution over two deterministic algo-
rithms:

Theorem 3. No distribution over two deterministic online algorithms is ρ-


competitive with ρ less than 54 .

Proof. Suppose there is an algorithm which is ( 54 − )-competitive for  > 0. We


give the algorithm two jobs of size 1 at time 0. With some probability p, the
algorithm starts the second job at time t1 , while with probability 1 − p, it start
it at time t2 . Without loss of generality, t1 < t2 . First, we must have
5
1 + pt1 + (1 − p)t2 < ,
4
and therefore
4t2 − 1
p> .
4(t2 − t1 )
398 John Noga and Steve Seiden

We also conclude that t1 < 14 . Now suppose we give the algorithm m − 1 jobs of
size 2 − t1 − δ at time t1 + δ. Then its competitive ratio is at least
p(3 − t1 − δ) + (1 − p)2 1 − t1 − δ (1 − t1 − δ)(4t2 − 1) 5
= p+1> +1< ,
2 2 8(t2 − t1 ) 4
and therefore
1 − 3t1 − δ
t2 < .
2 − 4t1 − 4δ
Since this is true for all δ > 0 we have
1 − 3t1
t2 < .
2 − 4t1
If instead we give m − 1 jobs of size 2 − t2 − δ at time t2 + δ, then the competitive
ratio is at least
3 − t2 − δ 5 − 9t1 − 2δ + 4t1 δ
> .
2 4 − 8t1
Again, this holds for all δ > 0 and so we have
5 − 9t1 5
≤ − ,
4 − 8t1 4
which is impossible for 0 ≤ t1 < 14 .

5 Conclusions
We have shown a deterministic online algorithm for scheduling two machines
with release times with the best possible competitive ratio. Further, we have im-
proved the lower bound on the competitive ratio of randomized online algorithms
for this problem.
We feel there are a number of interesting and related open questions. First,
is there a way to generalize Sleepy to an arbitrary number of machines and
have a competitive ratio smaller than LPT? Or the even stronger result, for
a fixed value of  > 0 is there an algorithm which is (3/2 − )-competitive
for all m? Second, does randomization actually reduce the competitive ratio?
It seems like randomization should help to reduce the competitive ratio a great
deal. However, the best known randomized algorithms are actually deterministic
(Sleepy for m = 2 and LPT for m > 2). Third, how much does the competitive
ratio decrease if restarts are allowed? In many real world situations a job can be
killed and restarted later with only the loss of processing already completed.

6 Acknowledgement
This research has been supported by the START program Y43-MAT of the Aus-
trian Ministry of Science. The authors would like to thank Gerhard Woeginger
for suggesting the problem.
Scheduling Two Machines with Release Times 399

References
[1] Albers, S. Better bounds for online scheduling. In Proceedings of the 29th ACM
Symposium on Theory of Computing (1997), pp. 130–139.
[2] Bartal, Y., Fiat, A., Karloff, H., and Vohra, R. New algorithms for an
ancient scheduling problem. Journal of Computer and System Sciences 51, 3 (Dec
1995), 359–366.
[3] Bartal, Y., Karloff, H., and Rabani, Y. A better lower bound for on-line
scheduling. Information Processing Letters 50, 3 (May 1994), 113–116.
[4] Chen, B., van Vliet, A., and Woeginger, G. A lower bound for randomized
on-line scheduling algorithms. Information Processing Letters 51, 5 (Sep 1994),
219–222.
[5] Chen, B., van Vliet, A., and Woeginger, G. New lower and upper bounds
for on-line scheduling. Operations Research Letters 16, 4 (Nov 1994), 221–230.
[6] Chen, B., and Vestjens, A. Scheduling on identical machines: How good is
LPT in an online setting? Operations Research Letters 21, 4 (Nov 1997), 165–169.
[7] Faigle, U., Kern, W., and Turàn, G. On the performance of on-line algorithms
for partition problems. Acta Cybernetica 9, 2 (1989), 107–119.
[8] Galambos, G., and Woeginger, G. An online scheduling heuristic with better
worst case ratio than Graham’s list scheduling. SIAM Journal on Computing 22,
2 (Apr 1993), 349–355.
[9] Graham, R. L. Bounds for certain multiprocessing anomalies. Bell Systems
Technical Journal 45 (1966), 1563–1581.
[10] Hall, L., and Shmoys, D. Approximation schemes for constrained schedul-
ing problems. In Proceedings of the 30th IEEE Symposium on Foundations of
Computer Science (1989), pp. 134–139.
[11] Karger, D., Phillips, S., and Torng, E. A better algorithm for an ancient
scheduling problem. Journal of Algorithms 20, 2 (Mar 1996), 400–430.
[12] Karlin, A., Manasse, M., Rudolph, L., and Sleator, D. Competitive snoopy
caching. Algorithmica 3, 1 (1988), 79–119.
[13] Reingold, N., Westbrook, J., and Sleator, D. Randomized competitive
algorithms for the list update problem. Algorithmica 11, 1 (Jan 1994), 15–32.
[14] Seiden, S. S. Randomized algorithms for that ancient scheduling problem. In
Proceedings of the 5th Workshop on Algorithms and Data Structures (Aug 1997),
pp. 210–223.
[15] Sgall, J. A lower bound for randomized on-line multiprocessor scheduling. In-
formation Processing Letters 63, 1 (Jul 1997), 51–55.
[16] Shmoys, D., Wein, J., and Williamson, D. Scheduling parallel machines on-
line. In Proceedings of the 32nd Symposium on Foundations of Computer Science
(Oct 1991), pp. 131–140.
[17] Sleator, D., and Tarjan, R. Amortized efficiency of list update and paging
rules. Communications of the ACM 28, 2 (Feb 1985), 202–208.
[18] Stougie, L., and Vestjens, A. P. A. Randomized on-line scheduling: How low
can’t you go? Manuscript, 1997.
[19] Vestjens, A. P. A. On-line Machine Scheduling. PhD thesis, Eindhoven Uni-
versity of Technology, The Netherlands, 1997.
[20] Von Neumann, J., and Morgenstern, O. Theory of games and economic
behavior, 1st ed. Princeton University Press, 1944.
[21] Yao, A. C. Probabilistic computations: Toward a unified measure of complexity.
In Proceedings of the 18th IEEE Symposium on Foundations of Computer Science
(1977), pp. 222–227.
An Introduction to Empty Lattice Simplices

András Sebő?

CNRS, Laboratoire Leibniz-IMAG, Grenoble, France


http://www-leibniz.imag.fr/DMD/OPTICOMB

Abstract. We study simplices whose vertices lie on a lattice and have no


other lattice points. Such ‘empty lattice simplices’ come up in the theory
of integer programming, and in some combinatorial problems. They have
been investigated in various contexts and under varying terminology by
Reeve, White, Scarf, Kannan and Lovász, Reznick, Kantor, Haase and
Ziegler, etc.
Can the ‘emptiness’ of lattice simplices be ‘well-characterized’ ? Is their
‘lattice-width’ small ? Do the integer points of the parallelepiped they
generate have a particular structure ?
The ‘good characterization’ of empty lattice simplices occurs to be open
in general ! We provide a polynomial algorithm for deciding when a
given integer ‘knapsack’ or ‘partition’ lattice simplex is empty. More
generally, we ask for a characterization of linear inequalities satisfied by
the lattice points of a lattice parallelepiped. We state a conjecture about
such inequalities, prove it for n ≤ 4, and deduce several variants of
classical results of Reeve, White and Scarf characterizing the emptiness
of small dimensional lattice simplices. For instance, a three dimensional
integer simplex is empty if and only if all its faces have width 1. Seemingly
different characterizations can be easily proved from one another using
the Hermite normal form.
In fixed dimension the width of polytopes can be computed in polyno-
mial time (see the simple integer programming formulation of Haase and
Ziegler). We prove that it is already NP-complete to decide whether the
width of a very special class of integer simplices is 1, and we also pro-
vide for every n ≥ 3 a simple example of n-dimensional empty integer
simplices of width n − 2, improving on earlier bounds.

1 Introduction

P V ⊆ IR (n P∈ IIN) be a finite set. A polytope is a set of the form conv(V ) :=


n
Let
{ v∈V λv v : v∈V λv = 1}. If V is linearly independent, then S := conv(V ∪
{0}) is called a simplex. (It will be assumed that that 0 ∈ IRn is one of the vertices
of S.) The (linear or affine) rank r(S) is the linear rank of V . A polytope is said
to be integer if its vertices are integer vectors. More generally, fixing an arbitrary
‘lattice’ L, a lattice polytope is a polytope whose vertices are in L. The results
we are proving in this paper do hold for arbitrary lattices, but this general case
?
Research developed partly during a visit in the Research Institute for Mathematical
Sciences, Kyoto University.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 400–414, 1999.
c Springer-Verlag Berlin Heidelberg 1999
An Introduction to Empty Lattice Simplices 401

can always be obviously reduced to L = ZZn . Therefore we will not care about
more general lattices.
P
A set of the form cone(V ) := { v∈V λv v : λv ≥ 0} is a cone; if V is linearly
independent, the cone is called simplicial.
We refere to Schrijver [13] for basic facts about polytopes, cones and other
notions of polyhedral combinatorics, as well as for standard notations such as
the affine or linear hull of vectors, etc.
Let us call an integer polytope P empty, if it is integer, and denoting the set
of its vertices by V , (P ∩ ZZn ) \ V = ∅. (The definition is similar for arbitrary
lattice L instead of ZZn .) Empty lattice polytopes have been studied in the past
four decades, let us mention as landmarks Reeve [11], White [15], Reznick [10],
Scarf [12] and Haase, Ziegler [5]. (However, there is no unified terminology about
them: the terms range from ‘lattice-point-free lattice polytopes’ to ‘elementary
or fundamental lattice polytopes’.) The latter paper is devoted to the volume
and the width of empty lattice simplices.
In this paper we study the structure of empty lattice simplices, the correlation
of emptiness and the width of lattice simplices, including the computational com-
plexity of both. (Note that deciding whether a (not necessarily integer) simplex
contains an integer point is trivially NP-complete, since the set of feasible solu-
tions of the knapsack problem K := {x ∈ IRn : ax = b, x ≥ 0} (a ∈ IINn , b ∈ IIN)
is a simplex. However, in Section 2 we will decide whether a knapsack lattice
simplex is empty in polynomial time.)
Deciding whether a lattice simplex is empty is the simplest of a possible range
of problems that can be formulated as follows: given linearly indepedent integer
vectors a1 , . . . , an ∈ ZZn is there an integer vector v in the cone they generate such
that the uniquely determined coefficients λ1 , . . . , λn ∈ Q + for which λ1 a1 + . . . +
λn an = v satisfy some given linear inequalities. The problem investigated here
corresponds to the inequality λ1 + . . . λn ≤ 1. Another interesting variant is the
existence of an integer point where the λi (i = 1, . . . , n) satisfy some lower and
upper bounds. For instance the ‘Lonely Runner Problem’ (or ‘View Obstruction
Problem’) for velocity vector v ∈ IRn (see [4]) can be restated as the problem
of the existence of an integer vector with 1/(n + 1) ≤ λi ≤ n/(n + 1) for all
i = 1, . . . , n in cone(e1 , . . . , en , (v, d)), where the ei (i = 1, . . . , n) are unit vectors
and d is the least common multiple of all the vectors vi + vj , (i, j = 1, . . . , n).
The width W (S) in IRn of a set S ⊆ IRn is the minimum of max{wT (x − y) :
x, y ∈ S} over all vectors w ∈ ZZn \ {0}. If the rank of S is r < n, then the
width of S in IRn is 0, and it is more interesting to speak about its width in
aff(S) = lin(S) defined as the minimum of max{wT (x − y) : x, y ∈ S} over all
vectors w ∈ ZZn not orthogonal to lin(S). Shortly, the width of S will mean its
width in aff(S).
If 0 ∈
/ VP⊆ ZZn , and V is linearly independent, then define par(V ) := {x ∈
v∈V λv v; 1 > λv ≥ 0 (v ∈ V )} and call it a parallelepiped. If in
n
ZZ : x =
addition |V | = n, then | par(V )| = det(V ), where det(V ) denotes the absolute
value of the determinant of the matrix whose rows are the elements of V (see
402 András Sebő

for instance Cassels [3], or for an elementary proof see Sebő [14]). In particular,
par(V ) = {0} if and only if det(V ) = 1.
The problem of deciding the emptiness of parallelepipeds generated by integer
vectors is therefore easy. On the other hand the same problem is still open for
simplices, and that is the topic of this paper.
If n ≤ 3, it is well-known [15], that the width of an empty integer simplex is
1 (the reverse is even easier – for both directions see Corollary 4.5 below).
The following example shows that in general the width of empty integer
simplices can be large. The problem might have been overlooked before: the
simple construction below is apparently the first explicit example of empty inte-
ger simplices of arbitrary high width. The best result known so far was Kantor’s
non-constructive proof [8] for the existence of integer simplices of width n/e.
For a survey concerning previous results on the correlation of the width and the
volume of empty lattice simplices see Haase, Ziegler [5].
It is easy to construct integer simplices of width n without integer points
at all (not even the vertices), for instance by the following Pn well-known example,
[6]: for arbitrary ε ∈ IR, ε > 0, the simplex {x ∈ IRn : i=1 xi ≤ n − ε/2, xi ≥
ε/2 for all i = 1, . . . , n} has width n−ε and no integer point. The vertices of this
simplex have exactly one big coordinate. We define an integer simplex which is
‘closest possible’ to this one:
Let k ∈ IIN (the best choice will be k = n − 2). Sn (k) := conv(s0 , s1 , ..., sn ),
s0 := 0, s1 := (1, k, 0, . . . , 0), s2 := (0, 1, k, 0, . . . , 0), . . ., sn−1 := (0, . . . , 0, 1, k),
sn := (k, 0, . . . , 0, 1). Let si,j denote the j-th coordinate of si , that is, si,j = 0
if |i − j| ≥ 2, si,i = 1, si,i+1 = k, (i, j ∈ {1, . . . , n}). The notation i + 1 is
understood mod n.

(1.1) The width of Sn (k) is k, unless both k = 1 and n is even.


Indeed, since 0 ∈ Sn (k), the width is at least k if and only if for arbitrary
nonzero integer n-dimensional w there exists i ∈ {1, . . . , n} so that |wT si | ≥ k.
The polytope Sn (k) is a simplex if s1 , . . . , sn are linearly independent, which is
automatic if W (Sn (k)) = k > 0 holds.
Let w ∈ ZZn , w 6= 0. If wi = 0 for some i ∈ {1, . . . , n}, then there also exists
one such that wi = 0 and wi+1 6= 0. But then |wT si | ≥ si,i+1 = k, and we are
done. So suppose w has full support. If there exists an i = 1, ..., n such that
|wi | < |wi+1 |, then |wT si | = |wi + wi+1 k| ≥ |wi+1 k| − |wi | > (k − 1)|wi | ≥ k − 1,
and we are done again.
Hence, we can suppose that |wi | ≥ |wi+1 | holds for all i = 1, ..., n. But then
the equality follows throughout, and we can suppose |wi | = 1 for all i = 1, ..., n.
If there exists an i ∈ {1, . . . , n} so that wi , wi+1 have the same sign, then
wT si = k + 1. If the signs of the wi are cyclically alternating, then wT si is also
alternating between (k − 1) and −(k − 1) and then wT (si − si+1 ) = 2k − 2. In
this case n is even, and 2k − 2 ≥ k unless k = 1.
So the width is at least k, unless n is even, and k = 1. Choosing w to be any
of the n unit vectors we see that the width of Sn (k) is k. t
u
An Introduction to Empty Lattice Simplices 403

(1.2) If k + 1 < n, then Sn (k) is an empty integer simplex.


Indeed, for a contradiction, let Pz = (z1 , . . . , zn ) ∈ Sn (k)
Pn be an integer vector
n
different from si (i = 1, ..., n), z = i=0 λi si , λi ∈ IR+ , i=0 λi = 1.
Claim: λi > 0 for all i = 1, . . . , n.
Indeed, if not, there exists an i ∈ {1, . . . , n} so that λi = 0, λi+1 6= 0. Then
zi+1 = λi+1 . Since λi+1 > 0 by assumption, and λi+1 < 1 since z is different
from si+1 , we have: zi+1 is not integer, contradicting the integrality of z. The
claim is proved. P
The claim immediately implies z > 0 and hence z P ≥ 1. Therefore ni=1 zi ≥
n
n. On the other hand, for all the vertices x of Sn (k), i=1 xi ≤ k + 1. Thus, if
k + 1 < n, then z ∈ / Sn (k). t
u
Hence, for n ≥ 3, Sn (n − 2) is an integer simplex of width n − 2 by (1.1), and
is empty by (1.2). The volume (determinant) of Sn (n − 2) is also high among
empty simplices in IRn . This example is not best possible : for small n there exist
empty integer simplices of larger width, see [5]. Moreover Imre Bárány pointed
out that in the above example, for odd n, the facet of Sn (k) not containing 0
has still the same width as Sn (k), providing an empty integer simplex of width
n − 1 if n ≥ 4 is even; Bárány also noticed that the volume can be increased by
a constant factor by changing one entry in the example. However, it still seems
to be reasonable to think that the maximum width of an empty integer simplex
S ⊆ IRn is n + constant. Kannan and Lovász [6] implies that the width of an
arbitrary empty integer polytope is at most O(n2 ); in [1] this is improved to
O(n3/2 ), where for empty integer simplices O(n log n) is also proved.
The main question we are interested in is the following:
Question: Is there a simple ‘good-characterization’ theorem or even a polyno-
mial algorithm deciding the emptiness of integer simplices ? How are the empti-
ness and the width correlated ? Is there any relation between their computational
complexity ?
It is surprising that the literature does not make any claim about these
problems in general. In the present paper we state some questions and provide
some simple answers whenever we can. Unfortunately we do not yet know what is
the complexity of deciding the emptiness of integer simplices, nor the complexity
of computing the width of empty integer simplices !
In Section 2 we describe a good-characterization theorem and polynomial
algorithm deciding if an integer knapsack (or ‘partition’) polytope is empty.
In Setion 3 and Section 4 we show a possible good certificate for certain
lattice simplices to be empty. As a consequence, a simple new proof is provided
for facts well-known from [15], [12].
Deciding whether a lattice polytope is empty can be trivially reduced to
the existence of an integer point in a slightly perturbed polytope. The result
of Section 2 suggests that a reduction in the opposite direction could be more
difficult, and is impossible in some particular cases.
This reduction implies that in fixed dimension it can be decided in polyno-
mial time whether a lattice polytope is empty by Lenstra [9]. More generally,
404 András Sebő

Barvinok [2] developped a polynomial algorithm for counting the number of


integer points in polytopes when the dimension is fixed.
We do not care about how the simplices are given, since the constraints can
be computed from the vertices in polynomial time, and vice versa.
In Section 5 we explore the complexity of computing the width of simplices.

2 Knapsack Simplices

If a, b ∈ ZZ, lcm(a, b) denotes the least common multiple of a and b, that is, the
smallest nonnegative number which is both a multiple of a and of b.

Theorem 2.1 Let K := {x ∈ IRn : aT x = b, x ≥ 0}, (a ∈ IINn , b ∈ IIN) be


an integer polytope. Then K is empty, if and only if lcm(ai , aj ) = b for all
i 6= j = 1, . . . , n.

Proof. Clearly, K is an n − 1-dimensional simplex whose vertices are vi :=


ki ei , where ki := b/ai . Since K is an integer polytope, ki ∈ ZZ, that is, ai |b
(i = 1, . . . , n).
Let us realize that K contains no integer points besides the vertices, if and
only if gcd(ki , kj ) = 1 for all i, j ∈ 1, . . . , n.
Indeed, if say gcd(k1 , k2 ) = d > 1 then (1/d)v1 + ((d − 1)/d)v2 is an Pninteger
point. Conversely, suppose Pn that there exists an integer vector w = i=1 λi vi
(λi ≥ 0, i = 1, . . . , n), i=1 λi = 1, and say λ1 > 0. Let λi := pi /qi , where pi , qi
are relatively prime nonzero integers whenever λi 6= 0. Since λ2 +. . .+λn = 1−λ1 ,
an arbitrary prime factor p of q1 occurs also in qi for some i ∈ {2, . . . , n}. Since
wi = λi ki is integer, the denominator of λi divides ki , that is, p|k1 , p|ki , proving
gcd(k1 , ki ) ≥ p > 1.
Now it only remains to notice that for any i, j = 1, . . . , n, gcd(ki , kj ) = 1 if
and only if lcm(ai , aj ) = lcm(b/ki , b/kj ) = b. t
u
This assertion does not generalize, the simplex K is quite special :

Corollary 2.2 The integer knapsack polytope K := {x ∈ IRn : aT x = b, x ≥ 0},


(a ∈ IINn , b ∈ IIN) is empty, if and only if all its two dimensional faces are empty.

3 Parallelepiped Structure and Jump Coefficients

In this section we are stating two lemmas that will be needed in the sequel. The
first collects some facts about the structure or parallelepipeds. The second is a
result of number theoretic character. The structure provided by the first raises
the problem solved by the second.
A unimodular transformation is a linear transformation defined by an in-
teger matrix whose determinant is 1. An equivalent definition: a unimodular
transformation is the composition of a finite number of reflections fi (x) :=
(x1 , . . . , −xi , . . . , xn ), and sums fi,j := (x1 , . . . , xi−1 , xi + xj , xi+1 , . . . , xn ),
An Introduction to Empty Lattice Simplices 405

(i, j = 1, . . . , n). The equivalence of the two definitions is easy to prove in knowl-
edge of the Hermite normal form (see the definition in [13]).
We will use unimodular transformations of a set of vectors V by putting
them into a matrix M as rows, and then using column operations to determine
the Hermite normal form M 0 of M . Then the rows of M 0 can be considered to
provide an ‘isomorphic’ representation of V .
The residue of x ∈ IR mod d ∈ IIN will be denoted by mod(x, d), 0 ≤
mod(x, d) < d.
Let V := {v1 , . . . , vn } ⊆ ZZn be a basis of IRn , and d := det(v1 , . . . , vn ).
By Cramer’s rule every integer vector is a linear combination of V with coeffi-
cients that are integer multiples of 1/d. For x ∈ IRn the coefficient vector λ =
(λ1 , . . . , λn ) ∈ ZZn defined by the unique combination x = (λ1 v1 + . . . + λn vn )/d,
will be called the V -coefficient vector of x. In other words λ = dV −1 x (where
V denotes the n × n matrix whose n-th column is vn ). If x ∈ ZZn then all
V -coefficients of x are integer.
Clearly, V -coefficient vectors are unchanged by linear transformations, and
the width is unchanged under unimodular transformations. (The inverse of a
unimodular transformation is also unimodular. Unimodular transformations can
be considered to be the ‘isomorphisms’ of polytopes with respect to their integer
vectors.)
A par(V )-coefficient vector is a vector λ ∈ IRn which is the V -coefficient
vector of some x ∈ par(V ). We will often exploit the fact that parallelepipeds are
symmetric objects: if x ∈ par(v1 , . . . , vn ), then v1 + . . .+ vn − x ∈ par(v1 , . . . , vn ).
In other words, if (λ1 , . . . , λn ) ∈ IINn is a par(V )-coefficient vector, then (d −
λ1 , . . . , d − λn ) is also one (extensively used in [10] and [14]).
We will use par(V )-coefficients and some basic facts in the same way as we did
in [14]. These have been similarly used in books, for n = 3 in White’s paper [15],
or in Reznick [10] through ‘barycentric coordinates’, whose similarity was kindly
pointed out to the author by Jean-Michel Kantor.
An important part of the literature is in fact involved more generally with
the number of integer points in three (or higher) dimensional simplices. Our
ultimate goal is to find only simple general ‘good certificates’ (or polynomial
algorithms) for deciding when this number is zero, and par(V )-coefficients seem
to be helpful to achieve this task. We are able to achieve only much less: we treat
small dimensional cases in a simple new way, bringing to the surface some more
general facts and conjectures.
If x ∈ IRn , then clearly, there exists a unique integer vector p so that x +
p ∈ par(V ). If x ∈ ZZn and λ ∈ ZZn is the V -coefficient vector of x, then the
V -coefficient vector of x + p is (mod(λ1 , d)/d, . . . , mod(λn , d)/d). The par(V )-
coefficient vectors form a group G = G(V ) with respect to mod d addition. This
is the factor-group of the additive group of integer vectors with respect to the
subgroup generated by V , moreover the following well-known facts hold:
Lemma 3.1 Let V := v1 , . . . , vn ∈ ZZn . Then:
(a) par(V \vn ) = {0} if and only if there exists a unimodular transformation (and
possibly permutation of the coordinates) such that vi := ei (i = 1, . . . , n − 1),
406 András Sebő

vn = (a1 , . . . , an−1 , d), where d = det(v1 , . . . , vn ) and 0 < ai < d for all
i = 1, . . . , n − 1.
(b) If par(V \ vn ) = {0}, then G(V ) is a cyclic group.
(c) If par(V \ vn ) = {0} then par(V \ vi ) = {0} for some i ∈ {1, . . . , n − 1} if
and only if in (a) gcd(ai , d) = 1.
Indeed, let the rows of a matrix M be the vectors vi ∈ ZZn (i = 1, . . . , n)
and consider the Hermite normal form of (the column lattice of) M . If par(V \
vn ) = {0}, then deleting the last row and column we get an (n − 1) × (n −
1) identity matrix, and (a), follows. Now (b) is easy, since par(V ) = {0} ∪
{di/d(a1 , . . . , an−1 , d)e : i = 1, . . . , d − 1}, that is, the par(V )-coefficients are
equal to {mod(ih, d) : i = 1, . . . , d − 1}, where h := (d − a1 , . . . , d − an , 1).
Statement (c) is also easy, since gcd(ai , d) = gcd(d − ai , d) > 1 if and only if
there exists j ∈ IIN, j < d so that the i-th coordinate of mod(jh, d) is 0.
Statement (b) is very useful for proving properties forr all the parallelepiped
P : one can generate the V := {v1 , . . . , vn }-coefficients of all the d − 1 nonzero
points of P by taking the mod d multiples of the V -coefficient vector of some
generator h = (h1 , . . . , hn ) ∈ P (cf. [10], or [14]). If the polytope we are consid-
ering is described in terms of inequalities satisfied by a function of the par(V )-
coefficients, then it is useful to understand how the par(V )-coefficient vector
mod((i + 1)h, d) changed comparing to mod(ih, d).
For instance, for the simplex S := conv(v1 , . . . , vn ) to be empty means exactly
that the sum of the V -coefficients of any vector in P is strictly greater than d. For
any coordinate 0 < a ≤ d−1 of h one simply has mod((i+1)a, d) = mod(ia, d)+a,
unless the interval (mod(ia, d), mod(ia, d) + a] contains a multiple of d, that
is, if and only if mod(ia, d) + a ≥ d. In this latter case mod((i + 1)a, d) =
mod(ia, d) + a − d, and we will say that i is a jump-coefficient of a mod d.
Hence mod d inequalities can be treated as ordinary inequalities, corrected
by controling of jump-coefficients. We will need only the following simply stated
Lemma 3.2, relating containment relations between the sets of jump coeffients,
to divisibility relations.
We say that i ∈ {1, . . . , d − 1} is a jump-coefficient of a ∈ IIN mod d (1 ≤ a <
d), if b(i + 1)a/dc > bia/dc (equivalently, if mod((i + 1)a, d) < mod(ia, d)). If
a = 1, then Ja (d) = ∅, and if a ≥ 2, the set of jump-coefficients of a mod d is
(∗) Ja (d) := {bid/ac : i = 1, . . . , a − 1}.
Let us illustrate our goal and the statement we want to prove on the easy
example of a = 2 and d odd: let us show that Ja (d) ⊆ Jb (d) if and only if b is
even. Indeed, 2 has just one jump-coefficient, bd/2c. So Ja (d) ⊆ Jb (d) if and only
if bd/2c ∈ Jb , that is, if and only if the interval (bd/2cb, (bd/2c + 1)b] contains a
multiple of d. It does contain a multiple of d/2: bd/2, and since b/2 < d/2 this
is the only multiple of d/2 it contains. But bd/2 is a multiple of d if and only if
b is even, as claimed.
Lemma 3.2 states a generalization of this statement for arbitrary a. Let us
first visualise the statement, and (∗) – a basic tool in the proof :
Let d ∈ IIN be arbitrary, and 0 < a, b < d. The points A := {id/a : i =
1, . . . , a − 1} divide the interval [0, d] into a equal parts. Each of these parts has
An Introduction to Empty Lattice Simplices 407

length bigger than 1, so the points of A lie in different intervals (i, i + 1]. Now
(∗) means exactly that
(∗∗) i ∈ Ja if and only if the interval (i, i + 1] contains an element of A.
If b > a, then clearly, there is an interval (i, i + 1] containing a point of B
and not caintaining a point of A. If b is a multiple of a, then obviously, B ⊇ A.
If d − a is a multiple of d − b, then again B ⊇ A is easy to prove (see the Fact
below).
If a, b ≤ d/2, then d − b | d − a cannot hold. The following lemma states that
under this condition the above remark can be reversed: if a does not divide b,
then A \ B 6= ∅. If a ≤ d/2 and b > d/2, then Ja (d) ⊆ Jb (d) can hold without
any of a|b or d − b | d − a being true (see example below).
Let us first note the following statement that will be frequently used in the
sequel:
Fact: {Ja (d), Jd−a (d)} is a bipartition of {1, . . . , d − 1}. (Easy.)
The following lemma looks like a simple and quite basic statement :

Lemma 3.2 Let d, a, b ∈ IIN, 0 < a, b < d/2, gcd(a, d) = gcd(b, d) = 1. If


Ja (d) ⊆ Jb (d), then a|b.

Proof. Let a, b ∈ IIN, 0 < a, b < d/2, Ja (d) ⊆ Jb (d). Then a ≤ b. Suppose
a does not divide b, and let us show Ja \ Jb 6= ∅. We have then 2 ≤ a < b,
and Ja \ Jb 6= ∅ means exactly the existence of k ∈ {1, . . . , a − 1} such that
bkd/ac ∈
/ Jb (see (∗)).
Claim : Let k ∈ {1, . . . , a − 1}. Then bkd/ac ∈
/ Jb if and only if both

mod(kd, a) d a − mod(kd, a) d
< , and < hold.
mod(kb, a) b a − mod(kb, a) b

This statement looks somewhat scaring, but we will see that it expresses
exactly what bkd/ac ∈ / Jb means if one exploits (∗) for b, and (∗∗) for a:
Indeed, then bkd/ac ∈ / Jb if and only if for all i = 1, . . . , b − 1: id/b ∈
/
(bkd/ac, bkd/ac + 1]. Let us realize, that instead of checking this condition for
all i, it is enough to check it for those possibilities for i for which id/b has a
chance of being in (bkd/ac, bkd/ac + 1]:
Since kd a = a b and hence b a c b ≤ a ≤ d a e b , there are only two such
kb d kb d kd kb d

possibilities for i: b a c and d a e. In other words, b kd


kb kb
a c∈
/ Jb if and only if

kb d kd kb d kd
b c < b c, or d e > 1 + b c.
a b a a b a

Subtract from these inequalities the equality kb d


a b = kd
a , and apply bp/qc =
p−mod(p,q)
q :
mod(kb, a) d mod(kd, a)
− <− ,
a b a
which is the first inequality of the claim, and
408 András Sebő

a − mod(kb, a) d mod(kd, a)
>1− ,
a b a
which is the second inequality. The claim is proved.
Let g := gcd(a, b). The values mod(ib, a) (i ∈ {0, 1, . . . , a − 1}) are from the
set {jg : j = 0, . . . , (a/g) − 1}, and each number in this set is taken g times.
Depending on whether a/g is even or odd, a/2 or a − g/2 is in this set.
So there exist g different values of i for which a/2 − g/2 ≤ mod(ib, a) ≤ a/2
and since for all of these g values mod(id, a) is different (because of gcd(a, d) = 1),
for at least one of them mod(id, a) ≤ a − g. For this i we have mod(id,a) mod(ib,a) ≤
a−g
a/2−g/2 = 2, and since a − mod(ib, a) ≥ a/2, a−mod(id,a)
a−mod(ib,a) ≤ 2 holds as well. Since
d/b > 2 by assumption, the condition of the claim is satisfied and we conclude
bid/ac ∈
/ Jb . t
u

Corollary 3.3 Let d, a, b ∈ IIN, d/2 < a, b < d, gcd(a, d) = gcd(b, d) = 1. If


Ja (d) ⊆ Jb (d), then d − b | d − a.

Indeed, according to the Fact proved before the Lemma, Ja (d) ⊆ Jb (d) im-
plies Jd−b (d) ⊆ Jd−a (d), and clearly, 0 < d − b, d − a < d/2 gcd(d − b, d) =
gcd(d − a, d) = 1. Hence the Lemma can be applied to d − b, d − a and it estab-
lishes d − b | d − a.
The following example shows that the Lemma or its corollary are not nec-
essarily true if a < d/2, b > d/2, even if the condition of the lemma is ‘asymp-
totically true’ (lim d/b = 2 if k → ∞): let k ∈ IIN, k ≥ 3; d := 6k − 1, a = 3,
b = 3k + 4.
Then Ja = {2k − 1, 4k − 1} ⊆ Jb – let us check for instance 2k − 1 ∈ Jb :
(2k − 1)b = 6k 2 + 5k − 4 = (6k − 1)k + 6k − 4 ≡ 6k − 4 mod 6k − 1. Since
3k + 4 > (6k − 1) − (6k − 4) = 3, 2k − 1 is a jump coefficient of b = 3k + 4 mod
d.
For k = 2 we do not get a real example: a = 3, b = 10, d = 11; Ja ⊆ Jb
is true, and 3 is not a divisor of 10, but the corollary applies to d − a = 8 and
d − b = 1. One gets the smallest example with Ja ⊆ Jb and neither a|b nor
d − b | d − a by substituting k = 3: then a = 3, b = 13, d = 17.

4 A Polynomial Certificate

In the introduction we formulated a somewhat more general question than the


emptiness of lattice simplices: given a linearly independent set V := {v1 , . . . , vn }
of integer vectors, is there a par(V )-coefficient vector whose (weighted) sum
is smaller than a pre-given value. If all the coefficients are 1, and the pre-
given value is d := det(V ) we get back the problem of the nonemptiness of
conv(0, v1 , . . . , vn ). Pn
For integer weights 0 ≤ a1 , . . . P , an ≤ d − 1, the congruence i=1 ai λi ≡
n
0 mod d (λ1 , . . . , λn ∈ IIN) implies i=1 a i λ i ≥ d, unless a i λi = 0 for all i =
1, . . . , n. If the congruence holds for all (λ1 , . . . , λn ) ∈ par(V ), then the inequality
An Introduction to Empty Lattice Simplices 409

also holds, except if λi = 0 for all i = 1, . . . , n for which ai > 0. We suppose that
this exception does not occur. (This is automatically the case if all proper faces
of par(V ) are empty, or if ai > 0 for all i = 1, . . . , n.) Then, in order to certify
the validity of the above inequality for the entire par(V ), one only has to check
the congruence to hold for a generating set.
Such inequalities (induced exactly by the ‘orthogonal’ space to G(V ) mod d)
can then be combined in the usual way of linear (or integer) programming, in
order to yield new inequalities. We do not have an example where this procedure
would not provide a ‘short’ certificate
Pn for a lattice simplex to be empty, or more
generally, for the inequality i=1 λi /d > k (k ∈ IIN) to hold.
By the symmetry of the parallelepiped, the maximum of k for which such an
inequality can be satisfied is k = n/2.
For this extreme case (which occurs in the open cases of the integer Caratheo-
dory property – and slightly changes for odd n) we conjecture that the simplest
possible ‘good certificate’ can work:

Conjecture 4.1 Let e ∈ {0, 1}, Pne ≡ n mod 2, and V = {v1 , . . . , vn } ⊆ ZZ


n

be linearly independent. Then i=1 λi ≥ bn/2cd + 1 − e holds for every λ =


(λ1 , . . . , λn ) ∈ G(V ), if and only if there exists α = (α1 , . . . , αn ) ∈ G(V ),
gcd(αi , d) = 1 (i = 1, . . . , n), and a set P of bn/2c disjoint pairs in {1, . . . , n}
such that for each pair {p, q} ∈ P: αp + αq = d.

The sufficiency of the condition is easy: α is a generator; for all λ ∈ G(V ),


λ > 0; then 0 < λp +P λq < 2d for every {p, q} ∈ P, and d | λp + λq , so λp + λq = d
n
for every λ ∈ G(VP ); i=1 λi ≥ |P| + 1 − e = bn/2cd + 1 − e follows.
n
Conversely, if i=1 λi ≥ bn/2cd+1−e for every λ ∈ G(V ), then par(V \vi ) =
{0} (i = 1, . . . , n) is obvious: ifPsay par(V \ vn ) 6= {0}, then by symmetry, there
n
exists λ ∈ par(V \ en ) 6= {0}, i=1 λi /d ≤ (n − 1)/2, a contradiction. PSo G(V )
n
is cyclic. In order to prove the conjecture, we have to prove that i=1 λi ≥
bn/2cd + 1 − e for every λ = (λ1 , . . . , λn ) ∈ G(V ) implies λp + λq = d for all
par(V ) coefficient, or equivalently, for a generator.
We show this Conjecture in the special case n ≤ 4, when it is equivalent to a
celebrated result of White [15]. Instead of using White’s theorem, we provide a
new, simpler proof based on Lemma 3.2. We hope that these results will be also
useful in more general situations.
An integer simplex S ⊆ IR2 is empty if and only if its two nonzero vertices
form an empty parallelepiped, that is, if and only if they define a unimodular
matrix. (This is trivial from the Hermite normal form, or just by the symmetry
of parallelepipeds.)
Let now n = 3, and let A, B, C ∈ ZZ3 be the nonzero vertices of the simplex
S ⊆ IR3 . It follows from the result on n ≤ 2 applied to the facets of S after
applying Lemma 3.1, that G(V ) is cyclic, and then the input of the problem
can be given in a shorter and more symmetric form: we suppose that the input
consists of only three numbers, the V := {A, B, C}-coordinates of a generator
(α, β, γ) of G(V ) (instead of the vectors A, B, C themselves). White’s theorem
(see [15] or [10]) can be stated as follows:
410 András Sebő

Theorem 4.2 Let S ⊆ IR3 be an integer simplex with vertices O = 0 ∈ IR3 , and
A, B, C linearly independent vectors, d := det(A, B, C), V := {A, B, C}. The
following statements are equivalent:

(i) par(A, B) = par(B, C) = par(A, C) = {0}, and there exists a par(V )-


coefficient vector (α, β, γ) such that gcd(α, d) = gcd(β, d) = gcd(γ, d) = 1,
moreover the sum of two of α, β, γ is equal to d, and the third is equal to 1.
(ii) par(A, B) = par(B, C) = par(A, C) = {0}, and for every par(V )-coefficient
vector (α, β, γ), after possible permutation of the coordinates
mod (α, d) + mod (β, d) = d (i = 1, . . . , d − 1).
(iii) S is empty.

Proof. If (i) holds, then (α, β, γ) ∈ G(V ) is a generator, so (ii) also holds. If
(ii) holds, then for all x ∈ par(A, B, C) the sum of the first two V -coefficients is
divisible by, and it follows that it is equal to d; since par(A, B) = {0}, the third
V -coordinate of x is nonzero, so the sum of the V -coefficients is greater than d
which means exactly that x ∈ / S. So S is empty.
The main part of the proof is (iii) implies (i). We follow the proof of [14]
Theorem 2.2:
Let S ⊆ IR3 be an empty integer simplex. Then every face of S is also integer
and empty. Therefore (since the faces are two-dimensional) par(V 0 ) = {0} for
every proper subset V 0 ⊂ V . Now by Lemma 3.1, the group G(V ) is cyclic. Let
the par(V )-coefficient (α, β, γ) be a generator.
Claim 1: d < mod(iα, d) + mod(iβ, d) + mod(iγ, d) < 2d (i = 1, . . . , d − 1).
Indeed, since S is empty, mod(iα, d) + mod(iβ, d) + mod(iγ, d) > d, and
(d − mod(iα, d)) + d − mod(iβ, d) + d − mod(iγ, d) > d, for all i = 1, . . . , d − 1.
Claim 2: There exists a generator (α, β, γ) ∈ G(V ) such that α + β + γ = d + 1.
Note that mod(iα, d)+mod(iβ, d)+mod(iγ, d) is mod d different for different
i ∈ {1, . . . , d − 1}, because if for j, k, 0 ≤ j < k ≤ d − 1 the values are equal,
then for i = k − j the expression would be divisibile by d, contradicting Claim
1. So {mod(iα, d) + mod(iβ, d) + mod(iγ, d) : i = 1, . . . , d − 1} is the same as
the set {d + 1, . . . , 2d − 1}, in particular there exists k ∈ {1, . . . , d − 1} such that
mod (kα, d) + mod (kα, d) + mod (kγ, d) = d + 1. Then clearly, mod (ikα, d) +
mod (ikα, d) + mod (ikγ, d) = d + i, so (kα, kβ, kγ) is also a generator, and the
claim is proved.
Choose now (α, β, γ) be like in Claim 2.
Claim 3. Each i = 1, . . . , d − 1 is jump-coefficient of exactly one of α, β and γ.
Indeed, because of Claim 1 we have mod((i + 1)α, d) + mod((i + 1)β, d) +
mod((i + 1)γ, d) = mod(iα, d) + mod(iβ, d) + mod(iγ, d) + α + β + γ − d and the
claim follows.
Fix now the notation so that α ≥ β ≥ γ. If we apply Claim 3 to i = 1 we get
that α > d/2, β < d/2, γ < d/2.
Hence Lemma 3.2 can be applied to d − α, β, γ: Claim 3 means Jβ ∩ Jα = ∅,
Jγ ∩ Jα = ∅, whence, because of the Fact noticed before Lemma 3.2 Jβ , Jγ ⊆
An Introduction to Empty Lattice Simplices 411

Jd−α . So by Lemma 3.2 β, γ | d − α. If both β and γ are proper divisors, then


they are both smaller than the half or d − α, that is, β + γ ≤ d − α, contradicting
α + β + γ = d + 1.
So β = d − α, and then γ = 1 finishing the proof of the theorem. t
u
Note that the above proof contains most of the proof of [14] Theorem 2.2
and more: the key-statement of that proof is the existence of a par(V )-coefficient
(α, β, γ) such that α+β +γ = d+1. The above proof sharpens that statement by
adding that γ = 1 in this par(V )-coefficient. This additional fact implies some
simplifications for proving the main result of [14]. Here we omit these, we prefere
to deduce the following, in fact equivalent versions of Theorem 4.2 ([11],[15],[12]).
It turns out that the various versions are the same modulo row permutations in
the Hermite normal form of a 3 × 3 matrix that has two different types of rows:

Corollary 4.3 S = conv(0, A, B, C) ⊆ IR3 , (A, B, C ∈ ZZ3 ) is empty if and only


if the Hermite normal form of one of the (six) 3 × 3 matrices M whose rows are
A, B, C in some order has rows (1, 0, 0), (0, 1, 0), (a, d − a, d), (gcd(a, d) = 1).

Proof. The if part is obvious, since for an arbitrary par(V )-coefficient vector
(α, β, γ): α + β = d and γ > 0, so α/d + β/d + γ/d > 1.
The only if part is an obvious consequence of the theorem: let the two par-
ticular rows mentioned in Theorem 4.2 (i) be the first two rows of a matrix M ,
and let the remaining row be the third. Then M is a 3 × 3 matrix, and with
the notation of Theorem 4.2, the rows of the Hermite normal form of M are
the following : the first two rows are (1, 0, 0), (0, 1, 0) since the parallelepiped
generated by any two nonzero extreme rays is {0}; the third is (d − α, d − β, d).
Since by Theorem 4.2 α + β = d, the statement follows. t
u

Corollary 4.4 S = conv(0, A, B, C) ⊆ IR3 (A, B, C ∈ ZZ3 ) is empty if and only


if the Hermite normal form of one of the (six) 3 × 3 matrices M whose rows are
A, B, C in some order has rows (1, 0, 0), (0, 1, 0) and (1, b, d), (gcd(b, d) = 1).

Proof. Again, the if part is obvious, because in an arbitrary par(V )-coefficient


vector the sum of the first and third coordinate is d, and the second coordinate
is positive.
The proof of the only if part is also the same as that of the previous corollary,
with the only difference that now we let the two particular rows mentioned in
Theorem 4.2 (i) be the first and third row of M . t
u

Corollary 4.5 The integer simplex S ⊆ IRn , n ≤ 3 is empty, if and only if


W (P ) = 1 for all faces P of S (that is, for P = S and for all P which is any of
the facets or edges of S).

Proof. If n ≤ 2 the statement is obvious (see above). The only if part follows
from the previous corollary: after applying a unimodular transformation, the
three nonzero vertices of S will be (1, 0, 0), (0, 1, 0) and (1, b, d). The vector
w := (1, 0, 0) shows that the width of S is 1.
412 András Sebő

Conversely, suppose W (P ) = 1 for every face P of the integer simplex S.


Suppose indirectly v ∈ S ∩ ZZn is not a vertex of S. Because of the statement
for n ≤ 2, v does not lie on a face P ⊂ S, P 6= S. For a vector w ∈ ZZn
defining the width, min{wT x : x ∈ S} < wT v < max{wT x : x ∈ S}, because
the vectors achieving the minimum or the maximum constitute a face of S. But
then W (S) = max{wT x : x ∈ S} − min{wT x : x ∈ S} = max{wT x : x ∈
S} − wT v + wT v − min{wT x : x ∈ S} ≥ 1 + 1 = 2, a contradiction. t
u

Corollary 4.6 Conjecture 4.1 is true for n ≤ 4.

Proof. For n ≤ 3 the statement is a consequence of Theorem 4.2.


PnLet n = 4. As noticed after Conjecture 4.1, we only have to prove that
i=1 λi ≥ 2d for every λ ∈ G(V ) implies that for some generator α (possibly
after permuting the coordinates) α1 + α2 = d, α3 + α4 = d.
Let α ∈ G(V ) be a generator, and suppose without loss of generality α4 =
d − 1. (This holds after multiplying α by a number relatively prime to d.) Then
Claim 2 and 3 of the proof of Theorem 4.2 hold with the choice α := α1 , β := α2 ,
γ := α3 , and the proof can be finished in the same way. t
u
This argument can be copied for proving that Conjecture 4.1 holds true for
n = 2k − 1 if and only if it is true for n = 2k.

5 Computing the Width of Polytopes

In spite of some counterexamples (see Section 1), the width and the number
of lattice points of a lattice simplex are correlated, and some of the remarks
above are about this relation. It is interesting to note that the complexity of
computing these two numbers seem to show some analogy: it is hard to compute
the number of integer points of a polytope, but according to a recent result of
Barvinok[2] this problem is polynomially solvable if the dimension is bounded;
we show below that to compute the width of quite particular simplices is already
NP-hard, however, there is a simple algorithm that finds the width in polynomial
time if the dimension is fixed. The proofs are quite easy:

Theorem 5.1 Let a ∈ IINn . It is NP-complete to decide whether the width of


conv(e1 , . . . , en , a) is at most 1.

Proof. The defined problem is clearly in NP. We reduce PARITION to this


problem.PLet the input of a PARTITION problem be a vector a ∈ IINn , where
−an := n−1 i=1 ai /2 and we can suppose without loss of generality, ai > 3 (i =
1, . . . , n). (We multiply by 4 an arbitrary instance of PARTITION. The question
P PARTITION is whether there exists a subset I ⊆ {1, . . . , n − 1} such that
of
i∈I ai = −an , and this is unchanged under multiplication of all the data by a
scalar.)
It is easy to see that the width of P := conv(e1 , . . . , en , a) is at most 1 if and
only if the defined PARTITION problem has a solution.
An Introduction to Empty Lattice Simplices 413

Indeed, if I ⊆ {1, . . . , n − 1} is a solution, then for the vector w ∈ ZZn defined


by wi := 1 if i ∈ I ∪ {n}, and wi := 0 if i ∈ / I ∪ {n} we have wT ei ≤ 1
(i = 1, . . . , n), wT a = 0.
Conversely, let W (P ) ≤ 1, and let w ∈ IINn be the vector defining the width.
Then |wT x − wT y| ≤ 1, x, y ∈ {e1 , . . . , en } means that the value of wi (i =
1, . . . , n) is one of two consecutive integers, and these are nonnegative without
loss of generality: for some k ∈ IIN and P K ⊆ {1, . P. . , n}, wi := k + 1 if i ∈ K
and wi := k if i ∈ / K; but then wT a = k ni=1 ai + i∈K ai . Since w defines the
width, |wT a| ∈ {k, k + 1}, and an easy computation shows that this is possible
only if k = 0 and K \ {n} is a solution of the partition problem. t
u
Let us consider a polytope P := {x ∈ IRn : Ax ≤ b} where A is a matrix
with n columns. Denote the i-th row of A by ai , and the corresponding right
hand side by bi . For simplicity (and without loss of generality) suppose that
P is full dimensional. If v is a vertex of P , denote Cv := cone(ai : aTi v = bi )
and let Xu,v denote the set of extreme rays of the cone Cu ∩ −Cv so that
for every x ∈ Xu,v : x = (x1 , . . . , xn ) ∈ ZZn , xi ∈ ZZ, (i = 1, . . . , n), and
gcd(x1 , . . . , xn ) = 1. Let
[
Zu,v := {conv(X ∪ {0}) : X ⊆ Xu,v , X is a basis of IRn } ∩ ZZn .

Theorem 5.2 W (P ) = min wT (u − v) over all pairs of vertices u, v of S, and


all w ∈ Zu,v .

Proof. By definition, the width of P is the minimum of max{cT (u − v) : u, v ∈


P } over all vectors c ∈ ZZn \ {0}. Let w be the minimizing vector. It follows
from the theory of linear programming, that u, v can be chosen to be vertices of
P which maximize and respectively minimize the linear objective function wT x;
moreover, by the duality theorem of linear programming, w is then a nonnegative
combination of vectors in Cu and also a nonnegative combination of vectors in
−Cv ; that is, w ∈ Cu ∩ −Cv . Fix now u and v to be these two vertices of P . All
that remains to be proved is:
Claim. For some linearly independent subset X ⊆ Xu,v , w ∈ conv(X ∪{0})∩ZZn .
P Indeed, since w ∈ Cu ∩ −Cv , we know by Caratheodory’s theorem that w =
x∈X λx x for some linearly independent X ⊆ PXu,v and λx ∈ IR, (xT∈ X). In
order
P to prove the claim, we
P have to prove that x∈X λx ≤ 1. If not: w (u−v) =
( x∈X λx x)T (u − v) ≥ ( x∈X λx )(minx∈X xT (u − v) > minx∈X xT (u − v).
Let this minimum be achieved in x0 ∈ X, so we proved wT (u−v) > xT0 (u−v).
Since x0 ∈ Cu ∩ −Cv , max{xT0 (u0 − v 0 ) : u0 , v 0 ∈ P } is achieved for u0 = u and
v 0 = v. But then wT (u − v) > xT0 (u − v) contradicts the definition of w. t
u
Theorem 5.2 provides a polynomial algorithm for computing the width if n is
fixed: all the extreme rays and facets of Cu ∩−Cv can be computed in polynomial
time; since |Xu,v | is bounded by a polynomial, it contains a polynomial number of
subsets X of fixed size n; for every X we solve a mixed integer program searching
414 András Sebő

for an integer point in conv(X ∪{0}) but not in X ∪{0}. Mixed integer programs
can be solved in polynomial time by Lenstra [9], see also Schrijver [13], page 260.
Haase and Ziegler [5] present W (S) where S is a lattice simplex as the optimal
value of a direct integer linear program. Their method is much simpler, and
probably quicker. The finite set of points (‘finite basis’) provided by Theorem 5.2
can be useful for presenting a finite list of vectors that include a width-defining
w ∈ ZZn , for arbitrary polytopes.
The negative results of the paper do not exclude that the emptiness of integer
simplices, and the width of empty integer simplices are decidable in polynomial
time. The positive results show some relations between these notions, involving
both complexity and bounds.
Acknowledgment: I am thankful to Jean-Michel Kantor for introducing me
to the notions and the references of the subject; to Imre Bárány and Bernd
Sturmfels for further helpful discussions.

References
1. W. Banaszczyk, A.E. Litvak, A. Pajor, S.J Szarek, The flatness theorem for
non-symmetric convex bodies via the local theory of Banach spaces, Preprint
1998.
2. A. Barvinok, A polynomial time algorithm for counting integral point in poly-
hedra when the dimension is fixed, Math. Oper. Res., 19 (1994), 769–779.
3. J.W.S. Cassels, An Introduction to the Geometry of Numbers, Springer, Berlin,
1959.
4. W. Bienia, L. Goddyn, P. Gvozdjak, A. Sebő, M. Tarsi, Flows, View Obstruc-
tions, and the Lonely Runner, J. Comb. Theory/B, Vol 72, No 1, 1998.
5. C. Haase, G. Ziegler, On the maximal width of empty lattice simplices, preprint,
July 1998/January 1999, 10 pages, European Journal of Combinatorics, to ap-
pear.
6. R. Kannan, L. Lovász, Covering minima and lattice-point-free convex bodies,
Annals of Mathematics, 128 (1988), 577-602.
7. L. Lovász and M. D. Plummer, Matching Theory, Akadémiai Kiadó, Budapest,
1986.
8. J-M. Kantor, On the width of lattice-free simplices, Composition Mathematica,
1999.
9. H.W. Lenstra Jr., Integer programming with a fixed number of variables, Math-
ematics of Operations Research, 8, (1983), 538–548.
10. B. Reznick, Lattice point simplices, Discrete Mathematics, 60, 1986, 219–242.
11. J.E.Reeve, On the volume of lattice polyhedra, Proc. London Math. Soc. (3) 7
(1957), 378–395.
12. H. Scarf, Integral polyhedra in three space, Math. Oper. Res., 10 (1985), 403–438.
13. A. Schrijver, ‘Theory of Integer and Linear Programming’, Wiley, Chich-
ester,1986.
14. A. Sebő, Hilbert bases, Caratheodory’s theorem and Combinatorial Optimiza-
tion, IPCO1, (R. Kannan and W. Pulleyblank eds), University of Waterloo Press,
Waterloo 1990, 431–456.
15. G. K. White, Lattice tetrahedra, Canadian J. Math. 16 (1964), 389–396.
On Optimal Ear-Decompositions of Graphs

Zoltán Szigeti

Equipe Combinatoire, Université Paris 6, 75252 Paris, Cedex 05, France


[email protected]

Abstract. This paper can be considered as a continuation of a paper


[7] of the author. We consider optimal ear-decompositions of graphs that
contain even ears as few as possible. The ear matroid of a graph was
introduced in [7] via optimal ear-decompositions. Here we give a simple
description of the blocks of the ear matroid of a graph. The second goal
of this paper is to point out how the structural result in [7] implies easily
the Tight Cut Lemma of Edmonds, Lovász and Pulleyblank. Moreover,
we propose the investigation of a new class of graphs that generalizes
matching-covered graphs. A graph is called ϕ-covered if each edge may
lie on an even ear of an optimal ear-decomposition. Several theorems on
matching-covered graphs will be generalized for ϕ-covered graphs.

1 Introduction
This paper is concerned with matchings, matroids and ear-decompositions. The
results presented here are closely related to the ones in [7]. As in [7], we focus
our attention on ear-decompositions (called optimal) that have minimum num-
ber ϕ(G) of even ears. A. Frank showed in [3] how an optimal ear-decomposition
can be constructed in polynomial time for any 2-edge-connected graph. For an
application we refer the reader to [1], where we gave a 17/12-approximation algo-
rithm for the minimum 2-edge-connected spanning subgraph problem. ϕ(G) can
also be interpreted as the minimum size of an edge-set (called critical making)
whose contraction leaves a factor-critical graph, because, by a result of Lovász
(see Theorem 1), ϕ(G) = 0 if and only if G is factor-critical. It was shown in
[7] (see also [8]) that the minimal critical making edge sets form the bases of a
matroid. We call the matroid as the ear-matroid of the graph and it will be de-
noted by MG . For a better understanding of this matroid we give here a simple
description of the blocks of MG . As we shall see, two edges belong to the same
block of the ear-matroid if and only if there exists an optimal ear-decomposition
so that these two edges are contained in the same even ear. Let us denote by
D(G) the graph obtained from G by deleting each edge which is always in an
odd ear in all optimal ear-decompositions of G. We shall show that the edge sets
of the blocks of this graph D(G) coincide with the blocks of MG .
Let us turn our attention to matching theory. Matching-covered graphs and
saturated graphs have ϕ(G) = 1. By generalizing the Cathedral Theorem of
Lovász [6] on saturated graphs we proved in [7] a stucture theorem on graphs
with ϕ(G) = 1. As a new application of that result we present here a simple

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 415–428, 1999.
c Springer-Verlag Berlin Heidelberg 1999
416 Zoltán Szigeti

proof for the Tight Cut Lemma due to Edmonds, Lovász and Pulleyblank [2].
Edmonds et al. determined in [2] the dimension of the perfect matching polytope
of an arbitrary graph. They used the so-called brick decomposition and the most
complicated part was to show that the result is true for bricks. In this case the
problem is equivalent to saying that no brick contains a non-trivial tight cut.
This is the Tight Cut Lemma. Their proof for this lemma contains some linear
programming arguments, the description of the perfect matching polytope, the
uncrossing technique and some graph theoretic arguments. Here we provide a
purely graph theoretic proof.
Let us call a graph ϕ-covered if for each edge e there exists an optimal
ear-decomposition so that e is contained in an even ear. We shall generalize
matching-covered graphs via the following fact.
(×) A graph G is matching-covered if and only if G/e is factor-critical for each
edge e of G, in other words, G is ϕ-covered and ϕ(G) = 1.
Thus ϕ-covered graphs can be considered as the generalization of matching-
covered graphs. We shall extend several results on matching-covered graphs for
ϕ-covered graphs. For example Little’s theorem combining by Lovász and Plum-
mer’s result states that if e and f are two arbitrary edges of a matching-covered
graph G, then G has an optimal ear–decomposition such that the first ear is
even and it contains e and f . This is also true for ϕ-covered graphs. Lovász
and Plummer [6] proved that each matching-covered graph has a 2-graded ear-
decomposition. For a simple proof for this theorem see [9]. This result may also
be generalized for ϕ-covered graphs.

2 Definitions and Preliminaries

Let G = (V, E) be a (2-edge–connected) graph. An ear–decomposition of G is


a sequence G0 , G1 , ..., Gn = G of subgraphs of G where G0 is a vertex and
each Gi arises from Gi−1 by adding a path Pi for which the two end–vertices
(they are not necessarily distinct) belong to Gi−1 while the inner vertices of
Pi do not. This means that the graph G can be written in the following form:
G = P1 +P2 +...+Pn, where the paths Pi are called the ears of this decomposition.
Note that we allow closed ears, for example the first ear (starting ear) is always a
cycle. An ear is odd (resp. even) if its length is odd (resp. even). If each ear is odd
then we say that it is an odd ear-decomposition. For 2-edge-connected graphs
ϕ(G) is defined to be the minimum number of even ears in an ear-decomposition
of G. An ear-decomposition is said to be optimal if it has exactly ϕ(G) even ears.
A graph G is called factor-critical if for each vertex v ∈ V (G), G − v has a
perfect matching.

Theorem 1. Lovász [5]


a.) A graph G is factor-critical if and only if ϕ(G) = 0.
b.) Let H be a connected subgraph of a graph G. If H and G/H are factor-
critical, then G is factor-critical. t
u
On Optimal Ear-Decompositions of Graphs 417

We say that an edge set of a graph G is critical making if its contraction leaves
a factor-critical graph. An edge of a graph G is allowed if it lies in some perfect
matching of G. If G has a perfect matching then N (G) denotes the subgraph of
G induced by the allowed edges of G. A connected graph G is matching covered
if each edge of G is allowed. The following theorem is obtained by combining the
result of Little [4] and a result of Lovász and Plummer [6].

Theorem 2. Let e and f be two arbitrary edges of a matching-covered graph


G. Then G has an ear–decomposition such that only the first ear is even and it
contains e and f . t
u
We call a graph almost critical if ϕ(G) = 1. By Theorem 2, matching covered
graphs are almost critical. We mention that if G is almost critical then each
critical making edge of G is allowed. Let G be an arbitrary graph. If X ⊆ V (G),
then the subgraph of G induced by X is denoted by G[X]. The graph obtained
from G by contracting an edge e of G will be denoted by G/e. By the subdivision
of an edge e we mean the operation which subdivides e by a new vertex. The
new graph will be denoted by G × e. We say that an edge e of G is ϕ-extreme
if ϕ(G/e) = ϕ(G) − 1, and more generally, an edge set F of G is ϕ-extreme if
ϕ(G/F ) = ϕ(G)−|F |. We proved in Lemma 1.1 in [8] that for a 2-edge-connected
graph G,
ϕ(G × F ) = ϕ(G/F ) if F is a forest in G. (1)
A graph G is called ϕ-covered if each edge of G is ϕ-extreme. For a graph G, we
denote by D(G) the graph on V (G) whose edges are ecatly the ϕ-extreme edges
of G. The ear matroid M(G) of a graph G was introduced in [8]. Its independent
sets are the ϕ-extreme edge sets of G and its bases are exactly the minimum
critical making edge sets. The set of bases of M(G) will be denoted by B(G). The
blocks of a graph G are defined to be the maximal 2-vertex-connected subgraphs
of G.
A connected component H of a graph G is called odd (even) if |V (H)| is odd
(even). A subgraph H of G is called nice if G − V (H) has a perfect matching.
Let H be a graph with a perfect matching. By Tutte’s Theorem co (H − X) ≤
|X| for all X ⊆ V (H). A vertex set X is called barrier if co (H − X) = |X|,
where for X ⊆ V (H), co (H − X) denotes the number of odd components in
H − X. A non-empty barrier X of H is said to be a strong barrier if H − X
has no even components, each of the odd components is factor–critical and the
bipartite graph HX obtained from H by deleting the edges spanned by X and by
contracting each factor–critical component to a single vertex is matching covered.
Let G = (V, E) be a graph and assume that the subgraph H of G induced by
U ⊆ V has a strong barrier X. Then H is said to be a strong subgraph of G with
strong barrier X if X separates U − X from V − U or if U = V.

Theorem 3. Frank [3]


a.) A graph has a strong subgraph if and only if it is not factor-critical.
b.) Let H be a strong subgraph of a graph G. Then ϕ(H) = 1 and ϕ(G/H) =
ϕ(G) − 1. t
u
418 Zoltán Szigeti

It is easy to see that (∗) if H is a strong subgraph (with strong barrier X)


of an almost critical graph G, then G − (V (H) − X) contains no critical making
edge of G. The following lemma generalizes (∗).

Lemma 1. Let H be a strong subgraph with strong barrier X of a graph G.


Then an edge e of G is ϕ-extreme in G if and only if e is ϕ-extreme either in H
or in G/H. That is, E(D(G)) = E(D(H)) ∪ E(D(G/H)).

Proof. To see the if part let e be a ϕ-extreme edge either in H or in G/H. By


Theorem 3/b, there exists an edge f in H and an edge set F in G/H of size
ϕ(G) − 1 so that H/f and (G/H)/F are factor-critical and by the fact that the
ϕ-extreme edge sets are the independent sets of the ear matroid we may suppose
that F 0 := F ∪ f contains e. Then, by Theorem 1/b, G/F 0 is factor-critical and
since |F 0 | = ϕ(G) it follows that F 0 ∈ B(G) that is e is ϕ-extreme in G.
The other direction will be derived from the following simple fact. Let e be
a ϕ-extreme edge in G.

Proposition 1. There exists Be ∈ B(G) with e ∈ Be and |Be ∩ E(G[V (H)])| =


1.

Proof. If at least one of the two end vertices of e is contained in one of the odd
components of H − X, then let us denote this component by K, otherwise let K
be an arbitrary odd component of H − X. Let f be a ϕ-extreme edge in H which
connects K to X, such an edge exists by Theorem 4.3 in [3]. The proposition is
true for f (see above the other direction), thus there exists a base Bf ∈ B(G)
so that f ∈ Bf and |Bf ∩ E(G[V (H)])| = 1. e is ϕ-extreme edge in G that
is it is independent in M(G). Thus it can be extended to a base Be ∈ B(G)
using elements in Bf . We show that Be has the required properties. Clearly,
e ∈ Be . |Be ∩ E(G[V (H)])| ≤ 2 by construction. Let us denote by X 0 (by H 0 )
the vertex set (subgraph) corresponding to X (H) in G/Be . G/Be is factor-
critical because Be ∈ B(G), whence trivially follows that co (H 0 − X 0 ) < |X 0 |.
Thus, by construction, |X| − 1 = co (H − X) − 1 ≤ co (H 0 − X 0 ) ≤ |X 0 | − 1 ≤
|X| − 1. Thus co (H 0 − X 0 ) = co (H − X) − 1 and |X 0 | = |X|. It follows that
|Be ∩ E(G[V (H)])| = 1. t
u

Let De = Be − E(G[V (H)]). Let G0 := G/De . Then, by the previous propo-


sition, ϕ(G0 ) = 1. We claim that H is a strong subgraph in G0 . Otherwise, if X
changes that is |X| decreases, then the corresponding set X 0 violates the Tutte’s
condition in G0 , a contradition because, as it was mentioned in [3], any graph Q
with ϕ(Q) = 1 has a perfect matching.
First suppose that e ∈ E(G[V (H)]). Then, by (∗) and Lemma 2.5 in [7],
the lemma is true. Now suppose that e ∈ E(G/H). By Theorem 3/b, G0 /H
is factor-critical. Since (G/H)/De = G0 /H and |De | = ϕ(G) − 1 = ϕ(G/H),
e ∈ De ∈ B(G/H), that is e ϕ-extreme in G/H. t
u

The first two statements of the following theorem were proved in [7]. Since
the fourth statement is trivial, we prove here only the third one.
On Optimal Ear-Decompositions of Graphs 419

Theorem 4. Let G be an almost critical graph. Then


(4.1) E(D(G)) coincides with the edge set of one of the connected components
of N (G). This component will be denoted by B(G).
(4.2) B(G) is matching-covered
T and G/B(G) is factor-critical.
(4.3) V (B(G)) = {V (H) : H is a strong subgraph in G} .
(4.4) Let u, v ∈ V (B(G)) and let T ⊂ V (G) separating u and v. Then δ(T )
contains an allowed edge of G.

Proof. Let us denote by U the vertex set on the right hand side in Theorem
(4.3). By (∗), V (B(G)) ⊆ U. To show the other direction let v ∈ U. Let e be an
allowed edge of G incident to v. Let G0 := G/e and let us denote by u the vertex
of G0 corresponding to the two end vertices of e. G0 − u has a perfect matching
because e is an allowed edge of G. If G0 is not factor-critical, then it is easy to
see (using Lemma 1.8 in [8]) that G0 contains a strong subgraph H so that H
does not contain u. It follows that H is a strong subgraph of G not containing
v, which is a contradiction since v ∈ U. Thus G0 is factor-critical, that is e is a
critical making edge of G, thus v ∈ V (B(G)), and we are done. t
u

3 The Ear Matroid


The blocks of a matroid N (G) are defined by an equivalence relation. For two
elements e and f of N (G), e ∼ f if there exists a circuit in the matroid containing
them. Observe that there exists a circuit in M(G) containing both e and f if
and only if there exists a base B ∈ B(G) containing e so that B − e + f is a base
again. This is an equivalence relation and the blocks of N (G) are the equivalence
classes of ∼ . There is a close relation between the circuits of M(G) and the
even ears of G.

Lemma 2. The following are equivalent for the ear matroid M(G) of a 2-edge-
connected graph G. Let e and f be two edges of G.
a.) there exists an optimal ear-decomposition of G so that the first ear is even
and it contains e and f .
b.) e and f are in the same block of M(G).

Proof. a.) =⇒ b.) If e and f together can be in an even ear P of an optimal


ear-decomposition then choosing one edge from each even ear (from P let e be
chosen) we obtain the desired base.
b.) =⇒ a.) Let B be a base containing e so that B − e + f is a base again.
Let G0 := G × (B − e). Then, by (1), ϕ(G0 ) = ϕ(B − e) = 1 and e, f ∈ E(D(G0 )).
By Theorem (4.2), B(G0 ) is matching covered, so by Theorem 2, B(G0 ) satisfies
Lemma 2/a. By Theorem (4.2), G0 /B(G0 ) is factor-critical, so by Theorem 1/a, it
has an odd ear-decomposition. It follows that G0 satisfies Lemma 2/a. Obviously,
the ear-decomposition obtained provides the desired ear-decomposition of G. u t

Theorem 5. Let G be a 2-vertex-connected ϕ-covered graph. Then the ear ma-


troid M(G) has one block.
420 Zoltán Szigeti

Proof. We prove the theorem by induction on ϕ(G). If ϕ(G) = 1, G is matching-


covered, and then by Theorem 2 and Lemma 2, the theorem is true. In the rest
of the proof we suppose that ϕ(G) ≥ 2. Let H be a strong subgraph of G with
strong barrier X. By Lemma 1, H and G/H are ϕ-covered, and, by Theorem
3/b, ϕ(H) = 1 and ϕ(G/H) = ϕ(G) − 1. Let G1 be an arbitrary block of G/H.
i.) Let e1 and e2 be two arbitrary edges of H. Let B ∈ B(G/H). Then
(G/H)/B is factor-critical. H/e1 and H/e2 are factor-critical because ϕ(H) =
1 and H is ϕ-covered. Let B 0 := B + e1 . Then, by Theorem 1/b, G/B 0 and
G/(B 0 − e1 + e2 ) are factor-critical, that is B 0 and B 0 − e1 + e2 are in B(G),
hence e1 and e2 belong to the same block of M(G).
ii.) Let e1 and e2 be two arbitrary edges of G1 . By induction, e1 and e2
belong to the same block of M(G1 ), thus there exists a base B ∈ B(G1 ) so that
e1 ∈ B and B − e1 + e2 ∈ B(G1 ). For each block Gi of G/H different from G1
let Bi ∈ B(Gi ). Furthermore, let f ∈ E(H). Finally, let D := B ∪ (∪Bi ) + f.
Note that |D| = ϕ(G). Then, by Theorem 1/b, G/D and G/(D − e1 + e2 ) are
factor-critical, that is D and D − e1 + e2 ∈ B(G). Hence e1 and e2 belong to the
same block of M(G).
iii.) Let e1 and f1 be two edges of G1 so that the corresponding two edges in G
are incident to two different vertices u and v of X. By the 2-vertex-connectivity
of G, such edges exist. Let e2 and f2 be two edges of H incident to u and v
respectively. By i.) ii.) and Lemma 2, there exists an optimal ear-decomposition
P1 + P2 + ... + Pk (P10 + P20 + ... + Pl0 ) of H (of G1 ) so that e2 and f2 (e1 and
f1 ) belong to the starting even ear. Furthermore, let P100 + P200 + ... + Pm
00
be an
optimal ear-decomposition of G/H/G1 so that the first ear contains the vertex
corresponding to the contracted vertex set. Using these ear-decompositions we
provide an optimal ear-decomposition of G so that the starting even ear will
contain e1 and e2 . u and v divide P1 (which is an even ear) into two paths D1
and D2 . Suppose D1 contains e2 . It is easy to see that

Proposition 2. D1 and D2 are of even length. t


u

Consider the following ear-decomposition of G : (D1 + P10 ) + D2 + P2 + ... +


Pk + P20 + ... + Pl0 + P100 + P200 + ... + Pm
00
. It is clear that this is an optimal ear-
decomposition of G, the first ear is even and it contains e1 and e2 , so e1 and e2
belong to the same block of M(G).
i.), ii.) and iii.) imply the theorem. t
u

The following result generalizes Theorem (4.2) and gives some information
about the structure of D(G) for an arbitrary 2-edge-connected graph G.

Theorem 6. Let us denote by G1 , ..., Gk the blocks of D(G). Then


a.) the graph S(G) := ((G/G1 )/...)/Gk is factor-critical,
P
b.) ϕ(G) = k1 ϕ(Gi ),
c.) ϕ(G/Gi ) = ϕ(G) − ϕ(Gi ) (i = 1, ..., k),
d.) Gi is ϕ-covered (i = 1, ..., k).
On Optimal Ear-Decompositions of Graphs 421

Proof. We prove the theorem by induction on ϕ(G). First suppose that ϕ(G) = 1.
Then Theorem (4.2) proves a.), and Theorem (4.2) and Fact (×) in the Intro-
duction implies b.), c.) and d.).
Now suppose that ϕ(G) ≥ 2. Let H be a strong subgraph of G with strong
barrier X. By Theorem 3/b, H is almost critical, and by Theorem (4.2), B(H) is
matching-covered. Since B(H) is 2-vertex-connected, by Lemma 1, it is included
in some Gi , say G1 . Consider the graph G0 := G/B(H). Since H/B(H) is factor-
critical by Theorem (4.2), ϕ(G0 ) = ϕ(G/H) and E(D(G/H)) = E(D(G0 )). By
Lemma 1, E(D(G)) − E(D(H)) = E(D(G/H)), so E(D(G0 )) = E(D(G)) −
E(D(H)). Thus the blocks G01 , ..., G0l of D(G0 ) are exactly the blocks of G1 /B(H)
and G2 , ..., Gk . By Theorem 3/b, ϕ(G0 ) = ϕ(G/H) = ϕ(G) − 1, thus, by the
induction hypothesis, the theorem is true for G0 .
Proposition 3. B(H) is a strong subgraph of G1 .
Proof. For an edge e ∈ E(H), e is ϕ-extreme in H if and only if e is ϕ-extreme in
G by Lemma 1, thus deleting the non ϕ-extreme edges of G from H the resulting
graph is exactly D(H). The factor-critical components of H − X correspond
to odd components of B(H) − X because the graph B(H) is nice in H by
Theorem (4.1). Thus X is a barrier of B(H). Let Y be a maximal barrier of
B(H) including X. Then, since B(H) is matching-covered by Theorem (4.2), Y
is a strong barrier of B(H). Since X separates the factor-critical components of
H − X from G − V (H) in G, Y separates B(H) − Y from G1 − V (B(H)). It
follows that B(H) is a strong subgraph of G1 with strong barrier Y. t
u

a.) Since S(G) = S(G0 ) (in the second case we contracted G1 in two steps,
namely first B(H) and then the blocks of G1 /B(H)), the statement follows
from the induction hypothesis.
b.) By Proposition 3 and Theorem 3/b, ϕ(G1 /B(H)) = ϕ(G1 ) − 1. ϕ(G) =
Pl Pk Pk
ϕ(G0 ) + 1 = 1 ϕ(G0i ) + 1 = ϕ(G1 ) − 1 + 2 ϕ(Gi ) + 1 = 1 ϕ(Gi ).
Theorem 1/b, ϕ(G) ≤ ϕ(G/Gi ) + ϕ(Gi ) and ϕ(G/Gi ) ≤ ϕ((G/Gi )/ ∪j6=i
c.) By P
Gj ) + j6=i ϕ(Gj ). By adding them, and using that ϕ((G/Gi )/ ∪j6=i Gj ) = 0 by
Pk Pk
a.), and 1 ϕ(Gj ) = ϕ(G) by b.), we have the following. ϕ(G) ≤ 1 ϕ(Gj ) =
ϕ(G). Thus equality holds everywhere, hence ϕ(G) = ϕ(G/Gi ) + ϕ(Gi ), as we
claimed.
d.) For i ≥ 2 the statement follows from the induction hypothesis. For G1 it
follows from the induction hypothesis, Proposition 3 and from Lemma 1. u
t

Theorem 7. The edge sets of the blocks of D(G) and the blocks of M(G) coin-
cide.
Proof. (a) Let e and f be two edges of G so that they belong to the same block
of M(G). By Lemma 2, there exists an optimal ear-decomposition of G so that
the starting even ear C contains e and f. Since every edge of C is ϕ-extreme,
the edges of this cycle belong to the same block of D(G).
422 Zoltán Szigeti

(b) Let e and f be two edges of G so that they belong to the same block of
D(G), say to G1 . By Theorem 6/d, G1 is ϕ-covered, thus, by Theorem 5 and
Lemma 2, there exists an optimal ear-decomposition of G1 so that the starting
even ear contains e and f . By Theorem 6/c, ϕ(G/G1 ) = ϕ(G) − ϕ(G1 ), so this
ear-decomposition can be extended to an optimal ear-decomposition of G so that
the starting even ear contains e and f . Then by Lemma 2, e and f belong to
the same block of M(G). t
u

4 The Tight Cut Lemma


We consider only simple graphs in this section. A graph G is called bicritical
if for each pair of vertices u and v of G the graph G − u − v has a perfect
matching, in other words, it is matching-covered and saturated. A brick is a
3-vertex-connected bicritical graph.
Proposition 4. [6] Let G be a graph with a perfect matching. Then G is bicrit-
ical if and only if G contains no barrier of size at least two. t
u
For a subset S of vertices, δ(S) denotes the set of edges leaving S. δ(S) is
a cut, and a cut δ(S) is called odd if |S| is odd. An odd cut δ(S) is trivial if
|S| = 1 or |S| = 1, otherwise non-trivial. An odd cut is called tight if it contains
exactly one edge of each perfect matching. For a cut δ(S), a perfect matching
M is called S-fat if |δ(S) ∩ M | ≥ 2. Note that if S is an odd cut and M is an
S-fat perfect matching, then |δ(S) ∩ M | ≥ 3.
The aim of this section is to present a simple proof for the so-called Tight
Cut Lemma of Edmonds, Lovász, Pulleyblank [2].
Theorem 8. Let δ(S) be a non-trivial odd cut in a brick G. Then there exists an
S-fat perfect matching in G. In other words, a brick does not contain non-trivial
tight cuts.
Proof. Suppose there exists a non-trivial tight cut δ(S). Then G[S], G[S] are
connected, otherwise an S-fat perfect matching can easily be found. Let us chose
S so that δ(S) is a non-trivial tight cut and |S| is minimal.
Two cases will be distinguished. Either there exists a vertex v ∈ V (G) so that
|δ(v) ∩ δ(S)| ≥ 2 or for each edge uv of G, δ(v) ∩ δ(S) = u and δ(u) ∩ δ(S) = v.
In the first case we will not use the minimality of S thus we may suppose that
v ∈ S.
CASE 1. Since G[S] is connected, v has at least one neighbor in S. Let us
denote the edges incident to v not leaving S by e1 = vu1 , . . . , ek = vuk . Then by
assumption, 1 ≤ k ≤ d(v) − 2. Let G0 = G − e1 − . . . − ek . The following lemma
contains the main steps in Case 1.
Lemma 3. Let v be an arbitrary vertex of a brick G. Let ei = vui 1 ≤ i ≤ k be
1 ≤ k ≤ d(v) − 2 edges incident to v. Let G0 := G − e1 − . . . − ek . Then
(3.1) G0 is almost critical and v ∈ V (B(G0 )).
(3.2) For some 1 ≤ i ≤ k, ui ∈ V (B(G0 )).
On Optimal Ear-Decompositions of Graphs 423

Proof. G − v is factor-critical because G is bicritical. It follows that each edge of


G0 incident to v is critical making. Thus G0 is almost critical and v ∈ V (B(G0 )).
We prove (3.2) by induction on k. If k = 1, then we may change the role of
v and u1 , and the statement follows from (3.1). Suppose (3.2) is true for each
1 ≤ l ≤ k − 1 but not for k. By (3.1) and Theorem (4.3), it follows that there
exists a strong subgraph H of G0 with strong barrier X so that uk belongs to
an even component C of G0 − X.

Proposition 5. v is in an odd component F1 of G0 − X and |X| =


6 1.

Proof. If v ∈ X, then X is a barrier in G, so by Proposition 4, X = v and since


G is 3-vertex-connected there is no even component of G − X, a contradiction.
If v was in an even component of G0 − X, then X ∪ v is a barrier in G0 , and
also in G which contradicts Proposition 4. Thus v is in an odd component F1 of
G0 − X.
If |X| = 1, then, since v has at least two neighbors in G0 , v has a neighbor
in F1 , thus |V (F1 )| ≥ 3. It follows that G − (X ∪ v) has at least two connected
components (C and F1 − v). This is a contradiction because |X ∪ v| = 2 and G
is a brick. t
u

Proposition 6. Let R := {ui : ui ∈ V (H) − X − V (F1 )} and let T be the union


of those connected components of N (G0 ) which contain the vertices in R. Then
(6.1) 1 ≤ |R| ≤ k − 1 and (6.2) T ⊆ V (H) − X − V (F1 ).

Proof. There exists 1 ≤ j ≤ k − 1 so that uj belongs to an odd component of


H − X different from F1 , because by Proposition 5 and Proposition 4, X is not
a barrier in G. Since by assumption |R| ≤ k − 1, (6.1) is proved.
(6.2) follows from the facts that each connected component of T contains a
vertex in V (H) − X − V (F1 ) (some ui ), T ∩ X = ∅ (since by assumption and by
Theorem (4.1), T ∩ V (B(G0 )) = ∅ and by Theorem 4.6/a in [3], X ⊆ V (B(G0 ))),
and X is a cutset in G0 . t
u

Let G00 := G − {ei : ui ∈ R}. Using (6.1), the induction hypothesis implies
that there exists uj ∈ R so that uj ∈ V (B(G00 )).

Proposition 7. There exists an allowed edge of G0 leaving T.

Proof. There exists an allowed edge f = st of G00 so that t ∈ T, s ∈


/ T by Theorem
(4.4) using that v ∈/ T (since v ∈ V (B(G0 )) by (3.2) and T ∩ V (B(G0 )) = ∅),
v ∈ V (B(G00 )) by (3.1) and uj ∈ T ∩ V (B(G00 )). Let M1 be a perfect matching
of G00 which contains f.
To prove the proposition we show that f is an allowed edge of G0 . There
exists an odd component F ∗ 6= F1 of G0 − X which contains t by (6.2). By
adding some edges to G0 between F1 and some even components of G0 − X, the
odd components of G0 − X different from F1 does not change. Consequently, X
is a barrier in G00 and F ∗ is an odd component of G00 − X. Thus there exists
exactly one edge g of M1 leaving F ∗ . g is an allowed edge in G0 by Theorem
424 Zoltán Szigeti

4.6/b in [3], Lemma 2.6 in [7] and Theorem (4.1). Let M2 be a perfect matching
of G0 containing g. g is the only edge of M2 leaving F ∗ because X is a barrier in
G0 . Thus M1 (F ∗ ) ∪ M2 (G0 − V (F ∗ )) ∪ g is a perfect matching of G0 containing
f. t
u
Proposition 7 gives a contradiction because, by the definition of T, there is
no allowed edge leaving T in G0 . u
t

Let T := S ∪ v. By (3.1) G0 is almost critical and v ∈ V (B(G0 )), by (3.2) for


some neighbor ui of v, ui ∈ V (B(G0 )), thus by Theorem (4.4), δ(T ) contains an
allowed edge f of G0 . Let M be a perfect matching of G0 containing f. Since v
has no neighbor in S in G0 M matches v to a vertex in S. Hence M is an S-fat
perfect matching in G, and we are done in Case 1.
CASE 2. The following proposition is the only place where we need the mini-
mality of S.
Proposition 8. G[S − u] is connected for each edge uv with u ∈ S, v ∈ S.
Proof. Suppose not, that is there exists an edge uv with u ∈ S, v ∈ S and a
partition of S − u into two sets S1 6= ∅ = 6 S2 so that there is no edge between
S1 and S2 . First suppose that |S1 | and |S2 | are odd. Then an arbitrary perfect
matching of G containing uv is S-fat. Now suppose that |S1 | and |S2 | are even.
Then S 0 := S1 ∪v is a non-trivial odd cut and |S 0 | < |S|. Then, by the minimality
of S, there exists an S 0 -fat perfect matching M of G. Since |δ(S 0 ) ∩ M | ≥ 3 and
there is no edge between S1 and S2 , M is also S-fat. t
u

Proposition 9. G[S − v] is connected for some edge uv with u ∈ S, v ∈ S.


Proof. Let us consider the blocks of G[S]. If it has only one block, then G[S − v]
is connected for each vertex. If it has more blocks, then, because of the tree
structure of the blocks, there exists a block B that contains only one cut vertex.
G contains no cut vertex so there exists a vertex v ∈ B so that v has a neighbor
u in S and B − v is connected. In both cases the lemma is proven. t
u

By Proposition 9, by Proposition 8 and since we are in Case 2, there exists


an edge uv so that u ∈ S, v ∈ S, v has only one neighbor in S (namely u) and u
has only one neighbor in S (namely v), and G[S − u] and G[S − v] are connected.
From now on, the minimality of S will not be used, that is S and S play the
same role.
Let G00 := G − u − v. Clearly,
(∗∗) G00 has a perfect matching but there exists no allowed edge of G00 between
S − u and S − v.
Proposition 10. Let H be a strong subgraph of G00 with strong barrier X. Then
(10.1) (S − u) ∩ V (H) 6= ∅ and (S − v) ∩ V (H) 6= ∅.
(10.2) Either X ⊆ (S − u) or X ⊆ (S − v).
On Optimal Ear-Decompositions of Graphs 425

Proof. If say (S − u) ∩ V (H) = ∅, then X ∪ v is a barrier of G because all


neighbors of u are in S ∪ v. Since |X ∪ v| ≥ 2 this contradicts Proposition 4.
By Theorem 3/b, H is almost critical. Thus, by Theorem 4.3/b in [3], X ⊆
V (B(H)), so by Theorem (4.1) X belongs to a connected component of N (H).
By (∗∗), G00 has a perfect matching thus X belongs to a connected component
of N (G00 ). Then, by (∗∗), X is disjoint either from (S − u) or from S − v, which
was to be proved. t
u

Proposition 11. G00 is almost critical.

Proof. By (∗∗), G00 has a perfect matching. If G00 is not almost critical, then by
Corollary 4.8 in [3], there exist two disjoint strong subgraphs H1 and H2 in G00
with strong barriers X1 and X2 . By Proposition (10.2), we may suppose that
X1 ⊆ (S − v). By Proposition (10.1), (S − u) ∩ V (H1 ) 6= ∅, thus (S − u) ⊂
V (H1 ) because G[S − u] is connected. Hence (S − u) ∩ V (H2 ) = ∅, contradicting
Proposition (10.1). t
u

Proposition 12. (S − u) ∩ V (B(G00 )) 6= ∅ and (S − v) ∩ V (B(G00 )) 6= ∅.

Proof. Suppose (S − u) ∩ V (B(G00 )) = ∅. We claim that there exists a strong


subgraph H of G00 such that (S − u) ∩ V (H) = ∅. This will give a contradiction
because of Proposition (10.1). Let w ∈ S − u. By Proposition 11, G00 is almost
/ V (B(G00 )), it follows by Theorem (4.3) that
critical. Since, by assumption, w ∈
there exists a strong subgraph H with strong barrier X so that w ∈ / V (H),
that is w belongs to an even component C of G00 − X. By Theorem 4.3/b in [3],
X ⊆ V (B(G00 )), thus X ∩(S −u) = ∅. Since G[S −u] is connected, S −u ⊆ C. u t

By Proposition 11, Theorem (4.4) and Proposition 12 there exists an allowed


edge f of G00 between S − u and S − v. This contradicts (∗∗). t
u

5 ϕ-Covered Graphs
The aim of this section is to extend earlier results on matching-covered graphs of
Lovász and Plummer [6] for ϕ-covered graphs. First we prove a technical lemma.

Lemma 4. Let e be an edge of a ϕ-covered graph G and assume that ϕ(G) ≥ 2.


Then there exists a strong subgraph H of G so that e ∈ E(G/H).

Proof. First suppose that G has a perfect matching. Then, by Corollary 4.8
in [3], G has two vertex disjoint strong subgraphs. Clearly, for one of them
e ∈ E(G/H). Secondly, suppose that G has no perfect matching. Let X be a
maximal vertex set for which co (G − X) > |X|. Then each component of G − X
is factor-critical and, by assumption, |X| =
6 ∅.
i.) If a component F contains an end vertex of e, then by Lemma 1.8 in [8], G
has a strong subgraph H so that V (H) ⊆ V (G) − V (F ) so we are done.
426 Zoltán Szigeti

ii.) Otherwise, by Lemma 1.8 in [8], G has a strong subgraph H with strong
barrier Y ⊆ X so that each component of H − Y is a component of G − X. We
claim that e ∈ E(G/H). If not then the two end vertices u and v of e belong
to Y because we are in ii.). By Lemma 1, H is ϕ-covered and e is ϕ-extreme in
H. Thus by Theorem 3/b and Fact (×), H is matching-covered, that is H/e is
factor-critical. However, co ((H/e) − (Y /e)) = |Y | > |Y /e|, so obviously, H/e is
not factor-critical, a contradiction. t
u

The following theorem generalizes Theorem 2 for ϕ-covered graphs. It is


equivalent to Theorem 5 by Lemma 2.

Theorem 9. Let G be a 2-vertex-connected ϕ-covered graph. Then for every


pair of edges {e, f } there exists an optimal ear-decomposition of G so that the
first ear is even and it contains e and f. t
u

Lovász and Plummer [6] proved that each matching-covered graph has a
2-graded ear-decomposition. This result may also be generalized for ϕ-covered
graphs. A sequence (G0 , G1 , ..., Gm ) of subgraphs of G is a generalized 2-graded
ear-decomposition of G if G0 is a vertex, Gm = G, for every i = 0, ..., m − 1 :
Gi+1 is ϕ-covered, Gi+1 is obtained from Gi by adding at most two disjoint
paths (ears) which are openly disjoint from Gi but their end-vertices belong to
Gi , if we add two ears then both are of odd length, and ϕ(Gi ) ≤ ϕ(Gi+1 ). This is
the natural extension of the original definition of Lovász and Plummer. Indeed,
if G is matching-covered then ϕ(G) = 1, thus the first ear will be even and all
the other ears will be odd.

Theorem 10. Let e be an arbitrary edge of a ϕ-covered graph G. Then G has


a generalized 2-graded ear-decomposition so that the starting ear contains e.

Proof. We prove the theorem by induction on |V (G)|. Let H be a strong sub-


graph of G with strong barrier X. By Lemma 1, H and G/H are ϕ-covered.
By Theorem 3/b and Fact (×), H is matching-covered. Let us denote by v the
vertex of G/H corresponding to H.
Case 1. e ∈ E(H). By the result of Lovász and Plummer [6], H has a 2-
graded ear-decomposition (G0 , G1 , ..., Gk ) so that the starting ear contains e.
Let e0 be an edge of G/H incident to v. By induction G/H has a 2-graded ear-
decomposition (G00 , G01 , ..., G0l ) so that the starting ear contains e0 . Let G00i = Gi
if 0 ≤ i ≤ k and let G00i be the graph obtained from G0i−k−1 by replacing the
vertex v by Gk if k + 1 ≤ i ≤ k + l + 1. We show that (G000 , G001 , ..., G00k+l+1 ) is the
desired generalized 2-graded ear-decomposition. The starting ear contains e, in
each step we added at most two ears, when two ears were added then they were
of odd length, ϕ(G00i ) ≤ ϕ(G00i+1 ) and finally by Lemma 1, each subgraph G00i is
ϕ-covered.
Case 2. e ∈/ E(H). Let us denote by Q the block of G/H which contains e.
Since G/H is ϕ-covered, Q is also ϕ-covered. By induction Q has a generalized
2-graded ear-decomposition (G0 , G1 , ..., Gk ) so that the starting ear contains e.
On Optimal Ear-Decompositions of Graphs 427

Let Gj be the first subgraph of Q which contains v and let a and b the two
edges of Gj incident to v. G/H/Q is also ϕ-covered, since a graph is ϕ-covered if
and only if each block of it is ϕ-covered. By induction G/H/Q has a generalized
2-graded ear-decomposition (G∗0 , G∗1 , ..., G∗p ) so that the starting ear contains an
edge incident to v.
i.) First suppose that a and b are incident to the same vertex u of X in G. Let c be
an edge of H incident to u. H has a 2-graded ear-decomposition (G00 , G01 , ..., G0l )
so that the starting ear contains c. Let G00i = Gi if 0 ≤ i ≤ j, let G00i be the graph
obtained from G0i−j−1 by replacing the vertex u by Gj if j + 1 ≤ i ≤ j + l + 1, let
G00i be the graph obtained from Gi−l−1 by replacing the vertex v by G00j+l+1 if
j + l + 2 ≤ i ≤ k + l + 1 and finally let G00i be the graph obtained from Gi−k−l−1
by replacing the vertex v by G00k+l+1 if k + l + 2 ≤ i ≤ k + l + p + 1. As above,
it is easy to see that (G000 , G001 , ..., G00k+l+p+1 ) is the desired generalized 2-graded
ear-decomposition.
ii.) Secondly, suppose that a and b are incident to different vertices u and w of
X in G. Let c and d be edges of H incident to u and w respectively. By Theorem
5.4.2 in [6] and Theorem 2, H has a 2-graded ear-decomposition (G00 , G01 , ..., G0l )
so that the starting ear P1 contains c and d. u and w divide P1 (which is an
even ear) into two paths D1 and D2 . By Proposition 2, D1 and D2 are of even
length. Let G00i = Gi if 0 ≤ i ≤ j − 1, G00j be graph obtained from Gj by replacing
the vertex v by D1 , G00j+1 be graph obtained from Gj by replacing the vertex
v by P1 , let G00i be graph obtained from G0i−j−1 by replacing P1 by G00j+1 if
j + 2 ≤ i ≤ j + l + 1, let G00i be graph obtained from Gi−l−1 by replacing the
vertex v by G00j+l+1 if j + l + 2 ≤ i ≤ k + l + 1 and finally as above let G00i
be the graph obtained from Gi−k−l−1 by replacing the vertex v by G00k+l+1 if
k + l + 2 ≤ i ≤ k + l + p + 1. It is easy to see that (G000 , G001 , ..., G00k+l+p+1 ) is the
desired generalized 2-graded ear-decomposition. In this case however in order to
see that each subgraph G00i (j + 2 ≤ i ≤ j + l + 1) is ϕ-covered we have to use
that the subgraph of G00i corresponding to G0i−j−1 of H constructed so far is a
strong subgraph of G00i . t
u

The two ear theorem on matching-covered graphs follows easily from the
following: if the addition of some new edge set F to a matching-covered graph
G results in a matching-covered graph then we can choose at most two edges
of F so that by adding them to G the graph obtained is matching-covered. The
next theorem is the natural generalization of this. However, we cannot prove
Theorem 10 using this result.
Theorem 11. Let F := {e1 , ...ek } ⊂ E(G) be a set of non edges of a ϕ-covered
graph G. Suppose that G + F is ϕ-covered and ϕ(G) = ϕ(G + F ). Then there
exist i ≤ j so that G + ei + ej is ϕ-covered.
Proof. We prove the theorem by induction on ϕ(G). If ϕ(G) = 1, then G
is matching-covered by Fact (×), so by the theorem of Lovász and Plummer
(Lemma 5.4.5 in [6]), we are done. In the following we suppose that ϕ(G) ≥ 2.
Let F 0 ⊆ F be a minimal set in F so that G0 := G + F 0 is ϕ-covered. We claim
428 Zoltán Szigeti

that |F 0 | ≤ 2. Suppose that |F 0 | ≥ 3 and let e be an edge of F 0 . By Lemma 4,


there exists a strong subgraph H of G0 so that e ∈ E(G0 /H). By Lemma 1, H
and G0 /H are ϕ-covered. Let E1 := E(H) ∩ F 0 and E2 := E(G0 /H) ∩ F 0 . Then
E1 ∪ E2 = F 0 and E2 6= ∅.
First suppose that E1 = ∅. Then H is also a strong subgraph of G, so
by Lemma 1, G/H is ϕ-covered. By Theroem 3, ϕ(G/H) = ϕ(G) − 1, thus by
induction for G/H and F 0 , there exists F 00 ⊆ F 0 so that |F 00 | ≤ 2 and (G/H)+F 00
is ϕ-covered. By Lemma 1, G + F 00 is ϕ-covered, and we are done.
Secondly, suppose that E1 6= ∅. We claim that each edge of G + E1 is ϕ-
extreme. Clearly, each edge of G is ϕ-extreme in G + E1 . Furthermore, each edge
of E1 is ϕ-extreme in H, so by Lemma 1, they are ϕ-extreme in G + E1 . Since
E1 ⊂ F 0 , this contradicts the minimality of F 0 . t
u

Example. We give an example which shows that the above theorem is not true
if we delete the condition that ϕ(G) = ϕ(G + F ). Let G be the graph obtained
from a star on four vertices by duplicating each edge and let F be the three
edges in E(G). Then G and G + F are ϕ-covered but for every non empty subset
F 0 of F, G + F 0 is not ϕ-covered. Note that ϕ(G) = 3 and ϕ(G + F ) = 1.
Acknowledgement This research was done while the author visited the Re-
search Institute for Discrete Mathematics, Lennéstrasse 2, 53113. Bonn, Ger-
many by an Alexander von Humboldt fellowship. All the results presented in
this paper can be found in [10] or in [11].

References
1. J. Cheriyan, A. Sebő, Z. Szigeti, An improved approximation algorithm for the
minimum 2-edge-connected spanning subgraph problem, Proceedings of IPCO6,
eds.: R.E. Bixby, E.A. Boyd, R.Z. Rios-Mercado, (1998) 126-136.
2. J. Edmonds, L. Lovász, W. R. Pulleyblank, Brick decompositions and the matching
rank of graphs, Combinatorica 2 (1982) 247-274.
3. A. Frank, Conservative weightings and ear-decompositions of graphs, Combinator-
ica 13 (1) (1993) 65-81.
4. C.H.C. Little, A theorem on connected graphs in which every edge belongs to a
1-factor, J. Austral. Math. Soc. 18 (1974) 450-452.
5. L. Lovász, A note on factor-critical graphs, Studia Sci. Math. Hungar. 7 (1972)
279-280.
6. L. Lovász, M. D. Plummer, “Matching Theory,” North Holland, Amsterdam,
(1986).
7. Z. Szigeti, On Lovász’s Cathedral Theorem, Proceedings of IPCO3, eds.: G. Ri-
naldi, L.A. Wolsey, (1993) 413-423.
8. Z. Szigeti, On a matroid defined by ear-decompositions of graphs, Combinatorica
16 (2) (1996) 233-241.
9. Z. Szigeti, On the two ear theorem of matching-covered graphs, Journal of Com-
binatorial Theory, Series B 74 (1998) 104-109.
10. Z. Szigeti, Perfect matchings versus odd cuts, Technical Report No. 98865 of Re-
search Institute for Discrete Mathematics, Bonn, 1998.
11. Z. Szigeti, On generalizations of matching-covered graphs, Technical Report of
Research Institute for Discrete Mathematics, Bonn, 1998.
Gale-Shapley Stable Marriage Problem
Revisited: Strategic Issues and Applications
(Extended Abstract)

Chung-Piaw Teo1? , Jay Sethuraman2?? , and Wee-Peng Tan1


1
Department of Decision Sciences, Faculty of Business Administration,
National University of Singapore
[email protected]
2
Operations Research Center, MIT, Cambridge, MA 02139
[email protected]

1 Introduction

This paper is motivated by a study of the mechanism used to assign primary


school students in Singapore to secondary schools. The assignment process re-
quires that primary school students submit a rank ordered list of six schools
to the Ministry of Education. Students are then assigned to secondary schools
based on their preferences, with priority going to those with the highest exami-
nation scores on the Primary School Leaving Examination (PSLE). The current
matching mechanism is plagued by several problems, and a satisfactory resolu-
tion of these problems necessitates the use of a stable matching mechanism. In
fact, the student-optimal and school-optimal matching mechanisms of Gale and
Shapley [2] are natural candidates.
Stable matching problems were first studied by Gale and Shapley [2]. In a
stable marriage problem we have two finite sets of players, conveniently called the
set of men (M ) and the set of women (W ). We assume that every member of each
set has strict preferences over the members of the opposite sex. In the rejection
model, the preference list of a player is allowed to be incomplete in the sense that
players have the option of declaring some of the members of the opposite sex as
unacceptable; in the Gale-Shapley model we assume that preference lists of the
players are complete. A matching is just a one-to one mapping between the two
sexes; in the rejection model, we also include the possibility that a player may be
unmatched, i.e. the player’s assigned partner in the matching is himself/herself.
The matchings of interest to us are those with the crucial stability property,
defined as follows: A matching µ is said to be unstable if there is a man-woman
pair, who both prefer each other to their (current) assigned partners in µ; this
pair is said to block the matching µ, and is called a blocking pair for µ. A
stable matching is a matching that is not unstable. The significance of stability
is best highlighted by a system where acceptance of the proposed matching is
?
Research supported by a NUS Research grant RP 3970021.
??
Research supported by an IBM Fellowship.

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 429–438, 1999.
c Springer-Verlag Berlin Heidelberg 1999
430 Chung-Piaw Teo, Jay Sethuraman, and Wee-Peng Tan

voluntary. In such a setting, an unstable matching cannot be expected to remain


intact, as the blocking pair(s) would soon discover that they could both improve
their match by joint action; the man and the woman involved in a blocking pair
could just “divorce” their respective partners and “elope.” To put this model
in context, the students play the role of “women” and the secondary schools
play the role of the “men.” Observe that there is a crucial difference between
the stable marriage model as described here, and the problem faced by the
primary school students in Singapore; in the latter, many “women” (students)
can be assigned to the same “man” (secondary school), whereas in the former we
require that the matching be one-to-one. In what follows, we shall restrict our
attention to the one-to-one marriage model; nevertheless, the questions studied
here, and the ideas involved in their resolution are relevant to the many-to-one
marriage model as well.

One of the main difficulties with using the men-optimal stable matching
mechanism is that it is manipulable by the women: some women can intentionally
misrepresent their preferences to obtain a better stable partner. Such (strategic)
questions have been studied for the stable marriage problem by mathematical
economists and game theorists; essentially, this approach seeks to understand
and quantify the potential gains of a deceitful player. Roth [7] proved that when
the men-optimal stable-matching mechanism is used, a man will never be better
off by falsifying his preferences, if all the other people in the problem reveal their
true preferences. Falsifying preferences will at best result in the (original) match
that he obtains when he reveals his true preferences. Gale and Sotomayor [4]
showed that when the man-propose algorithm is used, a woman w can still force
the algorithm to match her to her women-optimal partner, denoted by µ(w), by
falsifying her preference list. The optimal cheating strategy for woman w is to
declare men who rank below µ(w) as unacceptable. Indeed, the cheating strat-
egy proposed by Gale and Sotomayor is optimal in the sense that it enables the
women to obtain their women-optimal stable partner even when the man-propose
mechanism is being adopted. Prior to this study, we know of no analogous results
when the women are required to submit complete preference lists. This question
is especially relevant to the Singapore school-admissions problem: All assign-
ments are done via the centralized posting exercise, and no student is allowed
to approach the schools privately for admission purposes. In fact, in the current
system, students not assigned to a school on their list are assigned to schools
according to some pre-determined criterion set by the Ministry. Thus effectively
this is a matching system where the students are not allowed to remain “single”
at the end of the posting exercise. To understand whether the stable matching
mechanism is a viable alternative, we first need to know whether there is any
incentive for the students to falsify their preferences, so that they can increase
their chances of being assigned to “better” schools. The one-to-one version of
this problem is exactly the question studied in this paper: in the stable marriage
model with complete preferences, with the men-optimal matching mechanism,
is there an incentive for the women to cheat? If so, what is the optimal cheating
strategy for a woman? To our knowledge, the only result known about this prob-
Gale-Shapley Stable Marriage Problem Revisited 431

lem is an example due to Josh Benaloh (cf. Gusfield and Irving [5]), in which
the women lie by permuting their preference lists, and still manage to force the
men-optimal matching mechanism to return the women-optimal solution.

2 Optimal Cheating in the Stable Marriage Problem


Before we derive the optimal cheating strategy we consider a (simpler) question:
Suppose woman w is allowed to reject proposals. Is it possible for woman w to
identify her women-optimal partner by observing the sequence of proposals in
the men-propose algorithm? Somewhat surprisingly, the answer is yes! Our algo-
rithm to compute the optimal cheating strategy is motivated by the observation
that if a woman simply rejects all the proposals made to her, then the best (ac-
cording to her true preference list) man among those who have proposed to her
is her women-optimal partner. Hence by rejecting all her proposals, a woman
can extract information about her best possible partner. Our algorithm for the
optimal cheating strategy builds on this insight: the deceitful woman rejects as
many proposals as possible, while remaining engaged to some man who proposed
earlier in the algorithm. Using a backtracking scheme, the deceitful woman can
use the matching mechanism repeatedly to find her optimal cheating strategy.

2.1 Finding Your Optimal Partner


We first describe algorithm OP —an algorithm to compute the women-optimal
partner for w using the man-propose mechanism. (Recall that we do this under
the assumption that woman w is allowed to remain single.)
Algorithm OP

1. Run the man-propose algorithm, and reject all proposals made to w. At the
end, w and a man, say m, will remain single.
2. Among all the men who proposed to w in Step 1, let the best man (according
to w) be m1 .

Theorem 1. m1 is the women-optimal partner for w.

Proof: Let µ(w) denote the women-optimal partner for w. We modify w’s pref-
erence list by inserting the option to remain single in the list, immediately after
µ(w). (We declare all men that are inferior to µ(w) as unacceptable to w.) Con-
sequently, in the man-propose algorithm, all proposals inferior to µ(w) will be
rejected. Nevertheless, since there exists a stable matching with w matched to
µ(w), our modification does not destroy this solution. It is also well known that
the set of people who are single is the same for all stable matchings (cf. Roth
and Sotomayor [8], pg. 42). Thus, w must be matched in all stable matchings
with the modified preference list. The men-optimal matching for this modified
preference list must match w to µ(w). In particular, µ(w) must have proposed to
432 Chung-Piaw Teo, Jay Sethuraman, and Wee-Peng Tan

w during the execution of the man-propose algorithm. Note that until µ(w) pro-
poses to w, the man-propose algorithm for the modified list runs exactly in the
same way as in Step 1 of OP . The difference is that Step 1 of OP will reject the
proposal from µ(w), while the man-propose algorithm for the modified list will
accept the proposal from µ(w), as w prefers µ(w) to being single. Hence, clearly
µ(w) is among those who proposed to w in Step 1 of OP , and so m1 ≥w µ(w).
Suppose m1 >w µ(w). Consider the modified list in which we place the option
of remaining single immediately after m1 . We run the man-propose algorithm
with this modified list. Again, until m1 proposes to w, the algorithm runs ex-
actly the same as in Step 1 of OP , after which the algorithm returns a stable
partner for w who is at least as good as m1 . This gives rise to a contradiction
as we assumed µ(w) to be the best stable partner for w.

Observe that under this approach, the true preference list of w is only used
to compare the men who have proposed to w. We do not need to know her exact
preference list; we only need to know which man is the best among a given set of
men, according to w. Hence the information set needed here to find the women-
optimal partner of w is much less than that needed when the woman-propose
algorithm is used. This is useful for the construction of the cheating strategy as
the information on the “optimal” preference list is not given a-priori and is to
be determined.

2.2 Cheating Your Way to a Better Marriage


Observe that the preceding procedure only works when woman w is allowed
to remain single throughout the matching process, so that she can reject any
proposal made to her in the algorithm. Suppose we do not give the woman an
option to declare any man as unacceptable. How do we determine her best stable
partner? This is essentially a restatement of our original question: who is the
best stable partner woman w can have when the man-propose algorithm is used
and when she can lie only by permuting her preference list.
A natural extension of Algorithm OP is for woman w to: (i) accept a proposal
first, and then reject all future proposals. (ii) From the list of men who proposed
to w but were rejected, find her most preferred partner; repeat the Gale-Shapley
algorithm until the stage when this man proposes to her. (iii) Reverse the earlier
decision and accept the proposal from this most preferred partner, and continue
the Gale-Shapley algorithm by rejecting all future proposals. (iv) Repeat (ii)
and (iii) until the woman cannot find a better partner from all other propos-
als. Unfortunately, this elegant strategy does not always yield the best stable
partner a woman can achieve under our model. The reason is that this greedy
improvement technique does not allow for the possibility of rejecting the cur-
rent best partner, in the hope that this rejection will trigger a proposal from
a better would-be partner. Our algorithm in this paper does precisely that. Let
P (w) = {m1 , m2 , . . . mn } be the true preference list of woman w, and let
P (m, w) be a preference list for w that returns m as her men-optimal partner.
Our algorithm constructs P (m, w) iteratively, and consists of the following steps:
Gale-Shapley Stable Marriage Problem Revisited 433

1. Run the man-propose algorithm with the true preference list P (w) for woman
w. Keep track of all men who propose to w. Let the men-optimal partner for
w be m, and let P (m, w) be the true preference list P (w).
2. Suppose mj proposed to w in the Gale-Shapley algorithm. By moving mj to
the front of the list P (m, w), we obtain a preference list for w such that the
men-optimal partner will be mj . Let P (mj , w) be this altered list. We say
that mj is a potential partner for w.
3. Repeat step 2 for every man who proposed to woman w in the algorithm;
after this, we say that we have exhausted man m, the men-optimal partner
obtained with the preference list P (m, w).
4. If a potential partner for w, say man u, has not been exhausted, run the
Gale-Shapley algorithm with P (u, w) as the preference list of w. Identify
new possible partners and their associated preference lists, and classify man
u as exhausted.
5. Repeat Step 4 until all possible partners of w are exhausted. Let N denote
the set of all possible (and hence exhausted) partners for w.
6. Among the men in N let ma be the man woman w prefers most. Then
P (ma , w) is an optimal cheating strategy for w.

The men in the set N at the end of the algorithm have the following crucial
properties:

– For each man m in N , there is an associated preference list for w such the
Gale-Shapley algorithm returns m as the men-optimal partner for w with
this list.
– All other proposals in the course of the Gale-Shapley algorithm come from
other men in N . (Otherwise, there will be some possible partners who are
not exhausted.)

With each run of the Gale-Shapley algorithm, we exhaust a possible partner,


and so we need at most n Gale-Shapley algorithms before termination.

Theorem 2. π = P (ma , w) is an optimal strategy for woman w.

Proof: (by contradiction) We use the convention that r(m) = k if man m is the
k th man on woman w’s list. Let π ∗ = {mp1 , mp2 , . . . , mpn } be the preference list
that gives rise to the best stable partner for w under the man-propose algorithm.
Let this man be denoted by mpb , and let woman w strictly prefer mpb to ma
(under her true preference list). Recall that we use r(m) to represent the rank
of m under the true preferences of w; by our assumption, r(mpb ) < r(ma ),
i.e., mpb is ranked higher than ma . Observe that the order of the men who
do not propose to woman w is irrelevant and does not affect the outcome of
the Gale-Shapley’s algorithm. Furthermore, men of rank higher than r(mpb ) do
not get to propose to w, otherwise we can cheat further and improve on the
best partner for w, contradicting the optimality of π ∗ . Thus we can arbitrarily
alter the order of these men, without affecting the outcome. Without loss of
generality, we may assume that 1 = r(mp1 ) < 2 = r(mp2 ) < . . . < q = r(mpb ).
434 Chung-Piaw Teo, Jay Sethuraman, and Wee-Peng Tan

Since r(mpb ) < r(ma ), ma will appear anywhere after mpb in π ∗ : thus, ma can
appear in any position from mpb+1 to mpn .
Now, we modify π ∗ such that all men who (numerically) rank lower than ma
but higher than mpb (under true preferences) are put in order according to their
ranks. This is accomplished by moving all these men before ma in π ∗ . With that
alteration, we obtain a new list π̃ = {mq1 , mq2 , . . . , mqn } such that:

(i) 1 = r(mq1 ) < 2 = r(mq2 ) < . . . < s = r(mqs ).


(ii) mq1 = mp1 . . . mqb = mpb , where the position of those men who rank higher
than mpb is unchanged.
(iii) r(ma ) = s + 1, ma ∈ {mqs+1 , mqs+2 , . . . mqn }.
(iv) The men in the set {mqs+1 , mqs+2 , . . . mqn } retain their relative position
with respect to one another under π ∗ .

Note that the men-optimal partner of w under π̃ cannot come from the set
{mqs+1 , mqs+2 , . . . mqn }. Otherwise, since the set of men who proposed in the
course of the algorithm must come from {mqs+1 , mqs+2 , . . . mqn }, and since the
preference list π ∗ retains the relative order of the men in this set, the same
partner would be obtained under π ∗ . This leads to a contradiction as π ∗ is
supposed to return a better partner for w. Hence, we can see that under π̃, we
already get a better partner than under π.
Now, since the preference list π returns ma with r(ma ) = s + 1, we may
conclude that the set N (obtained from the final stage of the algorithm) does not
contain any man of rank smaller than s + 1. Thus N ⊆ {mqs+1 , mqs+2 , . . . mqn }.
Suppose mqs+1 , mqs+2 , . . . , mqw do not belong to the set N , and mqw+1 is the
first man after mqs who belongs to the set N . By construction of N , there
exists a permutation π̂ with mqw+1 as the stable partner for w under the men-
optimal matching mechanism. Furthermore, all of those who propose to w in
the course of the algorithm are in N , and hence they are no better than ma to
w. Furthermore, all proposals come from men in {mqw+1 , mqw+2 , . . . mqn }, since
N ⊆ {mqs+1 , mqs+2 , . . . mqn }.
By altering the order of those who did not propose to w, we may assume that
π̂ is of the form {mq1 , mq2 , . . . , mqs−1 , mqs , . . . , mqw , mqw+1 , . . .}, where the first
qw + 1 men in the list are identical to those in π̃. But, the men-optimal stable so-
lution obtained using π̂ must also be stable under π̃, since w is match to mqw+1 ,
and the set of men she strictly prefers to mqw+1 is identical in both π̂ and π̃.
This is a contradiction as π̃ is supposed to return a men-optimal solution better
than ma . Thus π ∗ does not exist, and so π is optimum and ma is the best stable
partner w can get by permuting her preference list.

We now present an example to illustrate how our heuristic works.


Gale-Shapley Stable Marriage Problem Revisited 435

Example 1: Consider the following stable marriage problem:

123451 112354
234512 221453
351423 332514
431245 445123
515234 551234
True Preferences of the Men True Preferences of the Women

We construct the optimal cheating strategy for woman 1.

– Step 1: Run Gale-Shapley with the true preference list for woman 1; her men-
optimal partner is man 5. Man 4 is the only other man who proposes to her
during the Gale-Shapley algorithm. So P (man5, woman1) = (1, 2, 3, 5, 4).
– Step 2-3: Man 4 is moved to the head of woman 1’s preference list; i.e.,
P (man4, woman1) = (4, 1, 2, 3, 5). Man 5 is exhausted, and man 4 is a
potential partner.
– Step 4: As man 4 is not yet exhausted, we run the Gale-Shapley algorithm
with P (man4, woman1) as the preference list for woman 1. Man 4 will be
exhausted after this, and man 3 is identified as a new possible partner, with
P (man3, woman1) = (3, 4, 1, 2, 5).
– Repeat Step 4: As man 3 is not yet exhausted, we run Gale-Shapley with
P (man3, woman1) as the preference list for woman 1. Man 3 will be ex-
hausted after this. No new possible partner is found, and so the algorithm
terminates.

Example 1 shows that woman 1 could cheat and get a partner better than the
men-optimal solution. However, her women-optimal partner in this case turns
out to be man 1. Hence Example 1 also shows that woman 1 cannot always assure
herself of getting the women-optimal partner through cheating, in contrast to
the case when rejection is allowed in the cheating strategy.

3 Strategic Issues in the Gale-Shapley Problem


By requiring the women to submit complete preference lists, we are clearly re-
stricting their strategic options, and thus many of the strong structural results
known for the model with rejection may not hold in this model. This is good
news, for it reduces the incentive for a woman to cheat. In the rest of this section,
we present some examples to show that the strategic behaviour of the women
can be very different under the models with and without rejection.

3.1 The Best Possible Partners (Obtained from Cheating) May Not
Be Women-Optimal
In the two-sided matching model with rejection, it is not difficult to see that the
women can always force the man-propose algorithm to return the women-optimal
436 Chung-Piaw Teo, Jay Sethuraman, and Wee-Peng Tan

solution (e.g. each woman rejects all those who are inferior to her women-optimal
partner). In our model, where rejection is forbidden, the influence of the women
is far less, even if they collude. A simple example is when each woman is ranked
first by exactly one man. In this case, there is no conflict among the men, and
in the men-optimal solution, each man is matched to the woman he ranks first.
(This situation arises whenever each man ranks his men-optimal partner as his
first choice.) In this case, the algorithm will terminate with the men-optimal
solution, regardless of how the women rank the men in their lists. So ruling out
the strategic option of remaining single for the women significantly affects their
ability to change the outcome of the game by cheating.
By repeating the above analysis for all the other women in Example 1, we
conclude that the best possible partner for woman 1, 2, 3, 4, and 5 are re-
spectively man 3, 1, 2, 4, and 3. An interesting observation is that woman 5
cannot benefit by cheating alone (she can only get her men-optimal partner no
matter how she cheats). However, if woman 1 cheats using the preference list
(3, 4, 1, 2, 5), woman 5 will also benefit by being matched to man 5, who is first
in her list.

3.2 Multiple Strategic Equilibria

Suppose each woman w announces a preference list P (w). The set of strategies
{π(1), π(2), . . . , π(n)} is said to be in strategic equilibrium if none of the women
has an incentive to deviate unilaterally from this announced strategy. It is easy
to see that if a woman benefits from announcing a different permutation list
(instead of her true preference list), then every other woman would also benefit,
i.e. every other woman will get a partner who is at least as good as her men-
optimal partner (cf. Roth and Sotomayor [8] ).

Theorem 3. If a single woman can benefit by cheating, then the game has mul-
tiple strategic equilibria.

Proof: A strategic equilibrium can be constructed by repeating the proposed


cheating algorithm iteratively, improving the partner for some woman at each
iteration. (Notice that the partner of a woman at the end of iteration j is at
least as good as her partner at the beginning of the iteration j.) The algorithm
will thus terminate at a strategic equilibrium, where at least one woman will
be matched to someone whom she (strictly) prefers to her men-optimal part-
ner. Another strategic equilibrium is obtained if each woman w announces a
list of the form {m1 , m2 , . . . , mn }, with m1 being her men-optimal partner and
m2 , m3 , . . . , mn in the same (relative) order as in her true preference list. Clearly
the man-propose algorithm will match woman w to m1 , since moving m1 to the
front of w’s preference list does not affect the sequence of proposals in the man-
propose algorithm. No woman can benefit from cheating, as all other women
are already matched to their announced first-ranked partner. Thus we have con-
structed two strategic equilibria.
Gale-Shapley Stable Marriage Problem Revisited 437

3.3 Does It Pay To Cheat?

Roth [7] shows that under the man-propose mechanism, the men have no incen-
tives to alter their true preference lists. In the rejection model, however, Gale
and Sotomayor [3] show that a woman has an incentive to cheat as long as she
has at least two distinct stable partners. Pittel [6] shows that the average num-
ber of stable solutions is asymptotic to nlog(n)/e, and with high probability,
the rank of the women-optimal and men-optimal partner for the woman are re-
spectively log(n) and n/log(n). Thus in typical instances of the stable marriage
game under the rejection model, most of the women will not reveal their true
preference lists.
Many researchers have argued that the troubling implications from these
studies are not relevant in practical stable marriage game, as the model assumes
that the women have full knowledge of each individual’s preference list and the
set of all the players in the game. For the model we consider, it is natural to
ask whether it pays (as in the rejection model) for a typical woman to solicit
information about the preferences of all other participants in the game. We run
the cheating algorithm on 1000 instances, generated uniformly at random, for
n = 8 and the number of women who benefited from cheating is tabulated in
Table 1.

Number of Women who benefited 0 1 2 3 4 5 6 7 8


Number of observations 740 151 82 19 7 1 0 0 0
Table 1

Interestingly, the number of women who can gain from cheating is surprisingly
low. In fact, in 74% of the instances, the men-optimal solution is their only
option, no matter how they cheat. The average percentage of women who benefit
from cheating is merely 5.06%.
To look at the typical strategic behaviour on larger instances of the stable
marriage problem, we run the heuristic on 1000 random instances for n = 100.
The cumulative plot is shown in Figure 1. In particular, in more than 60% of
the instances at most 10 women (out of 100) benefited from cheating, and in
more than 96% of the instances at most 20 women benefited from cheating.
The average number of women who benefited from cheating is 9.52%. Thus, the
chances that a typical woman can benefit from acquiring complete information
(i.e., knowing the preferences of the other players) is pretty slim in our model.
We have repeated the above experiment for large instances of the Gale-
Shapley model. Due to computational requirements, we can only run the ex-
periment on 100 random instances of the problem with 500 men and women.
Again the insights obtained from the 100 by 100 cases carry over: the number
of women who benefited from cheating is again not more than 10% of the to-
tal number of the women involved. In fact, the average was close to 6% of the
women population in the problem. This suggests that the number of women who
can benefit from cheating in the Gale-Shapley model with n women grows at a
rate which is slower than a linear function of n. However, detailed probabilistic
438 Chung-Piaw Teo, Jay Sethuraman, and Wee-Peng Tan

1000

800
Number of instances (out of 1000)

600

400

200

0
0 10 20 30 40 50 60 70 80 90 100
Number of people who benefit from cheating

Fig. 1. Benefits of cheating: cumulative plot for n = 100

analysis of this phenomenon is a challenging problem that is beyond the scope


of the present paper.
A practical advice for all women in the stable marriage game, using men-
optimal matching mechanism: “don’t bother to cheat !”

References
1. Dubins, L.E. and D.A. Freedman (1981), “Machiavelli and the Gale- Shapley Al-
gorithm.“ American Mathematical Monthly, 88, pp485-494.
2. Gale, D., L. S. Shapley (1962). College Admissions and the Stability of Marriage.
American Mathematical Monthly 69 9-15.
3. Gale, D., M. Sotomayor (1985a). Some Remarks on the Stable Matching Problem.
Discrete Applied Mathematics 11 223-232.
4. Gale, D., M. Sotomayor (1985b). Ms Machiavelli and the Stable Matching Problem.
American Mathematical Monthly 92 261-268.
5. Gusfield, D., R. W. Irving (1989). The Stable Marriage Problem: Structure and
Algorithms, MIT Press, Massachusetts.
6. Pittel B. (1989), the Average Number of Stable Matchings, SIAM Journal of Dis-
crete Mathematics, 530-549.
7. Roth, A.E. (1982). The Economics of Matching: Stability and Incentives. Mathe-
matics of Operations Research 7 617-628.
8. Roth, A.E., M. Sotomayor (1991). Two-sided Matching: A Study in Game-theoretic
Modelling and Analysis, Cambridge University Press, Cambridge.
Vertex-Disjoint Packing of Two Steiner Trees:
Polyhedra and Branch-and-Cut

Eduardo Uchoa and Marcus Poggi de Aragão

Departamento de Informática - Pontifı́cia Universidade Católica do Rio de Janeiro


Rua Marquês de São Vicente, 225 - Rio de Janeiro RJ, 22453-900, Brasil.
[email protected], [email protected]

Abstract. Consider the problem of routing the electrical connections


among two large terminal sets in circuit layout. A realistic model for
this problem is given by the vertex-disjoint packing of two Steiner trees
(2VPST), which is known to be NP-complete. This work presents an
investigation on the 2VPST polyhedra. The main idea is to depart from
facet-defining inequalities for a vertex-weighted Steiner tree polyhedra.
Some of these inequalities are proven to also define facets for the pack-
ing polyhedra, while others are lifted to derive new important families
of inequalities, including proven facets. Separation algorithms are also
presented. The resulting branch-and-cut code has an excellent perfor-
mance and is capable of solving problems on grid graphs with up to
10000 vertices and 5000 terminals in a few minutes.

1 Introduction
The Vertex-disjoint Packing of Steiner Trees problem (VPST) is the following:
given a connected graph G = (V, E) and K disjoint vertex-sets N1 , . . . , NK ,
called terminal sets, find disjoint vertex-sets T1 , . . . , TK such that, for each k ∈
{1, . . . , K}, Nk ⊆ Tk and G[Tk ] (the subgraph induced by Tk ) is connected.
Deciding whether a VPST instance is feasible or not is NP-complete, and many
times a very challenging task. VPST has a wide application in layout of printed
boards or chips, as it can model in an accurate way the routing of electrical
connections among the circuit components (see Korte et al. [3] or Lengauer [4]).
This work studies the case where K = 2, the Vertex-Disjoint Packing of
Two Steiner Trees (2VPST). This restricted problem is still NP-complete [3].
Although most VPST applications involves more than two terminal sets, many
problems in circuit layout have only two large terminal sets, for instance the
ground and the Vcc terminals. Finding a solution for these two large sets may
represent a major step in solving the whole routing problem. In fact, it is usual in
chip layout to route the ground and Vcc connections over special layers, separated
from the remaining connections. This particular routing problem leads to the
2VPST.
We address an optimization version of the 2VPST: find two disjoint vertex-
sets T1 and T2 , such that G[T1 ] and G[T2 ] are connected, minimizing |N1 \ T1 | +
|N2 \ T2 |. In other words, we look for a solution leaving the minimum possible

G. Cornuéjols, R.E. Burkard, G.J. Woeginger (Eds.): IPCO’99, LNCS 1610, pp. 439–452, 1999.
c Springer-Verlag Berlin Heidelberg 1999
440 Eduardo Uchoa and Marcus Poggi de Aragão

number of unconnected terminals. It is also allowed to set weights in order to


specify which terminals have priority in being connected. These features may
be very useful for a practitioner and suggests an iterative approach for layout:
if the 2VPST instance is not feasible, a best possible solution gives valuable
information on how to make small changes in terminal placement to achieve
feasibility.
A previous work that is somehow related to ours is Grötschel et al. [1,2],
where Edge-disjoint Packing of Steiner Trees (EPST) is studied. EPST gives an
alternative model for the routing problem in circuit layout. Their branch-and-
cut code could solve switchbox instances (G is a grid graph and all terminals
are placed in the outer face) with up to 345 vertices, 24 terminal sets, and 68
terminals.

2 Formulation
For each vertex i ∈ V define a binary variable xi , where xi = 1 if i belongs to
T1 and xi = 0 if i belongs to T2 . This choice of variables obliges every vertex to
belong either to T1 or T2 . Of course, terminals of N1 in T2 and terminals of N2
in T1 are considered unconnected. A formulation for 2VPST follows.
 X X

 Min z = − wi + wi xi (1)



 i∈N1 i∈V

 P P


 s.t. xi − xi ≤ 1 ∀S ⊆ V, |S| = 2; ∀R ∈ ∆(S) (2)
(F) i∈S i∈R

 P P

 − xi ≤ |R| − 1 ∀S ⊆ V, |S| = 2; ∀R ∈ ∆(S) (3)


xi +

 i∈S i∈R



xi ∈ {0, 1} ∀i ∈ V (4)

Weight wi must be negative if i ∈ N1 , positive if i ∈ N2 and 0 otherwise. The


minimum possible number of unconnected terminals matches z ∗ when wi = −1
for all i ∈ N1 and wi = 1 for all i ∈ N2 . In any case the 2VPST instance is
feasible iff z ∗ = 0. The symbol ∆(S) denotes the set of all minimal separators of
S. A vertex-set R is a separator of S if any path joining the pair of vertices in S
has at least one vertex in R. Constraints F.2 says: if both vertices in S are in T1 ,
then at least one vertex in any separator of S must also be in T1 . Constraints
F.3 are complementary to F.2: if both vertices in S are in T2 , then at least one
vertex in any separator of S must be in T2 .

3 Polyhedral Investigations
Many combinatorial optimization problems can be naturally decomposed into
simpler subproblems. The 2VPST decompose into two Steiner-like subproblems.
There are cases where a good polyhedral description for the subproblems yields
a good description of the main polyhedra. On the other hand, sometimes even
Packing of Two Steiner Trees 441

a complete subproblem description yields but a poor description of the main


polyhedra. Experiences with column generation showed that this is indeed the
case of 2VPST [6]. In these situations, there is a need to also look for joint
inequalities, i.e. inequalities which are not valid for any subproblem polyhedra
but valid for the main polyhedra.
Definition 1. Given a graph G = (V, E), a v-tree of G is a vertex-set T ⊆ V
such that G[T ] is connected. A complementary v-tree of G is a vertex-set T ⊆ V
such that G[V \ T ] is connected. A double v-tree of G is a v-tree T of G that is
also a complementary v-tree of G. (G being clear, we say v-tree or double v-tree
for short).

Definition 2. Let χU be the incidence vector of a vertex-set U . Define P (G) as


the convex hull of vectors χT such that T is a double v-tree of G. Define P1 (G)
as the convex hull of vectors χT such that T is a v-tree of G and P2 (G) as the
convex hull of vectors χT such that T is a complementary v-tree of G.
The vectors defining polyhedra P correspond to the solutions of formulation
F, so P is associated to 2VPST. Polyhedra P1 and P2 are associated to a Vertex
Weighted Steiner Tree problem: given a graph with positive and negative vertex
weights, find a tree with maximum/minimum total vertex weight. Note that P1
and P2 are complementary polyhedra and that every valid inequality for one can
be converted into a valid inequality for the other by variable complementation,
i.e. replacing x by 1 − x. For instance, inequalities F.2 are valid for P1 and the
complementary inequalities F.3 are valid for P2 . Therefore whenever we talk
about polyhedra P1 , it is implicit that similar results also hold for P2 .
Our idea is to use subproblem polyhedra P1 and P2 as a starting point for
investigating main polyhedra P . We show that: (i) some inequalities that define
facets of P1 or P2 also define facets of P and (ii) some of these inequalities which
do not define facets of P can be lifted to derive new joint inequalities.
Theorem 1. P (G), P1 (G) and P2 (G) are full-dimensional.
Proof. Let A be a spanning tree of G, rooted at an arbitrary vertex. For each
i ∈ V , the vertex-set Ui containing i and his descendents in A is a double v-tree,
so χUi ∈ P (G). As χ∅ ∈ P (G), there are |V | + 1 affine independent vectors in
P (G). As P (G) ⊆ P1 (G) ∩ P2 (G), the result follows. t
u

3.1 The p-Separator Facets of P1


Definition 3. Let S be a vertex-set, where |S| ≥ 2. A vertex-set R is a separator
of S if any path joining a pair of vertices in S has at least one vertex in R. A
separator R of S is proper if S ∩ R = ∅ and minimal if no other separator of S
is contained in R. The symbol ∆(S) indicates the set of all proper and minimal
separators of S. Given a separator R of S and a vertex i ∈ V , define SiR as the
maximum subset of vertices in S that can belong to a v-tree Ti containing i and
such that Ti ∩ R ⊆ {i}. Note that a separator R of S is minimal iff |SiR | ≥ 2 for
each i ∈ R.
442 Eduardo Uchoa and Marcus Poggi de Aragão

G1 b G2
c a b
a
e
f
e
f
d i
h c d
g
l
j
k

Fig. 1. Graphs for examples

For example, in graph G1 (Fig. 1) taking S = {a, c, h}, R = {b, e, k, i}


is a proper minimal separator of S, SbR = {a, c, h}, SeR = {a, h}, SjR = {a}
and SgR = ∅. The following theorem suggests that the concept of separator
is fundamental to understand the facial structure of P1 . The results of this
subsection are proved in the appendix.

Theorem 2. Every facet of P1 (G) can be defined


P P inequality (−xi ≤ 0
by a trivial
or xi ≤ 1) or by an inequality in format i∈S βi xi − i∈R αi xi ≤ 1, where β
and α are vectors of positive numbers and R is a proper separator of S.

We define a family of potentially facet-defining inequalities.

Definition 4. Let S be a vertex-set, where |S| = p ≥ 2, and R ∈ ∆(S). The


following inequalities are called p-separators, where αi is an integer greater than
|SiR | − 1. X X
xi − αi xi ≤ 1
i∈S i∈R

Constraints F.2 are 2-separators. In graph G1, xa +xf −xb −xe −xj ≤ 1 is an
example of such inequality. In the same graph, xa + xf + xg − xb − 2xe − 2xk ≤ 1
is a 3-separator valid for P1 (G1). Sometimes taking αi = |SiR | − 1 for all i ∈ R
does not yield a valid inequality for P1 . For instance,
P in graph G2 (Fig. 1) taking
S = {a, b, c, d} and R = {e, f },Pthe 4-separator i∈S xi −xeP −xf ≤ 1 is not valid
for P1 (G2). But 4-separators i∈S xi − 2xe − xf ≤ 1 and i∈S −xe − 2xf ≤ 1
are valid and facet-defining.

Theorem 3. For each S and each R ∈ ∆(S), there is at least one p-separator
inequality which is valid and facet-defining for P1 (G).

P P
Corollary 1. If the p-separator i∈S xi − i∈R (|SiR | − 1)xi ≤ 1 is valid for
P1 (G), then it defines a facet of P1 (G).
Packing of Two Steiner Trees 443

The above theorem does not tell how to calculate the α coefficients of a p-
separator facet. But given sets S and R ∈ ∆(S), the coefficients of a p-separator
valid for P1 can be computed by the following procedure:
Procedure Compute α
M ←− ∅;
For each i ∈ R {
If (exists j ∈ M such that SiR ∩ SjR = ∅)
Then { αi ←− |SiR |; }
Else { αi ←− |SiR | − 1; M ←− M ∪ {i}; }
}

As example, take graph G2 with S = {a, b, c, d} and R = {e, f }. If we pick


vertices in order (e, f ), the procedure yields αe = 1 and αf = 2. If the order is
reversed, it yields αf = 1 and αe = 2. Note that if p ≤ 3, the procedure always
yields αi = |SiR | − 1 for all i ∈ R. Then, by Corollary 1, all such 2-separators
and 3-separators define facets of P1 (G). The above procedure is not optimal, if
p ≥ 4 the resulting p-separators are not always facet-defining.

3.2 From P1 to P : p-Separators, Lifted p-Separators, and p-Crosses


If a certain p-separator defines a facet of P1 (G), it may also define a facet of
P (G), so it is interesting to investigate when it happens. Also interesting is to
investigate when this does not happen: in this case there is room for lifting the
p-separator, producing a new joint inequality.
Definition 5. The inequalities valid for P (G) that can be obtained by lifting α
coefficients of a p-separator facet of P1 (G) or P2 (G) are called lifted p-separators.
The following definitions and lemma (proved in the appendix) are necessary
for announcing and proving the main results of this section.
Definition 6. Let T be a double v-tree of G and t a vertex in T . A sequence of
nested double v-trees from t to T is any sequence of double v-trees T1 , . . . , T|T |
such that {t} = T1 ⊂ . . . ⊂ T|T | = T . (In other words, for each i, 1 < i ≤ |T |,
Ti−1 may be obtained from Ti by removing a vertex distinct from t).

Lemma 1. Suppose G is biconnected. Given a double v-tree T of G and t ∈ T ,


it is always possible to construct a sequence of nested double v-trees from t to T .

Definition 7. A separator R of S = {s1 , . . . , sp } induces the following struc-


ture in G. Graph G[V \ R] is split in a number of connected components: those
containing exactly one vertex in S are called A-components and those with no
vertices in S are called B-components. Define Ai as the vertex-set of the A-
component containing si (when R is not proper some A-components do not ex-
ist). The vertex-sets corresponding to B-components are numbered arbitrarily as
B1 , . . . , Bq . Note that |SiR | = 0 if i belongs to a B-component and |SiR | = 1 if i
belongs to a A-component.
444 Eduardo Uchoa and Marcus Poggi de Aragão

P P
Theorem 4. Suppose G is biconnected. Let i∈S xi − i∈R (|SiR | − 1)xi ≤ 1 be
a valid p-separator, where R induces a structure without B-components and that
every A-component is a double v-tree. If for every i ∈ R, there is a double v-tree
Ui containing {i} ∪ SiR such that Ui ∩ R = {i}, this inequality defines a facet of
P (G).

Proof. Let at x ≤ a0 be a valid inequality for P (G) such that


X X
F = {x ∈ P (G) | xi − (|SiR | − 1)xi = 1} ⊆ Fa = {x ∈ P (G) | at x = a0 } .
i∈S i∈R

We show that at x ≤ a0 is an scalar multiple of this p-separator. Consider the


structure induced by R. For every A-component Aj , applying Lemma 1, con-
struct a sequence of nested double v-trees from sj to Aj . For each double v-tree
T in this sequence, χT ∈ F . So asj = a0 and for every i 6= sj ∈ Aj , ai = 0. For
every vertex i ∈ R, χUi ∈ F , where Ui is the double v-tree given by hypothesis.
So ai = −(|SiR | − 1)a0 . t
u

For example, in graph G1, the 2-separator xa + xf − xb − xe − xj ≤ 1, 3-


separator xa + xf + xg − xb − 2xe − 2xk ≤ 1 and 4-separator xa + xc + xg +
xh − 2xb − 2xe − xi − 2xk ≤ 1 define facets of P (G1). The converse of Theorem
4 does not hold even if p = 3, so we need some additional conditions to be able
to lift α coefficients.
P P
Theorem 5. Suppose G is biconnected. Let i∈S xi − i∈R (|SiR | − 1)xi ≤ 1
be a valid p-separator, where R induces a structure without B-components and
that every A-component is a double v-tree. Suppose that there is a vertex i ∈ R
not satisfying Theorem 4 conditions. If |SiR | = p, the coefficient αi can be lifted
in at least one unit.

Proof.
P We show that for every double v-tree T containing i, γT = |S ∩ T | −
i∈R∩T αi ≤ 0. If T contains Si , |S ∩ T | = |Si |. As T must contain another
R R

vertex j of R, γT ≤ |SiR | − (|SiR | − 1) − (|SjR | − 1) = 2 − |SjR | ≤ 0. If T does not


contain SiR , γT ≤ (|SiR | − 1) − (|SiR | − 1) ≤ 0. t
u

If p = 2, the above results can be collected as follows:


P P
Corollary 2. Suppose G is biconnected. Let i∈S xi − i∈R xi ≤ 1 be a 2-
separator, where R induces a structure without B-components. This inequality
defines a facet of P (G) iff for every i ∈ R, there is a double v-tree Ui containing
{i, s1 , s2 } such that Ui ∩ R = {i}

Proof. As R is minimal, every vertex in R is adjacent to both A1 and A2 . So A1


and A2 are double v-trees. So the “if” part follows from Theorem 4. For each
i ∈ R, |SiR | = 2. So the “only if” part follows from Theorem 5. t
u

For examples, in graph G1, the 2-separator xa +xc −xc −xf −xe −xh −xl ≤ 1
do not define a facet of P (G1), by Corollary 2, αe or αh can be lifted to 0. The
Packing of Two Steiner Trees 445

3-separator xa + xc + xg − xb − 2xe − 2xk ≤ 1 also do not define a facet P (G1),


by Theorem 5, αe can be lifted to 1.
In graph G3 (Fig. 2), the 2-separator xa + xb − xd − xe − xf ≤ 1 is not facet-
defining, although it satisfies the conditions of Corollary 2, except that R induces
a structure with B-components. This 2-separator is dominated by the lifted 3-
separator xa +xb +xc −xd −xe −xf ≤ 1 (from 3-separator xa +xb +xc −2xd −2xe −
2xf ≤ 1). In graph G4 (Fig. 2), 3-separator xa + xc + xe − xb − xd − xf − xh ≤ 1
is not facet-defining, although it satisfies the conditions of Theorem 4, except
that the A-component formed by {c, g} is not a double v-tree. This 3-separator
is dominated by xa + xc + xg + xe − xb − xd − xf − xh ≤ 1.

G3 a d G4

b c d

b e a e

f g h

c f

Fig. 2. Graphs for examples

The concept of lifted p-separators acts as a convenient classifying device,


permitting us to view lots of joint inequalities in a single framework. Even though
Theorem 5 does not cover all cases when a p-separator facet of P1 (G) can be
lifted. It is certainly possible to improve these results, but difficulties appear.
For instance, when p ≥ 4 it is not clear even how to calculate the original α
coefficients of a p-separator facet of P1 (G). Also, it is usual that several such
coefficients can be lifted at once. Conditions for this “multilifting” are more
complicated. Most of all, it is cumbersome to design separation procedures for
lifted p-separators departing from the original p-separator.
We give a direct characterization of an important subfamily of lifted p-
separators that permitted us to overcome such difficulties. These inequalities
turned out to be essential to good branch-and-cut performance.

Definition 8. Let S = {s1 , . . . , sp } and R = {r1 , . . . , rp } be two disjoint vertex-


sets with p ≥ 2. Suppose that for any integers a, b, c, d in the range {1, . . . , p},
with a ≤ b < c ≤ d, there are no two vertex-disjoint paths joining sa to sc and rb
to rd (in other words, the vertices in every path from rb to rd define a separator
of {sa , sc } and vice-versa). Then the following inequality is called a p-cross
X X
xi − xi ≤ 1
i∈S i∈R
446 Eduardo Uchoa and Marcus Poggi de Aragão

In graph G1, xe + xj − xd − xg ≤ 1 is a 2-cross and xa + xc + xl + xj − xb −


xi − xk − xd ≤ 1 is a 4-cross. Most p-crosses are lifted p-separators, it suffices
to “complete” R until a proper minimal separator of S is found. For instance,
3-cross xa +xc +xk −xb −xl −xd ≤ 1 is a lifting of 3-separator xa +xc +xk −2xb −
xf − xl − xd ≤ 1 facet of P1 (G1) or −xb − xl − xd + xa + xc + 2xe + xf + xk ≤ 4
facet of P2 (G1).

Theorem 6. The p-crosses are valid for P .


Pp
Proof. By induction on p. The 2-crosses are clearly valid. Suppose that i=1 xsi −
P p
xri ≤ 1 is a p-cross with p ≥ 3. By induction hypothesis, (p − 1)-crosses
Pi=1
p−1 Pp−1 Pp Pp
i=1 xsi − i=1 xri ≤ 1 and i=2 xsi − i=2 xri ≤ 1 are valid. Summing these
inequalities with 2-cross xs1 + xsp − xr1 − xrp ≤ 1, dividing the result by 2 and
rounding down the right-hand-side, the desired p-cross is obtained. t
u

Theorem 7. If G is biconnected, every 2-cross defines a facet of P (G).

Proof. Let xs1 + xs2 − xr1 − xr2 ≤ 1 be a 2-cross and at x ≤ a0 be a valid


inequality for P (G) such that

F = {x ∈ P (G) | xs1 + xs2 − xr1 − xr2 = 1} ⊆ Fa = {x ∈ P (G) | at x = a0 } .

We show that at x ≤ a0 is an scalar multiple of this 2-cross. As G is biconnected,


there are two vertex-disjoint paths, except in the endpoints, joining r1 to r2 . Let
I1 and I2 be the set of vertices of any two such paths. As I1 and I2 are separators
of S, it is not possible to have S ⊆ I1 or S ⊆ I2 .
Case (a): I1 or I2 is a proper separator of S.
Suppose I1 is a proper separator of S and consider the induced structure. As
G is biconnected, each component is adjacent to at least two vertices in I1 . We
assert that at least one of the two A-components is adjacent to both r1 and r2 . If
this is not true, the path corresponding to I2 must pass through a B-component
and I1 would not be a separator of S. Suppose A2 is an A-component adjacent
to r1 and r2 .
Subcase (a.1): A1 is adjacent to a vertex in I10 = I1 \ {r1 , r2 }. In this case,
C1 = A1 ∪ I10 ∪ {Bk | Bk is adjacent to I10 } and C2 = A2 are double v-trees.
Construct a sequence of nested double v-trees from s1 to C1 and from s2 to
C2 . For each T in these sequences, χT ∈ F . So as1 = as2 = a0 and ai = 0 if
i ∈ (C1 ∪ C2 ) \ S. As χC1 ∪C2 ∪{r1 } and χC1 ∪C2 ∪{r2 } ∈ F , ar1 = ar2 = −a0 . Each
component Bk not adjacent to I10 must be adjacent to both r1 and r2 . Let j be a
vertex in Bk adjacent to r1 . Construct a sequence of nested double v-trees from
j to Bk . For each T in this sequence UT = C1 ∪ C2 ∪ {r1 } ∪ T is also a double
v-tree and χUT ∈ F . So ai = 0, for all i ∈ Bk .
Subcase (a.2): A1 is not adjacent to I10 . In this case, A1 is adjacent to r1
and r2 . Let C1 = A1 , C2 = A2 and proceed similar to case a.1, concluding that
as1 = as2 = a0 , ai = 0 if i ∈ (C1 ∪C2 )\S and ar1 = ar2 = −a0 . Now define C3 as
I10 ∪ {Bk | Bk is adjacent to I10 }. Let j be a vertex in I10 adjacent to r1 . Construct
Packing of Two Steiner Trees 447

a sequence of nested double v-trees from j to C3 . For each T in this sequence


UT = C1 ∪ C2 ∪ {r1 } ∪ T is also a double v-tree and χUT ∈ F . So ai = 0, for each
i ∈ C3 . B-components adjacent to both r1 and r2 are treated in a similar way.
Case (b): I1 and I2 are not proper separators of S.
Suppose I1 contains s1 and I2 contains s2 . Consider the structure induced
by separator I1 . As I2 contains s2 , A2 is adjacent to r1 and r2 . Let C1 = I10 ∪
{Bk | Bk is adjacent to I10 }, C2 = A2 and proceed similar to subcase a.1. B-
components adjacent to both r1 and r2 are also treated like in a.1. t
u

Using similar techniques it is possible to prove that under mild conditions


p-crosses with p ≥ 3 also define facets of P (G). For instance, supposing G is
biconnected and planar, all p-crosses are facet-defining.

4 Separation Procedures

The 2-separators can be separated by applying the min-cut algorithm over a


suitable network. When p ≥ 3, the procedures to find violated p-separators in-
equalities are harder to devise. One heuristic procedure to find such a p-separator
is by the successive application of the 2-separator procedure. Note that if p ≥ 4
there is the additional complication of calculating proper α coefficients.
The p-crosses can be separated in polynomial time over any graph G, for
fixed p. This follows from definition 8, since there exists an O(|V |.|E|) algorithm
to determine if it is possible to connect two given pairs of vertices by vertex-
disjoint paths (Shiloach [7]). However, far better p-cross separation schemes may
exist if the particular topological structure of G is considered. If G is planar and
supposing that no 2-separator is violated, a violated p-cross only occurs when
the sequence s1 , r1 , . . . , sp , rp happen in this cyclic order on some face f of G. By
this property, p-cross inequalities can be separated in O(|V |) over planar graphs,
by dynamic programming.

5 Computational Results

The branch-and-cut for 2VPST was implemented in C++ using the ABACUS
framework [8] and CPLEX 3.0 LP solver, running on a Sun ULTRA workstation.
The code used 2-separators, 3-separators (heuristically separated) and p-crosses.
In all experiments described in table 1, G is a square grid graph (grid graphs
are typical in circuit layout applications). Two kind of instances were generated:

– Unweighted instances with |N1 | = |N2 | = |V |/4. The terminals in each of


these sets are randomly chosen among the vertices of G. Vertices in N1 have
weight -1 and vertices in N2 have weight 1. These instances are referenced
as u n Inst, where n = |V | and Inst is the instance number. The value z ∗
gives the number of unconnected terminals in an optimal solution.
448 Eduardo Uchoa and Marcus Poggi de Aragão

– Weighted instances. For each unweighted instance we generated a corre-


sponding weighted instance, by picking an integer wi from an uniform dis-
tribution in the range [−10, −1] if i ∈ N1 and from [1, 10] if i ∈ N2 . The
value z ∗ gives the sum of the absolute weights of the unconnected terminals
in an optimal solution. Weighted instances are referenced as w n Inst.

Table 1. Branch-and-Cut Results


Instance LBr z ∗ Nd LPs T Instance LBr z ∗ Nd LPs T
u1600 1 45.7 46 4 96 20 w1600 1 222.0 222 2 35 9
u1600 2 40.0 40 2 40 8 w1600 2 218.0 218 1 25 5
u2500 1 46.0 46 2 56 33 w2500 1 233.0 233 2 37 14
u2500 2 55.0 55 3 49 32 w2500 2 227.5 228 3 39 17
u3600 1 76.5 77 2 45 41 w3600 1 364.0 364 2 48 33
u3600 2 74.0 74 4 57 50 w3600 2 311.0 311 2 74 54
u4900 1 99.1 100 4 116 170 w4900 1 420.0 420 2 77 130
u4900 2 92.0 92 3 66 108 w4900 2 400.0 400 3 53 90
u6400 1 104.0 104 3 126 443 w6400 1 436.0 436 3 106 295
u6400 2 127.0 128 4 123 322 w6400 2 534.0 534 3 128 367
u8100 1 131.5 132 4 217 1207 w8100 1 549.5 550 4 113 563
u8100 2 115.0 115 6 125 713 w8100 2 467.0 467 3 82 264
u10000 1 159.8 161 8 120 676 w10000 1 685.0 685 3 60 345
u10000 2 151.5 152 4 157 991 w10000 2 632.5 634 7 116 621

In Table 1, the LBr column presents the lower bound obtained in the root
node, z∗ is the value of the optimal integer solution, Nd is the number of nodes
explored in the branch-and-bound tree, LPs is the total number of solved LPs
and T is the total cpu time in seconds spent by the algorithm. Remark that
LBr bounds includes p-crosses and 3-separators. The lower bounds given by the
linear relaxation of formulation F alone are rather weak, for example, in instance
u1600 1 it is 8.7. Figure 3 shows an optimal solution of instance u1600 1, the
black boxes represent the terminals in N1 , the empty boxes are the terminals in
N2 , the full lines are edges of a spanning tree for G[T1 ] and the dotted lines a
spanning tree for G[T2 ].
The code turned out to be robust under changes on terminal placement, ter-
minal density and weight distribution. In fact, we used the same set of parameters
for solving all instances in table 1, weighted and unweighted. An interesting ex-
perience is to generate a random terminal placement avoiding the most obvious
causes of infeasibility: terminals on the outer face, two pairs of opposite termi-
nals crossed over the same grid square and terminals blocked by its neighbors.
An optimal solution of such an instance (called t1600 1) with 800 terminals and
unitary weights is shown in Fig. 4. The unconnected terminals are in coordinates
(4,33), (8,21), (33,36) and (34,15). This solution was obtained in 21 seconds.
Packing of Two Steiner Trees 449

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Fig. 3. An optimal solution of instance u1600 1, z ∗ = 46.

Appendix

Proof (Theorem 2). Suppose that a facet F of P1 (G) is defined by at x ≤ a0 .


The right-hand-side a0 must be non-negative, otherwise this inequality would
not be valid for the empty v-tree. So, w.l.o.g. we can suppose that a0 = 0 or
a0 = 1. Let S be the vertex-set corresponding to positive coefficients in a and R
the vertex-set corresponding to negative coefficients.
Case (a): a0 = 0. The set S must be empty, otherwise at x ≤ 0 would not
be valid for the v-tree formed by a single vertex in S. The set R must be a
singleton {u}, since if there are other vertices in R, at x ≤ 0 would be dominated
by −xu ≤ 0. So F can be defined by this trivial inequality.
Case (b): a0 = 1. S can not be empty. If S is a singleton {u}, F can be
defined by the trivial inequality xu ≤ 1. Suppose |S| ≥ 2. Graph G[V \ R] is
split into a number of connected components. No two vertices in S can belong
to the same such component, since if u and w are two vertices in S in the same
component
P then at x ≤ 1 would be P a linear combination P of valid inequalities
a x + (a + a )x + a x ≤ 1 and i∈S\{u,w} ai xi + (au +
P i i
i∈S\{u,w} u w u i∈R i i
aw )xw + i∈R ai xi ≤ 1. Therefore R is a proper separator of S. t
u
450 Eduardo Uchoa and Marcus Poggi de Aragão

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Fig. 4. An optimal solution of instance t1600 1, z ∗ = 4.

The following lemma and definition are used in the proof of Theorem 3.

Lemma 2. (Sequential Maximum Lifting Lemma, pages 261-263 of [5]) Let P


be a polyhedron in IRn . Let N be the set of indices {1, . . . , n} and Z0 a subset
of
P N . Define P
Z0
as the polyhedron P ∩ {x ∈ IRn |xi = 0, i ∈ Z0 }. Suppose
a x ≤ a0 is a valid inequality for P Z0 defining a face of dimension d.
P\Z0 i i
i∈N P
Let i∈N \Z0 ai xi + i∈Z0 mi xi ≤ a0 be any inequality obtained by a sequential
maximum lifting of one coefficient in Z0 at a time (in any order). This lifted
inequality defines a face of P of dimension d + |Z0 |, and if the original inequality
defines a facet of P Z0 , the lifted inequality defines a facet of P .

Definition 9. Let T be a v-tree of G and t a vertex in T . A sequence of nested


v-trees from t to T is any sequence of v-trees T1 , . . . , T|T | such that {t} = T1 ⊂
. . . ⊂ T|T | = T . Note that given v-tree T and t, is always possible to construct
such a sequence.

Proof (Theorem
P 3). Consider the structure induced by R (see definition 7).
Inequality i∈S xi ≤ 1 is valid for P1 (G)Z0 , where Z0 = R ∪ (∪qj=1 Bj ). Define
Packing of Two Steiner Trees 451

P
F = {x ∈ P1 (G)Z0 | i∈S xi = 1}. For each j ∈ {1, . . . , p}, Aj is a v-tree.
Construct a sequence of nested P v-trees from sj to Aj . For every T in these
p
sequences, χT ∈ F . So there are j=1 |Aj | affine independent vectors in F and
P
i∈S xi ≤ 1 defines a facet of P1 (G) .
Z0

Number the vertices of R from r1 to r|R| . Define R0 = ∅ and Rj as the set


{r1 , . . . , rj }. For each k ∈ {1, . . . , q}, define f (k) as the smaller number of a
vertex in R adjacent to B-component Bk . Define Wj = Rj ∪ {Bk |f (k) ≤ j} and
Pj = Z0 \ P
Z Wj . We show that for each j ∈ {0, . . . , |R|}, there is an inequality
i∈S x i − i∈Rj αi xi ≤ 1, where each αi is integer and satisfies αi ≥ |Si | − 1,
R

defining a facet of P1 (G)Zj .


This is true when j = 0. Suppose 1 ≤ j ≤ |R|. Pick the inequality correspond-
ing to j − 1. First lift the coefficient of rj . As there exists a v-tree Tj such that
Tj ∩ R = {rj }, Tj ∩ S = SjR and Tj ∩ (∪qj=1 Bj ) = ∅, αj ≥ |SjR | − 1. As the lifting
p
is maximum, αj is integral and there P exists a v-tree U ⊆ (∪j=1 Aj ∪ Wj−1 ∪ {rj })
such that rj ∈ U and |S ∩ U | − i∈Rj ∩U αi = 1. For each B-component Bk
such that f (k) = j, take a vertex t ∈ Bk adjacent to rj . Construct a sequence
of nested v-trees from t to Bk . Make a maximum lifting of the coefficients of
vertices in Bk in the same order this vertices are introduced in the sequence.
Since U ∪ T is a v-tree for every T in the sequence, all such coefficients must be
zero. Repeating this procedure until j = |R|, a p-separator defining a facet of
P1 (G) is obtained. t
u

Proof (Correctness of procedure Calculate P α). Let U be a v-tree and


let Q = R ∩ U . We must show that |S ∩ U | − i∈Q αi ≤ 1. Assume that Q
is not empty, otherwise this inequality is clearly true. Sort the vertices in Q
as q1 , . . . , q|Q| , using the same order in which these vertices were considered in
procedure Calculate α. Define Qi as the set {q1 , . . . , qi } and SQR
as ∪j∈Qi SjR .
P i
We assert that for each i ∈ {1, . . . , |Q|}, γi = |SQi ∩U |− i∈Qi αi ≤ 1. This is
R

true when i = 1. Suppose 1 < i ≤ |Q|. If αi = |SiR |, then γi ≤ γi−1 ≤ 1. Suppose


αi = |SiR | − 1. The procedure assures that if there exists a vertex j ∈ Qi−1 with
αj = |SjR | − 1 then SiR ∩ SjR 6= ∅,P and γi ≤ γi−1 ≤ 1. If there is no such vertex
γi−1 ≤ 0, so γi ≤ 1. As |S ∩ U | − i∈Q αi ≤ γ|Q| ≤ 1, the result follows. t
u

Proof (Lemma 1). If T = {t} the result is immediate. Otherwise, suppose


that a partial sequence T = T|T | ⊃ . . . ⊃ Ti , |T | ≥ i > 1, is already formed.
Vertices in double v-tree Ti , different from t, but adjacent to a vertex in V \ Ti
will be called candidate vertices (if T = V , all vertices in V − t are candidates).
As G is biconnected, there is at least one candidate vertex. We show that is
always possible to choose a candidate to be removed from Ti to form double
v-tree Ti−1 .
Let v0 be a candidate. If Ti − v0 is a v-tree, v0 is chosen. Otherwise G[Ti − v0 ]
is separated into j ≥ 2 connected components with vertex-sets C11 , . . . , Cj1 . As
Ti is a v-tree, at least one vertex in Cj1 is adjacent to v0 . As G is biconnected, at
least one vertex in each Cj1 is adjacent to V \ Ti . Adjust notation for C11 to be a
set not containing t. Let v1 be a candidate vertex in C11 . G[C11 − v1 ] is separated
452 Eduardo Uchoa and Marcus Poggi de Aragão

into j ≥ 0 connected components with vertex-sets C12 , . . . , Cj2 . As C11 is a v-tree,


at least one vertex in each Cj2 is adjacent to v1 . As G is biconnected, at least
one vertex in each Cj2 is adjacent to v0 or to V \ Ti . If at least one vertex in each
Cj2 is adjacent to v0 (or j = 0), v1 is chosen. Otherwise, adjust notation for C12
to be a set containing a candidate vertex.
By repeating this procedure, we get a sequence v1 , . . . , vk of candidate vertices
contained in smaller sets C1k . If each component of G[C1k − vk ] is adjacent to
Ti \ C1k , vk is the chosen candidate and Ti − vk is a v-tree. In the worst case, the
procedure is repeated until C1k = {vk }. t
u

Acknowledgments: The first author was supported by CNPq Bolsa de Douto-


rado. The second author was partially supported by CNPq Bolsa de Produtivi-
dade em Pesquisa 300475/93-4.

References
1. M. Grötschel, A. Martin and R. Weismantel, Packing Steiner Trees: polyhedral in-
vestigations, Mathematical Programming, Vol. 72, 101-123, 1996.
2. M. Grötschel, A. Martin and R. Weismantel, Packing Steiner Trees: a cutting plane
algorithm and computational results, Mathematical Programming, Vol. 72, 125-145,
1996.
3. B. Korte, H. Prömel and A. Steger, Steiner Trees in VLSI-Layout, in B. Korte, L.
Lovász, H. Prömel, A. Schrijver (Eds.) “Paths, Flows and VLSI-Layout”, Springer-
Verlag, Berlin, 1989.
4. T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, Wiley, Chich-
ester, 1990.
5. G. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization, Wiley, New
York, 1988.
6. M. Poggi de Aragão and E. Uchoa, The γ-Connected Assignment Problem, To
appear in European Journal of Operational Research, 1998.
7. Y. Shiloach, A Polynomial Solution to the Undirected Two Paths Problem, Journal
of the ACM, Vol. 27, No.3, 445-456, 1980.
8. S. Thienel, ABACUS - A Branch-And-Cut System, PhD thesis, Universität zu Köln,
1995.
Author Index

Aardal, K. 1 McCormick, S.T. 259


Ageev, A.A. 17 Mahjoub, A.R. 166
Ahuja, R.K. 31 Melkonian, V. 345
Amaldi, E. 45 Mutzel, P. 304, 361
Atamtürk, A. 60 Nagamochi, H. 377
Bixby, R.E. 1 Nakamura, M. 289
Cai, M. 73 Nemhauser, G.L. 60
Caprara, A. 87 Noga, J. 391
Chudak, F.A. 99 Orlin, J.B. 31
Cunningham, W.H. 114 Pfetsch, M.E. 45
Cvetković, D. 126 Poggi de Aragão, M. 439
Čangalović, M. 126 Queyranne, M. 218
Deng, X. 73 Savelsbergh, M.W.P. 60
Eisenbrand, F. 137 Schulz, A.S. 137
Fischetti, M. 87 Sebő, A. 400
Fleischer, L. 151 Seiden, S. 391
Fonlupt, J. 166 Sethuraman, J. 429
Frank, A. 183, 191 Shigeno, M. 259
Halperin, E. 202 Smeltink, J.W. 1
Hartmann, M. 218 Stein, C. 328
Hartvigsen, D. 234 Sviridenko, M.I. 17
Helmberg, C. 242 Szigeti, Z. 183, 415
Hochbaum, D.S. 31 Takabatake, T. 289
Hurkens, C.A.J. 1 Tan, W.-P. 429
Ibaraki, T. 377 Tang, L. 114
Iwata, S. 259 Tardos, É. 345
Jordán, T. 183, 273 Teo, C.-P. 429
Kashiwabara, K. 289 Trotter, L.E., Jr. 45
Király, Z. 191 Uchoa, E. 439
Klau, G.W. 304 Wang, Y. 218
Klein, P. 320 Weiskircher, R. 361
Kolliopoulos, S.G. 328 Williamson, D.P. 99
Kovačević-Vujčić, V. 126 Young, N. 320
Lenstra, A.K. 1 Zang, W. 73
Letchford, A.N. 87 Zwick, U. 202

You might also like