0% found this document useful (0 votes)
150 views

Sven O Krumke Integer Programming Polyhedra and Algorithms Lecture Notes

This document provides an introduction and table of contents to a textbook on integer programming. It discusses the topics that will be covered in the textbook, including polyhedral theory, algorithms like dynamic programming and branch and bound, and cutting plane methods. It also provides contact information for the author and acknowledges they are open to feedback on the textbook draft.

Uploaded by

Pride Mwanema
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Sven O Krumke Integer Programming Polyhedra and Algorithms Lecture Notes

This document provides an introduction and table of contents to a textbook on integer programming. It discusses the topics that will be covered in the textbook, including polyhedral theory, algorithms like dynamic programming and branch and bound, and cutting plane methods. It also provides contact information for the author and acknowledges they are open to feedback on the textbook draft.

Uploaded by

Pride Mwanema
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 188

Sven O.

Krumke

Integer Programming
Polyhedra and Algorithms

Draft: March 6, 2017

5 S S

4 shrink S

3 G GS

shrink S̄ = V \ S
2

0 V \S
0 1 2 3 4 5 GS̄
ii

These course notes are based on


my lectures »Integer Program-
ming: Polyhedral Theory« and
»Integer Programming: Algorithms«
at the University of Kaiserslautern.
I would be happy to receive feed-
back, in particular suggestions for
improvement and notificiations
of typos and other errors.

Sven O. Krumke
[email protected]

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Contents

1 Introduction 1
1.1 A first Look at Integer Linear Programs . . . . . . . . . . . . . . . . . . . 1
1.2 Integer Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Notes of Caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Basics 13
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Convex Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Polyhedra and Formulations . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

I Polyhedral Theory 19

3 Polyhedra and Integer Programs 21


3.1 Valid Inequalities and Faces of Polyhedra . . . . . . . . . . . . . . . . . . 21
3.2 Inner and Interior Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 The Fundamental Theorem of Linear Inequalities . . . . . . . . . . . . . . 25
3.4 The Decomposition Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 The Characteristic Cone and Extreme Rays . . . . . . . . . . . . . . . . . 42
3.9 Most IPs are Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . 47
iv

4 Integrality of Polyhedra 51
4.1 Equivalent Definitions of Integrality . . . . . . . . . . . . . . . . . . . . . 52
4.2 Matchings and Integral Polyhedra I . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Total Unimodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Conditions for Total Unimodularity . . . . . . . . . . . . . . . . . . . . . 56
4.5 Applications of Unimodularity: Network Flows . . . . . . . . . . . . . . . 58
4.6 Matchings and Integral Polyhedra II . . . . . . . . . . . . . . . . . . . . . 61
4.7 Total Dual Integrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.8 Submodularity and Matroids . . . . . . . . . . . . . . . . . . . . . . . . . 68

II Algorithms 73

5 Basics about Problems and Complexity 75


5.1 Encoding Schemes, Problems and Instances . . . . . . . . . . . . . . . . . 75
5.2 The Classes P and NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 The Complexity of Integer Programming . . . . . . . . . . . . . . . . . . . 80
5.4 Optimization and Separation . . . . . . . . . . . . . . . . . . . . . . . . . 81

6 Relaxations and Bounds 83


6.1 Optimality and Relaxations . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Combinatorial Relaxations . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Lagrangian Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Dynamic Programming 99
7.1 Shortest Paths Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Knapsack Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8 Branch and Bound 103


8.1 Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.2 Pruning Enumeration Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.3 LP-Based Branch and Bound: An Example . . . . . . . . . . . . . . . . . 106
8.4 Techniques for LP-based Branch and Bound . . . . . . . . . . . . . . . . . 109
8.4.1 Reoptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.4.2 Other Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9 Cutting Planes 115


9.1 Cutting-Plane Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2 A Geometric Approach to Cutting Planes: The Chvátal Rank . . . . . . . . 121
9.3 Cutting-Plane Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.4 Gomory’s Cutting-Plane Algorithm . . . . . . . . . . . . . . . . . . . . . 124
9.5 Mixed Integer Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.6 Structured Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.6.1 Knapsack and Cover Inequalities . . . . . . . . . . . . . . . . . . . 133
9.6.2 Lifting of Cover Inequalities . . . . . . . . . . . . . . . . . . . . . 137
9.6.3 Separation of Cover Inequalities . . . . . . . . . . . . . . . . . . . 141
9.6.4 The Set-Packing Polytope . . . . . . . . . . . . . . . . . . . . . . 142

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


v

10 Column Generation 149


10.1 Dantzig-Wolfe Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 149
10.1.1 A Quick Review of Simplex Pricing . . . . . . . . . . . . . . . . . 150
10.1.2 Pricing in the Dantzig-Wolfe Decomposition . . . . . . . . . . . . 151
10.2 Dantzig-Wolfe Reformulation of Integer Programs . . . . . . . . . . . . . 152
10.2.1 Solving the Master Linear Program . . . . . . . . . . . . . . . . . 153
10.2.2 The Cutting-Stock Problem . . . . . . . . . . . . . . . . . . . . . 155
10.2.3 The Balanced k-Cut Problem . . . . . . . . . . . . . . . . . . . . . 159
10.2.4 Strength of the Master Linear Program and Relations to Lagrangean Duality161
10.2.5 Getting an Integral Solution . . . . . . . . . . . . . . . . . . . . . 162

11 More About Lagrangian Duality 167


11.1 Convexity and Subgradient Optimization . . . . . . . . . . . . . . . . . . . 167
11.2 Subgradient Optimization for the Lagrangian Dual . . . . . . . . . . . . . 170
11.3 Lagrangian Heuristics and Variable Fixing . . . . . . . . . . . . . . . . . . 171

A Notation 173
A.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
A.2 Sets and Multisets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
A.3 Analysis and Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . 174
A.4 Growth of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
A.5 Particular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
A.6 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
A.7 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
A.8 Theory of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

B Symbols 179

Bibliography 181

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Introduction
1
Linear Programs can be used to model a large number of problems arising in practice. A
standard form of a Linear Program is
(LP) max cT x (1.1a)
Ax 6 b (1.1b)
x > 0, (1.1c)
where c ∈ Rn , b ∈ Rm are given vectors and A ∈ Rm×n is a matrix. The focus of
these lecture notes is to study extensions of Linear Programs where we are given additional
integrality conditions on all or some of the variables.
Problems with integrality constraints on the variables arise in a variety of applications. For
instance, if we want to place facilities, it makes sense to require the number of facilites to
be an integer (it is not clear what it means to build 2.28 fire stations). Also, frequently, one
can model decisions as 0-1-variables: the variable is zero if we make a negative decision
and one otherwise.

1.1 A first Look at Integer Linear Programs


Simple Ex is an exchange student at the University of Kaiserslautern. He has just completed
his studies and is on his way home. Therefore, he has packed his six suitcases, whose
weights sum up to exactly 20 kg, which is the total weight allowed by Knapsack Airlines.
Hand baggage is currently forbidden due to security reasons.
Just before Simple Ex is about to leave, he meets Cutty Plain, the best friend of his girlfriend
who gives him another suitcase of weight 8 kg. A short phone call with his girlfriend shows
him that in no way he may leave this suitcase in Kaiserslautern. This means that instead of
20 kg he has now only 12 kg left for the rest of his luggage.
Time is short and there is no way of repacking. So, he must leave some luggage in Kaiser-
slautern. But what? Simple Ex remembers his lecture on Linear and Network Optimization.
Is there a way to solve this problem by a Linear Program?
He first notes that the different suitcases represent different values for him. He compiles
Table 1.1 and sets up a Linear Program:
max 7x1 + 5x2 + 9x3 + 4x4 + 2x5 + 1x6 (1.2a)
4x1 + 3x2 + 6x3 + 3x4 + 3x5 + 2x6 6 12 (1.2b)
0 6 xi 6 1, i = 1, . . . , 6. (1.2c)
2 Introduction

Suitcase 1 Suitcase 2 Suitcase 3 Suitcase 4 Suitcase 5 Suitcase 6


Weight 4 kg 3 kg 6 kg 3 kg 3 kg 2 kg
Score 7 5 9 4 2 1

Table 1.1: The suitcases of Simple Ex with weights and profits

The optimum solution is given by


x1 = 1, x2 = 1, x3 = 5/6, x4 = x5 = x6 = 0
with value 19.5 There is a problem, however. Simple Ex can not split Suitcase 3 (or pack
it in a fractional way). As Cutty Plain points out, the “real” problem to be solved is one
where the variables attain only values in {0, 1}:
max 7x1 + 5x2 + 9x3 + 4x4 + 2x5 + 1x6 (1.3a)
4x1 + 3x2 + 6x3 + 3x4 + 3x5 + 2x6 6 12 (1.3b)
xi ∈ {0, 1}, i = 1, . . . , 6. (1.3c)

Simple Ex is not convinced that there is a real difficulty (except for the cursed suitcae of
Cutty). If he packs the first two suitcases, then the third one will have to stay in Kaiser-
slautern (the variable is “rounded” to 0). This gives a profit of 12. The remaining 5 kg can
be filled perfectly with Suitcases 4 and 6, which yields exactly 12 kg and a profit of 17.
Simple Ex is happy. This should be the optimum solution, shouldn’t it?
Cutty Plain does not look convinced. In fact, a long time ago, she attended a lecture on
Integer Programming. It is trivial that 19.5 is an upper bound for the profit. Since we must
pack suitcases completely, the optimum integral value of (1.3) must be an integer, so we
can even state that 19 is an upper bound for the optimum profit. Cutty points out that the
Linear Program (1.2) can be improved, since as noted before, one can not pack the first
three suitcases alltogether. So we may add the inequality
x1 + x2 + x3 6 2. (1.4)
to the Linear Program without affecting the optimum integral solution! The new (frac-
tional) optimum solution is:
x1 = 1, x2 = 0, x3 = 1, x4 = 2/3, x5 = x6 = 0
which gives a profit of 56/3 ≈ 18.666. This shows that even 19 can not be reached as
optimum profit.
Simple Ex finds this interesting. He realizes that he can add a couple of more inequalities
in the same way. This gives the new Linear Program:
max 7x1 + 5x2 + 9x3 + 4x4 + 2x5 + 1x6 (1.5a)
4x1 + 3x2 + 6x3 + 3x4 + 3x5 + 2x6 6 12 (1.5b)
x1 + x2 + x3 6 2 (1.5c)
x1 + x3 + x4 6 2 (1.5d)
0 6 xi 6 1, i = 1, . . . , 6. (1.5e)
Solving this problem yields as an optimum solution
x1 = 0, x2 = 1, x3 = 1, x4 = 1, x5 = x6 = 0
with value 18. This is better than the solution provided by Simple Ex earlier. Also, this
must be an optimum solution to (1.3), since it is integral.
Simple Ex wonders whether this can be done all the time and Cutty Plain tells him that she
does not remember exactly. Thus, Simple Ex cancels his flight home and decides to stay
one more semester to attend a course on Integer Programming.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


1.2 Integer Linear Programs 3

1.2 Integer Linear Programs


We first state the general form of a Mixed Integer Program as it will be used throughout
these notes:

Definition 1.1 (Mixed Integer Linear Program (MIP))


A Mixed Integer Linear Program (MIP) is given by vectors c ∈ Rn , b ∈ Rm , a matrix
A ∈ Rm×n and a number p ∈ {0, . . . , n}. The goal of the problem is to find a vector x ∈ Rn
solving the following optimization problem:

(MIP) max cT x (1.6a)


Ax 6 b (1.6b)
x>0 (1.6c)
x ∈ Zp × Rn−p . (1.6d)

If p = 0, then then there are no integrality constraints at all, so we obtain the Linear
Program (1.1). On the other hand, if p = n, then all variables are required to be integral.
In this case, we speak of an Integer Linear Program (IP): we note again for later reference:

(IP) max cT x (1.7a)


Ax 6 b (1.7b)
x>0 (1.7c)
x ∈ Zn (1.7d)

If in an (IP) all variables are restricted to values from the set B = {0, 1}, we have a 0-1-
Integer Linear Program or Binary Linear Integer Program:

(BIP) max cT x (1.8a)


Ax 6 b (1.8b)
x ∈ Bn (1.8c)

Most of the time we will be concerned with Integer Programs (IP) and Binary Integer
Programs (BIP).

Example 1.2
Consider the following Integer Linear Program:

max x+y (1.9a)


2y − 3x 6 2 (1.9b)
x+y 6 5 (1.9c)
16x63 (1.9d)
16y63 (1.9e)
x, y ∈ Z (1.9f)

The feasible region S of the problem is depicted in Figure 1.1. It consists of the integral
points emphasized in red, namely

S = {(1, 1), (2, 1), (3, 1), (1, 2), (2, 2), (3, 2), (2, 3)}.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4 Introduction

3
b

2
b b b

1
b b b

0
0 1 2 3 4 5

Figure 1.1: Example of the feasible region of an integer program.

Example 1.3 (Knapsack Problem)


A climber is preparing for an expedition to Mount Optimization. His equipment consists
of n items, where each item i has a profit pi ∈ Z+ and a weight wi ∈ Z+ . The climber
knows that he will be able to carry items of total weight at most b ∈ Z+ . He would like to
pack his knapsack in such a way that he gets the largest possible profit without exceeding
the weight limit.
We can formulate the K NAPSACK problem as a (BIP). Define a decision variable xi , i =
1, . . . , n with the following meaning:

1 if item i gets packed into the knapsack
xi =
0 otherwise

Then, K NAPSACK becomes the (BIP)


n
X
max pi xi (1.10a)
i=1
Xn
wi xi 6 b (1.10b)
i=1
x ∈ Bn (1.10c)

In the preceeding example we have essentially identified a subset of the possible items with
a 0-1-vector. Given a ground set E, we can associate with each subset F ⊆ E an incidence
vector χF ∈ RF by setting

F 1 if e ∈ F
χe =
0 if e ∈/ F.

Then, we can identify the vector χF with the set F and vice versa. We will see numerous
examples of this identification in the rest of these lecture notes. This identification appears
frequently in the context of combinatorial optimization problems.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


1.3 Notes of Caution 5

Definition 1.4 (Combinatorial Optimization Problem)


A combinatorial optimization problem is given by a finite ground set N, a weight function
c : N → R and a family F ⊆ 2N of feasible subsets of N. The goal is to solve
 
X 
(COP) min c(j) : S ∈ F . (1.11)
 
j∈S

Thus, we can write K NAPSACK also as (COP) by using:

N := {1, . . . , n}
 
X
F := S⊆N: wi 6 b ,
i∈S

and c(i) := −pi . We will see later that there is a close connection between combinatorial
optimization problems (as stated in (1.11)) and Integer Linear Programs.

1.3 Notes of Caution

Integer Linear Programs are qualitatively different from Linear Programs in a number of
aspects. Recall that from the Fundamental Theorem of Linear Programming we know that,
if the Linear Program (1.1) is feasible and bounded, it has an optimal solution.


x=1 y= 2x
8

0 y=0
0 1 2 3 4 5 6 7 8

Figure 1.2: An IP without optimal solution.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6 Introduction

Now, consider the following Integer Linear Program:



max − 2x + y (1.12a)

− 2x + y 6 0 (1.12b)
x>1 (1.12c)
y>0 (1.12d)
x, y ∈ Z (1.12e)

The feasible region S of the IP (1.12) is depicted in Figure 1.2. The set of feasible
√ solutions
is nonempty (for instance (1, 0) is a feasible point) and by the constraint − 2x + y 6 0
the objective is also bounded from above on S. However, the IP does not have an optimal
solution!
√ √
To see this, observe
√ that from the √ constraint − 2x + y 6 0 we have y/x 6 2 and, since
we know that 2 is irrational,
√ −√ 2x + y < 0 for any x, y ∈ Z. On the other hand, for
integral x, the function − 2x + ⌊ 2x⌋ gets arbitrarily close to 0.

Remark 1.5 An IP with irrational input data can be feasible and bounded but may still not
have an optimal solution.

The reason why the IP (1.12) does not have an optimal solution lies in the fact that we have
used irrational data to specify the problem. We will see later that under the assumption of
rational data any feasible and bounded MIP must have an optimal solution. We stress again
that for standard Linear Programs no assumption about the input data is needed.
At first sight it might sound like a reasonable idea to simply drop the integrality constraints
in an IP and to “round the corresponding” solution. But, in general, this is not a good idea
for the following reasons:

• The rounded solution may be infeasible (see Figure 1.3(a)), or

• the rounded solution may be feasible but far from the optimum solution (see Fig-
ure 1.3(b)).

x∗ x∗
cT x
cT x

(a) (b)

Figure 1.3: Simply rounding a solution of the LP-relaxation to an IP may give infeasible or
very bad solutions.

1.4 Examples
In this section we give various examples of integer programming problems.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


1.4 Examples 7

Example 1.6 (Assignment Problem)


Suppose that we are given n tasks and n people which are available for carrying out the
tasks. Each person can carry out exactly one job, and there is a cost cij if person i serves
job j. How should we assign the jobs to the persons in order to minimize the overall cost?
We first introduce binary decision variables xij with the following meaning:

1 if person i carrys out job j
xij =
0 otherwise.
P
Given such a binary vector x, the number of persons assigned to job j is exactly ni=1 xij .
Thus, the requirement that each job gets served by exactly one person can be enforced by
the following constraint:
Xn
xij = 1, for j = 1, . . . , n.
i=1

Similarly, we can ensure that each person does exactly one job by having the following
constraint:
Xn
xij = 1, for i = 1, . . . , n.
j=1

We obtain the following BIP:


n X
X n
min cij xij
i=1 j=1
Xn
xij = 1 for j = 1, . . . , n
i=1
Xn
xij = 1 for i = 1, . . . , n
j=1
2
x ∈ Bn

Example 1.7 (Uncapacitated Facility Location Problem)


In the uncapacitated facility location problem (U FL) we are given a set of potential depots
M = {1, . . . , m} and a set N = {1, . . . , n} of clients. Opening a depot at site j involves a fixed
cost fj . Serving client i by a depot at location j costs cij units of money. The goal of the
U FL is to decide at which positions to open depots and how to serve all clients such as to
minimize the overall cost.
We can model the cost cij which arises if client i is served by a facility at j with the help
of binary variables xij similar to the assignment problem:

1 if client i is served by a facility at j
xij =
0 otherwise.

The fixed cost fj which arises if we open a facility at j can be handled by binary variables yj
where 
1 if a facility at j is opened
yj =
0 otherwise.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


8 Introduction

Following the ideas of the assignment problem, we obtain the following BIP:
m
X m X
X n
min fj yj + cij xij (1.13a)
j=1 j=1 i=1
Xm
xij = 1 for i = 1, . . . , n (1.13b)
j=1
x ∈ Bnm , y ∈ Bm (1.13c)

As in the assignment problem, constraint (1.13b) enforces that each client is served. But
our formulation is not complete yet! The current constraints allow a client i to be served
by a facility at j which is not opened, that is, where yj = 0. How can we ensure that clients
are only served by open facilities?
One option is to add the nm constraints xij 6 yj for all i, j to (1.13). Then, if yj = 0, we
must have xij = 0 for all i. This is what we want. Hence, the U FL can be formulated as
follows:
m
X m X
X n
min fj yj + cij xij (1.14a)
j=1 j=1 i=1
Xm
xij = 1 for i = 1, . . . , n (1.14b)
j=1

xij 6 yj for i = 1, . . . , n and j = 1, . . . , m (1.14c)


x ∈ Bnm , y ∈ Bm (1.14d)

A potential drawback of the formulation (1.14) is that it contains a large number of con-
straints. We can formulate P the condition that clients are served only by open facilities in
another way. Observe that n i=1 xij is the number of clients
P assigned to facility j. Since
this number is an integer between 0 and n, the condition n i=1 xij 6 nyj can alsoPbe used
to prohibit clients being served by a facility whichP is not open. If yj = 0, then n i=1 xij
must also be zero. If yj = 1, we have the constraint n x
i=1 ij 6 n which is always satisfied
since there is a total of n clients. This gives us the alternative formulation of the U FL:
m
X m X
X n
min fj yj + cij xij (1.15a)
j=1 j=1 i=1
Xm
xij = 1 for i = 1, . . . , n (1.15b)
j=1
Xn
xij 6 nyj for j = 1, . . . , m (1.15c)
i=1
x ∈ Bnm , y ∈ Bm (1.15d)

Are there any differences between the two formulations as far as solvability is concerned?
We will explore this question later in greater detail. ⊳

Example 1.8 (Minimum Spanning Tree Problem)


A spanning tree in an undirected graph G = (V, E) is a subgraph T = (V, F) of G which
is connected and does not contain cycles. Given a cost function c : E → R+ on the edges
of G the minimum spanning tree problem (MST-Problem) asks to find a spanning tree T of
minimum weight c(F).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


1.4 Examples 9

P that xe = 1
We choose binary indicator variables xe for the edges in E with the meaning
if and only if e is included in the spanning tree. The objective function e∈E ce xe is
now clear. But how can we formulate the requirement that the set of edges chosen forms a
spanning tree?
A cut in an undirected graph G is a partition S ∪ S̄ = V, S ∩ S̄ = ∅ of the vertex set. We
denote by δ(S) the set of edges in the cut, that is, the set of edges which have exactly one
endpoint in S. It is easy to see that a subset F ⊆ E of the edges forms a connected spanning
subgraph (V, F) if and only if F ∩ δ(S) 6= ∅ for all subsets S with ∅ ⊂ S ⊂ V. Hence, we
can formulate the requirement that the P subset of edges chosen forms a connected spanning
subgraph by having the constraint e∈δ(S) xe > 1 for each such subset. This gives the
following BIP:
X
min ce xe (1.16a)
e∈E
X
xe > 1 for all ∅ ⊂ S ⊂ V (1.16b)
e∈δ(S)
X
xe = n − 1 (1.16c)
e∈E

x ∈ BE (1.16d)
P
The constraint e∈E xe = n − 1 ensures that our subgraph contains exactly n − 1 edges,
which together with the requirement that it is connected, implies that it forms a spanning
tree.

Example 1.9 (Traveling Salesman Problem)


In the traveling salesman problem (T SP) a salesman must visit each of n given cities V =
{1, . . . , n} exactly once and then return to his starting point. The distance between city i
and j is cij . The salesman wants to find a tour of minimum length.
The T SP can be modeled as a graph problem by considering a complete directed graph G =
(V, A), that is, a graph with A = V × V, and assigning a cost c(i, j) to every arc a = (i, j).
A tour is then a cycle in G which touches every node in V exactly once.
We formulate the T SP as a BIP. We have binary variables xij with the following meaning:

1 if the salesman goes from city i directly to city j
xij =
0 otherwise.
P
Then, the total length of the tour taken by the salesman is i,j cij xij . In a feasible tour,
the salesman enters each city exactly once and also leaves each city exactly once. Thus, we
have the constraints:
X
xji = 1 for i = 1, . . . , n
j:j6=i
X
xij = 1 for i = 1, . . . , n.
j:j6=i

However, the constraints specified so far do not ensure that a binary solution forms indeed
a tour.
Consider the situation depicted in Figure 1.4. We are given five cities and for each city
there is exactly one incoming and one outgoing arc. However, the solution is not a tour but
a collection of directed cycles called subtours.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10 Introduction

1 4 5

Figure 1.4: Subtours are possible in the T SP if no additional constraints are added.

To elminimate subtours we have to add more constraints. One possible way of doing so
is to use the so-called subtour elimination constraints. The underlying idea is as follows.
Let ∅ 6= S ⊂ V be a subset of the cities. A feasible tour (on the whole set of cities) must
leave S for some vertex outside of S. Hence, the number of arcs that have both endpoints
in S can be at most |S| − 1. On the other hand, a subtour which is a directed cycle for a
set ∅ 6= S ⊂ V, has exactly |S| arcs with both endpoints in S. We obtain the following BIP
for the T SP:
n X
X n
min cij xij (1.17a)
i=1 j=1
X
xij = 1 for i = 1, . . . , n (1.17b)
j:j6=i
X
xij = 1 for j = 1, . . . , n (1.17c)
i:i6=j
XX
xij 6 |S| − 1 for all ∅ ⊂ S ⊂ V. (1.17d)
i∈S j∈S

x ∈ Bn(n−1) (1.17e)

As mentioned before, the constraints (1.17d) are called subtour elimination constraints.

Example 1.10 (Set-Covering, Set-Packing and Set-Partitioning Problem)
Let U be a finite ground set and F ⊆ 2U be a collection of subsets of U. There is a
cost/benefit cf associated with every set f ∈ F. In the the set covering (set packing, set
partitioning) problem we wish to find a subcollection of the sets in F such that each ele-
ment in U is covered at least once (at most once, exactly once). The goal in the set covering
and set partitioning problem is to minimize the cost of the chosen sets, whereas in the set
packing problem we wish to maximize the benefit.
We can formulate each of these problems as a BIP by the following approach. We choose
binary variables xf , f ∈ F with the meaning that

1 if set f is chosen to be in the selection
xf =
0 otherwise.

Let A = (auf ) be the |U| × |F|-matrix which reflects the element-set containment relations,
that is 
1 if element u is contained in set f
auf :=
0 otherwise.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


1.5 Literature 11

Then, the constraint that every element is covered at least once (at most once, exactly once)
can be expressed by the linear constraints Ax > 1 (Ax 6 1, Ax = 1), where 1 = (1, . . . , 1) ∈
RU is the vector consisting of all ones. ⊳

1.5 Literature
These notes build upon my earlier notes [Kru04, Kru06]. The first version [Kru04] followed
more closely the book by Laurence Wolsey [Wol98], the second introduced the notation
which is still present in the current version.
Classical books about Linear and Integer Programming are the books of George Nemhauser
and Laurence Wolsey [NW99] and Alexander Schrijver [Sch86]. You can also find a lot of
useful stuff in the books [CCPS98, Sch03, GLS88] which are mainly about combinatorial
optimization. Section 5.3 discusses issues of complexity. A classical book about the theory
of computation is the book by Garey and Johnson [GJ79]. More books from this area
are [Pap94, BDG88, BDG90].

1.6 Acknowledgements
I wish to thank all students of the lecture »Optimization II: Integer Programming« at
the Technical University of Kaiserslautern for their comments and questions. Particular
thanks go to all students who actively reported typos and provided constructive criticism:
Oliver Bachtler, Tim Bergner, Robert Dorbritz, Ulrich Dürholz, Tatjana Christoph Hertrich,
Kowalew, Erika Lind, Wiredu Sampson and Phillip Süß. Needless to say, all remaining er-
rors and typos are solely my faults!

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


2
Basics

In this chapter we introduce some of the basic concepts that will be useful for the study of
integer programming problems.

2.1 Notation

Let A ∈ Rm×n be a matrix with row index set M = {1, . . . , m} and column index set N =
{1, . . . , n}. We write

A = (aij )i=1,...,m
j=1,...,n
 
a1j
A·,j =  ...  : jth column of A
 

amj
Ai,· = (ai1 , . . . , an1 ) : ith row of A.

For subsets I ⊆ M and J ⊆ N we denote by

AI,J := (aij )i∈I


j∈J

the submatrix of A formed by the corresponding indices. We also set

A·,J := AM,J
AI,· := AI,N .

For a subset X ⊆ Rn we denote by


 k

X
lin(X) := x= λi vi : λi ∈ R and v1 , . . . , vk ∈ X
i=1

the linear hull of X.


14 Basics

2.2 Convex Hulls


Definition 2.1 (Convex Hull)
Given a set X ⊆ Rn , the convex hull of X, denoted by conv(X) is defined to be the set of
all convex combinations of vectors from X, that is,
 k k

X X
conv(X) := x= λi vi : λi > 0, λi = 1 and v1 , . . . , vk ∈ X .
i=1 i=1

Suppose that X ⊆ Rn is some set, for instance X is the set of incidence vectors of all
spanning trees of a given graph (cf. Example 1.8). Suppose that we wish to find a vector
x ∈ X maximizing cT x.
P
If x = ki=1 λi vi ∈ conv(X) is a convex combination of the vectors v1 , . . . , vk ∈ X, then

k
X
cT x = λi cT vi 6 max{cT vi : i = 1, . . . , k}.
i=1

Hence, we have that

max{ cT x : x ∈ X } = max{ cT x : x ∈ conv(X) }

for any set X ⊆ Rn .

Observation 2.2 Let X ⊆ Rn be any set and c ∈ Rn be any vector. Then

max{ cT x : x ∈ X } = max{ cT x : x ∈ conv(X) }. (2.1)

Proof: See above. ✷

Observation 2.2 may seem of little use, since we have replaced a discrete finite problem
(left hand side of (2.1) by a continuous one (right hand side of (2.1)). However, in many
cases conv(X) has a nice structure that we can exploit in order to solve the problem. It turns
out that “most of the time” conv(X) is a polyhedron { x : Ax 6 b } (see Section 2.3) and that
the problem max{ cT x : x ∈ conv(X) } is a Linear Program.

Example 2.3
We return to the IP given in Example 1.2:

max x+y
2y − 3x 6 2
x+y 6 5
16x63
16y63
x, y ∈ Z

We have already noted that the set X of feasible solutions for the IP is

X = {(1, 1), (2, 1), (3, 1), (1, 2), (2, 2), (3, 2), (2, 3)}.

Observe that the constraint y 6 3 is actually superfluous, but that is not our main concern
right now. What is more important is the fact that we obtain the same feasible set if we add

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


2.2 Convex Hulls 15

3
b

2
b b b

1
b b b

0
0 1 2 3 4 5

Figure 2.1: The addition of the new constraint x − y > −1 (shown as the red line) leads to
the same feasible set.

the constraint x − y > −1 as shown in Figure 2.1. Moreover, we have that the convex hull
conv(X) of all feasible solutions for the IP is described by the following inequalities:

2y − 3x 6 2
x+y 6 5
−x + y 6 1
16x63
16y63

Observation 2.2 now implies that instead of solving the original IP we can also solve the
Linear Program

max x+y
2y − 3x 6 2
x+y 6 5
−x + y 6 1
16x63
16y63

that is, a standard Linear Program without integrality constraints. ⊳

In the above example we reduced the solution of an IP to solving a standard Linear Program.
We will see later that in principle this reduction is always possible (provided the data of
the IP is rational). However, there is a catch! The mentioned reduction might lead to an
exponential increase in the problem size. Sometimes we might still overcome this problem
(see Section 5.4).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


16 Basics

2.3 Polyhedra and Formulations


Definition 2.4 (Polyhedron, polytope)
A polyhedron is a subset of Rn described by a finite set of linear inequalities, that is, a
polyhedron is of the form
P(A, b) := { x ∈ Rn : Ax 6 b }, (2.2)
where A is an m × n-matrix and b ∈ Rm is a vector. The polyhedron P is a rational
polyhedron if A and b can be chosen to be rational. A bounded polyhedron is called
polytope.

In Section 1.4 we have seen a number of examples of Integer Linear Programs and we
have spoken rather informally of a formulation of a problem. We now formalize the term
formulation:
Definition 2.5 (Formulation)
A polyhedron P ⊆ Rn is a formulation for a set X ⊆ Zp ×Rn−p , if X = P ∩(Zp ×Rn−p ).

It is clear, that in general there is an infinite number of formulations for a set X. This
naturally raises the question about “good” and “not so good” formulations.
We start with an easy example which provides the intuition how to judge formulations.
Consider again the set X = {(1, 1), (2, 1), (3, 1), (1, 2), (2, 2), (3, 2), (2, 3)} ⊂ R2 from Exam-
ples 1.2 and 2.3. Figure 2.2 shows our known two formulations P1 and P2 together with a
third one P3 .
 

 2y − 3x 6 2  
 x
  
x+y 6 5
P1 = :

 y 16x63  
 
16y63
 

 2y − 3x 6 2  

  x + y 6 5 

 x 
P2 = : −x + y 6 1
 y

 16x63 



 

16y63

Intuitively, we would rate P2 much higher than P1 or P3 . In fact, P2 is an ideal formulation


since, as we have seen in Example 2.3 we can simply solve a Linear Program over P2 , the
optimal solution will be an extreme point which is a point from X.
Definition 2.6 (Better and ideal formulations)
Given a set X ⊆ Rn and two formulations P1 and P2 for X, we say that P1 is better than P2 ,
if P1 ⊂ P2 .
A formulation P for X is called ideal, if P = conv(X).

We will see later that the above definition is one of the keys to solving IPs.
Example 2.7
In Example 1.7 we have seen two possibilities to formulate the Uncapacitated Facility
Location Problem (U FL):
m
X X n
m X m
X m X
X n
min fj yj + cij xij min fj yj + cij xij
j=1 j=1 i=1 j=1 j=1 i=1
x ∈ P1 x ∈ P2
x ∈ Bnm , y ∈ Bm x ∈ Bnm , y ∈ Bm ,

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


2.4 Cones 17

3
b

2
b b b

1
b b b

0
0 1 2 3 4 5

Figure 2.2: Different formulations for the integral set X =


{(1, 1), (2, 1), (3, 1), (1, 2), (2, 2), (3, 2), (2, 3)}

where
  Pm
x j=1 xij = 1 for i = 1, . . . , n
P1 = :
y xij 6 yj for i = 1, . . . , n and j = 1, . . . , m
  Pm
x j=1 xij = 1 for i = 1, . . . , n
P2 = : Pn
y i=1 xij 6 nyj for j = 1, . . . , m

m > 1). If x ∈ P1 , then xij 6 yj for


We claim that P1 is a better formulation than P2 (for n, P
all i and j. Summing these constraints over i gives us n i=1 xij 6 nyj , so x ∈ P2 . Hence
we have P1 ⊆ P2 . We now show that P1 6= P2 thus proving that P1 is a better formulation.
We assume for simplicity that n/m = k is an integer. The argument can be extended to
the case that m does not divide n by some technicalities. We partition the clients into
m groups, each of which contains exactly k clients. The first group will be served by a
(fractional) facility at y1 , the second group by a (fractional) facility at y2 and so on. More
precisely, we set

xij = 1 for i = k(j − 1) + 1, . . . , k(j − 1) + k and j = 1, . . . , m

and xij = 0 otherwise. We also set yj = k/n for j = 1, . . . , m.


P
Fix j. By construction n k
i=1 xij = k = n n = nyj . Hence, the point (x, y) just constructed
is contained in P2 . On the other hand, (x, y) ∈
/ P1 . ⊳

2.4 Cones
Definition 2.8 (Cone, polyhedral cone)
A nonempty set C ⊆ Rn is called a (convex) cone, if for all x, y ∈ C and all λ, µ > 0 we
have:
λx + µy ∈ C.
A cone is called polyhedral, if there exists a matrix A such that

C = {x : Ax 6 0} .

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


18 Basics

Thus, polyhedral cones are also polyhedra, albeit special ones. We will see that they play
an important role in the representation of polyhedra.
Suppose that we have a set S ⊆ Rn Then, the set

cone(S) := {λ1 x1 + · · · + λk xk : xi ∈ S, λi > 0 for i = 1, . . . , k}

is obviously a cone. We call this the cone generated by S. It is easy to see that this is the
smallest cone containing S.

2.5 Linear Programming


We briefly recall the following fundamental results from Linear Programming which we
will use in these notes. For proofs, we refer to standard books about Linear Programming
such as [Sch86, Chv83].

Theorem 2.9 (Duality Theorem of Linear Programming) Let A be  an m × n-matrix,


b ∈ Rm and c ∈ Rn . Define the polyhedra P = {x : Ax 6 b} and Q = y : AT y = c, y > 0 .

(i) If x ∈ P and y ∈ Q then cT x 6 bT y. (weak duality)

(ii) In fact, we have


max{ cT x : x ∈ P } = min{ bT y : y ∈ Q }, (2.3)
provided that both sets P and Q are nonempty. (strong duality) ✷

Theorem 2.10 (Complementary Slackness Theorem) Let x∗ be a feasible solution of


max{ cT x : Ax 6 b } and y∗ be a feasible solution of min{ bT y : AT y = c, y > 0 }. Then
x∗ and y∗ are optimal solutions for the maximization problem and minimization problem,
respectively, if and only if they satisfy the complementary slackness conditions:

for each i = 1, . . . , m, either y∗i = 0 or Ai,· x∗ = bi . (2.4)

Theorem 2.11 (Farkas’ Lemma) The set {x : Ax = b, x > 0} is nonempty if and only if
there is no vector y such that AT y > 0 and bT y < 0. ✷

2.6 Agenda
These lecture notes consist of two main parts. The goals of Part I are as follows:

(i) Prove that for any rational polyhedron P = P(A, b) = {x : Ax 6 b} and X = P ∩ Zn


the set conv(X) is again a rational polyhedron.
 
(ii) Use the fact max cT x : x ∈ X = max cT x : x ∈ conv(X) (see Observation 2.2)
and (i) to show that the latter problem can be solved by means of Linear Program-
ming. This is accomplished by showing that an optimum solution will always be
found at an extreme point of the polyhedron conv(X) (which we show to be a point
in X).

(iii) Give tools to derive good formulations.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Part I

Polyhedral Theory
Polyhedra and Integer Programs
3
3.1 Valid Inequalities and Faces of Polyhedra
Definition 3.1 (Valid Inequality)
Let w ∈ Rn and t ∈ R. We say that the inequality wT x 6 t is valid for a set S ⊆ Rn , if

S ⊆ { x : wT x 6 t }.
w
for the inequality wT x 6 t.

Sometimes we write short t

Definition 3.2 (Face)


LetP ⊆ Rn be a polyhedron. The set F ⊆ P is called a face of P, if there is a valid inequality
w
t for P such that

F = x ∈ P : wT x = t .

If F 6= ∅ we say that w T

t supports the face F and call x : w x = t the corresponding
supporting hyperplane. If F 6= ∅ and F 6= P, then we call F a nontrivial or proper face.

Observe that any face of P(A, b) has the form




F = x : Ax 6 b, wT x 6 t, −wT x 6 −t

which shows that any face of a polyhedron is again a polyhedron.

Example 3.3
We consider the polyhedron P ⊆ R2 , which is defined by the inequalities

x1 + x2 6 2 (3.1a)
x1 6 1 (3.1b)
x1 , x2 > 0. (3.1c)

We have P = P(A, b) with



  
1 1 2
1 0 1 
A=
−1 0 
 and b=
0  .

0 −1 0
22 Polyhedra and Integer Programs

3x1 + x2 = 4

2 x1 = 5/2
F1

1 F2

P x1 + x2 = 2

1 2 3
2x1 + x2 = 3

Figure 3.1: Polyhedron for Example 3.3

0 1
 
The line segment F1 from 2 to 1 is a face of P, since x1 + x2 6 2 is a valid inequality
and 
F1 = P ∩ x ∈ R2 : x1 + x2 = 2 .
1

The singleton F2 = { 1 } is another face of P, since

F2 = P ∩ x ∈ R2 : 2x1 + x2 = 3

F2 = P ∩ x ∈ R2 : 3x1 + x2 = 4 .

Both inequalities 2x1 + x2 6 3 and 3x1 + x2 6 4 induce the same face of P. In particular,
this shows that the same face can be induced by completely different inequalities.
The inequalities x1 + x2 6 2, 2x1 + x2 6 3 and 3x1 + x2 6 4 induce nonempty faces of P.
They support F1 /F2 .
In contrast, the valid inequality x1 6 5/2 has

F3 = P ∩ x ∈ R2 : x1 = 5/2 = ∅,

and thus x1 6 5/2 does not give a supporting hyperplane of P. ⊳



Remark 3.4 (i) Any polyhedron P ⊆ Rn is a face of itself, since P = P ∩ x ∈ Rn : 0T x = 0 .

(ii) ∅ is a face of any polyhedron P ⊆ Rn , since ∅ = P ∩ x ∈ Rn : 0T x = 1 .

(iii) If F = P ∩ x ∈ Rn : cT x = γ is a nontrivial face of P ⊆ Rn , then c 6= 0, since
otherwise we are either in case (i) or (ii) above.

Let us consider Example 3.3 once more. Face F1 can be obtained by turning inequal-
ity (3.1a)) into an equality:

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.1 Valid Inequalities and Faces of Polyhedra 23

 
 x1 + x2 = 2 
F1 = x ∈ R2 : x1 6 1
 
x1 , x2 > 0
Likewise F2 can be obtained by making (3.1a) and (3.1b) equalities
 
 x1 + x2 = 2 
F2 = x ∈ R2 : x1 = 1
 
x1 , x2 > 0

Let P = P(A, b) ⊆ Rn be a polyhedron and M be the index set of the rows of A. For a
subset I ⊆ M we consider the set

fa(I) := {x ∈ P : AI,· x = bI } . (3.2)

Since any x ∈ P satisfies AI,· x 6 bI , we get by summing up the rows of (3.2) for
X X
cT := Ai,· and γ := bi
i∈I i∈I

a valid inequality cT x 6 γ for P. For all x ∈ P \ fa(I) there is at least one i ∈ I, such that
Ai,· x < bi . Thus cT x < γ for all x ∈ P \ fa(I) and


fa(I) = x ∈ P : cT x = γ

is a face of P.

Definition 3.5 (Face induced by index set)


The set fa(I) defined in (3.2) is called the face of P induced by I.

In Example 3.3 we have F1 = fa({1}) and F2 = fa({1, 2}). The following theorem shows that
in fact all faces of a polyhedron can be obtained this way.

Theorem 3.6 Let P = P(A, b) ⊆ Rn be a (nonempty) polyhedron and M be the index set
of the rows of A. The set F ⊆ Rn with F 6= ∅ is a face of P if and only if F = fa(I) =
{x ∈ P : AI,· x = bI } for a subset I ⊆ M.

Proof: We have fa(I) is a face of P for any I ⊆ M. Assume conversely


 already seen that
that F = P ∩ x ∈ Rn : cT x = t is a face of P. Then, F is precisely the set of optimal
solutions of the Linear Program


max cT x : Ax 6 b . (3.3)

(here we need the assumption that F 6= ∅). By the Duality Theorem of Linear Programming
(Theorem 2.9), the dual Linear Program for (3.3)


min bT y : AT y = c, y > 0

also has an optimal solution y∗ which satisfies bT y∗ = t. Let I := i : y∗i > 0 . By the
complementary slackness (Theorem 2.10) the optimal solutions of (3.3) are precisely those
x ∈ P with Ai,· x = bi for i ∈ I. This gives us F = fa(I). ✷

This result implies the following consequence:

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


24 Polyhedra and Integer Programs

Corollary 3.7 Every polyhedron has only a finite number of faces.

Proof: There is only a finite number of subsets I ⊆ M = {1, . . . , m}. ✷

We can also look at the binding equations for subsets of polyhedra.

Definition 3.8 (Equality set) Let P = P(A, b) ⊆ Rn be a polyhedron. For S ⊆ P we call


eq(S) := {i ∈ M : Ai,· x = bi for all x ∈ S} ,
the equality set of S.

Clearly, for subsets S, S ′ of a polyhedron P = P(A, b) with S ⊆ S ′ we have eq(S) ⊇ eq(S ′ ).


Thus, if S ⊆ P is a subset of P, then any face F of P which contains S must satisfy eq(F) ⊆
eq(S). On the other hand, fa(eq(S)) is a face of P containing S. Thus, we have the following
observation:

Observation 3.9 Let P = P(A, b) ⊆ Rn be a polyhedron and S ⊆ P be a nonempty subset


of P. The smallest face of P which contains S is fa(eq(S)).

Corollary 3.10 (i) The polyhedron P = P(A, b) does not have any proper face if and
only if eq(P) = M, that is, if and only if P is an affine subspace P = {x : Ax = b}.
(ii) If Ax̄ < b, then x̄ is not contained in any proper face of P.

Proof:

(i) Immediately from the characterization of faces in Theorem 3.6.


(ii) If Ax̄ < b, then eq({x̄}) = ∅ and fa(∅) = P.

3.2 Inner and Interior Points


Definition 3.11 (Inner point, interior point)
The vector x̄ ∈ P = P(A, b) is called an inner point, if eq({x}) = eq(P). We call x̄ ∈ P an
interior point, if Ax̄ < b.

By Corollary 3.10(ii) neither an interior point x̄ nor an innner point is contained in any
proper face.

Lemma 3.12 Let P = P(A, b) be a nonempty polyhedron. Then, the set of inner points of P
is nonempty.

Proof: Let M = {1, . . . , m} be the index set of the rows of A, I := eq(P) and J := M \ I.
If J = ∅, that is, if I = M, then by Corollary 3.10(i) the polyhedron P does not have any
proper face and any point in P is an inner point.
If J 6= ∅, then for any j ∈ J we can find an xj ∈ P such that Axj 6 b and Aj,· xj < bj . Since
P is convex, the vector y, defined as
1X j
y := x
|J|
j∈J

(which is a convex combination of the xj , j ∈ J) is contained in P. Then, AJ,· y < bJ and


AI,· y = bI . So, eq({y}) = eq(P) and the claim follows. ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.3 The Fundamental Theorem of Linear Inequalities 25

3.3 The Fundamental Theorem of Linear Inequalities


Theorem 3.13 (Fundamental Theorem of Linear Inequalities) Suppose that a1 , . . . , ak ∈
Rn and b ∈ Rn such that t := rank(a1 , . . . , ak , b). Then exactly one of the following state-
ment holds:

(i) b is a nonnnegative
 linear combination
of linear independent vectors from a1 , . . . , ak ,
i.e., b ∈ cone( ai1 , . . . , ait ) for linear independent vectors ai1 , . . . , ait .

(ii) There is a vector c 6= 0 such that the hyperplane x : cT x = 0 contains t−1 linearly
independent vectors from a1 , . . . , ak and cT ai > 0 for i = 1, . . . , k and cT b < 0.
If all the ai and b are rational, then c can be chosen to be a rational vector.

Proof: First, observe that at most one of the two cases can occur, since otherwise
k
X
0 > cT b = λi cT ai > 0,
|{z} | {z }
i=1 >0 >0

which is a contradiction.
We now prove that at least one of the cases occurs. If b is not in the subspace U spanned
by a1 , . . . , ak , then we can write b = b1 + b2 where b1 ∈ U and b2 6= 0 is in the orthogonal
complement of U. The vector −b2 satisfies then −bT2 ai = 0 for i = 1, . . . , k and −bT2 b =
−kb2 k22 < 0. In this case we are done.
Thus for the remainder of the proof we can assume that b is contained in U which means
that t := rank(a1 , . . . , ak , b) = rank(a1 , . . . , ak ). We can also assume without loss of gen-
erality that the vectors a1 , . . . , ak span the whole Rn , i.e., that t = n. The argument is as
follows: if t < n we consider an isomorphism ϕ : U → Rt , which is described by a matrix
T ∈ Rt×n . If all the ai are rational, then T is also a rational matrix. If we prove the theorem
for the vectors ϕ(a1 ), . . . , ϕ(ak ) and ϕ(b), we have one of the following situations:

(i) ϕ(b) = λi1 ϕ(ai1 ) + · · · + λit ϕ(ait ) = ϕ(λi1 ai1 + · · · + λit ait ) for linearly inde-
pendent vectors ϕ(aij ). Then by the fact that ϕ is an isomorophism, we have
b = λi1 ai1 + · · · + λit ait and we are done.

(ii) There is a vector c 6= 0 such that the hyperplane x : cT x = 0 contains t − 1 linearly
independent vectors from ϕ(a1 ), . . . , ϕ(ak ) and cT ϕ(ai ) > 0 for i = 1, . . . , k and
cT ϕ(b) < 0. Since ϕ(x) = T x for any vector in U we have that cT T x = (T T c)T x
and the vector T T c is as required.

So, for the rest of the proof we assume that the vectors a1 , . . . , ak span Rn . We choose
linearly independent vectors ai1 , . . . , ait and set D = {ai1 , . . . , ait }. We then apply the
following procedure:

1. Let b = λi1 ai1 + · · · + λit ait .

2. If λi > 0 for all i, then we are in case (i).

3. Otherwise, choose the smallest h such that λh < 0. We let H be the subspace spanned
by D\ {ah }. Then,
H is a hyperplane and, thus, there exists a vector c such that
H = x : cT x = 0 . We normalize c such that cT ah = 1 and hence

λh = cT b < 0. (3.4)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


26 Polyhedra and Integer Programs

(Observe that in case of rational ai the vector c can be chosen to be rational: Use the
Gram-Schmidt orthogonalization applied to D and ah (where ah is the last vector to
be handled). This yields a rational orthonormal basis of Rn and H is the orthogonal
complement of the last vector in the orthonormal basis).
4. If cT ai > 0 for i = 1, . . . , k, then we are in case (ii).
5. Otherwise, there is s such that cT as < 0. Choose the smallest such s. We set D :=
D \ {ah } ∪ {as } and continue in the first step. (Observe that the new set D forms a
set of linearly independent vectors, since as can not be a linear combination of the
other vectors in D, since cT as < 0 and cT aij = 0 for the other ones).

The proof is complete, if we can show that the above process must terminate. Suppose that
this is not the case. Then, since there are only finitely many choices for the set D, we must
have that Dp = Dp+m , where Dl is the set D at the beginning of iteration l.
Let r be the highest index, such that the vector ar has been removed in one of the iterations
p, p + 1, . . . , p + m and suppose that this is iteration ℓ. Then ar must also have been added
in another iteration q, where ℓ 6 q < p + m. By construction we have:

Dℓ ∩ {ar+1 , . . . , am } = Dq ∩ {ar+1 , . . . , am } . (3.5)

Let Dℓ = {ai1 , . . . , ait }, b = λ1 ai1 + · · · + λk aik and c̄ be the vector found in iteration q.
Then, we have:

• λij > 0 for ij < r, since r was the smallest index h in iteration ℓ such that λh < 0.

• c̄T aij > 0 for ij < r, since r was the smallest index s in iteration q such that c̄T as <
0.
• This gives us
λij c̄T aij > 0 for ij < r. (3.6)

• λij < 0 and c̄T aij < 0 for ij = r, since ar was removed in iteration ℓ and added in
iteration q:
λij c̄T aij > 0 for ij = r. (3.7)

• c̄T aij = 0 for ij > r, since we have (3.5) and the way we constructed c̄:

λij c̄T aij = 0 for ij > r. (3.8)

Using (3.6), (3.7) and (3.8) yields:


(3.4)
0 > c̄T b = c̄T (λi1 ai1 + · · · + λit ait ) = λi1 c̄T ai1 + · · · + λit c̄T ait > 0,

which is a contradiction.

3.4 The Decomposition Theorem


Recall that in Section 2.4 we called

cone(S) := {λ1 x1 + · · · + λk xk : xi ∈ S, λi > 0 for i = 1, . . . , k}

the cone generated by the set S. We call a cone C finitely generated, if there is a finite set S
such that C = cone(S).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.4 The Decomposition Theorem 27

Theorem 3.14 (Theorem of Weyl-Minkowski-Farkas) A convex cone C is polyhedral if


and only if it is finitely generated.
If C is a rational polyhedral cone, then it is generated by a finite set of integral vectors and
each cone generated by rational vectors is a rational polyhedron.

Proof: Suppose that x1 , . . . , xk are vectors in Rn . We must prove that cone({x1 , . . . , xk }) is


polyhedral. To this end, assume in the first step that x1 , . . . , xk span Rn .
By the Fundamental Theorem of Linear Inequalities 3.13 we have: y ∈ cone({x1 , . . . , xk })
if and only if for all hyperplanes {x : cT x = 0} containing n − 1 linearly independent vectors
from x1 , . . . , xk and such that cT xi > 0 for i = 1, . . . , k we have cT y > 0. In other words,
y ∈ cone({x1 , . . . , xk }) if and only if y is contained in any halfspace {x : cT x > 0} such
that {x : cT x = 0} contains n − 1 linear vectors from x1 , . . . , xk and such that cT xi > 0
for i = 1, . . . , k. There are only finitely many such halfspaces (since there are only finitely
many ways to select the n − 1 linearly independent vectors). So cone({x1 , . . . , xk }) is the
solution set of a finite number of inequalities, i.e., a polyhedron. Observe that in the case of
rational vectors xi each c is rational (see the Fundamental Theorem of Linear Inequalities)
and, hence, the cone is a rational polyhedron.
If x1 , . . . , xk do not span Rn , we follow the above construction by extending each half-
space H of the subspace U spanned by x1 , . . . , xk to a halfspace H ′ of Rn such that
H ′ ∩ U = H.
We now prove the other direction. To this end, assume that

C = {x : Ax 6 0} = {x : Ai,· x 6 0, i = 1, . . . , m} .

We have already shown that each finitely generated cone is polyhedral, so




cone( AT1,· , . . . , ATm,· ) = {x : Bx 6 0} = x : Bj,· x 6 0, j = 1, . . . , p . (3.9)

(in case of a rational matrix A by our proof above also all the entries in B can be chosen to
be rational). We are done, if we can show that C = cone({B1,· , . . . , Bp,· }).
By (3.9) we have Ai,· BTj,· 6 0 for all i and all j. This gives us Bj,· ∈ C for all j and hence
cone({B1,· , . . . , Bp,· }) ⊆ C.

Assume that there exists some y ∈ C \ cone( BT1,· , . . . , BTp,· ). Then, by the Fundamental
Theorem of Linear Inequalities, there exists some c 6= 0 such that cT y < 0 and cBTj,· > 0
P
for j = 1, . . . , p. By (3.9) we have −c ∈ cone({A1,· , . . . , Am,· }), say −cT = mi=1 λi Ai,· for
nonnegative scalars λi .
If x ∈ C, then we have Ai,· x 6 0 for i = 1, . . . , m and hence also
m
X
0> λi Ai,· x = −cT x.
i=1

Thus, it follows that (−c)T x 6 0 for all x ∈ C. But this is a contradiction to the fact that
y ∈ C and −cT y > 0.
If C is a rational cone, then we have seen above that C is generated by rational vectors
B1,· , . . . , Bp,· . Since we can multiply the entries in each rational vector by any positive
number without changing the generated cone, it follows that the vectors B1,· , . . . , Bp,· can
in fact be chosen to be integral. ✷

Theorem 3.15 (Decomposition Theorem for Polyhedra) A set P ⊆ Rn is a polyhedron


if and only if there exists a finite set X of points such that

P = conv(X) + C (3.10)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


28 Polyhedra and Integer Programs

for some polyhedral cone C. If P is rational, then the points in X can be chosen to be
rational and C is a rational cone. Conversely, if X is a finite set of rational vectors and C
is a rational cone, then P := conv(X) + C is a rational polyhedron.

Proof: Suppose that P = P(A, b) = {x : Ax 6 b} is a polyhedron. We consider the polyhe-


dral cone  
′ x n+1
C := ∈R : Ax − λb 6 0, λ > 0. (3.11)
λ
By Theorem 3.14 this cone C ′ is generated by finitely many vectors λx11 , . . . , λxk . Without
 
k
loss of generality we can assume that each λi is either 0 or 1 (since the cone generated by
 does not change if we multiply by positive scalars). Observe that since λ > 0
some vectors
for all λx ∈ C ′ we have λi > 0 for i = 1, . . . , k. We set

Q := conv({xi : λi > 0}) = conv({xi : λi = 1})

C := cone({xi : λi = 0}).

We then have:

x ∈ P ⇔ Ax 6 b
 
x
⇔ ∈ C′
1
       
x x1 xk
⇔ ∈ cone ,...,
1 λ1 λk
⇔ x ∈ Q + C.

Assume now conversely that P = conv(X) + C for a finite set X = {x1 , . . . , xk } and a polyhe-
dral cone C. By Theorem 3.14, the cone C is finitely generated, i.e., C = cone({y1 , . . . , yp }).
We have
k
X p
X k
X
x∈P⇔x= λi xi + µj yj for some λi > 0, µj > 0 with λi = 1
i=1 j=1 i=1
          
x x1 xk y1 yp
⇔ ∈ cone ,..., , ,..., .
1 1 1 0 0
By Theorem 3.14 the last cone in the above calculation is polyhedral, i.e., there is a ma-
trix (A b) such that
            
x1 xk y1 yp x x
cone ,..., , ,..., = : (A b) 60
1 1 0 0 λ λ
 
x
= : Ax + λb 6 0 .
λ
Thus
          
x x1 xk y1 yp
x∈P⇔ ∈ cone ,..., , ,...,
1 1 1 0 0
Ax 6 −b.

Thus, P is a polyhedron. ✷

Corollary 3.16 (Characterization of polytopes) A set P ⊆ Rn is a polytope if and only


if it is the convex hull of finitely many points.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.5 Dimension 29

Proof: Let P be a polytope. By the Decomposition Theorem we have P = conv(X) + C


for a finite set X and a polyhedral cone C. Since P is bounded we must have C = {0} and
P = conv(X).
Conversely, if P = conv(X), then P is a polyhedron (again by the Decomposition Theorem)
and also bounded, thus a polytope. ✷

Example 3.17
A stable set (or independent set) in an undirected graph G = (V, E) is a subset S ⊆ V of
the vertices such that none of the vertices in S are joined by an edge. We can formulate the
problem of finding a stable set of maximum cardinality as an IP:
X
max xv (3.12a)
v∈V
xu + xv 6 1 for all edges (u, v) ∈ E (3.12b)
xv > 0 for all vertices v ∈ V (3.12c)
xv 6 1 for all vertices v ∈ V (3.12d)
xv ∈ Z for all vertices v ∈ V (3.12e)

As an application of Corollary 3.16 we consider the so-called stable-set polytope STAB(G),


which is defined as the convex hull of the incidence vectors of stable sets in an undirected
graph G.


STAB(G) = conv x ∈ BV : x is an incidence vector of a stable set in G . (3.13)

By Corollary 3.16, STAB(G) is a polytope and we can solve (3.12) by Linear Programming
over a polytope {x : Ax 6 b}. Of course, the issue is how to find A and b! ⊳

3.5 Dimension
Intuitively the notion of dimension seems clear by considering the degrees of freedom we
have in moving within a given polyhedron (cf. Figure 3.2).

Definition 3.18 (Affine Combination, affine independence, affine hull) P


An affine combination of the vectors v1 , . . . , vk ∈ Rn is a linear combination x = ki=1 λi v
i
Pk
such that i=1 λi = 1.
Given a set X ⊆ Rn , the affine hull of X, denoted by aff(X) is defined to be the set of all
affine combinations of vectors from X, that is
k
X k
X
aff(X) := { x = λi vi : λi = 1 and v1 , . . . , vk ∈ X }
i=1 i=1

Pk Pk
The vectors v1 , . . . , vk ∈ Rn are called affinely independent, if i
i=1 λi v = 0 and i=1 λi =
0 implies that λ1 = λ2 = · · · = λk = 0.

Lemma 3.19 The following statements are equivalent

(i) The vectors v1 , . . . , vk ∈ Rn are affinely independent.


(ii) The vectors v2 − v1 , . . . , vk − v1 ∈ Rn are linearly independent.
1 k
(iii) The vectors v1 , . . . , v1 ∈ Rn+1 are linearly independent.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


30 Polyhedra and Integer Programs

6 z

5 P0
4
Q3
3 P1 Q2
P2
2

1 Q0 y
x b b

b Q1
1 2 3 4 5 6 (b) Polyhedra in R3
(a) Polyhedra in R2

Figure 3.2: Examples of polyhedra with various dimensions

Proof:
P P Pk
(i)⇔(ii) If k λ (vi − v1 ) = 0 and we set λ1 := − k i=2 λi , this gives us
i
i=1 λi v = 0
Pki=2 i
and i=1 λi = 0. Thus, from the affine independence it follows that λ1 = · · · = λk =
0.
P
Assume conversely that v2 −v1 , . . . , vk −v1 are linearly independent and k i
i=1 λi v =
Pk Pk Pk i 1
0 with i=1 λi = 0. Then λ1 = − i=2 λi which gives i=2 λi (v − v ) = 0. The
linear independence of v2 − v1 , . . . , vk − v1 implies λ2 = · · · = λk = 0 which in turn
also gives λ1 = 0.
(i)⇔(iii) This follows immediately from
k
 P 
 i k
X v i=1 λi vi = 0
λi =0⇔ Pk .
i=1
1 i=1 λi = 0

This completes the proof. ✷

Definition 3.20 (Dimension of a polyhedron, full-dimensional polyhedron)


The dimension dim P of a polyhedron P ⊆ Rn is one less than the maximum number of
affinely independent vectors in P. We set dim ∅ = −1. If dim P = n, then we call P full-
dimensional.
Example 3.21
Consider the polyhedron P ⊆ R2 defined by the following inequalities (see Figure 3.3):

x62 (3.14a)
x+y 6 4 (3.14b)
x + 2y 6 10 (3.14c)
x + 2y 6 6 (3.14d)
x+y > 2 (3.14e)
x, y > 0 (3.14f)

The polyhedron P is full dimensional, since (2, 0), (1, 1) and (2, 2) are three affinely inde-
pendent vectors. ⊳

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.5 Dimension 31

(2, 2)
2

1 2 3 4 5

Figure 3.3: A fulldimensional polyhedron in R2 .

Example 3.22
Consider the stable set polytope from Example 3.17. Let P be the polytope determined
by the inequalities in (3.12). We claim that P is full dimensional. To see this, con-
sider the n unit vectors ei = (0, . . . , 0, 1, 0, . . . , 0)T , i = 1, . . . , n and e0 := (0, . . . , 0)T . Then
e0 , e1 , . . . , en are affinely independent and, thus, dim P = n. ⊳

Theorem 3.23 (Dimension Theorem) Let F 6= ∅ be a face of the polyhedron P(A, b) ⊆


Rn . Then we have
dim F = n − rank Aeq(F),· .

Proof: By Linear Algebra we know that

dim Rn = n = rank Aeq(F),· + dim kern Aeq(F),· .

Thus, the theorem follows, if we can show that dim kern(Aeq(F),· ) = dim F. We abbreviate
I := eq(F) and set r := dim kern AI,· , s := dim F.

“r > s”: Select s + 1 affinely independent vectors x0 , x1 , . . . , xs ∈ F. Then, by Lemma 3.19


x1 − x0 , . . . , xs − x0 are linearly independent vectors and AI,· (xj − x0 ) = bI − bI = 0
for j = 1, . . . , s. Thus, the dimension of kern AI,· is at least s.
“s > r”: Since we have assumed that F 6= ∅, we have s = dim F > 0. Thus, in the sequel
we can assume that r > 0 since otherwise there is nothing left to prove.
By Lemma 3.12 there exists an inner point x̄ of F which satisfies eq({x̄}) = eq(F) = I.
Thus, for J := M \ I we have

AI,· x̄ = bI and AJ,· x̄ < bJ .

Let {x1 , . . . , xr } be a basis of kern AI,· . Then, since AJ,· x̄ < bJ we can find ε > 0 such
that AJ,· (x̄ + εxk ) < bJ and AI,· (x̄ + εxk ) = bI for k = 1, . . . , r. Thus, x̄ + εxk ∈ F
for k = 1, . . . , r.
The vectors εx1 , . . . , εxr are linearly independent and, by Lemma 3.19 x̄, εx1 +
x̄, . . . , εxr + x̄ form a set of r + 1 affinely independent vectors in F which implies
dim F > r.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


32 Polyhedra and Integer Programs

Corollary 3.24 Let P = P(A, b) ⊆ Rn be a nonempty polyhedron. Then:

(i) dim P = n − rank Aeq(P),· .

(ii) P is full dimensional if eq(P) = ∅.

(iii) P is full dimensional if P contains an interior point.

(iv) If F is a proper face of P, then 0 6 dim F 6 dim P − 1.

Proof:

(i) Use Theorem 3.23 with F = P.

(ii) Immediate from (i).

(iii) If P has an interior point, then eq(P) = ∅.

(iv) Let I := eq(P), j ∈ eq(F) \ I and J := eq(P) ∪ {j}. We show that Aj,· is linearly in-
dependent of the rows in AI,· . This shows that rank Aeq(F),· > rank AJ,· > rank AI,·
and by the Dimension Theorem we have dim F 6 dim P − 1.
P
Assume that Aj,· = i∈I λi Ai,· . Take x̄ ∈ F arbitrary, then
X X
bj = Aj,· x̄ = λi Ai,· x̄ = λi bi .
i∈I i∈I

Since j ∈
/ eq(P), there is x ∈ P such that Aj,· x < bj . But by the above we have
X X
bj > Aj,· x = λi Ai,· x = λi bi = bj ,
i∈I i∈I

which is a contradiction.

We derive another important consequence of the Dimesion Theorem about the facial struc-
ture of polyhedra:

Theorem 3.25 (Hoffmann and Kruskal) Let P = P(A, b) ⊆ Rn be a polyhedron. Then a


nonempty set F ⊆ P is an inclusionwise minimal face of P if and only if F = {x : AI,· x = bI }
for some index set I ⊆ M and rank AI,· = rank A.

Proof: “⇒”: Let F be a minimal nonempty face of P. Then, by Theorem 3.6 and Observa-
tion 3.9 we have F = fa(I), where I = eq(F). Thus, for J := M \ I we have

F = {x : AI,· x = bI , AJ,· x 6 bJ } . (3.15)

We claim that F = G, where


G = {x : AI,· x = bI } . (3.16)
By (3.15) we have F ⊆ G. Suppose that there exists y ∈ G \ F. Then, there exists j ∈ J such
that
AI,· y = bI , Aj,· y > bj . (3.17)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.6 Extreme Points 33

Let x̄ be any inner point of F which exists by Lemma 3.12. We consider for τ ∈ R the point

z(τ) = x̄ + τ(y − x̄) = (1 − τ)x̄ + τy.

Observe that AI,· z(τ) = (1 − τ)AI,· x̄ + τAI,· y = (1 − τ)bI + τbI = bI , since x̄ ∈ F and y
satisfies (3.17). Moreover, AJ,· z(0) = AJ,· x̄ < bJ , since J ⊆ M \ I.
Since Aj,· y > bj we can find τ ∈ R and j0 ∈ J such that Aj0 ,· z(τ) = bj0 and AJ,· z(τ) 6 bJ .
Then, τ 6= 0 and 
F ′ := x ∈ P : AI,· x = bI , Aj0 ,· x = bj0
is a face which is properly contained in F (note that x̄ ∈ F \ F ′ ). This contradicts the choice
of F as inclusionwise minimal. Hence, we have that F can be represented as (3.16).
It remains to prove that rank AI,· = rank A. If rank AI,· < rank A, then there exists an index
j ∈ J = M \ I, such that Aj,· is not a linear combination of the rows in AI,· . Then, we can
find a vector w 6= 0 such that AI,· w = 0 and Aj,· w > 0. For θ > 0 appropriately chosen
the vector y := x̄ + θw satisfies (3.17) and as above we can construct a proper face F ′ of F
contradicting the minimality of F.
“⇐”: If F = {x : AI,· x = bI }, then F is an affine subspace and Corollary 3.10(i) shows that F
does not have any proper face. By assumption F ⊆ P and thus F = {x : AI,· = bI , AJ,· x 6 bJ }
is a minimal face of P. ✷

Corollary 3.26 All minimal nonempty faces of a polyhedron P = P(A, b) have the same
dimension, namely n − rank A. ✷

Corollary 3.27 Let P = P(A, b) ⊆ Rn be a nonempty polyhedron and rank(A) = n − k.


Then P has a face of dimension k and does not have a proper face of lower dimension.

Proof: Let F be any nonempty face of P. Then, rank Aeq(F),· 6 rank A = n − k and thus by
the Dimension Theorem (Theorem 3.23) it follows that dim(F) > n − (n − k) = k. Thus,
any nonempty face of P has dimension at least k.
On the other hand, by Corollary 3.26 any inclusionwise minimal nonempty face of P has
dimension n − rank A = n − (n − k) = k. Thus, P has in fact faces of dimension k. ✷

There will be certain types of faces which are of particular interest:

• extreme points (vertices),

• facets.

We will also be interested in extreme rays, which are directions in which we can “move
towards infinity” inside a polyhedron.
In the next section we discuss extreme points and their meaning for optimization. Sec-
tion 3.7 deals with facets and their importance in describing polyhedra by means of in-
equalities.

3.6 Extreme Points


Definition 3.28 (Extreme point, pointed polyhedron)
The point x̄ ∈ P = P(A, b) is called an extreme point of P, if x̄ = λx + (1 − λ)y for some
x, y ∈ P and 0 < λ < 1 implies that x = y = x̄.
A polyhedron P = P(A, b) is pointed, if it has at least one extreme point.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


34 Polyhedra and Integer Programs

Example 3.29
Consider the polyhedron from Example 3.21. The point (2, 2) is an extreme point of the
polyhedron. ⊳

Theorem 3.30 (Characterization of extreme points) Let P = P(A, b) ⊆ Rn be a poly-


hedron and x̄ ∈ P. Then, the following statements are equivalent:

(i) {x̄} is a zero-dimensional face of P.

(ii) There exists a vector c ∈ R n such that x̄ is the unique optimal solution of the Linear
Program max cT x : x ∈ P .

(iii) x̄ is an extreme point of P.

(iv) rank Aeq({x̄}),· = n.

Proof: “(i)⇒(ii)”:
 Since {x̄} is a face of P, there exists a valid inequality wT x 6 t such
that {x̄} = x ∈ P : wT x = t . Thus, x̄ is the unique optimum of the Linear Program with
objective c := w.

“(ii)⇒(iii)”: Let x̄ be the unique optimum solution of max cT x : x ∈ P . If x̄ = λx + (1 −
λ)y for some x, y ∈ P, (x, y) 6= (x̄, x̄) and 0 < λ < 1, then we have

cT x̄ = λcT x + (1 − λ)cT y 6 λcT x̄ + (1 − λ)cT x̄ = cT x̄.

Thus, we can conclude that cT x̄ = cT x = cT y which contradicts the uniqueness of x̄ as


optimal solution.
“(iii)⇒(iv)”: Let I := eq({x̄}). If rank AI,· < n, there exists y ∈ Rn \{0} such that AI,· y = 0.
Then, for sufficiently small ε > 0 we have x := x̄ + εy ∈ P and y := x̄ − εy ∈ P (since
/ I). But then, x̄ = 21 x + 21 y which contradicts the assumption that x̄ is
Aj,· x̄ < bj for all j ∈
an extreme point.
“(iv)⇒(i)”: Let I := eq({x̄}). By (iv), the system AI,· x = bI has a unique solution which
must be x̄ (since AI,· x̄ = bI by construction of I). Hence

{x̄} = {x : AI,· x = bI } = {x ∈ P : AI,· x = bI }

and by Theorem 3.6 {x̄} is a zero-dimensional face of P. ✷

The result of the previous theorem has interesting consequences for optimization. Consider
the Linear Program


max cT x : x ∈ P , (3.18)

where P is a pointed polyhedron (that is, it has extreme points). Since by Corollary 3.26
all minimal proper faces of P have the same dimension, it follows that the minimal proper
faces of P are of the form {x̄}, where x̄ is an extreme point of P. Suppose that P 6= ∅ and
cT x is bounded on P. We know that there exists an optimal solution x∗ ∈ P. The set of
optimal solutions of (3.18) is a face


F = x ∈ P : cT x = cT x∗

which contains a minimal nonempty face F ′ ⊆ F. Thus, we have the following corollary:

Corollary 3.31 If the polyhedron P is pointed and the Linear Program (3.18) has optimal
solutions, it has an optimal solution which is also an extreme point of P.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.6 Extreme Points 35

Another important consequence of the characterization of extreme points in Theorem 3.30


is the following:

Corollary 3.32 Every polyhedron has only a finite number of extreme points.

Proof: By the preceding theorem, every extreme point is a face. By Corollary 3.7, there is
only a finite number of faces. ✷

Let us now return to the Linear Program (3.18) which we assume to have an optimal so-
lution. We also assume that P is pointed, so that the assumptions of Corollary 3.31 are
satisfied. By the Theorem of Hoffman and Kruskal (Theorem 3.25) every extreme point x̄
of P is the solution of a subsystem

AI,· x = bI , where rank AI,· = n.

Thus, we could obtain an optimal solution of (3.18) by “brute force”, if we simply consider
all subsets I ⊆ M with |I| = n, test if rank AI,· = n (this can be done by Gaussian elimina-
tion) and solve AI,· x = bI . We then choose the best of the feasible solutions obtained this
way. This gives us a finite algorithm for (3.18). Of course, the Simplex Method provides a
more sophisticated way of solving (3.18).
Let us now derive conditions which ensure that a given polyhedron is pointed.

Corollary 3.33 A nonempty polyhedron P = P(A, b) ⊆ Rn is pointed if and only if


rank A = n.

Proof: By Corollary 3.26 the minimal nonempty faces of P are of dimension 0 if and only
if rank A = n. ✷

Corollary 3.34 Any nonempty polytope is pointed.

Proof: Let P = P(A, b) be a polytope and x ∈ P be arbitrary. By Corollary 3.33 it suffices


to show that rank A = n. If rank A < n, then we can find y ∈ Rn with y 6= 0 such that
Ay = 0. But then x + θy ∈ P for all θ ∈ R which contradicts the assumption that P is
bounded. ✷

Corollary 3.35 Any nonempty polyhedron P ⊆ Rn


+ is pointed.

Proof: If P = P(A, b) ⊆ Rn
+ , we can write P alternatively as
   
A b
P= x: x6 = P(Ā, b̄).
−I 0
 
A
Since rank Ā = rank = n, we see again that the minimal faces of P are extreme
−I
points. ✷

On the other hand, Theorem 3.30(ii) is a formal statement of the intuition that by optimizing
with the help of a suitable vector over a polyhedron we can “single out” every extreme
point. We now derive a stronger result for rational polyhedra:

Theorem 3.36 Let P = P(A, b) be a rational polyhedron and let x̄ ∈ P be an extreme point
of P.There exists an integral vector c ∈ Zn such that x̄ is the unique optimal solution of
max cT x : x ∈ P .

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


36 Polyhedra and Integer Programs

Proof: Let I :=P eq({x̄}) and M := {1, . . . , m} be the index set of the rows of A. Consider
the vector c̄ = i∈M ATi,· . Since all the Ai,· are rational, we can find a θ > 0 such that
c := θc̄ ∈ Zn is integral. Since fa(I) = {x̄} (cf. Observation 3.9), for every x ∈ P with x 6= x̄
there is at least one i ∈ I such that Ai,· x < bi . Thus, for all x ∈ P \ {x̄} we have
X X
cT x = θ Ai,· x < θ bi = θcT x̄.
i∈I i∈I

This proves the claim. ✷

Consider the polyhedron

P= (A, b) := {x : Ax = b, x > 0} ,

where A is an m × n matrix. A basis of A is an index set B ⊆ {1, . . . , n} with |B| = m


such that the square matrix A·,B formed by the columns from B is nonsingular. The basic
solution corresponding to B is the vector (xB , xN ) with xB = A−1
·,B b, xN = 0. The basic
solution is called feasible, if it is contained in P= (A, b).
The following theorem is a well-known result from Linear Programming:

Theorem 3.37 Let P = P= (A, b) = {x : Ax = b, x > 0} and x̄ ∈ P, where A is an m × n


matrix of rank m. Then, x̄ is an extreme point of P if and only if x̄ is a basic feasible
solution for some basis B.

Proof: Suppose that x̄ is a basic solution for B and x̄ = λx + (1 − λ)y for some x, y ∈ P.
It follows that xN = yN = 0. Thus xB = yB = A−1 ·,B b = x̄B . Thus, x̄ is an extreme point
of P.
Assume now conversely that x̄ is an extreme point of P. Let B := {i : x̄i > 0}. We claim that
the matrix A·,B consists of linearly independent colums. Indeed, if A·,B yB = 0 for some
yB 6= 0, then for small ε > 0 we have xB ± εyB > 0. Hence, if we set N := {1, . . . , m} \ B
and y = (yB , yB ) we have x̄ ± εy ∈ P and hence we can write x as a convex combination
x = 12 (x̄ + εy) + 12 (x̄ − εy) contradicting the fact that x̄ is an extreme point.
Since AB has linearly independent columns, it follows that |B| 6 m. Since rank A = m we
can augment B to a basis B ′ . Then, x̄ is the basic solution for B ′ . ✷

We close this section by deriving structural results for polytopes. We need one auxiliary
result:

Lemma 3.38 Let X ⊂ Rn be a finite set and v ∈ Rn \ conv(X). There exists an inequality
that separates v from conv(X), that is, there exist w ∈ Rn and t ∈ R such that wT x 6 t
for all x ∈ conv(X) and wT v > t.

Proof: Let X = {x1 , . . . , xk }. Since v ∈


/ conv(X), the system

k
X
λk xk = v
i=1
k
X
λk = 1
i=1
λi > 0 for i = 1, . . . , k

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.7 Facets 37

y

does not have a solution. By Farkas’ Lemma (Theorem 2.11), there exists a vector z ∈
Rn+1 such that
yT xi + z > 0, for i = 1, . . . , k
yT v + z < 0.
If we choose w := −y and t := z we have wT xi 6 t for i = 1, . . . , k and wT v > t.
P
If x = k i=1 λi xi is a convex combination of the xi , then as in Section 2.2 we have:
k
X
T
w x= λi wT xi 6 max{wT xi : i = 1, . . . , k} 6 t.
i=1

Thus, wT x 6 t for all x ∈ conv(X). ✷

Theorem 3.39 A polytope is equal to the convex hull of its extreme points.

Proof: The claim is trivial, if the polytope is empty. Thus, let P = P(A, b) be a nonempty
polytope. Let X = {x1 , . . . , xk } be the extreme points of P (which exist by Corollary 3.34
and whose number is finite by Corollary 3.32). Since P is convex and x1 , . . . , xk ∈ P,
we have conv(X) ⊆ P. We must show that conv(X) ⊇ P. Assume that there exists v ∈
P \ conv(X). Then, by Lemma (3.38) we can find an inequality wT x 6 t such that wT x 6 t
T
 xT∈ conv(X)
for all but w v > t. Since P is bounded and nonempty, the Linear Program
max w x : x ∈ P has a finite solution value t∗ ∈ R. Since v ∈ P we have t∗ > t. Thus,
none of the extreme points of P is an optimal solution, which is impossible by Corol-
lary 3.31. ✷

Corollary 3.40 If X ⊆ Rn is a finite nonempty set of rational vectors, then conv(X) is a


rational polytope.

Proof: Immediately from Theorem 3.39 and Corollary 3.16 ✷

The results of Corollary 3.16 and Corollary 3.40 are the two major driving forces behind
polyhedral combinatorics. Let X ⊆ Rn be a nonempty finite set, for instance, let X be the
set of incidence vectors of stable sets of a given graph G as in the above example. Then,
by the preceding theorem we can represent conv(X) as a pointed polytope:
conv(X) = P = P(A, b) = {x : Ax 6 b} .
Since P is bounded and nonempty, for any given c ∈ Rn the Linear Program



max cT x : x ∈ P = max cT x : x ∈ conv(X) (3.19)
 T
has a finite value which by Observation 2.2 coincides with max c x : x ∈ X . By Corol-
lary 3.31 an optimal solution of (3.19) will always be obtained at an extreme point of P,
which must be a point in X itself. So, if we solve the Linear Program (3.19) we can also
solve the problem of maximizing cT x over the discrete set X.

3.7 Facets
In the preceding section we proved that for a finite set X ⊆ Rn its convex hull conv(X) is
always a polytope and thus has a representation
conv(X) = P(A, b) = {x : Ax 6 b} .
This motivates the questions which inequalities are actually needed in order to describe a
polytope, or more general, to describe a polyhedron.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


38 Polyhedra and Integer Programs

Definition 3.41 (Facet)


A nontrivial face F of the polyhedron P = P(A, b) is called a facet of P, if F is not strictly
contained in any proper face of P.

Example 3.42
Consider again the polyhedron from Example 3.21. The inequality x 6 3 is valid for P. Of
course, also all inequalities from (3.14) are also valid. Moreover, the inequality x + 2y 6 6
defines a facet, since (3, 3) and (2, 2) are affinely independent. On the other hand, the
inequality x + y 6 4 defines a face that consists only of the point (2, 2). ⊳

Theorem 3.43 (Characterization of facets) Let P = P(A, b) ⊆ Rn be a polyhedron and


F be a face of P. Then, the following statements are equivalent:

(i) F is a facet of P.
(ii) rank Aeq(F),· = rank Aeq(P),· + 1
(iii) dim F = dim P − 1.

Proof: The equivalence of (ii) and (iii) is an immediate consequence of the Dimension
Theorem (Theorem 3.23).
“(i)⇒(iii)”: Suppose that F is a facet but k = dim F < dim P − 1. By the equivalence of (ii)
and (iii) we have rank AI,· > rank Aeq(P),· + 1, where I = eq(F). Chose U ⊆ I such that
for J := I \ U we have rank AJ,· = rank AI,· − 1. Then fa(J) is a face which contains F and
which has dimension k + 1 6 dim P − 1. So fa(J) is a proper face of P containing F which
contradicts the maximality of F.
“(iii)⇒(i)”: Suppose that G is any proper face of P which strictly contains F. Then F is
a proper face of G and by Corollary 3.24(iv) applied to F and P ′ = G we get dim F 6
dim G − 1 which together with dim F = dim P − 1 gives dim G = dim P. But then, again by
Corollary 3.24(iv), G can not be a proper face of P. ✷

Example 3.44
As an application of Theorem 3.43 we consider again the stable-set polytope, which we
have seen to be full-dimensional in Example 3.17.
For any v ∈ V, the inequality xv > 0 defines a facet of STAB(G), since the n − 1 unit
vectors with ones at places other than position v and the zero vector form a set of n affinely
independent vectors from STAB(G) which all satisfy the inequality as equality. ⊳

As a consequence of the previous theorem we show that for any facet of a polyhedron
P = P(A, b) there is at least one inequality in Ax 6 b inducing the facet:

Corollary 3.45 Let P = P(A, b) ⊆ Rn be a polyhedron and F be a facet of P. Then, there


exists an index j ∈ M \ eq(P) such that

F = x ∈ P : Aj,· x = bj . (3.20)

Proof: Let I = eq(P). Choose any j ∈ eq(F) \ I and set J := I ∪ {j}. Still J ⊆ eq(F), since
I ⊆ eq(F) and j ∈ eq(F). Thus, F ⊆ fa(J) ⊂ P (we have fa(J) 6= P since any inner point x̄
of P has Aj,· x̄ < bj since j ∈ eq(F) \ I) and by the maximality of F we have F = fa(J). So,

F = fa(J) = {x ∈ P : AJ,· x = bJ }

= x ∈ P : AI,· x = bI , Aj,· x = bj

= x ∈ P : Aj,· x = bj ,

where the last equality follows from I = eq(P). ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.7 Facets 39

The above corollary shows that, if for a polyhedron P we know A and b such that P =
P(A, b), then all facets of P are of the form (3.20).

Definition 3.46 (Redundant constraint, irredundant system)


Let P = P(A, b) be a polyhedron and I = eq(P). The constraint Ai,· x 6 bi is called redun-
dant with respect to Ax 6 b, if P(A, b) = P(AM\{i},· , bM\{i} ), that is, if we can remove
the inequality without changing the solution set.
We call Ax 6 b irredundant or minimal, if it does not contain a redundant constraint.

Observe that removing a redundant constraint may make other redundant constraints irre-
dundant.
The following theorem shows that in order to describe a polyhedron we need an inequality
for each of its facets and that, conversely, a list of all facet defining inequalities suffices.

Theorem 3.47 (Facets are necessary and sufficient to describe a polyhedron) Let P =
P(A, b) be a polyhedron with equality set I = eq(P) and J := M \ I. Suppose that no in-
equality in AJ,· x 6 bJ is redundant. Then, there is a one-to-one correspondence between
the facets of P and the inequalities in AJ,· x 6 bJ :
For each row Aj,· of AJ,· the inequality Aj,· x 6 bj defines a distinct facet of P. Conversely,
for each facet F of P there exists exactly one inequality in AJ,· x 6 bJ which induces F.

Proof: Let F be a facet of P. Then, by Corollary 3.45 there exists j ∈ J such that

F = x ∈ P : Aj,· x = bj . (3.21)

Thus, each facet is represented by an inequality in AJ,· x 6 bJ .


Moreover, if F1 and F2 are facets induced by rows j1 ∈ J and j2 ∈ J with j1 6= j2 , then we
must have F1 6= F2 , since eq(Fi ) = eq(P) ∪ {ji } for i = 1, 2 as we will show now. To do
this, it suffices to construct a point z such that AI,· z = bI , Aji ,· z = bji and AJ ′ ,· z < bJ ′ for
J ′ = M \ (I ∪ {ji }).
We proceed similar as in Theorem 3.25: Let x̄ be any inner point of P which exists by
Lemma 3.12. Then AI,· x̄ = bI , Aji ,· x̄ < bji and AJ ′ ,· x̄ < bJ ′ . Since inequality ji is not
redundant, there exists some y such that AI,· y = bI , Aji ,· y > bji and AJ ′ ,· ȳ < bJ ′ . Then,
as in the proof of Theorem 3.25, a suitable convex combination z of x̄ and y is as required.
Conversely, consider any inequality Aj,· x 6 bj where j ∈ J. We must show that the face F
given in (3.21) is a facet. Clearly, F 6= P, since j ∈ eq(F) \ eq(P). So dim F 6 dim P − 1. We
are done, if we can show that eq(F) = eq(P) ∪ {j}, since then rank Aeq(F),· 6 rank Aeq(P),· +
1 which gives dim F > dim P − 1 and Theorem 3.43 proves that F is a facet.
Take any inner point x̄ of P. This point satisfies

AI,· x̄ = bI and AJ,· x̄ < bJ .

Let J ′ := J \ {j} and M ′ := M \ {j}. Since Aj,· x 6 bj is not redundant in Ax 6 b, we have


P(A, b) ⊂ P(AM ′ ,· , bM ′ ). Thus, there exists y such that

AI,· y = bI , AJ ′ ,· y 6 bJ ′ and Aj,· y > bj .

Consider z = λy + (1 − λ)x̄. Then for an appropriate choice of λ ∈ (0, 1) we have

AI,· z = bI , AJ ′ ,· z < bJ ′ , and Aj,· z = bj .

Thus, z ∈ F and eq(F) = eq(P) ∪ {j} as required. ✷

Corollary 3.48 Each proper face of a polyhedron P is the intersection of facets of P.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


40 Polyhedra and Integer Programs

Proof: We may assume that we have an irredundant representation of P and P = P(A, b).
Let K := eq(P). By Theorem 3.6, for each face F, there is a nonempty set I ⊆ M \ K such
that

F = {x ∈ P : AI,· x = bI } = {x ∈ P : AI,· x = bI }
\
= x ∈ P : Aj,· x = bj ,
j∈I

where by Theorem 3.47 each of the sets in the intersection above defines a facet. ✷

Corollary 3.49 Any defining system for a polyhedron must contain a distinct facet-inducing
inequality for each of its facets. ✷

Lemma 3.50 Let P = P(A, b) ⊆ Rn with n > 1 with I = eq(P) and let F = x ∈ P : wT x = t
be a proper face of P. Then, the following statements are equivalent:

(i) F is a facet of P.
(ii) If cT x = γ for all x ∈ F, then cT is a linear combination of wT and the rows in AI,· .

Proof: “(i)⇒(ii)”: We can write F = x ∈ P : wT x = t, cT x = γ , so we have for J := eq(F)
by Theorem 3.43 and the Dimension Theorem
 
AI,·
dim P = n − rank AI,· = 1 + dim F 6 n − rank  wT  + 1.
cT
Thus  
AI,·
rank  wT  6 rank AI,· + 1.
cT
 
  AI,·
AI,·
Since F is a proper face, we have rank = rank AI,· +1 which means that rank  wT  =
wT
cT
 
AI,·
rank . So, cT is a linear combination of wT and the rows in AI,· .
wT
 
AI,·
“(ii)⇒(i)”: Let J = eq(F). By assumption, rank AJ,· = rank = rank AI,· + 1. So
wT
dim F = dim P − 1 by the Dimension Theorem and by Theorem 3.43 we get that F is a facet.

Suppose that the polyhedron P = P(A, b) is of full dimension. Then, for I := eq(P) we
have rank AI,· = 0 and we obtain the following corollary:

Corollary 3.51 Let P = P(A, b) ⊆ Rn with n > 1 be full-dimensional let F = x ∈ P : wT x = t
be a proper face of P. Then, the following statements are equivalent:

(i) F is a facet of P.
c w
(ii) If cT x = γ for all x ∈ F, then
 
γ is a scalar multiple of t .

Proof: The fact that (ii) implies (i) is trivial by the previous Lemma 3.50. Conversely, if F is
a facet, then by Lemma 3.50 above, cT is a “linear combination” of wT , that is, c = λw is a
scalar multiple of w. Now, since for all x ∈ F we have wT x = t and γ = cT x = λwT x = λt,
the claim follows. ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.7 Facets 41

Example 3.52
We consider the 0/1-Knapsack polytope
 
 X 
PK NAPSACK := PK NAPSACK (N, a, b) := conv x ∈ BN : aj xj 6 b
 
j∈N

which we have seen a couple of times in these lecture notes (for instance in Example 1.3).
Here, the ai are nonnegative coefficients and b > 0. We usePthe general index set N instead
of N = {1, . . . , n} to emphasize that a knapsack constraint j∈N aj xj 6 b might occur as
a constraint in a larger integer program and might not involve all variables.
In all what follows, we assume that aj 6 b for j ∈ N since aj > b implies that xj = 0 for
all x ∈ PK NAPSACK (N, a, b). Under this assumption PK NAPSACK (N, a, b) is full-dimensional,
since χ∅ and χ{j} (j ∈ N) form a set of n+1 affinely independent vectors in PK NAPSACK (N, a, b).
Each inequality xj > 0 for j ∈ N is valid for PK NAPSACK (N, a, b). Moreover, each of these
nonnegativity constraints defines a facet of PK NAPSACK (N, a, b), since χ∅ and χ{i} (i ∈
N \ {j}) form a set of n affinely independent vectors that satisfy the inequality as equality.
In the sequel we will search for more facets of PK NAPSACK (N, a, b) and, less ambitious, for
more valid inequalities.
A set C ⊆ N is called a cover, if X
aj > b.
j∈C

The cover is called a minimal cover, if C \ {j} is not a cover for all j ∈ C.
P
Each cover C gives us a valid inequality j∈C xj 6 |C| − 1 for PK NAPSACK (if you do not
see this immediately, the proof will be given below). It turns out that this inequality is quite
strong, provided that the cover is minimal.
By Observation 2.2 it suffices to show that the inequality is valid for the knapsack set
 
 X 
X := x ∈ BN : aj xj 6 b .
 
j∈N

Suppose that x ∈ X does not satisfy the cover inequality. We have that x = χS is the
incidence vector of a set S ⊆ N. By assumption we have
X X
|C| − 1 < xj = xj = |C ∩ S|.
|{z}
j∈C j∈C∩S
=1

So |C ∩ S| = |C| and consequently C ⊆ S. Thus,


X X X X
aj xj > aj xj = aj = aj > b,
j∈N j∈C j∈C∩S j∈C

which contradicts the fact that x ∈ X.


We claim that the cover inequality defines a facet of PK NAPSACK (C, a, b), if the cover is
minimal. Suppose that there is a facet-defining inequality cT x 6 δ such that
 
 X 
F := x ∈ PK NAPSACK (C, a, b) : xj = |C| − 1
 
j∈C


⊆ Fc := x ∈ PK NAPSACK (C, a, b) : cT x = δ .

We will show that cT x 6 δ is a nonnegative scalar multiple of the cover inequality.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


42 Polyhedra and Integer Programs

For i ∈ C consider the set Ci := C \ {i}. Since, C is minimal, Ci is not a cover. Conse-
quently, each of the |C| incidence vectors χCi ∈ BC is contained in F, so χCi ∈ Fc for
i ∈ C. Thus, for i 6= j we have

0 = cT χCi − cT χCj = cT (χCi − χCj ) = cj − ci .

Hence we have ci = γ for i ∈ C and cT x 6 δ is of the form


X
γ xj 6 δ.
j∈C

Fix i ∈ C. Then, by F ⊆ Fc we have


X
δ = cT χCi = γ xj
j∈Ci

and we get X
xj = |C| − 1,
j∈Ci

so δ = γ(|C|−1) and cT x 6 δ must be a nonnegative scalar multiple of the cover inequality.


Corollary 3.53 A full-dimensional polyhedron has a unique (up to positive scalar multi-
ples) irredundant defining system.

Proof: Let Ax 6 b be an irredundant defining system. Since P is full-dimensional, we


have Aeq(P),· = ∅. By Theorem 3.47 there is a one-to-one correspondence between the
inequalities in Ax 6 b and the facets of P. By Corollary 3.51 two valid inequalities for P
which induce the same facet are scalar multiples of each other. ✷

3.8 The Characteristic Cone and Extreme Rays


In the previous section we have learned that each polyhedron can be represented by its
facets. In this section we learn another representation of a polyhedron which is via its
extreme points and extreme rays.

Definition 3.54 (Characteristic cone, (extreme) ray)


Let P be a polyhedron. Then, its characteristic cone or recession cone char. cone(P) is
defined to be:
char. cone(P) := {r : x + r ∈ P for all x ∈ P} .
We call any r ∈ char. cone(P) with r 6= 0 a ray of P. A ray r of P is called an extreme ray
if there do not exist rays r1 , r2 of P, r1 6= θr2 for any θ ∈ R+ such that r = λr1 + (1 − λ)r2
for some λ ∈ (0, 1).

In other words, char. cone(P) is the set of all directions y in which we can go from all
x ∈ P without leaving P. This justifies the name “ray” for all vectors in char. cone(P).
Since P 6= 0 implies that 0 ∈ char. cone(P) and P = ∅ implies char. cone(P) = Rn , we
have that char. cone(P) 6= ∅.

Lemma 3.55 Let P = P(A, b) be a nonempty polyhedron. Then

char. cone(P) = {x : Ax 6 0} .

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.8 The Characteristic Cone and Extreme Rays 43

Proof: If Ay 6 0, then for all x ∈ P we have A(x + y) = Ax + Ay 6 Ax 6 b, so x + y ∈ P.


Thus y ∈ char. cone(P).
Conversely, ify ∈ char. cone(P) we have Ai,·  exists an x ∈ P such that Ai,· x =
y 6 0 if there
bi . Let I := i : Ai,· x = bj for some x ∈ P and J := j : Aj,· x < bj for all x ∈ P be the
set of all other indices. We are done, if we can show that AJ,· y 6 0J .
If Aj,· y > 0 for some j ∈ J, take an inner point x̄ ∈ P and consider z = x̄ + λy for λ > 0.
Then, by choosing λ > 0 appropriately, we can find j ′ ∈ J such that Aj ′ ,· z = bj ′ , AJ,· z 6 bJ
and AI,· z 6 bI which contradicts the fact that j ′ ∈ J. ✷

The characteristic cone of a polyhedron P(A, b) is itself a polyhedron f(P) = P(A, 0), albeit
a very special one. For instance, char. cone(P) has at most one extreme point, namely
the vector 0. To see this, assume that r 6= 0 is an extreme point of char. cone(P). Then,
Ar 6 0 and from r 6= 0 we have r 6= 21 r ∈ char. cone(P) and r 6= 32 r ∈ char. cone(P). But
then r = 21 ( 12 r) + 12 ( 23 r) is a convex combination of two distinct points in char. cone(P)
contradicting the assumption that r is an extreme point of char. cone(P).
Together with Corollary 3.33 we have:

Observation 3.56 Let P = P(A, b) 6= ∅. Then, 0 ∈ char. cone(P) and 0 is the only potential
extreme point of char. cone(P). The zero vector is an extreme point of char. cone(P) if and
only if rank A = n.

Lemma 3.57 (Properties of the characteristic cone) Let P = P(A, b) be a nonempty


polyhedron and char. cone(P) = {x : Ax 6 0} be its characteristic cone. Then, the fol-
lowing statements hold:

(i) y ∈ char. cone(P) if and only if there exists some x ∈ P such that x + λy ∈ P for all
λ > 0;

(ii) P + char. cone(P) = P;

(iii) P is bounded if and only if char. cone(P) = {0};

(iv) If P = Q+C where Q is a polytope and C is a polyhedral cone, then C = char. cone(P).

Proof:

(i) Let y ∈ char. cone(P). Then Ay 6 0, so for for any λ > 0 we have A(λy) = λAy 6 0
and, hence λy ∈ char. cone(P). Choose any x ∈ P, then A(x + λy) 6 Ax 6 b for all
λ > 0.
Assume conversely that x ′ ∈ P and y such that x ′ + λy ∈ P for all λ > 0. We show
that Ay 6 0 which implies that y ∈ char. cone(P). We have A(x ′ + λy) 6 b for all
λ > 0 if and only if Ax ′ + λAy 6 b for all λ > 0. Assume that there is an index i
such that (Ay)i > 0. Then (note that (Ax ′ )i is constant):

(Ax ′ )i + λ(Ay)i → ∞ for λ → ∞

which contradicts the fact that (Ax ′ )i + λ(Ay)i 6 bi .

(ii) Let x ∈ P, then x = x + 0 ∈ P + char. cone(P), so P ⊆ P + char. cone(P). On the


other hand for x + y ∈ P + char. cone(P) with x ∈ P and y ∈ char. cone(P) we have
x + y ∈ P by definition of the characteristic cone.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


44 Polyhedra and Integer Programs

(iii) Let P be bounded. Then there is an M such that kxk < M for all x ∈ P. Assume
that there exists y ∈ char. cone(P) with y 6= 0. Then kx + λyk → ∞ for λ → ∞
contradicting the boundedness of P.
Now assume conversely that char. cone(P) = {0} and assume that P is unbounded.
Due to the Decomposition Theorem (Theorem 3.15) we can write P = Q ′ + C ′ for a
polytope Q ′ and a polyhedral cone C ′ . If P is unbounded, we can follow, that there
is a c ∈ C ′ with c 6= 0 (otherwise P = Q ′ and the polytope Q ′ is clearly bounded).
For all λ > 0 also λc is in the cone C ′ and for all x ∈ Q ′ , also x = x + 0 ∈ P since
O ∈ C ′ . We get:

x + λc ∈ P for all λ > 0

By part (i) we have 0 6= c ∈ char. cone P which is a contradiction.

(iv) Let P = Q + C, where C = {x : Bx 6 0} and y ∈ char. cone(P). We show that y ∈ C.


Let x ∈ P. Since y ∈ char. cone(P), we have x + ny ∈ P = Q + C for all n ∈ N, say
x + ny = qn + cn for some qn ∈ Q and cn ∈ C. We have

1 1 1 1 1
By = Bqn + Bcn − Bx 6 Bqn − Bx.
n n |{z} n n n
60

Let ε > 0. Choose n such that: k n1 Bqn − n1 Bxk < ε. Then By < ε. Since ε > 0 was
arbitrary, we get By 6 0 and, thus, y ∈ C.
We have to show that y ∈ char. cone(P), i.e., p + y ∈ P for all p ∈ P. We have

p + y = x + c + y = x + (c + y) ∈ Q + C = P

with x ∈ Q and c ∈ C and c + q ∈ C (since B(c + q) = Bc + Bq 6 0 since c and q


in C)

Recall the Decomposition Theorem (Theorem 3.15): Any polyhedron P can be written as
P = Q + C, where Q is a polytope and C is a polyhedral cone. By Lemma 3.57 (iv) we see
that inevitably we must have C = char. cone(P):

Observation 3.58 Any polyhedron P can be written as P = Q + char. cone(P), where Q is


a polytope.

Theorem 3.59 (Characterization of extreme rays) Let P = P(A, b) ⊆ Rn be a nonempty


pointed polyhedron. Then, the following statements are equivalent:

(i) r is an extreme ray of P.

(ii) {θr : θ ∈ R+ } is a one-dimensional face of char. cone(P) = {x : Ax 6 0}.

(iii) r ∈ char. cone(P)\{0} and for I := {i : Ai,· r = 0} = eq({r}) we have rank AI,· = n−1.

Proof: Let I := {i : Ai,· r = 0}.


“(i)⇒(ii)”: Let F be the smallest face of char. cone(P) containing the set {θr : θ ∈ R+ }. By
Observation 3.9 we have

F = {x ∈ char. cone(P) : AI,· x = 0I }

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.8 The Characteristic Cone and Extreme Rays 45

and eq(F) = I. If dim F > 1, then the Dimension Theorem tells us that rank AI,· < n − 1.
Thus, the solution set of AI,· x = 0I contains a vector r1 which is linearly independent
from r. For sufficiently small ε > 0 we have r ± εr1 ∈ char. cone(P), since AI,· r = 0I ,
AI,· r1 = 0I and AM\I,· r < 0. But then r = 12 (r + εr1 ) + 12 (r − εr1 ) contradicting the fact
that r is an extreme ray.
So dim F = 1. Since r 6= 0, the unbounded set {θr : θ ∈ R+ } which is contained in F has also
dimension 1, hence F = {θr : θ ∈ R+ } is a one-dimensional face of char. cone(P). Note that,
if θr ∈ F and −θr ∈ F, then we can write r as 3/2r + 1/2(−r) = r and r is not an extreme
ray.
“(ii)⇒(iii)”: The Dimension Theorem applied to char. cone(P) implies that rank AI,· =
n − 1. Since {θr : θ ∈ R+ } has dimension 1 it follows that r 6= 0.
“(iii)⇒(i)”: By Linear Algebra, the solution set of AI,· x = 0I is one-dimensional. Since
AI,· r = 0I and r 6= 0, for any y with AI,· y = 0I we have y = θr for some θ ∈ R.
Suppose that r = λr1 + (1 − λ)r2 for some r1 , r2 ∈ char. cone(P), λ ∈ (0, 1). Then

0I = AI,· r = λ AI,· r1 +(1 − λ) AI,· r2 6 0I


| {z } | {z }
60I 60I

implies that AI,· rj = 0 for j = 1, 2. So rj = θj rj for appropriate scalars, and both rays r1
and r2 must be scalar multiples of each other. Observe that it follows that θj > 0 for j = 1, 2,
since as above we could otherwise conclude that 0 is not an extreme point of char. cone(P).

Corollary 3.60 Every nonempty pointed polyhedron P has only a finite number of extreme
rays. Moreover, char. cone(P) is finitely generated by the extreme rays of P.

Proof: By the preceding theorem, every extreme ray is determined by a subsystem of


rank n − 1, of which there are only finitely many. ✷

Combining this result with Corollary 3.32 we have:

Corollary 3.61 Every pointed polyhedron has only a finite number of extreme points and
extreme rays.

Consider once more the Linear Program




max cT x : x ∈ P , (3.22)

where P is a pointed polyhedron. We showed in Corollary 3.31, that if the Linear Pro-
gram (3.18) has optimal solutions, it has an optimal solution which is also an extreme point
of P. What happens, if the Linear Program (3.22) is unbounded?

Theorem 3.62 Let P = P(A, b) be a pointed nonempty polyhedron.

(i) If the Linear Program (3.22) has optimal solutions, it has an optimal solution which
is also an extreme point of P.

(ii) If the Linear Program (3.22) is unbounded, then there exists an extreme ray r of P
such that cT r > 0.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


46 Polyhedra and Integer Programs

Proof: Statement (i) is a restatement of Corollary 3.31. So, we only need to prove (ii). By
the Duality Theorem of Linear Programming (Theorem 2.9) the set


y : AT y = c, y > 0

(which is the feasible set for the dual to (3.22)) must be empty. By Farkas’ Lemma (The-
orem 2.11), there exists r such that Ar > 0 and cT r < 0. Let r ′ := −r, then Ar ′ 6 0 and
cT r ′ > 0. In particular, r ′ is a ray of P.
We now consider the Linear Program



max cT x : Ax 6 0, cT x 6 1 = max cT x : x ∈ P ′ . (3.23)

Since rank A = n, it follows that P ′ is pointed. Moreover, P ′ 6= ∅ since r ′ /cT r ′ ∈ P ′ .


Finally, the constraint cT x 6 1 ensures that (3.23) is bounded. In fact, the optimum value
of (3.23) is 1, since this value is achieved for the vector r ′ /cT r ′ .
By (i) an optimal solution of (3.23) is attained at an extreme point of P ′ ⊆ char. cone(P),
say at r∗ ∈ char. cone(P). So
cT r∗ = 1. (3.24)
 
AI,·
Let I = {i : Ai,· r∗ = 0}. By the Dimension Theorem we have rank = n (since {r∗ }
cT
is a zero-dimensional face of the polyhedron P ′ by Theorem 3.30).
If rank AI,· = n−1, then by Theorem 3.59 we have that r∗ is an extreme ray of P as needed.
If rank AI,· = n, then r∗ 6= 0 is an extreme point of char. cone(P) by Theorem 3.30 which
is a contradiction to Observation 3.56. ✷

Now let P = P(A, b) be a rational pointed polyhedron where A and b are already choosen
to have rational entries. By Theorem 3.30 the extreme points of P are the 0-dimensional
faces. By the Theorem of Hoffmann and Kruskal (Theorem 3.25) each extreme point {x̄}
is the unique solution of a subsystem AI,· x = bI . Since A and b are rational, it follows
that each extreme point has also rational entries (Gaussian elimination applied to the linear
system AI,· x = bI does not leave the rationals). Similarly, by Theorem 3.59 an extreme
ray r is determined by a system AI,· r = 0, where rank A = n − 1. So again, r must be
rational. This gives us the following observation:

Observation 3.63 The extreme points and extreme rays of a pointed rational polyhedron
are rational vectors.

We have shown that every (pointed) polyhedron P as P = Q + char. cone(P).


P can be written
Thus, any point x ∈ P can be written as x = k∈K λk x̄k + j∈J µj rj , where the x̄k are
the extreme points of Q and the rj are the extreme rays of P. We can, in fact, prove that it
suffices to choose the Pof P in the first sum. To see this, assume that x ∈ P is
Pextreme points
not of the form x = k∈K λk xk + j∈J µj rj , where the xk are the extreme points and the
rj are the extreme rays of P. Then, the following system is unsolvabe:
X X
λk xk + µj rj = x
k∈K j∈J
X
λk = 1
k∈K
λ, µ > 0.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.9 Most IPs are Linear Programs 47

Then, by Farkas’ Lemma (Theorem 2.11), there exists (w, θ) such that

wT x + θ < 0
wT xk + +θ = 0, k ∈ K
wT rj = 0, j ∈ J,

but this means that wT x < −θ = wT xk for all extreme points of P. Since wT rj = 0, the
LP of maximizing the linear objective w over P has an optimum solution which can not be
at an extreme point which is a contradiction to Theorem 3.62.

3.9 Most IPs are Linear Programs


This section is dedicated to establishing the important fact that if P is a rational (pointed)
polyhedron, then
PI := conv(P ∩ Zn )
is again a rational polyhedron. Clearly, we always have PI ⊆ P, since P is convex.

Lemma 3.64 For any rational polyhedral cone C = {x : Ax 6 0} we have

CI = C.

Proof: By the Theorem of Weyl-Minkowski-Farkas (Theorem 3.14), C is generated P by


finitely many integral vectors, say C = cone{a1 , . . . , ap }. Let x ∈ C, then x = pi=1 µi ai
for nonnegative scalar values µi . In order to prove that x ∈ CI it suffices to show that
µx ∈ CI for some µ > 1, since x is on the line segment between 0 and µx and CI is
convex.
P
Choose k ∈ N such that k > p j=1 µj =: s. Then µ := k/s > 1. Observe that kai ∈ CI for
all i, since ai ∈ Zn , so kai ∈ Zn and by the fact that C is a cone we also have kai ∈ C,
so kai ∈ C ∩ Zn ⊆ CI . Now
p
k 1X
µx = x= µi kai ,
s s |{z}
i=1 ∈CI

so µx ∈ CI as desired. ✷

Theorem 3.65 (Most IPs are LPs) Let P = P(A, b) be a rational pointed polyhedron,
then PI = conv(P ∩Zn ) is a rational polyhedon. If PI 6= ∅, then char. cone(P) = char. cone(PI ).

Proof: Let P be a rational polyhedron, then we have P = Q + char. cone(P) (see Observa-
tion 3.58). By Theorem 3.15 C := char. cone(P) is a rational cone which by Theorem 3.14
is generated by finitely many integral vectors, say char. cone(P) = cone{y1 , . . . , yk }. We let
 k 
X
R := µi yi : 0 6 µi 6 1 for i = 1, . . . , k . (3.25)
i=1

Then R is a bounded set, and so is Q + R (since Q is bounded, too). Thus, (Q + R) ∩ Zn


consists of finitely many points and so by Corollary 3.16

(Q + R)I = conv((Q + R) ∩ Zn )

is a polytope. By the Decomposition Theorem 3.15 we are done, if we can prove that
PI = (Q + R)I + C.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


48 Polyhedra and Integer Programs

“PI ⊆ (Q + R)I + C”: Suppose that x ∈ P ∩ Zn , then x = q + c where q ∈ Q and c ∈ C.


P
We have c = k i=1 λi yi for some λi > 0 and, hence,

k
X k
X
c= ⌊λi ⌋yi + (λi − ⌊λi ⌋)yi = c ′ + r, where c ′ ∈ C ∩ Zn , r ∈ R.
| {z }
i=1 i=1
∈[0,1]

It follows that x = q + c = (q + r) + c ′ . Note that q + r = x − c ′ ∈ Zn , since x ∈ Zn


and c ′ ∈ Zn . Hence x ∈ (Q + R)I + C.
We have shown that any integral point in P is contained in (Q + R)I + C. Since
(Q + R)I + C is a polyhedron and, hence, convex, also conv(P ∩ Zn ) is contained in
(Q + R)I + C.

“(Q + R)I + C ⊆ PI ”: We have

Lemma 3.64
(Q + R)I + C ⊆ PI + C = PI + CI ⊆ (P + C)I = PI .

In order to see that PI + PCI ⊆ (P + C)I , consider any point x =P p + c for p ∈ PI


and c ∈ CP I . Then p = λ x
i i i for integral xi ∈ P, λ i > 0 and i λi = 1. Simi-
larly,
P P c = µ c
j j j is a convex
P P combination of integral vectors cj ∈ C ∩ Zn . Then
i j λi µj = 1 and i j λi µj (xi + cj ) = p + c, so x = p + c ∈ (P + C)I . This
completes the proof.

The result of Theorem 3.65 has far reaching consequences: Any Integer Program

max cT x
Ax 6 b
x ∈ Zn

with a rational matrix A and a rational vector b can be written as a Linear Program

max cT x
Āx 6 b̄,

provided P(A, b) is pointed. Of course, the issue is how to find Ā and b̄!
It should be noted that the above result can be extended rather easily to mixed integer
programs (and also to non-pointed polyhedra). Moreover, as a byproduct of the proof we
obtain the following observation:

Observation 3.66 Let P = P(A, b) be a pointed polyhedron with rational A and b. If


PI 6= ∅, then the extreme rays of P and PI coincide.

Theorem 3.67 Let P = P(A, b) be a pointed polyhedron with rational A and b such that
PI is nonempty. Let c ∈ Rn be arbitrary. We consider the two optimization problems:


(IP) zIP = max cT x : x ∈ P ∩ Zn


(LP) zLP = max cT x : x ∈ PI .

Then, the following statements hold for X = P ∩ Zn :

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


3.9 Most IPs are Linear Programs 49

(i) The objective value of IP is bounded from above if and only if the objective value of
LP is bounded from above.

(ii) If LP has a bounded optimal value, then it has an optimal solution (namely, an ex-
treme point of conv(X)), that is an optimal solution to IP.

(iii) If x∗ is an optimal solution to IP, then x∗ is also an optimal solution to LP.

Proof: Let X = P ∩ Zn . Since X ⊆ conv(X) = PI it trivially follows that zLP > zIP .

(i) If zIP = +∞ it follows that zLP = +∞. On the other hand, if zLP = +∞, there is
an integral point x0 ∈ conv(X) and an integral extreme ray r of conv(X) such that
x0 + µr ∈ conv(X) for all µ > 0 and cT r > 0. Thus, x0 + µr ∈ X for all µ ∈ N. Thus,
we also have zIP = +∞.

(ii) By Theorem 3.65 we know that PI = conv(X) is a rational polyhedron. Hence, if


LP has an optimal solution, there exists also an optimal solution which is an extreme
point of conv(X), say x0 . But then x0 ∈ X and zIP > cT x0 = zLP > zIP . Hence, x0 is
also an optimal solution for IP.

(iii) Since x∗ ∈ X ⊆ conv(X) = PI , the point x∗ is also feasible for LP. The claim now
follows from (ii) and (iii).

Theorems 3.65 and 3.67 are particularly interesting in conjunction with the polynomial
time equivalence of the separation and optimization (see Theorem 5.24
later on). A general
method for showing that an Integer Linear Program max cT x : x ∈ X with X = P(A, b) ∩
Zn can be solved in polynomial time is as follows:

(i) Find a description of PI = conv(X), that is, conv(X) = P ′ = {x : A ′ x 6 b ′ }.

(ii) Give a polynomial time separation algorithm for P ′ .

(iii) Apply Theorem 5.24.

Although this procedure does usually not yield algorithms that appear to be the most ef-
ficient in practice, the equivalence of optimization and separation should be viewed as a
guide for searching for more efficient algorithms. In fact, for the vast majority of problems
that were first shown to be solvable in polynomial time by the method outlined above, later
algorithms were developed that are faster both in theory and practice.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Integrality of Polyhedra
4
 chapter we study properties of polyhedra P which ensure that the Linear Program
In this
max cT x : x ∈ P has optimal integral solutions.

Definition 4.1 (Integral Polyhedron)


A polyhedron P is called integral if every nonempty face of P contains an integral point.

Informally speaking, if we are optimizing over an integral polyhedron weget integrality for

free: the set of optimal solutions of z = max cT x : x ∈ P is a face F = x ∈ P : cT x = z
of P, and, if each face contains an integral point, then there is also an optimal solution
which is also integral. In other words, for integral polyhedra we have



max cT x : x ∈ P = max cT x : x ∈ P ∩ Zn . (4.1)

Thus, the IP on the right hand side of (4.1) can be solved by solving the Linear Program on
the left hand side of (4.1).
A large part of the study of polyhedral methods for combinatorial optimization problems
was motivated by a theorem of Edmonds on matchings in graphs. A matching in an undi-
rected graph G = (V, E) is a set M ⊆ E of edges such that none of the edges in M share
a common endpoint and M does not contain a loop. Given a matching M we say that
a vertex v ∈ V is M-covered if some edge in M is incident with v. Otherwise, we call
v M-exposed. Observe that the number of M-exposed nodes is precisely |V| − 2|M|. A
matching M is called pefect, if all vertices in G are M-covered. We define:


PM(G) := χM ∈ BE : M is a perfect matching in G (4.2)

to be the set of incidence vectors of perfect matchings in G.


We will show in the next section that a polyhedron P is integral if and only if P = conv(P ∩
Zn ). Edmonds’ Theorem can be stated as follows:

Theorem 4.2 (Perfect Matching Polytope Theorem) For any graph G = (V, E) without
loops, the convex hull conv(PM(G)) of the perfect matchings in G (i.e. each vertex v ∈ V
is incident to exactly one edge of the matching) is identical to the set of solutions of the
following linear system:

x(δ(v)) = 1 for all v ∈ V (4.3a)


x(δ(S)) > 1 for all S ⊆ V, 3 6 |S| 6 |V| − 3 odd (4.3b)
xe > 0 for all e ∈ E. (4.3c)
52 Integrality of Polyhedra

Proof: See Theorem 4.23. ✷

Observe that any integral solution of (4.3) is a perfect matching. Thus, if P denotes the
polyhedron defined by (4.3), then by the equivalence shown in the next section the Perfect
Matching Polytope Theorem states that P is integral and P = conv(PM(G)). Edmond’s
result is very strong, since it gives us an explicit description of conv(PM(G)).

4.1 Equivalent Definitions of Integrality


We are now going to give some equivalent definitions of integrality which will turn out to
be quite useful later.

Theorem 4.3 Let P = P(A, b) be a pointed rational polyhedron. Then, the following state-
ments are equivalent:

(i) P is an integral polyhedron.



(ii) The LP max cT x : x ∈ P has an optimal integral solution for all c ∈ Rn where the
value is finite.

(iii) The LP max cT x : x ∈ P has an optimal integral solution for all c ∈ Zn where the
value is finite.

(iv) The value zLP = max cT x : x ∈ P is integral for all c ∈ Zn where the value is
finite.
(v) P = conv(P ∩ Zn ).

Proof: We first show the equivalence of statements (i)-(iv):

(i)⇒(ii) The set of optimal solutions of the LP is a face of P. Since every face contains an
integral point, there is an integral optimal solution.
(ii)⇒(iii) trivial.
(iii)⇒(iv) trivial.
(iv)⇒(i) Suppose that (i) is false and let x0 be an extreme point which by assumption is
not integral, say component x0j is fractional. By Theorem 3.36 there exists a vector

c ∈ Zn such that x0 is the unique optimal solution of max cT x : x ∈ P . Since x0 is
the unique optimal solution, we can find a large ω ∈ N such that x0 is also optimal
1
for the objective vector c̄ := c + ω ej , where ej is the jth unit vector. Clearly, x0 must
then also be optimal for the objective vector c̃ := ωc̄ = ωc + ej . Now we have

c̃T x0 − ωcT x0 = (ωcT x0 + eTj x0 ) − ωcT x0 = eTj x0 = x0j .

Hence, at least one of the two values c̃T x0 and cT x0 must be fractional, which con-
tradicts (iv).

We complete the proof of the theorem by showing two implications:

(i)⇒(v) Since P is convex, we have conv(P ∩ Zn ) ⊆ P. PThus, the claimPfollows if we can


show that P ⊆ conv(P ∩ Zn ). Let v ∈ P, then v = k∈K λk xk + j∈J µj rj , where
the xk are the extreme j
P points ofkP and the r arenthe extreme rays of P. By (i) every
k
x is integral, thus k∈K λk x ∈ conv(P ∩ Z ). Since by Observation 3.66 the
extreme rays of P and conv(P ∩ Zn ) are the same, we get that v ∈ conv(P ∩ Zn ).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.2 Matchings and Integral Polyhedra I 53

n n
 c ∈ Z be an integral vector. Since by assumption conv(P
(v)⇒(iii) Let P ∩Z ) = P, the LP
max cT x : x ∈ P has an optimal solution in P ∩ Zn (If x = i xi ∈ conv(P ∩ Zn )
is a convex combination of points in P ∩ Zn , then cT x 6 maxi cT xi (cf. Observa-
tion 2.2)).

This shows the theorem. ✷

Recall that each minimal nonempty face of P(A, b) is an extreme point if and only if
rank(A) = n (Corollary 3.26). Thus, we have the following result:

Observation 4.4 A nonempty polyhedron P = P(A, b) with rank(A) = n is integral if and


only if all of its extreme points are integral. ✷

Moreover, if P(A, b) ⊆ Rn
+ is nonempty, then rank(A) = n. Hence, we also have the
following corollary:

Corollary 4.5 A nonempty polyhedron P ⊆ Rn


+ is integral if and only if all of its extreme
points are integral. ✷

4.2 Matchings and Integral Polyhedra I


As mentioned before, a lot of the interest about integral polyhedra and their applications in
combinatorial optimization was fueled by results on the matching polytope. As a warmup
we are going to prove a weaker form of the perfect matching polytope theorem due to
Birkhoff.
A graph G = (V, E) is called bipartite, if there is a partition V = A ∪ B, A ∩ B = ∅ of the
vertex set such that every edge e is of the form e = (a, b) with a ∈ A and b ∈ B.

Lemma 4.6 A graph G = (V, E) is bipartite if and only if it does not contain an odd cycle.

Proof: Let G = (V, E) be bipartite with bipartition V = A ∪ B. Assume for the sake of a
contradiction that C = (v1 , v2 , . . . , v2k−1 , v2k = v1 ) is an odd cycle in G. We can assume
that v1 ∈ A. Then (v1 , v2 ) ∈ E implies that v2 ∈ B. Now (v2 , v3 ) ∈ E implies v3 ∈ A.
Continuing we get that v2i−1 ∈ A and v2i ∈ B for i = 1, 2, . . .. But since v1 = v2k we have
v1 ∈ A ∩ B = ∅, which is a contradiction.
Assume conversely that G = (V, E) does not contain an odd cycle. Since it suffices to show
that any connected component of G is bipartite, we can assume without loss of generality
that G is connected.
Choose r ∈ V arbitrary. Since G is connected, the shortest path distances from r to all v ∈ V
are finite. We let

A = {v ∈ V : d(v) is even}
B = {v ∈ V : d(v) is odd}

This gives us a partition of V with r ∈ A. We claim that all edges are between A and B.
Let (u, v) ∈ E and suppose that u, v ∈ A. Clearly, |d(u) − d(v)| 6 1 which gives us that
d(u) = d(v) = 2k. Let p = r, v1 , . . . , v2k = v and q = r, u1 , . . . , u2k = u be shortest paths
from r to v and u, respectively. The paths might share some common parts. Let vi and
uj be maximal with the property that vi = uj and the paths vi+1 , . . . , v and uj+1 , . . . , u are
node disjoint. Observe that we must have that i = j since otherwise one of the paths could
not be shortest. But then vi , vi+1 , . . . , v2k = v, u = u2k , u2k−1 , . . . , ui = vi is a cycle of odd
length, which is a contradiction. ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


54 Integrality of Polyhedra

Theorem 4.7 (Birkhoff’s Theorem) Let G be a bipartite graph. Then, conv(PM(G)) =


P, where P is the polytope described by the following linear system:

x(δ(v)) = 1 for all v ∈ V (4.4a)


xe > 0 for all e ∈ E. (4.4b)

In particular, P is integral.

Proof: Clearly conv(PM(G)) ⊆ P. To show that conv(PM(G)) = P it suffices to prove


that all extreme points of P are integral, since P is pointed (due to the fact that P ⊆ RE + or,
alternatively, due to the fact that P is a polytope). Let x be any extreme point of P. Assume
for the sake of a contradiction that x is fractional. Define Ẽ := {e ∈ E : 0 < xe < 1} to be
the set of “fractional edges”. Since x(δ(v)) = 1 for any v ∈ V, we can conclude that any
vertex that has an edge from Ẽ incident with it, in fact is incident to at least two such edges
from Ẽ. Thus, Ẽ contains an even cycle C (by Lemma 4.6 the graph G does not contain any
odd cycle). Let y be a vector which is alternatingly ±1 for the edges in C and zero for all
other edges. For small ε > 0 we have x ± εy ∈ P. But then, x can not be an extreme point.

Observe that we can view the assignment problem (see Example 1.6) as the problem of
finding a minimum cost perfect matching on a complete bipartite graph. Thus, Birkhoff’s
theorem shows that we can solve the assignment problem by solving a Linear Program.

Remark 4.8 The concept of total unimodularity derived in the next section will enable us
to give an alternative proof of Birkhoff’s Theorem.

4.3 Total Unimodularity


Proving that a given polyhedron is integral is usually a difficult task. In this section we
derive some conditions under which the polyhedron

P= (A, b) = {x : Ax = b, x > 0}

is integral for every integral right hand side b.


As a motivation for the following definition of total unimodularity, consider the Linear
Program


(LP) max cT x : Ax = b, x > 0 , (4.5)

where rank A = m. From Linear Programming theory, we know that if (4.5) has a feasible
(optimal) solution, it also has a feasible (optimal) basic solution, that is, a solution of the
form x = (xB , xN ), where xB = A−1 ·,B b and xN = 0 and A·,B is an m × m nonsingular
submatrix of A indexed by the columns in B ⊆ {1, . . . , n}, |B| = m. Here, N = {1, . . . , n} \ B.
Given such a basic solution x = (xB , xN ) we have by Cramer’s rule:

det(Bi )
xi = for i ∈ B,
det(AB )

where Bi is the matrix obtained from AB by replacing the ith column by the vector b.
Hence, we conclude that if det(AB ) = ±1, then each entry of xB will be integral (provided
b is integral as well).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.3 Total Unimodularity 55

Definition 4.9 (Unimodular matrix, totally unimodular matrix)


Let A be an m × n-matrix with full row rank. The matrix A is called unimodular if all
entries of A are integral and each nonsingular m × m-submatrix of A has determinant ±1.
The matrix A is called totally unimodular, if each square submatrix of A has determinant
±1 or 0.

Since every entry of a matrix forms itself a square submatrix, it follows that for a totally
unimodular matrix A every entry must be either ±1 or 0.

Observation 4.10 (i) A is totally unimodular, if and only if AT is totally unimodular.

(ii) A is totally unimodular, if and only if (A, I) is unimodular (if and only if (A, I) is
totally unimodular).
 
A
 −A 
(iii) A is totally unimodular, if and only if   is totally unimodular.
 I 
−I

We now show that a Linear Program with a (totally) unimodular matrix always has an
integral optimal solution provided the optimum is finite. Thus, by Theorem 4.3 we get that
the corresponding polyhedron must be integral.

Theorem 4.11 Let A be an m × n matrix with integer entries and linearly independent
rows. The polyhedron {x ∈ Rn : Ax = b, x > 0} is integral for all b ∈ Zm if and only if A
is unimodular.

Proof: Suppose that A is unimodular and b ∈ Zm is an integral vector. By Corollary 4.5


it suffices to show that all extreme points of {x : Ax = b, x > 0} are integral. Let x̄ be such
an extreme point. Since A has full row rank, there exists a basis B ⊆ {1, . . . , n}, |B| = m
such that x̄B = A−1·,B b and x̄N = 0. Since A is unimodular, we have det(A·,B ) = ±1 and by
Cramer’s rule we can conclude that x̄ is integral.
Assume conversely that {x : Ax = b, x > 0} is integral for every integral vector b. Let B be
a basis of A. We must show that det(A·,B ) = ±1. Let x̄ be the extreme point corresponding
to the basis B. By assumption x̄B = A−1 ·,B b is integral for all integral b such that x̄B > 0.
Consider for j ∈ {1, . . . , m} and some z ∈ Zn the vector wj := A·,B z + ej , where ej is the
jth unit vector. Then A−1 −1 n −1
·,B wj = z + A·,B ej . We can choose z ∈ Z such that A·,B wj > 0
for all j = 1, . . . , m. Thus from z ∈ Z and A·,B wj ∈ Z we can conclude that A−1
n −1 n
·,B ej is
integral for all j, which gives that A·,B must be integral. Thus, it follows that det(A−1
−1
·,B ) =
1/ det(A·,B ) is integral. On the other hand, det(A·,B ) is also integral by the integrality of A.
Hence, det(A·,B ) = ±1 as required. ✷

We use the result of the previous theorem to show the corresponding result for the polyhe-
dron {x : Ax 6 b, x > 0}.

Corollary 4.12 (Integrality-Theorem of Hoffmann and Kruskal) Let A be an m × n


matrix with integer entries. The matrix A is totally unimodular if and only if the poly-
hedron P(b) := {x : Ax 6 b, x > 0} is integral for all b ∈ Zm .

Proof: The proof is very simular to Theorem 4.11. It is easy to see that x is an extreme
point of P(b) if and only if (x, y) is an extreme point of

{(x, y) : Ax + Iy = b, x, y, > 0},

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


56 Integrality of Polyhedra

where y is uniquely determined by y = b − Ax. Thus, if A is totally unimodular, then


(A, I) is (totally) unimodular and integrality of P(b) follows directly from Theorem 4.11.

Assume now conversely that P(b) is integral for each integral b. Let A1 be a k × k-
submatrix of A and let
 
A1 0
Ā :=
A2 Im−k

be the m × m-submatrix of (A, I) generated from A1 by taking the appropriate m − k unit


vectors from I. Notice that Ā is nonsingular if and only if A1 is. Consider the vectors
wj := Āz + ej , j = 1, . . . , m. Then Ā−1 wj = z + Ā−1 ej . We can choose z ∈ Zn such that
Ā−1 wj > 0 for all j = 1, . . . , m. Hence, z + Ā−1 ej is the vector of basis variables of an
extreme point of P(b) which by assumption is integral. Again, we can conclude that Ā−1
is integral and so must be A−1 1 . By the same argument as in Theorem 4.11 we get that
det(A1 ) ∈ {+1, −1}. ✷

The Integrality-Theorem of Hoffmann and Kruskal in conjunction with Observation 4.10


yields more characterizations of totally unimodular matrices.

Corollary 4.13 Let A be an integral matrix. Then the following statements hold:

(a) A is totally unimodular, if and only if the polyhedron {x : a 6 Ax 6 b, l 6 x 6 u} is


integral for all integral a, b, l, u.

(b) A is totally unimodular, if and only if the polyhedron {x : Ax = b, 0 6 x 6 u} is inte-


gral for all integral b, u.

4.4 Conditions for Total Unimodularity

In this section we derive sufficient conditions for a matrix to be totally unimodular.

Theorem 4.14 Let A be any m × n matrix with entries taken from {0, +1, −1} with the
property that any column contains at most two nonzero entries. Suppose also that there
exists a partition M1 ∪ MP2 = {1, . . . , m} of
Pthe rows of A such that every column j with two
nonzero entries satisfies: i∈M1 aij = i∈M2 aij . Then, A is totally unimodular.

Proof: Suppose for the sake of a contradiction that A is not totally unimodular. Let B be a
smallest square submatrix such that det(B) ∈
/ {0, +1, −1}. Obviously, B can not contain any
column with at most one nonzero entry, since otherwise B would not be smallest. Thus,
any column of B contains exactly two nonzero entries. By the assumptions of the theorem,
adding the rows in B that are in M1 and subtracting those that are in M2 gives the zero
vector, thus det(B) = 0, a contradiction! ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.4 Conditions for Total Unimodularity 57

Example 4.15
Consider the LP-relaxation of the assignment problem.
n X
X n
min cij xij (4.6a)
i=1 j=1
Xn
xij = 1 for j = 1, . . . , n (4.6b)
i=1
n
X
xij = 1 for i = 1, . . . , n (4.6c)
j=1

0 6 x 6 1, (4.6d)

We can write the constraints (4.6b) and (4.6c) as Ax = 1, where A is the node-edge inci-
dence matrix of the complete bipartite graph G = (V, E) (Figure 4.1 shows the situation
for n = 3). The rows of A correspond to the vertices and the columns to the edges. The
column corresponding to edge (u, v) has exactly two ones, one at the row for u and one at
the row for v. The fact that G is bipartite V = A ∪ B, gives us a partition A ∪ B of the rows
such that the conditions of Theorem 4.14 are satisfied. Hence, A is totally unimodular.

 
x11 ′ 1 1′
   x12 ′   
1 1 1   1
 x13 ′ 
 1 1 1     1 
   x21 ′   
 1 1 1     1 
 · x22 ′ =  2 2′
 1 1 1     1 
   x23 ′   
 1 1 1     1 
 x31 ′ 
1 1 1   1
 x32 ′ 
x33 ′ 3 3′
(b) Complete bipartite
(a) Constraint system graph

Figure 4.1: The matrix of the assignment problem as the node-edge incidence matrix of a
complete bipartite graph.

We derive some other useful consequences of Theorem 4.14:

Theorem 4.16 Let A be any m × n matrix with entries taken from {0, +1, −1} with the
property that any column contains at most one +1 and at most one −1. Then A is totally
unimodular.

Proof: The fact that A is totally unimodular follows from Theorem 4.14 with M1 =
{1, . . . , m} and M2 = ∅. ✷

The node-arc incidence matrix of a directed network G = (V, A) is the n × m-Matrix


M(A) = (mxy ) such that

+1 if a = (i, j) and x = j

mxa = −1 if a = (i, j) and x = i


0 otherwise

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


58 Integrality of Polyhedra

The minimum cost flow problem can be stated as the following Linear Program:
X
min c(i, j)x(i, j) (4.7a)
(i,j)∈A
X X
x(j, i) − x(i, j) = b(i) for all i ∈ V (4.7b)
j:(j,i)∈A j:(i,j)∈A

0 6 x(i, j) 6 u(i, j) for all (i, j) ∈ A (4.7c)

By using the node-arc incidence matrix M = M(A), we can rewrite (4.7) as:


min cT x : Mx = b, 0 6 x 6 u , (4.8)

where b is the vector of all required demands.

Corollary 4.17 The node-arc incidence matrix of a directed network is totally unimodular.

Proof: The claim follows immediately from Theorem 4.16. ✷

We close this section by one more sufficient condition for total unimodularity.

Theorem 4.18 (Consecutive ones Theorem) Let A be any m × n-matrix with entries
from {0, 1} and the property that the rows of A can be permutated in such a way that all 1s
appear consecutively. Then, A is totally unimodular.

Proof: Let B be a square submatrix of A. Without loss of generality we can assume


that the rows of A (and thus also of B) are already permuted in such a way that the ones
appear consecutively. Let bT1 , . . . , bTk be the rows of B. Consider the matrix B ′ with rows
bT1 − bT2 , bT2 − bT3 , . . . , bTk−1 − bTk , bk . The determinant of B ′ is the same as of B.
Any column of B ′ contains at most two nonzero entries, one of which is a −1 (one before
the row where the ones in this column start) and a +1 (at the row where the ones in this
column end). By Theorem 4.16, B ′ is totally unimodular, in particular det(B ′ ) = det(B) ∈
{0, +1, −1}. ✷

4.5 Applications of Unimodularity: Network Flows


We have seen above that if M is the node-arc incidence matrix of a directed graph, then the
polyhedron {x : Mx = b, 0 6 x 6 u} is integral for all integral b and u. In particular, for
integral b and u we have strong duality between the IP


max cT x : Mx = b, 0 6 x 6 u, x ∈ Zn (4.9)

and the dual of the LP-relaxation




min bT z + uT y : MT z + y > c, y > 0 . (4.10)

Moreover, if the vector c is integral, then by total unimodularity the LP (4.10) has always
an integral optimal solution value.
As an application of the strong duality of the problems (4.9) and (4.10) we will establish
the Max-Flow-Min-Cut-Theorem.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.5 Applications of Unimodularity: Network Flows 59

Definition 4.19 (Cut in a directed graph, forward and backward part)


Let G = (V, A) be a directed graph and S ∪ T = V a partition of the node set V. We call
(S, T ) the cut induced by S and T . We also denote by

δ+ (S) := { (i, j) ∈ A : i ∈ S and j ∈ T }


δ− (S) := { (j, i) ∈ A : j ∈ T and i ∈ S }

the forward part and the backward part of the cut. The cut (S, T ) is an (s, t)-cut if s ∈ S
and t ∈ T .
If u : A → R>0 is a capacity function defined on the arcs of the network G = (V, A) and
(S, T ) is a cut, then the capacity of the cut is defined to be the sum of the capacities of its
forward part:
X
u(δ+ (S)) := u(i, j).
(i,j)∈δ+ (S)

Figure 4.2 shows an example of a cut and its forward and backward part.

2 3 2 3

s 1 4 t s 1 4 t

5 6 5 6

(a) An (s, t)-cut (S, T ) in a directed (b) The forward part δ+ (S) of the cut:
graph. The arcs in δ+ (S) ∪ δ− (S) are Arcs in δ+ (S) are shown as dashed arcs.
shown as dashed arcs.

2 3

s 1 4 t

5 6

(c) The backward part δ− (S) of the cut:


arcs in δ− (S) are shown as dashed arcs.

Figure 4.2: A cut (S, T ) in a directed graph and its forward part δ+ (S) and backward part
δ− (S).

Let f : A → R+ be any function defined on the arcs of G = (V, A). For a node i ∈ V we
define by
X X
excessf (i) := f(a) − f(a) (4.11)
a∈δ− (v) a∈δ+ (v)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


60 Integrality of Polyhedra

the excess of i with respect to f. The first term in (4.11) corresponds to the inflow into i,
the second term is the outflow out of i.
We call f an (s, t)-flow, if excessf (i) = 0 for all i 6= s, t, excessf (s) 6 0 and excessf (t) > 0.
The value val(f) = excessf (t) = −excessf (s) is called the value of the flow. Then we have:
X
val(f) = −excessf (s) = − excessf (i)
i∈S
 
X X X
=  f(i, j) − f(j, i) . (4.12)
i∈S (i,j)∈A (j,i)∈A

If for an arc (x, y) both nodes x and y are contained in S, then the term f(x, y) appears
twice in the sum (4.12), once with a positive and once with a negative sign. Hence, (4.12)
reduces to X X
val(f) = f(a) − f(a). (4.13)
a∈δ+ (S) a∈δ− (S)

We call a flow f feasible if 0 6 f(a) 6 u(a) for all a ∈ A. For such a feasible flow f, we
get from (4.13):
X X X
val(f) = f(a) − f(a) 6 u(a) = u(δ+ (S)).
a∈δ+ (S) a∈δ− (S) a∈δ+ (S)

Thus, the value val(f) of the flow is bounded from above by the capacity u(δ+ (S))) of the
cut. We have proved the following lemma:

Lemma 4.20 Let f be a feasible (s, t)-flow and (S, T ) an (s, t)-cut. Then:

val(f) 6 u(δ+ (S)).

Since f and [S, T ] are arbitrary we deduce that:

max val(f) 6 min u(δ+ (S)). (4.14)


f is an (s, t)-flow in G (S, T ) is an (s, t)-cut in G

We are now ready to prove the famous Max-Flow-Min-Cut-Theorem of Ford and Fulker-
son:

Theorem 4.21 (Max-Flow-Min-Cut-Theorem) Let G = (V, A) be a flow network with


capacities u : A → R+ , then the value of a maximum feasible (s, t)-flow equals the mini-
mum capacity of an (s, t)-cut.

Proof: We add a backward arc (t, s) to G. Call the resulting graph G ′ = (V, A ′ ), where
A ′ = A ∪ {(t, s)}. Then, we can write the maximum flow problem as the Linear Program

z = max {xts : Mx = 0, 0 6 x 6 u} , (4.15)

where M is the node-arc incidence matrix of G ′ and u(t, s) = +∞. We know that M is
totally unimodular from Corollary 4.17. So, (4.15) has an optimal integral solution value
for all integral capacities u. By Linear Programming duality we have:


max {xts : Mx = 0, 0 6 x 6 u} = min uT y : MT z + y > χ(t,s) , y > 0 ,

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.6 Matchings and Integral Polyhedra II 61

where χ(t,s) is the vector in RA which has a one at entry (t, s) and zero at all other entries.
We unfold the dual which gives:
X
w = min uij yij (4.16a)
(i,j)∈A

zi − zj + yij > 0 for all (i, j) ∈ A (4.16b)


zt − zs > 1 (4.16c)
yij > 0 for all (i, j) ∈ A (4.16d)

There are various ways to see that (4.16) has an optimum solution which is also integral,
for instance:

• The constraint matrix of (4.16) is of the form (MT I) and, from the total unimodular-
ity of M it follows that (MT I) is also totally unimodular. In particular, (4.16) has an
integral optimal solution for every integral right hand side (and our right hand side is
integral!).
• The polyhedron of the LP (4.15) is integral by total unimodularity. Thus, it has
an optimum integer value for all integral capacities (the objective is also integral).
Hence, by LP-duality (4.16) has an optimum integral value for all integral objec-
tives (which are the capacities). Hence, by Theorem 4.3 the polyhedron of (4.16) is
integral and (4.16) has an optimum integer solution.

Let (y∗ , z∗ ) be such an integral optimal solution of (4.16). Observe that replacing z∗ by
z∗ − α for some α ∈ R does not change anything, so we may assume without loss of
generality that z∗s = 0.
Since (y∗ , z∗ ) is integral, the sets S and T defined by

S := {v ∈ V : z∗v 6 0}
T := {v ∈ V : z∗v > 1}

induce an (S, T )-cut. Then,


X X X
w= uij y∗ij > uij y∗ij > uij ( z∗j − z∗i ) > u(δ+ (S)).
(i,j)∈A (i,j)∈δ+ (S) (i,j)∈δ+ (S)
|{z} |{z}
>1 60

Thus, the optimum value w of the dual (4.16) which by strong duality equals the maximum
flow value is at least the capacity u(δ+ (S)) of the cut (S, T ). By Lemma 4.20 it now follows
that (S, T ) must be a minimum cut and the claim of the theorem is proved. ✷

4.6 Matchings and Integral Polyhedra II


Birkhoff’s theorem provided a complete description of conv(PM(G)) in the case where
the graph G = (V, E) was bipartite. In general, the conditions in (4.4) do not suffice to
ensure integrality of every extreme point of the corresponding polytope. Let FPM(G) (the
fractional perfect matching polytope) denote the polytope defined by (4.4). Consider the
case where the graph G contains an odd cycle of length 3 (cf. Figure 4.3).
The vector x̃ with x̃e1 = x̃e2 = x̃e3 = x̃e5 = x̃e6 = x̃e7 = 1/2 and x̃e4 = 0 is contained
in FPM(G). However, x̃ is not a convex combination of incidence vectors of perfect match-
ings of G, since {e3 , e4 , e5 } is the only perfect matching in G. However, the fractional
matching polytope FPM(G) still has an interesting structure, as the following theorem
shows:

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


62 Integrality of Polyhedra

x̃e5 = 1/2
5 6

x̃e7 = 1/2 x̃e6 = 1/2


4
x̃e4 = 0

2
x̃e1 = 1/2 x̃e2 = 1/2

1 3
x̃e3 = 1/2
Figure 4.3: In an odd cycle, the blossom inequalities are necessary to ensure integrality of
all extreme points.

Theorem 4.22 (Fractional Perfect Matching Polytope Theorem) Let G = (V, E) be a


graph and x ∈ FPM(G). Then, x is an extreme point of FPM(G) if and only if xe ∈
{0, 1/2, 1} for all e ∈ E and the edges e for which xe = 1/2 form node disjoint odd cycles.

Proof: Suppose that x̃ is a half-integral solution satisfying the conditions stated in the
E
theorem. Define the vector w ∈T R by we = −1 if x̃e = 0 and we = 0 if x̃e > 0. Consider
the face F = x ∈ FPM(G) : w x = 0 . Clearly, x̃ ∈ F. We claim that F = {x̃} which shows
that x̃ is an extreme point.
For every x ∈ F we have X
0 = wT x = − xe .
|{z}
e∈E:x̃e =0 >0

Thus xe = 0 for all edges such that x̃e = 0. Now consider an edge e where x̃e = 1/2. By
assumption, this edge lies on an odd cycle C. It is now easy to see that the values of x on
the cycle must be alternatingly θ and 1 − θ since x(δ(v)) = 1 for all v ∈ V (see Figure 4.4).
The only chance that x ∈ FPM(G) is θ = 1/2 and thus x = x̃.

x̃51 = 1/2, x12 = θ 1 x̃12 = 1/2, x12 = θ

5 2

x̃45 = 1/2, x12 = 1 − θ x̃23 = 1/2, x12 = 1 − θ


4 3
x̃23 = 1/2, x12 = θ

Figure 4.4: If x̃ satisfies the conditions


of Theorem 4.22 it is the only point contained in
the face F = x ∈ FPM(G) : wT x = 0 .

Assume conversely that x̃ is an extreme point of FPM(G). We first show that x̃ is half-
integral.
 By Theorem 3.36 there is an integral vector c such that x̃ is the unique solution of
max cT x : x ∈ FPM(G) .
Construct a bipartite graph H = (VH , EH ) from G by replacing each node v ∈ V by two
nodes v ′ , v ′′ and replacing each edge e = (u, v) by two edges e ′ = (u ′ , v ′′ ) and e ′′ =
(v ′ , u ′′ ) (see Figure 4.5 for an illustration). We extend the weight function c : E → R to
EH by setting c(u ′ , v ′′ ) = c(v ′ , u ′′ ) = c(u, v).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.6 Matchings and Integral Polyhedra II 63

1′ 1 ′′

4
x̃e4 = 0 2′ 2 ′′

2
x̃e1 = 1/2 x̃e2 = 1/2
3′ 3 ′′
1 3
x̃e3(a)= The
1/2original graph G.
4′ 4 ′′
(b) The bipartite graph H.

Figure 4.5: Construction of the bipartite graph H in the proof of Theorem 4.22.

Observe that, if x ∈ FPM(G), then x ′ defined by xu ′ ′


′ ,v ′′ := xv ′ ,u ′′ := xuv is a vector
in FPM(H) of twice the objective function value of x. Conversely, if x ′ ∈ FPM(H), then

xuv = 12 (xu ′
′ v ′′ + xu ′′ v ′ ) is a vector in FPM(G) of half of the objective function value

of x .
By Birkhoff’s Theorem (Theorem 4.7), the problem


max cT xH : xH ∈ FPM(H)

has an integral optimal solution x∗H . Using the correspondence xuv = 21 (x∗u ′ v ′′ + x∗u ′′ v ′ )
we obtain a half-integral optimal solution to


max cT x : x ∈ FPM(G) .

Since x̃ was the unique optimal solution to this problem, it follows that x̃ must be half-
integral.
If x̃ is half-integral, it follows that the edges {e : x̃e = 1/2} must form node disjoint cycles
(every node that meets a half-integral edge, meets exactly two of them). As in the proof
of Birkhoff’s Theorem, none of these cycles can be even, since otherwise x̃ is no extreme
point. ✷

With the help of the previous result, we can now prove the Perfect Matching Polytope
Theorem, which we restate here for convenience.

Theorem 4.23 (Perfect Matching Polytope Theorem) For any graph G = (V, E) without
loops, the convex hull conv(PM(G)) of the perfect matchings in G is identical to the set of
solutions of the following linear system:
x(δ(v)) = 1 for all v ∈ V (4.17a)
x(δ(S)) > 1 for all S ⊆ V, 3 6 |S| 6 |V| − 3 odd (4.17b)
xe > 0 for all e ∈ E. (4.17c)
The inequalities (4.17b) are called blossom inequalities.

Proof: We show the claim by induction on the number |V| of vertices of the graph G =
(V, E). If |V| 6 2, then the claim is trivial. So, assume that |V| > 2 and the claim holds for
all graphs with fewer vertices.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


64 Integrality of Polyhedra

Let P be the polyhedron defined by the inequalities (4.17) and let x ′ ∈ P be any extreme
point of P. Since conv(PM(G)) ⊆ P, the claim of the theorem follows if we can show that
x ′ ∈ PM(G). Since {x ′ } is a minimal face of P, by Theorem 3.6 there exists a subset E ′ ⊆ E
of the edges and a family S ′ of odd subsets S ⊆ V such that x ′ is the unique solution to:

x(δ(v)) = 1 for all v ∈ V (4.18a)


x(δ(S)) = 1 for all S ∈ S ′ (4.18b)
xe = 0 for all e ∈ E ′ . (4.18c)

Case 1: S ′ = ∅.
In this case, x ′ is an extreme point of FPM(G). By Theorem 4.22, x ′ is half-integral and the
fractional edges form node-disjoint odd cycles. On the other hand, x ′ satisfies the blossom
inequalities (4.17b) which means that x ′ can not contain any fractional entries. Thus, x ′
corresponds to a perfect matching.
Case 2: S ′ 6= ∅.
Fix S ∈ S ′ , by definition we have x ′ (δ(S)) = 1. Notice that |S| is odd. The complement S̄ :=
V \ S need not be of odd cardinality. However, if S̄ is even, then V contains an odd number
of vertices and no vector in P satisfies the blossom inequality 0 = x(∅) = x(δ(V)) > 1. In
this case, P is empty and we have P ⊆ conv(PM(G)). Thus, we will assume for the rest of
the proof that both sets, S and S̄ are of odd cardinality.

Let GS and GS̄ be the graphs obtained from G by shrinking S and S̄ = V \ S to a single
node (see Figure 4.6). Let xS and xS̄ be the restriction of x ′ to the edges of GS and GS̄ ,
respectively. By construction, xi (δ(S)) = xi (δ(S̄)) = 1 for i = S, S̄.

S S

shrink S

G GS

shrink S̄ = V \ S

V \S
GS̄

Figure 4.6: Graphs GS and GS̄ obtained from G by shrinking the odd sets S and V \ S in
the proof of Theorem 4.23.

It is easy to see that xS and xS̄ satisfy the constraints (4.17) with respect to GS and GS̄ ,
respectively. Since 3 6 |S| 6 |V| − 3, both graphs have fewer vertices than G. Thus, by the
induction hypothesis, we have xi ∈ conv(PM(Gi )) for i = S, S̄. Hence, we can write xi as

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.7 Total Dual Integrality 65

convex combinations of perfect matchings of Gi :

k
1 X MSj
xS = χ (4.19)
k
j=1
k
1 X MS̄j
xS̄ = χ (4.20)
k
j=1

Here, we have assumed without loss of generality that in both convex combinations the
number of vectors used is the same, namely k. Also, we have assumed a special form of
the convex combination which can be justified as follows: x ′ is an extreme point of P and
thus is rational. This implies that xi , i = S, S̄ are also rational. Since all λj are rational, any
P P µ
convex combination j λj yj can be written by using common denominator k as j kj yj ,
P
where all µj are integral. Repeating vector yj exactly µj times, we get the form k1 j zj .

For e ∈ δ(S) the number of j such that e ∈ MS S ′ S̄


j is kxe = kxe = kxe . This is the same
number of j such that e ∈ MS̄ S
j . Again: for every e ∈ δ(S) the number of j such that e ∈ Mj
is the same as the number of j with e ∈ MS̄ i S
j . Thus, we can order the Mj so that Mj
and MS̄ i
j share an edge in δ(S) (any Mj , i = S, S̄ has exactly one edge from δ(S)). Then,
Mj := MS S̄
j ∪ Mj is a perfect matching of G since every vertex in G is matched and no
vertex has more than one edge incident with it. We have:

k
1 X Mj
x′ = χ . (4.21)
k
i=1

Since Mj is a perfect matching of G we see from (4.21) that x ′ is a convex combination of


perfect matchings of G. Since x ′ is an extreme point, it follows that x ′ must be a perfect
matching itself. ✷

4.7 Total Dual Integrality


Another concept for proving integrality of a polyhedron is that of total dual integrality.

Definition 4.24 (Totally dual integral system)


A rational linear system Ax 6 b is totally dual integral (TDI), if for each integral vector c
such that

zLP = max cT x : Ax 6 b

is finite, the dual



min bT y : AT y = c, y > 0

has an integral optimal solution.

Theorem 4.25 If Ax 6 b is TDI and b is integral, then the polyhedron P = {x : Ax 6 b} is


integral.

Proof: If Ax 6 b is TDI and b is integral, then the dual Linear Program




min bT y : AT y = c, y > 0

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


66 Integrality of Polyhedra

has an optimal integral objective value if it is finite (the optimal vector y∗ is integral and b
is integral by assumption, so bT y∗ is also integral). By LP-duality, we see that the value


zLP = max cT x : Ax 6 b

is integral for all c ∈ Zn where the value is finite. By Theorem 4.3(iv), the polyhedron P
is integral. ✷

Example 4.26
Let G = (A ∪ B, E) be a complete bipartite graph. Suppose we wish to solve a generalized
form of S TABLE S ET on G in which we are allowed to pick a vertex more than once. Given
weights cab for the edges (a, b) we want to solve the following Linear Program:
X
max wv xv (4.22a)
v∈V
xa + xb 6 cab for all (a, b) ∈ E (4.22b)

We claim that the system of ineqalities xa + xb 6 cab (a, b) ∈ A is TDI. The dual of (4.22)
is given by:
X
min cab yab (4.23a)
(a,b)∈E
X
yab = wa for all a ∈ A (4.23b)
b∈B
X
yab = wb for all b ∈ B (4.23c)
a∈A
yab > 0 for all (a, b) ∈ A. (4.23d)

The constraint matrix of (4.23) is the constraint matrix of the assignment problem, which
we have already shown to be totally unimodular (see Example 4.15). Thus, if the weight
vector w is integral, then the dual (4.23) has an optimal integral solution (if it is feasible).

It should be noted that the condition “and b is integral” in the previous theorem is cru-
cial. It can be shown that for any rational system Ax 6 b there is an integer ω such that
(1/ω)Ax 6 (1/ω)b is TDI. Hence, the fact that a system is TDI does not yet tell us any-
thing useful about the structure of the corresponding polyhedron.
We have seen that if we find a TDI system with integral right hand side, then the corre-
sponding polyhedron is integral. The next theorem shows that the converse is also true: if
our polyhedron is integral, then we can also find a TDI system with integral right hand side
defining it.

Theorem 4.27 Let P be a rational polyhedron. Then, there exists a TDI system Ax 6 b
with A integral such that P = P(A, b). Moreover, if P is an integral polyhedron, then b can
be chosen to be integral.

Proof: Let P = {x ∈ Rn : Mx 6 d} be a rational polyhedron. If P = ∅, then the claim is


trivial. Thus, we assume from now on that P is nonempty. Since we can scale the rows of M
by multiplying with arbitrary scalars, we can assume without loss of generality that M has
integer entries. We also assume that the system Mx 6 d does not contain any redundant
rows. Thus, for any row mTi of M there is an x ∈ P with mTi x = di .

Let S = s ∈ Zn : s = MT y, 0 6 y 6 1 be the set of integral vectors which can be written
as nonnegative linear combinations of the rows of M where no coefficient is larger than one.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.7 Total Dual Integrality 67

Since y comes from a bounded domain and S contains only integral points, it follows that
S is finite. For s ∈ S we define


z(s) := max sT x : x ∈ P .
P
Observe that, if s ∈ S, say s = m T
i=1 yi mi , and x ∈ P, then mi x 6 di which means
T
yi mi x 6 yi di for i = 1, . . . , m, from which we get that
m
X m
X m
X
T
s x= yi mTi x 6 yi di 6 |di |.
i=1 i=1 i=1

Thus, sT x is bounded on P and z(s) < +∞ for all s ∈ S. Moreover, the inequality sT x 6
z(s) is valid for P. We define the system Ax 6 b to consist of all inequalities sT x 6 z(s)
with s ∈ S.
Every row mTi of M is a vector in S (by assumption M is integral and mTi is a degenerated
linear combination of the rows, namely with coefficient one for itself and zero for all other
rows). Since mTi x 6 di for all x ∈ P, the inequality mTi x 6 di is contained in Ax 6 b.
Furthermore, since we have only added valid inequalities to the system, it follows that

P = {x : Ax 6 b} . (4.24)

If P is integral, then by Theorem 4.3 the value z(s) is integral for each s ∈ S, so the system
Ax 6 b has an integral right hand side. The only thing that remains to show is that Ax 6 b
is TDI.

Let c be an integral vector such that zLP = max cT x : Ax 6 b is finite. We have to
construct an optimal integral solution to the dual


min bT y : AT y = c, y > 0 . (4.25)

We have


zLP = max cT x : Ax 6 b


= max cT x : x ∈ P (by (4.24))


= max cT x : Mx 6 d


= min dT y : MT y = c, y > 0 (by LP-duality). (4.26)

Let y∗ be an optimal solution for the problem in (4.26) and consider the vector s̄ =
MT (y∗ − ⌊y∗ ⌋). Observe that y − ⌊y∗ ⌋ has entries in [0, 1]. Morever s̄ is integral, since
s̄ = MT y∗ − MT ⌊y∗ ⌋ = c − MT ⌊y∗ ⌋ and c, MT and ⌊y∗ ⌋ are all integral. Thus, s̄ ∈ S.
Now,


z(s̄) = max s̄T x : x ∈ P


= min dT y : MT y = s̄, y > 0 (by LP-duality). (4.27)

The vector y∗ − ⌊y∗ ⌋ is feasible for (4.27) by construction. If v is feasible for (4.27), then
v + ⌊y∗ ⌋ is feasible for (4.26). Thus, it follows easily that y − ⌊y∗ ⌋ is optimal for (4.27).
This implies z(s̄) = dT (y∗ − ⌊y∗ ⌋), or

zLP = dT y∗ = z(s̄) + dT ⌊y∗ ⌋. (4.28)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


68 Integrality of Polyhedra

Consider the integral vector ȳ defined as ⌊y∗ ⌋ for the dual variables corresponding to rows
in M, one for the dual variable corresponding to the constraint s̄T x 6 z(s̄) and zero every-
where else. Clearly, ȳ > 0. Moreover,
X
AT ȳ = ȳs s = MT ⌊y∗ ⌋ + 1 · s̄ = MT ⌊y∗ ⌋ + MT (y∗ − ⌊y∗ ⌋) = MT y∗ = c.
s∈S

Hence, ȳ is feasible for (4.25). Furthermore,


(4.28)
bT ȳ = z(s̄) + dT ⌊y∗ ⌋ = zLP .
Thus, ȳ is an optimal integral solution for the dual (4.25). ✷

4.8 Submodularity and Matroids


In this section we apply our results about TDI systems to prove integrality for a class of
important polyhedra.
Definition 4.28 (Submodular function)
Let N be a finite set. A function f : 2N → R is called submodular, if
f(A) + f(B) > f(A ∩ B) + f(A ∪ B) for all A, B ⊆ N. (4.29)
The function is called nondecreasing if
f(A) 6 f(B) for all A, B ⊆ N with A ⊆ B. (4.30)

Usually we will not be given f “explicitly”, that is, by a listing of all the 2|N| pairs
(A, f(A)). Rather, we will have access to f via an “oracle”, that is, given A we can compute
f(A) by a call to the oracle.

Example 4.29
(i) The function f(A) = |A| is nondecreasing and submodular.
(ii) Let G = (V, E) be an undirected graphPwith edge weights u : E → R+ . The func-
tion f : 2V → R+ defined by f(A) := e∈δ(A) u(e) is submodular but not neces-
sarily nondecreasing.

Definition 4.30 (Submodular polyhedron, submodular optimization problem)
Let f be submodular and nondecreasing. The submodular polyhedron associated with f is
 
 X 
P(f) := x ∈ Rn + : xj 6 f(S) for all S ⊆ N. (4.31)
 
j∈S

The submodular optimization problem is to optimize a linear objective function over P(f):


max cT x : x ∈ P(f) .

Observe that by the polynomial time equivalence of optimization and separation (see Sec-
tion 5.4) we can solve the submodular optimization problem in polynomial time if we can
solve the corresponding separation problem in polynomial time: We index the polyhedra
by the pair (f, N), and it is easy to see that this class is proper.
We now consider the simple Greedy algorithm for the submodular optimization problem
described in Algorithm 4.1. The surprising result proved in the following theorem is that
the Greedy algorithm in fact solves the submodular optimization problem. But the result is
even stronger:

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.8 Submodularity and Matroids 69

Algorithm 4.1 Greedy algorithm for the submodular optimization problem.


G REEDY-S UBMODULAR
1 Sort the variables such that c1 > c2 > · · · > ck > 0 > ck+1 > · · · > cn .
2 Set xi := f(Si ) − f(Si−1 ) for i = 1, . . . , k and xi = 0 for i = k + 1, . . . , n, where Si =
{1, . . . , i} and S0 = ∅.

Theorem 4.31 Let f be a submodular and nondecreasing function with f(∅) = 0, c : N →


R be an arbitrary weight vector.

(i) The Greedy algorithm solves the submodular optimization problem for maximizing
cT x over P(f).

(ii) The system (4.31) is TDI.

(iii) For integral valued f, the polyhedron P(f) is integral.

Proof:

have xi = f(Si ) − f(Si−1 ) > 0 for i = 1, . . . , k. Let


(i) Since f is nondecreasing, we P
S ⊆ N. We have to show that j∈S xj 6 f(S). By the submodularity of f we have
for j ∈ S:
=A =B =A∪B =A∩B
z }| { z}|{ z}|{ z }| {
f(Sj ∩ S) + f(Sj−1 ) > f( Sj ) + f(Sj−1 ∩ S)
⇔f(Sj ) − f(Sj−1 ) 6 f(Sj ∩ S) − f(Sj−1 ∩ S) (4.32)

Thus,
X X  
xj = f(Sj ) − f(Sj−1 )
j∈S j∈S∩Sk
X  
6 f(Sj ∩ S) − f(Sj−1 ∩ S) (by (4.32))
j∈S∩Sk
X 
6 f(Sj ∩ S) − f(Sj−1 ∩ S) (f nondecreasing)
j∈Sk

= f(Sk ∩ S) − f(∅)
6 f(S).

Thus, the vector x computed by the Greedy algorithm is in fact contained in P(f). Its
solution value is
Xk  
cT x = ci f(Si ) − f(Si−1 ) . (4.33)
i=1

We now consider the Linear Programming dual of the submodular optimization prob-
lem:
X
wD = min f(S)yS
S⊆N
X
yS > cj for all j ∈ N
S:j∈S
yS > 0 for all S ⊆ N.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


70 Integrality of Polyhedra

If we can show that cT x = wD , then it follows that x is optimal. Construct a vector y


by ySi = ci − ci+1 for i = 1, . . . , k − 1, ySk = ck and yS = 0 for all other sets S ⊆ N.
Since we have sorted the sets such that c1 > c2 > ck > 0 > ck+1 > · · · > cn , it
follows that y has only nonnegative entries.
For j = 1, . . . , k we have

X k
X k−1
X
yS > ySi = (ci − ci+1 ) + ck = cj .
S:j∈S i=j i=j

On the other hand, for j = k + 1, . . . , n


X
yS > 0 > cj .
S:j∈S

Hence, y is feasible for the dual. The objective function value for y is:

X k
X k−1
X
f(S)yS = f(Si )ySi = f(Si )(ci − ci+1 ) + f(Sk )ck
S⊆N i=1 i=1
k
X
= (f(Si ) − f(Si−1 )ci
i=1
T
= c x,

where the last equality stems from (4.33). Thus, y must be optimal for the dual and
x optimal for the primal.
(ii) The proof of statement (ii) follows from the observation that, if c is integral, then the
optimal vector y constructed is integral.
(iii) Follows from (ii) and Theorem 4.25.

An important class of submodular optimization problems are induced by special submod-


ular functions, namely the rank functions of matroids.

Definition 4.32 (Independence system, matroid)


Let N be a finite set and I ⊆ 2N . The pair (N, I) is called an independence system, if A ∈ I
and B ⊆ A implies that B ∈ I. The sets in I are called independent sets.
The independence system is a matroid if for each A ∈ I and B ∈ I with |B| > |A| there exists
a ∈ B \ A with A ∪ {a} ∈ I.
Given a matroid (N, I), its rank function r : 2N → N is defined by

r(A) := max {|I| : I ⊆ A and I ∈ I} .

Observe that r(A) 6 |A| for any A ⊂ N and r(A) = |A| if any only if A ∈ I. Thus, we
could alternatively specify a matroid (N, I) also by (N, r).

Lemma 4.33 The rank function of a matroid is submodular and nondecreasing.

Proof: The fact that r is nondecreasing is trivial. Let A, B ⊆ N. We must prove that

r(A) + r(B) > r(A ∪ B) + r(A ∩ B).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


4.8 Submodularity and Matroids 71

Let X ⊆ A ∪ B with |X| = r(A ∪ B) and Y ⊆ A ∩ B with |Y| = r(A ∩ B). Let X ′ := Y. Since
X ′ is independent and X is independent, if |X ′ | < |X| we can add an element from X to X ′
without loosing independence. Continuing this procedure, we find X ′ with |X ′ | = |X| and
Y ⊆ X ′ by construction. Hence, we can assume that Y ⊆ X. Now,
r(A) + r(B) > |X ∩ A| + |X ∩ B|
= |X ∩ (A ∩ B)| + |X ∩ (A ∪ B)|
> |Y| + |X|
= r(A ∩ B) + r(A ∪ B).
This shows the claim. ✷
Example 4.34 (Matrix matroid)
Let A be an m × n-matrix with columns a1 , . . . , an . Set N := {1, . . . , n} and the family I by
the condition that S ∈ I if and only if the vectors {ai : i ∈ S} are linearly independent. Then
(N, I) is an independendence system. By Steinitz’ Theorem (basis exchange) from Linear
algebra, we know that (N, I) is in fact also a matroid. ⊳

Example 4.35  P
Let E = {1, 3, 5, 9, 11} and F := A ⊆ E : e∈A e 6 20 . Then, (E, F) is an independence
system but not a matroid.

P is an independence system follows from the property that, if B ⊆ A ∈ F,


P that (E, F)
The fact
then e∈B e 6 e∈A e 6 20.
Now consider B := {9, 11} ∈ F and A := {1, 3, 5, 9} ∈ F where |B| < |A|. However, there is
no element in A \ B that can be added to B without losing independence. ⊳

Definition 4.36 (Tree, forest)


Let G = (V, E) a graph. A forest in G is a subgraph (V, E ′ ) which does not contain a cycle.
A tree is a forest which is connected (i.e., which contains a path between any two vertices).

Example 4.37 (Graphic matroid)


Let G = (V, E) be a graph. We consider the pair (E, F), where
F := {T ⊆ E : (V, T ) is a forest} .
Clearly, (E, F) is an independence system, since by deleting edges from a forest, we obtain
again a forest. We show that the system is also a matroid. If (V, T ) is a forest, then it
follows easily by induction on |T | that (V, T ) has exactly |V| − |T | connected components.
Let A ∈ I and B ∈ I with |B| > |A|. Let C1 , . . . , Ck be the connected components of (V, A)
where k = |V| − |A|. Since (V, B) has fewer connected components, there must be an edge
e ∈ B \ A whose endpoints are in different components of (V, A). Thus A ∪ {e} is also a
forest. ⊳

Lemma 4.38 Let G = (V, E) be a graph and (N, I) be the associated graphic matroid.
Then the rank function r is given by r(A) = |V| − comp(V, A), where comp(V, A) denotes
the number of connected components of the graph (V, A).

Proof: Let A ⊆ E. It is easy to see that in a matroid all maximal independent subsets of A
have the same cardinality. Let C1 , . . . , Ck be the connected components of (V, A). For
i = 1,P
. . . , k we can find a spanning tree of Ci . The union of these spanning trees is a forest
with k i=1 (|Ci | − 1) = |V| − k edges, all of which are in A. Thus, r(A) > |V| − k.
Assume now that F ⊆ A is an independent set. Then, F can contain only edges of the
components C1 , . . . , Ck . Since F is a forest, the restriction of F to any Ci is also a forest,
which implies that |F∩Ci | 6 |Ci |−1. Thus, we have |F| 6 |V|−k and hence r(A) 6 |V|−k.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


72 Integrality of Polyhedra

Definition 4.39 (Matroid optimization problem)


P c : N → R, the matroid optimization problem
Given a matroid (N, I) and a weight function
is to find a set A ∈ I maximizing c(A) = a∈A c(a).

By Lemma 4.33 the polyhedron


 
 X 
P(N, I) := x ∈ Rn
+ : xj 6 r(S) for all S ⊆ N
 
j∈S

is a submodular polyhedron. Moreover, by the integrality of the rank function and The-
orem 4.31(iii) P(N, I) is integral. Finally, since r({j}) 6 1 for all j ∈ N, it follows that
0 6 x 6 1 for all x ∈ P(N, I). Thus, in fact we have:
 
 X 
P(N, I) = conv  x ∈ Bn : xj 6 r(S) for all S ⊆ N.  (4.34)
 
j∈S

With equation (4.34) it is easy to see that the matroid optimization problem reduces to
the submodular optimization problem. By Theorem 4.31(i) the Greedy algorithm finds an
optimal (integral) solution and thus solves the matroid optimization problem.
It is worthwhile to have a closer look at the Greedy algorithm in the special case of a
submodular polyhedron induced by a matroid. Let again be Si = {1, . . . , i}, S0 = ∅. Since
r(Si ) − r(Si−1 ) ∈ {0, 1} and the algorithm works as follows:

(i) Sort the variables such that c1 > c2 > · · · > ck > 0 > ck+1 > · · · > cn .

(ii) Start with S = ∅.


(iii) For i = 1, . . . , k, if S ∪ {i} ∈ I, then set S := S ∪ {i}.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Part II

Algorithms
Basics about Problems and Complexity
5
5.1 Encoding Schemes, Problems and Instances
We briefly review the basics from the theory of computation as far as they are relevant in
our context. Details can be found for instance in the book of Garey and Johnson [GJ79].
Informally, a decision problem is a problem that can be answered by “yes” or ”no”. As an
example consider the problem of asking whether a given graph G = (V, E) has a Hamilto-
nian Tour. Such a problem is characterized by the inputs for which the answer is “yes”. To
make this statement more formal we need to define exactly what we mean by “input”, in
particular if we speak about complexity in a few moments.

1 4

Figure 5.1: Example of an undirected graph.

Let Σ be a finite set of cardinality at least two. The set Σ is called the alphabet. By Σ∗
we denote the set of all finite strings with letters from Σ. The size |x| of a string x ∈ Σ∗ is
defined to be the number of characters in it. For example, the undirected graph depicted
in Figure 5.1 can be encoded as the following string over an appropriate alphabet which
contains all the necessary letters:

({001, 010, 011, 100}, {(001, 010), (001, 011), (010, 011), (011, 100)})

Here, we have encoded the vertices as binary numbers.


A (decision) problem is a subset Π ⊆ Σ∗ . To decide Π means, given x ∈ Σ∗ to decide
whether x ∈ Π or not. The string x is called the input of the problem. One speaks of an
instance of the problem if one asks for a concrete input x whether x belongs to Π or not.
76 Basics about Problems and Complexity

Example 5.1 (Stable Set Problem)


A stable set (or independent set) in an undirected graph G = (V, E) is a subset S ⊆ V of
the vertices such that none of the vertices in S are joined by an edge. The set of instances
of the stable set problem (S TABLE S ET) consists of all words (a, b) such that:

• a encodes an undirected graph G = (V, E)

• b encodes an integer k with 0 6 k 6 |V|

• G contains a stable set of size at least k.

Alternatively we could also say that an instance of S TABLE S ET is given by an undirected


graph G(V, E) and an integer k and the problem is to decide wether G has a stable set of
size at least k. ⊳

Classical complexity theory expresses the running time of an algorithm in terms of the size
of the input, which is intended to measure the amount of data necessary to describe an
instance of a problem. Naturally, there are many ways to encode a problem as words over
an alphabet Σ. We assume the following standard encoding:

• Integers are encoded in binary. The standard binary representation of an integer n


uses ⌈log2 (|n| + 1)⌉ + 1 bits. We need an additional bit to encode the sign. Hence,
the encoding length of an integer n is hni = ⌈log2 (|n| + 1)⌉ + 2.

• Any rational number r has a unique representation r = p/q where p ∈ Z, q ∈ N


and p and q do not have a common divisor other than 1. The encoding length of r is
hri := hpi + hqi.
P
• The encoding length of a vector x = (x1 , . . . , xn )T ∈ Qn is ni=1 hxi i.
P
• The encoding length of a matrix A = (aij ) ∈ Qm×n is i,j haij i.

• Graphs are encoded by means of their adjacency matrices, incidence matrices or


adjacency lists.
The adjacency matrix of a graph G = (V, E) with n := |V| nodes is the n × n matrix
A(G) ∈ Bn×n with aij = 1 if (vi , vj ) ∈ E and aij = 0 if (vi , vj ) ∈
/ E. The incidence
matrix of G is the m × n matrix I(G) with rows corresponding to the vertices of G
and columns representing the arcs/edges of G. The column corresponding to arc
(vi , vj ) has a −1 at row i and a +1 at row j. All other entries are zero. Since we
have already specified the encoding lengths of matrices, the size of a graph encoding
is now defined if one of the matrix represenations is used.
The adjacency list representation consists of the number n of vertices and the num-
ber m of edges plus a vector Adj of n lists, one for each vertex. The list Adj[u]
contains all v ∈ V such that (u, v) ∈ E. Figure 5.2 shows an example of such a rep-
resentation for a directed graph. The size of the adjacency list representation of a
graph is hni + hmi + m + n for directed graphs and hni + hmi + 2m + n for undi-
rected graphs.
One might wonder why we have allowed different types of encodings for graphs.
The answer is that for the question of polynomial time solvability it does in fact not
matter which representation we use, since they are all polynomially related in size.

Example 5.2
An instance (G = (V, E), k) of S TABLE S ET has size hni + hmi + n + 2m + hki if we use
the adjacency list representation. ⊳

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


5.2 The Classes P and NP 77

v Adj[v]

1 2 7 3 4

2 4 1 5 1
1
2 4 3 5 5
7
4 5 2
1 1 2 6
5 NULL
4
6 NULL
5
3 5

Figure 5.2: Adjacency list representation of a directed graph.

Example 5.3
Suppose we are given a MIP

(MIP) max cT x
Ax 6 b
x>0
x ∈ Zp × Rn−p .
where all data A, b, c is integral. Then, the encoding size of an instance of the decision
version of the MIP which asks whether there exists a feasible solution x with cT x > k is
given by
hAi + hbi + hci + hki + n + p.

The running time of an algorithm on a specific input is defined to be the sum of times taken
by each instruction executed. The worst case time complexity or simply time complexity of
an algorithm is the function T (n) which is the maximum running time taken over all inputs
of size n (cf. [AHU74, GJ79, GLS88]). An algorithm is a polynomial time algorithm if
T (n) 6 p(n) for some polynomial p.

5.2 The Classes P and NP


Definition 5.4 (Complexity Class P)
The class P consists of all problems which can be decided in polynomial time on a deter-
ministic Turing machine.

Example 5.5
The decision version of the maximum flow problem, that is, decide whether the following
IP has a solution of value greater than a given flow value F,
X X
max f(i, t) − f(t, j)
(i,t)∈A (t,j)∈A
X X
f(j, i) − f(i, j) = 0 for all i ∈ V \ {s, t}
(j,i)∈A (i,j)∈A

0 6 f(i, j) 6 u(i, j) for all (i, j) ∈ A

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


78 Basics about Problems and Complexity

is in P, since for instance the Edmonds-Karp-Algorithm solves it in time O(nm2 ) which is


polynomial in the input size.1 ⊳

Definition 5.6 (Complexity Class NP)


The class NP consists of all problems Π ⊆ Σ∗ such that there is a problem Π ′ ∈ P and a
polynomial p such that for each x ∈ Σ∗ the following property holds:

x ∈ Π if and only if there exists y ∈ Σ∗ with |y| 6 p(|x|) and (x, y) ∈ Π ′ .

The word y is called a certificate for x.

It is clear from the definition of NP and P that P ⊆ NP.

Example 5.7 (Stable Set Problem (continued))


S TABLE S ET is in NP, since if a graph G = (V, E) has a stable set S of size k we can simply
use S as a certificate. Clearly it can be checked in polynomial time that S is in fact a stable
set in G of size at least k. ⊳

To obtain a classification of “easy” and “difficult” problems we introduce the concept of a


polynomial time reduction:

Definition 5.8 (Polynomial Time Reduction)


A polynomial time reduction of a problem Π to a problem Π ′ is a polynomial time com-
putable function f with the property that x ∈ Π if and only if f(x) ∈ Π ′ .
We say that Π is polynomial time reducible to Π ′ if there exists a polynomial time reduction
from Π to Π ′ . In this case we write Π ∝ Π ′ .

Intuitively, if Π ∝ Π ′ , then Π could be called “easier” than Π ′ , since we can reduce the
solvability of Π to that of Π ′ . In fact, we have the following important observation:

Observation 5.9 If Π ∝ Π ′ and Π ′ ∈ P, then Π ∈ P.

Definition 5.10 (Vertex Cover)


Let G = (V, E) be an undirected graph. A vertex cover in G is a subset C ⊆ V of the
vertices of G such that for each edge e ∈ M at least one endpoint is in C.

Example 5.11
An instance of the vertex cover problem (V C) consists of a graph G and an integer k. The
question posed is whether there exists a vertex cover of size at most k in G.
We claim that V C ∝ S TABLE S ET. To see this observe that C is a vertex cover in G if and
only if V \ C is a stable set. This immediately leads to a polynomial time reduction. ⊳

Example 5.12
A clique C in a graph G = (V, E) is a subset of the nodes such that every pair of nodes
in C is connected by an edge. The clique problem (C LIQUE) asks whether a given graph G
contains a clique of size at least a given number k.
Since C is a clique in G if and only if C is a stable set in the complement graph Ḡ, a
polynomial time reduction from C LIQUE to S TABLE S ET is obtained by computing Ḡ. ⊳

We are now ready to define “hard” problems:


1 There are actually faster algorithms such as the FIFO-Preflow-Push
 2
 Algorithm of Goldberg and Tarjan which
runs in time O(n3 ). This time can even be reduced to O nm log n m by the use of sophisticated data structures.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


5.2 The Classes P and NP 79

Definition 5.13 (NP-complete problem)


A problem Π ′ ∈ NP is called NP-complete if Π ∝ Π ′ for every problem Π ∈ NP.

By the fact that the composition of polynomial time reductions is again a polynomial time
reduction, we have the following useful observation:

Observation 5.14 Suppose that Π is NP-complete. Let Π ′ ∈ NP with the property that
Π ∝ Π ′ . Then, Π ′ is also NP-complete.

From Observation 5.9 we also have the following important fact:

Observation 5.15 Suppose that Π is NP-complete and also Π ∈ P. Then, P = NP.

Most problems encountered in these lecture notes are optimization problems rather than
decision problems. Clearly, to any minimization problem (maximization problem) we can
associate a corresponding decision problem that asks whether there exists a feasible solu-
tion of cost at most (at least) a given value.

Definition 5.16 (NP-hard optimiziation problem)


An optimization problem whose corresponding decision problem is NP-complete is called
NP-hard.

Definition 5.13 raises the question whether there exist NP-complete problems. This ques-
tion was settled in the affirmative in the seminal work by Cook [Coo71] and Karp [Kar72].
An instance of the Satisfiability Problem (S AT) is given by a finite number n of Boolean
variables X1 , . . . , Xn and a finite number m of clauses
Cj = Li1 ∨ Li2 ∨ · · · ∨ Lij ,

where Lil ∈ {Xil , Xil } is a literal, that is, a variable or its negation. Given an instance of
S AT the question posed is whether there exists an assignment of truth values to the variables
such that all clauses are satisfied.

Theorem 5.17 (Cook [Coo71]) S AT is NP-complete.

Karp [Kar72] showed a number of elementary graph problems to be NP-complete, among


them S TABLE S ET, C LIQUE, V C and the T SP. Since then the list of NP-complete problems
has been growing enormously (see the book Garey and Johnson [GJ79] for an extensive list
of NP-complete problems).

Theorem 5.18 (Karp [Kar72]) The following problems are all NP-complete:

• S TABLE S ET
• C LIQUE
• VC
• T SP
• K NAPSACK

We have already remarked above that P ⊆ NP. By Observation 5.15 we would have P =
NP if we find a polynomial time algorithm for a single NP-complete problem. Despite
great efforts to date no one has succeeded in doing so. It is widely believed that NP-
completeness, or, more general, NP-hardness of a problem is a certificate of intractability.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


80 Basics about Problems and Complexity

5.3 The Complexity of Integer Programming


We now consider the complexity of Integer Linear Programming. To this end we first
address the case of binary variables.

(BIP) max cT x
Ax 6 b
x>0
x ∈ Bn

Theorem 5.19 Binary Integer Programming (BIP) is NP-hard.

Proof: In order to show that the decision version of BIP (here also called BIP for simplicity)
is NP-complete we have to prove two things:

(i) BIP is in NP.

(ii) There is an NP-complete problem Π such that Π is polynomial time reducable to


BIP.

The fact that BIP is contained in NP is easy to see. Suppose that the instance (A, b, c, k)
has a feasible solution x∗ with cT x > k. The encoding size of x∗ ∈ Bn is n (since x∗ is
a binary vector with n entries). Since we can check in polynomial time that in fact x∗ is a
feasible solution by checking all the contraints and the objective function value, this gives
us a certificate that (A, b, c, k) is a “yes”-instance.
In order to prove the completeness of BIP, we reduce the satisfiability problem S AT to BIP.
Suppose that we are given an instance (X1 , . . . , Xn , C1 , . . . , Cm ) of S AT. For a clause Cj we
denote by C+ −
j and Cj the index sets of positive and negative literals in Cj , respectively.
We now formulate the following BIP:

max 0
X X
xi + (1 − xi ) > 1 for j = 1, . . . , m (5.1)
i∈C+
j i∈C−
j

xi ∈ {0, 1} for i = 1, . . . , n

It is easy to see that the BIP can be set up in polynomial time given the instance of S AT.
Consider a clause Cj . If we have a truth assignment to the variables in Cj that satisfies Cj ,
then the corresponding setting of binary values to the variables in the BIP will satisfy the
constraint (5.1) and vice versa. Hence it follows that the BIP has a feasible solution if and
only if the given instance of S AT is satisfiable. ✷

Integer Linear Programming and Mixed Integer Programming are generalizations of Binary
Integer Programming. Hence, Binary Integer Programming can be reduced to both prob-
lems in polynomial time. Can we conclude now that both IP and MIP are NP-complete?
We can not, since we still have to show that in case of a “yes”-instance there exists a cer-
tificate of polynomial size. This was not a problem for the BIP since the certificate just
involved n binary values, but for general IP and MIP we have to bound the entries of a
solution vector x by a polynomial in hAi, hbi and hci. This is somewhat technical and we
refer the reader to [Sch86, NW99].

Theorem 5.20 Solving MIP and IP is NP-hard. ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


5.4 Optimization and Separation 81

Thus, solving general Integer Linear Programs is a hard job. Nevertheless, proving a prob-
lem to be hard does not make the problem disappear. Thus, in the remainder of these notes
we seek to explore the structure of specific problems which sometimes makes it possible to
solve them in polynomial time or at least much more efficiently than brute-force enumera-
tion.

5.4 Optimization and Separation


We have seen a number of problems which had an exponential number of constraints (e.g.
the formulation of the T SP in Example 1.9, the formulation of the minimum spanning
tree problem in Example 1.8). Having such a large number of constraints in a Linear
Programming problem does not defeat the existence of a polynomial time algorithm, at
least not if we do not write down the LP explicitly. We will now make this statement
precise.
To this end, we need a framework for describing linear systems that may be very large
(relative to the size of the combinatorial problem that we want to solve). For instance, we
do not want to list all the 2n inequalities for the LP-relaxation of the MST-problem.

Definition 5.21 (Separation problem over an implicitly given polyhedron)


Given a bounded rational polyhedron P ⊂ Rn and a rational vector v ∈ Rn , either con-
clude that v ∈ P or, if not, find a rational vector w ∈ Rn such that wT x < wT v for all
x ∈ P.

Definition 5.22 (Optimization problem over an implicitly given polyhedron)


Given a bounded rational polyhedron P ⊂ Rn and a rational objective vector c, either find
x∗ ∈ P maximizing cT x over P or conclude that P is empty.

A famous theorem of Grötschel, Lovász and Schrijver [GLS88] says that the separation
problem is polynomial time solvable if and only if the corresponding optimization problem
is. To make this statement more precise, we need a bit of notation.
We consider classes of polyhedra P = {Po : o ∈ O}, where O is some collection of objects.
For instance, O can be the collection of all graphs, and for a fixed o ∈ O, Po is the convex
hull of all incidence vectors of spanning trees of o. The class P of polyhedra is called
proper, if for each o ∈ O we can compute in polynomial time (with respect to the size of o)
positive integers no and so such that Po ⊂ Rno and Po can be described by a linear system
of inequalities each of which has encoding size at most so .

Example 5.23
We consider the LP-relaxation of the IP-formulation for the MST-problem from Exam-
ple 1.8, that is, the LP obtained by dropping the integrality constraints:
X
min ce xe (5.2a)
e∈E
X
xe > 1 for all ∅ ⊂ S ⊂ V (5.2b)
e∈δ(S)
X
xe = n − 1 (5.2c)
e∈E
06x61 (5.2d)

The class of polyhedra associated with (5.2) is proper. Given the graph (object) o = G =
(V, E), the dimension where the polytope Po lives in is m = |E|, the number of edges of G.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


82 Basics about Problems and Complexity

Clearly, m can be computed in polynomial time. Moreover each of the inequalities (5.2b)
has encoding size at most m + 1, since we need to specify at most m variables to sum up.
Each of the inequalities (5.2d) has also size at most m + 1. ⊳

We say that the separation problem is polynomial time solvable for a proper class of poly-
hedra P, if there exists an algorithm that solves the separation problem for each Po ∈ P in
time polynomial in the size of o and the given rational vector v ∈ Rno . The optimiziation
problem over P is polynomial time solvable, if there exists a polynomial time algorithm for
solving any instance (Po , c) of the optimization problem, where o ∈ O and c is a rational
vector in Rno .
The exact statement of the theorem of Grötschel, Lovász and Schrijver [GLS88] is as fol-
lows:

Theorem 5.24 For any proper class of polyhedra, the optimization problem is polynomial
time solvable if and only if the separation problem is polynomial time solvable.

The proof is beyond the scope of these lecture notes, and we refer the reader to [GLS88,
NW99] for details. We close this section with an example to give a flavor of the application
of this result.
Example 5.25
Consider again the LP-relaxation of the IP-formulation for the MST-problem. We have
already seen that the class of polytopes associate with this problem is proper. We show
that we can solve the separation problem in polynomial time. To this end, let v ∈ RE be a
vector. We have to check whether v ∈ Po and if not, find a violated inequality.
As a first step, we check the 2n constraints 0 6 ve 6 1 for all e ∈ E. Clearly, we can do
this in polynomial time. If we find a violated P inequality, we are already done. The more
difficult part is to check the 2n − 2 constraints e∈δ(S) ve > 1.
Consider the graph GP = (V, E) with edge weights given by v, that is, edge e has weight ve .
Then, the constraints e∈δ(S) ve > 1 for ∅ ⊂ S ⊂ V say that with respect to this weighing,
G must not have a cut of weight less than 1. Thus, we solve a minimum cut problem in G
with edge weights v. If the minimum cut has weight at least 1, we know that v satisfies all
In the other case, if S is one side of the minimum cut, which has weight less
constraints. P
than 1, then e∈δ(S) ve > 1 is a violated inequality. Since we can solve the minimum cut
problem in polynomial time, we can now solve the separation problem in polynomial time,
too.
By the result in Theorem 5.24, it now follows that we can solve the optimization prob-
lem (5.2) in polynomial time. ⊳

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Relaxations and Bounds
6
6.1 Optimality and Relaxations
Suppose that we are given an IP


z = max cT x : x ∈ X ,

where X = {x : Ax 6 b, x ∈ Zn } and a vector x∗ ∈ X which is a candidate for optimality.


Is there a way we can prove that x∗ is optimal? In the realm of Linear Programming the
Duality Theorem provides a nice characterization of optimality. Similarly, the concept of
complementary slackness is useful for (continuous) Linear Programs.
Unfortunately, there are no such nice characterizations of optimality for Integer Linear
Programs. However, there is a simple concept that sometimes suffices to prove optimality,
namely the concept of bounding with the help of upper and lower bounds. If we are given
a lower bound z 6 z and an upper bound z 6 z̄ and we know that z̄ − z 6 ε for some ε > 0,
then we have enclosed the optimal function value within an accuracy of ε. Moreover, if
z̄ = z, then we have found an optimal solution.
In particular, consider the case mentioned at the beginning of this section, we had a candi-
date x∗ for optimality. Then, z = cT x∗ is a lower bound for the optimal solution value z.
If we are able to find an upper bound z̄ such that cT x∗ = z̄, we have shown that x∗ is an
optimal solution.
In the case of maximization (minimization) problems lower (upper) bounds are usually
called primal bounds, whereas upper (lower) bounds are referred to as dual bounds.
In the sequel we consider the case of a maximization problem


max cT x : x ∈ X ⊆ Rn ,

the case of a minimization problem is analogous.

Primal bounds Every feasible solution x ∈ X provides a lower bound z = cT x for the
optimal objective function value z. This is the only known way to establish primal
bounds and the reason behind the name: every primal bound goes with a feasible
solution for the (primal) problem.
Sometimes, it is hard to find a feasible solution. However, sometimes we can find
a feasible solution (and thus a lower bound) almost trivially. For instance, in the
traveling salesman problem (see Example 1.9), any permutation of the cities gives a
feasible solution.
84 Relaxations and Bounds

Dual bounds In order to find upper bounds one usually replaces the optimization problem
by a simpler problem, whose optimization value is at least as large as z. For the
“easier” problem one either optimizes over a larger set or one replaces the objective
function by a version with larger value everywhere.

The most useful concept for proving dual bounds is that of a relaxation:

Definition 6.1 (Relaxation)


Consider the following two optimization problems:


(IP) z = max cT x : x ∈ X ⊆ Rn
(RP) zRP = max {f(x) : x ∈ T ⊆ Rn }

The problem RP is called a relaxation of IP if X ⊆ T and f(x) > cT x for all x ∈ X.

Clearly, if RP is a relaxation of IP, then zRP > z. We collect this property and a few more
easy but useful observations:

Observation 6.2 Let RP be a relaxation of IP. Then, the following properties hold:

(i) zRP > z

(ii) If RP is infeasible, then IP is also infeasible.

(iii) Suppose that x∗ is an optimal solution to RP. If x∗ ∈ X and f(x∗ ) = cT x∗ , then x∗ is


also optimal for IP.

One of the most useful relaxations of integer programs are the linear programming relax-
ations:

Definition 6.3 (Linear Programming relaxation) 


The Linear Programming relaxation of the IP max cT x : x ∈ P ∩ Z with formulation

P = {x : Ax 6 b} is the Linear Program zLP = max cT x : x ∈ P .

Example 6.4
Consider the following slight modification of the IP from Example 1.2 (the only modifica-
tion is in the objective function):

z = max − 2x1 + 5x2


2x2 − 3x1 6 2
x1 + x2 6 5
1 6 x1 6 3
1 6 x2 6 3
x1 , x2 ∈ Z

A primal bound is obtained by observing that (2, 3) is feasible for the IP which gives z =
−2·2+5·3 = 11 6 z. We obtain a dual bound by solving the LP relaxation. This relaxation
has an optimal solution (4/3, 3) with value zLP = 37/3. Hence, we have an upper bound
z̄ = 37/3 > z. But we can improve this upper bound slightly. Observe that for integral
x1 , x2 the objective function value is also integral. Hence, if z 6 37/3 we also have z 6
⌊37/3⌋ = 12 (in fact we have z = 11). ⊳

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6.1 Optimality and Relaxations 85

3
b

2
b b b

1
b b b

0
0 1 2 3 4 5

Figure 6.1: Feasible region for the LP-relaxation in Example 6.4 and feasible set of the IP.

Example 6.5 (Knapsack Problem)


Consider the Linear Programming relaxation of the Knapsack Problem (K NAPSACK) from
Example 1.3.
n
X
max ci xi
i=1
Xn
ai xi 6 b
i=1
0 6 xi 6 1 for i = 1, . . . , n.

In this relaxation we are not required to either pack an item i or not pack it, but we are
allowed to pack an arbitrary fraction 0 6 xi 6 1 of the item into the knapsack. Although
we could solve the LP-relaxation by standard Linear Programming methods, this is in fact
overkill! Suppose that we sort the items into nonincreasing order according to their “bang-
for-the-buck-ratio” ci /ai . Without loss of generality we assume now that c1 /a1 > c2 /a2 >
Pi0
· · · > cn /an . Let i0 be the largest index such that i=1 ai 6 b. We pack all items 1, . . . , i0
Pi0
completely (xi = 1 for i = 1, . . . , i0 ) and fill the remaining empty space s = b − i=1 ai
with s/ai0 +1 units of item i0 + 1 (xi0 +1 = s/ai0 +1 ). All other items stay completely out
of the knapsack. It is easy to see that this in fact yields an optimal (fractional) packing.

We turn to the relation between formulations and the quality of Linear Programming relax-
ations:

Lemma 6.6 Let P1 and P2 be formulations for the  set X ⊆ R n , where P1 is better
T
than P2 . Consider the integer program z = max c x : x ∈ X and denote by zLP
IP
 i =
max cT x : x ∈ Pi for i = 1, 2 the values of the associated Linear Programming relax-
ations. Then we have
zIP 6 zLP
1 6 z2
LP

for all vectors c ∈ Rn .

Proof: The result immediately follows from the fact that X ⊆ P1 ⊆ P2 . ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


86 Relaxations and Bounds

Example 6.7
By using the formulation given in Example 6.4 we obtained an upper bound of 12 for the
optimal value of the IP. If we use the ideal formulation

z = max − 2x1 + 5x2


x1 + x2 6 5
− x1 + x2 6 1
1 6 x1 6 3
1 6 x2 6 3
x1 , x2 ∈ Z

then we obtain the optimal solution for the relaxation (2, 3) with objective function value 11.
Thus, (2, 3) is an optimal solution for the original IP. ⊳

6.2 Combinatorial Relaxations


Sometimes the relaxation of a problem is a combinatorial optimization problem. In this
case we speak of a combinatorial relaxation. We have already seen an example of such a
problem in Example 6.5, where the LP-relaxation of K NAPSACK turned out to be solvable
by combinatorial methods. Combinatorial relaxations are particularly nice if we can solve
them in polynomial time.

Example 6.8 (Traveling Salesman Problem)


Consider the T SP from Example 1.9.

n X
X n
min cij xij
i=1 j=1
X
xij = 1 for i = 1, . . . , n
j:j6=i
X
xij = 1 for j = 1, . . . , n
i:i6=j
XX
xij 6 |S| − 1 for all ∅ ⊂ S ⊂ V. (6.1)
i∈S j∈S

x ∈ Bn(n−1)

Observe that if we drop the subtour elimination constraints (6.1) we have in fact an assign-
ment problem. An assignment in a directed graph G = (V, A) is a subset T ⊆ A of the
arcs such that for each vertex v ∈ V we have either exactly one outgoing arc or exactly one
incoming arc. If we interprete the T SP in the graph theoretic setting as we already did in
Example 1.9 we get:
 
 X 
zT SP = min cij : T forms a tour
T ⊆A  
(i,j)∈T
 
 X 
> min cij : T forms an assignment
T ⊆A  
(i,j)∈T

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6.2 Combinatorial Relaxations 87

Assignments in directed graphs are an analogon to matchings in undirected graphs. Match-


ings will play an important role througout our lecture notes, one reason being that a theorem
by Edmonds about perfect matching polyhedra sparked the interest in polyhedral combina-
torics.
We remark here that we can decide in polynomial time whether a given graph has a perfect
matching. We can also compute a perfect matching of minimum weight in polynomial
time.
Example 6.9 (Symmetric Traveling Salesman Problem)
If in the T SP all distances are symmetric, that is cij = cji for all i, j, one speaks of a symmet-
ric traveling salesman problem. This problem is best modelled on a complete undirected
graph G = (V, E). Each tour corresponds to a Hamiltonian cycle in G, that is, a cycle which
touches each vertex exactly once.
Observe that every T SP-tour (i.e., a Hamiltonian cycle) also contains a spanning tree: If
we remove one edge of the tour we obtain a path, a somewhat degenerated spanning
tree. Hence, the problem of finding a minimum spanning tree in G = (V, E) with edge
weights cij is a relaxation of the symmetric T SP:

zT SP > zM ST .

Recall that a minimum spanning tree can be computed in polynomial time for instance by
Kruskal’s algorithm. ⊳

The relaxation in the previous example can actually be used to prove an improved lower
bound on the length of the optimal T SP-tour. Suppose that we are given an instance of the
symmetric T SP where the distances satisfy the triangle inequality, that is, we have

cij 6 cik + ckj for all i, j, k.

Let T be a minimum spanning tree in G = (V, E) and let O denote the subset of the vertices
which have an odd degree in T . Observe that O contains an even number of vertices, since
the sum of all degrees in T sums up to 2(n − 1) which is even. Build a complete auxiliary
graph H = (O, EO ) where the weight of an edge (u, v) coincides with the corresponding
edgeweight in G. Since H contains an even number of vertices and is complete, there exists
a perfect matching M in H. We can compute a perfect matching with minimum weight in H
in polynomial time.
P
Lemma 6.10 The total weight c(M) = e∈M ce of the minimum weight perfect matching
in H is at most 1/2zT SP .

Proof: Let O = (v1 , . . . , v2k ) be the sequence of odd degree vertices in T in the order as
they are visited by the optimum T SP-tour T ∗ . Consider the following two perfect matchings
in H:

M1 = {(v1 , v2 ), (v3 , v4 ), . . . , (v2k−1 , v2k )}


M2 = {(v2 , v3 ), (v4 , v5 ), . . . , (v2k , v1 )}

Figure 6.2 shows an illustration of the matchings M1 and M2 . By the triangle inequality
we have c(T ∗ ) > c(M1 ) + c(M2 ), so that at least one of the two matchings has weight at
most 1/2c(T ∗ ). Thus, the minimum weight perfect matching must have weight at most
1/2c(T ∗ ). ✷

Consider the graph Ḡ = (V, T ∪ M) obtained by adding the edges of the minimum weight
perfect matching M to the spanning tree T (see Figure 6.3(c)). Then, by construction any

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


88 Relaxations and Bounds

v1 v2 v13 v14
v1 v2

v8 v9 v10 v3
v8 v3

v7 v11 v4 v7 v4

v15 v6 v5 v6 v5

(a) The T SP-tour. (b) The two matchings

Figure 6.2: The two matchings constructed in Lemma 6.10 from the optimal T SP-tour.
Matching M1 contains all the solid edges, M2 all the dashed ones.

node in Ḡ has even degree. Since Ḡ is also connected (by the fact that it contains a spanning
tree), it follows that Ḡ is Eulerian (see e.g. [Har72, AMO93]), that is, it contains a cycle W
which traverses every edge exactly once. We have that
X 1 3
c(W) = c(e) = c(T ) + c(M) 6 zT SP + zT SP = zT SP . (6.2)
2 2
w∈W

In other words, 32 c(W) 6 zT SP is a lower bound for the optimal T SP-tour. Moreover, we can
convert the cycle W into a feasible tour T ′ by “shortcutting” the Eulerian cycle: we start to
follow W at node 1. If we have reached some node i, we continue to the first subsequent
node in W which we have not visited yet. Due to the triangle inequality, the tour obtained
this way has weight at most that of W which by (6.2) is at most 3/2zT SP . The algorithm
we just described is known as Christofides’ algorithm. Figure 6.3 shows an example of the
execution. Let T ′ be the tour found by the algorithm. By (6.2) we know that

2
c(T ′ ) 6 zT SP 6 c(T ′ ).
3
We summarize our results:

Theorem 6.11 Given any instance of the symmetric T SP with triangle inequality, the al-
gorithm of Christofides returns a tour of length at most 3/2zT SP . ✷

Christofides’ algorithm falls in the class of approximation algorithms. Those are algorithms
with provable worst-case performance guarantees.

Example 6.12 (1-Tree relaxation for the symmetric T SP)


Another relaxation of the symmetric T SP is obtained by the following observation: Every
tour consists of two edges incident to node 1 and a path through nodes {2, . . . , n} (in some
order). Observe that every path is also a tree. Call a subgraph T of G = (V, E) a 1-tree, if
T consists of two edges incident on node 1 and the edges of a tree on nodes {2, . . . , n}.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6.2 Combinatorial Relaxations 89

1 2 3 1 2 3

4 4

5 5

6 7 6 7

8 8
(a) The minimum spanning tree T . (b) The mimimum weight perfect match-
ing M on the vertices in T with odd de-
gree.

1 2 3 1 2 3

4 4

5 5

6 7 6 7

8 8
(c) The Eulerian graph Ḡ = (V, T ∪ (d) The approximate tour T ′ obtained by
M). shortcutting.

Figure 6.3: Example of the execution of Christofides’ algorithm for the symmetric T SP
with triangle inequality.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


90 Relaxations and Bounds

We can now derive the following estimate:


 
X
T SP
z = min ce : T is a tour
e∈T
 
X
> min ce : T is a 1-tree (6.3)
e∈T
1-Tree
=: z

The problem in (6.3) is called the 1-tree relaxation of the (symmetric) T SP. Observe that
we can find an optimal 1-tree easily in polynomial time: Let e, f ∈ E with e 6= f be the two
lightest edges incident with 1 and let H = G\{1} be the graph obtained from G by removing
vertex 1. Then, the weight of an optimal 1-tree is ce + cf + M ST(H), where M ST(H) is the
weight of a minimum spanning tree in H. ⊳

6.3 Lagrangian Relaxation


The shortest path problem consists of finding a path of minimum weight between given
vertices s and t in a directed (or undirected) graph G. Here, we consider the directed
version, that is, we are given a directed graph G = (V, A) with arc weights c : A → R+ .
We set up binary variables x(i, j) for (i, j) ∈ A where x(i, j) = 1 if the path from s to t uses
arc (i, j). Then, the shortest path problem (which is a special case of the minimum cost
flow problem) can be written as the following IP:
X
min c(i, j)x(i, j)
(i,j)∈A

X X 1
 if i = t
x(j, i) − x(i, j) = −1 if i = s (6.4)


j:(j,i)∈A j:(i,j)∈A 0 otherwise
x ∈ BA

The mass balance constraints (6.4) ensure that any feasible solution contains a path from s
to t. Now, the shortest path problem can be solved very efficiently by combinatorial algo-
rithms. For instance, Dijkstra’s algorithm implemented with Fibonacci-heaps runs in time
O(m + n log n), where n = |V| and m = |A|. In other words, the above IP is “easy” to
solve.
Now, consider the addition of the following constraint to the IP: in addition to the arc
weights c : A → R+ we are also given a second set of arc costs d : A → R+ . Given a
budget B we wish to find a shortest (s, t)-path with respect to the c-weights subject to the
constraint that the d-cost of the path is at most B. This problem is known as the resource
constrained shortest path problem (R CSP).

Lemma 6.13 The R CSP is NP-hard to solve.

Proof: We show the claim by providing a polynomial time reduction from K NAPSACK,
which is known to be NP-complete to solve. Given an instance of K NAPSACK with items
{1, . . . , n} we construct a directed acyclic graph for the R CSP as depicted in Figure 6.4. Let
Z := max{ci : i = 1, . . . , n}. For each i = 1, . . . , n there are two arcs ending at node i. One
represents the action that item i is packed and has c-weight Z − ci > 0 and d-weight ai .
The other arc corresponds to the case when item i is not packed into the knapsack and has

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6.3 Lagrangian Relaxation 91

c-weight Z and d-weight 0. We set the budget in the R CSP to be D := b, where b is the
bound specified in the instance of K NAPSACK. It is now easy to see that there is a path P
from s to t of c-weight at most nZ − u and d-weight at most b, if and only if we can pack
items of profit at least u into the knapsack of size b.

s 1 2 3 n t

Figure 6.4: Graph used in the proof of NP-completeness of the R CSP


P
Apparently, the addition of the constraint (i,j)∈A d(i, j)x(i, j) 6 B makes our life much
harder! Essentially, we are in the situation, where we are given an integer program of the
following form:

z = max cT x (6.5a)
Dx 6 d (6.5b)
x ∈ X, (6.5c)

with X = {Ax 6 b, x ∈ Zn } and Dx 6 d are some “complicating constraints” (D is a k × n


matrix and d ∈ Rk ).
Of course, we can easily obtain a relaxation of (6.5) by simply dropping the complicating
constraints Dx 6 d. However, this might give us very bad bounds. Consider
P for instance
the R CSP depicted in Figure 6.5. If we respect the budget constraint a∈A da xa 6 B,
then the length of the shortest path from s to t is Ω. Dropping the constraint allows the
shortest path consisting of arc a1 which has length 1.

ca1 = 1, da1 = B + 1
s t
ca2 = Ω, da2 = 1

P
Figure 6.5: A simple R CSP where dropping the constraint a∈A da xa 6 B leads to a
weak lower bound.

An alternative to dropping the constraints is to incorporate them into the objective function
with the help of Lagrange multipliers:

Definition 6.14 (Langrangian Relaxation)


For a vector u ∈ Rm + the Lagrangian relaxation IP(u) for the IP (6.5) is defined as the
following optimization problem:

IP(u) z(u) = max cT x + uT (d − Dx) (6.6a)


x∈X (6.6b)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


92 Relaxations and Bounds

We note that there is no “unique” Lagrangian relaxation, since we are free to include or
exclude some constraints by the definition of the set X. The relaxation IP(u) as defined
above is usually called the relaxation obtained by relaxing the constraints Dx 6 d. The
vector u > 0 is referred to as the price, dual variable or Lagrangian multiplier associated
with the constraints Dx 6 d.

Lemma 6.15 For any u > 0, the problem IP(u) is a relaxation of the IP (6.5).

Proof: The fact that any feasible solution to the IP is also feasible to IP(u) is trivial:
X ⊇ {x : Dx 6 d, x ∈ X}. If x is feasible for the IP, then Dx 6 d or d − Dx > 0. Since
u > 0 we have uT (d − Dx) > 0 and thus cT x + uT (d − Dx) > cT x as required. ✷

Consider again the R CSP. What happens, if we relax the budget constraint
X
d(i, j)x(i, j) 6 B?
(i,j)∈A

For u ∈ R+ the Lagrangian relaxation is


 
X X
min c(i, j)x(i, j) + u  d(i, j)x(i, j) − B
(i,j)∈A (i,j)∈A

X X 1
 if i = t
x(j, i) − x(i, j) = −1 if i = s


j:(j,i)∈A j:(i,j)∈A 0 otherwise
x ∈ BA .

If we rearrange the terms in the objective function, we obtain


X
(c(i, j) + ud(i, j))x(i, j) − uB.
(i,j)∈A

P
In other words, IP(u) asks for a path P from s to tPminimizing a∈P (ca + uda ) − uB.
If u > 0 is fixed, this is the same as minimizing a∈P (ca + uda ). Hence, IP(u) is a
(standard) shortest path problem with weights (c + ud) on the arcs.
We have seen that IP(u) is a relaxation of IP for every u > 0, in particular z(u) > zIP .
In order to get the best upper bound for zIP we can optimize over u, that is, we solve the
Lagrangian Dual Problem:

(LD) wLD = min {z(u) : u > 0} . (6.7)

Lemma 6.16 Let u > 0 and x(u) be an optimal solution for IP(u). Suppose that the
following two conditions hold:

(i) Dx(u) 6 d (that is, x is feasible for IP);

(ii) (Dx(u))i = di if ui > 0 (that is, x is complementary to u).

Then, x(u) is an optimal solution for the original IP.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6.3 Lagrangian Relaxation 93

Proof: Since x(u) is feasible for IP, we have cT x(u) 6 z. On the other hand

z 6 cT x(u) + uT (d − Dx(u)) (by Lemma 6.15)


m
X
= cT x(u) + ui (Dx(u) − d)i (by assumption (ii))
| {z }
i=1
=0
T
= c x(u).

Thus, cT x(u) = z and x(u) is optimal as claimed. ✷

Observe that if we dualize some equality constraints Dx = d, then the Lagrange multi-
plier u is unrestricted in sign. In this case, the Lagrangian dual becomes:

(LD) wLD = min {z(u) : u ∈ Rm } . (6.8)

Moreover, in this case, assumption (ii) of Lemma 6.16 is automatically satisfied. Hence, if
x(u) is feasible for the IP, then x(u) is also optimal.
What kind of bounds can we expect from LD? The following theorem addresses this ques-
tion:

Theorem 6.17 Let X = {x : Ax 6 b, x ∈ Zn } where A and b are rational, and let z denote
the optimal value of the IP (6.5):


z = max cT x : Dx 6 x, x ∈ X

Then for the value of the Lagrangian dual as defined in (6.8) we have:


wLD = max cT x : Dx 6 d, x ∈ conv(X) > z.

Proof: We have:

wLD = min {z(u) : u > 0}




= min max cT x + uT (d − Dx) : x ∈ X
u>0


= min max (c − DT u)T x + uT d : x ∈ X
u>0


= min max (c − DT u)T x + uT d : x ∈ conv(X)
u>0


= min max cT x + uT (d − Dx) : x ∈ conv(X) ,
u>0

where the fourth equality follows from the fact that the objective function is linear. If
X = ∅, then conv(X) = ∅ and wLD = −∞ since the inner maximum is taken over the
empty set for every u. Since z = −∞, the result holds in this case.
If X 6= ∅ by Theorem 3.65 conv(X) is a rational polyhedron. Let xk , k ∈ K, be the extreme
points and rj , j ∈ J, be the extreme rays of conv(X). Fix u > 0. We have


z(u) = max cT x + uT (d − Dx) : x ∈ conv(X)

+∞ if (cT − uT D)rj > 0 for some j ∈ J
=
cT xk + uT (d − Dxk ) for some k ∈ K otherwise.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


94 Relaxations and Bounds

Since for the minimization of z(u) it suffices to consider the case of finite z(u), we have

wLD = min max cT xk + uT (d − Dxk )


u>0:(cT −uT D)rj 60 for j ∈ J k∈K

We can restate the last problem as follows:

wLD = min t (6.9a)


k T T k
t + (Dx − d) u > c x for k ∈ K (6.9b)
j T T j
(Dr ) u > c r for j ∈ J (6.9c)
u∈ Rm
+ ,t ∈ R (6.9d)

Observe that the problem (6.9) is a Linear Program. Hence by Linear Programming duality
(cf. Theorem 2.9) we get:
X X
wLD = max αk cT xk + βj cT rj
k∈K j∈J
X
αk = 1
k∈K
X X
αk (Dxk − d) + βj Drj 6 0
k∈K j∈J
αk , βj > 0, for k ∈ K, j ∈ J

Rearranging terms this leads to:


 
X X
wLD = max cT  αk xk + βj rj 
k∈K j∈J
X
αk = 1
k∈K
 
X X X
D αk xk + βj rj  6 d( αk )
k∈K j∈J k∈K

αk , βj > 0, for k ∈ K, j ∈ J
P k
P j
Since any point of the form k∈K αk x + j∈J βj r is in conv(X) (cf. Minkowski’s The-
orem), this gives us:


wLD = max cT x : x ∈ conv(X), Dx 6 d .

This completes the proof. ✷

Theorem 6.17 tells us exactly how strong the Langrangean dual is. Under some circum-
stances, the bounds provided are no better than that of the LP-relaxation:
 More precisely,
if
conv(X) = {x : Ax 6 b}, then by Theorem 6.17: wLD = max cT x : Ax 6 b, Dx 6 d and
wLD is exactly the value of the LP-relaxation of (6.5):


zLP = max cT x : Ax 6 b, Dx 6 d

However, since

conv(X) = conv {x ∈ Zn : Ax 6 b} ⊆ {x ∈ Rn : Ax 6 b} ,

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6.3 Lagrangian Relaxation 95

we always have
z 6 wLD 6 zLP ,
and in most cases we will have wLD < zLP . In Chapter 11 we will learn more about La-
grangian duality. In particular, we will be concerned with the question how to solve the
Lagrangian dual.

Example 6.18 (Lagrangian relaxation for the symmetric T SP)


The symmetric T SP can be restated as the following Integer Linear Program:
X
zT SP = min ce xe (6.10a)
e∈E
X
xe = 2, for all i ∈ V (6.10b)
e∈δ(i)
X
xe 6 |S| − 1, for all 2 6 |S| 6 |V| − 1 (6.10c)
e∈E(S)

x ∈ BE (6.10d)

Here δ(i) denotes the set of edges incident with node i and E(S) is the set of edges that
have both endpoints in S.
Let x be any feasible solution to the LP-relaxation of (6.10). Let S ⊂ V with 2 6 |S| 6 |V|−1
and S̄ = V \ S. We denote by (S, S̄) the set of edges which have one endpoint in S and the
other one in S̄. Due to the constraints (6.10b) we have:
1X X
|S| = xe .
2
i∈S e∈δ(i)

Hence,
X 1X X X
|S| − xe = xe − xe
2
e∈E(S) i∈S e∈δ(i) e∈E(S)
1 X
= xe . (6.11)
2
e∈(S,S̄)

By the analogous calculation we get


X 1 X
|S̄| − xe = xe . (6.12)
2
e∈E(S̄) e∈(S,S̄)
P P
From (6.11) and (6.12) we conclude that e∈E(S) xe 6 |S|−1 if and only if e∈E(S̄) xe 6
|S̄|−1. This calculation shows that in fact half of the subtour elimination constraints (6.10c)
are redundant. As a consequence, we can drop all subtour constraints with 1 ∈ S. This gives
us the equivalent integer program:
X
zT SP = min ce xe (6.13a)
e∈E
X
xe = 2, for all i ∈ V (6.13b)
e∈δ(i)
X
xe 6 |S| − 1, for all 2 6 |S| 6 |V| − 1, 1 ∈
/S (6.13c)
e∈E(S)

x ∈ BE (6.13d)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


96 Relaxations and Bounds

P
If we sum up all the constraints (6.13b) we get 2 e∈E xe = 2n, since every edge is
counted exactly twice. Now comes our Lagrangian relaxation: We dualize all degree
constraintsP(6.13b) in (6.13) but leave the degree constraint for node 1. We also add the
constraint e∈E xe = n. This gives us the following relaxation:
X X
z(u) = min (cij − ui − uj )xe + 2 ui (6.14a)
(i,j)∈E i∈V
X
xe = 2 (6.14b)
e∈δ(1)
X
xe 6 |S| − 1, for all 2 6 |S| 6 |V| − 1, 1 ∈
/S (6.14c)
e∈E(S)
X
xe = n (6.14d)
e∈E

x ∈ BE (6.14e)

Let us have a closer look at the relaxation (6.14). Every feasible solution T has two edges
incident with node 1 by constraint (6.14b). By constraint (6.14d), the solution must consist
of exactly n edges and hence has a cycle. By constraints (6.14c), T can not contain a cycle
that does not contain node 1. Thus, if we remove one edge incident to 1, then T is cycle
free and must contain a spanning tree on the vertices {2, . . . , n}. To summarize, the feasible
solutions for (6.14) are exactly the 1-trees. We have already seen in Example 6.12 that we
can compute a minimum weight 1-tree in polynomial time. Hence, the relaxation (6.14) is
particularly interesting. ⊳

6.4 Duality
As mentioned at the beginning of Chapter 6, for Linear Programs the Duality Theorem
is a convenient way of proving optimality or, more general, for proving upper and lower
bounds for an optimization problem. We are now going to introduce the concept of duality
in a more general sense.

Definition 6.19 ((Weak) dual pair of optimization problems)


The two optimization problems:

(P) z = max {c(x) : x ∈ X}


(D) w = min {d(y) : y ∈ Y}

form a (weak-) dual pair, if c(x) 6 d(y) for all x ∈ X and all y ∈ Y. If additionally z = w,
then we say that the problems form a strong-dual pair.

By the Duality Theorem of Linear Programming, the two problems

(P) max{ cT x : Ax 6 b }
(D) min{ bT y : AT y = c, y > 0 }

form a strong-dual pair.

Example 6.20 (Matching and vertex cover)


Given an undirected graph G = (V, E), the following two problems form a weak-dual pair:

max {|M| : M ⊆ E is a matching in G}


min {|C| : C ⊆ V is a vertex cover in G}

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


6.4 Duality 97

In fact, let M ⊆ E be a matching in G and C ⊆ V a vertex cover. Since M is a matching,


the edges in M do not share a common endpoint. Thus, since C must contain at least one
endpoint for each e ∈ M (and all of the 2|M| endpoints are different as we have just seen),
we get that |C| > |M|.
Do the two problems form a strong-dual pair? The answer is no: Figure 6.6 shows an
example of a graph, where the maximum size matching has cardinality 1, but the minimum
vertex cover has size 2. After a moment’s thought, it would be very surprising if the two
problems formed a strong-dual pair: the vertex-cover problem is NP-hard to solve, whereas
the matching problem is solvable in polynomial time. ⊳

2
e1 e2

1 3
e3
Figure 6.6: A graph where the maximum matching has size 1, but the minimum vertex
cover has size 2.


Lemma 6.21 The Integer Linear Program max cT x : Ax 6 b, x ∈ Zn and the Linear
 T
Program min b y : AT y = c, y ∈ Rm
+ form a weak dual pair. Moreover, also the two
Integer Linear Programs


max cT x : Ax 6 b, x ∈ Zn


min bT y : AT y = c, y ∈ Zm
+

form a weak dual pair.

Proof: For any feasible x̄, ȳ we have




cT x̄ 6 max cT x : Ax 6 b, x ∈ Zn


6 max cT x : Ax 6 b, x ∈ Rn


= min bT y : AT y = c, y ∈ Rm
+


6 min bT y : AT y = c, y ∈ Zm T
+ 6 b ȳ,

where the equality follows from the Duality Theorem of Linear Programming. ✷

Example 6.22 (Matching and vertex cover (continued))


We formulate the maximum cardinality matching problem as an integer program:
X
z = max xe (6.15a)
e∈E
X
xe 6 1, for all v ∈ V (6.15b)
e:e∈δ(v)

x ∈ BE (6.15c)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


98 Relaxations and Bounds

Let zLP be the value of the LP-relaxation of (6.15) obtained by dropping the integrality
constraints. The dual of the LP-relaxation is
X
wLP = min yv (6.16a)
v∈V
yu + yv > 1, for all e = (u, v) ∈ E (6.16b)
y > 0. (6.16c)

If we put integrality constraints on (6.16) we obtain the vertex cover problem. Let w denote
the optimal value of this problem. We then have z 6 zLP = wLP 6 w. This is an alternative
proof for the fact that the matching problem and the vertex cover problem form a weak dual
pair.
Consider again the graph depicted in Figure 6.6. We have already seen that for this graph
z = 1 and w = 2. Moreover, the Linear Programming relaxation is

max xe1 + xe2 + xe3


xe1 + xe2 6 1
xe1 + xe3 6 1
xe2 + xe3 6 1
xe1 , xe2 , xe3 > 0

The vector xe1 = xe2 = xe3 = 1/2 is feasible for this relaxation with objective function
value 3/2. The dual of the LP is

min y1 + y2 + y3
y1 + y2 > 1
y2 + y3 > 1
y1 + y3 > 1
y1 , y2 , y3 > 0

so that y1 = y2 = y3 = 1/2 is feasible for the dual. Hence we have zLP = wLP = 3/2. ⊳

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Dynamic Programming
7
7.1 Shortest Paths Revisited
Suppose that we are given a directed graph G = (V, A) with arc lengths c : A → R+ and we
wish to compute shortest paths from a distinguished source node s to all other vertices v ∈
V. Let d(v) denote the shortest path distance from s to v with respect to c.
Suppose the shortest path from s to v passes through some intermediate node u (see Fig-
ure 7.1 for an illustration). What can we say about the paths Psu and Puv from s to u and
from u to v? Clearly, Psu must be a shortest (s, u)-path since otherwise we could replace
it by a shorter one which together with Puv would form a shorter path from s to v. The
same arguments show that Puv must be a shortest (u, v)-path.
The above observations are known as the Bellman principle of optimality: any subpath
(any partial solution) of a shortest path (of an optimal solution) must be a shortest path (an
optimal solution of a subproblem) itself.
Let v ∈ V \ {s}. If we apply the Bellman principle to all predecessors u of v, that is, to all
u such that (u, v) ∈ A we get the following recurrences for the shortest path distances:

d(v) = min d(u) + c(u, v). (7.1)


u:(u,v)∈A

Equation (7.1) states that, if we know the shortest path distance d(u) from s to all prede-
cessors u of v, then we can easily compute the shortest path distance d(v). Unfortunately,

s v

Figure 7.1: A shortest path from s to v that passes through the intermediate node u. The
subpath from s to u must be a shortest (s, u)-path.
100 Dynamic Programming

it may be the case that in order to compute d(u) we also need d(v), because v in turn is a
predecessor of u. So, for general graphs, (7.1) does not yet give us a method for solving
the shortest path problem.
There is, however, one important class of graphs, where (7.1) can be used such as to obtain
a very efficient algorithm.

Definition 7.1 (Directed acyclic graph (DAG))


A directed acyclic graph (DAG) is a graph that does not contain a directed cycle.

DAGs allow a special ordering of the vertices defined below:

Definition 7.2 (Topological sorting)


Let G = (V, A) be a directed graph. A bijection f : V → {1, . . . , n} is called a topological
sorting of G if for all (u, v) ∈ A we have f(u) < f(v).

The proof of the following fact is left as an exercise:

Theorem 7.3 Let G = (V, A) be a directed graph. Then, G has a topological sorting if
and only if G is acyclic. If G is acyclic, the topological sorting can be computed in linear
time.

Proof: Exercise. ✷

Suppose that G is a DAG and that f is a topological sorting (which by the previous theorem
can be obtained in linear time O(n + m), where n = |V| and m = |A|). For the sake of
a simpler notation we assume that the vertices are already numbered v1 , . . . , vn according
to the topological sorting, that is, f(vi ) = i. We can now use (7.1) by computing d(vi ) in
order of increasing i = 1, . . . , n. In order to compute d(vi ) we need values d(vj ) where
j < i and these have already been computed. Thus, in case of a DAG (7.1) leads to an
algorithm which can be seen to run in total time O(n + m).
In our algorithm above we have calculated the solution value of a problem recursively
from the optimal values of slightly different problems. This method is known as dynamic
programming. We will see in this chapter that this method can be used to obtain optimal
solutions to many problems. As a first step we go back to the shortest path problem but
we now consider the case of general graphs. As we have already remarked above, the
recursion (7.1) is not directly useful in this case. We need to somehow enforce an ordering
on the vertices such as to get a usable recursion.
Let dk (v) be the length of a shortest path from s to v using at most k arcs. Then, we have

dk (v) = min dk−1 (v), min dk−1 (u) + c(u, v) . (7.2)
u:(u,v)∈A

Now, the above recurrence (7.2) gives us a way to compute the shortest path distances in
time O((n + m)n). We first compute d1 (v) for all v which is easy. Once we know all
values dk−1 , we can use (7.2) to compute dk (v) for all v ∈ V in time O(n + m) (every arc
has to be considered exactly once). Since the range of k is from 1 to n − 1 (every path with
at least n arcs contains a cycle and thus can not be shortest), we get the promised running
time of O(n(n + m)) = O(n2 + nm).
The general tune of a dynamic programming algorithm is the following: We compute a
solution value recursively from the values of modified problems. The problems that we
compute solutions for are referred to as states, the ordering in which we compute these are
usually called stages. We can imagine a dynamic programming algorithm as filling in the
values in a table which is indexed by the states. In a stage we compute one table entry from
other table entries which have been computed earlier.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


7.2 Knapsack Problems 101

7.2 Knapsack Problems


Consider the K NAPSACK problem (see Example 1.3):
n
X
z = max ci xi (7.3a)
i=1
Xn
ai xi 6 b (7.3b)
i=1
x ∈ Bn (7.3c)

Let us set up a dynamic programming algorithm for K NAPSACK. The states are subprob-
lems Pk (λ) of the form
k
X
(Pk (λ)) fk (λ) = max ci xi (7.4a)
i=1
k
X
ai xi 6 λ (7.4b)
i=1
x ∈ Bk (7.4c)
More verbosely, the problem Pk (λ) consists of getting the maximum profit from items
1, . . . , k where the size of the knapsack is λ. We have z = fn (b), thus we get the optimal
value once all fk (λ) for k = 1, . . . , n, λ = 0, . . . , b are known.
We need a recursion to calculate the fk (λ). Consider an optimal solution x∗ for Pk (λ).

• If it does not use item k, then (x∗1 , . . . , x∗k−1 ) is an optimal solution for Pk−1 (λ), that
is, fk (λ) = fk−1 (λ).
• If the solution uses k, then it uses weight at most λ − ak from items 1, . . . , k − 1. By
the arguments used for the shortest path problem, we get that (x∗1 , . . . , x∗k−1 ) must be
an optimal solution for Pk−1 (λ − ak ), that is, fk (λ) = fk−1 (λ − ak ) + ck .

Combining the above two cases yiels the recursion:


fk (λ) = max {fk−1 (λ), fk−1 (λ − ak ) + ck } . (7.5)
Thus, once we know all values fk−1 we can compute each value fk (λ) in constant time by
just inspecting two states. Figure 7.2 illustrates the computation of the values fk (λ). They
are imagined to be stored in a b × n-table. The kth column of the table is computed with
the help of the entries in the (k − 1)st column.
The recursion (7.5) gives a dynamic programming algorithm for K NAPSACK which runs in
total time O(nb) (it is easy to actually set the binary variables according to the results of
the recursion). Observe that this running time is not polynomial, since the encoding length
of an instance of K NAPSACK is only O(n + n log amax + n log cmax + log b).
Let us now consider a generalized version of K NAPSACK, where we are allowed to pack
more than one copy of an item. The Integer Knapsack Problem (I NTEGER K NAPSACK) is:
n
X
z = max ci xi (7.6a)
i=1
Xn
ai xi 6 b (7.6b)
i=1
x ∈ Zn
+ (7.6c)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


102 Dynamic Programming

1 2 k−1 k

fk−1 (λ − ak )

fk−1 (λ)fk (λ)

Figure 7.2: Computation of the values fk (λ) in the dynamic programming algorithm for
K NAPSACK.

Analogous to K NAPSACK we can define


k
X
(Pk (λ)) gk (λ) = max ci xi (7.7a)
i=1
k
X
ai xi 6 λ (7.7b)
i=1
x ∈ Zk
+ (7.7c)

As before, gn (b) gives us the optimal solution for the whole problem. We need a recursion
to compute the gk (λ), k = 1, . . . , n, λ = 0, . . . , b.
Let x∗ be an optimal solution for Pk (λ). Suppose that x∗k = t for some integer t ∈ Z+ .
Then, x∗ uses space at most λ − tak for the items 1, . . . , k − 1. It follows that (x∗1 , . . . , x∗k−1 )
must be an optimal solution for Pk (λ − tak ). These observations give us the following
recursion:
gk (λ) = max {ck t + gk−1 (λ − tak )} . (7.8)
t=0,...,⌊λ/ak ⌋

Notice that ⌊λ/ak ⌋ 6 b, since ak is an integer. Thus (7.8) shows how to compute gk (λ) in
time O(b). This yields an overall time of O(nb2 ) for the nb values gk (λ).
Let us now be a bit smarter and get a dynamic programming algorithm with a better running
time. The key is to accomplish the computation of gk (λ) by inspecting only a constant
(namely two) previously computed values instead of Θ(b) as in (7.8).
Again, let x∗ be an optimal solution for Pk (λ).

• If x∗k = 0, then gk (λ) = gk−1 (λ).


• If x∗k = t + 1 for some t ∈ Z+ , then (x∗1 , . . . , x∗k − 1) must be an optimal solution for
Pk (λ − ak ).

The above two cases can be combined into the new recursion

gk (λ) = max {gk−1 (λ), gk (λ − ak ) + ck } . (7.9)

Observe that the recursion allows us to compute all gk (λ) if we compute them in the order
k = 1, . . . , n and λ = 0, . . . , b. The total running time is O(nb) which is the same as for
K NAPSACK.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Branch and Bound
8
8.1 Divide and Conquer
Suppose that we would like to solve the problem


z = max cT x : x ∈ S .

If the above problem is hard to solve, we might try to break it into smaller problems which
are easier to solve.

Observation 8.1 If S = S1 ∪ · · · ∪ Sk and zi = max cT x : x ∈ Si for i = 1, . . . , k then
 i
z = max z : i = 1, . . . , k .

Although Observation 8.1 is (almost) trivial, it is the key to branch and bound methods
which turn out to be effective for many integer programming problems.
The recursive decomposition of S into smaller problems can be represented by an enumer-
ation tree. Suppose that S ⊆ B3 . We first divide S into S0 and S1 , where

S0 = {x ∈ S : x1 = 0}
S1 = {x ∈ S : x1 = 1}

The sets S0 and S1 will be recursively divided into smaller sets as shown in the enumeration
tree in Figure 8.1.
As a second example consider the tours of a TSP on four cities. Let S denote the set of
all tours. We first divide S into three pieces, S12 , S13 and S14 , where S1i ⊂ S is the set of
all tours which from city 1 go directly to city i. Set S1i in turn is divided into two pieces
S1i,ij , j 6= 1, i, where S1i,ij is the set of all tours in Si1 which leave city i for city j. The
recursive division is depicted in the enumeration tree in Figure 8.2. The leaves of the tree
correspond to the (4 − 1)! = 3! = 6 possible permutations of {1, . . . , 4} which have 1 at the
first place.

8.2 Pruning Enumeration Trees


It is clear that a complete enumeration soon becomes hopeless even for problems of
medium size. For instance, in the TSP the complete enumeration tree would have (n − 1)!
104 Branch and Bound

x1 = 0 x1 = 1

S0 S1

x2 = 0 x2 = 1 x2 = 0 x2 = 1

S00 S01 S10 S11

x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1 x3 = 0 x3 = 1

S000 S001 S010 S011 S100 S101 S110 S111

Figure 8.1: Enumeration tree for a set S ⊆ B3 .

S12 S13 S14

S12,23 S12,24 S13,32 S13,34 S14,42 S14,43

Figure 8.2: Enumeration tree for the TSP on four cities.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


8.2 Pruning Enumeration Trees 105

leaves and thus for n = 60 would be larger than the estimated number of atoms in the
universe (the estimated number of atoms in the universe is 1080 ).
The key to efficient algorithms is to “cut off useless parts” of an enumeration tree. This is
where bounds (cf. Chapter 6) come in. The overall scheme obtained is also referred to as
implicit enumeration, since most of the time not all of the enumeration tree is completely
unfolded.

Observation 8.2 Let S = S1 ∪ · · · ∪ Sk and zi = max cT x : x ∈ Si for i = 1, . . . , k then
 i
z = max z : i = 1, . . . , k . Suppose that we are given zi and z̄i for i = 1, . . . , k with

zi 6 zi 6 z̄i for i = 1, . . . , k.

Then for z = max cT x : x ∈ S it is true that:



max zi : i = 1, . . . , k 6 z 6 max z̄i : i = 1, . . . , k

The above observation enables us to deduce rules how to prune the enumeration tree.
i i
 TIf z = z̄ for
Observation 8.3 (Pruning by optimality)
i
some i, then the optimal solution
value for the ith subproblem z = max c x : x ∈ Si is known and there is no need to
subdivide Si into smaller pieces.

As an example consider the situation depicted in Figure 8.3. The set S is divided into S1
and S2 with the upper and lower bounds as shown.

S
z̄ = 30 → S
z̄ = 25
z = 10 z = 20
z̄1 = 20 z̄2 = 25 z̄2 = 25
S1 S2 S1 S2
z1 = 20 z2 = 15 z2 = 15
Figure 8.3: Pruning by optimality.

Since z1 = z̄1 , there is no need to further explore the branch rooted at S1 . So, this branch
can be pruned by optimality. Moreover, Observation 8.2 allows us to get new lower and
upper bounds for z as shown in the figure.
i j
Observation 8.4 (Pruning by bound)
i
 T If z̄ < z for some i 6= j, then the optimal solution
for the ith subproblem z = max c x : x ∈ Si can never be an optimal solution of the
whole problem. Thus, there is no need to subdivide Si into smaller pieces. Note that, if
z̄i < zj , then in particular zj > −∞, which in turn implies that Sj 6= ∅.

Consider the example in Figure 8.4. Since z̄1 = 20 < 21 = z2 , we can conclude that the
branch rooted at S1 does not contain an optimal solution: any solution in this branch has
objective function value at most 20, whereas the branch rooted at S2 contains solutions of
value at least 21. Again, we can stop to explore the branch rooted at S1 .

S
z̄ = 30 → S
z̄ = 25
z = 10 z = 21
z̄1 = 20 z̄2 = 25 z̄2 = 25
S1 S2 S1 S2
z1 = 15 z2 = 21 z2 = 21
Figure 8.4: Pruning by bound.

We list one more generic reason which makes the exploration of a branch unnecessary.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


106 Branch and Bound

Observation 8.5 (Pruning by infeasibility) If Si = ∅, then there is no need to subdi-


vide Si into smaller pieces, either.

The next section will make clear how the case of infeasibility can occur in a branch and
bound system. We close this section by showing an example where no pruning is possible.
Consider the partial enumeration tree in Figure 8.5, where again we have divided the set S
into the sets S1 and S2 with the bounds as shown.

S
z̄ = 30 → S
z̄ = 25
z = 10 z = 17
z̄1 = 20 z̄2 = 25 z̄1 = 20 z̄2 = 25
S1 S2 S1 S2
z1 = 15 z2 = 17 z1 = 15 z2 = 17
Figure 8.5: No pruning possible.

Although the lower and upper bounds for the subproblems allow us to get better bounds for
the whole problem, we still have to explore both subtrees.

8.3 LP-Based Branch and Bound: An Example


The most common way to solve integer programs is to use an implicit enumeration scheme
based on Linear Programming. Upper bounds are provided by LP-relaxations and branch-
ing is done by adding new constraints. We illustrate this procedure for a small example.
Figure 8.6 shows the evolution of the corresponding branch-and-bound-tree.
Consider the integer program

z = max 4x1 − x2 (8.1a)


3x1 − 2x2 + x3 = 14 (8.1b)
x2 + x4 = 3 (8.1c)
2x1 − 2x2 + x5 = 3 (8.1d)
x∈ Z5+ (8.1e)

Bounding The first step is to get upper and lower bounds for the optimal value z. An
upper bound is obtained by solving the LP-relaxation of (8.1). This LP has the optimal
solution x̃ = (4.5, 3, 6.5, 0, 0). Hence, we get the upper bound z̄ = 15. Since we do not have
any feasible solution yet, by convention we use z = −∞ as a lower bound.

Branching In the current situation (see Figure 8.6(a)) we have z < z̄, so we have to
branch, that is, we have to split the feasible region S. A common way to achieve this is to
take an integer variable xj that is basic but not integral and set:

S1 = S ∩ x : xj 6 ⌊x̃j ⌋

S2 = S ∩ x : xj > ⌈x̃j ⌉

Clearly, S = S1 ∪ S2 and S1 ∩ S2 = ∅. Observe that due to the above choice of branching,


the current optimal basic solution x̃ is not feasible for either subproblem.
 If there is no de-
generacy (i.e., multiple optimal LP solutions), then we get max z̄1 , z̄2 < z̄, which means
that our upper bound decreases (improves).
In the concrete example we choose variable x1 where x̃1 = 4.5 and set S1 = S ∩ {x : x1 6 4}
and S2 = S ∩ {x : x1 > 5}.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


8.3 LP-Based Branch and Bound: An Example 107

z̄ = 15
z̄ = 15 S
S z = −∞
z = −∞

x1 6 4 x1 > 5

S1 S2 Pruned by infeasibility
(a) Initial tree
(b) Node S2 is pruned by infeasi-
bility.

z̄ = 13.5
S
z = −∞

z̄ = 13.5
S x1 6 4 x1 > 5
z = −∞
z̄1 = 13.5
x1 6 4 x1 > 5 S1 S2 Pruned by infeasibility
z1 = −∞
z̄1 = 13.5
S1 S2 Pruned by infeasibility x2 6 2 x2 > 3
z1 = −∞
(c) After the LP-relaxation of S2 is
solved, we get a new upper bound. S11 S12

(d) Branching on variable x2 leads


to two new subproblems.

z̄ = 13.5 z̄ = 13.5
S S
z = 13 z = 13

x1 6 4 x1 > 5 x1 6 4 x1 > 5

z̄1 = 13.5 z̄1 = 13.5


S1 S2 Pruned by infeasibility S1 S2 Pruned by infeasibility
z1 = 13 z1 = 13

x2 6 2 x2 > 3 x2 6 2 x2 > 3

z̄12 = 13 z̄11 = 12 z̄12 = 13


S11 S12 Pruned by optimality Pruned by bound S11 S12 Pruned by optimality
z12 = 13 z11 = −∞ z12 = 13
(e) Problem S12 has an integral op- (f) Problem S11 can be pruned by
timal solution, so we can prune it by bound, since z̄11 = 12 < 13 = z.
optimality. This also gives the first
finite lower bound.

Figure 8.6: Evolution of the branch-and-bound-tree for the example. The active nodes are
shown in white.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


108 Branch and Bound

Choosing an active node The list of active problems (nodes) to be examined is now
S1 , S2 . Suppose we choose S2 to be explored. Again, the first step is to obtain an upper
bound by solving the LP-relaxation:

z̄2 = max 4x1 − x2 (8.2a)


3x1 − 2x2 + x3 = 14 (8.2b)
x2 + x4 = 3 (8.2c)
2x1 − 2x2 + x5 = 3 (8.2d)
x1 > 5 (8.2e)
x>0 (8.2f)

It turns out that (8.2) is infeasible, so S2 can be pruned by infeasibility (see Figure 8.6(b)).

Choosing an active node The only active node at the moment is S1 . So, we choose S1 to
be explored. We obtain an upper bound z̄1 by solving the LP-relaxation

z̄1 = max 4x1 − x2 (8.3a)


3x1 − 2x2 + x3 = 14 (8.3b)
x2 + x4 = 3 (8.3c)
2x1 − 2x2 + x5 = 3 (8.3d)
x1 6 4 (8.3e)
x > 0. (8.3f)

An optimal solution for (8.3) is x̃2 = (4, 2.5, 7, 4, 0) with objective function value 13.5. This
allows us to update our bounds as shown in Figure 8.6(c).

This time we choose to branch on variable x2 , since x̃22 = 2.5. The two subproblems are
defined on the sets S11 = S1 ∩ {x : x2 6 2} and S12 = S1 ∩ {x : x2 > 3}, see Figure 8.6(d).

Choosing an active node The list of active nodes is S11 and S12 . We choose S12 and
solve the LP-relaxation

z̄12 = max 4x1 − x2 (8.4a)


3x1 − 2x2 + x3 = 14 (8.4b)
x2 + x4 = 3 (8.4c)
2x1 − 2x2 + x5 = 3 (8.4d)
x1 6 4 (8.4e)
x2 > 3 (8.4f)
x > 0. (8.4g)

An optimal solution of (8.4) is x̃12 = (4, 3, 8, 1, 0) with objective function value 13. As
the solution is integer, we can prune S12 by optimality. Moreover, we can also update our
bounds as shown in Figure 8.6(e).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


8.4 Techniques for LP-based Branch and Bound 109

Choosing an active node At the moment we only have one active node: S11 . The corre-
sponding LP-relaxation is

z̄11 = max 4x1 − x2 (8.5a)


3x1 − 2x2 + x3 = 14 (8.5b)
x2 + x4 = 3 (8.5c)
2x1 − 2x2 + x5 = 3 (8.5d)
x1 6 4 (8.5e)
x2 6 2 (8.5f)
x > 0, (8.5g)

which has x̃11 = (3.5, 2, 7.5, 1, 0) as an optimal solution. Hence, z̄11 = 12 which is strictly
smaller than z = 13. This means that S11 can be pruned by bound.

8.4 Techniques for LP-based Branch and Bound

The previous section contained an example for an execution of an LP-based branch and
bound algorithm. While the basic outline of such an algorithm should be clear, there are a
few issues that need to be taken into account in order to obtain the best possible efficiency.

8.4.1 Reoptimization

Consider the situation in the example from the previous section when we wanted to explore
node S1 . We had already solved the initial LP for S. The only difference between LP(S)
and LP(S1 ) is the presence of the constraint x1 6 4 in LP(S1 ). How can we solve LP(S1 )
without starting from scratch?

The problem LP(S) is of the form max cT x : Ax = b, x > 0 . An optimal basis for this
LP is B = {x1 , x2 , x3 } with solution x∗B = (4.5, 3, 6.5) and x∗N = (0, 0). We can parametrize
any solution x = (xB , xN ) for LP(S) by the nonbasic variables:

xB := A−1 −1 −1
B b − AB AN xN = xB − AB AN xN

Let ĀN := A−1 T T −1 T −1


B AN , c̄N := cB AB AN − cN and b̄ := AB b. Multiplying the constraints
−1
Ax = b by AB gives the optimal basis representation of the LP:

max cTB x∗B − c̄TN xN (8.6a)


xB + ĀN xN = b̄ (8.6b)
x>0 (8.6c)

Recall that a basis B is called dual feasible, if c̄N > 0. The optimal primal basis is also
dual feasible. We refer to [Sch86, NW99] for details on Linear Programming.
In the branch and bound scheme, we add a new constraint xi 6 t (or xi > t) for some
basic variable xi ∈ B. Let us make this constraint an equality constraint by adding a slack
variable s > 0. The new constraints are xi + s = t, s > 0. By (8.6) we can express xi in
terms of the nonbasic variables: xi = (b̄ − ĀN xN )i . Hence xi + s = t becomes

(−ĀN xN )i + s = t − b̄i .

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


110 Branch and Bound

Adding this constraint to (8.6) gives us the following new LP that we must solve:

max cTB x∗B − c̄TN xN (8.7a)


xB + ĀN xN = b̄ (8.7b)
(−ĀN xN )i + s = t − b̄i (8.7c)
x > 0, s > 0 (8.7d)

The set B ∪ {s} forms a dual feasible basis for (8.7), so we can start to solve (8.7) by the
dual Simplex method starting with the situation as given.
Let us illustrate the method for the concrete sitation of our example. We have
 
3 −2 1
AB =  0 1 0 
2 −2 0
 
0 1 1/2
A−1B =
 0 1 0 
1 −1 −3/2

and thus (8.6) amounts to

max 15 − 3x4 − 2x5


x1 +x4 + 12 x5 = 9
2
x2 +x4 = 3
x3 −x4 − 23 x5 = 13
2
x1 , x2 , x3 , x4 , x5 > 0

If we add a slack variable s, then the new constraint x1 6 4 becomes x1 + s = 4, s > 0.


Since x1 = 92 − x4 − 12 x5 , we can rewrite this constraint in terms of the nonbasic variables
as:
1 1
−x4 − x5 + s = − . (8.8)
2 2
This gives us the dual feasible representation of LP(S1 ):

max 15 − 3x4 − 2x5


x1 +x4 + 12 x5 = 9
2
x2 +x4 = 3
x3 −x4 − 32 x5 = 13
2
−x4 − 12 x5 +s = − 21
x1 , x2 , x3 , x4 , x5 , s > 0

We can now do dual Simplex pivots to find the optimal solution of LP(S1 ).

8.4.2 Other Issues

Storing the tree In practice we do not need to store the whole branch and bound tree. It
suffices to store the active nodes.

Bounding Upper bounds are obtained by means of the LP-relaxations. Cutting-planes


(see the following chapter) can be used to strengthen bounds. Lower bounds can be
obtained by means of heuristics and approximation algorithms.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


8.4 Techniques for LP-based Branch and Bound 111

Branching In our example we branched on a variable that was fractional in the optimal
solution for the LP-relaxation. In general, there will be more than one such variable.
A choice that has proved to be efficient in practice is to branch on the most fractional

variable, that is, to choose a variable which maximizes maxj min fj , 1 − fj , where
fj = xj − ⌊xj ⌋. There are other rules based on estimating the cost of a variable to
become integer. We refer to [NW99] for more details.

Choosing an active node In our example we just chose an arbitrary active node to be ex-
plored. In practice there are two antagonistic arguments how to choose an active
node:

• The tree can only be pruned significantly if there is a good primal feasible
solution available which gives a good lower bound. This suggests to apply a
depth-first search strategy. Doing so, in each step a single new constraint is
added and we can reoptimize easily as shown in Section 8.4.1.
• On the other hand, one would like to minimize the total number of nodes eval-
uated in the tree. This suggests to choose an active node with the best upper
bound. Such a strategy is known as best-first search.

In practice, one usually employs a mixture of depth-first search and best-first search.
After an initial depth-first search which leads to a feasible solution one switches to a
combined best-first and depth-first strategy.

Preprocessing An important technique for obtaining efficient algorithms is to use prepro-


cessing which includes

• tightening of bounds (e.g. by cutting-planes, see the following chapter)


• removing of redundant variables
• variable fixing

We demonstrate preprocessing techniques for a small example. Consider the LP

max 2x1 + x2 − x3
5x1 − 2x2 + 8x3 6 15
8x1 + 3x2 − x3 > 9
x1 + x2 + x3 6 6
0 6 x1 6 3
0 6 x2 6 1
1 6 x3

Tightening of constraints

Suppose we take the first constraint and isolate variable x1 . Then we get

5x1 6 15 + 2x2 − 8x3


6 15 + 2 · 1 − 8 · 1
= 9,

where we have used the bounds on the variables x2 and x3 . This gives us the bound

9
x1 6 .
5

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


112 Branch and Bound

Similarly, taking the first constraint and isolating variable x3 results in:

8x3 6 15 + 2x2 − 5x1


6 15 + 2 · 1 − 5 · 0
= 17.

Our new bound for x3 is:


17
x3 6 .
8
By similar operations we get the new bound
7
x1 >
8
and now using all the new bounds for x3 in the first inequality gives:
101
x3 6 .
64
Plugging the new bounds into the third constraint gives:
9 101
x1 + x2 + x3 6 +1+ < 6.
5 64
So, the third constraint is superfluous and can be dropped. Our LP has reduced to the
following problem:

max 2x1 + x2 − x3
5x1 − 2x2 + 8x3 6 15
8x1 + 3x2 − x3 > 9
7 9
6 x1 6
8 5
0 6 x2 6 1
101
1 6 x3 6
64

Variable Fixing

Consider variable x2 . Increasing variable x2 makes all constraints less tight. Since x2 has
a positive coefficient in the objective function it will be as large as possible in an optimal
solution, that is, it will be equal to its upper bound of 1:

x2 = 1.

The same conclusion could be obtained by considering the LP dual. One can see that the
dual variable corresponding to the constraint x2 6 1 must be positive in order to achieve
feasibility. By complementary slackness this means that x2 = 1 in any optimal solution.
Decreasing the value of x3 makes all constraints less tight, too. The coefficient of x3 in the
objective is negative, so x3 can be set to its lower bound.
After all the preprocessing, our initial problem has reduced to the following simple LP:

max 2x1
7 9
6 x1 6
8 5

We formalize the ideas from our example above:

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


8.4 Techniques for LP-based Branch and Bound 113

Observation 8.6 Consider the set


 
 Xn 
S = x : a0 x0 + aj xj 6 b, lj 6 xj 6 uj , for j = 1, . . . , n .
 
j=1

The following statements hold:

Bounds on variables If a0 > 0, then


 
X X
x0 6 b − aj lj − aj uj  /a0 .
j:aj >0 j:aj <0

and, if a0 < 0, then


 
X X
x0 > b − aj lj − aj uj  /a0 .
j:aj >0 j:aj <0

Pn
Redundancy The constraint a0 x0 + j=1 aj xj 6 b is redundant, if
X X
aj uj + aj lj 6 b.
j:aj >0 j:aj <0

Infeasibility The set S is empty, if


X X
aj lj + aj uj > b.
j:aj >0 j:aj <0


Variable Fixing For a maximization problem of the form max cT x : Ax 6 b, l 6 x 6 u ,
if aij > 0 for all i = 1, . . . , m and cj < 0, then xj = lj in an optimal solution. Con-
versely, if aij 6 0 for all i = 1, . . . , m and cj > 0, then xj = uj in an optimal solution.

For integer programming problems, the preprocessing can go further. If xj ∈ Z and the
bounds lj 6 xj 6 uj are not integer, then we can tighten them to

⌈lj ⌉ 6 xj 6 ⌊uj ⌋.

We will explore this fact in greater depth in the following chapter on cutting-planes.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Cutting Planes
9

Let P = x ∈ Rn n
+ : Ax 6 b be a rational polyhedron and PI = conv(P ∩ Z ). We have
seen in Section 3.9 that PI is a rational polyhedron and that we can solve the integer pro-
gram

max cT x : x ∈ P ∩ Zn (9.1)

by solving the Linear Program




max cT x : x ∈ PI . (9.2)

In this chapter we will be concerned with the question how to find a linear description of PI
(or an adequate superset of PI ) which enables us to solve (9.2) and (9.1).
Recall that by Theorem 3.47 in order to describe a polyhedron we need exactly its facets.

9.1 Cutting-Plane Proofs


Suppose that we are about to solve an integer program


max cT x : Ax 6 b, x ∈ Zn .

Let P = {x ∈ Rn : Ax 6 b} and X = P ∩Zn . If we want to establish optimality of a solution


(or at least provide an upper bound) this task is equivalent to proving that cT x 6 t is valid
for all points in X. Without the integrality constraints we could prove the validity of the
inequality by means of a variant of Farkas’ Lemma (cf. Theorem 2.11):

Lemma 9.1 (Farkas’ Lemma (Variant)) Let P = {x ∈ Rn : Ax 6 b} 6= ∅. The following


statements are equivalent:

(i) The inequality cT x 6 t is valid for P.

(iii) There exists y > 0 such that AT y = c and bT y 6 t.



Proof: By Linear Programming duality, we have max cT x : x ∈ P 6 t if and only if

min bT y : AT y = c, y > 0 6 t. So, cT x 6 t is valid for P 6= ∅ if and only if there exists
y > 0 such that AT y = c and bT y 6 t. ✷
116 Cutting Planes

As a consequence of the above lemma, if an inequality cT x 6 t is valid for P = {x : Ax 6 b},


then we can derive the validity of the inequality. Namely, we can find y > 0 such that for
c = AT y and t ′ = yT b the inequality cT x 6 t ′ is valid for P and t ′ 6 t. This clearly
implies that cT x 6 t is valid.
How can we prove validity in the presence of integrality constraints? Let us start with an
example. Consider the following linear system
2x1 + 3x2 6 27 (9.3a)
2x1 − 2x2 6 7 (9.3b)
−6x1 − 2x2 6 −9 (9.3c)
−2x1 − 6x2 6 −11 (9.3d)
−6x1 + 8x2 6 21 (9.3e)
Figure 9.1 shows the polytope P defined by the inequalities in (9.3) together with the convex
hull of the points from X = P ∩ Z2 .

0
0 1 2 3 4 5 6 7 8

Figure 9.1: Example of a polytope and its integer hull.

As can be seen from Figure 9.1, the inequality x2 6 5 is valid for X = P ∩ Z2 . However, we
can not use Farkas’ Lemma to prove this fact from the linear system (9.3), since the point
(9/2, 6) ∈ P has second coordinate 6.
Suppose we multiply the last inequality (9.3e) of the system (9.3) by 1/2. This gives us the
valid inequality
−3x1 + 4x2 6 21/2. (9.4)
For any integral vector (x1 , x2 ) ∈ X the left hand side of (9.4) will be integral, so we can
round down the right hand side of (9.4) to obtain the valid inequality (for X):
−3x1 + 4x2 6 10. (9.5)
We now multiply the first inequality (9.3a) by 3 and (9.5) by 2 and add those inequalities.
This gives us a new valid inequality:
17x2 6 101.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.1 Cutting-Plane Proofs 117

Dividing this inequality by 17 and rounding down the resulting right hand side gives us the
valid inequality x2 6 5.
The procedure used in the example above is called a cutting plane proof. Suppose that our
system Ax 6 b is formed by the inequalities

aTi x 6 bi , i = 1, . . . , m (9.6)

and let P = {x : Ax 6 b}. Let y ∈ Rm


+ and set

m
X
c := (AT y) = yi ai
i=1
m
X
t := bT y = yi bi .
i=1

As we have already seen, every point in P satisfies cT x 6 t. But we can say more. If c is
integral, then for every integral vector in P the quantity cT x is integral, so it satisfies the
stronger inequality
cT x 6 ⌊t⌋. (9.7)
The inequality (9.7) is called a Gomory-Chvátal cutting plane. The term “cutting plane”
stems from the fact that (9.7) cuts off part of the polyhedron P but not any of the integral
vectors in P.

Definition 9.2 (Cutting-Plane Proof)


Let Ax 6 b be a system of linear inequalities and cT x 6 t be an inequality. A sequence of
linear inequalities
cT1 x 6 t1 , cT2 x 6 t2 , . . . , cTk x 6 tk
is called a cutting-plane proof of cT x 6 t (from Ax 6 b), if each of the vectors c1 , . . . , ck is
integral, ck = c, tk = t, and if for each i = 1, . . . , k the following statement holds: cTi x 6 ti′
is a nonnegative linear combination of the inequalities Ax 6 b, cT1 x 6 t1 , . . . , cTi−1 x 6 ti−1
for some ti′ with ⌊ti′ ⌋ 6 ti .

Clearly, if cT x 6 t has a cutting-plane proof from Ax 6 b, then cT x 6 t is valid for each


integral solution of Ax 6 b. Moreover, a cutting plane proof is a clean way to show that
the inequality cT x 6 t is valid for all integral vectors in a polyhedron.

Example 9.3 (Matching Polytope)


The matching polytope M(G) of a graph G = (V, E) is defined as the convex hull of all
incidence vectors of matchings in G. It is equal to the set of solutions of

x(δ(v)) 6 1 for all v ∈ V


E
x∈B

Alternatively, if we let P denote the polytope obtained by replacing x ∈ BE by 0 6 x, then


M(G) = PI .
Let S ⊆ V be a set of nodes of odd cardinality. As the edges of a matching do not share an
|S|−1
endpoint, the number of edges of a matching having both endpoints in S is at most 2 .
Thus,
|S| − 1
x(γ(S)) 6 (9.8)
2
is a valid inequality for M(G) = PI . Here, γ(S) denotes the set of edges which have both
endpoints in S. We now give a cutting-plane proof of (9.8).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


118 Cutting Planes

For v ∈ S take the inequality x(δ(v)) 6 1 with weight 1/2 and sum up the resulting |S| in-
equalities. This gives:
1 |S|
x(γ(S)) + x(δ(S)) 6 . (9.9)
2 2
For each e ∈ δ(S) we take the inequality −xe 6 0 with weight 1/2 and add it to (9.9). This
gives us:
|S|
x(γ(S)) 6 . (9.10)
2
Rounding down the right hand side of (9.10) yields the desired result (9.8). ⊳

In the sequel we are going to show that cutting-plane proofs are always possible, provided
P is a polytope.

Theorem 9.4 Let P = {x : Ax 6 b} be a rational polytope and let cT x 6 t be a valid in-


equality for X = P ∩ Zn , where c is integral. Then, there exists a cutting-plane proof of
cT x 6 t ′ from Ax 6 b for some t ′ 6 t.

We will prove Theorem 9.4 by means of a special case (Theorem 9.6). We need another
useful equivalent form of Farkas’ Lemma:

Theorem 9.5 (Farkas’ Lemma for inequalities) The sytem Ax 6 b has a solution x if
and only if there is no vector y > 0 such that yT A = 0 and yT b < 0.

Proof: See standard textbooks about Linear Programming, e.g. [Sch86]. ✷

From this variant of Farkas’ Lemma we see that P = {x : Ax 6 b} is empty if and only
if we can derive a contradiction 0T x 6 −1 from the system Ax 6 b by means of taking
a nonnegative linear combination of the inequalities. The following theorem gives the
analogous statement for integral systems:

Theorem 9.6 Let P = {x : Ax 6 b} be a rational polytope and X = P ∩ Zn be empty. Then


there exists a cutting-plane proof of 0T x 6 −1 from Ax 6 b.

Before we embark upon the proofs (with the help of a technical lemma) let us derive an-
other look at Gomory-Chvátal cutting-planes. By Farkas’ Lemma, we can derive any valid
inequality cT x 6 t (or a stronger version) for a polytope P = {x : Ax 6 b} by using a non-
negative linear combination of the inequalities. In view of this fact, we can define Gomory-
Chvátal cutting-planes also directly in terms of the polyhedron P: we just take a valid
inequality cT x 6 t for P with c integral which induces a nonempty face and round down t
to obtain the cutting plane cT x 6 ⌊t⌋.
The proof of Theorems 9.4 and 9.6 is via induction on the dimension of the polytope. The
following lemma allows us to translate a cutting-plane proof on a face F to a proof on the
entire polytope P.

Lemma 9.7 Let P = {x : Ax 6 b} be a rational polytope and F be a face of P. If cT x 6 ⌊t⌋


is a Gomory-Chvátal cutting-plane for F, then there exists a Gomory-Chvátal cutting-plane
c̄T x 6 ⌊t̄⌋ for P such that



F ∩ x : c̄T x 6 ⌊t̄⌋ = F ∩ x : cT x 6 ⌊t⌋ . (9.11)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.1 Cutting-Plane Proofs 119

Proof: By Theorem 3.6 we can represent the polyhedron P and the face F by:
 
P = x : A ′ x 6 b ′ , A ′′ x 6 b ′′ and F = x : A ′ x 6 b ′ , A ′′ x = b ′′ ,

where A ′′ and b ′′ are integral. Let t∗ = max cT x : x ∈ F . Since cT x 6 t is valid for F
we must have t > t∗ . We can assume without loss of generality that t = t∗ , since replacing
t by t∗ on the right hand side of (9.11) does not change anything. So, the following system
does not have a solution:

A ′x 6 b ′
A ′′ x 6 b ′′
−A ′′ x 6 −b ′′
cT x > t

By Farkas’ Lemma there exist vectors y ′ > 0, y ′′ such that

(y ′ )T A ′ + (y ′′ )T A ′′ = cT
(y ′ )T b ′ + (y ′′ )T b ′′ = t.

This looks like a Gomory-Chvátal cutting-plane cT x 6 ⌊t∗ ⌋ for P with the exception that
y ′′ is not necessarily nonnegative. However, the vector y ′′ − ⌊y ′′ ⌋ is nonnegative and it
turns out that replacing y ′′ by this vector will work. Let

c̄T := (y ′ )T A ′ + (y ′′ − ⌊y ′′ ⌋)T A ′′ = cT − (⌊y ′′ ⌋)T A ′′


t̄ := (y ′ )T b ′ + (y ′′ − ⌊y ′′ ⌋)T b ′′ = t − (⌊y ′′ ⌋)T b ′′ .

Observe that c̄ is integral, since c is integral, A ′′ is integral and ⌊y ′′ ⌋ is integral. The


inequality c̄T x 6 t̄ is a valid inequality for P, since we have taken a nonnegative linear
combination of the constraints. Now, we have

⌊t⌋ = ⌊y ′T b ′ + (⌊y ′′ ⌋)T b ′′ ⌋ = ⌊t̄⌋ + (⌊y ′′ ⌋)T b ′′ , (9.12)

where the last equality follows from the fact that ⌊y ′′ ⌋ and b ′′ are integral. This gives us:


F ∩ x : c̄T x 6 ⌊t̄⌋


=F ∩ x : c̄T x 6 ⌊t̄⌋, A ′′ x = b ′′


=F ∩ x : c̄T x 6 ⌊t̄⌋, (⌊y ′′ ⌋)T A ′′ x = (⌊y ′′ ⌋)T b ′′


=F ∩ x : cT x 6 ⌊t̄⌋ + (⌊y ′′ ⌋)T b ′′ , (⌊y ′′ ⌋)T A ′′ x = (⌊y ′′ ⌋)T b ′′


=F ∩ x : cT x 6 ⌊t⌋ .

This completes the proof. ✷

Proof of Theorem 9.6 We use induction on the dimension of P. If dim(P) ∈ {−1, 0}, then
the claim obviously holds. So, let us assume that dim(P) > 1 and that the claim holds for
all polytopes of smaller dimension.
Let cT x 6 δ with c integral be an inequality which induces a proper face of P. Then, by
Farkas’ Lemma, we can derive the inequality cT x 6 δ from Ax 6 b and cT x 6 ⌊δ⌋ is a
Gomory-Chvátal cutting-plane for P. Let


P̄ := x ∈ P : cT x 6 ⌊δ⌋

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


120 Cutting Planes

be the polytope obtained from P by applying the Gomory-Chvátal cut cT x 6 ⌊δ⌋.


Case 1: P̄ = ∅:
By Farkas’ Lemma we can derive the inequality 0T x 6 −1 from the inequality system
Ax 6 b, cT x 6 ⌊δ⌋ which defines P̄. Since cT x 6 ⌊δ⌋ was a Gomory-Chvátal cutting-plane
for P (and thus was derived itself from Ax 6 b) this means, we can derive the contradiction
from Ax 6 b.
Case 2: P̄ 6= ∅:
Define the face F of P̄ by



F := x ∈ P̄ : cT x = ⌊δ⌋ = x ∈ P : cT x = ⌊δ⌋ .

If δ is integral, then F is a proper face of P, so dim(F) < dim(P) in this case. If δ is not
integral, then P contains points which do not satisfy cT x = ⌊δ⌋ and so also in this case we
have dim(F) < dim(P).
By the induction hypothesis, there is a cutting-plane proof of 0T x 6 −1 for F, that is, from
the system Ax 6 b, cT x = ⌊δ⌋. By Lemma 9.7 there is a cutting-plane proof from Ax 6 b,
cT x 6 ⌊δ⌋ for an inequality wT x 6 d such that



∅ = F ∩ x : 0T x 6 −1 = F ∩ x : wT x 6 ⌊d⌋ .

We have



∅ = F ∩ x : wT x 6 ⌊d⌋ = P̄ ∩ x : cT x = ⌊δ⌋, wT x 6 ⌊d⌋ . (9.13)

Let us restate our result so far: We have shown that there is a cutting plane proof from
Ax 6 b, cT x 6 ⌊δ⌋ for an inequality wT x 6 d which satisfies (9.13).
Thus, the following linear system does not have a solution:

Ax 6 b (9.14a)
T
c x 6 ⌊δ⌋ (9.14b)
T
−c x 6 −⌊δ⌋ (9.14c)
T
w x 6 ⌊d⌋. (9.14d)

By Farkas’ Lemma for inequalities there exist y, λ1 , λ2 , µ > 0 such that

yT A + λ1 cT − λ2 cT + µwT = 0 (9.15a)
T
y b + λ1 ⌊δ⌋ − λ2 ⌊δ⌋ + µ⌊d⌋ < 0. (9.15b)

If λ2 = 0, then (9.15) means that already the system obtained from (9.14) by dropping
cT x > ⌊δ⌋ does not have a solution, that is


∅ = x : Ax 6 b, cT x 6 ⌊δ⌋, wT x 6 ⌊d⌋ .

So, by Farkas’ Lemma we can derive 0T x 6 −1 from this system which consists completely
of Gomory-Chvátal cutting-planes for P.
So, it suffices to handle the case that λ2 > 0. In this case, we can divide both lines in (9.15)
by λ2 and get that there exist y ′ > 0, λ ′ > 0 and µ ′ > 0 such that

(y ′ )T A + (λ ′ )cT + (µ ′ )wT = cT (9.16a)


′ T ′ ′
(y ) b + (λ )⌊δ⌋ + (µ )⌊d⌋ = θ < ⌊δ⌋. (9.16b)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.2 A Geometric Approach to Cutting Planes: The Chvátal Rank 121

Now, (9.16) states that we can derive an inequality cT x 6 θ from Ax 6 b, cT x 6 ⌊δ⌋,


wT x 6 ⌊d⌋ with θ < ⌊δ⌋. Since all the inequalities in the system were Gomory-Chvátal
cutting-planes this implies that
cT x 6 ⌊δ⌋ − τ for some τ ∈ Z, τ > 1 (9.17)
is a Gomory-Chvátal cutting-plane for P.

Since P is bounded, the value z = min cT x : x ∈ P is finite. If we continue as above,

starting with P̄ = x ∈ P : cT x 6 ⌊δ⌋ − τ , at some point we will obtain a cutting-plane

proof of some cT x 6 t where t < z so that P ∩ x : cT x 6 t = ∅. Then, by Farkas’
Lemma we will be able to derive 0T x 6 −1 from Ax 6 b, cT x 6 t. ✷

Proof of Theorem 9.4 Case 1: P ∩ Zn = ∅


By Theorem 9.6 there
 T is a cutting-plane
proof of 0T x 6 −1 from Ax 6 b. Since P is
bounded, ℓ := max c x : x ∈ P is finite. By Farkas’ Lemma, we can derive cT x 6 ℓ and
thus we have the Gomory-Chvátal cutting plane cT x 6 ⌊ℓ⌋. Adding an appropriate multiple
of 0T x 6 −1 to cT x 6 ⌊ℓ⌋ gives an inequality cT x 6 t ′ for some t ′ 6 t which yields the
required cutting-plane proof.
Case 2: P ∩ Zn 6= ∅
 
Again, let ℓ := max cT x : x ∈ P which is finite, and define P̄ := x ∈ P : cT x 6 ⌊ℓ⌋ , that
is, P̄ is the polytope obtained by applying the Gomory-Chvátal cutting-plane cT x 6 ⌊ℓ⌋
to P.
If ⌊ℓ⌋ 6 t we already have a cutting-plane proof of an inequality with the desired properties.
So, assume that ⌊ℓ⌋ > t. Consider the face


F = x ∈ P̄ : cT x = ⌊ℓ⌋

of P̄. Since cT x 6 t is valid for all integral points in P and by assumption t < ⌊ℓ⌋, the
face F can not contain any integral point. By Theorem 9.6 there is a cutting-plane proof of
0T x 6 −1 from Ax 6 b, cT x = ⌊ℓ⌋. We now use Lemma 9.7 as in the proof of Theorem 9.6.
The lemma shows that there exists a cutting
 plane proof of some inequality wT x 6 ⌊d⌋
from Ax 6 b, c x 6 ⌊ℓ⌋ such that P̄ ∩ x : c x = ⌊ℓ⌋, wT x 6 ⌊d⌋ = ∅.
T T

By using the same arguments as in the proof of Theorem 9.6 it follows that there is a
cutting-plane proof of an inequality cT x 6 ⌊ℓ⌋ − τ for some τ ∈ Z, τ > 1 from Ax 6 b.
Contiuning this way, we finally get an inequality cT x 6 t ′ with t ′ 6 t. ✷

9.2 A Geometric Approach to Cutting Planes: The Chvá-


tal Rank
Let P = {x : Ax 6 b} be a rational polyhedron and PI := conv(P ∩ Zn ). Suppose we want
to find a linear description of PI . One approach is to add valid inequalities step by step,
obtaining tighter and tighter approximations of PI .
We have already seen that, if cT x 6 δ is a valid inequality for P with c integral,
 then cT x 6

T T
⌊δ⌋ is valid for PI . If c x = δ was a supporting hyperplane of P, that is, P ∩ x : c x = δ
is a proper face of P, then cT x 6 ⌊δ⌋ is a Gomory-Chvátal cutting-plane. Otherwise, the
inequality cT x 6 δ is dominated by that of a supporting hyperplane. Anyway, we have


PI ⊆ x ∈ Rn : cT x 6 ⌊δ⌋ (9.18)

for any valid inequality cT x 6 δ for P where c is integral. This suggests to take the inter-
section of all sets of the form (9.18) as an approximation to P.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


122 Cutting Planes

Definition 9.8 Let P be a rational polyhedron. Then, P ′ is defined as


\

P ′ := x ∈ Rn : cT x 6 ⌊δ⌋ . (9.19)
c is integral
and
cT x 6 δ is valid for P

Observe that (9.19) is the same as taking the intersection over all Gomory-Chvátal cutting-
planes for P. It is not a priori clear that P ′ is a polyhedron, since there is an infinite number
of cuts.

Theorem 9.9 Let P be a rational polyhedron. Then P ′ as defined in (9.19) is also a rational
polyhedron.

Proof: If P = ∅ the claim is trivial. So let P 6= ∅. By Theorem 4.27 there is a TDI-system


Ax 6 b with integral A such that P = {x : Ax 6 b}. We claim that

P ′ = {x ∈ Rn : Ax 6 ⌊b⌋} . (9.20)

From this the claim follows, since the set on the right hand side of (9.20) is a rational
polyhedron (A and ⌊b⌋ are integral, and there are only finitely many constraints).
Since every row aTi x 6 bi of Ax 6 b is a valid inequality for P it follows that P ′ ⊆
{x ∈ Rn : Ax 6 ⌊b⌋}. So, it suffices to show that the set on the right hand side of (9.20) is
contained in P ′ .

Let cT x = δ be a supporting hyperplane of P with c integral, P ⊆ x : cT x 6 δ . By Linear
Programming duality we have



δ = max cT x : x ∈ P = min bT y : AT y = c, y > 0 . (9.21)

Since the system Ax 6 b is TDI and c is integral, the minimization problem in (9.21) has
an optimal solution y∗ which is integral.
Let x ∈ {x : Ax 6 ⌊b⌋}.

cT x = (AT y∗ )T x (since y∗ is feasible for the problem in (9.21))


= (y∗ )T (Ax)
6 (y∗ )T ⌊b⌋ (since Ax 6 ⌊b⌋ and y∗ > 0)
= ⌊(y∗ )T ⌊b⌋⌋ (since y∗ and ⌊b⌋ are integral)
6 ⌊(y∗ )T b⌋ (since ⌊b⌋ 6 b and y∗ > 0)
= ⌊δ⌋ (by the optimality of y∗ for (9.21)).

Thus, we have

{x : Ax 6 ⌊b⌋} ⊆ x : cT x 6 ⌊δ⌋ .

Since cT x = δ was an arbitrary supporting hyperplane, we get that


\

{x : Ax 6 ⌊b⌋} ⊆ x : cT x 6 ⌊δ⌋ = P ′
c,δ

as required. ✷

We have obtained P ′ from P by taking all Gomory-Chvátal cuts for P as a first wave.
Given that P ′ is a rational polyhedron, we can take as a second wave all Gomory-Chvátal

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.3 Cutting-Plane Algorithms 123

cuts for P ′ . Continuing this procedure gives us better and better approximations of PI . We
let

P(0) := P
 ′
P(i) := P(i−1) for i > 1.

This gives us a sequence of polyhedra

P = P(0) ⊃ P(1) ⊃ P(2) ⊃ · · · ⊃ PI ,

which are generated by the waves of cuts.


We know that PI is a rational polyhedron (given that P is one) and by Theorem 9.4 each
of the finitely many valid inequalities for PI will be generated by the waves of Gomory-
Chvátal cuts. Thus, we can restate the result Theorem 9.4 in terms of the polyhedra P(i) as
follows:

Theorem 9.10 Let P be a rational polytope. Then we have P(k) = PI for some k ∈ N. ✷

Definition 9.11 (Chvátal rank)


Let P be a rational polytope. The Chvátal rank of P is defined to be the smallest integer k
such that P(k) = PI .

9.3 Cutting-Plane Algorithms


Cutting-plane proofs are usually employed to prove the validity of some classes of inequal-
ities. These valid inequalities can then be used in a cutting-plane algorithm.
Suppose that we want to solve the integer program


z∗ = max cT x : x ∈ P ∩ Zn

and that we know a family F of valid inequalities for PI = conv(P ∩ Zn ). Usually, F will
not contain a complete description of PI since either such a description is not known or
we do not know how to separate over F efficiently. The general idea of a cutting-plane
algorithm is as follows:


• We find an optimal solution x∗ for the Linear Program max cT x : x ∈ P . This can
be done by any Linear Programming algorithm (possibly a solver that is available
only as a black-box).

• If x∗ is integral, we already have an optimal solution to the IP and we can terminate.

• Otherwise, we search our family (or families) of valid inequalities for inequalities
which are violated by x∗ , that is, wT x∗ > d where wT x 6 d is valid for PI .

• We add the inequalities found to our LP-relaxation and resolve to find a new optimal
solution x∗∗ of the improved formulation. This procedure is contiued.

• If we are fortunate (or if F contains a complete description of PI ), we terminate


with an optimal integral solution. We say “fortunate”, since if F is not a complete
description of PI , this depends on the objective function and our family F.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


124 Cutting Planes

• If we are not so lucky, we still have gained something. Namely, we have found a new
formulation for our initial problem which is better than the original one (since we
have cut off some non-integral points). The formulation obtained upon termination
gives an upper bound z̄ for the optimal objective function value z∗ which is no worse
than the initial one (and usually is much better). We can now use z̄ in a branch and
bound algorithm.

Algorithm 9.1 gives a generic cutting-plane algorithm along the lines of the above discus-
sion. The technique of using improved upper bounds from a cutting-plane algorithm in a
branch and bound system is usually referred to as branch-and-cut (cf. the comments about
preprocessing in Section 8.4.2).

Algorithm 9.1 Generic cutting-plane algorithm


G ENERIC -C UTTING -P LANE 
Input: An integer program max cT x : x ∈ P, x ∈ Zn ;
a family F of valid inequalities for PI = conv(P ∩ Zn )
1 repeat 
2 Solve the Linear Program max cT x : x ∈ P . Let x∗ be an optimal solution.
3 if x∗ is integral then
4 An optimal solution to the integer program has been found. stop.
5 else
6 Solve the separation problem for F, that is, try to find an inequality wT x 6 d in F
such that wT x∗ > d.
7 if such an inequality wT x 6 d cutting off x∗ was found then
8 Add the inequality to the system, that is, set P := P ∩ x : wT x 6 d .
9 else
10 We do not have an optimal solution yet. However, we have a better formulation
for the original problem. stop.
11 end if
12 end if
13 until forever

It is clear that the efficiency of a cutting-plane algorithm depends on the availability of


constraints that give good upper bounds. In view of Theorem 3.47 the only inequalities (or
cuts) we need are those that induce facets. Thus, one is usually interested in finding (by
means of mathematical methods) as many facet-inducing inequalities as possible.

9.4 Gomory’s Cutting-Plane Algorithm


In this section we assume that the integer program which we want to solve is given in the
following form:

max cT x (9.22a)
Ax = b (9.22b)
x>0 (9.22c)
x ∈ Zn (9.22d)

where A is an integral m × n-matrix with rank(A) = m and b is an integral vector in Zm .


As usual we let P = {x ∈ Rn : Ax = b, x > 0} and PI = conv(P ∩ Zn ).
Any integer program with rational data can be brought into this form by elementary trans-
formations (see textbooks about Linear Programming [Sch86, Lue84, NW99] where those

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.4 Gomory’s Cutting-Plane Algorithm 125

methods are used for Linear Programs): if xj is not sign restricted, we replace xj by two
new variables xj = x+ − + − T
j − xj where xj , xj > 0. Any inequality ai x 6 bi can be trans-
formed into an equality by introducing a slack variable s > 0 which yields aTi x + s = bi .
Since A and b are integral, the new slack variable will also be an integral variable.
Suppose that we solve the LP-relaxation
max cT x (9.23a)
Ax = b (9.23b)
x>0 (9.23c)

of (9.22) by means of the Simplex method.1


Recall that a basis for (9.23) is an index set B ⊆ {1, . . . , n} with |B| = m such that the
corresponding submatrix AB of A is nonsingular. The basis is termed feasible if xB :=
A−1
B b > 0. Clearly, in this case (xB , xN ) with xN := 0 is a feasible solution of (9.23). It is
a well known fact that (9.23) has an optimal solution if and only if there is an optimal basic
solution [Lue84, CCPS98, Sch86].
Suppose that we are given an optimal basis B and a corresponding optimal basic solution x∗
for (9.23). As in Section 8.4.1 we can parametrize any solution x of Ax = b by the nonbasic
variables:
xB = A−1 −1
B b − AB AN xN =: b̄ − ĀN xN (9.24)
This gives the equivalent statement of the problem (9.23) in the basis representation:
max cTB x∗B − c̄TN xN (9.25a)
xB + ĀN xN = b̄ (9.25b)
x>0 (9.25c)
If x∗ is integral, then x∗
is an optimal solution for our integer program (9.22). Otherwise,
there is a basic variable x∗i which has a fractional value, that is, x∗i = b̄i ∈
/ Z. We will now
use the equation in (9.24) which defines x∗i to derive a valid inequality for PI . We will then
show that the inequality derived is in fact a Gomory-Chvátal cutting-plane.
Let Ā = (ālk ). Any feasible solution x of the integer program (9.22) satisfies (9.24). So,
we have
X
xi = b̄i − āij xj ∈ Z (9.26)
j∈N

−⌊b̄i ⌋ ∈Z (9.27)
X
⌊āij ⌋xj ∈ Z. (9.28)
j∈N

Adding (9.26), (9.27) and (9.28) results in:


X
(b̄i − ⌊b̄i ⌋) − (āij − ⌊āij ⌋)xj ∈ Z. (9.29)
| {z }
j∈N
∈(0,1) | {z }
>0
P
Since 0 < (b̄i − ⌊b̄i ⌋) < 1 and j∈N (āij − ⌊āij ⌋)xj > 0, the value on the left hand side
of (9.29) can only be integral, if it is nonpositive, that is, we must have
X
(b̄i − ⌊b̄i ⌋) − (āij − ⌊āij ⌋)xj 6 0
j∈N
1 Basically, one could also use an interior point method. The key point is that in the sequel we need an optimal

basis.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


126 Cutting Planes

for every x ∈ PI . Thus, the following inequality is valid for PI :


X
(āij − ⌊āij ⌋)xj > (b̄i − ⌊b̄i ⌋). (9.30)
j∈N

Moreover, the inequality (9.30) is violated by the current basic solution x∗ , since x∗N = 0
(which means that the left hand side of (9.30) is zero) and x∗i = b̄i ∈ / Z, so that (b̄i −
⌊b̄i ⌋) = (x∗i − ⌊x∗i ⌋) > 0.
As promised, we are now going to show that (9.30) is in fact a Gomory-Chvátal cutting-
plane. By (9.24) the inequality
X
xi + āij xj 6 b̄i (9.31)
j∈N

P P
is valid for P. Since P ⊆ Rn
+ , we have j∈N ⌊āij ⌋xj 6 j∈N āij xj and the inequality
X
xi + ⌊āij ⌋xj 6 b̄i (9.32)
j∈N

must also be valid for P. In fact, since the basic solution x∗ for the basis B satisfies (9.31)
and (9.32) with equality, the inequalities (9.31) and (9.32) both induce supporting hyper-
planes. Observe that all coefficients in (9.32) are integral. Thus,
X
xi + ⌊āij ⌋xj 6 ⌊b̄i ⌋, (9.33)
j∈N

is a Gomory-Chvátal cutting-plane. We can now use (9.24) to rewrite (9.33), that is, to
eliminate xi (this corresponds to taking a nonnegative linear combination of (9.33) and the
appropriate inequality stemming from the equality (9.24)). This yields (9.30), so (9.30)
is (a scalar multiple of) a Gomory-Chvátal cutting-plane. It is important to notice that
the difference between the left-hand side and the right-hand side of the Gomory-Chvátal
cutting-plane (9.33), hence also of (9.30) is integral, when x is integral. Thus, if (9.30) is
rewritten using a slack variable s > 0 as
X
(āij − ⌊āij ⌋)xj − s = (b̄i − ⌊b̄i ⌋), (9.34)
j∈N

then this slack variable s will also be a nonnegative integer variable. Gomory’s cutting
plane algorithm is summarized in Algorithm 9.2.

Example 9.12
We consider the following integer program:

max 4x1 −x2


7x1 −2x2 6 14
x2 6 3
2x1 −2x2 6 3
x1 , x2 > 0
x1 , x2 ∈ Z

The feasible points and the polyhedron P described by the inequalities above are depicted
in Figure 9.2 (a).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.4 Gomory’s Cutting-Plane Algorithm 127

Algorithm 9.2 Gomory’s cutting-plane algorithm


G OMORY-C UTTING -P LANE 
Input: An integer program max cT x : Ax = b, x > 0, x ∈ Zn
1 repeat 
2 Solve the current LP-relaxation max cT x : Ax = b, x > 0 . Let x∗ be an optimal
basic solution.
3 if x∗ is integral then
4 An optimal solution to the integer program has been found. stop.
5 else
6 Choose one of the basis integer variables which is fractional in the optimal LP-
solution, say xi = b̄i . This variable is parametrized as follows:
X
xi = b̄i − āij xj
j∈N

7 Generate the Gomory-Chvátal cut


X
(āij − ⌊āij ⌋)xj > (b̄i − ⌊b̄i ⌋)
j∈N

and add it to the LP-formulation by means of a new nonnegative integer slack


variable s:
X
(āij − ⌊āij ⌋)xj − s = (b̄i − ⌊b̄i ⌋)
j∈N
s>0
s∈Z

8 end if
9 until forever

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


128 Cutting Planes

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 0 1 2 3 4

(a) Inequalities leading to the polytope (b) The first cut produced by Gomory’s al-
shown with thick lines together with the con- gorithm is x1 6 2 (shown in red)
vex hull of the integral points.

0
0 1 2 3 4

(c) The next cut produced is x1 − x2 6 1


(shown in red). After this cut the algorithm
terminates with the optimum solution (2, 1).

Figure 9.2: Example problem for Gomory’s algorithm. Thick solid lines indicate the poly-
tope described by the inequalities, the pink shaded region is the convex hull of the integral
points (red) which are feasible.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.4 Gomory’s Cutting-Plane Algorithm 129

Adding slack variables gives the following LP-problem:


max 4x1 −x2
7x1 −2x2 +x3 = 14
x2 +x4 = 3
2x1 −2x2 +x5 = 3
x1 , x2 , x3 , x4 , x5 > 0
x1 , x2 , x3 , x4 , x5 ∈ Z
An optimal basis for the corresponding LP-relaxation is B = {1, 2, 5} with
 
  20/7
7 −2 0  3 

 
AB =  0 1 0  xB =  0 

2 −2 1  0 
23/7
The optimal basis representation (9.25) is given by:
max 59/7 −4/7x3 −1/7x4
x1 +1/7x3 +2/7x4 = 20/7
(9.35a)
x2 +x4 = 3
−2/7x3 +10/7x4 +x5 = 23/7
x1 , x2 , x3 , x4 , x5 > 0 (9.35b)
x1 , x2 , x3 , x4 , x5 ∈ Z (9.35c)
The variable x∗1 is fractional, x∗1 = 20/7. From the first line of (9.35) we have:
1 2 20
x1 + x3 + x4 = .
7 7 7
The Gomory-Chvátal cut (9.30) generated for x1 is
1 1 2 2 20 20
( − ⌊ ⌋)x3 + ( − ⌊ ⌋)x4 > ( − ⌊ ⌋)
7 7 7 7 7 7
1 2 6
⇔ x3 + x4 >
7 7 7
Thus, the following new constraint will be added to the LP-formulation:
1 2 6
x3 + x4 − s = ,
7 7 7
where s > 0, s ∈ Z is a new integer nonnegative slack variable. Before we continue, let us
look at the inequality in terms of the original variables, that is, without the slack variables.
We have x3 = 14 − 7x1 + 2x2 and x4 = 3 − x2 . Substituting we get the cutting-plane
1 2 6
(14 − 7x1 + 2x2 ) + (3 − x2 ) >
7 7 7
⇔x1 6 2.
The cutting-plane x1 6 2 is shown in Figure 9.2(b).
Reoptimization of the new LP leads to the following optimal basis representation:
max 15/2 −1/5x5 −3s
x1 +s = 2
x2 −1/2x5 +s = 1/2
x3 −x5 −s = 1
x4 +1/2x5 +6s = 5/2
x1 , x2 , x3 , x4 , x5 , s > 0
x1 , x2 , x3 , x4 , x5 , s ∈ Z

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


130 Cutting Planes

The optimal solution x∗ = (2, 21 , 1, 52 , 0) is still fractional. We choose basic variable x2


which is fractional to generate a new cut. We have

1 1
x2 − x5 + s = ,
2 2
and so the new cut is
1 1 1 1
(− − ⌊− ⌋)x5 > ( − ⌊ ⌋)
2 2 2 2
1 1
⇔ x5 >
2 2

(observe that (− 21 − ⌊− 12 ⌋) = 21 , since ⌊− 21 ⌋ = −1). We introduce a new slack variable t >


0, t ∈ Z and add the following constraint:

1 1
x5 − t = .
2 2

Again, we can translate the new cut 21 x5 > 1


2 in terms of the original variables. It amounts
to
1 1
(2x1 − 2x2 ) >
2 2
⇔x1 − x2 6 1.

The new cutting-plane x1 − x2 6 1 is shown in Figure 9.2(c).


After reoptimization we obtain the following situation:

max 7 −3s −t
x1 +s = 2
x2 +s −t = 1
x3 −5s −2t = 2
x4 +6s +t = 2
x5 −t = 1
x1 , x2 , x3 , x4 , x5 , s, t > 0
x1 , x2 , x3 , x4 , x5 , s, t ∈ Z

The optimal basic solution is integral, thus it is also an optimal solution for the original
integer program: (x1 , x2 ) = (2, 1) constitutes an optimal solution of our original integer
program. ⊳

One can show that Gomory’s cutting-plane algorithm always terminates after a finite num-
ber of steps, provided the cuts are chosen appropriately. The proof of the following theorem
is beyond the scope of these lecture notes. We refer the reader to [Sch86, NW99].

Theorem 9.13 Suppose that Gomory’s algorithm is implemented in the following way:

(i) We use the lexicographic Simplex algorithm for solving the LPs.

(ii) We always derive the Gomory-Chvátal cut from the first Simplex row in which the
basic variable is fractional.

Then, Gomory’s cutting-plane algorithm terminates after a finite number of steps with an
optimal integer solution. ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.5 Mixed Integer Cuts 131

9.5 Mixed Integer Cuts


In this section we consider the situation of mixed integer programs

(MIP) max cT x (9.36a)


Ax = b (9.36b)
x>0 (9.36c)
p n−p
x ∈ Z ×R . (9.36d)

In this case, the approach taken so far does not work: In a nutshell, the basis of the Gomory-
Chvátal cuts was the fact that, if X = {y ∈ Z : y 6 b}, then y 6 ⌊b⌋ is valid for X. More
precisely, we saw that all Gomory-Chvátal cuts for PI = P ∩ Zn are of the form cT x 6 ⌊δ⌋
where cT x 6 δ is a supporting hyperplane of P with integral c. If x is not required to be
integral, we may not round down the right hand side of cT x 6 δ to obtain a valid inequality
for PI .
The approach taken in Gomory’s cutting-plane algorithm from Section 9.4 does not work
either, since for instance
1 1
+ x1 − 2x2 ∈ Z
3 3
with x1 ∈ Z+ and x2 ∈ R+ has a larger solution set than

1 1
+ x1 ∈ Z.
3 3
Thus, we can not derive the validity of (9.30) (since we can not assume that the coefficients
of the fractional variables are nonnegative) which forms the basis of Gomory’s algorithm.
The key to obtaining cuts for mixed integer programs is the following disjunctive argument:

Lemma 9.14 Let P1 and P2 be polyhedra in Rn (i) T


+ and (a ) x 6 αi be valid for Pi , i =
1, 2. Then, for any vector c ∈ Rn satisfying c 6 min(a(1) , a(2) ) componentwise and δ >
max(α1 , α2 ) the inequality
cT x 6 δ
is valid for X = P1 ∪ P2 and conv(X).

Proof: Let x ∈ X, then x ∈ P1 or x ∈ P2 . If x ∈ Pi , then


n
X n
X (i)
cT x = cj xj 6 aj xj 6 αi 6 δ,
j=1 j=1

where the first inequality follows from c 6 a(i) and x > 0. ✷

Let us go back to the situation in Gomory’s algorithm. We solve the LP-relaxation

(LP) max cT x (9.37a)


Ax = b (9.37b)
x>0 (9.37c)

of (9.36) and obtain an optimal basic solution x∗ . As in Section 9.4 we parametrize the
solutions of (9.37) by means of the nonbasic variables:

x∗B = A−1 −1 ∗ ∗
B b − AB AN xN =: b̄ − ĀN xN

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


132 Cutting Planes

Let xi be an integer variable. Then, any feasible solution to the MIP (9.36) satisfies:
X
b̄i − āij xj ∈ Z, (9.38)
j∈N

since
 the quantity
on−the left hand side of (9.38) is the value of variable xi . Let N+ :=
+
j ∈ N : āij > 0 , N := N \ N . Also, for a shorter notation we set f0 := (b̄i − ⌊b̄i ⌋) ∈
[0, 1). Equation (9.38) is equivalent to
X
āij xj = f0 + k for some k ∈ Z. (9.39)
j∈N

If, k > 0, then the quantity on the left hand side of (9.39) is at least f0 , if k 6 −1, then it is
at most f0 − 1. Accordingly, we distinguish between two cases:
P P P
Case 1: j∈N āij xj > f0 > 0: In this case, we get from j∈N+ āij xj > j∈N āij xj
the inequality: X
āij xj > f0 . (9.40)
j∈N+
P P P
Case 2: j∈N āij xj 6 f0 − 1 < 0: Then, j∈N− āij xj 6 j∈N āij xj 6 f0 − 1 which is
equivalent to
f0 X
− āij xj > f0 . (9.41)
1 − f0 − j∈N

We split PI into two parts, PI = P1 ∪ P2 , where


 
 X 
P1 := PI ∩ x : āij xj > 0
 
j∈N
 
 X 
P2 := PI ∩ x : āij xj < 0 .
 
j∈N

Inequality (9.40) is valid for P1 , while inequality (9.41) is valid for P2 . We apply
 Lemma 9.14
to get an inequality πT x > π0 . If j ∈ N+ , the
coefficient for x j is πj = max āij , 0 = āij .
f0 f0
If j ∈ N− , the coefficient for xj is πj = max 0, − 1−f 0
āij = − 1−f 0
āij . Thus, we obtain
the following inequality which is valid for PI :
X f0 X
āij xj − āij xj > f0 . (9.42)
1 − f0
j∈N+ −
j∈N

We can strengthen (9.42) by the following technique: The derivation of (9.42) remains
valid even if we add integer multiples of integer variables to the left hand side of (9.39): if
π is an integral vector, then
X X
πj xj + āij xj = f0 + k for some k ∈ Z. (9.39’)
j:xj is an integer variable j∈N


Thus, we can
 achieve that every
integer variable is in one of the two sets M+ = j : āij + πj > 0
or M− := j : āij + πj < 0 . If j ∈ M+ , then the coefficient πj of xj in the new version
πT x > π0 of (9.42) is āij + πj , so the best we can achieve is πj = fj := (āij − ⌊āij ⌋). If
f0
j ∈ M− , then the coefficient πj is − 1−f 0
(āij + πj ) and the smallest value we can achieve

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 133

f0 f0 (1−fj )
is − 1−f 0
(fj − 1) = 1−f0 . In summary, the smallest coefficient we can achieve for an
integer variable is  
f0 (1 − fj )
min fj , . (9.43)
1 − f0
The minimum in (9.43) is fj if and only if fj 6 f0 . This leads to Gomory’s mixed integer
cut:
X X f0 (1 − fj )
fj xj + xj .
1 − f0
j:fj 6f0 j:fj >f0
xj integer variable j: xj integer variable
X X (9.44)
f0
+ āij xj − āij xj > f0 .
1 − f0
j∈N+ j∈N−
xj no integer variable xj no integer variable

Similar to Theorem 9.13 it can be shown that an appropriate choice of cutting-planes (9.44)
leads to an algorithm which solves the MIP (9.36) in a finite number of steps.

9.6 Structured Inequalities


In the previous sections of this chapter we have derived valid inequalities for general integer
and mixed-integer programs. Sometimes focussing on a single constraint (or a small subset
of the constraints) can reveal that a particular problem has a useful “local structure”. In this
section we will explore such local structures in order to derive strong inequalities.

9.6.1 Knapsack and Cover Inequalities


We consider the 0/1-Knapsack polytope
 
 X 
PK NAPSACK := PK NAPSACK (N, a, b) := conv x ∈ BN : aj xj 6 b
 
j∈N

which we have seen a couple of times in these lecture notes (for instance in Example 1.3).
Here, the ai are nonnegative coefficients and b > 0. We usePthe general index set N instead
of N = {1, . . . , n} to emphasize that a knapsack constraint j∈N aj xj 6 b might occur as
a constraint in a larger integer program and might not involve all variables.
In all what follows, we assume that aj 6 b for j ∈ N since aj > b implies that xj = 0 for
all x ∈ PK NAPSACK (N, a, b). Under this assumption PK NAPSACK (N, a, b) is full-dimensional,
since χ∅ and χ{j} (j ∈ N) form a set of n+1 affinely independent vectors in PK NAPSACK (N, a, b).
Each inequality xj > 0 for j ∈ N is valid for PK NAPSACK (N, a, b). Moreover, each of these
nonnegativity constraints defines a facet of PK NAPSACK (N, a, b), since χ∅ and χ{i} (i ∈
N \ {j}) form a set of n affinely independent vectors that satisfy the inequality at equality.
In the sequel we will search for more facets of PK NAPSACK (N, a, b) and, less ambitious, for
more valid inequalities.

Definition 9.15 (Cover, minimal cover)


A set C ⊆ N is called a cover, if
X
aj > b.
j∈C

The cover is called a minimal cover, if C \ {j} is not a cover for all j ∈ C.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


134 Cutting Planes

P
Each cover C gives us a valid inequality j∈C xj 6 |C| − 1 for PK NAPSACK (if you do not
see this immediately, the proof will be given in the following theorem). It turns out that
this inequality is quite strong, provided that the cover is minimal.

Example 9.16
Consider the knapsack set

X = x ∈ B7 : 11x1 + 6x2 + 6x3 + 5x4 + 5x5 + 4x6 + x7 6 19

Three covers are C1 = {1, 2, 6}, C2 = {3, 4, 5, 6} and C3 = {1, 2, 5, 6} so we have the cover
inequalities:
x1 +x2 +x6 6 2
x3 +x4 +x5 +x6 6 3
x1 +x2 +x5 +x6 6 3
The cover C3 is not minimal, since C1 ⊂ C3 is also a cover. ⊳

Theorem 9.17 Let C ⊆ N be a cover. Then, the cover inequality


X
xj 6 |C| − 1
j∈C

is valid for PK NAPSACK (N, a, b). Moreover, if C is minimal, then the cover inequality defines
a facet of PK NAPSACK (C, a, b).

Proof: By Observation 2.2 it suffices to show that the inequality is valid for the knapsack
set  
 X 
X := x ∈ BN : aj xj 6 b .
 
j∈N

Suppose that x ∈ X does not satisfy the cover inequality. We have that x = χS is the
incidence vector of a set S ⊆ N. By assumption we have
X X
|C| − 1 < xj = xj = |C ∩ S|.
|{z}
j∈C j∈C∩S
=1

So |C ∩ S| = |C| and consequently C ⊆ S. Thus,


X X X X
aj xj > aj xj = aj = aj > b,
j∈N j∈C j∈C∩S j∈C

which contradicts the fact that x ∈ X.


It remains to show that the cover inequality defines a facet of PK NAPSACK (C, a, b), if the
cover is minimal. Suppose that there is a facet-defining inequality cT x 6 δ such that
 
 X 
x ∈ PK NAPSACK (C, a, b) : xj = |C| − 1
  (9.45)
j∈C


⊆ Fc := x ∈ PK NAPSACK (C, a, b) : cT x = δ .

We will show that cT x 6 δ is a nonnegative scalar multiple of the cover inequality.


For i ∈ C consider the set Ci := C \ {i}. Since, C is minimal, Ci is not a cover. Conse-
quently, each of the |C| incidence vectors χCi ∈ BC is contained in the set on the left hand
side of (9.45), so χCi ∈ Fc for i ∈ C. Thus, for i 6= j we have

0 = cT χCi − cT χCj = cT (χCi − χCj ) = ci − cj .

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 135

Hence we have ci = γ for i ∈ C and cT x 6 δ is of the form


X
γ xj 6 δ. (9.46)
j∈C

Fix i ∈ C. Then, by (9.46) we have


X
cT χCi = γ xj = δ
j∈Ci

and by (9.45) we have X


xj = |C| − 1,
j∈Ci

so δ = γ(|C|−1) and cT x 6 δ must be a nonnegative scalar multiple of the cover inequality.


The proof technique above is a general tool to show that an inequality defines a facet of a
full-dimensional polyhedron:

Observation 9.18 (Proof technique 1 for facets) Suppose that P = conv(X) ⊆ Rn is a


full dimensional polyhedron and πT x 6 π0 is a valid inequality for P. In order to show
that πT x 6 π0 defines a facet of P, it suffices to accomplish the following steps:

(i) Select t > n points x1 , . . . , xt ∈ X with πT xi = π0 and suppose that all these points
lie on a generic hyperplane cT x = δ.
(ii) Solve the linear equation system
n
X
cj xij = δ for i = 1, . . . , t (9.47)
j=1

in the n + 1 unknowns (c, δ).


(iii) If the only solution of (9.47) is (c, δ) = γ(π, π0 ) for some γ 6= 0, then the inequality
πT x 6 π0 defines a facet of P.

In the proof of Theorem 9.17 we were dealing with a polytope PK NAPSACK (C, a, b) ⊂ RC
and we choose |C| points χCi (i ∈ C) satisfying the cover inequality with equality.

Pto the cover inequalities. We have shown that for a minimal cover C the cover
Let us return
inequality j∈C xj 6 |C| − 1 defines a facet of PK NAPSACK (C, a, b). However, this does not
necessarily mean that the inequality is also facet-defining for PK NAPSACK (N, a, b). Observe
that there is a simple way to strengthen the basic cover inequalities:

P
Lemma 9.19 Let C be a cover for X = x ∈ BN : j∈N aj xj 6 b . We define the ex-
tended cover E(C) by

E(C) := C ∪ j ∈ N : aj > ai for all i ∈ C .

The extended cover inequality


X
xj 6 |C| − 1
j∈E(C)

is valid for PK NAPSACK (N, a, b).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


136 Cutting Planes

Proof: Along the same lines as the validity of the cover inequality in Theorem 9.17. ✷

Example 9.20 (Continued)


In the knapsack set of Example 9.16 the extended cover inequality for C = {3, 4, 5, 6} is

x1 + x2 + x3 + x4 + x5 + x6 6 3.

So, the cover inequality x3 +x4 +x5 +x6 6 3 is dominated by the extended cover inequality.
On the other hand, the extended cover inequality in turn is dominated by the inequality
2x1 + x2 + x3 + x4 + x5 + x6 6 3, so it can not be facet-defining (cf. Theorem 3.47). ⊳

We have just seen that even extended cover inequalities might not give us a facet of the
knapsack polytope. Nevertheless, under some circumstances they are facet-defining:

Theorem 9.21 Let a1 > a2 > · · · > an and C = {j1 , . . . , jr } with j1 < j2 < · · · < jr be a
minimal cover. Suppose that at least one of the following conditions is satisfied:

(i) C = N

(ii) E(C) = N and (C \ {j1 , j2 }) ∪ {1} is not a cover.

(iii) C = E(C) and (C \ {j1 }) ∪ {p} is a cover, where p = min {j : j ∈ N \ E(C)}.

(iv) C ⊂ E(C) ⊂ N and (C \ {j1 , j2 }) ∪ {1} is a cover and (C \ {j1 }) ∪ {p} is a cover, where
p = min {j : j ∈ N \ E(C)}.

Then the extended cover inequality defines a facet.

Proof: We construct n affinely independent vectors in X that satisfy the extended cover
inequality at equality. Then, it follows that the proper face induced by the inequality has
dimension at least n − 1, which means that is constitutes a facet.
We use the incidence vectors of the following subsets of N:

(i) the |C| sets Ci := C \ {ji } for ji ∈ C.

(ii) the |E(C) \ C| sets Ck′ := (C \ {j1 , j2 }) ∪ {k} for k ∈ E(C) \ C. Observe that |Ck′ ∩
E(C)| = |C| − 1 and that Ck′ is not a cover by the assumptions of the theorem.

(iii) the |N \ E(C)| sets C̄j := C \ {j1 } ∪ {j} for j ∈ N \ E(C); again |E(C) ∩ C̄j | = |C| − 1
and C̄j is not a cover by the assumptions of the theorem.

It is straightforward to verify that the n vectors constructed above are in fact affinely inde-
pendent. ✷

On the way to proving Theorem 9.21 we saw another technique to prove that an inequality
is facet defining, which for obvious reasons is called the direct method:

Observation 9.22 (Proof technique 2 for facets) In order to show that πT x 6 π0 defines
a facet of P, it suffices to present dim(P) − 1 affinely independent vectors in P that satisfy
πT x = π0 at equality.

Usually we are in the situation that P = conv(X) and we will be able to exploit the combi-
natorial structure of X. Let us diverge for a moment and illustrate this one more time for
the matching polytope.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 137

Example 9.23
Let G = (V, E) be an undirected graph and M(G) be the convex hull of the incidence
vectors of all matchings of G. Then, all vectors in M(G) satisfy the following inequalities:

x(δ(v)) 6 1 for all v ∈ V (9.48a)


|T | − 1
x(γ(T )) 6 for all T ⊆ V, |T | > 3 odd (9.48b)
2
xe > 0 for all e ∈ E. (9.48c)

Inequalities (9.48a) and (9.48c) are obvious and the inequalities (9.48b) have been shown
to be valid in Example 9.3.
The polytope M(G) is clearly full-dimensional. Moreover, each inequality xe ′ > 0 defines
a facet of M(G). To see this, take the |E| incidence vectors of the matchings ∅ and {e}
where e ∈ E \ {e ′ } which are clearly independent and satisfy the inequality at equality.

9.6.2 Lifting of Cover Inequalities


We return from our short excursion to matchings to the extended cover inequalities. In
Example 9.20 we saw that the extended cover inequality x1 + x2 + x3 + x4 + x5 + x6 6 3
is dominated by the inequality 2x1 + x2 + x3 + x4 + x5 + x6 6 3. How could we possibly
derive the latter inequality?
Consider the cover inequality for the cover C = {3, 4, 5, 6}

x3 + x4 + x5 + x6 6 3

which is valid for our knapsack set



X = x ∈ B7 : 11x1 + 6x2 + 6x3 + 5x4 + 5x5 + 4x6 + x7 6 19

from Example 9.16. We may also say that the cover inequality is valid for the set

X ′ := x ∈ B4 : 6x3 + 5x4 + 5x5 + 4x6 6 19 ,

which is formed by the variables in C. Since the cover is minimal, by Theorem 9.17 the
cover inequality defines a facet of conv(X ′ ), so it is as strong as possible. We would like to
transfer the inequality and its strength to the higher dimensional set conv(X).
As a first step, let us determine the coefficients β1 such that the inequality

β1 x1 + x3 + x4 + x5 + x6 6 3 (9.49)

is valid for 
X ′′ := x ∈ B5 : 11x1 + 6x3 + 5x4 + 5x5 + 4x6 6 19 .
In a second step, we will choose β1 as large as possible, making the inequality as strong as
possible.
For all x ∈ X ′′ with x1 = 0, the inequality (9.49) is valid for all values of β1 . If x1 = 1,
then (9.49) is valid if and only if

β1 + x3 + x4 + x5 + x6 6 3

is valid for all x ∈ B4 satisfying

6x3 + 5x4 + 5x5 + 4x6 6 19 − 11 = 8.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


138 Cutting Planes

Thus, (9.49) is valid if and only if



β1 + max x3 + x4 + x5 + x6 : x ∈ B4 and 6x3 + 5x4 + 5x5 + 4x6 6 8 6 3.

This is equivalent to saying that β1 6 3 − z1 , where



z1 = max x3 + x4 + x5 + x6 : x ∈ B4 and 6x3 + 5x4 + 5x5 + 4x6 6 8 (9.50)

The problem in (9.50) is itself a K NAPSACK problem. However, the objective function is
particularly simple and in our example we can see easily that z1 = 1 (we have z1 > 1 since
(1, 0, 0, 0) is feasible for the problem; on the other hand not two items fit into the knapsack
of size 8). Thus, (9.49) is valid for all values β 6 3 − 1 = 2. Setting β1 = 2 gives the
strongest inequality.
The technique that we have seen above is called lifting: we “lift” a lower-dimensional
(facet-defining) inequality to a higher-dimensional polyhedron. The fact that this lifting is
possible gives another justification for studying “local structures” in integer programs such
as knapsack inequalities.
In our example we have lifted the cover inequality one dimension. In order to lift the
inequality to the whole polyhedron, we need to solve a more general problem. Namely, we
wish to find the best possible values βj for j ∈ N \ C such that the inequality
X X
βj xj + xj 6 |C| − 1
j∈N\C j∈C


P
is valid for X = x ∈ BN : j∈N aj xj 6 b . The procedure in Algorithm 9.3 accom-
plishes this task.

Example 9.24 (Continued)


We return to the knapsack set of Example 9.16 and 9.20. Take the minimal cover C =
{3, 4, 5, 6}
x3 + x4 + x5 + x6 6 3
and set j1 = 1, j2 = 2 and j3 = 7. We have already calculated the value β1 = 2. For
βj2 = β2 , the coefficient for x2 in the lifted inequality we need to solve the following
instance of K NAPSACK:

z2 = max 2x1 + x3 + x4 + x6 + x6
11x1 + 6x3 + 5x4 + 5x5 + 4x6 6 19 − 6 = 13
x ∈ B5

It is easy to see that z2 = 2, so we have βj2 = β2 = 3 − 2 = 1.


Finally, for βj3 = β7 , we must solve

z7 = max 2x1 + x2 + x3 + x4 + x6 + x6 + x7
11x1 + 6x2 + 6x3 + 5x4 + 5x5 + 4x6 6 19 − 1 = 18
x ∈ B6

Here, the problem gets a little bit more involved. It can be seen that z7 = 3, so the coefficient
β7 for x7 in the lifted cover inequality is β7 = 3 − 3 = 0. We finish with the inequality:

2x1 + x2 + x3 + x4 + x5 + x6 6 3.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 139

Algorithm 9.3 Algorithm to lift cover inequalities.


L IFT-C OVER

P
Input: The data N, a, b for a knapsack set X = x ∈ BN : j∈N aj xj 6 b , a
minimal cover C
Output: Values βj for j ∈ N \ C such that
X X
βj xj + xj 6 |C| − 1
j∈N\C j∈C

is valid for X
1 Let j1 , . . . , jr be an ordering of N \ C.
2 for t = 1, . . . , r do
3 The valid inequality
t−1
X X
βji xji + xj 6 |C| − 1
i=1 j∈C

has been obtained so far.


4 To calculate the largest value βjt for which

t−1
X X
βjt xjt + βji xji + xj 6 |C| − 1
i=1 j∈C

is valid, solve the following K NAPSACK problem:


t−1
X X
zt = max βji xji + xj
i=1 j∈C
t−1
X X
aji xji + aj xj 6 b − ajt
i=1 j∈C

x ∈ B|C|+t−1

5 Set βjt := |C| − 1 − zt .


6 end for

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


140 Cutting Planes

We prove that the lifting technique in a more general setting provides us with a tool to
derive facet-defining inequalities:

Theorem 9.25 Suppose that X ⊆ Bn and let Xδ = X ∩ {x ∈ Bn : x1 = δ} for δ ∈ {0, 1}.

(i) Suppose that the inequality


n
X
πj xj 6 π0 (9.51)
j=2

is valid for X0 . If X1 = ∅, then x1 6 0 is valid for X. If X1 6= ∅, then the inequality


n
X
β1 x1 + πj xj 6 π0 (9.52)
j=2

is valid for X if β1 6 π0 − z, where


 n 
X
1
z = max πj xj : x ∈ X .
i=2

Moreover, if β1 = π0 − z and (9.51) defines a face of dimension k of conv(X0 ), then


the lifted inequality (9.52) defines a face of dimension k+1 of conv(X). In particular,
if (9.51) is facet-defining for conv(X0 ), then (9.52) is facet-defining for conv(X).
(ii) Suppose that the inequality (9.51) is valid for X1 . If X0 = ∅, then x1 > 1 is valid
for X. If X0 6= ∅, then
Xn
γ1 x1 + πj xj 6 π0 + γ1 (9.53)
j=2

is valid for X if γ1 > z ′ − π0 ,


where
 n 
X
′ 0
z = max πj xj : x ∈ X .
i=2

Moreover, if γ1 = π0 − z ′ and (9.51) defines a face of dimension k of conv(X1 ), then


the lifted inequality (9.53) defines a face of dimension k+1 of conv(X). In particular,
if (9.51) is facet-defining for conv(X1 ), then (9.52) is facet-defining for conv(X).

Proof: We only prove the first part of the theorem. The second part can be proved along
the same lines.
We first show that the lifted inequality (9.52) is valid for X for all β1 6 π0 − z. We have
X = X0 ∪ X1 . If x ∈ X0 , then
n
X n
X
β1 x1 + πj xj = πj xj 6 π0 .
j=2 j=2

If x ∈ X1 , then
n
X n
X
β1 x1 + πj xj = β1 + πj xj 6 β1 + z 6 (π0 − z) + z = π0
j=2 j=2

by definition of z. Thus, the validity follows.


If (9.51) defines a k-dimensional face of conv(X0 ), then there are k+1 affinely independent
vectors x̄i , i = 1, . . . , k + 1 that satisfy (9.51) at equality. Everyone of those vectors has

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 141

P
x1 = 0 and also satisfies (9.52) at equality. Choose x∗ ∈ X1 such that z = n ∗
j=2 πj xj .
If β1 = π0 − z, then x∗ satsifies (9.52) also at equality. Moreover, x∗ must be affinely
independent from all the vectors x̄i , since the first component of x∗ is 1 while all the
vectors x̄i have first component 0. Thus, we have found k + 2 affinely independent vectors
satisfying (9.52) at equality and it follows that the face induced has dimension k + 1. ✷

Theorem 9.25 can be used iteratively P as in our lifting procedure (Algorithm 9.3): Given
N1 ⊂ N = {1, . . . , n} and an inequality j∈N1 πj xj 6 π0 which is valid for

X ∩ x ∈ Bn : xj = 0 for j ∈ N \ N1

we can lift one variable at a time to obtain a valid inequality


X X
βj xj + πj xj 6 π0 (9.54)
j∈N\N1 j∈N1

for X. The coefficients βj in (9.54) are independent of the order in which the variables
are lifted. The corresponding lifting procedure is a straightforward generalization of our
lifting procedure for the cover inequalities. From Theorem 9.25 we obtain the following
corollary:

P
Corollary 9.26 Let C be a minimal cover for X = x ∈ BN : j∈N aj xj 6 b . The lift-
ing procedure in Algorithm 9.3 determines a facet-defining inequality for conv(X). ✷

9.6.3 Separation of Cover Inequalities


Since, in general, there are exponentially many cover inequalities, it is not feasible to add
them all at once. Thus, we proceed as in the generic cutting-plane algorithm from 9.3,
where our family F of valid inequalities is the set of cover inequalities.
Suppose that we have solved a relaxation of the K NAPSACK problem resulting in an opti-
mum fractional solution x∗ . Our task is now to find a cover inequality violated by x∗ or to
certify that x∗ satisfies all of them.
Let us rewrite a cover inequality
X X X 
xj 6 |C| − 1 ⇔ |C| − xj > 1 ⇔ 1 − xj > 1.
j∈C j∈C j∈C

So, we would like to know whether there is a cover C such that


X
1 − x∗j < 1.


j∈C

Let us define w∗j := 1 − x∗j for j = 1, . . . , n. With this notation, the task of finding a violated
cover inequality is equivalent to solving the following optimization problem:
n
X
v := min w∗j yj (9.55a)
j=1
Xn
aj yj > b (9.55b)
j=1
yj ∈ {0, 1}, j = 1, . . . , n. (9.55c)

Thus, x∗ violates a cover inequality if and only if v < 1. Note that the problem (9.55) is
not an integer program in the ordinary sense, since it contains a strict inequality. But, if we

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


142 Cutting Planes

Day 1 2 3 4 5 6
Monday X X X
Tuesday X X
Wednesday X X X
Thursday X
Friday X X X

Table 9.1: Cleaning requests for Dilbert’s company. A checkmark indicated a request for
that particular day of the week.

P
assume that the aj and b are integral, then the condition n j=1 aj yj > b is tantamount to
Pn
a y
j=1 j j > b + 1. Hence, we can check for a violated cover inequality by solving:
n
X
v := min w∗j yj (9.56a)
j=1
Xn
aj yj > b + 1 (9.56b)
j=1

yj ∈ {0, 1}, j = 1, . . . , n. (9.56c)

But, the problem (9.56) is another K NAPSACK problem. So, what is the point of this
approach? In order to find a violated cover inequality, we need to solve a K NAPSACK
problem, which is with what we started.
However, typically, most of the x∗j will by 0 or 1, which means that most of the w∗j will
also be 0 or 1. Thus, problem (9.56) usually solves very quickly.

9.6.4 The Set-Packing Polytope


Integer and mixed integer programs often contain inequalities that have all coefficients
P B = {0, 1}. In particular, many applications require logical inequalities
from P of the form
j∈N xj 6 1 (packing constraint: at most one of the js is chosen) or x
j∈N j > 1 (cov-
ering constraint: at least one of the js is picked). This motivates the study of packing,
covering problems, cf. Example 1.10.

Example 9.27
Dilbert is running a small cleaning business. In fact, he is the only cleaner available in his
company. He offers a service from Monday to Friday and, in order to attract customers, he
is offering a flat rate, which means that he will charge a unit price for the service no matter
how many days a week he will do the cleaning. Six companies have called and requested
service for various days as shown in Table 9.1.
Dilbert can do at most one cleaning job per day and he must either accept a cleaning request
completely or reject it. Thus, in order to maximize the number of jobs accepted, he writes
an integer program. For each cleaning job j ∈ J = {1, . . . , 6} he sets up a binary variable xj
which is 1, if the job is accepted. The problem now becomes:
X
max xj (9.57a)
j∈J
X
xj 6 1 for all days d (9.57b)
j∈J: j requests day d

xj ∈ {0, 1} for all j ∈ J (9.57c)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 143

Inequality (9.57b) states the constraint that on each day Dilbert can do at most one job.
If we solve the LP-relaxation obtained by dropping the integrality constraints (9.57c) we
obtain the optimum solution:

x∗1 = x∗2 = x∗3 = 1/2, x∗4 = 1, x∗5 = x∗6 = 0.

The optimum (fractional) solution value is 2.5 which must be greater than the value of the
best integer solution, since for any integral solution the objective is also integral. ⊳

Definition 9.28 (Set-Packing Problem and Set Covering Problem)


Let A ∈ Bm×n be an m × n-matrix with entries from B = {0, 1} and c ∈ Rn . The integer
problems


max cT x : Ax 6 1, x ∈ Bn


max cT x : Ax > 1, x ∈ Bn


max cT x : Ax = 1, x ∈ Bn

are called the set-packing problem, the set-covering problem and the set-covering parti-
tioning problem, respectively.

Example 9.29 (Example 9.27 continued)


The cleaning optimization problem of Example 9.27 is given in matrix form as follows:

max (1, 1, 1, 1, 1, 1)T x


 
1 0 1 0 0 1
1 1 0 0 0 0
 
x 6 1
0 1 1 0 1 0

0 0 0 0 0 1
0 0 0 1 1 1
x ∈ {0, 1}6

In this section we restrict ourselves to the set-packing problem and the set-packing poly-
tope:
PPACKING (A) := conv {x ∈ Bn : Ax 6 1} .
For the set-packing problem, there is a nice graph-theoretic interpretation of feasible so-
lutions. Given the matrix A, define an undirected graph G(A) as follows: the vertices
of G(A) correspond to the columns of A. There is an edge between i and j if there is a
common nonzero entry in columns i and j. The graph G(A) is called the conflict graph or
intersection graph.
Recall that a stable set (or independent set) in an undirected graph G = (V, E) is a subset
S ⊆ V of the vertex set such that no two vertices in S are connected by an edge. Obviously,
each feasible binary vector for the set-packing problem corresponds to a stable set in G(A).
Conversely, each stable set in G(A) gives a feasible solution for the set-packing problem.
Thus, we have a one-to-one correspondence and it follows that

PPACKING (A) = conv {x ∈ Bn : xu + xv 6 1 for all [u, v] ∈ G(A)} .

In other words, PPACKING (A) is the stable-set polytope STAB(G(A)) of G(A). If G is a


graph, then the incidence vectors of the n + 1 sets ∅ and {v}, where v ∈ V are all affinely
independent and contained in STAB(G) whence STAB(G) has full dimension.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


144 Cutting Planes

We know from Theorem 4.14 that the node-edge incidence matrix of a bipartite graph is
totally unimodular (see also Example 4.15). Thus, if G(A) is bipartite, then by the The-
orem of Hoffmann and Kruskal (Corollary 4.12) we have that PPACKING (A) is completely
described by the linear system:

xi + xj 6 1 for all (i, j) ∈ G(A) (9.58a)


x > 0. (9.58b)

We also know that for a general graph, the system (9.58) does not suffice to describe the
convex hull of its stable sets, here PPACKING (A).

Example 9.30 (Example 9.27 continued)


Let us look at the stable set problem for Dilbert’s cleaning. The conflict graph has 6 ver-
tices. The problem becomes:
6
X
max xi (9.59a)
i=1
x1 + x2 6 1 (9.59b)
x1 + x3 6 1 (9.59c)
x1 + x6 6 1 (9.59d)
x2 + x3 6 1 (9.59e)
x2 + x5 6 1 (9.59f)
x3 + x5 6 1 (9.59g)
x3 + x6 6 1 (9.59h)
x4 + x5 6 1 (9.59i)
x4 + x6 6 1 (9.59j)
x5 + x6 6 1 (9.59k)
x>0 (9.59l)
x ∈ Z6 (9.59m)

If we solve the LP-relaxation obtained by dropping the integrality constraints (9.59m) we


obtain the solution:

x∗1 = x∗2 = x∗3 = x∗4 = x∗5 = x∗6 = 1/2.

The optimum (fractional) solution value is 3. Recall that in Example 9.27 we had obtained
a value of 2.5 for the LP-relaxation of the original problem which we argued was larger
than the optimum integral value. So, the value 3 is even larger.
Didn’t we just argue that studying the set packing polytope and the stable set polytope is
equivalent? How can it happen that we obtain a weaker bound via the stable set polytope?
The answer is hidden in the fact that we looked at the LP-relaxations of both formulations.
The polytopes begin defined as convex hulls of the integral vectors are nevertheless the
same. ⊳

Let us address the fact that the LP-relaxation in Examples 9.27 and 9.30 turned out to be
of different strength. How can we possibly strengthen the formulation of the stable set
polytope?

Theorem 9.31 Let Q be a clique in G. The clique inequality


X
xi 6 1
i∈Q

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 145

is valid for STAB(G). The above inequality defines a facet of STAB(G) if and only if Q is
a maximal clique, that is, a clique which is maximal with respect to inclusion.

Proof: The validity of the inequality is immediate. Assume that Q is maximal. We find
n affinely independent vectors that satisfy x(Q) = 1. For v ∈ Q, we take the incidence
vector of {v}. For u ∈/ Q, we choose a node v ∈ Q which is adjacent to u. Such a node
exists, since Q is maximal. We add the incidence vector of {u, v} to our set. In total we have
n vectors which satisfy x(Q) 6 1 with equality. They are clearly affinely independent.
Assume conversely that Q is not maximal. So, there is a clique Q ′ ⊃ Q, Q ′ 6= Q. The
clique inequality x(Q ′ ) 6 1 dominates x(Q) 6 1, so x(Q) 6 1 is not necessary in the
description of STAB(G) and x(Q) 6 1 can not define a facet. ✷

The above example shows that for a general (i.e., non-bipartite) graph, the system (2.1)
does not suffice to describe the convex hull of its stable sets, here PPACKING (A). A graph is
bipartite if and only if it does not contain an odd cycle (see Lemma 4.6). Another class of
inequalities, the so-called odd-cycle inequalities, can be derived by considering odd cycles
(those were the culprits destroying bipartiteness). Suppose that C is an odd cycle in a
graph G Then, any stable set can contain at most ⌊|C|/2⌋ = (|C| − 1)/2 vertices. This gives
rise to the so-called odd-cycle inequality:
X |C| − 1
xi 6
2
i∈C

which is valid for the stable set polytope STAB(G).

Theorem 9.32 Let C be an odd cycle in G. The odd-cycle inequality


X |C| − 1
xi 6
2
i∈C

is valid for STAB(G). The above inequality defines a facet of STAB(G) if and only if C is
an odd hole, that is, a cycle without chords.

Proof: Any stable set x can contain at most every second vertex from C, thus x(C) 6
(|C| − 1)/2 since |C| is odd. So, the odd-cycle inequality is valid for STAB(G).
Suppose the C is an odd hole with V(C) = {0, 1, . . . , k − 1}, k ∈ N odd and let cT x 6 δ be
a facet-defining inequality with
 
X |C| − 1
FC = x ∈ STAB(G) : xi =
2
i∈C


⊆ Fc = x ∈ STAB(G) : cT x = δ

Fix i ∈ C and consider the two stable sets

S1 = {i + 2, i + 4, . . . , i − 3, i}
S2 = {i + 2, i + 4, . . . , i − 3, i − 1}

where all indices are taken modulo k (see Figure 9.3 for an illustration). Then, χSi ∈ FC ⊆
Fc , so we have
0 = cT χS1 − cT χS2 = cT (χS1 − χS2 ) = ci − ci−1 .

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


146 Cutting Planes

2 2
1 1
3 3

0 0

4 4

6 6
5 5
(a) The stable set S1 = {i + 2, i + 4, . . . , i − 3, i} in (b) The stable set S2 =
the odd cycle C. {i + 2, i + 4, . . . , i − 3, i − 1} in the odd cycle C.

Figure 9.3: Construction of the stable sets S1 and S2 in the proof of Theorem 9.32: Here,
node i = 0 is the anchor point of the stable sets. The stable sets are indicated by the black
nodes.

2
1
3

6
5

Figure 9.4: If C is an odd cycle with chords, then there is an odd hole H contained in C
(indicated by the thick lines).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


9.6 Structured Inequalities 147

Since we can choose i ∈ C arbitrarily, this implies that ci = γ for all i ∈ C for some γ ∈ R.
As in the proof of Theorem 9.17 we can now conclude that cT x 6 δ is a positive scalar
multiple of the odd-hole inequality (observe that we used proof technique 1 for facets).
Finally, suppose that C is a cycle with at least one chord. We can find an odd hole H that is
contained in C (see Figure 9.4). The odd-cycle inequality:
X |H| − 1
xi 6 (9.60)
2
i∈H

is also valid for STAB(G). Let W := V \ V(H) and EW := E(W) ∩ E(C) be the edges
which have both endpoints in W and also lie on the odd cycle C. Then, W is a simple path
of even length through all the nodes which are not on H. Any stable set in G can contain at
most |W|/2 vertices from W. Thus, the inequality
X |W|
xi 6 (9.61)
2
i∈W

is valid for STAB(G). We have |H| + |W| = |C|. The odd-cycle inequality x(C) 6 (|C| −
1)/2 is a nonnegative linear combination of the valid inequalites (9.60) and (9.61) (and
there is a vector satisfying both inequalities as equalities). Thus

|C| − 1 |H| − 1
x ∈ STAB(G) : x(C) 6 ⊂ x ∈ STAB(G) : x(H) 6 .
2 2

Hence, x(C) 6 (|C| − 1)/2 can not induce a facet. ✷

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10
Column Generation

One of the recurring ideas in optimization is that of decomposition. The idea of a decom-
position method is to remove some of the variables from a problem and handle them in a
master problem. The resulting subproblems are often easier and more efficient to solve.
The decomosition method iterates back and forth between the master problem and the sub-
problem(s), exchanging information in order to solve the overall problem to optimality.

10.1 Dantzig-Wolfe Decomposition


We start with the Dantzig-Wolfe Decomposition for Linear Programs. Suppose that we are
given the following Linear Program:
max cT x (10.1a)
1 1
A x6b (10.1b)
2 2
A x6b (10.1c)

where Ai is an mi × n-matrix. Consider the polyhedron



P2 = x : A2 x 6 b2 .

From the Decomposition Theorem 3.15 and Corollary 3.60 we know that any point x ∈ P2
is of the form X X
x= λk xk + µj rj (10.2)
k∈K j∈J
P
with k∈K λk = 1, λk > 0 for k ∈ K, µj > 0 for j ∈ J. The vectors xk and rj are the
extreme points and extreme rays of P2 . Using (10.2) in (10.1) yields:
 
X X
max cT  λk xk + µj rj  (10.3a)
k∈K j∈J
 
X X
A1  λk xk + µj rj  6 b1 (10.3b)
k∈K j∈J
X
λk = 1 (10.3c)
k∈K
J
λ ∈ RK
+ , µ ∈ R+ (10.3d)
150 Column Generation

Rearranging terms, we see that (10.3) is equivalent to the following Linear Program:
X X
max (cT xk )λk + (cT µj )rj (10.4a)
k∈K j∈J
X X
(A1 xk )λk + (A1 rj )µj 6 b1 (10.4b)
k∈K j∈J
X
λk = 1 (10.4c)
k∈K
J
λ ∈ RK
+ , µ ∈ R+ (10.4d)

Let us compare the initial formulation (10.1) and the equivalent one (10.4):
Formulation number of variables number of constraints
(10.1) n m1 + m2
(10.4) |K| + |J| m1 + 1
In the transition from (10.1) to (10.4) we have decreased the number of contstraints by m2 +
1, but we have increased the number of variables from n to |K| + |J| which is usually much
larger than n (as an example that |K| + |J| can be exponentially larger than n consider the
unit cube {x ∈ Rn : 0 6 x 6 1} which has 2n constraints and 2n extreme points). Thus, the
reformulation may seem like a bad move. The crucial point is that we can still apply the
Simplex method to (10.4) without actually writing down the huge formulation.
Let us abbreviate (10.4) (after the introduction of slack variables) by


max wT η : Dη = d, η > 0 , (10.5)

where D ∈ R(m1 +1)×(|K|+|J|) , d ∈ Rm1 +1 .


The Simplex method iterates from basis to basis in order to find an optimal solution. A
basis of (10.5) is a set B ⊆ {1, . . . , |K| + |J|} with |B| = m1 + 1 such that the corresponding
square submatrix DB of D is nonsingular. Observe that such a basis is smaller (namely
by m2 − 1 variables) than a basis of the original formulation (10.1). In particular, the
matrix DB ∈ R(m1 +1)×(m1 +1) is much smaller than a basic matrix for (10.1) which is an
(m1 + m2 ) × (m1 + m2 )-matrix. In any basic solution ηB := D−1 B d of (10.5) and ηN := 0
only a very small number of variables (m1 + 1 out of |K| + |J|) can be nonzero.
It is easy to see that a basic solution (ηB , ηN ) of (10.5) is optimal if and only if the vector
y defined by yT DB = wTB satisfies wN − yT DN 6 0. We only outline the basics and
refer to standard textbooks on Linear Programming for details, e.g. [Lue84, CCPS98]. The
key observation is that in the Simplex method the only operation that uses all columns of
the system (10.5) is this pricing operation, which checks whether the reduced costs of the
nonbasic variables are nonpositive.

10.1.1 A Quick Review of Simplex Pricing


Recall that the Linear Programming dual to (10.5) is given by:


min dT y : DT y > w . (10.6)

The pricing step works as follows. Given a basis B and correspondig basic solution η =
(ηB , ηN ) = (D−1 T T
B d, 0) the Simplex method solves y DB = wB to obtain y. If wN −
T
y DN 6 0, then y is feasible for the dual (10.6) and we have an optimal solution for (10.5),
since
wT η = wTB ηB = yT DB ηB = yT d = dT y,

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.1 Dantzig-Wolfe Decomposition 151

and for any pair (η ′ , y ′ ) of feasible solutions for (P) and (D), respectively, we have

wT η 6 (DT y ′ )T η ′ = (y ′ )T Dη ′ = dT y ′ .

If on the other hand, wi − dTi y > 0 for some i ∈ N, we can improve the solution by adding
variable ηi to the basis and throwing out another index. We first express the new variable in
terms of the old basis, that is, we solve DB z = di . Let η(ε) be defined by ηB (ε) := ηB −εz,
ηi (ε) := ε and zero for all other variables. Then,

wT η(ε) = wTB (ηB − εz) + wi ε


= wTB ηB + ε(wi − wTB z)
= wTB ηB + ε(wi − yT DB z)
= wTB ηB + ε (wi − yT di )
| {z }
>0

Thus, for ε > 0 the new solution η(ε) is better than the old one η. The Simplex method
now chooses the largest possible value of ε, such that η(ε) is feasible, that is η(ε) > 0. This
operation will make one of the old basic variables j in B become zero. The selection of j is
usually called the ratio test, since j is any index in B such that zj > 0 and j minimizes the
ratio ηi /zi over all i ∈ B having zi > 0.

10.1.2 Pricing in the Dantzig-Wolfe Decomposition


In our particular situation we do not want to explicitly iterate over all entries of the vector
wN − yT DN in order to check for nonpositivity. Rather, the pricing can be accomplished
by solving the following Linear Program:

ζ = max (cT − ȳT A1 )x − ym1 +1 (10.7a)


A2 x 6 b2 (10.7b)

where ȳ is the vector composed of the first m1 components of y and y satisfies yT DB =


wB . The following cases can occur:

Case 1: We have ζ > 0 in (10.7).


Then, the problem (10.7) has an optimal solution x∗ with (cT − ȳT A1 )x∗ > ȳm1 +1 .
In this case, x∗ = xk for some k ∈ K is an extreme point. Let Dk,· be the kth row
of D. The reduced cost of xk is given by
 1 k
T T k T A x
wk − y D·,k = c x − y = cT xk − ȳT A1 xk − ym1 +1 > 0.
1

Thus, xk will be the variable entering the basis in the Simplex step (or in other words,
A1 xk

1 will be the column entering the Simplex tableau).
Case 2: The problem (10.7) is unbounded.
Then, there is an extreme ray rj for some j ∈ J with (cT − ȳT A1 )rj > 0. The reduced
cost of rj is given by
 1 j
A r
w|K|+j − yT D·,|K|+j = cT rj − yT = cT rj − ȳT A1 rj > 0.
0
A1 rj
So, rj will be the variable entering the basis, or

0 will be the column entering
the Simplex tableau.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


152 Column Generation

Case 3: We have ζ 6 0 in (10.7).


Then, the problem (10.7) has an optimal solution x∗ with (cT − ȳT A1 )x∗ 6 ym1 +1 .
By the same arguments as in Case 1 and Case 2 it follows that wi − ȳT D·,i 6 0 for
all i ∈ K ∪ J which shows that the current basic solution is an optimal of (10.4).

We have decomposed the original problem (10.1) into two problems: the master prob-
lem (10.4) and the pricing problem (10.7). The method works with a feasible solution
for the master problem (10.4) and generates columns of the constraint matrix on demand.
Thus, the method is also called column generation method.

10.2 Dantzig-Wolfe Reformulation of Integer Programs

We now turn to integer programs. Decomposition methods are particularly promising if the
solution set of the IP which we want to solve has a “decomposable structure”. In particular,
we will be interested in integer programs where the constraints take on the following form:

A1 x1 +A2 x2 + . . . Ak xk = b
D1 x1 6 d1
D2 x2 6 d2 (10.8)
.. .. ..
. . .
DK xK 6 dK

If the IP has the form above, then the sets




n
Xk = xk ∈ Z+k : Dk xk 6 dk

PK k k
are independent except for the linking constraint k=1 A x = b. Our integer program
which we want to solve is:
 K K

X X
z = max (ck )T xk : Ak xk = b, xk ∈ Xk for k = 1, . . . , K . (10.9)
k=1 k=1

In the sequel we assume that each set Xk contains a large but finite number of points:


Xk = xk,t : t = 1, . . . , Tk . (10.10)

An extension of the method also works for unbounded sets Xk but the presentation is even
more technical since we also need to handle extreme rays. In the Dantzig-Wolfe decompo-
sition in Section 10.1 we reformulated the problem to be solved by writing every point as
a convex combination of its extreme points (in case of bounded sets there are no extreme
rays). We will adopt this idea to the integer case.

Using (10.10) we can write:


 
 Tk Tk 
X X
Xk = xk ∈ Rnk : xk = λk,t xk,t , λk,t = 1, λk,t ∈ {0, 1} for t = 1, . . . , Tk .
 
t=1 t=1

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.2 Dantzig-Wolfe Reformulation of Integer Programs 153

Substituting into (10.9) gives an equivalent integer program, the IP master problem:

Tk
K X
X
(IPM) z = max ((ck )T xk,t )λk,t (10.11a)
k=1 t=1
Tk
K X
X
(Ak xk,t )λk,t = b (10.11b)
k=1 t=1
Tk
X
λk,t = 1 for k = 1, . . . , K (10.11c)
t=1
λk,t ∈ {0, 1} for k = 1, . . . , K and t = 1, . . . , Tk
(10.11d)

10.2.1 Solving the Master Linear Program

In order to solve the IP master problem (10.11) we first solve its Linear Programming
relaxation which is given by

Tk
K X
X
(LPM) zLPM = max ((ck )T xk,t )λk,t (10.12a)
k=1 t=1
Tk
K X
X
(Ak xk,t )λk,t = b (10.12b)
k=1 t=1
Tk
X
λk,t = 1 for k = 1, . . . , K (10.12c)
t=1
λk,t > 0 for k = 1, . . . , K and t = 1, . . . , Tk
(10.12d)

The method to solve (10.12) is the same we used for the reformulation in Section 10.1: a
column generation technique allows us to solve (10.12) without writing down the whole
Linear Program at once. We always work with a small subset of the columns.
Observe that (10.12) has a column
 
0
 .. 
 . 
(ck )T xk,t
 
 
 1  ←k
 Ak xk,t  with ek =  
 0 
ek  
 . 
 .. 
0

for every xk,t ∈ Xk . The constraint matrix of (10.12) is of the form:

A1 x1,1 . . . A1 x1,T1 A2 x2,1 . . . A2 x2,T2 AK xK,1 . . . AK xK,TK


 
... ...

 1 ... 1 


 1 ... 1 

 .. ,

 . 

 .. 
 . 
1 ... 1

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


154 Column Generation

while the right hand side has the form


 
b
b̄ = .
1

Let πi , i = 1, . . . , m be the dual variables associated with the linking constraints (10.12b)
and µk , k = 1, . . . , K be the dual variables corresponding to the constraints (10.12c). Con-
straints (10.12c) are also called convexity constraints. The Linear Programming dual
of (10.1) is given by:
m
X K
X
(DLPM) min bi πi + µk (10.13a)
i=1 k=1
π (Ak xk ) + µk > (ck )T xk
T
for all xk ∈ Xk , k = 1, . . . , K (10.13b)

Suppose that we have a subset of the columns which contains at least one column for
each k = 1, . . . , K such that the following Restricted Linear Programming Master Problem
is feasible:
(RLPM) z̄RLPM = max c̄T λ̄ (10.14a)
Āλ̄ = b̄ (10.14b)
λ̄ > 0 (10.14c)

The matrix Ā is a submatrix of the constraint matrix, λ̄ denotes the restricted set of variables
and c̄ is the corresponding restriction of the cost vector to those variables. Let λ̄∗ be an
optimal solution of (10.14) and (π, µ) ∈ Rm × RK be a corresponding dual solution (which
is optimal for the Linear Programming dual to (10.14)).
Clearly, any feasible solution to the (RLPM) (10.14) is feasible for (LPM) (10.12), so we
get
m
X K
X
z̄RLPM = c̄T λ̄∗ = πi bi + µk 6 zLPM 6 z.
i=1 k=1

The pricing step is essentially the same as in Section 10.1. We need to check whether for
each x ∈ Xk the reduced cost is nonpositive, that is whether (ck )T x − πT Ak x − µk 6 0
(equivalently, this means to check whether (π, µ) is feasible for the Linear Programming
dual (DLPM) (10.13) of (LPM) (10.12)). Instead of going through the reduced costs one
by one, we solve K optimization problems. While in Section 10.1.2 the corresponding op-
timization problem was a Linear Program (10.7), here we have to solve an integer program
for each k = 1, . . . , K:
ζk = max ((ck )T − πT Ak )x − µk (10.15a)


n
x ∈ Xk = xk ∈ Z+k : Dk xk 6 dk (10.15b)

Two cases can occur, ζk 6 0 and ζk > 0 (the case that (10.15) is unbounded is impossible
since we have assumed that Xk is a bounded set).

Case 1: ζk > 0 for some k ∈ {1, . . . , K}


Let x̃k be the optimal solution of (10.15) for this value of k. As in Section 10.1.2
it follows that x̃k is an extreme point of conv(Xk ), say x̃k = xk,t . The column cor-
k,t
responding
 tokthe T
 x has a positive reduced price. We introduce a new
variable
k,t
(c ) x
column  Ak xk,t . This leads to a new Restricted Linear Programming Mas-
ek
ter Problem which can be reoptimized easily by Simplex steps.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.2 Dantzig-Wolfe Reformulation of Integer Programs 155

Case 2: ζk 6 0 for k = 1, . . . , K
In this case, the dual solution (π, µ) which we obtained for the dual of (RLPM) (10.14)
is also feasible for the Linear Programming dual (DLPM) (10.13) of (LPM) (10.12)
and we have
m
X K
X
z̄LPM 6 πi bi + µk = c̄T λ̄∗ = z̄RLPM 6 z̄LPM .
i=1 k=1

Thus λ̄∗ is optimal for (LPM) (10.12) and we can terminate.

We can derive an upper bound for zLPM during the run of the algorithm by using Linear
Programming duality. By definition, we have
ζk > ((ck )T − πT Ak )xk − µk for all xk ∈ Xk , k = 1, . . . , K.
Let ζ = (ζ1 , . . . , ζK ). Then, it follows that (π, µ+ζ) is feasible for the dual (DLPM) (10.13).
Thus, by Linear Programming duality we get
m
X K
X K
X
zLPM 6 πi bi + µk + ζk . (10.16)
i=1 k=1 k=1

Finally, we note that there is an alternative stopping criterion to the one we have already
derived in Case 2 above. Let (x̄1 , . . . , x̄K ) be the K solutions of (10.15), so that ζk =
((ck )T − πT Ak )x̄k − µk . Then
K
X K
X K
X K
X
(ck )T x̄k = πT Ak xk + µk + ζk . (10.17)
k=1 k=1 k=1 k=1
PK
So, if (x̄1 , . . . , x̄K ) satisfies the linking constraint k k
k=1 A x = b, then we get from (10.17)
that
K
X K
X K
X
(ck )T x̄k = πT b + µk + ζk . (10.18)
k=1 k=1 k=1

The quantity on the right hand side of (10.18) is exactly the upper bound (10.16) for
the optimal value zLPM of the Linear Programming Master. Thus, we can conclude that
(x̄1 , . . . , x̄K ) is optimal for (LPM).

10.2.2 The Cutting-Stock Problem


We illustrate the process of column generation for the so-called cutting-stock problem.
Imagine that you work in a paper mill and you have a number of rolls of paper of fixed
width waiting to be cut, yet different customers want different numbers of rolls of various-
sized widths. How are you going to cut the rolls so that you minimize the waste (amount
of left-overs)?
More specifically, assume that the width of your paper rolls is W > 0 and customer i has
ordered bi rolls of width wi for i = 1, . . . , m. In all what follows we assume that wi 6 W
for all i (orders of width greater than W can not be satisfied and it is trivial to detect them at
the beginning). Let P K be an upper bound K on the number of rolls that we need to process
all orders, e.g. K = m i=1 bi . Using the variables

0, if roll k ∈ {1, . . . , K} is not used
xk
0 :=
1, if roll k ∈ {1, . . . , K} is used
xk
i := number of rolls of width wi cut out of roll k

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


156 Column Generation

we can formulate the cutting-stock problem as an integer linear program

K
X
zIP = min xk
0 (10.19a)
k=1
K
X
xk
i > bi , i = 1, . . . , m (10.19b)
k=1
Xm
wi xk k
i 6 Wx0 , k = 1, . . . , K (10.19c)
i=1
xk
0 ∈ {0, 1}, k = 1, . . . , K (10.19d)
k
xi ∈ Z+ , i = 1, . . . , m, k = 1, . . . , K (10.19e)

Assume that W = 20 and you have the following orders for paper:
i width wi number of orders bi
1 9 511
2 8 301
3 7 263
4 6 383
P
Then i bi = 1458. If we apply a branch-and-bound technique (see Chapter 8), the first
step would be to solve the LP-relaxation of (10.19). It turns out that the optimum solution
value of this LP is 557.34 The optimum value zIP equals 601 which means that the LP-
relaxation is rather weak.
Let us define a pattern to be a vector p = (ap p m
1 , . . . , am ) ∈ Z + which
Pm indicates that ap
i rolls
p
of width wi should be cut from a roll. A pattern is feasible, if i=1 ai wi 6 W. By P we
denote the set of all feasible patterns. We introduce the new variables λp ∈ Z+ for p ∈ P,
where λp denotes the number of rolls of width W which are cut according to pattern p.
Then, we can also formulate the cutting-stock problem alternatively as
X
zIP = min λp (10.20a)
p∈P
X
ap
i λp > bi , i = 1, . . . , m (10.20b)
p∈P
λp > 0, p∈P (10.20c)
λp ∈ Z, p ∈ P. (10.20d)

Of course, the formulation (10.20) contains a huge number of variables. Again, the cor-
responding Linear Programming Relaxation is obtained by dropping the integrality con-
straints (10.20d) and we denote by zLP its value:
X
(LPM) zLP = min λp (10.21a)
p∈P
X
ap
i λp > bi , i = 1, . . . , m (10.21b)
p∈P
λp > 0, p∈P (10.21c)

As in the previous section, let us first solve the LP-relaxation of the LP-master problem.
We start with a subset P ′ ⊆ P of the columns such that the Restricted Linear Programming

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.2 Dantzig-Wolfe Reformulation of Integer Programs 157

Master Problem
X
(RLPM) z ′ = min λp (10.22a)
p∈P
X
ap
i λp > bi , i = 1, . . . , m (10.22b)
p∈P
λp > 0, p ∈ P′ (10.22c)

is feasible. Such a subset P ′ can be found easily. Let us show this for our example.
We have four widths which we need to cut: 9, 8, 7, 6. If we cut only rolls of width 9,
then we can cut two rolls out of any roll of width 20. So, a pattern would be (2, 0, 0, 0)T .
Similarly, for the widths 9 and 7 we obtain the patterns (0, 2, 0, 0)T and (0, 0, 2, 0)T . Finally
for width 6 we can cut three rolls which gives us the pattern (0, 0, 0, 3)T . So, the initial
restricted LP-master is

min λ1 + λ2 + λ3 + λ4 (10.23a)
    
2 0 0 0 λ1 511
0 2 0 0 λ2   301 
0 0 2 0 λ3  >  263  (10.23b)
    

0 0 0 3 λ4 383.
λ > 0. (10.23c)

An optimum solution for this problem is

λ∗ = (255.5, 150.5, 131.5, 127.6̄)T .

In order to emulate the pricing step of the Simplex method, we need to look at the LP-dual
of (10.21) which is
m
X
(DLPM) max bi πi (10.24a)
i=1
Xm
ap
i πi 6 1, p∈P (10.24b)
i=1
π > 0. (10.24c)

The restricted dual corresponding to (10.22) is


m
X
(DLPM) max bi πi (10.25a)
i=1
Xm
ap
i πi 6 1, p ∈ P′ (10.25b)
i=1
π > 0. (10.25c)

We need to look at the reduced cost for all patterns p and check, whether they are nonneg-
ative. The reduced cost of a pattern p is given by
m
X
c̄p = 1 − ap
i πi . (10.26)
i=1

Again, observe that checking whether c̄p > 0 for all patterns p is the same as checking
whether the optimum solution π for the restricted primal is feasible for the dual master
(DLPM).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


158 Column Generation

In our example, the restricted dual has the optimum solution

π = (0.5, 0.5, 0.5, 0.3̄)T

and the reduced cost of a pattern (a1 , a2 , a3 , a3 ) is given by 1 − 21 a1 − 21 a2 − 21 a3 − 31 a4 .


Any feasible pattern with positive reduced cost thus satisfies

9a1 + 8a2 + 7a3 + 6a4 6 20


1 1 1 1
a1 a2 + a3 + a4 > 1
2 2 2 3
We can check whether such a pattern exists by solving
1 1 1 1
max a1 a2 + a3 + a4 (10.27a)
2 2 2 3
9a1 + 8a2 + 7a3 + 6a4 6 20 (10.27b)
a∈ Z4+ (10.27c)

If the optimum solution of (10.27) is greater than 1, we have found a pattern with negative
reduced cost. If the optimum solution is at most 1, then there is no pattern with nega-
tive reduced cost. Problem (10.27) is a Knapsack Problem! That this is the case is not a
coincidence of our example.
For the cutting-stock problem, using (10.26), a pattern with minimum reduced cost is ob-
tained by solving the Knapsack problem:
m
X
max πi ai (10.28a)
i=1
Xm
wi ai 6 W (10.28b)
i=1
a ∈ Zm
+. (10.28c)

Thus, we can continue and obtain an optimum solution of (LPM) (10.21). For the cutting-
stock problem and many other problems, this optimum solution is quite useful and the
relaxation (LPM) is much stronger than that of the first relaxation. We will work on the
exact strength in the next Section 10.2.4.
For the moment, let us have a closer look at the optimum solution for our example. It is
given by the following patterns with corresponding values:

(2, 0, 0, 0)T 255.5 (10.29a)


T
(0, 2, 0, 0) 87.625 (10.29b)
T
(0, 0, 2, 1) 131.5 (10.29c)
T
(0, 1, 0, 2) 125.75. (10.29d)

The solution value is

zLP = 600.375.

What can we learn from this solution? By the fact that (LPM) is a relaxation of (IPM) we
get zIP > zLP = 600.375. Since the optimum solution of (IPM) must have an integral value,
we can even round up and get
zIP > 601. (10.30)
Suppose now that we round up our values from (10.29). This gives us the vector x ′ :=
(256, 88, 132, 126). Clearly, x ′ must be feasible for (IPM), since all demands are satisfied

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.2 Dantzig-Wolfe Reformulation of Integer Programs 159

by the rounding-process. The solution value of x ′ is 256 + 88 + 132 + 126 = 602. So,
by (10.30) we have

zIP ∈ {601, 602}.

Thus, our solution found by rounding is off only by an additive constant of 1!

This rounding can be done the same way for general instances of the cutting-stock problem.
If we have an optimum solution x of (LPM), rounding up each entry of x gives a feasible
solution and increases the cost by at most m, where m is the number of different widths.
We get

zLP 6 zIP 6 zLP + m. (10.31)

The estimate zIP 6 zLP +m in (10.31) is pessimistic. In our example, the cost was increased
only by 1 and not by 4.

We can also round down the entries of the optimum LP-solution, thereby obtaining an
infeasible solution. For the example we get the solution

(2, 0, 0, 0)T 255 (10.32a)


T
(0, 2, 0, 0) 87 (10.32b)
(0, 0, 2, 1)T 131 (10.32c)
T
(0, 1, 0, 2) 125, (10.32d)

of value 598 and we are missing 1 roll of width 9, 2 rolls of width 8, 1 roll of width 7 and
2 rolls of width 6. If we cut each of the additional three patterns

(1, 1, 0, 0)T , (0, 1, 1, 0)T , (0, 0, 0, 2)T

once, we obtain a feasible solution of value 598 + 3 = 601 which must be optimal! Of
course, in general such a solution is not that easy to see.

10.2.3 The Balanced k-Cut Problem

In this section we investigate another application of column generation applied to integer


programs. Suppose that we are given an undirected graph G = (V, E) with nonnegative
weights w : E → R+ on the edges. Suppose that we are given an integer k and we wish
to divide the graph into k subgraphs of equal size such that the total weight of the edges
between different subgraphs is as small as possible. For simplicity we assume that n/k is
an integer (this can be enforced by adding isolated vertices), such that each block of the
partition gets n/k nodes.

A straightforward integer programming formulation of the balanced k-cut problem is as


follows: We introduce binary variables xv,i for v ∈ V and i = 1, . . . , k, where xv,i = 1 if
and only if node v gets assigned to block i. We also introduce binary variables ye which
are 1 if and only if e connects vertices in two different blocks.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


160 Column Generation

X
min we ye
e∈E
k
X
xv,i = 1 for all v ∈ V
i=1
X
xv,i = n/k for i = 1, . . . , k
v∈V
xu,i − xv,i 6 yuv for all [u, v] ∈ E
xv,i − xu,i 6 yuv for all [u, v] ∈ E
xv,i ∈ {0, 1} for all v ∈ V and i = 1, . . . , k
ye ∈ {0, 1} for all e ∈ E

The problem with the formulation (10.33) is that its LP-relaxation is very weak! The op-
timum value of the LP-relaxation is typically 0 or very close to 0 which means that a
branch-and-bound-based IP-solver may find a good (or optimum) solution quickly but then
spend enormous time to prove optimality. Another problem is the symmetry in (10.33).
If one permutes the k blocks of the optimum partition, then this yields another optimum
solution. Thus, there are always at least k! optimum solutions and this also means that
branch-and-bound will have to investigate many branches of the enumeration tree.

We now reformulate the problem by means of Dantzig-Wolfe-reformulation. Instead of


minimizing the weight of the edges crossing the cut we can also maximize the weights of
P by B the collection of all subsets of V of size n/k. For B ∈ B
the blocks. Let us denote
we denote by wB := e∈G[B] we the weights of the edges with both endpoints in B. Then
the balanced k-cut problem can be stated as
X
max wB xB (10.34a)
B∈B
X
xB = 1 for all v ∈ V (10.34b)
B∈B:v∈B
xB ∈ {0, 1} for all B ∈ B (10.34c)

Here, we have used a binary variable xB which is 1 if and only if we choose the subset B
as one block in the partition. The corresponding LP-relaxation is
X
max wB xB (10.35a)
B∈B
X
xB = 1 for all v ∈ V (10.35b)
B∈B:v∈B
xB > 0 for all B ∈ B (10.35c)

and the dual is given by


X
min πv (10.36a)
v∈V
X
πv > wB for all B ∈ B (10.36b)
v∈B

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.2 Dantzig-Wolfe Reformulation of Integer Programs 161

Suppose that we are given a subselection B ′ ⊆ B such that the restricted LP-master
X
max wB xB (10.37a)
B∈B ′
X
xB = 1 for all v ∈ V (10.37b)
B∈B ′ :v∈B
xB > 0 for all B ∈ B ′ (10.37c)
is feasible. Such a collection can be found easily: just group the first n/k nodes into the
first block, the next n/k nodes into the second block and so on. Let x∗ be an optimum
solution of (10.37) and π∗ be an optimum solution of the corresponding restricted dual
X
min πv (10.38a)
v∈V
X
πv > wB for all B ∈ B ′ (10.38b)
v∈B

The pricing problem is then to decide whether π∗ is feasible for (10.36), that is to decide
whether there exists B ∈ B \ B ′ such that
X
π∗v < wB . (10.39)
v∈B

Problem (10.39) can be restated as follows: Is there a selection B of exactly n/k nodes such
that the weights of the edges with both endpoints in B minus the π∗ -weights of the nodes
in B is strictly positive? This problem can be formulated as an integer linear program:
X X
max −π∗v xv + we ye (10.40a)
v∈V e∈E
X
xv = n/k (10.40b)
v∈V
yuv 6 xu for all [u, v] ∈ E (10.40c)
yuv 6 xv for all [u, v] ∈ E (10.40d)
xv ∈ {0, 1} for all v ∈ V (10.40e)
ye ∈ {0, 1} for all [u, v] ∈ E (10.40f)
The value of (10.40) lets us then decide optimality of x∗ for the LP-master. If the value is
at most 0, then the reduced cost of any block B is nonpositive and we can stop. If the value
is larger than 0 and attained for some block B, then we add the variable xB to the restricted
master (10.37) and continue.
It turns out that the formulation is much better than (10.33) and leads to much better solu-
tion times in practice, even though we have to solve an integer program in each iteration of
the emulated Simplex Method.

10.2.4 Strength of the Master Linear Program and Relations to La-


grangean Duality
We first investigate what kind of bounds the Master Linear Program will provide us with.

Theorem 10.1 The optimal value zLPM of the Master Linear Program (10.12) satisfies:
 K K

X X
LPM k T k k k k k
z = max (c ) x : A x = b, x ∈ conv(X ) for k = 1, . . . , K .
k=1 k=1

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


162 Column Generation

PTk
Proof: We obtain the Master Linear Program (10.12) by substituting xk = t=1 λk,t xk,t ,
PTk k
t=1 λk,t = 1 and λt,k > 0 for t = 1, . . . , Tk . This is equivalent to substituting x ∈
k
conv(X ). ✷

The form of the integer program (10.9) under study suggests an alternative approach to
solving the problem. We could dualize the linking constraints to obtain the Lagrangean
dual (see Section 6.3)
wLD = minm L(u),
u∈R

where
 K K

X X
k T k T k k k k
L(u) = max (c ) x + u (b − A x ) : x ∈ X for k = 1, . . . , K
k=1 k=1
 K

X
= max (((ck )T − uT Ak )xk + uT b) : xk ∈ Xk for k = 1, . . . , K
k=1

Observe that the calculation of L(u) decomposes automatically into K independent sub-
problems:
XK

T
L(u) = u b + max ((ck )T − uT Ak )xk : xk ∈ Xk .
k=1

Theorem 6.17 tells us that


 K K

X X
k T k k k k k
wLD = max (c ) x : A x = b, x ∈ conv(X ) for k = 1, . . . , K .
k=1 k=1

Comparing this result with Theorem 10.1 gives us the following corollary:

Corollary 10.2 The value of the Linear Programming Master and the Lagrangean dual
obtained by dualizing the linking constraints coincide:

zLPM = wLD .

10.2.5 Getting an Integral Solution

We have shown how to solve the Linear Programming Master and also shown that it pro-
vides the same bounds as a Lagrangean dual. If at the end of the column generation process
the optimal solution λ̄∗ of (LPM) (10.12) is integral, we have an optimal solution for the
integer program (10.11) which was our original problem. However, if λ̄∗ is fractional,
then (10.11) is not yet solved. We know that zLPM = wLD > z. This gives us at least an
upper bound for the optimal value and suggests using this bound in a branch-and-bound
algorithm.
Recall that in Section 9.3 we combined a branch-and-bound algorithm with a cutting-plane
approach. The result was a branch-and-cut algorithm. Similarly, we will now combine
branch-and-bound methods and column generation which gives us what is known as a
branch-and-price algorithm. In this section we restrict the presentation to 0-1-problems
(“binary problems”).

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.2 Dantzig-Wolfe Reformulation of Integer Programs 163

The integer program which we want to solve is:


 K K

X X
k T k k k k k
z = max (c ) x : A x = b, x ∈ X for k = 1, . . . , K , (10.41)
k=1 k=1

where

n
Xk = xk ∈ B+k : Dk xk 6 dk
for k = 1, . . . , K. As in Section 10.2 we reformulate the problem. Observe that in the binary
case each set Xk is automatically bounded, Xk = xk,t : t = 1, . . . , Tk . so that
 
 Tk Tk 
X X
Xk = xk ∈ Rnk : xk = λk,t xk,t , λk,t = 1, λk,t ∈ {0, 1} for t = 1, . . . , Tk .
 
t=1 t=1

and substituting into (10.41) gives the IP master problem:


Tk
K X
X
(IPM) z = max ((ck )T xk,t )λk,t (10.42a)
k=1 t=1
Tk
K X
X
(Ak xk,t )λk,t = b (10.42b)
k=1 t=1
Tk
X
λk,t = 1 for k = 1, . . . , K (10.42c)
t=1
λk,t ∈ {0, 1} for k = 1, . . . , K and t = 1, . . . , Tk
(10.42d)
PK PTk k T k,t )λ̄
Let λ̄ be an optimal solution to the LP-relaxation of (10.42), so that zLPM = k=1 t=1 ((c ) x k,t ,
where
Tk
K X
X
(LPM) zLPM = max ((ck )T xk,t )λk,t (10.43a)
k=1 t=1
Tk
K X
X
(Ak xk,t )λk,t = b (10.43b)
k=1 t=1
Tk
X
λk,t = 1 for k = 1, . . . , K (10.43c)
t=1
λk,t > 0 for k = 1, . . . , K and t = 1, . . . , Tk
(10.43d)

PTk
Let x̄k = t=1 λ̄k,t xk,t and x̄ = (x̄1 , . . . , x̄k ). Since all the xk,t ∈ Xk are different binary
vectors, it follows that x̄k ∈ Bnk if and only if all λ̄k,t are integers.
So, if λ̄ is integral, then x̄ is an optimal solution for (10.41). Otherwise, there is k0 such that
k
/ Bnk0 , i.e, there is t0 such that x̄t00 ∈
x̄k0 ∈ / B. So, like in the branch-and-bound scheme in
k
Chapter 8 we can branch on the fractional variable x̄t00 : We split the feasible set S of all
feasible solutions into S = S0 ∪ S1 , where


k
S0 = S ∩ x : xj00 = 0


k
S1 = S ∩ x : xj00 = 1 .

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


164 Column Generation

This is illustrated in Figure 10.1(a). We could also branch on the column variables and
split S into S = S0′ ∪ S1′ where


S0′ = S ∩ λ : λk1 ,t1 = 0


S1′ = S ∩ λ : λk1 ,t1 = 1 ,

where λk 1
t1 is a fractional varialbe in the optimal solution λ̄ of the Linear Programming
Master (see Figure 10.1(b)). However, branching on the column variables leads to a highly
unbalanced tree for the following reason: the branch with λk1 ,t1 = 0 exludes just one col-
umn, namely the t1 th feasible solution of the k1 st subproblem. Hence, usually branching
on the original variables is more desirable, since it leads to more balanced enumeration
trees.

S S

xk
t =0 xk
t =1 λk
t =0 λk
t =1

S0 S1 S0′ S1′

(a) Branching on the original variables (b) Branching on the column variables

Figure 10.1: Branching for 0-1 column generation.

k
We return to the situation where we branch on an original variable xj00 . We have

T
k0
k
X k ,t
xj00 = λk0 ,t xj00 .
t=1

k ,t
Recall that xj00 ∈ B, since each vector xk0 ,t ∈ Bnk0 .
k k ,t
If we require that xj00 = 0, this implies that λk0 ,t = 0 for all t with xj00 > 0. Put in
k k ,t
Similarly, if we require that xj00 = 1, this implies that λk0 ,t = 0 for all t with xj00 > 0.
k
In summary, if we require in the process of branching that xj00 = δ for some fixed δ ∈ {0, 1},
k ,t
then λk0 ,t > 0 only for those t such that xj 0 = δ. Thus, the Master Problem at node


k
Sδ = S0 = S ∩ x : xj00 = δ is given by:

Tk
XX X
(IPM)(Sδ ) max ((ck )T xk,t )λk,t + ((ck0 )T xk0 ,t )λk0 ,t (10.44a)
k6=k0 t=1 k ,t
t:xj 0 =δ
0
Tk
XX X
(Ak xk,t )λk,t + (Ak0 xk0 ,t )λk0 ,t = b (10.44b)
k6=k0 t=1 k ,t
t:xj 0 =δ
0
Tk
X
λk,t = 1 for k 6= k0 (10.44c)
t=1
λk,t ∈ {0, 1} for k = 1, . . . , K and t = 1, . . . , Tk (10.44d)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


10.2 Dantzig-Wolfe Reformulation of Integer Programs 165

The new problems (10.44) have the same form as the original Master Problem (10.42). The
k ,t
only difference is that some columns, namely all those where xj00 6= δ, are excluded. This
means in particular that for k 6= k0 each subproblem

ζk = max ((ck )T x − πT Ak )x − µk (10.45a)




n
x ∈ Xk = xk ∈ B+k : Dk xk 6 dk (10.45b)

remains unchanged. For k = k0 and δ = {0, 1} the k0 th subproblem is

ζk0 = max ((ck0 )T x − πT Ak0 )x − µk0 (10.46a)



x ∈ Xk ∩ x : xj = δ . (10.46b)

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


More About Lagrangian Duality
11
Consider the integer program:


z = max cT x : Dx 6 d, x ∈ X (11.1)

with
X = {Ax 6 b, x ∈ Zn } (11.2)
and Dx 6 d are some “complicating constraints” (D is a k × n matrix and d ∈ Rk ). In
Section 6.3 we introduced the Lagrangian Relaxation


IP(u) z(u) = max cT x + uT (d − Dx) : x ∈ X (11.3)

for fixed u > 0 and showed that IP(u) is a relaxation for (11.1) (Lemma 6.15). We then
considered the Lagrangian Dual for (11.1)

(LD) wLD = min {z(u) : u > 0} . (11.4)

Clerly, wLD > z, and Theorem 6.17 gave us precise information about the relation between
wLD and z, namely,


wLD = max cT x : Dx 6 d, x ∈ conv(X) .

In this chapter we will learn more about Lagrangian duality. We will investigate how we
can use information from a Lagrangian Relaxation to determine values for the variables
in the original problem (variable fixing) and we will also sketch how to actually solve the
Lagrangian Dual.

11.1 Convexity and Subgradient Optimization


In the proof of Theorem 6.17 we derived two Linear Programming formulation for solving
the Lagrangian dual (11.4):

wLD = min t (11.5a)


k T T k
t + (Dx − d) u > c x for k ∈ K (11.5b)
j T T j
(Dr ) u > c r for j ∈ J (11.5c)
u ∈ Rm
+ ,t ∈ R (11.5d)
168 More About Lagrangian Duality

and
 
X X
wLD = max cT  αk xk + βj rj  (11.6a)
k∈K j∈J
X
αk = 1 (11.6b)
k∈K
 
X X X
D αk xk + βj rj  6 d( αk ) (11.6c)
k∈K j∈J k∈K

αk , βj > 0, for k ∈ K, j ∈ J. (11.6d)

Here, xk , k ∈ K are the extreme points and rj , j ∈ J are the extreme rays of conv(X), where
X is defined in (11.2). Formulation (11.5) contains a huge number of constraints, so a
practially efficient solution method will have to use a cutting-plane algorithm. Formula-
tion (11.6) contains a huge number of constraints, in fact, it is a Dantzig-Wolfe reformula-
tion and we may solve it by column generation (see Chapter 10).
In this section we describe an alternative approach to solve the Lagrangian dual. This
approach is comparatively simple, easy to implement and exploits the fact that the func-
tion z(u), defined in (11.3) is convex and piecewise linear (we will prove this fact in a
minute).

Definition 11.1 (Convex Function)


A function f : Rn → R is convex, if

f(λx + (1 − λ)y) 6 λf(x) + (1 − λ)f(y)

for all x, y ∈ Rn and all λ ∈ [0, 1].

The definition above simply states that for any point λx + (1 − λ)y on the segment joining
x and y the function value f(λx + (1 − λ)y) lies below the segment connecting f(x) and
f(y). Figure 11.1 provides an illustration for the one-dimensional case.

λx+(1−λ)y 
λf(x)+(1−λ)f(y) y

λx+(1−λ)y 
f(λx+(1−λ)y)

Figure 11.1: A convex function.

Convex functions have a the nice property that local minimal are also global minima:

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


11.1 Convexity and Subgradient Optimization 169

Definition 11.2 (Local and global minimum)


The point x∗ ∈ Rn is a local minimum of f : Rn → R if there is δ > 0 such that f(x) >
f(x∗ ) for all kx − x∗ k 6 δ. The point x∗ is a global minimum if f(x) > f(x∗ ) for all
x ∈ Rn .

Lemma 11.3 Let x∗ be a local minimum of the convex function f : Rn → R. Then, x∗ is


also a global minimum.

Proof: See Homework. ✷

It can be shown that a differentiable function f : Rn → R is convex if and only if

f(x) > f(x∗ ) + ∇f(x∗ )(x − x∗ )

for all x, x∗ ∈ Rn , see e.g. [Lue84]. Here, ∇f(x∗ ) is the gradient of f at x∗ . So x∗ is a local
(and by Lemma 11.3 also a global) minimizer of a convex differentiable function f : Rn →
R if and only if ∇f(x∗ ) = 0. The notion of a subgradient is meant as an extension to the
nondifferentiable (and possibly non-smooth) case:

Definition 11.4 (Subgradient, subdifferential)


Let f : Rn → R be convex. The vector s ∈ Rn is a subgradient of f at x∗ if

f(x) > f(x∗ ) + sT (x − x∗ )

for all x ∈ Rn . The subdifferential ∂f(x) of f at x is the set of all subgradients of f at x.

Lemma 11.5 Let f : Rn → R be convex. Then, x∗ is an optimal solution of min {f(x) : x ∈ Rn }


if and only if 0 ∈ ∂f(x∗ ).

Proof: We have

0 ∈ ∂f(x∗ ) ⇔ f(x) > f(x∗ ) + 0T (x − x∗ ) for all x ∈ Rn


⇔ f(x) > f(x∗ ) for all x ∈ Rn

The subgradient algorithm (see Algorithm 11.1) is designed to solve problems of the form

min {f(x) : x ∈ Rn } ,

where f is convex. At any step, it chooses an arbitrary subgradient and moves into that
direction. Of course, the question arises how to get a subgradient, which subgradient to
choose and how to select the steplengths θk .

Algorithm 11.1 Subgradient algorithm for minimizing a convex function f : Rn → R.


S UBGRADIENT-A LG(f)
1 Choose a starting point x0 ∈ R n and set k := 0.
2 Choose any subgradient s ∈ ∂f(xk ).
3 while s 6= 0 do
4 Set xk+1 := xk − θk s for some θk > 0.
5 Set k = k + 1.
6 end while
7 xk−1 is an optimal solution. stop.

We will not elaborate on this in the general setting and restrict ourselves to the special case
of the Lagrangian dual.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


170 More About Lagrangian Duality

11.2 Subgradient Optimization for the Lagrangian Dual


Before we start to apply subgradient descent methods to the Lagrangian dual, we will
first prove that the function we wish to minimize actually possesses the desired structural
properties.

Lemma 11.6 Let f1 , . . . , fm : Rn → R be convex functions and

f(x) = max {fi (x) : i = 1, . . . , m}

be the pointwise maximum of the fi . Then, f is convex.

Proof: Let x, y ∈ Rn and λ ∈ [0, 1]. For i = 1, . . . , m we have from the convexity of fi

fi (λx + (1 − λ)y) 6 λfi (x) + (1 − λ)fi (y).

Thus,

f(λx + (1 − λ)y) = max {fi (λx + (1 − λ)y) : i = 1, . . . , m}


6 max {λfi (x) + (1 − λ)fi (y) : i = 1, . . . , m}
6 λ max {fi (x) : i = 1, . . . , m} + (1 − λ) max {fi (y) : i = 1, . . . , m}
= λf(x) + (1 − λ)f(y).

Thus, f is convex. ✷

Theorem 11.7 The function z(u) defined in (11.3) is piecewise linear and convex on the
domain over which it is finite.

Proof: Let xk , k ∈ K be the extreme points and rj , j ∈ J be the extreme rays of conv(X).
Fix u > 0. We have (see also Theorem 6.17):


z(u) = max cT x + uT (d − Dx) : x ∈ conv(X)

+∞ if (cT − uT D)rj > 0 for some j ∈ J
=
cT xk + uT (d − Dxk ) for some k ∈ K otherwise.

So, z(u) is finite if and only if u is contained in the polyhedron




Q := y ∈ Rm + : uT
Drj
> cT j
r for all j ∈ J

If u ∈ Q, then

z(u) = uT d + max (cT − uT D)xk : k ∈ K

is the maximum of a finite set of affine functions fk (u) = uT d + (cT − uT D)xk , (k ∈ K).
Since affine functions are convex, the convexity of z(u) follows from Lemma 11.6. ✷

The above theorem shows that the function occuring in the Lagrangian dual is a particular
convex function: it is piecewise linear. This property enables us to derive subgradients
easily:

Lemma 11.8 Let ū > 0 and x(ū) be an optimal solution of the Lagrangian relaxation
IP(ū) given in (11.3). Then, the vector d − Dx(ū) is a subgradient of z(u) at ū.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


11.3 Lagrangian Heuristics and Variable Fixing 171

Proof: For any u > 0 we have




z(u) = max cT x + uT (d − Dx) : x ∈ X
> cT x(ū) + uT (d − Dx(ū)
= cT x(u) + ūT (d − Dx(ū)) + (u − ū)T (d − Dx(ū))
= z(u) + (u − ū)T (d − Dx(ū))
= z(u) + (d − Dx(ū))T (u − ū)

This proves the claim. ✷

Algorithm 11.2 Subgradient algorithm for solving the Lagrangian dual.


S UBGRADIENT-LD
1 Choose a starting dual vector u0 ∈ R n
+ and set k := 0.
2 repeat
3 Solve the Lagrangian Problem:


IP(ut ) z(uk ) = max cT x + (uk )T (d − Dx) : x ∈ X

and let x(uk ) be its optimal solution.


4 Set s := d − Dx(uk ), then st is a subgradient of z(u) at uk (Lemma 11.8).
5 Set uk+1 := max{uk − θk s, 0} for some θk > 0.
6 Set k := k + 1.
7 until s = 0
8 uk−1 is an optimal solution. stop.

We state the following theorem without proof:

Theorem 11.9 Let θt be the sequence of set lengths chosen by Algorithm 11.2.
P∞
(i) If limk→∞ θk = 0 and k=0 θk = ∞, then limk→∞ z(uk ) = wLD .
(ii) If θk = θ0 ρk for some 0 < ρ < 1, then limk→∞ z(uk ) = wLD if θ0 and ρ are suffi-
ciently large.
z(uk )−w̄
(i) If w̄ > wLD and θk = εk kd−Dx(uk )k with 0 < εk < 2, then limk→∞ z(uk ) = wLD ,
or the algorithm finds uk with w̄ > z(uk ) > wLD for some finite k.

11.3 Lagrangian Heuristics and Variable Fixing


Suppose that we solve the Lagrangian dual by some algorithm, e.g. by the subgradient
method described in the previous section. It makes sense to hope that once the multipliers
uk approach the optimal solution (of the dual), we may possibly extract some information
about the original IP (11.1). In particular, we would hope that x(uk ), the optimal solution
of the problem 11.3 is «close » to an optimal solution of (11.1).
In this section we examine this idea for a particular example, the set-covering problem (see
also Example 1.10). In this problem we are given a finite ground set U and a collection
F ⊆ 2N of subsets of N. There is a cost cf associated with every set f ∈ F. We wish to find

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


172 More About Lagrangian Duality

a subcollection of the sets in F of minimum cost such that each element in U is covered at
least once:  
X X
F
min cf xf : aif xf > 1 for i ∈ N, x ∈ B . (11.7)
f∈F f∈F

Here, as in Example 1.10)



1 if element i is contained in set f
aif :=
0 otherwise.

Let us consider the Lagrangian relaxation of (11.7) where we dualize all covering con-
straints. For u > 0 we have:
 ! 
X X X
z(u) = min cf xf + ui 1 − aif xf , x ∈ BF
f∈F i∈N f∈F
 ! 
X X X
F
= min ui + cf − ui aif xf : x ∈ B (11.8)
i∈N f∈F i∈N

Observe that the problem (11.8) is trivial to solve. Let


 
X
+
F := f ∈ F : cf − ui aif > 0
i∈N
 
X

F := f ∈ F : cf − ui aif < 0
i∈N
 
X
F0 := f ∈ F : cf − ui aif = 0
i∈N

We set xf = 1 if f ∈ F− and xf := 0 if f ∈ F+ ∪ F0 . otherwise. So, given u > 0, we can


easily calculate x(u) and the corresponding optimal value z(u).
Given x(u) ∈ BF , we can P use this vector to obtain a feasible solution xH of (11.7): We
drop all rows i ∈ N where f ∈ Faif xf > 1 and solve the remaining smaller set-covering
problem by a greedy-heuristic (always choose a set such that the ratio of the cost over
the number of elements newly covered is minimized). If y∗ is the heuristic solution, then
xH := x(u) + y∗ is a feasible solution to (11.7).
Once we are given xH we can use the dual information in the multipliers u for variable
fixing. Suppose that x̄ is a feasible solution for (11.7) with cT x̄ < cT xH . Since we are
dealing with a relaxation, we have
!
X X X
ui + cf − ui aif x̄f 6 cT x < cT xH . (11.9)
i∈N f∈F i∈N

P P P
Let f ∈ F+ and suppose that > cT xH . Then,

i∈N ui + f∈F− cf − i∈N ui aif
(11.9) can only hold if x̄f = 0.
P P P
Similarly, let f ∈ F− and suppose that

i∈N ui + f∈F− \{f} cf − i∈N ui aif >
cT xH . Then, for (11.9) to hold we must have x̄f = 1.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Notation
A
This chapter is intended mainly as a reference for the notation used in these lecture notes
and the foundations this work relies on. We assume that the reader is familiar with ele-
mentary graph theory, graph algorithmic concepts, and combinatorial optimization as well
as with basic results from complexity theory. For detailed reviews we refer the reader to
monographs and textbooks which are listed at the end of this chapter.

A.1 Basics
By R (Q, Z, N) we denote the set of real (rational, integral, natural) numbers. The set
N of natural numbers does not contain zero. R+ + +
0 (Q 0 , Z 0 ) denotes the nonnegative real
(rational, integral) numbers.
The rounding of real numbers x ∈ R+ is denoted by the notation ⌊x⌋ := max{ n ∈ N ∪ {0} :
n 6 x } and ⌈x⌉ := min{ n ∈ N : n > x }.
By 2S we denote the power set of a set S, which is the set of all subsets of set S (including
the empty set ∅ and S itself).

A.2 Sets and Multisets


A multiset Y over a ground set U, denoted by Y ❁ U, can be defined as a mapping Y : U →
N, where for u ∈ U the number Y(u) denotes the multiplicity of u in Y. We write u ∈ Y if
Y(u) > 1. If Y ❁ U then X ❁ Y denotes a multiset over the ground set { u ∈ U : Y(u) > 0 }.
If Y ❁ U and Z ❁ U are multisets over the same ground set U, then we denote by Y +Z their
multiset union, by Y − Z their multiset difference and by Y ∩ Z their multiset intersection,
defined for u ∈ U by

(Y + Z)(u) = Y(u) + Z(u)


(Y − Z)(u) = max{Y(u) − Z(u), 0}
(Y ∩ Z)(u) = min{Y(u), Z(u)}.

The multiset Y ❁ U is a subset of the multiset Z ❁ U, denoted by Y ⊆ Z, if Y(u) 6 Z(u)


for all u ∈ U.
P For a weight function c : U → R the weight of a multiset Y ❁ U is defined
by
P c(Y) := u∈U c(u)Y(u). We denote the cardinality of a multiset Y ❁ U by |Y| :=
u∈U Y(u).
174 Notation

Any (standard) set can be viewed as a multiset with elements of multiplicity 0 and 1. If X
and Y are two standard sets with X ⊆ Y and X 6= Y, then X is a proper subset of Y, denoted
by X ⊂ Y. Two subsets X1 ⊆ Y, X2 ⊆ Y of a standard set Y form a partition of Y, if
Y = X1 ∪ X2 and X1 ∩ X2 = ∅.

A.3 Analysis and Linear Algebra


Reference books: [Rud76]

A metric space (X, d) consists of a nonempty set X and a distance function or metric d : X×
X → R+ which satisfies the following three conditions:

(i) d(x, y) > 0 if x 6= y; d(x, x) = 0;


(ii) d(x, y) = d(y, x);
(iii) d(x, y) 6 d(x, z) + d(z, y) for all z ∈ X.

Inequality (iii) is called the triangle inequality. An example of a metric space is the
set Rp endowed with the Euclidean metric which for vectors x = (x1 , . . . , xp ) ∈ Rp and
y = (y1 , . . . , yp ) ∈ Rp is defined by
p
!1/2
X
2
d(x, y) := (xi − yi ) .
i=1

This metric space is usually referred to as the Euclidean space.


A path in a metric space (X, d) is a continuous function γ : [0, 1] → X. The path γ is called
rectifiable, if for all dissections 0 = t0 < t1 < · · · < tk = 1 of the interval [0, 1] the sum
k
X
d(γ(ti ), γ(ti−1 ))
i=1

is bounded from above. The supremum of the sums, taken over all dissections, is then
referred to as the length of the path γ.

A.4 Growth of Functions


Reference books: [CLR90, AHU74]

Let g be a function from N to N. The set O(g(n)) contains all those functions f : N → N
with the property that there exist constants c > 0 and n0 ∈ N such that f(n) 6 c · g(n) for
all n > n0 . A function f belongs to the set Ω(g(n)), if and only if g(n) ∈ O(f(n)). The
notation f(n) ∈ Θ(g(n)) means that f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n)). Finally, we
write f(n) ∈ o(g(n)), if lim f(n)/g(n) = 0.
n→∞

A.5 Particular Functions


We use loga to denote the logarithm function to the basis of a. We omit the basis in the
case of a = 2 for the sake of convenience. By ln n we denote the natural logarithm of a
positive number n, that is, ln n := loge n.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


A.6 Probability Theory 175

A.6 Probability Theory

Reference books: [Fel68, Fel71, MR95]

A probability space (Ω, F, Pr) consists of a σ-field (Ω, F) with a probability measure Pr
defined on it. When specifying a probability space, F may be omitted which means that
the σ-field referred to is (Ω, 2Ω ).

In this thesis we are mainly concerned with the case that Ω is either the the set of real
numbers R or an interval contained in R. In this context a density function is a non-
negativeR function p : R → R+ whose integral, extended over the real numbers, is unity,
+∞
that is −∞ p(x) dx = 1. The density corresponds to the probability measure Pr, which
satisfies

Zt
Pr [] x ∈ (−∞, t] = p(x) dx.
−∞

A.7 Graph Theory

Reference books: [Har72, AMO93]

A mixed graph G = (V, E, R) consists of a set V of vertices (or nodes), a set E of undirected
edges, and a multiset R of directed arcs. We usually denote by n := |V|, mE := |E| and
mR := |R| the number of vertices, edges and arcs, in G respectively. Throughout the thesis
we assume that V, E, and R are all finite. If R = ∅, we briefly write G = (V, E) and call G
an undirected graph (or simply graph) with vertex set V and edge set E. If E = ∅, we refer
to G = (V, R) as a directed graph with vertex set V and arc (multi-) set R.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


176 Notation

Each undirected edge is an unordered pair [u, v] of distinct vertices u 6= v. The edge [u, v]
is said to be incident to the vertices u and v. Each arc is an ordered pair (u, v) of vertices
which is incident to both u and v. We refer to vertex u as the source of arc (u, v) and to
vertex v as its target. The arc (u, v) emanates from vertex u and terminates at vertex v. An
arc (u, v) is incident to both vertices u and v. The arc (u, v) is an outgoing arc of node u
and an incoming arc of vertex v. We call two vertices adjacent, if there is an edge or an arc
which is incident with both of them.
Two arcs are called parallel arcs if they refer to copies of the same element (u, v) in the
multiset R. Arcs (u, v) and (v, u) are termed anti-parallel or inverse. We write (u, v)−1 :=
(v, u) to denote an inverse arc to (u, v). For a set R of arcs we denote by R−1 the set
R−1 := { r−1 : r ∈ R }.
Let G = (V, E, R) be a mixed graph. A graph H = (VH , EH , RH ) is a subgraph of G if
VH ⊆ V, EH ⊆ E and RH ⊆ R. For a multiset X ❁ E + R we denote by G[X] the subgraph
of G induced by X, that is, the subgraph of G consisting of the arcs and edges in X together
with their incident vertices. A subgraph of G induced by vertex set X ⊆ V is a subgraph
with node set X and containing all those edges and arcs from G which have both endpoints
in X.
For v ∈ V we let Rv be the set of arcs in R emanating from v. The outdegree of a vertex v
in G, denoted by deg+ G (v), equals the number of arcs in G leaving v. Similarly, the indegree
deg−G (v) is defined to be the number of arcs entering v. If X ❁ R, we briefly write deg+X (v)

and degX (v) instead of deg+ G[X]
(v) and deg −
G[X]
(v). The degree of a vertex v in an undi-
rected graph G = (V, E) is defined to be the number of edges incident with v.
A subset C of the vertices of an undirected graph G = (V, E) such that every pair of vertices
is adjacent is called a clique of size |C| in the graph G. A graph G whose vertex set forms
a clique is said to be a complete graph.
A path P in an undirected graph G = (V, E) is defined to be an alternating sequence
p = (v1 , e1 , v2 , . . . , ek , vk+1 ) of nodes vi ∈ V and edges ei ∈ E, where for each triple
(vi , ei , vi+1 ) we have ei = (vi , vi+1 ). We use equivalently the alternative notation P =
(v1 , v2 , . . . , vk+1 ) and P = (e1 , e2 , . . . , ek ) when the meaning is clear. For directed graphs
G = (V, R), edges are replaced by arcs, and we require ri = (vi , vi+1 ) and ri ∈ R ∪ R−1
for each triple. If the stronger condition ri ∈ R holds, the path is called directed.
For mixed graphs, we define a walk which traverses arbitrarily edges and directed arcs. An
oriented walk is a “directed version” of a walk in the sense that for any two consecutive
vertices vi and vi+1 we require that either xi is an undirected edge [vi , vi+1 ] between vi
and vi+1 or a directed arc (vi , vi+1 ) from vi to vi+1 .
If all nodes of the path or walk are pairwise different (without considering the pair
v1 , vk+1 ), the path or walk is called simple. A path or walk with coincident start and
endpoint is closed. A closed and simple path or walk is a cycle. An Eulerian cycle in a
directed graph G = (V, R) is a directed cycle which contains (traverses) every arc from R
exactly once. The directed graph G is called Eulerian if it contains an Eulerian cycle. A
Hamiltonian path Hamiltonian cycle) is a simple path (cycle) which touches every vertex
in a directed (or undirected) graph.
A mixed graph G = (V, E, R) is connected (strongly connected), if for every pair of ver-
tices u, v ∈ V with u 6= v there is an walk (oriented walk) from u to v in G. A (strongly)
connected subgraph of G which is maximal with respect to set inclusion is called (strongly)
connected component of G.
A tree is a connected graph that contains no cycle. A node in a tree is called a leaf if its
degree equals 1, and an inner node otherwise. A spanning tree of a graph G is a tree which
has the same vertex set as G.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


A.8 Theory of Computation 177

A Steiner tree with respect to a subset K of the vertices of an undirected graph G, is a tree
which is a subgraph of G and whose vertex set includes K. The vertices in K are called
terminals.
A directed in-tree rooted at o ∈ V is a subgraph of a directed graph H = (V, A) which is a
tree and which has the property that for each v ∈ V it contains a directed path from v to o.
Additional definitions to the basic ones presented above will be given in the respective
contexts.

A.8 Theory of Computation


Reference books: [GJ79, Pap94, GLS88, CLR90]

Model of Computation
The Turing machine [GJ79] is the classical model of computation that was used to define
the computational complexity of algorithms. However, for practical purposes it is fairly
more convenient to use a different model. In the random access machine or RAM model
[Pap94, MR95] we have a machine which consists of an infinite array of registers, each
capable of containing an arbitrarily large integer, possibly negative. The machine is capa-
ble of performing the following types of operations involving registers and main memory:
input-output operations, memory-register transfers, indirect addressing, arithmetic opera-
tions and branching. The arithmetic operations permitted are addition, subtraction, mul-
tiplication and division of numbers. Moreover, the RAM can compare two numbers and
evaluate the square root of a positive number.
There are two types of RAM models used in literature. In the log-cost RAM the execution
time of each instruction takes time proportional to the encoding length, i.e. proportional to
the logarithm of the size of its operands, whereas in the unit-cost RAM each instruction can
be accomplished in one time step. A log-cost RAM is equivalent to the Turing machine
under a polynomial time simulation [Pap94]. In contrast, in general there is no polynomial
simulation for a unit-cost RAM, since in this model we can compute large integers too
quickly by using multiplication. However, if the encoding lengths of the operands occur-
ring during the run of an algorithm on a unit-cost RAM are bounded by a polynomial in
the encoding length of the input, a polynomial time algorithm on the unit-cost RAM will
transform into a polynomial time algorithm on a Turing machine [GLS88, Pap94]. This
argument remains valid in the case of nondeterministic programs.
For convenience, we will use the general unit-cost RAM to analyze the running time of
our algorithms. This does not change the essence of our results, because the algorithms in
which we are interested involve only operations on numbers that are not significantly larger
than those in the input.

Computational Complexity
Classical complexity theory expresses the running time of an algorithm in terms of the
“size” of the input, which is intended to measure the amount of data necessary to describe
an instance of a problem. The running time of an algorithm on a specific input is defined to
be the sum of times taken by each instruction executed. The worst case time complexity or
simply time complexity of an algorithm is the function T (n) which is the maximum running
time taken over all inputs of size n (cf. [AHU74, GJ79, GLS88]).
An alphabet Σ is a nonempty set of characters. By Σ∗ we denote the set of all strings
over Σ including the empty word. We will assume that every problem Π has an (encoding

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


178 Notation

independent) associated function length : DΠ → N, which is polynomially related to the


input lengths that would result from a “reasonable encoding scheme”. Here, DΠ ⊆ Σ∗
is the set of instances of the problem Π, expressed as words over the alphabet Σ. For a
more formal treatment of the input length and also of the notion of a “reasonable encoding
scheme” we refer to [GJ79].
A decision problem is a problem where each instance has only one of two outcomes from
the set {yes, no}. For a nondecreasing function f : N → N the deterministic time complexity
class DTIME(f(n)) consists of the decision problems for which there exists a deterministic
Turing machine deciding the problem in O(f(n)) time. Its nondeterministic counterpart
NTIME(f(n)) is defined analogously. The most important complexity classes with respect
to this thesis are

[ ∞
[
P := DTIME(nk ) and NP := NTIME(nk ) .
k=1 k=1

Suppose we are given two decision problems Π and Π ′ . A polynomial time transformation
is an algorithm t which, given an encoded instance I of Π, produces in polynomial time an
encoded instance t(I) of Π ′ such that the following holds: For every instance I of Π, the
answer to Π is “yes” if and only if the answer to the transformation t(I) (as an instance
of Π ′ ) is “yes”. A decision problem Π is called NP-complete if Π ∈ NP and every other
decision problem in NP can be transformed to Π in polynomial time.
To tackle also optimization problems rather than just decision problems it is useful to extend
the notion of a transformation between problems. Informally, a polynomial time Turing
reduction (or just Turing reduction) from a problem Π to a problem Π ′ is an algorithm
alg, which solves Π by using a hypothetical subroutine alg’ for solving Π ′ such that
if alg’ were a polynomial time algorithm for Π ′ , then alg would be a polynomial time
algorithm for Π. More precisely, a polynomial time Turing reduction from Π to Π ′ is a
deterministic polynomial time oracle Turing machine (with oracle Π ′ ) solving Π.
An optimization problem Π is called NP-hard (“at least as difficult as any problem in NP”),
if there is an NP-complete decision problem Π ′ such that Π ′ can be Turing reduced to Π.
Results from complexity theory (see e.g. [GJ79]) show that such an NP-hard optimization
problem can not be solved in polynomial time unless P = NP.

File: –sourcefile– Revision: –revision– Date: 2017/03/06 –time–GMT


Symbols
B
∅ the empty set
Z the set of integers, that is, Z = {. . . , −2, −1, 0, 1, 2, . . . }
Z+ the set of nonnegative integers Z+ = {0, 1, 2, . . . }
N the set of natural numbers, N = Z+
Q the set of rational numbers
Q+ the set of nonnegative rational numbers
R the set of real numbers
R+ the set of nonnegative real numbers
2A the set of subsets of the set A
|A| the cardinality of the (multi-) set A
A⊆B A is a (not necessarily proper) subset of B
A⊂B A is a proper (multi-) subset of B
Y❁U Y is a multiset over the ground set U
f(n) ∈ O(g(n)) f grows at most as fast as g (see page 174)
f(n) ∈ Ω(g(n)) f grows at least as fast as g (see page 174)
f(n) ∈ Θ(g(n)) f grows exactly as fast as g (see page 174)
f(n) ∈ o(g(n)) f grows more slowly than g (see page 174)
loga logarithm to the basis of a
log logarithm to the basis of 2
ln natural logarithm (basis e)
G = (V, E) undirected graph with vertex set V and edge set E
G = (V, A) directed graph with vertex set V and edge set E
δ(S) P that have exactly one endpoint in S
set of edges
x(S) x(S) = s∈S xs
Bibliography

[Ada96] S. Adams, The Dilbert principle, HarperCollins, New York, 1996.


[AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of com-
puter algorithms, Addison-Wesley Publishing Company, Inc., Reading, Mas-
sachusetts, 1974.
[AMO93] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Networks flows, Prentice Hall,
Englewood Cliffs, New Jersey, 1993.
[BDG88] J. L. Balcázar, J. Díaz, and J. Gabarró, Structural complexity I, EATCS mono-
graphs on theoretical computer science, Springer, 1988.
[BDG90] , Structural complexity II, EATCS monographs on theoretical computer
science, Springer, 1990.
[CCPS98] W. J. Cook, W. H. Cunningham, W. R. Pulleyblank, and A. Schrijver, Combi-
natorial optimization, Wiley Interscience Series in Discrete Mathematics and
Optimization, John Wiley & Sons, 1998.
[Chv83] V. Chvátal, Linear programming, W. H. Freeman and Company, 1983.
[CLR90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to algorithms,
MIT Press, 1990.
[Coo71] S. A. Cook, The complexity of theorem-proving procedures, Proceedings of the
3rd Annual ACM Symposium on the Theory of Computing, 1971, pp. 151–158.
[Fel68] W. Feller, An introduction to probability theory and its applications, 3 ed.,
vol. 1, John Wiley & Sons, Inc., 1968.
[Fel71] , An introduction to probability theory and its applications, 2 ed., vol. 2,
John Wiley & Sons, Inc., 1971.
[GJ79] M. R. Garey and D. S. Johnson, Computers and intractability (a guide to the
theory of NP-completeness), W.H. Freeman and Company, New York, 1979.
[GLS88] M. Grötschel, L. Lovász, and A. Schrijver, Geometric algorithms and combi-
natorial optimization, Springer-Verlag, Berlin Heidelberg, 1988.
[Har72] F. Harary, Graph theory, Addison-Wesley Publishing Company, Inc., 1972.
182 BIBLIOGRAPHY

[Kar72] R. M. Karp, Reducibility among combinatorial problems, Complexity of Com-


puter Computations (R. E. Miller and J. W. Thatcher, eds.), Plenum Press, New
York, 1972, pp. 85–103.

[Kru04] S. O. Krumke, Integer programming, Lecture notes,


http://optimierung.mathematik.uni-kl.de/~krumke, 2004.

[Kru06] , Integer programming, Lecture notes,


http://optimierung.mathematik.uni-kl.de/~krumke, 2006.
[Lue84] D. G. Luenberger, Linear and nonlinear programming, 2 ed., Addison-Wesley,
1984.

[MR95] R. Motwani and P. Raghavan, Randomized algorithms, Cambridge University


Press, 1995.

[NW99] G. L. Nemhauser and L. A. Wolsey, Integer and combinatorial optimization,


Wiley-Interscience series in discrete mathematics and optimization, John Wiley
& Sons, 1999.

[Pap94] C. M. Papadimitriou, Computational complexity, Addison-Wesley Publishing


Company, Inc., Reading, Massachusetts, 1994.

[Rud76] W. Rudin, Principles of mathematical analysis, 3 ed., McGraw Hill, 1976.

[Sch86] A. Schrijver, Theory of linear and integer programming, John Wiley & Sons,
1986.

[Sch03] , Combinatorial optimization: Polyhedra and efficiency, Springer,


2003.

[Wol98] L. A. Wolsey, Integer programming, Wiley-Interscience series in discrete math-


ematics and optimization, John Wiley & Sons, 1998.

You might also like