Integer Optimization-Stephan Weltge
Integer Optimization-Stephan Weltge
Stefan Weltge
1
A more specific name would have been “Linear Integer Optimization”.
2
We would like to thank Volker Kaibel for providing his great lecture notes, which
have been the main source for designing this course. In fact, several parts are direct
translations of parts from his lecture notes.
3
This course assumes solid knowledge of Linear Algebra and the basics of Linear
Optimization.
Contents
1 Introduction 3
1.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Lecture 1
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Relation to other problems . . . . . . . . . . . . . . . . . . . . . 7 Lecture 2
1.4 P and NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Lecture 3
4 Integer hulls 31
4.1 Refresher on polyhedra . . . . . . . . . . . . . . . . . . . . . . . . 31 Lecture 9
4.2 Integer points in polyhedra . . . . . . . . . . . . . . . . . . . . . 32
4.3 Integer hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Lecture 10
4.4 Refresher on faces of polyhedra . . . . . . . . . . . . . . . . . . . 35
4.5 Solving IPs via convexification . . . . . . . . . . . . . . . . . . . 37 Lecture 11
5 Integer polyhedra 38
5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Total unimodularity . . . . . . . . . . . . . . . . . . . . . . . . . 38 Lecture 12
5.3 Incidence matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 Integer-valued polyhedra . . . . . . . . . . . . . . . . . . . . . . . 44 Lecture 13
5.5 Total dual integrality . . . . . . . . . . . . . . . . . . . . . . . . . 45 Lect’s 14 & 15
6 Chvátal-Gomory cuts 50
6.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Lecture 16
6.2 Chvátal-Gomory closure . . . . . . . . . . . . . . . . . . . . . . . 51 Lecture 17
1
CONTENTS 2
7 Split cuts 58
7.1 Split disjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Lecture 18
7.2 Split closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Lecture 19
7.3 Split cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Introduction
3
CHAPTER 1. INTRODUCTION 4
1.2 Examples
Integer programs are among the most common models to mathematically de-
scribe a wide variety of problems in operations research and related fields. This
is due to their high expressive power, which made them the tool of choice for
numerous real-world optimization problems, and also led to a large ecosystem of
commercial solvers and modeling languages supporting integer programs. In this
section, we want to illustrate with some first examples how various optimization
tasks can be easily formulated as integer programs. During the exercises, you
will learn how to derive such formulations on your own and how to solve them
using software tools.
Pandemic restrictions
The r-value (basis reproduction number) is an important measure to understand
the spread of a pandemic. Let us suppose that due to many restrictions we
reached an r-value of 0.6. We now want to drop some restrictions but still keep
an r-value of at most 0.9. To this end, we may open some facilities. We assume
that opening a facility type has a certain value to the society and results in
certain increase of the r-value (and that these effects are independent among
the facility types), see the table below. Our goal is to find a subset of facility
types whose openings do not result in an r-value above 0.9 and whose total
value is largest possible.
Stable sets
Let G = (V, E) be an undirected graph. A subset S ⊆ V of nodes is called a
stable set (or independent set) if no two nodes in S are adjacent. Our task is
to find a stable set of maximum cardinality. (We might think of a situation in
which we want to perform as many actions as possible, but some of which are
conflicting with each other. A simple example is inviting friends to a party by
avoiding people who dislike each other.) A possible formulation is the following.
X
max xv : xv + xw ≤ 1 for all {v, w} ∈ E,
v∈V
x ∈ {0, 1}V
Sudoku
Suppose we are given some entries of a (9 × 9)-matrix that we have to complete
to a Sudoku matrix, say (i1 , j1 , a1 ), . . . , (ik , jk , ak ), where the a` is the given
entry at row i` and j` . How do we determine the missing entries (or decide
that the given entries cannot be completed to a Sudoku matrix) using integer
programming?
For this problem, it seems natural to use an integer variable with range
{1, 2, . . . , 9} for every entry of the final matrix. However, one quickly realizes
that such variables alone do not allow us to formulate the requirements of a
Sudoku matrix using linear constraints. Again, we will make use of binary vari-
ables. To this end, for every entry i, j of the Sudoku matrix, we will introduce
binary variables xi,j,1 , xi,j,2 , . . . , xi,j,9 where we think of xi,j,t = 1 if and only if
the (i, j)-entry is filled with number t.
We now want to determine a solution x subject to the following constraints
CHAPTER 1. INTRODUCTION 6
9
X
xi,j,t = 1 for all i, j ∈ [9],
t=1
9
X
xi,j,t = 1 for all i, t ∈ [9],
j=1
9
X
xi,j,t = 1 for all j, t ∈ [9],
i=1
p+3
X q+3
X
xi,j,t = 1 for all t ∈ [9], p, q ∈ {0, 3, 6}
i=p+1 j=q+1
Note that the above problem does not contain an objective function. Prob-
lems of this type are called feasibility problems.
Facility location
Finally, let us consider the facility location problem. We are given facilities
n that we may open/build in order to ship products. Our m customers have
demands D1 , . . . , Dm > 0. For each i ∈ [m], j ∈ [n], the cost of shipping one
unit from facility j to customer i is given by ci,j > 0. Moreover, for each j ∈ [n]
we are given a fixed cost fj > 0 for opening facility j and a total capacity uj > 0
of units that facility j can ship at all.
We want to find a subset of facilities to be opened such that we can serve
the customers’ demands at minimum total cost. A possible integer program to
solve this problem is the following.
m
X m X
X n
min fi yi + ci,j xi,j
i=1 i=1 j=1
Xn
xi,j = Di for all i ∈ [m]
j=1
Xm
xi,j ≤ uj yj for all j ∈ [n]
i=1
xi,j ≥ 0 for all i ∈ [m], j ∈ [n]
yj ∈ {0, 1} for all j ∈ [n]
Note that the above problem is a mixed integer program, in which only a
proper subset of the variables is required to be integer.
CHAPTER 1. INTRODUCTION 7
min ax + by : ax + by ≥ 1, x, y ∈ Z (1.1)
We claim that in order to solve the above integer program, some (basic) number
theory is required. To this end, recall that the greatest common divisor gcd(a, b)
is the smallest positive integer that divides a and b.
Proposition 1.1. The optimal value of (1.1) equals gcd(a, b).
We have seen that integer programs can be easily used to model a variety of
problems. This comes at a price: Solving general integer programs is not easy!
From the theoretical point of view, this statement can be formalized in a partic-
ularly strong way. To this end, it will be helpful to focus on decision problems
instead of optimization problems. For example, instead of finding an solution
x ∈ Zn with Ax ≤ b of maximum objective c| x we may ask whether there
exists a solution x ∈ Zn with Ax ≤ b such that c| x ≥ γ for some given γ. In
many cases, optimization problems can be efficiently reduced to a series of such
decision problems (at least in theory).
Let us consider some decision problems that are closely related to integer
programs.
In Linear Algebra, you have learned that the first problem can be solved
“efficiently”. In the next section we will describe a way to formalize what this
means. There also exist efficient algorithms for the second problem (linear
programming). For instance, the simplex algorithm typically runs very fast
in practice. Unfortunately, there are some (mostly artificial) instances that
cause a bad behavior of the simplex algorithm. However, there also exist more
sophisticated methods that will always run “fast”, and we will briefly study them
at some point in the course.
How about the diophantine versions of the above problems? Somehow sur-
prisingly, it is also possible to solve linear diophantine equations efficiently,
which is one of the first main results of this course.
This is all good news. How about linear diophantine inequalities, i.e., in-
teger programs? So far, we have not found algorithms that perform well on
all instances. In fact, there are even strong indications that such algorithms
simply do not exist! The main reason is that integer programs cover many
other problems that are regarded to be theoretically hard, such as the stable set
problem.
1.4 P and NP
In this part, we would like to understand how the notion of an efficient algorithm
can be formalized and how different problems can be compared in terms of
difficulty. Such questions are at the core of computational complexity theory,
a very fundamental branch of theoretical computer science. While it can be
entirely understood within purely mathematical terms, we will only have time to
mention the most important concepts that form the basis of complexity theory,
and still stay quite informal. A standard reference on the topics covered below
is the beautiful book
• Computers and Intractability by M. Garey, D. Johson, 1979.
A more modern reference that covers many exciting recent developments in the
field is the book
• Computational Complexity: A Modern Approach by S. Arora, B. Barak,
2007.
Input sizes: Clearly, the running time of an algorithm may depend on the
size of its input. For instance, the problem of deciding whether a given positive
integer is prime may require more time (= elementary operations) for a large
number than for a small one.
This raises an important point: How shall we measure the size of an input?
For many types of inputs there is a common sense for how we define its size, or
often also called its encoding length. The encoding length can be thought of a
natural way to represent an object in a computer using bits, for example:
Definition 1.2. Given a rational number z = pq with p ∈ Z, q ∈ Z≥1 , gcd(p, q) =
1, the encoding length of z is defined as hzi := 1 + dlog2 (|p| + 1)e + dlog2 (q)e.
Definition 1.3.P
The encoding length of a rational matrix A ∈ Qm×n is defined
m Pn
as hAi := mn + i=1 j=1 hAi,j i.
The class NP: However, the fourth problem (feasibility of integer program-
ming) belongs to another important class of decision problems, called NP. It
is important to mention that NP does not stand for “non-polynomial”! (Un-
fortunately, it stands for something else that might be still quite misleading).
Decision problems in NP admit algorithms with the following property. For
every “yes” input x, there exists some nice extra input y whose encoding length
can be polynomially bounded in the encoding length of x such that if the algo-
rithm is given both x and y it can verify in polynomial time that y is indeed a
certificate for the fact that x is a “yes” input. In other words, for “yes” inputs x
there is a rather short proof y that can be efficiently checked, showing that that
x is indeed a “yes” input. However, for “no” inputs such proofs are not required
to exist. Moreover, the algorithm is not required to find the extra input y itself.
For example, we will see that (the decision version of) integer programming
is in NP: A “yes” instance is a diophantine system of linear inequalities that has
CHAPTER 1. INTRODUCTION 10
a solution. Clearly, the solution itself may be regarded as the extra input y,
and it can be easily checked that y is indeed an integer solution that satisfies all
constraints. A little catch is that we have to be careful about the encoding length
that y requires, but we will see later that this is indeed not an issue. Recall
that the definition of NP does not require the existence of a short certificate y in
the case of a “no” instance. In fact, for “no” instances of diophantine system of
linear inequalities we are not aware of simple proofs certifying their infeasibility.
Let us remark that it is easy to see that P is a subset of NP: Polynomial-time
algorithms for problems in P do not even need any certificate y to determine
whether x is a “yes” input.
However, the class NP contains thousands of nice problems for which no
polynomial-time algorithms are known, including the traveling salesman prob-
lem, the satisfiability problem, or the clique problem.
Take-away: Clearly, this is bad news and shows the limitations of what we can
expect. However, at this point one should recall that the notion of a polynomial-
time algorithm is very restrictive: It has to apply to all inputs and assumes
answers to be always correct. In contrast, it is not clear whether the inputs
we are dealing with in practice will cause worst-case behaviors. Moreover, in
CHAPTER 1. INTRODUCTION 11
We close this introductory chapter with some remarks about duality, a central
concept in the theory of linear programming. To recall, suppose we are given a
linear program
max c| x : Ax ≤ b,
and let us assume that it is feasible and bounded, i.e., the optimum is attained.
The corresponding dual linear program is given by
min b| y : A| y = c, y ≥ 0.
The simple notion of weak duality states that every feasible solution y to the
dual problem provides an upper bound on the optimum value of the primal
problem: Indeed, given a feasible solution x for the primal problem, we see that
c| x = (A| y)| x = y | Ax ≤ y | b = b| y
holds. Notice that the dual variables y simply assign a nonnegative weight to
each inequality of the original system such that the corresponding combination
yields an inequality of the form c| x ≤ γ that holds for all feasible x. The
purpose of the dual linear program is to find the best upper bound γ that you
can prove in this way. The strong linear programming duality states that if we
are given an optimal solution x to the primal, then we can always certify its
optimality with the above simple strategy, i.e.,
At this point, we may ask whether a corresponding duality theorem also holds
for integer programs. The most natural problems to consider are
max c| x : Ax ≤ b, x ∈ Zn
and
min b| y : A| y = c, y ≥ 0, y ∈ Zm .
As for the case of linear programming, we immediately obtain weak duality:
max{c| x : Ax ≤ b, x ∈ Zn } ≤ max{c| x : Ax ≤ b}
= min{b| y : A| y = c, y ≥ 0}
≤ min{b| y : A| y = c, y ≥ 0, y ∈ Zm }.
CHAPTER 1. INTRODUCTION 12
However, strong duality does not hold for general integer programs: For ex-
ample, in the case A = (2), b = (1), c = (1), the two above inequalities are
strict.
Finally, we would like to mention that this is not a singular example. In
fact, for most (difficult) integer programs one should not expect strong duality
to hold. Otherwise, the concept of duality would yield a simple way of certifying
that a given solution is optimal. However, this refers to one of the issues about
integer programming being in NP raised in the previous section: If there exists
a feasible integer solution x∗ of objective value at least γ, there is a short proof
certifying this fact (which is essentially x∗ itself, we will get back to this later).
However, we do not know simple proofs showing that there is no solution of
value at least γ +ε (certifying the optimality of x∗ ). Moreover, again complexity
theory indicates that such proofs may simply not exist.
Chapter 2
13
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 14
We see that x2 is uniquely determined by the other variables and the last equal-
ity. In this sense, we may ignore the second column and fourth row from now
on. Choosing (2, 4) as the next pivot element, we obtain
1 0 1 0 5 | 7
3 0 1 1 11 | 16
2 0 2 0 10 | 14 .
8 1 2 1 24 | 36
Again, we see that x4 is determined by the remaining variables and the second
equation. Thus, we may also ignore the fourth column and the second row from
now on. Our final pivot element will be (1, 1), we obtain the matrix
1 0 1 0 5 | 7
3 0 1 1 11 | 16
0 0 0 0 0 | 0 .
8 1 2 1 24 | 36
Again, we see that x1 is determined by the remaining variables and the first
equation. The last remaining equation is trivially feasible, since its right-hand
side is zero. Thus, we may assign any values to the remaining variables x3 and
x5 and obtain a solution for the original system.
To understand whether the above algorithm runs in polynomial time, let
us consider the following, slightly more general algorithm (in which we should
think of A already as the augmented matrix).
Algorithm 2.1 (Gaussian elimination). Given a matrix A ∈ Qm×n , we create
matrices A(1) , . . . , A(m) ∈ Qm×n as follows. We set A(1) ← A and initialize
M ← [m] and N ← [n]. For each k = 2, . . . , m we consider A(k) ← A(k−1) and
perform the following steps.
(k)
1. If AM,N = 0, we set A(k+1) ← · · · ← A(m) ← A(k) and stop.
(k)
2. Pick any (p, q) ∈ M × N with Ap,q 6= 0.
(k)
(k) (k) Ai,q (k)
3. Ai,N ← Ai,N − (k) Ap,N for every i ∈ M \ {p}.
Ap,q
Note that for each k, step 3 performs at most O(mn) arithmetic operations.
We see that the total number of arithmetic operations is O(m2 n). Thus, the
number of arithmetic operations of the above algorithm is polynomial in the
input size. However, what about the encoding lengths of the numbers arising in
the matrices A(1) , . . . , A(m) ? The following two statements show that encoding
lenghts of all these numbers can indeed be polynomially bounded in the encoding
length of the input matrix.
(k)
Lemma 2.2. For every i, k ∈ [m] and j ∈ [n], the entry Ai,j is zero or (up to
sign) the quotient of the determinants of two square submatrices of A.
Proof. We proceed by induction on k = 1, . . . , m. The statement is trivial for
k = 1. Let k ≥ 2 and consider i ∈ [m], j ∈ [n]. Let M, N denote the respective
sets at the beginning of iteration k, and (p, q) ∈ M × N the respective pivot.
(k) (k−1)
Note if (i, j) ∈
/ M × N , then Ai,j = Ai,j and we are done. The same holds
(k)
if i = p. Moreover, after step 3, we have Ai,q = 0 for all i ∈ M \ {p}. Thus, it
suffices to consider the case (i, j) ∈ (M \ {p} × (N \ {q}).
Let M̄ = [m] \ M and N̄ = [n] \ N . The crucial observation is now that
(k) (k)
AM̄ ,∗ and AM̄ ∪{i},∗
arose from
AM̄ ,∗ and AM̄ ∪{i},∗
by adding multiples of rows to other rows, respectively. In particular, this means
that
(k) (k)
det AM̄ ,N̄ = det AM̄ ,N̄ and det AM̄ ∪{i},N̄ ∪{j} = det AM̄ ∪{i},N̄ ∪{j}
(k)
hold. Finally, let us compute det AM̄ ∪{i},N̄ ∪{j} by expanding along the ith
(k)
row. Since Ai,N̄ = 0, we obtain
(k) (k) (k)
det AM̄ ∪{i},N̄ ∪{j} = Ai,j · det AM̄ ,N̄ ,
and X
hAi = k 2 + (1 + dlog2 (|pi,j | + 1)e + dlog2 (qi,j )e)
(i,j)∈[k]2
Using the Leibniz formula for the determinant, we see that q divides the product
of all qi,j and hence Y
q≤ qi,j . (2.2)
(i,j)∈[k]2
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 16
In particular, we obtain
X
dlog2 (q)e ≤ dlog2 (qi,j )e ≤ hAi. (2.3)
(i,j)∈[k]2
and hence
X
dlog2 (|p| + 1)e ≤ 1 + dlog2 (|p|)e ≤ 1 + (dlog2 (|pi,j | + 1)e + dlog2 (qi,j )e)
(i,j)∈[k]2 )
≤ hAi − 1.
Plugging (2.3) and the latter inequality into (2.1) yields the claim.
We conclude:
Theorem 2.4. Algorithm 2.1 (Gaussian elimination) runs in polynomial time.
Proof. Suppose that (x, y) satisfies (2.5). For every x0 with Ax0 ≤ b we have
c| x0 = (A| y)| x0 = y | Ax0 ≤ y | b = c| x,
and hence x is indeed an optimal solution.
Conversely, if x is an optimal solution, then strong LP duality asserts that
there exists some y such that (x, y) satisfy (2.5).
In what follows, we assume that we are given A ∈ Qm×n , b ∈ Qm and we
want do determine whether the polyhedron
P = P ≤ (A, b) := {x ∈ Rn : Ax ≤ b}
is nonempty. While P might be unbounded, the next lemma states that we may
restrict ourselves to searching feasible points within a certain radius.
Lemma 2.6. There exists a polynomial p such that the following holds for every
A ∈ Qm×n , b ∈ Qm : If P = P ≤ (A, b) is nonempty, then so is P ∩[−δ, δ]n , where
δ := 2p(hAi+hbi) .
Proof. Suppose that P is nonempty. Consider any inclusion-wise minimal face
F of P . We know that F = {x ∈ Rn : A0 x = b0 } for some A0 , b0 arising from A, b
by deleting some rows. In the previous section, we have discussed that there
exists a polynomial p such that Gaussian elimination can be used to compute a
point x ∈ F in time at most p(hA0 i+hb0 i). We may assume that p is monotonely
increasing. Note that we must have
hxi ≤ p(hA0 i + hb0 i) ≤ p(hAi + hbi),
which implies kxk∞ ≤ 2p(hAi+hbi) .
Thus, by possibly adding inequalities −δ ≤ xi ≤ δ, we may √ assume that
P ⊆ [−δ, δ]n , where δ is as in the above lemma. Setting R := nδ, this implies
that
P ⊆ B(R), (2.6)
where we use the notation B(R) := {x ∈ Rn : kxk2 ≤ R} (the ambient dimen-
sion n will be always clear from the context). Note that log2 (R) is polynomially
bounded in the encoding lengths of A, b.
In what follows, we will also assume that if P is nonempty, then it contains
x0 +B(r) for some (unknown) point x0 ∈ Rn and some r > 0 such that log2 (1/r)
is also polynomially bounded in the encoding lenghts of A, b, i.e.,
x0 + B(r) ⊆ P or P = ∅. (2.7)
For instance, this requirement can be achieved by carefully increasing b without
changing the (non)emptyness of P , but we will not have time to comment on
this in detail.
A famous way to quickly detect a point in P or conclude that P is empty,
is given by the ellipsoid method. It was first studied by Naum Shor, Arkadi
Nemirovski, and David Yudin in the 1970s. In 1979, Leonid Khachiyan showed
how this method can be used to obtain the first polynomial-time algorithm in
linear programming, which was a stunning theoretical breakthrough.
As the name suggests, the ellipsoid method relies on the concept of ellipsoids,
which are simply linear transformations of Euclidean balls. We will make use
of the equivalent definition.
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 18
Proof. Consider the case that P is nonempty, in which we know that P contains
a Euclidean ball of radius r and hence vol(P ) ≥ vol(B(r)). Suppose that the
ellipsoid method has not found a point in P after k iterations. Let E1 , E2 , . . . , Ek
denote the ellipsoids computed in step 5 of each iteration, and set E0 = B(R).
By Lemma 2.9 we have vol(Ei ) ≤ e−1/(2n+2) vol(Ei−1 ) for all i ∈ [k]. Recall
that P ⊆ Ek and hence
Thus, we obtain
vol(B(r)) r n
e−k/(2n+2) ≥ = ,
vol(B(R)) R
which implies
R
k ≤ n(2n + 2) ln .
r
Let us close this section by a few remarks. It turns out that the ellipsoid
method does not perform well in practice, and hence is rather of theoretical
interest. However, its theoretical implications are very strong. For instance,
the ellipsoid method never requires an explicit list of inequalities describing
P but only an oracle for deciding whether a point is contained in P and, if
not, providing a violated inequality. This allows to use the ellipsid method
in situations where P is described by huge number of inequalities, or even an
infinite number of inequalities, and hence it is applicable to general convex
optimization problems.
Another famous class of algorithms for solving linear programs are interior
point methods, studied by Yurii Nesterov and Arkadi Nemirovski. In 1984,
Narendra Karmarkar showed that interior point methods can be also used to
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 20
21
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 22
In what follows, we often want to understand whether two sets of vectors gen-
erate the same lattice, or whether a point lies within a lattice. These tasks are
particularly simple if the respective matrices have a special form:
Definition 3.5 (Hermite normal form). A matrix A ∈ Rm×n with m ≤ n has
Hermite normal form (HNF) if the following holds.
1. Ai,j ≥ 0 for all i, j
2. Ai,j = 0 for all j > i
3. Ai,i > 0 for all i
4. Ai,j < Ai,i for all j < i
In other words, a matrix has HNF if every entry is nonnegative (1), it is a
lower triangular matrix (2) with strictly positive diagonal entries (3) that are
strictly larger than all other entries in their row (4). As an example, the matrix
4 0 0 0 0 0
0 π 0 0 0 0
e
√1 5 0 0 0
2.8 2 0 3.1 0 0
has HNF.
Note that if A ∈ Rm×n has HNF, then rank(A) = m. Moreover, Λ(A) is a
lattice and the nonzero columns of A form a lattice basis of Λ(A).
Theorem 3.6. For any two matrices A, A0 ∈ Rm×n in HNF we have
Λ(A) = Λ(A0 ) ⇐⇒ A = A0 .
Λ(A) = Λ(A0 ).
Ai,j 6= A0i,j .
A0i,i ≤ Ai,i .
Since A0∗,j ∈ Λ(A) and due to the triangular shape (2) of A there exist coeffi-
cients
λ1 , . . . , λ i ∈ Z
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 23
with
i
X
A0[i],j = λ` A[i],` (3.1)
`=1
and
i−1
X
A0[i−1],j = λ` A[i−1],` . (3.2)
`=1
λ1 = · · · = λi−1 = 0.
In this case, (3.1) yields A0i,i = λi Ai,i . However, recall that we have A0i,i < Ai,i ,
A0
which yields λi = Ai,ii,i
< 1. The nonnegativity (by (1)) and integrality of λi
implies λi = 0 and hence A0i,i = 0, a contradiction to (3).
Hence, we must have j < i. By the minimality of i, we have A0[i−1],` = A[i−1],`
for all `. Again by the triangular shape of A and (3.2) we see that
(
1 if ` = j
λ` =
0 if ` ∈ [i − 1] \ {j}
Recall that we assumed Ai,j 6= A0i,j and hence λi 6= 0. If λi > 0, then A0i,j ≥
Ai,i ≥ A0i,i , a contradiction to (4). If λi < 0, then Ai,j = A0i,j − λi Ai,i ≥ Ai,i ,
again a contradiction to (4).
Thus, if two matrices have HNF, then it is easy to tell if they generate the
same lattice. Next, we will see that every rational lattice is generated by a
matrix in HNF. To see this, we will start with any lattice basis and try to put
it into HNF using simple operations that do not affect the lattice.
Definition 3.7 (Elementary column operations). The following operations on
a matrix are called elementary column operations (ECOs).
1. interchanging two columns
2. multiplication of a column by −1
3. addition of an integer multiple of a column to another column
Let us collect a few simple observations about ECOs. First, for every ECO
ω and matrices A, B we have
Moreover, it turns out that ECOs do not affect lattices, or, more generally:
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 24
Λ(ω(A)) = Λ(A).
Proof. It is clear that Λ(ω(A)) ⊆ Λ(A). It remains to observe that there exists
an ECO ω −1 that is inverse to ω, and hence
(Such a sequence exist since the sum can only take nonnegative integer values.)
By (3.5) and rank(A) = m, we must have D1,1 > 0, and by (3.6) we even
have D1,2 = D1,3 = · · · = D1,ñ = 0. (If D1,j > 0, we may subtract the j-th
column of D from the first column and possibly reorder the columns of D to
contradict (3.6).) Thus, we see that one can iteratively apply a sequence of
ECOs to A to obtain a matrix
B 0 ,
where B is lower triangular with Bi,i > 0 for all i.
To obtain nonnegativity and diagonal dominance, we perform the following
two steps for each row i = 2, . . . , m (in this order) and every j = 1, . . . , i − 1:
1. Determine µ, r ∈ Z with Bi,j = µBi,i + r, where 0 ≤ r < Bi,i . 2. Add (−µ)
times the ith column of B to the jth column of B.
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 25
2 D1,i , and add (−µ) times the jth column to the ith column. Otherwise, we
1
simply add (−1) times the jth column to the ith column. Note that in each
such step, we decrease a positive entry D1,i by at least a factor of 2. Thus,
the number of iterations required to achieve (3.5) and (3.6) is polynomial in the
encoding length of A.
However, recall that to obtain a polynomial-time algorithm, we also need to
take care of the numbers involved in that process. With some further modifi-
cation of the above algorithm, it is possible to obtain a true polynomial-time
algorithm for computing the Hermite normal form:
Theorem 3.10 (Kannan, Bachem 1979). There is a polynomial-time algorithm
for computing the Hermite normal form of any rational matrix.
Corollary 3.11. Given a rational matrix A (of any rank), we have that Λ(A)
is a lattice. Moreover, we can compute a lattice basis of Λ(A) in polynomial
time.
Proof. Given A ∈ Qm×n with rank(A) = k, using Gaussian elimination we may
compute a rational matrix à ∈ Qk×n with rank(Ã) = k and a matrix V ∈ Qm×k
such that A = V Ã.
By Theorem (3.10), we may compute a lattice basis B̃ of Λ(Ã). Then B :=
V B̃ is a lattice basis of Λ(A) since
In the previous section, we have seen that the Hermite normal form of a ratio-
nal matrix can be obtained by ECOs. It turns out that ECOs correspond to
multiplications of very particular matrices, which will be helpful for our under-
standing of the solution space of linear diophantine systems.
Definition 3.12 (Unimodular matrix). A square matrix U is called unimodular
if all entries are integer and | det(U )| = 1.
Theorem 3.13. For every invertible integer matrix U ∈ Zn×n , the following
statements are equivalent.
1. U is unimodular
2. U −1 ∈ Zn×n
3. U −1 is unimodular
4. U · Zn = Zn
5. Λ(U ) = Zn
6. The Hermite normal form of U is the identity matrix.
7. U arises from the identity matrix by a sequence of ECOs.
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 26
Proof. 1 =⇒ 2: Recall that Cramer’s rule states that for an invertible matrix
A, a the solution of Ax = b is given by xi = det(A i)
det(A) , where Ai arises from A by
replacing its ith column with b. The claim follows by applying Cramer’s rule
to the matrix equation U X = I and using the fact that the determinant of an
integer matrix is integer.
2 =⇒ 3: This holds since 1 = det(U ) det(U −1 ) and the fact that det(U ),
det(U −1 ) are both nonzero integers.
3 =⇒ 4: We have Zn = (U U −1 )Zn = U (U −1 Zn ) ⊆ U Zn ⊆ Zn .
4 =⇒ 5: Note that Λ(U ) = U Zn .
5 =⇒ 6: This follows from the uniqueness of the Hermite normal form, see
Theorem 3.9.
6 =⇒ 7: This follows again from Theorem 3.9 and the fact that every ECO
has an inverse ECO.
7 =⇒ 1: The application of an ECO does not change the absolute value of the
determinant.
Corollary 3.14. For every rational matrix A ∈ Qm×n with rank(A) = m there
is a unimodular matrix U ∈ Zn×n such that AU is the Hermite normal form of
A. Moreover, U can be found in polynomial time.
Before we describe the above algorithm, let us mention that the above theo-
rem yields an elegant characterization of the feasibility of a linear Diophantine
systems. In case there is no solution, the vector y can be seen as a simple
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 27
certificate showing that the system is indeed infeasible: If there was a solution
x ∈ Zn with Ax = b, then y | b = y | Ax ∈ Z, contradicting y | b ∈
/ Z. Note that a
similar characterization holds for systems of linear equations: If Ax = b has no
solution in Rn , then there exists some y ∈ Rn such that y | A = 0 but y | b = 1,
which is also known as the Fredholm alternative.
In case the system has a solution, we see that the solution space is again
similar to what we know for linear systems.
Proof of Theorem 3.15. First, we perform Gaussian elimination to see whether
Ax = b has any solution x ∈ Rn . If not, we may obtain some y ∈ Qm such that
y | A = 0 and y | b = 21 . If it has a solution, we may determine redundant rows
whose removal does not change the solution space and hence we can assume
that rank(A) = m.
By Corollary 3.14, in polynomial time we can compute a unimodularmatrix
U ∈ Zn×n such that AU is in Hermite normal form. Recall that AU = B 0
for some invertible matrix B ∈0 Q
m×m
.
z
For any x ∈ R let z = 00 := U x with z 0 ∈ Rm , z 00 ∈ Rn−m . We have:
n −1
z
Ax = b, x ∈ Zn
⇐⇒
AU z = b, z ∈ Zn
⇐⇒
z0
= b, z 0 ∈ Zm , z 00 ∈ Zn−m
B 0
z 00
⇐⇒
z = B −1 b, B −1 b ∈ Zm , z 00 ∈ Zn−m .
0
y | A = y | B 0 U −1 = e|i 0 U −1 = Ui,∗ −1
∈ Zn
and
y | b = (B −1 b)i ∈
/ Z.
Otherwise, we have B −1 b ∈ Zm and setting U = U 0 U 00 with U 0 ∈ Zn×m ,
holds. Hence, we may choose x(0) := U 0 B −1 b and x(1) , . . . , x(t) as the columns
of U 00 (where t = n − m = n − rank(A)).
Since we have already obtained some understanding of lattices, let us use this op-
portunity to visit an extremely elegant and powerful result in finite-dimensional
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 28
convexity, Minkowski’s convex body theorem. To this end, we need the following
notions.
where γi = αi0 − αi . Note that |γi | < 1 for all i and that v − v 0 ∈ Λ. Since
b1 , . . . , bn is a lattice basis of Λ, the numbers γi must be integers and hence are
all zero. We conclude that αi = αi0 for all i and hence y 0 = y, v 0 = v.
Su = {y ∈ Π : y + u ∈ S}.
Note that
Su + u = ((Π + u) ∩ S)
and hence the previous corollary implies that the translates Su + u cover S
without overlapping. Thus, we have
X X
vol(Su ) = vol(Su + u) = vol(S) > det(Λ) = vol(Π).
u∈Λ u∈Λ
Since Su ⊆ Π for all u ∈ Λ, not all Su can be disjoint (otherwise we would have
u∈Λ vol(Su ) ≤ vol(Π)). Thus, there exist u, u ∈ Λ, u 6= u with Su ∩ Su0 6= ∅.
0 0
P
Let z ∈ Su ∩Su0 . Then x := z +u ∈ S and y := z +u ∈ S and x−y = u−u0 ∈ Λ,
0
as claimed.
We are ready to state and prove Minkowski’s theorem.
Proof. First, let us argue that it suffices to prove the claim for prime numbers
n. To see this, suppose that n = pq for smaller integers p, q for which we have
already found integers x1 , x2 , x3 , x4 and y1 , y2 , y3 , y4 with p = x21 + x22 + x23 + x24 ,
q = y12 + y22 + y32 + y42 . Then it is straightforward (but tedious) to check that
n = n21 + n22 + n23 + n24 , where:
n1 = x1 y1 − x2 y2 − x3 y3 − x4 y4
n2 = x1 y1 + x2 y1 + x3 y4 − x4 y3
n3 = x1 y3 − x2 y4 + x3 y1 + x4 y2
n4 = x1 y4 + x2 y3 − x3 y2 + x4 y1
Assuming that n is prime, let us show that there are integers α, β such that
α2 + β 2 + 1 ≡ 0 mod n. If n = 2, then choose α = 0, β = 1. Otherwise,
n is odd. We claim that the numbers α2 with 0 ≤ α < n/2 have distinct
residues modulo n: Suppose α12 ≡ α22 mod n for some 0 ≤ α1 , α2 < n/2. Then
(α1 −α2 )(α1 +α2 ) ≡ 0 mod n and hence n divides α1 −α2 or α1 +α2 . However,
the absolute value of both numbers is strictly smaller than n and hence α1 = α2 .
Similarly, we see that the numbers −1 − β 2 with 0 ≤ β < n/2 also have distinct
residues modulo n. Since there are (n + 1)/2 such numbers α2 and (n + 1)/2
such numbers −1 − β 2 , we obtain the claim.
Thus, there exist integers α, β with α2 + β 2 + 1 ≡ 0 mod n. Consider the
lattice
In the exercises, we will see that det(Λ) = n2 holds. Moreover, the 4-dimensional
open ball B = {x ∈ R4 : kxk2 < 2n} satisfies
Thus, by Minkowski’s theorem there is a point x ∈ Λ with 0 < x21 +x22 +x23 +x24 <
2n. Note that
Thus, x21 + x22 + x23 + x24 is divisible by n and hence it must be equal to n.
Chapter 4
Integer hulls
Now that we have seen that linear programming and linear Diophantine systems
can be solved in polynomial time, we turn our full attention to integer programs.
A central aspect of our studies will be to understand the structure of integer
points in polyhedra. To put the upcoming results in context, we begin with the
following refresher on the basics of polyhedra (some of which we have already
implicitly used in earlier parts).
Definition 4.1 (Polyhedron). A polyhedron is a set of the form P ≤ (A, b) :=
{x ∈ Rn : Ax ≤ b} for some A ∈ Rm×n , b ∈ Rm . If A and b are rational, we
say that P ≤ (A, b) is a rational polyhedron.
While the above definition is based on outer descriptions of polyedra by
means of linear inequalities, polyhedra can be also specified by inner descrip-
tions, see Theorem 4.8.
Definition 4.2 (Polytope). A polytope is a bounded polyhedron.
Definition 4.3 (Convex hull). The convex hull of a set X ⊆ Rn is the smallest
convex set that contains X, and is denoted by conv(X).
Definition 4.4 (Cone). A nonempty set C ⊆ Rn is a cone if for every x ∈ C
the ray {λx : λ ≥ 0} is also contained in C.
Definition 4.5 (Convex conic hull). The convex conic hull of a set Y ⊆ Rn is
the smallest convex cone that contains Y , and is denoted by ccone(Y ).
Definition 4.6 (Convex and conic combinations). Given z1 , . . . , zk ∈ Rn and
λ1 , . . . , λk ∈ R, we say that the linear combination
λ1 z1 + · · · + λk zk
31
CHAPTER 4. INTEGER HULLS 32
satisfies
char(P ) = P ≤ (A, 0) = ccone(Y ).
Theorem 4.14. For every rational polyhedron P there exist finite sets X̃, Ỹ ⊆
Zn such that
P ∩ Zn = X̃ + Λ+ (Ỹ ).
If P ∩ Zn 6= ∅, we have
char(P ) = ccone(Ỹ ).
Moreover, if P = P ≤ (A, b) for rational A, b, then X̃, Ỹ can be chosen such that
the encoding length of every vector in X̃ ∪ Ỹ is polynomially bounded in the
encoding lengths of A, b.
CHAPTER 4. INTEGER HULLS 33
Proof. By Theorem 4.8 there exist finite sets X, Y ⊆ Qn such that the encoding
length of every vector in X ∪ Y is polynomially bounded in hA, bi. For every
y ∈ Y let α(y) ∈ Z≥1 denote the least common multiple of all denominators of
the entries in y. Setting
Ỹ := {α(y)y : y ∈ Y },
we see that ccone(Ỹ ) = ccone(Y ) = char(P ) and that the encoding length of
every vector in Ỹ is polynomially bounded in hA, bi. Let us also define
X̃ := (conv(X) + Q) ∩ Zn ,
where
X
Q := λy y : Y 0 ⊆ Ỹ , |Y 0 | ≤ n, 0 ≤ λy ≤ 1 for all y ∈ Y 0 .
y∈Y 0
First, note that conv(X) + Q is a bounded set and hence X̃ is finite. Second,
every point z ∈ X̃ is integer and satisfies
kzk∞ ≤ max{kxk∞ : x ∈ X} + n max{kyk∞ : y ∈ Ỹ }.
Thus, the encoding length of every point X̃ is also polynomially bounded in
hA, bi.
It remains to show that P ∩Zn = X̃ +Λ+ (Ỹ ) holds. To this end, first observe
that Q ⊆ char(P ), implying conv(X) + Q ⊆ P and hence X̃ ⊆ P . Thus, we
obtain
X̃ + Λ+ (Ỹ ) ⊆ P + Λ+ (Ỹ ) ⊆ P + ccone(Ỹ ) = P + char(P ) = P.
For the reverse inclusion, let z ∈ P ∩ Zn . Using
P = conv(X) + ccone(Y ) = conv(X) + ccone(Ỹ )
we see that there exist x ∈ conv(X) and v ∈ ccone(Ỹ ) with z = x + v. By
Carathéodory’s theorem, there is a set Y 0 ⊆ Ỹ with |Y 0 | ≤ n together with
λy ≥ 0 for all y ∈ Y 0 such that
X
v= λy y.
y∈Y 0
Writing X X
z =x+v =x+ (λy − bλy c)y + bλy cy ,
y∈Y 0 y∈Y 0
| {z } | {z }
=:r =:ṽ
Proof. We have already discussed that the decision version of integer program-
ming is NP-hard, and hence it remains to show that it is in NP. Suppose we are
given A ∈ Qm×n , b ∈ Qm that corresponds to a yes-instance, i.e., P := P ≤ (A, b)
contains an integer point. By Theorem 4.14 there exist finite sets X, Y ⊆ Zn
such that
P ∩ Zn = X + Λ+ (Y )
and the encoding length of every vector in X ∪ Y is polynomially bounded
in the encoding lengths of A, b. Thus, we may pick any vector x ∈ X as a
certificate showing that P contains an integer point: x itself is such a point, has
polynomially bounded encoding length, and it can be efficiently checked that x
satisfies all constraints defining P .
Definition 4.16 (Hilbert basis). A Hilbert basis of a set C ⊆ Rn is a finite set
H ⊆ Zn such that C ∩ Zn = Λ+ (H).
C ∩ Zn = X + Λ+ (Y ).
In the previous part, we have seen that integer points in rational polyhedra can
be nicely parametrized. This allowed us to see that if a rational polyhedron
contains an integer point, we will also find one with small encoding length,
showing that the decision version of integer programming is in NP. Another
important consequence is that the convex hull of all integer points in a rational
polyhedron is again a rational polyhedron.
Before we prove the above statement, let us mention that if P is not rational,
PI is not a polyhedron in general (see the exercises). Of course, the first part
of the above statement becomes trivial if P is a polytope, in which case PI is
the convex hull of a finite number of integer points, and hence it is clear that
PI is a rational polytope. (H) In the proof of Theorem 4.19 we make use of the
following lemma.
Lemma 4.20. For all finite sets X, Y ⊆ Rn we have
where the latter ist just a linear program. However, how do we find a feasible
integer point attaining the optimum value? To understand this, let us recall
some basic facts about faces of polyhedra.
Definition 4.21. For a ∈ Rn , β ∈ R, we set H ≤ (a, β) := {x ∈ Rn : a| x ≤ β}
and H = (a, β) := {x ∈ Rn : a| x = β}.
Definition 4.22 (Face). A face of a polyhedron P ⊆ Rn is a set of the form
F = P ∩ H = (a, β), where a ∈ Rn and β ∈ R are such that P ⊆ H ≤ (a, β). We
say that F is defined by the inequality a| x ≤ β.
Proposition 4.23. Let F be a face of a polyhedron P = P ≤ (A, b) = conv(X) +
char(Y ), where A ∈ Rm×n , b ∈ Rm , and X, Y ⊆ Rn are finite.
1. F is a polyhedron.
yields
F = {x ∈ P : A|i,∗ x = bi for all i ∈ eq(F )}
and F = P ∩ H = (a, β), where
X X
a= Ai,∗ and β = bi
i∈eq(F ) i∈eq(F )
Integer polyhedra
5.1 Basics
This part is devoted to the study of situations in which the polyhedron P that
we are given is already equal to its integer hull, i.e., P = PI . While this is
clearly not satisfied in general, we will see that this special situation still occurs
in several interesting settings. Moreover, we will derive techniques to see that
P = PI indeed holds in these cases.
Definition 5.1 (Integer polyhedra). We say that a polyhedron P is integer if
it satisfies P = PI .
Theorem 5.2. A rational polyhedron P is integer if and only if every minimal
face of P contains at least one integer point. In particular, pointed rational
polyhedra (and all polytopes) are integer if and only if all their vertices are
integer.
Proof. If P is integer, then Theorem 4.31 shows that every nonempty face con-
tains an integer point.
Conversely, suppose that every minimal face of P contains an integer point.
Let X ⊆ P ∩ Zn contain an integer point from every minimal face of P . By
Remark 4.30, we obtain P = conv(X) + char(P ). If PI = ∅, then X = ∅ and
hence P = ∅. Otherwise, we have char(PI ) = char(P ) by Theorem 4.19, and
hence
P ⊆ conv(X) + char(PI ) ⊆ PI + char(PI ) = PI .
38
CHAPTER 5. INTEGER POLYHEDRA 39
A| , −A,
A −A , A I , A −I ,
A A A A
, , ,
−A −A I −I
P+≤ (A, b) := {x ∈ Rn : Ax ≤ b, x ≥ 0}
P = (A, b) := {x ∈ Rn : Ax = b}
P+= (A, b) := {x ∈ Rn : Ax = b, x ≥ 0}
[`, u] := {x ∈ Rn : ` ≤ x ≤ u}
max{c| x : Ax ≤ b, x ∈ Zn } = min{b| y : A| y = c, y ≥ 0, y ∈ Zm }
max{c| x : Ax ≤ b} = min{b| y : A| y = c, y ≥ 0, y ∈ Zm }.
We stress that the above criterion must be met for every subset J ⊆ [n].
Since A is totally unimodular if and only if A| is totally unimodular, the row
version of the statement holds as well:
Theorem 5.10 (Ghouila-Houri, 1962 (row version)). A matrix A ∈ {−1, 0, +1}m×n
is totally unimodular if and only if for every subset I ⊆ [m] there exists a par-
tition I = I+ ∪ I− , I+ ∩ I− = ∅ with
X X
Ai,∗ − Ai,∗ ∈ {−1, 0, +1}n .
i∈I+ i∈I−
and hence
m
A∗,J (2z − 1) ∈ {−1, 0, 1} .
CHAPTER 5. INTEGER POLYHEDRA 41
We claim that
Ay ∈ {en , −en } (5.1)
holds. Note that this implies (A )∗,n ∈ {y, −y} ⊆ {−1, 0, +1} , as claimed.
−1 n
X
c(v,w) .
(v,w)∈A:
v∈S ∗ , w∈S
/ ∗
However, this does not work for nonbipartite graphs. For instance, if G is
a cycle on three nodes with edge weights c ≡ 1, then the above LP has an
optimal value of 32 , whereas the maximum cardinality of matching is only 1.
In fact, for nonbipartite graphs more inequalities are needed to describe their
matching polytope.
Another application of Corollary 5.18 is the following well-known result con-
cerning the relation between matchings and vertex covers in bipartite graphs.
Recall that a set of nodes C ⊆ V is a vertex cover of an undirected graph
G = (V, E) if every edge in E is incident to at least one node in C.
CHAPTER 5. INTEGER POLYHEDRA 44
= max 1| x : I(G) ≤ 1, x ≥ 0, x ∈ ZE
α = min 1| y : I(G)| ≥ 1, y ≥ 0, y ∈ ZV
( )
X
= min yv : yv + yw ≥ 1 for all {v, w} ∈ E, y ≥ 0, y ∈ ZV
v∈V
( )
X
= min yv : yv + yw ≥ 1 for all {v, w} ∈ E, y ∈ {0, 1}V ,
v∈V
F ∩ Zn = ∅,
{x ∈ Zn : c| x = δ} = ∅. (5.4)
and hence x ∈ H ≤ (c, δ). Moreover, since y > 0, we see that c| x = δ holds if and
only if Ai,∗ x = bi for all i ∈ I = eq(F ). In other words, we have x ∈ H = (c, δ) if
and only if x ∈ F .
Proof of Thm. 5.21. Let P be a rational polyhedron. Suppose first that P is
integer and c ∈ Zn with
δ := max{c| x : x ∈ P } ∈
/ {−∞, +∞}.
F := {x ∈ P : c| x = δ} =
6 ∅
δ = c| z ∈ Z.
Suppose now that P is not integer. Again by Theorem 5.2, there exists a
minimal face F of P with F ∩ Zn = ∅. By the previous lemma, F is defined by
a rational linear inequality c| x ≤ δ with
{x ∈ Zn : c| x = δ} = ∅.
δ = max{c| x : x ∈ P }.
We have seen that total unimodularity is a useful notion to explain the inte-
grality of certain polyhedra. However, many integer polytopes appearing in
combinatorial optimization are not described by totally unimodular matrices.
In this section, we study the more general notion of total dual integrality (TDI),
which turns out to be applicable to all integer polyhedra and can be seen as a
unifying tool to explain many combinatorial min-max theorems (such as max-
flow-min-cut or König’s theorem discussed earlier). In fact, many such theorems
can be phrased as “a certain linear program and its dual each have integer op-
timal solutions”.
CHAPTER 5. INTEGER POLYHEDRA 46
for all c ∈ Zn ,
max{c| x : Ax ≤ b, x ∈ Zn } = min{b| y : A| y = c, y ≥ 0, y ∈ Zm }.
max{c| x : Ax ≤ b, x ∈ Zn } = max{c| x : Ax ≤ b, x ∈ Rn }
= min{b| y : A| y = c, y ≥ 0, y ∈ Rm }
= min{b| y : A| y = c, y ≥ 0, y ∈ Zm }
holds, where the equalities follow from the integrality of P , LP duality, and
total dual integrality, respectively.
Remark 5.26. Let Ax ≤ b be totally dual integral and σ ∈ Z≥1 . In general,
σA · x ≤ σb is not totally dual integral. However, σ1 A · x ≤ σ1 b is totally dual
integral.
Remark 5.27. For every rational system Ax ≤ b there is a rational number
τ > 0 such that τ A · x ≤ τ b is totally dual integral.
Proof. Let η1 ∈ Z≥1 be such that η1 A is integer and let η2 denote a common
integer multiple of all subdeterminants of η1 A. Define τ := ηη21 .
Let c ∈ Zn such that max{c| x : τ Ax ≤ τ b} ∈ / {−∞, ∞}. Consider an
optimal vertex solution y ∗ of
min{τ b| y : τ A| y = c, y ≥ 0}
min{η1 b| z : η1 A| z = c, z ≥ 0}.
F ⊆ arg max{c| x : x ∈ P }.
and hence
X
c = A| y ∗ = yi∗ Ai,∗ ∈ Λ+ ({Ai,∗ : i ∈ eq(F )}).
i∈eq(F )
For the converse direction of the proof, assume now that for every minimal
face F of P ≤ (A, b), the vectors Ai,∗ , i ∈ eq(F ) form a Hilbert basis of
F ∗ ⊆ {x ∈ P ≤ (A, b) : c| x = η} =
6 ∅
be a minimal face of P ≤ (A, b), and pick any point x∗ ∈ F ∗ . By LP duality, there
is a vector y ∗ ∈ Rm
≥0 with A y = c. Note that Ai,∗ x < bi for all i ∈
| ∗ ∗
/ eq(F ∗ )
and hence, by complementary slackness, we even have
and hence there exists λi ∈ Z≥0 , i ∈ eq(F ∗ ) with c = i∈eq(F ∗ ) λi Ai,∗ . Filling
P
with zeros, this means that there is a vector λ ∈ Zm≥0 with
A| λ = c
and
|
X X X
b| λ = bi λi = Ai,∗ x∗ λi = λi Ai,∗ x∗ = c| x∗ ,
i∈eq(F ∗ ) i∈eq(F ∗ ) i∈eq(F ∗ )
Corollary 5.30. The rows of A ∈ Zm×n form a Hilbert basis of ccone({A1,∗ , . . . , Am,∗ })
if and only if Ax ≤ 0 is totally dual integral.
Proof. The claim follows immediately from Theorem 5.28 by observing that
lineal(P ≤ (A, 0)) = kern(A) is the only minimal face of the polyhedral cone
P ≤ (A, 0).
NF (P ) ∩ Zn = Λ+ (H(F ))
and
NF (P ) = ccone(H(F )).
For every F ∈ F, pick any point x(F ) ∈ F with x(F ) ∈ Zn if F ∩ Zn 6= ∅. We
claim that the system Ax ≤ b defined by
max{a| x : x ∈ P } = a| x(F )
and hence
a| x ≤ a| x(F )
CHAPTER 5. INTEGER POLYHEDRA 49
Chvátal-Gomory cuts
In the previous chapter, we have studied the idealistic case of integer programs
defined by an integer polyhedron, in which case the integer program reduces to a
linear program. In practice, however, we are usually confronted with polyhedra
that are not integer and whose integer hulls are difficult to compute. One
very effective technique for dealing with such integer programs is to refine the
formulation by (iteratively) adding so-called cuts. Consider an IP
max{c| x : Ax ≤ b, x ∈ Zn }.
max{c| x : Ax ≤ b}
P := P ≤ (A, b).
a| x∗ = β = max{a| x : x ∈ P }.
a ∈ Zn
50
CHAPTER 6. CHVÁTAL-GOMORY CUTS 51
is still valid for PI and hence we may add it to our integer program without
changing the feasible region (of integer points). Note also that the inequal-
ity (6.1) is violated by x∗ and hence it is “cut off”. In view of this, the inequal-
ity (6.1) is called a cut. More specifically, cuts that are derived in the above way
are called Chvátal-Gomory cuts, named after Vašek Chvátal and Ralph Gomory.
During this course, we will discuss several ways to derive cuts and how to deal
with them algorithmically. Before we do this, let us have a closer look at cuts
that are derived in the above way.
Let us start with the following summary.
Observation 6.1. Let P ⊆ Rn be a polyhedron and a ∈ Zn , β ∈ R with
P ⊆ H ≤ (a, β). Then we have
PI ⊆ H ≤ (a, β) I ⊆ H ≤ (a, bβc) ⊆ H ≤ (a, β).
Proof. The inclusion ⊆ is clear. For the reverse inclusion, note that we have
H ≤ (a, bβc) I ⊆ H ≤ (a, β) I
and hence it remains to show that H ≤ (a, bβc) is integer. By Theorem 5.2 it
suffices to show that every minimal face of H ≤ (a, bβc) contains an integer point.
However, the only minimal face is F := H = (a, bβc). Recall that
Λ({a1 , . . . , an }) = gcd(a1 , . . . , an )Z = Z
and hence there is some x ∈ Zn with a| x = bβc. In particular, we have x ∈
F ∩ Zn .
Definition 6.3. For n ∈ Z≥1 let
Hn = H ≤ (a, β) : a ∈ Zn \ {0}, β ∈ Q
With this definition, several questions arise. It is clear that the Chvátal-
Gomory closure is a closed convex set, but is it also a polyhedron? How does
it relate to the integer hull? Is it easy to compute it? Let us address these
questions step by step.
Remark 6.5. For every rational polyhedron P we have PI ⊆ P 0 ⊆ P . If P1 , P2
are two polyhedra with P1 ⊆ P2 , then P10 ⊆ P20 .
Theorem 6.6. If Ax ≤ b is totally dual integral with A ∈ Zm×n and b ∈ Qm ,
then
(P ≤ (A, b))0 = P ≤ (A, bbc),
where bbc = (bb1 c, . . . , bbm c).
Proof. The inclusion ⊆ is clear. It remains to show that for every c ∈ Zn and
δ ∈ Q with P ≤ (A, b) ⊆ H ≤ (c, δ) we also have P ≤ (A, bbc) ⊆ H ≤ (c, bδc). We may
assume that P ≤ (A, b) 6= ∅, otherwise we have (P ≤ (A, b))0 = ∅ = P ≤ (A, bbc).
Since Ax ≤ b is totally dual integral, there is some y ∈ Zm ≥0 with A y = c and
|
As we will see in the exercises, the Chvátal-Gomory closure may be still far
from the integer hull. However, nothing prevents us from applying the closure
multiple times. Clearly, the hope is that we reach the integer hull at some point.
Fortunately, this will be the case.
Definition 6.8. For a polyhedron P let
P (0) := P
and 0
P (t) := P (t−1)
Theorem 6.10. For every rational polyhedron P there exists some t ∈ Z≥0
such that PI = P (t) .
CHAPTER 6. CHVÁTAL-GOMORY CUTS 53
F (t) = P (t) ∩ F
where the second equality follows from the induction hypothesis for t − 1, and
the fourth equality from the induction hypothesis for t = 1.
To show the claim for t = 1, let us pick a TDI system Ax ≤ b with
P 0 = P ≤ (A, bbc).
F = {x ∈ P : a| x = β}, a ∈ Zn , β ∈ Z.
Ax ≤ b, a| x ≤ β, −a| x ≤ −β (6.2)
is totally dual integral. Note that this yields (again using Theorem 6.6)
To show that (6.2) is totally dual integral, we use Theorem 5.28: We show that
for every face G 6= ∅ of F the set
a ∈ ccone(A),
and hence
A ∪ {a}
CHAPTER 6. CHVÁTAL-GOMORY CUTS 54
which means
x = x̃ + λ(−a) ∈ Λ+ (A ∪ {−a, a}).
Lemma 6.12. For every rational polyhedron P ⊆ Rn and t ∈ Z≥0 we have:
1. If U ∈ Zn×n is totally unimodular, then
P (t) = PI ⇐⇒ (U P )(t) = (U P )I .
P (t) = PI ⇐⇒ (y + P )(t) = (y + P )I .
and
which easily follow from the definitions of the integer hull and the Chvátal-
Gomory closure, respectively.
Proof of Theorem 6.10. We prove the claim by induction on d = dim(P ). If
d = −1, then P = ∅ and P (0) = ∅ = PI . For d = 0, we have P = {v} for some
v ∈ Qn . If v ∈ Zn , then P (0) = {v} = PI . Otherwise, there is some i ∈ [n] with
/ Z. The (contradicting) inequalities
vi ∈
c| x = δ with c ∈ Zn , δ ∈ Q \ Z
that is valid for aff(P ), and hence also for P . The contradicting inequalities
aff(U −1 P ) = U −1 aff(P ) = {U −1 x : Cx = 0, x ∈ Rn }
= {y ∈ Rn : CU y = 0}
= {y ∈ Rn : [B 0]y = 0}
= {0n−d } × Rd .
Thus,
U −1 P = {0n−d } × Q
for some rational polyhedron Q ⊆ Rd with dim(Q) = d. By Lemma 6.12.1
and 6.12.3 we have P (t) = PI if and only if Q(t) = QI , and hence we indeed
may assume d = n.
Since PI is a rational polyhedron, it suffices to show that for every c ∈ Zn ,
δ ∈ Z with PI ⊆ H ≤ (c, δ) there is some r ∈ Z≥0 such that
P (r) ⊆ H ≤ (c, δ). (6.3)
If PI = ∅, it suffices to consider c = 0 and δ = −1. Otherwise, we have
char(PI ) = char(P ) and hence we may always assume sup{c| x : x ∈ P } < ∞.
In particular, we see that
γ := min{δ 0 ∈ Z : δ 0 ≥ δ, P (r) ⊆ H ≤ (c, δ 0 ) for some r}
always exists. Let r be such that
P (r) ⊆ H ≤ (c, γ)
and consider the face
F := P (r) ∩ H = (c, γ)
of P (r) . If F ∩ Zn 6= ∅, then γ = δ and (6.3) is satisfied. Otherwise, we have
FI = ∅ and since dim(F ) < n = d the induction hypthesis asserts that there is
some s ∈ Z≥0 with
∅ = FI = F (s) = (P (r) )(s) ∩ F = P (r+s) ∩ F,
where the third equality is due to Lemma 6.11 and since F is a face of P (r) .
Thus, we have c| x < γ for all x ∈ P (r+s) . Since P (r+s) is a polyhedron, there
exists some γ 0 ∈ Q, γ 0 < γ such that c| x ≤ γ 0 for all x ∈ P (r+s) . This means
that
P (r+s+1) ⊆ H ≤ (c, γ − 1).
By the minimality of γ, we must have γ − 1 < δ. Since γ, δ ∈ Z this implies
γ ≤ δ and hence
P (r) ⊆ H ≤ (c, γ) ⊆ H ≤ (c, δ).
CHAPTER 6. CHVÁTAL-GOMORY CUTS 56
We remark that the above theory does not extend to non-rational polyhedra.
For example, if √
P = {(x, y) ∈ R2 : y ≤ 2x, x ≥ 0}
we see that
P = P (0) = P (1) = P (2) = · · · ) PI .
(Recall that in this case PI is not even a polyhedron in this case). For
√
Q = {(x, y) ∈ R2 : y ≤ 2x, x ≥ 1}
we have
Q = Q(0) ) Q(1) ) Q(2) ) Q(3) ) . . . .
However, Dadush, Dey, Vielma (2013) showed that the Chvátal-Gomory close
of any compact convex set is a rational polytope, and hence our theory also
applies for such sets.
For a positive example, let us recall that the matching polytope of a bipartite
graph G = (V, E) is described by
Here, for a vector x ∈ RE and a set F ⊆ E we use the notation x(F ) := e∈F xe .
P
In the exercises we have seen that this description is unfortunately not sufficient
in the case that G is nonbipartite. Observe that every vector x ∈ P (G) satisfies
|U |
x(E(U )) ≤
2
for every subset U ⊆ V . This means that the inequalities
|U | − 1
x(E(U )) ≤ for all U ⊆ V, |U | odd (6.4)
2
are valid for P (G). As shown in a seminal paper by Jack Edmonds, it turns out
that adding these inequalities is sufficient for obtaining the matching polytope,
i.e., the matching of a general graph G = (V, E) is given by
This example is very exceptional. For most polytopes that arise in combinatorial
optimization we have to apply several rounds of the Chvátal-Gomory closure to
reach the integer hull. Any interesting result yielding a bound on the number
of such iterations is due to Eisenbrand & Schulz (2003): For every rational
polytope P ⊆ [0, 1]n , we have P (r) = PI for some r = O(n2 log n). Moreover,
Rothvoß and Sanità (2016) showed that some of these polytopes need r = Ω(n2 )
rounds.
To conclude this section, let us observe that Chvátal-Gomory cuts yield a
nice proof system:
Corollary 6.13 (Chvátal-Gomory proof system). Let Ax ≤ b be any rational
system of linear inequalities and let c| x ≤ δ be any rational inequality that is
valid for X := P ≤ (A, b) ∩ Zn . Then (a rescaling of ) c| x ≤ δ can be obtained by
a finite number of steps of the following procedure:
CHAPTER 6. CHVÁTAL-GOMORY CUTS 57
Split cuts
In the previous chapter we have seen that Chvátal-Gomory cuts provide a sys-
tematic way to derive nontrivial inequalities that are valid for the integer hull
of a polyhedron P ⊆ Rn . In particular, we have seen that it is easy to de-
rive Chvátal-Gomory cuts that separate a given fractional vertex x∗ of P from
PI .1 Such routines are central components of modern integer programming
solvers since most cuts are in fact generated to separate fractional vertices, and
Chvátal-Gomory cuts turn out to be a good starting point. However, in the
last centuries a large number of different families of cuts have been studied,
and many of them are implemented in modern solvers. A nice example to il-
lustrate other ways cuts can be derived is the family of split cuts, which are
generalizations of Chvátal-Gomory cuts.
The idea behind split cuts is based on the following simple “disjunction”.
If π ∈ Zn and π0 ∈ Z, then every integer point z ∈ Zn satisfies π | z ≤ π0 or
π | z ≥ π0 +1. In particular, every integer point in P is contained in P ∩H ≤ (π, π0 )
or P ∩ H ≥ (π, π0 + 1). This motivates the following definition.
PI ⊆ split(P, π, π0 ).
Note that for every fractional vertex x∗ , we can easily find a pair (π, π0 ) ∈
Zn+1
such that split(P, π, π0 ) does not contain x∗ :
Observation 7.3. Let x∗ be a fractional vertex of a polyhedron P ⊆ Rn , i.e.,
x∗j ∈
/ Z for some j ∈ [n]. Setting π := ej and π0 := bx∗j c we have
x∗ ∈
/ split(P, π, π0 ).
1 Finding an inequality separating an arbitrary point in P \ P from P is a much harder
I I
task.
58
CHAPTER 7. SPLIT CUTS 59
Let us try to get a better idea of split(P, π, π0 ). To this end, first recall the
definition of a Chvátal-Gomory cut π | x ≤ π0 , which arises from an inequality
π | x ≤ β that is valid for P by setting π0 = bβc. In this case, we see that
P ∩ H ≥ (π, π0 + 1) = ∅ and hence split(P, π, π0 ) = P ∩ H ≤ (π, π0 ).
In general, however, (π, π0 ) ∈ Zn+1 , split(P, π, π0 ) is the convex hull of the
union of the two (nonempty) polyhedra P ∩ H ≤ (π, π0 ) and P ∩ H ≥ (π, π0 + 1).
We will see that split(P, π, π0 ) is still a polyhedron. Thus, if x∗ ∈/ split(P, π, π0 ),
then there must be a linear inequality valid for split(P, π, π0 ) but violated by
x∗ . Such an inequality is called a split cut.
Unfortunately, the polyhedron split(P, π, π0 ) is typically more complicated
than P : For instance, the number of facets of split(P, π, π0 ) may be exponential
in the number of facets of P and hence determining an inequality description of
split(P, π, π0 ) may not be practical. However, split(P, π, π0 ) has the following
interesting description.
To see the inclusion “⊆”, let x ∈ split(P, π, π0 ). Note that we can write
x = λ0 y (0) + λ1 y (1) ,
where
y (0) ∈ P ∩ H ≤ (π, π0 ) and y (1) ∈ P ∩ H ≥ (π, π0 + 1),
and
λ0 + λ1 = 1, λ0 , λ1 ≥ 0.
Setting
x(i) := λi y (i)
for i = 0, 1, we see that (x, x(0) , x(1) , λ0 , λ1 ) satisfy all constraints defining the
set on the right-hand side of (7.1).
For the inclusion “⊇”, let (x, x(0) , x(1) , λ0 , λ1 ) satisfy the constraints on the
right-hand side of (7.1). Suppose first that λ0 = 0, in which case λ1 = 1.
CHAPTER 7. SPLIT CUTS 60
We will see that the above description will help us to compute split cuts ex-
plicitly. Before we do this, let us mention that the above theorem yields a way to
express split(P, π, π0 ) as the projection of a higher-dimensional polyhedron that
may have significantly fewer facets than split(P, π, π0 ) itself. Such a description
is called an extended formulation and it turns out that many polytopes that
arise in discrete optimization can be described by extended formulations that
are significantly smaller than descriptions in the original space.
PI ⊆ scl(P ) ⊆ P 0 ⊆ P.
Proof. See Section 5.1 in the book Integer Programming by Conforti, Cornuéjols,
Zambelli.
scl(t) (P ) := scl(scl(t−1) (P ))
for t = 1, 2, . . . .
Corollary 7.9. For every rational polyhedron P there is some t ∈ Z≥0 with
scl(t) (P ) = PI .
Proof. Follows directly from Remark 7.6 and Theorem 6.10.
β, δ ∈ Rm
≥0 , γ, ε ≥ 0
such that
A| β + γπ = A| δ − επ =: α
and
α| x∗ > max{b| β + π0 γ, b| δ − (π0 + 1)ε} =: ζ
Moreover, the inequality α| x ≤ ζ is valid for all x ∈ split(P, π, π0 ).
CHAPTER 7. SPLIT CUTS 62
has a solution. If the system is infeasible, by Farkas’ lemma, there exist multi-
pliers
α ∈ Rn , β ∈ Rm m n n
≥0 , γ ≥ 0, δ ∈ R≥0 , ε ≥ 0, ζ ∈ R, η ∈ R≥0 , θ ∈ R≥0
such that
−α + A| β + γπ = 0
−α + A| δ − επ = 0
|
− b β − π0 γ +ζ −η = 0
|
− b δ + (π0 + 1)ε + ζ −θ = 0
∗ |
(−x ) α +ζ < 0.
We remark that the above method only works if we know a pair (π, π0 )
such that the given point x∗ is not contained in split(P, π, π0 ). While it is easy
to identify such a pair (π, π0 ) for a fractional vertex, it is NP-hard to decide
whether a general point x∗ is contained in scl(P ).
Recall that the above method requires solving an auxiliary LP, which can be
done in polynomial time but still requires some computational effort, of course.
If we solve the LP relaxation using the simplex method, there is an even more
efficient way for obtaining a split cut, which is based on the following fact.
Definition 7.12. For α ∈ R let f (α) := α − bαc denote the fractional part of
α.
Theorem 7.13. Let a ∈ Rn , β ∈ R \ Z and
P = {x ∈ Rn : a| x = β, x ≥ 0}.
CHAPTER 7. SPLIT CUTS 63
Define π ∈ Rn via (
baj c if f (aj ) ≤ f (β)
πj =
daj e if f (aj ) > f (β)
and g ∈ Rn via
(
f (aj )/f (β) if f (aj ) ≤ f (β)
gj =
(1 − f (aj ))/(1 − f (β)) if f (aj ) > f (β).
and
Π0 := {x ∈ P : π | x ≤ π0 }, Π1 := {x ∈ P : π | x ≥ π0 + 1},
where π0 := bβc. We have to show that g | x ≥ 1 holds for all x ∈ Π0 ∪ Π1 . For
all x ∈ P we have
X X
β = a| x = aj xj + a j xj
j∈J≤ j∈J>
X X
= (πj + f (aj ))xj + (πj + f (aj ) − 1)xj
j∈J≤ j∈J>
X X
= π| x + f (aj )xj + (f (aj ) − 1)xj . (7.3)
j∈J≤ j∈J>
f (aj ) f (a )−1
X X
1≤ f (β) xj + j
f (β) xj ≤ g | x.
j∈J≤ | {z } j∈J> | {z } |{z}
=gj <0≤gj ≥0
f (aj )
X X f (a )−1
j |
1≤ xj +
f (β)−1 f (β)−1 xj ≤ g x.
j∈J≤ | {z }
|{z} j∈J> | {z }
≤0≤gj ≥0 =gj
CHAPTER 7. SPLIT CUTS 64
Cuts of the above form are called Gomory Mixed Integer Cuts, often abbre-
viated as GMI cuts. The term “Mixed Integer” refers to the setting where only
a subset of the variables is required to be integer, in which case the above cuts
can be easily generalized. GMI cuts are implemented in every modern integer
programming solver and are considered as the most efficient cuts. On the one
hand, they empirically provide very strong cuts. On the other hand, they can
be directly read off from a simplex tableau without solving an auxililary LP:
To this end, recall that the simplex method is designed for LPs whose feasible
region is of the form
P = {x ∈ Rn : Ax = b, x ≥ 0},
g| x ≥ 1
from Theorem 7.13. To see that this inequality is violated by x∗ , observe that
every basic variable has an integer coefficient in (7.4). This implies gj = 0 for
all j ∈ B and hence
X X
g | x∗ = gj x∗j + gj x∗j = 0 < 1.
j∈B
|{z} j∈N
|{z}
=0 =0
Chapter 8
65
CHAPTER 8. BRANCH & CUT 66
8. L ← f (x)
9. P ← {Π00 ∈ P : u(Π00 ) > L} (“prune”)
L ≤ OPT ≤ U.
Branching In practice, there are many degrees of freedom to tune this algo-
rithm. Recall that we have created subproblems by simple “variable branching”.
To obtain a balanced tree, it is an important question which variables to branch
on (which index i to choose) and there are many clever algorithms to estimate
the effect of a branching. Moreover, there is no reason to restrict ourselves
to simple variable branching. Recall the idea behind split cuts: Any vector
π ∈ Zn with π | x∗ ∈/ Z creates two natural subproblems that exclude x∗ . How-
ever, selecting a good vector π different from unit vectors is quite challenging
in practice: One issue is that the search tree may become very unbalanced.
As an example, suppose all variables areP binary and wePhave decided to
create subproblems usingPthe inequalities i∈I xi ≤ 0 and i∈I xi ≥ 1. For
the problem defined by i∈I xi ≤ 0 we immediately know that all variables
indexed by I have to be zero, and hence can be eliminated P from the problem,
reducing the problem size. For the problem defined by i∈I xi ≥ 1 we only
know that at least one of the variables indexed by I has to be one, but we don’t
know which one. Thus, the latter problem does not seem to correspond to a
great restriction.
interior point algorithms are used for this task. Moreover, presolving techniques
that reduce the problem size in advance are an important ingredient.
Good relaxations As a final comment in this section, let us recall that the
performance of the branch & bound algorithm crucially depends on the quality of
the bounds we obtain during execution. While there are many ways to improve
heuristics to obtain good primal bounds, there seems to be not much freedom
about improving the dual bounds. What else can we do than solving the LP
relaxation? A simple answer to this question is: start with a good LP relaxation.
That is, start with a system Ax ≤ b such that P ≤ (A, b) is close to its integer
hull. This is not a simple task (especially for NP-hard problems), but we will
see some good practices in the exercises.
Cutting planes:
Gomory: 9
MIR: 180
Zero half: 253
RLT: 604
After this short excursion into practice, let us turn to some more beautiful
theoretical results at the end of this lecture. The first one is a landmark result
by Hendrik Lenstra. Recall that it is NP-hard to solve general integer programs.
Lenstra proved that integer programs with a fixed (constant) number of variables
can be solved in polynomial time.
Theorem 9.1 (Lenstra 1981). For every fixed n, there is a polynomial-time
algorithm to solve integer programs max{c| x : Ax ≤ b, x ∈ Zn }.
As usual, here we measure the running time proportional to the encoding
lengths of c, A, b. Let us note that the above result is trivial if all variables are
required to be binary, i.e., x ∈ {0, 1}n . In this case, we can simply enumerate all
2n points in {0, 1}n (which is a constant number if we regard n as a constant)
and return the best feasible point. However, solving general (nonbinary) integer
programs is already nontrivial for n = 2. Recall that we can model the compu-
tation of the greatest common divisor of two numbers with a 2-variable integer
program, and the fact that the greatest common divisor can be computed in
polynomial time requires at least some thinking.1
It turns out that Lenstra’s algorithm relies on two mathematical facts that
are beautiful on its own. In this chapter, we will first collect these results in
a rather independent fashion. Once they are established, Lenstra’s algorithm
follows quite naturally.
We begin with a central result in convex geometry, namely that every convex
body can be nicely sandwiched between two concentric ellipsoids.
1 It is always a nice exercise to remind yourself why the Euclidean algorithm runs in poly-
nomial time.
70
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 71
Definition 9.2 (Convex body). A convex body is a compact (i.e., bounded and
closed) convex set in Rn with nonempty interior.
Theorem 9.3. For every convex body K ⊆ Rn there exists an ellipsoid E with
center z such that
E ⊆ K ⊆ n(E − z) + z.
Note that the ellipsoid λ(E − z) + z arises from E by scaling it by a factor
of n around its center z. To phrase the above statement differently, recall that
our original definition of an ellipsoid E was based on a positive definite matrix
D ∈ Rn×n and a vector z ∈ Rn (the center of E), where E = E(z, D) := {x ∈
Rn : (x − z)| D(x − z) ≤ 1}. An alternative definition is that ellipsoids are
precisely the images of Euclidean unit balls under invertible affine linear maps:
Lemma 9.4. A set E ⊆ Rn is an ellipsoid if and only if there exists a nonsin-
gular matrix A ∈ Rn×n and a vector z ∈ Rn with
E = ABn + z,
Bn ⊆ τ (K) ⊆ nBn .
Proof. By Theorem 9.3 and Lemma 9.4 there exists a nonsingular matrix A ∈
Rn×n and a vector z ∈ Rn with
ABn + z ⊆ K ⊆ nABn + z
Define τ to be the invertible affine linear map with τ (x) = A−1 (x − z). The
claim follows since τ (ABn + z) = Bn and τ (nABn + z) = nBn .
It turns out that an ellipsoid with the above properties can be obtained by
taking the maximum-volume ellipsoid:
Theorem 9.6 (Löwner-John ellipsoid). Let K be a convex body. Among all
ellipsoids contained in K there is a unique one with maximum volume.
We will first prove the above result and then show that the respective el-
lipsoid satisfies Theorem 9.3. To this end, let us collect a few further insights
about ellipsoids.
Lemma 9.7. Let E = E(z, D) = {x ∈ Rn : (x − z)| D(x − z) ≤ 1} be an
ellipsoid with positive definite D ∈ Rn×n and center z ∈ Rn . The volume of E
vol(Bn )
is given by vol(E) = √ .
det(D)
Proof. Exercises.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 72
The next statement shows that the matrix in Lemma 9.4 can be chosen to
be positive definite.
Lemma 9.8. A set E ⊆ Rn is an ellipsoid if and only if there exists a positive
definite matrix A ∈ Rn×n and a vector z ∈ Rn with E = ABn + z.
Proof. Let A ∈ Rn×n be nonsingular and z ∈ Rn such that E = ABn + z. The
polar decomposition of A yields a positive definite matrix A0 ∈ Rn×n and an
orthogonal matrix U ∈ Rn×n with A = A0 U . The claim follows since U Bn =
Bn .
Towards the proof of Theorem 9.6, let us first consider the special case that
there cannot be two maximum-volume ellipsoids that are translates of each
other.
Lemma 9.9. Let E ⊆ Rn be an ellipsoid and let t ∈ Rn \ {0}. Then there exists
an ellipsoid E 0 ⊆ conv(E ∪ (E + t)) with vol(E 0 ) > vol(E).
Proof. We may assume that ktk = 1. Let us call v1 := t and set
1
ε := > 0.
2 max{|v1| x| : x ∈ E}
Pick vectors v2 , . . . , vn ∈ Rn such that v1 , . . . , vn are an orthonormal basis of
Rn . Consider the linear map f : Rn → Rn defined via
Xn Xn
f αi vi = (1 + ε)α1 v1 + αi vi .
i=1 i=2
λ := 1
2 − εα1 = 1
2 − εv1| x ∈ [0, 1]
we obtain
The final ingredient for Theorem 9.6 is the fact that the determinant is “log-
concave” over the cone of positive definite matrices, i.e., log det(·) is a concave
function over the set of positive definite matrices.
Lemma 9.10. For positive definite matrices A, B ∈ Rn×n we have
p
det 12 (A + B) ≥ det(A) det(B),
as well as
1 −1
1
+ B) = det(U )2 det + B)U −1
det 2 (A 2U (A
n
Y 1 + µi
= det(U )2 det 1
+ B 0 ) = det(U )2
2 (I 2
.
i=1
Thus, we have det 12 (A1 + A2 ) = det(A1 ) det(A2 ) and hence Lemma 9.10
p
where the last equality holds since ui ∈ ∂Bn . Since Bn ⊆ K and ui ∈ ∂Bn ∩ ∂K
we must have
u|i x ≤ 1 for all x ∈ K. (9.3)
(Since ui is in the boundary of ∂K we can choose a supporting hyperplane H
of K at ui . Since Bn ⊆ K and ui ∈ ∂Bn , we see that H is also a supporting
hyperplane of Bn at ui . This hyperplane is unique and has the above form.)
Let E ⊆ K be an ellipsoid with
( n 2 )
n
X (x − z)| vi
E= x∈R : ≤1
i=1
αi
Qn
as in the previous lemma. We have to show that i=1 αi ≤ 1. It is straightfor-
ward to check that, for every i ∈ [m], the point
− 21
Xd Xn
Rn 3 xi := z + αj2 (u|i vj )2 αj2 u|i vj vj
j=1 j=1
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 75
Let us now suppose that (9.1) does not hold. Then we have
(Note that for λ ≤ 0 we have h(I, 0), (uu| , u)i = kuk = 1 > 0 ≥ λn =
h(I, 0), λ(I, 0)i.) By the separation theorem there exists a symmetric matrix
H ∈ Rn×n and a vector h ∈ Rn with
and
u| Hu + h| u = h(H, h), (uu| , u)i > 0 (9.5)
for all u ∈ ∂Bn ∩ ∂K.
Since H is symmetric, we know that Qε := I + εH is positive definite for
ε h and consider the ellipsoid
small enough ε > 0. Let zε := − 2ε Q−1
Eε := {x ∈ Rn : (x − zε )| Qε (x − zε ) ≤ 1}.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 76
Note that
(x − zε )| Qε (x − zε ) = x| Qε x − 2zε Qε x + zε| Qε zε
= kxk + ε(x| Hx + h| x) + zε| Qε zε
≥ kxk + ε(x| Hx + h| x).
= nkxk − kxk2 ,
Actually, the following strengthening of Theorem 9.3 can be given for sym-
metric convex bodies:
Theorem 9.13. For every centrally symmetric (i.e., K = −K) convex body
K ⊆ Rn there exists an ellipsoid E with center z such that
√
E ⊆ K ⊆ n(E − z) + z.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 77
Proof. Let x ∈ K and recall that we have u| x ≤ 1 for all u ∈ ∂Bn ∩ ∂K.
Since −x ∈ K, we also have u| x ≥ −1 and hence |u| x| ≤ 1. Thus, choosing
u1 , . . . , um and c1 , . . . , cm as in Theorem 9.12 we obtain
m
X n
X
2
kxk = ci (u|i x)2 ≤ ci = n.
i=1
| {z } i=1
≤1
The Flatness theorem asserts that any convex body without integer points is
contained between two such slices that are not far apart.
Theorem 9.17 (Khintchin, 1948). There exists a function f : Z≥1 → R that
satisfies the following. For every full-dimensional lattice Λ ⊆ Rn and every
convex body K ⊆ Rn with K ∩ Λ = ∅ there exists some v ∈ Λ∗ \ {0} with
max{v | x : x ∈ K} − min{v | x : x ∈ K} ≤ f (n).
Note that once the above result has been established for the special case
Λ = Zn , it holds for all lattices. However, it will be actually more convenient
for us to work with general lattices and hence stick to the general case.
The best possible bound f (n) is not known yet, but it is known that there
exist constants c1 , c2 , c3 > 0 such that
c1 n ≤ f (n) ≤ c2 n4/3 logc3 n
holds for all n.2
In this course, we will establish the following bound, which is again absolutely
sufficient for our applications.
Theorem 9.18 (Lagarias, Lenstra, Schnorr 1990). The function f can be cho-
sen as f (n) := n5/2 .
We begin our proof with a simple bound on the length of the shortest nonzero
lattice vector.
n
Lemma 9.19. For every √ full-dimensional lattice Λ ⊆ R there exists some
x ∈ Λ \ {0} with kxk ≤ n · | det(Λ)|1/n .
Proof. Let Λ = Λ(A), where A ∈ Rn×n . Consider the cube
C := [−| det(A)|1/n , | det(A)|1/n ]n
and note that vol(C) = 2n | det(A)|. Since C is compact, Minkowski’s
√ theorem
yields a point x ∈ C ∩Λ\{0}. The claim follows since kxk = kxk2 ≤ nkxk∞ ≤
| det(A)|1/n .
We need to introduce two further quantities related to a lattice.
Definition 9.20 (Packing and covering radii). The packing radius of a full-
dimensional lattice Λ is defined as
1
pack(Λ) = min{kxk : x ∈ Λ \ {0}}.
2
The covering radius of Λ is given by
cov(Λ) = min{ρ > 0 : for all x ∈ Rn there is some z ∈ Λ
with kx − zk ≤ ρ}.
2 The upper bound is due to Rudelson (2000). It is believed that f (n) grows only linearly
in n. We recently showed that f (n) ≥ 2n − o(n). Can you beat this?
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 79
n3/2
cov(Λ) · pack(Λ∗ ) ≤ .
4
Before we prove the above result, let us demonstrate that it easily leads to
the Flatness theorem. Let us first prove the Flatness theorem for ellipsoids. In
this proof, we will see that the covering and packing radii appear naturally.
Proposition 9.22. If Λ ⊆ Rn is a full-dimensional lattice and E ⊆ Rn an
ellipsoid with E ∩ Λ = ∅, then there exists some v ∈ Λ∗ \ {0} with
Proof. Let E = ABn + z for some invertible matrix A. Consider the lattice
Λ̄ := A−1 Λ and note that we have
In other words, we see that 1 ≤ dist(A−1 z, Λ̄) ≤ cov(Λ̄) holds. By Lemma 9.21
this implies
n3/2 n3/2
· pack(Λ̄∗ ) ≤ ≤ .
4 cov(Λ̄) 4
This means that there exists a vector u ∈ Λ̄∗ with kuk ≤ 12 n3/2 . Setting
v := A−| u we see that v ∈ Λ∗ since v | z = (A−| u)| z = u| A−1 z ∈ Z holds for
every z ∈ Λ. Note that
max{v | x : x ∈ E} − min{v | x : x ∈ E}
= max{v | (Ay + z) : y ∈ Bn } − min{v | (Ay + z) : y ∈ Bn }
= max{v | Ay : y ∈ Bn } − min{v | Ay : y ∈ Bn }
= max{u| y : y ∈ Bn } − min{u| y : y ∈ Bn }
= u| kuk
1
u − u| − kuk 1
u
= 2kuk ≤ n3/2 .
We obtain
max{v | x : x ∈ K} − min{v | x : x ∈ K}
≤ max{v | x : x ∈ n(E − z) + z} − min{v | x : x ∈ n(E − z) + z}
= n · (max{v | x : x ∈ E} − min{v | x : x ∈ E}) ≤ n5/2 .
span(v1 , . . . , vk ) = span(b1 , . . . , bk ).
Proof. Let V ∈ Rd×k denote the matrix whose columns are v1 , . . . , vk , and let
B 0 ∈ Rd×n denote any lattice basis of Λ. Note that there exists an integer
matrix W ∈ Zn×k with B 0 W = V and rank(W ) = k. By Corollary 3.14, there
is a unimodular matrix U ∈ Zn×n such that W | U is in Hermite normal form.
Recall that B := B 0 U −| is also a lattice basis of Λ. The space spanned by
the first k columns of B is equal to
Λ1 = Λ(τ (π(b1 )), . . . , τ (π(bn ))) = Λ(τ (π(b2 )), . . . , τ (π(bn )))
is a full-dimensional lattice.
First, we claim that
pack(Λ∗ ) ≤ pack(Λ∗1 ) (9.7)
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 81
holds. To this end, let a ∈ Λ∗1 with kak = 2 pack(Λ∗1 ). Consider the vector
b ∈ W with τ (b) = a. For every z ∈ Λ we have
kuk2
cov(Λ)2 ≤ cov(Λ1 )2 + = cov(Λ1 )2 + pack(Λ)2 . (9.8)
4
holds. Note that this implies
n2
≤ cov(Λ1 )2 pack(Λ∗1 )2 +
16
(n − 1)3 n2
≤ + (induction)
16 16
n3
≤ . (for n ≥ 2)
16
Thus, it remains to prove (9.8). We have to show that for every x ∈ Rn there
2
exists some z ∈ Λ with kx − zk2 ≤ cov(Λ1 )2 + kuk 4 . To this end, let x ∈ R and
n
ky − vk ≤ cov(Λ1 ).
kz − xk2 = kz − x0 + x0 − xk2
= kz − x0 k2 + (z − x0 )| (x0 − x) + kx0 − xk2
= kz − x0 k2 + ( z − x0 )| (τ −1 (v) − π(x)) + kx0 − xk2
| {z } | {z }
∈span(u) ∈W
0 2 0 2
= kz − x k + kx − xk
= kz − x0 k2 + kτ −1 (v) − π(x)k2
kuk2
= kz − x0 k2 + kv − τ (π(x)) k2 ≤ + cov2 (Λ1 ).
| {z } 4
=y
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 82
Let us recall Theorem 9.1, which states that for every fixed n, there is a
polynomial-time algorithm to solve integer programs max{c| x : Ax ≤ b, x ∈
Zn }. Such an algorithm was given by Lenstra in 1981, which we want to present
here. To this end, let n be fixed from now on.
Deciding feasibility
Given A ∈ Qm×n , b ∈ Qm , it suffices to describe a polynomial-time algorithm
for deciding whether P ∩ Zn 6= ∅ holds, where P := P ≤ (A, b). Again using
Theorem 4.14, by possibly intersecting P with a box, we may assume that P
is bounded. Next, by solving some auxiliary linear programs, we can efficiently
determine the affine hull of P . If dim(P ) = d < n, we follow a strategy used
in the proof of Theorem 6.10: By possibly translating P , we may assume that
0 ∈ aff(P ). Then there exists a matrix
CU = [B 0]
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 83
U −1 P ∩ Zn = {U −1 x : Ax ≤ b, x ∈ Zn }
= {U −1 x : Ax ≤ b, x ∈ aff(P ), x ∈ Zn }
= {U −1 x : Ax ≤ b, Cx = 0, x ∈ Zn }
= {y : AU y ≤ b, CU y = 0, y ∈ Zn }
= {y : AU y ≤ b, [B 0]y = 0, y ∈ Zn }
= {y : AU y ≤ b, y1 = · · · = yn−d = 0, y ∈ Zn }.
Note that determining whether the latter set is nonempty is an integer feasibility
problem on only d < n variables, and hence we can assume that this can be
solved in polynomial time.
Thus, it remains to consider the case where P is a full-dimensional polytope,
which is the most interesting case. In the first step, we run on of our polynomial-
time algorithms to determine an ellipsoid E := T Bn +t, where T ∈ Qn×n , t ∈ Qn
with
T Bn + t ⊆ f (n)T Bn + t. (9.9)
In the second step, we employ the Flatness theorem. To this end, consider the
lattice Λ := T −1 Zn and let u ∈ Λ∗ \ {0}. In the proof of Proposition 9.22 we
have seen that the nonzero vector v := T −| u ∈ T −| T | Zn = Zn satisfies
{x ∈ Zn : Ax ≤ b, v | x = k}, k ∈ {α, α + 1, . . . , β}
is feasible. We may assume that we can solve each such subproblem in polyno-
mial time since it is lower-dimensional (see above). Recall that the number of
such subproblems is at most 2kuk + 2. Let u∗ denote a shortest nonzero vector
in Λ∗ . We have not discussed yet how to compute such a vector, and in fact
we will only describe a method to compute a nonzero vector u ∈ Λ∗ \ {0} such
that kuk ≤ g(n)ku∗ k for some universal constant g(n) ≥ 1. Let us run this
method to obtain such a u. If kuk ≤ 21 g(n)n3/2 , the number of subproblems is
bounded by a constant (only depending on n), and we are done. Otherwise, we
know that ku∗ k > 21 n3/2 (the ellipsoid E is not “flat”). In this case, the Flatness
theorem (for ellipsoids, Proposition 9.22) states that E must contain an integer
point, and so does P .
As we have seen in the previous section, the only missing ingredient for Lenstra’s
algorithm is an algorithm for finding a “short” nonzero vector in a given lattice.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 84
• |µi,j | ≤ 1
2 for all 1 ≤ j < i ≤ n, and
• ( 43 − µ2i+1,i )kgi k2 ≤ kgi+1 k2 for i = 1, 2, . . . , n − 1.
Let us first show that an LLL-reduced basis has the property mentioned
above.
4. Return b1 , . . . , bn .
Proposition 9.29. If the LLL-algorithm stops for an input basis b1 , . . . , bn , it
returns an LLL-reduced basis for Λ(b1 , . . . , bn ).
Proof. Note that throughout the algorithm we only perform elementary column
operations on [b1 . . . bn ] and hence the resulting vectors are still a lattice basis
for Λ(b1 , . . . , bn ). Moreover, the second step does not change the Gram-Schmidt
vectors of b1 , . . . , bn . Thus, it is clear that the returned vectors indeed satisfy
the conditions of an LLL-reduced basis.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 86
Proof. Let G = [g1 . . . gn ] and B = [b1 . . . bn ] and let R = G−1 B. Note that
B = GR and hence R is an upper triangular matrix with Ri,i = 1 for all i.
By subtracting integer multiples of columns from subsequent columns we can
efficiently find an upper triangular integer matrix U with Ui,i = 1 for all i such
that RU is again upper triangular with (RU )i,i = 1 for all i and |(RU )i,j | ≤ 1/2
for all i 6= j.
We claim that we can replace b1 , . . . , bn by the columns of BU . To this
end, note that the columns of BU also arise from b1 , . . . , bn by subtracting
integer multiples of columns from subsequent columns. In particular, g1 , . . . , gn
are still the Gram-Schmidt vectors of the columns of BU . Let us denote the
Gram-Schmidt coefficients of the columns of BU by µ̃i,j . Since BU = (GR)U =
G(RU ), we see that |µ̃i,j | = |(RU )j,i | ≤ 1/2 for all 1 ≤ j < i ≤ n.
Theorem 9.31. The number of iterations of the LLL-algorithm is polynomial
in the encoding lengths of b1 , . . . , bn ∈ Qn .
Proof. We claim that it suffices to analyze the number of iterations for in-
teger vectors b1 , . . . , bn . Indeed, if λ 6= 0, then λg1 , . . . , λgn are the Gram-
Schmidt vectors of λb1 , . . . , λbn . Moreover, if b1 , . . . , bn is LLL-reduced, then so
is λb1 , . . . , λbn . Finally, the steps of the LLL-algorithm on inputs b1 , . . . , bn and
λb1 , . . . , λbn are identical.
Thus, we may assume that b1 , . . . , bn are integer vectors. Notice that,
throughout the LLL algorithm, b1 , . . . , bn stay integer vectors. For any regular
matrix A ∈ Zn×n and j ∈ [n] let
Note that the encoding length of Φ(A) is polynomially bounded in the encoding
length of A. Thus, it suffices to show that after every swap operation of the
LLL-algorithm we have
3
Φ([b01 . . . b0n ]) ≤ Φ([b1 . . . bn ]),
4
where b01 , . . . , b0n arise from b1 , . . . , bn by swapping the respective vectors.
To this end, suppose that the LLL-algorithm swaps the vectors bj−1 and bj
in some iteration. Let B and B 0 denote the matrices consisting of the basis
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 87
vectors before and after the swap, respectively. Note that V` (B) = V` (B 0 ) for
all ` 6= j − 1 and hence
Φ(B 0 ) Vj−1 (B 0 )
= . (9.11)
Φ(B) Vj−1 (B)
Let g10 , . . . , gn0 denote the Gram-Schmidt vectors of B 0 . Clearly, we have g` = g`0
for all ` ∈ [j − 2]. Moreover, we have gj−1 0
= gj − µj,j−1 gj−1 (why?). Since the
LLL-algorithm swapped bj−1 and bj , we must have
or, equivalently,
3
4 kgj−1 k
2
> kgj k2 + kµj,j−1 gj−1 k2 = kgj + µj,j−1 gj−1 k2 . (9.12)