0% found this document useful (0 votes)
16 views88 pages

Integer Optimization-Stephan Weltge

Lecture notes from an Integer Optimization course from TUM(Munich).

Uploaded by

Anand Ravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views88 pages

Integer Optimization-Stephan Weltge

Lecture notes from an Integer Optimization course from TUM(Munich).

Uploaded by

Anand Ravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Integer Optimization123

Stefan Weltge

Winter term 2021/22


Technical University Munich

Last Update: February 4, 2022

1
A more specific name would have been “Linear Integer Optimization”.
2
We would like to thank Volker Kaibel for providing his great lecture notes, which
have been the main source for designing this course. In fact, several parts are direct
translations of parts from his lecture notes.
3
This course assumes solid knowledge of Linear Algebra and the basics of Linear
Optimization.
Contents

1 Introduction 3
1.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Lecture 1
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Relation to other problems . . . . . . . . . . . . . . . . . . . . . 7 Lecture 2
1.4 P and NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Lecture 3

2 Linear equations and inequalities 13


2.1 Linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . 16 Lecture 4

3 Linear Diophantine equations 21


3.1 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Lecture 5
3.2 Hermite normal form . . . . . . . . . . . . . . . . . . . . . . . . . 22 Lecture 6
3.3 Unimodular matrices . . . . . . . . . . . . . . . . . . . . . . . . . 25 Lecture 7
3.4 Solving linear Diophantine systems . . . . . . . . . . . . . . . . . 26
3.5 Minkowski’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 27 Lecture 8

4 Integer hulls 31
4.1 Refresher on polyhedra . . . . . . . . . . . . . . . . . . . . . . . . 31 Lecture 9
4.2 Integer points in polyhedra . . . . . . . . . . . . . . . . . . . . . 32
4.3 Integer hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Lecture 10
4.4 Refresher on faces of polyhedra . . . . . . . . . . . . . . . . . . . 35
4.5 Solving IPs via convexification . . . . . . . . . . . . . . . . . . . 37 Lecture 11

5 Integer polyhedra 38
5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Total unimodularity . . . . . . . . . . . . . . . . . . . . . . . . . 38 Lecture 12
5.3 Incidence matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 Integer-valued polyhedra . . . . . . . . . . . . . . . . . . . . . . . 44 Lecture 13
5.5 Total dual integrality . . . . . . . . . . . . . . . . . . . . . . . . . 45 Lect’s 14 & 15

6 Chvátal-Gomory cuts 50
6.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Lecture 16
6.2 Chvátal-Gomory closure . . . . . . . . . . . . . . . . . . . . . . . 51 Lecture 17

1
CONTENTS 2

7 Split cuts 58
7.1 Split disjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Lecture 18
7.2 Split closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Lecture 19
7.3 Split cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

8 Branch & cut 65


8.1 Branch & bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Lecture 20
8.2 Branch & bound for integer programs . . . . . . . . . . . . . . . 66
8.3 Branch & cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.4 Solver output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

9 Integer programming in fixed dimension 70


9.1 Lenstra’s result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Lecture 21
9.2 Approximating convex bodies by ellipsoids . . . . . . . . . . . . . 70 Lect’s 22 & 23
9.3 Flatness theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Lecture 24
9.4 Lenstra’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Lecture 25
9.5 Shortest vector problem and the LLL-algorithm . . . . . . . . . . 83 Lecture 26
Chapter 1

Introduction

1.1 Setting Lecture 1

During this course, we will consider (finite-dimensional) linear optimization


problems with the further requirement that certain variables are only allowed to
take integer values. There exist several names for this problem class, including

• (linear) integer optimization problems


• (linear) mixed-integer optimization problems
• (linear) integer programs
• (linear) mixed-integer programs.

Here, “program” is not meant to be a computer program, but refers to a rather


old synonym for an optimization problem. The term “linear” is often dropped,
although there also exist nonlinear optimization problems with integrality re-
quirements, of course. The latter problems are usually much harder and are not
the scope of this course.
Since integer program is the most frequently used name for the problems we
consider, we will also adopt this notation. You can see that this is in line with
the titles of the two beautiful books
• Theory of Linear and Integer Programming by A. Schrijver, 1998
• Integer Programming by M. Conforti, G. Cornuéjols, G. Zambelli, 2014

that we recommend for background material and further references.


General (pure) integer programs are of the form

max / min c| x subject to Ax ≤ b, x ∈ Zn ,

where c ∈ Rn , A ∈ Rm×n , and b ∈ Rm , and ≤ is meant entry-wise. Recall that


this form also allows to implicitly include linear equalities. Several authors also
refer to problems of the form

max / min c| x subject to Cx = d, x ≥ 0, x ∈ Zn

3
CHAPTER 1. INTRODUCTION 4

as integer programs in “standard form”. Note that in both settings we require


all variables to be integer, which is the case that we will mostly focus on.
Finally, let us mention that, in many applications, integer variables are used
to model yes/no decisions, in which case constraints 0 ≤ xi ≤ 1, xi ∈ Z are
imposed. In this case, we will also simply write xi ∈ {0, 1} and call such
variables binary. An integer program with all variables being binary is written
as
max / min c| x subject to Ax ≤ b, x ∈ {0, 1}n ,
and often called a binary integer program.

1.2 Examples
Integer programs are among the most common models to mathematically de-
scribe a wide variety of problems in operations research and related fields. This
is due to their high expressive power, which made them the tool of choice for
numerous real-world optimization problems, and also led to a large ecosystem of
commercial solvers and modeling languages supporting integer programs. In this
section, we want to illustrate with some first examples how various optimization
tasks can be easily formulated as integer programs. During the exercises, you
will learn how to derive such formulations on your own and how to solve them
using software tools.

Pandemic restrictions
The r-value (basis reproduction number) is an important measure to understand
the spread of a pandemic. Let us suppose that due to many restrictions we
reached an r-value of 0.6. We now want to drop some restrictions but still keep
an r-value of at most 0.9. To this end, we may open some facilities. We assume
that opening a facility type has a certain value to the society and results in
certain increase of the r-value (and that these effects are independent among
the facility types), see the table below. Our goal is to find a subset of facility
types whose openings do not result in an r-value above 0.9 and whose total
value is largest possible.

facility type ∆r value


1 kindergartens 0.07 8
2 schools 0.12 10
3 universities 0.11 7
4 restaurants 0.08 5
5 cinemas 0.06 4
6 soccer stadiums 0.09 2
7 discos 0.16 3

However, we have to take care of the following implications.

1. If we open universities, we also have to open schools and kindergartens.


CHAPTER 1. INTRODUCTION 5

2. If we open restaurants and cinemas, we also have to open soccer stadiums.

A valid formulation of the problem as an integer program is:

max 8x1 + 10x2 + 7x3 + 5x4 + 4x5 + 2x6 + 3x7 subject to


0.07x1 + 0.12x2 + 0.11x3 + 0.08x4 + 0.06x5 + 0.09x6 + 0.16x7 ≤ 0.3,
x3 ≤ x1 ,
x3 ≤ x2 ,
x4 + x5 ≤ x6 + 1,
x1 , . . . , x7 ∈ {0, 1}

Stable sets
Let G = (V, E) be an undirected graph. A subset S ⊆ V of nodes is called a
stable set (or independent set) if no two nodes in S are adjacent. Our task is
to find a stable set of maximum cardinality. (We might think of a situation in
which we want to perform as many actions as possible, but some of which are
conflicting with each other. A simple example is inviting friends to a party by
avoiding people who dislike each other.) A possible formulation is the following.
X
max xv : xv + xw ≤ 1 for all {v, w} ∈ E,
v∈V
x ∈ {0, 1}V

Sudoku
Suppose we are given some entries of a (9 × 9)-matrix that we have to complete
to a Sudoku matrix, say (i1 , j1 , a1 ), . . . , (ik , jk , ak ), where the a` is the given
entry at row i` and j` . How do we determine the missing entries (or decide
that the given entries cannot be completed to a Sudoku matrix) using integer
programming?
For this problem, it seems natural to use an integer variable with range
{1, 2, . . . , 9} for every entry of the final matrix. However, one quickly realizes
that such variables alone do not allow us to formulate the requirements of a
Sudoku matrix using linear constraints. Again, we will make use of binary vari-
ables. To this end, for every entry i, j of the Sudoku matrix, we will introduce
binary variables xi,j,1 , xi,j,2 , . . . , xi,j,9 where we think of xi,j,t = 1 if and only if
the (i, j)-entry is filled with number t.
We now want to determine a solution x subject to the following constraints
CHAPTER 1. INTRODUCTION 6

9
X
xi,j,t = 1 for all i, j ∈ [9],
t=1
9
X
xi,j,t = 1 for all i, t ∈ [9],
j=1
9
X
xi,j,t = 1 for all j, t ∈ [9],
i=1
p+3
X q+3
X
xi,j,t = 1 for all t ∈ [9], p, q ∈ {0, 3, 6}
i=p+1 j=q+1

xi` ,j` ,a` = 1 for all ` ∈ [k],

Note that the above problem does not contain an objective function. Prob-
lems of this type are called feasibility problems.

Facility location
Finally, let us consider the facility location problem. We are given facilities
n that we may open/build in order to ship products. Our m customers have
demands D1 , . . . , Dm > 0. For each i ∈ [m], j ∈ [n], the cost of shipping one
unit from facility j to customer i is given by ci,j > 0. Moreover, for each j ∈ [n]
we are given a fixed cost fj > 0 for opening facility j and a total capacity uj > 0
of units that facility j can ship at all.
We want to find a subset of facilities to be opened such that we can serve
the customers’ demands at minimum total cost. A possible integer program to
solve this problem is the following.

m
X m X
X n
min fi yi + ci,j xi,j
i=1 i=1 j=1
Xn
xi,j = Di for all i ∈ [m]
j=1
Xm
xi,j ≤ uj yj for all j ∈ [n]
i=1
xi,j ≥ 0 for all i ∈ [m], j ∈ [n]
yj ∈ {0, 1} for all j ∈ [n]

Note that the above problem is a mixed integer program, in which only a
proper subset of the variables is required to be integer.
CHAPTER 1. INTRODUCTION 7

Greatest common divisor


Our last example refers to the arguably smallest nontrivial integer program. For
integers a, b ∈ Z not both zero consider the two-variable integer program

min ax + by : ax + by ≥ 1, x, y ∈ Z (1.1)

We claim that in order to solve the above integer program, some (basic) number
theory is required. To this end, recall that the greatest common divisor gcd(a, b)
is the smallest positive integer that divides a and b.
Proposition 1.1. The optimal value of (1.1) equals gcd(a, b).

Proof. Let x, y ∈ Z denote an optimal solution to (1.1) and set d := ax + by.


Since g := gcd(a, b) divides a and b, it also divides d and hence g ≤ d.
We claim that d is a common divisor of a and b, and hence d ≤ g in which
case we are done. For the sake of contradiction, suppose that d does not divide
a (the case that d does not divide b works analogously). By division with
remainder we can write a = kd + r where k ∈ Z and r ∈ {1, . . . , d − 1}. Setting
x0 := 1 − kx ∈ Z and y 0 := −ky ∈ Z we obtain

ax0 + by 0 = a − kax − kby = a − k(ax + by) = a − kd = r,

which contradicts the optimality of x, y (since 1 ≤ r < d).

1.3 Relation to other problems Lecture 2

We have seen that integer programs can be easily used to model a variety of
problems. This comes at a price: Solving general integer programs is not easy!
From the theoretical point of view, this statement can be formalized in a partic-
ularly strong way. To this end, it will be helpful to focus on decision problems
instead of optimization problems. For example, instead of finding an solution
x ∈ Zn with Ax ≤ b of maximum objective c| x we may ask whether there
exists a solution x ∈ Zn with Ax ≤ b such that c| x ≥ γ for some given γ. In
many cases, optimization problems can be efficiently reduced to a series of such
decision problems (at least in theory).
Let us consider some decision problems that are closely related to integer
programs.

• Linear equations: Given a rational system of linear equations, decide


whether there exists a solution.
• Linear inequalities: Given a rational system of linear inequalities, decide
whether there exists a solution.1
• Linear diophantine equations: Given a rational system of linear equations,
decide whether there exists an integer solution.
• Linear diophantine inequalities: Given a rational system of linear inequal-
ities, decide whether there exists an integer solution.2
1 feasibility version of linear programming
2 feasibility version of integer programming
CHAPTER 1. INTRODUCTION 8

In Linear Algebra, you have learned that the first problem can be solved
“efficiently”. In the next section we will describe a way to formalize what this
means. There also exist efficient algorithms for the second problem (linear
programming). For instance, the simplex algorithm typically runs very fast
in practice. Unfortunately, there are some (mostly artificial) instances that
cause a bad behavior of the simplex algorithm. However, there also exist more
sophisticated methods that will always run “fast”, and we will briefly study them
at some point in the course.
How about the diophantine versions of the above problems? Somehow sur-
prisingly, it is also possible to solve linear diophantine equations efficiently,
which is one of the first main results of this course.
This is all good news. How about linear diophantine inequalities, i.e., in-
teger programs? So far, we have not found algorithms that perform well on
all instances. In fact, there are even strong indications that such algorithms
simply do not exist! The main reason is that integer programs cover many
other problems that are regarded to be theoretically hard, such as the stable set
problem.

1.4 P and NP
In this part, we would like to understand how the notion of an efficient algorithm
can be formalized and how different problems can be compared in terms of
difficulty. Such questions are at the core of computational complexity theory,
a very fundamental branch of theoretical computer science. While it can be
entirely understood within purely mathematical terms, we will only have time to
mention the most important concepts that form the basis of complexity theory,
and still stay quite informal. A standard reference on the topics covered below
is the beautiful book
• Computers and Intractability by M. Garey, D. Johson, 1979.
A more modern reference that covers many exciting recent developments in the
field is the book
• Computational Complexity: A Modern Approach by S. Arora, B. Barak,
2007.

Problems, algorithms, elementary operations: We will think of a deci-


sion problem as a set of expected inputs that are partitioned into “yes” and ”no”
instances, where inputs can be anything: numbers, matrices, graphs etc. For
example, a decision problem might expect rational square matrices and regards
a matrix as a “yes” input if and only if it is invertible. An algorithm for such a
problem is a list of deterministic instructions that can deal with any expected
input and will output the correct answer after a finite number of steps. Each
instruction is only allowed to be an “elementary operation”. Given an input,
the running time of an algorithm is the total number of elementary operations
before an answer is returned.
We stress again that all such concepts can be mathematically formalized.
For instance, the notion of an algorithm is nicely covered by the concept of a
Turing machine, for which the running time is a well-defined number.
CHAPTER 1. INTRODUCTION 9

We should think of an elementary operation as an extremely limited oper-


ation. For instance, even adding two integers on a computer requires several
operations with single digits – similar to those that we have been taught in
school. Computing the addition of two single digits (not numbers!) is a good
example of an elementary operation.

Input sizes: Clearly, the running time of an algorithm may depend on the
size of its input. For instance, the problem of deciding whether a given positive
integer is prime may require more time (= elementary operations) for a large
number than for a small one.
This raises an important point: How shall we measure the size of an input?
For many types of inputs there is a common sense for how we define its size, or
often also called its encoding length. The encoding length can be thought of a
natural way to represent an object in a computer using bits, for example:
Definition 1.2. Given a rational number z = pq with p ∈ Z, q ∈ Z≥1 , gcd(p, q) =
1, the encoding length of z is defined as hzi := 1 + dlog2 (|p| + 1)e + dlog2 (q)e.
Definition 1.3.P
The encoding length of a rational matrix A ∈ Qm×n is defined
m Pn
as hAi := mn + i=1 j=1 hAi,j i.

The class P: Having agreed on a definition of running time (= number of


elementary operations), we would like to distinguish between rather naive al-
gorithms, such as total enumeration, and algorithms that act in a considerably
faster way. A very common and robust notion is the following: We say that
an algorithm is a polynomial-time algorithm if there exists a polynomial p such
that for every input x, the number of elementary operations of the algorithm is
at most p(hxi), where hxi is the encoding length of x.
All of the previously mentioned “efficient” algorithms are in fact polynomial-
time algorithms. However, for problems such as integer programming or the
stable set problem, no polynomial-time algorithms are known.
The class of decision problems that admit polynomial-time algorithms is
simply called P. So, in other words, the first three problems from the previous
section belong to P, whereas it is not known whether the fourth problem belongs
to P.

The class NP: However, the fourth problem (feasibility of integer program-
ming) belongs to another important class of decision problems, called NP. It
is important to mention that NP does not stand for “non-polynomial”! (Un-
fortunately, it stands for something else that might be still quite misleading).
Decision problems in NP admit algorithms with the following property. For
every “yes” input x, there exists some nice extra input y whose encoding length
can be polynomially bounded in the encoding length of x such that if the algo-
rithm is given both x and y it can verify in polynomial time that y is indeed a
certificate for the fact that x is a “yes” input. In other words, for “yes” inputs x
there is a rather short proof y that can be efficiently checked, showing that that
x is indeed a “yes” input. However, for “no” inputs such proofs are not required
to exist. Moreover, the algorithm is not required to find the extra input y itself.
For example, we will see that (the decision version of) integer programming
is in NP: A “yes” instance is a diophantine system of linear inequalities that has
CHAPTER 1. INTRODUCTION 10

a solution. Clearly, the solution itself may be regarded as the extra input y,
and it can be easily checked that y is indeed an integer solution that satisfies all
constraints. A little catch is that we have to be careful about the encoding length
that y requires, but we will see later that this is indeed not an issue. Recall
that the definition of NP does not require the existence of a short certificate y in
the case of a “no” instance. In fact, for “no” instances of diophantine system of
linear inequalities we are not aware of simple proofs certifying their infeasibility.
Let us remark that it is easy to see that P is a subset of NP: Polynomial-time
algorithms for problems in P do not even need any certificate y to determine
whether x is a “yes” input.
However, the class NP contains thousands of nice problems for which no
polynomial-time algorithms are known, including the traveling salesman prob-
lem, the satisfiability problem, or the clique problem.

P vs. NP: The question whether P is a proper subset of NP is regarded as


one of the most fundamental open problems in mathematics. It is commonly
believed that the answer is affirmative, i.e., P 6= NP. Clearly, if we could prove
that there is no polynomial-time algorithm for integer programming, then this
would imply P 6= NP. However, very surprisingly, the converse holds as well:
If P 6= NP, then there can be also no polynomial-time algorithm for integer
programming.

NP-hard, NP-complete: The reason is based on the magic of the class NP


itself: It turns out that there exist single decision problems that capture the
hardness of all problems in NP simultaneously. Such problems are called NP-
hard and can be informally described as follows: A problem A is NP-hard if
every problem B in NP can be transformed into a special case of problem A, in
polynomial time. Thus, every algorithm for problem A can be used for problem
B: Take an input of problem B, transform it to an input for problem A, run
the algorithm for problem A, and return the answer. In this case, we would say
that B can be efficiently reduced to A. Such a process is called a (polynomial-
time) reduction. For instance, we have seen that the stable set problem can be
efficiently reduced to integer programming.
Note that an NP-hard problem is not required to be contained in NP, it
might be significantly “harder” than problems in NP. However, it is in some
sense at least as hard as every problem in NP.
The magic still continues at this point: There even exist NP-hard problems
that are contained in NP! Such problems are called NP-complete. Surprisingly,
today we know thousands of problems that are NP-complete, and one of them
is... integer programming! (Another one is the stable set problem.) This means
that if we could find a polynomial-time algorithm for integer programming, this
would imply polynomial-time algorithms for all problems in NP and is hence
regarded to be very unlikely.

Take-away: Clearly, this is bad news and shows the limitations of what we can
expect. However, at this point one should recall that the notion of a polynomial-
time algorithm is very restrictive: It has to apply to all inputs and assumes
answers to be always correct. In contrast, it is not clear whether the inputs
we are dealing with in practice will cause worst-case behaviors. Moreover, in
CHAPTER 1. INTRODUCTION 11

many situations it is sufficient to find a feasible solution whose objective value


is close to optimal rather than really equal to it. Generally, this suggests that
we should still try to do our best to get a very good understanding of integer
programming (or your favorite NP-hard problem) to circumvent the seeming
barriers of efficient computations. In fact, during this course we will learn about
several exciting structural insights about integer programming that come with
their own mathematical beauty and usually also allow us to design algorithms
that are very efficient in practice.

1.5 Duality Lecture 3

We close this introductory chapter with some remarks about duality, a central
concept in the theory of linear programming. To recall, suppose we are given a
linear program
max c| x : Ax ≤ b,
and let us assume that it is feasible and bounded, i.e., the optimum is attained.
The corresponding dual linear program is given by

min b| y : A| y = c, y ≥ 0.

The simple notion of weak duality states that every feasible solution y to the
dual problem provides an upper bound on the optimum value of the primal
problem: Indeed, given a feasible solution x for the primal problem, we see that

c| x = (A| y)| x = y | Ax ≤ y | b = b| y

holds. Notice that the dual variables y simply assign a nonnegative weight to
each inequality of the original system such that the corresponding combination
yields an inequality of the form c| x ≤ γ that holds for all feasible x. The
purpose of the dual linear program is to find the best upper bound γ that you
can prove in this way. The strong linear programming duality states that if we
are given an optimal solution x to the primal, then we can always certify its
optimality with the above simple strategy, i.e.,

max{c| x : Ax ≤ b} = min{b| y : A| y = c, y ≥ 0}.

At this point, we may ask whether a corresponding duality theorem also holds
for integer programs. The most natural problems to consider are

max c| x : Ax ≤ b, x ∈ Zn

and
min b| y : A| y = c, y ≥ 0, y ∈ Zm .
As for the case of linear programming, we immediately obtain weak duality:

max{c| x : Ax ≤ b, x ∈ Zn } ≤ max{c| x : Ax ≤ b}
= min{b| y : A| y = c, y ≥ 0}
≤ min{b| y : A| y = c, y ≥ 0, y ∈ Zm }.
CHAPTER 1. INTRODUCTION 12

However, strong duality does not hold for general integer programs: For ex-
ample, in the case A = (2), b = (1), c = (1), the two above inequalities are
strict.
Finally, we would like to mention that this is not a singular example. In
fact, for most (difficult) integer programs one should not expect strong duality
to hold. Otherwise, the concept of duality would yield a simple way of certifying
that a given solution is optimal. However, this refers to one of the issues about
integer programming being in NP raised in the previous section: If there exists
a feasible integer solution x∗ of objective value at least γ, there is a short proof
certifying this fact (which is essentially x∗ itself, we will get back to this later).
However, we do not know simple proofs showing that there is no solution of
value at least γ +ε (certifying the optimality of x∗ ). Moreover, again complexity
theory indicates that such proofs may simply not exist.
Chapter 2

Linear equations and


inequalities

2.1 Linear equations


In the previous chapter, we have mentioned that deciding whether a system of
linear equations has a solution can be solved in polynomial time. Having the
Gaussian algorithm in mind, this does not seem surprising. However, to see that
it really yields a polynomial time algorithm, a little care is necessary, which we
will explain here.
Let us suppose that we are given a system Ax = b of linear equations. Re-
call that the Gaussian algorithm is performing row operations on the augmented
matrix Ā = [A; b] to bring it into a simple form from which it is easy to derive
a solution for Ax = b or conclude that no solution exists. A very basic imple-
mentation works as follows: We search for a nonzero entry in the first part of
Ā (all columns except the last one), often also called the pivot element. Given
a pivot element at position (p, q) (the “pivot”), we add multiples of row p to the
other rows of Ā to make all other entries in column q zero. Now, if there is any
solution x to the original system, then xq is uniquely determined by all other
variables by solving the pth row for xq . Thus, we may think of deleting row p
and column q from Ā and continue. At some point, we will end up with the
situation in which Ā consists of no rows or the first part of Ā is all-zero. In the
first case, the system has a solution by setting all remaining variables to arbi-
trary values (say, zero), which uniquely determines the values of all previously
deleted variables. In the second case, the same works if the last column of Ā is
also all-zero. However, if the last column is non-zero, we can conclude that no
solution exists.
For example, let us consider the system

15x1 + x2 + 5x3 + 3x4 + 51x5 = 55


11x1 + x2 + 3x3 + 2x4 + 35x5 = 52
13x1 + x2 + 5x3 + 2x4 + 45x5 = 66
8x1 + x2 + 2x3 + x4 + 24x5 = 36

13
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 14

with augmented matrix


 
15 1 5 3 51 | 75
11
 1 3 2 35 | 52.
13 1 5 2 45 | 66
8 1 2 1 24 | 36

Choosing the entry (4, 2) as the first pivot element, we obtain


 
7 0 3 2 27 | 39
3 0
 1 1 11 | 16 .
5 0 3 1 21 | 30
8 1 2 1 24 | 36

We see that x2 is uniquely determined by the other variables and the last equal-
ity. In this sense, we may ignore the second column and fourth row from now
on. Choosing (2, 4) as the next pivot element, we obtain
 
1 0 1 0 5 | 7
 3 0 1 1 11 | 16
 2 0 2 0 10 | 14 .
 

8 1 2 1 24 | 36

Again, we see that x4 is determined by the remaining variables and the second
equation. Thus, we may also ignore the fourth column and the second row from
now on. Our final pivot element will be (1, 1), we obtain the matrix
 
1 0 1 0 5 | 7
3 0 1 1 11 | 16
0 0 0 0 0 | 0  .
 

8 1 2 1 24 | 36

Again, we see that x1 is determined by the remaining variables and the first
equation. The last remaining equation is trivially feasible, since its right-hand
side is zero. Thus, we may assign any values to the remaining variables x3 and
x5 and obtain a solution for the original system.
To understand whether the above algorithm runs in polynomial time, let
us consider the following, slightly more general algorithm (in which we should
think of A already as the augmented matrix).
Algorithm 2.1 (Gaussian elimination). Given a matrix A ∈ Qm×n , we create
matrices A(1) , . . . , A(m) ∈ Qm×n as follows. We set A(1) ← A and initialize
M ← [m] and N ← [n]. For each k = 2, . . . , m we consider A(k) ← A(k−1) and
perform the following steps.
(k)
1. If AM,N = 0, we set A(k+1) ← · · · ← A(m) ← A(k) and stop.
(k)
2. Pick any (p, q) ∈ M × N with Ap,q 6= 0.
(k)
(k) (k) Ai,q (k)
3. Ai,N ← Ai,N − (k) Ap,N for every i ∈ M \ {p}.
Ap,q

4. M ← M \ {p} and N ← N \ {q}.


CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 15

Note that for each k, step 3 performs at most O(mn) arithmetic operations.
We see that the total number of arithmetic operations is O(m2 n). Thus, the
number of arithmetic operations of the above algorithm is polynomial in the
input size. However, what about the encoding lengths of the numbers arising in
the matrices A(1) , . . . , A(m) ? The following two statements show that encoding
lenghts of all these numbers can indeed be polynomially bounded in the encoding
length of the input matrix.
(k)
Lemma 2.2. For every i, k ∈ [m] and j ∈ [n], the entry Ai,j is zero or (up to
sign) the quotient of the determinants of two square submatrices of A.
Proof. We proceed by induction on k = 1, . . . , m. The statement is trivial for
k = 1. Let k ≥ 2 and consider i ∈ [m], j ∈ [n]. Let M, N denote the respective
sets at the beginning of iteration k, and (p, q) ∈ M × N the respective pivot.
(k) (k−1)
Note if (i, j) ∈
/ M × N , then Ai,j = Ai,j and we are done. The same holds
(k)
if i = p. Moreover, after step 3, we have Ai,q = 0 for all i ∈ M \ {p}. Thus, it
suffices to consider the case (i, j) ∈ (M \ {p} × (N \ {q}).
Let M̄ = [m] \ M and N̄ = [n] \ N . The crucial observation is now that
(k) (k)
AM̄ ,∗ and AM̄ ∪{i},∗

arose from
AM̄ ,∗ and AM̄ ∪{i},∗
by adding multiples of rows to other rows, respectively. In particular, this means
that
   
(k) (k)
det AM̄ ,N̄ = det AM̄ ,N̄ and det AM̄ ∪{i},N̄ ∪{j} = det AM̄ ∪{i},N̄ ∪{j}
 

 
(k)
hold. Finally, let us compute det AM̄ ∪{i},N̄ ∪{j} by expanding along the ith
(k)
row. Since Ai,N̄ = 0, we obtain
   
(k) (k) (k)
det AM̄ ∪{i},N̄ ∪{j} = Ai,j · det AM̄ ,N̄ ,

which yields the claim.


Lemma 2.3. For every rational matrix A ∈ Qk×k we have hdet(A)i ≤ 2hAi.
p
Proof. Let det(A) = pq and Ai,j = qi,j
i,j
for all i, j, where p, pi,j ∈ Z and q, qi,j ∈
Z≥1 such that gcd(p, q) = 1 and gcd(pi,j , qi,j ) = 1. We may assume that p 6= 0,
otherwise the claim is trivial. The relevant encoding lengths are

hdet(A)i = 1 + dlog2 (|p| + 1)e + dlog2 (q)e (2.1)

and X
hAi = k 2 + (1 + dlog2 (|pi,j | + 1)e + dlog2 (qi,j )e)
(i,j)∈[k]2

Using the Leibniz formula for the determinant, we see that q divides the product
of all qi,j and hence Y
q≤ qi,j . (2.2)
(i,j)∈[k]2
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 16

In particular, we obtain
X
dlog2 (q)e ≤ dlog2 (qi,j )e ≤ hAi. (2.3)
(i,j)∈[k]2

Again using the Leibniz formula, we see that


k
XY Y
| det(A)| ≤ |pi,σ(i) | ≤ (|pi,j | + 1). (2.4)
σ i=1 (i,j)∈[k]2

holds. From (2.2) and (2.4) we get


Y
|p| = |q|| det(A)| ≤ (|pi,j | + 1)qi,j
(i,j)∈[k]2

and hence
X
dlog2 (|p| + 1)e ≤ 1 + dlog2 (|p|)e ≤ 1 + (dlog2 (|pi,j | + 1)e + dlog2 (qi,j )e)
(i,j)∈[k]2 )

≤ hAi − 1.
Plugging (2.3) and the latter inequality into (2.1) yields the claim.
We conclude:
Theorem 2.4. Algorithm 2.1 (Gaussian elimination) runs in polynomial time.

2.2 Linear programming Lecture 4

Next, we want to demonstrate that checking whether a system of inequalities


has a solution – and computing such a solution – can be also done in polyno-
mial time. We will also see that this can be turned into an efficient algorithm
for linear programming (the optimization version). For the latter problem, the
practicably by far most important algorithm is the simplex algorithm. While
it (theoretically and practiably) runs very fast on most instances, there exist
instances for which the simplex algorithm requires an expontential (in the num-
ber of inequalities and variables) number of arithmetic operations.1 Although
understanding the behavior of the simplex algorithm is a very exciting topic,
we will focus on another algorithm, which has polynomial running time even in
the worst case.
To this end, it will be helpful to consider the feasibility version of linear
programming. One way to directly obtain an algorithm for the optimization
version is given by the following observation.
Lemma 2.5. Let c ∈ Rn , A ∈ Rm×n , and b ∈ Rm . For every point (x, y)
satisfying the linear inequalities
Ax ≤ b, y ≥ 0, A| y = c, c| x = b| y (2.5)
we have that x is an optimal solution to max{c| x0 : Ax0 ≤ b}. Conversely,
for every optimal solution x to the latter problem there exists some y such that
(x, y) satisfies (2.5).
1 Actually, there are various different implementations of the simplex method, and instances

with an exponential behavior have been found for most of them.


CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 17

Proof. Suppose that (x, y) satisfies (2.5). For every x0 with Ax0 ≤ b we have
c| x0 = (A| y)| x0 = y | Ax0 ≤ y | b = c| x,
and hence x is indeed an optimal solution.
Conversely, if x is an optimal solution, then strong LP duality asserts that
there exists some y such that (x, y) satisfy (2.5).
In what follows, we assume that we are given A ∈ Qm×n , b ∈ Qm and we
want do determine whether the polyhedron
P = P ≤ (A, b) := {x ∈ Rn : Ax ≤ b}
is nonempty. While P might be unbounded, the next lemma states that we may
restrict ourselves to searching feasible points within a certain radius.
Lemma 2.6. There exists a polynomial p such that the following holds for every
A ∈ Qm×n , b ∈ Qm : If P = P ≤ (A, b) is nonempty, then so is P ∩[−δ, δ]n , where
δ := 2p(hAi+hbi) .
Proof. Suppose that P is nonempty. Consider any inclusion-wise minimal face
F of P . We know that F = {x ∈ Rn : A0 x = b0 } for some A0 , b0 arising from A, b
by deleting some rows. In the previous section, we have discussed that there
exists a polynomial p such that Gaussian elimination can be used to compute a
point x ∈ F in time at most p(hA0 i+hb0 i). We may assume that p is monotonely
increasing. Note that we must have
hxi ≤ p(hA0 i + hb0 i) ≤ p(hAi + hbi),
which implies kxk∞ ≤ 2p(hAi+hbi) .
Thus, by possibly adding inequalities −δ ≤ xi ≤ δ, we may √ assume that
P ⊆ [−δ, δ]n , where δ is as in the above lemma. Setting R := nδ, this implies
that
P ⊆ B(R), (2.6)
where we use the notation B(R) := {x ∈ Rn : kxk2 ≤ R} (the ambient dimen-
sion n will be always clear from the context). Note that log2 (R) is polynomially
bounded in the encoding lengths of A, b.
In what follows, we will also assume that if P is nonempty, then it contains
x0 +B(r) for some (unknown) point x0 ∈ Rn and some r > 0 such that log2 (1/r)
is also polynomially bounded in the encoding lenghts of A, b, i.e.,
x0 + B(r) ⊆ P or P = ∅. (2.7)
For instance, this requirement can be achieved by carefully increasing b without
changing the (non)emptyness of P , but we will not have time to comment on
this in detail.
A famous way to quickly detect a point in P or conclude that P is empty,
is given by the ellipsoid method. It was first studied by Naum Shor, Arkadi
Nemirovski, and David Yudin in the 1970s. In 1979, Leonid Khachiyan showed
how this method can be used to obtain the first polynomial-time algorithm in
linear programming, which was a stunning theoretical breakthrough.
As the name suggests, the ellipsoid method relies on the concept of ellipsoids,
which are simply linear transformations of Euclidean balls. We will make use
of the equivalent definition.
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 18

Definition 2.7 (Ellipsoid). A set E ⊆ Rn is called an ellipsoid if there exists


a vector z ∈ Rn and a positive definite matrix D ∈ Rn×n such that

E = E(z, D) := {x ∈ Rn : (x − z)| D(x − z) ≤ 1}.

The vector z is called the center of E.


Algorithm 2.8 (Ideal ellipsoid method). Given an inequality description of a
polytope P ⊆ Rn and R > 0 with P ⊆ B(R), we perform the following sequence
of steps.
1. E ← B(R)
2. z ← center of E
3. if z ∈ P , return z
4. otherwise, let a ∈ Rn \ {0} such that maxx∈P a| x ≤ a| z
5. E ← minimum volume ellipsoid containing {x ∈ E : a| x ≤ a| z}
6. go to step 2
The above method starts with an ellipsoid that definitely contains P . In
every iteration, it is checked whether the center z of the current ellipsoid is a
point in P . If this is the case, we return z and are done. Otherwise, we can
find an inequality a| x ≤ β that is valid for all x ∈ P but violated by z. In
particular, we know that P ⊆ {x ∈ E : a| x ≤ a| z}. Thus, we see that the
ellipsoid computed in step 5 still contains P and repeat the process.
It can be shown that every full-dimensional compact convex set has a unique
minimum volume ellipsoid containing it. Such polytopes are called Löwner-
John ellipsoids, see also Section 9.2. In our case, the minimum volume can be
described by some explicit formula.
Lemma 2.9. Let E = E(z, D) be an ellipsoid and a ∈ Rn \ {0}. The minimum
volume ellipsoid containing {x ∈ E : a| x ≤ a| z} is given by E 0 = E(z 0 , D0 ),
where
1
z0 = z − √ Da
(n + 1) a| Da
and
n2
 
2
D0 = 2
D− Da(Da)|
.
n −1 (n + 1)a| Da
Moreover, we have
vol(E 0 )
< e−1/(2n+2) .
vol(E)
Proof. See, e.g., Schrijver’s book, Theorem 13.1.
The above lemma does not only give an explicit way to perform step 5 of
Algorithm 2.8, it also shows that, in each iteration, the ellipsoid becomes smaller
in terms of volume.
Theorem 2.10. Let P ⊆ Rn satisfying (2.6) and (2.7).  If the ideal ellipsoid
method has not found a point in P after n(2n + 2) ln Rr iterations, then P = ∅.
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 19

Proof. Consider the case that P is nonempty, in which we know that P contains
a Euclidean ball of radius r and hence vol(P ) ≥ vol(B(r)). Suppose that the
ellipsoid method has not found a point in P after k iterations. Let E1 , E2 , . . . , Ek
denote the ellipsoids computed in step 5 of each iteration, and set E0 = B(R).
By Lemma 2.9 we have vol(Ei ) ≤ e−1/(2n+2) vol(Ei−1 ) for all i ∈ [k]. Recall
that P ⊆ Ek and hence

vol(B(r)) ≤ vol(P ) ≤ vol(Ek ) ≤ e−1/(2n+2) vol(Ek−1 )


≤ e−2/(2n+2) vol(Ek−2 )
...
≤ e−k/(2n+2) vol(E0 )
= e−k/(2n+2) vol(B(R)).

Thus, we obtain
vol(B(r))  r n
e−k/(2n+2) ≥ = ,
vol(B(R)) R
which implies  
R
k ≤ n(2n + 2) ln .
r

Thus, running the ideal ellipsoid method at most T = n(2n + 2) ln Rr + 1




iterations, we can decide whether P is nonempty. Recall that we have assumed


that log2 (R) and log2 (1/r) are polynomially bounded in the encoding lengths
of A, b, and hence the number of iterations is polynomial in the encoding length
of our input.
However, there is still a serious issue with the ideal ellipsoid method: In
the computations of the next ellipsoids, square roots appear and so we may not
perform that step within rational arithmetic. It is possible to overcome this
issue by carefully rounding the entries of D and z in each iteration, in a way
that still preserves a similar decrease in volume. To summarize, it is possible
to turn the above ideas into a true polynomial-time algorithm for solving linear
programs.
Theorem 2.11 (Khachiyan 1979). There is a polynomial-time algorithm that,
given A ∈ Qm×n , b ∈ Qm , returns a vector x ∈ Qn satisfying Ax ≤ b or
concludes that no such vector exists.

Let us close this section by a few remarks. It turns out that the ellipsoid
method does not perform well in practice, and hence is rather of theoretical
interest. However, its theoretical implications are very strong. For instance,
the ellipsoid method never requires an explicit list of inequalities describing
P but only an oracle for deciding whether a point is contained in P and, if
not, providing a violated inequality. This allows to use the ellipsid method
in situations where P is described by huge number of inequalities, or even an
infinite number of inequalities, and hence it is applicable to general convex
optimization problems.
Another famous class of algorithms for solving linear programs are interior
point methods, studied by Yurii Nesterov and Arkadi Nemirovski. In 1984,
Narendra Karmarkar showed that interior point methods can be also used to
CHAPTER 2. LINEAR EQUATIONS AND INEQUALITIES 20

obtain polynomial-time algorithms for linear programming. In contrast to el-


lipsoid methods, interior point methods perform very well in practice, in some
cases even better than the simplex method.
Chapter 3

Linear Diophantine equations

3.1 Lattices Lecture 5

In this chapter, we consider the problem of finding integer solutions to systems


of linear equations, and we will see that this problem can be also solved in
polynomial time. Given a matrix A ∈ Qm×n and a vector b ∈ Qm , we want to
determine whether there exists some x ∈ Zn such that Ax = b. In other words,
we want to find a linear combination with integer coefficients of the columns of
A that is equal to b. In the first part of this chapter, we try to get a better
understanding of the space of vectors that can be written as linear combinations
with integer coefficients of a finite set of vectors.
Definition 3.1 (Lattice, lattice basis). For a finite set X ⊆ Rn , we set
( )
X
Λ(X) := λx x : λx ∈ Z for all x ∈ X .
x∈X

If X is R-linearly independent, then Λ := Λ(X) is called a lattice. In the latter


case, we say that X is a lattice basis of Λ.
Definition 3.2. For a matrix A ∈ Rm×n , we set Λ(A) := Λ(A∗,1 , . . . , A∗,n ).
Note that Ax = b has an integer solution if and only if b ∈ Λ(A). A natural
example of a lattice is Λ = Zn , which has the standard unit vectors as a lattice
basis. Clearly, there also exist lattices different from Zn . Some of them are
proper subsets of Zn , but some of them look completely different. Note that a
lattice may have many lattice bases: For instance, another lattice basis for Z2
is given by {(2, 1)| , (1, 1)| }.
Let us also mention that √ Λ(X) is not necessarily a lattice if X is linearly
dependent, e.g., if X = {1, 2} ⊆ R. In other cases, Λ(X) may be a lattice but
none of the points from X belongs to a basis of Λ(X), e.g., if X = {3, 5}, then
we have Λ(X) = Z.
Definition 3.3 (Linear hull). The linear hull lin(X) of a subset X ⊆ Rn is the
smallest linear subspace of Rn that contains X.
Note that if B is a lattice basis of Λ, then lin(B) = lin(Λ). In particular, we
see that all lattice bases of Λ have the same cardinality.

21
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 22

Definition 3.4 (Dimension of a lattice). The dimension of a lattice is dim(Λ) :=


dim(lin(Λ)).
Note that for every matrix A, we have dim(Λ(A)) = rank(A).

3.2 Hermite normal form Lecture 6

In what follows, we often want to understand whether two sets of vectors gen-
erate the same lattice, or whether a point lies within a lattice. These tasks are
particularly simple if the respective matrices have a special form:
Definition 3.5 (Hermite normal form). A matrix A ∈ Rm×n with m ≤ n has
Hermite normal form (HNF) if the following holds.
1. Ai,j ≥ 0 for all i, j
2. Ai,j = 0 for all j > i
3. Ai,i > 0 for all i
4. Ai,j < Ai,i for all j < i
In other words, a matrix has HNF if every entry is nonnegative (1), it is a
lower triangular matrix (2) with strictly positive diagonal entries (3) that are
strictly larger than all other entries in their row (4). As an example, the matrix
 
4 0 0 0 0 0
0 π 0 0 0 0
 
 e
√1 5 0 0 0

2.8 2 0 3.1 0 0

has HNF.
Note that if A ∈ Rm×n has HNF, then rank(A) = m. Moreover, Λ(A) is a
lattice and the nonzero columns of A form a lattice basis of Λ(A).
Theorem 3.6. For any two matrices A, A0 ∈ Rm×n in HNF we have

Λ(A) = Λ(A0 ) ⇐⇒ A = A0 .

Proof. The implication ⇐ is clear. Let A, A0 ∈ Rm×n be matrices in HNF with

Λ(A) = Λ(A0 ).

For the sake of contradiction, suppose that A 6= A0 . Let us choose i ∈ [m]


minimal such that there is some j ∈ [i] with

Ai,j 6= A0i,j .

Without loss of generality, we may assume that

A0i,i ≤ Ai,i .

Since A0∗,j ∈ Λ(A) and due to the triangular shape (2) of A there exist coeffi-
cients
λ1 , . . . , λ i ∈ Z
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 23

with
i
X
A0[i],j = λ` A[i],` (3.1)
`=1

and
i−1
X
A0[i−1],j = λ` A[i−1],` . (3.2)
`=1

If j = i, then the linear independence of A[i−1],1 , . . . , A[i−1],i−1 and A0[i−1],i = 0


together with (3.2) would imply

λ1 = · · · = λi−1 = 0.

In this case, (3.1) yields A0i,i = λi Ai,i . However, recall that we have A0i,i < Ai,i ,
A0
which yields λi = Ai,ii,i
< 1. The nonnegativity (by (1)) and integrality of λi
implies λi = 0 and hence A0i,i = 0, a contradiction to (3).
Hence, we must have j < i. By the minimality of i, we have A0[i−1],` = A[i−1],`
for all `. Again by the triangular shape of A and (3.2) we see that
(
1 if ` = j
λ` =
0 if ` ∈ [i − 1] \ {j}

holds. Thus, by (3.1), we obtain

A0i,j = Ai,j + λi Ai,i .

Recall that we assumed Ai,j 6= A0i,j and hence λi 6= 0. If λi > 0, then A0i,j ≥
Ai,i ≥ A0i,i , a contradiction to (4). If λi < 0, then Ai,j = A0i,j − λi Ai,i ≥ Ai,i ,
again a contradiction to (4).
Thus, if two matrices have HNF, then it is easy to tell if they generate the
same lattice. Next, we will see that every rational lattice is generated by a
matrix in HNF. To see this, we will start with any lattice basis and try to put
it into HNF using simple operations that do not affect the lattice.
Definition 3.7 (Elementary column operations). The following operations on
a matrix are called elementary column operations (ECOs).
1. interchanging two columns
2. multiplication of a column by −1
3. addition of an integer multiple of a column to another column
Let us collect a few simple observations about ECOs. First, for every ECO
ω and matrices A, B we have

ω(A · B) = A · ω(B). (3.3)

Second, if λ ∈ R, then we always have

ω(λB) = λ · ω(B). (3.4)

Moreover, it turns out that ECOs do not affect lattices, or, more generally:
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 24

Observation 3.8. If ω is an ECO on a matrix A, then we have

Λ(ω(A)) = Λ(A).

Proof. It is clear that Λ(ω(A)) ⊆ Λ(A). It remains to observe that there exists
an ECO ω −1 that is inverse to ω, and hence

Λ(A) = Λ(ω −1 (ω(A)) ⊆ Λ((ω(A)),

which yields the claim.


Theorem 3.9. For every rational matrix A ∈ Qm×n with rank(A) = m there
exists a sequence of ECOs such that the resulting matrix has HNF. This matrix
is unique and is called the Hermite normal form of A.
Proof. Suppose that A0 and A00 have HNF and both arise from A by applying
a sequence of ECOs. By Observation 3.8 we see that Λ(A0 ) = Λ(A00 ) = Λ(A)
holds, and hence Theorem 3.6 yields A0 = A00 .
Thus, it remains to show the existence of a sequence of ECOs resulting
in a matrix that has HNF. By (3.4), we may assume that A has only integer
entries (multiplying a matrix in HNF with a positive scalar yields again matrix
in HNF).
Suppose we have already performed some ECOs to obtain a matrix of the
form  
B 0
,
C D
where B is a lower triangular matrix with Bi,i > 0 for all i and D ∈ Z(m−n+ñ)×ñ .
(At the beginning, we have ñ = n and B is a (0 × 0)-matrix.)
Step #: Among all finite sequences of ECOs that transform the first line of
D such that
D1,1 ≥ D1,2 ≥ · · · ≥ D1,ñ ≥ 0, (3.5)
apply one for which

X
D1,j is minimal. (3.6)
j=1

(Such a sequence exist since the sum can only take nonnegative integer values.)
By (3.5) and rank(A) = m, we must have D1,1 > 0, and by (3.6) we even
have D1,2 = D1,3 = · · · = D1,ñ = 0. (If D1,j > 0, we may subtract the j-th
column of D from the first column and possibly reorder the columns of D to
contradict (3.6).) Thus, we see that one can iteratively apply a sequence of
ECOs to A to obtain a matrix  
B 0 ,
where B is lower triangular with Bi,i > 0 for all i.
To obtain nonnegativity and diagonal dominance, we perform the following
two steps for each row i = 2, . . . , m (in this order) and every j = 1, . . . , i − 1:
1. Determine µ, r ∈ Z with Bi,j = µBi,i + r, where 0 ≤ r < Bi,i . 2. Add (−µ)
times the ith column of B to the jth column of B.
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 25

We remark that Step # may be replaced by the following algorithm to


achieve (3.5) and (3.6). Suppose that D1,i ≥ D1,j > 0 with i 6= j. If D1,j ≤
2 D1,i , then we may write D1,i = µD1,j + r with µ, r ∈ Z and 0 ≤ r < D1,j ≤
1

2 D1,i , and add (−µ) times the jth column to the ith column. Otherwise, we
1

simply add (−1) times the jth column to the ith column. Note that in each
such step, we decrease a positive entry D1,i by at least a factor of 2. Thus,
the number of iterations required to achieve (3.5) and (3.6) is polynomial in the
encoding length of A.
However, recall that to obtain a polynomial-time algorithm, we also need to
take care of the numbers involved in that process. With some further modifi-
cation of the above algorithm, it is possible to obtain a true polynomial-time
algorithm for computing the Hermite normal form:
Theorem 3.10 (Kannan, Bachem 1979). There is a polynomial-time algorithm
for computing the Hermite normal form of any rational matrix.
Corollary 3.11. Given a rational matrix A (of any rank), we have that Λ(A)
is a lattice. Moreover, we can compute a lattice basis of Λ(A) in polynomial
time.
Proof. Given A ∈ Qm×n with rank(A) = k, using Gaussian elimination we may
compute a rational matrix à ∈ Qk×n with rank(Ã) = k and a matrix V ∈ Qm×k
such that A = V Ã.
By Theorem (3.10), we may compute a lattice basis B̃ of Λ(Ã). Then B :=
V B̃ is a lattice basis of Λ(A) since

Λ(A) = Λ(V Ã) = V · Λ(Ã) = V · Λ(B̃) = Λ(V B̃) = Λ(B).

3.3 Unimodular matrices Lecture 7

In the previous section, we have seen that the Hermite normal form of a ratio-
nal matrix can be obtained by ECOs. It turns out that ECOs correspond to
multiplications of very particular matrices, which will be helpful for our under-
standing of the solution space of linear diophantine systems.
Definition 3.12 (Unimodular matrix). A square matrix U is called unimodular
if all entries are integer and | det(U )| = 1.
Theorem 3.13. For every invertible integer matrix U ∈ Zn×n , the following
statements are equivalent.
1. U is unimodular
2. U −1 ∈ Zn×n
3. U −1 is unimodular
4. U · Zn = Zn
5. Λ(U ) = Zn
6. The Hermite normal form of U is the identity matrix.
7. U arises from the identity matrix by a sequence of ECOs.
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 26

Proof. 1 =⇒ 2: Recall that Cramer’s rule states that for an invertible matrix
A, a the solution of Ax = b is given by xi = det(A i)
det(A) , where Ai arises from A by
replacing its ith column with b. The claim follows by applying Cramer’s rule
to the matrix equation U X = I and using the fact that the determinant of an
integer matrix is integer.
2 =⇒ 3: This holds since 1 = det(U ) det(U −1 ) and the fact that det(U ),
det(U −1 ) are both nonzero integers.
3 =⇒ 4: We have Zn = (U U −1 )Zn = U (U −1 Zn ) ⊆ U Zn ⊆ Zn .
4 =⇒ 5: Note that Λ(U ) = U Zn .
5 =⇒ 6: This follows from the uniqueness of the Hermite normal form, see
Theorem 3.9.
6 =⇒ 7: This follows again from Theorem 3.9 and the fact that every ECO
has an inverse ECO.
7 =⇒ 1: The application of an ECO does not change the absolute value of the
determinant.
Corollary 3.14. For every rational matrix A ∈ Qm×n with rank(A) = m there
is a unimodular matrix U ∈ Zn×n such that AU is the Hermite normal form of
A. Moreover, U can be found in polynomial time.

Proof. By Theorem 3.10 we can compute the Hermite normal form H of A.


Note that since rank(A) = m there is a unique matrix U with AU = H, which
we can compute from A and H using Gaussian elimination.
Thus, it remains to show that there exists a unimodular matrix U with
AU = H. To this end, recall that H can be obtained from A by a sequence
ω of ECOs. Letting U := ω(I), we see that U is a unimodular matrix (by the
previous theorem) and that H = ω(A) = ω(A · I) = Aω(I) = AU holds.

3.4 Solving linear Diophantine systems


We are ready to present a polynomial-time algorithm for solving linear Dio-
phantine systems. More concretely, we show the following:
Theorem 3.15. There is a polynomial-time algorithm that, given A ∈ Qm×n ,
b ∈ Qm , computes the following.
• If Ax = b has no solution x ∈ Zn , it computes a vector y ∈ Qm with
y | A ∈ Zn but y | b ∈
/ Z.
• Otherwise, it computes a vector x(0) ∈ Zn and linear independent vectors
x(1) , . . . , x(t) ∈ Zn with t = n − rank(A) such that

{x ∈ Zn : Ax = b} = x(0) + Λ(x(1) , . . . , x(t) )


n Xt o
= x(0) + λj x(j) : λ1 , . . . , λt ∈ Z .
j=1

Before we describe the above algorithm, let us mention that the above theo-
rem yields an elegant characterization of the feasibility of a linear Diophantine
systems. In case there is no solution, the vector y can be seen as a simple
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 27

certificate showing that the system is indeed infeasible: If there was a solution
x ∈ Zn with Ax = b, then y | b = y | Ax ∈ Z, contradicting y | b ∈
/ Z. Note that a
similar characterization holds for systems of linear equations: If Ax = b has no
solution in Rn , then there exists some y ∈ Rn such that y | A = 0 but y | b = 1,
which is also known as the Fredholm alternative.
In case the system has a solution, we see that the solution space is again
similar to what we know for linear systems.
Proof of Theorem 3.15. First, we perform Gaussian elimination to see whether
Ax = b has any solution x ∈ Rn . If not, we may obtain some y ∈ Qm such that
y | A = 0 and y | b = 21 . If it has a solution, we may determine redundant rows
whose removal does not change the solution space and hence we can assume
that rank(A) = m.
By Corollary 3.14, in polynomial time we can compute a unimodularmatrix
U ∈ Zn×n such that AU is in Hermite normal form. Recall that AU = B 0
for some invertible matrix B  ∈0 Q
m×m
.
z
For any x ∈ R let z = 00 := U x with z 0 ∈ Rm , z 00 ∈ Rn−m . We have:
n −1
z

Ax = b, x ∈ Zn
⇐⇒
AU z = b, z ∈ Zn
⇐⇒
 z0
 
= b, z 0 ∈ Zm , z 00 ∈ Zn−m

B 0
z 00
⇐⇒
z = B −1 b, B −1 b ∈ Zm , z 00 ∈ Zn−m .
0

We observe that if Ax = b has no integer solution, then there is some index


i ∈ [m] with (B −1 b)i ∈
/ Z. Choosing y as the ith row of B −1 , we obtain

y | A = y | B 0 U −1 = e|i 0 U −1 = Ui,∗ −1
∈ Zn
   

and
y | b = (B −1 b)i ∈
/ Z.
Otherwise, we have B −1 b ∈ Zm and setting U = U 0 U 00 with U 0 ∈ Zn×m ,
 

U 00 ∈ Zn×(n−m) we see that


 z0
   
0 −1 00
{x ∈ Zn : Ax = b} = U 0 U 00 n−m

: z = B b, z ∈ Z
z 00
= {U 0 B −1 b + U 00 z 00 : z 00 ∈ Zn−m }

holds. Hence, we may choose x(0) := U 0 B −1 b and x(1) , . . . , x(t) as the columns
of U 00 (where t = n − m = n − rank(A)).

3.5 Minkowski’s theorem Lecture 8

Since we have already obtained some understanding of lattices, let us use this op-
portunity to visit an extremely elegant and powerful result in finite-dimensional
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 28

convexity, Minkowski’s convex body theorem. To this end, we need the following
notions.

Definition 3.16 (Determinant of a lattice). Let Λ ⊆ Rn be a lattice with lattice


basis B ∈ Rn×n . The determinant of Λ is defined as det(Λ) = | det(B)|.
Lemma 3.17. The determinant of a lattice is well-defined, i.e., it does not
depend on the choice of B.
Proof. Let B, B 0 denote lattice bases of Λ. Then there is a unimodular matrix
U with B 0 = BU (why?). We obtain | det(B 0 )| = | det(BU )| = | det(B)| ·
| det(U )| = | det(B)|.
Definition 3.18 (Fundamental parallelepiped). A fundamental parallelepiped
of a lattice Λ ⊆ Rn is a set of the form

{λ1 b1 + · · · + λn bn : λ1 , . . . , λn ∈ [0, 1)} ,

where b1 , . . . , bn ∈ Rn is a lattice basis of Λ.


Observation 3.19. The determinant of a lattice is equal to the volume of any
of its fundamental parallelepipeds.

Lemma 3.20. Let Λ ⊆ Rn be a lattice and Π be a fundamental parallelepiped


of Λ. For every point x ∈ Rn there exist unique v ∈ Λ and y ∈ Π such that
x = v + y.
Proof. Suppose that Π is induced by the lattice Pn basis b1 , . . . , bn of Λ. Since
b1 , . . . , bn is a basis of Rn , we can write x = i=1 µi bi for some µ1 , . . . , µn ∈ R.
Let
Xn Xn
v= bµi cbi and y = (µi − bµi c)bi .
i=1 i=1

Clearly, we have v ∈ Λ, y ∈ Π, and x = v + y.


Suppose there are two decompositions x = v + y = v 0 + y 0 with v, v 0 ∈ Λ and
n
X n
X
y= αi bi and y0 = αi0 bi ,
i=1 i=1

where 0 ≤ αi , αi0 < 1 for i = 1, . . . , n. Then


n
X
0 0
v−v =y −y = γi bi ,
i=1

where γi = αi0 − αi . Note that |γi | < 1 for all i and that v − v 0 ∈ Λ. Since
b1 , . . . , bn is a lattice basis of Λ, the numbers γi must be integers and hence are
all zero. We conclude that αi = αi0 for all i and hence y 0 = y, v 0 = v.

Corollary 3.21. Let Λ ⊆ Rn be a lattice and Π be a fundamental parallelepiped


of Λ. Then the sets {Π + v : v ∈ Λ} cover Rn without overlapping.
Before we get to Minkowski’s theorem, let us start by a result of Blichfeldt
(1914).
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 29

Theorem 3.22 (Blichfeldt’s theorem). Let Λ ⊆ Rn be a lattice and let S ⊆ Rn


be a set with vol(S) > det(Λ). Then there exist two points x, y ∈ S, x 6= y such
that x − y ∈ Λ.

Proof. Let Π be a fundamental parallelepiped of Λ. For every lattice point


u ∈ Λ let us define the set

Su = {y ∈ Π : y + u ∈ S}.

Note that
Su + u = ((Π + u) ∩ S)
and hence the previous corollary implies that the translates Su + u cover S
without overlapping. Thus, we have
X X
vol(Su ) = vol(Su + u) = vol(S) > det(Λ) = vol(Π).
u∈Λ u∈Λ

Since Su ⊆ Π for all u ∈ Λ, not all Su can be disjoint (otherwise we would have
u∈Λ vol(Su ) ≤ vol(Π)). Thus, there exist u, u ∈ Λ, u 6= u with Su ∩ Su0 6= ∅.
0 0
P
Let z ∈ Su ∩Su0 . Then x := z +u ∈ S and y := z +u ∈ S and x−y = u−u0 ∈ Λ,
0

as claimed.
We are ready to state and prove Minkowski’s theorem.

Theorem 3.23 (Minkowski’s convex body theorem). Let Λ ⊆ Rn be a lattice


and C ⊆ Rn be a convex set such that vol(C) > 2n det(Λ), and C is symmetric
about the origin, i.e., C = −C. Then C contains a nonzero lattice point from
Λ. Furthermore, if C is compact, then the statement even holds if vol(C) ≥
2n det(Λ).

Proof. Letting K := 12 C we have vol(K) = 2−n vol(C) > det(Λ). By Blichfeldt’s


theorem, there exist x, y ∈ K, x 6= y such that u := x − y ∈ Λ \ {0}. Note that
2x, 2y ∈ C and since C is symmetric about the origin we also have −2y ∈ C.
Since C is convex, we obtain
1 1
u=x−y = (2x) + (−2y) ∈ C,
2 2
which yields the first part of the claim. The second part follows by standard
compactness arguments and the fact that lattices are discrete (which you proved
in the exercises).

We encourage the reader to convince herself that all assumptions in Minkowski’s


theorem are necessary. It turns out that Minkowski’s theorem has numerous ap-
plications, and we will also make use of it later during this course. With the
following example we would like to highlight its importance within the geome-
try of numbers, showing that it can be used to derive some seemingly unrelated
nontrivial results in number theory.
Theorem 3.24 (Lagrange’s four-square theorem (1770)). Every positive integer
is the sum of four square numbers. That is, for every n ∈ Z≥1 there exist
x1 , x2 , x3 , x4 ∈ Zn such that n = x21 + x22 + x23 + x24 .
CHAPTER 3. LINEAR DIOPHANTINE EQUATIONS 30

Proof. First, let us argue that it suffices to prove the claim for prime numbers
n. To see this, suppose that n = pq for smaller integers p, q for which we have
already found integers x1 , x2 , x3 , x4 and y1 , y2 , y3 , y4 with p = x21 + x22 + x23 + x24 ,
q = y12 + y22 + y32 + y42 . Then it is straightforward (but tedious) to check that
n = n21 + n22 + n23 + n24 , where:

n1 = x1 y1 − x2 y2 − x3 y3 − x4 y4
n2 = x1 y1 + x2 y1 + x3 y4 − x4 y3
n3 = x1 y3 − x2 y4 + x3 y1 + x4 y2
n4 = x1 y4 + x2 y3 − x3 y2 + x4 y1

Assuming that n is prime, let us show that there are integers α, β such that
α2 + β 2 + 1 ≡ 0 mod n. If n = 2, then choose α = 0, β = 1. Otherwise,
n is odd. We claim that the numbers α2 with 0 ≤ α < n/2 have distinct
residues modulo n: Suppose α12 ≡ α22 mod n for some 0 ≤ α1 , α2 < n/2. Then
(α1 −α2 )(α1 +α2 ) ≡ 0 mod n and hence n divides α1 −α2 or α1 +α2 . However,
the absolute value of both numbers is strictly smaller than n and hence α1 = α2 .
Similarly, we see that the numbers −1 − β 2 with 0 ≤ β < n/2 also have distinct
residues modulo n. Since there are (n + 1)/2 such numbers α2 and (n + 1)/2
such numbers −1 − β 2 , we obtain the claim.
Thus, there exist integers α, β with α2 + β 2 + 1 ≡ 0 mod n. Consider the
lattice

Λ = {x ∈ Z4 : x1 ≡ αx3 + βx4 mod n, x2 ≡ βx3 − αx4 mod n}.

In the exercises, we will see that det(Λ) = n2 holds. Moreover, the 4-dimensional
open ball B = {x ∈ R4 : kxk2 < 2n} satisfies

vol(B) = 2n2 π 2 > 16n2 ≥ 24 det(Λ).

Thus, by Minkowski’s theorem there is a point x ∈ Λ with 0 < x21 +x22 +x23 +x24 <
2n. Note that

x21 + x22 + x23 + x24 ≡ (α2 + β 2 + 1)x23 + (α2 + β 2 + 1)x24 ≡ 0 mod n.

Thus, x21 + x22 + x23 + x24 is divisible by n and hence it must be equal to n.
Chapter 4

Integer hulls

4.1 Refresher on polyhedra Lecture 9

Now that we have seen that linear programming and linear Diophantine systems
can be solved in polynomial time, we turn our full attention to integer programs.
A central aspect of our studies will be to understand the structure of integer
points in polyhedra. To put the upcoming results in context, we begin with the
following refresher on the basics of polyhedra (some of which we have already
implicitly used in earlier parts).
Definition 4.1 (Polyhedron). A polyhedron is a set of the form P ≤ (A, b) :=
{x ∈ Rn : Ax ≤ b} for some A ∈ Rm×n , b ∈ Rm . If A and b are rational, we
say that P ≤ (A, b) is a rational polyhedron.
While the above definition is based on outer descriptions of polyedra by
means of linear inequalities, polyhedra can be also specified by inner descrip-
tions, see Theorem 4.8.
Definition 4.2 (Polytope). A polytope is a bounded polyhedron.
Definition 4.3 (Convex hull). The convex hull of a set X ⊆ Rn is the smallest
convex set that contains X, and is denoted by conv(X).
Definition 4.4 (Cone). A nonempty set C ⊆ Rn is a cone if for every x ∈ C
the ray {λx : λ ≥ 0} is also contained in C.
Definition 4.5 (Convex conic hull). The convex conic hull of a set Y ⊆ Rn is
the smallest convex cone that contains Y , and is denoted by ccone(Y ).
Definition 4.6 (Convex and conic combinations). Given z1 , . . . , zk ∈ Rn and
λ1 , . . . , λk ∈ R, we say that the linear combination

λ1 z1 + · · · + λk zk

is a conic combination of z1 , . . . , zk if λ1 , . . . , λk ≥ 0. We say that it is a convex


combination of z1 , . . . , zk if λ1 , . . . , λk ≥ 0 and λ1 + · · · + λk = 1.
Theorem 4.7 (Carathéodory). The convex hull of a set X ⊆ Rn is the set of
all convex combinations of at most n + 1 elements from X. The conic hull of a
set Y ⊆ Rn is the set of all conic combinations of at most n elements from Y .

31
CHAPTER 4. INTEGER HULLS 32

Theorem 4.8 (Minkowski, Weyl). A set P is a polyhedron if and only if there


exist finite sets X, Y ⊆ Rn such that P = conv(X) + ccone(Y ).

Remark 4.9. In the above statement, P is rational if and only if X, Y can be


chosen to consist of rational vectors only.
Remark 4.10. There is a polynomial p such that the following holds.
• For every A ∈ Qm×n , b ∈ Qm there exist finite sets X, Y ∈ Qn with
P ≤ (A, b) = conv(X) + ccone(Y ) where the encoding length of every vector
in X ∪ Y is at most p(hA, bi).
• For every finite sets X, Y ∈ Qn there exist A ∈ Qm×n , b ∈ Qm with
P ≤ (A, b) = conv(X) + ccone(Y ) where the encoding length of every entry
in A, b is at most p(hX, Y imax ), where hX, Y imax := maxz∈X∪Y hzi.

Be careful: in the first part of the statement, the number of vectors in X ∪ Y


may be exponential in the size of A and b. Conversely, in the second part of
the statement, the number of rows in A, b may be exponential in the number of
vectors in X ∪ Y .
Proposition 4.11 (Characteristic cone). For every polyhedron P ≤ (A, b) =
conv(X) + ccone(Y ), the characteristic cone (of all unbounded directions)

char(P ) := {y ∈ Rn : x + λy ∈ P for all x ∈ P, λ ≥ 0}

satisfies
char(P ) = P ≤ (A, 0) = ccone(Y ).

Remark 4.12. A polyhedron P is bounded if and only if char(P ) = {0}.

4.2 Integer points in polyhedra


In this part, we will see that integer points in a rational polyhedron can be
parametrized in a similar way as in Theorem 4.8.
Definition 4.13. For a finite set Y ⊆ Rn we define
 
X 
Λ+ (Y ) := λy y : λy ∈ Z≥0 for all y ∈ Y .
 
y∈Y

Theorem 4.14. For every rational polyhedron P there exist finite sets X̃, Ỹ ⊆
Zn such that
P ∩ Zn = X̃ + Λ+ (Ỹ ).
If P ∩ Zn 6= ∅, we have
char(P ) = ccone(Ỹ ).
Moreover, if P = P ≤ (A, b) for rational A, b, then X̃, Ỹ can be chosen such that
the encoding length of every vector in X̃ ∪ Ỹ is polynomially bounded in the
encoding lengths of A, b.
CHAPTER 4. INTEGER HULLS 33

Proof. By Theorem 4.8 there exist finite sets X, Y ⊆ Qn such that the encoding
length of every vector in X ∪ Y is polynomially bounded in hA, bi. For every
y ∈ Y let α(y) ∈ Z≥1 denote the least common multiple of all denominators of
the entries in y. Setting
Ỹ := {α(y)y : y ∈ Y },
we see that ccone(Ỹ ) = ccone(Y ) = char(P ) and that the encoding length of
every vector in Ỹ is polynomially bounded in hA, bi. Let us also define
X̃ := (conv(X) + Q) ∩ Zn ,
where
 
X 
Q := λy y : Y 0 ⊆ Ỹ , |Y 0 | ≤ n, 0 ≤ λy ≤ 1 for all y ∈ Y 0 .
 
y∈Y 0

First, note that conv(X) + Q is a bounded set and hence X̃ is finite. Second,
every point z ∈ X̃ is integer and satisfies
kzk∞ ≤ max{kxk∞ : x ∈ X} + n max{kyk∞ : y ∈ Ỹ }.
Thus, the encoding length of every point X̃ is also polynomially bounded in
hA, bi.
It remains to show that P ∩Zn = X̃ +Λ+ (Ỹ ) holds. To this end, first observe
that Q ⊆ char(P ), implying conv(X) + Q ⊆ P and hence X̃ ⊆ P . Thus, we
obtain
X̃ + Λ+ (Ỹ ) ⊆ P + Λ+ (Ỹ ) ⊆ P + ccone(Ỹ ) = P + char(P ) = P.
For the reverse inclusion, let z ∈ P ∩ Zn . Using
P = conv(X) + ccone(Y ) = conv(X) + ccone(Ỹ )
we see that there exist x ∈ conv(X) and v ∈ ccone(Ỹ ) with z = x + v. By
Carathéodory’s theorem, there is a set Y 0 ⊆ Ỹ with |Y 0 | ≤ n together with
λy ≥ 0 for all y ∈ Y 0 such that
X
v= λy y.
y∈Y 0

Writing X X
z =x+v =x+ (λy − bλy c)y + bλy cy ,
y∈Y 0 y∈Y 0
| {z } | {z }
=:r =:ṽ

we see that r ∈ Q and ṽ ∈ Λ+ (Ỹ ). Moreover, we have x + r ∈ conv(X) + Q and


x + r = z − ṽ ∈ Zn , and hence x + r ∈ X̃. Thus, we obtain z = (x + r) + ṽ ∈
X̃ + Λ+ (Ỹ ).
Note that if we are explicitly given X̃, Ỹ as in the above theorem, with
X̃ 6= ∅, then given an objective c ∈ Rn it is easy to solve the integer program
max{c| x : x ∈ P ∩Zn }. In fact, if c| y > 0, then we know that the optimum value
is +∞. Otherwise, we see that max{c| x : x ∈ P ∩ Zn } = max{c| x : x ∈ X̃}
holds.
However, again the number of vectors in X̃ ∪ Ỹ can be exponential in the
size of A, b, and hence computing X̃, Ỹ explicitly is typically not possible.
CHAPTER 4. INTEGER HULLS 34

Corollary 4.15. The decision version of integer programming is in NP, and is


hence NP-complete.

Proof. We have already discussed that the decision version of integer program-
ming is NP-hard, and hence it remains to show that it is in NP. Suppose we are
given A ∈ Qm×n , b ∈ Qm that corresponds to a yes-instance, i.e., P := P ≤ (A, b)
contains an integer point. By Theorem 4.14 there exist finite sets X, Y ⊆ Zn
such that
P ∩ Zn = X + Λ+ (Y )
and the encoding length of every vector in X ∪ Y is polynomially bounded
in the encoding lengths of A, b. Thus, we may pick any vector x ∈ X as a
certificate showing that P contains an integer point: x itself is such a point, has
polynomially bounded encoding length, and it can be efficiently checked that x
satisfies all constraints defining P .
Definition 4.16 (Hilbert basis). A Hilbert basis of a set C ⊆ Rn is a finite set
H ⊆ Zn such that C ∩ Zn = Λ+ (H).

Corollary 4.17. Every rational polyhedral cone1 has a Hilbert basis.


Proof. Let C ⊆ Rn be a rational polyhedral cone. By Theorem 4.14 there exist
finite sets X, Y ⊆ Zn such that

C ∩ Zn = X + Λ+ (Y ).

Note that since X ∪Y ⊆ C and C is a convex cone, we also have Λ+ (X ∪Y ) ⊆ C


and hence
C ∩ Zn = X + Λ+ (Y ) ⊆ Λ+ (X ∪ Y ) ⊆ C ∩ Zn ,
which shows C ∩ Zn = Λ+ (X ∪ Y ).

4.3 Integer hull Lecture 10

In the previous part, we have seen that integer points in rational polyhedra can
be nicely parametrized. This allowed us to see that if a rational polyhedron
contains an integer point, we will also find one with small encoding length,
showing that the decision version of integer programming is in NP. Another
important consequence is that the convex hull of all integer points in a rational
polyhedron is again a rational polyhedron.

Definition 4.18 (Integer hull). The integer hull of a convex set C ⊆ Rn is


CI := conv(C ∩ Zn ).
Theorem 4.19. The integer hull PI of a rational polyhedron P is again a
rational polyhedron. If PI 6= ∅, then we have char(P ) = char(PI ). Moreover,
if P = P ≤ (A, b) for some rational A, b, then there exist rational D, d such that
PI = P ≤ (D, d) where the encoding length of every entry in D, d is polynomially
bounded in hA, bi.
1A set is a polyhedral cone if it is a polyhedron and a cone.
CHAPTER 4. INTEGER HULLS 35

Before we prove the above statement, let us mention that if P is not rational,
PI is not a polyhedron in general (see the exercises). Of course, the first part
of the above statement becomes trivial if P is a polytope, in which case PI is
the convex hull of a finite number of integer points, and hence it is clear that
PI is a rational polytope. (H) In the proof of Theorem 4.19 we make use of the
following lemma.
Lemma 4.20. For all finite sets X, Y ⊆ Rn we have

conv(X + Λ+ (Y )) = conv(X) + ccone(Y ).

Proof. See exercises.


Proof of Theorem 4.19. Let P = P ≤ (A, b) be a rational polyhedron with A, b
rational. We may assume that P ∩ Zn 6= ∅. By Theorem 4.14 there are finite
sets X, Y ⊆ Zn such that P ∩ Zn = X + Λ+ (Y ) and char(P ) = ccone(Y ), where
the encoding length of every vector in X ∪ Y is polynomially bounded in hA, bi.
By Lemma 4.20, we see that

PI = conv(P ∩ Zn ) = conv(X + Λ+ (Y )) = conv(X) + ccone(Y )

is a rational polyhedron. By Proposition 4.11, we have char(PI ) = ccone(Y ) =


char(P ). Moreover, by Remark 4.10 we see that there exist rational D, d with
PI = P ≤ (D, d) such that every entry in D, d is polynomially bounded in hA, bi.

4.4 Refresher on faces of polyhedra


Suppose we are given an integer program max{c| x : x ∈ P ∩ Zd }, where P
is a rational polyhedron. We have seen that PI is a rational polyhedron and
hence given an inequality description P ≤ (D, d) = PI of PI , we can determine
the optimum value via

max{c| x : x ∈ P ∩ Zd } = max{c| x : x ∈ PI } = max{c| x : Dx ≤ d},

where the latter ist just a linear program. However, how do we find a feasible
integer point attaining the optimum value? To understand this, let us recall
some basic facts about faces of polyhedra.
Definition 4.21. For a ∈ Rn , β ∈ R, we set H ≤ (a, β) := {x ∈ Rn : a| x ≤ β}
and H = (a, β) := {x ∈ Rn : a| x = β}.
Definition 4.22 (Face). A face of a polyhedron P ⊆ Rn is a set of the form
F = P ∩ H = (a, β), where a ∈ Rn and β ∈ R are such that P ⊆ H ≤ (a, β). We
say that F is defined by the inequality a| x ≤ β.
Proposition 4.23. Let F be a face of a polyhedron P = P ≤ (A, b) = conv(X) +
char(Y ), where A ∈ Rm×n , b ∈ Rm , and X, Y ⊆ Rn are finite.

1. F is a polyhedron.

2. The faces of F are those faces of P that are contained in F .


CHAPTER 4. INTEGER HULLS 36

3. F = conv(X ∩ F ) + ccone(Y ∩ char(F )).


4. If F 6= ∅, then setting

eq(F ) := eqA,b (F ) := {i ∈ [m] : A|i,∗ x = bi for all x ∈ F }

yields
F = {x ∈ P : A|i,∗ x = bi for all i ∈ eq(F )}
and F = P ∩ H = (a, β), where
X X
a= Ai,∗ and β = bi
i∈eq(F ) i∈eq(F )

and P ⊆ H ≤ (a, β).


5. For all I ⊆ [m], the set {x ∈ P : AI,∗ x = bI } is a face of P .
6. P has only finitely many faces.

Definition 4.24 (Lineality space). The lineality space of a polyhedron P ⊆ Rn


is
lineal(P ) := {z ∈ Rn : x + span{z} ⊆ P for all x ∈ P }.
Remark 4.25. For a polyhedron P = P ≤ (A, b) 6= ∅ we have

lineal(P ) = kern(A) ⊆ P ≤ (A, 0) = char(P ).

Definition 4.26 (Minimal face). A minimal face of a polyhedron P is a nonempty


face of P that does not properly contain another nonempty face of P .
Remark 4.27. Let F be a minimal face of a polyhedron P = P ≤ (A, b) with
A ∈ Rm×n , b ∈ Rm .
1. For every x∗ ∈ F we have F = x∗ + lineal(P ).
2. F = {x ∈ Rn : Aeq(F ) x = beq(F ) }.
Definition 4.28 (Pointed polyhedra, vertices). We say that a polyhedron P is
pointed if lineal(P ) = {0}. In this case, the minimal faces of P consist of single
points. If {v} is a minimal face of P , then we call both {v} and v a vertex of
P.
Let us mention that if a polyhedron P ⊆ Rn is pointed, then it is nonempty
since lineal(∅) = Rn . Furthermore, every nonempty polytope P is pointed since
lineal(P ) ⊆ char(P ) = {0}.
Remark 4.29. Let P be a polyhedron and v ∈ P . The following statements are
equivalent.
1. v is a vertex of P .
2. {v} is a 0-dimensional face of P .
3. v is an extreme point of P , that is, v ∈
/ conv(P \ {v}).
4. v is not a convex combination of two points from P \ {v}.
CHAPTER 4. INTEGER HULLS 37

Remark 4.30. Let P be a polyhedron and X ⊆ P . Then


P = conv(X) + char(P )
if and only if X contains a point from every minimal face of P . In particular: If
P is a pointed polyhedron, then the above equality holds if and only if X contains
all vertices from P .

4.5 Solving IPs via convexification Lecture 11

From our previous findings, it is possible to derive a rather direct approach to


solve integer programs. It makes use of the following fact.
Theorem 4.31. Let P be a rational polyhedron. Every nonempty face of PI
contains at least one integer point.
Proof. By Theorem 4.19, PI is a rational polyhedron. Let F be a nonempty
face of PI . By Proposition 4.23 there exists a rational a ∈ Qn , β ∈ Q with
PI ⊆ H ≤ (a, β), F = PI ∩ H = (a, β). Note that we have
β = max{a| x : x ∈ PI } = max{a| x : x ∈ P ∩ Zn },
and hence there is some z ∈ P ∩ Zn with a| z = β, and thus z ∈ F ∩ Zn .
Algorithm 4.32 (IPs via convexification). Given a rational polyhedron P ⊆ Rn
and objective c ∈ Qn , perform the following steps to solve the integer program
max{c| x : x ∈ P ∩ Zn }. (4.1)
1. Compute D ∈ Qm×n , d ∈ Qm with PI = P ≤ (D, d).
2. Compute ω = max{c| x : x ∈ PI }. If ω = −∞ or ω = ∞, stop and
conclude that (4.1) is either unbounded or infeasible.
3. Find an inclusionwise maximal set I ⊆ [m] with
F := {x ∈ Rn : Dx ≤ d, DI x = dI , c| x = ω} =
6 ∅.

4. We see that F is a minimal face of the face of PI that is defined by c| x ≤ ω,


and hence F is a minimal face of PI .
5. By Remark 4.27, we have F = {x ∈ Rn : DI x = dI , c| x = ω}, and F is
an affine subspace.
6. By Theorem 4.31, we know that there exists an integer point in F . Any
such point is an optimal solution to (4.1).
7. Determine that point by solving the respective linear Diophantine system.
In general, the above procedure is not feasible for concrete problems, either in
practice or in theory, because the system Dx ≤ d can almost never be computed
efficiently and is often not satisfactorily from a theoretical point of view. In the
sequel, however, we will see that there are interesting special cases in which
computing PI is not necessary at all because P = PI (integer polyhedra). After
that we will see how to use the described approach even PI cannot be computed
exactly (cutting planes).
Chapter 5

Integer polyhedra

5.1 Basics
This part is devoted to the study of situations in which the polyhedron P that
we are given is already equal to its integer hull, i.e., P = PI . While this is
clearly not satisfied in general, we will see that this special situation still occurs
in several interesting settings. Moreover, we will derive techniques to see that
P = PI indeed holds in these cases.
Definition 5.1 (Integer polyhedra). We say that a polyhedron P is integer if
it satisfies P = PI .
Theorem 5.2. A rational polyhedron P is integer if and only if every minimal
face of P contains at least one integer point. In particular, pointed rational
polyhedra (and all polytopes) are integer if and only if all their vertices are
integer.
Proof. If P is integer, then Theorem 4.31 shows that every nonempty face con-
tains an integer point.
Conversely, suppose that every minimal face of P contains an integer point.
Let X ⊆ P ∩ Zn contain an integer point from every minimal face of P . By
Remark 4.30, we obtain P = conv(X) + char(P ). If PI = ∅, then X = ∅ and
hence P = ∅. Otherwise, we have char(PI ) = char(P ) by Theorem 4.19, and
hence
P ⊆ conv(X) + char(PI ) ⊆ PI + char(PI ) = PI .

5.2 Total unimodularity Lecture 12

How do we see that a polyhedron, given by an inequality description, is integer?


In general, this is a difficult question. In this part, we study the following notion
that yields a rather simple but powerful sufficient condition for a polyhedron
being integer.
Definition 5.3 (Total unimodularity). A matrix A ∈ Rm×n is totally unimod-
ular (TU) if
det(AI,J ) ∈ {−1, 0, +1}
holds for all I ⊆ [m], J ⊆ [n] with |I| = |J|.

38
CHAPTER 5. INTEGER POLYHEDRA 39

Note that, in particular, all entries of a TU matrix are in {−1, 0, +1}.


Theorem 5.4. If A ∈ Rm×n is totally unimodular, then P ≤ (A, b) is integer for
all b ∈ Zm .
Proof. Let b ∈ Zm be arbitrary. By Theorem 5.2, it suffices to show that every
minimal face F of P contains an integer point. Recall that for every such face
we have
F = {x ∈ Rn : Aeq(F ) x = beq(F ) }.
Choose a subset I ⊆ eq(F ) such that rank(AI ) = |I|, and a subset J ⊆ [n]
such that AI,J is regular. Consider the unique point x∗ ∈ Rn satisfying AI x∗ =
det(Â(j))
bI , x∗[n]\J = 0 and note that x∗ ∈ F . By Cramer’s rule, we obtain x∗j = det(AI,J )
for all j ∈ J, where Â(j) arises from AI,J by repeating column j by the vector
bI . Since A and b are integer, we have det(Â(j)) ∈ Z. Since A is TU, we have
det(AI,J ) ∈ {−1, +1}, and hence x∗ j ∈ Z. Thus, we see that x∗ ∈ F ∩ Zn
holds.
Let us remark that the integrality of b in the above statement is important,
see {x ∈ R : 1 · x ≤ 0.5}.
Remark 5.5. If A is totally unimodular, then the following matrices are also
totally unimodular:

A| , −A,
     
A −A , A I , A −I ,
       
A A A A
, , ,
−A −A I −I

Be careful: If A1 and A2 are totally unimodular, then


 
A1
A1 A2 and
 
A2

are typically not totally unimodular! For example, 1 1 and −1 1 are


   

totally unimodular, but we have


 
1 1
det = 2.
−1 1

Definition 5.6. For A ∈ Rm×n , b ∈ Rm , `, u ∈ Rn , we define the following


polyhedra.

P+≤ (A, b) := {x ∈ Rn : Ax ≤ b, x ≥ 0}
P = (A, b) := {x ∈ Rn : Ax = b}
P+= (A, b) := {x ∈ Rn : Ax = b, x ≥ 0}
[`, u] := {x ∈ Rn : ` ≤ x ≤ u}

Corollary 5.7. Let A ∈ Rm×n be totally unimodular and b ∈ Zm , `, u ∈ Zn .


Then the following polyhedra are integer:

P ≤ (A, b), P+≤ (A, b), P+= (A, b),


P ≤ (A, b) ∩ [`, u], P = (A, b) ∩ [`, u]
CHAPTER 5. INTEGER POLYHEDRA 40

Remark 5.8. If A ∈ Rm×n is totally unimodular and b ∈ Zm , c ∈ Zn with


sup{c| x : Ax ≤ b} ∈
/ {−∞, ∞}, then we have the following strong duality:

max{c| x : Ax ≤ b, x ∈ Zn } = min{b| y : A| y = c, y ≥ 0, y ∈ Zm }

If b is not integer, we still have

max{c| x : Ax ≤ b} = min{b| y : A| y = c, y ≥ 0, y ∈ Zm }.

We close this section by mentioning that it is possible to check whether a


given matrix is totally unimodular in polynomial time. However, this is far
from obvious. In 1980, Seymour gave a characterization of totally unimodular
matrices in terms of a decomposition theory of regular matroids, leading to
such a polynomial-time algorithm. A very efficient implementation was given
by Walter & Truemper (2013) based on work of Truemper (1990).
An extremely useful characterization of totally unimodular matrices is the
following.
Theorem 5.9 (Ghouila-Houri, 1962 (column version)). A matrix A ∈ {−1, 0, +1}m×n
is totally unimodular if and only if for every subset J ⊆ [n] there exists a parti-
tion J = J+ ∪ J− , J+ ∩ J− = ∅ with
X X
A∗,j − A∗,j ∈ {−1, 0, +1}m .
j∈J+ j∈J−

We stress that the above criterion must be met for every subset J ⊆ [n].
Since A is totally unimodular if and only if A| is totally unimodular, the row
version of the statement holds as well:
Theorem 5.10 (Ghouila-Houri, 1962 (row version)). A matrix A ∈ {−1, 0, +1}m×n
is totally unimodular if and only if for every subset I ⊆ [m] there exists a par-
tition I = I+ ∪ I− , I+ ∩ I− = ∅ with
X X
Ai,∗ − Ai,∗ ∈ {−1, 0, +1}n .
i∈I+ i∈I−

It suffices to prove the column version.


Proof of Theorem 5.9. Suppose that A is totally unimodular, and let J ⊆ [n].
Consider the system

b 12 A∗,J 1c ≤ A∗,J x ≤ d 21 A∗,J 1e, x ≤ 1, x ≥ 0,

which has a solution x = 21 1. Since A is totally unimodular, the above system


(brought in ≤-form) has a totally unimodular coefficient matrix. By Theo-
rem 5.4, we see that the system also has an integer solution z ∈ ZJ . Note that
z ∈ {0, 1}J . Moreover, we have
m
A∗,J z − 21 A∗,J 1 ∈ − 12 , 0, + 12

,

and hence
m
A∗,J (2z − 1) ∈ {−1, 0, 1} .
CHAPTER 5. INTEGER POLYHEDRA 41

Thus, setting J+ := {j ∈ J : zj = 1} and J− := {j ∈ J : zj = 0}, we obtain


X X m
A∗,j − A∗,j = A∗,J (2z − 1) ∈ {−1, 0, 1} .
j∈J+ j∈J−

It remains to show if A ∈ {−1, 0, 1}m×n satisfies the Ghouila-Houri criterion,


then A is totally unimodular. Note that it suffices to consider the case m = n.
We proceed by induction on n ≥ 1 and observe that the case n = 1 is trivial.
Assume n > 1. By the induction hypothesis, all proper square submatrices of A
are totally unimodular, and hence it remains to show det(A) ∈ {−1, 0, 1}. We
may assume that A is regular. By Cramer’s rule, we have
det(A[n]\{i},[n]\{j} )
(A−1 )i,j = (−1)i+j
det(A)
for all i, j ∈ [n], and hence
det(A)A−1 ∈ {−1, 0, 1}n×n .
In particular, we have
x := det(A)(A−1 )∗,n ∈ {−1, 0, 1}n \ {0}.
To show that | det(A)| = 1, it thus suffices to show that
(A−1 )∗,n ∈ {−1, 0, 1}n
holds. To this end, consider J := {j ∈ [n] : xj 6= 0} 6= ∅. Let J+ , J− be a
partition of J with
X X
A∗,j − A∗,j ∈ {−1, 0, 1}n \ {0}.
j∈J+ j∈J−

Consider the vector y ∈ {−1, 0, 1}n defined via



1
 if j ∈ J+
yj := −1 if j ∈ J−
if j ∈

0 /J

We claim that
Ay ∈ {en , −en } (5.1)
holds. Note that this implies (A )∗,n ∈ {y, −y} ⊆ {−1, 0, +1} , as claimed.
−1 n

To show that (5.1) holds, first recall that we have


Ay ∈ {−1, 0, 1}n \ {0}. (5.2)
Note also that we have
(Ay)[n−1] = A[n−1],∗ y = A[n−1],∗ y + A[n−1],∗ x = A[n−1],∗ (y + x),
where the second equality follows from Ax = det(A)en . Observe that x + y ∈
{−2, 0, 2}n holds, and so every entry of (Ay)[n−1] is an even number. By (5.2),
this implies (Ay)[n−1] = 0 and hence, again using (5.2), we obtain (5.1).
Corollary 5.11. If A ∈ {−1, 0, +1}m×n contains at most one +1 and at most
one −1 in every row (column), then A is totally unimodular.
Proof. The row version of the claim follows from the column version of Ghouila-
Houri by setting J+ = J and J− = ∅. The column version of the claim follows
analogously from the row version of Ghouila-Houri.
CHAPTER 5. INTEGER POLYHEDRA 42

5.3 Incidence matrices


Let us collect some more examples of totally unimodular matrices together with
interesting applications.
Definition 5.12 (Incidence matrices of digraphs). The incidence matrix of a
directed graph D = (V, A) is the matrix I(D) ∈ {−1, 0, +1}V ×A with

−1 if a = (v, w)

I(D)v,a = +1 if a = (w, v)

0 else.

Theorem 5.13. Incidence matrices of directed graphs are totally unimodular.


Proof. Follows directly from Corollary 5.11.
As a first application, let us show that the famous Max-Flow-Min-Cut the-
orem is a simple consequence of what we have seen so far. To this end, let
D = (V, A) be a directed graph with capacities c ∈ RA ≥0 and s, t ∈ V , s 6= t.
The Max-Flow-Min-Cut theorem states that the maximum value of an s-t-flow
in the network (D, c) is equal to the minimum capacity of an s-t-cut in (D, c).
The nontrivial part of this statement is that if α∗ is the maximum value of an
s-t-flow, then there also exists an s-t-cut of capacity at most α∗ . Let us prove
this part here. To this end, first note that
α∗ = max{α : I(D)x = α(et − es ), 0 ≤ x ≤ c}.
By Remark 5.8, this is equal to
α∗ = min{c| y : I(D)| z + y ≥ 0, zs − zt = 1, y ∈ ZA V
≥0 , z ∈ Z }. (5.3)
Note that the inequalities in (5.3) read
y(v,w) ≥ zv − zw for all (v, w) ∈ A.
Let (y , z ) ∈
∗ ∗
ZA
≥0 ×Z V
be an optimal solution to (5.3) and consider
S ∗ := {v ∈ V : zv∗ ≥ zt∗ + 1}.
Recall that zs∗ = zt∗ + 1 and hence s ∈ S ∗ , t ∈
/ S ∗ . Thus, S ∗ is an s-t-cut. Recall
that the capacity of S is equal to

X
c(v,w) .
(v,w)∈A:
v∈S ∗ , w∈S
/ ∗

Note that since z ∗ is integer, we have V \ S ∗ = {v ∈ V : zv∗ ≤ zt∗ }. Thus, for


every (v, w) ∈ A with v ∈ S ∗ and w ∈/ S ∗ we have
1 = (zt∗ + 1) − zt∗ ≤ zv∗ − zw
∗ ∗
≤ y(v,w) .
We obtain
X X X
∗ ∗
c(v,w) ≤ c(v,w) y(v,w) ≤ c(v,w) y(v,w) = c| y ∗ = α∗ ,
(v,w)∈A: (v,w)∈A: (v,w)∈A
v∈S ∗ , w∈S
/ ∗ v∈S ∗ , w∈S
/ ∗

and hence S ∗ is a desired s-t-cut of value at most α∗ .


CHAPTER 5. INTEGER POLYHEDRA 43

Definition 5.14 (Incidence matrices of undirected graphs). The incidence ma-


trix of an undirected graph G = (V, E) is the matrix I(G) ∈ {−1, 0, +1}V ×E
with (
1 if v ∈ e
I(G)v,e =
0 else.

While incidence matrices of directed graphs are always totally unimodular,


this is not true for undirected graphs. To this end, recall that an undirected
graph G = (V, E) is bipartite if we can partition V = U ∪ W , U ∩ W = ∅ such
that every edge in G contains one node from U and one node from W .

Theorem 5.15. The incidence matrix of an undirected graph G is totally uni-


modular if and only if G is bipartite.
Proof. Exercises (use Ghouila-Houri).
Definition 5.16 (Matching). A matching in an undirected graph G = (V, E)
is a subset of edges M ⊆ E with |δ(v) ∩ M | ≤ 1 for all v ∈ V , where δ(v) :=
{e ∈ E : v ∈ e}.
Definition 5.17 (Characteristic vector). The characteristic vector of a subset
M ⊆ T of a finite set T is the vector χ(M ) ∈ {0, 1}T with
(
1 if e ∈ M
χ(M )e =
0 if e ∈/ M.

Note that the characteristic vectors of matchings in an undirected graph


G = (V, E) are precisely the vectors x ∈ {0, 1}E satisfying I(G)x ≤ 1.

Corollary 5.18 (Matching polytopes of bipartite graphs). For every bipartite


graph G = (V, E) we have

conv{χ(M ) : M is a matching in G} = x ∈ RE : I(G)x ≤ 1, x ≥ 0 .




A direct consequence is the following: Given edge weights c, the maximum


weight of a matching in a bipartite graph can be directly computed by solving
the linear program

max {c| x : I(G) ≤ 1, x ≥ 0}


 
X X 
= max ce x e : xe ≤ 1 for all v ∈ V, x ≥ 0 .
 
e∈E e∈δ(v)

However, this does not work for nonbipartite graphs. For instance, if G is
a cycle on three nodes with edge weights c ≡ 1, then the above LP has an
optimal value of 32 , whereas the maximum cardinality of matching is only 1.
In fact, for nonbipartite graphs more inequalities are needed to describe their
matching polytope.
Another application of Corollary 5.18 is the following well-known result con-
cerning the relation between matchings and vertex covers in bipartite graphs.
Recall that a set of nodes C ⊆ V is a vertex cover of an undirected graph
G = (V, E) if every edge in E is incident to at least one node in C.
CHAPTER 5. INTEGER POLYHEDRA 44

Corollary 5.19 (König’s theorem). In every bipartite graph, the maximum


cardinality of a matching is equal to the minimum cardinality of a vertex cover.

Proof. Let G = (V, E) be a bipartite graph and let

α := max 1| x : I(G) ≤ 1, x ∈ {0, 1}E




= max 1| x : I(G) ≤ 1, x ≥ 0, x ∈ ZE


denote the largest cardinality of a matching in G. Since I(G) is totally unimod-


ular, by Remark 5.8, we have

α = min 1| y : I(G)| ≥ 1, y ≥ 0, y ∈ ZV

( )
X
= min yv : yv + yw ≥ 1 for all {v, w} ∈ E, y ≥ 0, y ∈ ZV

v∈V
( )
X
= min yv : yv + yw ≥ 1 for all {v, w} ∈ E, y ∈ {0, 1}V ,
v∈V

which yields the claim.


We mention that many min-max relations in combinatorial optimization are
consequences of polyhedral arguments.

5.4 Integer-valued polyhedra Lecture 13

In this section, we provide another useful charaterization of integer polyhedra.

Definition 5.20 (Integer-valued polyhedra). A polyhedron P ⊆ Rn is called


integer-valued if
max{c| x : x ∈ P } ∈ Z ∪ {−∞, +∞}
holds for all c ∈ Zn .
Theorem 5.21. A rational polyhedron is integer if and only if it is integer-
valued.

Lemma 5.22. If F is a minimal face of a rational polyhedron P ⊆ Rn with

F ∩ Zn = ∅,

then F is defined by a rational linear inequality c| x ≤ δ with

{x ∈ Zn : c| x = δ} = ∅. (5.4)

Proof. Let P = P ≤ (A, b) be a rational polyhedron. After rescaling A, b, we may


assume that A ∈ Zm×n and b ∈ Zm holds. Let F be a minimal face of P with
F ∩ Zn = ∅. Setting I := eq(F ), we know that F = {x ∈ Rn : AI x = bI }. By
Theorem 3.15, there is a vector y ∈ Qm with y | AI ∈ Zn and y | bI ∈ / Z. Since
A and b are integer, by possibly adding an integer multiple of 1 to y, we may
assume y > 0. Setting c| := y | AI and δ := y | bI we see that (5.4) is satisfied.
CHAPTER 5. INTEGER POLYHEDRA 45

It remains to show that P ⊆ H ≤ (c, δ) and F = P ∩ H = (c, δ). To this end,


let x ∈ P . Since y ≥ 0, we have
X X
c| x = yi Ai,∗ x ≤ yi bi = δ,
i∈I i∈I

and hence x ∈ H ≤ (c, δ). Moreover, since y > 0, we see that c| x = δ holds if and
only if Ai,∗ x = bi for all i ∈ I = eq(F ). In other words, we have x ∈ H = (c, δ) if
and only if x ∈ F .
Proof of Thm. 5.21. Let P be a rational polyhedron. Suppose first that P is
integer and c ∈ Zn with

δ := max{c| x : x ∈ P } ∈
/ {−∞, +∞}.

Thus, the inequality c| x ≤ δ is valid for P and defines a nonempty face

F := {x ∈ P : c| x = δ} =
6 ∅

of P . By Theorem 5.2, F contains an integer point z and hence

δ = c| z ∈ Z.

Suppose now that P is not integer. Again by Theorem 5.2, there exists a
minimal face F of P with F ∩ Zn = ∅. By the previous lemma, F is defined by
a rational linear inequality c| x ≤ δ with

{x ∈ Zn : c| x = δ} = ∅.

Note that since F is nonempty we have

δ = max{c| x : x ∈ P }.

By possibly rescaling c, we may assume that c ∈ Zn and gcd(c1 , c2 , . . . , cn ) = 1.


In the exercises, we have seen that there exists an integer point x ∈ Zn with
c| x = 1, or, equivalently, that Λ(c1 , . . . , cn ) = Z holds. This means that δ ∈ /
Z.

5.5 Total dual integrality Lect’s 14 & 15

We have seen that total unimodularity is a useful notion to explain the inte-
grality of certain polyhedra. However, many integer polytopes appearing in
combinatorial optimization are not described by totally unimodular matrices.
In this section, we study the more general notion of total dual integrality (TDI),
which turns out to be applicable to all integer polyhedra and can be seen as a
unifying tool to explain many combinatorial min-max theorems (such as max-
flow-min-cut or König’s theorem discussed earlier). In fact, many such theorems
can be phrased as “a certain linear program and its dual each have integer op-
timal solutions”.
CHAPTER 5. INTEGER POLYHEDRA 46

Definition 5.23 (Total dual integrality). A rational system of linear inequali-


ties Ax ≤ b with A ∈ Qm×n , b ∈ Qm is totally dual integral (TDI) if

for all c ∈ Zn ,

the (dual) problem


min{b| y : A| y = c, y ≥ 0}
has an integer optimal solution (if the optimal value is finite).
Let us mention that being totally dual integral is a property of an inequality
system, not of a polyhedron. Moreover, observe that does not make assumptions
on integral solutions of the primal system. However, we will see that they come
for free.
Observation 5.24. If A is totally unimodular, then Ax ≤ b is totally dual
integral for all rational right-hand sides b.
Theorem 5.25. Let Ax ≤ b be totally dual integral with A ∈ Qm×n and integer
right-hand side b ∈ Zm . Then P ≤ (A, b) is integer.
Moreover, for all c ∈ Zn with sup{c| x : Ax ≤ b} ∈
/ {−∞, +∞} we have

max{c| x : Ax ≤ b, x ∈ Zn } = min{b| y : A| y = c, y ≥ 0, y ∈ Zm }.

Proof. Using LP duality, we see that P ≤ (A, b) is integer-valued and hence it is


integer by Theorem 5.21. For any c ∈ Zn with sup{c| x : Ax ≤ b} ∈ / {−∞, +∞}
we see that

max{c| x : Ax ≤ b, x ∈ Zn } = max{c| x : Ax ≤ b, x ∈ Rn }
= min{b| y : A| y = c, y ≥ 0, y ∈ Rm }
= min{b| y : A| y = c, y ≥ 0, y ∈ Zm }

holds, where the equalities follow from the integrality of P , LP duality, and
total dual integrality, respectively.
Remark 5.26. Let Ax ≤ b be totally dual integral and σ ∈ Z≥1 . In general,
σA · x ≤ σb is not totally dual integral. However, σ1 A · x ≤ σ1 b is totally dual
integral.
Remark 5.27. For every rational system Ax ≤ b there is a rational number
τ > 0 such that τ A · x ≤ τ b is totally dual integral.
Proof. Let η1 ∈ Z≥1 be such that η1 A is integer and let η2 denote a common
integer multiple of all subdeterminants of η1 A. Define τ := ηη21 .
Let c ∈ Zn such that max{c| x : τ Ax ≤ τ b} ∈ / {−∞, ∞}. Consider an
optimal vertex solution y ∗ of

min{τ b| y : τ A| y = c, y ≥ 0}

(which exists since the underlying polyhedron is pointed). Then z ∗ = 1 ∗


η2 y is
an optimal vertex solution of

min{η1 b| z : η1 A| z = c, z ≥ 0}.

By Cramer’s rule, every entry of z ∗ is a rational number whose denominator is


a subdeterminant of η1 A. Thus, y ∗ = η2 z ∗ is an integer vector.
CHAPTER 5. INTEGER POLYHEDRA 47

Theorem 5.28. A system Ax ≤ b with integer matrix A and rational right-


hand side b is totally dual integral if and only if for every minimal face F of
P ≤ (A, b), the vectors
Ai,∗ i ∈ eq(F )
form a Hilbert basis of the normal cone

NF (P ≤ (A, b)) = ccone{Ai,∗ : i ∈ eq(F )}. (5.5)

Remark 5.29. If F is a face of a polyhedron P ⊆ Rn , then the normal cone


NF (P ) of F at P is the set of all c ∈ Rn for which

F ⊆ arg max{c| x : x ∈ P }.

The equality (5.5) is a consequence of LP duality and complementary slackness.


Proof of Theorem 5.28. Let A ∈ Zm×n and b ∈ Zm . Suppose that Ax ≤ b is
totally dual integral and F a minimal face of P ≤ (A, b). We have to show that

ccone({Ai,∗ : i ∈ eq(F )}) ∩ Zn = Λ+ ({Ai,∗ : i ∈ eq(F )})

holds. The inclusion ⊇ is clear since A is integer. Let c ∈ ccone({Ai,∗ : i ∈


eq(F )}) ∩ Zn . Let x∗ ∈ F be arbitrary. Since c is in the normal cone of F , x∗ is
an optimal solution of max{c| x : Ax ≤ b}. Since Ax ≤ b is totally dual integral,
there is an optimal integer dual solution y ∗ ∈ Zm≥0 with A y = c. Since F is a
| ∗

minimal face of P (A, b), we have


Ai,∗ x∗ < bi for all i ∈


/ eq(F ),

since otherwise x∗ would be contained in a proper face of F , contradicting the


minimality of F . By complementary slackness, we have

yi∗ = 0 for all i ∈


/ eq(F )

and hence
X
c = A| y ∗ = yi∗ Ai,∗ ∈ Λ+ ({Ai,∗ : i ∈ eq(F )}).
i∈eq(F )

For the converse direction of the proof, assume now that for every minimal
face F of P ≤ (A, b), the vectors Ai,∗ , i ∈ eq(F ) form a Hilbert basis of

ccone({Ai,∗ : i ∈ eq(F )}).

Let c ∈ Zn be such that η := sup{c| x : Ax ≤ b} ∈


/ {−∞, ∞}. Let

F ∗ ⊆ {x ∈ P ≤ (A, b) : c| x = η} =
6 ∅

be a minimal face of P ≤ (A, b), and pick any point x∗ ∈ F ∗ . By LP duality, there
is a vector y ∗ ∈ Rm
≥0 with A y = c. Note that Ai,∗ x < bi for all i ∈
| ∗ ∗
/ eq(F ∗ )
and hence, by complementary slackness, we even have

c ∈ ccone({Ai,∗ : i ∈ eq(F ∗ )})


CHAPTER 5. INTEGER POLYHEDRA 48

By our assumption, we even have

c ∈ Λ+ ({Ai,∗ : i ∈ eq(F ∗ )})

and hence there exists λi ∈ Z≥0 , i ∈ eq(F ∗ ) with c = i∈eq(F ∗ ) λi Ai,∗ . Filling
P
with zeros, this means that there is a vector λ ∈ Zm≥0 with

A| λ = c

and
 |
X X X
b| λ = bi λi = Ai,∗ x∗ λi =  λi Ai,∗  x∗ = c| x∗ ,
i∈eq(F ∗ ) i∈eq(F ∗ ) i∈eq(F ∗ )

and hence λ is an integer optimal dual solution.

Corollary 5.30. The rows of A ∈ Zm×n form a Hilbert basis of ccone({A1,∗ , . . . , Am,∗ })
if and only if Ax ≤ 0 is totally dual integral.
Proof. The claim follows immediately from Theorem 5.28 by observing that
lineal(P ≤ (A, 0)) = kern(A) is the only minimal face of the polyhedral cone
P ≤ (A, 0).

Theorem 5.31. For every rational polyhedron P ⊆ Rn there is a totally dual


integral system Ax ≤ b with A ∈ Zm×n and P = P ≤ (A, b).
The right-hand side b can be chosen to be integer if and only if P is integer.
Proof. We may assume that P 6= ∅. Let F denote the set of all minimal faces
of P . For every F ∈ F, the normal cone NF (P ) is a rational polyhedral cone.
By Corollary 4.17, NF (P ) has a Hilbert basis H(F ) ⊆ Zn . Note that we have

NF (P ) ∩ Zn = Λ+ (H(F ))

and
NF (P ) = ccone(H(F )).
For every F ∈ F, pick any point x(F ) ∈ F with x(F ) ∈ Zn if F ∩ Zn 6= ∅. We
claim that the system Ax ≤ b defined by

a| x ≤ a| x(F ) for all F ∈ F, a ∈ H(F )

is a totally dual integral system for P . Note that A is integer by construction.


Moreover, we see that if P is integer, by Theorem 5.2, every face contains an
integer point and hence b is integer. Conversely, once we have shown that Ax ≤ b
is totally dual integral, we recall that b can only be integer if P is integer, by
Theorem 5.25.
Let us first show that P = P ≤ (A, b) holds. The inclusion ⊆ is clear since for
every F ∈ F and a ∈ H(F ) ⊆ NF (P ) we have

max{a| x : x ∈ P } = a| x(F )

and hence
a| x ≤ a| x(F )
CHAPTER 5. INTEGER POLYHEDRA 49

holds for all x ∈ P .


To see that P ≤ (A, b) ⊆ P holds, it suffices to show that every inequality
c x ≤ δ that is valid for P is also valid for P ≤ (A, b). Let c| x ≤ δ be valid for P .
|

We may assume that δ = max{c| x : x ∈ P } and hence the inequality defines a


nonempty face of P . Let us pick a minimal face F ∗ inside this face. Note that
we have
c ∈ NF ∗ (P ) = ccone(H(F ∗ ))
as well as
δ = c| x(F ∗ ).
Thus, for every a ∈ H(F ∗ ) there is some λa ≥ 0 such that
X
c= λa a.
a∈H(F ∗ )

For every x ∈ P ≤ (A, b) we thus have


 |
X X X
c| x =  λa a x = λa a| x ≤ λa a| x(F ∗ )
a∈H(F ∗ ) a∈H(F ∗ ) a∈H(F ∗ )
 |
X
= λa a x(F ∗ ) = c| x = δ.
a∈H(F ∗ )

It remains to show that Ax ≤ b is totally dual integral. By Theorem 5.28 we


only have to show that, for every face F of P , the Hilbert basis H(F ) of NF (P )
is contained in
{Ai,∗ : i ∈ eqA,b (F )}.
This holds since for all x ∈ F and all a ∈ H(F ) we have a| x = a| x(F ) (since
a ∈ NF (P ) and x(F ) ∈ F ).

Recall that we can efficiently test if a given matrix is totally unimodular.


Unfortunately, it is coNP-complete to decide whether a given system is totally
dual integral. Moreover, given an inequality description Ax ≤ b, it is coNP-
complete to decide whether P ≤ (A, b) is integer.
Chapter 6

Chvátal-Gomory cuts

6.1 Idea Lecture 16

In the previous chapter, we have studied the idealistic case of integer programs
defined by an integer polyhedron, in which case the integer program reduces to a
linear program. In practice, however, we are usually confronted with polyhedra
that are not integer and whose integer hulls are difficult to compute. One
very effective technique for dealing with such integer programs is to refine the
formulation by (iteratively) adding so-called cuts. Consider an IP

max{c| x : Ax ≤ b, x ∈ Zn }.

Let x∗ be an optimal solution for the LP

max{c| x : Ax ≤ b}

and assume that it is a vertex of

P := P ≤ (A, b).

If x∗ ∈ Zn , then it is an optimal solution to the IP and we are done. Since


this is very unlikely, let us assume that x∗ ∈ / Zn . Recall that in this case
Lemma 5.22 states that we can compute a rational linear inequality a| x ≤ β
with {x ∈ Zn : a| x = β} = ∅ that defines x∗ , i.e.,

a| x∗ = β = max{a| x : x ∈ P }.

In particular, the inequality


a| x ≤ β
is valid for P . We may assume that

a ∈ Zn

and gcd(a1 , . . . , an ) = 1, in which case the requirement {x ∈ Zn : a| x = β} = ∅


is equivalent to β ∈ / Z. A simple (but crucial) observation is now that the
inequality
a| x ≤ bβc (6.1)

50
CHAPTER 6. CHVÁTAL-GOMORY CUTS 51

is still valid for PI and hence we may add it to our integer program without
changing the feasible region (of integer points). Note also that the inequal-
ity (6.1) is violated by x∗ and hence it is “cut off”. In view of this, the inequal-
ity (6.1) is called a cut. More specifically, cuts that are derived in the above way
are called Chvátal-Gomory cuts, named after Vašek Chvátal and Ralph Gomory.
During this course, we will discuss several ways to derive cuts and how to deal
with them algorithmically. Before we do this, let us have a closer look at cuts
that are derived in the above way.
Let us start with the following summary.
Observation 6.1. Let P ⊆ Rn be a polyhedron and a ∈ Zn , β ∈ R with
P ⊆ H ≤ (a, β). Then we have
PI ⊆ H ≤ (a, β) I ⊆ H ≤ (a, bβc) ⊆ H ≤ (a, β).


Actually, if a is appropriately scaled and β rational, then the above relation


can be simplified:
Lemma 6.2. For a ∈ Zn \ {0} with gcd(a1 , . . . , an ) = 1 and β ∈ Q we have
H ≤ (a, β) I = H ≤ (a, bβc).


Proof. The inclusion ⊆ is clear. For the reverse inclusion, note that we have
H ≤ (a, bβc) I ⊆ H ≤ (a, β) I
 

and hence it remains to show that H ≤ (a, bβc) is integer. By Theorem 5.2 it
suffices to show that every minimal face of H ≤ (a, bβc) contains an integer point.
However, the only minimal face is F := H = (a, bβc). Recall that
Λ({a1 , . . . , an }) = gcd(a1 , . . . , an )Z = Z
and hence there is some x ∈ Zn with a| x = bβc. In particular, we have x ∈
F ∩ Zn .
Definition 6.3. For n ∈ Z≥1 let
Hn = H ≤ (a, β) : a ∈ Zn \ {0}, β ∈ Q


denote the set of all rational (closed affine) halfspaces in Rn .


Note that Lemma 6.2 yields a simple way to determine the integer hull of
any rational halfspace.

6.2 Chvátal-Gomory closure Lecture 17

In this section, we want to understand the strength of Chvátal-Gomoroy (CG)


cuts. To this end, let us collect all Chvátal-Gomoroy cuts and add them to the
original description simultaneously.
Definition 6.4 (Chvátal-Gomory closure). For a polyhedron P ⊆ Rn we denote
by \
P 0 := HI
H∈Hn :P ⊆H

the Chvátal-Gomory closure of P .


CHAPTER 6. CHVÁTAL-GOMORY CUTS 52

With this definition, several questions arise. It is clear that the Chvátal-
Gomory closure is a closed convex set, but is it also a polyhedron? How does
it relate to the integer hull? Is it easy to compute it? Let us address these
questions step by step.
Remark 6.5. For every rational polyhedron P we have PI ⊆ P 0 ⊆ P . If P1 , P2
are two polyhedra with P1 ⊆ P2 , then P10 ⊆ P20 .
Theorem 6.6. If Ax ≤ b is totally dual integral with A ∈ Zm×n and b ∈ Qm ,
then
(P ≤ (A, b))0 = P ≤ (A, bbc),
where bbc = (bb1 c, . . . , bbm c).

Proof. The inclusion ⊆ is clear. It remains to show that for every c ∈ Zn and
δ ∈ Q with P ≤ (A, b) ⊆ H ≤ (c, δ) we also have P ≤ (A, bbc) ⊆ H ≤ (c, bδc). We may
assume that P ≤ (A, b) 6= ∅, otherwise we have (P ≤ (A, b))0 = ∅ = P ≤ (A, bbc).
Since Ax ≤ b is totally dual integral, there is some y ∈ Zm ≥0 with A y = c and
|

b| y = max{c| x : x ∈ P ≤ (A, b)} ≤ δ.

For every x ∈ P ≤ (A, bbc) we obtain


m
$m % $m %
X X X
c| x = y | Ax ≤ y | bbc = yi bbi c = yi bbi c ≤ yi bi ≤ bδc.
i=1 i=1 i=1

Corollary 6.7. The Chvátal-Gomory closure of a rational polyhedron is again


a rational polyhedron.
Proof. By Theorem 5.31, every rational polyhedron can be described by a TDI-
system with integer coefficient matrix, for which we can apply the previous
result.

As we will see in the exercises, the Chvátal-Gomory closure may be still far
from the integer hull. However, nothing prevents us from applying the closure
multiple times. Clearly, the hope is that we reach the integer hull at some point.
Fortunately, this will be the case.
Definition 6.8. For a polyhedron P let

P (0) := P

and  0
P (t) := P (t−1)

for all t ∈ Z≥1 .


Remark 6.9. For every polyhedron P we have

P = P (0) ⊇ P (1) ⊇ P (2) ⊇ · · · ⊇ PI .

Theorem 6.10. For every rational polyhedron P there exists some t ∈ Z≥0
such that PI = P (t) .
CHAPTER 6. CHVÁTAL-GOMORY CUTS 53

Lemma 6.11. If F is a face of a rational polyhedron P , then

F (t) = P (t) ∩ F

for all t ∈ Z≥0 .


Proof. Since the statement is trivial if P is empty, we may assume P 6= ∅. It
remains to show the claim for t = 1 since the general claim follows by a simple
induction, starting with t = 0. In fact, for general t ≥ 1 we can argue as follows.
Setting F̃ := P (t−1) ∩ F , we see that F̃ is a face of P (t−1) and hence
 0  0  0
F (t) = F (t−1) = P (t−1) ∩ F = (F̃ )0 = P (t−1) ∩ F̃
= P (t) ∩ F̃ = P (t) ∩ P (t−1) ∩ F = P (t) ∩ F,

where the second equality follows from the induction hypothesis for t − 1, and
the fourth equality from the induction hypothesis for t = 1.
To show the claim for t = 1, let us pick a TDI system Ax ≤ b with

P = P ≤ (A, b), A ∈ Zm×n , b ∈ Qm ,

which exists by Theorem 5.31. Recall that by Theorem 6.6 we have

P 0 = P ≤ (A, bbc).

Let a| x ≤ β be an inequality valid for P with

F = {x ∈ P : a| x = β}, a ∈ Zn , β ∈ Z.

We will show that the system

Ax ≤ b, a| x ≤ β, −a| x ≤ −β (6.2)

is totally dual integral. Note that this yields (again using Theorem 6.6)

F 0 = {x ∈ Rn : Ax ≤ bbc, a| x ≤ bβc , −a| x ≤ − bβc }


|{z} |{z}
=β =β
n |
= {x ∈ R : Ax ≤ bbc, a x = β}
= P ≤ (A, bbc) ∩ F
= P 0 ∩ F.

To show that (6.2) is totally dual integral, we use Theorem 5.28: We show that
for every face G 6= ∅ of F the set

A ∪ {−a, a} with A := {Ai,∗ : i ∈ eqA,b (G)}

forms a Hilbert basis. Since Ax ≤ b is a totally dual integral system, A forms


a Hilbert basis (again by Theorem 5.28). Note that since a| x ≤ β defines the
face F and since eqA,b (F ) ⊆ eqA,b (G), we have

a ∈ ccone(A),

and hence
A ∪ {a}
CHAPTER 6. CHVÁTAL-GOMORY CUTS 54

is also a Hilbert basis. Let now

x ∈ ccone(A ∪ {−a, a}) ∩ Zn .

For sufficiently large λ ∈ Z≥0 we have

x̃ := x + λa ∈ ccone(A ∪ {a}) ∩ Zn = Λ+ (A ∪ {a}),

which means
x = x̃ + λ(−a) ∈ Λ+ (A ∪ {−a, a}).
Lemma 6.12. For every rational polyhedron P ⊆ Rn and t ∈ Z≥0 we have:
1. If U ∈ Zn×n is totally unimodular, then

P (t) = PI ⇐⇒ (U P )(t) = (U P )I .

2. For every y ∈ Zn we have

P (t) = PI ⇐⇒ (y + P )(t) = (y + P )I .

3. For every k ∈ Z≥0 we have

P (t) = PI ⇐⇒ (P × {0k })(t) = (P × {0}k )I .

Proof. Recall that U Zn = Zn = U −1 Zn holds for unimodular matrices U ∈


Zn×n . The claims follow from the identities

(U P )I = U PI , (y + P )I = y + PI , (P × {0k })I = PI × {0k }

and

(U P )0 = U P 0 , (y + P )0 = y + P 0 , (P × {0k })0 = P 0 × {0k },

which easily follow from the definitions of the integer hull and the Chvátal-
Gomory closure, respectively.
Proof of Theorem 6.10. We prove the claim by induction on d = dim(P ). If
d = −1, then P = ∅ and P (0) = ∅ = PI . For d = 0, we have P = {v} for some
v ∈ Qn . If v ∈ Zn , then P (0) = {v} = PI . Otherwise, there is some i ∈ [n] with
/ Z. The (contradicting) inequalities
vi ∈

xi ≤ bvi c and − xi ≤ b−vi c

are valid for P 0 and hence P (1) = ∅ = PI .


Let now d ≥ 1. Let us argue that it suffices to consider the case d = n. If
aff(P ) contains no integer point, we may use our algorithm for solving linear
Diophantine systems (see Theorem 3.15) to obtain a linear equation

c| x = δ with c ∈ Zn , δ ∈ Q \ Z

that is valid for aff(P ), and hence also for P . The contradicting inequalities

c| x ≤ bδc and − c| x ≤ −b−δc


CHAPTER 6. CHVÁTAL-GOMORY CUTS 55

show that P 0 = ∅ and hence P (1) = ∅ = PI . Thus, by Lemma 6.12.2 we may


assume that
0 ∈ aff(P ).
Suppose that d < n. Then there exists a matrix
C ∈ Z(n−d)×n with rank(C) = n − d and aff(P ) = kern(C).
By Corollary 3.14 there is a unimodular matrix U ∈ Zn×n such that
CU = [B 0]
(n−d)×(n−d)
is the Hermite normal form of C, where B ∈ Z≥0 is regular. Note that

aff(U −1 P ) = U −1 aff(P ) = {U −1 x : Cx = 0, x ∈ Rn }
= {y ∈ Rn : CU y = 0}
= {y ∈ Rn : [B 0]y = 0}
= {0n−d } × Rd .
Thus,
U −1 P = {0n−d } × Q
for some rational polyhedron Q ⊆ Rd with dim(Q) = d. By Lemma 6.12.1
and 6.12.3 we have P (t) = PI if and only if Q(t) = QI , and hence we indeed
may assume d = n.
Since PI is a rational polyhedron, it suffices to show that for every c ∈ Zn ,
δ ∈ Z with PI ⊆ H ≤ (c, δ) there is some r ∈ Z≥0 such that
P (r) ⊆ H ≤ (c, δ). (6.3)
If PI = ∅, it suffices to consider c = 0 and δ = −1. Otherwise, we have
char(PI ) = char(P ) and hence we may always assume sup{c| x : x ∈ P } < ∞.
In particular, we see that
γ := min{δ 0 ∈ Z : δ 0 ≥ δ, P (r) ⊆ H ≤ (c, δ 0 ) for some r}
always exists. Let r be such that
P (r) ⊆ H ≤ (c, γ)
and consider the face
F := P (r) ∩ H = (c, γ)
of P (r) . If F ∩ Zn 6= ∅, then γ = δ and (6.3) is satisfied. Otherwise, we have
FI = ∅ and since dim(F ) < n = d the induction hypthesis asserts that there is
some s ∈ Z≥0 with
∅ = FI = F (s) = (P (r) )(s) ∩ F = P (r+s) ∩ F,
where the third equality is due to Lemma 6.11 and since F is a face of P (r) .
Thus, we have c| x < γ for all x ∈ P (r+s) . Since P (r+s) is a polyhedron, there
exists some γ 0 ∈ Q, γ 0 < γ such that c| x ≤ γ 0 for all x ∈ P (r+s) . This means
that
P (r+s+1) ⊆ H ≤ (c, γ − 1).
By the minimality of γ, we must have γ − 1 < δ. Since γ, δ ∈ Z this implies
γ ≤ δ and hence
P (r) ⊆ H ≤ (c, γ) ⊆ H ≤ (c, δ).
CHAPTER 6. CHVÁTAL-GOMORY CUTS 56

We remark that the above theory does not extend to non-rational polyhedra.
For example, if √
P = {(x, y) ∈ R2 : y ≤ 2x, x ≥ 0}
we see that
P = P (0) = P (1) = P (2) = · · · ) PI .
(Recall that in this case PI is not even a polyhedron in this case). For

Q = {(x, y) ∈ R2 : y ≤ 2x, x ≥ 1}

we have
Q = Q(0) ) Q(1) ) Q(2) ) Q(3) ) . . . .
However, Dadush, Dey, Vielma (2013) showed that the Chvátal-Gomory close
of any compact convex set is a rational polytope, and hence our theory also
applies for such sets.
For a positive example, let us recall that the matching polytope of a bipartite
graph G = (V, E) is described by

P (G) := x ∈ RE : x(δ(v)) ≤ 1 for all v ∈ V, x ≥ 0 .




Here, for a vector x ∈ RE and a set F ⊆ E we use the notation x(F ) := e∈F xe .
P
In the exercises we have seen that this description is unfortunately not sufficient
in the case that G is nonbipartite. Observe that every vector x ∈ P (G) satisfies

|U |
x(E(U )) ≤
2
for every subset U ⊆ V . This means that the inequalities

|U | − 1
x(E(U )) ≤ for all U ⊆ V, |U | odd (6.4)
2
are valid for P (G). As shown in a seminal paper by Jack Edmonds, it turns out
that adding these inequalities is sufficient for obtaining the matching polytope,
i.e., the matching of a general graph G = (V, E) is given by

P (G)0 = {x ∈ P (G) : x satisfies (6.4)}.

This example is very exceptional. For most polytopes that arise in combinatorial
optimization we have to apply several rounds of the Chvátal-Gomory closure to
reach the integer hull. Any interesting result yielding a bound on the number
of such iterations is due to Eisenbrand & Schulz (2003): For every rational
polytope P ⊆ [0, 1]n , we have P (r) = PI for some r = O(n2 log n). Moreover,
Rothvoß and Sanità (2016) showed that some of these polytopes need r = Ω(n2 )
rounds.
To conclude this section, let us observe that Chvátal-Gomory cuts yield a
nice proof system:
Corollary 6.13 (Chvátal-Gomory proof system). Let Ax ≤ b be any rational
system of linear inequalities and let c| x ≤ δ be any rational inequality that is
valid for X := P ≤ (A, b) ∩ Zn . Then (a rescaling of ) c| x ≤ δ can be obtained by
a finite number of steps of the following procedure:
CHAPTER 6. CHVÁTAL-GOMORY CUTS 57

1. Take a nonnegative combination of the inequalities in Ax ≤ b to obtain an


inequality a| x ≤ β with integer left-hand side a.

2. Derive the strengthened inequality a| x ≤ bβc (which is valid for X ).


3. Add this inequality to Ax ≤ b and repeat.
Chapter 7

Split cuts

7.1 Split disjunctions Lecture 18

In the previous chapter we have seen that Chvátal-Gomory cuts provide a sys-
tematic way to derive nontrivial inequalities that are valid for the integer hull
of a polyhedron P ⊆ Rn . In particular, we have seen that it is easy to de-
rive Chvátal-Gomory cuts that separate a given fractional vertex x∗ of P from
PI .1 Such routines are central components of modern integer programming
solvers since most cuts are in fact generated to separate fractional vertices, and
Chvátal-Gomory cuts turn out to be a good starting point. However, in the
last centuries a large number of different families of cuts have been studied,
and many of them are implemented in modern solvers. A nice example to il-
lustrate other ways cuts can be derived is the family of split cuts, which are
generalizations of Chvátal-Gomory cuts.
The idea behind split cuts is based on the following simple “disjunction”.
If π ∈ Zn and π0 ∈ Z, then every integer point z ∈ Zn satisfies π | z ≤ π0 or
π | z ≥ π0 +1. In particular, every integer point in P is contained in P ∩H ≤ (π, π0 )
or P ∩ H ≥ (π, π0 + 1). This motivates the following definition.

Definition 7.1. For P ⊆ Rn and (π, π0 ) ∈ Zn+1 we define

split(P, π, π0 ) := conv P ∩ H ≤ (π, π0 ) ∪ P ∩ H ≥ (π, π0 + 1) .


 

Observation 7.2. For every polyhedron P ⊆ Rn and (π, π0 ) ∈ Zn+1 we have

PI ⊆ split(P, π, π0 ).

Note that for every fractional vertex x∗ , we can easily find a pair (π, π0 ) ∈
Zn+1
such that split(P, π, π0 ) does not contain x∗ :
Observation 7.3. Let x∗ be a fractional vertex of a polyhedron P ⊆ Rn , i.e.,
x∗j ∈
/ Z for some j ∈ [n]. Setting π := ej and π0 := bx∗j c we have

x∗ ∈
/ split(P, π, π0 ).
1 Finding an inequality separating an arbitrary point in P \ P from P is a much harder
I I
task.

58
CHAPTER 7. SPLIT CUTS 59

Let us try to get a better idea of split(P, π, π0 ). To this end, first recall the
definition of a Chvátal-Gomory cut π | x ≤ π0 , which arises from an inequality
π | x ≤ β that is valid for P by setting π0 = bβc. In this case, we see that
P ∩ H ≥ (π, π0 + 1) = ∅ and hence split(P, π, π0 ) = P ∩ H ≤ (π, π0 ).
In general, however, (π, π0 ) ∈ Zn+1 , split(P, π, π0 ) is the convex hull of the
union of the two (nonempty) polyhedra P ∩ H ≤ (π, π0 ) and P ∩ H ≥ (π, π0 + 1).
We will see that split(P, π, π0 ) is still a polyhedron. Thus, if x∗ ∈/ split(P, π, π0 ),
then there must be a linear inequality valid for split(P, π, π0 ) but violated by
x∗ . Such an inequality is called a split cut.
Unfortunately, the polyhedron split(P, π, π0 ) is typically more complicated
than P : For instance, the number of facets of split(P, π, π0 ) may be exponential
in the number of facets of P and hence determining an inequality description of
split(P, π, π0 ) may not be practical. However, split(P, π, π0 ) has the following
interesting description.

Theorem 7.4. For every polyhedron P = P ≤ (A, b) with A ∈ Rm×n , b ∈ Rm


and (π, π0 ) ∈ Zn+1 we have

split(P, π, π0 ) = {x ∈ Rn : x = x(0) + x(1) , Ax(0) ≤ λ0 b,


π | x(0) ≤ λ0 π0 ,
Ax(1) ≤ λ1 b,
π | x(1) ≥ λ1 (π0 + 1),
λ0 + λ1 = 1,
λ0 , λ1 ≥ 0,
x (0)
, x(1) ∈ Rn }. (7.1)

In particular, split(P, π, π0 ) is a linear projection of a polyhedron and hence a


polyhedron.
Proof. We avoid some technical details and provide a prove for the case of P
being a nonempty polytope, in which we have

char(P ) = P ≤ (A, 0) = {0}. (7.2)

To see the inclusion “⊆”, let x ∈ split(P, π, π0 ). Note that we can write

x = λ0 y (0) + λ1 y (1) ,

where
y (0) ∈ P ∩ H ≤ (π, π0 ) and y (1) ∈ P ∩ H ≥ (π, π0 + 1),
and
λ0 + λ1 = 1, λ0 , λ1 ≥ 0.
Setting
x(i) := λi y (i)
for i = 0, 1, we see that (x, x(0) , x(1) , λ0 , λ1 ) satisfy all constraints defining the
set on the right-hand side of (7.1).
For the inclusion “⊇”, let (x, x(0) , x(1) , λ0 , λ1 ) satisfy the constraints on the
right-hand side of (7.1). Suppose first that λ0 = 0, in which case λ1 = 1.
CHAPTER 7. SPLIT CUTS 60

Moreover, we have Ax(0) ≤ 0 and hence x(0) = 0 by (7.2). Thus, we obtain


x = x(1) , Ax = Ax(1) ≤ λ1 b = b and π | x = π | x(1) ≥ λ1 (π0 + 1) = π0 + 1, and
hence
x ∈ P ∩ H ≥ (π, π0 + 1) ⊆ split(P, π, π0 ).
The case λ1 = 0 follows analogously. We are left with the case 0 < λ0 , λ1 , in
which we have
x = λ0 y (0) + λ1 y (1) ,
where
1 (0) 1 (1)
y (0) = x and y (1) = x .
λ0 λ1
Note that we have
1 1
Ay (0) = Ax(0) ≤ b and Ay (1) = Ax(1) ≤ b
λ0 λ1
as well as
1 | (0) 1 | (1)
π | y (0) = π x ≤ π0 and π | y (1) = π x ≥ π0 + 1.
λ0 λ1

Thus, we have y (0) ∈ P ∩ H ≤ (π, π0 ) and y (1) ∈ P ∩ H ≥ (π, π0 + 1) and hence


x ∈ split(P, π, π0 ).

We will see that the above description will help us to compute split cuts ex-
plicitly. Before we do this, let us mention that the above theorem yields a way to
express split(P, π, π0 ) as the projection of a higher-dimensional polyhedron that
may have significantly fewer facets than split(P, π, π0 ) itself. Such a description
is called an extended formulation and it turns out that many polytopes that
arise in discrete optimization can be described by extended formulations that
are significantly smaller than descriptions in the original space.

7.2 Split closure Lecture 19

Let us briefly comment on a closure operator similar to the Chvátal-Gomory


closure.
Definition 7.5 (Split closure). For P ⊆ Rn let
\
scl(P ) := split(P, π, π0 )
(π,π0 )∈Zn+1

denote the split-closure of P .


Remark 7.6. For every polyhedron P we have

PI ⊆ scl(P ) ⊆ P 0 ⊆ P.

Moreover, if P1 ⊆ P2 are two polyhedra, then scl(P1 ) ⊆ scl(P2 ).


Theorem 7.7. The split closure of a rational polyhedron is again a rational
polyhedron.
CHAPTER 7. SPLIT CUTS 61

Proof. See Section 5.1 in the book Integer Programming by Conforti, Cornuéjols,
Zambelli.

Definition 7.8. For P ⊆ Rn we define scl(0) (P ) := P and

scl(t) (P ) := scl(scl(t−1) (P ))

for t = 1, 2, . . . .
Corollary 7.9. For every rational polyhedron P there is some t ∈ Z≥0 with
scl(t) (P ) = PI .
Proof. Follows directly from Remark 7.6 and Theorem 6.10.

Proposition 7.10. For every polytope P ⊆ [0, 1]n we have

split(· · · split(split(P, e1 , 0), e2 , 0) · · · , en , 0) = PI = conv(P ∩ {0, 1}n ).

In particular, we have scl(n) (P ) = PI .


Proof. Define P0 := P and Pi := split(Pi−1 , ei , 0) for i = 1, . . . , n. Note that
Pn = split(· · · split(split(P, e1 , 0), e2 , 0) · · · , en , 0). We will prove that for every
i ∈ {0, . . . , n}, the first i coordinates of every vertex of Pi are in {0, 1}. This
shows that every vertex of Pn is integer and hence Pn = PI .
We proceed by induction on i ≥ 0 and observe that the claim is trivial if
i = 0. Let i ≥ 1 and let v be a vertex of Pi . Recall that Pi is the convex hull of
F0 := Pi−1 ∩ H ≤ (ei , 0) and F1 := Pi−1 ∩ H ≥ (ei , 1). Since v is a vertex of Pi , we
must have v ∈ F0 or v ∈ F1 . Moreover, since Pi−1 ⊆ P ⊆ [0, 1]n , we see that F0
and F1 are both faces of Pi−1 . Thus, v is a vertex of a face of Pi−1 and hence
v is a vertex of Pi−1 . By the induction hypothesis, the first i − 1 coordinates of
v are in {0, 1}. Since v ∈ F0 ∪ F1 , we see that also vi ∈ {0, 1} holds.

7.3 Split cuts


In this part, we will consider two ways to obtain explicit split cuts. The first
one is based on the assumption that we are given an arbitrary point x∗ and a
pair (π, π0 ) for which we already know that x∗ ∈
/ split(P, π, π0 ). In this case, we
want to derive an inequality that is valid for split(P, π, π0 ) but violated by x∗ .
This can be done by a simple linear program:

Theorem 7.11. Let P = P ≤ (A, b) with A ∈ Rm×n , b ∈ Rm , π ∈ Zn , π0 ∈ Z


and x∗ ∈ Rn . We have x∗ ∈
/ split(P, π, π0 ) if and only if there exist

β, δ ∈ Rm
≥0 , γ, ε ≥ 0

such that
A| β + γπ = A| δ − επ =: α
and
α| x∗ > max{b| β + π0 γ, b| δ − (π0 + 1)ε} =: ζ
Moreover, the inequality α| x ≤ ζ is valid for all x ∈ split(P, π, π0 ).
CHAPTER 7. SPLIT CUTS 62

Proof. By Theorem 7.4, the point x∗ is contained in split(P, π, π0 ) if and only


if the system

x(0) + x(1) = x∗ [−α]


Ax(0) + (−b)λ0 ≤ 0 [β ≥ 0]
| (0)
π x + (−π0 )λ0 ≤ 0 [γ ≥ 0]
(1)
Ax + (−b)λ1 ≤ 0 [δ ≥ 0]
| (1)
(−π) x + (π0 + 1)λ1 ≤ 0 [ε ≥ 0]
λ0 + λ1 = 1 [ζ]
−λ0 ≤ 0 [η ≥ 0]
−λ1 ≤ 0 [θ ≥ 0]

has a solution. If the system is infeasible, by Farkas’ lemma, there exist multi-
pliers

α ∈ Rn , β ∈ Rm m n n
≥0 , γ ≥ 0, δ ∈ R≥0 , ε ≥ 0, ζ ∈ R, η ∈ R≥0 , θ ∈ R≥0

such that

−α + A| β + γπ = 0
−α + A| δ − επ = 0
|
− b β − π0 γ +ζ −η = 0
|
− b δ + (π0 + 1)ε + ζ −θ = 0
∗ |
(−x ) α +ζ < 0.

The first claim follows by eliminating η, θ. It remains to show that every x ∈


split(P, π, π0 ) satisfies α| x ≤ ζ. To this end, it suffices to show that α| x ≤ ζ is
valid for all x ∈ P ∩ H ≤ (π, π0 ) ∪ P ∩ H ≥ (π, π0 + 1). If x ∈ P ∩ H ≤ (π, π0 ), then

α| x = (A| β + γπ)| x = β | Ax + γπ | x ≤ β | b + γπ0 ≤ ζ.

If x ∈ P ∩ H ≥ (π, π0 + 1), then

α| x = (A| δ − επ)| x = δ | Ax − επ | x ≤ δ | b − ε(π0 + 1) ≤ ζ.

We remark that the above method only works if we know a pair (π, π0 )
such that the given point x∗ is not contained in split(P, π, π0 ). While it is easy
to identify such a pair (π, π0 ) for a fractional vertex, it is NP-hard to decide
whether a general point x∗ is contained in scl(P ).
Recall that the above method requires solving an auxiliary LP, which can be
done in polynomial time but still requires some computational effort, of course.
If we solve the LP relaxation using the simplex method, there is an even more
efficient way for obtaining a split cut, which is based on the following fact.
Definition 7.12. For α ∈ R let f (α) := α − bαc denote the fractional part of
α.
Theorem 7.13. Let a ∈ Rn , β ∈ R \ Z and

P = {x ∈ Rn : a| x = β, x ≥ 0}.
CHAPTER 7. SPLIT CUTS 63

Define π ∈ Rn via (
baj c if f (aj ) ≤ f (β)
πj =
daj e if f (aj ) > f (β)
and g ∈ Rn via
(
f (aj )/f (β) if f (aj ) ≤ f (β)
gj =
(1 − f (aj ))/(1 − f (β)) if f (aj ) > f (β).

Then the inequality


g| x ≥ 1
is valid for all x ∈ split(P, π, bβc).
Proof. Set

J≤ := {j ∈ [n] : f (aj ) ≤ f (β)}, J> := {j ∈ [n] : f (aj ) > f (β)},

and
Π0 := {x ∈ P : π | x ≤ π0 }, Π1 := {x ∈ P : π | x ≥ π0 + 1},
where π0 := bβc. We have to show that g | x ≥ 1 holds for all x ∈ Π0 ∪ Π1 . For
all x ∈ P we have
X X
β = a| x = aj xj + a j xj
j∈J≤ j∈J>
X X
= (πj + f (aj ))xj + (πj + f (aj ) − 1)xj
j∈J≤ j∈J>
X X
= π| x + f (aj )xj + (f (aj ) − 1)xj . (7.3)
j∈J≤ j∈J>

If x ∈ Π0 , then π | x ≤ bβc and (7.3) implies


X X
f (β) = β − bβc ≤ β − π | x = f (aj )xj + (f (aj ) − 1)xj .
j∈J≤ j∈J>

Multiplying both sides with 1


f (β) > 0 yields

f (aj ) f (a )−1
X X
1≤ f (β) xj + j
f (β) xj ≤ g | x.
j∈J≤ | {z } j∈J> | {z } |{z}
=gj <0≤gj ≥0

If x ∈ Π1 , then π | x ≥ bβc + 1 and hence by (7.3) we have


X X
f (β) − 1 = β − bβc − 1 ≥ β − π | x = f (aj )xj + (f (aj ) − 1)xj .
j∈J≤ j∈J>

Multiplying both sides with 1


f (β)−1 < 0 yields

f (aj )
X X f (a )−1
j |
1≤ xj +
f (β)−1 f (β)−1 xj ≤ g x.
j∈J≤ | {z }
|{z} j∈J> | {z }
≤0≤gj ≥0 =gj
CHAPTER 7. SPLIT CUTS 64

Cuts of the above form are called Gomory Mixed Integer Cuts, often abbre-
viated as GMI cuts. The term “Mixed Integer” refers to the setting where only
a subset of the variables is required to be integer, in which case the above cuts
can be easily generalized. GMI cuts are implemented in every modern integer
programming solver and are considered as the most efficient cuts. On the one
hand, they empirically provide very strong cuts. On the other hand, they can
be directly read off from a simplex tableau without solving an auxililary LP:
To this end, recall that the simplex method is designed for LPs whose feasible
region is of the form

P = {x ∈ Rn : Ax = b, x ≥ 0},

where A ∈ Rm×n , b ∈ Rm . It traverses vertices of P , which are given by


a set of “basic variables” indexed by some set B ⊆ [n]. Every basic variable
is uniquely determined by the “nonbasic variables” indexed by N := [n] \ B.
The corresponding vertex x∗ is then defined by setting x∗j = 0 for all j ∈ N
(which then determines xj for all j ∈ B). Thus, if x∗ is fractional, then there
is some i ∈ B with x∗i ∈/ Z. Using the simplex method, one determines explicit
coefficients aj ∈ R (j ∈ N ) such that
X
xi + aj xj = x∗i (7.4)
j∈N

holds for all x ∈ P . At this point, consider the GMI cut

g| x ≥ 1

from Theorem 7.13. To see that this inequality is violated by x∗ , observe that
every basic variable has an integer coefficient in (7.4). This implies gj = 0 for
all j ∈ B and hence
X X
g | x∗ = gj x∗j + gj x∗j = 0 < 1.
j∈B
|{z} j∈N
|{z}
=0 =0
Chapter 8

Branch & cut

8.1 Branch & bound Lecture 20

While we have mainly focused on theoretical aspects concerning integer pro-


grams, let us now take a look at how such problems are solved in practice.
Given their practical relevance, it may not be surprising that many professional
software tools for solving integer programs are available. At their core, they all
follow an algorithmic paradigm called “branch and bound”. This method was
first proposed by Ailsa Land and Alison Doig in 1960 and has become the most
commonly used tool for solving discrete or combinatorial optimization problems.
We first describe the paradigm in terms of a general optimization problem.
Suppose are (implicitly) given a set Π and a function f : Π → R that we want
to maximize. The branch & bound paradigm refers to a search algorithm that
creates subproblems, which we will identify with subsets Π0 ⊆ Π. To make this
algorithm work, we need access to three important routines. Given a subproblem
Π0 ⊆ Π, we want to ...
• ... try finding a point x ∈ Π0 . (heuristic)
• ... compute a value u(Π0 ) ≥ max{f (x) : x ∈ Π0 }. (dual bound)
• ... divide Π0 into two (or more) sets Π1 , Π2 ⊆ Π0 with max{f (x) : x ∈
Π0 } = max{f (x) : x ∈ Π1 ∪ Π2 }. (branching decision)
All three components should be fast, since they are usually called many times.
A corresponding branch & bound algorithm looks as follows.
1. L ← −∞ (primal bound; value of currently best known solution)
2. U ← ∞ (dual bound; known upper bound on optimum value)
3. P ← {Π} (problems left)
4. while P =
6 ∅:
5. U ← max{u(Π0 ) : Π0 ∈ P} (“bound”)
6. pick some Π0 ∈ P (consider a subproblem)
7. if heuristic finds a point x ∈ Π0 with f (x) > L:

65
CHAPTER 8. BRANCH & CUT 66

8. L ← f (x)
9. P ← {Π00 ∈ P : u(Π00 ) > L} (“prune”)

10. if u(Π0 ) > L: (subproblem may still contain optimal solution)


11. create subproblems Π1 , Π2 ⊆ Π0 (branch)
12. P ← (P \ {Π0 }) ∪ {Π1 , Π2 }
Denoting OPT = max{f (x) : x ∈ Π} we observe that within every step of
the above algorithm we have

L ≤ OPT ≤ U.

We remark that it is not necessary to update U in every execution of line 5 (and


clearly, there is no point in recomputing u(Π00 ) for the same Π00 repeatedly).
However, we would like to stress a great feature of a branch & bound algorithm:
At every execution step we are able to provide a guarantee of the form L ≤
OPT ≤ U to estimate the quality of the best solution found so far. Of course,
the quality of this bound depends on the progress but also on the quality of our
heuristic and dual bounds.
Let us comment on the three main ingredients (heuristic, dual bound, branch-
ing decision) and how they affect the solution process in general.
The heuristic can be actually seen as a collection of algorithms determining
a (preferably) good solution for a subproblem. In most cases, these algorithms
are required to be very fast and are usually nonoptimal. In fact, they often do
not even return a feasible solution. However, the better our heuristics are, the
faster we can increase L and get closer to OPT. On the one hand, this results
in a smaller gap L ≤ OPT ≤ U , which may suggest that we can stop earlier.
On the other hand, a larger value of L allows to prune subproblems earlier,
resulting in a smaller search space.
A similar effect has the dual bound u that we want to compute for each
subproblem. Again, we want to be able to determine u(Π0 ) quickly. Usually,
u(Π0 ) will be larger than maxx∈Π0 f (x), but the closer it is the better we can
estimate whether it’s worth looking at the subproblem.
Finally we want to be able to create “good” subproblems Π1 , Π2 . Observe
that the creation of subproblems creates a tree, whose root is the original prob-
lem Π and each subproblem Π0 is followed by its two children Π1 , Π2 . This
tree is called the branch & bound tree. Typically, it is desirable to create a
“balanced” tree whose nodes (subproblems) are of similar difficulty and pruned
in a rather uniform way.

8.2 Branch & bound for integer programs


How do we implement the above framework in order to solve an integer program
max{c| x : Ax ≤ b, x ∈ Zn }? Clearly, we consider the objective function f (x) =
c| x. We identify our original problem Π with the polyhedron P ≤ (A, b), and
keep in mind that we are only interested in the integer points in P ≤ (A, b).
CHAPTER 8. BRANCH & CUT 67

Straightforward implementation Each subproblem Π0 will correspond to a


polyhedron Q ⊆ P ≤ (A, b). The dual bound can be quickly computed by solving
the linear programming relaxation u(Π0 ) = max{c| x : x ∈ Q}. Let x∗ ∈ Q
be an optimal solution to the LP. If x∗ is integer, it seems to be a perfect
choice for a heuristic solution. If not, we have x∗i ∈ / Z for some index i. In
this case, it is natural to create two subproblems by considering the polyhedra
Q1 = {x ∈ Q : xi ≤ bx∗i c} and Q2 = {x ∈ Q : xi ≥ dx∗i e}. The heuristics may
use x∗ to find a good integer point in Q, for instance by rounding it. It can be
easily seen that this setup yields a finite algorithm for integer programming, at
least for bounded polyhedra.

Branching In practice, there are many degrees of freedom to tune this algo-
rithm. Recall that we have created subproblems by simple “variable branching”.
To obtain a balanced tree, it is an important question which variables to branch
on (which index i to choose) and there are many clever algorithms to estimate
the effect of a branching. Moreover, there is no reason to restrict ourselves
to simple variable branching. Recall the idea behind split cuts: Any vector
π ∈ Zn with π | x∗ ∈/ Z creates two natural subproblems that exclude x∗ . How-
ever, selecting a good vector π different from unit vectors is quite challenging
in practice: One issue is that the search tree may become very unbalanced.
As an example, suppose all variables areP binary and wePhave decided to
create subproblems usingPthe inequalities i∈I xi ≤ 0 and i∈I xi ≥ 1. For
the problem defined by i∈I xi ≤ 0 we immediately know that all variables
indexed by I have to be zero, and hence can be eliminated P from the problem,
reducing the problem size. For the problem defined by i∈I xi ≥ 1 we only
know that at least one of the variables indexed by I has to be one, but we don’t
know which one. Thus, the latter problem does not seem to correspond to a
great restriction.

Warmstarts In any case, an important aspect of creating subproblems by


adding linear inequalities is the possiblity of solving the linear programming
relaxation using “warmstarts”. That is, suppose we have solved max{c| x : x ∈
Q} and let Q0 be a polyhedron that arises from Q by adding a single linear
inequality. Clearly, the optimal (primal) solution to max{c| x : x ∈ Q} may
not be feasible for max{c| x : x ∈ Q0 }. However, the optimal dual solution for
the first problem is still feasible (though not necessarily optimal). Thus, we can
start from this dual solution and solve the dual problem instead of the primal
one. This can be done very efficiently using the simplex algorithm (resulting
in the “dual simplex” algorithm). In fact, in many cases solving the “root LP”
max{c| x : Ax ≤ b} may take much time, but resolving the LP relaxations of
the subproblems only takes few iterations.

Other components Regarding the heuristics, modern solvers have imple-


mented many algorithms to find a feasible solution, based on very interesting
ideas. However, we will not have time to cover them.
Finally, of course there are several other aspects on practical IP solving that
we do not touch in this class. For instance, solving the intermediate linear
programs is one of the main bottlenecks. Nowadays, very sophisticated imple-
mentations of the simplex algorithm (involving very fast linear algebra) but also
CHAPTER 8. BRANCH & CUT 68

interior point algorithms are used for this task. Moreover, presolving techniques
that reduce the problem size in advance are an important ingredient.

Good relaxations As a final comment in this section, let us recall that the
performance of the branch & bound algorithm crucially depends on the quality of
the bounds we obtain during execution. While there are many ways to improve
heuristics to obtain good primal bounds, there seems to be not much freedom
about improving the dual bounds. What else can we do than solving the LP
relaxation? A simple answer to this question is: start with a good LP relaxation.
That is, start with a system Ax ≤ b such that P ≤ (A, b) is close to its integer
hull. This is not a simple task (especially for NP-hard problems), but we will
see some good practices in the exercises.

8.3 Branch & cut


So far, we have missed one important ingredient for solving integer programs in
practice: cuts. In previous chapters we have discussed several ways to obtain
linear inequalities that are valid for the integer hull and that cut off given points.
Modern integer programming solvers make extensive use of this paradigm, which
can be easily incorporated into our branch & bound framework: Whenever a
subproblem (polyhedron) is considered, try to add cuts to obtain a strengthened
relaxation. This is particularly done at the intial LP (the root LP), but also at
intermediate LPs of the branch & bound tree. This extension of the branch &
bound algorithm is usually called a branch-and-cut algorithm.

8.4 Solver output


To illustrate several of the above discussed concepts it is always useful to look
at the log output of an integer programming solver, which look similar across
different solvers. During class, we will discuss the following output, which is
generated by running our image segmentation example from the exercises on
img4.jpg.
Gurobi Optimizer version 9.1.2 build v9.1.2rc0 (mac64)
Thread count: 2 physical cores, 4 logical processors, using up to 4 threads
Optimize a model with 151209 rows, 47402 columns and 453625 nonzeros
Model fingerprint: 0x7cc2411d
Variable types: 37802 continuous, 9600 integer (9600 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [2e-12, 8e-02]
Bounds range [0e+00, 0e+00]
RHS range [2e+00, 2e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 13 rows and 4 columns
Presolve time: 0.64s
Presolved: 151196 rows, 47398 columns, 453588 nonzeros
Variable types: 0 continuous, 47398 integer (47398 binary)

Deterministic concurrent LP optimizer: primal and dual simplex


Showing first log only...

Root simplex log...

Iteration Objective Primal Inf. Dual Inf. Time


18538 1.6505447e+00 5.632130e+01 0.000000e+00 5s
27143 1.6505503e+00 9.287491e+01 0.000000e+00 10s
34784 1.6505545e+00 2.749886e+02 0.000000e+00 15s
Concurrent spin time: 0.01s

Solved with dual simplex

Root relaxation: objective 2.195599e+00, 62513 iterations, 17.23 seconds


CHAPTER 8. BRANCH & CUT 69

Nodes | Current Node | Objective Bounds | Work


Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time

0 0 2.19560 0 9599 -0.00000 2.19560 - - 18s


H 0 0 0.4271054 2.19560 414% - 18s
0 0 2.16350 0 9849 0.42711 2.16350 407% - 22s
H 0 0 0.5316163 2.16350 307% - 22s
0 0 2.12687 0 9931 0.53162 2.12687 300% - 26s
0 0 2.12682 0 9932 0.53162 2.12682 300% - 27s
0 0 2.10057 0 10020 0.53162 2.10057 295% - 32s
H 0 0 0.5455696 2.10057 285% - 32s
H 0 0 0.6045253 2.10057 247% - 32s
0 0 2.09846 0 10017 0.60453 2.09846 247% - 32s
0 0 2.08109 0 10016 0.60453 2.08109 244% - 37s
H 0 0 0.6257337 2.08109 233% - 38s
0 0 2.07806 0 10018 0.62573 2.07806 232% - 38s
0 0 2.06886 0 10060 0.62573 2.06886 231% - 40s
H 0 0 0.6576363 2.06886 215% - 40s
0 0 2.06844 0 10068 0.65764 2.06844 215% - 41s
0 0 2.06525 0 10079 0.65764 2.06525 214% - 43s
H 0 0 0.7205576 2.06525 187% - 43s
H 0 0 0.7567742 2.06525 173% - 43s
0 0 2.06450 0 10080 0.75677 2.06450 173% - 43s
0 0 2.06335 0 10065 0.75677 2.06335 173% - 45s
0 0 2.06320 0 10074 0.75677 2.06320 173% - 46s
0 0 2.06309 0 10082 0.75677 2.06309 173% - 48s
H 0 0 0.8058161 2.06309 156% - 48s
0 0 2.06309 0 10082 0.80582 2.06309 156% - 49s
0 0 2.06293 0 10080 0.80582 2.06293 156% - 51s
0 0 2.06293 0 10080 0.80582 2.06293 156% - 51s
0 0 2.05896 0 10125 0.80582 2.05896 156% - 58s
0 0 2.05808 0 10105 0.80582 2.05808 155% - 59s
0 0 2.05618 0 10115 0.80582 2.05618 155% - 61s
0 0 2.05599 0 10143 0.80582 2.05599 155% - 62s
0 0 2.05536 0 10127 0.80582 2.05536 155% - 64s
0 0 2.05524 0 10138 0.80582 2.05524 155% - 65s
0 0 2.05508 0 10138 0.80582 2.05508 155% - 67s
0 0 2.05507 0 10141 0.80582 2.05507 155% - 67s
0 0 2.05507 0 10141 0.80582 2.05507 155% - 69s
H 0 0 0.8102670 2.05507 154% - 70s
0 0 2.05504 0 10146 0.81027 2.05504 154% - 72s
0 0 2.05500 0 10154 0.81027 2.05500 154% - 72s
0 0 2.05491 0 10153 0.81027 2.05491 154% - 75s
0 0 2.05491 0 10152 0.81027 2.05491 154% - 76s
H 0 0 0.9590914 2.05491 114% - 76s
H 0 0 1.0810808 2.05491 90.1% - 77s
H 0 2 1.1433983 2.05491 79.7% - 78s
0 2 2.05491 0 10152 1.14340 2.05491 79.7% - 78s
1 4 1.88895 1 10142 1.14340 2.05485 79.7% 6841 83s
3 6 1.76498 2 10147 1.14340 1.88890 65.2% 4523 87s
7 10 1.59969 3 10153 1.14340 1.88886 65.2% 2901 90s
13 16 1.43371 4 10154 1.14340 1.88886 65.2% 2460 95s
22 19 1.27104 6 10156 1.14340 1.88886 65.2% 1958 101s
26 18 cutoff 8 1.14340 1.88886 65.2% 1997 107s
H 28 18 1.6505431 1.88886 14.4% 1855 107s
29 16 cutoff 7 1.65054 1.76485 6.93% 1880 112s
33 14 cutoff 3 1.65054 1.76485 6.93% 1977 115s

Cutting planes:
Gomory: 9
MIR: 180
Zero half: 253
RLT: 604

Explored 40 nodes (165193 simplex iterations) in 117.97 seconds


Thread count was 4 (of 4 available processors)

Solution count 10: 1.65054 1.1434 1.08108 ... 0.625734

Optimal solution found (tolerance 1.00e-04)


Best objective 1.650543074860e+00, best bound 1.650543074860e+00, gap 0.0000%
Chapter 9

Integer programming in fixed


dimension

9.1 Lenstra’s result Lecture 21

After this short excursion into practice, let us turn to some more beautiful
theoretical results at the end of this lecture. The first one is a landmark result
by Hendrik Lenstra. Recall that it is NP-hard to solve general integer programs.
Lenstra proved that integer programs with a fixed (constant) number of variables
can be solved in polynomial time.
Theorem 9.1 (Lenstra 1981). For every fixed n, there is a polynomial-time
algorithm to solve integer programs max{c| x : Ax ≤ b, x ∈ Zn }.
As usual, here we measure the running time proportional to the encoding
lengths of c, A, b. Let us note that the above result is trivial if all variables are
required to be binary, i.e., x ∈ {0, 1}n . In this case, we can simply enumerate all
2n points in {0, 1}n (which is a constant number if we regard n as a constant)
and return the best feasible point. However, solving general (nonbinary) integer
programs is already nontrivial for n = 2. Recall that we can model the compu-
tation of the greatest common divisor of two numbers with a 2-variable integer
program, and the fact that the greatest common divisor can be computed in
polynomial time requires at least some thinking.1
It turns out that Lenstra’s algorithm relies on two mathematical facts that
are beautiful on its own. In this chapter, we will first collect these results in
a rather independent fashion. Once they are established, Lenstra’s algorithm
follows quite naturally.

9.2 Approximating convex bodies by ellipsoids Lect’s 22 & 23

We begin with a central result in convex geometry, namely that every convex
body can be nicely sandwiched between two concentric ellipsoids.
1 It is always a nice exercise to remind yourself why the Euclidean algorithm runs in poly-

nomial time.

70
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 71

Definition 9.2 (Convex body). A convex body is a compact (i.e., bounded and
closed) convex set in Rn with nonempty interior.

Theorem 9.3. For every convex body K ⊆ Rn there exists an ellipsoid E with
center z such that
E ⊆ K ⊆ n(E − z) + z.
Note that the ellipsoid λ(E − z) + z arises from E by scaling it by a factor
of n around its center z. To phrase the above statement differently, recall that
our original definition of an ellipsoid E was based on a positive definite matrix
D ∈ Rn×n and a vector z ∈ Rn (the center of E), where E = E(z, D) := {x ∈
Rn : (x − z)| D(x − z) ≤ 1}. An alternative definition is that ellipsoids are
precisely the images of Euclidean unit balls under invertible affine linear maps:
Lemma 9.4. A set E ⊆ Rn is an ellipsoid if and only if there exists a nonsin-
gular matrix A ∈ Rn×n and a vector z ∈ Rn with

E = ABn + z,

where Bn = {x ∈ Rn : kxk ≤ 1} is the standard unit ball in Rn .


Proof. Exercises.

Thus, an equivalent version of the above statement is the following.


Theorem 9.5. For every convex body K ⊆ Rn there exists an invertible affine
linear map τ : Rn → Rn such that

Bn ⊆ τ (K) ⊆ nBn .

Proof. By Theorem 9.3 and Lemma 9.4 there exists a nonsingular matrix A ∈
Rn×n and a vector z ∈ Rn with

ABn + z ⊆ K ⊆ nABn + z

Define τ to be the invertible affine linear map with τ (x) = A−1 (x − z). The
claim follows since τ (ABn + z) = Bn and τ (nABn + z) = nBn .
It turns out that an ellipsoid with the above properties can be obtained by
taking the maximum-volume ellipsoid:
Theorem 9.6 (Löwner-John ellipsoid). Let K be a convex body. Among all
ellipsoids contained in K there is a unique one with maximum volume.
We will first prove the above result and then show that the respective el-
lipsoid satisfies Theorem 9.3. To this end, let us collect a few further insights
about ellipsoids.
Lemma 9.7. Let E = E(z, D) = {x ∈ Rn : (x − z)| D(x − z) ≤ 1} be an
ellipsoid with positive definite D ∈ Rn×n and center z ∈ Rn . The volume of E
vol(Bn )
is given by vol(E) = √ .
det(D)

Proof. Exercises.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 72

The next statement shows that the matrix in Lemma 9.4 can be chosen to
be positive definite.
Lemma 9.8. A set E ⊆ Rn is an ellipsoid if and only if there exists a positive
definite matrix A ∈ Rn×n and a vector z ∈ Rn with E = ABn + z.
Proof. Let A ∈ Rn×n be nonsingular and z ∈ Rn such that E = ABn + z. The
polar decomposition of A yields a positive definite matrix A0 ∈ Rn×n and an
orthogonal matrix U ∈ Rn×n with A = A0 U . The claim follows since U Bn =
Bn .
Towards the proof of Theorem 9.6, let us first consider the special case that
there cannot be two maximum-volume ellipsoids that are translates of each
other.
Lemma 9.9. Let E ⊆ Rn be an ellipsoid and let t ∈ Rn \ {0}. Then there exists
an ellipsoid E 0 ⊆ conv(E ∪ (E + t)) with vol(E 0 ) > vol(E).
Proof. We may assume that ktk = 1. Let us call v1 := t and set
1
ε := > 0.
2 max{|v1| x| : x ∈ E}
Pick vectors v2 , . . . , vn ∈ Rn such that v1 , . . . , vn are an orthonormal basis of
Rn . Consider the linear map f : Rn → Rn defined via
Xn  Xn
f αi vi = (1 + ε)α1 v1 + αi vi .
i=1 i=2

Observe that the transformation matrix of f with respect to the basis v1 , . . . , vn


has determinant 1 + ε and hence vol(f (E)) = (1 + ε) vol(E). In particular, the
ellipsoid E 0 := f (E) + 21 t satisfies

vol(E 0 ) > vol(E).

It remains to show that E 0 ⊆ conv(E ∪ (E + t)) with vol(E 0 ) > vol(E). To


this end, let y ∈ E 0 . ThenPnthere exists some x ∈ E with y = f (x) + 2 t. Let
1

α1 , . . . , αn ∈ R with x = i=1 αi vi . Defining

λ := 1
2 − εα1 = 1
2 − εv1| x ∈ [0, 1]

we obtain

y = f (x) + 12 t = x + εα1 v1 + 21 t = x + ( 12 − λ)t + 21 t


= λx + (1 − λ)(x + t) ∈ conv(E ∪ (E + t)).

The final ingredient for Theorem 9.6 is the fact that the determinant is “log-
concave” over the cone of positive definite matrices, i.e., log det(·) is a concave
function over the set of positive definite matrices.
Lemma 9.10. For positive definite matrices A, B ∈ Rn×n we have
 p
det 12 (A + B) ≥ det(A) det(B),

with equality if and only if A = B.


CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 73

Proof. Let U ∈ Rn×n be a positive definite matrix with U 2 = A. The matrix


B 0 := U −1 BU −1 is positive definite with eigenvalues µ1 , . . . , µn > 0. We have
p p
det(A) det(B) = det(U )2 det(U −2 ) det(A)U −1 BU −1
n
Y √
= det(U )2 µi
i=1

as well as
1 −1
1
+ B) = det(U )2 det + B)U −1
 
det 2 (A 2U (A
n
Y 1 + µi
= det(U )2 det 1
+ B 0 ) = det(U )2

2 (I 2
.
i=1

The claim follows since


1 + µi p √
≥ 1 · µi = µi
2
by the inequality of the arithmetic vs. geometric mean, where equality holds if
and only if 1 = µi . Note that in the case of µi = 1 for all i we have B 0 = I and
hence B = U 2 = A.
Proof of Theorem 9.6. We first show the existence of a maximum-volume ellip-
soid in K. By Lemma 9.8 we have to show that the function (A, z) 7→ det(A)
attains its maximum over the set

Σ := {(A, z) : A ∈ Rn×n positive definite, z ∈ Rd , ABn + z ⊆ K}.

To this end, consider the set

Σ := {(A, z) : A ∈ Rn×n positive semidefinite, z ∈ Rd , ABn + z ⊆ K}.

Since K and the cone of positive semidefinite matrices is closed, so is Σ. More-


over, Σ is bounded: Since K is bounded, there is some M > 0 such that
K ⊆ [−M, M ]n . For every (A, z) ∈ Σ we clearly have z = A0 + z ∈ K and
hence z ∈ [−M, M ]n . Moreover, we have

A∗,i = Aei + z − z ∈ K − z ⊆ [−2M, 2M ]

for every column of A.


In other words, Σ is a compact set over which the continuous function
(A, z) 7→ det(A) attains its maximum. Since K is full-dimensional, it con-
tains a ball of positive volume and hence the maximum function value over Σ
is positive. Thus, if (A∗ , z ∗ ) is a maximizer over Σ, it satisfies det(A) > 0 and
hence A∗ is positive definite. We see that (A∗ , z ∗ ) is even a maximizer of Σ.
To show the uniqueness, let E1 , E2 ⊆ K be ellipsoids in K of maximum
volume. Again by Lemma 9.8 there are positive definite matrices A1 , A2 ∈ Rn×n
and vectors z1 , z2 ∈ Rn with Ei = Ai Bn + zi . Consider the ellipsoid
1 1 1 1
E := (A1 + A2 )Bn + (z1 + z2 ) = E1 + E2 ⊆ K.
2 2 2 2
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 74

By Lemma 9.10, the maximality of E1 , E2 implies


 p
det(A1 ) ≥ det 12 (A1 + A2 ) ≥ det(A1 ) det(A2 ) = det(A1 ).

Thus, we have det 12 (A1 + A2 ) = det(A1 ) det(A2 ) and hence Lemma 9.10
 p

yields A1 = A2 . Thus, we have E1 + (z1 − z2 ) = E2 ⊆ K, which implies z1 = z2


by Lemma 9.9 and hence E1 = E2 .
Let us now turn to the fact that Löwner-John ellipsoids satisfy Theorem 9.3.
Lemma 9.11. For every ellipsoid E ⊆ Rn with center z ∈ Rn there exist
α1 , . . . , αn > 0 and an orthonormal basis v1 , . . . , vn ∈ Rn such that
( n  2 )
n
X (x − z)| vi
E= x∈R : ≤1 .
i=1
αi
Qn
In this case, we have vol(E) = vol(Bn ) · i=1 αi .
Proof. Exercises.
Our proof of Theorem 9.3 makes use of the following optimality criterion by
John (1948):
Theorem 9.12 (John 1948, Ball 1997). Let K ⊆ Rn be a convex body with
Bn ⊆ K. The maximum-volume ellipsoid of K is equal to Bn if and only if
there exist
u1 , . . . , um ∈ ∂Bn ∩ ∂K
and c1 , . . . , cm > 0 with
m
X m
X
ci ui = 0 and ci ui u|i = I. (9.1)
i=1 i=1

Proof. Suppose first that (9.1) holds. Then we have


n m
! m n m m
X X |
X X X X
n= ci ui ui = ci (ui u|i )j,j = ci kui k2 = ci , (9.2)
j=1 i=1 j,j i=1 j=1 i=1 i=1

where the last equality holds since ui ∈ ∂Bn . Since Bn ⊆ K and ui ∈ ∂Bn ∩ ∂K
we must have
u|i x ≤ 1 for all x ∈ K. (9.3)
(Since ui is in the boundary of ∂K we can choose a supporting hyperplane H
of K at ui . Since Bn ⊆ K and ui ∈ ∂Bn , we see that H is also a supporting
hyperplane of Bn at ui . This hyperplane is unique and has the above form.)
Let E ⊆ K be an ellipsoid with
( n  2 )
n
X (x − z)| vi
E= x∈R : ≤1
i=1
αi
Qn
as in the previous lemma. We have to show that i=1 αi ≤ 1. It is straightfor-
ward to check that, for every i ∈ [m], the point
 − 21
Xd Xn
Rn 3 xi := z +  αj2 (u|i vj )2  αj2 u|i vj vj
j=1 j=1
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 75

is contained in E. By (9.3) we have u|i xi ≤ 1 and hence


  21
Xd
u|i z +  αj2 (u|i vj )2  ≤ 1.
j=1

Multiplying each inequality with ci and summing up yields (since


P
i ci ui = 0)
  21
n
X Xd n
X
ci  αj2 (u|i vj )2  ≤ ci = n,
i=1 j=1 i=1
Pm
where the last equality is due to (9.2). Since i=1 ci (u|i x)2 = kxk2 holds for all
x and since v1 , . . . , vn form an orthonormal basis, we obtain
n
X n
X m
X
αi = αj ci (u|i vj )2
i=1 j=1 i=1
m
X n
X
= ci αj (u|i vj )2
i=1 j=1
| {z }
=(αj u| |
i vj )(ui vj )
  21
m
X Xn n
X  12
2 | 2 | 2
(Cauchy-Schwarz) ≤ ci  αj (ui vj ) (ui vj )
i=1 j=1 j=1
| {z }
=kui k2 =1
  12
m
X Xn
= ci  αj2 (u|i vj )2  ≤ n.
i=1 j=1

By the inequality of the arithmetic vs. geometric mean we conclude


n Pn !n
j=1 αj
Y
αj ≤ ≤ 1.
j=1
n

Let us now suppose that (9.1) does not hold. Then we have

/ conv{(uu| , u) : u ∈ ∂Bn ∩ ∂K}


λ(I, 0) ∈ for all λ ∈ R.

(Note that for λ ≤ 0 we have h(I, 0), (uu| , u)i = kuk = 1 > 0 ≥ λn =
h(I, 0), λ(I, 0)i.) By the separation theorem there exists a symmetric matrix
H ∈ Rn×n and a vector h ∈ Rn with

hH, Ii = h(H, h), (I, 0)i = 0 (9.4)

and
u| Hu + h| u = h(H, h), (uu| , u)i > 0 (9.5)
for all u ∈ ∂Bn ∩ ∂K.
Since H is symmetric, we know that Qε := I + εH is positive definite for
ε h and consider the ellipsoid
small enough ε > 0. Let zε := − 2ε Q−1

Eε := {x ∈ Rn : (x − zε )| Qε (x − zε ) ≤ 1}.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 76

Note that

(x − zε )| Qε (x − zε ) = x| Qε x − 2zε Qε x + zε| Qε zε
= kxk + ε(x| Hx + h| x) + zε| Qε zε
≥ kxk + ε(x| Hx + h| x).

By (9.5) we thus have


(u − zε )| Qε (u − zε ) > 1 (9.6)
for all u ∈ ∂Bn ∩ ∂K. Since Bn ⊆ K and ∂K is compact, we see that (9.6) is
even valid for all u ∈ ∂K if we choose ε > 0 small enough. In particular, we have
Eε ∩ ∂K = ∅ for all sufficiently small ε > 0. However, we have E0 = Bn ⊆ K.
Thus, for sufficiently small ε > 0 we see that Eε is still contained in K and
Eε 6= Bn .
It remains to show vol(Eε ) ≥ vol(Bn ). Let λ1 , . . . , λn denote the eigenvalues
of Qε . By Lemma 9.7, arithmetic mean vs. geometric mean, and (9.4) we obtain
 2 n  Pn n  n
vol(Bn ) Y
i=1 λi tr(I + εH)
= λi ≤ =
vol(Eε ) i=1
n n
 n
hI, I + εHi
= = 1.
n

Finally, we are ready to prove the main result of this section.

Proof of Theorem 9.3. By applying a suitable affine linear transformation to K,


we may assume that the maximum-volume ellipsoid in K is equal to Bn . Since
Bn ⊆ K we have u| x ≤ 1 for all u ∈ ∂Bn ∩ ∂K, and all x ∈ K. Furthermore,
by Cauchy-Schwarz we have |u| x| ≤ kukkxk = kxk, and in particular u| x ≥
−kxk for all u ∈ ∂Bn and all x ∈ Rn . Let u1 , . . . , um and c1 , . . . , cm as in
Theorem 9.12. Then, for every x ∈ K, we have
m
X
0≤ ci (1 − u|i x)(u|i x + kxk)
i=1
m
X | m
X m
X
= (1 + kxk) ci ui x + kxk ci − ci (u|i x)2
i=1 i=1 i=1
| {z } | {z } | {z }
=0 =n, see prev. proof =kxk2

= nkxk − kxk2 ,

that is, kxk2 ≤ nkxk, which implies kxk ≤ n.

Actually, the following strengthening of Theorem 9.3 can be given for sym-
metric convex bodies:
Theorem 9.13. For every centrally symmetric (i.e., K = −K) convex body
K ⊆ Rn there exists an ellipsoid E with center z such that

E ⊆ K ⊆ n(E − z) + z.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 77

Proof. Let x ∈ K and recall that we have u| x ≤ 1 for all u ∈ ∂Bn ∩ ∂K.
Since −x ∈ K, we also have u| x ≥ −1 and hence |u| x| ≤ 1. Thus, choosing
u1 , . . . , um and c1 , . . . , cm as in Theorem 9.12 we obtain
m
X n
X
2
kxk = ci (u|i x)2 ≤ ci = n.
i=1
| {z } i=1
≤1

An important aspect that we have not discussed yet is how to compute


ellipsoids as in Theorem 9.3. Given a rational polytope, our representations
for its Löwner-John ellipsoid may involve irrational numbers and and hence
we have some issues with computing it exactly. However, it can be efficiently
approximated:

Theorem 9.14. Given the inequality description Ax ≤ b of a rational full-


dimensional polytope P ⊆ Rn and ε > 0, we can compute an ellipsoid E ⊆ Rn
with center z and
E ⊆ P ⊆ (1 + ε)n(E − z) + z
1
in time polynomial in n, ε and the encoding lengths of A, b.
Proof. See Schrijver’s book, Theorem 15.6.
In the exercises, we will discuss a very simple method to obtain an ellipsoid
E with E ⊆ P ⊆ f (n)(E − z) + z, where f (n) = O(n2 ). It turns out that any
factor f (n) that only depends on n is sufficient for our applications.

9.3 Flatness theorem Lecture 24

The second ingredient of Lenstra’s algorithm is the Flatness theorem, which is


concerned with the interaction of convex sets and lattices in a similar spirit
as Minkowski’s convex body theorem (Theorem 3.23). Recall that Minkowski’s
theorem states that every centrally symmetric convex body with volume at least
2n det(Λ) contains a nonzero lattice point. The Flatness theorem is a powerful
statement for convex bodies that are not necessarily centrally symmetric. Note
that we can easily create convex bodies of arbitrarily large volume that do not
contain any lattice point. However, when constructing such bodies it feels that
they must be somewhat “flat”. To formalize this, let us introduce the notion of
the dual lattice.
Definition 9.15 (Dual lattice). The dual lattice of a full-dimensional lattice
Λ ⊆ Rn is the set

Λ∗ = {y ∈ Rn : x| y ∈ Z for all x ∈ Λ}.

Here, we say that a lattice Λ ⊆ Rn is full-dimensional if dim(Λ) = n. Note


that we have (Zn )∗ = Zn .

Lemma 9.16. The dual lattice of a full-dimensional lattice is again a lattice.


Moreover, if Λ = Λ(A) for some invertible matrix A, then Λ∗ = Λ(A−| ).
Proof. Exercises.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 78

Given a full-dimensional lattice Λ and a vector v ∈ Λ∗ \ {0}, note that we


can partition Λ into the slices
[
Λ= {x ∈ Λ : v | x = k}.
k∈Z

The Flatness theorem asserts that any convex body without integer points is
contained between two such slices that are not far apart.
Theorem 9.17 (Khintchin, 1948). There exists a function f : Z≥1 → R that
satisfies the following. For every full-dimensional lattice Λ ⊆ Rn and every
convex body K ⊆ Rn with K ∩ Λ = ∅ there exists some v ∈ Λ∗ \ {0} with
max{v | x : x ∈ K} − min{v | x : x ∈ K} ≤ f (n).
Note that once the above result has been established for the special case
Λ = Zn , it holds for all lattices. However, it will be actually more convenient
for us to work with general lattices and hence stick to the general case.
The best possible bound f (n) is not known yet, but it is known that there
exist constants c1 , c2 , c3 > 0 such that
c1 n ≤ f (n) ≤ c2 n4/3 logc3 n
holds for all n.2
In this course, we will establish the following bound, which is again absolutely
sufficient for our applications.
Theorem 9.18 (Lagarias, Lenstra, Schnorr 1990). The function f can be cho-
sen as f (n) := n5/2 .
We begin our proof with a simple bound on the length of the shortest nonzero
lattice vector.
n
Lemma 9.19. For every √ full-dimensional lattice Λ ⊆ R there exists some
x ∈ Λ \ {0} with kxk ≤ n · | det(Λ)|1/n .
Proof. Let Λ = Λ(A), where A ∈ Rn×n . Consider the cube
C := [−| det(A)|1/n , | det(A)|1/n ]n
and note that vol(C) = 2n | det(A)|. Since C is compact, Minkowski’s
√ theorem
yields a point x ∈ C ∩Λ\{0}. The claim follows since kxk = kxk2 ≤ nkxk∞ ≤
| det(A)|1/n .
We need to introduce two further quantities related to a lattice.
Definition 9.20 (Packing and covering radii). The packing radius of a full-
dimensional lattice Λ is defined as
1
pack(Λ) = min{kxk : x ∈ Λ \ {0}}.
2
The covering radius of Λ is given by
cov(Λ) = min{ρ > 0 : for all x ∈ Rn there is some z ∈ Λ
with kx − zk ≤ ρ}.
2 The upper bound is due to Rudelson (2000). It is believed that f (n) grows only linearly
in n. We recently showed that f (n) ≥ 2n − o(n). Can you beat this?
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 79

Note that both quantities have simple geometric interpretations in terms of


lattice translates of balls: The packing radius is the largest radius of a ball such
that its lattice translates only intersect in their boundaries. The covering radius
is the smallest radius of a ball such that its lattice translates cover all Rn .
Depending on the lattice, the covering and packing radii may be arbitrarily
large. A key observation of this section is that if the covering radius of a lattice
is large, the packing radius of the dual lattice must be small:
Lemma 9.21. Every full-dimensional lattice Λ ⊆ Rn satisfies

n3/2
cov(Λ) · pack(Λ∗ ) ≤ .
4
Before we prove the above result, let us demonstrate that it easily leads to
the Flatness theorem. Let us first prove the Flatness theorem for ellipsoids. In
this proof, we will see that the covering and packing radii appear naturally.
Proposition 9.22. If Λ ⊆ Rn is a full-dimensional lattice and E ⊆ Rn an
ellipsoid with E ∩ Λ = ∅, then there exists some v ∈ Λ∗ \ {0} with

max{v | x : x ∈ E} − min{v | x : x ∈ E} ≤ n3/2 .

Proof. Let E = ABn + z for some invertible matrix A. Consider the lattice
Λ̄ := A−1 Λ and note that we have

(Bn + A−1 z) ∩ Λ̄ ⊆ (A−1 K) ∩ Λ̄ = A−1 (E ∩ Λ) = ∅.

In other words, we see that 1 ≤ dist(A−1 z, Λ̄) ≤ cov(Λ̄) holds. By Lemma 9.21
this implies
n3/2 n3/2
· pack(Λ̄∗ ) ≤ ≤ .
4 cov(Λ̄) 4
This means that there exists a vector u ∈ Λ̄∗ with kuk ≤ 12 n3/2 . Setting
v := A−| u we see that v ∈ Λ∗ since v | z = (A−| u)| z = u| A−1 z ∈ Z holds for
every z ∈ Λ. Note that

max{v | x : x ∈ E} − min{v | x : x ∈ E}
= max{v | (Ay + z) : y ∈ Bn } − min{v | (Ay + z) : y ∈ Bn }
= max{v | Ay : y ∈ Bn } − min{v | Ay : y ∈ Bn }
= max{u| y : y ∈ Bn } − min{u| y : y ∈ Bn }
   
= u| kuk
1
u − u| − kuk 1
u
= 2kuk ≤ n3/2 .

Proof of Theorem 9.18. Let Λ ∈ Rn be a full-dimensional lattice and K ⊆ Rn


be a convex body with K ∩ Λ = ∅. By Theorem 9.3, there is an ellipsoid E with
center z such that
E ⊆ K ⊆ n(E − z) + z.
Note that E ∩ Λ ⊆ K ∩ Λ = ∅ and hence by Proposition 9.22 there is some
v ∈ Λ∗ \ {0} with

max{v | x : x ∈ E} − min{v | x : x ∈ E} ≤ n3/2 .


CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 80

We obtain

max{v | x : x ∈ K} − min{v | x : x ∈ K}
≤ max{v | x : x ∈ n(E − z) + z} − min{v | x : x ∈ n(E − z) + z}
= n · (max{v | x : x ∈ E} − min{v | x : x ∈ E}) ≤ n5/2 .

The remainder of this section is devoted to the proof of Lemma 9.21. To


this end, we will make use of the following simple estimate.

Lemma 9.23. Every full-dimensional lattice Λ ⊆ Rn satisfies


n
pack(Λ) · pack(Λ∗ ) ≤ .
4

Proof. Let Λ = Λ(A). By Lemma 9.19 we have pack(Λ) ≤ 12 n · | det(A)|1/n

and pack(Λ∗ ) ≤ 12 n · | det(A−| )|1/n . The claim follows since det(A−| ) =
det(A)−1 .
We will also make use of the following basic fact about bases of lattices.
Lemma 9.24. Let Λ be a lattice and v1 , . . . , vk ∈ Λ linearly independent lattice
vectors. Then there exists a lattice basis b1 , . . . , bn ∈ Λ of Λ with

span(v1 , . . . , vk ) = span(b1 , . . . , bk ).

Proof. Let V ∈ Rd×k denote the matrix whose columns are v1 , . . . , vk , and let
B 0 ∈ Rd×n denote any lattice basis of Λ. Note that there exists an integer
matrix W ∈ Zn×k with B 0 W = V and rank(W ) = k. By Corollary 3.14, there
is a unimodular matrix U ∈ Zn×n such that W | U is in Hermite normal form.
Recall that B := B 0 U −| is also a lattice basis of Λ. The space spanned by
the first k columns of B is equal to

span(Be1 , . . . , Bek ) = B span(e1 , . . . , ek ) = B span(U | W )


= span(BU | W ) = span(B 0 W ) = span(V ).

Proof of Lemma 9.21. We prove the claim by induction over n ≥ 1. If n = 1,


then Λ = {αm : m ∈ Z} and Λ∗ = {α−1 m : m ∈ Z} for some α > 0. Hence
cov(Λ) = α/2 and pack(Λ∗ ) = α−1 /2, so cov(Λ) pack(Λ) = 14 .
Suppose that n ≥ 2. Let u ∈ Λ with kuk = 2 pack(L), and let W ⊆ Rn
denote the orthogonal complement of u. Let π : Rn → W denote the orthogonal
projection onto W , and let τ : W → Rn−1 be any orthogonal transformation,
i.e., τ is a linear map with x| y = τ (x)| τ (y) for all x, y ∈ W . (Note that this
also implies kxk = kτ (x)k for all x ∈ W .)
We claim that Λ1 := τ (π(Λ)) is a full-dimensional lattice. By Lemma 9.24
there exists a lattice basis b1 , . . . , bn of Λ where b1 is a multiple of u, and hence
π(b1 ) = 0. Note that τ (π(b2 )), . . . , τ (π(bn )) are linearly independent and hence

Λ1 = Λ(τ (π(b1 )), . . . , τ (π(bn ))) = Λ(τ (π(b2 )), . . . , τ (π(bn )))

is a full-dimensional lattice.
First, we claim that
pack(Λ∗ ) ≤ pack(Λ∗1 ) (9.7)
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 81

holds. To this end, let a ∈ Λ∗1 with kak = 2 pack(Λ∗1 ). Consider the vector
b ∈ W with τ (b) = a. For every z ∈ Λ we have

b| z = b| π(z) = τ (b)| τ (π(z)) = a| τ (π(z)) ∈ Z

and hence b ∈ Λ∗ , which yields

2 pack(Λ∗ ) ≤ kbk = kτ (b)k = kak = 2 pack(Λ∗1 ).

Second, we claim that

kuk2
cov(Λ)2 ≤ cov(Λ1 )2 + = cov(Λ1 )2 + pack(Λ)2 . (9.8)
4
holds. Note that this implies

cov(Λ)2 pack(Λ∗ )2 ≤ cov(Λ1 )2 · pack(Λ∗ )2 + pack(Λ)2 pack(Λ∗ )2


| {z } | {z }
≤pack(Λ∗
1 ) by (9.7) ≤n
2
16 by Lemma 9.23

n2
≤ cov(Λ1 )2 pack(Λ∗1 )2 +
16
(n − 1)3 n2
≤ + (induction)
16 16
n3
≤ . (for n ≥ 2)
16
Thus, it remains to prove (9.8). We have to show that for every x ∈ Rn there
2
exists some z ∈ Λ with kx − zk2 ≤ cov(Λ1 )2 + kuk 4 . To this end, let x ∈ R and
n

let y = τ (π(x)). Let v ∈ Λ1 be a closest lattice point to y in Λ1 , so

ky − vk ≤ cov(Λ1 ).

Let L denote the line through τ −1 (v) ∈ W parallel to u. Consider x0 := x −


π(x) + τ −1 (v) ∈ L. Since L intersects Λ by a (nonempty) set of equally spaced
points, each being distance kuk from the next, we can find a lattice point z ∈
Λ ∩ L with
kuk
kz − x0 k ≤ .
2
We conclude

kz − xk2 = kz − x0 + x0 − xk2
= kz − x0 k2 + (z − x0 )| (x0 − x) + kx0 − xk2
= kz − x0 k2 + ( z − x0 )| (τ −1 (v) − π(x)) + kx0 − xk2
| {z } | {z }
∈span(u) ∈W
0 2 0 2
= kz − x k + kx − xk
= kz − x0 k2 + kτ −1 (v) − π(x)k2
kuk2
= kz − x0 k2 + kv − τ (π(x)) k2 ≤ + cov2 (Λ1 ).
| {z } 4
=y
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 82

9.4 Lenstra’s algorithm Lecture 25

Let us recall Theorem 9.1, which states that for every fixed n, there is a
polynomial-time algorithm to solve integer programs max{c| x : Ax ≤ b, x ∈
Zn }. Such an algorithm was given by Lenstra in 1981, which we want to present
here. To this end, let n be fixed from now on.

From feasibility to optimization


First, we claim that it suffices to give a polynomial-time algorithm for the fea-
sibility version of the problem: Given A0 ∈ Qm×n , b ∈ Qm , decide whether
P ≤ (A0 , b0 ) ∩ Zn 6= ∅. Suppose we can efficiently solve the latter problem. Given
A ∈ Qm×n , b ∈ Qm and c ∈ Qn , we can determine OPT := max{c| x : Ax ≤
b, x ∈ Zn } as follows.
Clearly, we may assume that the integer program is feasible. By possibly
rescaling c, we may assume that c is integer, in which case OPT ∈ Z ∪ {∞}.
Recall Theorem 4.14, which states that there exist finite sets X, Y ⊆ Zn such
that P ∩ Zn = X + Λ+ (Y ), and the encoding length of every vector in X is
polynomially bounded in the encoding lengths of A, b. Thus, there is some
polynomial p (independent of A, b, c) such that kxk∞ ≤ 2p(L) , where L is the
total encoding length of A, b, c. Moreover, note that if OPT is finite, then one
of the points in X must be an optimal solution, say x∗ . This means that we
have
OPT = |c| x∗ | ≤ n · kck∞ · kxk∞ ≤ 2L · 2L · 2p(L) = 2p(L)+2L
in this case. Thus, to determine whether OPT is finite, it suffices to check
whether {x ∈ Zn : Ax ≤ b, c| x ≥ 2p(L)+2L + 1} = ∅ is feasible. Suppose now
we know that OPT is finite. Using binary search, we can determine OPT by at
most log2 (2p(L)+2L + 1) + 2 calls of the feasibility algorithm for sets of the form
{x ∈ Zn : Ax ≤ b, ` ≤ c| x ≤ u}. Note that the number of calls is polynomial
in the input size.

Deciding feasibility
Given A ∈ Qm×n , b ∈ Qm , it suffices to describe a polynomial-time algorithm
for deciding whether P ∩ Zn 6= ∅ holds, where P := P ≤ (A, b). Again using
Theorem 4.14, by possibly intersecting P with a box, we may assume that P
is bounded. Next, by solving some auxiliary linear programs, we can efficiently
determine the affine hull of P . If dim(P ) = d < n, we follow a strategy used
in the proof of Theorem 6.10: By possibly translating P , we may assume that
0 ∈ aff(P ). Then there exists a matrix

C ∈ Z(n−d)×n with rank(C) = n − d and aff(P ) = kern(C).

By Corollary 3.14 there is a unimodular matrix U ∈ Zn×n such that

CU = [B 0]
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 83

is the Hermite normal form of C, where B ∈ Z(n−d)×(n−d) is regular. Note that


P ∩ Zn 6= ∅ if and only if U −1 P ∩ Zn 6= ∅. Moreover, we have

U −1 P ∩ Zn = {U −1 x : Ax ≤ b, x ∈ Zn }
= {U −1 x : Ax ≤ b, x ∈ aff(P ), x ∈ Zn }
= {U −1 x : Ax ≤ b, Cx = 0, x ∈ Zn }
= {y : AU y ≤ b, CU y = 0, y ∈ Zn }
= {y : AU y ≤ b, [B 0]y = 0, y ∈ Zn }
= {y : AU y ≤ b, y1 = · · · = yn−d = 0, y ∈ Zn }.

Note that determining whether the latter set is nonempty is an integer feasibility
problem on only d < n variables, and hence we can assume that this can be
solved in polynomial time.
Thus, it remains to consider the case where P is a full-dimensional polytope,
which is the most interesting case. In the first step, we run on of our polynomial-
time algorithms to determine an ellipsoid E := T Bn +t, where T ∈ Qn×n , t ∈ Qn
with
T Bn + t ⊆ f (n)T Bn + t. (9.9)
In the second step, we employ the Flatness theorem. To this end, consider the
lattice Λ := T −1 Zn and let u ∈ Λ∗ \ {0}. In the proof of Proposition 9.22 we
have seen that the nonzero vector v := T −| u ∈ T −| T | Zn = Zn satisfies

max{v | x : x ∈ E} − min{v | x : x ∈ E} ≤ 2kuk,

and hence by (9.9)

max{v | x : x ∈ P } − min{v | x : x ∈ P } ≤ 2nkuk.

Clearly, given u, we can determine α := dmax{v | x : x ∈ P }e and β :=


bmin{v | x : x ∈ P }c in polynomial time. Since v ∈ Zn \ {0}, our original
integer program is feasible if and only if one of the integer programs

{x ∈ Zn : Ax ≤ b, v | x = k}, k ∈ {α, α + 1, . . . , β}

is feasible. We may assume that we can solve each such subproblem in polyno-
mial time since it is lower-dimensional (see above). Recall that the number of
such subproblems is at most 2kuk + 2. Let u∗ denote a shortest nonzero vector
in Λ∗ . We have not discussed yet how to compute such a vector, and in fact
we will only describe a method to compute a nonzero vector u ∈ Λ∗ \ {0} such
that kuk ≤ g(n)ku∗ k for some universal constant g(n) ≥ 1. Let us run this
method to obtain such a u. If kuk ≤ 21 g(n)n3/2 , the number of subproblems is
bounded by a constant (only depending on n), and we are done. Otherwise, we
know that ku∗ k > 21 n3/2 (the ellipsoid E is not “flat”). In this case, the Flatness
theorem (for ellipsoids, Proposition 9.22) states that E must contain an integer
point, and so does P .

9.5 Shortest vector problem and the LLL-algorithm


Lecture 26

As we have seen in the previous section, the only missing ingredient for Lenstra’s
algorithm is an algorithm for finding a “short” nonzero vector in a given lattice.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 84

Given a regular matrix A ∈ Qn×n , the problem of determining a nonzero vector


in Λ = Λ(A) of smallest (Euclidean) norm is called the shortest vector prob-
lem, which has numerous applications in other areas, such as number theory
and cryptography. For general n, the (decision version of the) shortest vector
problem is NP-hard.
However, in what follows we will describe a polynomial-time algorithm (for
general n) that, given A, determines a vector v ∈ Λ \ {0} such that kvk ≤
2(n−1)/2 kv ∗ k, where v ∗ is a shortest nonzero vector in Λ, which is sufficient to
run Lenstra’s algorithm in polynomial time. Such an algorithm was invented by
Arjen Lenstra, Hendrik Lenstra, and László Lovász in 1982, and is now called
the Lenstra–Lenstra–Lovász (LLL) lattice basis reduction algorithm.3
Before we present the LLL-algorithm, we need to recall the Gram-Schmidt
process: Given a basis b1 , . . . , bn ∈ Rn , the Gram-Schmidt process yields an
orthogonal basis g1 , . . . , gn ∈ Rn such that span(b1 , . . . , bi ) = span(g1 , . . . , gi )
for i = 1, . . . , n. Setting g1 = b1 , the remaining vectors are computed as follows.
If g1 , . . . , gi have been already computed, then gi+1 = bi+1 − u, where u is the
projection of bi+1 onto span(g1 , . . . , gi ). Equivalently, for each i ∈ [n] we have
i−1
X b|i gj
gi = bi − µi,j gj where µi,j = .
j=1
kgj k2

We will refer to the coefficients µi,j as the corresponding Gram-Schmidt coeffi-


cients.
Let us start with the following simple, but crucial observation.
Lemma 9.25. Let Λ ⊆ Rn be a lattice and let b1 , . . . , bn be any lattice basis
of Λ with Gram-Schmidt vectors g1 , . . . , gn . For every v ∈ Λ \ {0} we have
kvk ≥ min{kg1 k, . . . , kgn k}.
Pk
Proof. Let v = i=1 λi bi where λ1 , . . . , λk ∈ Z and λk 6= 0. Denote W :=
span(b1 , . . . , bk−1 ) = span(g1 , . . . , gk−1 ) and note that
k−1
X
v − λk gk = λi bi + λk (bk − gk ) ∈ W.
| {z }
i=1
∈W

Since gk is orthogonal to W , it is orthogonal to v − λk gk and hence

kvk2 = k(v − λk gk ) + λk gk k2 = k(v − λk gk )k2 + kλk gk k2 ≥ λ2k kgk k2 ≥ kgk k2 ,

where the last inequality holds since λk is a nonzero integer.

For a general lattice basis b1 , . . . , bn , the above lemma has no immediate


implication since the Gram-Schmidt vectors g1 , . . . , gn are not nicely related to
the lattice. However, recall that g1 = b1 . Thus, if none of the Gram-Schmidt
vectors g2 , . . . , gn is much shorter than b1 , then b1 itself is a good approximation
for the shortest vector problem. In fact, the LLL-algorithm will find a lattice
basis for which this is the case. To this end, it computes a basis with the
following property.
3 This algorithm is particularly mentioned in Lovász’s laudatio on the occasion of his receipt

of the Abel prize in 2021.


CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 85

Definition 9.26 (LLL-reduced basis). Let b1 , . . . , bn ∈ Rn be a basis and let


g1 , . . . , gn denote the corresponding Gram-Schmidt basis with Gram-Schmidt co-
efficients µi,j . We say that b1 , . . . , bn is LLL-reduced if

• |µi,j | ≤ 1
2 for all 1 ≤ j < i ≤ n, and
• ( 43 − µ2i+1,i )kgi k2 ≤ kgi+1 k2 for i = 1, 2, . . . , n − 1.
Let us first show that an LLL-reduced basis has the property mentioned
above.

Proposition 9.27. Let Λ ⊆ Rn be a lattice and let b1 , . . . , bn be an LLL-reduced


lattice basis of Λ. Then kb1 k ≤ 2(n−1)/2 kv ∗ k, where v ∗ is a shortest nonzero
vector in Λ.
Proof. The properties of an LLL-reduced basis directly imply
3 1
kgi+1 k2 ≥ ( − µ2i+1,i )kgi k2 ≥ kgi k2
4 2
for every i ∈ [n − 1]. In particular, we see that

kgi k2 ≥ ( 12 )i−1 kg1 k2 ≥ ( 12 )n−1 kg1 k2 = ( 21 )n−1 kb1 k2

holds for all i ∈ [n]. By Lemma 9.25, every v ∈ Λ \ {0} satisfies

kvk ≥ min{kgi k : i ∈ [n]} ≥ ( 12 )(n−1)/2 kb1 k.

Let us remark that the parameter 34 in the definition of an LLL-reduced


basis can be replaced by any number in ( 41 , 1). The choice 34 a nearly optimal
choice with respect to the approximation guarantee of the previous lemma.
A surprising fact is that every lattice has an LLL-reduced basis, and it can
be computed by the following simple algorithm.
Algorithm 9.28 (LLL-algorithm). Given a basis b1 , . . . , bn ∈ Rn , perform the
following steps.

1. Compute the Gram-Schmidt vectors g1 , . . . , gn of b1 , . . . , bn .


2. For each i ∈ {2, . . . , n}, subtract integer multiples of b1 , . . . , bi−1 from bi
such that |µi,j | ≤ 21 for j = 1, . . . , i − 1.
3. If ( 34 − µ2i+1,i )kgi k2 > kgi+1 k2 for some i ∈ [n − 1], then swap bi and bi+1
and go back to Step 1.

4. Return b1 , . . . , bn .
Proposition 9.29. If the LLL-algorithm stops for an input basis b1 , . . . , bn , it
returns an LLL-reduced basis for Λ(b1 , . . . , bn ).
Proof. Note that throughout the algorithm we only perform elementary column
operations on [b1 . . . bn ] and hence the resulting vectors are still a lattice basis
for Λ(b1 , . . . , bn ). Moreover, the second step does not change the Gram-Schmidt
vectors of b1 , . . . , bn . Thus, it is clear that the returned vectors indeed satisfy
the conditions of an LLL-reduced basis.
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 86

Proposition 9.30. Step 2 of the LLL-algorithm can be performed in polynomial


time.

Proof. Let G = [g1 . . . gn ] and B = [b1 . . . bn ] and let R = G−1 B. Note that
B = GR and hence R is an upper triangular matrix with Ri,i = 1 for all i.
By subtracting integer multiples of columns from subsequent columns we can
efficiently find an upper triangular integer matrix U with Ui,i = 1 for all i such
that RU is again upper triangular with (RU )i,i = 1 for all i and |(RU )i,j | ≤ 1/2
for all i 6= j.
We claim that we can replace b1 , . . . , bn by the columns of BU . To this
end, note that the columns of BU also arise from b1 , . . . , bn by subtracting
integer multiples of columns from subsequent columns. In particular, g1 , . . . , gn
are still the Gram-Schmidt vectors of the columns of BU . Let us denote the
Gram-Schmidt coefficients of the columns of BU by µ̃i,j . Since BU = (GR)U =
G(RU ), we see that |µ̃i,j | = |(RU )j,i | ≤ 1/2 for all 1 ≤ j < i ≤ n.
Theorem 9.31. The number of iterations of the LLL-algorithm is polynomial
in the encoding lengths of b1 , . . . , bn ∈ Qn .
Proof. We claim that it suffices to analyze the number of iterations for in-
teger vectors b1 , . . . , bn . Indeed, if λ 6= 0, then λg1 , . . . , λgn are the Gram-
Schmidt vectors of λb1 , . . . , λbn . Moreover, if b1 , . . . , bn is LLL-reduced, then so
is λb1 , . . . , λbn . Finally, the steps of the LLL-algorithm on inputs b1 , . . . , bn and
λb1 , . . . , λbn are identical.
Thus, we may assume that b1 , . . . , bn are integer vectors. Notice that,
throughout the LLL algorithm, b1 , . . . , bn stay integer vectors. For any regular
matrix A ∈ Zn×n and j ∈ [n] let

Vj (A) := det(A|∗,[j] · A∗,[j] ) ∈ Z≥1

denote the squared j-dimensional volume of the parallelepiped spanned by the


first j columns of A. Note that if G = [g1 . . . gn ] consists of the Gram-Schmidt
vectors of the columns of A, then

Vj (A) = Vj (G) = kg1 k2 · · · · · kgj k2 (9.10)

for all j ∈ [n]. We consider the potential


n
Y
Φ(A) := Vj (A) ∈ Z≥1 .
j=1

Note that the encoding length of Φ(A) is polynomially bounded in the encoding
length of A. Thus, it suffices to show that after every swap operation of the
LLL-algorithm we have
3
Φ([b01 . . . b0n ]) ≤ Φ([b1 . . . bn ]),
4
where b01 , . . . , b0n arise from b1 , . . . , bn by swapping the respective vectors.
To this end, suppose that the LLL-algorithm swaps the vectors bj−1 and bj
in some iteration. Let B and B 0 denote the matrices consisting of the basis
CHAPTER 9. INTEGER PROGRAMMING IN FIXED DIMENSION 87

vectors before and after the swap, respectively. Note that V` (B) = V` (B 0 ) for
all ` 6= j − 1 and hence
Φ(B 0 ) Vj−1 (B 0 )
= . (9.11)
Φ(B) Vj−1 (B)
Let g10 , . . . , gn0 denote the Gram-Schmidt vectors of B 0 . Clearly, we have g` = g`0
for all ` ∈ [j − 2]. Moreover, we have gj−1 0
= gj − µj,j−1 gj−1 (why?). Since the
LLL-algorithm swapped bj−1 and bj , we must have

( 43 − µ2j,j−1 )kgj−1 k2 > kgj k2 ,

or, equivalently,
3
4 kgj−1 k
2
> kgj k2 + kµj,j−1 gj−1 k2 = kgj + µj,j−1 gj−1 k2 . (9.12)

Combining (9.12), (9.11), and (9.10) we conclude

Φ(B 0 ) kgj + µj,j−1 gj−1 k2 3


= < .
Φ(B) kgj−1 k2 4

You might also like