Linopt Notes
Linopt Notes
WRITTEN BY
FABRICIO OLIVEIRA
ASSOCIATE PROFESSOR OF OPERATIONS RESEARCH
AALTO UNIVERSITY, SCHOOL OF SCIENCE
These notes comprise the compilations of lecture notes prepared for teaching Linear Optimisation
and Integer Optimisation at Aalto University, Department of Mathematics and Systems Analysis,
since 2017.
These notes are mostly based on these references:
Over the years, errors and typos have been found and introduced as the notes evolve. It is likely
that some still exist. If so, please feel free to open an issue in the GitHub repository and report it
or make a pull request proposing a fix.
These notes have been prepared with the help of several students over the years. I am immensely
grateful for their support and dedication to this material and teaching of this course.
F.
3
4
Contents
1 Introduction 9
1.1 What is optimisation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.1 Mathematical programming and optimisation . . . . . . . . . . . . . . . . . 9
1.1.2 Types of mathematical optimisation models . . . . . . . . . . . . . . . . . . 10
1.2 Linear programming applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Resource allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Transportation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 Production planning (lot-sizing) . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 The geometry of LPs - graphical method . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.1 The graphical method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.2 Geometrical properties of LPs . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5
6 Contents
Introduction
That is, we would like to find the solution x that minimises the value of the objective function f ,
such that (s.t.) x belongs to the feasibility set X. In a general sense, problems like this can be
solved by employing the following strategy:
1. Analysing properties of the function f (x) under specific domains and deriving the conditions
that must be satisfied such that a point x is a candidate optimal point.
2. Applying numerical methods that iteratively search for points satisfying these conditions.
This idea is central to several knowledge domains and is often described with area-specific nomen-
clature. Fields such as economics, engineering, statistics, machine learning, and, perhaps more
broadly, operations research are intensive users and developers of optimisation theory and appli-
cations.
9
10 Chapter 1. Introduction
• domain represents business rules or constraints, representing logic relations, design or engi-
neering limitations, requirements, and such;
• objective function is a function that provides a measure of solution quality.
With these in mind, we can represent the decision problem as a mathematical programming model
of the form of (1.1) that can be solved using optimisation methods. From now on, we will refer
to this specific class of models as mathematical optimisation models or optimisation models for
short. We will also use the term solve the problem to refer to the task of finding optimal solutions
to optimisation models.
This book mostly focuses on the optimisation techniques employed to find optimal solutions for
these models. As we will see, depending on the nature of the functions that are used to formulate
the model, some methods might be more or less appropriate. Further complicating the issue, for
models of a given nature, there might be alternative algorithms that can be employed, with no
generalised consensus on whether one method is generally better performing than another, which is
one of the aspects that make optimisation so exciting and multifaceted when it comes to alternative
approaches. I hope that this makes more sense as we progress through the chapters.
1. Unconstrained models: in these, the set X = Rn and m = l = 0. These are prominent in,
e.g., machine learning and statistics applications, where f represents a measure of model
fitness or prediction error.
2. Linear programming (LP): presumes linear objective function f (x) = c⊤ x and affine con-
straints g and h, i.e., of the form a⊤ n
i x − bi , with ai ∈ R and b ∈ R. Normally, X =
n
{x ∈ R | xj ≥ 0, ∀j ∈ [n]} enforce that the domain of the decision variables is the nonneg-
ative orthant.
1.2. Linear programming applications 11
3. Nonlinear programming (NLP): some or all of the functions f , g, and h are nonlinear.
4. Mixed-integer (linear) programming (MIP): consists of an LP in which some (or all, being
then simply integer programming) of the variables are constrained to be integers. In other
words, X ⊆ Rk × Zn−k . Very frequently, the integer variables are constrained to be binary
terms, i.e., xi ∈ {0, 1}, ∀i ∈ [n − k] and are meant to represent true-or-false or yes-or-no
conditions.
5. Mixed-integer nonlinear programming (MINLP): are the intersection of MIPs and NLPs.
where c = [c1 , . . . , cN ]⊤ and x = [x1 , . . . , xN ]⊤ are n-sized vectors. Notice that c⊤ x denotes the
inner (or dot) product. The transpose sign ⊤ is meant to reinforce that we see our vectors as
column vectors unless otherwise stated.
Next, we need to define constraints that state the conditions for a plan to be valid. In this context,
a valid plan is a plan that does not utilise more than the amount of available resources bi , ∀i ∈ I.
This can be expressed as the collection (one for each i ∈ I) of affine (more often wrongly called,
as we will too, linear) inequalities
X
s.t.: aij xj ≤ bi , ∀i ∈ I ⇒ Ax ≤ b,
j∈J
12 Chapter 1. Introduction
where aij are the components of the m × n matrix A and b = [b1 , . . . , bm ]⊤ . Furthermore, we also
must require that xi ≥ 0, ∀i ∈ I.
Combining the above, we obtain the generic formulation that will be used throughout this text to
represent linear programming models:
max. c⊤ x
s.t.: Ax ≤ b (1.2)
x ≥ 0.
Let us work on a more specific example that will be useful for illustrating some important concepts
related to the geometry of linear programming problems.
Let us consider a paint factory that produces exterior and interior paint from raw materials M1
and M2. The maximum demand for interior paint is 2 tons/day. Moreover, the amount of interior
paint produced cannot exceed that of exterior paint by more than 1 ton/day.
Our goal is to determine the optimal paint production plan. Table 1.1 summarises the data to
be considered. Notice the constraints that must be imposed to represent the daily availability of
paint.
The paint factory problem is an example of a resource allocation problem. Perhaps one aspect that
is somewhat dissimilar is the constraint representing the production rules regarding the relative
amounts of exterior and interior paint. Notice, however, that this type of constraint also has the
same format as the more straightforward resource allocation constraints.
Let x1 be the amount produced of exterior paint (in tons) and x2 the amount of interior paint.
The complete model that optimises the daily production plan of the paint factory is:
max. z = 5x1 + 4x2
s.t.: 6x1 + 4x2 ≤ 24
x1 + 2x2 ≤ 6
x2 − x1 ≤ 1
x2 ≤ 2
x1 , x2 ≥ 0.
Notice that paint factory model can also be compactly represented as in (1.2), where
6 4
1 2
c = [5, 4], x = [x1 , x2 ], A =
−1 1 , and b = [24, 6, 1, 2].
0 1
1.2. Linear programming applications 13
Plants Clients
NY
SD
CH
SE
MI
Figure 1.1: Schematic illustration of a network with two source nodes and three demand nodes
Clients
Factory NY Chicago Miami Capacity
Seattle 2.5 1.7 1.8 350
San Diego 3.5 1.8 1.4 600
Demands 325 300 275 -
Table 1.2: Problem data: unit transportation costs, demands and capacities
where cij is the unit transportation cost from i to j. The problem has two types of constraints
that must be observed, relating to the supply capacity and demand requirements. These can be
stated as the following linear constraints
where Ci is the production capacity of factory i ∈ I and Dj is the client demand j ∈ J. Notice
that the terms on the left-hand side in (1.3) account for the total production in each of the source
nodes i ∈ I. Analogously, in constraint (1.4), the term on the left-hand side accounts for the total
of the demand satisfied at the demand nodes j ∈ J.
P
Using an optimality argument,P we can see that any solution for which, for any j ∈ J, i∈I xij > Dj
can be improved by making i∈I xij = Dj . This shows that this constraint under these conditions
will always be satisfied as an equality constraint instead and could be replaced as such.
The complete transportation model for the example above can be stated as
One interesting aspect to notice regarding algebraic forms is that they allow to represent the main
structure of the model while being independent of the instance being considered. For example,
regardless of whether the instance would have 5 or 50 nodes, the algebraic formulation is the same,
allowing for detaching the problem instance (in our case the 5 node network) from the model
itself. Moreover, most computational tools for mathematical programming modelling (hereinafter
referred to simply as modelling - such as JuMP.jl) empower the user to define the optimisation
model using this algebraic representation.
1.2. Linear programming applications 15
Algebraic forms are the main form in which we will specify optimisation models. This abstraction
is a peculiar aspect of mathematical programming and is perhaps one of its main features, the fact
that one must formulate models for each specific setting, which can be done in multiple ways and
might have consequences for how well an algorithm performs computationally. This is a point we
will discuss in more detail later on in this book.
k1 k2 kT −1
t=1 t=2 ... t=T
p1 p2 pt pT
Figure 1.2: A schematic representation of the lot-sizing problem. Each node represents the material
balance at each time period t
A few points must be considered carefully when dealing with lot-sizing problems. First, one must
carefully consider boundary condition, that is, what the model is deciding in time periods t = T
16 Chapter 1. Introduction
and what is the initial inventory (carried from t = 0). While the former will be seen by the model
as the “end of the world”, leading to the realization that optimal inventory levels at period |T |
must be zero, the latter might considerably influence how much production is needed during the
planning horizon T . These must be observed and handled accordingly.
6 6
3 3
x2
x2
x2 2
2 x2 2 2
1 x1 + 2x2 6 1 x1 + 2x2 6
0 1 4 5 0 1 4 5
0 2 3 6 0 2 3 6
x1 x1
(a) (b)
Figure 1.3: The feasible region of the paint factory problem (in Figure 1.3b), represented as the
intersection of the four closed-half spaces formed by each of the constraints (as shown in Figure
1.3a). Notice how the feasible region is a polyhedral set in R2 , as there are two decision variables
(x1 and x2 )
We can use this visual representation to find the optimal solution for the problem, that is, the point
1.3. The geometry of LPs - graphical method 17
(x1 , x2 ) within the feasible set such that the objective function value is maximal (recall that the
paint factory problem is a maximisation problem). For that, we must consider how the objective
function z = c⊤ x can be represented in the (x1 , x2 )-plane. Notice that the objective function
forms a hyperplane in (x1 , x2 , z) ⊂ R3 , of which we can plot level curves (i.e., projections) onto
the (x1 , x2 )-plane. Figure 1.4a shows the plotting of three level curves, for z = 5, 10, and 15.
This observation provides us with a simple graphical method to find the optimal solution to linear
problems. One must simply sweep the feasible region in the direction of the gradient ∇z =
∂z
[ ∂x , ∂z ]⊤ = [5, 4]⊤ (or in its opposite direction, if minimising) until one last point (or edge) of
1 ∂x2
contact remains, meaning that the whole of the feasible region is behind that furthermost level
curve. Figure 1.4b illustrates the process of finding the optimal solution for the paint factory
problem.
6 6
3 3
x2
x2
z = 10 z = 10
z x2 2 z x2 2
2 2 z * = 21
z=5 z=5 x * = (3, 1.5)
1 x1 + 2x2 6 1 x1 + 2x2 6
0 1 4 5 0 1 4 5
0 2 3 6 0 2 3 6
x1 x1
(a) (b)
Figure 1.4: Graphical representation of some of the level curves of the objective function z =
5x1 + 4x2 . Notice that the constant gradient vector ∇z = (5, 4)⊤ points to the direction in which
the level curves increase in value. The optimal point is represented by x∗ = (3, 1.5)⊤ with the
furthermost level curve being that associated with the value z ∗ = 21
The graphical method is important because it allows us to notice several key features that will be
used later on when we analyse a method that can search for optimal solutions for LP problems. The
first is related to the notion of active or inactive constraints. We say that a constraint is active at
a given point x if the constraint is satisfied as equality at the point x. For example, the constraints
6x1 + 4x2 ≤ 24 and x1 + 2x2 ≤ 6 are active at the optimum x∗ = (3, 1.5), since 6(3) + 4(1.5) = 24
and 3 + 2(1.5) = 6. An active constraint indicates that the resource (or requirement) represented
by that constraint is being fully depleted (or minimally satisfied).
Analogously, inactive constraints are constraints that are satisfied as strict inequalities at the
optimum. For example, the constraint −x1 + x2 ≤ 1 is inactive at the optimum, as −(3) + 1.5 < 1.
In this case, an inactive constraint represents a resource (or requirement) that is not fully depleted
(or is over-satisfied).
18 Chapter 1. Introduction
1.4 Exercises
(a)
(b)
(c)
(d)
Consider the following network, where the supplies from the Austin and Buffalo nodes need to
meet the demands in Chicago, Denver, and Erie. The data for the problem is presented in Table
1.3.
Plants Clients
Chicago
Buffalo
Denver
Austin
Erie
Clients
Factory Chicago Denver Eire
Buffalo 4 9 8 25
Austin 10 7 9 600
Demands 15 12 13 -
Table 1.3: Problem data: unit transportation costs, demands and capacities
Solve the transportation problem, finding the minimum cost transportation plan.
The Finnish company Next has a logistics problem in which used oils serve as raw material to
form a class of renewable diesel. The supply chain team need to organise, to a minimal cost, the
acquisition of two types of used oils (products p1 and p2) from three suppliers (supply nodes s1,
s2, and s3) to feed the line of three of their used oils processing factories (demand nodes d1, d2,
and d3). As the used oils are the byproduct of the supplier’s core activities, the only requirement
is that Next need to fetch the amount of oil and pay the transportation costs alone.
The oils have specific conditioning and handling requirements, so transportation costs vary between
p1 and p2. Additionally, not all the routes (arcs) between suppliers and factories are available, as
some distances are not economically feasible. Table 1.4a shows the volume requirement of the two
types of oil from each supply and demand node, and Table ?? shows the transportation costs for
each oil type per L and the arc capacity. Arcs with “-” for costs are not available as transportation
routes.
1.4. Exercises 21
d1 d2 d3
node p1 p2
p1/p2 (cap) p1/p2 (cap) p1/p2 (cap)
s1 / d1 80 / 60 400 / 300
s1 5/- (∞) 5/18 (300) -/- (0)
s2 / d2 200 / 100 1500 / 1000
s2 8/15 (300) 9/12 (700) 7/14 (600)
s3 / d3 200 / 200 300 / 500
s3 -/- (0) 10/20 (∞) 8/- (∞)
Find the optimal oil acquisition plan for Next, i.e., solve its transportation problem to the minimum
cost.
Consider a farmer who produces wheat, corn, and sugar beets on his 500 acres of land. During the
winter, the farmer wants to decide how much land to devote to each crop.
The farmer knows that at least 200 tons (T) of wheat and 240T of corn are needed for cattle feed.
These amounts can be raised on the farm or bought from a wholesaler. Any production in excess
of the feeding requirement would be sold. Over the last decade, mean selling prices have been $
170 and $ 150 per ton of wheat and corn, respectively. The purchase prices are 40 % more than
this due to the wholesaler’s margin and transportation costs. The planting costs per acre of wheat
and corn are $ 150 and $ 230, respectively.
Another profitable crop is sugar beet, which he expects to sell at $36/T; however, the European
Commission imposes a quota on sugar beet production. Any amount in excess of the quota can
be sold only at $10/T. The farmer’s quota for next year is 6000T. The planting cost per acre of
sugar beet is $ 260.
Based on past experience, the farmer knows that the mean yield on his land is roughly 2.5T, 3T,
and 20T per acre for wheat, corn, and sugar beets, respectively.
Based on the data, build up a model to help the farmer allocate the farming area to each crop and
how much to sell/buy of wheat, corn, and sugar beets considering the following cases.
(a) The predictions are 100% accurate and the mean yields are the only realizations possible.
(b) There are three possible equiprobable scenarios (i.e, each one with a probability equal to 13 ):
a good, fair, and bad weather scenario. In the good weather, the yield is 20% better than the
yield expected whereas in the bad weather scenario it is reduced 20% of the mean yield. In
the regular weather scenario, the yield for each crop keeps the historical mean - 2.5T/acre,
3T/acre, and 20T/acre for wheat, corn, and sugar beets, respectively.
(c) What happens if we assume the same scenarios as item (b) but with probabilities 25%, 25%,
and 50% for good, fair, and bad weather, respectively? How the production plan changes
and why?
22 Chapter 1. Introduction
A factory makes seven products (PROD 1 to PROD 7) using the following machines: four grinders,
two vertical drills, three horizontal drills, one borer and one planer. Each product yields a certain
contribution to the profit (defined as $/unit selling price minus the cost of raw materials). These
quantities (in $/unit) together with the unit production times (hours) required on each process
are given in Table 1.5. A dash indicates that a product does not require a process. There are also
marketing demand limitations on each product each month. These are given in Table 1.6.
It is possible to store up to 100 of each product at a time at a cost of $0.5 per unit per month.
There are no stocks at present, but it is desired to have a stock of 50 of each type of product at
the end of June.
The factory works six days a week with two shifts of 8h each day. Assume that each month consists
of only 24 working days. Also, there are no penalties for unmet demands. What is the factory’s
production plan (how much of which product to make and when) in order to maximise the total
profit?
Suppose we have a set of pre-classified data points xi in Rn , divided into two sets based on their
classification. For example, the data could be information on patients and the two sets would
correspond to patients who have or do not have a certain disease. Note that unlike so far, x is now
data, not a decision variable. Denote the respective index sets by I1 and I2 , respectively. In order
to predict the class of a new point xnew , we want to infer a classification rule from the data.
1.4. Exercises 23
Figure 1.5: Two sets of data points defined by two features, separated by a line ax = b
Write a linear programming problem that finds the hyperplane a⊤ x = b such that if a⊤ xnew > b, the
point xnew is predicted to be in class 1, and if a⊤ xnew < b, the predicted class is 2. The hyperplane
should be optimal in the sense that it minimises the sum of absolute deviations |a⊤ xi − b| for the
misclassified points xi in the training data, that is, points on the wrong side of the hyperplane. In
Figure 1.5, any orange point xi , i ∈ I2 that is on the top/right of the line is on the wrong side and
thus accumulates the error, and similarly for blue points on the bottom/left of the line.
24 Chapter 1. Introduction
Chapter 2
Matrix inversion is the “kingpin” of linear (and nonlinear) optimisation. As we will see later on,
performing efficient matrix inversion operations (in reality, operations that are equivalent to matrix
inversion but that can exploit the matrix structure to be made more efficient from a computational
standpoint) is of utmost importance for developing a linear optimisation solver.
Another important concept is the notion of linear independence. We formally state when a collec-
tion of vectors is said to be linearly independent (or dependent) in Definition 2.2.
k
Definition 2.2 (Linearly independent vectors). The vectors {xi }i=1 with xi ∈ Rn , ∀i ∈ [k], are
k
linearly dependent if there exist real numbers {ai }i=1 with ai ̸= 0 for at least one i ∈ [k] such that
k
X
ai xi = 0;
i=1
k
otherwise, {xi }i=1 are linearly independent.
In essence, for a collection of vectors to be linearly independent, it must be so that none of the
vectors in the collection can be expressed as a linear combination (that is, multiplying the vectors
by nonzero scalars and adding them) of the others. Analogously, they are said to be linearly
dependent if one vector in the collection can be expressed as a linear combination of the others.
This is simpler to see in R2 . Two vectors are linearly independent if one cannot obtain one by
multiplying the other by a constant, which effectively means that they are not parallel. If the two
vectors are not parallel, then one of them must have a component in a direction that the other
25
26 Chapter 2. Basics of Linear Algebra
x1 x2
x2
x1
x3
Figure 2.1: Linearly independent (top) and dependent (bottom) vectors in R2 . Notice how, in the
bottom picture, any of the vectors can be obtained by appropriately scaling and adding the other
two
vector cannot “reach”. The same idea can be generalised to any n-dimensional space. Also, that
explains why one can only have up to n independent vectors in Rn . Figure 2.1 illustrates this idea.
Theorem 2.3 summarises results that we will utilise in the upcoming developments. These are
classical results from linear algebra and the proof is left as an exercise.
Theorem 2.3 (Inverses, linear independence, and solving Ax = b). Let A be a m × m matrix.
Then, the following statements are equivalent:
1. A is invertible
2. A⊤ is invertible
Notice that Theorem 2.3 establishes important relationships between the geometry of the matrix
A (its rows and columns) and consequences it has to our ability to calculate its inverse A−1 and,
consequently, solve the system Ax = b, to which the solution is obtained as x = A−1 b. As we will
see, solving linear systems of equations is the most important operation in the simplex method.
S = {ax + by : x, y ∈ S; a, b ∈ R} .
2.1. Basics of linear problems 27
k
A related concept is the notion of a span. A span of a collection of vectors {xi }i=1 ∈ Rn is the
subspace of Rn formed by all linear combinations of such vectors, i.e.,
( k
)
X
span(x1 , . . . , xk ) = y = ai xi : ai ∈ R, i ∈ [k] .
i=1
Notice how the two concepts are related: the span of a collection of vectors forms a subspace.
Therefore, a subspace can be characterised by the collection of vectors whose span forms it. In
other words, the span of a set of vectors is the subspace formed by all points we can represent by
some linear combination of these vectors.
The missing part in this is the notion of a basis. A basis of the subspace S ⊆ Rn is a collection of
k
vectors {xi }i=1 ∈ Rn that are linearly independent such that span(x1 , . . . , xk ) = S.
Notice that a basis is a “minimal” set of vectors that form a subspace. You can think of it in light
of the definition of linearly independent vectors (Definition 2.2); if a vector is linearly dependent
to the others, it is not needed for characterising the subspace that the vectors span since it can be
represented by a linear combination of the other vectors (and thus is in the subspace formed by
the span of the other vectors).
The above leads us to some important realisations:
1. All bases of a given subspace S have the same dimension. Any extra vector would be linearly
dependent to those vectors that span S. In that case, we say that the subspace has size (or
dimension) k, the number of linearly independent vectors forming the basis of the subspace.
We can overload the notation dim(S) to represent the dimension of the subspace S.
2. If the subspace S ⊂ Rn is formed by a basis of size m < n, we say that S is a proper
subspace with dim(S) = m, because it is not the whole Rn itself, but is contained within
Rn . For example, two linearly independent vectors form (i.e., span) a hyperplane in R3 ; this
hyperplane is a proper subspace since dim(S) = m = 2 < 3 = n.
3. If a proper subspace has dimension m < n, then it means that there are n−m directions in Rn
that are perpendicular to the subspace and to each other. That is, there are nonzero vectors ai
that are orthogonal to each other and to S. Or, equivalently, a⊤ i x = 0 for i = n − m + 1, ..., n.
Referring to the R3 , if m = 2, then there is a third direction that is perpendicular to (or not
in) S. Figure 2.2 can be used to illustrate this idea. Notice how one can find a vector, say
x3 that is perpendicular to S. This is because the whole space is R3 , but S has dimension
m = 2 (or dim(S) = 2).
Theorem 2.4 builds upon the previous points to guarantee the existence of bases and propose a
procedure to form them.
Theorem 2.4 (Forming bases from linearly independent vectors). Suppose that S = span(x1 , . . . , xk )
has dimension m ≤ k. Then
Proof. Notice that, if every vector xk′ +1 , . . . , xk can be expressed as a linear combination of
x1 , . . . xk′ , then every vector in S is also a linear combination of x1 , . . . , xk′ . Thus, x1 , . . . , xk′
28 Chapter 2. Basics of Linear Algebra
S x1
x1
S
x2
form a basis to S with m = k ′ . Otherwise, at least one of the vectors in xk′ +1 , . . . , xk is linearly
independent from x1 , . . . , xk′ . By picking one such vector, we now have k ′ + 1 of the vectors
xk′ +1 , . . . , xk that are linearly independent. If we repeat this process m − k ′ times, we end up with
a basis for S.
Our interest in subspaces and bases spans (pun intended!) from their usefulness in explaining how
the simplex method works under a purely algebraic (as opposed to geometric) perspective. For
now, we can use the opportunity to define some “famous” subspaces which will often appear in
our derivations.
Let A be a m×n matrix. The column space of A consists of the subspace spanned by the n columns
of A and has dimension m (recall that each column has as many components as the number of rows
and is thus a m-dimensional vector). Likewise, the row space of A is the subspace in Rn spanned
by the rows of A. Finally, the null space of A, often denoted as null(A) = {x ∈ Rn : Ax = 0},
consist of the vectors that are perpendicular to the row space of A.
One important notion related to those subspaces is their size. Both the row and the column space
have the same size, which is the rank of A. If A is full rank, than it means that
rank(A) = min {m, n} .
Finally, the size of the null space of A is given n − rank(A), which is in line with Theorem 2.4.
x1
S
x0
x2
.
Figure 2.3: The affine subspace S generated by x0 and null(a). Notice that all vectors in S,
exemplified by x1 and x2 are perpendicular (i.e., have null dot product) to a
constraints, meaning that it will always lead to us having m < n. Now, assume that x0 ∈ Rn is
such that Ax0 = b. Then, we have that
Ax = Ax0 = b ⇒ A(x − x0 ) = 0.
Thus, x ∈ S if and only if the vector (x − x0 ) belongs to null(A), the nullspace of A. Notice that
the feasible region S can be also defined as
S = {x + x0 : x ∈ null(A)} ,
being thus an affine subspace with dimension n − m, if A has m linearly independent rows (i.e.,
rank(A) = m). This will have important implications in the way we can define multiple bases for
S from the n vectors in the column space and what implications it has for the feasibility of the
whole problem. Figure 2.3 illustrates this concept for a single-row matrix a. For a multiple-row
matrix A, S becomes the intersection of multiple hyperplanes.
S = {x ∈ Rn : Ax ≥ b} ,
One important thing to notice is that polyhedral sets, as defined in Definition 2.5, as formed by
m
the intersection multiple half-spaces. Specifically, let {ai }i=1 be the rows of A. Then, the set S
can be described as
S = x ∈ Rn : a⊤
i x ≥ bi , ∀i ∈ [m] , (2.3)
30 Chapter 2. Basics of Linear Algebra
which represents exactly the intersection of the half-spaces a⊤i x ≥ bi . Furthermore, notice that the
hyperplanes a⊤ i x = bi , ∀i ∈ [m], are the boundaries of each hyperplane, and thus describe one of
the facets of the polyhedral set. Figure 2.4 illustrates a hyperplane forming two half-spaces (also
polyhedral sets) and how the intersection of five half-spaces forms a (bounded) polyhedral set.
a⊤
3 x = b3
a⊤
2 x = b2
a⊤ x > b a3
1 x = b1
a a2
a1
a⊤
a4
a⊤ x < b a5
a⊤ x = b a⊤
4 x = b4
a⊤
5 x = b5
Figure
2.4: A hyperplane and its respective halfspaces (left) and the polyhedral set
x ∈ R2 : axi ≥ bi , i = 1, . . . , 5 (right).
You might find authors referring to bounded polyhedral sets as polytopes. However, this is not used
consistently across references, sometimes with switched meanings (for example, using polytope to
refer to a set defined as in Definition 2.5 and using polyhedron to refer to a bounded version of
S). In this text, we will only use the term polyhedral set to refer to sets defined as in Definition
2.5 and use the term bounded whenever applicable.
Also, it may be useful
to formally define some elements in polyhedral sets. For that, let us consider
a hyperplane H = x ∈ Rn : a⊤ x = b , with a ∈ Rn and b ∈ R. Now assume that H is such that
the set F = H ∩ S is not empty, and it only contains points that are in the boundary of S. This
set is known as a face of a polyhedral set. If the face F has dimension zero, then F is called a
vertex. Analogously, if dim(F ) = 1, then F is called an edge. Finally, if dim(F ) = dim(S) − 1,
then F is called a facet. Notice that in R3 , facets and faces are the same whenever the face is not
an edge or a vertex.
Definition 2.6 leads to a simple geometrical intuition: for a set to be convex, the line segment
connecting any two points within the set must lie within the set. This is illustrated in Figure 2.5.
Associated with the notion of convex sets are two important elements we will refer to later when
we discuss linear problems that embed integrality requirements. The first is the notion of a convex
combination, which is already contained in Definition 2.6, but can be generalised for an arbitrary
2.2. Convex polyhedral set 31
x2 x2
x1 x2
x1 x1
Figure 2.5: Two convex sets (left and middle) and one nonconvex set (right)
x1 x1
x2 x2
x5
x6
x1
x3 x2 x4 x3
Figure 2.6: The convex hull of two points is the line segment connecting them (left); The convex
hull of three (centre) and six (right) points in R2
number of points. The second consists of convex hulls, which are sets formed by combining the
convex combinations of all elements within a given set. As one might suspect, convex hulls are
always convex sets, regardless of whether the original set from which the points are drawn from is
convex or not. These are formalised in Definition 2.7 and illustrated in Figure 2.6.
Definition 2.7 (Convex combinations and convex hulls). Let x1 , . . . , xk ∈ Rn and λ1 , . . . , λk ∈ R
Pk
such that λi ≥ 0 for i = 1, . . . , k and i=1 λi = 1. Then
Pk k
1. x = i=1 λi xi is a convex combination of {xi }i=1 ∈ Rn ;
k
2. The convex hull of {xi }i=1 ∈ Rn , denoted conv(x1 , . . . , xk ), is the set of all convex combi-
k
nations of {xi }i=1 ∈ Rn .
We are now ready to state the result that guarantees the convexity of polyhedral sets of the form
S = {x ∈ Rn : Ax ≤ b} .
Theorem 2.8 (Convexity of polyhedral sets). The following statements are true:
2. Let a ∈ Rn and b ∈ R. Let x, y ∈ Rn , such that a⊤ x ≥ b and a⊤ y ≥ b. Let λ ∈ [0, 1]. Then
a⊤ (λx + (1 − λ)y) ≥ λb + (1 − λ)b = b, showing that half-spaces are convex. The result
follows from combining this with (1).
3. By induction. Let S be a convex set and assume that the convex combination of x1 , . . . , xk ∈
S also belongs to S. Consider k+1 elements x1 , . . . , xk+1 ∈ S and λ1 , . . . , λk+1 with λi ∈ [0, 1]
Pk+1
for i ∈ [k + 1] and i=1 λi = 1 and λk+1 ̸= 1 (without loss of generality). Then
k+1 k
X X λi
λi xi = λk+1 xk+1 + (1 − λk+1 ) xi . (2.4)
i=1 i=1
1 − λk+1
Pk λi Pk λi
Notice that = 1. Thus, using the induction hypothesis,
i=1 1−λk+1 i=1 1−λk+1 xi ∈
Pk+1
S. Considering that S is convex and using (2.4), we conclude that i=1 λk+1 xk+1 ∈ S,
completing the induction.
Pk Pk
4. Let S = conv(x1 , . . . , xk ). Let y = i=1 αi xi and z = i=1 βi xi be such that y, z ∈ S,
Pk Pk
αi , βi ≥ 0, and i=1 αi = i=1 βi = 1. Let λ ∈ [0, 1]. Then
k
X k
X k
X
λy + (1 − λ)z = λ αi xi + (1 − λ) β i xi = (λαi + (1 − λ)βi )xi . (2.5)
i=1 i=1 i=1
Pk
Since i=1 λαi + (1 − λ)βi = 1 and λαi + (1 − λ)βi ≥ 0 for i = 1, . . . , k, λy + (1 − λ)z is
a convex combination of x1 , . . . , xk and, thus, λy + (1 − λ)z ∈ S, showing the convexity of
S.
Figure 2.7 illustrates some of the statements represented in the proof. For example, the intersection
of the convex sets is always a convex set. One should notice however that the same does not apply to
the union of convex sets. Notice that statement 2 proves that polyhedral sets as defined according
to Definition 2.5 are convex. Finally the third figure on the right illustrates the convex hull of four
points as a convex polyhedral set containing the lines connecting any two points within the set.
We will halt our discussion about convexity for now, as we have covered the key facts we will
need to prove that the simplex method converges to an optimal point. The last result missing to
achieve this objective is to show a simple yet very powerful result, which states that the presence of
convexity is what allows us to conclude that a locally optimal solution returned by an optimisation
algorithm applied to a linear programming problem is indeed optimal for the problem at hand. It
so turns out that, in the context of linear programming, convexity is a given since linear functions
are convex by definition and the feasibility set of linear programming is also convex (as we have
just shown in 2.8).
2.3. Extreme points, vertices, and basic feasible solutions 33
Theorem 2.9 (Global optimality for convex problems). Let f : Rn → R be a convex function,
that is, f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ), λ ∈ [0, 1], and let S ⊂ Rn be a convex set. Let
x∗ be an element of S. Suppose that x∗ is a local optimum for the problem of minimising f (x) over
S. That is, there exists some ϵ > 0 such that f (x∗ ) ≤ f (x) for all x ∈ S for which ∥x − x∗ ∥ ≤ ϵ.
Then, x∗ is globally optimal, meaning that f (x∗ ) ≤ f (x) for all x ∈ S.
Proof. Suppose, in order to derive a contradiction that f (x) < f (x∗ ) for some x ∈ S. Using
Definition 2.6, we have that
f (λx + (1 − λ)x∗ ) ≤ λf (x) + (1 − λ)f (x∗ ) < λf (x∗ ) + (1 − λ)f (x∗ ) = f (x∗ ), ∀λ ∈ [0, 1].
We see that the strict inequality f (λx + (1 − λ)x∗ ) < f (x∗ ) holds for any point λx + (1 − λ)x∗ ,
including those with ∥λx+(1−λ)x∗ −x∗ ∥ ≤ ϵ. Our assumption thus contradicts the local optimality
of x∗ , and this proves that f (x) ≥ f (x∗ ) for all x ∈ S.
Definition 2.10 (Vertex). Let P be a convex polyhedral set. The vector x ∈ P is a vertex of P if
there exists some c such that c⊤ x < c⊤ y for all y ∈ P with y ̸= x.
Definition 2.11 (Extreme points). Let P be a convex polyhedral set. The vector x ∈ P is an
extreme point of P if there are no two vectors y, z ∈ P (different than x) such that x = λy+(1−λ)z,
for any λ ∈ [0, 1].
Figure 2.8 provides an illustration of the Definitions 2.10 and 2.11. Notice that the definition of a
vertex involves an additional hyperplane that, once placed on a vertex point, strictly contains the
whole polyhedral set in one of the half-spaces it defines, except for the vertex itself. On the other
hand, the definition of an extreme point only relies on convex combinations of elements in the set
itself.
Definition 2.10 also hints at an important consequence for linear programming problems. As we
can see from Theorem 2.8, P is convex, which guarantees that P is contained in the half-space
c⊤ y > c⊤ x. This implies that c⊤ x ≤ c⊤ y, ∀y ∈ P, which is precisely the condition that x must
satisfy to be the minimum for the problem min. x c⊤ x : x ∈ P .
Now we focus on the description of active constraints from an algebraic standpoint. For that, let
us first generalise our setting by considering all possible types of linear constraints. That is, let us
34 Chapter 2. Basics of Linear Algebra
y : c⊤ y = c⊤ w
w v
w
u
P P
c
c y
y : c⊤ y = c⊤ x
x x
z
consider the convex polyhedral set P ⊂ Rn , formed by the set of inequalities and equalities:
a⊤
i x ≥ b, i ∈ M1 ,
a⊤
i x ≤ b, i ∈ M2 ,
a⊤
i x = b, i ∈ M3 .
Definition 2.12 (Active (or binding) constraints). If a vector x satisfies a⊤ i x = bi for some
i ∈ M1 , M2 , or M3 , we say that the corresponding constraints are active (or binding).
x3
x2
D
E
P
B
C x1
Theorem 2.13 sows a thread between having a collection of active constraints forming a vertex
and being able to describe it as a basis of a subspace that is formed by the vectors ai that form
these constraints. This link is what will allow us to characterise vertices by their forming active
constraints.
2. The span({ai }i∈I ) spans Rn . That is, every x ∈ Rn can be expressed as a linear combination
of {ai }i∈I .
Proof. Suppose that {ai }i∈I spans Rn , implying that the span({ai }i∈I ) has dimension n. By
Theorem 2.4 (part 1), n of these vectors form a basis for Rn and are, thus, linearly independent.
Moreover, they must span Rn and therefore every x ∈ Rn can be expressed as a combination of
{ai }i∈I . This connects (1) and (2).
Assume that the system of equations a⊤
i x = bi i∈I has multiple solutions, say x1 and x2 . Then,
the nonzero vector d = x1 − x2 satisfies a⊤i d = 0 for all i ∈ I. As d is orthogonal to every ai , i ∈ I,
d cannot be expressed as a combination of {ai }i∈I and, thus, {ai }i∈I do not span Rn .
Rn , choose d ∈ Rn that is orthogonal to span({ai }i∈I ). If
Conversely,if {ai }i∈I do not span
x satisfies ai x = bi i∈I , so does a⊤
⊤
i (x + d) = bi i∈I , thus yielding multiple solutions. This
connects (2) and (3).
Notice that Theorem 2.13 implies that there are (at least) n active constraints (ai ) that are linearly
independent at x. This is the reason why we will refer to x, and any vertex-forming solution, as
a basic solution, of which we will be interested in those that are feasible, i.e., that satisfy all
constraints i ∈ M1 ∪ M2 ∪ M3 . Definition 2.14 provides a formal definition of these concepts.
Definition 2.14 (Basic feasible solution (BFS)). Consider a convex polyhedral set P ⊂ Rn defined
by linear equality and inequality constraints, and let x ∈ Rn .
1. x is a basic solution if
Figure 2.10 provides an illustration of the notion of basic solutions, and show how only a subset
of the basic solutions are feasible. As one might infer, these will be the points of interest in out
future developments, as these are the candidates for optimal solution.
We finalise stating the main result of this chapter, which formally confirms the intuition we have
developed so far. That is, for convex polyhedral sets, the notion of vertices and extreme points
coincide, and these points can be represented as basic feasible solutions. This is precisely the
link that allows for considering the feasible region of linear programming problems under a purely
algebraic characterisation of the candidates for optimal solutions, those described uniquely by a
subset of constraints of the problem that is assumed to be active.
Theorem 2.15 (BFS, extreme points and vertices). Let P ⊂ Rn be a convex polyhedral set and
let x ∈ P . Then, the following are equivalent
Proof. Let P = {x ∈ Rn : a⊤ ⊤ ⊤
i x ≥ bi , i ∈ M1 , ai x = bi , i ∈ M2 }, and I = i ∈ M1 ∪ M2 | ai x = bi .
36 Chapter 2. Basics of Linear Algebra
A
C
B
P
F
E
D
Figure 2.10: Points A to F are basic solutions; B,C,D, and E are BFS.
1. (Vertex ⇒ Extreme point) Suppose x is a vertex. Then, there exists some c ∈ Rn such that
c⊤ x < c⊤ x, for every x ∈ P with x ̸= x (cf. Definition 2.10). Take y, z ∈ P with y, z ̸= x.
Thus c⊤ x < c⊤ y and c⊤ x < c⊤ z. For λ ∈ [0, 1], c⊤ x < c⊤ (λy + (1 − λ)z) implying that
x ̸= λy + (1 − λ)z, and is thus an extreme point (cf. Definition 2.11).
2. (Extreme point ⇒ BFS) We will prove the contrapositive instead1 . Suppose x ∈ P is not
a BFS. Then, there are no n linearly independent vectors within {ai }i∈I . Thus the vectors
{ai }i∈I lie in a proper subspace of Rn . Let the nonzero vector d ∈ Rn be such that a⊤
i d = 0,
for all i ∈ I.
Let ϵ > 0, y = x + ϵd, and z = x − ϵd. Notice that a⊤ ⊤
i y = ai z = bi , for all i ∈ I. Moreover,
for i ̸= I, a⊤
i x > b i and, provided that ϵ is sufficiently small (such that ϵ|a⊤ ⊤
i d| < ai x − bi ),
we have that a⊤ i x ≥ bi for all i ∈ I. Thus y ∈ P , and by a similar argument, z ∈ P . Now,
by noticing that x = 12 y + 12 z, we see that x is not an extreme point.
P
3. (BFS ⇒ Vertex) Let x be a BFS. Define c = i∈I ai . Then
X X
c⊤ x = a⊤
i x= bi .
i∈I i∈I
since a⊤ ⊤ ⊤
i x ≥ bi for i ∈ M1 ∪ M2 . Thus, for any x ∈ P , c x ≤ c x, making x a vertex (cf.
Definition 2.10).
Some interesting insights emerge from the proof of Theorem 2.15, upon which we will build our
next developments. Once the relationship between being a vertex/extreme point and a BFS is
made, it means that x can be recovered as the unique solution of a system of linear equations,
these equations being the active constraints at that vertex. This means that the list of all optimal
solution candidate points can be obtained by simply looking at all possible combinations of n active
constraints, discarding those that are infeasible. This
means that the number of candidates for
optimal solution is finite and can be bounded by m n , where m = |M1 ∪ M2 |.
1 Consider two propositions A and B. Then, we have that A ⇒ B ≡ ¬B ⇒ ¬A. The latter is known as the
2.4 Exercises
Exercise 2.1: Polyhedral sets [1]
Which of the following sets are polyhedral sets?
Note: the proof of the theorem is proved in the notes. Use this as an opportunity to revisit the
proof carefully, and try to take as many steps without consulting the text as you can. This is a
great exercise to help you internalise the proof and its importance in the context of this book. I
strongly advise against blindly memorising it, as I suspect you will never (in my courses, at least)
be requested to recite the proof literally.
1. A is invertible
2. A⊤ is invertible
3. The determinant of A is nonzero
4. The rows of A are linearly independent
5. The columns of A are linearly independent
6. For every b ∈ Rm , the linear system Ax = b has a unique solution
7. There exists some b ∈ Rm such that Ax = b has a unique solution.
38 Chapter 2. Basics of Linear Algebra
2x1 + 3x2 + x3 = 12
x1 + 2x2 + 3x3 = 12
3x1 + x2 + 2x3 = 12
x1 + 2x2 + 4x3 = 6
2x1 + x2 + 5x3 = 6
x1 + 3x2 + 5x3 = 6
x1 + 2x2 + 4x3 = 5
2x1 + x2 + 5x3 = 4
x1 + 3x2 + 5x3 = 7
a⊤
i x ≤ b, i ∈ M2 ,
a⊤
i x = b, i ∈ M3 .
1. x is a vertex;
2. x is a extreme point;
3. x is a BFS;
max. 2x1 + x2
s.t.: 2x1 + 2x2 ≤ 9
2x1 − x2 ≤ 3
x1 − x2 ≤ 1
x1 ≤ 2.5
x2 ≤ 4
x1 , x2 ≥ 0.
Assess the following points relative to the polyhedron defined in R2 by this system and classify
them as in (i) belonging to which active constraint(s), and (ii) being a (basic) non-feasible/feasible
solution. Use Definitions 2.12 and 2.14 to check if your classification is correct.
a) (1.5, 0)
b) (1, 0)
c) (2, 1)
d) (1.5, 3)
40 Chapter 2. Basics of Linear Algebra
Chapter 3
where |u| represents the cardinality of the vector u. Another transformation that may be required
consists of imposing the condition x ≥ 0. Let us assume that a polyhedral set P was such that
(notice the absent nonnegativity condition)
P = {x ∈ Rn : Ax = b} .
41
42 Chapter 3. Basis, Extreme Points and Optimality in Linear Programming
It is a requirement for standard-form linear programs to have all variables to be assumed nonneg-
ative. To achieve that in this case, we can simply include two auxiliary variables, say x+ and x− ,
with the same dimension as x, and reformulate P as
P = x+ , x− ∈ Rn : A(x+ − x− ) = b, x+ , x− ≥ 0 .
These transformations, as we will see, will be required for employing the simplex method to solve
linear programming problems with inequality constraints and, inevitably, will always render stan-
dard form linear programming problems with more variables than constraints, or m < n.
The standard-form polyhedral set P always has, by definition, m active constraints because of its
equality constraints. To reach the total of n active constraints, n − m of the remaining constraints
xi ≥ 0, i = 1, . . . , n, must be made active, which can be done by selecting n − m of those to be
set as xi = 0. These n active constraints (the original m plus the n − m variables set to zero)
form a basic solution, as we have seen in the last chapter. If it happens that the m equalities can
hold while the constraints xi ≥ 0, i = 1, . . . , n, are satisfied, then we have a basic feasible solution
(BFS). Theorem 3.1 summarises this process, guaranteeing that the setting of n − m variables to
zero will render a basic solution.
Theorem 3.1 (Linear independence and basic solutions). Consider the constraints Ax = b and
x ≥ 0, and assume that A has m linearly independent (LI) rows I = {1, . . . , m}. A vector x ∈ Rn
is a basic solution if and only if we have that Ax = b and there exists indices B(1), . . . , B(m) such
that
Proof. Assume that (1) and (2) are satisfied. Then the active constraints xj = 0 for j ∈
/
{B(1), . . . , B(m)} and Ax = b imply that
m
X n
X
AB(i) xB(i) = Aj xj = Ax = b.
i=1 j=1
Since the columns AB(i) i∈I are LI, xB(i) i∈I are uniquely determined and thus Ax = b has a
unique solution, implying that x is a basic solution (cf. Theorem 2.15).
Conversely, assume that x is a basic solution. Let xB(1) , . . . , xB(k) be the nonzero components of
x. Thus, the system
Xn
Ai xi = b and {xi = 0}i∈{B(1),...,B(k)}
/
i=1
Pk
has a unique solution, and so does i=1 AB(i) xB(i) = b, implying that the columns AB(1) , . . . , AB(k)
Pk
are LI. Otherwise, there would be scalars λ1 , . . . , λk , not all zeros, for which i=1 AB(i) λi = 0;
Pk
this would imply that i=1 AB(i) (xB(i) + λi ) = b, contradicting the uniqueness of x.
Since AB(1) , . . . , AB(k) are LI, k ≤ m. Also, since A has m LI rows, it must have m LI columns
spanning Rm . Using Theorem 2.4, we can obtain m − k additional columns AB(k+1) , . . . , AB(m) so
that AB(1) , . . . , AB(m) are LI.
Finally, since k ≤ m, {xj = 0}i∈{B(1),...,B(m)}
/ ⊂ {xj = 0}i∈{B(1),...,B(k)}
/ , satisfying (1) and (2).
3.1. Polyhedral sets in standard form 43
The proof of Theorem 3.1 highlights an important aspect in the process of generating basic so-
lutions. Notice that once we set n − m variables to be zero, the system of equations forming P
becomes uniquely determined, i.e.,
m
X n
X
AB(i) xB(i) = Aj xj = Ax = b.
i=1 j=1
You might have noticed that in the proof of Theorem 3.1, the focus shifted to the columns of
A rather than its rows. The reason for that is because, when we think of solving the system
Ax = b, what we are truly doing is finding a vector x representing the linear combination of the
columns of A that yield the vector b. This creates an association between the columns of A and the
components of x (i.e., the variables). Notice, however, that the columns of A are not the variables
per se, as they have dimension m (while x ∈ Rn ).
One important interpretation for Theorem 3.1 is that we will form bases for the column space of A
by choosing m components to be nonzero in a vector of dimension n. Since m < n by definition, A
can only have rank rank(A) = min{m, n} = m, which happens to be the size of the row space of A
(as the rows are assumed LI). This, in turn, means that both the column and the row spaces have
dimension m. Thus, these bases are bases for the column space of A. Finally, finding the vector
x is the same as finding how the vector b can be expressed as a linear combination of that basis.
Notice that this is always possible when the basis spans Rm (as we have m LI column vectors) and
b ∈ Rm .
You will notice that from here onwards, we will implicitly refer to the columns of A as variables
(although we actually mean the weight associated with that column, represented by the respective
component in the variable x). Then, when we say that we are setting some (n − m) of the variables
to be zero, it means that we are ignoring the respective columns of A (the mapping between
variables and columns being their indices: x1 referring to the first column, x2 to the second, and
so forth), while using the remainder to form a (unique) combination that yields the vector b, being
the weights of this combination precisely the solution x, which in turn represent the coordinates in
Rn of the vertex formed by the n (m equality constraints plus n − m variables set to zero) active
constraints.
As we will see, this procedure will be at the core of the simplex method. Since we will often refer
to elements associated with this procedure, it will be useful to define some nomenclature.
We say that B = AB(i) i∈I is a basis (or, perhaps more precisely, a basic matrix) with basic
B
indices IB = {B(1), . . . , B(m)}. Consequently, we say that the variables xj , with j ∈ IB , are
basic variables. Somewhat analogously, we say that the variables chosen to be set to zero are the
nonbasic variables xj , with j ∈ IN , where IN = J \ IB , with J = {1, . . . , n} being the indices of
all variables (and all columns of A).
44 Chapter 3. Basis, Extreme Points and Optimality in Linear Programming
Notice that the basic matrix B is invertible since its columns are LI (c.f. Theorem 2.4). For
xB = (xB(1) , . . . , xB(m) ), the unique solution for BxB = b is
xB(1)
xB = B −1 b, where B = AB(1) . . . AB(m) and xB = ... .
xB(m)
Let us consider the following numerical example. Consider the following set P
x1 + x2 + 2x3 ≤ 8
x + 6x ≤ 12
2 3
3
P = x ∈ R : x1 ≤ 4 , (3.1)
x2 ≤ 6
x1 , x2 , x3 ≥ 0
which can be written in the standard by adding slack variables {xi }i∈{4,...,7} , yielding
x1 + x2 + 2x3 + x4 = 8
x + 6x + x = 12
2 3 5
7
P = x ∈ R : x1 + x6 = 4 . (3.2)
x + x = 6
2 7
x1 , . . . , x 7 ≥ 0
• Let IB = {4, 5, 6, 7}; in that case xB = (8, 12, 4, 6) and x = (0, 0, 0, 8, 12, 4, 6), which is a
basic feasible solution (BFS), as x ≥ 0.
• For IB = {3, 5, 6, 7}, xB = (4, −12, 4, 6) and x = (0, 0, 4, 0, −12, 4, 6), which is basic but not
feasible, since x5 < 0.
Definition 3.2 (Adjacent basic solutions). Two basic solutions are adjacent if they share n − 1 LI
active constraints. Alternatively, two bases B1 and B2 are adjacent if all but one of their columns
are the same.
For example, consider the set polyhedral set P defined in (3.2). Our first BFS was defined by
making x1 = x2 = x3 = 0 (nonbasic index set IN = {1, 2, 3}). Thus, our basis was IB = {4, 5, 6, 7}.
An adjacent basis was then formed, by replacing the basic variable x4 with the nonbasic variable
x3 , rendering the new (not feasible) basis IB = {3, 5, 6, 7}.
Notice that the process of moving between adjacent bases has a simple geometrical interpretation.
Since adjacent bases share all but one basic element, this means that the two must be connected by a
line segment (in the case of the example, it would be the segment between (0, 8) and (4, 0), projected
onto (x3 , x4 ) ∈ R2 , or equivalently, the line between (0, 0, 0, 8, 12, 4, 6) and (0, 0, 4, 0, −12, 4, 6)
(Notice how the coordinate x6 also changed values; this is necessary, so the movement is made
along the edge of the polyhedral set. This will become clearer when we analyse the simplex method
in further detail in Chapter 4.
Proof. Assume, without loss of generality, that i1 = 1 and ik = k. Clearly, P ⊂ Q, since a solution
satisfying the constraints forming P also satisfies those forming Q.
As rank(A) = k, the rows ai1 , . . . , aik form a basis in the row space of A and any row ai , i ∈ I,
Pk
can be expressed as a⊤
i =
⊤
j=1 λij aj for λij ∈ R.
Pk Pk
For y ∈ Q and i ∈ I, we have a⊤ i y =
⊤
j=1 λij aj y = j=1 λij bj = bi , which implies that y ∈ P
and that Q ⊂ P . Consequently, P = Q.
Theorem 3.3 implies that any linear programming problem in standard form can be reduced to
an equivalent problem with linearly independent constraints. It turns out that, in practice, most
professional-grade solvers (i.e., software that implements solution methods and can be used to find
optimal solutions to mathematical programming models) have preprocessing routines to remove
redundant constraints. This means that the problem is automatically treated to become smaller
by not incorporating unnecessary constraints.
Degeneracy is somewhat related to the notion of redundant constraints. We say that a given vertex
is a degenerate basic solution if it is formed by the intersection of more than n active constraints
(in Rn ). Effectively, this means that more than n − m variables (i.e., some of the basic variables)
are set to zero, which is the main way to identify a degenerate BFS. Figure 3.1 illustrates a case
in which degeneracy is present.
Notice that, while in the figure on the left, the constraint causing degeneracy is redundant, that
is not the case in the figure on the righthand side. That is, redundant constraints may cause
degeneracy, but not all constraints causing degeneracy are redundant.
46 Chapter 3. Basis, Extreme Points and Optimality in Linear Programming
A D
B
Figure 3.1: A is a degenerate basic solution, B and C are degenerate BFS, and D is a BFS.
As another example, in some cases, we may see that degeneracy is caused by the chosen represen-
tation of the problem. For example, consider the two equivalent sets:
The polyhedral set P2 is equivalent to P1 since x2 ≥ 0 is a redundant constraint. In that case, one
can see that, while the point (0, 0, 1) is not degenerate in P1 , it is in P2 , which illustrates the weak
but existent relationship between redundancy and degeneracy. This is illustrated in Figure 3.2.
(0, 0, 1)
(1, 1, 0)
In practice, degeneracy might cause issues related to the way we identify vertices. Because more
than n active constraints form the vertex, and yet, we identify vertices by groups of n constraints to
be active, it means that we might have a collection of adjacent bases that, in fact, are representing
the same (vertex) point in space, meaning that we might be “stuck” for a while in the same position
while scanning through adjacent bases. The numerical example below illustrates this phenomenon.
Let us consider again the example in (3.2).
1 1 2 1 0 0 0 8
0 1 6 0 1 0 0 12
x =
1 0 0 0 0 1 0 4
0 1 0 0 0 0 1 6
• let IB = {1, 2, 3, 7}; this implies that x = (4, 0, 2, 0, 0, 0, 6). There are 4 zeros (instead of
n − m = 3) in x, which indicates degeneracy.
• Now, let = IB = {1, 3, 4, 7}. This also implies that x = (4, 0, 2, 0, 0, 0, 6). The two bases are
adjacent yet represent the same point in R7 .
3.2. Optimality of extreme points 47
As we will see, there are mechanisms that prevent the simplex method from becoming stuck on
such vertices forever, an issue that is referred to as cycling.
Figure 3.3 illustrates the notion of containing a line and the existence of extreme points. Notice
that if a set would have any “corner”, then that would imply that the edges that form that corner
prevent a direction (or a line) to be fully contained within the set. Furthermore, the said corner
would be an extreme point.
P
Q
Figure 3.3: P contains a line (left) and Q does not contain a line (right)
We are now ready to pose the result that utilises Definition 3.4 to provide the conditions for the
existence of extreme points.
Proof. We start with (2) ⇒ (1), i.e., if P does not contain a line, then it must have a basic feasible
solution and thus, cf. Theorem 2.15, an extreme point. Let x ∈ Rn be an element of P and let
48 Chapter 3. Basis, Extreme Points and Optimality in Linear Programming
I = i : a⊤
i x = bi s . If n of the vectors ai , i ∈ I, are linearly independent, then x is a basic feasible
solution, cf. Definition 2.14.
If less than n vectors are linearly independent, then all vectors ai , i ∈ I, lie in a proper subspace
of Rn and, consequently, there exists a nonzero vector d ∈ Rn such that a⊤ i d = 0 for every i ∈ I.
Consider the line consisting of all points of the form y = x + λd, with λ ∈ R. For i ∈ I, we have
that
a⊤ ⊤ ⊤ ⊤
i y = ai x + λai d = ai x = b.
Thus, the active constraints remain active at all points on the line. Now, by assumption, the
polyhedral set contains no lines, and thus there will be values for λ for which some constraint will
eventually be violated. Let λ be the point at which this new constraint j ̸∈ I becomes active, i.e.,
a⊤
j x + λ = bj .
We must show that aj is not a linear combination of {ai }i∈I . We know that a⊤ j x ̸= bj , as j ̸∈ I
and that a⊤j x + λ = bj by the definition of λ. This implies that a⊤
j d ̸
= 0. Contrasting this with
⊤ 2
the fact that ai d = 0, ∀i ∈ I, we conclude that aj is linearly independent of {ai }i∈I .
To conclude, we can see that by moving from x to x + λd, the number of linearly independent
active constraints is increased by one. Repeating the see procedure as many times as needed, we
will eventually reach a point having n linearly independent constraints, being thus a basic feasible
solution, as we retain feasibility in the process.
To show that (1) ⇒ (3), we simply need to notice that if P has an extreme point x, then it is also
a basic feasible solution (cf. Theorem 2.15) and there are n active constraints corresponding to
{ai }i∈I linearly independent vectors.
Finally, let us show that (3) ⇒ (2) by contradiction. Assume, without loss of generality, that
a1 , . . . , an are the n linearly independent vectors. Suppose that P contains a line x + λd where
d ̸= 0. Therefore, we have that a⊤ i x ≥ bi , for all i = 1, . . . , m and for any λ. The only way this is
possible is if a⊤ i d = 0 for all i = 1, . . . , m, which implies that d = 0, reaching a contradiction that
establishes that P does not contain a line.
It turns out that linear programming problems in the standard form do not contain a line, meaning
that they will always provide at least one extreme point (and, consequently, a basic feasible solution,
cf. Theorem 2.15). More generally, bounded polyhedral sets do not contain a line, and neither
does the positive orthant.
We are now to state the result that proves the intuition we had when analysing the plots in Chapter
1, which states that if a polyhedral set has at least one extreme point and at least one optimal
solution, then there must be an optimal solution that is an extreme point.
Suppose that P has at least one extreme point and that there exists an optimal solution. Then,
there exists an optimal solution that is an extreme point of P .
tions. Since Q ⊂ P and P contains no line (cf. Theorem 3.5), Q contains no line either, and thus
has an extreme point.
2 To see that, notice that d is orthogonal to any linear combination of vectors {a }
i i∈I whilst it is not orthogonal
to aj and, thus, aj cannot be expressed as a linear combination of {ai }i∈I .
3.2. Optimality of extreme points 49
Theorem 3.6 is posed in a somewhat general way, which might be a source for confusion. First,
recall that in the example in Chapter 1, we considered the possibility of the objective function level
curve associated with the optimal value to be parallel to one of the edges of the feasible region,
meaning that instead of a single optimal solution (a vertex), we would observe a line segment
containing an infinite number of optimal solutions, of which exactly two would be extreme points.
This is important because we intend to design an algorithm that only inspects extreme points.
This discussion guarantees that, even for the cases in which a whole set of optimal solutions exists,
some elements in that set will be extreme points anyway and thus identifiable by our method.
In a more general case (with n > 2) it might be so that a whole facet of optimal solutions is obtained.
That is precisely the polyhedral set of all optimal solutions Q in the proof. This polyhedral set
will not contain a line and, therefore (cf. Theorem 3.5), have at least one extreme point.
Perhaps another important point is to notice that the result is posed assuming a minimisation
problem, but it naturally holds for maximisation problems as well. Maximising a function f (x) is
the same as minimising −f (x), with the caveat that, although the optimal value x∗ is the same in
both cases, the optimal values are symmetric in sign (because of the additional minus we included
in the problem being originally maximised).
This very simple procedure happens to be the core idea of most optimisation methods. We will
concentrate on how to identify directions of improvement and, as a consequence of their absence,
how to identify optimality.
Starting from a point x ∈ P , we would like to move in a direction d that yields improvement while
maintaining feasibility. Definition 3.7 provides a formalisation of this idea.
Figure 3.4 illustrates the concept. Notice that at extreme points, the relevant feasible directions
are those along the edges of the polyhedral set since those are the directions that can lead to other
extreme points.
50 Chapter 3. Basis, Extreme Points and Optimality in Linear Programming
Let us now devise a way of identifying feasible directions algebraically. For that, let A be a m × n
matrix, I = [m] and J = [n]. Consider the problem
min. c⊤ x : Ax = b, x ≥ 0 .
Let x be a basic feasible solution (BFS) with basis B = [AB(1) , . . . , AB(m) ]. Recall that the basic
variables xB are given by
and that the remaining nonbasic variables xN are such that xN = (xj )j∈IN = 0, with IN = J \ IB .
Moving to a neighbouring solution can be achieved by simply moving between adjacent bases,
which can be accomplished without a significant computational burden. This entails selecting a
nonbasic variable xj , with j ∈ IN , and increasing it to a positive value θ.
Equivalently, we can define a feasible direction d = [dN , dB ], where dN represent the components
associated with nonbasic variables and dB those associated with basic variables and move from the
point x to the point x + θd. The components dN associated with the nonbasic variables are thus
defined as (
dj = 1
d=
dj ′ = 0, for all j ′ ̸= j,
with j, j ′ ∈ IN . Notice that, geometrically, we are moving along a line in the dimension represented
by the nonbasic variable xj .
Now, feasibility might become an issue if we are not careful to retain feasibility conditions. To
retain feasibility, we must observe that A(x + θd) = b, implying that Ad = 0. This allows us to
define the components dB of the direction vector d that is associated with the basic variables xj ,
with j ∈ IB , since
n
X m
X
0 = Ad = Aj dj = AB(i) dB(i) + Aj = BdB + Aj
j=1 i=1
and thus dB = −B −1 Aj is the basic direction implied by the choice of the nonbasic variable xj ,
j ∈ IN , to become basic. The vector dB can be thought of as the adjustments that must be made
in the value of the other basic variables to accommodate the new variable becoming basic to retain
feasibility.
Figure 3.5 provides a schematic representation of this process, showing how the change between
adjacent basis can be seen as a movement between adjacent extreme points. Notice that it conveys
a schematic representation of a n = 5 dimensional problem, in which we ignore all the m dimensions
and concentrate on the n−m dimensional projection of the feasibility set. This implies that the only
constraints left are those associated with the nonnegativity of the variables x ≥ 0, each associated
with an edge of this alternative representation. Thus, when we set n − m (nonbasic variables)
3.2. Optimality of extreme points 51
x3 = 0 B
A
x1 = 0
x4 = 0 x5 = 0
C
x2 = 0
1. All the other nonbasic variables remain valued at zero, that is, xj ′ = 0 for j ′ ∈ IN \ {j}.
2. If x is a nondegenerate extreme point, then all xB > 0 and thus xB +θdB ≥ 0 for appropriately
small θ > 0.
3. If x is a degenerate extreme point: dB(i) might not be feasible since, for some B(i), if dB(i) < 0
and xB(i) = 0, any θ > 0 will make xB(i) < 0.
We will see later that we can devise a simply rule to define a value for θ that guarantees the above
will be always observed. For now, we will put this discussion on hold and focus on the issue of how
to guide the choice of which nonbasic variable xj , j ∈ IN , to select to become basic.
c⊤ d = c⊤ ⊤ −1
B dB + cj = cj − cB B Aj ,
where cB = [cB(1) , . . . , cB(m) ]. The quantity cj − cB B −1 Aj can be used, for example, in a greedy
fashion, meaning that we choose the nonbasic variable index j ∈ IN with greatest potential of
improvement.
First, let us formally define this quantity, which is known as the reduced cost.
52 Chapter 3. Basis, Extreme Points and Optimality in Linear Programming
Definition 3.8 (Reduced cost). Let x be a basic solution associated with the basis B and let
cB = [cB(1) , . . . , cB(m) ] be the objective function coefficients associated with the basis B. For each
nonbasic variable xj , with j ∈ IN , we define the reduced cost cj as
cj = cj − c⊤
BB
−1
Aj .
The name reduced cost refers to the fact that it quantifies a cost change onto the reduced space
of the basic variables. In fact, the reduced cost is calculating the change in the objective function
caused by the increase in one unit of the nonbasic variable xj elected to become basic (represented
by the cj component) and the associated change caused by the accommodation in the basic variable
values to retain feasibility (−c⊤ BB
−1
Aj ). Therefore, the reduced cost can be understood as a
marginal value of change in the objective function value associated with each nonbasic variable.
Let us demonstrate this with a numerical example. Consider the following linear programming
problem
min. 2x1 + 1x2 + 3x3 + 2x4
s.t.: x1 + x2 + x3 + x4 = 2
2x1 + 3x3 + 4x4 = 2
x1 , x2 , x3 , x4 ≥ 0.
Let IB = {1, 2}, yielding
1 1
B= .
2 0
Thus, x3 = x4 = 0 and x = (1, 1, 0, 0). The basic direction d3B for x3 is given by
3 −1 0 1/2 1 −3/2
dB = −B A3 = − = .
1 −1/2 3 1/2
which gives d3 = (−3/2, 1/2, 1, 0). Analogously, for x4 , we have
0 1/2 1 −2
d4B = −B −1 A4 = − = .
1 −1/2 4 1
Let x be the BFS associated with a basis B and let c be the corresponding vector of reduced costs.
Proof. To prove (1), assume that cj ≥ 0, let y be a feasible solution to P , and d = y − x. We have
that Ax = Ay = b and thus Ad = 0. Equivalently:
X X
BdB + Aj dj = 0 ⇒ dB = − B −1 Aj dj ,
j∈IN j∈IN
implying that
X X X
c⊤ d = c⊤
B dB + cj dj = (cj − c⊤
BB
−1
Aj )dj = cj dj . (3.3)
j∈IN j∈IN j∈IN
c⊤ d ≥ 0 ⇒ c⊤ (y − x) ≥ 0 ⇒ c⊤ y ≥ c⊤ x,
i.e., x is optimal. To prove (2) by contradiction, assume that x is optimal with cj < 0 for some
j ∈ IN . Thus, we could improve on x moving along this j th direction d, contradicting the optimality
of x.
A couple of remarks are worth making at this point. First, notice that, in the presence of degener-
acy, it might be that x is optimal with cj < 0 for some j ∈ IN . This is mostly caused by problems
with multiple optima (i.e., the set of optimal solutions not being empty nor a singleton). Luckily,
the simplex method manages to get around this issue in an effective manner, as we will see in the
next chapter. Another point to notice is that, if cj > 0, ∀j ∈ IN , then x is a unique optimal.
Analogously, if c ≥ 0 with cj = 0 for some j ∈ IN , then it means that moving in that direction will
cause no change in the objective function value, implying that both BFS are “equally optimal”
and that the problem has multiple optimal solutions.
54 Chapter 3. Basis, Extreme Points and Optimality in Linear Programming
3.3 Exercises
Exercise 3.1: Properties of basic solutions
Prove the following theorem:
Theorem (Linear independence and basic solutions). Consider the constraints Ax = b and x ≥ 0,
and assume that A has m LI rows M = {1, . . . , m}. A vector x ∈ Rn is a basic solution if and only if we
have that Ax = b and there exists indices B(1), . . . , B(m) such that
Note: the theorem is proved in the notes. Use this as an opportunity to revisit the proof carefully,
and try to take as many steps without consulting the text as you can. This is a great exercise to
help you internalise the proof and its importance in the context of the material. I strongly advise
against blindly memorising it, as I suspect you will never (in my courses, at least) be requested to
recite the proof literally.
(a) Enumerate all basic solutions, and identify those that are basic feasible solutions.
(b) Draw the feasible region, and identify the extreme point associated with each basic feasible
solution.
(c) Consider a minimization problem with the cost vector c′ = (c1 , c2 , c3 , c4 ) = (−2, 12 , 0, 0).
Compute the basic directions and the corresponding reduced costs of the nonbasic variables
at the basic solution x′ = (3, 3, 0, 0) with x′B = (x1 , x2 ) and x′N = (x3 , x4 ); either verify that
x′ is optimal, or move along a basic direction which leads to a better solution.
max 2x1 + x2
s.t.
2x1 + 2x2 ≤ 9
2x1 − x2 ≤ 3
x1 − x2 ≤ 1
4x1 − 3x2 ≤ 5
x1 ≤ 2.5
x2 ≤ 4
x1 , x2 ≥ 0
(a) Plot the constraints and find the degenerate basic feasible solutions.
(b) What are the bases forming the degenerate solutions?
3.3. Exercises 55
(P ) : min. c⊤ x : Ax = b, x ≥ 0 .
Building upon the elements we defined in Chapter 3, employing the simplex method to solve P
consists of the following set of steps:
Moving along the feasible direction d towards x + θd (with scalar θ > 0) makes xj > 0 (i.e., j ∈ IN
enters the basis) while reducing the objective value at a rate of cj . Thus, one should move as far
as possible (say, take a step of length θ) along the direction d, which incurs on an objective value
change of θ(c⊤ d) = θcj .
Moving as far along the feasible direction d as possible while observing that feasibility is retained
is equivalent to setting θ as
θ = max θ ≥ 0 : A(x + θd) = b, x + θd ≥ 0 .
Recall that, by construction, the feasible direction d, we have that Ad = 0 and thus A(x + θd) =
Ax = b. Therefore, the only feasibility condition that can be violated when setting θ too large is
the nonnegativity of all variables, i.e., x + θd ≥ 0.
1 Notice that this can be assumed without loss of generality, by multiplying both sides by a constant, which does
57
58 Chapter 4. The simplex method
To prevent this from being the case, all basic variables i ∈ IB for which the component in the basic
direction vector dB is negative must be guaranteed to retain
xi
xi + θdi ≥ 0 ⇒ θ ≤ − .
di
Therefore, the maximum value θ is that that can be increased until the first component of xB
turns zero. Or, more precisely put,
xB(i)
θ = min − .
i∈IB : di <0 dB(i)
Notice that we only need to consider those basic variables with components di , i ∈ IB , that are
negative. This is because, if di ≥ 0, then xi + θdi ≥ 0 holds for any value of θ > 0. This means
that the constraints associated with these basic variables (referring to the representation in Figure
3.5) do not limit the increase in value of the select nonbasic variable. Notice that this can lead
to a pathological case in which none of the constraints limits the increase in value of the nonbasic
variable, which indicates that the problem has an unbounded direction of decrease for the objective
function. In this case, we say that the problem is unbounded.
Another important point is the assumption of a nondegenerate BFS. The nondegeneracy of the
BFS implies that xB(i) > 0, ∀i ∈ IB , and, thus, θ > 0. In the presence of degeneracy, one can infer
that the definition of the step size θ must be done more carefully.
Let us consider the numerical example we used in Chapter 3 with a generic objective function.
min. c1 x1 + c2 x2 + c3 x3 + c4 x4
s.t.: x1 + x2 + x3 + x4 = 2
2x1 + 3x3 + 4x4 = 2
x1 , x2 , x3 , x4 ≥ 0.
Let c = (2, 0, 0, 0) and IB = {1, 2}. The reduced cost of the nonbasic variable x3 is
where dB = [−3/2, 1/2]. As x3 increases in value, only x1 decreases, since d1 < 0. Therefore, the
largest θ for which x1 ≥ 0 is −(x1 /d1 ) = 2/3. Notice that this is precisely the value that makes
x1 = 0, i.e., nonbasic. The new basic variable is now x3 = 2/3, and the new (adjacent, as we will
see next) extreme point is
By moving to the BFS x by making x = x + θd, we are in fact moving from the basis B to an
adjacent basis B, defined as
(
B(i) = B(i), for i ∈ IB \ {l}
B=
B(i) = j, for i = l.
Notice that the new basis B only has one pair of variables swapped between basic and nonbasic
when compared against B. Analogously, the basic matrix associated with B is given by
AB(1) ... AB(l−1) Aj AB(l+1) ... AB(m) ,
where the middle column representing that the column AB(l) has been replaced with Aj .
Theorem 4.1 provides the technical result that formalises our developments so far.
Theorem 4.1 (Adjacent bases). Let Aj be the column of the matrix A associated with the selected
nonbasic variable index j ∈ IN . And let l be defined as (4.1), with AB(i) being its respective column
in A. Then
(1) The columns AB(i) and Aj are linearly independent. Thus, B is a basic matrix;
making B −1 AB(i) not linearly independent. However, B −1 AB(i) = B −1 AB(i) for i ∈ IB \ {l} and
thus are all unit vectors ei with the lth component zero.
Now, B −1 Aj = −d
B , with component dB(l) ̸= 0, is linearly independent from B −1 AB(i) =
−1
B AB(i) . Thus, AB(i) i∈I = AB(i) i∈B\{l} ∪ {Aj } are linearly independent, forming the
B
contradiction.
We have finally compiled all the elements we need to state the simplex method pseudocode, which
is presented in Algorithm 1. One technical detail in the presentation of the algorithm is the use
of the auxiliary vector u. This allows for the precalculation of the components of dB = −B −1 Aj
(notice the changed sign) to test for unboundedness, that is, the lack of a constraint (and associated
basic variable) that can limit the increase of the chosen nonbasic variable.
The last missing element is proving that Algorithm 1 eventually converges to an optimal solution,
should one exist. This is formally stated in Theorem 4.2.
Theorem 4.2 (Convergence of the simplex method). Assume that P has at least one feasible
solution and that all BFS are nondegenerate. Then, the simplex method terminates after a finite
number of iterations, in one of the following states:
60 Chapter 4. The simplex method
Proof. If the condition in Line 2 of Algorithm 1 is not met, then B and associated BFS are optimal,
c.f. Theorem 3.9. Otherwise, if Line 4 condition is met, then d is such that Ad = 0 and d ≥ 0,
implying that x + θd ∈ P for all θ > 0, and a value reduction θc of −∞.
Finally, notice that θ > 0 step sizes are taken along d satisfy c⊤ d < 0. Thus, the value of successive
solutions is strictly decreasing and no BFS can be visited twice. As there is a finite number of
BFS, the algorithm must eventually terminate.
the possibility of moving to the nondegenerate basis IB = {3, 4, 5}, i.e., y. Assuming that we
are minimising, any direction with a negative component in the y-axis direction will represent an
improvement in the objective function (notice the vector c, which is parallel to the y-axis and
points upwards). Thus, the method is only stalled by the degeneracy of x but does not cycle.
−g x
c f g
x4 = 0 x5 = 0
h
x3 = 0
x2 = 0
x1 = 0 y
Figure 4.1: IN = {4, 5} for x; f (x5 > 0) and g (x4 > 0) are basic directions. Making IN = {2, 5}
lead to new basic directions h (x4 > 0) and −g (x2 > 0).
• Greedy selection (or Dantzig’s rule): choose xj , j ∈ IN , with largest |cj |. Prone to cycling.
• Index-based order (or Bland’s rule): choose xj , j ∈ IN , with smallest j. It prevents cycling
but is computationally inefficient.
• Reduced cost pricing: calculate θ for all (or some) j ∈ N and pick smallest (i.e., largest
in module with negative sign) θcj . Calculating the actual observed change for all nonbasic
variables is too computationally expensive. Partial pricing refers to only considering a subset
of the nonbasic variables to calculate θcj .
62 Chapter 4. The simplex method
• Devex 2 and steepest-edge 3 : most commonly used by modern implementations of the simplex
method, available in professional-grade solvers.
where the lth column AB(l) is precisely how the adjacent bases B and B differ, with Aj replacing
AB(l) in B.
−1
We can devise an efficient manner to update B −1 into B utilising elementary row operations.
First, let us formally define the concept.
Definition 4.3 (Elementary row operations). Adding a constant multiple of one row to the same
or another row is called an elementary row operation.
Calculating QB = (I + D)B is the same as having the j th row of B multiplied by a scalar β and
then having the resulting j th row added to the ith row of B. Before we continue, let us utilise a
numerical example to clarify this procedure. Let
1 2
B = 3 4
5 6
and suppose we would like to multiply the third row by 2 and have it then added to the first row.
That means that D13 = 2 and that Q = I + D would be
1 2
Q= 1
1
As a side note, we have that Q−1 exists since det(Q) = 1. Furthermore, sequential elementary
row operations {1, 2, . . . , k} can be represented as Q = Q1 Q2 , . . . Qk .
−1
Going back to the purpose of updating B −1 into B , notice the following. Since B −1 B = I, each
term B −1 AB(i) is the ith unit vector ei (the ith column of the identity matrix). That is,
1 u1
.. ..
. .
B −1 B = e1
... el−1 u el+1 ... em =
ul ,
.. ..
. .
um 1
̸ l, multiply the lth row by − uuil and add to the ith row. This replaces ui
1. First, for each i =
with zero for all i ∈ I \ {l}.
We can restate the simplex method in its revised form. This is presented in Algorithm 2.
Notice that in Algorithm 2, apart from the initialisation step, no linear systems are directly solved.
Instead, elementary row operations (ERO) are performed, leading to computational savings.
64 Chapter 4. The simplex method
The key feature of the revised simplex method is a matter of representation and, thus, memory
allocation savings. Algorithm 2 only requires keeping in memory a matrix of the form
| p⊤ b
p
B −1 | u
−1
which, after each series of elementary row operations, yields not only B but also the updated
⊤ ⊤ −1 ⊤
simplex multipliers p and p b = cB B b = cB xB , which represents the objective function value of
the new basic feasible solution x = [xB , xN ]. These savings will become obvious once we discuss
the tabular (or non-revised) version of the simplex method.
Three main issues arise when considering the efficiency of implementations of the simplex method,
namely, matrix (re)inversion, representation in memory, and the use of matrix decomposition
strategies.
• Reinversion: localised updates such as EROs have the side effect of accumulating truncation
and round-off error. To correct this, solvers typically rely on periodically recalculating B −1 ,
which, although costly, can avoid numerical issues.
between each basic feasible solution. However, it is a helpful representation from a pedagogical
standpoint and will be useful for explaining further concepts in the upcoming chapters.
In contrast to the revised simplex method, instead of updating only B −1 , we consider the complete
matrix
B −1 [A | b] = B −1 A1 , . . . , B −1 An | B −1 b .
c1 ··· cn −cB xB
⊤
c − c⊤ −1
−c⊤ −1 xB(1)
BB A BB b
−1 −1 ⇒ ..
B A B b B −1 A1 ··· B −1 An .
xB(m)
In this representation, we say that the j th column associated with the nonbasic variable to become
basic is the pivot column u. Notice that, since the tableau exposes the reduced costs cj , j ∈ IN , it
allows for trivially applying the greedy pricing strategy (by simply choosing the variables with a
negative reduced cost with the largest absolute value).
The lth row associated with the basic variable selected to leave the basis is the pivot row. Again, the
tableau representation facilitates the calculation of the ratios used in choosing the basic variable
l to leave the basis since it amounts to simply calculating the ratios between the elements on the
rightmost column and those in the pivot column, disregarding those that present entries less or
equal than zero and the zeroth row. The row with the minimal ratio will be the row associated
with the current basic variable leaving the basis.
Once a pivot column and a pivot row have been defined, it is a matter of performing elemental
row operations utilising the pivot row to turn the pivot column into the unit vector el and turn
to zero the respective zeroth element (recall that basic variables have zero reduced costs). This is
the same as using elementary row operations using the pivot row to turn all elements in the pivot
column zero, except for the pivot element ul , which is the intersection of the pivot row and the
pivot column, that must be turned into one. The above highlights the main purpose of the tableau
representation, which is to facilitate calculation by hand.
Notice that, as we have seen before, performing elementary row operations to convert the pivot
−1
column u into el converts B −1 [A | b] into B [A | b]. Analogously, turning the entry associated
with the pivot column u in the zeroth row to zero converts [c⊤ − c⊤ BB
−1
A | −c⊤BB
−1
b] into
⊤ ⊤ −1 ⊤ −1
[c − cB B A | −cB B b].
Let us return to the paint factory example from Section 1.2.1, which in its standard form can be
66 Chapter 4. The simplex method
written as
x1 x2 x3 x4 x5 x6 RHS
z -5 -4 0 0 0 0 0
x3 6 4 1 0 0 0 24
x4 1 2 0 1 0 0 6
x5 -1 1 0 0 1 0 1
x6 0 1 0 0 0 1 2
x1 x2 x3 x4 x5 x6 RHS
z 0 -2/3 5/6 0 0 0 20
x1 1 2/3 1/6 0 0 0 4
x4 0 4/3 -1/6 1 0 0 2
x5 0 5/3 1/6 0 1 0 5
x6 0 1 0 0 0 1 2
x1 x2 x3 x4 x5 x6 RHS
z 0 0 3/4 1/2 0 0 21
x1 1 0 1/4 -1/2 0 0 3
x2 0 1 -1/8 3/4 0 0 3/2
x5 0 0 3/8 -5/4 1 0 5/2
x6 0 0 1/8 -3/4 0 1 1/2
The bold terms in the tableau represent the pivot elements at each iteration, i.e., the intersection of
the pivot column and row. From the last tableau, we see that the optimal solution is x∗ = (3, 3/2).
Notice that we applied a change in signs in the objective function coefficients, turning it into a
minimisation problem; also notice that this makes the values of the objective function in the RHS
column appear as positive, although it should be negative as we are in fact minimising −5x1 − 4x2 ,
for which x∗ = (3, 3/2) evaluates as −21. As we have seen, the tableau shows −cB xB , hence why
the optimal tableau displays 21 in the first row of the RHS column.
Ax ≤ b ⇒ Ax + s = b.
4.2. Implementing the simplex method 67
Notice that this is equivalent to assuming all original problem variables (i.e., those that are not
slack variables) to be initialised as zero (i.e., nonbasic) since this is a trivially available initial
feasible solution. However, this approach does not work for constraints of the form Ax ≥ b, as in
this case, the transformation would take the form
Ax ≥ b ⇒ Ax − s = b ⇒ Ax − s = b.
Notice that making the respective slack variable basic would yield an initial value of −b, making
the basic solution not feasible.
For more general problems, however, this might not be possible since simply setting the original
problem variables to zero might not yield a feasible solution that can be used as a BFS. To
circumvent that, we rely on artificial variables to obtain a BFS.
Let P : min. c⊤ x : Ax = b, x ≥ 0 , which can be achieved with appropriate transformation (i.e.,
adding nonnegative slack to the inequality constraints) and assumed (without loss of generality)
to have b ≥ 0. Then, finding a BFS for P amounts to finding a zero-valued optimal solution to the
auxiliary problem
m
X
(AU X) : min. yi
i=1
s.t.: Ax + y = b
x, y ≥ 0.
The auxiliary problem AU X is formed by including one artificial variable for each constraint in P ,
represented by the vector y of so-called artificial variables. Notice that the problem is represented
in a somewhat compact notation, in which we assume that all slack variables used to convert
inequalities into equalities have already been incorporated in the vector x and matrix A, with
the artificial variables y playing the role of “slacks” in AU X that can be assumed to be basic
and trivially yield an initial BFS for AU X. In principle, one does not need artificial variables
for the rows in which there is a positive signed slack variable (i.e., an originally less-or-equal-than
constraint), but this representation allows for compactness.
Solving AU X to optimality consists of trying to find a BFS in which the value of the artificial vari-
ables is zero since, in practice, the value of the artificial variables measures a degree of infeasibility
of the current basis in the context of the original problem P . This means that a BFS in which the
artificial variable plays no roles was found and can be used as an initial BFS for solving P . On the
other hand, if the optimal for AU X is such that some of the artificial variables are nonzero, then
this implies that there is no BFS for AU X in which the artificial variables are all zero, or, more
specifically, there is no BFS for P , indicating that the problem P is infeasible.
Assuming that P is feasible and y = 0, two scenarios can arise. The first is when the optimal basis
B for AU X is composed only of columns Aj of the original matrix A, with no columns associated
with the artificial variables. Then B can be used as an initial starting basis without any issues.
The second scenario is somewhat more complicated. Often, AU X is a degenerate problem and the
optimal basis B may contain some of the artificial variables y. This then requires an additional
preprocessing step, which consists of the following:
(1) Let AB(1) , . . . , AB(k) be the columns A in B, which are linearly independent. We know
from earlier (c.f. Theorem 2.4) that, being A full-rank, we can choose additional columns
AB(k+1) , . . . , AB(m) that will span Rm .
68 Chapter 4. The simplex method
(2) Select the lth artificial variable yl = 0 and select a component j in the lth row with nonzero
B −1 Aj and use elementary row operations to include Aj in the basis. Repeat this m − k
times.
Pm
The procedure is based on several ideas we have seen before. Since i=1 yi is zero at the optimal,
there must be a BFS in which the artificial variables are nonbasic (which is what (1) is referring
to). Thus, step (2) can be repeated until a basis B is formed and includes none of the artificial
variables.
Some interesting points are worth highlighting. First, notice that B −1 AB(i) = ei , i = 1, . . . , k.
Since k < l, the lth component of each of these vectors is zero and will remain so after performing
the elementary row operations. In turn, the lth entry of B −1 Aj is not zero, and thus Aj is linearly
independent to AB(1) , . . . , AB(k) .
However, it might be so that we find zero elements in the lth row. Let g be the lth row of B −1 A
(i.e., the entries in the tableau associated with the original problem variables). If g is the null
vector, then gl is zero, and the procedure fails. However, note that g ⊤ A = 0 = g ⊤ Ax = g ⊤ b,
implying that g ⊤ Ax = g ⊤ b is redundant can be removed altogether.
This process of generating initial BFS is often referred to as Phase I of the two-phase simplex
method. Phase II consists of employing the simplex method as we developed it, utilising the BFS
found in Phase I as a starting basis.
P : min. z
s.t.: Ax = b
c⊤ x = z
Xn
xj = 1
j=1
x ≥ 0.
In this reformulation, we make the objective function an auxiliary variable, so it can be easily
represented on a real line at the expense of adding an additional constraint c⊤ x = z. Furthermore,
we normalise the decision variables so they add to one (notice that this implies a bounded feasible
4.3. Column geometry of the simplex method 69
P : min. z
A1 A2 A b
s.t.: x1 + x2 + · · · + xn n =
c1 c2 cn z
Xn
xj = 1
j=1
x ≥ 0.
This second formulation exposes one interesting interpretation of the problem. Solving P is akin to
finding a set of weights x that makes a convex combination (c.f. Definition 2.7) of the columns of
A such that it constructs (or matches) b in a way that the resulting combination of the respective
components of the vector c is minimised. Now, let us define some nomenclature that will be useful
in what follows.
Definition 4.4 (k-dimensional simplex). A collection of vectors y1 , . . . , yk+1 are affinely indepen-
dent if k ≤ n and y1 − yk+1 , . . . , yk − yk+1 are linearly independent. The convex hull of k + 1
affinely independent vectors is a k-dimensional simplex.
Definition 4.4 is precisely the inspiration for the name of the simplex method. We know that only
m + 1 components of x will be different than zero since that is the number of constraints we have
and, thus, the size of a basis in this case. Thus, a BFS is formed by m + 1 (Ai , 1) columns, which
in turn are associated with (Ai , ci ) basic points.
Figure 4.2 provides an illustration of the concept. In this, we have that m = 2, so each column
Aj represents a point in a two-dimensional plane. Notice that a basis requires three points (Ai , ci )
and forms a 3-simplex. A BFS is a selection of three points (Ai , ci ) such that b, also illustrated
in the picture, can be formed by a convex combination of the (Ai , ci ) forming the basis. This will
be possible if b happens to be inside the 3-simplex formed by these points. For example, in Figure
4.2, the basis formed by columns {2, 3, 4} is a BFS, while the basis {1, 2, 3} is not.
(A3 , c3 )
z (A1 , c1 )
(A2 , c2 )
(A4 , c4 )
A3
A1 b A4
A2
We now can add a third dimension to the analysis representing the value of z. For that, we will
use Figure 4.3. As can be seen, each selection of basis creates a tilt in the three-dimensional
simplex, such that the point b is met precisely at the height corresponding to its value in the z
axis. This allows us to compare bases according to their objective function value. And, since we
are minimising, we would like to find the basis that has its respective simplex crossing b at the
lowermost point.
70 Chapter 4. The simplex method
z
B
I
C
H
F
D
G
E
b
Notice that in Figure 4.3, although each facet is a basic simplex, only three are feasible (BCD,
CDF , and DEF ). We can also see what one iteration of the simplex method does under this
geometrical interpretation. Moving between adjacent basis means that we are replacing one vertex
(say, C) with another (say, E) considering the potential for a decrease in value in the z axis
(represented by the difference between points H and G onto the z axis). You can also see the
notion of pivoting: since we are moving between adjacent bases, two successive simplexes share an
edge in common. Consequently, they pivot around that edge (think about the movement of the
edge C moving to the point E while the edge DF remains fixed).
1
3
2
4 7
8
5
Figure 4.4: Pivots from initial basis [A3 , A6 ] to [A3 , A5 ] and to the optimal basis [A8 , A5 ]
Now we are ready to provide an insight into why the simplex method is often so efficient. The main
reason is associated with the ability that the method possesses of skipping bases in favour of those
with most promising improvement. To see that, consider Figure 4.4, which is a 2-dimensional
schematic projection of Figure 4.3. By using the reduced costs to guide the choice of the next
basis, we tend to choose the steepest of the simplexes that can provide reductions in the objective
function value, which has the side effect of allowing for skipping several bases that would have to
be otherwise considered. This creates a “sweeping effect”, which allows the method to find optimal
4.3. Column geometry of the simplex method 71
solutions in fewer pivots than vertices. Clearly, this can be engineered to be prevented, as there
are examples purposely constructed to force the method to consider every single vertex, but the
situation illustrated in Figure 4.4 is by far the most common in practice.
72 Chapter 4. The simplex method
4.4 Exercises
Exercise 4.1: Properties of the simplex algorithms
Consider the simplex method applied to a standard form minimization problem, and assume that
the rows of the matrix A are linearly independent. For each of the statements that follow, give
either a proof or a counter example.
(a) An iteration of the simplex method might change the feasible solution while leaving the cost
unchanged.
(b) A variable that has just entered the basis cannot leave in the very next iteration.
(c) If there is a non-degenerate optimal basis, then there exists a unique optimal basis.
A feasible point for this problem is (x1 , x2 ) = (0, 3). Formulate the problem as a minimisation
problem in standard form and verify whether or not this point is optimal. If not, solve the problem
by using the simplex method.
x1 x2 x3 x4 x5 x6 x7 RHS
z 0 0 0 δ 3 γ ξ 0
x2 0 1 0 α 1 0 3 β
x3 0 0 1 -2 2 η -1 2
x1 1 0 0 0 -1 2 1 3
The entries α, β, γ, δ, η and ξ in the tableau are unknown parameters. Furthermore, let B be the
basis matrix corresponding to having x2 , x3 , and x1 (in that order) be the basic variables. For
each one of the following statements, find the ranges of values of the various parameters that will
make the statement to be true.
(a) Phase II of the Simplex method can be applied using this as an initial tableau.
(b) The corresponding basic solution is feasible, but we do not have an optimal basis.
4.4. Exercises 73
(c) The corresponding basic solution is feasible and the first Simplex iteration indicates that the
optimal cost is −∞.
(d) The corresponding basic solution is feasible, x6 is a candidate for entering the basis, and
when x6 is the entering variable, x3 leaves the basis.
(e) The corresponding basic solution is feasible, x7 is a candidate for entering the basis, but if it
does, the objective value remains unchanged.
max. 5x1 + x2
s.t.: 2x1 + x2 ≥ 5
x2 ≥ 1
2x1 + 3x2 ≤ 12
x1 , x2 ≥ 0.
74 Chapter 4. The simplex method
Chapter 5
5.1.1 Motivation
Let us return to the paint factory example. Recall its formulation, given by
To motivate our developments, suppose we do not have an LP solver at our disposal but have found
(perhaps by trial and error) a feasible solution (3, 1) with objective value 19 and want to know
how good of a solution this is. If we know nothing else about the problem, 19 might be a very
bad solution compared to the optimal solution. On the other hand, if we can somehow determine
an upper bound for the solution, showing that this problem simply cannot have a solution with
an objective value better than, say, 25, we might consider the solution “good enough” even if we
cannot show that it is optimal.
Notice that the objective function (1) and the first resource constraint (2) imply
The first inequality comes from the fact that the coefficients of the nonnegative decision variables
x are greater or equal on the left-hand side. The first constraint (2) thus tells us that the optimal
solution value cannot be more than 24, because of the availability of resources. This upper bound
might be higher than the optimal value, but it allows us to conclude that our solution with value
19 cannot be improved by more than 26%, even if this theoretical upper bound could be reached.
The second constraint (3) does not fulfil the requirement of coefficients being at least equal to
75
76 Chapter 5. Linear Programming Duality - Part I
those in the objective function, but multiplying the constraint by 5 on both sides achieves just
that. This gives us
z = 5x1 + 4x2 ≤ 5x1 + 10x2 ≤ 30,
which is a worse bound than 24.
Finally, we notice that two constraints can be combined to obtain a new constraint. Multiplying
the first constraint by 0.75 and the second by 0.5, we can add the results together to obtain:
(0.75 × 6 + 0.5 × 1)x1 + (0.75 × 4 + 0.5 × 2)x2 ≤ 0.75 × 24 + 0.5 × 6,
or
5x1 + 4x2 ≤ 21.
Turns out, this upper bound 21 is also the optimal solution.
Let us next formalise these ideas. What we want to obtain is a constraint giving us an upper
bound for the original problem, hereinafter referred to as the primal problem P . This constraint
is obtained as a weighted combination of the primal constraints. Denote the weights as pi for each
constraint i ∈ {1, . . . , m}. In the case of a primal problem of the form
(P ) : min. c⊤ x
s.t.: Ax ≤ b
x ≥ 0,
the weights p result in a constraint:
m
X n
X m
X
pi Aij xj ≤ pi bi ,
i=1 j=1 i=1
Pm
where the coefficient of xj is i=1 pi Aij . For this to give an upper bound of the objective function
P n
j=1 cj xj , the condition
m
X
pi Aij ≥ cj
i=1
must be satisfied for each j ∈ {1, . . . , n}. In matrix form, this becomes:
A⊤ p ≥ c.
Note that in order for us to be able to combine the constraints into one, the weights p must be
nonnegative, as negative weights would flip the inequalities in the corresponding primal constraints.
Finally, we want the upper bound to be as tight as possible, so we minimise the right-hand side
b⊤ p of this combined constraint, resulting in the dual problem:
(D) : min. b⊤ p
s.t.: A⊤ p ≥ c p ≥ 0.
which we will refer to as the primal problem. In mathematical programming, we say that a
constraint has been relaxed if it has been removed from the set of constraints. With that in mind,
let us consider a relaxed version of P , where Ax = b is replaced with a violation penalty term
p⊤ (b − Ax). This leads to the following problem:
g(p) = min. c⊤ x + p⊤ (b − Ax) ,
x≥0
which has the benefit of not having equality constraints explicitly represented, but only implicit
by means of a penalty term. This term is used to penalise the infeasibility of the constraints in the
attempt to steer the solution of the relaxed problem towards the solution to P . Recalling that our
main objective is to solve P , we are interested in the values (or prices, as they are often called) for
p ∈ Rm that make P and g(p) equivalent.
Let x be the optimal solution to P . Notice that, for any p ∈ Rm , we have that
g(p) = min. c⊤ x + p⊤ (b − Ax) ≤ c⊤ x + p⊤ (b − Ax) = c⊤ x,
x≥0
i.e., g(p) is a lower bound on the optimal value c⊤ x. The inequality holds because, although x is
optimal for P , it might not be optimal for g(p) for an arbitrary vector p. The rightmost equality
is a consequence of x ∈ P , i.e., the feasibility of x implies that Ax = b.
We can use an optimisation-based approach to try to find an optimal lower bound, i.e., the tightest
possible lower bound for P . This can be achieved by solving the dual problem D formulated as
(D) : max. g(p).
p
Notice that D is an unconstrained problem to which a solution proves the tightest lower bound on
P (say, at p). Also, notice how the function g(p) : Rm → R has embedded on its evaluation the
solution of a linear programming problem with x ∈ Rn as decision variables for a fixed p, which is
the argument given to the function g.
We will proceed in this chapter developing the analytical framework that allows us to pose the key
result in duality theory, which states that
g(p) = c⊤ x.
That is, we will next develop the results that guarantee the equivalence between primal and dual
representations. This will be useful for interpreting properties associated with the optimal primal
solution x from the associated optimal prices p. Furthermore, we will see in later chapters that
linear programming duality can be used as a framework for replacing constraints with equivalent
representations, which is a useful procedure in many settings, including developing alternative
solution strategies also based on linear programming.
= p b + min. c⊤ x − p⊤ Ax
⊤
x≥0
As x ≥ 0, the rightmost problem can only be bounded if (c⊤ − p⊤ A) ≥ 0. This gives us a linear
constraint that can be used to enforce the existence of a solution for
(D) : max. p⊤ b
s.t.: p⊤ A ≤ c⊤ .
Notice that D is a linear programming problem with m variables (one per constraint of the primal
problem P ) and n constraints (one per variable of P ). As you might suspect, if you were to repeat
the analysis, looking at D as the “primal” problem, you would end with a dual that is exactly P .
For this to become more apparent, let us first define more generally the rules that dictate what
kind of dual formulations are obtained for different types of primal problems in terms of their
original (i.e., not in standard) form.
In the more general case, let P be defined as
(P ) : min. c⊤ x
s.t.: Ax ≥ b.
(P ) : min. c⊤ x
s.t.: Ax − s = b
s ≥ 0.
Let us focus on the constraints in the reformulated version of P , which can be written as
x
[A | −I] = b.
s
We will apply the same procedure as before, being our constraint matrix [A | −I] in place of A
and [x | s]⊤ our vector of variables, in place of x. Using analogous arguments, we now require that
c⊤ − p⊤ A = 0, so g(p) is finite. Notice that this is a slight deviation from before, but in this case,
we have that x ∈ Rn , so c⊤ − p⊤ A = 0 is the only condition that allows the inner problem in g(p)
to have a finite solution. Then, we obtain the following conditions to be imposed to our dual linear
programming formulation
p⊤ [A | −I] ≤ [c⊤ | 0⊤ ]
c⊤ − p⊤ A = 0,
Combining them all and redoing the previous steps for obtaining a dual formulation, we arrive at
(D) : max. p⊤ b
s.t.: p⊤ A = c⊤
p ≥ 0.
Notice how the change in the type of constraints in the primal problem P lead to additional
nonnegative constraints in the dual variables p. Similarly, the absence of explicit nonnegativity
5.1. Linear programming duals 79
constraints in the primal variables x lead to equality constraints in the dual problem D, as opposed
to inequalities.
Table 5.1 provides a summary which allows one to identify the resulting formulation of the dual
problem based on the primal formulation, in particular regarding its type (minimisation or max-
imisation), constraint types and variable domains.
For converting a minimisation primal problem into a (maximisation) dual, one must read the table
from left to right. That is, the independent terms (b) become the objective function coefficients,
greater or equal constraints become nonnegative variables, and so forth. However, if the primal
problem is a maximisation problem, the table must be read from right to left. For example, in
this case, less-or-equal-than constraints would become nonnegative variables instead, and so forth.
It takes a little practice to familiarise yourself with this table, but it is a really useful resource to
obtain dual formulations from primal problems.
One remark to be made at this point is that, as is hopefully clearer now, the conversion of primal
problems into duals is symmetric, meaning that reapplying the rules in Table 5.1 would take you
from the obtained dual back to the original primal. This is a property of linear programming
problems called being self dual. Another remark is that equivalent reformulations made in the
primal lead to equivalent duals. Specifically, transformations that replace variables x ∈ R with
x+ −x− , where x+ , x− ≥ 0, introduce nonnegative slack variables, or remove redundant constraints
all lead to equivalent duals.
For example, recall that the dual formulation for the primal problem
(P ) : min. c⊤ x
s.t.: Ax ≥ b
x ∈ Rn
is given by
(D) : max. p⊤ b
s.t.: p ≥ 0
p⊤ A = c⊤ .
80 Chapter 5. Linear Programming Duality - Part I
ui = pi (a⊤ ⊤
i x − bi ) and vj = (cj − p Aj )xj .
Notice that ui ≥ 0 for i ∈ I and vj ≥ 0 for j ∈ J, since each pair of terms will have the same
sign (you can see that from Table 5.1 and assuming xj to be the dual variable associated with
p⊤ A ≤ c⊤ ). Thus, we have that
X X
0≤ ui + vj = [p⊤ Ax − p⊤ b] + [c⊤ x − p⊤ Ax] = c⊤ x − p⊤ b.
i∈I j∈J
5.2. Duality theory 81
Let us also pose some results that are direct consequences of Theorem 5.1, which are summarised
in Corollary 5.2.
Corollary 5.2 (Consequences of weak duality). The following are immediate consequences of
Theorem 5.1:
(3) let x and p be feasible to P and D, respectively. Suppose that p⊤ b = c⊤ x. Then x is optimal
to P and p is optimal to D.
Proof. By contradiction, suppose that P has optimal value −∞ and that D has a feasible solution
p. By weak duality, p⊤ b ≤ c⊤ x = −∞, i.e., a contradiction. Part (2) follows a symmetric argument.
Part (3): let x be an alternative feasible solution to P . From weak duality, we have c⊤ x ≥ p⊤ b =
c⊤ x, which proves the optimality of x. The optimality of p follows a symmetric argument.
Notice that Theorem 5.1 provides us with a bounding technique for any linear programming prob-
lem. That is, for a given pair of primal and dual feasible solutions, x and p, respectively, we have
that
p⊤ b ≤ c⊤ x∗ ≤ c⊤ x,
where c⊤ x∗ is the optimal objective function value.
Corollary 5.2 also provides an alternative way of identifying infeasibility by means of linear pro-
gramming duals. One can always use the unboundedness of a given element of a primal-dual pair to
state the infeasibility of the other element in the pair. That is, an unbounded dual (primal) implies
an infeasible primal (dual). However, the reverse statement is not as conclusive. Specifically, an
infeasible primal (dual) does not necessarily imply that the dual (primal) is unbounded, but does
imply it to be either infeasible or unbounded.
Proof. Assume P is solved to optimality, with optimal solution x and basis B. Let xB = B −1 b. At
the optimal, the reduced costs are c⊤ − c⊤
BB
−1
A ≥ 0. Let p = c⊤BB
−1
. We then have p⊤ A ≤ c⊤ ,
which shows that p is feasible to D. Moreover,
p ⊤ b = c⊤
BB
−1
b = c⊤ ⊤
B xB = c x, (5.1)
The proof of Theorem 5.3 reveals something remarkable relating the simplex method and the dual
variables p. As can be seen in (5.1), for any primal feasible solution x, an associated dual (not
82 Chapter 5. Linear Programming Duality - Part I
necessarily) feasible solution p can be immediately recovered. If the associated dual solution is also
feasible, then Theorem 5.3 guarantees that optimality ensues.
This means that we can interchangeably solve either a primal or a dual form of a given problem,
considering aspects related to convenience and computational ease. This is particularly useful
in the context of the simplex method since the prices p are readily available as the algorithm
progresses. In the next section, we will discuss several practical uses of this relationship in more
detail. For now, let us halt this discussion for a moment and consider a geometrical interpretation
of duality in the context of linear programming.
(P ) : min. c⊤ x
s.t.: a⊤
i x ≥ bi , ∀i ∈ I.
Imagine that there is a particle within the polyhedral set representing the feasible region of P and
that this particle is subjected to a force represented by the vector −c.
Notice that this is equivalent
to minimising the function z = c⊤ x within the polyhedral set a⊤ i x ≥ bi i∈I representing the
feasible set of P . Assuming that the feasible set of P is bounded in the direction −c, this particle
will eventually come to a halt after hitting the “walls” of the feasible set, at a point where the
pulling force −c and the reaction of these walls reach an equilibrium. We can think of x as this
stopping point. This is illustrated in Figure 5.1.
a3
a1 c
p2 a2
p1 a1 a2
We can then think of the dual variables p as the multipliers applied to the normal vectors asso-
ciated with the hyperplanes (i.e., the walls) that are in contact with the particle to achieve this
equilibrium. Hence, these multipliers p will be such that
X
c= pi ai , for some pi ≥ 0, i ∈ I,
i∈I
which is precisely the dual feasibility condition (i.e., constraint) associated with the dual of P ,
5.3. Geometric interpretation of duality 83
given by
p⊤ b : p⊤ A = c, p ≥ 0 .
D : max.
which again implies the optimality of p (cf. Corollary 5.2 (3)). This geometrical insight leads to
another key result for linear programming duality, which is the notion of complementary slackness.
The vectors x and p are optimal solutions to P and D, respectively, if and only if pi (a⊤
i x − bi ) =
0, ∀i ∈ I, and (cj − p⊤ Aj )xj = 0, ∀j ∈ J.
Proof. From the proof of Theorem 5.1 and with Theorem 5.3 holding, we have that
pi (a⊤ ⊤
i x − bi ) = 0, ∀i ∈ I, and (cj − p Aj )xj = 0, ∀j ∈ J.
In turn, if these hold, then x and p are optimal (cf. Corollary 5.2 (3)).
Theorem 5.4 exposes an important feature related to optimal solutions. Notice that a⊤ i x − bi
represents the slack value of constraint i ∈ I. Thus, under the assumption of nondegeneracy,
pi (a⊤
i x − bi ) = 0, ∀i ∈ I implies that for each constraint i ∈ I, either the dual variable pi associated
with constraint is zero, or the associated slack value a⊤ i x − bi is zero.
For nondegenerate basic feasible solutions (BFS) (i.e., xj > 0, ∀j ∈ IB , where IB is the set of basic
variable indices), complementary slackness determines a unique dual solution. That is
a1 a2 a5 a3
c a1
A c a1
a4 a2
B a4
a3 c a1 c
C
a6 c a1
a5
a6 D
Figure 5.2: A is both primal and dual infeasible; B is primal feasible and dual infeasible; C is
primal and dual feasible; D is degenerate.
a⊤
i x ≥ bi , ∀i ∈ I (primal feasibility) (5.2)
0
pi = 0, ∀i ∈
/I (complementary conditions) (5.3)
X
p⊤
i ai = c (dual feasibility I) (5.4)
i∈I
pi ≥ 0, (dual feasibility II) (5.5)
where I 0 = i ∈ I, a⊤
i x = bi are the active constraints. From (5.2)–(5.5), we see that the opti-
mality of the primal-dual pair has two main requirements. The first is that x must be (primal)
feasible. The second, expressed as
X
pi ai = c, pi ≥ 0,
i∈I 0
c = c⊤ − cB B −1 A ≥ 0
was observed. Notice that this is precisely the dual feasibility condition p⊤ A ≤ c.
Being a dual method, the dual version of the simplex method, or the dual simplex method, considers
conditions in reverse order. That is, it starts from an initial dual feasible solution and iterates in
a manner that the primal feasibility condition B −1 b ≥ 0 is sought for, while c ≥ 0, or equivalently,
p⊤ A ≤ c, is maintained.
To achieve that, one must revise the pivoting of the primal simplex method such that the variable
to leave the basis is some i ∈ IB , with xB(i) < 0, while the variable chosen to enter the basis is
some j ∈ IN , such that c ≥ 0 is maintained.
Consider the lth simplex tableau row for which xB(i) < 0 of the form [v1 , . . . , vn , xB(i) ]; i.e., vj is
the lth component of B −1 Aj .
For each j ∈ IN for which vj < 0, we pick
cj
j ′ = arg minj∈IN :vj <0 .
|vj |
Pivoting is performed by employing elemental row operations to replace AB(i) with Aj in the basis.
This implies that cj ≥ 0 is maintained, since
cj cj ′ cj ′ cj ′
≥ ⇒ cj − |vj | ≥ 0 ⇒ cj + vj ≥ 0, ∀j ∈ J.
|vj | |vj ′ | |vj ′ | |vj ′ |
Notice that it also justifies why we must only consider for entering the basis those variables for
which vj < 0. Analogously to the case in the primal simplex method, if we observe that vj ≥ 0
for all j ∈ J , then no limiting condition is imposed in terms the increase in the nonbasic variable
(i.e., an unbounded dual, which, according to Corollary 5.2 (2), implies the original problem is
infeasible).
Assuming that the dual is not unbounded, the termination of the dual simplex method is ob-
served when B −1 b ≥ 0 is achieved, and primal-dual optimal solutions have been found, with
x = (xB , xN ) = (B −1 b, 0) (i.e., the primal solution) and p = (pB , pN ) = (0, c⊤
BB
−1
) (dual).
Algorithm 3 presents a pseudocode for the dual simplex method.
To clarify some of the previous points, let us consider a numerical example. Consider the problem
min. x1 + x2
s.t.: x1 + 2x2 ≥ 2
x1 ≥ 1
x1 , x2 ≥ 0.
86 Chapter 5. Linear Programming Duality - Part I
The first thing we must do is convert the greater-or-equal-than inequalities into less-or-equal-than
inequalities and add the respective slack variables. This allows us to avoid the inclusion of artificial
variables, which are not required anymore since we can allow for primal infeasibility. This leads to
the equivalent standard form problem
min. x1 + x2
s.t.: − x1 − 2x2 + x3 = −2
− x1 + x4 = −1
x1 , x2 , x3 , x4 ≥ 0.
Below is the sequence of tableaus after applying the dual simplex method to solve the problem.
The terms in bold font represent the pivot element (i.e., the intersection between the pivot row
and pivot column).
x1 x2 x3 x4 RHS
z 1 1 0 0 0
x3 -1 -2 1 0 -2
x4 -1 0 0 1 -1
x1 x2 x3 x4 RHS
z 1/2 0 1/2 0 -1
x2 1/2 1 -1/2 0 1
x4 -1 0 0 1 -1
x1 x2 x3 x4 RHS
z 0 0 1/2 1/2 -3/2
x2 0 1 -1/2 1/2 1/2
x1 1 0 0 -1 1
Figure 5.3 illustrates the progress of the algorithm both in the primal (Figure 5.3a) and in the dual
(Figure 5.3b) variable space. Notice how in the primal space the solution remains primal infeasible
until a primal feasible solution is reached, that being the optimal for the problem. Also, notice
that the coordinates of the dual variables can be extracted from the zeroth row of the simplex
tableau.
5.4. The dual simplex method 87
3 1.5
Constraints Constraints
Obj. function Obj. function
2 1.0
p2
x2
B 0.5
1
C
A 0.0
0 0.0 0.5 1.0 1.5
0 1 2 3
x1 p1
(a) The primal-variable space (b) The dual-variable space
Figure 5.3: The progress of the dual simplex method in the primal and dual space.
Some interesting features related to the progress of the dual simplex algorithm are worth highlight-
c ′
ing. First, notice that the objective function is monotonically increasing in this case, since xB(l) |vj ′ |
j
is added to −cB B −1 b and xB(l) < 0, meaning that the dual cost increases (recall the convention of
having a minus sign so that the zeroth row correctly represent the objective function value, given
by the negative of the value displayed in the rightmost column). This illustrates how the solution
becomes gradually worse as it compromises optimality in the search for (primal) feasibility. For a
nondegenerate problem, this can also be used as an argument for eventual convergence since the
dual objective value can only increase and is bounded by the primal optimal value. However, in the
presence of dual degeneracy, that is, cj = 0 for some j ∈ IN in the optimal solution, the algorithm
can suffer from cycling. As we have seen before, that is an indication that the primal problem has
multiple optima.
The dual simplex method is often the best choice of algorithm, because it typically precludes
the need for a Phase I type of method as it is often trivial to find initial dual feasible solutions
(the origin, for example, is typically dual feasible in minimisation problems with nonnegative
coefficients; similar trivial cases are also well known).
Moreover, dual simplex is the algorithm of choice for resolving a linear programming problem when
after finding an optimal solution, you modify the feasible region. Turns out that this procedure
is in the core of the methods used to solve integer programming problems, as well as in the
Benders decomposition, both topics we will explore later on. The dual simplex method is also
more successful than its primal counterpart in combinatorial optimisation problems, which are
typically plagued with degeneracy. As we have seen, primal degeneracy simply means multiple
dual optima, which are far less problematic under an algorithmic standpoint.
Most professional implementations of the simplex method use by default the dual simplex version.
This has several computational reasons, in particular related to more effective Phase I and pricing
methods for the dual counterpart.
88 Chapter 5. Linear Programming Duality - Part I
5.5 Exercises
Exercise 5.1: Duality in the transportation problem
Recall the transportation problem from Chapter 1. Answer the following questions based on the
interpretation of the dual price.
We would like to plan the production and distribution of a certain product, taking into account
that the transportation cost is known (e.g., proportional to the distance travelled), the factories
(or source nodes) have a supply capacity limit, and the clients (or demand nodes) have known
demands. Table 5.2 presents the data related to the problem.
Clients
Factory NY Chicago Miami Capacity
Seattle 2.5 1.7 1.8 350
San Diego 3.5 1.8 1.4 600
Demands 325 300 275 -
Table 5.2: Problem data: unit transportation costs, demands and capacities
Additionally, we consider that the arcs (routes) from factories to clients have a maximum trans-
portation capacity, assumed to be 250 units for each arc. The problem formulation is then
XX
min. z = cij xij
i∈I j∈J
X
s.t.: xij ≤ Ci , ∀i ∈ I
j∈J
X
xij ≥ Dj , ∀j ∈ J
i∈I
xij ≤ Aij , ∀i ∈ I, j ∈ J
xij ≥ 0, ∀i ∈ I, j ∈ J,
where Ci is the supply capacity of factory i, Dj is the demand of client j and Aij is the trans-
portation capacity of the arc between i and j.
(a) What price would the company be willing to pay for increasing the supply capacity of a given
factory?
(b) What price would the company be willing to pay for increasing the transportation capacity
of a given arc?
(b) Write the dual formulation of the problem and use strong duality to verify that x and p are
optimal.
5.5. Exercises 89
min. 2x1 + x3
s.t.: − 1/4x1 − 1/2x2 ≤ −3/4
8x1 + 12x2 ≤ 20
x1 + 1/2x2 − x3 ≤ −1/2
9x1 + 3x2 ≥ −6
x1 , x2 , x3 ≥ 0.
(P ) : min. c⊤ x
s.t.: Ax = b
x ≥ 0,
where A ∈ Rm×n and b ∈ Rm . Show that if P has a finite optimal solution, then the new problem
P obtained from P by replacing the right-hand side vector b with another one b̄ ∈ Rm cannot be
unbounded no matter what value the components of b̄ can take.
where A1,...,9 are matrices, b1,...,3 , c1,...,3 are column vectors, and y 1,...,3 are the dual variables
associated to each constraint.
(a) Construct the dual of the problem and solve both the original problem and its dual.
(b) Use complementary slackness to verify that the primal and dual solutions are optimal.
Chapter 6
The most direct application of duality in linear programming problems is the interpretation of dual
variable values as marginal values associated with constraints, with important economic implica-
tions.
We will also consider the notion of solution stability and restarting the simplex method, once new
variables or constraints are added to the problem post optimality. This will have an important
consequence for the development of efficient solution methods for integer programming problems,
which we will discuss in detail in later chapters.
We are interested in analysing aspects associated with the stability of the optimal solution x in
terms of how it changes with the inclusion of new decision variables and constraints or with changes
in the input data. Both cases are somewhat motivated by the realisation that problems typically
emerge from dynamic settings. Thus, one must assess how stable a given plan (represented by x)
is or how it can be adapted in the face of changes in the original problem setting. This kind of
analysis is generally referred to as sensitivity analysis in the context of linear programming.
First, we will consider the inclusion of new variables or new constraints after the optimal solution
x is obtained. This setting represents, for example, the inclusion of a new product or a new
production plant (referring to the context of resource allocation and transportation problems, as
discussed in Chapter 1) or the consideration of additional constraints imposing new (or previously
disregarded) requirements or conditions. The techniques we consider here will also be relevant
in the following chapters. We will then discuss specialised methods for large-scale problems and
solution techniques for integer programming problems, both topics that heavily rely on the idea of
iteratively incrementing linear programming problems with additional constraints (or variables).
The second group of cases relates to changes in the input data. When utilising linear programming
models to optimise systems performance, one must bear in mind that there is inherent uncertainty
associated with the input data. Be it due to measurement errors or a lack of complete knowledge
about the future, one must accept that the input data of these models will, by definition, embed
some measure of error. One way of taking this into account is to try to understand the consequences
to the optimality of x in case of eventual changes in the input data, represented by the matrix A,
and the vectors c and b. We will achieve this by studying the ranges within which variations in
these terms do not compromise the optimality of x.
91
92 Chapter 6. Linear Programming Duality - Part II
P : min. c⊤ x
s.t.: Ax = b
x ≥ 0.
Let us consider that a new variable xn+1 with associated column (that is, respective objective
function and constraint coefficients) (cn+1 , An+1 ) is added to P . This leads to a new augmented
problem P ′ of the form
We need to determine if, after the inclusion of this new variable, the current basis B is still optimal.
Making the newly added variable nonbasic yields the basic feasible solution (BFS) x = (x, 0).
Moreover, we know that the optimality condition c⊤ − c⊤ BB
−1
b ≥ 0 held before the inclusion of
the variable, so we know that all the other reduced costs associated with the nonbasic variables
j ∈ IN were nonnegative.
Therefore, the only check that needs to be done is whether the reduced cost associated with xn+1
also satisfies the optimality condition, i.e., if
cn+1 = cn+1 − c⊤
BB
−1
An+1 ≥ 0.
If the optimality condition is satisfied, the new variable does not change the optimal basis, and the
solution x = (x, 0) is optimal. Otherwise, one must perform a new simplex iteration, using B as a
starting BFS. Notice that in this case, primal feasibility is trivially satisfied, while dual feasibility
is not observed (that is, cn+1 < 0). Therefore, primal simplex can be employed, warm started by
B. This is often a far more efficient strategy than resolving P ′ from scratch.
Let us consider a numerical example. Consider the problem
x1 x2 x3 x4 RHS
z 0 0 2 7 12
x1 1 0 -3 2 2
x2 0 1 5 -3 2
Suppose we include a variable x5 , for which c5 = −1 and A5 = (1, 1). The modified problem then
6.1. Sensitivity analysis 93
becomes
x1 x2 x3 x4 x5 RHS
z 0 0 2 7 -4 12
x1 1 0 -3 2 -1 2
x2 0 1 5 -3 2 2
Notice that this tableau now shows a primal feasible solution that is not optimal and can be further
iterated using primal simplex.
We can reuse the optimal basis B to form a new basis B for the problem. This will have the form
B 0
B= ,
a⊤ −1
where a are the respective components of am+1 associated with the columns from A that formed
−1
B. Now, since we have that B B = I, we must have that
B −1
−1 0
B = ⊤ −1 .
a B −1
Notice however, that the basic solution (x, am+1 ⊤ x − bm+1 ) associated with B is not feasible, since
we assumed that x did not satisfy the newly added constraint, i.e., am+1 ⊤ x < bm+1 .
94 Chapter 6. Linear Programming Duality - Part II
Notice that the new slack variable has a null component as a reduced cost, meaning that it does
not violate dual feasibility conditions. Thus, after adding a constraint that makes x infeasible, we
still have a dual feasible solution that can be immediately used by the dual simplex method, again
allowing for warm starting the solution of the new problem.
To build an initial solution in terms of the tableau representation of the simplex method, we must
simply add an extra new row, which leads to a new tableau with the following structure
B −1 A
−1 0
B A = ⊤ −1 .
a B A − a⊤ m+1 1
Let us consider a numerical example again. Consider the same problem as the previous example,
but that we instead include the additional constraint x1 + x2 ≥ 5, which is violated by the optimal
solution (2, 2, 0, 0). In this case, we have that am+1 = (1, 1, 0, 0) and a⊤ B −1 A−a⊤
m+1 = [0, 0, 2, −1].
This modified problem then looks like
min. − 5x1 − x2 + 12x3
s.t.: 3x1 + 2x2 + x3 = 10
5x1 + 3x2 + x4 = 16
− x1 − x2 + x5 = −5
x1 , . . . , x 5 ≥ 0
with associated tableau
x1 x2 x3 x4 x5 RHS
z 0 0 2 7 0 12
x1 1 0 -3 2 0 2
x2 0 1 5 -3 0 2
x5 0 0 2 -1 1 -1
Notice that this tableau indicates that we have a dual feasible solution that is not primal feasible
and thus suitable to be solved using dual simplex.
A final point to note is that these operations are related to each other in terms of equivalent
primal-dual formulations. That is, consider dual of P , which is given by
D : max. p⊤ b
s.t.: p⊤ A ≤ c.
Then, adding a constraint of the form p⊤ An+1 ≤ cn+1 is equivalent to adding a variable to P ,
exactly as discussed in Section 6.1.1.
As before, assume that we have solved P to optimality. As we have seen in Chapter 4, the optimal
solution x with associated basis B satisfies the following optimality conditions: it is a BFS and,
therefore (i) B −1 b ≥ 0; and (ii) all reduced costs are nonnegative, that is c⊤ − c⊤
BB
−1
b ≥ 0.
Now, assume that we cause a marginal perturbation on the vector b, represented by a vector d.
That is, assume that we have B −1 (b + d) > 0, assuming that nondegeneracy is retained.
Recall that the optimality condition c = c⊤ − c⊤ BB
−1
A ≥ 0 is not influenced by such a marginal
perturbation. That is, for a small change d, the optimal basis (i.e., the selection of basic variables)
is not disturbed. On the other hand, the optimal value of the basic variables, and consequently,
the optimal value, becomes
c⊤
BB
−1
(b + d) = p⊤ (b + d).
x1 x2 x3 x4 x5 x6 RHS
z 0 0 3/4 1/2 0 0 21
x1 1 0 1/4 -1/2 0 0 3
x2 0 1 -1/8 3/4 0 0 3/2
x5 0 0 3/8 -5/4 1 0 5/2
x6 0 0 1/8 -3/4 0 1 1/2
where x3 and x4 were the slack variables associated with raw material M1 and M2, respectively.
In this case, we have that
⊤
1/4 −1/2 0 0 −5 1/4 −1/2 0 0 −3/4
−1/8 3/4 0 0 −4 −1/8 3/4 0 0 −1/2
B −1 = and p = c⊤
BB
−1
=
0 3/8 −5/4 1 0 = 0
3/8 −5/4 1 0
1/8 −3/4 0 1 0 1/8 −3/4 0 1 0
Notice that these are values of the entries in the z-row below the slack variables x3 , . . . , x6 , except
the minus sign. This is because the z-rows contain the entries for c − p⊤ B −1 and for all slack
variables we have that cj = 0, for j = 3, . . . , 6. Also, recall that the paint factory problem is a
maximisation problem, so p represents the decrease in the objective function value. In this, we
see that removing one unit of M1 would decrease the objective function by 3/4 and removing one
unit of M2 would similarly decrease the objective value by 1/2. Analogous, increasing M1 or M2
availability by one unit would increase the objective function value by 3/4 and 1/2, respectively.
96 Chapter 6. Linear Programming Duality - Part II
Suppose that some component bi changes and becomes bi + δ, with δ ∈ R. We are interested in
the range for δ within which the basis B remains optimal.
First, we must notice that optimality conditions c = c⊤ − c⊤ BB
−1
A ≥ 0 are not directly affected by
variation in the vector b. This means that the choice of variable indices j ∈ IB to form the basis
B will, in principle, be stable unless the change in bi is such that B is rendered infeasible. Thus,
we need to study the conditions in which feasibility is retained, or, more specifically, if (recall the
ei is the vector of zeros except for the ith component being 1)
B −1 (b + δei ) ≥ 0.
In other words, changing bi will incur changes in the value of the basic variables, and thus, we
must determine the range within which all basic variables remain nonnegative (i.e., feasible).
Let us consider a numerical example. Once again, consider the problem from Section 6.1.1. The
optimal tableau was given by
x1 x2 x3 x4 RHS
z 0 0 2 7 12
x1 1 0 -3 2 2
x2 0 1 5 -3 2
Suppose that b1 will change by δ in the constraint 3x1 + 2x2 + x3 = 10. Notice that the first
column of B −1 can be directly extracted from the optimal tableau and is given by (−3, 5). The
optimal basis will remain feasible if 2 − 3δ ≥ 0 and 2 + 5δ ≥ 0, and thus −2/5 ≤ δ ≤ 2/3.
Notice that this means that we can calculate the change in the objective function value as a function
of δ ∈ [−2/5, 2/3]. Within this range, the optimal cost changes as
c⊤ ⊤
B (b + δei ) = p b + δpi ,
where p⊤ = c⊤BB
−1
is the optimal dual solution. In case the variation falls outside that range, this
means that some of the basic variables will become negative. However, since the dual feasibility
conditions are not affected by changes in bi , one can still reutilise the basis B using dual simplex
to find a new optimal solution.
We now consider the case where variations are expected in the objective function coefficients.
Suppose that some component cj becomes cj + δ. In this case, optimality conditions become a
concern. Two scenarios can occur. First, it might be that the changing coefficient is associated
6.1. Sensitivity analysis 97
with a variable j ∈ J that happens to be nonbasic (j ∈ IN ) in the optimal solution. In this case,
we have that optimality will be retained as long as the nonbasic variable remains “not attractive”,
i.e., the reduced cost associated with j remains nonnegative. More precisely put, the basis B will
remain optimal if
(cj + δ) − cB B −1 Aj ≥ 0 ⇒ δ ≥ −cj .
The second scenario concerns changes in variables that are basic in the optimal solution, i.e.,
j ∈ IB . In that case, the optimality conditions are directly affected, meaning that we have to
analyse the range of variation for δ within which the optimality conditions are maintained, i.e., the
reduced costs remain nonnegative.
Let cj is the coefficient of the lth basic variable, that is j = B(l). In this case, cB becomes cB + δel ,
meaning that all optimality conditions are simultaneously affected. Thus, we have to define a range
for δ in which the condition
(cB + δel )⊤ B −1 Ai ≤ ci , ∀i ̸= j
holds. Notice that we do not need to consider j since xj is a basic variable, and thus, its reduced
costs are assumed to remain zero.
Considering the tableau representation, we can use the lth row and examine the conditions for
which δqli ≤ ci , ∀i ̸= j, where qli is the lth entry of B −1 Ai .
Let us once again consider the previous example, with optimal tableau
x1 x2 x3 x4 RHS
z 0 0 2 7 12
x1 1 0 -3 2 2
x2 0 1 5 -3 2
First, let us consider variations in the objective function coefficients of variables x3 and x4 . Since
both variables are nonbasic in the optimal basis, the allowed variation for them is given by
Two points to notice. First, notice that both intervals are one-sided. This means that one should
only be concerned with variations that decrease the reduced cost value since increases in their
value can never cause any changes in the optimality conditions. Second, notice that the allowed
variation is trivially the negative value of the reduced cost. For variations that turn the reduced
costs negative, the current basis can be utilised as a starting point for the primal simplex.
Now, let us consider a variation in the basic variable x1 . Notice that in this case we have to
analyse the impact in all reduced costs, with exception of x1 itself. Using the tableau, we have
that ql = [1, 0, −3, 2] and thus
δ1 q12 ≤ c2 ⇒ 0 ≤ 0
δ1 q13 ≤ c3 ⇒ δ1 ≥ −2/3
δ1 q14 ≤ c4 ⇒ δ1 ≤ 7/2,
implying that −2/3 ≤ δ1 ≤ 7/2. Like before, for a change outside this range, primal simplex can
be readily employed.
98 Chapter 6. Linear Programming Duality - Part II
A cone C can be understood as a set formed by the nonnegative scaling of a collection of vectors
x ∈ C. Notice that it implies that 0 ∈ C. Often, it will be the case that 0 ∈ C is an extreme point
of C and in that case, we say that C is pointed. As one might suspect, in the context of linear
programming, we will be mostly interested in a specific type of cone, those known are polyhedral
cones. Polyhedral cones are sets of the form
P = {x ∈ Rn : Ax ≥ 0} .
Figure 6.1 illustrates a polyhedral cone in R3 formed by the intersection of three half-spaces.
x3
a1
a2
x2
x1
Some interesting properties can be immediately concluded regarding polyhedral cones. First, they
are convex sets, since they are polyhedral sets (cf. Theorem 2.8). Also, the origin is an extreme
point, and thus, polyhedral cones are always pointed. Furthermore, just like general polyhedral
sets, a cone C ∈ Rn will always be associated with a collection of n linearly independent vectors.
Corollary 6.2 summarises these points. Notice we pose it as a corollary because these are immediate
consequences of Theorem 3.5.
Corollary 6.2. Let C ⊂ Rn be a polyhedral cone defined by constraints a⊤
i x ≥ 0 i=1,...,m . Then
the following are equivalent
1. 0 is an extreme point of C;
2. C does not contain a line;
3. There exists n vectors in a1 , . . . , am that are LI.
Notice that 0 ∈ C is the unique extreme point of the polyhedral cone C. To see that, let 0 ̸= x ∈ C,
x1 = (1/2)x and x2 = (3/2)x. Note that x1 , x2 ∈ C, and x ̸= x1 ̸= x2 . Setting λ1 = λ2 = 1/2, we
have that λ1 x1 + λ2 x2 = x and thus, x is not an extreme point (cf. Definition 2.11).
Definition 6.3 (Recession cone). Consider the polyhedral set P = {x ∈ Rn : Ax ≥ b}. The reces-
sion cone at x ∈ P , denoted recc(P ), is defined as
Notice that the definition states that a recession cone comprises all directions d along which one
can move from x ∈ P without ever leaving P . However, notice that the definition does not
depend on x, meaning that the recession cone is unique for the polyhedral set P , regardless of its
“origin”. Furthermore, notice that Definition 6.3 implies that recession cones of polyhedral sets
are polyhedral cones.
We say that any directions d ∈ recc(P ) is a ray. Thus, bounded polyhedra can be alternatively
defined as polyhedral sets that do not contain rays.
Figure 6.2 illustrates the concept of recession cones. Notice that the cone is purposely placed in
several places to illustrate the independence of the point x ∈ P .
Finally, the recession cone for a standard form polyhedral set P = {x ∈ Rn : Ax = b, x ≥ 0} is
given by
recc(P ) = {d ∈ Rn : Ad = 0, d ≥ 0} .
Notice that we are interested in extreme rays of the recession cone recc(P ) of the polyhedral set
P . However, it is typical to say that they are extreme rays of P . Figure 6.3 illustrates the concept
of extreme rays in polyhedral cones.
x3
a1
d1
d2 a2
d3
x2
x1
Figure 6.3: A polyhedral cone formed by the intersection of three half-spaces (the normal vector
a3 is perpendicular to the plane of the picture and cannot be seen). Directions d1 , d2 , and d3
represent extreme rays.
Notice that, just like extreme points, the number of extreme rays is finite by definition. In fact, we
say that two extreme rays are equivalent if they are positive multiples corresponding to the same
n − 1 linearly independent active constraints.
The existence of extreme rays can be used to verify unboundedness in linear programming problems.
The mere existence of extreme rays does not suffice since unboundedness is a consequence of the
extreme ray being a direction of improvement for the objective function. To demonstrate this, let us
first describe unboundedness in polyhedral cones, which we can then use to show the unboundedness
in polyhedral sets.
6.5 (Unboundedness in polyhedral cones). Let P : min. c⊤ x : x ∈ C , with C =
Theorem
⊤
ai x ≥ 0, i = 1, . . . , m . The optimal value is equal to −∞ if and only if some extreme ray d ∈ C
satisfies c⊤ d < 0.
Proof. If c⊤ d < 0, then P is unbounded, since c⊤ x → −∞ along d. Also, there exists some x ∈ C
for which c⊤ x < 0 can be scaled to -1.
Let P = x ∈ Rn : a⊤ ⊤
i x ≥ 0, i = 1, . . . , m, c x = −1 . Since 0 ∈ C, P has at least one extreme
m n
point {ai }i=1 and thus span R (cf. Theorem 3.5). Let d be one of those.m As we have n linearly-
independent active constraints at d, n − 1 of the constraints a⊤ i x ≥ 0 i=1 must be active (plus
⊤
c x = −1), and thus d is an extreme ray.
{x ∈ Rn : Ax ≥ b} and assume that the feasible set has at least one extreme point. Then, the
optimal value is −∞ if and only if c⊤ d < 0.
The existence of at least one extreme point for P implies that the rows {ai }i=1,...,m of A span
Rn and recc(X)={x ∈ Rn : Ax ≥ 0} is pointed. Thus, by Theorem 6.5 there exists d such that
c⊤ d < 0.
We now focus on how this can be utilised in the context of the simplex method. It turns out that
once unboundedness is identified in the simplex method, one can extract the extreme ray causing
the said unboundedness. In fact, most professional-grade solvers are capable of returning extreme
(or unbounded) rays, which is helpful in the process of understanding the causes for unboundedness
in the model. We will also see in the next chapter that these extreme rays are also used in the
context of specialised solution methods.
To see that is possible, let P : min. c⊤ x : x ∈ X with X = {x ∈ Rn : Ax = b, x ≥ 0} and assume
that, for a given basis B, we conclude that the optimal value is −∞, that is, the problem is
unbounded. In the context of the simplex method, this implies that we found a nonbasic variable
xj for which the reduced cost cj < 0 and the j th column of B −1 Aj has no positive coefficient.
Nevertheless, we can still form the feasible direction d = [dB dN ] as before, with
(
−1 dj = 1
dB = −B Aj and dN =
di = 0, ∀i ∈ IN \ {j} .
This direction d is precisely an extreme ray for P . To see that, first, notice that Ad = 0 and
d ≥ 0, thus d ∈ recc(X). Moreover, there are n − 1 active constraints at d: m in Ad = 0 and
n − m − 1 in di = 0 for i ∈ IN \ {j}. The last thing to notice is that cj = c⊤ d < 0, which shows
the unboundedness in the direction d.
If there exists any p ∈ Y , then there is no x ∈ X for which p⊤ Ax = p⊤ b, (and in turn Ax = b),
holds. Thus, X must be empty. Notice that this can be used to infer that a problem P with a
feasibility set represented by X prior to solving P itself, by means of solving the feasibility problem
of finding a vector p ∈ Y .
We now pose this relationship more formally via a result generally known as the Farkas’ lemma.
Theorem 6.7 (Farkas’ lemma). Let A be a m × n matrix and b ∈ Rm . Then, exactly one of the
following statements hold
Being P infeasible, D must be unbounded (instead of infeasible) since p = 0 is feasible for D. Thus,
p⊤ b < 0 for some p ̸= 0.
102 Chapter 6. Linear Programming Duality - Part II
The Farkas’ lemma has a nice geometrical interpretation that represents the mutually exclusive
relationship between the two sets. For that, notice that we can think of b as being a conic com-
bination of the columns Aj of A, for some x ≥ 0. If that cannot be the case, then there exists a
hyperplane that separates b and the cone formed by the columns of A, C = {y ∈ Rm : y = Ax}.
This is illustrated in Figure 6.4. Notice that the separation caused by such a hyperplane with
normal vector p implies that p⊤ Ax ≥ 0 and p⊤ b < 0, i.e., Ax and b are on the opposite sides of
the hyperplane.
A1 A2 A3
p
6.3 Exercises
min. − 2x1 − x2 + x3
s.t.: x1 + 2x2 + x3 ≤ 8
− x1 + x2 − 2x3 ≤ 4
3x1 + x2 ≤ 10
x1 , x2 , x3 ≥ 0.
x1 x2 x3 x4 x5 x6 RHS
z 0 0 1.2 0.2 0 0.6 -7.6
x1 1 0 -0.2 -0.2 0 0.4 2.4
x2 0 1 0.6 0.6 0 -0.2 2.8
x5 0 0 -2.8 -0.8 1 0.6 3.6
(a) If you were to choose between increasing in 1 unit the right-hand side of any constraints,
which one would you choose, and why? What is the effect of the increase on the optimal
cost?
(b) Perform a sensitivity analysis on the model to discover what is the range of alteration in
the RHS in which the same effect calculated in item (a) can be expected. HINT : JuMP
(from version 0.21.6) includes the function “lp sensitivity report()” that you can use to help
performing the analysis.
(a) Let P = {(x1 , x2 ) : x1 − x2 = 0, x1 + x2 = 0}. What are the extreme points and the extreme
rays of P ?
(b) Let P = {(x1 , x2 ) : 4x1 + 2x2 ≥ 0, 2x1 + x2 ≤ 1}. What are the extreme points and the
extreme rays of P ?
(c) For the polyhedron of part (b), is it possible to express each one of its elements as a convex
combination of its extreme points plus a nonnegative linear combination of its extreme rays?
Is this compatible with the Resolution Theorem?
max. 2x1 + x2
s.t.: 2x1 + 2x2 ≤ 9 (p1 )
2x1 − x2 ≤ 3 (p2 )
x1 ≤ 3 (p3 )
x2 ≤ 4 (p4 )
x1 , x2 ≥ 0.
(a) Find the primal and dual optimal solutions. HINT : You can use complementary slackness,
once having the primal optimum, to find the dual optimal solution.
(b) Suppose we add a new constraint 6x1 − x2 ≤ 6, classify the primal and dual former optimal
points stating if they: (i) remain optimal; (ii) remain feasible but not optimal; or (iii) become
infeasible.
(c) Consider the new problem from item (b) and find the new dual optimal point through one
dual simplex iteration. After that, find the primal optimum.
Chapter 7
105
106 Chapter 7. Barrier Method for Linear Programming
We wish to find a solution x∗ that is a root for f , i.e., f (x∗ ) = 0. For that, we must solve the
system of equations given by
f1 (x) 0
.. ..
f (x) = . = . .
fn (x) 0
The NR method starts from an initial guess xk for x∗ and iterates by finding the root for a linear
(i.e., first-order Taylor) approximation of f at xk . Under suitable conditions,including having a
starting point x0 that is within a neighbourhood of the root of f , the sequence xk k→∞ converges
to x∗ .
Let us clearly state how the method iterates. At a given xk , the first-order approximation of f (x)
is given by
f (xk + d) = f (xk ) + ∇f (xk )⊤ d,
where ∇f (xk ) is the Jacobian of f (x), which is defined as
∇f1 (xk )⊤
∇f (xk ) = ..
.
.
k ⊤
∇fn (x )
The algorithm proceeds by finding the step to be taken from xk to reach the root of the first-order
approximation of f at xk . This means that we want to obtain a Newton direction d such that it
solves the first-order approximation of f (xk + d) = 0. Thus, we must solve
f (xk ) + ∇f (xk )⊤ d = 0 ⇒ d = −∇f (xk )−1 f (xk ).
Consider the following numerical example. Suppose we would like to find the root of f with
x0 = (1, 0, 1), where f is given by
2
x1 + x22 + x23 − 3
f1 (x)
f (x) = f2 (x) = x21 + x22 − x3 − 1
f3 (x) x1 + x2 + x3 − 3
The Jacobian of f is given by
2x1 2x2 2x3
∇f (x) = 2x1 2x2 −1 .
1 1 1
Notice that the dual is also posed in a form where the inequalities (originally A⊤ p ≤ c) are
converted to equalities requiring the additional variable u. An alternative way to think about u is
as if it were the dual variable associated with the nonnegativity constraints x ≥ 0.
Recall that the optimality conditions for problem P can be expressed as
Ax = b, x ≥ 0, (7.1)
A⊤ p + u = c, u ≥ 0, (7.2)
⊤
u x = 0. (7.3)
The first two conditions are primal (7.1) and dual (7.2) feasibility conditions, while (7.3) is an
alternative way of stating complementarity conditions on the nonnegativity constraints x ≥ 0.
One initial idea could be simply, from a purely methodological standpoint, to employ NR to solve
the above system of equations. The caveat, however, is that one must observe and retain the non-
negativity conditions x > 0 and u > 0, which are an important complicating factor in this setting.
Indeed, departing from a feasible solution (x, p, u), one could employ NR to find a solution for
(7.1)-(7.3) while controlling the steps taken in each Newton direction such that the nonnegativity
conditions x > 0 and u > 0 are retained, in a similar fashion to how it is done in the simplex
method. However, it turns out that Newton directions obtained from successive iterations of (7.1)-
(7.3) typically require the steps to be considerably small so that the nonnegativity conditions are
retained, which renders the algorithm less useful from a practical standpoint.
This is precisely when the notion of an interior solution plays a role. An interior point is defined
as a point that satisfies primal and dual feasibility conditions strictly, that is Ax = b with x > 0,
and A⊤ p + u = c with u > 0, implying that the complementarity conditions (7.3) are violated. This
notion of interior points is useful in that it allows for the definition of less “aggressive” Newton
directions, which aim towards directions that reduce the amount of violation in the complemen-
tarity conditions. Thus, we can alternatively consider the system with relaxed complementarity
conditions in which we define the scalar µ > 0 and restate the optimality conditions as
Ax = b, x ≥ 0,
A⊤ p + u = c, u ≥ 0,
uj xj = µ, ∀j ∈ J.
108 Chapter 7. Barrier Method for Linear Programming
Then, we use these relaxed optimality conditions to obtain Newton directions, while, simultane-
ously, gradually making µ → 0. By doing so, one can obtain not only directions that can be
explored with larger step sizes, thus making more progress per iteration, but also yield methods
with better numerical properties.
To see how that can be made operational, let us first define some helpful notation. Let X ∈ Rn×n
and U ∈ Rn×n be defined as
.. ..
. .
X = diag(x) = xi and U = diag(u) =
ui
.. ..
. .
and let e = [1, . . . , 1]⊤ be a vector of ones of suitable dimension. We can rewrite the optimality
conditions (7.1)- (7.3) in matrix form as
Ax = b, x > 0,
A⊤ p + u = c, u > 0,
XU e = 0,
and, analogously, state their relaxed version (i.e., with relaxed complementarity conditions) as
Ax = b, x > 0, (7.4)
⊤
A p + u = c, u > 0, (7.5)
XU e = µe. (7.6)
We start from a feasible solution (xk , pk , uk ). We can then employ NR to solve the system (7.4)-
(7.6) for a given value of µ. That amounts to finding the Newton direction d that solves f (xk ) +
∇f (xk )⊤ d = 0, in which
Axk − b x − xk
A 0 0
f (xk ) = A⊤ pk + uk − c , ∇f (xk ) = 0 A⊤ I and d = (dkx , dkp , dku ) = p − pk .
X k U k e − µe Uk 0 Xk u − uk
Once the direction dk = (dkx , dkp , dku ) is obtained, we must calculate step sizes that can be taken in
the direction dk that also retain feasibility conditions, meaning that they do not violate x > 0 and
u > 0 at the new point. One simple idea is to follow the same procedure as the simplex method.
That is, let θpk and θdk be the iteration k step sizes for x and (p, u), respectively. Then they can be
set as
xk uk
θpk = min − ki and θdk = min − ki ,
i:dkxi <0 dxi i:dk
ui <0 dui
where dkxi and dkui are the i-th component of the vectors dkx and dku , respectively, and i = 1, . . . , n.
In practice, there are alternative, and arguably more efficient, ways to set step sizes, but they all
are such that their maximum sizes are θp and θd as above and always strictly smaller than one
(notice that our constraints are such that x and u are strictly positive).
Once we calculated appropriate step sizes, we can then make
(xk+1 , pk+1 , uk+1 ) = (xk + θpk dkx , pk + θdk dkp , uk + θdk dku ).
7.4. Barrier methods for linear programming problems 109
Then, we proceed by updating µ as µ = βµ, with β ∈ (0, 1) and repeat the same procedure until
convergence, i.e., until µ is sufficiently small.
Before discussing in more detail the practicalities of the method, let us take an alternative per-
spective on showing how the notion of interior plays a role in the design of the algorithm, which
requires us to consider the notion of barrier functions. This will also serve as a justification for the
name “barrier methods”.
With that definition for I at hand, we can then reformulate our problem as
m
X
min. f (x) + I(gi (x))
i=1
s.t.: Ax = b.
The issue however is that I is not numerically favourable, due to its nature of “shooting to infinity”
(including the discontinuity it creates) whenever a solution x is infeasible to the original problem.
To circumvent that, we can use barrier functions instead, which are chosen to mimic the behaviour
of I, whilst retaining more favourable numerical properties. The most widespread choice for barrier
functions is the logarithmic barrier, which is given by
Bµ (y) = −µ ln(−y)
where µ > 0 sets the accuracy of the barrier term Bµ (y). Figure 7.1 illustrates the influence of µ
in the shape of the barrier function for the unidimensional case. Notice how, as µ decreases, the
logarithmic barrier more closely resembles the indicator function I.
Using Bµ as the barrier function, the barrier problem Pµ can be formulated as
m
X
(Pµ ) : min. f (x) − µ ln(−gi (x))
i=1
s.t.: Ax = b.
2 The technical name of this function is the characteristic function, which is arguably a less informative name.
110 Chapter 7. Barrier Method for Linear Programming
2.0
=1
= 0.5
1.5 = 0.1
I(x)
1.0
(ln( x))
0.5
0.0
B (x) =
0.5
1.0
1.5
3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5
x
Figure 7.1: Alternative barrier functions for different values of µ. The dashed line represents the
indicator function I(x) going to infinity when x > 0
This formulation has a number of important benefits. First, notice how this problem is such that,
if one were to apply NR to solve its optimality conditions, one would be faced with solving linear
systems of equations, regardless of the nature of the constraints gi (x) ≤ 0, i = 1, . . . , m (recall that
NR solves first-order approximations of the original system of equations). This creates a bridge
between linear algebra techniques and a wide range of nonlinear optimisation problems.
Moreover, as discussed earlier, one can also gradually decrease µ by making µk+1 = βµ with
β ∈ (0, 1). This method is an idea generally known as Sequential Unconstrained Minimisation
Technique (SUMT). As one may suspect, as µ → 0, we have that x∗ (µ) → x∗ , where x∗ (µ) and
x∗ are the optimal values for problems Pµ and P , respectively. However, for small values of µ, the
barrier problem becomes challenging numerically, which does not come as a surprise as the barrier
resembles more and more the indicator function I and all its associated numerical issues.
Let us illustrate the above with an example. Consider the nonlinear problem
Notice that the unconstrained optimum would be x = −1 but the constrained optimum is x∗ = 0.
For this example, the barrier problem Pµ is
Because this is a univariate convex function, we know that the optimum is attained where the
derivative (f (x) + Bµ (x))′ = 0. Thus, we have that
30
(x + 1)2 ( = 0)
x
= 10
ln(x) x10
20 =5
x5
=2
x2
f(x) + B(x) = (x + 1)2
=1
10 x1
10 1 1 4
2 0 2 3
x
Figure 7.2: The optimal values of Pµ for different values of µ. Notice the trajectory formed by the
points x∗ (µ) as µ → 0.
The positive solution (since we must have that x > 0) of 2x2 + 2x − µ = 0 is given by
√
∗ −2 + 4 + 8µ
x (µ) = .
4
Notice that limµ→0 x∗ (µ) = 0, which is indeed the optimal x∗ for P . Figure 7.2 plots the function
f (x) + Bµ (x) for alternative values of µ indicating their minima (found by substituting the value
of µ in the expression for x∗ (µ)).
There is an interesting link between barrier methods and the notion of interior points, which is, in
a way, the reason why the two ideas are equivalent. Before we show they are indeed equivalent,
let us explore this notion of interiority.
For a large enough µ, the solution of the barrier problem Pµ is close to the analytic centre of the
polyhedral feasibility set. The analytic centre of a polyhedral set is the point at which the distance
to all of the hyperplanes forming the set is maximal. More precisely, consider the polyhedral
set S = {x ∈ Rn : Ax ≤ b}. The analytic centre of S is given by the optimal solution x∗ of the
following problem
m
Y
max. (bi − a⊤
i x)
x
i=1
s.t.: x ∈ S.
Notice that we obtain an equivalent problem, that is, a problem would return the same optimal
112 Chapter 7. Barrier Method for Linear Programming
solution x∗ , if we take the logarithm of the objective function, this would lead to
m
X
min. − ln(bi − a⊤
i x)
x
i=1
s.t.: x ∈ S.
This allows us to infer something about the behaviour of the barrier method. For larger values of
µ, the optimal solution of x∗ (µ) will lie close to the analytical centre of the feasible region. On the
other hand, as µ diminishes, the “pull” towards the centre slowly decays whilst the pull caused by
the objective function (that is, by its gradient −∇f (x)) slowly becomes more prevalent and steers
the solution towards x∗ .
(P ) : min. c⊤ x : Ax = b, x ≥ 0
By looking at this formulation, and relating to the previous discussion, a barrier method for a
linear programming problem operates such that in earlier iterations, or for larger µ values, the
optimal solution x(µ)∗ of Pµ tends to be pushed towards the centre of the feasible region, where
x > 0. As µ → 0, the influence of c⊤ x gradually takes over, steering the solution towards the
optimal x∗ of P .
The most remarkable result, which ties together interior point methods and barrier methods for
linear programming problems, is the following. If we derive the Karush-Kuhn-Tucker optimality
conditions (Section ?? in the nonlinear optimization section of these notes) for Pµ , we obtain the
following set of conditions
Ax = b, x > 0
1 1
A⊤ p = c − µ ,..., .
x1 xn
Using the same notation as before (i.e., defining X and U ), these can be equivalently rewritten as
Ax = b, x > 0
A⊤ p + u = c
u = µX −1 e ⇒ XU e = µe,
which are exactly (7.4)-(7.6). Thus, it becomes clear that relaxing the complementarity condi-
tions in the way that we have done before is equivalent to imposing logarithmic barriers to the
nonnegativity constraints of x.
7.5. Barrier methods for linear programming problems 113
There are many important consequences for the analysis of the method once this link is established.
One immediate consequence relates to the convergence of the method, meaning that as primal
and dual feasibility conditions are satisfied throughout the iterations of the method, as µ → 0,
complementarity conditions are satisfied in the limit, thus converging to the solution of the original
linear problem P .
Indeed, the trajectory formed by successive solutions {(x(µk ), p(µk ), u(µk ))}k=1,2,... is called the
central path, which is a consequence of the interiority forced by the barrier function. This property
is the reason why barrier methods are sometimes called “path-following” methods.
It is interesting to also notice the information encoded in (7.6). First, notice from (7.5) that
A⊤ p + u = c ⇔
p⊤ (Ax) + u⊤ x = c⊤ x ⇔
u⊤ x = c⊤ x − p⊤ b.
That is, u⊤ x provides us with an estimate of the current duality gap, which can be used to estimate
how far we are from the optimal solution. Moreover, notice that
n
X
u⊤ x = u(µ)i x(µ)i = nµ
i=1
meaning that the term µ indicates the average amount of violation per x and u variable pairs at
(xk , pk , uk ).
The version of the barrier method most often used has a few additional ideas incorporated into the
algorithm. One of the main ideas is using a single Newton step for each value of µ. That effectively
means that the iterates do not delineate exactly the central path formed by successively smaller
values of µ, but rather follow it only approximately.
More precisely, assume we start with µk > 0 and a (xk , pk , uk ) close to (x(µk ), p(µk ), u(µk )).
Then, for a small β ∈ (0, 1), a Newton step with µk+1 = βµk leads to (xk+1 , pk+1 , uk+1 ) close to
x(µk+1 ), p(µk+1 ), u(µk+1 ). Now, to be more precise in terms of what is meant by close, we must
refer to the notion of central paths neighbourhoods.
A common neighbourhood often used in convergence analysis of the barrier method is
which essentially consists of a bound on the difference between the expected value of the com-
plementarity condition violation µ and that observed by the current solution (xk , pk , uk ). The
parameter α ∈ (0, 1] is an arbitrary scalar that dictates the amount of such difference tolerated
and effectively controls how wide the neighbourhood is. Then, by setting values for β and α that
satisfy β = 1 − √σn for some σ ∈ (0, 1) (e.g., α = σ = 0.1), one can guarantee that the complemen-
tarity violation of the next iteration to be bounded such that (xk+1 , pk+1 , uk+1 ) ∈ Nµk+1 (α) [3].
Figure 7.3 illustrate this idea, depicting two successive iterates of the method remaining within
the shrinking neighbourhoods Nµk (α).
Another important aspect of the method relates to the feasibility requirements of each step. Current
implementations of the method can be shown to converge under particular conditions even if the
114 Chapter 7. Barrier Method for Linear Programming
xk+1 c
xµk+1
Nµ2 (α)
xk
xµk
Nµ1 (α)
x+∞
Figure 7.3: An illustrative representation of the central path and how the method follows it
approximately
feasibility requirements are eased in the Newton system. In this case, it needs to be shown that
the Newton direction is also such that the amount of infeasibility in the system decreases as the
algorithm progresses.
The infeasible version of the algorithm is such that the so-called perturbed (or relaxed) system
k+1
A 0 0 dx 0
0 A⊤ I dk+1 p
= 0
k k k+1 k k
U 0 X du µk+1 e − X U e
becomes k+1
Axk − b
A 0 0 dx
0 A⊤ I dk+1
p
= − A⊤ pk + uk − c , (7.7)
Uk 0 X k k+1
du X k U k e − µk+1 e
where µk+1 = βµk .
The primal residual rp (x) = Ax − b and dual residual rd (p, u) = A⊤ p + u − c allow the method to
iterate through solutions (xk , pk , uk ) that do not satisfy primal and/ or dual feasibility. A caveat
though is that the convergence analysis of this variant is more involved and requires additional
assumptions on the parameterisation of the algorithm for convergence.
To see how the residuals decay at each iteration, notice the following. Let r(w) = r(x, p, u) =
(rp (x), rd (p, u)). The optimality conditions (7.1)-(7.3) require the residuals to vanish, that is,
r(w) = 0. Now, let us consider the first-order approximation for r at wk for a Newton step dw .
This amounts to the following:
r(wk + dw ) ≈ r(wk ) + Dr(wk )dw ,
where Dr(wk ) is the Jacobian of r evaluated at wk . One can notice that, in this notation, the
solution dk+1
w to (7.7) is such that
Dr(wk )dk+1
w = −r(wk ).
7.5. Barrier methods for linear programming problems 115
Now, let us consider the directional derivative of the norm of the updated residues in the direction
dk+1
w . That leads to the following conclusion
d 2
r(wk + tdk+1
w ) 2 t=0
= 2r(wk )⊤ Dr(wk )dk+1
w = −2∥r(wk )∥22 < 0.
dt
This shows that the direction dk+1
w is a descent direction for the norm of the residues at wk which
leads to their eventual vanishing.
We are finally ready to pose the pseudocode of a working version of the barrier method, which is
displayed in Algorithm 4.
Some final remarks are worth making. Barrier methods are perhaps the only class of methods that
are known for being strong contenders to the dominance of simplex method-based approaches for
linear programming problems. This stems from both a theoretical and an experimental perspective.
The convergence analysis available in
√[3] shows that the number of iterations of the barrier method,
for suitable parameterisation, is O( n ln(1/ϵ)), whilst, for the simplex method, a similar analysis
would bound the number of iterations to be of the order O(2n ). A conclusion stemming from these
results would be that barrier methods, being polynomial complexity algorithms, should outperform
the simplex method.
But in practice, this is not necessarily the case. Despite its complexity analysis, the simplex
method in practice often requires O(m) iterations (i.e., a typically modest multiple of m), where
m is the number of columns in the √ basis. On the other hand, practical observation has shown
barrier methods to typically require n iterations or less and appear to be somewhat insensitive
to growth in the number of variables further to a certain point. On the other hand, one iteration of
the simplex method is rather inexpensive from a computational standpoint, while one iteration of
a barrier method is a potentially computationally demanding operation (as it requires the solution
of a large linear system of equations).
In general, simplex-method-based solvers are faster on problems of small to medium dimensions,
while barrier methods are competitive, and often faster, on large problems. However, this is not
a rule, as the performance is dependent on the structure of the particular application. In the
end, it all boils down to how effectively the underlying liner system solver can exploit particular
structural features that allow using dedicated numerical algebra methods (e.g., sparsity and the
use of Cholesky decomposition combined with triangular solves).
Moreover, barrier methods are generally not able to take full advantage of any prior knowledge
about the solution, such as an estimate of the solution itself or some suboptimal basis. Hence,
barrier methods are less useful than simplex approaches in situations in which “warm-start” infor-
mation is available. One situation of this type involves branch-and-bound algorithms for solving
116 Chapter 7. Barrier Method for Linear Programming
integer programs, where each node in the branch-and-bound tree requires the solution of a linear
program that differs only slightly from one already solved in the parent node. In other situations,
we may wish to solve a sequence of linear programs in which the data is perturbed slightly to
investigate the sensitivity of the solutions, or in which we approximate a non-linear optimisation
problem by a sequence of linear programs. In none of the aforementioned scenarios can barrier
methods be used as efficiently as the simplex method (dual or primal, depending on the context).
7.6. Exercises 117
7.6 Exercises
Exercise 7.1: Step size rule for the barrier method
Solve the following optimisation problem using the barrier method.
(P ) min x1 + x2 (7.8)
s.t. 2x1 + x2 ≥ 8 (7.9)
x1 + 2x2 ≥ 10 (7.10)
xi ≥ 0, ∀i ∈ {1, 2} (7.11)
where cj , j ∈ N , is a weight, and F is a family of feasible subsets. As the name suggests, in these
problems we are trying to form combinations of elements such that a measure (i.e., an objective
119
120 Chapter 8. Integer Programming Models
Incidence vectors will permeate many of the (mixed-)integer programming formulations we will
see. Notice that, once xS is defined, the objective function simply becomes j∈N cj xj . Integer
P
programming formulations are particularly suited for combinatorial optimisation problems when
F can be represented by a collection of linear constraints.
1 1 1 1
C12
C31
2 2 2 2
C24
3 3 3 3
C43
4 4 4 4
(a) (b)
Figure 8.1: An illustration of all potential assignments as a graph and an example of one possible
assignment, with total cost C12 + C31 + C24 + C43
To represent the problem, let xij = 1 if worker i is assigned to job j and 0, otherwise. Let
N = {1, . . . , n} be a set of indices of workers and jobs (we can use the same set since they are of
same number). The integer programming model that represent the assignment problem is given
8.2. (Mixed-)integer programming applications 121
by
XX
(AP ) : min. Cij xij
i∈N j∈N
X
s.t.: xij = 1, ∀i ∈ N
j∈N
X
xij = 1, ∀j ∈ N
i∈N
xij ∈ {0, 1} , ∀i, ∀j ∈ N.
Before we proceed, let us make a parallel to combinatorial problems. The assignment problem is an
example of a combinatorial problem, which can be posed by making (i) N the set of all job-worker
pairs (i, j); (ii) S ∈ F the (i, j) pairs in which i and j appear in exactly one pair, and, (iii) xS such
that xij , where i, j = 1, . . . , n. Thus, the assignment problem is an example of a combinatorial
optimisation problem that can be represented as an integer programming formulation.
Notice that the knapsack problem has variants in which an item can be selected a certain number
of times, or that multiple knapsacks must be considered simultaneously, both being generalisations
of KP .
Also, the knapsack problem is also a combinatorial optimisation problem, which can be stated by
defining (i) N the set of all items {1, . . . , n}, (ii) S ∈ F the subset of items with total cost not
greater than B, and (iii) xS such that xi , ∀i ∈ N .
items bins
C11
1 1
C12
2 2
C23
3 3
C44
4 4
Figure 8.2: An example of a bin packing with total cost C11 + C12 + C23 + C44
bin, or a bin might have no item assigned. In some contexts, this problem is also known as the bin
packing problem.
In this case, we would like to assign all of the m items to n bins, observing that the capacity B of
each bin cannot be exceeded by the weights Ai of the items assigned to that bin. We know that
assigning the item i = 1, . . . , m to the bin j = 1, . . . , n costs Cij . Our objective is to obtain a
minimum-cost bin assignment (or packing) that does not exceed any of the bin capacities. Figure
8.2 illustrates a possible assignment of items to bins. Notice that the number of total bins does
not necessarily need to be the same the number of items.
To formulate the generalised assignment problem as an integer programming problem, let us define
xij = 1, if item i is packed into bin j, and xij = 0 otherwise. Moreover, let M = {1, . . . , m} be
the set of items and N = {1, . . . , n} be the set of bins. Then, the problem can be formulated as
follows.
XX
(GAP ) : min. Cij xij
x
i∈M j∈N
Xn
s.t.: xij = 1, ∀i ∈ M
j=1
Xm
Ai xij ≤ Bj , ∀j ∈ N
i=1
xij ∈ {0, 1} , ∀i ∈ M, ∀j ∈ N.
Hopefully, the parallel to the combinatorial optimisation problem version is clear at this point and
is let for the reader as a thought exercise.
Our objective is to decide where to open the centres so that all regions are served and the total
opening cost is minimised.
Figure 8.3 illustrates an example of a set covering problem based on a fictitious map broken
into cells representing, e.g., regions. Each of the cells represents a region that must be covered,
i.e., M = {1, . . . , 20}. The blue cells represent regions that can have a centre opened, that is,
N = {3, 4, 7, 11, 12, 14, 19}. Notice that N ⊂ M . In this case, we assume that if a centre is
opened at a blue cell, then it can serve the respective cell and all adjacent cells. For example,
S3 = {1, 2, 3, 8}, S4 = {2, 4, 5, 6, 7}, and so forth.
Figure 8.3: The hive map illustrating the set covering problem. Our objective is to cover all of the
regions while minimising the total cost incurred by opening the centres at the blue cells
To model the set covering problem, and pretty much any other problem involving indexed subsets
such as Sj , ∀j ∈ N , we need an auxiliary parameter matrix A, often referred to as 0-1 incidence
matrix such that
(
Aij = 1, if i ∈ Sj ,
A=
Aij = 0, otherwise.
For example, referring to Figure 8.3, the first column of A would refer to j = 3 and would have
nonzero values at rows 1, 2, 3, and 8.
We are now ready to pose the set covering problem as an integer programming problem. For that,
let xj = 1 if a facility is opened (or a service centre is located) at location j and xj = 0, otherwise.
In addition, let M = {1, . . . , m} be the set of regions to be served and N = {1, . . . , n} be the set
of candidate places to have a facility opened. Then, the set covering problem can be formulated
as follows.
X
(SCP ) : min. Cj xj
x
j∈N
X
s.t.: Aij xj ≥ 1, ∀i ∈ M
j∈N
xj ∈ {0, 1} , ∀j ∈ N.
As a final note, notice that this problem can also be posed as a combinatorial optimisation problem
124 Chapter 8. Integer Programming Models
of the form
X [
min. Cj : Sj = M ,
T ⊆N
j∈T j∈T
1 6
3 4 5
To pose the problem as an integer programming model, let us define xij = 1 if city j is visited
directly after city i, and xij = 0 otherwise. Let N = {1, . . . , n} be the set of cities. We assume
that xii is not defined for i ∈ N . A naive model for the travelling salesperson problem would be
XX
(T SP ) : min. Cij xij
x
i∈N j∈N
X
s.t.: xij = 1, ∀i ∈ N
j∈N \{i}
X
xij = 1, ∀j ∈ N
i∈N \{j}
First, notice that the formulation of T SP is exactly the same as that of the assignment problem.
However, this formulation has an issue. Although it can guarantee that all cities are only visited
once, it cannot enforce an important feature of the problem which is that the tour cannot present
disconnections, i.e., contain sub-tours. In other words, the salesperson must physically visit from
city to city in the tour, and cannot “teleport” from one city to another. Figure 8.5 illustrates the
concept of sub-tours.
In order to prevent sub-tours, we must include constraints that can enforce the full connectivity
of the tour. There are mainly two types of such constraints. The first is called cut-set constraints
8.2. (Mixed-)integer programming applications 125
2 3 6
1 4 5
Figure 8.5: A feasible solution for the naive TSP model. Notice the two sub-tours formed
and is defined as X X
xij ≥ 1, ∀S ⊂ N, 2 ≤ |S| ≤ n − 1.
i∈S j∈N \S
The cut-set constraints act by guaranteeing that among any subset of nodes S ⊆ N there is always
at least one arc (i, j) connecting one of the nodes in S and a node not in S.
An alternative type of constraint is called sub-tour elimination constraint and is of the form
XX
xij ≤ |S| − 1, ∀S ⊂ N, 2 ≤ |S| ≤ n − 1.
i∈S j∈S
Differently from the cutset constraints, the sub-tour elimination constraints prevent the cardinality
of the nodes in each subset from matching the cardinality of arcs within the same subset.
For example, consider the sub-tours illustrated in Figure 8.5 and assume that we would like to
prevent the sub-tour formed by S = {1, 2, 3}. Then the cutset constraint would be of the form
There are some differences between these two constraints and, typically cutset, constraints are
preferred for being stronger (we will discuss the notion of stronger constraints in the next chapters).
In any case, either of them suffers from the same problem: the number of such constraints quickly
becomes computationally prohibitive as the number of nodes increases. This is because one would
have to generate a constraint to each possible node subset combination from sizes 2 to n − 1.
A possible remedy to this consists of relying on delayed constraint generation. In this case, one
can start from the naive formulation T SP and from the solution, observe whether there are any
sub-tours formed. That being the case, only the constraints eliminating the observed sub-tours
need to be generated, and the problem can be warm-started. This procedure typically terminates
far earlier than having all of the possible cutset or sub-tour elimination constraints generated.
facilities located and the client-facility association depends on the trade-offs between locating and
service costs.
i4 i4
i7 j2 i7 j2
j5 j5
i3 i3
i8 j6 i8 j6
j1 j1
j3 i6 i2 j3 i6 i2
i1 i5 i1 i5
j4 j4
(a) (b)
Figure 8.6: An illustration of the facility location problem and one possible solution with two
facilities located (right)
To formulate the problem as a mixed-integer programming problem, let us define xij as the fraction
of the demand in i ∈ M being served by a facility located at j ∈ N . In addition, we define the
binary variable yj such that yj = 1, if a facility is located at j ∈ N and 0, otherwise. With those,
the uncapacitated facility location (or UFL) problem can be formulated as
X XX
(U F L) : min. Fj yj + Cij xij (8.1)
x,y
j∈N i∈M j∈N
X
s.t.: xij = 1, ∀i ∈ M (8.2)
j∈N
X
xij ≤ myj , ∀j ∈ N (8.3)
i∈M
xij ≥ 0, ∀i ∈ M, ∀j ∈ N (8.4)
yj ∈ {0, 1} , ∀j ∈ N. (8.5)
Some features of this model are worth highlighting. First, notice that the absolute values associated
with the demand at nodes i ∈ M are somewhat implicitly represented in the cost parameter Cij .
This is an important modelling feature that allows the formulation to be not only stronger but
also more numerically favourable (avoiding large coefficients). Therefore, the demand is thought
as being, at each node, 1 or 100%, and 0 ≤ xij ≤ 1 represents the fraction of the demand at i ∈ M
being served by a facility eventually located at j ∈ N . Second, notice how the variable xij is only
allowed to be greater than zero if the variable yj is set to 1, due to (8.3). Notice that m, the
number of clients, is acting as a maximum upper bound for the amount of demand being served
from the facility j, which would be at most m when only one facility is located. That constraint
is precisely the reason why the problem is called uncapacitated, since, in principle, there are no
capacity limitations on how much demand is served from a facility.
Facility location problems are frequent in applications associated with supply chain planning prob-
lems and can be specialised to a multitude of settings, including capacitated versions (both nodes
and arcs), versions where the arcs must also be located (or built), having multiple echelons, and
so forth.
8.3. Good formulations 127
Notice that the formulation of U LS is veryP similar to that seen in Chapter 1, with exception of the
variable yt , its associated fixed cost term t∈T Fj yj and the constraint (8.6). This constraint is
precisely what renders the “uncapacitated” nomenclature, and is commonly known in the context
of mixed-integer programming as big-M constraints. Notice that the constant M is playing the
role of +∞: it only really makes the constraint relevant when yt = 0, so that pt ≤ 0 is enforced,
making thus pt = 0. However, this interpretation has to be taken carefully. Big-M constraints
are known for being the cause of numerical issues and worsening the performance of mixed-integer
programming solver methods. Thus, the value of M must be set such that it is the smallest
value possible such that it does not artificially create a constraint. Finding these values are often
challenging and instance dependent. In the capacitated case, M can trivially take the value of the
production capacity.
best formulation will be the one that requires the solution of the least of such LP relaxations. And
precisely, it so turns out that the number of LP relaxations that need to be solved (and, hence,
performance) is strongly dependent on how closely it resembles the convex hull of the feasible
solutions.
An LP relaxation simply consists of a version of the original MIP problem in which the integrality
requirements are dropped. Most of the methods used to solve MIP models are based on LP
relaxation. There are several reasons why the employment of LP relaxations is a good strategy.
First, we can solve (and resolve) LP problems efficiently. Secondly, the solution of the LP relaxation
can be used to reduce the search space of the original MIP. However, simply rounding the solution
of the LP relaxation will typically not lead to relevant solutions.
Let us illustrate the geometry of an integer programming model, such that the points we were
discussing become more evident. Consider the problem
16
(P ) : max. x1 + x2
x 25
s.t.: 50x1 + 31x2 ≤ 250
3x1 + 31x2 ≥ −4
x1 , x2 ∈ Z+ .
The feasible region of problem P is represented in Figure 8.7. First, notice how in this case the
feasible region is not a polyhedral set anymore, but yet a collection of discrete points (represented
in blue) that happen to be within the polyhedral set formed the linear constraints. This is one of
the main complicating features of MIPs because the premise of convexity does not hold anymore.
Another point can be noticed in Figure 8.7. Notice that rounding the solution obtained from
the LP relaxation would in most cases lead to infeasible solutions, except when x1 is rounded
up and x2 rounded down, which leads to the suboptimal solution (2, 5). However, one can still
graphically find the optimal solution using exactly the same procedure as that employed for linear
programming problems, which would lead to the optimal integer solution (5, 0).
x2
6
LP sol. (376/193, 950/193)
5
1
opt. IP sol
0
0 1 2 3 4 5 6 x1
One aspect that can be noticed from Definition 8.1 is that the feasible region of an integer pro-
gramming problem is a collection of points, represented by X. This is illustrated in Figure 8.8,
where one can see three alternative formulations, P1 , P2 , and P3 for the same set X.
x2
5
P2 P1
4
3 P3
0
0 1 2 3 4 5 x1
Figure 8.8: An illustration of three alternative formulations for X. Notice that P3 is an ideal
formulation, representing the convex hull of X.
The formulation P3 has a special feature associated with it. Notice how all extreme points of P3
belong to X. This has an important consequence. That implies that the solution of the original
integer programming problem can be obtained by solving a single LP relaxation, since the solution
of both problems is the same. This is precisely what characterises an ideal formulation, which is
that leading to a minimal (i.e., only one) number of required LP relaxation solutions as solving an
LP relaxation over an ideal P yields a solution x ∈ X for any cost vector c. This will only be the
case if the formulation P is the convex hull of X.
This is the case because of two important properties relating the set X and its convex hull conv(X).
The first is that conv(X) is a polyhedral set and the second is that the extreme points of conv(X)
belong to X. We summarise those two facts in Proposition 8.2, to which we will refer shortly. Notice
that the proof for the proposition can be derived from Definition 2.7 and Theorem 2.8.
Proposition 8.2. conv(X) is a polyhedral set and all its extreme points belong to X.
Unfortunately, this is often not the case. Typically, except for some specific cases, a description
of conv(X) is not known and deriving it algorithmically is an impractical computational task.
However, Proposition 8.2 allows us to define a structured way to compare formulations. This is
summarised in Definition 8.3.
130 Chapter 8. Integer Programming Models
Definition 8.3. Given a set X ⊆ Zn × Rn and two formulations P1 and P2 for X, P1 is a better
formulation than P2 if P1 ⊂ P2 .
Definition 8.3 gives us a framework to try to demonstrate that a given formulation is better than
another. If we can show that P1 ⊂ P2 , then, by definition (literally), P1 is a better formulation
than P2 . Clearly, this is not a perfect framework, since, for example, it would not be useful for
comparing P1 and P2 in Figure 8.8, and, in fact, there is nothing that can be said a priori about
the two in terms of which will render the better performance. Often, in the context of MIP, this
sort of analysis can only rely on careful computational experimentation.
A final point to make is that sometimes one must compare formulations of distinct dimensions,
that is, with a different number of variables. When that is the case, one can resort to projection,
as a means to compare both formulations onto the same space of variables.
8.4. Exercises 131
8.4 Exercises
xt production in period t
(x, s, y) : st−1 + xt = dt + st , ∀t ∈ N
xt ≤ M y t , ∀t ∈ N
st stock in period t
yt setup in period t
s0 = 0,
(x,s,y)
PULS-1 =
st ≥ 0, ∀t ∈ N
dt demand in period t
xt ≥ 0, ∀t ∈ N
M maximum production
0 ≤ yt ≤ 1, ∀t ∈ N
P
M = t∈N dt
t
(w, y) :
P
wit = dt , ∀t ∈ N wit production in period i
to be used in period t
i=1
(w,y)
PULS-2 = wit ≤ dt yi , ∀i, t ∈ N : i ≤ t
yt setup in period t
wit ≥ 0, ∀i, t ∈ N : i ≤ t
0 ≤ yt ≤ 1, ∀t ∈ N
(x,s,y,w) (x,s,y)
proj (PULS-2 ) ⊆ PULS-1 ,
x,s,y
(b) The optimisation problems associated with the two ULS formulations are
P
(ULS-1) min. (ft yt + pt xt + qt st )
x,s,y t∈N xt production in period t
s.t.: st−1 + xt = dt + st , ∀t ∈ N st stock in period t
xt ≤ M y t , ∀t ∈ N yt setup in period t
s0 = 0,
dt demand in period t
st ≥ 0, ∀t ∈ N
M maximum production
xt ≥ 0, ∀t ∈ N P
M = t∈N dt
0 ≤ yt ≤ 1, ∀t ∈ N
132 Chapter 8. Integer Programming Models
" !#
P n
P t
P n
P
(ULS-2) min. ft yt + pt wti + qt wij − di
w,y t∈N i=t i=1 j=i
wit production in period i
t
to be used in period t
P
s.t.: wit = dt , ∀t ∈ N
i=1 yt setup in period t
wit ≤ dt yi , ∀i, t ∈ N : i ≤ t
wit ≥ 0, ∀i, t ∈ N : i ≤ t
0 ≤ yt ≤ 1, ∀t ∈ N
Consider a ULS problem instance over N = {1, . . . , 6} periods with demands d = (6, 7, 4, 6, 3, 8),
set-up costs f = (12, 15, 30, 23, 19, 45), unit production costs p = (3, 4, 3, 4, 4, 5), unit storage costs
P6
q = (1, 1, 1, 1, 1, 1), and maximum production capacity M = i=1 dj = 34. Solve the problems
ULS-1 and ULS-2 with Julia using JuMP to verify the result of part (a) computationally.
where xij = 1 if city j ∈ N is visited immediately after city i ∈ N , and xij = 0 otherwise.
Constraints (∗) with the variables ui ∈ R for all i ∈ N are called Miller-Tucker-Zemlin (MTZ)
subtour elimination constraints.
Hint: Formulation PM T Z is otherwise similar to the formulation presented, except for the con-
straints (∗) which replace either the cutset constraints
X X
xij ≥ 1, ∀S ⊂ N, S ̸= ∅,
i∈S j∈N \S
which are used to prevent subtours in TSP solutions. Thus, you have to show that:
(2) Every TSP solution x (on the same graph G) satisfies the constraints (∗).
8.4. Exercises 133
You can prove (1) by contradiction. First, assume that a solution x ∈ PM T Z has a subtour with k
arcs (i1 , i2 ), . . . , (ik−1 , ik ), (ik , i1 ) and k nodes {i1 , . . . , ik } ∈ N \ {1}. Then, write the constraints
(∗) for all arcs in this subtour and try to come up with a contradiction.
You can prove (2) by finding suitable values for each u2 , . . . , un that will satisfy the constraints
(∗) for any TSP solution x. Recall that a TSP solution represents a tour that visits each of the
N = {1, . . . , n} cities exactly once and returns to the starting city.
Implement the model and solve the problem instance with Julia using JuMP.
Hint: You can assume that the variables u2 , . . . , un take unique integer values from the set
{2, . . . , n}. That is, we have ui ∈ {2, . . . , n} for all i = 2, . . . , n with ui ̸= uj for all i, j ∈ 2, . . . , n.
This holds for any TSP solution of problem MTZ as we showed in Exercise 8.2. If we fix city 1 as
the starting city, then the value of each ui represents the position of city i in the TSP tour, i.e.,
ui = t for t = 2, . . . , n if city i ̸= 1 is the t:th city in the tour. You have to check that each of the
inequalities (8.7) - (8.10) hold (individually) for any arc (i, j) ∈ A and city i ∈ N that are part of
the inequality, by checking that the following two cases are satisfied: either xij = 0 or xij = 1.
134 Chapter 8. Integer Programming Models
(b) Add all four sets of inequalities (8.7) - (8.10) to the MTZ formulation and compare the com-
putational performance against the model with no extra inequalities.
where x1 representes the amount (in tons) of exterior paint produced, and x2 the amount of
interior paint produced. This is a simple linear problem that has been used as an illustrative
example throughout the lecture notes.
To demonstrate some typical structures in integer programming models, we now add two modifi-
cations to the problem, namely
• Piecewise linear functions: the profit for exterior paint depends on the amount produced.
The profit for the first ton produced is 7 ($1000/ton), while the next ton can only be sold at
a profit of 5, the third for a profit of 3 and the fourth ton only makes a profit of 1.
• Logical constraints: the factory has to produce at least 1.5 tons of at least one of the two
paint types, that is, either x1 ≥ 1.5 or x2 ≥ 1.5 must hold.
As seen in Fig. 8.9, these modifications result in a nonlinear objective function and a nonconvex
feasible region. Using additional integer/binary variables, formulate and solve the problem as a
mixed-integer linear problem.
8.4. Exercises 135
Figure 8.9: The feasible region of the problem, and the contours of the objective function
136 Chapter 8. Integer Programming Models
Chapter 9
Branch-and-bound Method
9.2 Relaxations
Before we present the method itself, let us discuss the more general concept of relaxation. We have
visited the concept somewhat informally before, but now we will concentrate on a more concrete
definition.
Consider an integer programming problem of the form
z = min. c⊤ x : x ∈ X ⊆ Zn .
x
To prove that a given solution x∗ is optimal, we must rely on the notion of bounding. That is,
we must provide a pair of upper and lower bounds that are as close (or tight) as possible. If it
happens that these bounds have the same value, and thus match the value of z = c⊤ x∗ , we have
available a certificate of optimality for x∗ . This concept must sound familiar to you. We already
used similar arguments in Chapter 5, when we introduced the notion of dual bounds.
Most methods that can prove optimality work by bounding the optimal solution. In this context,
bounding means to construct an increasing sequence of lower bounds
137
138 Chapter 9. Branch-and-bound Method
to obtain as tight as possible lower (z ≤ z) and upper (z ≥ z) bounds. Notice that the process can
be arbitrarily stopped when z t − z s ≤ ϵ, where s and t are some positive integers and ϵ > 0 is a
predefined (suitably small) tolerance. The term ϵ represents an absolute optimality gap, meaning
that one can guarantee that the optimal value is at most greater than z by ϵ units and at most
smaller than z by ϵ units. In other words, the optimal value must be either z, z, or a value in
between.
This framework immediately poses the key challenge of deriving such bounds efficiently. It turns
out that this is a challenge that goes beyond the context of mixed-integer programming problems.
In fact, we will see this idea of bounding in Chapter 12, when we discuss decomposition methods,
which also generate lower and upper bounds during their execution.
Regardless of the context, bounds are typically of two types: primal bounds, which are bounds
obtained by evaluating a feasible solution (i..e, that satisfy primal feasibility conditions); and dual
bounds, which are typically attained when primal feasibility is allowed to be violated so that a
dual feasible solution is obtained. In the context of minimisation, primal bounds are upper bounds
(to be minimised), while dual bounds are lower bounds (to be maximised). Clearly, in the case of
maximisation, the reverse holds.
Primal bounds can be obtained by means of a feasible solution. For example, one can heuristically
assemble a solution that is feasible by construction. On the other hand, dual bounds are typically
obtained by means of solving a relaxation of the original problem. We are ready now to provide
Definition 9.1, which formally states the notion of relaxation.
Definition 9.1 (Relaxation). A problem
c⊤ x : x ∈ X ⊆ Rn
(RP ) : zRP = min.
is a relaxation of problem ⊤
(P ) : z = min. c x : x ∈ X ⊆ Rn
if X ⊆ X, and c⊤ x ≤ c⊤ x, ∀x ∈ X.
Definition 9.1 provides an interesting insight related to relaxations: they typically comprise an
expansion of the feasible region, possibly combined with an objective function bounding. Thus,
two main strategies to obtain relaxations are to enlarge the feasible set by dropping constraints
and replacing the objective function with another of same or smaller value. One might notice at
this point that we have used a very similar argumentation to define linear programming duals in
Chapter 5. We will return to the relationship between relaxations and Lagrangian duality in a
more general setting in Part ??, when we discuss the notion of Lagrangian relaxation.
Clearly, for relaxations to be useful in the context of solving mixed-integer programming problems,
they have to be easier to solve than the original problem. That being the case, we can then rely
on two important properties that relaxations have, which are crucial for using them as a means to
generate dual bounds. These are summarised in Proposition 9.2 and 9.3.
Proposition 9.2. If RP is a relaxation of P, then zRP is a dual bound for z.
Proof. Let x∗RP be the optimal solution for RP . For any optimal solution x∗ of P , we have that
x∗ ∈ X ⊆ X, which implies that x∗ ∈ X. Thus, we have that
z = c⊤ x∗ ≥ c⊤ x∗ ≥ c⊤ x∗RP = zRP .
The first inequality is due to Definition 9.1 and the second is because x∗ is simply a feasible
solution, but not necessarily the optimal, for zRP .
9.2. Relaxations 139
Proof. To prove (1), simply notice that if X = ∅, then X = ∅. Now, let x∗ be the optimal solution
for RP . To show (2), notice that as x∗ ∈ X, z ≤ c⊤ x∗ = c⊤ x∗RP = zRP . From Proposition 9.2, we
have z ≥ zRP . Thus, z = zRP .
problem min. c⊤ x : x ∈ P .
Notice that an LP relaxation is indeed a relaxation, since we are enlarging the feasible region by
dropping the integrality requirements while maintaining the same objective function (cf. Definition
9.1).
Let us consider a numerical example. Consider the integer programming problem
z = max. 4x1 − x2
x
s.t.: 7x1 − 2x2 ≤ 14
x2 ≤ 3
2x1 − 2x2 ≤ 3
x ∈ Z2+ .
A dual (upper; notice the maximisation) bound for z can be obtained by solving its LP relaxation,
which yields the bound zLP = 8.42 ≥ z. A primal (lower) bound can be obtained by choosing any
of the feasible solutions (e.g., (2, 1), to which z = 7 ≤ z). This is illustrated in Figure 9.1.
We can now briefly return to the discussion about better (or stronger) formulations for integer
programming problems. Stronger formulations are characterised by those that wield stronger
relaxations, or, specifically, that yield relaxations that are guaranteed to provide better (or tighter)
dual bounds. This is formalised in Proposition 9.5.
Proposition 9.5. Let P1 and P2 be formulations of the integer programming problem
min. c⊤ x : x ∈ X with X = P1 ∩ Zn = P2 ∩ Zn .
x
c⊤ x : x ∈ Pi
i
Assume P1 is a better formulation than P2 (i.e., P1 ⊂ P2 ). Let zLP = min. for
1 2
i = 1, 2. Then zLP ≥ zLP for any cost vector c.
x2
3
x0LP
0
x1
0 1 2 3 4
Figure 9.1: The feasible region of the example (represented by the blue dots) and the solution of
the LP relaxation, with objective function value zLP = 8.42
Still relating to the TSP, one can obtain even stronger combinatorial relaxations using 1-trees for
symmetric TSP. Let us first define some elements. Consider an undirected graph G = (V, E) with
edge (or arcs) weights ce for e ∈ E. The objective is to find a minimum weight tour.
Now, notice the following: (i) a tour contains exactly two edges adjacent to the origin node (say,
node 1) and a path through nodes {2, . . . , |V |}; (ii) a tour is a special case of a (spanning) tree,
which is any subset of edges that covers (or touch at least once) all nodes v ∈ V .
We can now define what is a 1-tree. A 1-tree is a subgraph consisting of two edges adjacent to
node 1 plus the edges of a tree on nodes {2, . . . , |V |}. Clearly, every tour is a 1-tree with the
additional requirement (or constraint) that every node has exactly two incident edges. Thus, the
problem of finding minimal 1-trees is a relaxation for the problem of finding optimal tours. Figure
9.2 illustrates a 1-tree for an instance with eight nodes.
Once again, it so turns out that several efficient algorithms are known for forming minimal spanning
9.3. Branch-and-bound method 141
2 8
1 5 6
3 4 7
edges of a tree on nodes {2, . . . , 8}. 2 edges from node 1.
trees, which can be efficiently utilised as a relaxation for the symmetric TSP, that is
( )
X
zST SP = min. ce : T forms a tour ≥
T ⊆E
e∈T
( )
X
z1−T REE = min. ce : T forms a 1-tree .
T ⊆E
e∈T
The working principle behind this strategy is based on the principle formalised in Proposition 9.6.
z = max z k .
k∈K
Notice the use of the word decomposition in Proposition 9.6. Indeed, the principle is philosophically
the same, and this connection will be exploited in later chapters when we discuss in more detail
the available technology for solving mixed-integer programming problems.
Now, one challenging aspect related to divide-and-conquer approaches is that, in order to be able
to find a solution, one might need to repeat several times the strategy based on Proposition 9.6,
which suggests breaking any given problem in multiple subproblems. This leads to a multi-layered
collection of subproblems that must have their relationships managed. To address this issue, such
methods typically rely on tree structures called enumerative trees, which are simply a representation
142 Chapter 9. Branch-and-bound Method
that allows for keeping track of the relationship (represented by branches) between subproblems
(represented by nodes).
3
Figure 9.3 represents an enumerative tree for a generic problem S ⊆ {0, 1} in which one must
define the value of a three-dimensional binary variable. The subproblems are formed by, at each
level, fixing one of the components to zero or one, forming then two subproblems. Any strategy to
form subproblems that generate two subproblems (or children) is called a binary branching.
S0 S1
Figure 9.3: A enumeration tree using binary branching for a problem with three binary variables
S = S0 ∪ S1 = {x ∈ S : x1 = 0} ∪ {x ∈ S : x1 = 1} ,
which renders two subproblems. Then, each subproblem is again decomposed each into two chil-
dren, such that
Si = Si0 ∪ Si1 = {x ∈ S : x1 = i, x2 = 0} ∪ {x ∈ S : x1 = i, x2 = 1} .
Finally, once all of the variables are fixed, we arrive at what is called the leaves of the tree. These
are such that they cannot be further divided, since they immediately yield a candidate solution
for the original problem.
Notice that applying Proposition 9.6, we can recover an optimal solution to the problem.
First, notice tat P is a maximisation problem, for which an upper bound is a dual bound, ob-
tained from a relaxation, and a lower bound is a primal bound, obtained from a feasible solution.
Proposition 9.7 states that the best known primal (lower) bound can be applied globally to all of
the subproblems Sk , k ∈ K. On the other hand, dual (upper) bounds can only be considered valid
locally, since only the worst of the upper bounds can be guaranteed to hold globally.
Pruning branches is made possible by combining relaxations and global primal bounds. If, at any
moment of the search, the solution of a relaxation of Sk is observed to be worse than a known
global primal bound, then any further branching from that point onwards would be fruitless, since
no solution found from that subproblem could be better than the relaxation for Sk . Specifically,
we have that
z ≥ z k ≥ z k′ , ∀k ′ that is descendent of k.
Notice that this implies that each of the subproblems will be disjunct (i.e., with no intersection)
and have one additional constraint that eliminates the fractional part of the component around
the solution of the LP relaxation.
Bounding can occur in three distinct ways. The first case is when the solution of the LP relaxation
happens to be integer and, therefore, optimal for the subproblem itself. In this case, no further
exploration along that subproblem is necessary and we say that the node has been pruned by
optimality.
Figure 9.4 illustrates the process of pruning by optimality (in a maximisation problem). Each box
denotes a subproblem, with the interval denoting known lower (primal) and upper (dual) bounds
for the problem and x denoting the solution for the LP relaxation of the subproblem. In Figure 9.4,
we see a pruning that is caused because a solution to the original (integer) subproblem has been
identified by solving its (LP) relaxation, akin to the leaves in the enumerative tree represented in
Figure 9.3. This can be concluded because the solution of the LP relaxation of subproblem S1 is
integer.
Another type of pruning takes place when known global (primal) bounds can be used to prevent
further exploration of a branch in the enumeration tree. Continuing the example in Figure 9.4,
notice that the global lower (primal) bound z = 11 becomes available and can be transmitted to
all subproblems. Now suppose we solve the LP relaxation of S2 and obtain the optimal value of
z 2 = 9.7. Notice that we are precisely in the situation described in Section 9.3.1. That is, the nodes
descending from S2 (i.e., their descendants) can only yield solutions that have objective function
144 Chapter 9. Branch-and-bound Method
x2 ≤ 2 x2 ≥ 3 x2 ≤ 2 x2 ≥ 3
Figure 9.4: An example of pruning by optimality. Since the solution of LP relaxation of subproblem
S1 is integer, x = (2, 2) must be optimal for S1
value worse than the dual bound of S2 , which, in turn, is worse than a known global primal (lower)
bound. Thus, any further exploration among the descendants of S2 would be fruitless in terms of
yielding better solutions and can be pruned. This is known as pruning by bound, and is illustrated
in Figure 9.5.
x2 ≤ 2 x2 ≥ 3 x2 ≤ 2 x2 ≥ 3
Figure 9.5: An example of pruning by bound. Notice that the newly found global bound holds for
all subproblems. After solving the LP relaxation of S2 , we notice that z 2 ≤ z, which renders the
pruning.
The third type of pruning is called pruning by infeasibility, which takes place whenever the branch-
ing constraint added to the subproblem renders its relaxation infeasible, implying that the sub-
problem itself is infeasible (cf. Proposition 9.3)
Algorithm 5 presents a pseudocode for an LP-based branch-and-bound method. Notice that the
algorithm keeps a list L of subproblems to be solved and requires that a certain rule to select
which subproblem is solved next to be employed. This subproblem selection (often referred to as
search strategy) can have considerable impacts on the performance of the method. Similarly, in
case multiple components are found to be fractional, one must be chosen. Defining such branching
priorities also has consequences to performance. We will discuss these in more depth in Chapter
11.
Also, recall that we have seen how to efficiently resolve a linear programming problem from an
optimal basis once we include an additional constraint (in Chapter 6). It so turns out that an
efficient dual simplex method is the kingpin of an efficient branch-and-bound method for (mixed)-
integer programming problems.
Finally, although we developed the method in the context of integer programming problems, the
method can be readily applied to mixed-integer programming problems, with the only difference
being that the branch-and-bound steps are only applied to the integer variables while the continuous
variables are naturally taken care of in the solution of the LP relaxations.
Let us finalise presenting a numerical example of the employment of branch-and-bound method to
9.3. Branch-and-bound method 145
max. z = 4x1 − x2
x
s.t.: 7x1 − 2x2 ≤ 14
x2 ≤ 3
2x1 − 2x2 ≤ 3
x ∈ Z2+ .
We start by solving its LP relaxation, as represented in Figure 9.6. We obtain the solution xLP =
(20/7, 3) with objective value of z = 59/7. As the first component of x is fractional, we can
generate subproblems by branching the node into subproblems S1 and S2 , where
S1 = S ∩ {x : x1 ≤ 2}
S2 = S ∩ {x : x1 ≥ 3} .
The current enumerative (or branch-and-bound) tree representation is depicted in Figure 9.7.
x2
3
x0LP
0
x1
0 1 2 3 4
S
[−, 59/7]
(20/7, 3)
x1 ≤ 2 x1 ≥ 3
S1 S2
[−, −] [−, −]
Suppose we arbitrarily choose to solve the relaxation of S1 next. Notice that this subproblem
consists of the problem S, with the added constraint x1 ≤ 2. The feasible region and solution of
the LP relaxation of S2 is depicted in Figure 9.8. Since we again obtain a new fractional solution
x1LP = (2, 1/2), we must branch on the second component, forming the subproblems
S11 = S1 ∩ {x : x2 = 0}
S12 = S1 ∩ {x : x2 ≥ 1} .
x2
3
x0LP
1
x1LP
0
x1
0 1 2 3 4
Notice that, at this point, our list of active subproblems is formed by L = {S1 , S11 , S12 }. Our
current branch-and-bound tree is represented in Figure 9.9.
Suppose we arbitrarily choose to first solve S2 . One can see that this would render an infeasible
subproblem, since the constraint x2 ≥ 3 does not intersect with the original feasible region and,
thus, S2 can be pruned by infeasibility.
Next, we choose to solve the LP relaxation of S12 , which yields an integer solution x12 LP = (2, 1).
Therefore, an optimal solution for S12 was found, meaning that a global primal (lower) bound
has been found and can be propagated to the whole branch-and-bound tree. Solving S11 next, we
obtain the solution x11
LP = (3/2, 0) with optimal value z = 6. Since a better global primal (lower)
bound is known, we can prune S11 by bound. As there are no further nodes to be explored, the
solution for the original problem is the best (and, in this case, the single) integer solution found
in the process (cf. Proposition 9.6), x∗ = (2, 1), z ∗ = 7. Figure 9.10 illustrates the feasible region
of the subproblems and their respective optimal solutions, while Figure 9.11 presents the final
branch-and-bound tree with all branches pruned.
9.3. Branch-and-bound method 147
S
[−, 59/7]
(20/7, 3)
x1 ≤ 2 x1 ≥ 3
S1
S2
[−, 15/2]
[−, −]
(2, 1/2)
x2 = 0 x2 ≥ 1
S11 S12
[−, −] [−, −]
Figure 9.9: The branch-and-bound tree after branching S1 onto S11 and S12
x2
3
x0LP
x12
LP
1
x1LP
x11
LP
0
x1
0 1 2 3 4
Figure 9.10: LP relaxations of all subproblems. Notice that S11 and S12 includes the constraints
x1 ≤ 2 from the parent node S1
S
[7, 59/7]
(20/7, 3)
x1 ≤ 2 x1 ≥ 3
S1 S2
[7, 15/2] [7, −∞]
(2, 1/2) -
x2 = 0 x1 ≥ 1
S11 S12
[7, 6] [7, 7]
- (2, 1)
Notice that in this example, the order in which we solved the subproblems was crucial for pruning
by bound the subproblem S11 , only possible because we happened to solve the LP relaxation of S12
first and that happened to yield a feasible solution and associated primal bound. This illustrates
an important aspect associated with the branch and bound method: having good feasible solutions
available early on in the process increases the likelihood of performing more pruning by bound,
which is highly desirable in terms of computational savings (and thus, performance). We will
discuss in more detail the impacts of different search strategies later on when we consider this and
other aspects involved in the implementation of mixed-integer programming solvers.
148 Chapter 9. Branch-and-bound Method
9.4 Exercises
(a) Let N = {1, . . . , n} be a set of potential facilities and M = {1, . . . , m} a set of clients. Let
yj = 1 if facility j is opened, and yj = 0 otherwise. Moreover, let xij be the fraction of client i’s
demand satisfied from facility j. The UFL can be formulated as the mixed-integer problem:
X XX
(UFL-W) : min. fj yj + cij xij (9.1)
x,y
j∈N i∈M j∈N
X
s.t.: xij = 1, ∀i ∈ M, (9.2)
j∈N
X
xij ≤ myj , ∀j ∈ N, (9.3)
i∈M
xij ≥ 0, ∀i ∈ M, ∀j ∈ N, (9.4)
yj ∈ {0, 1}, ∀j ∈ N, (9.5)
where fj is the cost of opening facility j, and cij is the cost of satisfying client i’s demand from
facility j. Consider an instance of the UFL with opening costs f = (4, 3, 4, 4, 7) and client costs
12 13 6 0 1
8 4 9 1 2
2 6 6 0 1
(cij ) =
3 5 2 1 8
8 0 5 10 8
2 0 3 4 1
Implement (the model) and solve the problem with Julia using JuMP.
(b) An alternative formulation of the UFL is of the form
X XX
(UFL-S) : min. fj yj + cij xij (9.6)
x,y
j∈N i∈M j∈N
X
s.t.: xij = 1, ∀i ∈ M, (9.7)
j∈N
xij ≤ yj , ∀i ∈ M, ∀j ∈ N, (9.8)
xij ≥ 0, ∀i ∈ M, ∀j ∈ N, (9.9)
yj ∈ {0, 1}, ∀j ∈ N. (9.10)
Linear programming (LP) relaxations of these problems can be obtained by relaxing the binary
constraints yj ∈ {0, 1} to 0 ≤ yj ≤ 1 for all j ∈ N . For the same instance as in part (a), solve the
LP relaxations of UFL-W and UFL-S and compare the optimal costs of the LP relaxations against
the optimal integer cost obtained in part (a).
9.4. Exercises 149
Solve the IP problem by LP-based branch and bound, i.e., use LP relaxations to compute dual
(upper) bounds. Use dual simplex to efficiently solve the subproblem of each node starting from
the optimal basis of the previous node. Recall that the LP relaxation of IP is obtained by relaxing
the variables x1 , . . . , x5 ∈ Z+ to x1 , . . . , x5 ≥ 0.
Hint: The initial dual bound z is obtained by solving the LP relaxation of IP at the root node
S. Let [z, z] be the lower and upper bounds of each node. The optimal tableau and the initial
branch-and-bound tree with only the root node S are shown below.
-z x1 x2 x3 x4 x5
-59/7 0 0 -4/7 -1/7 0 z = 59/7
x1 = 20/7 1 0 1/7 2/7 0 z = −∞
x2 = 3 0 1 0 1 0
S
x5 = 23/7 0 0 -2/7 10/7 1
You can proceed by branching on the fractional variable x1 and imposing either x1 ≤ 2 or x ≥ 3.
This creates two new subproblems S1 = S ∩ {x1 ≤ 2} and S2 = S ∩ {x1 ≥ 3} in the branch-and-
bound tree that can be solved efficiently using the dual simplex method, starting from the optimal
tableau of S shown above, by first adding the new constraint x1 ≤ 2 for S1 or x1 ≥ 3 for S2 to
the optimal tableau. The dual simplex method can be applied immediately if the new constraint
is always written in terms of non-basic variables before adding it to the tableau as a new row,
possibly multiplying the constraint by −1 if needed.
Plot (or draw) the feasible region of the linear programming (LP) relaxation of the problem IP ,
then solve the problems using the figure. Recall that the LP relaxation of IP is obtained by
150 Chapter 9. Branch-and-bound Method
(a) What is the optimal cost zLP of the LP relaxation of the problem IP ? What is the optimal
cost z of the problem IP ?
(b) Draw the border of the convex hull of the feasible solutions of the problem IP . Recall that
the convex hull represents the ideal formulation for the problem IP .
(c) Solve the problem IP by LP-relaxation based branch-and-bound. You can solve the LP
relaxations at each node of the branch-and-bound tree graphically. Start the branch-and-
bound procedure without any primal bound.
Chapter 10
Cutting-planes Method
c⊤ x : x ∈ X
(IP ) : max.
x
Furthermore, if we had available Ãx ≤ b̃, then we could solve IP by solving its linear programming
relaxation.
Cutting-plane methods are based on the idea of iteratively approximating the set of inequalities
Ãx ≤ b̃ by adding constraints to the formulation P of IP . These constraints are called valid
inequalities, a term we define more precisely in Definition 10.1.
Notice that the condition for an inequality to be valid is that it does not remove any of the point
in the original integer set X. In light of the idea of gradually approximating conv(X), one can
infer that good valid inequalities are those that can “cut off” some of the area defined by the
polyhedral set P , but without removing any of the points in X. This is precisely where the name
cut comes from. Figure 10.1 illustrates the process of adding a valid inequality to a formulation
P . Notice how the inequality exposes one of the facets of the convex hull of X. Cuts like such are
called “facet-defining” and are the strongest types of cuts one can generate. We will postpone the
discussion of stronger cuts to later in this chapter.
151
152 Chapter 10. Cutting-planes Method
x2 x2
4 4
3 3
2 2
1 1
0 0
x1 x1
0 1 2 3 4 0 1 2 3 4
Figure 10.1: Illustration of a valid inequality being added to a formulation P . Notice how the
inequality cuts off a portion of the polyhedral set P while not removing any of the feasible points
X (represented by the dots)
Ax ≤ b
u Ax ≤ u⊤ b
⊤
π ⊤ x ≤ u⊤ Ax ≤ u⊤ b ≤ π0 ,
and, thus, it implies the validity of the cut, i.e., that π ⊤ x ≤ π0 , ∀x ∈ P . Now, let us consider the
other direction, which we can use linear programming duality to show. First, consider the primal
problem
max. π ⊤ x
s.t.: Ax ≤ b
x≥0
min. u⊤ b
s.t.: u⊤ A ≥ π
u ≥ 0.
10.2. The Chvátal-Gomory procedure 153
Notice that u⊤ A ≥ π can be seen as a consequence of dual feasibility, which is guaranteed to hold
for some u since π ⊤ x is bounded. Then, strong duality gives u⊤ b = π ⊤ x ≤ π0 , which completes
the proof.
One thing to notice is that valid cuts in the context of polyhedral sets are somewhat redundant,
since, by definition, they do not alter the polyhedral set by any means. However, the concept can
be combined with a simple yet powerful way of generating valid inequalities for integer sets by
using rounding. This is stated in Proposition 10.3.
Proposition 10.3 (Valid inequalities for integer sets). Let X = y ∈ Z1 : y ≤ b . The inequality
y ≤ ⌊b⌋ is valid for X.
The proof of Proposition 10.3 is somewhat straightforward and left as a thought exercise.
We can combine Propositions 10.2 and 10.3 into a single procedure to automatically generate valid
inequalities. Let us start with a numerical example. Consider the set X = P ∩ Zn where P is
defined by
P = x ∈ R2+ : 7x1 − 2x2 ≤ 14, x2 ≤ 3, 2x1 − 2x2 ≤ 3 .
First, let u = 72 , 37
63 , 0 , which, for now, we can assume that they were arbitrarily chosen. We
can then combine the constraints in P (the Ax ≤ b in Proposition 10.2) forming the constraint
(equivalent to u⊤ Ax ≤ u⊤ b)
1 121
2x1 + x2 ≤ .
63 21
Now, notice that the constraint would remain valid for P if we simply round down the coefficients
on the left-hand side (as x ≥ 0 and all coefficients are positive). This would lead to the new
constraint (notice that this yields a vector π in Proposition 10.2)
121
2x1 + 0x2 ≤ .
21
Finally, we can invoke Proposition 10.3 to generate a cut valid for X. This can be achieved by
simply rounding down the righthand side (yielding π0 ), obtaining
2x1 + 0x2 ≤ 5,
which is valid for X, but not for P . Notice that, apart from the vector of weights u used to
combine the constraints, everything else in the procedure of generating the valid inequality for X
is automated. This procedure is known as the Chvátal-gomory procedure and can be formalised as
follows.
Definition
10.4 (Chvátal-Gomory procedure). Consider the integer set X = P ∩ Zn where
P = x ∈ R+ : Ax ≤ b , A is an m × n matrix with columns {A1 , . . . , An } and u ∈ Rm
n
+.
The Chvátal-Gomory procedure consists of the following set of steps to generate valid inequalities
for X:
Pn
1. j=1 u⊤ Aj xj ≤ u⊤ b is valid for P as u ≥ 0;
Pn ⊤
2. j=1 ⌊u Aj ⌋xj ≤ u⊤ b is valid for P as x ≥ 0;
Pn ⊤
3. j=1 ⌊u Aj ⌋xj ≤ ⌊u⊤ b⌋ is valid for X as ⌊u⊤ b⌋ is integer.
154 Chapter 10. Cutting-planes Method
Perhaps the most striking result in the theory of integer programming is that every valid inequality
for an integer set X can be obtained by employing the Chvátal-gomory procedure a number of
times. This is formalised in Theorem 10.5, which has its proof provided as an exercise (see Exercise
10.1).
Theorem 10.5. Every valid inequality for X can be obtained by applying the Chvátal-Gomory
procedure a finite number of times.
The motivation for employing cutting-plane algorithms lies in the belief that only a few of all |F|
inequalities (assuming F is finite, which might not be necessarily the case) is necessary, circum-
venting the computationally prohibitive need of generating all possible inequalities from F.
Some other complicating aspects must be observed when dealing with cutting-plane algorithms.
First, it might be so that a given family of valid inequalities F is not sufficient to expose the optimal
solution x ∈ X, which might be the case, for example, if F cannot fully describe conv(X) or if
the separation problem is unsolvable. In that case, the algorithm will terminate with a solution
for the LP relaxation that is not integer, i.e., xkLP ∈/ Zn .
1 the separation principle is a consequence of the separation theorem (Theorem ??) in Part ??.
10.4. Gomory’s fractional cutting-plane method 155
However, failing to converge to an integer solution is not a complete failure since, in the process,
we have improved the formulation P (cf. Definition 8.3). In fact, this idea plays a major role in
professional-grade implementations of mixed-integer programming solvers, as we will see later.
xj ∈ Z+ , ∀j ∈ J,
where I = {1, . . . , m}, J = {1, . . . , n}, IB ⊂ J are the indices of basic variables and IN = J \ IB
the indices of nonbasic variables. Notice that, at this point, we are simply recasting the problem
IP by performing permutations of columns, since basic feasible solutions for the LP relaxation do
not necessarily translate into a feasible solution for X.
However, assuming that we solve the LP relaxation of the integer programming problem P and
obtain an optimal solution x = (xB , xN ) with associated optimal basis B. If x is fractional, then
it means that ai0 is fractional for some i.
From any of the rows i with fractional ai0 , we can derive a valid inequality using the Chavátal-
Gomory procedure. These inequalities, commonly referred to as CG cuts, take the form
X
xB(i) + ⌊aij ⌋xj ≤ ⌊ai0 ⌋. (10.1)
j∈IN
As this is thought to be used in conjunction with the simplex method, we must be able to state
(10.1) in terms of the nonbasic variables xj , ∀j ∈ IN . To do so, we can replace xB(i) = ai0 −
P
j∈IN aij xj , obtaining X
(aij − ⌊aij ⌋)xj ≥ (ai0 − ⌊ai0 ⌋),
j∈IN
156 Chapter 10. Cutting-planes Method
which, by defining fij = aij − ⌊aij ⌋, can be written in the more conventional form
X
fij xj ≥ fi0 . (10.2)
j∈IN
In the form of (10.2), this inequality is referred to as the Gomory (fractional) cut.
Notice that the inequality (10.2) is not satisfied by the optimal solution of the LP relaxation, since
xj = 0, ∀j ∈ IN and fi0 > 0. Therefore, this indicates that a cutting-plane method using this idea
benefits from the employment of dual simplex, in line with the discussion in Section 6.1.2.
Let us present a numerical example illustrating the employment of Gomory’s fractional cutting
plane algorithm for solving the following integer programming problem
z = max. 4x1 − x2
x
s.t.: 7x1 − 2x2 ≤ 14
x2 ≤ 3
2x1 − 2x2 ≤ 3
x1 , x2 ∈ Z+ .
Figure 10.2 illustrates the feasible region of the problem and indicates the solution of its LP
relaxation. Considering the tableau representation of the optimal basis for the LP relaxation, we
have
x1 x2 x3 x4 x5 RHS
0 0 -4/7 -1/7 0 59/7
1 0 1/7 2/7 0 20/7
0 1 0 1 0 3
0 0 -2/7 10/7 1 23/7
Notice that the tableau indicates that the component x1 in the optimal solution is fractional. Thus,
we can choose that row to generate a Gomory cut. This will lead to the new constraint (with added
respective slack variable s ≥ 0.
1 2 6
x3 + x4 − s = .
7 7 7
We can proceed to add this new constraint to the problem, effectively adding an additional row
to the tableau. After multiplying it by -1 (so we have s as a basic variable complementing the
augmented basis), we obtain the new tableau
x1 x2 x3 x4 x5 s RHS
0 0 -4/7 -1/7 0 0 59/7
1 0 1/7 2/7 0 0 20/7
0 1 0 1 0 0 3
0 0 -2/7 10/7 1 0 23/7
0 0 -1/7 -2/7 0 1 -6/7
Notice that the solution remains dual feasible, which indicates the suitability of the dual simplex
method. Applying the dual simplex method leads to the optimal tableau
10.4. Gomory’s fractional cutting-plane method 157
x1 x2 x3 x4 x5 s RHS
0 0 0 0 -1/2 -3 15/2
1 0 0 0 0 1 2
0 1 0 0 -1/2 1 1/2
0 0 1 0 -1 -5 1
0 0 0 1 1/2 -1 5/2
Notice that we still have a fractional component, this time associated with x2 . We proceed in
an analogous fashion, first generating the Gomory cut and adding the slack variable t ≥ 0, thus
obtaining
1 1
x5 − t = .
2 2
Then, adding it to the previous tableau and employing the dual simplex again, leads to the optimal
tableau
x1 x2 x3 x4 x5 s t RHS
0 0 0 0 0 -4 -1 7
1 0 0 0 0 1 0 2
0 1 0 0 0 0 -1 1
0 0 1 0 0 -7 -2 2
0 0 0 1 0 0 1 2
0 0 0 0 1 -2 -2 1
Notice that now all variables are integer, and thus, an optimal solution for the original integer
programming problem was found.
Some points are worth noticing. First, notice that, at the optimum, all variables, including the
slacks, are integer. This is a consequence of having the Gomory cuts active at the optimal solution
since (10.1), and consequently, (10.2), have both the left- and right-hand sides integer. Also, notice
that at each iteration the problem increases in size, due to the new constraint being added, which
implies that the basis also increases in size. Though this is an issue also in the branch-and-bound
method, it can be a more prominent computational issue in the context of cutting-plane methods.
We can also interpret the progress of the algorithm in graphical terms. First of all, notice that
we can express the cuts in terms of the original variables (x1 , x2 ) by noticing that the original
formulation gives x3 = 14 − 7x1 + x2 and x4 = 3 − x2 . Substituting x3 and x4 in the cut
1 2 6
7 x3 + 7 x4 − s = 7 gives x1 ≤ 2. More generally, we have that cuts can be expressed using the
original problem variables, as stated in Proposition 10.6.
−1
Proposition 10.6. Let β be the P row l of B selected to generate the cut, and let qi = βi − ⌊βi ⌋,
i ∈ {1, . . . , m}. Then the cut j∈IN flj xj ≥ fl0 , written in terms of the original variables, is the
Chvátal-Gomory inequality
n
X
⌊qAj ⌋xj ≤ ⌊qb⌋.
j=1
158 Chapter 10. Cutting-planes Method
x2
x2 x2 4
4 4
3
x0LP
3 3
x0LP x0LP
2
2 2
x2LP
1
1 1
x1LP x1LP
0 0 0
x1 x1 x1
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Figure 10.2: Feasible region of the LP relaxation (polyhedral set) and of the integer programming
problem (blue dots) at each of three iterations taken to solve the integer programming problem.
The inequalities in orange represent the Gomory cut added at each iteration
Let us illustrate the concept of dominance with a numerical example. Consider the inequalities
2x1 + 4x2 ≤ 9 and x1 + 3x2 ≤ 4, which are valid for P = conv(X), where
X = {(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 1)} .
Notice that, if we consider u = 1/2, we have that, for any x = (x1 , x2 ), that 1x1 + 3x2 ≥
u(2x1 + 4x2 ) = x1 + 2x2 and that 4 ≤ 9u = 9/2. Thus, we say that x1 + 3x2 ≤ 4 dominates
2x1 + 4x2 ≤ 9. Figure 10.3 illustrates the two inequalities. Notice that x1 + 3x2 ≤ 4 is a stronger
inequality since it is more efficient in representing the convex hull of X than 2x1 + 4x2 ≤ 9.
Another related concept is the notion of redundancy. Clearly, in the presence of two constraints
in which one dominates the other, the dominated constraint is also redundant and can be safely
10.5. Obtaining stronger inequalities 159
x2
3
2 2x1 + 4x2 ≤ 9
1 x1 + 3x2 ≤ 4
conv(X)
0
x1
0 1 2 3 4 5
Figure 10.3: Illustration of dominance between constraints. Notice that x1 + 3x2 ≤ 4 dominates
2x1 + 4x2 ≤ 9 and is thus stronger
removed from the formulation of a polyhedral set P . However, in some cases one might not be able
to identify redundant constraints simply because no constraint is clearly dominated by another.
Even then, there might be a way to identify weak (or redundant) constraints by combining two or
more constraints that then form a dominating constraint. This is formalised in Definition 10.8.
Definition 10.8 (Redundancy). The inequality πx ≤ π0 is redundant for P if there P exists k ≥1
i i k i
valid inequalities π x ≤ π0 and k ≥ 1 vectors ui > 0, for i ∈ {1, . . . , k}, such that i=1 ui π x≤
Pk i
i=1 ui π0 dominates πx ≤ π0 .
Once again, let us illustrate the concept with a numerical example. Consider we generate the
inequality 5x1 − 2x2 ≤ 6, which is valid for the polyhedral set
P = x ∈ R2+ : 6x1 − x2 ≤ 9, 9x1 − 5x2 ≤ 6 .
The inequality 5x1 − 2x2 ≤ 6 is not dominated by any of the inequalities forming P . However, if
we set ui = ( 31 , 13 ), we obtain 5x1 − 2x2 ≤ 5, which in turn dominates 5x1 − 2x2 ≤ 6. Thus, we
can conclude that the generated inequality is redundant and does not improve the formulation of
P . This is illustrated in Figure 10.4.
x2
3 9x1 − 5x2 ≤ 6
2
5x1 − 2x2 ≤ 6
1
6x1 − x2 ≤ 9
0
x1
0 1 2 3
Figure 10.4: Illustration of a redundant inequality. Notice how the inequality 5x1 − 2x2 ≤ 6 (in
orange) does not dominate any of the other inequalities
As one might realise, checking whether a newly generated inequality improves the current formu-
lation is a demanding task, as it requires finding the correct set of coefficients u for all constraints
currently forming the polyhedral set P . Nonetheless, the notions of redundancy and dominance
can be used to guide procedures that generate or improve existing inequalities. Let us discuss one
of such procedures, in the context of 0-1 knapsack inequalities.
160 Chapter 10. Cutting-planes Method
We assume that aj ≥ 0, j ∈ N = {1, . . . , n}, and b > 0. Let us start by defining the notion of a
minimal cover.
P
Definition 10.9 (minimal cover). A set C ⊆ N is a cover if j∈C aj > b. A cover C is minimal
if C \ {j} for all j ∈ C is not a cover.
Notice that a cover C refers to any selection of items that exceed the budget b of the constraint
and this selection is said to be a minimal cover if, upon removing of any item of the selection, the
constraint becomes satisfied. This logic allows us to design a way to generate valid inequalities
using covers. This is the main result in Proposition 10.10.
Proposition 10.10. If C ⊆ N is a cover for X, then a valid cover inequality for X is
X
xj ≤ |C| − 1.
j∈C
Proof. Let R = j ∈ N : xR R R
P
j = 1 , for x ∈ X. If j∈C xj > |C| − 1, then |R ∩ C| = |C|
R
P P
and C ⊆ R. Thus, j∈N aj xj = j∈R aj > b, which violates the inequality and implies that
xR ∈
/ X.
The usefulness of Proposition 10.10 becomes evident if C is a minimal cover. Let us consider a
numerical example to illustrate this. Consider the knapsack set
n o
7
X = x ∈ {0, 1} : 11x1 + 6x2 + 6x3 + 5x4 + 5x5 + 4x6 + x7 ≤ 19 .
x1 + x2 + x3 ≤ 2
x1 + x2 + x6 ≤ 2
x1 + x5 + x6 ≤ 2
x3 + x4 + x5 + x6 ≤ 3
We leave the proof as a thought exercise. Let us however illustrate this using the previous numerical
example. For C = {3, 4, 5, 6}, E(C) = {1, 2, 3, 4, 5, 6}, yielding the inequality
x1 + x2 + x3 + x4 + x5 + x6 ≤ 3
which is stronger than x3 + x4 + x5 + x6 ≤ 3 as the former dominates the latter, cf. Definition
10.7.
The above serves as an example of how strong inequalities can be generated for problems with a
known structure. However, this is just one example of many other well-known cutting-generation
methods. In the next chapter, we will mention a few other alternative techniques to generate valid
inequalities that enrich professional-grade implementations of MIP solvers.
162 Chapter 10. Cutting-planes Method
10.6 Exercises
Exercise 10.1: Chvátal-Gomory (C-G) procedure
Consider the set X = P ∩ Zn where P = {x ∈ Rn : Ax ≤ b, x ≥ 0} and in which A is an
m × n matrix with columns {A1 , . . . , An }. Let u ∈ Rm with u ≥ 0. The Chvátal-Gomory (C-G)
procedure to construct valid inequalities for X uses the following 3 steps:
n
P n
P
1. uAj xj ≤ ub is valid for P , as u ≥ 0 and Aj xj ≤ b.
j=1 j=1
n
P
2. ⌊uAj ⌋xj ≤ ub is valid for P , as x ≥ 0.
j=1
n
P n
P
3. ⌊uAj ⌋xj ≤ ⌊ub⌋ is valid for X, as any x ∈ X is integer and thus ⌊uAj ⌋xj is integer.
j=1 j=1
Show that every valid inequality for X can be obtained by applying the Chvátal-Gomory procedure
a finite number of times.
Hint: We show this for the 0-1 case. Thus, let P = {x ∈ Rn : Ax ≤ b, 0 ≤ x ≤ 1}, X = P ∩ Zn ,
and suppose that πx ≤ π0 with π, π0 ∈ Z is a valid inequality for X. We show that πx ≤ π0 can
be obtained by applying Chvátal-Gomory procedure a finite number of times. We do this in parts
by proving the following claims C1, C2, C3, C4, and C5.
C4. If X X
πx ≤ π0 + τ + xj + (1 − xj ) (10.5)
j∈T 0 ∪{p} j∈T 1
and X X
πx ≤ π0 + τ + xj + (1 − xj ) (10.6)
j∈T 0 j∈T 1 ∪{p}
are valid inequalities for X, where τ ∈ Z+ and (T 0 , T 1 ) is any partition of {1, . . . , p − 1}, then
X X
πx ≤ π0 + τ + xj + (1 − xj ) (10.7)
j∈T 0 j∈T 1
10.6. Exercises 163
C5. If
πx ≤ π0 + τ + 1 (10.8)
πx ≤ π0 + τ (10.9)
(a) Consider the set X = {x ∈ 0, 15 : 3x1 − 4x2 + 2x3 − 3x4 + x5 ≤ −2}. Derive the following
inequalities as C-G inequalities:
(i) x2 + x4 ≥ 1
(ii) x1 ≤ x2
(b) Consider the set X = {x ∈ 0, 14 : xi + xj ≤ 1 for all i, j ∈ {1, . . . , 4} : i ̸= j}. Derive the
clique inequalities x1 + x2 + x3 ≤ 1 and x1 + x2 + x3 + x4 ≤ 1 as C-G inequalities.
s.t.: 4x1 + x2 ≤ 28
x1 + 4x2 ≤ 27
x1 − x2 ≤ 1
x1 , x2 ∈ Z+ .
s.t.: 4x1 + x2 + x3 = 28
x1 + 4x2 + x4 = 27
x1 − x2 + x5 = 1
x1 , x2 , x3 , x4 , x5 ≥ 0
164 Chapter 10. Cutting-planes Method
The optimal Simplex tableau after solving the problem LP with primal Simplex is
x1 x2 x3 x4 x5 RHS
0 0 -1/5 -6/5 0 -38
1 0 4/15 -1/15 0 17/3
0 1 -1/15 4/15 0 16/3
0 0 -1/3 1/3 1 2/3
(a) Derive two fractional Gomory cuts from the rows of x1 and x5 , and express them in terms
of the original variables x1 and x2 .
(b) Derive the same cuts as in part (a) as Chvátal-Gomory cuts. Hint: Use Proposition 5 from
Lecture 9. Recall that the bottom-right part of the tableau corresponds to B −1 A, where B −1
is the inverse of the optimal basis matrix and A is the original constraint matrix. You can
thus obtain the matrix B −1 from the optimal Simplex tableau, since the last three columns
of A form an identity matrix.
Solve the problem by adding Gomory cuts to the LP relaxation until you find an integer solution.
and a solution x̄ = (0, 2/3, 0, 1, 1, 1, 1) to its LP relaxation. Find a cover inequality cutting out
(violated by) the fractional solution x̄.
Chapter 11
Mixed-integer Programming
Solvers
165
166 Chapter 11. Mixed-integer Programming Solvers
Start Presolve
LP Relaxation
Cuts Heuristics
Branching
Figure 11.1: The flowchart of a typical MIP solver. The nodes represent phases of the algorithm
presolve, these are likely the phases that most differ between implementations of MIP solvers. The
cut phase consists of the employment of a cutting-plane method onto the current LP relaxation
with the aim of either obtaining an integer solution (and thus pruning the branch by optimality)
or strengthening the formulation of the LP relaxation, as discussed in Chapter 10. Each solver
will have their own family of cuts that are used in this phase, and typically a collection of cut
families is used simultaneously. The heuristics phase is used in combination to try to obtain
primal feasible solutions from the LP relaxations (possibly augmented by cuts) so primal bounds
(integer and feasible solution values) can be obtained and broadcasted to the whole search tree,
hopefully fostering pruning by bound.
In what follows, we will discuss some of the main techniques in each of these phases.
Many techniques used in the preprocessing phase rely on the notion of constraint activity. Consider
the constraint a⊤ x ≤ b, with x ∈ Rn as a decision variable vector, l ≤ x ≤ u, b ∈ R, where (a, b, l, u)
are given. We say that the minimum activity of the constraint is given by
X X
αmin = min a⊤ x : l ≤ x ≤ u =
aj lj + aj uj .
j:aj >0 j:aj <0
Notice that the constraint activity is simply capturing what is the minimum and maximum (re-
spectively) values the left-hand side of a⊤ x ≤ b could assume. This constraint activity can be used
11.2. Presolving methods 167
in a number of ways. For example, if there is a constraint for which αmin > b, then the problem is
trivially infeasible. On the other hand, if one observes that αmax ≤ b for a given constraint, then
the constraint can be safely removed since it is guaranteed to be redundant.
Bound tightening
Another important presolving method is bound tightening, which, as the name suggests, tries to
tighten lower and upper bounds of variables, thus strengthening the LP relaxation formulation.
There are alternative ways that this can be done, and they typically trade off how much tightening
can be observed and how much computational effort they require.
One simple way of employing bound tightening is by noticing the following. Assume, for simplicity,
that aj > 0, ∀j ∈ J. Then, we have that
X
αmin = aj lj = a⊤ l ≤ a⊤ x ≤ b,
j∈J
aj xj ≤ b − a⊤ x + aj xj . (11.1)
αmin − aj lj ≤ a⊤ x − aj xj .
b − a⊤ x + aj xj ≤ b − αmin + aj lj .
Combining this result with what was obtained from (11.1), we have that
aj xj ≤ b − a⊤ x + aj xj ≤ b − αmin + aj lj ,
IP : min. c⊤ x : x ∈ X = P ∩ Zn
where P is a polyhedral set. Then, the optimal solution value of the subproblem
provides a lower bound for xj that considers all possible constraints at once. Analogously, solving
LPxj as a maximisation problem yields an upper bound. Though this can be done somewhat
efficiently, this clearly has steeper computational requirements.
168 Chapter 11. Mixed-integer Programming Solvers
Coefficient tightening
Different from bound tightening, coefficient tightening techniques aim at improving the strength
of existing constraints. The simplest form consists of the following. Let aj > 0, xj ∈ {0, 1}, such
that αmax − aj < b. If such coefficients are identified, then the constraint
X
aj xj + aj ′ xj ′ ≤ b
j ′ :j ′ ̸=j
Notice that the modified constraint is valid for the original integer set while dominating the original
constraint (cf. Definition 10.7 since, on the left-hand side, we have that αmax − b < aj and on the
right-hand side we have αmax − aj < b).
Other methods
There is a wide range of methods employed as preprocessing, and they vary greatly among different
solvers and even different modelling languages. Thus, compiling an exhaustive list is no trivial feat.
Some other common methods that are employed include:
• Merge of parallel rows and columns: methods implemented to identify pairs of rows
(constraints) and columns (variables) with constant proportionality coefficient (i.e., are lin-
early dependent) and merge them into a single entity, thus reducing the size of the model.
• Domination tests between constraints: heuristics that test whether domination between
selected constraints can be asserted so that some constraints can be deemed redundant and
removed.
• Clique merging: a clique is a subset of vertices of a graph that are fully connected. Assume
that xj ∈ {0, 1} for j ∈ {1, 2, 3} and that the three constraints hold:
x1 + x2 ≤ 1
x1 + x3 ≤ 1
x2 + x3 ≤ 1.
Then, one can think of these constraints forming a clique between the imaginary nodes 1, 2,
and 3, which renders the clique cut x1 + x2 + x3 ≤ 1. Many other similar ideas using this
graph representation of implication constraints, known as conflict graphs are implemented in
presolvers to generate valid inequalities for MIPs.
Some final remarks are worth making. Most solvers might, at some point in the process of solving
the MIP refer to something called restart, which consists of reapplying some or all of the techniques
associated with the preprocessing phase after a few iterations of the branch-and-cut process. This
can be beneficial since in the solution process new constraints are generated (cuts) which might
lead to new opportunities for reducing the problem or further tightening bounds.
In addition, conflict graphs can contain information that can be exploited in a new round of pre-
processing and transmitted across all search tree, a process known as propagation. Conflict graphs
and propagation are techniques originally devised for constraint programming and satisfiability
(SAT) problems, but have made considerable inroads in MIP solvers as well.
and Chvátal-Gomory cuts for pure integer constraints, which are dependent on the heuristic or
process used to define the values of the multipliers u in
n
X
⌊uAj ⌋xj ≤ ⌊ub⌋.
j=1
For example, zero-half cuts (with u ∈ [0, 1/2]) and mod-k cuts (u ∈ {0, 1/k, . . . , (k − 1)/k}) are
available in most solvers. Other cuts such as knapsack (or cover) inequalities, mixed-integer round-
170 Chapter 11. Mixed-integer Programming Solvers
ing, and clique cuts are also common and, although they can be shown to be related to the Chvátal-
Gomory procedure, are generated by means of heuristics (normally referring to a cut generation
procedure).
Maximum infeasibility
The branching strategy called maximum infeasibility consists of choosing the variable with the
fractional part as close to 0.5 as possible, or, more precisely selecting the variables j ∈ {1, . . . , n}
as
arg max min {fj , 1 − fj }
j∈{1,...,n}
where fj = xj −⌊xj ⌋. This in effect is trying to reduce as most as possible the infeasibility of the LP
relaxation solution, which in turn would more quickly lead to a feasible (i.e., integer) solution. An
analogous form called minimum infeasibility is often available, and as the name suggests, focuses
on selecting the variables that are closer to being integer-valued.
Strong branching
Strong branching can be understood as an explicit look-ahead strategy. That is, to decide which
variable to branch on, the method performs branching on all possible variables, and chooses the
one which provides the best improvement on the dual (LP relaxation) bound. Specifically, for
each fractional variable xj , we can solve the LP relaxations corresponding to branching options
xj ≤ ⌊xLP ⌋ and xj ≥ ⌈xLP ⌉ to obtain LP relaxation objective values ZjD and ZjU , respectively.
We then choose the fractional variable xj that leads to subproblems with the best LP relaxation
objective values, defined as
arg max min ZjD , ZjU ,
j∈{1,...,n}
assuming a minimisation problem. Strong branching thus compares the worse LP relaxation bound
for each fractional xj and picks the fractional variable for which this value is the best.
As one might suspect, there is a trade-off related to the observed reduction in the number of nodes
explored given by the more prominent improvement of the dual bound, and how computationally
intensive is the method. There are however ideas that can exploit this trade-off more efficiently.
First, the solution of the subproblems might yield information related to infeasibility and pruning
by limit, which can be used in favour of the method.
Another idea is to limit the number of simplex iterations performed when solving the subproblems
associated with each branching option. This allows for using an approximate solution of the
subproblems and potential savings in computational efforts. Some solvers offer a parameter that
allows the user to set this iteration limit value.
172 Chapter 11. Mixed-integer Programming Solvers
Pseudo-cost branching
Pseudo-cost branching relies on the idea of using past information from the search process to
estimate gains from branching on a specific variable. Because of this reliance on past information,
the method tends to be more reliable later in the search tree, where more information has been
accumulated on the impact of choosing a variable for branching.
These improvement estimates are the so-called pseudo-costs, which compile an estimate of how
much the dual (LP relaxation) bound has improved per fractional unit of the variable that has
been reduced. More specifically, let
fj− = xLP LP + LP LP
j − ⌊xj ⌋ and fj = ⌈xj ⌉ − xj . (11.2)
where α ∈ [0, 1]. Setting the value of α trades off two aspects. Assume a maximisation problem.
Then, setting α closer to zero will slow down degradation, which refers to the decrease of the
upper bound (notice that the dual bound is decreasing and thus, ∆+ and ∆− are negative). This
strategy improves the chances of finding a good feasible solution on the given branch, and, in turn,
potentially fostering pruning by bound. In contrast, setting α closer to one increases the rate of
decrease (improvement) of the dual bound, which can be helpful for fostering pruning once a good
global primal bound is available. Some solvers allow for considering alternative branching score
functions.
As one might suspect, it might take several iterations before reliable estimates Ψ+ and Ψ− are
available. The issue with unreliable pseudo-costs can be alleviated with the use of a hybrid strategy
known as reliability branching1 . In that, variables that are deemed unreliable for not having been
selected for branching a minimum number of times η ∈ [4, 8], have strong branching employed
instead.
GUB branching
are referred to as special ordered sets 1 (or SOS1) which, under the assumption that xj ∈ {0, 1},
∀j ∈ {1, . . . , k} implies that only one variable can take value different than zero. Notice you may
1 Tobias Achterberg, Thorsten Koch, and Alexander Martin (2005), Branching rules revisited, Operations Re-
search Letters
11.5. Node selection 173
have SOS1 sets involving continuous variables, which, in turn, would require the use of binary
variables to be modelled appropriately.
Branching on these variables might lead to unbalanced branch-and-bound trees. This is because
the branch in which xj is set to a value different than zero, immediately defines the other variables
to be zero, leading to an early pruning by optimality or infeasibility. In turn, unbalanced trees are
undesirable since they preclude the possibility of parallelisation and might lead to issues related
to searches that focus on finding leaf notes quickly.
To remediate this, the idea of using a generalised upper bound is employed, leading to what is
referred to as GUB branching (with some authors referring to this as SOS1 branching). A gener-
alised upper bound is an upper bound imposed on the sum of several variables. In GUB branching,
branching for binary variables is imposed considering the following rule:
can also benefit from the use of GUB branching, with the term GUB being perhaps better suited
in this case.
You might notice that points 1 and 2 are conflicting since while the former would benefit from a
breadth-focused search (that is, having wider trees earlier is preferable than deeper trees) the latter
would benefit from searches that dive deeply into the tree searching from leaf nodes. Points 3 and
174 Chapter 11. Mixed-integer Programming Solvers
4 pose exactly the same dilemma: the former benefits from breadth-focusing searches while the
latter benefits from depth-focusing searches.
The main strategies for node selection are, to a large extent, ideas to emphasise each or a combi-
nation of the above.
A depth-first search focus on diving down into the search tree, prioritising nodes at lower levels. It
has the effect of increasing the chances of finding leaves, and potentially primal feasible solutions,
earlier. Furthermore, because the problems being successively solved are very similar, with the
exception of one additional branching constraint, the dual simplex methods can be efficiently
restarted and often fewer iterations are needed to find the optimal of the children’s subproblem
relaxation. On the other hand, as a consequence of the way the search is carried out, it is slower
in populating the list of subproblems.
In contrast, breadth-first search gives priority to nodes at higher levels in the tree, ultimately
causing a horizontal spread of the search tree. This has as a consequence a faster improvement of
the dual bound, at the expense of potentially delaying the generation of primal feasible solutions.
This also generates more subproblems quickly, fostering diversification (more subproblems and po-
tentially more information to be re-utilised in repeated rounds of preprocessing) and opportunities
for parallelisation.
Best bound
Best bound consists of the strategy of choosing the next node to be solved by selecting that with
the best dual (LP relaxation) bound. It leads to a breadth-first search pattern, but with the
flexibility to allow potentially good nodes that are in deeper levels of the tree to be selected.
Ultimately, this strategy fosters a faster improvement of the dual bound, but with a higher overhead
on the set-up of the subproblems, since they can be quite different from each other in terms of their
constraints. One way to mitigate this overhead is to perform diving sporadically, which consists of,
after choosing a node by best bound, temporarily switching to a DFS search for a few iterations.
Uses a strategy similar to that employing pseudo-costs to choose which variable to branch on.
However, instead of focusing on objective function values, it uses estimates of the node progress
towards feasibility relative to its bound degradation.
To see how this works, assume that the parent node has been solved, and a dual bound zD is
available. Now, using our estimates in (11.3), we can calculate an estimate of the potential primal
feasible solution value E, given by
n
X
min ∆− +
E = zD + j , ∆j .
j=1
The expression E is an estimate of what is the best possible value an integer solution could have
if it were to be generated by rounding the LP relaxation solution. These estimates can also take
into account the feasibility per se, trying to estimate feasibility probabilities considering known
feasible solutions and how fractional the subproblem LP relaxation solution is.
11.6. Primal heuristics 175
Another common idea consists of selecting the variables to be rounded by considering a reference
solution, which often is an incumbent primal feasible solution. This guided dive is then performed
by choosing the variable with the smallest fractional value when compared against this reference
solution.
A third idea consists of taking into account the number of locks associated with the variable. The
locks refer to the number of constraints that are potentially made infeasible by rounding up or down
a variable. This potential infeasibility stems from taking into count the coefficient of the variable,
the type of constraint and whether rounding it up or down can potentially cause infeasibility. This
is referred to as coefficient diving.
176 Chapter 11. Mixed-integer Programming Solvers
The relaxation-induced neighbourhood search (or RINS) is possibly the most common improvement
heuristic available in modern implementations2 . The heuristic tries to find solutions that present
a balance between proximity of the current LP solution, hoping this would improve the solution
quality, and proximity to an incumbent (i.e., best-known primal feasible) solution, emphasising
feasibility.
In a nutshell, the method consists of the following. After solving the LP relaxation of a node,
suppose we obtain the solution xLP . Also, assume we have at hand an incumbent (integer feasible)
solution x. Then, we form an auxiliary MIP problem in which we fix all variables coinciding
between xLP and x. This can be achieved by using the constraints
xj = xj , ∀j ∈ {1, . . . , n} : xj = xLP
j ,
which, in effect, fixes these variables to integer values and removes them from the problem, as they
can be converted to parameters (or input data). Notice that this constrains the feasible space to
be in the (potentially large) neighbourhood of the incumbent solution. In later iterations, when
more of the components of the relaxation solutions xLP are integer, this becomes a more localised
search, with fewer degrees of freedom. Finally, this additional MIP is solved and, in case an optimal
solution is found, a new incumbent solution might become available.
In contrast, relaxation-enforced neighbourhood search (or RENS) is a constructive heuristic, which
has not yet seen a wider introduction in commercial-grade solvers, though it is available in CBC
and SCIP3 .
The main differences between RINS and RENS are the fact that no incumbent solution is considered
(hence the dropping of the term “induced”) but rather the LP relaxation solution xLP fully defines
the neighbourhood (explaining the name “enforced”).
2 Emilie Danna, Edward Rothberg, and Claude Le Pape (2005), Exploring relaxation induced neighborhoods to
Once again, let us assume we obtain the solution xLP . And again, we fix all integer-valued variables
in xLP , forming a large neighbourhood
xj = xLP LP
j , ∀j ∈ {1, . . . , n} : xj ∈ Z.
One key difference is how the remaining variables are treated. For those components that are
fractional, the following bounds are imposed
⌊xj ⌋ ≤ xj ≤ ⌈xj ⌉, ∀j ∈ {1, . . . , n} : xLP
j ∈/ Z.
Notice that this in effect makes the neighbourhood considerably smaller around the solution xLP .
Then, the MIP subproblem with all these additional constraints is solved and a new incumbent
solution may be found.
Local branching
The idea of local branching is to allow the search to be performed in a neighbourhood of controlled
size, which is achieved by the use of an L1 -norm4 . The size of the neighbourhood is controlled by
a divergence parameter ∆, which in the case of binary variables, amounts to being the Humming
distance between the variable vectors.
In its most simple form, it can be seen as the following idea. From an incumbent solution x, one
can generate and impose the following neighbourhood-inducing constraint
n
X
|xj − xj | ≤ ∆
j=1
Feasibility pump
Feasibility pump is a constructive heuristic that, contrary to the previous heuristics, has made
inroads in most professional-grade solvers and is often employed by default. The focus is exclusively
on trying to find a first primal feasible solution. The idea consists of, from the LP relaxation
solution xLP , performing alternate steps of rounding and solving a projection step, which happens
to be an LP problem.
Starting from the xLP , the method starts by simply rounding the components to obtain an integer
solution x. If this rounded solution is feasible, the algorithm terminates. Otherwise, we perform a
projection step by replacing the LP relaxation objective function with
n
X
aux
f (x) = |xj − xj |
j=1
and resolving it. This is called a projection because it is effectively finding a point in the feasible
region of the LP relaxation that is the closest to the integer solution x. This new solution xLP is
once again rounded and the process repeats until a feasible integer solution is found.
4 Matteo Fischetti and Andrea Lodi (2003), Local branching, Mathematical Programming
178 Chapter 11. Mixed-integer Programming Solvers
It is known that feasibility pump can suffer from cycling, that is, repeatedly finding the same
xLP and x solutions. This can be alleviated by performing random perturbations on some of the
components of x.
Feasibility pump is an extremely powerful method and plays a central role in many professional-
grade solvers. It is also useful in the context of mixed-integer nonlinear programming models.
More recently, variants have been developed5 , taking into account the quality of the projection
(i.e., taking into account also the original objective function) and discussing theoretical properties
of the methods and its convergence guarantees.
5 Timo Berthold, Andrea Lodi, and Domenico Salvagnin (2018), Ten years of feasibility pump, and counting,
11.7 Exercises
Problem 11.1: Preprocessing and primal heuristics
(a) Tightening bounds and redundant constraints
where fj is the cost of opening facility j, and cij is the cost of satisfying client i’s demand
from facility j. Consider an instance of the UFL with opening costs f = (21, 16, 30, 24, 11)
and client costs
6 9 3 4 12
1 2 4 9 2
15 2 6 3 18
(cij ) =
9 23 4 8 1
7 11 2 5 14
4 3 10 11 3
In the UFL problem, the facility production capacities are assumed to be large, and there is
no budget constraint on how many facilities can be built. The problem thus has a feasible so-
lution if at least one facility is opened. We choose an initial feasible solution ȳ = (0, 1, 0, 0, 0).
180 Chapter 11. Mixed-integer Programming Solvers
Try to improve the solution by using relaxation-induced neighbourhood search (RINS) and
construct a feasible solution using relaxation-enforced neighbourhood search (RENS).
• To obtain a good start for branching, use strong branching at the root node. All following
branching variables should be selected using maximum infeasibility.
• For node selection, use the best bound selection to minimze the number of evaluated nodes.
Chapter 12
Decomposition Methods
c⊤ x : x ∈ X ,
(P ) : min.
TK
where X = k=1 Xk , for some K > 0, and
Xk = xk ∈ Rn+k : Dk xk = dk , ∀k ∈ {1, . . . , K} .
That is, X is the intersection of K standard-form polyhedral sets. Our objective is to devise a way
to break into K separable parts that can be solved separately and recombined as a solution for P .
In this case, this can be straightforwardly achieved by noticing that P can be equivalently stated
as
181
182 Chapter 12. Decomposition Methods
(P ) : max. x c⊤
1 x1 +···+ c⊤
K xK
s.t.: D 1 x1 = d1
.. ..
. .
DK xK = dK
x1 , ..., xK ∈ Rn+
has a structure that immediately allows for separation. That is, P could be solved as K independent
problems
Pk : min. c⊤
k xk : xk ∈ Xk
in parallel and then combinePK their individual solutions onto a solution for P , simply by making
⊤ ⊤
x = [xk ]Kk=1 and c x = i=1 ck xk . Notice that, if we were to assume that the solution time
scales linearly (it does not; it grows faster than linear) and K = 10, then solving P as K separated
problems would be ten times faster (that is not true; there are bottlenecks and other considerations
to take into account, but the point stands).
Unfortunately, complicating structures often compromise this natural separability, preventing one
from being able to directly exploit this idea. Specifically, two types of complicating structures
can be observed. The first is of the form of complicating constraint. That is, we observe that a
constraint is such that it connects variables from (some of) the subsets Xk . In this case, we would
notice that P has an additional constraint of the form
A1 x1 + · · · + AK xK = b,
P ′ : max. x c⊤
1 x1 +···+ c⊤
K xK
s.t.: A1 x 1 +···+ AK x K =b
D1 x1 = d1
.. ..
. .
D K xK = dK
x1 , ..., xK ∈ Rn+ .
The other type of complicating structure is the case in which the same set of decision variable is
present in multiple constraints, or multiple subsets Xk . In this case, we observe that variables of a
subproblem k ∈ {1, . . . , K} has nonzero coefficient in another subproblem k ′ ̸= k, k ′ ∈ {1, . . . , K}.
Hence, problem P takes the form of
P ′′ : max. x c⊤
0 x0 + c⊤
1 x1 +···+ c⊤
K xK
s.t.: A1 x 0 + D1 x1 = d1
.. .. ..
. . .
AK x0 + D K xK = dK
x0 , x1 , ..., xK ∈ Rn+ .
The challenging aspect is that a specific method becomes more suitable depending on the com-
plicating structure. Therefore, being able to identify these structures is one of the key success
factors in terms of the chosen method’s performance. As a general rule, problems with complicat-
ing constraints (as P ′ ) are suitable to be solved by a delayed variable generation method such as
12.2. Dantzig-Wolfe decomposition and column generation* 183
column generation. Analogously, problems with complicating variables (P ′′ ) are better suited for
employing delayed constraint generation methods such as Benders decomposition.
The development of professional-grade code employing decomposition methods is a somewhat
recent occurrence. The commercial solver CPLEX offers a Benders decomposition implementation
that requires the user to specify the separable structure. On the other hand, although there are
some available frameworks for implementing column generation-based methods, these tend to be
more ad hoc occurrences yet often reap impressive results.
Theorem 12.1 has an important consequence, as it states that bounded polyhedra, i.e., a polyhedral
set that has no extreme rays, can be represented by the convex hull of its extreme points. For now,
let us look at an example that illustrates the concept.
Consider the polyhedral set P given by
P = {x1 − x2 ≥ −2; x1 + x2 ≥ 1, x1 , x2 ≥ 0} .
C = (d1 , d2 ) ∈ R2 : 0 ≤ d2 ≤ d1 .
We can then conclude that the two vectors w1 = (1, 1) and w2 = (1, 0) are extreme rays of P .
Moreover, P has three extreme points: x1 = (0, 2), x2 = (0, 1), and x3 = (1, 0).
184 Chapter 12. Decomposition Methods
Figure 12.1 illustrates what is stated in Theorem 12.1. For example, a representation for the point
y = (2, 2) ∈ P is given by " # " # " # " #
2 0 1 1
y= = + + ,
2 1 1 0
that is, y = x2 + w1 + w2 . Notice, however, that y could also be represented as
" # " # " # " #
2 1 0 1 1 3 1
y= = + + ,
2 2 1 2 0 2 1
with then y = 12 x2 + 21 x3 + 32 w1 . Notice that this implies that the representation of each point is
not unique.
w1
2 x1 y
x2
1 x2
Recession
cone
x3 w2
0
0 1 2 3
x1
Figure 12.1: Example showing that every point of P = {x1 − x2 ≥ −2; x1 + x2 ≥ 1, x1 , x2 ≥ 0}
can be represented as a convex combination of its extreme point and a linear combination of its
extreme rays
PK
Notice that P has a complicating constraint structure, due to the constraints k=1 Ak xk = b. In
order to devise a decomposition method for this setting, let us first assume that we have available
for each of the sets Pk , k ∈ {1, . . . , K}, (i) all extreme points, represented by xjk , ∀j ∈ Jk ; and (ii)
all extreme rays wkr , ∀r ∈ Rk . As one might suspect, this is in principle a demanding assumption,
but one that we will be able to drop later on.
Using the Resolution theorem (Theorem 12.1), we know that any element of Pk can be represented
as X j j X
xk = λk xk + θkr wkr , (12.1)
j∈Jk r∈Rk
where λjk
≥ 0, ∀j ∈ Jk , are the coefficients of the convex combination of extreme points, meaning
P j r
that we also observe j∈Jk λk = 1, and θk ≥ 0, ∀r ∈ Rk , are the coefficients of the conic
combination of the extreme rays.
Using the identity represented in (12.1), we can reformulate P onto the main problem PM as
follows.
XK X j X
j
(PM ) : min. λ k c⊤
k xk + θkr c⊤ r
k wk
k=1 j∈Jk r∈Rk
K
X X X
s.t.: λjk Ak xjk + θkr Ak wkr = b (12.2)
k=1 j∈Jk r∈Rk
X
λjk = 1, ∀k ∈ {1, . . . , K} (12.3)
j∈Jk
where ek is the unit vector (i.e., with 1 in the k th component, and 0 otherwise). Notice that PM
has as many variables as the number of extreme points and extreme rays of P , which is likely to
be prohibitively large.
However, we can still solve it if we use a slightly modified version of the revised simplex method.
To see that, let us consider that b is a m-dimensional vector. Then, a basis for PM would be of
size m + K, since we have the original m constraints plus one for each convex combination (arising
from each subproblem k ∈ K). This means that we are effectively working with (m + K) × (m + K)
matrices, i.e., the basic matrix B and its inverse B −1 . Another element we need is the vector of
simplex multipliers p, which is a vector of dimension (m + K).
The issue with the representation adopted in PM arises when we are required to calculate the
reduced costs of all the nonbasic variables, since this is the critical issue for its tractability. That
is where the method provides a clever solution. To see that, notice that the vector p is formed by
components p⊤ = (q, r1 , . . . , rK )⊤ , where q represent the m dual variables associated with (12.2),
and rk , ∀k ∈ {1, . . . , K}, are the dual variables associated with (12.3).
The reduced costs associated with the extreme-point variables λjk , j ∈ JK , is given by
" #
⊤ j ⊤ Ak xjk j
ck xk − [q r1 . . . rK ] = (c⊤ ⊤
k − q Ak )xk − rk . (12.4)
ek
186 Chapter 12. Decomposition Methods
The main difference is how we assess the reduced costs of the non-basic variables. Instead of
explicitly calculating the reduced costs of all variables, we instead rely on an optimisation-based
approach to consider them only implicitly. For that, we can use the subproblem
(Sk ) : min. ck = (c⊤ ⊤
k − q Ak )xk
x
s.t.: xk ∈ Pk ,
which can be solved in parallel for each subproblem k ∈ {1, . . . , K}. The subproblem Sk is known
as the pricing problem. For each subproblem k = 1, . . . , K, we have the following cases.
We might observe that ck = −∞. In this case, we have found an extreme ray wkr satisfying
(c⊤ ⊤ r r
k − q Ak )wk < 0. Thus, the reduced cost of the associated extreme-ray variable θk is negative.
If that is the case, we must generate the column
!
Ak wkr
0
space that is O((m + K × m0 )2 ), where m0 is the number of rows of Dk , for ∀k ∈ {1, . . . , K}. This
is essentially the size of the inverse basic matrix B −1 . In contrast, the Dantzig-Wolfe reformulation
requires O((m + K)2 ) + K × O(m20 ) of memory space, with the first term referring to the main
problem inverse basic matrix and the second to the pricing problems basic matrices. For example,
for a problem in which m = m0 and much larger than, say, K = 10, this implies that the memory
space required by the Dantzig-Wolfe reformulation is 100 times smaller, which can substantially
enlarge the range of large-scale problems that can be solved for the same amount of computational
resources available.
ek
Notice that the unbounded case is not treated to simplify the pseudocode, but could be trivially
adapted to return extreme rays to be used in PM , like the previous variant presented in Algorithm
188 Chapter 12. Decomposition Methods
x∗l
⊤ ∗l ∗l
k ← argmin ck xk − q (Ak xk ) − rk : xk ∈ Pk .
∗l ∗l
∗l
6: if ck = c⊤ ∗l ∗l
k xk − q (Ak xk ) < rk then X̃k ← X̃k ∪ xk
7: end if
8: end for
9: l ← l + 1.
10: until ck > rk for all k ∈ {1, . . . , K}
11: return λ∗l .
7. Also, notice that the method is assumed to be initialised with a collection of columns (i.e.,
extreme points) X̃k , which can normally be obtained from inspection or using a heuristic method.
We finalise showing that the Dantzig-Wolfe and column generation methods can provide informa-
tion related to its own convergence. This means that we have access to an optimality bound that
can be used to monitor the convergence of the method and allow for a preemptive termination
given an acceptable tolerance. This bounding property is stated in Theorem 12.2.
Theorem 12.2. Suppose P is feasible with finite optimal value z. Let z be the optimal cost
associated with PM at a given iteration l of the Dantzig-Wolfe method. Also, let rk be the dual
variable associated with the convex combination of the k th subproblem and zk its optimal cost. Then
K
X
z+ (zk − rk ) ≤ z ≤ z.
k=1
Proof. We know that z ≤ z, because a solution for PM is primal feasible and thus feasible for P .
Now, consider the dual of PM
K
X
(DM ) : max. q ⊤ b + rk
k=1
s.t.: q Ak xjk + rk ≤ c⊤
⊤ j
k xk , ∀j ∈ Jk , ∀k ∈ {1, . . . , K}
q ⊤ Ak wkr ≤ c⊤ r
k wk , ∀r ∈ Rk , ∀k ∈ {1, . . . , K}
PK
We know that strong duality holds, and thus z = q ⊤ b + k=1 rk for dual variables (q, r1 , . . . , rK ).
j j
Now, since zk are finite, we have minj∈Jk (c⊤ ⊤
k xk −q Ak xk )= zk and minr∈Rk (c⊤ r ⊤ r
k wk −q Dk wk ) ≥ 0,
meaning that (q, z1 , . . . , zK ) is feasible to DM . By weak duality, we have that
K
X K
X K
X K
X
z ≥ q⊤ b + zk = q ⊤ b + rk + (zk − rk ) = z + (zk − rk ).
k=1 k=1 k=1 k=1
12.3. Benders decomposition 189
P (b) = {x ∈ Rn : Ax = b, x ≥ 0}
be defined as a function of the input vector b ∈ Rm . That is, P (b) is the set of feasible solutions
when the right-hand side is set to b. Then, let S = {b ∈ Rm : P (b) ̸= ∅} be the set of all vectors b
for which P has at least one feasible solution. Being so, we can restate the set S as
S = {Ax : x ≥ 0} .
That is, the set S is formed by the conic combination of the columns of A for which x is nonnegative
(or, in other words, that P (b) is feasible). Also, notice that this is a convex set, as discussed in
Section 6.2.1. For any b ∈ S, we can define the function
which takes as an argument b and returns the optimal value of the parametrised optimisation
problem. Notice that evaluating F requires that an optimisation problem is solved.
Now, let us assume that the dual feasibility set
SD = p ∈ Rm : p⊤ A ≤ c
is not empty. That implies that F (b) is finite for every b ∈ S since different b’s simply imply that
different objective functions p⊤ b over SD are considered. Our main objective here is to understand
the structure of the function F : S → R.
Let b ∈ S be a particular b. Suppose that there exists a nondegenerate optimal BFS xB with
basis B. Then, we have that xB = B −1 b, F (b) = c⊤ ⊤ −1
B xB = cB B b and that all reduced costs are
nonnegative.
Now, if we change b to b such that the difference is sufficiently small, B −1 b remains positive
and, consequently, xB remains a basic feasible solution. Furthermore, since b does not affect the
reduced costs (recall that our optimality condition is given by c = c⊤ − c⊤
BB
−1
A ≥ 0), they remain
nonnegative. Notice that this is the same argument we used in Section 6.1.3 when we discussed
changes in the input data in the context of sensitivity analysis.
The optimal value F (b) associated with this new b sufficiently close to b
F (b) − F (b) = cB B −1 (b − b) = p⊤ (b − b)
190 Chapter 12. Decomposition Methods
max. p⊤ b
s.t.: p⊤ A ≤ c.
This allows us to observe two important characteristics of F . First, in the vicinity of b, F (b) is
a linear function of b, Also, p represents a gradient of F . This, again, is an alternative view to
the conclusions we drew earlier in Section 6.1.3, and, in a way, further strengthens the idea of the
optimal dual variables p representing marginal values regarding changes in the components of b.
With the above, we can formalise an important result regarding the convexity of the function F (b).
Theorem 12.3 (Convexity of F (b)). The optimal value function F (b) is convex in b on the set S.
Proof. Let bi ∈ S and xi be their associated optimal solution, for i = 1, 2. Thus, F (bi ) = c⊤ xi , for
i = 1, 2. Let λ ∈ [0, 1]. The vector x = λx1 + (1 − λ)x2 is nonnegative, and Ax = λb1 + (1 − λ)b2 .
Thus, x is a feasible solution when b is set to λb1 + (1 − λ)b2 . Therefore,
which is the definition of a convex function. The first inequality is valid because x is a feasible
solution for when b is set to λb1 + (1 − λ)b2 .
D : max. p⊤ b
s.t.: p⊤ A ≤ c,
which is again assumed to be feasible. For any b in S, F (b) is finite, and by strong duality
(cf. Theorem 5.3), we have that F (b) = p⊤ b. Because of our outstanding assumption that A
has m linearly independent rows (and therefore, m linearly independent columns), Theorem 3.5
guarantees that the feasible region of D has at least one extreme point.
Let us suppose that we know all the extreme points p1 , . . . , pK of D. As the optimal solution for
D must be an extreme point, we can redefine F as
Specifically, F is the maximum of a finite collection of linear functions, and thus, piecewise linear.
Furthermore, within the region where F (b) is linear, or, as we have seen before, the change in b is
such that the corresponding optimal basis B of the primal does not change, F (b) = (pi )⊤ b where
pi is the corresponding dual cost.
Finally, we must consider the lack of differentiability of F . Notice that specific values of b will
indicate the point at which the optimal basis B changes, which implies a change in p. These
“junctions” represent the points in which D has multiple (i.e., not unique) solutions, which, as we
have also seen, implies that the primal problem becomes degenerate.
Let us assume that we change b in a particular way, i.e., b = b + θd, where θ ∈ R. We can then
redefine F by letting
f (θ) = F (b + θd).
12.3. Benders decomposition 191
f (θ)
(p1 )⊤ (b + θd)
(p3 )⊤ (b + θd)
(p2 )⊤ (b + θd)
θ1 θ2 θ
Figure 12.2: The optimal cost function F as a function of b in the direction d. The feasibility set
SD has three extreme points p1 , p2 , and p3 , each associated with a hyperplane (pi )⊤ (b + θd)
which represents the optimal cost as a function of the scalar θ. In fact, f (θ) represents a section
of the function F in the direction given by the vector d, which is thus also piecewise linear and
convex (and can be plotted). Figure 12.2 illustrates the function f projected onto direction d as a
function of θ.
To finalise, we return to one remaining issue associated with differentiability. Although p can be
seen as a gradient of F (b) at b, we know that F is not differentiable everywhere. To circumvent
that, we require a generalisation of the the concept of gradients, which is given by the notion of
subgradients.
Definition 12.4. Let F be a convex function on the convex set S. Let b ∈ S The vector p is a
subgradient of of F at b if
F (b) + p⊤ (b − b) ≤ F (b), ∀b ∈ S. (12.8)
Figure 12.3 illustrates the notion of a subgradient. Notice that, in general, the subgradient is a
singleton, composed only of the gradient (or normal vector) of the hyperplane, which is given by
the vector p. At the nondifferentiable points (the junctions), the subgradient comprises all hyper-
planes defined between the two hyperplanes intersecting (or all nonnegative linear combinations
of the normal vectors of the intersecting hyperplanes). For these values of b, we notice that the
dual problem has multiple solutions, implying that the referring BFS to the primal problem is
denegerate.
The last result we need is to show that the optimal solution of the dual problem p∗ is in fact a
subgradient of F (b) at b.
Theorem 12.5. Suppose that the linear programming problem P = min. c⊤ x : Ax = b, x ≥ 0
is feasible and the optimal cost is finite. Then, a vector p ∈ Rm is an optimal solution to the dual
problem if and only if it is a subgradient of the optimal value function F at b.
Proof. Recall that F is defined on the set S = {b ∈ Rm : P (b) ̸= ∅}, and that P (b) = {x ∈ Rn : Ax = b, x ≥ 0}.
Suppose p is an optimal solution to the dual problem D. Then, strong duality implies that
p⊤ b = F (b). Consider now an arbitrary b ∈ S. For any feasible solution x ∈ P (b), we have from
weak duality that
p⊤ b ≤ c⊤ x ⇒ p⊤ b ≤ min c⊤ x = F (b).
x∈P (b)
192 Chapter 12. Decomposition Methods
Notice that this implies that p⊤ b − p⊤ b ≤ F (b) − F (b), which in turn yields that p is a subgradient
of F at b since it rearranges to
F (b) + p⊤ (b − b) ≤ F (b).
Let us consider the converse. Assume that p is a subgradient of F at b. Thus, we have
Let choose an x ≥ 0, and let b = Ax, meaning that x ∈ P (b) and that F (b) ≤ c⊤ x. Using (12.9),
we obtain
p⊤ Ax = p⊤ b ≤ F (b) − F (b) + p⊤ b ≤ c⊤ x − F (b) + p⊤ b.
Since x ≥ 0, this implies that p⊤ A ≤ c, showing that p is a dual feasible solution. Also, for x = 0,
we obtain F (b) ≤ p⊤ b. Now, using weak duality, we have that a dual feasible solution p′ satisfies
(p′ )⊤ b ≤ F (b). Combining the two, we show that p is dual optimal, since
(p′ )⊤ b ≤ p⊤ b.
F (b) F (b)
F (b) + p⊤ (b − b)
F (b) + p⊤ (b − b)
b b b b
Figure 12.3: The subgradients of the function F at b. On the left, the unique subgradient of F at
b is the gradient of the affine function F (b) + p⊤ (b − b). On the right, the gradient of the affine
function F (b) + p⊤ (b − b) at b is contained in a subgradient for F at b.
Therefore, more generally, we can say that at the breakpoints, F has multiple subgradients while
everywhere else, the subgradients are unique and correspond to the gradients of F .
Notice that this is equivalent to the problem P ′ presented in Section 12.1, but with a notation
modified to make it easier to track how the terms are separated in the process. We can see that P
has a set of complicating variables x, which becomes obvious when we recast the problem as
This structure is sometimes referred to as block-angular, referring to the initial block of columns
on the left (as many as there are components in x) and the diagonal structure representing the
elements associated with the variables y. In this case, notice that if the variable x were to be
removed, or fixed to a value x = x, the problem becomes separable in K independent parts
s.t.: Dk yk = ek − Ck x
yk ≥ 0.
Notice that these subproblems k ∈ {1, . . . , K} can be solved in parallel and, in certain contexts,
might even have analytical closed-form solutions. The part missing is the development of a coor-
dination mechanism that would allow for iteratively updating the solution x based on information
emerging from the solution of the subproblems k ∈ {1, . . . , K}.
To see how that can be achieved, let us reformulate P as
K
X
(PR ) : min. c⊤ x + zk (x)
x
k=1
s.t.: Ax = b
x ≥ 0.
fk⊤ yk : Dk yk = ek − Ck x .
zk (x) = min.
y
Notice the resemblance between zk (x) and the optimal value function F (b) introduced in Section
12.3.1. This is because the subproblems become parametric optimisation problems but as a function
of x. That is, in this case, the subproblems are assumed to have x = x set as a parameter.
Analogously, evaluating zk (x) requires solving the subproblem Sk , which, in turn, depends on x.
The Benders decomposition works by iteratively constructing the optimal value function. Instead
of assuming that all dual extreme points are known, we collect them iteratively, forming an approx-
imation of the optimal value function zk (x) that becomes increasingly precise as we collect more
such extreme points. And, to find new dual extreme points, we can use our current approximation
of the optimal value function zk (x) in PR , which, once solved, returns us a new x to be used for
finding a new dual extreme point. Notice that this procedure is akin to iteratively finding the
linear segments that form the optimal value functions zk (x).
194 Chapter 12. Decomposition Methods
To formalise the discussion above, let us first consider the dual formulation of the subproblems
k ∈ {1, . . . , K}, which is given by
s.t.: p⊤
k Dk ≤ fk .
Pk = p : p⊤ Dk ≤ fk , ∀k ∈ {1, . . . , K} ,
(12.10)
and assume that each Pk ̸= ∅ with at least one extreme point1 . Relying on the resolution theorem
(Theorem 12.1), we know that Pk can be represented by its extreme points pik , i ∈ Ik and extreme
rays wkr , r ∈ Rk .
As we assume that Pk ̸= ∅, two cases can occur when we solve SkD , k ∈ {1, . . . , K}. Either SkD is
unbounded, meaning that the relative primal subproblem is infeasible, or SkD is bounded, meaning
that zkD < ∞.
From the first case, we can use Theorem 6.6 to conclude that primal feasibility (or a bounded dual
value zkD < ∞) can only be attained if and only if
Furthermore, we know that if SkD has a solution, that must lie on a vertex of Pk . So, having
available the set of all extreme vertices pik , i ∈ Ik , we have that if one can solve SkD , it can be
equivalently represented as
min. θk (12.13)
s.t.: θk ≥ (pik )⊤ (ek − Ck x), ∀i ∈ Ik . (12.14)
Again, notice that (12.12) is equivalent to (12.7). Combining (12.11)–(12.14), we can reformulate
PR into a single-level equivalent form
K
X
(PR ) : min. c⊤ x + θk
x
k=1
s.t.: Ax = b
(pik )⊤ (ek − Ck x) ≤ θk , ∀i ∈ Ik , ∀k ∈ {1, . . . , K} (12.15)
(wkr )⊤ (ek − Ck x) ≤ 0, ∀r ∈ Rk , ∀k ∈ {1, . . . , K} (12.16)
x ≥ 0.
Notice that, just like the reformulation used for the Dantzig-Wolfe method presented in Section
12.2, the formulation of PR is of little practical use since it requires the complete enumeration of
(a typically prohibitive) number of extreme points and rays and is likely to be computationally
intractable due to the large number of associated constraints. To address this issue, we can
employ delayed constraint generation and iteratively generate only the constraints we observe to
1 We have discussed in Section 12.3.2 why we can assume that Pk has at least one extreme point
12.3. Benders decomposition 195
be violated. Notice that this can be alternatively interpreted as the idea of iteratively generating
the segments of the optimal value function zk (x).
l
Following this idea, at a given iteration l, we have at hand a relaxed main problem PM , which
comprises only some of the constraints associated with the dual extreme points and rays obtained
until iteration l. The relaxed main problem can be stated as
K
X
l
(PM ) : zPl M = min. c⊤ x + θk
x
k=1
s.t.: Ax = b
(pik )⊤ (ek − Ck x) ≤ θk , ∀i ∈ Ikl , ∀k ∈ {1, . . . , K}
(wkr )⊤ (ek − Ck x) ≤ 0, ∀r ∈ Rkl , ∀k ∈ {1, . . . , K}
x ≥ 0,
where Ikl ⊆ Ik , ∀k ∈ {1, . . . , K} represent subsets of extreme points pik of Pk , and Rkl ⊆ Rk subsets
of extreme rays wkr of Pk .
We can iteratively obtain these extreme points and rays from the subproblems Sk , k ∈ {1, . . . , K}.
l
To see that, let us first define that, at iteration l, we solve the main problem PM and obtain a
solution l l l
argmin PM = (xl , θ1 , . . . θK ).
x,θ
We can then solve the subproblems Skl , k ∈ {1, . . . , K}, for that fixed solution xl and then observe
if we can find additional constraints that were to be violated if they had been in the relaxed main
problem in the first place. In other words, we can identify if the solution xl allows for identifying
additional extreme points pik or extreme rays wkr of Pk that were not yet included in PM l
.
Another way to interpret this notion of violation is to again think of the optimal value function.
When we solve Skl , for each k ∈ {1, . . . , K}, we are obtaining a “true“ (as opposed to approxi-
mate) evaluation of the optimal value function zk (xl ) which, when compared against the working
approximation of zk (valued as θk ) in the main problem, provides a value that is greater than that
of θk , that is,
θk < (pik )⊤ (ek − Ck xl ). (12.17)
This implies that the approximation of the optimal value function has a segment missing at xl ,
which is precisely the one given by (pik )⊤ (ek − Ck x).
Figure 12.4 illustrates how the optimal value function approximation is iteratively constructed.
To identify those violated constraints, first recall that the subproblem (in its primal form) is given
by
(Skl ) : min. f ⊤ y
s.t.: Dk yk = ek − Ck xl
yk ≥ 0.
Then, two cases can lead to generating violated constraints that must be added to the relaxed
l+1
primal problem to form PM . The first is when Skl is feasible. In that case, a dual optimal basic
il ⊤ l l
feasible solution pil
k is obtained. If (pk ) (ek − Ck x ) > θ k , then we can conclude that we just
formed a violated constraint of the form of (12.15). The second case is when Skl is infeasible, then
an extreme ray wkrl of Pk is available, such that (wkrl )⊤ (ek − Ck xl ) > 0, violating (12.16).
196 Chapter 12. Decomposition Methods
zk (x) zk (x)
z(x3 ) 4
θ
3
θ (p∗k )⊤ (ek − Ck x) (p3k )⊤ (ek − Ck x)
x3 x x4 x
Figure 12.4: zk (x) is described by 3 line segments. At iteration l = 3, two are available. The
3
solution to PM3
returns θ , which is lower than z(x3 ), obtained solving SkD for xl . The solution
p∗k = p3k from z(xl ) defines the missing segment (p∗k )⊤ (ek − Ck x). A new optimality cut is added
4
and in iteration l = 4, x4 is obtained from solving PM 4
. Notice that we would have θ = z(x4 ),
meaning that the algorithm terminates.
Notice that the above can also be accomplished by solving the dual subproblems SkD , k ∈ {1, . . . , K},
instead. In that case, the extreme point pil
k is immediately available and so are the extreme rays
wkrl in case of unboundedness.
Algorithm 9 presents a pseudocode for the Benders decomposition. Notice that the method can
benefit in terms of efficiency from the use of dual simplex, since we are iteratively adding violated
l
constraints to the relaxed main problem PM . Likewise, the dual of the subproblem Skl , SkDl has only
the objective function coefficients being modified at each iteration and, in light of the discussion
in Section 6.1.3, can also benefit from the use of dual simplex. Furthermore, the loop represented
by Line 4 can be parallelised to provide further computational performance improvements.
Notice that the algorithm terminates if no violated constraint is found. This in practice implies
12.3. Benders decomposition 197
K
that (pik )⊤ (ek − Ck x) ≤ θk for all k ∈ {1, . . . , K}, and thus (x, θk k=1 ) is optimal for P . In a way,
if one considers the dual version subproblem, SkD , one can notice that it is acting as an implicit
search for values of pik that can make (pik )⊤ (ek − Ck x) larger than θk , meaning that the current
l
solution x violates (12.15) and is thus not feasible to PM .
l
Also, every time one solves PM , a dual (lower for minimisation) bound LB l = zPl M is obtained. This
is simply because the relaxed main problem is a relaxation of the problem P , i.e., it contains less
constraints than the original problem P . A primal (upper) bound can also be calculated at every
iteration, which allows for keeping track of the progress of the algorithm in terms of convergence
and preemptively terminate it at any arbitrary optimality tolerance. That can be achieved by
setting
( K
)
X
l l−1 ⊤ l ⊤ l
U B = min U B , c x + f yk
k=1
( K K
)
l−1
X l X
= min U B , zPl M − θk + zkDl ,
k=1 k=1
n l oK
where (xl , θk , y lk = argminy Skl , and zkDl is the objective function
l
) = argminx,θ PM
k=1
value of the dual subproblem SkD at iteration l. Notice that, differently from the lower bound LB l ,
there are no guarantees that the upper bound U B l will decrease monotonically. Therefore, one
must compare the bound obtained at a given iteration l using the solution (xl , y l1 , . . . , y lK ) against
an incumbent (or best-so-far) bound U B l−1 .
198 Chapter 12. Decomposition Methods
12.4 Exercises
Exercise 12.1: Dantzig-Wolfe decomposition
Consider the following linear programming problem:
We wish to solve this problem using Dantzig-Wolfe decomposition, where the constraint x11 +x23 ≤
15 is the only “coupling” constraint and the remaining constraints define a single subproblem.
(a) Consider the following two extreme points for the subproblem:
and
x2 = (0, 10, 10, 20, 0, 0).
Construct a main problem in which x is constrained to be a convex combination of x1 and
x2 . Find the optimal primal and dual solutions for the main problem.
(b) Using the dual variables calculated in part a), formulate the subproblem and find its optimal
solution.
(c) What is the reduced cost of the variable λ3 associated with the extreme point x3 obtained
from solving the subproblem in part b)?
max. z = 5x + 4y (12.18)
s.t.: 6x + 4y ≤ 24 (12.19)
x + 2y ≤ 6 (12.20)
y−x≤1 (12.21)
y≤2 (12.22)
x, y ≥ 0. (12.23)
12.4. Exercises 199
(a) Formulate a parametric optimization problem F (x), maximizing the profit from selling inte-
rior paint (y) given the amount of exterior paint (x) produced. The solution to this problem
must be feasible to the full paint factory problem.
(b) Solve the paint factory problem using Benders decomposition. Use x as the main problem
variable and y as the subproblem variable.
Consider a wholesaler company planning to structure its supply chain to the retailers of a given
product. The company needs to distribute the production from many suppliers to a collection of
distribution points from which the retailers can collect as much product as they need for a certain
period. By default, a pay-as-you-consume contract between wholesaler and retailers is signed and,
therefore, the demand at each point is unknown at the moment of shipping. Consider there is no
penalty for any unfulfilled demand and any excess must be discarded from one period to the other.
The following parameters are given:
• pi : production at supplier i
The model for minimising the cost (considering revenue as a negative cost) is given below,
X X X X
min. Bi pi + Tij tij + Ps (−Rj ljs + Wj wjs )
i∈I i∈I,j∈J s∈S j∈J
s.t.: pi ≤ Ci , ∀i ∈ I
X
pi = tij , ∀i ∈ I
j∈J
X
rj = tij , ∀j ∈ J
i∈I
rj = ljs + wjs , ∀j ∈ J, ∀s ∈ S
ljs ≤ Djs , ∀j ∈ J, ∀s ∈ S
pi ≥ 0, ∀i ∈ I
rj ≥ 0, ∀j ∈ J
tij ≥ 0, ∀i ∈ I, ∀j ∈ J
ljs , wjs ≥ 0, ∀j ∈ J, ∀s ∈ S.
Solve an instance of the wholesaler’s distribution problem proposed using Benders decomposition.
Bibliography
[1] Dimitris Bertsimas and John N Tsitsiklis. Introduction to linear optimization, volume 6. Athena
Scientific Belmont, MA, 1997.
[2] John R Birge and Francois Louveaux. Introduction to stochastic programming. Springer Science
& Business Media, 2011.
[3] Jacek Gondzio. Interior point methods 25 years later. European Journal of Operational Re-
search, 218(3):587–601, 2012.
[4] Changhyun Kwon. Julia Programming for Operations Research. Changhyun Kwon, 2019.
[5] Hamdy A Taha. Operations Research: An Introduction (7th Edition). Pearson/Prentice Hall,
2003.
[6] H Paul Williams. Model building in mathematical programming. John Wiley & Sons, 2013.
201