Lecture 7 Large-Scale
Lecture 7 Large-Scale
min c0 x
s.t. Ax = b,
x ≥ 0.
In the full tableau implementation, in each iteration we update the following table which requires us to
compute the reduced cost c̄j = cj − c0B B−1 Aj and the search direction B−1 Aj for all nonbasic variables xj .
However, in some large scale optimization problems, we have a huge number of decision variables, i.e, n is
very large, and accessing every column in the matrix A in each iteration can be time consuming. One can
overcome this difficulty if the following two steps can be achieved:
The second step is the key idea behind the revised simplex method, whose typical iteration is summarized
below.
An iteration of the revised simplex method
1. In each iteration, we start with the basic columns AB(1) , ...AB(m) , the associated BFS x and B−1 .
2. Compute the reduced costs c̄j = cj − c0B B−1 Aj sequentially. If one encounters c̄j < 0 for some j for
the first time, then stop and return the index j. If all reduced costs are all nonnegative, the current
basic feasible solution is optimal, and the algorithm terminates.
3. For the returned nonbasic index j, compute u = B−1 Aj . If u ≤ 0, then the optimal cost is −∞ and
the algorithm terminates; else let
xB(i)
θ∗ = min .
{i|ui >0} ui
x
and l be the index such that θ∗ = B(l)
ul . Form a new basis by replacing AB(l) with Aj and compute
the new BFS y via yj = θ∗ , and yB(i) = xB(i) − θ∗ ui for i 6= l.
7-1
Lecture 7: Large Scale Optimization 7-2
Note that the steps 3 and 4 are essentially the same as solving
m
X
min cB(i) xB(i) + cj xj
i=1
s.t. BxB + Aj xj = b,
xB ≥ 0, xj ≥ 0.
In some delayed column generation methods, instead of keeping just the basic columns and throwing away
the exit column in each iteration, one may keep some of the columns {Ai |i ∈ I} with I ⊆ {1, ..., n} and solve
the following smaller problems (without explicitly going through the simplex iteration as in steps 3 and 4)
m
X X
min cB(i) xB(i) + cj xj + ci x i
i=1 i∈I
X
s.t. BxB + Aj xj + Ai xi = b,
i∈I
xB ≥ 0, xi , xj ≥ 0.
In the revised simplex method, once a column Aj with negative reduced cost is found, the rest of the nonbasic
columns will not be accessed when performing steps 3 and 4. However, in step 2, in the worst case, one still
needs to generate every column Aj —the generation of the columns is not delayed.
We demonstrate using the example below that when the problem has certain special structure, minj c̄j can
be computed without accessing to every column Aj .
Example 7.1 (Cutting Stock Problem) Consider a paper company that has a supply of large rolls of
paper of width W , which is assumed to be a positive integer. There are demands for bi rolls of paper with
width wi , where wi ≤ W for i = 1, ..., m. A large roll can be sliced in a certain pattern to obtain smaller
rolls. Let ai be the number of rolls of width wi to be produced from a single large roll. A feasible pattern
(a1 , ..., am ) then must satisfy
Xm
ai wi ≤ W.
i=1
If there are in total n feasible patterns, we then collect all feasible patterns in a matrix A of dimension m×n.
For instance, when W = 7, w1 = 2, w2 = 4, the following matrix summarizes all feasible patterns:
0 0 1 1 2 3
A= ,
0 1 0 1 0 0
with the column Aj corresponding to a pattern j.
Let xj be the number of large rolls cut according to pattern j. The company seeks to minimize the number
of large rolls used while satisfying customer demand:
n
X
min xj
j=1
Xn
s.t. aij xj = bi , i = 1, ..., m
j=1
xj ≥ 0, j = 1, ..., n.
Lecture 7: Large Scale Optimization 7-3
• If the knapsack problem returns an optimal value less than or equal to one, we then know c̄j ≥ mini c̄i ≥
0. Hence, the current basis B is optimal.
• If the knapsack problem returns a value greater than one with optimal solution (a∗1 , ..., a∗m ), then we
have identified a nonbasic column A0j = (a∗1 , ..., a∗m ), that enters the basis.
• If p∗ satisfies all the constraints p0 Ai ≤ ci , i = 1, ..., n. Then, p∗ must also be optimal to the original
dual problem, and the algorithm terminates.
• If p∗ violates constraint i for some i 6∈ I, then we add i into I.
The step of checking feasibility is the same as checking the nonnegativity of the reduced cost in the delayed
column generation method, and we need an efficient method for identifying a violated constraint. Usually,
this is achieved by finding an efficient way for solving
min ci − (p∗ )0 Ai .
i=1,...,n
Lecture 7: Large Scale Optimization 7-4
Solving the above problem without going through every term ci − (p∗ )0 Ai is possible when the problem has
certain special structure, which we demonstrate next.
Let (Ω, F, P) be a probability space. Consider a decision maker who acts in two consecutive stages with
some random information being revealed in the second stage. In the first stage, the decision maker needs to
choose a vector x that satisfies the constraints
Ax = b,
x ≥ 0.
The decision y(ω) generates a second stage cost f 0 y(ω). Let z(x, ω) be the minimum second stage cost given
a scenario ω and first stage decision x. It follows that
While EP [z(x, ω)] is in general a nonlinear function of x, the above problem can nevertheless be formulated
as an LP when Ω consists of finite samples, say, ω1 , ..., ωK . Let αi be the probability of scenario ωi . The
above problem is then equivalent to
K
X
min c0 x + αi f 0 yi
x,yi ,i=1,...,K
i=1
s.t. Ax = b, (7.3)
Bi x + Dyi = di , i = 1, ..., K
x, y1 , ..., yK ≥ 0.
Example 7.2 (Joint Inventory and Transportation Problem) Suppose a retailer manages the inven-
tory at n warehouses, which are used to satisfy random demands at m locations. In the first stage, the retailer
needs to decide x ∈ Rn with xi being the inventory placed at warehouse i for i = 1, ..., n. The procurement
cost at warehouse i is ci so that the total procurement cost generated in the first stage is c0 x.
Lecture 7: Large Scale Optimization 7-5
In the second stage, the demand d(ω) at m locations is realized with dj (ω) being the demand at location j
in scenario ω. Given the inventory level x and the demand realization d(ω), the retailer needs to decide
yij (ω), the amount of inventory transported from warehouse i to satisfy demand at location j. The unit
transportation cost from i to j is tij and the unit revenue for satisfying demand at location j is rj . The
second stage problem is then
n X
X m
z(x, ω) = min (tij − rj )yij (ω)
y(ω)
i=1 j=1
m
X
s.t. yij (ω) ≤ xi ,
j=1
Xn
yij (ω) ≤ dj (ω),
i=1
y(ω) ≥ 0.
The first stage problem is simply
min c0 x + EP [z(x, ω)]
x
s.t. x ≥ 0.
When K is large and y(ω), d(ω) has dimension m, t respectively, the formulation in (7.3) is an LP with
O(mK) decision variables and O(tK) equality constraints, and can be computationally demanding to solve1 .
Observe that given a fixed x, the problems of finding y(ω) are all decoupled and we can solve K much
smaller LPs, i.e., (7.1), with m decision variables and t equality constraints. The difficulty lies in the fact
that finding x is coupled with finding y(ω), ω ∈ Ω. The idea behind Benders decomposition is to decouple
the two tasks.
In the following we assume that (7.1) is feasible and has finite optimal value for any ω ∈ Ω. The dual of
(7.1) is
z(x, ω) = max p0 (ω)(d(ω) − B(ω)x)
p(ω)
(7.4)
s.t. p0 (ω)D ≤ f 0 .
Let pi , i = 1, ..., I be the extreme points of {p|p0 D ≤ f 0 }. By our assumption on (7.1), problem (7.4) also
has finite optimal value and we must have
z(x, ω) = max (pi )0 (d(ω) − B(ω)x),
i=1,...,I
which is equivalent to
z(x, ω) = min z(ω)
z(ω)
We call formulation (7.5) the master problem, which only has O(K) decision variables (as opposed to O(mK)
in (7.3)). But (7.5) has an extremely large number of inequality constraints—O(IK). We can overcome this
via delayed constraint generation.
We start with (7.3) that involves only a subset of inequality constraints. Suppose the resulting optimal
solution to this relaxed master problem is x∗ and z∗ = (z1∗ , ..., zK
∗
). We then need to check the feasibility
of (x∗ , z∗ ) with respect to the rest of constraints in (7.3). The key idea here is to solve some auxiliary
subproblems instead of checking the constraints (pi )0 (d(ω) − B(ω)x∗ ) ≤ z ∗ (ω) one by one. In particular,
for each ω ∈ Ω, we solve
min f 0 y(ω)
y(ω)
From solving above problem, we can obtain the optimal dual BFS: pi(ω) for every ω ∈ Ω.
for all i = 1, ..., I. As a result, (x∗ , z∗ ) is feasible to (7.5) and hence optimal.
• If (pi(ω̄) )0 (d(ω̄) − B(ω̄)x∗ ) > z ∗ (ω̄) for some ω̄ ∈ Ω, then we have identified a violating constraint:
References
[BT97] D. Bertsimas and J.N. Tsitsiklis, Introduction to Linear Optimization, Springer, 1997.