Accelerating Benders Decomposition Algorithmic Enh PDF
Accelerating Benders Decomposition Algorithmic Enh PDF
net/publication/265668231
CITATIONS READS
404 1,029
2 authors, including:
Richard Wong
45 PUBLICATIONS 3,861 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Richard Wong on 13 September 2016.
IV C-
;~~~ -)i :-: ,, S:- ~
. F 0. Xf : - f 0:00t:f... '
-::-,}0
:....
.. ·' ··
: I:·.-r'·.- - -LI
-: · --
I·'·-
-- ·- ·;· :;-:; ·· .
:-·· ·.
:'·· :·
i.. ··
....-
·-··-
0;- F. Tc :
OFT_.
, NO .
ACCELERATING BENDERS DECOMPOSITION:
ALGORITHMIC ENHANCEMENTS AND
MODEL SELECTION CRITERIA
by
Thomas L. Magnanti
and
Richard T. Wong
Guerin, and Bushel [11 have used the algorithm to schedule the movement
routing, and Geoffrion and Graves [16] have had great success applying the
[9] have recently discussed the advantages of using the algorithm for ve-
programs.
all applications. Geoffrion and Graves 16], among others, have noted
that reformulating a mixed integer program can have a profound effect upon
formulation for a given problem? For those applications where the algo-
rithm does not work well, is there a mechanism for improving its conver-
pects for applying Benders decomposition and has prompted the study re-
this by choosing judiciously from the possible cuts that could be gene-
fng that also includes Dantzig-Wolfe decomposition for linear and non-
.near programs, and related "cutting plane" type algorithms that arise in
that exploits the underlying structure of these models. Since the linear
cuts.
proved formulations can also allow the generation of stronger cuts for
Minimax Problems
Two of the most widely-used strategies for solving large scale opti-
(see, for example, Geoffrion [13] and [14], and Magnanti [22]) point out
The techniques are not only applied directly; their use is, at times, com-
within the framework of branch and bound for solving integer programming
Since Benders algorithm, the focus of our analysis, is but one mani-
but somewhat more abstract, minimax setting that captures the essence of
in the outer minimizing variable y for each choice of the inner maximizing
-4-
variable u.
Minimize cx + dy
x > 0, y Y .
Ax = b - Dy
1
These assumptions can be relaxed quite easily, but with added complica-
tions that cloud our main development. See Garfinkel and Nemhauser [12]
The minimax problem (1) also arises when dualizing the constraints
Maximize f(u)
u U .
the convex subset of the nonnegative orthant for which the maximization
For any given y Y, let v(y) denote the value of the maximization
v = Min v(y)
yEY
where
the set Y is convex, the minimax problem can be viewed as a convex program.
There has been a great flourish of activity recently in modifying and ex-
Wolfe 37], and the references that they cite). An alternative solution
Rewrite (4) as
Minimize z
y Y, z R
Minimize z
y Y, z R
where each u
j is an element of U. The solution y ,
K zK of this "master
that problem; that is, if v(y ) < z . If, on the other hand, v(y ) > z ,
K +
(uK+ =
z > f(u ) + yg(uK+)
(7). The algorithm continues in this way, alternately solving the master
When applied to problems (3) and (4), this algorithm is known, res-
2
As before, to simplify our discussion we assume that this problem always
linear program, then the point u j in (7) can be chosen as extreme points
the set U is compact and the functions f and g are continuous, then any
program. Even when the master problem is a linear program as in the ap-
Hays 31], Wolfe [36]). There are several possibilities for improvement:
of yK at each step;
that the initial selection of cuts can have a profound effect upon the
There have been several proposals to alter the master problem for
and Widhelm [30]) show that scaling the constraints of the master problem
ficial. Marsten, Hogan, and Blankenship [25], (see also Marsten [24]),
have had success in restricting the solution to the master problem at each
step to lie within a box centered about the previous solution. Hollaway
19] shows how to select among multiple optima of the master problem to
other ways. For example, they might have different linear programming or
Lagrangian relaxations, one being preferred to the other when used in con-
Ray [6], Beale and Tomlin [2], and Williams [35], in the context of
linear programming relaxation for branch and bound, and Geoffrion and
location models, show that proper model formulation can generally improve
ther the question of proper model formulation for mixed integer programs
becomes an issue. Recall from section 1 that any solution to the sub-
A of the mixed integer program (2) models network flow structure, multiple
section, and the following one, we introduce methods and algorithms for
First, we must formalize some definitions. We say that the cut (or
stronger than the cut, z > f(u) + yg(u), if f(u1 ) + yg(u ) > f(u) + yg(u)
call a cut pareto optimal if no cut dominates it. Since a cut is deter-
oint of Y.
The following theorem provides a method for choosing from among the
optimal cuts.
Proof: Suppose to the contrary that u is not pareto optimal; that is,
©
f(u) + yg(u) > f(u© ) + yg(u ) for all y Y, (10)
it is true that
To establish the last inequality, recall that any point w Y can be ex-
optimal solution to the optimization problem (8), that is, u U(y). But
-11-
Since u dominates u ,
0 C
c)
for at least one point y £ Y. Also, since y ri(Y , there exists (see
(1 - e), which is negative and reverses the inequality, and adding gives:
But this inequality contradicts (11), showing that our supposition that u
= k
When f(u) ub, g(u) = (d - uD), and U = {u £ R : uA < c}, as in
program. In this case, U(y) is the set of points in U satisfying the li-
near equation, u(b - Dy) = v(y) - dy, where v(y) is the optimal value of
the subproblem (5). Therefore, to find a pareto optimal point among all
the alternate optimal solutions to problem (8), we solve problem (9), which
and uA < c .
We should note that varying the core point y might conceivably gene-
cut version of Benders algorithm has the option of generating pareto op-
k k
Y = {y R : y < p, y > 0 and integer},
j=l
and g are additively separable over the sets UJ; that is,
f(u) = E f(u
j=l
and
g(u) = gj(u(),
j=l
J
v(y) = v (y)
j=l j
the vector u belongs to U(i), meaning that the sum over j of the lefthand
sides of these expressions equals the sum of the righthand sides if, and
only if,
for all j. That is, choosing u to be one of the alternate optimal solu-
let u (Y) denote the set of optimal solutions to the optimization problem
In this case, problem (14) decomposes into several linear programs, one for
J
g(x)-= gj(uj)) >
j=l (J) -
program:
-15-
n m m
v = Min E E c..x.. + E d.y.
i=l j=l z] i. j=l
m
subject to: x.. > 1 (i = 1,2,. . .,n) (16)
j=l
yj = 0 or 1
yEY
n = number of customers
requires that each customer be serviced by some facility. The second con-
and our discussion in section 4 suggest reasons for choosing this parti-
with constraints x.ij < nyj for all j in place of the constraints xij < yj
n m
v(y) = Min Z Z C ..x..
i=l j=l
m
subject to: O x.. >1 (i = 12,. . .,n) (17)
j=l 1]-
(1 < i < n)
where
n m
v (y) = Max £ [X. - z y.r. 4
i= 1 j=l 1
i=l
Tr . > .
13 -
n m m
v > (X. - Z r. .y.) + Z d.y. (19)
i=l I j=l 3] 3 j=l ]
(Note that we have appended the term Zdjyj to the right hand side of the
cut. This term was omitted from the objective function of the subproblem (17)
ables yj.)
Careful inspection of the linear program (17) reveals that, for most
usually will be possible to derive more than one Benders cut. We next des-
Note that for any choice of y Y, the linear programs (17) and (19)
decompose into separate subproblems, one for each index i = 1,2,. . . ,n.
= c min {c..; j O}
i ij(i) i 3
7ij= 0 if j 0 (20)
to the linear programming dual problem (18) has the property that the op-
with u (i) = (Xi, ilT i2- . . in) implies that solving for each i the
3
The optimal dual variables have a convenient interpretation in terms of
the subproblem
m
Max (X. - y.T. .]
j=l
m
subject to: X.1 - z yr.. = X. (21)
J13 1
T.. >O
13 -
X.1 ->0
.,n and j = 1,2,. . .,m. Here, as before, y denotes the current value of
Our first objective is to show that, for each i, the subproblem (21)
X. - 'g T..
I j sO ] 1 ij(i)
and since
i ij(i) - ij(i)
I _ I _1_1__
-19-
ij(i) = - ci = . - X
m
Max X + Z (j y.)rij
y-
_ 0
j =yj - Yj
two constraints
bounds on i.:
. < . < L.
because each y > 0 and each .ij > 0. The upper bound is a consequence of
3- 1 3-
our previous observation that, for all j j(i) and j E 0, V.. = 0 and,
13
-20-
interval X. < X < L. in order from left to right until the slope of any
1 - 1 - 1
stop, X. = L. is optimal.
Once the optimal value of Xi is found using this algorithm for each-i,
the remaining variables .ij can be set using the rules given above. Then,
for any given point y in the core of Y. Also, the algorithm applies to
this section.
-21-
This algorithm can also be extended to more complex models like the
a procedure are similar to the algorithm that we have just described and
pp. 997-998]. Both procedures essentially give a set of rules for in-
problem (16).
stronger than the natural cuts defined by setting Ai = .i and i..= r.. in
to network design and other mixed integer programming models. This section
N N
(P) Minimize Z Z d..x..
i=l j=l 1 1]
N
subject to: x.. = 1 V. (23)
N
Z y. = p (25)
i=l 1
N N
(Q) Minimize Z Z d..x..
i=l j=1 13
N
x.i < Ny (27)
j=l
if we relax the integrality constraint on the yi, the feasible region for
-23-
5.1:
400
z > 200 - y1 - 400y 2 + 0y 3 + 0y4
400
z > 200 + Oy 1 - 400y 2 + Oy3 - y4
This set of cuts has the property that every single one must be generated
and let d.. = 100 for all i j and d.. = 0 for all i = j. With this
1] 1]
-24-
every case, requires only one Benders cut for convergence! This example
results.
represented as:
x > 0
and
w > 0
x,w, and y are column vectors of problem variables; b and g are column
captures the integer constraints of the problem. We assume that the set
Y is finite.
---
-25-
That is, the two models have the same integer variables and may have dif-
ferent continuous variables and constraints, but always give the same ob-
jective function value for any feasible assignment of the integer variables.
We will say that the two formulations are identical if v (y) = vQ(y) for
tation of equivalence is that vP (y) and vQ(y) represent the linear pro-
and Q as
Minimize z
y C Y
Minimize z
v (y) and v(y) are feasible and have optimal solutions for all y Y.
These constraints can be relaxed, but with added complications that do not
our definition of Benders cuts, in which a cut can be generated from any point
in the subproblem dual feasible region, produces a larger set of possible cuts
than the usual definition which restricts the cuts to those corresponding to
the extreme points of the subproblem dual feasible region. With this limited
definition of Benders cuts, the results of this section need not always be valid.
constraint)
z > (b - By) + dy
for Q if
A cut z > y(g - Gy) + dy for Q will be called unmatched with respect to the
formulation P if there is no cut for P that is equal to it (in the sense that
two cuts are equal if their right-hand sides are equal for all y Y) or
dominates it.
least one Benders cut that is unmatched with respect to P, but P does not have
lations and the set of Benders cuts for P is a proper subset of the Benders
cuts for Q.
-27-
with respect to P. Since we are assuming that the set Y is finite, the
Now observe that the above inequality still holds if we replace the set Y
c
by Y . Using linear programming duality theory, we can reverse the order
subject to: Ax + By = b
x > 0, y Yc
subject to: Ax = b - By
x> .
to Q, gives us:
-28-
subject to: Dw = g - Gy
w> 0
or
vP (yO) < y) .
(<-) The reverse implication has essentially the same proof with all
progranuming problem. Q is superior to P if, and onZy if, vQ(y) > vP (y)
Proof: (-) If v(y) > vP (y) for all y E YC, Lemma 1 says that P
does not have any Benders cuts that are unmatched with respect to Q. But because
there is a y Y such that v(y) > v(yO) Lemma 1 implies that Q has a cut that
does not have any cuts that are unmatched with respect to Q. Lemma 1
then tells us that vQ(y) > vP (y) for all y Y. The definition of
superior also states that Q has a cut that is unmatched with respect to
P and using lemma 1 we can say that there exists a y Y such that
y( o .)
vQ(yO) > v
preted in another way. Let the reZlaxed primal problem for any formu-
the "tightest" possible constraint set) for its relaxed primal problem
a smaller feasible region for its relaxed primal problem will result in.
larger values of the function v (y) which Lemma 1 and Theorem 2 indicate
is desirable.
of this section has two formulations P and Q. They differ only in that
P has constraints of the form x.. < Yi for all (i,j), whereas Q has
is no larger than that for Q. So vP(y) > vQ(y) for all y yC. A straight-
)
forward computation shows that vP(y = 200 > vQ(y0 ) = 0 for yo = (2,2,)
since it has a relaxed primal problem whose feasible region is the smallest.
integer program as in (2), let C(P) denote the mixed integer program whose
vC() (y) > vQ(y) for all y yC and for all equivalent formulations Q
of this problem.
* cy *
Proof: Let y Y be arbitrary and let x be an optimal solution to
* C(P) * * *
C(P) when y = y ; that is, V (y ) = cx + dy . By definition of
-30-
* *
convex hull (y ,x ) is a convex combination with weights X. of a finite
1
vC() (Y*) > Z .vQ(yl) and by convexity of vQ (y), the right-hand side of
or identical to Q.
problem is that when Benders algorithm is applied to it, only one cut is
constraints of the following problem define the convex hull of the mixed
C(P)
v = min cx + dy
subject to: Rx + Qy = q
x > O,y Y
Then we have
huZZ formulation C(P) requires only one Benders cut for convergence.
C(P)
Proof: v min min cx + dy
yEY x>O
subject to: Rx + Qy = q
-31-
without affecting the optimal solution value. Then applying linear pro-
C(p)
gramming duality theory (and again assuming that v (y) is feasible for
C(p)
all y E Y) we have v = min max u(q - Qy) + dy. Another application
EYC uR<c C(P) *
of linear programming duality theory yields v = min u (q - Qy) + dy for
s* ysyc
some u- satisfying uR<c. Since the last objective function is a linear function
C(P)
v = min z
y Y .
formulation C(P). O
Although we have shown that a reduced feasible region for the relaxed
hull formulation of a problem requires only a single Benders cut for con-
and Hong [32] have recently had success generating such constraints itera-
rich application area for the results of this section. Network problems
problem solved by Geoffrion and Graves [16], and the capacitated plant
location problem described by Guignard and Spielberg [18], are all net-
work examples that have several easily derived formulations. For these
problems, since the alternative formulations usually have the same problem
region for their respective relaxed primal problems. Due to the compara-
they reduce the size of the feasible region for the modified primal problem.
formulation strengthens the Benders cuts that can be derived, but also com-
between the quality of Benders cuts available and the time needed to solve
Benders cuts, but these stronger cuts may have to be distinguished from
in a future paper.
use with Benders decomposition. Suggestions were also made for modifying
formulations for mixed integer programs based upon our results and to per-
form computational tests evaluating our criteria for selecting among alter-
(_I _____
-34-
REFERENCES
11. Florian, M. G., Guerin, G., and Bushel, G., "The Engine Scheduling
Problem in a Railway Network," INFOR JournaZ, Vol. 14, pp. 121-
138, 1976.
12. Garfinkel, R. and Nemhauser, G., Integer Programming, Wiley, New York,
1972.
20. Lasdon, L. S., Optimization Theory for Large Systems, The Macmillan
Company, New York, 1970.
24. Marsten, R. E., "The Use of the BOXSTEP Method in Discrete Optimization,"
iMath. Prog. Study 3, pp. 127-144, 1975.
25. Marsten, R. E., Hogan, W. W., and Blankenship, J. W., "The BOXSTEP
Method for Large-Scale Optimization," Opns. Res., Vol. 23,
pp. 389-405, 1975.
26. Mevert, P., "Fixed Charge Network Flow Problems: Applications and
Methods of Solution." Presented at Large Scale and Hierarchical
Systems Workshop, Brussels, May, 1977.
27. Mifflin, R., "An Algorithm for Constrained Optimization with Semi-
smooth Functions," International Inst. for Appl. Sys. Analysis,
Laxenberg, Austria, 1977.
32. Padberg, M. and Hong, S., On the Symmetric TraveZing Salesman ProbZlem:
A ComputationaZ Study, Report #77-89, New York University, 1977.
37. Wolfe, P., "A method of Conjugate Subgradients for Minimizing Non-
differentiable Functions," Math Prog. Study 3, pp. 145-173, 1975.