Constrained Optimization: Wolfram Mathematica® Tutorial Collection
Constrained Optimization: Wolfram Mathematica® Tutorial Collection
Constrained Optimization: Wolfram Mathematica® Tutorial Collection
CONSTRAINED OPTIMIZATION
For use with Wolfram Mathematica® 7.0 and later.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Local Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Solving Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
The LinearProgramming Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Importing Large Datasets and Solving Large-Scale Problems . . . . . . . . . . . . . . . . . . . 10
Application Examples of Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Algorithms for Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Optimization Problems
Constrained optimization problems are problems for which a function f HxL is to be minimized or
maximized subject to constraints FHxL. Here f : n Ø is called the objective function and FHxL is
a Boolean-valued formula. In Mathematica the constraints FHxL can be an arbitrary Boolean
combination of equations gHxL 0, weak inequalities gHxL ¥ 0, strict inequalities gHxL > 0, and x œ
statements. The following notation will be used.
Min f HxL
(1)
s.t. FHxL
Max f HxL
(2)
s.t. FHxL
The following describes constrained optimization problems more precisely, restricting the discus-
sion to minimization problems for brevity.
Global Optimization
A point u œ n is said to be a global minimum of f subject to constraints F if u satisfies the con-
straints and for any point v that satisfies the constraints, f HuL § f HvL.
A value a œ ‹ 8-¶, ¶< is said to be the global minimum value of f subject to constraints F if for
any point v that satisfies the constraints, a § f HvL.
The global minimum value a exists for any f and F. The global minimum value a is attained if
there exists a point u such that FHuL is true and f HuL ã a. Such a point u is necessarily a global
minimum.
If f is a continuous function and the set of points satisfying the constraints F is compact (closed
and bounded) and nonempty, then a global minimum exists. Otherwise a global minimum may
2 Constrained Optimization
If f is a continuous function and the set of points satisfying the constraints F is compact (closed
and bounded) and nonempty, then a global minimum exists. Otherwise a global minimum may
or may not exist.
Here the minimum value is not attained. The set of points satisfying the constraints is not
closed.
In[1]:= MinimizeA9x, x2 + y2 < 1=, 8x, y<E
Minimize::wksol :
Warning: There is no minimum in the region described by the constraints; returning a result
on the boundary. à
Out[1]= 8-1, 8x Ø -1, y Ø 0<<
Here the set of points satisfying the constraints is closed but unbounded. Again, the minimum
value is not attained.
In[3]:= MinimizeA9x2 , x y ã 1=, 8x, y<E
Minimize::natt : The minimum is not attained at any point satisfying the given constraints. à
The minimum value may be attained even if the set of points satisfying the constraints is
neither closed nor bounded.
In[4]:= MinimizeA9x2 + Hy - 1L2 , y > x2 =, 8x, y<E
Out[4]= 80, 8x Ø 0, y Ø 1<<
Local Optimization
A point u œ n is said to be a local minimum of f subject to constraints F if u satisfies the con-
straints and, for some r > 0, if v satisfies v - u < r Ï FHvL, then f HuL § f HvL.
A local minimum may not be a global minimum. A global minimum is always a local minimum.
30
20
Out[20]= 10
1 2 3 4 5
-10
-20
Yes No
Out[2]= Yes No
Yes No Yes No
Linear Programming
Introduction
Linear programming problems are optimization problems where the objective function and
constraints are all linear.
Mathematica has a collection of algorithms for solving linear optimization problems with real
variables, accessed via LinearProgramming, FindMinimum , FindMaximum , NMinimize,
NMaximize, Minimize, and Maximize. LinearProgramming gives direct access to linear program-
ming algorithms, provides the most flexibility for specifying the methods used, and is the most
efficient for large-scale problems. FindMinimum , FindMaximum , NMinimize, NMaximize,
Minimize, and Maximize are convenient for solving linear programming problems in equation
and inequality form.
Constrained Optimization 5
Min x + 2y
s.t. -5 x + y = 7
x + y ¥ 26
x ¥ 3, y ¥ 4
using Minimize .
In[1]:= Minimize@8x + 2 y, - 5 x + y ã 7 && x + y ¥ 26 && x ¥ 3 && y ¥ 4<, 8x, y<D
293 19 137
Out[1]= : , :x Ø , yØ >>
6 6 6
This solves the same problem using NMinimize. NMinimize returns a machine-number
solution.
In[2]:= NMinimize@8x + 2 y, - 5 x + y ã 7 && x + y ¥ 26 && x ¥ 3 && y ¥ 4<, 8x, y<D
Out[2]= 848.8333, 8x Ø 3.16667, y Ø 22.8333<<
The Method option specifies the algorithm used to solve the linear programming problem.
Possible values are Automatic, "Simplex", "RevisedSimplex", and "InteriorPoint". The
default is Automatic, which automatically chooses from the other methods based on the prob -
lem size and precision.
Examples
Interior point algorithms for linear programming, loosely speaking, iterate from the interior of
the polytope defined by the constraints. They get closer to the solution very quickly, but unlike
the simplex/revised simplex algorithms, do not find the solution exactly. Mathematica's imple-
mentation of an interior point algorithm uses machine precision sparse linear algebra. Therefore
for large-scale machine-precision linear programming problems, the interior point method is
more efficient and should be used.
This solves a linear programming problem that has multiple solutions (any point that lies on the
line segment between 81, 0< and 81, 0< is a solution); the interior point algorithm gives a
solution that lies in the middle of the solution set.
In[6]:= LinearProgramming@8- 1., - 1<, 881., 1.<<, 881., - 1<<, Method Ø "InteriorPoint"D
Out[6]= 80.5, 0.5<
Using Simplex or RevisedSimplex, a solution at the boundary of the solution set is given.
In[7]:= LinearProgramming@8- 1., - 1<, 881., 1.<<, 881., - 1<<, Method Ø "RevisedSimplex"D
Out[7]= 81., 0.<
This shows that interior point method is much faster for the following random sparse linear
programming problem of 200 variables and gives similar optimal value.
Constrained Optimization 7
This shows that interior point method is much faster for the following random sparse linear
programming problem of 200 variables and gives similar optimal value.
In[43]:= m = SparseArray@RandomChoice@80.1, 0.9< Ø 81., 0.<, 850, 200<DD;
xi = LinearProgramming@Range@200D, m, Table@0, 850<D,
Method Ø "InteriorPoint"D; êê Timing
Out[44]= 80.012001, Null<
Min cT x HPL
s.t. A1 x = b1
A2 x ¥ b2
l § x § u,
its dual is
Max bT y + lT z - uT w HDL
s.t. AT y + z - w = c
y2 ¥ 0, z, w ¥ 0
The relationship between the solutions of the primal and dual problems is given by the following
table.
When both problems are feasible, then the optimal values of (P) and (D) are the same, and the
following complementary conditions hold for the primal solution x, and dual solution y, z.
HA2 x - b2 LT y2 = 0
Hl - x* LT z* = Hu - x* LT w* = 0
Min 3 x1 + 4 x2
s.t. x1 + 2 x2 ¥ 5
1 § x1 § 4, 1 § x2 § 4,
Max 5 y1 + z1 + z2 - 4 w1 - 4 w2
s.t. y1 + z1 - w1 = 3
2 y1 + z2 - w2 = 4
y1 , z1 , z2 , w1 , w2 ¥ 0
In[14]:= 8x, y, z, w< = DualLinearProgramming@83, 4<, 881, 2<<, 85<, 881, 4<, 81, 4<<D
Out[14]= 881, 2<, 82<, 81, 0<, 80, 0<<
This confirms that the primal and dual give the same objective value.
In[15]:= 83, 4<.x
Out[15]= 11
The dual of the constraint is y = 82.<, which means that for one unit of increase in the right-
hand side of the constraint, there will be 2 units of increase in the objective. This can be con-
firmed by perturbing the right-hand side of the constraint by 0.001.
In[17]:= 8x, y, z, w< = DualLinearProgramming@83, 4<, 881, 2<<, 85 + 0.001<, 881, 4<, 81, 4<<D
Out[17]= 881., 2.0005<, 82.<, 81., 0.<, 80., 0.<<
Out[19]= LinearProgramming@81., 1<, 881, 1<, 81, 1<<, 881, -1<, 82, 1<<, Method Ø InteriorPointD
Sometimes the heuristic cannot tell with certainty if a problem is infeasible or unbounded.
In[20]:= LinearProgramming@8- 1., - 1.<, 881., 1.<<, 81.<, Method Ø "InteriorPoint"D
LinearProgramming::lpdinf :
The dual of this problem is infeasible, which implies that this problem is either
unbounded or infeasible. Setting the option Method -> Simplex should give a
more definite answer, though large problems may take longer computing time. à
Out[20]= LinearProgramming@8-1., -1.<, 881., 1.<<, 81.<, Method Ø InteriorPointD
Using the Simplex method as suggested by the message shows that the problem is unbounded.
In[21]:= LinearProgramming@8- 1., - 1.<, 881., 1.<<, 81.<, Method Ø "Simplex"D
Large problems that contain dense columns typically benefit from dense column treatment.
In[95]:= A = SparseArray@88i_, i_< Ø 1., 8i_, 1< Ø 1.<, 8300, 300<D;
c = Table@1, 8300<D;
b = A.Range@300D;
10 Constrained Optimization
This solves the linear programming problem specified by MPS file "afiro.mps".
In[25]:= p = Import@"OptimizationêDataêafiro.mps"D
Out[25]= 88-0.4 X02MPS - 0.32 X14MPS - 0.6 X23MPS - 0.48 X36MPS + 10. X39MPS ,
-1. X01MPS + 1. X02MPS + 1. X03MPS ã 0. && -1.06 X01MPS + 1. X04MPS ã 0. && 1. X01MPS § 80. &&
-1. X02MPS + 1.4 X14MPS § 0. && -1. X06MPS - 1. X07MPS - 1. X08MPS - 1. X09MPS + 1. X14MPS + 1. X15MPS ã 0. &&
-1.06 X06MPS - 1.06 X07MPS - 0.96 X08MPS - 0.86 X09MPS + 1. X16MPS ã 0. && 1. X06MPS - 1. X10MPS § 80. &&
1. X07MPS - 1. X11MPS § 0. && 1. X08MPS - 1. X12MPS § 0. && 1. X09MPS - 1. X13MPS § 0. &&
-1. X22MPS + 1. X23MPS + 1. X24MPS + 1. X25MPS ã 0. && -0.43 X22MPS + 1. X26MPS ã 0. && 1. X22MPS § 500. &&
-1. X23MPS + 1.4 X36MPS § 0. && -0.43 X28MPS - 0.43 X29MPS - 0.39 X30MPS - 0.37 X31MPS + 1. X38MPS ã 0. &&
1. X28MPS + 1. X29MPS + 1. X30MPS + 1. X31MPS - 1. X36MPS + 1. X37MPS + 1. X39MPS ã 44. &&
1. X28MPS - 1. X32MPS § 500. && 1. X29MPS - 1. X33MPS § 0. && 1. X30MPS - 1. X34MPS § 0. &&
1. X31MPS - 1. X35MPS § 0. && 2.364 X10MPS + 2.386 X11MPS + 2.408 X12MPS + 2.429 X13MPS - 1. X25MPS +
2.191 X32MPS + 2.219 X33MPS + 2.249 X34MPS + 2.279 X35MPS § 0. && -1. X03MPS + 0.109 X22MPS § 0. &&
-1. X15MPS + 0.109 X28MPS + 0.108 X29MPS + 0.108 X30MPS + 0.107 X31MPS § 0. &&
0.301 X01MPS - 1. X24MPS § 0. && 0.301 X06MPS + 0.313 X07MPS + 0.313 X08MPS + 0.326 X09MPS - 1. X37MPS § 0. &&
1. X04MPS + 1. X26MPS § 310. && 1. X16MPS + 1. X38MPS § 300. && X01MPS ¥ 0 && X02MPS ¥ 0 && X03MPS ¥ 0 &&
X04MPS ¥ 0 && X06MPS ¥ 0 && X07MPS ¥ 0 && X08MPS ¥ 0 && X09MPS ¥ 0 && X10MPS ¥ 0 && X11MPS ¥ 0 &&
X12MPS ¥ 0 && X13MPS ¥ 0 && X14MPS ¥ 0 && X15MPS ¥ 0 && X16MPS ¥ 0 && X22MPS ¥ 0 && X23MPS ¥ 0 &&
X24MPS ¥ 0 && X25MPS ¥ 0 && X26MPS ¥ 0 && X28MPS ¥ 0 && X29MPS ¥ 0 && X30MPS ¥ 0 && X31MPS ¥ 0 &&
X32MPS ¥ 0 && X33MPS ¥ 0 && X34MPS ¥ 0 && X35MPS ¥ 0 && X36MPS ¥ 0 && X37MPS ¥ 0 && X38MPS ¥ 0 && X39MPS ¥ 0<,
8X01MPS , X02MPS , X03MPS , X04MPS , X06MPS , X07MPS , X08MPS , X09MPS , X10MPS , X11MPS , X12MPS ,
X13MPS , X14MPS , X15MPS , X16MPS , X22MPS , X23MPS , X24MPS , X25MPS , X26MPS , X28MPS , X29MPS ,
X30MPS , X31MPS , X32MPS , X33MPS , X34MPS , X35MPS , X36MPS , X37MPS , X38MPS , X39MPS <<
In[26]:= NMinimize üü p
Out[26]= 8-464.753, 8X01MPS Ø 80., X02MPS Ø 25.5, X03MPS Ø 54.5, X04MPS Ø 84.8, X06MPS Ø 18.2143, X07MPS Ø 0.,
X08MPS Ø 0., X09MPS Ø 0., X10MPS Ø 0., X11MPS Ø 0., X12MPS Ø 0., X13MPS Ø 0., X14MPS Ø 18.2143,
X15MPS Ø 0., X16MPS Ø 19.3071, X22MPS Ø 500., X23MPS Ø 475.92, X24MPS Ø 24.08, X25MPS Ø 0.,
X26MPS Ø 215., X28MPS Ø 0., X29MPS Ø 0., X30MPS Ø 0., X31MPS Ø 0., X32MPS Ø 0., X33MPS Ø 0.,
X34MPS Ø 0., X35MPS Ø 0., X36MPS Ø 339.943, X37MPS Ø 383.943, X38MPS Ø 0., X39MPS Ø 0.<<
This shows that for MPS formatted data, the following 3 elements can be imported.
In[101]:= p = Import@"OptimizationêDataêganges.mps", "Elements"D
Out[101]= 8ConstraintMatrix, Equations, LinearProgrammingData<
This imports the problem "ganges", with 1309 constraints and 1681 variables, in a form suit-
able for LinearProgramming.
In[102]:= 8c, A, b, bounds< =
Import@"OptimizationêDataêganges.mps", "LinearProgrammingData"D;
In[104]:= c.x
Out[104]= -109 586.
The "ConstraintMatrix" specification can be used to get the sparse constraint matrix only.
In[105]:= p = Import@"OptimizationêDataêganges.mps", "ConstraintMatrix"D
Out[105]= SparseArray@<6912>, 81309, 1681<D
This gets a temporary file name, and exports the string to the file.
In[123]:= file = Close@OpenWrite@DD;
Export@file, txt, "Text"D;
This imports the file, using the "FreeFormat" -> True option.
In[126]:= Import@file, "MPS", "FreeFormat" Ø TrueD
Out[126]= 881. xMPS + 4. yMPS + 7. ZMPS , 2. xMPS + 5. yMPS § 10. && 3. xMPS + 8. ZMPS ¥ 11. && 6. yMPS + 9. ZMPS ã 12. &&
xMPS ¥ 0 && xMPS § 13. && yMPS ¥ 14. && yMPS § 15. && ZMPS ¥ 0<, 8xMPS , yMPS , ZMPS <<
Out[8]=
In[9]:= LinearProgramming üü %
Out[9]= 880., 25.5, 54.5, 84.8, 18.2143, 0., 0., 0., 0., 0., 0., 0., 18.2143, 0., 19.3071, 500.,
475.92, 24.08, 0., 215., 0., 0., 0., 0., 0., 0., 0., 0., 339.943, 383.943, 0., 0.<
This shows other properties that can be imported for the "afiro" problem.
In[10]:= ExampleData@8"LinearProgramming", "afiro"<, "Properties"D
Out[10]= 8Collection, ConstraintMatrix, Dimensions, Equations, LinearProgrammingData, Name, Source<
L1-Norm Minimization
It is possible to solve an l1 minimization problem
Min Ax-b 1
Min zT e
z¥Ax-b
z ¥ -A x + b
1 2 3 1
4 5 5 2
x =
7 8 9 3
10 11 12 4
The least squares solution can be found using PseudoInverse . This gives a large l1 norm, but
a smaller l2 norm.
In[41]:= x2 = PseudoInverse@AD.b
4 58 112
Out[41]= : , , >
513 513 513
Wall Anchor
Load
This problem can be modeled by discretizing and simulating it using nodes and links. The model-
ing process is illustrated using the following figure. Here a grid of 7×10 nodes is generated.
Each node is then connected by a link to all other nodes that are of Manhattan distance of less
than or equal to three. The three red nodes are assumed to be fixed to the wall, while on all
other nodes, compression and tension forces must balance.
16 Constrained Optimization
Each link represents a rigid rod that has a thickness, with its weight proportional to the force on
it and its length. The aim is to minimize the total material used, which is
Hence mathematically this is a linearly constrained minimization problem, with objective func-
tion a sum of absolute values of linear functions.
The absolute values force * link_length in the objective function can be replaced by breaking
down force into a combination of compression and tension forces, with each non-negative.
Thus assume E is the set of links, V the set of nodes, lij the length of link between nodes i and
j, cij and tij the compression and tension forces on the link; then the above model can be
subject to ‚ H tik - cik L = loadi , tij , cij ¥ 0, for all i œ V and 8i, j< œ E.
8i,k<œ E
The following sets up the model, solves it, and plots the result; it is based on an AMPL model
[2].
Constrained Optimization 17
The following sets up the model, solves it, and plots the result; it is based on an AMPL model
[2].
In[1]:= OptimalAnchorDesign@X_, Y_, ANCHORS_, forcepoints_, dist_: 3D :=
Module@8a, c, ldist, p, NODES, UNANCHORED, setx, sety, length, xload, yload,
nnodes, getarcs, ARCS, comp, comps, tensions, const1, const2, lengths, volume,
inedges, outedges, nodebalance, const3, vars, totalf, maxf, res, tens,
setInOutEdges, consts, sol, f, xii, yii, xjj, yjj, t, rhs, ma, obj, m, n<,
Clear@comp, tensions, tens, varsD;
H* A lattice of Nodes *L
NODES = Partition@Flatten@Outer@List, X, YDD, 2D;
H* these are the nodes near the wall that will be anchored *L
UNANCHORED = Complement@NODES, ANCHORSD;
H* the many linked exist in the structure that we try to optimize away *L
setx@8x_, y_<D := Hxload@x, yD = 0L;
sety@8x_, y_<D := Hyload@x, yD = 0L;
Map@setx, UNANCHOREDD;
Map@sety, UNANCHOREDD;
Map@Hyload@Ò@@1DD, Ò@@2DDD = - 1L &, forcepointsD;
H* objective function *L
volume = lengths.Hcomps + tensionsL;
nodebalance@8x_, y_<D :=
Module@8Inedges, Outedges, xforce, yforce<, Inedges = inedges@8x, y<D;
Outedges = outedges@8x, y<D;
xforce@8xi_, yi_, xj_, yj_<D := HHxj - xiL ê length@8xi, yi, xj, yj<DL *
Hcomp@xi, yi, xj, yjD - tens@xi, yi, xj, yjDL;
yforce@8xi_, yi_, xj_, yj_<D := HHyj - yiL ê length@8xi, yi, xj, yj<DL *
Hcomp@xi, yi, xj, yjD - tens@xi, yi, xj, yjDL;
H* constraints *L
8Total@Map@xforce, InedgesDD - Total@Map@xforce, OutedgesDD ã xload@x, yD,
Total@Map@yforce, InedgesDD - Total@Map@yforce, OutedgesDD ã yload@x, yD<
D;
const3 = Flatten@Map@nodebalance@ÒD &, UNANCHOREDDD;
H* assemble the variables and constraints, and solve *L
H* solve *L
t = Timing@sol = LinearProgramming@obj, ma,
Transpose@8- rhs, Table@0, 8m<D<D, Table@80, Infinity<, 8n<DD;D;
Print@"CPU time = ", t@@1DD, " Seconds"D;
Map@Set üü Ò &, Transpose@8vars, sol<DD;
H* Now we plot the links that has a force at least 0.001 and
get the optimal design of the anchor. We color code the drawing
so that red means a large force and blue a small one. Also,
links with large forces are drawn thinker than those with small forces. *L
This solves the problem by placing 30 nodes in the horizontal and vertical directions.
In[2]:= m = 30;H* y direction. *L
n = 30;H* x direction. *L
X = Table@i, 8i, 0, n<D;
Y = Table@i, 8i, 0, m<D;
res = OptimalAnchorDesign@X, Y,
Table@81, i<, 8i, Round@m ê 3D, Round@m ê 3 * 2D<D, 88n, m ê 2<<, 3D
Number of variables = 27 496 number of constraints = 1900
CPU time = 4.8123 Seconds
Out[6]=
If, however, the anchor is fixed not on the wall, but on some points in space, notice how the
results resemble the shape of some leaves. Perhaps the structure of leaves is optimized in the
process of evolution.
In[7]:= m = 40;H*must be even*L
n = 40;
X = Table@i, 8i, 0, n<D;
Y = Table@i, 8i, 0, m<D;
res = OptimalAnchorDesign@X, Y,
Table@8Round@n ê 3D, i<, 8i, Round@m ê 2D - 1, Round@m ê 2D + 1<D, 88n, m ê 2<<, 3D
Number of variables = 49 456 number of constraints = 3356
CPU time = 9.83262 Seconds
Out[11]=
20 Constrained Optimization
Although the sparse implementation of simplex and revised algorithms are quite efficient in
practice, and are guaranteed to find the global optimum, they have a poor worst-case behavior:
it is possible to construct a linear programming problem for which the simplex or revised
simplex method takes a number of steps exponential in the problem size.
Mathematica implements simplex and revised simplex algorithms using dense linear algebra.
The unique feature of this implementation is that it is possible to solve exact/extended precision
problems. Therefore these methods are more suitable for small-sized problems for which non-
machine number results are needed.
This sets up a random linear programming problem with 20 constraints and 200 variables.
In[12]:= SeedRandom@123D;
8m, n< = 820, 200<;
c = Table@RandomInteger@81, 10<D, 8n<D;
A = Table@RandomInteger@8- 100, 100<D, 8m<, 8n<D;
b = A.Table@1, 8n<D;
bounds = Table@8- 10, 10<, 8n<D;
This solves the problem. Typically, for a linear programming problem with many more variables
than constraints, the revised simplex algorithm is faster. On the other hand, if there are many
more constraints than variables, the simplex algorithm is faster.
In[25]:= t = Timing@x = LinearProgramming@c, A, b, bounds, Method Ø "Simplex"D;D;
Print@"time = ", t@@1DD, " optimal value = ", c.x, " or ", N@c.xDD
If only machine-number results are desired, then the problem should be converted to machine
numbers, and the interior point algorithm should be used.
In[20]:= t = Timing@
x = LinearProgramming@N@cD, N@AD, N@bD, N@boundsD, Method Ø "InteriorPoint"D;D;
Print@"time = ", t, " optimal value = ", c.xD
time = 80.036002, Null< optimal value = -10 935.
Furthermore, the Mathematica simplex and revised simplex implementation use dense linear
algebra, while its interior point implementation uses machine-number sparse linear algebra.
Therefore for large-scale, machine-number linear programming problems, the interior point
method is more efficient and should be used.
Min cT x, s.t. A x = b, x ¥ 0,
where c, x œ Rn , A œ Rmµn , b œ Rm . This problem can be solved using a barrier function formulation
to deal with the positive constraints
n
Min cT x - t ‚ln Hxi L, s.t. A x = b, x ¥ 0, t > 0, t Ø 0
i=1
c - t X -1 e = AT y, and A x = b, x ¥ 0
xz = te
AT y + z = c
Ax = b
x, z ¥ 0
This is a set of 2 m + n linear/nonlinear equations with constraints. It can be solved using New-
ton's method
with
X Z 0 Dz te - xz
I 0 A T Dx = c - AT y - z .
0 A 0 Dy b - Ax
One way to solve this linear system is to use Gaussian elimination to simplify the matrix into
block triangular form.
X Z 0 X Z 0 X Z 0
-1
I 0 A T Ø 0 X Z A -1 T Ø 0 X Z AT
-1
0 A 0 0 A 0 0 0 AZ X AT
To solve this block triangular matrix, the so-called normal system can be solved, with the
matrix in this normal system
B = A Z -1 X AT
This matrix is positive definite, but becomes very ill-conditioned as the solution is approached.
Thus numerical techniques are used to stabilize the solution process, and typically the interior
Constrained Optimization 23
point method can only be expected to solve the problem to a tolerance of about
Convergence Tolerance
General Linear Programming problems are first converted to the standard form
Min cT x
subject to A x = b
x ¥ 0
Max bT y
subject to A T y + z = c
z¥ 0
»» b - A x »» »» c - AT y - z »» »» cT x - bT y »»
+ + § tolerance
maxH1, »» b »»L maxH1, »» c »»L maxI1, »» cT x »», »» bT y »»M
References
[1] Vanderbei, R. Linear Programming: Foundations and Extensions. Springer-Verlag, 2001.
[2] Mehrotra, S. "On the Implementation of a Primal-Dual Interior Point Method." SIAM Journal
on Optimization 2 (1992): 575|601.
24 Constrained Optimization
Introduction
Numerical algorithms for constrained nonlinear optimization can be broadly categorized into
gradient-based methods and direct search methods. Gradient search methods use first deriva-
tives (gradients) or second derivatives (Hessians) information. Examples are the sequential
quadratic programming (SQP) method, the augmented Lagrangian method, and the (nonlinear)
interior point method. Direct search methods do not use derivative information. Examples are
Nelder|Mead, genetic algorithm and differential evolution, and simulated annealing. Direct
search methods tend to converge more slowly, but can be more tolerant to the presence of
noise in the function and constraints.
Typically, algorithms only build up a local model of the problems. Furthermore, to ensure conver -
gence of the iterative process, many such algorithms insist on a certain decrease of the objec-
tive function or of a merit function which is a combination of the objective and constraints. Such
algorithms will, if convergent, only find the local optimum, and are called local optimization
algorithms. In Mathematica local optimization problems can be solved using FindMinimum .
Global optimization algorithms, on the other hand, attempt to find the global optimum, typically
by allowing decrease as well as increase of the objective/merit function. Such algorithms are
usually computationally more expensive. Global optimization problems can be solved exactly
using Minimize or numerically using NMinimize.
Min x - y
s.t - 3 x2 + 2 x y - y2 ¥ -1
This solves the same problem numerically. NMinimize returns a machine-number solution.
FindMinimum numerically finds a local minimum. In this example the local minimum found is
also a global minimum.
In[3]:= FindMinimumA9x - y, - 3 x2 + 2 x y - y2 ¥ - 1=, 8x, y<E
-17
Out[3]= 9-1., 9x Ø 2.78301 µ 10 , y Ø 1.==
s.t x 2 + y2 > 3
using FindMinimum .
100 200
In[4]:= FindMinimumB:- - , x2 + y 2 > 3 >, 8x, y<F
2 2
Hx - 1L + Hy - 1L + 1 Hx + 1L + Hy + 2L2 + 1
2
This provides FindMinimum with a starting value of 2 for x, but uses the default starting point
for y.
100 200
In[5]:= FindMinimumB:- - , x2 + y 2 > 3 >, 88x, 2<, y<F
2 2
Hx - 1L + Hy - 1L + 1 Hx + 1L + Hy + 2L2 + 1
2
The previous solution point is actually a local minimum. FindMinimum only attempts to find a
local minimum.
26 Constrained Optimization
This contour plot of the feasible region illustrates the local and global minima.
100 200
In[6]:= ContourPlotB- -
, 8x, - 3, 2<,
Hx - 1L + Hy - 1L + 1 Hx + 1L + Hy + 2L2 + 1
2 2 2
-60
0
-40
Out[6]=
-1
-60
-80-100
-180 -140
-200
-2 -100
global minimum -80
-120
-160 -60 -40
-3 -40
-3 -2 -1 0 1 2
Out[21]=
Constrained Optimization 27
The Method option specifies the method to use to solve the optimization problem. Currently, the
only method available for constrained optimization is the interior point algorithm.
In[8]:= FindMinimumA9x2 + y2 , Hx - 1L2 + 2 Hy - 1L2 > 5=, 8x, y<, Method Ø "InteriorPoint"E
Out[8]= 80.149239, 8x Ø -0.150959, y Ø -0.355599<<
MaxIterations specifies the maximum number of iterations that should be used. When
machine precision is used for constrained optimization, the default MaxIterations -> 500 is
used.
When StepMonitor is specified, it is evaluated once every iterative step in the interior point
algorithm. On the other hand, EvaluationMonitor, when specified, is evaluated every time a
function or an equality or inequality constraint is evaluated.
28 Constrained Optimization
This demonstrates that 19 iterations are not sufficient to solve the following problem to the
default tolerance. It collects points visited through the use of StepMonitor .
In[9]:= pts =
100 200
ReapBsol = FindMinimumB:- , x2 + y 2 > 3 >,
-
Hx - 1L + Hy - 1L + 1 Hx + 1L + Hy + 2L2 + 1
2 2 2
88x, 1.5<, 8y, - 1<<, MaxIterations Ø 19, StepMonitor ß HSow@8x, y<DLF;F; sol
The points visited are shown using ContourPlot . The starting point is blue, the rest yellow.
100 200
In[10]:= ContourPlotB- -
, 8x, - 3, 3<,
Hx - 1L + Hy - 1L + 1 Hx + 1L + Hy + 2L2 + 1
2 2 2
0
-40
Out[10]= -1
-180 -140
-200 -40
-2
-120
-160
-100
-3 -60 -80
-40
-4 -20
-3 -2 -1 0 1 2 3
WorkingPrecision -> prec specifies that all the calculation in FindMinimum is to be carried out
at precision prec. By default, prec = MachinePrecision. If prec > MachinePrecision, a fixed
precision of prec is used through the computation.
AccuracyGoal and PrecisionGoal options are used in the following way. By default,
AccuracyGoal -> Automatic, and is set to prec ê 3. By default, PrecisionGoal -> Automatic
and is set to - Infinity. AccuracyGoal -> ga is the same as
AccuracyGoal -> 8- Infinity, ga<.
Suppose AccuracyGoal -> 8a, ga< and PrecisionGoal -> p, then FindMinimum attempts to
drive the residual, which is a combination of the feasibility and the satisfaction of the Karush|
Kuhn|Tucker (KKT) and complementary conditions, to be less than or equal to tol = 10-ga . In
Constrained Optimization 29
Suppose AccuracyGoal -> 8a, ga< and PrecisionGoal -> p, then FindMinimum attempts to
drive the residual, which is a combination of the feasibility and the satisfaction of the Karush|
Kuhn|Tucker (KKT) and complementary conditions, to be less than or equal to tol = 10-ga . In
addition, it requires the difference between the current and next iterative point, x and x+ , to
satisfy »» x+ - x »» <= 10-a + 10-p »» x »», before terminating.
The exact optimal value is computed using Minimize , and compared with the result of
FindMinimum .
In[12]:= solExact =
100 200
MinimizeB:- - , x2 + y 2 > 3 >, 8x, y<F;
2 2
Hx - 1L + Hy - 1L + 1 Hx + 1L + Hy + 2L2 + 1
2
Examples of FindMinimum
This shows a function with multiple minima within the feasible region of - 10 § x § 10.
In[14]:= Plot@Sin@xD + .5 x, 8x, - 10, 10<D
Out[14]=
-10 -5 5 10
-2
-4
If the user has some knowledge of the problem, a better starting point can be given to
FindMinimum .
In[16]:= FindMinimum@8Sin@xD + .5 x, - 10 <= x <= 10<, 88x, - 5<<D
Out[16]= 8-5.05482, 8x Ø -8.37758<<
Finally, multiple starting points can be used and the best resulting minimum selected.
In[18]:= SeedRandom@7919D;
Table@
FindMinimum@8Sin@xD + .5 x, - 10 <= x <= 10<, 88x, RandomReal@8- 10, 10<D<<D, 810<D
Out[19]= 88-1.91322, 8x Ø -2.09439<<, 8-5.05482, 8x Ø -8.37758<<, 8-5.05482, 8x Ø -8.37758<<,
8-5.05482, 8x Ø -8.37758<<, 8-1.91322, 8x Ø -2.0944<<, 8-5.05482, 8x Ø -8.37758<<,
81.22837, 8x Ø 4.18879<<, 81.22837, 8x Ø 4.18879<<, 81.22837, 8x Ø 4.18879<<, 84.45598, 8x Ø 10.<<<
Multiple starting points can also be done more systematically via NMinimize, using the
"RandomSearch" method with an interior point as the post-processor.
In[1]:= NMinimize@8Sin@xD + .5 x, - 10 <= x <= 10<, 8x<,
Method Ø 8"RandomSearch", "PostProcess" Ø "InteriorPoint"<D
Out[1]= 8-5.05482, 8x Ø -8.37758<<
Constrained Optimization 31
While this problem can often be solved using general constrained optimization technique, it is
more reliably solved by reformulating the problem into one with smooth objective function.
Specifically, the minimax problem can be converted into the following
Min z
s.t z ¥ fi HxL, i œ 81, 2, …, m<, gHxL ¥ 0, hHxL = 0
This shows the contour of the objective function, and the optimal solution.
In[14]:= ContourPlot@Max@8Abs@2 x ^ 2 + y ^ 2 - 48 x - 40 y + 304D, Abs@- x ^ 2 - 3 y ^ 2D,
Abs@x + 3 y - 18D, Abs@- x - yD, Abs@ x + y - 8D<D, 8x, 3, 7<, 8y, 1, 3<,
Contours Ø 40, Epilog Ø 8Red, PointSize@0.02D, Point@84.93, 2.08<D<D
Out[14]=
This shows the contour of the objective function within the feasible region, and the optimal
solution.
In[18]:= ContourPlot@Max@8Abs@2 x ^ 2 + y ^ 2 - 48 x - 40 y + 304D,
Abs@- x ^ 2 - 3 y ^ 2D, Abs@x + 3 y - 18D, Abs@- x - yD, Abs@ x + y - 8D<D,
8x, 3, 7<, 8y, 0, 3<, RegionFunction Ø HHHÒ1 - 6L ^ 2 + HÒ2 - 1L ^ 2 <= 1L &L,
Contours Ø 40, Epilog Ø 8Red, PointSize@0.02D, Point@85.34, 1.75<D<D
Out[18]=
Sometimes the decision maker has in mind a goal for each objective. In that case the so-called
goal programming technique can be applied.
There are a number of variants of how to model a goal-programming problem. One variant is to
order the objective functions based on priority, and seek to minimize the deviation of the most
important objective function from its goal first, before attempting to minimize the deviations of
the less important objective functions from their goals. This is called lexicographic or preemp-
tive goal programming.
In the second variant, the weighted sum of the deviation is minimized. Specifically, the follow-
ing constrained minimization problem is to be solved.
+ + +
Minx w1 I f1 HxL - goal1 M + w2 I f2 HxL - goal2 M + … + wm I fm HxL - goalm M
s.t gHxL ¥ 0, hHxL = 0
Here a+ stands for the positive part of the real number a. The weights wi reflect the relative
importance, and normalize the deviation to take into account the relative scales and units.
Possible values for the weights are the inverse of the goals to be attained. The previous prob-
lem can be reformulated to one that is easier to solve.
Min w1 z1 + w2 z2 + … + wm zm
s.t z1 ¥ f1 HxL - goal1 , z2 ¥ f2 HxL - goal2 , …, zm ¥ fm HxL - goalm , z1 , z2 , …, zm ¥ 0
gHxL ¥ 0, hHxL = 0
The third variant, Chebyshev goal programming, minimizes the maximum deviation, rather
than the sum of the deviations. This balances the deviation of different objective functions.
Specifically, the following constrained minimization problem is to be solved.
Min z
s.t z ¥ wi I fi HxL - goali M, i = 1, 2, …, m
gHxL ¥ 0, hHxL = 0
This defines a function GoalProgrammingChebyshev that solves the goal programming model
by minimizing the maximum deviation.
H* syntax GoalAttainment@888f1,goal1,weight1<,...<,cons<,varsDD*L
GoalProgrammingChebyshev@
8fg : 88_, _< ..<, cons_<, vars_, opts___ ? OptionQD := With@
8res = Catch@iGoalProgrammingChebyshev@8Map@HAppend üü Ò &L,
Thread@8fg, ConstantArray@1, 8Length@fgD<D<DD, cons<, varsDD<,
res ê; ListQ@resD
D;
GoalProgrammingChebyshev@
8fg : 88_, _, _< ..<, cons_<, vars_, opts___ ? OptionQD := With@
8res = Catch@iGoalProgrammingChebyshev@8fg, cons<, varsDD<,
res ê; ListQ@resD
D;
iGoalProgrammingChebyshev@
8fg : 88_, _, _< ..<, cons_<, vars_, opts___ ? OptionQD := Module@
8fs, goals, y, res, ws<,
8fs, goals, ws< = Transpose@fgD;
If@! VectorQ@ws, HÒ >= 0 &LD, Throw@$FailedDD;
If@! VectorQ@goals, HHNumericQ@ÒD && Head@ÒD =!= ComplexL &LD, Throw@$FailedDD;
res = FindMinimum@
8y, HAnd üü Flatten@8cons<, 1DL && HAnd üü Thread@y ¥ ws * Hfs - goalsL DL<,
Append@Flatten@8vars<, 1D, yD, optsD;
If@ListQ@resD, 8fs ê. res@@2DD, Thread@vars Ø Hvars ê. res@@2DDLD<D
D;
This solves a goal programming problem with two objective functions and one constraint using
GoalProgrammingWeightedAverage with unit weighting, resulting in deviations from the
goal of 13.12 and 33.28, thus a total deviation of 37, and a maximal deviation of 33.28.
Constrained Optimization 35
This solves a goal programming problem with two objective functions and one constraint using
GoalProgrammingWeightedAverage with unit weighting, resulting in deviations from the
goal of 13.12 and 33.28, thus a total deviation of 37, and a maximal deviation of 33.28.
res1 = GoalProgrammingWeightedAverage@
888x ^ 2 + y ^ 2, 0<, 84 Hx - 2L ^ 2 + 4 Hy - 2L ^ 2, 0<<, y - x ã - 4<, 8x, y<D
8813.12, 33.28<, 8x Ø 3.6, y Ø -0.4<<
This solves a goal programming problem with two objective functions and one constraint using
GoalProgrammingChebyshev with unit weighting, resulting in deviations from the goal of 16
and 32, thus a maximal deviation of 32, but a total deviation of 38.
res2 = GoalProgrammingChebyshev@
888x ^ 2 + y ^ 2, 0<, 84 Hx - 2L ^ 2 + 4 Hy - 2L ^ 2, 0<<, y - x ã - 4<, 8x, y<D
9816., 32.<, 9x Ø 4., y Ø -4.55071 µ 10-9 ==
This shows the contours for the first (blue) and second (red) objective functions, the feasible
region (the black line), and the optimal solution found by
GoalProgrammingWeightedAverage (yellow point) and by GoalProgrammingChebyshev
(green point).
g1 = ContourPlot@x ^ 2 + y ^ 2, 8x, 2, 6<, 8y, - 1, 2<,
ContourShading Ø False, ContourStyle Ø Blue, ContourLabels Ø AutomaticD;
g2 = ContourPlot@4 Hx - 2L ^ 2 + 4 Hy - 2L ^ 2, 8x, 2, 6<, 8y, - 1, 2<,
ContourShading Ø False, ContourStyle Ø Red, ContourLabels Ø AutomaticD;
Show@8g1, g2<, Epilog Ø 8Line@883, - 1<, 86, 2<<D, PointSize@0.02D, Yellow,
Point@8x, y< ê. res1@@2DDD, Green, Point@8x, y< ê. res2@@2DDD<D
2.0 10 15 20 25 35 60
20
1.5
1.0
10
0.5
0.0
80
-0.5
30
50 90
40
-1.0 5 70 30
2 3 4 5 6
36 Constrained Optimization
In this example, the aim is to find the optimal asset allocation so as to minimize the risk, and
achieve a preset level of return, by investing in a spread of stocks, bonds, and gold.
Here are the historical returns of various assets between 1973 and 1994. For example, in 1973,
S&P 500 lost 1 - 0.852 = 14.8 %, while gold appreciated by 67.7%.
"3m Tbill" "long Tbond" "SP500" "Wilt.5000" "Corp. Bond" "NASDQ" "EAFE" "Gold"
1973 1.075 0.942 0.852 0.815 0.698 1.023 0.851 1.677
1974 1.084 1.02 0.735 0.716 0.662 1.002 0.768 1.722
1975 1.061 1.056 1.371 1.385 1.318 0.123 1.354 0.76
1976 1.052 1.175 1.236 1.266 1.28 1.156 1.025 0.96
1977 1.055 1.002 0.926 0.974 1.093 1.03 1.181 1.2
1978 1.077 0.982 1.064 1.093 1.146 1.012 1.326 1.295
1979 1.109 0.978 1.184 1.256 1.307 1.023 1.048 2.212
1980 1.127 0.947 1.323 1.337 1.367 1.031 1.226 1.296
1981 1.156 1.003 0.949 0.963 0.99 1.073 0.977 0.688
1982 1.117 1.465 1.215 1.187 1.213 1.311 0.981 1.084
1983 1.092 0.985 1.224 1.235 1.217 1.08 1.237 0.872
1984 1.103 1.159 1.061 1.03 0.903 1.15 1.074 0.825
1985 1.08 1.366 1.316 1.326 1.333 1.213 1.562 1.006
1986 1.063 1.309 1.186 1.161 1.086 1.156 1.694 1.216
1987 1.061 0.925 1.052 1.023 0.959 1.023 1.246 1.244
1988 1.071 1.086 1.165 1.179 1.165 1.076 1.283 0.861
1989 1.087 1.212 1.316 1.292 1.204 1.142 1.105 0.977
1990 1.08 1.054 0.968 0.938 0.83 1.083 0.766 0.922
1991 1.057 1.193 1.304 1.342 1.594 1.161 1.121 0.958
1992 1.036 1.079 1.076 1.09 1.174 1.076 0.878 0.926
1993 1.031 1.217 1.1 1.113 1.162 1.11 1.326 1.146
1994 1.045 0.889 1.012 0.999 0.968 0.965 1.078 0.99
average 1.078 1.093 1.120 1.124 1.121 1.046 1.141 1.130
Here are the expected returns over this 22-year period for the eight assets.
In[3]:= 8n, nyear< = Dimensions@RD;
In[5]:= ER = Mean@Transpose ü RD
Out[5]= 81.07814, 1.09291, 1.11977, 1.12364, 1.12132, 1.04632, 1.14123, 1.12895<
Here is the covariant matrix, which measures how the assets correlate to each other.
In[11]:= Covariants = Covariance@Transpose@RDD;
This finds the optimal asset allocation by minimizing the standard deviation of an allocation,
subject to the constraints that the total allocation is 100% (Total@varsD == 1), the expected
return is over 12% (vars.ER ¥ 1.12), and the variables must be non-negative, thus each asset is
allocated a non-negative percentage (thus no shorting). The resulting optimal asset allocation
suggests 15.5% in 3-month treasury bills, 20.3% in gold, and the rest in stocks, with a result-
ing standard deviation of 0.0126.
In[18]:= vars = Map@Subscript@x, ÒD &, 8"3m T-bill", "long T-bond", "SP500",
"Wiltshire 5000", "Corporate Bond", "NASDQ", "EAFE", "Gold"<D;
vars = Map@Subscript@x, ÒD &, 8"3m T-bill", "long T-bond", "SP500",
"Wiltshire 5000", "Corporate Bond", "NASDQ", "EAFE", "Gold"<D;
FindMinimum@8
vars.Covariants.vars,
Total@varsD ã 1 && vars.ER ¥ 1.12 && Apply@And, Thread@Greater@vars, 0DDD<, varsD
Out[20]= 80.0126235, 8x3m T-bill Ø 0.154632, xlong T-bond Ø 0.0195645, xSP500 Ø 0.354434, xWiltshire 5000 Ø 0.0238249,
xCorporate Bond Ø 0.000133775, xNASDQ Ø 0.0000309191, xEAFE Ø 0.24396, xGold Ø 0.203419<<
38 Constrained Optimization
This trades less return for smaller volatility by asking for an expected return of 10%. Now we
have 55.5% in 3-month treasury bills, 10.3% in gold, and the rest in stocks.
In[16]:= vars = Map@Subscript@x, ÒD &, 8"3m T-bill", "long T-bond", "SP500",
"Wiltshire 5000", "Corporate Bond", "NASDQ", "EAFE", "Gold"<D;
FindMinimum@8
vars.Covariants.vars,
Total@varsD ã 1 && vars.ER ¥ 1.10 && Apply@And, Thread@Greater@vars, 0DDD<, varsD
Out[17]= 80.00365995, 8x3m T-bill Ø 0.555172, xlong T-bond Ø 0.0244205, xSP500 Ø 0.156701, xWiltshire 5000 Ø 0.0223812,
xCorporate Bond Ø 0.00017454, xNASDQ Ø 0.0000293021, xEAFE Ø 0.13859, xGold Ø 0.102532<<
This shows that the interior point method has difficulty in minimizing this nonsmooth function.
In[29]:= FindMinimum@8Abs@x - 3D, 0 § x § 5<, 8x<D
FindMinimum::eit : The algorithm does not converge to the tolerance of 4.806217383937354`*^-6 in 500
iterations. The best estimated solution, with 8feasibility residual, KKT residual, complementary
residual< of 94.54827 µ 10-6 , 0.0402467, 2.27414 µ 10-6 =, is returned. à
-6
Out[29]= 98.71759 µ 10 , 8x Ø 2.99999<=
FindMinimum::lstol :
The line search decreased the step size to within tolerance specified by AccuracyGoal and
PrecisionGoal but was unable to find a sufficient decrease
in the function. You may need more than MachinePrecision
digits of working precision to meet these tolerances. à
-6
Out[30]= 98.06359 µ 10 , 8x Ø 2.99999<=
Constrained Optimization 39
Min f HxL
(3)
s.t.hHxL = 0, x ¥ 0.
The non-negative constraints are then replaced by adding a barrier term to the objective
function
The necessary KKT condition (assuming, for example, that the gradient of h is linearly indepen-
dent) is
“ y m HxL - yT AHxL = 0
hHxL = 0,
gHxL - z - yT AHxL = 0
hHxL = 0 (4)
Z X e = m e.
This nonlinear system can be solved with Newton's method. Let LHx, yL = f HxL - hHxLT y and
HHx, yL = “2 LHx, yL = “2 f HxL - ⁄m i=1 yi “2 hi HxL; the Jacobi matrix of the above system (4) is
40 Constrained Optimization
This nonlinear system can be solved with Newton's method. Let LHx, yL = f HxL - hHxLT y and
HHx, yL = “2 LHx, yL = “2 f HxL - ⁄m i=1 yi “2 hi HxL; the Jacobi matrix of the above system (4) is
x := x + d x, y := y + d y, z := z + d z (6)
with the search direction 8d x, d y, d z< given by solving the previous Jacobi system (5).
To ensure convergence, you need to have some measure of success. One way of doing this is to
use a merit function, such as the following augmented Langrangian merit function.
Here m > 0 is the barrier parameter and b > 0 a penalty parameter. It can be proved that if the
matrix N Hx, yL = H Hx, yL + X -1 Z is positive definite, then either the search direction given by (6) is
a decent direction for the above merit function (7), or 8x, y, z, m< satisfied the KKT condition (4).
A line search is performed along the search direction, with the initial step length chosen to be
as close to 1 as possible, while maintaining the positive constraints. A backtracking procedure is
then used until the Armijo condition is satisfied on the merit function,
fHx + t d x, bL § f Hx, bL + g t “ f Hx, bL d x with g œ H0, 1 ê 2D.
T
Constrained Optimization 41
Convergence Tolerance
The convergence criterion for the interior point algorithm is
Introduction
Numerical algorithms for constrained nonlinear optimization can be broadly categorized into
gradient-based methods and direct search methods. Gradient-based methods use first deriva-
tives (gradients) or second derivatives (Hessians). Examples are the sequential quadratic pro-
gramming (SQP) method, the augmented Lagrangian method, and the (nonlinear) interior point
method. Direct search methods do not use derivative information. Examples are Nelder|Mead,
genetic algorithm and differential evolution, and simulated annealing. Direct search methods
tend to converge more slowly, but can be more tolerant to the presence of noise in the function
and constraints.
Typically, algorithms only build up a local model of the problems. Furthermore, many such
algorithms insist on certain decrease of the objective function, or decrease of a merit function
which is a combination of the objective and constraints, to ensure convergence of the iterative
process. Such algorithms will, if convergent, only find local optima, and are called local optimiza-
tion algorithms. In Mathematica local optimization problems can be solved using FindMinimum .
Global optimization algorithms, on the other hand, attempt to find the global optimum, typically
by allowing decrease as well as increase of the objective/merit function. Such algorithms are
usually computationally more expensive. Global optimization problems can be solved exactly
using Minimize or numerically using NMinimize.
Min x - y
s.t. - 3 x2 + 2 x y - y2 ¥ -1
42 Constrained Optimization
This solves the same problem numerically. NMinimize returns a machine-number solution.
FindMinimum numerically finds a local minimum. In this example the local minimum found is
also a global minimum.
In[3]:= FindMinimumA9x - y, - 3 x2 + 2 x y - y2 ¥ - 1=, 8x, y<E
-17
Out[3]= 9-1., 9x Ø 2.78301 µ 10 , y Ø 1.==
Finding a global optimum can be arbitrarily difficult, even without constraints, and so the meth-
ods used may fail. It may frequently be useful to optimize the function several times with
different starting conditions and take the best of the results.
1 2
This finds the minimum of Jy - N + x2 subject to the constraints y ¥ 0 and y ¥ x + 1.
2
The constraints to NMinimize and NMaximize may be either a list or a logical combination of
equalities, inequalities, and domain specifications. Equalities and inequalities may be nonlinear.
Any strong inequalities will be converted to weak inequalities due to the limits of working with
approximate numbers. Specify a domain for a variable using Element, for example,
Element@x, IntegersD or x œ Integers. Variables must be either integers or real numbers, and
will be assumed to be real numbers unless specified otherwise. Constraints are generally
enforced by adding penalties when points leave the feasible region.
In order for NMinimize to work, it needs a rectangular initial region in which to start. This is
similar to giving other numerical methods a starting point or starting points. The initial region is
specified by giving each variable a finite upper and lower bound. This is done by including
a § x § b in the constraints, or 8x, a, b< in the variables. If both are given, the bounds in the
variables are used for the initial region, and the constraints are just used as constraints. If no
initial region is specified for a variable x, the default initial region of -1 § x § 1 is used. Different
variables can have initial regions defined in different ways.
Here the initial region is taken from the variables. The problem is unconstrained.
Here the initial region for x is taken from the constraints, the initial region for y is taken from
the variables, and the initial region for z is taken to be the default. The problem is uncon-
strained in y and z, but not x.
In[7]:= NMinimizeA9x2 + y2 + z2 , 3 § x § 4=, 8x, 8y, 2, 5<, z<E
Out[7]= 89., 8x Ø 3., y Ø 0., z Ø 0.<<
2
The polynomial 4 x4 - 4 x2 + 1 has global minima at x Ø ± . NelderMead finds one of the
2
minima.
44 Constrained Optimization
2
The polynomial 4 x4 - 4 x2 + 1 has global minima at x Ø ± . NelderMead finds one of the
2
minima.
In[48]:= NMinimizeA4 x4 - 4 x2 + 1, x, Method -> "NelderMead"E
Out[48]= 80., 8x Ø 0.707107<<
Out[53]=
Use RandomSearch with more starting points to find the global minimum.
In[55]:= NMinimize@f, 8x, y<, Method Ø 8"RandomSearch", "SearchPoints" Ø 250<D
Out[55]= 8-3.30687, 8x Ø -0.0244031, y Ø 0.210612<<
With the default method, NMinimize picks which method to use based on the type of problem.
If the objective function and constraints are linear, LinearProgramming is used. If there are
integer variables, or if the head of the objective function is not a numeric function, differential
evolution is used. For everything else, it uses Nelder-Mead, but if Nelder-Mead does poorly, it
switches to differential evolution.
Because the methods used by NMinimize may not improve every iteration, convergence is only
checked after several iterations have occurred.
Nelder|Mead
The Nelder|Mead method is a direct search method. For a function of n variables, the algorithm
maintains a set of n + 1 points forming the vertices of a polytope in n-dimensional space. This
method is often termed the "simplex" method, which should not be confused with the well-
known simplex method for linear programming.
At each iteration, n + 1 points x1 , x2 , …, xn+1 form a polytope. The points are ordered so that
f Hx1 L § f Hx2 L § … § f Hxn+1 L. A new point is then generated to replace the worst point xn+1 .
1
Let c be the centroid of the polytope consisting of the best n points, c = n
⁄i=1 xi . A trial point xt is
n
generated by reflecting the worst point through the centroid, xt = c + a Hc - xn+1 L, where a > 0 is a
parameter.
If the new point xt is neither a new worst point nor a new best point, f Hx1 L § f Hxt L § f Hxn L, xt
replaces xn+1 .
If the new point xt is better than the best point, f Hxt L < f Hx1 L, the reflection is very successful and
can be carried out further to xe = c + b Hxt - rL, where b > 1 is a parameter to expand the polytope.
If the expansion is successful, f Hxe L < f Hxt L, xe replaces xn+1 ; otherwise the expansion failed, and xt
replaces xn+1 .
46 Constrained Optimization
If the new point xt is better than the best point, f Hxt L < f Hx1 L, the reflection is very successful and
can be carried out further to xe = c + b Hxt - rL, where b > 1 is a parameter to expand the polytope.
If the expansion is successful, f Hxe L < f Hxt L, xe replaces xn+1 ; otherwise the expansion failed, and xt
replaces xn+1 .
If the new point xt is worse than the second worst point, f Hxt L ¥ f Hxn L, the polytope is assumed to
be too large and needs to be contracted. A new trial point is defined as
where 0 < g < 1 is a parameter. If f Hxc L < MinH f Hxn+1 L, f Hxt LL, the contraction is successful, and xc
replaces xn+1 . Otherwise a further contraction is carried out.
The process is assumed to have converged if the difference between the best function values in
the new and old polytope, as well as the distance between the new best point and the old best
point, are less than the tolerances provided by AccuracyGoal and PrecisionGoal.
Strictly speaking, Nelder|Mead is not a true global optimization algorithm; however, in practice
it tends to work reasonably well for problems that do not have many local minima.
Here the function inside the unit disk is minimized using NelderMead.
2
In[82]:= NMinimizeB:100 Iy - x2 M + H1 - xL2 , x2 + y2 § 1>, 8x, y<, Method Ø "NelderMead"F
Out[82]= 80.0456748, 8x Ø 0.786415, y Ø 0.617698<<
Here is a function with several local minima that are all different depths.
In[83]:= Clear@a, fD;
a = Reverse êü Distribute@88- 32, - 16, 0, 16, 32<, 8- 32, - 16, 0, 16, 32<<, ListD;
f = 1 ê H0.002 + Plus üü MapIndexed@1 ê HÒ2P1T + Plus üü HH8x, y< - Ò1L ^ 6LL &, aDL;
Plot3D@f, 8x, - 50, 50<, 8y, - 50, 50<, Mesh Ø None, PlotPoints Ø 25D
Out[83]=
With the default parameters, NelderMead is too easily trapped in a local minimum.
In[116]:= Do@Print@NMinimize@f, 88x, - 50, 50<, 8y, - 50, 50<<,
Method Ø 8"NelderMead", "RandomSeed" Ø i<DD, 8i, 5<D
83.96825, 8x Ø 15.9816, y Ø -31.9608<<
By using settings that are more aggressive and less likely to make the simplex smaller, the
results are better.
In[117]:= Do@Print@NMinimize@f, 88x, - 50, 50<, 8y, - 50, 50<<,
Method Ø 8"NelderMead", "ShrinkRatio" Ø 0.95, "ContractRatio" Ø 0.95,
"ReflectRatio" Ø 2, "RandomSeed" Ø i<DD, 8i, 5<D
83.96825, 8x Ø 15.9816, y Ø -31.9608<<
82.98211, 8x Ø -0.0132362, y Ø -31.9651<<
81.99203, 8x Ø -15.9864, y Ø -31.9703<<
816.4409, 8x Ø -15.9634, y Ø 15.9634<<
80.998004, 8x Ø -31.9783, y Ø -31.9783<<
48 Constrained Optimization
Differential Evolution
Differential evolution is a simple stochastic function minimizer.
During each iteration of the algorithm, a new population of m points is generated. The jth new
point is generated by picking three random points, xu , xv and xw , from the old population, and
forming xs = xw + sHxu - xv L, where s is a real scaling factor. Then a new point xnew is constructed
from x j and xs by taking the ith coordinate from xs with probability r and otherwise taking the
coordinate from x j . If f Hxnew L < f Ix j M, then xnew replaces x j in the population. The probability r is
The process is assumed to have converged if the difference between the best function values in
the new and old populations, as well as the distance between the new best point and the old
best point, are less than the tolerances provided by AccuracyGoal and PrecisionGoal.
The differential evolution method is computationally expensive, but is relatively robust and
tends to work well for problems that have more local minima.
Here the function inside the unit disk is minimized using DifferentialEvolution.
2
In[125]:= NMinimizeB:100 Iy - x2 M + H1 - xL2 , x2 + y2 § 1>,
8x, y<, Method Ø "DifferentialEvolution"F
Out[125]= 80.0456748, 8x Ø 0.786415, y Ø 0.617698<<
In[127]:= f = 2 x1 + 3 x2 + 3 y1 ê 2 + 2 y2 - y3 ê 2;
c = 8x1 ^ 2 + y1 ã 5 ê 4, x2 ^ H3 ê 2L + 3 y2 ê 2 == 3,
x1 + y1 § 8 ê 5, 4 x2 ê 3 + y2 § 3, y3 § y1 + y2, 0 § x1 § 10, 0 § x2 § 10,
0 § y1 § 1, 0 § y2 § 1, 0 § y3 § 1, 8y1, y2, y3< œ Integers
<;
v = 8x1, x2, y1, y2, y3<;
By adjusting ScalingFactor, the results are much better. In this case, the increased
ScalingFactor gives DifferentialEvolution better mobility with respect to the integer
variables.
In[131]:= NMinimize@8f, c<, v, Method Ø 8"DifferentialEvolution", "ScalingFactor" Ø 1<D
Out[131]= 87.66718, 8x1 Ø 1.11803, x2 Ø 1.31037, y1 Ø 0, y2 Ø 1, y3 Ø 1<<
Simulated Annealing
Simulated annealing is a simple stochastic function minimizer. It is motivated from the physical
process of annealing, where a metal object is heated to a high temperature and allowed to cool
slowly. The process allows the atomic structure of the metal to settle to a lower energy state,
thus becoming a tougher metal. Using optimization terminology, annealing allows the structure
to escape from a local minimum, and to explore and settle on a better, hopefully global,
minimum.
At each iteration, a new point, xnew , is generated in the neighborhood of the current point, x.
The radius of the neighborhood decreases with each iteration. The best point found so far, xbest ,
is also tracked.
If f Hxnew L § f Hxbest L, xnew replaces xbest and x. Otherwise, xnew replaces x with a probability ebIi, D f , f0 M .
Here b is the function defined by BoltzmannExponent, i is the current iteration, D f is the change
in the objective function value, and f0 is the value of the objective function from the previous
-D f logHi+1L
iteration. The default function for b is .
50 Constrained Optimization
If f Hxnew L § f Hxbest L, xnew replaces xbest and x. Otherwise, xnew replaces x with a probability ebIi, D f , f0 M .
Here b is the function defined by BoltzmannExponent, i is the current iteration, D f is the change
in the objective function value, and f0 is the value of the objective function from the previous
-D f logHi+1L
iteration. The default function for b is 10
.
Like the RandomSearch method, SimulatedAnnealing uses multiple starting points, and finds
an optimum starting from each of them.
The default number of starting points, given by the option SearchPoints, is minH2 d, 50L, where d
is the number of variables.
For each starting point, this is repeated until the maximum number of iterations is reached, the
method converges to a point, or the method stays at the same point consecutively for the
number of iterations given by LevelIterations.
Out[65]=
By default, the step size for SimulatedAnnealing is not large enough to escape from the
local minima.
In[68]:= NMinimize@f@x, yD, 8x, y<, Method Ø "SimulatedAnnealing"D
Out[68]= 88.0375, 8x Ø 1.48098, y Ø 1.48098<<
By increasing PerturbationScale, larger step sizes are taken to produce a much better
solution.
In[69]:= NMinimize@f@x, yD, 8x, y<, Method Ø 8"SimulatedAnnealing", "PerturbationScale" Ø 3<D
Out[69]= 8-38.0779, 8x Ø 5.32216, y Ø 5.32216<<
Here BoltzmannExponent is set to use an exponential cooling function that gives faster
convergence. (Note that the modified PerturbationScale is still being used as well.)
In[70]:= NMinimize@f@x, yD, 8x, y<, Method Ø 8"SimulatedAnnealing", "PerturbationScale" Ø 3,
"BoltzmannExponent" Ø Function@8i, df, f0<, - df ê HExp@i ê 10DLD<D
Out[70]= 8-38.0779, 8x Ø 5.32216, y Ø 5.32216<<
Random Search
The random search algorithm works by generating a population of random starting points and
uses a local optimization method from each of the starting points to converge to a local mini-
mum. The best local minimum is chosen to be the solution.
The possible local search methods are Automatic and "InteriorPoint". The default method is
Automatic, which uses FindMinimum with unconstrained methods applied to a system with
penalty terms added for the constraints. When Method is set to "InteriorPoint", a nonlinear
interior-point method is used.
52 Constrained Optimization
The possible local search methods are Automatic and "InteriorPoint". The default method is
Automatic, which uses FindMinimum with unconstrained methods applied to a system with
penalty terms added for the constraints. When Method is set to "InteriorPoint", a nonlinear
interior-point method is used.
The default number of starting points, given by the option SearchPoints, is minH10 d, 100L, where
d is the number of variables.
Convergence for RandomSearch is determined by convergence of the local method for each
starting point.
RandomSearch is fast, but does not scale very well with the dimension of the search space. It
also suffers from many of the same limitations as FindMinimum . It is not well suited for discrete
problems and others where derivatives or secants give little useful information about the prob-
lem.
Here the function inside the unit disk is minimized using RandomSearch.
2
In[71]:= NMinimizeB:100 Iy - x2 M + H1 - xL2 , x ^ 2 + y ^ 2 § 1>, 8x, y<, Method Ø "RandomSearch"F
Out[71]= 80.0456748, 8x Ø 0.786415, y Ø 0.617698<<
Constrained Optimization 53
Here is a function with several local minima that are all different depths and are generally
difficult to optimize.
In[72]:= Clear@a, fD;
a = Reverse êü Distribute@88- 32, - 16, 0, 16, 32<, 8- 32, - 16, 0, 16, 32<<, ListD;
f = 1 ê H0.002 + Plus üü MapIndexed@1 ê HÒ2P1T + Plus üü HH8x, y< - Ò1L ^ 6LL &, aDL;
Plot3D@f, 8x, - 50, 50<, 8y, - 50, 50<, Mesh Ø None,
NormalsFunction Ø "Weighted", PlotPoints Ø 50D
Out[72]=
With the default number of SearchPoints, sometimes the minimum is not found.
In[73]:= Do@Print@NMinimize@f, 88x, - 50, 50<, 8y, - 50, 50<<,
Method Ø 8"RandomSearch", "RandomSeed" Ø i<DD, 8i, 5<D
81.99203, 8x Ø -15.9864, y Ø -31.9703<<
81.99203, 8x Ø -15.9864, y Ø -31.9703<<
80.998004, 8x Ø -31.9783, y Ø -31.9783<<
81.99203, 8x Ø -15.9864, y Ø -31.9703<<
80.998004, 8x Ø -31.9783, y Ø -31.9783<<
This uses nonlinear interior point methods to find the minimum of a sum of squares.
In[76]:= n = 10;
f = SumAHx@iD - Sin@iDL2 , 8i, 1, n<E;
c = Table@- 0.5 < x@iD < 0.5, 8i, n<D;
v = Array@x, nD;
Timing@NMinimize@8f, c<, v, Method Ø 8"RandomSearch", Method Ø "InteriorPoint"<DD
Out[80]= 88.25876, 80.82674, 8x@1D Ø 0.5, x@2D Ø 0.5, x@3D Ø 0.14112, x@4D Ø -0.5,
x@5D Ø -0.5, x@6D Ø -0.279415, x@7D Ø 0.5, x@8D Ø 0.5, x@9D Ø 0.412118, x@10D Ø -0.5<<<
For some classes of problems, limiting the number of SearchPoints can be much faster
without affecting the quality of the solution.
In[81]:= Timing@NMinimize@8f, c<, v,
Method Ø 8"RandomSearch", Method Ø "InteriorPoint", "SearchPoints" Ø 1<DD
Out[81]= 80.320425, 80.82674, 8x@1D Ø 0.5, x@2D Ø 0.5, x@3D Ø 0.14112, x@4D Ø -0.5,
x@5D Ø -0.5, x@6D Ø -0.279415, x@7D Ø 0.5, x@8D Ø 0.5, x@9D Ø 0.412118, x@10D Ø -0.5<<<
Introduction
Exact global optimization problems can be solved exactly using Minimize and Maximize.
This computes the radius of the circle, centered at the origin, circumscribed about the set
x4 + 3 y4 § 7.
This computes the radius of the circle, centered at the origin, circumscribed about the set
a x2 + b y2 § 1 as a function of the parameters a and b.
1
Hb > 0 && a ã bL »» Hb > 0 && 0 < a < bL
a
1
b > 0 && a ã 2 b
Out[2]= : a-b ,
1
Hb > 0 && a > 2 bL »» Hb > 0 && b < a < 2 bL
b
¶ True
0 Hb > 0 && a ã 2 bL »» Hb > 0 && a > 2 bL »» Hb > 0 && b < a < 2 bL
1
- b > 0 && 0 < a < b
a
:x Ø ,
1
a
- b > 0 && a ã b
2
Indeterminate True
0 b > 0 && 0 < a < b
1 1
- 3 b > 0 && a ã b
2 a
1
yØ - b > 0 && a ã 2 b >>
a-b
1
- Hb > 0 && a > 2 bL »» Hb > 0 && b < a < 2 bL
b
Indeterminate True
Algorithms
Depending on the type of problem, several different algorithms can be used.
The most general method is based on the cylindrical algebraic decomposition (CAD) algorithm.
It applies when the objective function and the constraints are real algebraic functions. The
method can always compute global extrema (or extremal values, if the extrema are not
attained). If parameters are present, the extrema can be computed as piecewise-algebraic
functions of the parameters. A downside of the method is its high, doubly exponential complex-
ity in the number of variables. In certain special cases not involving parameters, faster methods
can be used.
When the objective function and all constraints are linear with rational number coefficients,
global extrema can be computed exactly using the simplex algorithm.
For univariate problems, equation and inequality solving methods are used to find the con-
straint solution set and discontinuity points and zeros of the derivative of the objective function
within the set.
56 Constrained Optimization
For univariate problems, equation and inequality solving methods are used to find the con-
straint solution set and discontinuity points and zeros of the derivative of the objective function
within the set.
Another approach to finding global extrema is to find all the local extrema, using the Lagrange
multipliers or the Karush|Kuhn|Tucker (KKT) conditions, and pick the smallest (or the greatest).
However, for a fully automatic method, there are additional complications. In addition to solving
the necessary conditions for local extrema, it needs to check smoothness of the objective func-
tion and smoothness and nondegeneracy of the constraints. It also needs to check for extrema
at the boundary of the set defined by the constraints and at infinity, if the set is unbounded.
This in general requires exact solving of systems of equations and inequalities over the reals,
which may lead to CAD computations that are harder than in the optimization by CAD algo-
rithm. Currently Mathematica uses Lagrange multipliers only for equational constraints within a
bounded box. The method also requires that the number of stationary points and the number of
singular points of the constraints be finite. An advantage of this method over the CAD-based
algorithm is that it can solve some transcendental problems, as long as they lead to systems of
equations that Mathematica can solve.
Examples
This finds the point on the cubic curve x3 - x + y2 a which is closest to the origin, as a function
of the parameter a.
In[4]:= min = MinimizeA9x2 + y2 , x3 + y2 - x ã a=, 8x, y<E
1 8
aã
9 27
1 8 80
Out[4]= : H-5 + 27 aL <a§ ,
27 27 27
: , yØ
Out[4]=
Constrained Optimization 57
1 8 8 80
- aã »» <a§
3 27 27 27
80 2
:x Ø RootA-a + RootA-a2 + Ò1 - 2 Ò12 + Ò13 &, 1E - Ò1 - Ò12 + Ò13 &, 1E a > »» a < - , yØ
27
3 3
8 8 80
- - +a <a§
27 27 27
80
-- Ja + RootA-a + RootA-a2 + Ò1 - 2 Ò12 + Ò13 &, 1E - Ò1 - Ò12 + Ò13 &, 1E - a> »» a < -
27
3 3
RootA-a + RootA-a2 + Ò1 - 2 Ò12 + Ò13 &, 1E - Ò1 - Ò12 + Ò13 &, 1E N
>>
This visualization shows the point on the cubic curve x3 - x + y2 a which is closest to the origin,
and the distance m of the point from the origin. The value of parameter a can be modified using
the slider. The visualization uses the minimum computed by Minimize .
In[5]:= plot@a_D := ContourPlotAx3 + y2 - x ã a,
8x, - 3, 3<, 8y, - 3, 3<, PlotRange -> 88- 3, 3<, 8- 3, 3<<E;
minval@a_D := Evaluate@min@@1DDD
minpt@a_D := Evaluate@min@@2DDD
mmark = Graphics@Text@Style@"m=", 10D, 81.25, 2.5<DD;
mvalue@a_D :=
Graphics@Text@Style@PaddedForm@minval@aD, 85, 3<D, 10D, 82, 2.5<DD;
amark = Graphics@Text@Style@"a=", 10D, 81.25, 2.8<DD;
avalue@a_D := Graphics@Text@Style@PaddedForm@a, 85, 3<D, 10D, 82, 2.8<DD;
mpoint@a_D := Graphics@8PointSize@0.03D, Red, Point@Re@8x, y< ê. minpt@aDDD<D;
Manipulate@Show@8plot@aD, amark, avalue@aD, mmark, mvalue@aD, mpoint@aD<D,
88a, 4.5<, - 5, 5<, SaveDefinitions Ø TrueD
3
a= 4.500
m= 3.430
2
1
Out[13]=
0
-1
-2
-3
-3 -2 -1 0 1 2 3
If algebraic functions are present, they are replaced with new variables; equations and inequali-
ties satisfied by the new variables are added. The variables replacing algebraic functions are
projected first. They also require special handling in the lifting phase of the algorithm.
Projection operator improvements by Hong, McCallum, and Brown can be used here, including
the use of equational constraints. Note that if a new variable needs to be introduced, there is at
least one equational constraint, namely y ã f .
When lifting the minimization variable y, you start with the smallest values of y, and proceed
(lifting the remaining variables in the depth-first manner) until you find the first cell for which
the constraints are satisfied. If this cell corresponds to a root of a projection polynomial in y,
Constrained Optimization 59
When lifting the minimization variable y, you start with the smallest values of y, and proceed
(lifting the remaining variables in the depth-first manner) until you find the first cell for which
the constraints are satisfied. If this cell corresponds to a root of a projection polynomial in y,
the root is the minimum value of f , and the coordinates corresponding to x of any point in the
cell give a point at which the minimum is attained. If parameters are present, you get a paramet -
ric description of a point in the cell in terms of roots of polynomials bounding the cell. If there
are no parameters, you can simply give the sample point used by the CAD algorithm. If the first
cell satisfying the constraints corresponds to an interval Hr, sL, where r is a root of a projection
polynomial in y, then r is the infimum of values of f , and the infimum value is not attained.
Finally, if the first cell satisfying the constraints corresponds to an interval H-¶, sL, f is
unbounded from below.
Experimental`Infimum@8 f ,cons<,8x,y,…<D
find the infimum of values of f on the set of points satisfy-
ing the constraints cons.
Experimental`Supremum@8 f ,cons<,8x,y,…<D
find the supremum of values of f on the set of points
satisfying the constraints cons.
This finds the smallest ball centered at the origin which contains the set bounded by the surface
x4 - y z x + 2 y4 + 3 z4 1. A full Maximize call with the same input did not finish in 10 minutes.
In[14]:= Experimental`SupremumA9x2 + y2 + z2 , x4 + 2 y4 + 3 z4 - x y z < 1=, 8x, y, z<E êê Timing
2
Out[14]= 94.813, -RootA-1 341 154 819 099 - 114 665 074 208 Ò1 + 4 968 163 024 164 Ò1 +
288 926 451 967 Ò13 - 7 172 215 018 940 Ò14 - 240 349 978 752 Ò15 + 5 066 800 071 680 Ò16 +
69 844 008 960 Ò17 - 1 756 156 133 376 Ò18 - 2 717 908 992 Ò19 + 239 175 991 296 Ò110 &, 1E=
Linear Optimization
60 Constrained Optimization
Linear Optimization
When the objective function and all constraints are linear, global extrema can be computed
exactly using the simplex algorithm.
Optimization problems where the objective is a fraction of linear functions and the constraints
are linear (linear fractional programs) reduce to linear optimization problems. This solves a
random linear fractional minimization problem with ten variables.
In[23]:= SeedRandom@2D; n = 10;
A = Table@RandomInteger@8- 1000, 1000<D, 8n ê 2<, 8n<D;
B = Table@RandomInteger@8- 1000, 1000<D, 8n ê 2<, 8n<D;
a = Table@RandomInteger@8- 1000, 1000<D, 8n ê 2<D;
b = Table@RandomInteger@8- 1000, 1000<D, 8n ê 2<D;
g = Table@RandomInteger@8- 1000, 1000<D, 8n<D;
d = Table@RandomInteger@8- 1000, 1000<D, 8n<D;
X = x êü Range@nD;
Minimize@8g.X ê d.X, And üü Thread@A.X ã aD && And üü Thread@b § B.X § b + 100D<, XD
1 286 274 653 702 415 809 313 525 025 452 519
Out[31]= :- ,
437 743 412 320 661 916 541 674 600 912 158
611 767 491 996 227 433 062 183 883 923 2 665 078 586 976 600 235 350 409 286 849
:x@1D Ø , x@2D Ø ,
599 276 957 533 098 032 663 796 688 622 1 198 553 915 066 196 065 327 593 377 244
215 391 679 158 483 849 611 061 030 533 1 477 394 491 589 036 027 204 142 993 013
x@3D Ø - , x@4D Ø ,
299 638 478 766 549 016 331 898 344 311 599 276 957 533 098 032 663 796 688 622
Constrained Optimization 61
473 657 331 854 113 835 444 689 628 600 955 420 726 065 204 315 229 251 112 109
x@5D Ø , x@6D Ø - ,
299 638 478 766 549 016 331 898 344 311 599 276 957 533 098 032 663 796 688 622
265 603 080 958 760 324 085 018 021 123 447 840 634 450 080 124 431 365 644 067
x@7D Ø , x@8D Ø - ,
1 198 553 915 066 196 065 327 593 377 244 599 276 957 533 098 032 663 796 688 622
2 155 930 215 697 442 604 517 040 669 063 18 201 652 848 869 287 002 844 477 177
x@9D Ø - , x@10D Ø >>
1 198 553 915 066 196 065 327 593 377 244 299 638 478 766 549 016 331 898 344 311
Univariate Optimization
For univariate problems, equation and inequality solving methods are used to find the con-
straint solution set and discontinuity points and zeros of the derivative of the objective function
within the set.
In[32]:= m = MinimizeAx2 + 2x , xE
2
Log@2D2 Log@2D2 Log@2D2
-
ProductLogB
2
F
ProductLogB F ProductLogB F
2 2
Out[32]= :2 LogA2E
+ , :x Ø - >>
Log@2D2 Log@2D
2.5
Out[33]= 2.0
1.5
Here Mathematica recognizes that the objective functions and the constraints are periodic.
p 2 1 1
In[34]:= MinimizeB:TanB2 x - F ,- § Sin@xD § >, xF
2 2 2
1 p
Out[34]= : , :x Ø >>
3 6
Here is an example where the minimum is attained at a singular point of the constraints.
1.5
1.0
Out[36]=
0.5
0.0
-0.5
-2 -1 0 1 2
The maximum of the same objective function is attained on the boundary of the set defined by
the constraints.
In[37]:= m = MaximizeA9y, y3 ã x2 && - 2 § x § 2 && - 2 § y § 2=, 8x, y<E
3 3
Out[37]= 9RootA-4 + Ò1 &, 1E, 9x Ø -2, y Ø RootA-4 + Ò1 &, 1E==
1.5
1.0
Out[38]=
0.5
0.0
-0.5
-2 -1 0 1 2
Here is a set of 3-dimensional examples with the same constraints. Depending on the objective
function, the maximum is attained at a stationary point of the objective function on the solution
set of the constraints, at a stationary point of the restriction of the objective function to the
boundary of the solution set of the constraints, and at the boundary of the boundary of the
solution set of the constraints.
Here the maximum is attained at a stationary point of the objective function on the solution set
of the constraints.
In[40]:= m = MaximizeAx + y + z, x2 + y2 + z2 ã 9 && - 2 § x § 2 && - 2 § y § 2 && - 2 § z § 2, 8x, y, z<E
Out[40]= :3 3 , :x Ø 3 , yØ 3 , zØ 3 >>
Out[41]=
Here the maximum is attained at a stationary point of the restriction of the objective function to
the boundary of the solution set of the constraints.
In[42]:= m = MaximizeAx + y + 2 z, x2 + y2 + z2 ã 9 && - 2 § x § 2 && - 2 § y § 2 && - 2 § z § 2, 8x, y, z<E
5 5
Out[42]= :4 + 10 , :x Ø , yØ , z Ø 2>>
2 2
64 Constrained Optimization
Out[43]=
Here the maximum is attained at the boundary of the boundary of the solution set of the
constraints.
In[44]:= m = MaximizeAx + 2 y + 2 z, x2 + y2 + z2 ã 9 && - 2 § x § 2 && - 2 § y § 2 && - 2 § z § 2, 8x, y, z<E
Out[44]= 89, 8x Ø 1, y Ø 2, z Ø 2<<
Out[45]=
Constrained Optimization 65
The Lagrange multiplier method works for some optimization problems involving transcendental
functions.
In[46]:= MinimizeA9y + Sin@10 xD, y3 ã Cos@5 xD && - 5 § x § 5 && - 5 § y § 5=, 8x, y<E
Out[46]= :AlgebraicNumberB
RootA43 046 721 - 95 659 380 Ò12 - 59 049 Ò13 + 78 653 268 Ò14 - 32 805 Ò15 - 29 052 108 Ò16 - 7290 Ò17 +
4 763 286 Ò18 - 810 Ò19 - 358 668 Ò110 - 45 Ò111 + 11 988 Ò112 - Ò113 - 180 Ò114 + Ò116 &, 6E,
2825 1 10 645 1271 117 277 421 177 851 157 13 523
:0, , , - , , , , - , , ,
256 81 768 186 624 20 736 279 936 186 624 944 784 186 624
625 36 749 83 4975 83
, - , , , 0, - >F - SinB
68 024 448 15 116 544 408 146 688 136 048 896 408 146 688
4 p - ArcTanBAlgebraicNumberBRootA43 046 721 - 95 659 380 Ò12 - 59 049 Ò13 + 78 653 268 Ò14 - 32 805 Ò15 -
29 052 108 Ò16 - 7290 Ò17 + 4 763 286 Ò18 - 810 Ò19 - 358 668 Ò110 - 45 Ò111 + 11 988 Ò112 -
1
Ò113 - 180 Ò114 + Ò116 &, 6E, :0, , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>FF F,
3
2
:x Ø - p - ArcTanBAlgebraicNumberBRootA43 046 721 - 95 659 380 Ò12 - 59 049 Ò13 + 78 653 268 Ò14 -
5
32 805 Ò15 - 29 052 108 Ò16 - 7290 Ò17 + 4 763 286 Ò18 - 810 Ò19 - 358 668 Ò110 - 45 Ò111 + 11 988
1
Ò112 - Ò113 - 180 Ò114 + Ò116 &, 6E, :0, , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>FF ,
3
y Ø AlgebraicNumberBRootA43 046 721 - 95 659 380 Ò12 - 59 049 Ò13 + 78 653 268 Ò14 -
32 805 Ò15 - 29 052 108 Ò16 - 7290 Ò17 + 4 763 286 Ò18 - 810 Ò19 -
358 668 Ò110 - 45 Ò111 + 11 988 Ò112 - Ò113 - 180 Ò114 + Ò116 &, 6E,
2825 1 10 645 1271 117 277 421 177 851 157 13 523
:0, , , - , , , , - , , ,
256 81 768 186 624 20 736 279 936 186 624 944 784 186 624
625 36 749 83 4975 83
, - , , , 0, - >F>>
68 024 448 15 116 544 408 146 688 136 048 896 408 146 688
To solve an integer linear programming problem Mathematica first solves the equational con-
straints, reducing the problem to one containing inequality constraints only. Then it uses lattice
reduction techniques to put the inequality system in a simpler form. Finally, it solves the simpli-
fied optimization problem using a branch-and-bound method.
This solves a randomly generated integer linear programming problem with 7 variables.
In[48]:= SeedRandom@1D;
A = Table@RandomInteger@8- 1000, 1000<D, 83<, 87<D;
a = Table@RandomInteger@8- 1000, 1000<D, 83<D;
B = Table@RandomInteger@8- 1000, 1000<D, 83<, 87<D;
b = Table@RandomInteger@8- 1000, 1000<D, 83<D;
g = Table@RandomInteger@8- 1000, 1000<D, 87<D;
X = x êü Range@7D;
eqns = And üü Thread@A.X ã aD;
ineqs = And üü Thread@B.X § bD;
bds = And üü Thread@X ¥ 0D && Total@XD § 10100 ;
Minimize@8g.X, eqns && ineqs && bds && X œ Integers<, XD
Out[58]= 8448 932, 8x@1D Ø 990, x@2D Ø 1205, x@3D Ø 219, x@4D Ø 60, x@5D Ø 823, x@6D Ø 137, x@7D Ø 34<<
This finds a point with integer coordinates maximizing x + y among the points lying below the
cubic x3 + y3 1000.
In[59]:= m = MaximizeA9x + y, x3 + y3 § 1000 && Hx yL œ Integers=, 8x, y<E
Out[59]= 815, 8x Ø 6, y Ø 9<<
10
Out[60]= 0
-10
-20
-20 -10 0 10 20
Minimize and Maximize can find exact global optima for a class of optimization problems
containing arbitrary polynomial problems. However, the algorithms used have a very high
asymptotic complexity and therefore are suitable only for problems with a small number of
variables.
Maximize always finds a global maximum, even in cases that are numerically unstable. The
2
left-hand side of the constraint here is Ix2 + y2 - 1010 M Ix2 + y2 M.
In[1]:= MaximizeA9x + y,
100 000 000 000 000 000 000 x2 - 20 000 000 000 x4 + x6 + 100 000 000 000 000 000 000 y2 -
40 000 000 000 x2 y2 + 3 x4 y2 - 20 000 000 000 y4 + 3 x2 y4 + y6 § 1=, 8x, y<E êê N@Ò, 20D &
Out[1]= 8141 421.35623730957559, 8x Ø 70 710.678118654787795, y Ø 70 710.678118654787795<<
This input differs from the previous one only in the twenty-first decimal digit, but the answer is
quite different, especially the location of the maximum point. For an algorithm using 16 digits of
precision both problems look the same, hence it cannot possibly solve them both correctly.
68 Constrained Optimization
This input differs from the previous one only in the twenty-first decimal digit, but the answer is
quite different, especially the location of the maximum point. For an algorithm using 16 digits of
precision both problems look the same, hence it cannot possibly solve them both correctly.
In[2]:= MaximizeA9x + y,
100 000 000 000 000 000 001 x2 - 20 000 000 000 x4 + x6 + 100 000 000 000 000 000 000 y2 -
40 000 000 000 x2 y2 + 3 x4 y2 - 20 000 000 000 y4 + 3 x2 y4 + y6 § 1=, 8x, y<E êê N@Ò, 20D &
Out[2]= 8100 000.99999500000000, 8x Ø 1.0000000000000000000, y Ø 99 999.999995000000000<<
NMaximize, which by default uses machine-precision numbers, is not able to solve either of the
problems.
In[3]:= NMaximizeA
9x + y, 100 000 000 000 000 000 000 x2 - 20 000 000 000 x4 + x6 + 100 000 000 000 000 000 000 y2 -
40 000 000 000 x2 y2 + 3 x4 y2 - 20 000 000 000 y4 + 3 x2 y4 + y6 § 1=, 8x, y<E
NMaximize::incst :
NMaximize was unable to generate any initial points satisfying the inequality constraints
9-1 + 100000000000000000000 x2 - 20000000000 x4 + x6 + 100000000000000000000 y2 - á1à +
3 x4 y2 - 20000000000 y4 + 3 x2 y4 + y6 § 0=. The initial
region specified may not contain any feasible points. Changing the initial
region or specifying explicit initial points may provide a better solution. à
-10 -11 -11
Out[3]= 91.35248 µ 10 , 9x Ø 4.69644 µ 10 , y Ø 8.82834 µ 10 ==
In[4]:= NMaximizeA
9x + y, 100 000 000 000 000 000 001 x2 - 20 000 000 000 x4 + x6 + 100 000 000 000 000 000 000 y2 -
40 000 000 000 x2 y2 + 3 x4 y2 - 20 000 000 000 y4 + 3 x2 y4 + y6 § 1=, 8x, y<E
NMaximize::incst :
NMaximize was unable to generate any initial points satisfying the inequality constraints
9-1 + 100000000000000000001 x2 - 20000000000 x4 + x6 + 100000000000000000000 y2 - á1à +
3 x4 y2 - 20000000000 y4 + 3 x2 y4 + y6 § 0=. The initial
region specified may not contain any feasible points. Changing the initial
region or specifying explicit initial points may provide a better solution. à
-10 -11 -11
Out[4]= 91.35248 µ 10 , 9x Ø 4.69644 µ 10 , y Ø 8.82834 µ 10 ==
FindMinimum only attempts to find a local minimum, therefore is suitable when a local optimum
is needed, or when it is known in advance that the problem has only one optimum or only a few
optima that can be discovered using different starting points.
Even for local optimization, it may still be worth using NMinimize for small problems.
NMinimize uses one of the four direct search algorithms (Nelder|Mead, differential evolution,
simulated annealing, and random search), then finetunes the solution by using a combination of
KKT solution, the interior point, and a penalty method. So if efficiency is not an issue,
NMinimize should be more robust than FindMinimum , in addition to being a global optimizer.
Constrained Optimization 69
This shows the default behavior of NMinimize on a problem with four variables.
This shows that two of the post-processors, KKT and FindMinimum , do not give the default
result. Notice that for historical reasons, the name FindMinimum , when used as an option value
of PostProcess, stands for the process where a penalty method is used to convert the con-
strained optimization problem into unconstrained optimization methods and then solved using
(unconstrained) FindMinimum .
However, if efficiency is important, FindMinimum can be used if you just need a local minimum,
or you can provide a good starting point, or you know the problem has only one minimum
(e.g., convex), or your problem is large/expensive. This uses FindMinimum and NMinimize to
solve the same problem with seven variables. The constraints are relatively expensive to com-
pute. Clearly FindMinimum in this case is much faster than NMinimize.
NMinimize::incst : NMinimize was unable to generate any initial points satisfying the inequality constraints
3.1 x@2D0.5 x@6D1ê3 1.3 x@2D x@6D 0.8 x@3D x@6D2
:-1 + + + § 0, á4à, -1 + á4à § 0>.
x@1D x@4D2 x@5D x@1D0.5 x@3D x@5D x@4D x@5D
The initial region specified may not contain any feasible points. Changing the
initial region or specifying explicit initial points may provide a better solution. à
NMinimize::incst : NMinimize was unable to generate any initial points satisfying the inequality constraints
3.1 x@2D0.5 x@6D1ê3 1.3 x@2D x@6D 0.8 x@3D x@6D2
:-1 + + + § 0, á4à, -1 + á4à § 0>.
x@1D x@4D2 x@5D x@1D0.5 x@3D x@5D x@4D x@5D
The initial region specified may not contain any feasible points. Changing the
initial region or specifying explicit initial points may provide a better solution. à
Out[15]= 88.151, 8911.881, 8x@1D Ø 3.89625, x@2D Ø 0.809359, x@3D Ø 2.66439,
x@4D Ø 4.30091, x@5D Ø 0.853555, x@6D Ø 1.09529, x@7D Ø 0.0273105<<<
Constrained Optimization 71
Constrained Optimization in
Mathematica~References
[1] Mehrotra, S. "On the Implementation of a Primal-Dual Interior Point Method." SIAM Journal
on Optimization 2 (1992): 575|601.
[2] Nelder, J.A. and R. Mead. "A Simplex Method for Function Minimization." The Computer
Journal 7 (1965): 308|313.
[3] Ingber, L. "Simulated Annealing: Practice versus Theory." Mathematical Computer Modelling
18, no. 11 (1993): 29|57.
[4] Price, K. and R. Storn. "Differential Evolution." Dr. Dobb's Journal 264 (1997): 18|24.