Mathematical Optimization Linear Program
Mathematical Optimization Linear Program
Mathematical
Optimization
From Linear Programming
to
Metaheuristics
Xin-She Yang
Xin-She Yang
Conditions of Sale
ISBN 978-1-904602-82-8
Preface v
I Fundamentals 1
1 Mathematical Optimization 3
1.1 Optimization . . . . . . . . . . . . . . . . . . . . 3
1.2 Optimality Criteria . . . . . . . . . . . . . . . . . 6
1.3 Computational Complexity . . . . . . . . . . . . 7
1.4 NP-Complete Problems . . . . . . . . . . . . . . 9
2 Norms and Hessian Matrices 11
i
ii CONTENTS
4.3 Gauss-Jordan Elimination . . . . . . . . . . . . . 41
4.4 LU Factorization . . . . . . . . . . . . . . . . . . 43
4.5 Iteration Methods . . . . . . . . . . . . . . . . . 45
4.5.1 Jacobi Iteration Method . . . . . . . . . . 45
4.5.2 Gauss-Seidel Iteration . . . . . . . . . . . 50
4.5.3 Relaxation Method . . . . . . . . . . . . . 51
4.6 Nonlinear Equation . . . . . . . . . . . . . . . . . 51
4.6.1 Simple Iterations . . . . . . . . . . . . . . 52
4.6.2 Newton-Raphson Method . . . . . . . . . 52
II Mathematical Optimization 53
5 Unconstrained Optimization 55
8 Tabu Search 89
CONTENTS iii
8.1 Tabu Search . . . . . . . . . . . . . . . . . . . . . 89
Index 149
Preface
v
balanced view of various algorithms and to provide a right
cov-erage of useful and yet efficient algorithms selected
from a wide range of optimization techniques.
Therefore, this book strives to provide a balanced coverage of
efficient algorithms commonly used in solving mathemat-ical
optimization problems. It covers both the convectional algorithms
and modern heuristic and metaheuristic methods. Topics include
gradient-based algorithms (such as the Newton-Raphson method
and steepest descent method), Hooke-Jeeves pattern search,
Lagrange multipliers, linear programming, par-ticle swarm
optimization (PSO), simulated annealing (SA), and Tabu search.
We also briefly introduce the multiobjective opti-mization including
important concepts such as Pareto optimal-ity and utility method,
and provide three Matlab and Octave programs so as to
demonstrate how PSO and SA work. In ad-dition, we will use an
example to demonstrate how to modify these programs to solve
multiobjective optimization problems using the recursive method.
Xin-She Yang
Cambridge, 2008
vi
Part I
Fundamentals
Chapter 1
Mathematical
Optimization
1.1 Optimization
Whatever the real world problem is, it is usually possible to
formulate the optimization problem in a generic form. All op-
timization problems with explicit objectives can in general be
expressed as nonlinearly constrained optimization problems
in the following generic form
maximize/minimize f (x), x = (x1, x2, ..., xn)T ∈ <n,
x <n
subject to φj (x) = 0, (j = 1, 2, ..., M ),
3
4 Chapter 1. Mathematical Optimization
ψk (x) ≥ 0, (k = 1, ..., N ), (1.1)
where f (x), φi(x) and ψj (x) are scalar functions of the real
T
col-umn vector x. Here the components xi of x = (x1, ..., xn)
are called design variables or more often decision variables,
and they can be either continuous, or discrete or mixed of
these two. The vector x is often called a decision vector
n
which varies in a n-dimensional space < . The function f (x)
is called the objective function or cost function. In addition,
φi(x) are con-straints in terms of M equalities, and ψj (x) are
constraints written as N inequalities. So there are M + N
constraints in total. The optimization problem formulated
here is a nonlinear constrained problem.
The space spanned by the decision variables is called the
n
search space < , while the space formed by the objective func-
tion values is called the solution space. The optimization prob-
n
lem essentially maps the < domain or space of decision vari-
ables into a solution space < (or the real axis in general).
The objective function f (x) can be either linear or non-
linear. If the constraints φi and ψj are all linear, it becomes a
linearly constrained problem. Furthermore, φi, ψj and the
objective function f (x) are all linear, then it becomes a linear
programming problem. If the objective is at most quadratic
with linear constraints, then it is called quadratic program-
ming. If all the values of the decision variables can be
integers, then this type of linear programming is called
integer program-ming or integer linear programming.
Linear programming is very important in applications and
has been well-studied, while there is still no generic method for
solving nonlinear programming in general, though some im-
portant progress has been made in the last few decades. It is
worth pointing out that the term programming here means
planning, it has nothing to do with computer programming and
the wording coincidence is purely incidental.
On the other hand, if no constraints are specified so that
xi can take any values in the real axis (or any integers), the
optimization problem is referred to as the unconstrained opti-mization
problem.
1.1 Optimization 5
subject to
x1 ≥ 1,x2 − 2 = 0. (1.4)
2 −x2
Example 1.1: To find the minimum of f (x) = x e , we have
0
the stationary condition f (x) = 0 or
0 −x2 2 −x2 3 −x2
f (x) = 2x × e + x × (−2x)e = 2(x − x )e = 0. As
−x2
e > 0, we have
2
x(1 − x ) = 0,
or
x = 0, x = ±1.
The second derivative
00 −x2 2 4
f (x) = 2e (1 − 5x + 2x ),
which is an even function with respect to x. So at x = ±1, f
00 2 4 −(±1)2 −1
(±1) = 2[1 − 5(±1) + 2(±1) ]e = −4e < 0. Thus,
−1
the maximum of fmax = e occur at x∗ = ±1. At x = 0, we have f
00
(0) = 2 > 0, thus the minimum of f (x) occur at x∗ = 0
with fmin(0) = 0.
6 weak local @
@RD
strong maximum
local maximum B
A KA
C
I@
A
@
A
weak local A
minimum local minimum
with discontinuity
local minimum -
0 x
Figure 1.1: Strong and weak maxima and minima.
0
for this point f (x∗) = 0 is not valid. We will not deal with this
type of minima or maxima in detail. In our present discussion,
0
we will assume that both f (x) and f (x) are always continuous
or f (x) is everywhere twice-continuously differentiable.
2
Example 1.2: The minimum of f (x) = x at x = 0 is a strong
2 2
local minimum. The minimum of g(x, y) = (x − y) + (x − y)
at x = y = 0 is a weak local minimum because g(x, y) = 0
along the line x = y so that g(x, y = x) = 0 = g(0, 0).
f = O(g). (1.6)
f ∼ g, (1.7)
1 √ n
R2 = 12 n · 2πn(n/e) ,
1.4 NP-Complete Problems 9
X
i=1
√ q 2 2 2 ,
kvk2 = v·v = v1 + v2 + ... + vn (2.2)
| i| vmax
i=1 ma
|x vi | p )1/p
p→∞ i=1 v
p
lim ( v = lim p
X X
1 1 n 1
vi
|v
p
= plim (v max
)p (
X
p
| ) p =v
max
lim (
p
X
|v vi
|p ) p . (2.3)
→∞ →∞
max i=1 max
X
→∞
i=1
−
√
and its corresponding norms are ku + vk1 = 9, ku + vk2 = 29
and ku + vk∞ = 4. It is straightforward to check that
ku + vk1 = 9 < 12 + 5 = kuk1 + kvk1,
√ √
ku + vk2 = 29 < 42 + 3 = kuk2 + kvk2,
and
ku + vk∞ = 4 < 5 + 4 = kuk∞ + kvk∞.
XX
i=1 j=1
kAk F =
a2
m n 1/2 (2.8)
i=1 j=1 ij .
XX
k A k = max ( |a ij | ), (2.9)
1 1 j n
i=1
which is the maximum of the absolute column sum, while
≤≤ n
X
|2| + |3|
kAk∞ = max = 9,
|4| + | − 5|
and
kAkmax = 5.
(A − λI )u = 0. (2.13)
where I is a unitary matrix with the same size as A. Any non-
trivial solution requires that
det |A − λI | = 0, (2.14)
a a a
or 21 22 λ ... 2n
a ... a
a11 − λ 12 1n
− = 0, (2.15)
..
.. .
a a
. a
n1 n2 ... nn λ
which
again can be written as a poly nomial
n n−1 (2.16)
λ + αn−1λ + ... + α0 = (λ − λ1)...(λ − λn) = 0,
A = 2 −3 ,
16 Chapter 2. Norms and Hessian Matrices
can be obtained by solving
2 3 λ
4−λ −−
9 = 0.
We have
(4 − λ)(−3 − λ) − 18 = (λ − 6)(λ + 5) = 0.
Thus, the eigenvalues are λ = 6 and λ = −5. Let v = (v1 v2)T
be the eigenvector, we have for λ = 6
!
| − | 29 v2 !
A λI = −2 9 v1 = 0,
−
which means that
2/85
2.2 Eigenvalues and Eigenvectors 17
λ= T . (2.18)
u u
This means that
T if λ > 0. (2.19)
u Au > 0,
A= β γ ,
is positive definite if
αu21 + 2βu1u2 + γu22 > 0, for all u = (u1, u2)T
6= 0. The inverse ofA is
αγ − β2 −β α
!
1 γ
A−1 = −β ,
A= 2 1!,
1 2
B= 6 20 ,
p
By dividing both sides of the above equation by ku k 6= 0, we
reach the following inequality
p p 1/p (2.23)
|λ| ≤ kA k ,
ρ(A) ≤ kA k
p 1/p
, (2.24)
a a
an1 n2 ... nn
then its eigenvalues are the diagonal entries: a 11, a22, ...,
ann. In addition, the determinant of the triangular matrix A is
sim-ply the product of its diagonal entries. That is
n
Y
u1 5 2 0 u1 2
u2 n+1 4 −
1/2
2u3 −
5
T
where x = (x1, x2, ..., xn) is a vector. As the gradient rf (x) of
a linear function f (x) is always a constant vector k, then any
linear function can be written as
T (2.30)
f (x) = k x + b,
where b is a constant vector.
The second derivatives of a generic function f (x) form an
n × n matrix, called Hessian matrix, given by
∂f ... ∂f
2
∂x ∂x1∂xn
2 .1 .
G2 (x) ≡r f (x) ≡
. .
, (2.31)
. .
2
∂f ∂ f
...
∂x1∂xn ∂xn 2
1 T T
f (x) = 2 x Ax + k x + b.
When the Hessian matrix G2(x) = A is a constant matrix (the values
of its entries are independent of x), the function f (x) is called a
quadratic function, and can subsequently be written as
(2.33)
∂ f = z cos(x)
∂y∂z
2
∂ f y cos(x)
∂z 2
22 Chapter 2. Norms and Hessian Matrices
(a) (b)
f (x)
6
A
HH
HH Q
HH
HHB
Hr
P
-
0Lβ - Lα -
x
Figure 2.2: Convexity of a function f (x). Chord AB lies above
the curve segment joining A and B. For any point P , we have
Lα = αL, Lβ = βL and L = |xB − xA|.
2.5 Convexity
xB − xA xB − xA
The value of the function f (xP ) at P should be less than or
equal to the weighted combination αf (xA) + βf (xB ) (or the
value at point Q). That is
2
Example 2.7: For example, the convexity of f (x) = x − 1
requires
2 2 2 (2.40)
(αx + βy) − 1 ≤ α(x − 1) + β(y − 1), ∀x, y ∈ <,
where α, β ≥ 0 and α + β = 1. This is equivalent to require
2 2 2 (2.41)
αx + βy − (αx + βy) ≥ 0,
24 Chapter 2. Norms and Hessian Matrices
2 2 2 2 2 2
αx + βy − α x − 2αβxy − β y
2 2 (2.42)
= α(1 − α)(x − y) = αβ(x − y) ≥ 0,
2
which is always true because α, β ≥ 0 and (x−y) ≥ 0.
2
Therefore, f (x) = x − 1 is convex for ∀x ∈ <.
A function f (x) on Ω is concave if and only if g(x) = −f (x)
is convex. An interesting property of a convex function f is
that the vanishing of the gradient df /dx|x∗ = 0 guarantees
that the point x∗ is a global minimum of f . If a function is not
convex or concave, then it is much more difficult to find
global minima or maxima.
Chapter 3
Root-Finding
Algorithms
25
26 Chapter 3. Root-Finding Algorithms
the equation 1 k
x= (x + ), (3.1)
2 x
starting from a random guess, say, x = 1. The reason is that
√
the above equation can be rearranged to get x = k. In order
to carry out the iteration, we use the notation x n for the value
of x at n-th iteration. Thus, equation (3.1) provides a way of
calculating the estimate of x at n + 1 (denoted as xn+1). We
have 1 k
x = (xn + ). (3.2)
n+1 2 xn
If we start from an initial value, say, x0 = 1 at n = 0, we can
do the iterations to meet the accuracy we want.
√
Example 3.1: To find 5, we have k = 5 with an initial guess x0
= 1, and the first five iterations are as follows:
1 5
x1 = 2 (x0 x ) = 3,
+ (3.3)
0
1 5 ) ≈ 2.333333333,
x2 = 2 (x1 + x (3.4)
1
x3 ≈ 2.238095238, x4 ≈ 2.236068895, (3.5)
x5 ≈ 2.236067977. (3.6)
We can see that x5 after 5 iterations is very close to its true
√
f (x)
6 A
-
xa x∗ xn xb
0 x
B
2
bound. If x 0 > k, then x0 is the upper bound and k/x0 is the
lower bound. For other iterations, the new bounds will be x n
and k/xn. In fact, the value xn+1 is always between these two
bounds xn and k/xn, and the new estimate xn+1 is thus the
mean or average of the two bounds. This guarantees that
√
the series converges into the true value of k. This method is
similar to the bisection method below.
1
xn = 2 (xa + xb). (3.7)
We then have to test the sign of f (x n). If f (xn) < 0 (having the
same sign as f (x b)), we then update the new upper bound as
xb = xn. If f (xn) > 0 (having the same sign as f (x a)), we update
the new lower bound as x a = xn. In a special case when f (x n) =
0, we have found the true root. The iterations continue in the
same manner until a given accuracy is achieved or the
prescribed number of iterations is reached.
√
Example 3.2: If we want to find π, we have
2
f (x) = x − π = 0.
√
We can use xa = 1 and xb = 2 since π < 4 (thus π < 2). The
first bisection point is
x = 1 (x + x ) = 1 (1 + 2) = 1.5.
1 a b
2 2
Since f (xa) < 0, f (xb) > 0 and f (x1) = −0.8916 < 0, we
update the new lower bound xa = x1 = 1.5. The second
bisection point is
1
x2 = 2 (1.5 + 2) = 1.75,
and f (x2) = −0.0791 < 0, so we update lower bound again x a
= 1.75. The third bisection point is
1
x3 = 2 (1.75 + 2) = 1.875.
Since f (x3) = 0.374 > 0, we now update the new upper
bound xb = 1.875. The fourth bisection point is
1
x4 = 2 (1.75 + 1.875) = 1.8125.
√
It is within 2.5% from the true value of π ≈ 1.7724538509.
In general, the convergence of the bisection method is
very slow, and Newton’s method is a much better choice in most
cases.
3.3 Newton’s Method 29
which leads to
x − x = x ≈ f (x ) − f (x ) ,
n+1 n
n+1 n 0
(3.9)
f (xn)
or f (x ) f (x )
x n+1 ≈ x n + n+1 − n .
(3.10)
0
f (xn)
Since we try to find an approximation to f (x) = 0 with f (x n+1),
we can use the approximation f (xn+1) ≈ 0 in the above expres-
sion. Thus we have the standard Newton iterative formula
f (xn)
xn+1 = xn − . (3.11)
f 0(xn)
The iteration procedure starts from an initial guess x0 and
continues until certain criterion is met. A good initial guess
will use less number of steps, however, if there is no obvious
initial good starting point, you can start at any point on the
interval [a, b]. But if the initial value is too far from the true
zero, the iteration process may fail. So it is a good idea to
limit the number of iterations.
−x
f (x) = x − e = 0,
0 −x
that f (x) = 1 + e ,
30 Chapter 3. Root-Finding Algorithms
f (x)
6 A
x
∗ -
xn
x
0 n+1 x
x =x − .
n+1 n 1 + e−xn
Using x0 = 1, we have
−1
1−e
x = 1− ≈ 0.5378828427,
1 1 + e−1
and
x2 ≈ 0.5669869914, x3 ≈ 0.5671432859.
We can see that x3 (only three iterations) is very close to the
true root is x∗ ≈ 0.5671432904.
We have seen that Newton’s method is very efficient and
is thus so widely used. This method can be modified for
solving unconstrained optimization problems because it is
0
equivalent to find the root of the first derivative f (x) = 0
once the objective function f (x) is given.
T T
where x = (x, y, ..., z) = (x1, x2, ..., xp) , an iteration method
is usually needed to find the roots
F (x) = 0. (3.13)
n n n n (3.14)
R(x, x ) = F (x ) + J(x )(x − x ),
and
J(x) = rF , (3.15)
where J is the Jacobian of F . That is
J = ∂F .
ij i
(3.16)
∂xj
n
Here we have used the notation x for the vector x at the n-th
n
iteration, which should not be confused with the power u of a
vector u. This might be confusing, but such notations are widely
used in the literature of numerical analysis. An alterna-tive (and
n (n)
better) notation is to denote x as x , which shows the vector
value at n-th iteration using a bracket. However, we will use
both notations if no confusion could arise.
n+1
To find the next approximation x from the current es-
n n+1 n
timate x , we have to try to satisfy R(x , u ) = 0, which is
equivalent to solve a linear system with J being the
coefficient matrix
n+1 n −1 n (3.17)
x = x − J F(x ),
under a given termination criterion
n+1 n
kx −x k≤.
32 Chapter 3. Root-Finding Algorithms
0
Iterations require an initial starting vector x , which is often
0
set to x = 0.
Example 3.4: To find the roots of the system
−y 2
x−e = 0, x − y = 0,
we first write it as ! !
x1
2
x2 x2 y!
F (x) = −x , x = x .
x1 − e 2 x= 1
−
n+1 n −1 n
x = x − J F (x ),
where the Jacobian J is
∂F1 ∂F1 ! !
∂F2 ∂F2
,
J=
= 2x1 1
∂x1 ∂x2 1 e−x2
∂x1 ∂x2
−
whose inverse is
−x
−1 − 2x1e 2 −2x1 1 !
A=J
−1 = 1 −1
−e−x2
= −x −
1 + 2x1e 2 2x1 1 !.
1 1 e x2
−
Therefore, the iteration equation becomes
xn+1 = xn − un
where ! !
−x 1 2
1 + 2x2e 2 2x1 x1 x2
n −1 n x
u =J F (x ) = 1 1 e−x2 1
− e−x2
− ! −
1 + 2x1e−x2 x12 + x2 2x1e−x2
2 −x
= 1 x1 + (x1 − 1 − x2)e 2 .
−
3.4 Iteration Methods 33
If we start with the initial guess x 0 = (0, 0)T , we have the first
estimate x1 as !
0 − 0! 0!
1
x 0 −1 =
= 1 ,
and the second iteration gives !
0.3333 .
2 ! !
x = 0 − −0.33333 =
1 0.33333 0.66667
If we continue this way, the third iteration gives
!
3
x =x −
2
−0.09082847 = 0.42415551! .
0.01520796 0.6514462
Finally, the fourth iteration gives !.
0.42630051
4 3 − !
x =x − − 0.002145006 =
0.001472389 0.65291859
System of Linear
Equations
. .
am1u1 + am2u2 + ... + amnun = bn, (4.1)
can be written in the compact form as b 2
a a
a21 22 ... 2n u2
a a ... a1n
11 12 u1 b1
.
. .. .. ..
am1
a
m2
.. ... a
mn un
. = bn
. , (4.2)
or simply
Au = b. (4.3)
35
36 Chapter 4. System of Linear Equations
−1 (4.5)
u = A b.
Av = λv, (4.6)
or
(A − λI)v = 0. (4.7)
21 a22 ,
a
a a
u3 = 1 b2
a
31
11 32
12 a
b1 b3
where a a
12 13
a
11
= (4.12)
a a a
31 32 33
a a a
21 22 23 .
−1
of finding the inverse A if possible. There are many ways of
solving the linear equations, but they fall into two categories:
direct algebraic methods and iteration methods. The former
intends to find the solution by elimination, decomposition of the
matrix, and substitutions, while the later involves certain
iterations to find the approximate solutions. The choice of these
methods depends on the characteristics of the matrix A, the
size of the problem, computational time, the type of problem,
and the required solution quality.
an1 .
a
n2
a
n3 ... a
nn un
. = bn
. , (4.13)
the aim in the first step is to try to make all the coefficients in
the first column (a21, ..., an1) become zero except the first
element by elementary row operations. This is based on the
principle that a linear system will remain the same if its rows
are multiplied by some non-zero coefficients or any two rows
are interchanged or any two (or more) rows are combined
through addition and substraction.
To do this, we first divide the first equation by a11 (we can
always assume a11 6= 0, if not, we re-arrange the order of the
equations to achieve this). We now have
a1n b1
1 a12 a13 ... u2
a11 a11
a
a11 a11
2n b2
. .
.
a nn un bn
u
1
a12 a13 a1n b1
−. − a21a1 . = −. .
a21a12 21b1
.. ..
..
1a 12 1 1n
0 a ... a un
an1bn
n2 nn
an an a
a11
a11
a11
bn
−
− −
We then repeat the same procedure for the third row to the
n-th row, the final form of the linear system should be in the
following generic form
α α α
0 22 23 ... 2nu2 β2
α α α ... α
11 12 13 1n u1 β1
.. .
.. .. ..
.
0 0 0 ... α
nn un
. = βn
. , (4.15)
where α1j = a1j /a11, α2j = a2j − a1j a21/a11(j = 1, 2, ..., n),
..., β1 = b1/a11, β2 = b2 − a21b1/a11 and others. From the above
form, we see that un = βn/αnn because there is only one
unknown un in the n-th row. We can then use the back
substitution to obtain un−1 and up to u1. Therefore, we have
un = βn ,
α
nn
n
1
u = (β j X
i αii i− αij xj ), (4.16)
=i+1
where i = n − 1, n − 2, ..., 1. Obviously, in our present case,
α11 = ... = αnn = 1. Let us look at an example.
− −
4
−
40 Chapter 4. System of Linear Equations
we first divide the first row by a = 2, we have . 11
3 −2 −5 6u2 = 9
1 1/2 3/2 2u 21/2
1
−2 1 0 5u3 12
4 5 6 0u 3
− −
4
− −
4
− −
For the second row, we repeat this procedure again, we have
0 −1 19/7 0 u2 45/7
1 1/2 3/2 2 u1 21/2
0 9u3 = .
0 0
−3
141/7 8u
−33
450/7
0
−
4
−
−
After the same procedure for the third row, we have
−
0 1 19/7 0 u2 45/7
1 1/2 3/2 2 u1 21/2
0 3 u3 =
0 −1 −11 .
0 0 0 367/7u 1101/7
4
B = [A b I] = a n1 .
...
a
nn .
bn
0 . 0 ...
1 ,(4.17)
| |
4 0 −5 | 14 | 0 0 1
42 Chapter 4. System of Linear Equations
By elementary row operations, this could be changed into
1 0 0 | 1 | 5 − 5 − 2
0 − 7 7 7
B = [I|u|A ] = 0 1 0 | 5 | −7 14 14 ,
1 5 17 11
4 4 3
0 1 2
| − | − −
0 7 7 7
which gives
u= , −1 10 17 11 .
5 A =
1 1 10 −10 −4
2 14
−8 8 6
−
− −
−1
We can see that both the solution u and the inverse A are
obtained in Gauss-Jordan elimination.
The Gauss-Jordan elimination is not quite stable numeri-
cally. In order to get better and stable schemes, common
prac-tice is to use pivoting. Basically, pivoting is a scaling
procedure by dividing all the elements in a row by the
element with the largest magnitude or norm. If necessary,
rows can be exchanged so the the largest element is moved
so that it becomes the lead-ing coefficient, especially on the
diagonal position. This makes all the scaled elements to be
in the range [−1, 1]. Thus, excep-tionally large numbers are
removed, which makes the scheme more numerically stable.
An important issue in both Gauss elimination and Gauss-
Jordan elimination is the non-zero requirement of leading co-
efficients such as a11 6= 0. Fora11, it is possible to re-arrange
the equations to achieve this requirement. However, there is
no guarantee that other coefficients such as a 22 − a21a12/a11
should be nonzero. If it is zero, there is a potential difficulty due to the
dividing by zero. In order to avoid this problem, we can use other
methods such as the pivoting method and LU decomposition.
4.4 LU Factorization 43
4.4 LU Factorization
Any square matrix A can be written as the product of two
triangular matrices in the form
A = LU, (4.19)
L= βn1
. n2 ... nn , (4.20)
and
.
.. − .
α ... α 1 α
11 1,n 1,n
U= 0.... αn 1,n 1 αn . 1,n . (4.21)
0... − − −
0 α
nn
or
Uu = v, Lv = b, (4.23)
which are two linear systems with triangular matrices only,
and these systems can be solved by forward and back
substitutions. The solutions of vi are given by
1 i−1
b 1 X
where i = n − 1, ..., 1.
For triangular matrices such as L, there are some interesting
properties. The inverse of a lower (upper) triangular matrix is also a
lower (upper) triangular matrix. The determinant of the
triangular matrix is simply the product of its diagonal entries.
That is n
Y
ij α (aij − ik kj
(4.31)
jj
k=1
for i = j + 1, j + 2, ..., n.
The same issue appears again, that is, all the leading
coef-ficients αii must be non-zero. For sparse matrices with
many zero entries, this often causes some significant
problems nu-merically. Better methods such as iteration
methods should be used in this case.
Au = (D + L + U)u = b, (4.33)
which can be written as the iteration procedure
(n+1) (n) (4.34)
Du = b − (L + U)u .
This can be used to calculate the next approximate solution
u(n+1) from current estimate u(n). As the inverse of any diag-
onal matrix D =diag[dii] is easy, we have
(n) (4.35)
[b − (L + U)u ].
u(n+1) = D−1
Writing in terms of the elements, we have
(n+1) 1 (n)
X
u = 6
i dii [bi − a u
ij j ], (4.36)
j =i
u1 + 4u2 = −10,
(4.38)
2u1 + 2u2 − 7u3 = −9.
We know its exact solution is = 3
u = u2 . (4.39)
u1 2
u3 − 1
4.5 Iteration Methods 47
− −
Then let us decompose the matrix A as
5 1 −2
2 2 −7
= 0 4 0 + 1 0 0 + 0 0 0 .
5 0 0 0 0 0 0 1 −2
0 0 7 2 2 0 0 0 0
− −
(0) T
If we start from the initial guess u = 0 0 0 , we have
u(1) 2.5 , u(2) 2.7500 , u(3) 3.0036 ,
1 2.0143 1.8929
≈ − ≈
1 .2857 − ≈ 0.8571
1.0755
48 Chapter 4. System of Linear Equations
2 3.0077 .
u(4) 2.9732, u(5)
.0309 1.9820
≈
− 0.9684
−
≈ 1.0165
0 0 0
which has no inverse as it is singular. This means the order
of the equations is important to ensure that the matrix is
diago-nally dominant.
Furthermore, if we interchange the first equation (row) and second
equation (row), we have an equivalent system
5 1 2u2 = −5 .
1 4 0u1 10
2 2 − 7u3 − 9
−
1 4 0
2 2 −7
= 0 1 0 + 5 0 0 + 0 0 −2 ,
1 0 0 0 0 0 0 4 0
0 0 72 2 00 0 0
−
−
We can see that it diverges. So what is the problem? How can
the order of the equation affect the results so significantly?
There are two important criteria for the iteration to con-
−1
verge correctly, and they are: the inverse of D must exist and
the spectral radius of the right matrix must be less than 1. The
−1
first condition is obvious, if D does not exist (say, when any of
the diagonal elements is zero), then we cannot carry out the
iteration process at all. The second condition requires
−1 −1 (4.40)
ρ(D ) ≤ 1,ρ[D (L + U)] ≤ 1,
where ρ(A) is the spectral radius of the matrix A. From the
diagonal matrix D, its largest absolute eigenvalue is 1. So
ρ(D−1) = max(|λi|) = 1 seems to be no problem. How about
the following matrix?
N = D−1(L + U) = 5 0 −2 . (4.41)
0 4 0
2/7 − 2/7 0
−
The three eigenvalues of N are λ i = 4.590, −4.479, −0.111.
So its spectral radius is ρ(N) = max(|λ i|) = 4.59 > 1. The
itera-tion scheme will diverge.
If we revisit our earlier example, we have
/5 0 0 1 1 −1
−1
1 −1
D = 0
0
1/4
0
0 , eig(D ) =
1/7 5
, ,
4 7
, (4.42)
and −
−1 −
N = D (L + U) = 1/4 0 0 , (4.43)
0 1/5 2/5
2/7 − 2/7 0
−
50 Chapter 4. System of Linear Equations
whose eigenvalues are
So we have
−1 ρ(N) = 0.4739 < 1. (4.45)
ρ(D ) = 1/4 < 1,
If the vector size u is large (it usually is), then we can devise
other iteration procedure to save memory using the running
update. So only one vector storage is needed.
The Gauss-Seidel iteration procedure is such an iteration
procedure to use the running update and also provides an
ef-ficient way of solving the linear matrix equation Au = b. It
uses the same decomposition as the Jacobi-type iteration by
splitting A into
A = L + D + U, (4.46)
(n+1) (n)
until ku − u k is sufficiently small. Iterations require a
(0)
starting vector u . This method is also referred to as the
successive substitution.
If this simple method does not work, the relaxation method
can be used. The relaxation technique first gives a tentative
new approximation u∗ from A(u(n))u∗ = b(u(n)), then we use
(n)
u
(n+1)
= ωu∗ + (1 − ω)u , ω ∈ (0, 1], (4.54)
∂uj
To find the next approximation u(n+1) from R(u(n+1); u(n)) = 0,
one has to solve a linear system with J as the coefficient matrix
(4.57)
u(n+1) = u(n) − J−1F(u(n)),
(n+1) (n)
under a given termination criterion ku −u k≤ .
Part II
Mathematical Optimization
Chapter 5
Unconstrained
Optimization
f (x) = xe
−x2
, −∞ < x < ∞ , (5.1)
df (x ) 2 2
∗ −x 2 −x (5.2)
dx =e ∗ − 2x∗ e ∗ = 0.
∗
2
Since exp(−x∗ ) 6= 0, we have
√2
. (5.3)
x∗ = ± 2
From the basic calculus we knowthat the maximum requires
√
00 00
f (x∗) ≤ 0 while minimum requires f (x∗) ≥ 0. At = 2/2,
x∗
we have
00 2 −x 2 √
f (x∗) = (4x∗ − 6)x∗e ∗ = 2e−1/2 < 0, (5.4)
−2
1 −1/2
so this point corresponds to a maximum f (x ∗) = e . Simi-
00 √
2
1 2
2 λj , (5.12)
f (x∗ + vj ) = f (x∗) +
which means that the variations of f (x), when x moves away
from the stationary point x∗ along the direction vj , are char-
acterised by the eigenvalues. If λj > 0, | | > 0 will leads to
| f | = |f (x) − f (x∗)| > 0. In other words, f (x) will increase as
| | increases. Conversely, if λj < 0, f (x) will decrease as | | > 0
increases. Obviously, in the special case λj = 0, the
58 Chapter 5. Unconstrained Optimization
Example 5.2: We know that the function f (x, y) = xy has a saddle point at
(0, 0). It increases along the x = y direction and decreases along x = −y
T T
direction. From the above analysis, we know that x ∗ = (x∗, y∗) = (0, 0)
and f (x∗, y∗) = 0. We now have
|
1
| f (x∗ + u) ≈ f (x∗) + 2
where
A =f (x ) = =
∂2f
2
∂ f
∂2f
∂ f
2 !
2 0 1
∂x2 ∂x∂y
r ∗ .
∂x∂y ∂y
2 1 0
The eigenvalue problem is simply
Av = λj vj , (j = 1, 2),
or j
−1 λj = 0,
λ −
1
2/2
√
2/2
v = √ .
2 − 2/2
Since A is symmetric, v1 and v2 are orthonormal. Indeed this
is the case because kv1k = kv2k = 1 and
√2 √2 √2 √2
T
v v = × + × (− ) = 0.
1 2
2 2 2 2
5.3 Gradient-Based Methods 59
Thus, we have
1 2
f ( vj ) = λj , (j = 1, 2). (5.13)
2
As λ1 = 1 is positive, f increases along the direction v1 =
√
2 (1 1)T which is indeed along the line x = y. Similarly, for
2 √
2 T
λ2 = −1, f will decrease along v2 = 2 (1 −1) which is exactly
along the line x = −y. As there is no zero eigenvalue, the
function will not remain constant in the region around (0, 0).
where α is the step size which can vary during iterations. g(rf
) is a function of the gradient rf . Different methods use
(n)
different form of g(rf, x ).
T 1 T 2
f (x) = f (xn)+(rf (xn)) x+ 2 x r f (xn)Δx+..., (5.15)
60 Chapter 5. Unconstrained Optimization
which is minimized near a critical point when x is the solution
This leads to −1
x = xn − G rf (xn),
(5.17)
2
where G = r f (xn) is the Hessian matrix. If the iteration
(0)
procedure starts from the initial vector x (usually taken to
be a guessed point in the domain), then Newton’s iteration
formula for the nth iteration is
(5.18)
x(n+1) = x(n) − G−1(x(n))f (x(n)).
It is worth pointing out that if f (x) is quadratic, then the
solution can exactly be found in a single step. However, this
method is not efficient for non-quadratic functions.
In order to speed up the convergence, we can use a smaller
step size α ∈ (0, 1] so that we have modified Newton’s method
(5.19)
x(n+1) = x(n) − αG−1(x(n))f (x(n)).
It sometimes might be time-consuming to calculate the
Hes-sian matrix for second derivatives. A good alternative is
−1
to use an identity matrix G = I , and we have the quasi-
Newton method
(n+1) (n) (n) (5.20)
x = x − αI rf (x ),
which is essentially the steepest descent method.
In each iteration, the gradient and the step size will be calcu-
lated. Again, a good initial guess of both the starting point
and the step size is useful.
therefore
(0) T
rf (x ) = (275, 290) .
In the first iteration, we have
!
(1) (0)
x =x − α0 275
290 .
(1)
The step size α0 should be chosen such that f (x ) is at the
minimum, which means that
2
f (α0) = 10(10 − 275α0)
2
+5(10 − 275α0)(15 − 290α0) + 10(12 − 290α0) ,
should be minimized. This becomes an optimization problem for
a single independent variable α0. All the techniques for
univariate optimization problems such as Newton’s method can
be used to find α0. We can also obtain the solution by setting
df
= −159725 + 3992000α = 0,
dα0 0
∂f = 20x + 5x = 0, ∂f = 5x + 20x − 60 = 0,
1 2 1 2
∂x1 ∂x2
T T
x∗ = (−4/5, 16/5) = (−0.8, 3.2) .
begin
T
Objective function f (x), x = (x1, ..., xp )
(0)
Initialize starting point x and increments i (i = 1, ..., p)
Initialize step reduction factor γ > 1 and tolerance > 0
while ( ||xn+1 − xn || ≥ )
for all i = 1 to p, by x
Perform exploratory search i ±i (n)
( n+1) )
Update until successful f (x ) ≤ f (x
If the move fails, try again using i = i /γ
end (for)
Perform pattern move:
(n+1) (n) (n−1)
(n)
−
x
x =x + (x (n +1) )
Update new base point x
n=n+1
end (while)
end
(n)
is discarded and new search starts from x , and new search
moves should use smaller step size by reducing increments
Di/γ when γ > 1 is the step reduction factor. Iterations continue
until the prescribed tolerance is met. The algorithm is sum-
marised in the pseudo code shown in Fig. 5.1.
(1) T
the other coordinate along x2, we know that x = (1, 2 − 1)
(1)
is a good move as it gives f (x ) = 6 < 24. Therefore, the
(1) T
base point is x = (1, 1) .
(2) (1)
We then perform the pattern move by using x =x +
(1) (0)
(x − x ), we have
! ! !
(2) 1 2 0
x =2 1 − 2 = 0 .
(2)
This pattern move produces f (x ) = 0 and it is a successful
move. This is indeed the optimal solution as the minimum
T
occurs exactly at x∗ = (0, 0) .
Chapter 6
Linear Mathematical
Programming
67
68 Chapter 6. Linear Mathematical Programming
x1 + x2 ≤ n. (6.3)
The problem now is to find the best x1 and x2 so that the profit
P is a maximum. Mathematically, we have
maximize P (x1, x2) = αx1 + βx2,
2
(x1,x2) N
subject to x + x ≤ n,
1 2
0 ≤x ≤n ,0≤x ≤n . (6.5)
1 1 2 2
x2
6
infeasible region
A (n−n2, n2)
(0, n2) B
P P = αx1+ βx2
feasible solutions C
D -
0 (n1, 0) (n, 0) x1
Figure 6.1: Schematic representation of linear programming. If
α = 2, β = 3, n1 = 16, n2 = 10 and n = 20, then the optimal
solution is at B(10, 10).
s1 ≥ 0. (6.8)
can be written, using three slack variables s1, s2, s3, as the fol-
lowing equalities
x1 + x2 + s1 = n, (6.10)
6.2 Simplex Method 71
x1 + s2 = n1, x2 + s3 = n2, (6.11)
and
s
3
xi ≥ 0,
(i = 1, 2, ..., 5), (6.13)
which has two control variables (x 1, x2) and three slack vari-
ables x3 = s1, x4 = s2, x5 = s3.
In general, a linear programming problem can be written
as the standard form
maximize p T
X
T
where A is an q × p matrix, b = (b1, ..., bq ) , and
T T (6.15)
x = [xp xs] = (x1, ..., xm, s1, ..., sp−m) .
2
0 0 1 0 0 1 s 10
s3
where x1, x2, s1, ..., s3 ≥ 0. Now the first step is to identify a
corner point or basic feasible solution by setting non-isolated
74 Chapter 6. Linear Mathematical Programming
0 0 1 0 0 1
s3
where the third, fourth, and fifth columns (for x 2, s1 and s2,
respectively) have only one non-zero coefficient. All the right
hand sides are non-negative. From this canonical form, we
can find the basic feasible solution by setting non-basic
variables equal to zero. This is to set x 1 = 0 and s3 = 0. We
now have the basic feasible solution
x2 = 10, s1 = 10, s2 = 16, (6.24)
76 Chapter 6. Linear Mathematical Programming
2
0 0 1 0 0 1 10
s3
6.2 Simplex Method 77
Nonlinear Optimization
79
80 Chapter 7. Nonlinear Optimization
ψj (x) ≤ 0, (j = 1, ..., N ), (7.1)
X X
i=1 j=1
where µi 1 and νj ≥ 0.
For example, in order to solve the following problem of the
Gill-Murray-Wright type
minimize f (x) = 100(x − b) 2 + 1,
x<
2
where the typical value for µ is 2000 ∼ 10000.
This method essentially transforms a constrained problem
0
into an unconstrained one. From the stationary condition Π (x)
= 0, we have
(7.5)
200(x∗ − b) − µ(x∗ − a) = 0,
which gives
200b + µa
x = . (7.6)
∗ 200 + µ
For µ → ∞, we have x∗ → a. For µ = 2000, a = 2 and b = 1,
we have x∗ ≈ 1.9090. This means the solution depends on
the value of µ, and it is very difficult to use extremely large values
without causing extra computational difficulties.
7.2 Lagrange Multipliers 81
g(x) = 0, (7.8)
(j = 1, ..., M ). (7.13)
∂λj = gj = 0,
These M + n equations
∂Π
will determine the n-component of x
and M Lagrange multipliers. As = λj , we can consider λj
∂gj
82 Chapter 7. Nonlinear Optimization
and
∂Π 2 2
= x + y − 1 = 0.
∂λ
The condition xy+λy = 0 implies that y = 0 or λ = −x. The
2
case of y = 0 can be eliminated as it leads to x = 0 from y +
2 2
2λx = 0, which does not satisfy the last condition x + y = 1.
Therefore, the only valid solution is
λ = −x.
From the first stationary condition, we have
2 2 2 2
y − 2x = 0, or y = 2x .
Substituting this into the third stationary condition, we have
2 2
x − 2x − 1 = 0,
which gives
x = ±1.
7.3 Kuhn-Tucker Conditions 83
So we have four stationary points
√ √ q
√
P1(1, 2), P2(1, − 2), P3(−1, 2), P4(−1, − (2)).
The values of function f (x, y) at these four points are
f (P1) = 2, f (P2) = 2, f (P3) = −2, f (P4) = −2.
√ √
Thus, the function reaches its maxima at (1, 2) and (1, − 2).
The Lagrange multiplier for this case is λ = −1.
x y x y
where sm = {(s m(1), s m(1)), ..., (s m(m), s m(m))} is a time-
ordered set of m distinct visited points with a sample of size
m. The interesting thing is that the performance is indepen-
dent of algorithm A itself. That is to say, all algorithms for
optimization will give the same performance when averaged
over all possible functions. This means that the universally
best method does not exist.
Well, you might say, there is no need to formulate new al-gorithms
because all algorithms will perform equally well. The
7.4 No Free Lunch Theorems 85
Metaheuristic Methods
Chapter 8
Tabu Search
89
90 Chapter 8. Tabu Search
As sB
Es sD sC
Figure 8.1: Passing through each line once and only once.
At point B, there are four possible routes, but only three (BE,
BD, BC) are permissible routes because AB is now in the
updated Tabu list. We see that the length of the Tabu list is
increasing and flexible. Suppose we randomly choose the
route BD to go to point D. The Tabu list now becomes
and
Tabu list ={BD, DE, EB, BC, CD} .
2 2
s s
s3 s3
1s 1s
Figure 8.2: Travelling salesman problem for five cities and a
possible route (not optimal).
q
2 2
dij = (xi − xj ) + (yi − yj ) ), (i, j = 1, 2, ..., 5), (8.1)
where (xi, yi) are the Cartesian coordinates of city i. This will
94 Chapter 8. Tabu Search
9.43 14.14 ∞
10.00 10.00
∞
As there are n = 5 cities, there are 5! = 120 ways of visiting
these cities such as 2-5-3-1-4-2, 5-4-1-2-3-5 (see Fig. 8.3)
and so on and so forth. For example, the route 2-5-3-1-4-2
has a total distance
2 2
s s
s3
1s s31 s
Figure 8.3: Two different routes: 2-5-3-1-4-2 and 5-4-1-2-3-5.
in parallel, it would takes the time much longer than the age
of the universe to do such a huge number of calculations. So
we have to use other methods.
Now let us use Tabu search for the travelling salesman problem.
As there is no known efficient method for solving such problems,
we use a method of systematic swapping. For example, when n =
3 cities, any order is an optimal 1-2-3 or 2-3-1 and others. For n =
four cities, suppose we know that optimal route is 1-2-3-4-1 (its
ordered permutation 2-3-4-1-2 and others are the same route).
However, initially we start randomly from, say, 2-4-3-1-2. A simple
swap between cities 3 and 4 leads to 2-3-4-1-2 which is the
optimal solution. Of course, we do not know which two cities
should be swapped, so we need a systematic approach by
swapping any two cities i and j. Furthermore, in order to swap any
two cities, say, 2 and 4, we can either simple swap them or we can
use a more systematical approach by swapping 2 and 3, then 3
and 4. The latter version takes more steps but it does provide a
systematic way of implementing the algorithms. Thus, we will use
this latter approach.
For simplicity, we still use the same example of the five cities.
Suppose we start from a random route 2-5-3-1-4-2 (see Fig. 8.3). In order
to swap any two cities among these five cities, we use the following swap
indices by swapping two ad-
96 Chapter 8. Tabu Search
jacent cities 3 2 4 5
1
2 1 3 4 5
swap = 2 4 3 , (8.6)
1
1 2 3 5
5
4
5 2 3 4 1
where the first row swaps the first two cities and the fifth row
swap the first and last (fifth) cities. In order to avoid repetition of
recent swaps, we use a Tabu list for the above index matrix
2 2
s s
s3
1s s31 s
Figure 8.4: Current best route after first generation of swaps
and the final optimal route.
Ant Colony
Optimization
From the Tabu search, we know that we can improve the search
efficiency by using memory. Another way of improving the
efficiency is to use the combination of randomness and memory.
The randomness will increase the diversity of the solutions so as to
avoid being trapped in local optima. The memory does not means
to use simple history records. In fact, there are other forms of
memory using chemical messenger such as pheromone which is
commonly used by ants, honeybees, and many other insects. In
this chapter, we will discuss the nature-inspired ant colony
optimization (ACO), which is a metaheuristic method.
99
100 Chapter 9. Ant Colony Optimization
mark the trails to and from it. From the initial random for-
aging route, the pheromone concentration varies and the
ants follow the route with higher pheromone concentration,
and the pheromone is enhanced by increasing number of
ants. As more and more ants follow the same route, it
becomes the favoured path. Thus, some favourite route
(often the shortest or more efficient) emerges. This is
actually a positive feedback mecha-nism.
Emerging behaviour exists in an ant colony and such emer-
gence arises from simple interactions among individual ants.
Individual ants act according to simple and local information
(such as pheromone concentration) to carry out their activities.
Although there is no master ant overseeing the entire colony
and broadcasting instructions to the individual ants, organized
behaviour still emerges automatically. Therefore, such emer-
gent behaviour is similar to other self-organized phenomena
which occur in many processes in nature such as the pattern
formation in animal skins (tiger and zebra skins).
The foraging pattern of some ant species (such as the army
ants) can show extraordinary regularity. Army ants search for
◦
food along some regular routes with an angle of about 123
apart. We do not know how they manage to follow such reg-
ularity, but studies show that they could move in an area and
build a bivouac and start foraging. On the first day, they for-age
in a random direction, say, the north and travel a few hundred
meters, then branch to cover a large area. The next day, they
◦
will choose a different direction, which is about 123 from the
direction on the previous day and cover a large area. On the
following day, they again choose a different direction about
◦
123 from the second day’s direction. In this way, they cover the
whole area over about 2 weeks and they move out to a
different location to build a bivouac and forage again.
The interesting thing is that they do not use the angle of
360◦/3 = 120◦ (this would mean that on the fourth day, they will
search on the empty area already foraged on the first day). The
beauty of this 123◦ angle is that it leaves an angle of about 10 ◦
from the direction on the first day. This means they cover
9.2 Ant Colony Optimization 101
t+1 t t (9.3)
φij = (1 − γ)φij + δφij ,
where γ ∈ [0, 1] is the rate of pheromone evaporation. The
increment δφtij is the amount of pheromone deposited at time
9.3 Double Bridge Problem 103
(1)
=⇒ A (2) B=⇒
Figure 9.3: Route selection via ACO: (a) initially, ants choose
each route with 50-50 probability, and (b) almost all ants
move along the shorter route after 5 iterations.
Particle Swarm
Optimization
107
108 Chapter 10. Particle Swarm Optimization
Possible
f x∗
directions i
:
-
f
H
particle i A H
HH
A Hj
? AU v
g∗
where 1 and 2 are two random vectors, and each entry taking
the value between 0 and 1. The Hadamard product of two
matrices u v is defined as the entrywise product, that is [u v] ij
= uij vij .
The initial values of x(i, j, t = 0) can be taken as the
bound-ary values [a = min(xj ), b = max(xj )] and vti=0= 0.
10.3 Accelerated PSO 109
t=t+1
for loop over all n particles and all p dimensions
t +1
Generate new velocity v i using equation
t +1 t t +1
(10.1) Calculate new locations x i = x i + v i
Evaluate objective functions at new locations
t +1 t+1
x i Find the current minimum fmin
end for
Find the current best x∗i and current global best
g∗ end while
Output the results x∗i and g∗
end
α = 0.7 ,
t (10.7)
or
2 2 2 (10.14)
sin(2x ) = 4x cos(2x ),
which has multiple solutions. The global maximum occurs at
(10.15)
x∗ = y∗ ≈ 0.7634.
It is worth pointing out that the solution x∗ = y∗ is
independent of λ as long as λ > 0.
If we use 40 particles, the new locations of these
particles after 10 iterations (generations) are shown in Figure
10.4. The final optimal solution at t = 10 is shown on the right
(with the best location marked with ).
10.5 Implementation
The accelerated particle swarm optimization has been imple-
mented using both Matlab and Octave. If you type the follow-
ing program and save it as, say, pso simpledemo.m, then
launch Matlab or Octave and change to the directory where
the file was saved. After typing in >pso simpledemo, it will
find the global optimal solution in less a minute on most
modern per-sonal computers.
114 Chapter 10. Particle Swarm Optimization
function [best]=pso_simpledemo(n,Num_iterations)
% n=number of particles
% Num_iterations=total number of iterations
if nargin<2, Num_iterations=10; end
if nargin<1, n=20; end
% Michaelewicz Function f*=-1.801 at [2.20319, 1.57049]
% Splitting two parts to avoid a long line for
printing str1=‘-sin(x)*(sin(x^2/3.14159))^20’;
str2=‘-sin(y)*(sin(2*y^2/3.14159))^20’;
funstr=strcat(str1,str2);
% Converting to an inline function and
vectorization f=vectorize(inline(funstr));
% range=[xmin xmax ymin ymax];
range=[0 4 0 4];
% ----------------------------------------------------
% Setting the parameters: alpha, beta
% Random amplitude of roaming particles alpha=[0,1]
% alpha=gamma^t=0.7^t;
% Speed of convergence (0->1)=(slow-
>fast) beta=0.5;
% ----------------------------------------------------
% Grid values of the objective function
% These values are used for
visualization only Ngrid=100;
dx=(range(2)-range(1))/Ngrid;
dy=(range(4)-range(3))/Ngrid;
xgrid=range(1):dx:range(2);
ygrid=range(3):dy:range(4);
[x,y]=meshgrid(xgrid,ygrid);
z=f(x,y);
% Display the shape of the function to be
optimized figure(1);
surfc(x,y,z);
% ---------------------------------------------------
best=zeros(Num_iterations,3); % initialize history
10.5 Implementation 115
∗
The solution at (0, 0) is trivial, and the minimum f ≈−1.801
occurs at about (2.202,1.571) (see Fig. 10.5).
If we run the program, we will get the global optimum after
about 200 evaluations of the objective function (for 20 particles
and 10 iterations). The results are shown in Fig. 10.6.
10.6 Constraints
The implementation we discussed in the previous section is for
unstrained problems. For constrained optimization, there are
many ways to implement the constraint equalities and inequal-
ities. However, we will only discussed two approaches: direct
implementation and transform to unconstrained optimization.
The simplest direct implementation is to check all the new
particle locations to see if they satisfy all the constraints. The
new locations are discarded if the constraints are not met, and
new locations are replaced by newly generated locations until
all the constraints are met. Then, the new solutions are eval-
uated using the standard PSO procedure. In this way, all the
new locations should be in the feasible region, and all infeasible
solutions are not selected. For example, in order to maximize f
(x) subjected to a constraint g(x) ≤ 0, the standard PSO
118 Chapter 10. Particle Swarm Optimization
2 (10.19)
Π(x, ν) = f (x) + νg(x) .
Simulated Annealing
119
120 Chapter 11. Simulated Annealing
δE = γδf, (11.2)
it is accepted.
11.3 SA Algorithm
The pseudo code of the simulated annealing algorithm is shown
in Fig. 11.1. In order to find a suitable starting temperature T 0,
we can use any information about the objective function. If we
11.4 Implementation 123
2
-1
-2
-2 -1 0 1 2
Figure 11.2: Contour of Rosenbrock function with a global
min-imum f∗ = 0 at (1, 1), and locations of the final 10
particles at the end of the simulated annealing.
11.4 Implementation
Based on the guidelines of choosing the important parameters
such as cooling rate, initial and final temperatures, and the bal-
anced number of iterations, we can implement the simulated
124 Chapter 11. Simulated Annealing
2 2 2
f (x, y) = (1 − x) + 100(y − x ) ,
we know that its global minimum f ∗ = 0 occurs at (1, 1) (see
Fig. 11.2). This is a standard test function and quite tough for
most algorithms. However, using the program given below, we
can find this global minimum easily and the 500 evaluations
during the simulated annealing are shown in Fig. 11.3.
xgrid=range(1):0.1:range(2);
ygrid=range(3):0.1:range(4);
[x,y]=meshgrid(xgrid,ygrid);
surfc(x,y,f(x,y));
% Initializing parameters and settings
T_init = 1.0; % initial temperature
T_min = 1e-10; % finial stopping temperature
% (eg., T_min=1e-10)
F_min = -1e+100; % Min value of the function
max_rej=500; % Maximum number of rejections
max_run=100; % Maximum number of runs
max_accept = 15; % Maximum number of accept
k = 1; % Boltzmann constant
alpha=0.9; % Cooling factor
Enorm=1e-5; % Energy norm (eg, Enorm=1e-8)
guess=[2 2]; % Initial guess
% Initializing the counters
i,j etc i= 0; j = 0;
accept = 0; totaleval = 0;
% Initializing various values
T = T_init;
E_init = f(guess(1),guess(2)); E_old
= E_init; E_new=E_old; best=guess; %
initially guessed values % Starting
the simulated annealling
while ((T > T_min) & (j <= max_rej) &
E_new>F_min) i = i+1;
% Check if max numbers of run/accept are met
if (i >= max_run) | (accept >= max_accept)
% Cooling according to a cooling schedule
T = alpha*T;
totaleval = totaleval + i;
% reset the counters
i = 1; accept = 1;
end
% Function evaluations at new
locations ns=guess+rand(1,2)*randn;
E_new = f(ns(1),ns(2));
% Decide to accept the new
solution DeltaE=E_new-E_old;
% Accept if improved
if (-DeltaE > Enorm)
126 Chapter 11. Simulated Annealing
in the domain (x, y) ∈ [−5, 5] × [−5, 5]. The landscape of the egg
crate function is shown in Fig. 11.4, and the paths of par-ticles
during simulated annealing are shown in Fig. 11.5. It is worth
pointing that the random generator we used in the program
rand(1,2)<1/2 leads to discrete motion along several major di-
rections, which may improve the convergence for certain class of
functions. However, for the Rosenbrock test function, this dis-crete
approach does not work well. For continuous movement in all
directions, simply use the random function rand(1,2) that is given in
this simple demo program. It would takes about 2500 evaluations
to get an accuracy with three decimal places.
Chapter 12
Multiobjective
Optimization
p
X
which is equivalent to
uv ⇐⇒ uv ∨ u = v. (12.6)
dominated set
x0
Pareto
front non-dominated
f1
which lead to
2α + 2β(x − 1) + 2γ = 0, (12.15)
and
2α(y − 1) + 2βy + 2γ(y + 1) = 0. (12.16)
The solutions are
(12.17)
x∗ = β,y∗ = α − γ.
1 (12.18)
x∗ = ,y∗ = 0.
3
N
X 2 2 2
Π(x) = αifi (x) = α1f1 (x) + ... + αN fN (x), (12.19)
i=1
utility (U )
A
feasible region
f1
representing preference)
u(x) = , (12.20)
1 − e−(x−xa )/ρ
1 − e−(xb −xa )/ρ
where xa and xb are the lowest and highest level of x, and ρ
is called the risk tolerance of the decision maker.
The utility function defines combinations of objective
values f1, ..., fp which a decision maker finds equally
acceptable or indifference. So the contours of the constant
utility are referred to as the indifference curves. The
optimization now becomes the maximization of the utility. For
a maximization problem with two objectives f 1 and f2, the
idea of the utility contours (indifference curves), Pareto front
and the Pareto solution with maximum utility (point A) are
shown in Fig. 12.4. When the utility function touches the
Pareto front in the feasible region, it then provides a
maximum utility Pareto solution (marked with A).
For two objectives f1 and f2, the utility function can be
con-structed in different ways. For example, the combined
product takes the following form
α β , (12.21)
U (f1 , f2) = kf f
1 2
where α and β are non-negative exponents and k a scaling
138 Chapter 12. Multiobjective Optimization
There are many other forms. The aim of utility function con-
p
structed by the decision maker is to form a mapping U : < 7→
< so that the total utility function has a monotonic and/or
convexity properties for easy analysis. It will also improve
the quality of the Pareto solution(s) with maximum utility. Let
us look at a simple example.
U = f1f2,
which combines the two objectives. The line connecting the two
corner points (5, 0) and (0, 5/α) forms the Pareto front (see Fig.
12.5). It is easy to check that the Pareto solution with maximum
utility is U = 25 at A(5, 0) when the utility contours touch the
Pareto front with the maximum possible utility.
The complexity of multiobjective optimization makes the
construction of the utility function a difficult task as it can be
constructed in many ways. A general and yet widely-used
utility function is often written in the following additive form
maximize K p
παu ,
XX
Pareto front
B
B
U= f1f2
feasible set
function [best]=pso_multi(Num_run)
% Num_run=number of recursive iterations (=10 default)
if nargin<1, Num_run=10; end
disp(‘Running PSO recursively!’);
disp(‘Please wait for a few minutes ...’);
n=20; % number of particles;
Num_steps=10; % number of pseudo time steps
% This function has two global optima f*=0.851
% at (0.7634,0.7634) and (-0.7634,-0.7634).
fstr=‘sin(x^2+y^2)/sqrt(x^2+y^2)*exp(-0.05*(x-y)^2)’
% Converting to an inline function
f=vectorize(inline(fstr));
% range=[xmin xmax ymin ymax];
range=[-5 5 -5 5];
% -------------------------------------------------
% Grid values of the objective function
% These values are used for
visualization only Ngrid=100;
dx=(range(2)-range(1))/Ngrid;
dy=(range(4)-range(3))/Ngrid;
xgrid=range(1):dx:range(2);
142 Chapter 12. Multiobjective Optimization
ygrid=range(3):dy:range(4);
[x,y]=meshgrid(xgrid,ygrid);
z=f(x,y);
% Display the shape of the function to be
optimized surfc(x,y,z); drawnow;
% Run the PSO recursively for, say, 10 times
for i=1:Num_run,
best(i,:)=pso(f,range,n,Num_steps);
end
% ------------------------------------------------
% Standard Accelerated PSO (for finding maxima)
function [best]=pso(f,range,n,Num_steps)
% here best=[xbest ybest fbest]
% Speed of convergence (0->1)=(slow-
>fast) beta=0.5;
% ----- Start Particle Swarm Optimization --------
% generating the initial locations of n
particles [xn,yn]=init_pso(n,range);
% Iterations as pseudo time
for i=1:Num_steps,
% Find the current best location
(xo,yo) zn=f(xn,yn);
zn_max=max(zn);
xo=max(xn(zn==zn_max));
yo=max(yn(zn==zn_max));
zo=max(zn(zn==zn_max));
% Accelerated PSO with randomness:
alpha=gamma^t gamma=0.7; alpha=gamma.^i;
% Move all particle to new locations
[xn,yn]=pso_move(xn,yn,xo,yo,alpha,beta,range);
end %%%%% end of iterations
% Return the finding as the current best
best(1)=xo; best(2)=yo; best(3)=zo;
% -----------------------------------------------
% All subfunctions are listed here
% Intial locations of particles
function [xn,yn]=init_pso(n,range)
12.5 Other Algorithms 143
xrange=range(2)-range(1);
yrange=range(4)-range(3);
xn=rand(1,n)*xrange+range(1);
yn=rand(1,n)*yrange+range(3);
% Move all the particles toward to (xo,yo)
function [xn,yn]=pso_move(xn,yn,xo,yo,a,b,range)
nn=size(yn,2); %%%%% a=alpha, b=beta
xn=xn.*(1-b)+xo.*b+a.*(rand(1,nn)-0.5);
yn=yn.*(1-b)+yo.*b+a.*(rand(1,nn)-0.5);
[xn,yn]=findrange(xn,yn,range);
% Make sure the particles are inside the range
function [xn,yn]=findrange(xn,yn,range)
nn=length(yn);
for i=1:nn,
if xn(i)<=range(1), xn(i)=range(1); end
if xn(i)>=range(2), xn(i)=range(2); end
if yn(i)<=range(3), yn(i)=range(3); end
if yn(i)>=range(4), yn(i)=range(4); end
end
145
146 BIBLIOGRAPHY
[13] Deb K., An efficient constraint handling method for genetic al-
gorithms, Comput. Methods Appl. Mech. Engrg., 186, 311-338
(2000).
[14] Deb. K., Optimization for Engineering Design: Algorithms and
Examples, Prentice-Hall, New Delhi, (1995).
[16] Dorigo M. and St¨utzle T., Ant Colony Optimization, MIT Press,
Cambridge, (2004).
[17] El-Beltagy M. A., Keane A. J., A comparison of various opti-
mization algorithms on a multilevel problem, Engin. Appl. Art.
Intell., 12, 639-654 (1999).
[18] Engelbrecht A. P., Fundamentals of Computational Swarm In-
telligence, Wiley, (2005).
[19] Flake G. W., The Computational Beauty of Nature: Computer
Explorations of Fractals, Chaos, Complex Systems, and
Adapta-tion, Cambridge, Mass.: MIT Press, (1998).
[20] Forsyth A. R., Calculus of Variations, Dover (1960).
[21] Fowler A. C., Mathematical Models in the Applied Sciences,
Cambridge University Press, (1997).
[22] Gardiner C. W., Handbook of Stochastic Methods, Springer,
(2004).
[23] Gershenfeld N., The Nature of Mathematical Modeling, Cam-
bridge University Press, (1998).
[24] Gill P. E., Murray W., and Wright M. H., Practical optimization,
Academic Press Inc, (1981).
[25] Glover F., Heuristics for Integer Programming Using Surrogate
Constraints, Decision Sciences, 8, 156-166 (1977).
[26] Glover F., Tabu Search - Wellsprings and Challenges, Euro. J.
Operational Research, 106, 221-225 (1998).
[27] Glover F. and Laguna M., Tabu Search, Kluwer Academic
Pub-lishers, (1997).
[28] Goldberg D. E., Genetic Algorithms in Search, Optimization and
Machine Learning, Reading, Mass.: Addison Wesley (1989).
[29] Goodman R., Teach Yourself Statistics, London, (1957).
BIBLIOGRAPHY 147
149
150 INDEX