Genetic Algorithm Applied On Multiobjective Optimization
Genetic Algorithm Applied On Multiobjective Optimization
MULTIOBJECTIVE OPTIMIZATION
The undersigned hereby certify that they have read and recommend to the department
of mathematics for acceptance of this project entitled ”GENETIC ALGORITHM AP-
PLIED ON MULTIOBJECTIVE OPTIMIZATION” by Beletew Mekasha in par-
tial fulfillment of the requirements for the degree of Master of Science in mathematics.
Signature:
Date
Signature:
Date
Signature:
Date
Acknowledgment
ii
Abstract
Multi-objective formulations are a realistic models for many complex optimization prob-
lems. In this project we presented multiobjective optimization problems using genetic
algorithms developed specifically for the problems with multiple objectives. Customized
genetic algorithms have been demonstrated to be particularly effective to determine excel-
lent solutions(pareto-optimal points) to the problems. Moreover, in solving multi-objective
problems, designers may be interested in a set of pareto-optimal points instead of a single
point. Since genetic algorithms(GAs) work with a population of points, it seems natural
to use GAs in multi-objective optimization problems to capture a number of solutions si-
multaneously. In this project we also describe the working principle of a binary-coded and
real-parameter genetic algorithm, which is ideally suited to handle problems with a con-
tinuous search space.Moreover, a non-dominated sorting-based multi-objective evolutionary
algorithm (MOEA), called non-dominated sorting genetic algorithm II (NSGA-II), is also
presented.
iii
Contents
iv
Introduction
v
of the weight vector used in the scalarization process. In this chapter, we present the working
principle of a binary-coded and real-parameter genetic algorithm operator.
Chapter 4 presents about Non-dominated Sorting Genetic Algorithm (NSGA-II) which
carries out a non-dominated sorting of a combined parent and offspring population. There-
after, starting from the best non-dominated solutions, each front is accepted until all pop-
ulation slots filled. This makes the algorithm an elitist type. For the solutions of the last
allowed front, a crowded distance-based niching strategy is used to resolve which solutions
are carried over to the new population.
The first multi-objective GA, called Vector Evaluated Genetic Algorithms (or VEGA),
was proposed by Schaffer[?]. Afterward, several major multi-objective evolutionary al-
gorithms were developed such as Multi-objective Genetic Algorithm (MOGA)[?], Niched
Pareto Genetic Algorithm , Random Weighted Genetic Algorithm (RWGA), Non-dominated
Sorting Genetic Algorithm (NSGA), Strength Pareto Evolutionary Algorithm (SPEA)[?],
Pareto-Archived Evolution Strategy (PAES), Fast Non-dominated Sorting Genetic Algo-
rithm (NSGA-II), Multi-objective Evolutionary Algorithm (MEA), Rank-Density Based Ge-
netic Algorithm (RDGA).
Several survey papers have been published on evolutionary multi-objective optimization.
This project takes a different course and focuses on important issues while designing a multi-
objective GA and describes common techniques used in multi-objective GA to attain the
goals in multi-objective optimization and we address the inclusion of an elite-preserving
operator to make the algorithms better converge to the Pareto-optimal solutions by using
fast and elitist Non-dominated Sorting Genetic Algorithm (NSGA-II).
vi
Chapter 1
Preliminary
or equalities:
hj (x) = 0, j = 1, · · · , p
The set of all n-tuples of real numbers denoted by Rn is called Euclidean n−space.
Two Euclidean spaces, objective space and decision variable space are considered in multi-
objective optimization problems.
1
Given an n-dimensional decision variable vector x = [x1 , · · · , xn ] in the solution space X,
find a vector x∗ that minimizes a given set of K objective functions: z(x∗ ) = [z1 (x∗ ), · · · , zK (x∗ )].
The solution space X is generally restricted by a series of constraints and bounds on the
decision variables
In many real-life problems with more than one criteria, objectives under consideration
conflict with each other. Hence, optimizing x with respect to a single objective often results
in unacceptable value with respect to other objectives. Therefore, a perfect multi-objective
solution that simultaneously optimizes each objective function is almost impossible. A rea-
sonable solution to a multi-objective problem is to investigate a set of solutions, each of
which satisfies the objectives at an acceptable level without being dominated by any other
solution.
• Non conflicting:If the objective functions are not conflicting to each other, the car-
dinality of the pareto-optimal set is one. This means that the minimum solution
corresponding to any objective function is the same.
Observe that gi (x) ≤ 0 and hj (x) = 0 represent constraints that must be fulfilled while
optimizing (minimizing or maximizing) f (x). Ω contains all possible x that can be used to
satisfy an evaluation of f (x) and its constraints. Of course, x can be a vector of continuous
or discrete variables as well as f being continuous or discrete.
2
Multi-objective optimization: When an optimization problem involves more than
one objective function, the task of finding one or more optimum solutions is known as multi-
objective optimization.
For multiple-objective problems, the objectives are generally conflicting, preventing si-
multaneous optimization of each objective or finding a multi-dimensional Pareto-optimal
front. As in a single-objective optimization problem, the multi-objective optimization prob-
lem may contain a number of constraints which any feasible solution (including all optimal
solutions) must satisfy.
It is noted that gi (x) ≤ 0 and hj (x) = 0 represent constraints that must be fulfilled
while minimizing (or maximizing) F (x) and Ω contains all possible x that can be used to
satisfy an evaluation of F (x). The optimal solutions in multi objective optimization can be
defined from a mathematical concept of partial ordering. In the parlance of multi-objective
optimization, the term domination is used for this purpose.
There are two general approaches to multiple-objective optimization. One is to combine
the individual objective functions into a single composite function. Determination of a single
objective is possible with methods such as utility theory, weighted sum method, etc., but
the problem lies in the correct selection of the weights or utility functions to characterize
the decision-makers preferences.
The second general approach is to determine an entire Pareto optimal solution set or
a representative subset. A Pareto optimal set is a set of solutions that are non-dominated
with respect to each other. While moving from one Pareto solution to another, there is
always a certain amount of sacrifice in one objective to achieve a certain amount of gain in
the other. Pareto optimal solution sets are often preferred to single solutions because they
can be practical when considering real-life problems, since the final solution of the decision
maker is always a trade-off between crucial parameters.
3
However, in multi-objective optimization, there are clearly two goals. Progressing
towards the pareto-optimal front is certainly an important goal. However, maintaining
a diverse set of solutions in the non-dominated front is also essential. The achievement
of one goal does not necessarily achieve the other goal. Explicit or implicit mechanisms
to emphasize convergence near the pareto-optimal front and the maintenance of a
diverse set of solutions most be introduced in an algorithm. Because of these dual
tasks, multi-objective optimization is more difficult than single-objective optimization.
• Dealing with two search spaces: Another difficulty is that a multi-objective opti-
mization involves two search spaces, instead of one. In single-objective optimization,
there is only one search space, the decision variable space. An algorithm works in this
space by accepting and rejecting solutions based on their objective function values.
Here, in addition to the decision variable space there also exist the objective or crite-
rian space. Although these two spaces are related by unique mapping between them,
often the mapping is nonlinear and the properties of the two search spaces are not
similar.
• The best-known Pareto front should be as close possible as to the true Pareto front.
Ideally, the best-known Pareto set should be a subset of the Pareto optimal set.
• In addition, the best-known Pareto front should capture the whole spectrum of the
Pareto front. This requires investigating solutions at the extreme ends of the objective
function space.
4
1.4 Evolutionary Algorithms
The potential of evolutionary algorithms for solving multiobjective optimization problems
was hinted as early as the late 1960s by Rosenberg in his PhD thesis[?]. Evolutionary algo-
rithm is characterized by a population of solution candidates and the reproduction process
enables the combination of existing solutions to generate new solutions. This enables finding
several members of the Pareto-optimal set in a single run instead of performing a series of
separate runs, which is the case for some of the conventional stochastic processes.
Evolutionary algorithms are based on the principle of evolution, i.e. survival of the fittest.
Unlike classical methods, they do not use a single search point but a population of points
called individuals. Each individual represents a potential solution to the problem. In these
algorithms, the population evolves toward increasingly better regions of the search space by
undergoing statistical transformations called recombination, mutation and selection.
Evolutionary Algorithms have a number of components, procedures, or operators that
must be specified in order to define a particular Evolutionary Algorithms. The most impor-
tant components are:
a. Representation (definition of individuals) Objects forming possible solutions
within the original problem context are referred to as phenotypes, while their encod-
ing, that is, the individuals within the evolutionary algorithms are called genotypes.
The first design step is commonly called representation, as it amounts to specifying a
mapping from the phenotypes onto a set of genotypes that are said to represent these
phenotypes. A solution a good phenotype is obtained by decoding the best genotype
after termination. To this end it should hold that the (optimal) solution to the problem
at hand a phenotype is represented in the given genotype space.
b. Evaluation function (or fitness function) Typically, this function is composed of a
quality measure in the phenotype space and the inverse representation. The evaluation
function is commonly called the fitness function in Evolutionary Algorithms.
c. Population The role of the population is to satisfy (the representation of) possible
solutions. Given a representation, defining a population can be as simple as specifying
how many individuals are in it, that is. In almost all Evolutionary Algorithms appli-
cations the population size is constant and does not change during the evolutionary
search. The diversity of a population is a measure of the number of different solutions
present. No single measure for diversity exists.
d. Parent selection mechanism Choosing individuals for recombination and mutation
to become parents to the next generation and selection effectively gives an individual
with higher fitness value probably contributing one or more children in succeeding
generation. The role of parent selection or mating selection is to distinguish among
individuals based on their quality, in particular, to allow the better individuals to
become parents of the next generation. In Evolutionary Algorithms, parent selection
is typically probabilistic. Thus, high-quality individuals get a higher chance to become
parents than those with low quality. Nevertheless, low-quality individuals are often
5
given a small, but positive chance; otherwise the whole search could become too greedy
and get stuck in a local optimum.
6
ii Crossover: Make pairs randomly and a crossover for each pair according to a given
crossover rate (probability) to create two offspring.
iii Mutation: Mutate each individual according to a given mutation rate (probability).
The evolutionary algorithms use the three main principles of the natural evolution which
is reproduction, natural selection and diversity of the species, maintained by the differences
of each generation with the previous.
Genetic Algorithms works with a set of individuals, representing possible solutions of the
task. The selection principle is applied by using a criterion, giving an evaluation for the
individual with respect to the desired solution. The best-suited individuals create the next
generation.
Figure 1.2: flowchart of the working principle of a GA. Adopted from:- [?]
In its general form, genetic algorithm(GA) work through the following steps:
(1) Creation of a random initial population of Np potential solutions to the problem and
evaluation of these individuals in terms of their fitness, i.e. of their corresponding
objective function values;
(2) Check for termination of the algorithm :- As in the most optimization algorithms, it is
possible to stop the genetic optimization by:
– Value of the function:- The value of the function of the best individual is within
defined range around a set value. However, it is not recommended to use this
criterion alone, because of the stochastic element in the search the procedure, the
optimization might not finish within sensible time;
– Maximal number of iterations:- this is the most widely used stopping criteria. It
guarantees that the algorithms will give some results within some time, whenever
it has reached the extremum or not;
7
– Stall generation:- if within initially set number of iterations (generations) there
is no improvement of the value of the fitness function of the best individual the
algorithms stops.
8
Chapter 2
Multi-objective optimization
Subject to gj (x) ≥ 0, j = 1, 2, · · · , J;
hk (x) = 0, k = 1, 2, · · · , K;
(L) (U )
xi ≤ xi ≤ xi i = 1, 2, · · · , n.
9
referred to as vector optimization, because a vector of objectives, instead of a single objective,
is optimized[?].
Linear and Nonlinear Multi-objective Optimization Problem: If all objective
functions and constraint functions are linear, the resulting multi-objective optimization
problem is called a multi-objective linear program(MOLP). Like the linear programming
problems, MOLPs also have many theoretical properties. However, if any of the objective or
constraint functions are nonlinear, the resulting problem is called a nonlinear multi-objective
problem[?].
2.1.1 Convex and Non convex Multi-Objective Optimization Problem
Before we discuss a convex multi-objective optimization problem, let us first define a convex
function:
Definition 2.1.1. A subset S of Rn is said to be convex if for any two pair of element
x1 , x2 ∈ Rn , the following condition is true:
λx1 + (1 − λ)x2 ∈ S
for all 0 ≤ λ ≤ 1.
Definition 2.1.2. A function f : Rn −→ R is a convex function if for any two pair of
solution x1 , x2 ∈ Rn , the following condition is true:
for all 0 ≤ λ ≤ 1.
The above definition gives rise to the following properties of a convex function:
• The linear approximation of f (x) at any point in the interval [x1 , x2 ] always underes-
timates the actual function value.
A function satisfying the inequality shown in (??) with a 0 >0 sign instead of a 0 ≤0 sign is
called a non-convex function. To test it a function is convex with in an interval, the Hessian
matrix 52 f is calculated and checked for its positive-definiteness at all points in the interval.
One of the ways to check the positive-definiteness of a matrix is to compute the eigenvalues of
the matrix and check to see if all eigenvalues are positive. To test if a function is non-convex
in an interval,the Hessian matrix − 52 f is checked for its positive-definiteness[?].
Definition 2.1.3. A multi-objective optimization problem is convex if all its objective func-
tions are convex and the feasible region is convex[?].
Proposition 2.1. Let X be a convex set in Rn , f (x) = (f1 (x), · · · , fk (x)) and f : Rn −→ Rk
is Rk+ convex if and only if the functions fi are convex,for all i = 1, 2, · · · , k.
10
Proof. (⇒) : let f is Rk+ is convex and x, y ∈ X for all λ ∈ [0, 1]
⇒ λf (x) + (1 − λ)f (y) − f (λx + (1 − λ)y) ∈ Rk+
⇒ λ(f1 (x), · · · , fk (x)) + (1 − λ)(f1 (y), · · · , fk (y)) − [f1 (λx + (1 − λ)y), · · · , fk (λx + (1 − λ)y)]
• irreflexive if (x, x) ∈
/ R for all x ∈ S,
11
Definition 2.2.2. A binary relation R on a set S is
• an equivalence relation if it is reflexive, symmetric, and transitive,
Instead of (x, y) ∈ R we shall also write xRy. In the case of R being a preorder the
pair (S, R) is called a preordered set. In the context of (pre)orders yet another notation for
the relation R is convenient. We shall write x y as shorthand for (x, y) ∈ R and x y
for (x, y) ∈
/ R and indiscriminately refer to the relation R or the relation . This notation
can be read as ”preferred to. ”Actually, ≺ and ∼ can be seen as the strict preference and
equivalence (or indifference) relation, respectively.
Definition 2.2.3. A binary relation is called
• partial order if it is reflexive, transitive and antisymmetric,
• convex if αd1 + (1 − α)d2 ∈ C for all d1 , d2 ∈ C and for all 0 < α < 1,
T
• pointed if for d ∈ C, d 6= 0, −d ∈
/ C, i.e., C (−C) ⊆ {0}.
12
y ∼ z if and only if not y z and not z y,
y % z if and only if y z or y ∼ z.
y % z ⇐⇒ y z or y ∼ z ⇐⇒ u(y) = u(z).
D+ (y) := {d ∈ RP : y + d y}
D− (y) := {d ∈ RP : y y + d}
13
is called the dominated set for y. In addition,
I(y) := {d ∈ RP : y ∼ y + d}
for any y ∈ RP .
For each y ∈ Y ⊂ RP , we define the set of domination factors
D(y) := {d ∈ RP : y y + d} ∪ {0}.
This means that deviation d ∈ D(y) from y is less preferred to the original y. Then the
point-to-set map D from Y to RP clearly represents the given preference order. We call D
the domination structure.
Definition 2.2.8. Given a set Y in RP and a domination structure D(.), the set of efficient
elements is defined by
0
ξ(Y, D) = {y ∈ Y |@y ∈ Y : y ∈ y + D(y)}.
This set ξ(Y, D) is called the efficient set with respect to the domination structure D.
The most important and interesting special case of the domination structures is when
D(.) is a constant point-to-set map, particularly when D(y) is a constant cone for all y.
When D(y) = D(acone), the domination structure D(.) is said to be
• asymmetry ⇐⇒ [d ∈ D, d 6= O =⇒ −d ∈
/ D]
0 0
• transitivity ⇐⇒ [d, d ∈ D =⇒ d + d ∈ D].
Pointed convex cones are often used for defining domination structures. We usually write
0 0 0 0
y 5D y for y, y ∈ Rp if and only if y −y ∈ D for a convex cone D in Rp . Also y ≤D y means
0 0 0 0
that y − y ∈ D but y − y ∈ / D. When D is pointed, y ≤D y if and only if y − y ∈ D{0}.
When D = RP+ , it is omitted as 5 or ≤. In other words,
0 0
y 5 y if and only if yi 5 yi for all i = 1, · · · , p,
0 0 0
y ≤ y if and only if y 5 y and y 6= y ,
i.e.,
0
yi 5 yi for all i = 1, · · · , p,
0
yi < yi for some i ∈ {1, · · · , p}.
14
2.3 Solution Concept
The concept of optimal solutions to multi-objective optimization problems is not trivial and
in itself debatable. It is closely related to the preference attitudes of the decision makers. The
most fundamental solution concept is that of efficient solutions (also called non-dominated
solutions or non-inferior solutions) with respect to the domination structure of the decision
maker[?].
We consider the multi-objective optimization problem (P) minimize
ξ(Y, D) ⊃ ξ(Y + D, D)
15
Proposition 2.4. Let Y1 and Y2 be two sets in Rp , and let D be a constant domination
structure on Rp (a constant cone). Then
ξ(Y1 + Y2 , D) ⊂ ξ(Y1 , D) + ξ(Y2 , D).
Proof. Let y ∗ ∈ ξ(Y1 + Y2 , D). Then y ∗ = y l + y 2 for some y l ∈ Y1 and y 2 ∈ Y2 . We show
that y l ∈ ξ(Y1 , D). If we suppose the contrary, then there exist y ∈ Y1 and nonzero d ∈ D
such that y l = y + d. Then y ∗ = y l + y 2 = y + y 2 + d and y + y 2 ∈ Y1 + Y2 , which contradicts
the assumption y ∗ ∈ ξ(Y1 + Y2 , D). Similarly we can prove that y 2 ∈ ξ(Y 2, D). Therefore,
y ∗ ∈ ξ(Y1 , D) + ξ(Y2 , D).
Nadir objective vector: The nadir objective vector, z nad , represents the upper bound
of each objective in the entire pareto-optimal set, and not in the entire search space.
In order to normalize each objective in the entire range of the pareto-optimal region, the
knowledge of nadir and ideal objective vectors can be used as follows
fi − zi∗
finorm =
zinad − zi∗
Definition 2.4.2. A Solution x1 is said to dominate the other solution x2 , if both conditions
1 and 2 are true:
1. The solution x1 is no worse than x2 in all objective,
2. The solution x1 is strictly better than x2 in at least one objective.
If any of the above condition is violated, the solution x1 does not dominate the solution x2 .
16
2.4.1 Pareto Optimality
For a given finite set of solutions, we can perform all possible pair-wise comparisons and
which solutions which and which solutions are non-dominated with respect to each other.
At the end, we expect to have a set of solutions, any two of which do not dominate each
other.
0
Among a set of solution P , the non-dominated set of solutions P are those that not
dominated by any member of the set P . When the set P is the entire search space, the
0
resulting non-dominated set P is called the pareto-optimal set[?].
Definition 2.4.3.
• The non-dominated set of the entire feasible search space S is the globally pareto-
optimal set.
fi (x∗ ) − fi (x)
≤M
fj (x) − fj (x∗ )
17
0
Step 1 Set solution counter i = 1 and create an empty non-dominated set P .
Step 3 If more solutions are left in P , increment j by one and go to step 2; otherwise, set
0 0
P = P ∪ {xi }.
0
Step 4 Increment i by one. If i ≤ N , go to step 2; otherwise, stop and declare P as the
non-dominated set.
Approach 2:Continuously updated:-[?]In this approach, every solution from the pop-
ulation is checked with a partially filled population for domination. To start with, the first
0
solution from the population is kept in an empty set p . Thereafter, each solution xi (the
0
second solution on wards) is compared with all members of the set p , one by one. If the
0 0
solution xi dominates any member of p , then that solution is removed from p . In this way
0
non-members of the non-dominated solutions get deleted from p . Otherwise, if solution xi
0
is dominated by any member of p , the solution xi is ignored. If solution xi is not dominated
0 0 0
by any member of p , it is entered in p . This is how the set p grows with non-dominated
0
solutions. When all solutions of the population are checked,the remaining members of p
constitute the non-dominated set. The procedure is :
0
Step 1 Initialize P = {x1 }. Set solution counter i = 2.
Step 2 Set j = 1.
0
Step 3 Compare solution xi with xj from P for domination.
0 0 0 0 0
Step 4 If i dominates j, delete the j th member form P or update P = P \{P (j) }. If j < |P |,
increment j by one and then go to step 3. Otherwise, go to step 5. Alternatively, if
0
the j th member of P dominates xi , increment i by one and then go to step 2.
0 0 0
Step 5 Insert xi in P or update P = P ∪ {xi }. If i < N , increment i by one and go to step
0
2. Otherwise, stop and declare P as the non-dominated set.
Approach 3: Kung et al.’s Efficient method:-[?]This approach first sorts the popu-
lation according to the descending order of importance to the first objective function value.
Thereafter, the population is recursively halved as top(T) and bottom(B) subpopulations.
Knowing that the top-half of the population is better in terms of the first objective function,
the bottom-half is then checked for domination with the top-half. The solutions of B that
are not dominated by any member of T are combined with members of T to form a merged
population M . The merging and the domination check starts with the innermost case (when
there is only one member left in either T or B in recursive divisions of the population) and
the proceeds in a bottom-up fashion. Generally the procedure of this approach is:
Step 1 Sort the population according to the descending order of importance to the first ob-
jective function and rename the population as P of size N .
18
Step 2 Front(P ) (which is the list of the population that sorted according to the descending
order of importance in the first objective function) if |P | = 1, return P as the output of
Front (P ). Otherwise, T = F ront(P (1) − P |P |2 ) and B = F ront(P ((|P |2)+1) − P (|P |) ).
If the ith solution of B is not dominated by any solution of T , create a merged set
M = T ∪ {xi }. Return M as the output of Front(P).
Example 2.1. Let us consider a two-objective optimization problem with five different
solutions shown in the objective space, as illustrated in Figure 2.1. Let us also assume
that the objective function 1 needs to be maximized while the objective function 2 needs
to be minimized. Five solutions with different objective function values are shown in this
figure. We illustrate the working principle of the above stated approaches on the same set
of five(N = 5) solutions, as shown in Figure 2.1. Ideally, the exact objective vector for
each solution will be used in executing the procedure, but here we use the figure to compare
different solutions. We follow the procedure step-by-step in the following.
step 2 We compare solution 1 with all other solutions for domination, starting from solution
2. We observe that solution 2 does not dominate solution 1. Since solution 1 is better
than solution 2 in objective function 1 and solution 1 is also better than solution 2 in
objective function 2. Thus, both of the conditions for domination are satisfied.
19
step 3 However, solution 3 dominates solution 1. Thus, we move to step 4.
step 4 Solution 1 does not belong to the non-dominated set and we increment i to 2 and move
to step 2 to check the fate of solution 2.
step 4 Thus, solution 2 does not belong to the non-dominated set. Next, we check solution 3.
steps 2 and 3 Starting from solution 1, we observe that neither solution 1 nor 2 dominate
solution 3. In fact, solutions 4 and 5 also do not dominates solution 3. Thus, we
0
include solution 3 in the non-dominated set, P = {3}.
step 3 Now we compare solution 3 with solution 1. We observe that solution 3(i = 3) dom-
0
inates solution 1. Thus, we delete the j th (or the first) member from P and update
0 0
P = ∅. Thus, |P | = 0. This depicts that a non-member of the non-dominated set gets
0
deleted from P . We now move to step 5.
0 0
step 5 We insert i = 3 in P or update P = {3}. Since i < 5 here, we increment i to 4 and
move to step 2.
0
step 2 We set j = 1 which refers to the lone element (solution 3) of P .
step 3 By comparing solution 4 with solution 3, we observe that solution 3 dominates solution
4. Thus, we increment i to 5 and move to step 2.
20
0
step 2 We still have solution 3 in P .
step 3 Now we compare solution 5 with solution 3. we observe that neither of them dominates
the other. Thus, we move to step 5.
0 0
step 5 We insert solution 5 in P and update P = {3, 5} as the non-dominated set.
Step 1 Set all non-dominated sets Pj , (j = 1, 2, · · · ) as empty sets. Set non-domination level
counter j = 1
0
Step 2 Use any one of the approaches 1 to 3 to find the non-dominated set P of population
P.
0 0
Step 3 Update Pj = P and P = p\{P }.
Step 4 If P 6= ∅, increment j by one and go to step 2. Otherwise, stop and declare all
non-dominated sets Pi , for i = 1, 2, · · · , j.
21
Chapter 3
22
we have a contradiction to the assumption that x∗ is a solution of the -Constraint problem.
Thus, x∗ has to be weakly pareto-optimal.
Proof. Necessity: Let x∗ ∈ S be pareto-optimal. Let us assume that it does not solve the
-Constraints problem For some j where εi = fi (x∗ ) for i = 1, 2, · · · , n,
j 6= i. Then there exists a solution. x ∈ S such that fj (x) < fj (x∗ ) and fi (x) ≤ fi (x∗ ) when
j 6= i. This contradicts the pareto-optimality of x∗ .
Sufficiency: since x∗ ∈ S is by assumption a solution of the -Constraint problem for every
j = 1, 2, · · · , n there is no x ∈ S such that fj (x) < fj (x∗ ) and fi (x) ≤ fi (x∗ ) when j 6= i.
This is the definition of pareto-optimality for x∗ .
NOTE: The proof of this result relies on the generalized Gordon Theorem[?]. That is,
Let f be an m-dimensional convex vector function on the convex set X ⊂ Rn . Then either
has no solution in X.
Upon imposing the convexity assumption and Papplying the generalized Gordon Theorem,
n
n
there exists p ∈ R P with : p ≥ 0 such thatP i=1 pi [fi (x) − fi (x∗ )] ≥ 0 for allPx ∈ X. By
choosing wi = pi ni=1 pi (possible since ni=1 pi > 0), we have w ≥ 0 and ni=1 wi = 1
with this choice of w, we have, for all x ∈ X
n
X n
X n
X
∗
wi [fi (x) − fi (x )] ≥ 0 =⇒ wi f (x) ≥ wi fi (x∗ )
i=1 i=1 i=1
23
3.1.2 Weighted Sum Method
A multi-objective problem is often solved by combining its multiple objectives into one
single-objective scalar function. This approach is in general known as the weighted-sum
or scalarization method. In more detail, the weighted-sum method minimizes a positively
weighted convex sum of the objectives, that is,
n
X
min wi fi (x) (3.2)
i=1
n
X
s.t. wi = 1
i=1
wi > 0, i = 1, · · · , n
x∈S
Proof. Let x∗ ∈ S be a solution of the weighting problem. Let us suppose that it is not
weakly pareto-optimal. In this case, there exists a solution x ∈ S such that fi (x) < fi (x∗ ) for
all i = 1, 2, · · · , n. According to
Pthe assumptions
Pnset to the∗ weighting coefficients, wj > 0 for
n
at least one j. Thus, we have i=1 wi fi (x) < i=1 wi fi (x ). This is a contradiction to the
assumption that x∗ is a solution of the weighting problem. Thus x∗ is weakly pareto-optimal.
Proof. Let x∗ ∈ S be a solution of the weighting problem with positive weighting coefficients.
Let us suppose that it is not pareto-optimal. This means that there exists a solution x ∈ S
such that fi (x) ≤ fi (x∗ ) for all i = 1,P2, · · · , n and fjP
(x) < fj (x∗ ) for at least one j. Since
wi > 0 for all i = 1, · · · , n, we have i=1 wi fi (x) < ni=1 wi fi (x∗ ). This is a contradiction
n
to the assumption that x∗ is a solution of the weighting problem. Thus x∗ must be pareto-
optimal.
∗
Theorem 3.6. Let the multi-objective optimization problem be convex. Pn If x ∈ S is pareto-
optimal, then there exist a weighting vector w (wi ≥ 0, i = 1, 2, · · · n, i=1 wi = 1) such that
x∗ is a solution of weighting problem(3.2).
24
Advantages: This is probably the simplest way to solve a multi-objective optimization
problems. The concept is intuitive and easy to use. For problems having a convex pareto-
optimal front, this method guarantees finding solutions on the entire pareto-optimal set.
Disadvantages: In most non-linear multi-objective optimization problems, a uniformly
distributed set of weight vectors need not find a uniformly distributed set of pareto-optimal
solutions. Since this mapping is not usually known, it becomes difficult to set the weight
vectors to obtain a pareto-optimal solution in a desired region in the objective space.
Moreover, different weight vectors need not necessarily lead to different pareto-optimal
solutions. If the chosen single objective optimization algorithm can not find all minimum
solutions for a weight vector, some pareto-optimal solutions can not be found.
xmax
i − xmin
i
xi = xmin
i + DV (si )
2li −1
Where li is the string length used to code the ith variable and DV (si ) is the decoded value
of the string si (where the complete string is s = ∪ni=1 si ). It allows the decision variables are
to take positive and negative values.
Assigning fitness to a solution It is important to reiterate that binary GAs work with
strings representing the decision variables, instead of decision variable themselves. Once a
string(or a solution) is created by genetic operators, it is necessary to evaluate the solution,
particularly in the context of the underlying objective and constraint functions.
The evaluation of a solution means calculating the objective function value and constraint
violations. Thereafter, a metric must be defined by using the objective function value and
constraint violations to assign a relative merit to the solution (called the fitness).
25
– Tournament selection It is played between two solutions and the better so-
lution is chosen and placed in the matting pool. Two other solutions are picked
again and another slot in the matting pool is filled with the better solution. The
best solution in a population will win both times, thereby making two copies of
it in the new population. Using a similar argument, the worst solution will lost
in both tournament and will be eliminated from the population. In this way, any
solution in a population will have zero, one or two copies in the new population.
– Proportionate selection Solutions are Assigned copies, the number of which is
proportional to their fitness values. If the average fitness of all population member
is favg , a solution with a fitness fi gets an expected fi favg number of copies.
– Ranking selection First, the solutions are sorted according to their fitness, from
the worst(rank 1) to the best(rank N). Each member in the sorted list is assigned
a fitness equal to the applied with the ranked fitness values, and N solutions are
chosen for the matting pool.
• Crossover operator This operator randomly chooses a locus and exchanges the sub-
sequences before and after that locus between two chromosomes to create two offspring.
Like the Selection operator, there exists a number of crossover operators i.e., single-
point, two-point and uniform crossover operator.
In a two-point crossover operator, two different cross sizes are chosen at random. This
will divide the string into three sub-strings. The crossover operation is completed by
exchanging the middle substring between the strings.
Real parameters are used directly (without any string coding), solving real-parameter
optimization problems is a step easier when compared to the binary-coded GAs. Decision
variables can be directly used to compute the fitness values. Since the selection operator
works with the fitness value, any selection operator used with binary-coded GAs can also
be used in real-parameter GAs[?].
Simulated Binary Crossover: The procedure of computing the children solutions
(1,t+1) (2,t+1) (1,t) (2,t)
xi and xi from parent solutions xi and xi is described as follows. A spread
factor β is defined as the ratio of the absolute difference in children values to the parent
26
values:
(2,t+1) (1,t+1)
xi − xi
β=|(2,t) (1,t)
|
xi − xi
First, a random number u between 0 and 1 is created. Thereafter, from a specified probability
distribution function, the ordinate β is found so that the area under the probability curve
from 0 to β is equal to the chosen random number u. The probability distribution used to
create a child solutionis derived from an analysis of search power and is given as follows [?]:
0.5(n + 1)β n if, β ≤ 1;
C(β) = 1
0.5(n + 1) n+2 , otherwise.
β
Where n is any non-negative real number.Using this equation we can calculate β as follows:
1
(2u) n + 1
if,u ≤ 0.5;
β= 1
1
) n + 1 ,otherwise.
(
2(1 − u)
After obtaining β from the above probability distribution, the children solutions are calcu-
lated as follows:
(1,t+1) (1,t) (2,t)
xi = 0.5[(1 + β)xi + (1 − β)xi ],
(2,t+1) (1,t) (2,t)
xi = 0.5[(1 − β)xi + (1 + β)xi ].
27
Chapter 4
rank(xi ; t) = 1 + nq(xi ; t)
where nq(xi ; t) is the number of solutions dominating solution xi at generation t. All non-
dominated individuals are assigned rank 1.
28
4.2 Diversity
Maintaining a diverse population is an important consideration in multi-objective GA to
obtain solutions uniformly distributed over the true Pareto font. Without taking any pre-
ventive measures, the population tends to form relatively few clusters in multi-objective GA.
This phenomenon is called genetic drift, and several approaches are used to prevent genetic
drift, as follows.
. Fitness Sharing Fitness sharing aims to encourage the search in unexplored sections
of a Pareto front by artificially reducing fitness of solutions in densely populated areas.
To achieve this goal, densely populated areas are identified and a fair penalty method
is used to penalize the solutions located in such areas. Sharing function is used to
obtain an estimate of the number of solutions belonging to each optimum. The idea of
fitness sharing was first proposed by Goldberg and Richardson[?] in the investigation
of multiple local optima for multi-modal functions. They used the following function
in their simulation studies:
1 − ( d )α if,d ≤ σ
share
sh(dij ) = σshare
0 otherwise
The parameter d is the distance between any two solutions in the population. Although
α does not have too much effect on the performance of the sharing function method. In
most application, an α = 1 or 2 is used. The Euclidean distance between two decision
variable vectors x(i) and x(j) can be calculated as dij :
v
u n
uX (i) (j)
dij = t (xk − xk )2
k=1
and σshare for introducing q equispaced (optima equally) niches in the search space is:
q
Pn (U ) (L) 2
k=1 (xk − xk )
σshare = √
2( q)1n
If d is zero(meaning that two solutions are identical or their distance is zero), sh(d) = 1.
This means that a solution has full sharing effect on it itself. On the other hand, if
d ≥ σshare meaning that two solutions are at least a distance of σshare away from each
other), sh(d) = 0. This means that two solutions which are a distance of σshare away
from each other do not have any sharing effect on each other.
The niche count nci is calculated for the ith solution, as follows:
n
X
nci = sh(dij )
j=1
29
Then the niche count provides an estimate of the extent of crowding near a solution.
It is important to note that nci is always greater than or equal to one. This is because
the right side includes the term sh(dii ) = sh(0) = 1. The final task is to calculate the
shared fitness value as
0 fi
fi =
nci
Two solutions might be very close in the objective function space while they have very
different structural features. Therefore, fitness sharing based on the objective function
space may reduce diversity in the decision variable space. However, Deb and Goldberg
[?] reported that fitness sharing on the objective function space usually performs better
than one based on the decision variable space.
Diversity Preservation
Most multi-objective evolutionary algorithms (MOEAs) try to maintain diversity within the
current Pareto set approximation by incorporating density information into the selection
process: an individuals chance of being selected is decreased the greater the density of
individuals in its neighborhood.
The sharing function method involves a sharing parameter σshare , which sets the extent
of sharing desired in a problem. This parameter is related to the distance metric cho-
sen to calculate the proximity measure between two population members. The parameter
σshare denotes the largest value of that distance metric within which any two solutions share
each others fitness. This parameter is usually set by the user, although there exist some
guidelines[?]. In the proposed NSGA-II, we replace the sharing function approach with a
crowded-comparison approach that eliminate any user-defined parameter for maintaining
diversity among population members.
Density Estimation: To get an estimate of the density of solutions surrounding a
particular solution in the population, we calculate the average distance of two points on
either side of this point along each of the objectives. This quantity idistance serves as an
estimate of the perimeter of the cuboid formed by using the nearest neighbors as the vertices
(call this the crowding distance).
Crowded-Comparison Operator: The crowded-comparison operator (≺n ) guides
the selection process at the various stages of the algorithm toward a uniformly spread-out
Pareto-optimal front. Assume that every individual xi in the population has two attributes:
1) non-domination rank (irank );
2) crowding distance (idistance ).
We now define a partial order ≺n as:
i ≺n j ,if (irank < jrank ) or (irank = jrank ) and (idistance > jdistance )
That is, between two solutions with differing non-domination ranks, we prefer the solution
with the lower (better) rank. Otherwise, if both solutions belong to the same front, then we
prefer the solution that is located in a lesser crowded region.
30
Elitisim
Elitism in the context of single-objective GA means that the best solution found so far during
the search has immunity against selection and always survives in the next generation. In
this respect, all non-dominated solutions discovered by a multi-objective GA are considered
as elite solutions.
Elitism can be introduced globally in a generational sense. Once the offspring population
is created, both parent and offspring population can be combined together. Thereafter, the
best N , members may be chosen to form the population of the next generation without any
parameter. In this way too, parents get a chance to compute with the offspring population
for their survival in the next generation. It makes sure that the fitness of the population of
best solution does not deteriorate. In this way, a good solution found early on in the run
will never be lost unless a better solution is discovered.
In fact, Rudolph(1996) has proved that GAs converge to the global optimal solution of
some functions in the presence of elitism. Moreover, the presence of elites enhances the
probability of creating better offspring[?]. Elitism can be implemented to different degrees.
For example, one can simply keep track of the best solution in a population and update it
if a better solution is discovered at any generation, but not use the elite solutions in any
genetic operations. On the other hand, in another extreme implementation, all elites present
in the current population can be carried over to the new population. In this way, not many
new solutions get a chance to enter the new population and the search does not progress any
where.
31
the entire population Rt . Although this requires more effort compared to performing a non-
dominated sorting on Qt alone, it allows a global non-domination check among the offspring
and parent solutions.
Once the non-dominated sorting is over, the new population is filled by solutions of
different non-dominated fronts, one at a time. The filling starts with the best non-dominated
front and continues with solutions of the second non-dominated front, followed by the third
non-dominated front, and so on.
In the following, we outline the algorithm in a step-by-step format. Initially, a random
population Po is created. The population is sorted into different non-domination levels.
Each solution is assigned a fitness equal to its non-domination level (1 is the best level).
Thus, minimization of the fitness is assumed. Binary tournament selection (with a crowded
tournament operator described later), recombination and mutation operators are used to
create an offspring population Q0 of size N . The NSGA-II procedure is outlined in the
following.
step 1 Combine parent and offspring populations and create Rt = Pt ∪ Qt . Perform a non-
dominated sorting to Rt and identify different fronts:fi , i = 1, 2, · · · , etc.
step 2 Set new population Pt+1 = ∅. Set a counter i = 1. Until | Pt+1 | + | fi |< N, perform
Pt+1 = Pt+1 ∪ fi and i = i + 1.
step 3 Perform the crowding-sort (fi <c ) procedure and include the most widely spread (N − |
Pt+1 |) solutions by using the crowding distance values in the sorted fi to Pt+1 .
step 4 Create offspring population Qt+1 from Pt+1 by using the crowded tournament selection,
simulated binary crossover and polynomial mutation operators.
32
Crowding Distance
To get an estimate of the density of solutions surrounding a particular solution xi in the
population, we take the average distance of two solutions on either side of solution xi along
each of the objectives. This quantity di serves as an estimate of the parameter of the cuboid
formed by using the nearest neighbors as the vertices (we call this the crowding distance).
The following algorithm is used to calculate the crowding distance of each point in the set f
Crowding Distance Assignment Procedure:crowding-sort (f <c )
step 1 Call the number of solutions in f as l =| f |. For each i in the set, first assign di = 0.
step 2 For each objective function m = 1, 2, · · · , M , sort the set in worse order of fm or, find
the sorted indices vector: I m = sort(fm , >).
step 3 For m = 1, 2, · · · , M , assign a large distance to the boundary solutions, or dIhm = dIlm =
∞, and for all other solutions j = 2 to (l − 1), assign:
Im Im
fmj+1 − fmj−1
dIjm = dIjm + max min
fm − fm
The index Ij denotes the solution index of the j th member in the sorted list. Thus, for
any objective, Il and Ih denote the lowest and highest objective function values respectively.
The second term on the right side of the last equation is the difference in objective function
values between two neighboring solutions on either side of solution Ij .
33
Example 4.1. We consider the following two-objective,two-variable minimization problem
to illustrate how the algorithm, presented in this project, works.
minimize f1 (x) = x1
1 + x2
minimize f2 (x) =
x1
subject to 0.1 ≤ x1 ≤ 1,
0 ≤ x2 ≤ 5.
We have also chosen six random solutions and we assume an offspring population of six
solutions, in the search space for illustrating the working principle of algorithm described in
this project. These solutions are also tabulated in the following table.
Table 4.1: Parent and offspring with their objective function value.
step 1 We first combine the population Pt and Qt and from a set Rt = {1, 2, 3, 4, 5, 6, a, b, c, d, e, f }.
Next, we perform a non-dominated sorting on Rt . We obtain the following non-
dominated fronts:
f1 = {5, a, e},
f2 = {1, 3, b, d},
f3 = {2, 6, c, f },
f4 = {4}.
step 3 Next, we consider solutions of the second front only and observe that three (of four)
solutions must be chosen to fill up three remaining slots this subpopulation (solution
1,3,b and d ) by using the <c operator. We calculate the crowded distance values of
these solutions in the front by using the step-by step procedure.
34
step C1 We notice that l = 4 and set d1 = d3 = db = dd = 0. We also set f1max = 1,
f1min = 0.1, f2max = 60 and f2min = 0.
step C2 For the first objective function, the sorting of these solutions is shown in table 4.2
and is as follows:
I 1 = {3, d, 1, b}.
step C3 Since solutions 3 and b are boundary solution, we set d3 = db = ∞. for the other
two solutions, we obtain:
(1) (3)
f1 − f1 0.31 − 0.22
dd = 0 + max min
=0+ = 0.10.
f1 − f1 1 − 0.1
(b) (d)
f1 − f1 0.79 − 0.27
d1 = 0 + max min
=0+ = 0.58.
f1 − f1 1 − 0.1
Now, we turn to the second objective function and update the above distances.
First the sorting on this objective yields I 2 = {b, 1, d, 3}. Thus, d3 = db = ∞ and
the other two distance are as follows:
(3) (1)
f2 − f2 7.09 − 6.10
dd = dd + max min
= 0.10 + = 0.12.
f2 − f2 60 − 0
(d) (b)
f2 − f2 6.93 − 3.97
d1 = d1 + max min
= 0.58 + = 0.63.
f2 − f2 60 − 0
The overall crowding distances of the four solutions are:
d1 = 0.63, d3 = ∞, db = ∞, dd = 0.12.
Evidently solution d has the smallest parimeter of the hypercube around it than
any other solution in the set f2 . Now, we move to the main algorithm.
35
step 4 The new population is Pt+1 = {5, a, e, 3, b, 1}. It is important to note that this
population is formed by choosing solutions from the better non-dominated fronts.
The offspring population Qt+1 has to be created next by using this parent population. We
realize that the exact offspring population will depend on the chosen pair of solutions partic-
ipating in a tournament and the chosen crossover and mutation operators. Let us say that
we pair solutions (5,e), (a,3), (1,b), (a,1), (e,b) and (3,5), so that each solution participates
in exactly two tournament. In the first tournament, we observe that solutions 5 and e belong
to the same front (r5 = re = 1). Thus, we choose the one with larger crowding distance
value. We find that solution 5 is the winner.
In the next comparison between solutions a and 3, solution a wins, since it belongs to
a better front. Performing other tournaments, we obtain the mating pool:{5, 5, a, a, b, e}.
Now, these solutions can be mated pair-wise and mutated to create Qt+1 . This completes
one generation of the NSGA-II.
Example 4.2. Consider the following two-objective multi-objective optimization problem:
min f1 (x) = 3x3 − 26x + 10
f2 (x) = 9x2 − 26
subject to the constraints:
x ≥ −2.5
We applied the non-dominated sorting genetic algorithm (NSGA-II) to this problem and the
algorithm was coded in Matlab. We use the Simulated Binary Crossover (SBX) and the
polynomial mutation operator and also we used the crossover probability of pc = 0.9 and a
1
mutation probability of pm = (where n is the number of decision variables). The distri-
n
bution indices for crossover and mutation operators as µc = 20 and µm = 20 respectively.
It shows a typical result with a population of 50 individuals and figure 4.2 and figure 4.3
are the classical representation in multi-objective optimization: f2 vs. f1 after 5 and 100
generations respectively.
Figure 4.2:The population after 5 generations Figure 4.3:The population after 100
with NSGA-II generations with NSGA-II
36
Example 4.3. Function f1 (X) and f2 (X) proposed by Zitzler [?] problem consists of solving
the following multi-objective optimization problem:
minimize f1 (X) = x1 r
f1
minimize f2 (X) = g(X).(1 − )
g(X)
9 Pn
s.t. g(X) = 1 + xi
n − 1 i=0
0 ≤ xi ≤ 1
We applied the non-dominated sorting genetic algorithm (NSGA-II) to this problem and the
algorithm was coded in Matlab. We use the Simulated Binary Crossover (SBX) and the
polynomial mutation operator and also we used the crossover probability of pc = 0.9 and a
1
mutation probability of pm = (where n is the number of decision variables). The distri-
n
bution indices for crossover and mutation operators as µc = 20 and µm = 20 respectively.
It shows a typical result with a population of 100 individuals and figure 4.4 and figure 4.5
are the classical representation in multi-objective optimization: f2 vs. f1 after 50 and 500
generations respectively.
Figure 4.4:The population after 50 generations Figure 4.5:The population after 500
37
Bibliography
[7] T. Back, Evolutionary Algorithms in Theory and Practice, Oxford University Press,
New York, (1996).
[11] S.S. Rao, Optimization theory and application, Wiley Eastern Limited, New Delhi,
(1991).
[13] Abdullah Konak, David W. Coit, Alice E. Smith, Multi-objective optimization using
genetic algorithms: A tutorial, Elsevier Ltd, (2005)
38
[14] Zitzler, E. and Thiele, L., Multiobjective evolutionary algorithms: a comparative case
study and the strength of pareto approach, IEEE Transactions on Evolutionary Com-
putation 3(4), 257-271 (1999).
[15] Schaffer, J.D. Multiple Objective optimization with vector evaluated genetic algorithms,
In J. J. Grefenstette, editor, Proceedings of an International Conference on Genetic
Algorithm and their applications, pp. 93-100, Pittsburgh,(1985).
[16] Deb, K. and Goldberg, D.E. An investigation of of niche an species fromation in genetic
function optimization’ in The Third International Conference on Genetic Algorithms.
Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, pp. 42-50, (1989).
[17] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan. A fast elitist non-dominated sorting
genetic algorithm for multi-objective optimization: NSGA-II. In M. Schoenauer et al.,
editors, Parallel Problem Solving from Nature (PPSN VI), Springer, pp. 849 858, Berlin,
(2000).
[18] K. Deb and R. B. Agrawal, Simulated binary crossover for continuous search
space,Complex System 9(2), pp. 115-148, (1995).
[19] K. Deb and M. Goyal, A combined genetic adaptive search for engineering design, com-
puter science and informatics 26(4), pp. 30-45,(1996).
39