A Novel Pareto-Optimal Ranking Method For Comparing Multi-Objective Optimization Algorithms
A Novel Pareto-Optimal Ranking Method For Comparing Multi-Objective Optimization Algorithms
Abstract—As the interest in multi- and many-objective op- the evaluation of the solution sets produced by the multi-
timization algorithms grows, the performance comparison of objective algorithms, which presents various trade-offs among
these algorithms becomes increasingly important. A large num- the objectives must be quantitatively appraised using a variety
ber of performance indicators for multi-objective optimization
of measurement metrics [5]. While the quality evaluation of
arXiv:2411.17999v1 [cs.AI] 27 Nov 2024
aforementioned factors at the same time and assigns a score II. BACKGROUND R EVIEW ON M ULTI - AND
to the result of each algorithm. This indicates that - no ideal M ANY- OBJECTIVE OPTIMIZATION
individual metric can assess all characteristics of a Pareto front Multi-objective optimization targets handling two or three
and, consequently, considering several metrics simultaneously conflicting objectives and many-objective algorithms aim to
is crucial. In such cases, the scores of algorithms based on each tackle more than three conflicting objectives. In recent years,
metric are computed and the overall final rank is determined. multi-objective optimization algorithms have been greatly
However, this can also be more complicated if the metrics expanded to tackle many-objective problems. The use of
conflict with each other, thus indicating the overall ranking can evolutionary algorithms has been very promising for solving
be more challenging. This occurs because each performance such problems. As a population-based approach, it enables
indicator may yield distinct rankings for competing algorithms. the generation of a set of solutions at each run, with each
For instance, [18] investigated the contradictions of the IGD solution potentially interacting with the others to create even
and HV indicator values when evaluating concave Pareto better solutions.
fronts. In literature, there exist a few publications that have Definition 1. Multi-objective Optimization [2]
studied the application of multiple performance metrics [19],
[20]. Yen et al. [19] introduced an ensemble method to rank x) = [f1 (x
M in/M ax F (x x), f2 (x
x), ..., fM (x
x)]
(1)
algorithms by combining several evaluation metrics using s.t. Li ≤ xi ≤ Ui , i = 1, 2, ..., d
double-elimination tournament selection. As an alternative
subject to the following equality and/or inequality constraints.
approach, there have been some techniques to combine several
metrics to present one ranking result based on a comprised x) ≤ 0 j = 1, 2, ..., J
gj (x
indicator [21]. However, finding a combination technique to (2)
x) = 0 k = 1, 2, ..., K
hk (x
avoid the negative impact on the individual metrics can be
another challenging issue. Furthermore, even with a combi- where M is the number of objectives, d is the number of
nation technique, only a limited number (two or three) of decision variables (i.e., dimension), and the value of each
indicators can be combined. From these investigations, it is variable, x i , is in interval [Li , Ui ] (i.e., box-constraints). fi
evident that multi-metric evaluation of algorithms has more represents the objective function, which should be minimized
benefits than any stand-alone single performance metric and or maximized. The hard constraints that are required to be
the set of quality indicators has to be large enough to satisfy satisfied are gj (x x) ≤ 0 j = 1, 2, ..., J and hk (x x) = 0 k =
the reliability of the comparison [22]. 1, 2, ..., K.
In multi- or many-objective optimization problems, finding
To address the aforementioned concerns and to ensure a fair an optimal solution set is far more complex than in the single-
assessment, this paper proposes a multi-metric method to rank objective case. As such, a trade-off must be made between the
the multi-objective algorithms based on a set of performance different objectives. One way to compare the various candidate
indicators. The proposed approach utilizes the Pareto optimal- solutions is to use the concept of dominance. This involves an
ity concept to tackle the issue of possible conflicts among the assessment of how one solution is better than another with
measurements. Each performance metric can be observed as an regard to the objectives.
objective in the objective space and, consequently, algorithms Definition 2. Dominance Concept [23] If x =
are ranked based on their scores achieved from the individual x = (x́1 , x́2 , ..., x́d ) are two vectors in
(x1 , x2 , ..., xd ) and x́
metrics. Since the Pareto optimality leads to different Pareto a minimization problem search space, x dominates x́ x ≺ x́
x (x x)
levels, four techniques are proposed to rank algorithms based if and only if
on their contribution at each Pareto level. This technique
provides a reliable ranking at the end of the process, regardless ∀i ∈ {1, 2, ..., M }, fi (x
x) ≤ fi (x́
x)∧
(3)
of the objectives, metrics, and algorithms employed. However, ∃j ∈ {1, 2, ..., M } : fj (xx) < fj (x́
x)
it is crucial to choose appropriate performance metrics that
This concept defines the optimality of a solution in a multi-
align with our ultimate goal. This is because the metrics
objective space. Candidate solution x is better than x́ x if it
we opt for play a vital role in accurately ranking competing
is not worse than x́ x in any of the objectives and it has a
algorithms. Furthermore, any newly developed metrics can be
better value in at least one of the objectives. Solutions that are
included as part of the assessment. A great benefit of this
not dominated by any other solution are called non-dominated
method is that it is parameter-free and algorithms can be
solutions; they create the Pareto front set. Multi-objective al-
evaluated on various factors, with the overall ranking generated
gorithms attempt to find these solutions by utilizing generating
at the end of its process.
strategies/operators and selection schemes. The non-dominated
The remaining sections of this paper are organized as sorting (NDS) algorithm [23] is one of the popular selection
follows: A background review of multi-objective optimization strategies that work based on the dominance concept. It ranks
is provided in Section II. Section III reviews some well- the solutions of the population in different levels of optimality,
known multi-objective performance indicators. Section IV called Pareto. The algorithm starts with determining all non-
presents a detailed description of the proposed multi-metric dominated solutions in the first rank. In order to identify
method. Section V investigates the performance of the pro- the second rank of individuals, the non-dominated vectors
posed method using the 2018 CEC competition’s test problems are removed from the set to process the remaining candidate
and algorithms. Finally, the paper is concluded in Section VI. solutions in the same way. Non-dominated solutions of this
JOURNAL OF ... 3
step make the second level of individuals (second Pareto). distance measure is from each point in the true Pareto to those
Thereafter, the second-ranked individuals will be removed to in the approximate Pareto front, so that:
identify the third Pareto. This process will continue until all q
P|P | 2
of the individuals are grouped into different levels of Pareto. i=1 dist(i, S)
IGD(S, P ) = , (6)
|P |
III. L ITERATURE REVIEW ON PERFORMANCE INDICATORS where dist(i, S) is the Euclidean distance between a point
in the true Pareto front and the nearest solution on the
There are several performance metrics to assess the qual- approximate solution.
ity of multi and many-objective algorithms. These metrics Two-set Coverage (C) [10]: This metric indicates the rate of
evaluate the performance of the algorithms using aspects solutions on the Pareto front of one algorithm that is dominated
such as convergence, distribution, and coverage. Each metric using solutions of another algorithm.
may have its advantages and drawbacks. In this section, we |{b ∈ B, there exists a ∈ A such that a ⪯ b}|
review some well-known metrics that we have utilized in our C(A, B) =
|B|
experiments to design the multi-metric ranking method. Table I (7)
provides an overview of the ten metrics employed in this For example, C(A, B) = 0.25 means that the approximate
study, detailing what they measure, the required number of solutions from algorithm A dominate 25% of the solutions
parameters, and their respective advantages and disadvantages. resulting from algorithm B. Obviously, both C(A, B) and
The table primarily addresses three key aspects of a solution C(B, A) should be calculated for comparison.
set: convergence (proximity to the theoretical Pareto opti- Coverage over the Pareto front (CPF) [24]: This is a
mal front), diversity (distribution and spread), and cardinality measure of the diversity of the solutions projected through a
(number of solutions). mapping from an M -dimensional objective space to an (M −
Hypervolume (HV) indicator [14]: HV is a very popular 1)-dimensional space. In this process, a large set of reference
indicator that evaluates multi-objective algorithms in terms points are sampled on the Pareto front, and then each solution
of the distribution of the Pareto front and the closeness to on the resulting Pareto front is replaced by its closest reference
true Pareto front. HV indicator evaluates the diversity and point. Thus, a new point set P ′ can be generated as follows:
convergence of a multi-objective algorithm. It calculates the
volume of an M -dimensional space that is surrounded by P ′ = {argminr∈R ∥ −f (x) ∥ |x ∈ P }, (8)
a set of solution points (A) and a reference point r = where R is a set of reference points and P denotes the set of
(r1 , r2 , ..., rM ) where M is the number of objectives of the approximate candidate solutions. After the transformation of
problem. Therefore, HV measures the volume of the region P ′ and R (i.e. projection, translation, and normalization) to
which is dominated by the non-dominated solutions in the project the points to a unit simplex, the ratio of the volume of
objective space, relative to a reference point, r. A reference P ′ and R is calculated as CPF.
point is a point with worse values than a nadir point. The HV V ol(P ′ )
measure is defined in Eq. 4. CP F = . (9)
V ol(R)
[
HV (A) = vol ( [f1 (a), r1 ]×[f2 (a), r2 ]×...×[fM (a), rM ]), The details of this metric and the way of calculating volume
a∈A are given in [24].
(4) Hausdorff Distance to the Pareto front (∆p ) [25]: This
where fi represents the objective function and a ∈ A is a point indicator combines two metrics, GD and IGD. ∆p is defined
that weakly dominates all candidate solutions. Larger values as follows:
of HV indicate that the Pareto front surrounds a wider space
resulting in more diverse solutions and closer to the optimal ∆p (S, P ) = max(GD(S, P ), IGD(S, P )) (10)
Pareto front.
Generational Distance (GD) [11]: GD measures the av- This metric has stronger properties than the two individual
erage minimum distance between each obtained objective indicators since it combines GD and IGD. For instance, ∆p can
vector from the set S and the closest objective vector on the efficiently handle outliers by considering the averaged result.
representative Pareto front, P , which is calculated as follows: Pure Diversity (PD) [26]: Given an approximate solution
(A), PD measures the sum of the dissimilarities of each
q
P|S| solution in A to the rest of A. For this purpose, the solution
2
i=1 dist(i, P ) with the maximal dissimilarity has the highest priority to
GD(S, P ) = , (5)
|S| accumulate its dissimilarities. The higher the PD metric, the
greater the diversity among the solutions. PD is calculated
where dist(i, P ) is the Euclidean distance from an approxi- using recursive Eq. 11:
mate solution to the nearest solution on the true Pareto front
q = 2. A smaller GD value indicates a lower distance to the P D(A) = max(P D(A − si ) + d(si , A − si )), (11)
si ∈A
true Pareto front and consequently better performance.
Where,
Inverted Generational Distance (IGD) [12]: The only
difference between GD and IGD is that the average minimum d(s, A) = min (dissimilarity(s, si )), (12)
si ∈A
JOURNAL OF ... 4
where d(si , X − si ) denotes the dissimilarity d from one 5) Apply the NDS algorithm on A × R M -dimensional
solution si to a community A. vectors to generate different levels of ranks. Note that,
Spacing (SP) [13]: This indicator measures the distribution in order to apply the NDS algorithm, we reverse some of
of non-dominated points on the approximate Pareto front. SP the scores (e.g., since the highest HV value indicates the
can be computed as follows: better algorithm, we replace this value with its inverse
v or change its sign in case of zeros) so that all metrics
u
u 1 |S|
X are evaluated as a minimization problem instead of a
SP (S) = t (d − di )2 , (13)
|S| − 1 i=1 maximization.
6) Evaluate the ranks of each algorithm based on their
where di = min(si ,sj )∈S,si ̸=sj ∥ F (si ) − F (sj ) ∥1 is the l1 contribution at each Pareto level using the proposed
distance of ith point on the approximate Pareto front to the ranking techniques discussed below.
closest point on the same Pareto front and d is the mean of Suppose that we want to compare the performance of
di ’s. A multi/many-objective algorithms for solving a benchmark
Overall Pareto Spread (OS) [27]: This metric measures the test problem. We run each algorithm R times and compute
extent of the front covered by the approximate Pareto front. the algorithm’s performance scores using M existing multi-
A higher value of this metric indicates better front coverage objective metrics (e.g., HV, IGD, GD,..., etc.). After these
and it is calculated as follows: steps, we obtain A × R M -dimensional points (performance
m
Y | maxx∈S fi (x) − minx∈S fi (x) scores). The format of the matrix representing these scores can
OS(S) = , (14)
|fi (PB ) − fi (PG )| be illustrated as follows:
i=1
where PB is the nadir point and PG is the ideal point. m1 m2 m3 ... mM
Distribution Metric (DM) [28]: This metric also indicates a1,1 psa1,1 ,m1 psa1,1 ,m2 psa1,1 ,m3 ... psa1,1 ,mM
the distribution of the approximate Pareto front by an algo- a1,2 psa1,2 ,m1 psa1,2 ,m2 psa1,2 ,m3 ... psa1,2 ,mM
.. .. .. .. .. ..
rithm. DM is given by: . . . . . .
a1,R psa1,R ,m1 psa1,R ,m2 psa1,R ,m3 ... psa1,R ,mM
m
.. .. .. .. ..
1 X σi |fi (PG ) − fi (PB )| ..
DM (S) = , (15) . . . . . .
|S| i=1 µi Ri
aA,1 psaA,1 ,m1 psaA,1 ,m2 psaA,1 ,m3 ... psaA,1 ,mM
where σi and µi are the standard deviation and mean of the aA,2 psaA,2 ,m1 psaA,2 ,m2 psaA,2 ,m3 ... psaA,2 ,mM
distances relative to the ith objective, Ri = maxs∈S fi (s) − .. .. .. .. .. ..
. . . . . .
mins∈S fi (s) where |S| is the number of points on the
aA,R psaA,R ,m1 psaA,R ,m2 psaA,R ,m3 ... psaA,R ,mM
approximate Pareto front, and fi (PG ) and fi (PB ) are the
function values of design ideal and nadir points, respectively. Where psai,j mk indicates the computed performance score for
A lower value of DM indicates well-distributed solutions. jth run of ith algorithm based on kth metric.
Next, the proposed method applies the NDS algorithm to
IV. P ROPOSED PARETO - OPTIMAL R ANKING M ETHOD place A × R M -dimensional vectors. This process yields a
In this section, we present the details of the multi-metric set of Pareto levels and the corresponding points in these
ranking method. The proposed method ranks competing multi- levels. Suppose that the NDS algorithm resulted in L levels,
or many-objective algorithms based on a variety of perfor- say l1 , l2 , l3 , ..., lL . Then, for each algorithm, we count the
mance metrics, simultaneously. First, each algorithm is inde- number of points associated with each Pareto level (i.e., this
pendently evaluated using a set of M performance indicators, step allows us to quantify the quality of each algorithm when
hence forming M objectives. Then, it combines these M- solving a given problem). Then, the resulting matrix can be
dimensional points and groups them to Pareto dominance lev- illustrated as follows:
els using the non-dominated sorting (NDS) algorithm. Finally, l1 l2 l3 ... lL
the ranks of each algorithm are determined using one of the a1 na1 l1 na1 l2 na1 l3 ... na1 lL
four proposed ranking methods described in this section. The a2 na2 l1 na2 l2 na2 l3 ... na2 lL
steps of the proposed method to rank A algorithms (for our a3 na1 l1 na1 l2 na1 l3 ... na3 lL
experiment we have used A = 10 algorithms) when solving .. .. .. .. .. ..
. . . . . .
many-objective optimization problems are described below:
aA naA l1 n aA l1 naA l1 ... naA lL
1) Select M -multi-objective performance indicators (e.g.,
HV, IGD, GD,..., etc.). where nai lj indicates the number of points (i.e., M -
2) Run each algorithm R times. dimensional metrics scores) of ith algorithm on jth Pareto
3) For each run, calculate the performance scores based on level. Lastly, the ranks of each algorithm are determined using
M metrics. As a result, for each algorithm, we have a one (or more, in case of a tie) of the four proposed ranking
matrix size of R × M performance scores. techniques described below.
4) Concatenate all performance scores from A algorithms Olympic method: The best algorithm is determined by
to get a matrix size of A × R points containing M - evaluating the number of points each solution has on the first
dimensional (objectives/criteria) vectors. level. If a tie occurs between two solutions, the second level
JOURNAL OF ... 5
TABLE I
A DVANTAGES AND D ISADVANTAGES OF M ULTI -O BJECTIVE P ERFORMANCE M ETRICS
is considered and the algorithm with more points is selected; where the number of points of algorithm a1 on the first,
if there is still a tie, the third level is considered, and so on. second, and third levels is 20, 10, and 1, respectively, and
15, 14, and 2 for algorithm a2 . According to the Olympic
F irst rank = arg max (nai l1 ) (16) ranking, algorithm a1 outperforms a2 as the number of points
i
on the first level (i.e., l1 ) for a1 is higher than a2 .
Suppose that two algorithms, a1 and a2 , after the NDS step, Linear method: This technique takes all points into account
have the following number of points on each level: when calculating an algorithm’s ranking, rather than just the
top Pareto level score like the Olympic method. The weighted
l1 l2 l3 score of each algorithm is calculated by multiplying the
a1 20 10 1 number of points they have in each level by the decreasing
a2 15 14 2 linear weights. For example, if the NDS algorithm produces
JOURNAL OF ... 6
L levels, the first level will have a weight of L, the second Using this definition, the total sum of cumulative weight
level will be L − 1, and so on. Once the weighted sums ratios of all ranks is the score of each algorithm. In this
are determined for all competing algorithms, their ranks are way, the ratio of contribution of each algorithm at each level
assigned based on these weights (the highest weighted sum determines the rank of the algorithm. Eq. 21 represents the
algorithm ranked first). In this way, every point on all levels computation of adaptive score for algorithm ai .
contributes to the ranking of an algorithm. Eq. 17 represents L
the computation of linear score for algorithm ai .
X CW (ai , l)
Adaptive Score(ai ) = (21)
T otal CW (l)
Linear Score(ai ) =nai l1 × (L) + nai l2 × (L − 1)+, ..., l=1
+ nai lL × (1) (17) For the previous example, we can compute the cumulative
weights for algorithms a1 and a2 as follows:
Given the previous example, the linear score of algorithm CW1 CW2 CW3
a1 and a2 can be calculated as follows: a1 20 30 31
a2 15 29 31
Linear Score(a1 ) = 20 × 3 + 10 × 2 + 1 × 1 = 81 The total cumulative weights for each Pareto level are calcu-
lated as:
Linear Score(a2 ) = 15 × 3 + 14 × 2 + 2 × 1 = 75
T otal CW (l1 ) = 20 + 15 = 35
Accordingly, the algorithm a1 has a higher rank. T otal CW (l2 ) = 30 + 29 = 59
Exponential Method: Similar to the linear ranking method,
the exponential technique assigns a weight to each level T otal CW (l3 ) = 31 + 31 = 62
of Pareto, however, the designated weights are decreasing Consequently, the adaptive scores for algorithms a1 and a2
exponentially rather than linearly. Specifically,, the decreasing will be:
weights are 20 , 2−1 , 2−3 , ..., 2−L for levels 1, 2, 3, ..., Adaptive Score(a1 ) = 20/35 + 30/59 + 31/62 = 1.58
L, respectively. Then, the weighted sum indicates the score
Adaptive Score(a2 ) = 15/35 + 29/59 + 31/62 = 1.42
of each algorithm. Eq. 18 represents the computation of
exponential score for algorithm ai . Thus, based on the adaptive score, algorithm a1 is better
than a2 .
Exponential Score(ai ) =nai l1 × 20 + nai l2 × 2−1 +, ..., It is worth mentioning that in the event of a tie (i.e.,
(18)
+ nai lL × 2−(L−1) identical ranks for two or more algorithms), when using
one of these ranking techniques, we the can apply any of
Given the previous example, the exponential score of
the other ranking methods to break the tie. Additionally, if
algorithms a1 and a2 can be calculated as follows:
the user has a preference for ranking in scenarios involving
diverse complexities, such as varying numbers of objectives,
Exponential Score(a1 ) = 20 × 20 + 10 × 2−1 + 1 × 2−2 it is feasible to assign weights to the scores based on their
= 25.25 respective complexities following the Non-Dominated Sorting
0 −1
Exponential Score(a2 ) = 15 × 2 + 14 × 2 + 2 × 2 −2 (NDS) step.
= 22.5
V. E XPERIMENTAL VALIDATION : C ONDUCTING
Accordingly, the algorithm a1 outperforms a2 . C OMPREHENSIVE C OMPARISONS
Adaptive Method: The score of each algorithm is calcu- A. Experimental settings
lated based on the cumulative number of points distributed
The practical application of the proposed metric is uti-
over all levels. For each algorithm, the total number of points
lized to rank ten well-known evolutionary multi-objective
at levels 1 and 2 is considered as the cumulative weight of
algorithms submitted to the 2018 IEEE CEC competition. In
level 2. The total number of points at levels 1, 2, and 3 are
this competition, participants were asked to develop a novel
considered as the cumulative weight of level 3. Correspond-
many-objective optimization algorithm to solve 15 MaF many-
ingly, the total number of points distributed on all levels is the
objective test problems listed in Table II. The competing
cumulative weight of level L of the corresponding algorithm.
algorithms include AGE-II [29], AMPDEA, BCE-IBEA [30],
Eq. 19 indicates the computation of cumulative weight for
CVEA3 [31], fastCAR [32], HHcMOEA [33], KnEA [34],
level l of algorithm ai .
RPEA [35], RSEA [36], and RVEA [37]. All experiments
l
X were conducted on 3-, 5-, and 15-objective MAF test problems
CW (ai , l) = n ai j (19) and the number of decision variables was set according to the
j=1 setting used in [38]. Each algorithm was run independently
Similarly, the total cumulative weight of level l can be 20 times. The maximum number of fitness evaluations was
defined as: set to max(100000, 10000 × D), and the maximum size of
A
the population was set to 240.
X In order to assess the efficacy of the proposed ranking
T otal CW (l) = CW (ai , l) (20)
i=1
method, we obtained the approximated Pareto fronts of these
JOURNAL OF ... 7
TABLE II AMPDEA, CVEA3, and HHcMOEA share the top rank when
P ROPERTIES OF THE 15 M A F BENCHMARK PROBLEMS . M IS THE solving MaF1. In such cases, we can resort to one or more
NUMBER OF OBJECTIVES .
of the other proposed ranking methods or employ the average
Test function Properties Dimension ranking across all four methods to resolve these ties when-
MaF1 Linear M+9 ever feasible. However, if two algorithms contribute equally
MaF2 Concave M+9
MaF3 Convex, multimodal M+9 across all Pareto levels, their rankings will remain identical
MaF4 Concave, multimodal M+9 irrespective of the ranking method employed.
MaF5 Convex, biased M+9
MaF6 Concave, degenerate M+9
MaF7 Mixed, disconnected, multimodal M+19 C. Ranking algorithms when solving a set of test problems
MaF8 Linear, degenerated 2 with a particular number of objectives
MaF9 Linear, degenerated 2
MaF11 Convex, disconnected, nonseparable M+9 In the previous experiment, we examined the ranking of
MaF12 Concave, nonseparable, biased deceptive M+9 algorithms based on individual test problems with a specific
MaF13 Concave, unimodal, nonseparable, degenerate 5
MaF14 Linear, Partially separable, Large scale 20 × M number of objectives. In this experiment, we delve into the
MaF15 Convex, Partially separable, Large scale 20 × M overall ranking of algorithms across a set of test problems
with the same number of objectives. To establish the overall
rankings of algorithms, akin to the previous experiment, we
ten algorithms from [38]. The competition utilized the IGD initially compute the contribution of each algorithm to each
and HV scores to rank these algorithms. However, we have specific test problem. Subsequently, for each algorithm, we
utilized the ten many-objective metrics (including HV and aggregate the total number of points at each level across all
IGD) listed in Section III to take advantage of the different test problems with the same index. For instance, if Algorithm 1
aspects of these performance metrics. accumulates 20 points in level 1 when addressing test problem
1, and 12 and 8 points in levels 1 and 2, respectively, when
B. Ranking algorithms when solving one specific test problem addressing test problem 2, then the total number of points for
with a particular number of objectives Algorithm 1 would be 32 in level 1 and 8 in level 2. The table
in Fig. 2 represents each algorithm’s total number of points
Our first experiment includes ranking the ten algorithms
on each Pareto level when solving 10- and 15-objective MaF
based on a specific test problem with a specific number of
benchmark test problems. For instance, AGE-II contributed
objectives. Since the number of independent runs is set to 20,
171 points on the first level from all test problems when
the input to our proposed method is 10 × 20 M -dimensional
solving 10-objective MaF test problems. From the table, we
performance metric scores corresponding to each run.
can also observe that the sum of points in each row is 300
Fig. 1 shows the results of the proposed scheme when solv-
(i.e., 300 = 20 runs × 15 test problems).
ing 5-objective MaF1 and 15-objective MaF10 test problems.
Fig. 2 also shows the overall distribution of points on the
The NDS algorithm resulted in 7 levels of Pareto levels for
different Pareto levels using 2-D and 3-D RadViz visualization.
MaF1, and 6 Pareto levels for MaF10 test problems. The top
Additionally, the rankings of each algorithm based on all test
table in Fig. 1 shows the number of points contributed by
problems using the four proposed ranking schemes can be
each algorithm at each Pareto level. For instance, the scores
seen in the middle table of Fig. 2. For instance, when solving
of AGE-II for MaF5 resulted in 17 points at the first level,
15-objective MaF test problems, the fastCAR algorithm has
while BCE-IBEA and CVEA3 have 10 points at this level.
the maximum number of points on the first level compared to
Correspondingly, the sum of elements in each row is 20,
the other algorithms. As a result, it is ranked first according to
indicating the overall number of runs. Fig. 1 also illustrates
the Olympic method as well as the other ranking methods. On
the distribution of these points in different Pareto levels using
the other hand, the HHcMOEA algorithm is ranked second by
2-D and 3-D RadViz visualization [39]. From the figure, the
the Olympic method when solving 15 objective test problems
density of points is higher at the lower levels and decreases at
as it has the second-highest number of NDS contributions at
the higher Pareto levels. The middle table shows the ranks of
the first level. However, this algorithm is ranked 6th when
each algorithm based on the four ranking methods discussed
using the linear method to rank these algorithms. Hence, the
in section IV. For instance, using the Olympic method, the
ranking methods should be selected carefully to address the
CVEA3 and HHcMOEA algorithms have the highest rank
needs of each user. For example, the Olympic method would
when solving 15-objective MaF10 as all their points are on the
be useful when we are only interested in algorithms that have
first Pareto level. Although the four proposed ranking methods
the majority of their contribution in Pareto level 1.
generally provide the same ranking, they may sometimes result
in minor conflicts. For example, the AMPDEA algorithm is
ranked 10th using the Olympic method, however, it was ranked D. Determining the overall rankings of algorithms
9th when the other three ranking methods were used to solve In order to determine the overall rankings of algorithms, it is
the 5-objective MaF1 test problem. required to consider the results of all algorithms when solving
Table III illustrates the Olympic ranking of each algorithm all test problems with a different number of objectives. From
when addressing the 15-objective MaF1 to MaF15 benchmark the previous experiment, we have identified the contribution of
test problems. The table reveals instances where the Olympic each algorithm after the NDS step when solving 5-, 10-, and
strategy led to ties among some algorithms. For instance, 15-objective MaF test problems separately. Now, we combine
JOURNAL OF ... 8
MaF1, M=5
MaF10, M=15
Fig. 1. Outcome of the NDS algorithm and the ranks of algorithms for 5-objective MaF1 and 15-objective MaF10 benchmark test problems. The top
table shows the number of points associated with each Pareto level. The bottom two diagrams show the distribution of these points using the PartoRadviz
visualization along the ranks of these algorithms using the four ranking techniques proposed in this paper.
JOURNAL OF ... 9
M = 10 M = 15
L1 L2 L3 L4 L5 L6 L7 L8 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12
AGE-II 171 63 40 14 10 2 0 0 165 32 17 21 4 32 17 2 0 0 9 1
AMPDEA 254 38 8 0 0 0 0 0 190 52 29 16 10 3 0 0 0 0 0 0
BCE-IBEA 272 23 5 0 0 0 0 0 194 46 31 16 10 2 1 0 0 0 0 0
CVEA3 274 26 0 0 0 0 0 0 207 36 30 14 10 1 1 1 0 0 0 0
fastCAR 250 28 9 11 1 1 0 0 238 43 17 1 0 1 0 0 0 0 0 0
HHcMOEA 264 20 13 3 0 0 0 0 232 16 8 4 0 10 12 5 0 0 10 3
KnEA 246 46 8 0 0 0 0 0 217 37 25 12 7 2 0 0 0 0 0 0
RPEA 202 61 23 4 6 1 2 1 151 41 28 20 10 18 8 8 7 6 2 1
RSEA 233 61 6 0 0 0 0 0 164 40 38 18 12 9 6 4 8 1 0 0
RVEA 235 22 23 9 11 0 0 0 144 48 45 19 15 10 4 8 4 2 1 0
M=10
M=15
Fig. 2. The overall outcome of the NDS algorithm and the ranks of the ten algorithms when solving 10- and 15-objective MaF benchmark test problems.
The top table shows the total number of points associated with each Pareto level. The bottom two diagrams show the distribution of these points using the
PartoRadviz visualization along the ranks of these algorithms using the four ranking techniques proposed in this paper.
JOURNAL OF ... 10
TABLE III
R ANKS OF ALGORITHMS WHEN SOLVING 15- OBJECTIVE M A F TEST PROBLEMS USING THE O LYMPIC RANKING TECHNIQUE .
M = 15
AGE-II AMPDEA BCE-IBEA CVEA3 fastCAR HHcMOEA KnEA RPEA RSEA RVEA
MaF1 6 1 7 1 10 1 5 4 8 9
MaF2 6 4 8 2 1 7 5 10 3 9
MaF3 6 9 4 7 5 2 1 10 3 8
MaF4 2 5 7 2 2 1 8 6 9 9
MaF5 8 5 3 6 1 10 4 2 9 7
MaF6 5 7 4 7 3 1 1 7 6 7
MaF7 10 4 4 4 3 1 1 4 4 9
MaF8 8 7 1 1 5 9 6 10 3 3
MaF9 10 4 4 4 2 1 4 3 8 9
MaF10 4 6 8 1 3 1 7 8 5 10
MaF11 10 1 9 4 1 3 6 6 8 5
MaF12 6 8 7 2 1 9 2 10 2 2
MaF13 3 5 5 5 1 1 5 5 5 4
MaF14 5 6 4 8 3 1 1 9 10 7
MaF15 1 1 1 1 9 1 7 1 8 10
these contributions by adding the number of points at each cumulative points distributed across all levels, as opposed
level with the same index from the results of the three different to only considering the top-level points. Secondly, although
objectives (when M = 5, 10, and 15). Hence, we have a total all the proposed ranking methods demonstrated comparable
of nine hundred 10-dimensional points (15 test problems × ranking performance, the adaptive ranking method exhibited
3 different numbers of objectives × 20 runs). For example, a higher average pair-wise correlation value compared to the
suppose we are interested in ranking algorithms based on their other methods. This indicates that the results obtained from
performance when M = 5 and 10. Let Algorithm 1 has 120 the adaptive ranking method are more consistent with the
points in level 1 and 80 points in level 2 when solving test rankings generated by other methods. The average pair-wise
problems with M =5, and 100 points in level 1, 55 points in correlation values for the Olympic, linear, exponential, and
level 2, and 45 in level 3 when solving test problems with adaptive ranking methods are 0.947, 0.923, 0.956, and 0.960,
M =10. Then, the total number of points of Algorithm 1 would respectively. Another consideration is that the Olympic method
be 220 in level 1, 135 in level 2, and 45 in level 2. is particularly useful for identifying algorithms whose primary
contributions lie within the top Pareto level/s.
From Table IV we see that the combined results consist
of 18 ranks, and each algorithm consists of a total of 900
points. For some of the algorithms, such as AMPDEA, most E. Comparison of rankings from the Competition and the
of the points are located on the higher levels (i.e., the first 5 proposed method
levels) while for some others such as RVEA, the points are In this section, we evaluate the results from the proposed
distributed over 18 levels. Table V shows the overall ranking ranking method and the official rankings published by the CEC
of all algorithms using the four different ranking techniques 2018 competition when comparing ten evolutionary multi- and
discussed in Section IV along with their average ranking many-objective optimization algorithms on 15 MaF bench-
based on the four ranking techniques. For instance, the AGE- mark problems with 5, 10, and 15 objectives. Each algorithm
II algorithm is ranked 5th position using the Olympic method was run independently 20 times for each test problem with
while this algorithm takes the 7th position if the linear ranking 5, 10, and 15 objectives to generate 900 results (15 test
method is applied. However, for some algorithms such as problems × 3 different numbers of objectives × 20 runs).
RSEA an equal rank is assigned by all techniques. Based on The Committee ranked the 10 algorithms based on two multi-
the results obtained from the proposed four ranking methods, objective metrics, IGD and HV. They sorted the means of
the fastCAR algorithm is ranked 1st algorithm among the 10 each performance indicator value on each problem with each
state-of-the-art many-objective optimization algorithms based number of objectives (i.e., 90 ranks). The score achieved by
on the ten considered performance indicators, while RPEA, each algorithm was then computed as the sum of the reciprocal
conversely is ranked last. values of the ranks.
It is important to highlight that all four proposed ranking For a fair comparison with the official ranking provided by
methods produce satisfactory results. However, we suggest the ECE2018 committee, in this experiment, we have used
employing the average ranking across these methods to im- the HV and IGD metrics (as opposed to the ten metrics
partially evaluate them based on a comprehensive assess- used in the previous experiments) to rank these algorithms
ment. Additionally, using the average ranking helps mitigate using the proposed ranking methods. Table VI represents
discrepancies that may arise when different methods assign the results of ranking based on the competition scores and
varying rankings to the same algorithm. Alternatively, if one the proposed method. From this table, we see that, since
prefers to use a single ranking method, we recommend utiliz- both ranking methods use the HV and IGD metrics, both
ing the adaptive ranking method for the following reasons. methods resulted in comparable rankings, with CVEA3 as
Firstly, unlike the linear ranking method, it considers the 1st and RPEA as last. However, when comparing the results
JOURNAL OF ... 11
TABLE IV
D ISTRIBUTION OF 900 POINTS FROM EACH ALGORITHM OVER 18 PARETO LEVELS .
TABLE VI
C OMPARISON BETWEEN THE RANKING RESULTS PROVIDED BY THE C OMPETITION AND THE PROPOSED TECHNIQUES BASED ON HV AND IGD
MEASURES .
Proposed method
Algorithm Competition ranking
Olympic Linear Expo Adaptive Average Rank
AGE-II 4 5 4 6 4 5
AMPDEA 2 2 2 2 2 2
BCE-IBEA 3 4 3 3 3 3
CVEA3 1 1 1 1 1 1
fastCAR 7 3 7 4 5 6
HHcMOEA 10 7 10 8 9 8
KnEA 6 9 6 7 7 7
RPEA 9 10 8 10 10 10
RSEA 5 6 5 5 5 4
RVEA 8 8 9 9 8 9
quality aspects of an algorithm (convergence, distribution, [6] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective
spread, cardinality, ..., etc.), it is important to incorporate evolutionary algorithms: Empirical results,” Evolutionary computation,
vol. 8, no. 2, pp. 173–195, 2000.
several performance multi-/many-metrics to properly assess [7] N. Riquelme, C. Von Lücken, and B. Baran, “Performance metrics
the performance of multi-/many-objectives algorithms in all in multi-objective optimization,” in 2015 Latin American Computing
aspects of many-objective quality measures. The proposed Conference (CLEI), 2015, pp. 1–11.
[8] M. Li and X. Yao, “Quality evaluation of solution sets in multiobjective
multi-metric approach gives users the ability to incorporate as optimisation: A survey,” ACM Computing Surveys (CSUR), vol. 52,
many performance indicators as required to properly compare no. 2, pp. 1–38, 2019.
the quality of several competing algorithms. The proposed [9] C. Audet, J. Bigeon, D. Cartier, S. Le Digabel, and L. Salomon, “Per-
ranking method uses the NDS algorithm to categorize the level formance indicators in multiobjective optimization,” European journal
of operational research, vol. 292, no. 2, pp. 397–422, 2021.
of contribution from each algorithm and apply four ranking [10] E. Zitzler and L. Thiele, “Multiobjective optimization using evolutionary
techniques, namely, Olympic, linear, exponential, and adaptive algorithms—a comparative case study,” in International conference on
to rank algorithms based on several performance metrics. parallel problem solving from nature. Springer, 1998, pp. 292–301.
[11] D. A. Van Veldhuizen, Multiobjective evolutionary algorithms: clas-
Our experimental results indicate that the proposed rank- sifications, analyses, and new innovations. Air Force Institute of
ing method can effortlessly incorporate several performance Technology, 1999.
metrics to adequately rank multi-/many-objective algorithms [12] C. A. C. Coello and N. C. Cortés, “Solving multiobjective optimization
problems using an artificial immune system,” Genetic programming and
based on their overall achievements in several categories of evolvable machines, vol. 6, no. 2, pp. 163–190, 2005.
performance measures. Moreover, it can also be used as a [13] J. R. Schott, “Fault tolerant design using single and multicriteria genetic
general ranking technique for any application in which the algorithm optimization,” Ph.D. dissertation, Massachusetts Institute of
Technology, 1995.
evaluation of multiple metrics is required. This includes such
[14] E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: a com-
as machine learning (e.g., multi-loss), data mining (e.g., multi- parative case study and the strength pareto approach,” IEEE transactions
quality metrics), business (e.g., revenue, profitability, customer on Evolutionary Computation, vol. 3, no. 4, pp. 257–271, 1999.
satisfaction, employee engagement, market share), sport (e.g., [15] N. Riquelme, C. Von Lücken, and B. Baran, “Performance metrics
in multi-objective optimization,” in 2015 Latin American computing
scoring, assists, rebounds, blocks, tackles), healthcare (e.g., conference (CLEI). IEEE, 2015, pp. 1–11.
blood pressure, cholesterol levels, body mass index, heart [16] J. A. Nuh, T. W. Koh, S. Baharom, M. H. Osman, and S. N. Kew,
rate variability, cognitive function), education (e.g., grades, “Performance evaluation metrics for multi-objective evolutionary al-
gorithms in search-based software engineering: Systematic literature
standardized test scores, attendance), and environment (e.g., review,” Applied Sciences, vol. 11, no. 7, p. 3117, 2021.
air quality, water quality, biodiversity, and climate change). [17] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fon-
seca, “Performance assessment of multiobjective optimizers: An analysis
and review,” IEEE Transactions on evolutionary computation, vol. 7,
R EFERENCES no. 2, pp. 117–132, 2003.
[18] S. Jiang, Y.-S. Ong, J. Zhang, and L. Feng, “Consistencies and contra-
[1] Y. Hua, Q. Liu, K. Hao, and Y. Jin, “A survey of evolutionary algorithms dictions of performance metrics in multiobjective optimization,” IEEE
for multi-objective optimization problems with irregular pareto fronts,” transactions on cybernetics, vol. 44, no. 12, pp. 2391–2404, 2014.
IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 2, pp. 303–318, [19] G. G. Yen and Z. He, “Performance metric ensemble for multiobjective
2021. evolutionary algorithms,” IEEE Transactions on Evolutionary Computa-
[2] A. Asilian Bidgoli, S. Rahnamayan, B. Erdem, Z. Erdem, A. Ibrahim, tion, vol. 18, no. 1, pp. 131–144, 2013.
K. Deb, and A. Grami, “Machine learning-based framework to cover [20] E. Zitzler, L. Thiele, and J. Bader, “On set-based multiobjective op-
optimal pareto-front in many-objective optimization,” Complex & Intel- timization,” IEEE Transactions on Evolutionary Computation, vol. 14,
ligent Systems, vol. 8, no. 6, pp. 5287–5308, 2022. no. 1, pp. 58–79, 2009.
[3] S. Sharma and V. Kumar, “A comprehensive review on multi-objective [21] J. Yan, C. Li, Z. Wang, L. Deng, and D. Sun, “Diversity metrics in
optimization techniques: Past, present and future,” Archives of Compu- multi-objective optimization: Review and perspective,” in 2007 IEEE
tational Methods in Engineering, vol. 29, no. 7, pp. 5605–5633, 2022. International Conference on Integration Technology. IEEE, 2007, pp.
[4] K.-J. Du, J.-Y. Li, H. Wang, and J. Zhang, “Multi-objective multi- 553–557.
criteria evolutionary algorithm for multi-objective multi-task optimiza- [22] M. Ravber, M. Mernik, and M. Črepinšek, “The impact of quality
tion,” Complex & Intelligent Systems, pp. 1–18, 2022. indicators on the rating of multi-objective evolutionary algorithms,”
[5] T. Okabe, Y. Jin, and B. Sendhoff, “A critical survey of performance Applied Soft Computing, vol. 55, pp. 265–275, 2017.
indices for multi-objective optimisation,” in The 2003 Congress on [23] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist
Evolutionary Computation, 2003. CEC’03., vol. 2. IEEE, 2003, pp. multiobjective genetic algorithm: NSGA-II,” IEEE transactions on evo-
878–885. lutionary computation, vol. 6, no. 2, pp. 182–197, 2002.
JOURNAL OF ... 13