WFG Toolkit
WFG Toolkit
Abstract. This paper presents a new toolkit for creating scalable multi-
objective test problems. The WFG Toolkit is flexible, allowing char-
acteristics such as bias, multi-modality, and non-separability to be in-
corporated and combined as desired. A wide variety of Pareto optimal
geometries are also supported, including convex, concave, mixed con-
vex/concave, linear, degenerate, and disconnected geometries.
All problems created by the WFG Toolkit are well defined, are scal-
able with respect to both the number of objectives and the number of
parameters, and have known Pareto optimal sets. Nine benchmark multi-
objective problems are suggested, including one that is both multi-modal
and non-separable, an important combination of characteristics that is
lacking among existing (scalable) multi-objective problems.
1 Introduction
There have been several attempts to define test suites and toolkits for test-
ing multi-objective evolutionary algorithms (MOEAs) [1–4]. However, existing
multi-objective test problems do not test a wide range of characteristics, and
are often poorly designed. Typical defects include not being scalable and be-
ing susceptible to simple search strategies. Moreover, many problems are poorly
constructed, with unknown Pareto optimal sets, or featuring parameters with
poorly located optima.
As suggested for single-objective problems by Whitley et al. [5] and Bäck and
Michalewicz [6], test suites should include scalable problems that are resistant to
hill climbing strategies, are non-linear, non-separable1 , and multi-modal. Such
requirements are also a good start for multi-objective test suites, but unfortu-
nately are poorly represented in the literature.
Addressing this problem, this paper presents the Walking Fish Group (WFG)
Toolkit, which places an emphasis on allowing test problem designers to con-
struct scalable test problems with any number of objectives, where features such
1
Separable problems can be optimised by considering each parameter in turn, inde-
pendently of one another. A non-separable problem is thus characterised by parame-
ter dependencies, is more difficult, and is more representative of real world problems.
2 corrected version: 22 June 2005
2 Terminology
Despite the variety of existing test problems, there is clear need for additional
work. At present there is no toolkit for creating problems with an arbitrary
number of objectives, where desirable features can easily be incorporated or
omitted as desired. We remedy this problem with our WFG Toolkit.
is always zero for convenience), where all zi,max > 0. Note that all xi ∈ x will
have domain [0, 1].
Some observations can be made about the above formalism: substituting in
xM = 0 and disregarding all transition vectors provides a parametric equation
that covers and is covered by the Pareto optimal front of the actual problem,
working parameters can have dissimilar domains (which would encourage EAs
to normalise parameter domains), and employing dissimilar scaling constants
results in dissimilar Pareto optimal front tradeoff ranges (this is more represen-
tative of real world problems, and encourages EAs to normalise fitness values).
Shape functions determine the nature of the Pareto optimal front, and map pa-
rameters with domain [0, 1] onto the range [0, 1]. Each of h1:M must be associated
with a shape function. For example, letting h1 = linear1 , hm=2:M −1 = convexm ,
and hM = mixedM indicates that h1 uses the linear shape function, hM uses the
mixed shape function, and all of h2:M −1 use convex shape functions.
Table 1 presents five different types of shape functions. Example Pareto op-
timal fronts constructed using these shape functions are given in Fig. 1.
Table 1. Shape functions. In all cases, x1 , . . . , xM −1 ∈ [0, 1]. A, α, and β are constants.
Linear QM −1
linear1 (x1 , . . . , xM −1 ) = xi
¡Qi=1
M −m
¢
linearm=2:M −1 (x1 , . . . , xM −1 ) = xi (1 − xM −m+1 )
i=1
linearM (x1 , . . . , xM −1 ) = 1 − x1 PM
When hm=1:M = linearm , the Pareto optimal front is a linear hyperplane, where hm = 1.
m=1
Convex QM −1
convex1 (x1 , . . . , xM −1 ) = (1 − cos(xi π/2))
¡Qi=1
M −m
¢
convexm=2:M −1 (x1 , . . . , xM −1 ) = (1 − cos(xi π/2)) (1 − sin(xM −m+1 π/2))
i=1
convexM (x1 , . . . , xM −1 ) = 1 − sin(x1 π/2)
When hm=1:M = convexm , the Pareto optimal front is purely convex.
Concave QM −1
concave1 (x1 , . . . , xM −1 ) = sin(xi π/2)
¡Qi=1
M −m
¢
concavem=2:M −1 (x1 , . . . , xM −1 ) = sin(xi π/2) cos(xM −m+1 π/2)
i=1
concaveM (x1 , . . . , xM −1 ) = cos(x1 π/2)
When hm=1:M = concavem , the Pareto optimal front P is purely concave, and a region of the
M
hyper-sphere of radius one centred at the origin, where h2 = 1.
m=1 m
Mixed convex/concave (α > 0, A¡∈ {1, 2, . . .}) ¢
cos(2Aπx1 +π/2) α
mixedM (x1 , . . . , xM −1 ) = 1 − x1 − 2Aπ
Causes the Pareto optimal front to contain both convex and concave segments, the number of
which is controlled by A. The overall shape is controlled by α: when α > 1 or when α < 1, the
overall shape is convex or concave respectively. When α = 1, the overall shape is linear.
Disconnected (α, β > 0, A ∈ {1, 2, . . .})
discM (x1 , . . . , xM −1 ) = 1 − (x1 )α cos2 (A(x1 )β π)
Causes the Pareto optimal front to have disconnected regions, the number of which is controlled
by A. The overall shape is controlled by α (when α > 1 or when α < 1, the overall shape is
concave or convex respectively, and when α = 1, the overall shape is linear), and β influences
the location of the disconnected regions (larger values push the location of disconnected regions
towards larger values of x1 , and vice versa).
6 corrected version: 22 June 2005
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 1 0.2 1
0.8 0.8
00 0.6 00 0.6
0.2 0.4 0.2 0.4
0.4 0.4
0.6 0.2 0.6 0.2
0.8 0.8
1 0 1 0
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 1 0.2 1
0.8 0.8
00 0.6 00 0.6
0.2 0.4 0.2 0.4
0.4 0.4
0.6 0.2 0.6 0.2
0.8 0.8
1 0 1 0
1 2
0.8
1.5
0.6
1
0.4
0.5
0.2 1 2
0.8 1.5
00 0.6 00
1
0.2 0.4 0.5
0.4 1 0.5
0.6 0.2
0.8 1.5
1 0 2 0
Transformation functions map input parameters with domain [0, 1] onto the
range [0, 1]. All transformation functions take a vector of parameters (called the
primary parameters) and map them to a single value. Transformation functions
may also employ constants and secondary parameters that further influence the
mapping. Primary parameters allow us to qualify working parameters as being
position- and distance-related.
corrected version: 22 June 2005 7
There are three types of transformation functions: bias, shift, and reduction
functions. Bias and shift functions only ever employ one primary parameter,
whereas reduction functions can employ many.
Bias transformations have a natural impact on the search process by bias-
ing the fitness landscape. Shift transformations move the location of optima.
In the absence of any shift, all distance-related parameters would be extremal
parameters, with optimal value at zero. Shift transformations can be used to set
the location of parameter optima (subject to skewing by bias transformations),
which is useful if medial and extremal parameters are to be avoided. We rec-
ommend that all distance-related parameters be subjected to at least one shift
transformation.
The transformation functions are specified in Table 2 and plotted in Fig. 2.
To ensure problems are well designed, some restrictions apply as given in Table 3.
For brevity, we have omitted a weighted product reduction function (analogous
to the weighted sum reduction function).
By incorporating secondary parameters via a reduction function, b param
can create dependencies between distinct parameters, including position- and
distance-related parameters. Moreover, when employed before any shift trans-
formation, b param can create objectives that are effectively non-separable — a
separable optimisation approach would fail unless given multiple iterations, or a
specific order of parameters to optimise.
The deceptive and multi-modal shift transformations make the corresponding
problem deceptive and multi-modal respectively5 . When applied to position-
related parameters, some regions of the Pareto optimal set can become difficult
to find, and the mapping from the Pareto optimal set to the Pareto optimal front
will be many-to-one (even when k = M − 1)6 . When applied to distance-related
parameters, finding any Pareto optimal solution becomes more difficult.
The flat region transformation can have a significant impact on the fitness
landscape7 , and can also be used to create a stark many-to-one mapping from
the Pareto optimal front to the Pareto optimal set.
Creating problems with the WFG Toolkit involves three main steps: specifying
values for the underlying formalism (including scaling constants and parameter
domains), specifying the shape functions, and specifying transition vectors. To
aid in construction, a computer-aided design tool or meta-language could be used
to help select and connect together the different components making up the test
5
Multi-modal problems are difficult because an optimiser can become stuck in local
optima. Deceptive problems (as defined by Deb [1]) exacerbate this difficulty by
placing the global optimum in an unlikely place.
6
Many-to-one mappings from the Pareto optimal set to the Pareto optimal front
present difficulties to the optimiser, as choices must be made between two otherwise
equivalent parameter vectors.
7
Optimisers can have difficulty with flat regions due to a lack of gradient information
8 corrected version: 22 June 2005
0
A, B, C, and the secondary parameter vector y together determine the degree to which y is
biased by being raised to an associated power: values of u(y0 ) ∈ [0, 0.5] are mapped linearly onto
[B, B + (C − B)A], and values of u(y0 ) ∈ [0.5, 1] are mapped linearly onto [B + (C − B)A, C].
Shift: Linear (A ∈ (0, 1))
|y−A|
s linear(y, A) = |bA−yc+A|
A is the value for which y is mapped to zero.
Shift: Deceptive (A ∈ (0, 1), 0 < B ¿ 1, 0 < C ¿ 1, A − B > 0, A + B < 1)
s decept(y, A, B, C) = 1 + (|y
³ − A| − B)× ´
A−B 1−A−B
by−A+Bc(1−C+ ) bA+B−yc(1−C+ ) 1
B + B +
A−B 1−A−B B
A is the value at which y is mapped to zero, and the global minimum of the transformation. B
is the “aperture” size of the well/basin leading to the global minimum at A, and C is the value
of the deceptive minima (there are always two deceptive minima).
Shift: Multi-modal (A ∈ {1, 2, . . .}, B ≥ 0, (4A + 2)π ≥ 4B, C ∈ (0, 1))
£ ¡ |y−C|
¢¤ ¡ |y−C|
¢2
1+cos (4A+2)π 0.5− +4B
2(bC−yc+C) 2(bC−yc+C)
s multi(y, A, B, C) = B+2
A controls the number of minima, B controls the magnitude of the “hill sizes” of the multi-
modality, and C is the value for which y is mapped to zero. When B = 0, 2A + 1 values of y (one
at C) are mapped to zero, and when B 6= 0, there are 2A local minima, and one global minimum
at C. Larger values of A and smaller values of B create more difficult problems.
Reduction: Weighted Sum ¡ (|w| = |y|,¢ w1 , . . . , w|y| > 0)
P|y| P|y|
r sum(y, w) = w i yi / wi
i=1 i=1
By varying the constants of the weight vector w, EAs can be forced to treat parameters differently.
Reduction: Non-separable (A ¡ ∈ {1, . . . , |y|},
¯ |y| mod A = 0) ¯¢
P|y| PA−2
yj + ¯
yj −y1+(j+k) mod |y| ¯
j=1 k=0
r nonsep(y, A) = |y|
dA/2e(1+2A−2dA/2e)
A
A controls the degree of non-separability (noting that r nonsep(y, 1) = r sum(y, {1, . . . , 1})).
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.8
0.8 0.6
0.6
0.4 0.4
0.2 1
0.8
00 0.6 0.2
0.2 0.4
0.4
0.6 0.2
0.8
1 0
0
0 0.2 0.4 0.6 0.8 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2 1 0.2
0.8 1
0.8
00 0.6 00 0.6
0.2 0.4
0.2 0.4 0.4 0.6 0.2
0.4 0.8 1 0
0.6 0.2
0.8
1 0
(g) r sum for two parame- (h) r nonsep for two param-
ters (w1 = 1, w2 = 5). eters (A = 2).
Fig. 2. Example transformations. Each example plots the value of the input primary
parameter(s) versus the result of the transformation.
10 corrected version: 22 June 2005
problem. With the use of sensible default values, the test problem designer then
need only specify which features of interest they desire in the test problem. An
example scalable test problem is specified in Table 4 and expanded in Fig. 3.
Type Setting
Constants Sm=1:M = 2m
A1:M −1 = 1
The settings for S1:M ensures the Pareto optimal front will have dissimilar trade-
off magnitudes, and the settings for A1:M −1 ensures the Pareto optimal front is not
degenerate.
Domains zi=1:n,max = 2i
The working parameters have domains of dissimilar magnitude.
Shape hm=1:M = concavem
The purely concave Pareto optimal front facilitates the use of some performance met-
rics, where the distance of a solution to the nearest point on the Pareto optimal front
must be determined.
t1 t1i=1:n−1 = b param(yi , r sum({yi+1 , . . . , yn }, {1, . . . , 1}), 49.98
0.98
, 0.02, 50)
t1n = yn
By employing the parameter dependent bias transformation, this transition vector
ensures that distance- and position-related working parameters are inter-dependent
and somewhat non-separable.
2
t t2i=1:k = s decept(yi , 0.35, 0.001, 0.05)
t2i=k+1:n = s multi(yi , 30, 95, 0.35)
This transition vector makes some parts of the Pareto optimal front more difficult
to determine (due to the deceptive transformation), and also makes it more difficult
to converge to the Pareto optimal front (due to the multi-modal transformation). The
multi-modality is similar to Rastrigin’s function, with many local optima (61l −1), and
one global optimum, where the “hill size” between adjacent local optima is relatively
small.
t3 t3i=1:M −1 = r nonsep({y(i−1)k/(M −1)+1 , . . . , yik/(M −1) }, k/(M − 1))
t3M = r nonsep({yk+1 , . . . , yn }, l)
This transition vector ensures that all objectives are non-separable, and also reduces
the number of parameters down to M , as required by the framework.
which can be found by first determining zn , then zn−1 , and so on, until the
required value for zk+1 is determined. Once the optimal values for zk+1:n are
determined, the position-related parameters can be varied arbitrarily to obtain
different Pareto optimal solutions.
The example problem has a distinct many-to-one mapping from the Pareto
optimal set to the Pareto optimal front due to the deceptive transformation of
the position-related parameters. All objectives are non-separable, deceptive, and
corrected version: 22 June 2005 11
multi-modal, the latter with respect to the distance component. The problem is
also biased in a parameter dependent manner.
This example constitutes a well designed scalable problem that is both non-
separable and multi-modal — we are not aware of any problem in the literature
with comparable characteristics.
6 Implications
In this section, we propose a test suite that consists of nine scalable, multi-
objective test problems (WFG1–WFG9) that focuses on some of the more perti-
nent problem characteristics. Table 6 specifies WFG1–WFG9, the properties of
which are summarised in Table 7.
We make the following additional observations: WFG1 skews the relative sig-
nificance of different parameters by employing dissimilar weights in its weighted
sum reduction, only WFG1 and WFG7 are both separable and uni-modal, the
non-separable reduction of WFG6 and WFG9 is more difficult than that of
WFG2 and WFG3, the multi-modality of WFG4 has larger “hill sizes” (and is
thus more difficult) than that of WFG9, the deceptiveness of WFG5 is more dif-
ficult than that of WFG9 (WFG9 is only deceptive on its position parameters),
the position-related parameters of WFG7 are dependent on its distance-related
parameters (and other position-related parameters) — WFG9 employs a simi-
lar type of dependency, but distance-related parameters also depend on other
distance-related parameters, the distance-related parameters of WFG8 are de-
pendent on its position-related parameters (and other distance-related parame-
ters) and as a consequence the problem is non-separable, and the predominance
of concave Pareto optimal fronts facilitates the use of performance metrics that
require knowledge of the distance to the Pareto optimal front.
For WFG1–WFG7, a solution is Pareto optimal iff all zi=k+1:n = 2i × 0.35,
noting WFG2 is disconnected. For WFG8, it is required that all of:
−1
zi=k+1:n = 2i × 0.35(0.02+49.98( 49.98 −(1−2u)|b0.5−uc+ 49.98 |))
0.98 0.98
Table 6. The WFG test suite. The number of position-related parameters, k, must
be divisible by the number of underlying position parameters, M − 1 (this simplifies
reductions). The number of distance-related parameters, l, can be set to any positive
integer, except for WFG2 and WFG3, for which l must be a multiple of two (due to the
nature of their non-separable reductions). To enhance readability, for any transition
vector ti , we let y = ti−1 . For t1 , let y = z[0,1] = {z1 /2, . . . , zn /(2n)}.
0.8
f2 0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2
f1
8
The DTLZ test suite uses a fixed (relative to the number of objectives) number of
position parameters.
corrected version: 22 June 2005 15
eters. The WFG test suite provides a fairer means of assessing the true perfor-
mance of optimisation algorithms on a wider range of different problems.
8 Conclusions
The WFG Toolkit offers a substantial range of features. Test problem designers
can construct problems with a diverse range of Pareto optimal geometries and
can incorporate a variety of important features in the manner of their choosing. A
suite of nine test problems is presented that exceeds the functionality of existing
test suites. Significantly, the WFG Toolkit allows for the construction of scalable
problems that are both non-separable and multi-modal. Given the relevance of
both characteristics to real world problems and the corresponding lack of such
problems in the literature, the WFG Toolkit offers an important contribution in
assessing the quality of optimisation algorithms on these types of problems.
Acknowledgments
This work was partly supported by an Australian Research Council linkage grant.
References
11. Bentley, P.J., Wakefield, J.P.: Finding acceptable solutions in the pareto-optimal
range using multiobjective genetic algorithms. Soft Computing in Engineering
Design and Manufacturing, Springer-Verlag (1998) 231–240
12. Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the
Pareto archived evolution strategy. Evolutionary Computation 8 (2000) 149–172
13. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective
genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6
(2002) 182–197
14. Fonseca, C.M., Fleming, P.J.: On the performance assessment and comparison
of stochastic multiobjective optimizers. Parallel Problem Solving from Nature —
PPSN IV, Springer-Verlag (1996) 584–593