An Analysis of The Search Mechanisms of The Bees Algorithm
An Analysis of The Search Mechanisms of The Bees Algorithm
DOI:
10.1016/j.swevo.2020.100746
License:
Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)
Document Version
Early version, also known as pre-print
General rights
Unless a licence is specified above, all rights (including copyright and moral rights) in this document are retained by the authors and/or the
copyright holders. The express permission of the copyright holder must be obtained for any use of this material other than for purposes
permitted by law.
•Users may freely distribute the URL that is used to identify this publication.
•Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of private
study or non-commercial research.
•User may use extracts from the document in line with the concept of ‘fair dealing’ under the Copyright, Designs and Patents Act 1988 (?)
•Users may not further distribute the material nor use it for the purposes of commercial gain.
Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document.
If you believe that this is the case for this document, please contact [email protected] providing details and we will remove access to
the work immediately and investigate.
Abstract
The Bees Algorithm has been successfully applied for over a decade to a large
number of optimisation problems. However, a mathematical analysis of its
search capabilities, the effects of different parameters used, and various de-
sign choices has not been carried out. As a consequence, optimisation of the
Bees Algorithm has so far relied on trial-and-error experimentation. This paper
formalises the Bees Algorithm in a rigorous mathematical description, beyond
the qualitative biological metaphor. A review of the literature is presented,
highlighting the main variants of the Bees Algorithm, and its analogies and dif-
ferences compared with other optimisation methods. The local search procedure
of the Bees Algorithm is analysed, and the results experimentally checked. The
analysis shows that the progress of local search is mainly influenced by the size
of the neighbourhood and the stagnation limit in the site abandonment proce-
dure, rather than the number of recruited foragers. In particular, the analysis
underlines the trade-off between the step size of local search (a large neigh-
bourhood size favours quick progress) and the likelihood of stagnation (a small
neighbourhood size prevents premature site abandonment). For the first time,
the implications of the choice of neighbourhood shape on the character of the lo-
cal search are clarified. The paper reveals that, particularly in high-dimensional
spaces, hyperspherical neighbourhoods allow greater search intensification than
hypercubic neighbourhoods. The theoretical results obtained in this paper are
in good agreement with the findings of several experimental studies. It is hoped
that the new mathematical formalism here introduced will foster further under-
standing and analysis of the Bees Algorithm, and that the theoretical results
obtained will provide useful parameterisation guidelines for applied studies.
Keywords: Bees Algorithm, Optimisation, Statistical Analysis
∗ MarcoCastellani
Email addresses: [email protected] (Luca Baronti), [email protected]
(Marco Castellani), [email protected] (Duc Truong Pham)
The Bees Algorithm iteratively looks for better solutions to a specified opti-
misation problem. The algorithm is terminated when a given stopping criterion
is met (e.g. a pre-set number of optimisation cycles has elapsed, a solution of
2
satisfactory quality is found). Despite minor differences, the notation concern-
ing the main parameters and operators of the Bees Algorithm is consistent in
the literature. With some minor changes, it is also used in this paper:
• ns number of scout bees used only in the global search;
• nb number of sites where local search is performed;
• nr number of recruited forager bees for each of the nb sites;
• stlim number of cycles of local stagnation before a site is abandoned;
• ngh initial neighbourhood size of the nb sites;
• α neighbourhood shrinking parameter (0 < α < 1);
In the standard Bees Algorithm, the parameter ns describes the total number
of scouts used for random exploration (here ns) plus the number of scouts
(nb) marking the neighbourhoods (sites) selected for local search. That is,
nsstandard = ns + nb. Also, it is customary to allocate a larger number of
foragers (nre) to the very best ne < nb (elite) sites, and less (nrb < nre) to the
remaining nb − ne best sites. This distinction is not necessary for the analysis
proposed in this paper, and for the sake of compactness is dropped. Henceforth,
the parameter nr will refer likewise to nre or nrb.
In this study only continuous optimisation is considered, and each solution
is represented by an N -dimensional vector of real-valued decision variables sg =
{sg [1], ..., sg [N ]} ∈ Rn . The solutions are evaluated by a fitness function F
specific to the problem domain, which the algorithm aims to maximise. The
analysis of this paper is equally valid for a minimisation problem (min{F (·)} ≡
max{−F (·)}).
In this paper, each of the s ∈ {s(1) , . . . , s(nb) } nb sites selected for local
search is denoted by a centre sg and two additional variables: the time-to-live
integer variable sttl , and the local search edge se . The time-to-live variable sttl
is a counter that indicates the number of remaining cycles of stagnation before
the site is abandoned. The edge se defines the current spatial extent (henceforth
called search scope) of the local search.
For the sake of simplicity, unless otherwise stated, all the decision variables
will be henceforth defined in the same interval. Accordingly, se and ngh are
scalars, and the search scope at a given site s is delimited by a hypercube C
of edge se centred in the solution sg . Hereafter, this region will be indicated
as C(sg , se ). Local search is performed uniformly sampling nr solutions inside
C(sg , se ). In the general case that the interval of definition is not equal for
all parameters, se and ngh will be defined as vectors of size N . In this case,
local search is performed inside a box (i.e. an N -orthotope) of edges se =
{se [1], . . . , se [N ]} centred in sg . When relevant, the consequences of using box
sampling rather than cubic sampling will be discussed.
The algorithm steps are described in box 1. Except for minor changes (i.e.
no elite sites), the procedure described in box 1 can be regarded as the Stan-
dard Bees Algorithm (SBA [5]). In the neighbourhood shrinking procedure
3
Bees Algorithm: Main Steps
4
(described in step 3b) if the interval of definition of the parameters is not the
same for all variables, the i-th dimension of the box is reduced as sei = αsei . The
initialisation and site abandonment procedures are designed to keep constant at
each generation the sampling rate of the solution space1 .
Local search aims to find the fitness optimum within a neighbourhood cen-
tred on a promising solution. Because the centre of the neighbourhood is up-
dated as better solutions are found (step 3), the scope of local search dynamically
changes, and eventually includes the local attractor point in the search space
(i.e. a local optimum). It should be noted that, like any stochastic optimisa-
tion procedure, local search is not guaranteed to stop at the local optimum. In
particular, local search may be prematurely abandoned when (i) it stagnates
for stlim iterations (e.g. stops on a flat surface) or (ii) global search finds more
promising regions (fitter solutions) elsewhere in the search space (step 2).
Global random search aims to find previously unexplored regions of high
fitness in the search space. Global search can also be used to increase adaptation
to changes in dynamic fitness landscapes. The solutions found via local (i.e. the
centres of the nb neighbourhoods) and global search are ranked at the end of
every optimisation cycle, and the fittest nb solutions are kept as seeds (centres)
for the next optimisation cycle. As the local exploitation of one given site
progresses, the probability that this site is abandoned because random search
found a fitter solution decreases. For this reason, some authors do not use global
search [5], or give randomly generated solutions (young bees) time to ’grow up’
[11].
1 This is particularly useful when the performance of the BA is compared with the perfor-
5
Many recruitment, neighbourhood alteration, and site abandonment heuris-
tics were proposed in the literature.
Ghanbarzadeh [16] proposed two methods for setting the number of recruited
foragers proportionally to a) the fitness or b) the location of the sites. Other
authors proposed recruitment schemes where the number of foragers was pro-
portional to the fitness of the site, and decreased it progressively by a fixed
amount [13], or according to a fuzzy logic policy [17]. Pham et al [18] used
Kalman filtering to allocate number of bees to the sites selected for local search.
This strategy was used to train a Radial Basis Function neural network, and
improved the learning accuracy and speed of the neural network. Finally, Iman-
guliyev [19] proposed a recruitment scheme where the number of foragers for a
site was computed on the efficiency rate of the site, rather than its fitness score.
In its basic instance [16], the search scope of a site is changed (reduced)
when local search fails to improve. Ahmad [20] proposed two different methods
to change dynamically the neighbourhood of a site: a) BA-NE where the search
scope is increased if a better solution is found and kept invariant otherwise,
and b) BA-AN, where the neighbourhood is asymmetrically increased along the
direction that led to the last improvement and decreased otherwise.
When a site is abandoned, the best-so-far local solution is usually kept in
memory [5]. However, in some cases [13, 21] all the local solutions found before
abandoning a site are retained for later use. In Hierarchical Site Abandonment
[17] when a site s is abandoned, all the other sites with fitness lesser or equal
to s are abandoned too.
4. Related Techniques
6
Variable Neighbourhood Search: Main Steps
2. Take neighbourhood Si :
(a) sample a solution v uniformly inside the neighbourhood Si ;
(b) apply a local search procedure using v as seed to find a new
solution v 0 ;
i. if v 0 is fitter that x, set x = v 0 and i = 1;
ii. else set i = i + 1;
3. If i > k, terminate the algorithm and return the best found solu-
tion, otherwise iterate from step 2;
4.2. LJ Search
The LJ Search Method was successfully used to optimise feedback control
in nonlinear systems [33], as well as time-optimal [34] and time-delay [35, 36]
systems. Given an N-dimensional minimisation problem, the LJ Search Method
pseudocode is:
7
LJ Search: Main Steps
set+1 = αset
8
estimated from an a-posteriori analysis on the ’lifetime’ of a generic site s, from
its discovery by a scout to abandonment when local search stalls (i.e. it fails
to progress for stlim iterations). The case that the site is replaced by a more
promising site found via global search is not included. If needed, the results of
the below analysis are applicable to describe the behaviour of local search from
any point in time, not necessarily the discovery of the site, until abandonment.
Importantly, the final solution may not be the local optimum, that is, local
search may only provide an approximation of the local optimum.
• sgt with 1 < t < n is the site centre after t local search cycles;
• sgn is the final result of the local search, namely, the neighbourhood centre
at the last local search cycle, before the site is abandoned (i.e. sttl
n = 0);
The solution found in the tth local search cycle Lnr (sgt ) can be formalised as the
result of the following endomorphic function (maximisation problem):
for all v ∈ C(Lopt , ). If Lopt is the optimum of the subregion C(sgt , set ), the
operator Lnr provides a stochastic approximation of the local optimum within
C(sgt , set ). The expected quality of this approximation increases monotonically
with the number nr of candidate solutions sampled.
Lopt = lim Lnr (s) = arg max {F (v) | v ∈ C(sgt , set )} (4)
nr→∞ v
9
The series of solutions S defined in eq. (1) shares the same convergence prop-
erties of the LJ Search proved in [37]. Namely, without site abandonment,
a number of steps n exists such that the series of solutions S will eventually
converge to a local optimum.
Due to the monotonically increasing nature of the series F (sg1 ), . . . , F (sgn )
(see eq. (2)), sgn is the best solution found in the n iterations of local search at
site s.
The standard neighbourhood shrinking heuristic can be formally defined for
a hypercube as follows (0 < α < 1):
(
e set Lnr (sgt ) 6= sgt
st+1 = (5)
αset Lnr (sgt ) = sgt
Proof. The proof is trivial: if local search stagnates at cycle t, sgt+1 = sgt and
set+1 < set (neighbourhood shrinking). Then sgt ∈ C(sgt+1 , set+1 ) = C(sgt , set+1 ). If
local search progresses at cycle t, sgt+1 6= sgt and set+1 = set (no neighbour-
hood shrinking). Remembering that sgt+1 ∈ C(sgt , set ), it follows that sgt ∈
C(sgt+1 , set ) = C(sgt+1 , set+1 ).
This property also holds in case of box and hyperspherical sampling is used.
10
5.2. Bounds on Reach
Hereafter, the distance in the solution space that local search is able to cover
at a given site in a given number of cycles will be indicated as the reach of local
search. That is, the reach is the distance between the starting point of local
search (s1 ) and the best approximation of the local optimum after n cycles (sn ),
namely d(s1 , sn ). The upper and lower boundaries of the reach are defined as
follows:
Proposition 2 (Reach). The reach of local search in n hlearning cycles at a
√ i
g se1 N
given site s centred on solution s1 is bounded within the 0, n 2 interval,
where se1 is the site edge at the start of the search.
Proof. Minimum reach occurs when local search stalls since the very beginning,
namely sgt = sgt+1 ∀t ∈ {1, . . . , n = stlim}, and thus sg1 = sgn .
At cycle t, local search is bounded within the hypercube C centred in sgt ,
where the farthest solutions lie at the four vertexes of C. To attain maximum
reach, local search must progress at each cycle, so as the initial site edge se1 is not
reduced (i.e. no neighbourhood shrinking). The maximum step size per cycle is
bounded by the distance between the centre and a vertex of the N -dimensional
√
hypercube C, that is dv = set N/2. The upper bound of the reach at a given site
is therefore n times dv .
Proposition 2 gives the boundaries of ’how far’ local search can travel in n
learning cycles. The maximum step size is achievable only when the segment
that joins sg1 to sgn is parallel to the diagonal of the hypercube C, and every pair
√
of subsequent solutions sgt and sgt+1 (1 ≤ t < n) are distant d(sgt , sgt+1 ) = set N/2.
For example, this would be the case of a fitness landscape consisting of a sloped
hyperplane aligned with the diagonal of the hypercube C, or a hypersphere of
centre c lying in the direction of one of the diagonals of the hypercube centred
in sg1 .
Considering unitary time steps per iterations, the reach can be considered
as a measure of the ’travelling speed’ of the local search in the solution space.
Closely related to the reach is the convergence time of the local search (i.e.
the number of iterations taken to reach the local attractor). If the distance
between the centre of the site s1 and the optimum Lopt is d(sg1 , Lopt ), according
to Proposition 2 the minimum number of iterations nmin required to reach Lopt
are:
2d(sg1 , Lopt )
nmin = √ (9)
se1 N
This is the lower bound on the convergence time, and can be used to evaluate
the efficiency of local search on different fitness landscapes.
In the more general case of asymmetric boundaries (i.e. box sampling), the
maximum reach can be computed as follows:
v
uN −1
nu X 2
maxreach = t se [i] (10)
2 i=0 1
11
where se1 [i] is the i-th component of vector se1 . Finally, it is worth mentioning
that most - if not all - BA variants use a hypercube to define the scope of local
search. Consequently, the foragers are sampled inside an anisotropic region, and
the maximum reach depends on the orientation of the segment that joins sg1 to
Lopt respect to the diagonal of the hypercube.
0.5nr+1 + nr `
d(sgt , sgt+1 ) = ` − (11)
nr + 1 2
Proof. See electronic appendix.
The result of proposition 3 is valid for any locally monotonic fitness slope,
as long as C is fully in the monotonic region. Its validity extends also to multi-
dimensional surfaces such as regions of hyperplanes or hyperspheres. For this
to happen, the search scope must be isotropic (i.e. a hypersphere), the fitness
landscape must be strictly monotonic along the straight line joining the centre
of C to the local fitness maximum, and the fitness landscape inside the search
scope must be symmetric respect to said straight line. If these conditions are
verified, the expected step size will be given by eq. (11) for the direction where
the slope is monotonic, and zero (no bias) in the other directions. The above
conditions apply in the common case where local search is climbing one side of
a fairly regular hill or slope, but C does not include the fitness maximum yet.
12
Far from the peak, where the curvature of the sphere is small, the spread of
the solutions on the fitness landscape is large, and indistinguishable from the
spread on the planar surface. Near the hill top, where the curvature is large,
the solutions are tightly clustered near the fitness maximum. This behaviour
suggests that local search becomes increasingly focused and exploitative as it
approaches the local fitness maximum.
Figure 1 also shows little difference between the spread of the solutions
obtained using 10 and 20 foragers. Indeed, the expected step size grows in
sublinear fashion with the number of foragers (eq: 11). Figure 2 shows how the
average step size of 104 independent local search trials varies with the number
of foragers (nr). Also in this case, the search trials were performed in a circle
of radius 0.5 centred in {0.5, 0.5}, and the plot shows the result of local search
(st+1 ) along the direction of the slope of an inclined plane. The numerical
averages (blue dot) are in good agreement with the theoretical expectations of
eq. (11) (red line). The plot highlights how the result of local search quickly
reaches the borders of the neighbourhood, that is the asymptotic value of 1. In
general, it can be said that the size of the neighbourhood determines more than
the number of foragers the ability of local search to quickly climb (descend) a
fitness slope.
This section analyses the probability that a site may be abandoned due to
lack of progress of local search.
where
LRst ∪ GRst = C(sgt , set ) (13)
According to the above definitions, it can be said that local search progresses if
the output of the endomorphism Lnr belongs to GRst :
In general, LRst and GRst may include non-contiguous subregions, since the
region covered by C may contain several local optima.
Hereafter, the border (hypersurface) of an N -dimensional region A will be
indicated as A− , and the volume of A will be indicated as V(A). To analyse the
13
(a) F H-Plane (b) F H-Sphere (x = 10) (c) F H-Sphere (x = 1)
nr = 1
nr = 10
nr = 20
Figure 1: Search results (st+1 ) of 103 independent local search trials in three 2D fitness
landscapes: a plane sloped in the horizontal direction (left column), a hypersphere with
centre in x = 10, y = 0.5 (middle column), and a hypershpere centred in x = 1, y = 0.5 (right
column). An isotropic circular search scope of centre sgt = [0.5, 0.5] (green square) and radius
srt = 0.5 was used. The number of foragers nr was set to 1 (top row), 10 (middle row), and 20
bottom row. The blue dots represent the solutions found in the local search trials, and their
arithmetic average is marked by the red triangle. The maximum is always on the border of
the search scope, at the right-end extreme of the horizontal diameter line. At the bottom of
each panel, the expected step size (11) in the direction of the maximum is shown.
14
Average Step Size Covered by a Single L(sgt )
0.5
0.3
0.2
predicted
sampled
0.1
0 5 10 15 20
nr
Figure 2: Result of local search using an isotropic search scope of radius srt = 0.5 and centred
in sgt = {0.5, 0.5} on a sloped planar fitness surface. The predicted value (red line) along the
direction of the slope was calcuated from eq. (11), and closely matches the average values of
104 independent local search runs (blue dots).
likelihood that local search stalls and the site is abandoned, it is useful to define
the following two ratios:
V(LRst ) V(GRst )
|LRst | = |GRst | = (15)
V (C(sgt , set )) V (C(sgt , set ))
within the local search scope C(sgt , set ), |LRst | and |GRst | represent the fraction
of space where solutions of respectively lower and higher fitness lie. That is, they
represent the relative coverage of C of the two regions LRst and GRst . In par-
ticular, |GRst | represents the probability that one random sample of the search
scope yields a solution of higher fitness than s. From eq. (15), the following
properties hold:
0 < |LRst | ≤ 1 0 ≤ |GRst | < 1 |LRst | + |GRst | = 1 (16)
Also, from eq. (15) it follows that a solution sgt is the optimum of the subregion
C(sgt , set ) iff:
for all v ∈ C(sgt , set ). The local exploitative search of the BA aims to locate Lopt ,
inside the search scope C(sgt , set ). If the GRst region is significantly smaller than
the search scope, the probability of finding a better solution than sg is small,
and progress may be slow or stop. The neighbourhood shrinking procedure may
mitigate this problem, progressively reducing the search scope and increasing
the probability that a forager is generated inside GRst . For this to happen,
neighbourhood shrinking needs to keep GRst inside the search scope. Unless it
15
is a local optimum, it can be shown that the site centre sg is at least contiguous
to GRst . That is:
Proposition 4. A solution sg is either a local optimum of the fitness function
F , or lies on the border GR−
st of GRst .
sg 6∈ GR−
st ⇔ ∃ > 0 | v ∈ LRst ∀v ∈ B(sg , ) (18)
16
One important aspect of the stalling probability is that, since local search
is random, it is not affected by the slope of the fitness surface. Proposition 5
is valid regardless whether sttl
t = stlim, that is, local search has not begun to
stagnate yet, or sttl
t < stlim and local search has already begun to stagnate.
Variants that use a dynamic assignment of foragers, like [13, 18, 17], yield
a more complex behaviour that leads to a different stalling probability formu-
lation. Some ideas on how to deal with these variants will be discussed later
in this section. If neighbourhood shrinking is used, the progressive reduction of
the search scope needs to be taken into account. In this case, it is possible that
if local search is trapped in a secondary peak, the GRst region may be lost as
the search scope is reduced.
1 1
|LRst+k | = (|LRst | − 1) + 1 = 1 − kN |GRst | (21)
αkN α
1
|GRst+k | = |GRst | (22)
αkN
Proof. See electronic appendix.
1
In the above analysis it is important to remember that αkN |GRst | < 1,
g e
otherwise GRst would be larger than C(st , st ), which is impossible by definition.
Lemma 1 is of quite general validity, as long as GRst is small respect to
C(sgt , set ), and located relatively far from the borders of C(sgt , set ). Even when
17
a portion of GRst is close to the border of C(sgt , set ), neighbourhood shrinking
reduces mostly the largest area (LRst ), and GRst+1 ' GRst . Equation (22)
shows that the relative coverage of GRst , and hence the likelihood of sampling a
fitter solution than sgt , grows (1/α > 1) exponentially. Neighbourhood shrinking
is therefore a powerful heuristics to foster progress in the the local search pro-
cedure. Neighbourhood shrinking introduces also a trade-off between reducing
the reach of local search, and hence slowing down the convergence to the local
optimum (see eq. 9), and making local search progress more likely, thus avoiding
several cycles of stalling. The probability of a complete stalling of local search
(i.e. site abandonment) can be calculated from eq. (22) as follows:
Proposition 6 (Stalling Probability With Neighbourhood Shrinking and Con-
stant GRst ). Let sgt be the centre of site s at cycle t in the N -dimensional
solution space. The probability that local search stagnates for k cycles if GRst is
not changed by neighbourhood shrinking is:
k nr
g g
Y 1
P (st = st+k ) = 1 − hN |GRst | (23)
α
h=1
Proof. After h cycles of stalling, the probability Pnr=1 (sgh = sgh+1 ) of not sam-
pling a single solution fitter than sgt in C(sgt , set+h ) is determined by the relative
coverage of LRst+h , which is defined in eq. (21):
1
Pnr=1 (sgh = sgh+1 ) = |LRst+h | = 1 − |GRst | (24)
αkN
The probability of stalling at any cycle h is equal to the probability of not
picking a fitter solution than sgt in nr independent samples of C(sgt , set+h ):
P (sgh = sgh+1 ) = Pnr=1 (sgh = sgh+1 )nr (25)
The probability of k consecutive cycles of stalling is calculated from eqs. (24)
and (25):
k−1
P (sgt = sgt+k ) = P (sgh = sgh+1 ) =
Y
h=1
(26)
k−1
Y nr
1
= 1 − hN |GRst |
α
h=1
This result is valid as long as the number of recruited bees is constant for
the k cycles monitored. If the number of bees changes at every iteration, for
example as in Packianather et al [13], nr in eq. (23) should be replaced by a
variable number nrk .
The stalling probability can never be 0, since LRst 6= ∅ for any st . It
should also be noted that the results of propositions 5 and 6 are independent of
the neighbourhood shape. The implications of using hyperspherical instead of
hypercubic neighbourhoods will be discussed in Section 7.
18
6.4. A large stlim or nr?
Proposition 6 is important to understand the behaviour of the Bees Algo-
rithm when neighbourhood shrinking does not change GRst , or at least does
not change it significantly. As discussed in Section 6.3, this occurrence is most
likely when neighbourhood search is near the local optimum, that is, GRst is
small and near the centre of C. In this case, the probability that local search
stagnates is large (|GRst | is small), and the site may be abandoned after stlim
cycles of stalling before the local optimum is found (local search stalls).
The probability that local search stalls depends on the number nr of solutions
that are sampled in one local search cycle, the stalling limit stlim, and the size of
the search scope. The larger nr and stlim are, the more likely is to pick at least
one solution within GRst , and thus the smaller is the likelihood that local search
stalls. However, the effect of nr and stlim on the stalling probability is not the
same, due to the nonlinear reduction of the search scope by neighbourhood
shrinking. Given a fixed number of sampling opportunities (equal to nr · stlim),
the question is whether it is more beneficial to sample thoroughly C for lesser
iterations (large nr), or sample less intensely C for longer times (large stlim).
In this section, it is assumed that nr and stlim can be increased by an
integer factor q > 1, and the local search stalling probability will be indicated
as Pnr (st = st+stlim ), where the index nr accounts for the number of candidate
solutions sampled in C in one local search cycle.
Lemma 2. Let sgt be the centre of site s at cycle t in the N-dimensional so-
lution space. Assuming that GRst is not changed by neighbourhood shrinking,
an increase in the stalling limit by an integer factor q > 1 modifies the stalling
probability of local search as follows:
where T as:
q·stlim nr
Y 1
T = 1− |GRst | (28)
αkN
k=stlim+1
19
Lemma 3. Let st be the centre of site s at cycle t in the N-dimensional so-
lution space. Assuming that GRst is not changed by neighbourhood shrinking,
an increase in the number of foragers by an integer factor q > 1 modifies the
stalling probability of local search as follows:
The next remark will prove that if GRst is not changed by neighbourhood
shrinking, increasing the stalling limit by an integer factor q > 1 has more effect
on decreasing the stalling probability than increasing the number of foragers by
the same factor.
Proposition 7 (stlim vs. nr). Let st be the centre of site s at cycle t in the
N-dimensional solution space. Assuming that GRst is not changed by neigh-
bourhood shrinking, an increase in the stalling limit of an integer factor q > 1
reduces the stalling probability more than an equal increase in the number of
foragers.
Pnr (st = st+q·stlim ) < Pq·nr (st = st+stlim ) (32)
Proof. See electronic appendix.
Proposition 7 can also be proven considering a fixed number of available
sampling opportunities T = (q · nr) · stlim = nr · (q · stlim) of the search scope.
If the choice is to increase the number of foragers, C will be sampled q · nr
times for at most stlim cycles of stalling before being abandoned. If GRst is
unchanged by neighbourhood shrinking, all candidate solutions will be sampled
1
with a stalling probability πnr ≥ A = 1 − αstlim·N |GRst | (see eq. (A.32) in
the electronic appendix). If instead the choice is to increase the stalling limit,
C will be sampled nr times for q · stlim cycles, and (q · stlim − stlim) · nr of
1
these samples will have a stalling probability πstlim ≤ A = 1 − αstlim·N |GRst |
(see eq. (A.33) in the electronic appendix).
As long as GRst is not disjoint multimodal, proposition 7 gives the practi-
tioner a useful guideline to parameterize the Bees Algorithm. This case is not
uncommon as the scope of the local search has narrowed down on the attraction
basin of one peak of performance. If GRst contain several peaks, there is the
risk that repeated applications of the neighbourhood shrinking procedure may
cut the main peak out of GRst . In this latter case, a high nr ensures that many
sampling attempts are made before the main peak is lost. Unfortunately, the
20
actual fitness landscape is not known, and trial-and-error is usually needed to
address the nr vs. stlim trade-off. However, several empirical studies [5, 6, 7]
obtained the best performances over a large set of varied benchmarks using large
stlim values, suggesting a wide applicability of proposition 7.
Among the numerous variants of the BA, the shape of the search scope is one
of the least researched features in the literature. In the standard formulation of
the Bees Algorithm (Section 2), the search scope C(sgt , set ) of a site s at the cycle
t, is defined as a hypercube of side set centred in sgt . A new candidate solution
v ∈ C(sgt , set ) is generated uniformly sampling the hypercube C(sgt , set ).
The main limitation of this hypercubic sampling is the anisotropic character
of the search, which has the shortest extent in the direction of the coordinate
axes, and the longest aligned with the diagonals of the C(sgt , set ) hypercube. This
anisotropy introduces a bias in the local search.
Moreover, as the dimensionality of the solution space increases, the volume
of the C(sgt , set ) hypercube exponentially increases, making the sampling more
sparse (curse of dimensionality, [38]).
whilst the volume of the hypersphere initially grows and then decreases with
the number of dimensions [41]:
π /2
N
where Γ is gamma function. More precisely, keeping the radius srt fixed, the
volume increases for the first N ∗ dimensions, where
21
Parameter N nr set sttl
t sgt GRst center GRst radius
Value 4 15 10 8 [1, 0, 0, 0] [0, 0, 0, 0] 1
with
Γ (N/2 + 3/2)
DN = √ (36)
πΓ (N/2 + 1)
and sharply decreases afterwards, approaching zero for large N values. As
mentioned in Section 6, replacing cubic with spherical sampling does not alter
the validity of propositions 5 and 6.
Proposition 8 (Scope Variation Invariance). Let C(sgt , set ) and B(sgt , sr ) be the
local search scope using respectively cubic and spherical sampling, and |GRst |C
and |GRst |S be the relative coverage of the GRst region using respectively cubic
and spherical sampling. If neighbourhood shrinking does not change the GRst
region, shrinking the edge/radius of the search scope of a factor α leads to the
same change in the respective coverages:
1
C(sgt , αset ) ⇒ |GRst |C
αN (37)
1
B(sgt , αsrt ) ⇒ N |GRst |S
α
Proof. See supplementary material.
The consequence of proposition 8 is that the stagnation probability is com-
puted in the same way (proposition 6) regardless of the kind of sampling used.
However, the different behaviour of the search scope volume in the two cases
has important implications for high dimensional spaces.
A possible enhancement of the current algorithm would be to switch the
shape of the search scope opportunistically to foster the exploratory (cubic
sampling) or exploitative (spherical sampling) goal of local search.
22
In this case, the V (GRst ) V (C(sgt , set )) region is a hypersphere centred
in the origin with unitary radius3 . As per proposition 4, sgt lies on the (open)
surface of the hypersphere GRst . It can be shown that he following variables
take the values:
|GRst | ≈ 4.9348 · 10−4
V(C(sgt , set )) = 104 (38)
V(GRst ) ≈ 4.9348
and, by complementarity:
where set+sttl ≈ 4.3047. The initial and final relative coverage of the GR regions
t
are:
|GRst | ≈ 4.9348 · 10−4 |GRst+sttl | ≈ 1.4372 · 10−2 (42)
t
23
Sampling Type
Cubic Spheric
Predicted Experimental Predicted Experimental
Without NS 0.9425 0.9429 0.8252 0.8249
With NS 0.6714 0.67137 0.2725 0.2716
Table 2: Predicted stalling probability and experimental frequency of stalling events for the
four cases described in section 7.2.
24
Stalling Probability Varying the Dimensionality
0.8
Stalling Probability
0.6
0.4
Cubic Without NS
0.2
Cubic With NS
Spheric Without NS
0 Spheric With NS
2 4 6 8 10 12
Number of Dimensions (N )
Figure 3: Stalling probability using different sampling methods, with and without the neigh-
bourhood shrinking. All the parameters are kept fixed except the number of dimensions of
the problem.
8. Discussion
25
with the two aforementioned methods is in the way the neighbourhood is varied:
the Bees Algorithm uses neighbourhood shrinking, whilst VNS tries a number
of randomly generated shapes, and standard LJ shrinks the neighbourhood re-
gardless of the progress of local search. Also, the Bees Algorithm terminates
the local search after stlim stagnation cycles, whilst LJ Search customarily ter-
minates the search after a fixed number of iterations regardless of the progress.
In terms of overall metaheuristic, the Bees Algorithm performs several local
searches in parallel, adaptively shifting the sampling effort at each generation
according to the progress of the search. Neighbourhoods can be abandoned
due to lack of progress, or replaced with more promising ones found via global
search. For a comparison between the Bees Algorithm and akin swarm optimi-
sation techniques [46, 23] the reader is referred to [5].
The theoretical analysis of the properties of local search showed that the
expected step size quickly approaches the maximum value as the number of for-
ager bees is increased. If local search is desired to quickly climb (descend) the
fitness slope, a large neighbourhood size is more beneficial than a large number
of foragers. Analysis of the stalling probability also found limited benefits in
increasing the number of local foragers. That is, neighbourhood shrinking and a
large stagnation limit are the most effective policies against premature stagna-
tion of local search. This latter result is in good agreement with the indications
of several experimental studies [5, 6, 7], where best performances were obtained
using the largest allowed value for the stagnation limit stlim.
One of the main contributions of this theoretical analysis regards the shape
of the local neighbourhood. For ease of implementation, nearly all versions of
the Bees Algorithm used hypercubic local neighbourhoods. As demonstrated,
hypercubic sampling biases the search along the directions of the diagonal, and
has poor exploitation capabilities in high dimensional spaces due to the curse
of dimensionality. That is, the volume of hypercubic neighbourhoods is a power
function of the search scope edge se . As suggested in section 7, the neighbour-
hood shape might be varied during the search to switch from explorative (cubic
sampling) to exploitative (spherical sampling) search strategies.
9. Conclusions
26
The shape of the neighbourhood function has been so far largely overlooked
in the Bees Algorithm literature. However, it was shown in section 7 that
the customary choice of hypercubic sampling creates large neighbourhoods in
high-dimensional spaces due to the curse of dimensionality. On the other hand,
hyperspherical sampling creates neighbourhoods of sizes that vary according to
the gamma function, and tend to become small in high-dimensional spaces (zero
for infinitely high-dimensional spaces). Thus, the exploitation capability of local
search is highly influenced by the choice of neighbourhood shape.
Overall, the Bees Algorithm can be seen as a parallel adaptive version of the
LJ Search and VNS algorithms (section 4), in which the modification of the
neighbourhood size and allocation of sampling opportunities are dynamically
adjusted according to the fitness of the neighbourhood centres and the local
progress of the search. Differently from LJ Search and VNS, the Bees Algorithm
also keeps on searching the fitness landscape for new promising neighbourhoods
via the global search procedure.
Throughout the paper, the Bees Algorithm was presented in a rigorously
mathematical and algorithmic format, beyond the customary qualitative de-
scription based on the biological metaphor. It is hoped that this new formalism
improves the understanding of the Bees Algorithm, and spurs new analytical
studies on its properties, and its similarities to and differences with other Swarm
Intelligence metaheuristics.
27
[8] A. Auger, B. Doerr, Theory of randomized search heuristics: Foundations
and recent developments, vol. 1, World Scientific, 2011.
[9] W. A. Hussein, S. Sahran, S. N. H. S. Abdullah, The variants of the Bees
Algorithm (BA): a survey, Artificial Intelligence Review 47 (1) (2017) 67–
121.
28
[21] D. T. Pham, E. Koç, Design of a two-dimensional recursive filter using the
bees algorithm, International Journal of Automation and Computing 7 (3)
(2010) 399–402.
[22] A. Rajasekhar, N. Lynn, S. Das, P. N. Suganthan, Computing with the col-
lective intelligence of honey bees–a survey, Swarm and Evolutionary Com-
putation 32 (2017) 25–48.
[23] D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical
function optimization: artificial bee colony (ABC) algorithm, Journal of
global optimization 39 (3) (2007) 459–471.
29
[34] R. Luus, A practical approach to time-optimal control of nonlinear sys-
tems, Industrial & Engineering Chemistry Process Design and Development
13 (4) (1974) 405–408.
[35] S. H. Oh, R. Luus, Optimal feedback control of time-delay systems, AIChE
Journal 22 (1) (1976) 140–147.
30
An Analysis of the Search Mechanisms
of the Bees Algorithm
(Electronic Appendix)
Luca Baronti, Marco Castellani, and Duc Truong Pham
Department of Mechanical Engineering, University of Birmingham, United
Kingdom
A. Theorems Proofs
0.5nr+1 + nr `
d(sgt , sgt+1 ) = ` − (A.1)
nr + 1 2
Proof. The goal is to calculate the expected output of the stochastic local search
operator sgt+1 = Lnr (sgt ) defined in eq. (2), with the search scope within [0, `].
This output can be expressed as the following continuous random variable:
where:
xi ∼ U (0, `) and φ(xi ) = max{F (sgt ), F (xi )} (A.3)
The expected value E of a continuous random variable Y defined in the
interval [a, b] is computed as [47]:
Z b
E[Y ] = y · P DFY (y)dy (A.4)
a
where P DFY (y) is the probability density function of Y . The probability den-
sity function of a random variable Y is equal to the derivative of the cumula-
tive distribution function CDFY (y). In the case of the variable X defined in
eq. (A.2):
nr
Y
CDFX (x) = P (φ(xi ) ≤ x) = CDFφ(x) (x)nr (A.5)
i=1
31
where P (φ(xi ) ≤ x) is the probability that one random sample xi of the search
scope is less or equal to x. Differentiating CDFX (x) and plugging the derivative
into eq. (A.4):
Z l
E[X] = φ(x) · nr · CDFφ(x) (x)nr−1 P DFφ(x) (x)dx (A.6)
0
Using the Law of the Unconscious Statistician [47] is possible to make the fol-
lowing substitution:
Z `
E[X] = φ(x) · nr · CDFφ(x) (x)nr−1 P DFφ(x) (x)dx
0
Z `
(A.7)
nr−1
= φ(x) · nr · CDFx (x) P DFx (x)dx
0
and (
1
` 0≤x≤`
P DFU (x) = (A.9)
0 elsewhere
From eq. (A.2) it is known that x ∼ U (0, `), also F is assumed to be monotonic4 ,
therefore:
Z `
E[X] = max{x, sgt } · nr · CDFU (x)nr−1 P DFU (x)dx =
0
Z `/2 x nr−1 1 Z ` x nr−1 1
= `/2 · nr · dx + x · nr · dx =
0 ` ` `/2 ` `
`/2 l
nr · xnr nr · xnr+1
= + =
2nr(`nr−1 ) 0 `nr (nr + 1) `/2 (A.10)
nr+1 nr · `(1 − 0.5nr+1 )
= 0.5 `+ =
nr + 1
nr0.5nr+1 ` + 0.5nr+1 ` + nr · ` − nr0.5nr+1 `
= =
nr + 1
0.5nr+1 + nr
=`
nr + 1
4 In the proof the case of monotonic increasing fitness is considered. This is not a loss of
generality since only the expected step size is considered, not the direction.
32
Therefore the average step size is:
0.5nr+1 + nr `
d(sgt , sgt+1 ) = |sgt+1 − sgt | = ` − (A.11)
nr + 1 2
Since GRst is not reduced, the reduction LRst will be equal to the reduction of
C(sgt , set )). That is:
V(LRst+k ) = V(LRst ) − V(C(sgt , set )) − V(C(sgt+k , set+k ))
(A.16)
From the definition of the relative coverage (eqs. (15) and (A.15)):
V(LRst+k ) V(LRst+k )
|LRst+k | = g = kN (A.17)
V(C(st+k , st+k ))
e α V(C(sgt , set ))
V(LRst+k )
And from eq. (A.16), αkN V(C(sg e is equals to:
t ,st ))
33
Equation (21) is obtained rearranging the final line of eq. (A.18). Remembering
eq. (16), it is also straightforward to show that:
1
|LRst+k | = (|LRst | − 1) + 1 =
αkN
1
= kN (1 − |GRst | − 1) + 1 = (A.19)
α
1
= 1 − kN |GRst |
α
Finally, eq. (22) is obtained from eq. (21) and eq. (16)
|GRst+k | = 1 − |LRst+k | =
(A.20)
1 1
= 1 − 1 − kN |GRst | = kN |GRst |
α α
Proof. The site stalls if the search stagnates for the next sttl
t cycles, that is, if all
the nr candidate solutions generated during sttl t local search cycles lie in LRsk .
When local search stagnates, the centre of the site is unchanged, and if the
search scope is not changed (no neighbourhood shrinking), |LRst | is constant:
The joint probability that all the solutions sampled during one given cycle
k of local search belong to LRsk is indicated as:
Due to the uniform sampling, the probability that one solution is picked from
LRsk corresponds to |LRsk |. Remembering eq. (A.22), it follows that:
The stalling probability can then be computed as the joint probability of sttl
t
consecutive stagnations of local search cycles:
t+sttl
t −1
Y ttl
P (st = st+sttl
t
)= |LRsk |nr = |LRst |nr·st (A.25)
k=t
34
A.4. stlim vs. nr
Proposition 7 (stlim vs. nr). Let st be the centre of site s at cycle t in the
N-dimensional solution space. Assuming that GRst is not changed by neigh-
bourhood shrinking, an increase in the stalling limit of an integer factor q > 1
reduces the stalling probability more than an equal increase in the number of
foragers.
Pnr (st = st+q·stlim ) < Pq·nr (st = st+stlim ) (A.26)
Proof. Remembering lemmas 2 and 3, eq. (32) can be re-written as:
with
q·stlim nr
Y 1
A = Pnr (st = st+stlim ) 1− kN
|GR st | (A.28)
α
k=stlim+1
with Pnr (st = st+stlim ) a non-null probability, and hence a positive real number.
Equation (A.27) can thus be rewritten as:
q·stlim nr
Y 1
1− kN
|GR st | < Pnr (st = st+stlim )q−1 (A.29)
α
k=stlim+1
Remembering proposition 6:
stlim
Y nr !q−1
q−1 1
Pnr (st = st+stlim ) = 1 − kN |GRst | (A.30)
α
k=1
The two terms inside the brackets on the right and left hand sides of A.31 express
the relative coverage of LRst at time k. That is, they represent the probability
of picking a solution of lower fitness than st inside C at time k. They become
smaller as k increases (α < 1), and thus:
1 1
1− |GRst | ≤ 1 − |GRst | (A.32)
αstlim·N αkN
for all k ∈ {1, stlim}. Likewise:
1 1
1− |GRst | ≤ 1 − |GRst | (A.33)
αkN αstlim·N
for all k ∈ {stlim + 1, q · stlim}. Accordingly:
35
with
stlim
Y (q−1)·nr
1
X= 1− stlim·N
|GR st
|
α
k=1
stlim
Y (q−1)·nr
1
Y = 1− |GRst |
αkN
k=1
q·stlim nr (A.35)
Y 1
W = 1− |GRst |
αkN
k=stlim+1
q·stlim nr
Y 1
Z= 1− stlim·N
|GRst |
α
k=stlim+1
q·stlim stlim
nr (q−1)·nr
Y Y
(A) ≤ (A) (A.36)
k=stlim+1 k=1
V(GRst ) V(GRst )
= =
V(C(sgt , αset )) (αset )N
(A.39)
1 V(GRst ) 1 V(GRst )
= N · = N ·
α (set )N α V(C(sgt , set ))
36
V(GRst ) V(GRst )
= =
V(B(sgt , αsrt )) VN · (αsrt )N
(A.40)
1 V(GRst ) 1 V(GRst )
= N · = N ·
α r
VN · (st )N α V(B(sgt , srt ))
37