Download
Download
net/publication/4219435
CITATIONS READS
202 1,871
2 authors, including:
Christian Blum
Spanish National Research Council
294 PUBLICATIONS 17,488 CITATIONS
SEE PROFILE
All content following this page was uploaded by Christian Blum on 02 June 2014.
Published by:
IRIDIA, Institut de Recherches Interdisciplinaires
et de Développements en Intelligence Artificielle
Université Libre de Bruxelles
Av F. D. Roosevelt 50, CP 194/6
1050 Bruxelles, Belgium
Abstract— Ant colony optimization is an optimization tech- As soon as an ant finds a food source, it evaluates the quantity
nique that was inspired by the foraging behaviour of real ant and the quality of the food and carries some of it back to
colonies. Originally, the method was introduced for the applica- the nest. During the return trip, the quantity of pheromone
tion to discrete optimization problems. Recent research efforts
led to the development of algorithms that are also applicable that an ant leaves on the ground may depend on the quantity
to continuous optimization problems. In this work we present and quality of the food. The pheromone trails guide other
one of the most successful variants for continuous optimization ants to the food source. It has been shown in [8] that the
and apply it to the training of feed-forward neural networks for indirect communication between the ants via pheromone
pattern classification. For evaluating our algorithm we apply it to trails enables them to find shortest paths between their nest
classification problems from the medical field. The results show,
first, that our algorithm is comparable to specialized algorithms and food sources. The shortest path finding capabilities of
for neural network training, and second, that our algorithm has real ant colonies are exploited in artificial ant colonies for
advantages over other general purpose optimizers. solving optimization problems.
I. I NTRODUCTION While ACO algorithms were originally introduced to
Pattern classification is an important real-world problem. In solve discrete optimization (i.e., combinatorial) problems,
the medical field, for example, pattern classification problems their adaptation to solve continuous optimization problems
arise when physicians are interested in reliable classifiers enjoys an increasing attention. Early applications of the ants
for diseases based on a number of measurements. Feed- metaphor to continuous optimization include algorithms such
forward neural networks (NNs) are commonly used systems as Continuous ACO (CACO) [2], the API algorithm [17], and
for the task of pattern classification [5], but require prior Continuous Interacting Ant Colony (CIAC) [12]. However,
configuration. Generally the configuration problem consists all these approaches follow rather loosely the original ACO
hereby of two parts: First, the structure of the feed-forward framework. The latest approach, which is at the same time the
NN has to be determined. Second, the numerical weights of approach that is closest to the spirit of ACO for combinatorial
the neuron connections have to be determined such that the problems, was proposed in [20]. In this work we extend
resulting classifier is as correct as possible. In this work we this approach, and apply it to the problem of optimizing
focus only on the second part, namely the optimization of the the weights of feed-forward NNs for the task of pattern
connection weights. We adopt the NN structures from earlier classification.
works on the same subject.
The outline of our work is as follows. In Section 2 we
Ant colony optimization (ACO) is an optimization shortly present the structure of feed-forward NNs for the pur-
technique that was introduced for the application to discrete pose of pattern classification. Then, in Section 3 we present the
optimization problem in the early 90’s by M. Dorigo ACO algorithm, while in Section 4 we compare our algorithm
and colleagues [9], [10], [11]. The origins of ant colony to methods specialized for feed-forward NN training, as well
optimization are in a field called swarm intelligence (SI) [6], as to a genetic algorithm. Finally, in Section 5 we offer a
which studies the use of certain properties of social insects, conclusion and a glimpse of future work.
flocks of birds, or fish schools, for tasks such as optimization.
The inspiring source of ACO is the foraging behaviour of real II. F EED - FORWARD NEURAL NETWORKS FOR PATTERN
ant colonies. When searching for food, ants initially explore CLASSIFICATION
the area surrounding their nest in a random manner. While A dataset for pattern classification consists of a number of
moving, ants leave a chemical pheromone trail on the ground. patterns together with their correct classification. Each pattern
o
np n0
input hidden output i1 omax − omin X X 2
layer layer layer w1
SEP = 100 (tpi − opi ) , (1)
n0 np p=1 i=1
D i2 w2
i3 D where omax and omin are respectively the maximum and
o2 minimum values of the output signals of the output neurons, np
P
D D f( ) o
i2 D represents the number of patterns, n0 is the number of output
D D o1 wbias neurons, and tpi and opi represent respectively the expected and
i1 D wn actual values of output neuron i for pattern p.
D in ibias
o III. A NT COLONY OPTIMIZATION FOR CONTINUOUS
OPTIMIZATION
(a) (b)
ACO algorithms are iterative methods that try so solve
Fig. 1. (a) shows a feed-forward NN with one hidden layer of neurons. Note optimization problems as follows. At each iteration candidate
that each neuron of a certain layer is connected to each neuron of the next solutions are probabilistically constructed by sampling a prob-
layer. (b) shows one single neuron (from either the hidden layer, or the output ability distribution over the search space. Then, this probability
layer). The neuron receives inputs (i.e., signals il , weighted by weights wl )
from each neuron of the previous layer. Additionally, it receives distribution is modified using the better ones among the con-
P a so-called
bias input ibias with weight wbias . The transfer function f ( ) of a neuron structed solutions. The goal is to bias over time the sampling
transforms the sum of all the weighted inputs into an output signal, which of solutions to areas of the search space that contain high
servers as input for all the neurons of the following layer. Input signals, output
signals, biases and weights are real values. quality solutions.
In ACO algorithms for discrete optimization problems,
the probabilitiy distribution is discrete and is derived from
artificial pheromone information. In a way, the pheromone
consists of a number of measurements (i.e., numerical values). information represents the stored search experience of the
The goal consists in generating a classifier that takes the algorithm. In contrast, our ACO algorithm for continuous op-
measurements of a pattern as input, and provides its correct timization, henceforth denoted by ACO*, utilizes a continuous
classification as output. A popular type of classifier are feed- probability density function (PDF). This density function is –
forward neural networks (NNs). for each solution construction – produced from a population
A feed-forward NN consists of an input layer of neurons, P of solutions that the algorithm keeps at all times. The
an arbitrary number of hidden layers, and an output layer management of this population works as follows. Before the
(for an example, see Figure 1). Feed-forward NNs for pattern start of the algorithm, the population—whose size k is a
classification purposes consist of as many input neurons as parameter of the algorithm—is filled with random solutions.
the patterns of the data set have measurements, i.e., for each Even though the domains of the decision variables are not
measurement there exists exactly one input neuron. The output restricted, we used the initial interval [−1, 1] for the sake
layer consists of as many neurons as the data set has classes, of simplicity. Then, at each iteration a set of m solutions is
i.e., if the patterns of a medical data set belong to either the generated and added to P . The same number of the worst
class normal or to the class pathological, the output layer solutions are removed from P . This biases the search process
consists of two neurons. Given the weights of all the neuron towards the best solutions found during the search.
connections, in order to classify a pattern, one provides its For constructing a solution an ant acts as follows. First,
measurements as input to the input neurons, propagates the it transforms the original set of decision variables X =
output signals from layer to layer until the output signals {X1 , . . . , Xn } into a set of temporary variables Z =
of the output neurons are obtained. Each output neuron is {Z1 , . . . , Zn }. The purpose of introducing temporary variables
identified with one of the possible classes. The output neuron is to improve the algorithms performance by limiting the
that produces the highest output signal classifies the respective correlation between decision variables. Note that this trans-
pattern. formation also affects the population of solutions: All the
The process of generating a NN classifier consists of de- solutions are transformed to the new coordinate system as
termining the weights of the connections between the neurons well. The method of transforming the set of decision variables
such that the NN classifier shows a high performance. Since is presented towards the end of this section.
the weights are real-valued, this is a continuous optimization Then, at each construction step i = 1, . . . , n, the ant chooses
problem of the following form: Given are n decision variables a value for decision variable Zi . For performing this choice it
{X1 , . . . , Xn } with continous domains. These domains are uses a Gaussian kernel PDF, which is a weighted superposition
not restricted, i.e., each real number is feasible. Furthermore, of several Gaussian functions. For a decision variable Zi the
the problem is unconstrained, which means that the variable Gaussian kernel Gi is given as follows:
settings do not depend on each other. Sought is a solution k (z−µj )2
35
30
30
20
25
10
0
Fig. 3. Box-plots for Cancer1. The boxes are drawn between the first and Fig. 4. Box-plots for Diabetes1. The boxes are drawn between the first and
the third quartile of the distribution, while the indentations in the box-plots the third quartile of the distribution, while the indentations in the box-plots
(or notches) indicate the 95 % confidence interval. (or notches) indicate the 95 % confidence interval.
The Heart1 problem (see Figure 5) is–with 230 weights– and GA-LM. Table III summarizes the results obtained by
the largest problem that we tackled. It is also the one on the ACO* and GA based algorithms. Clearly the stand-alone
which the performance of the algorithms differed mostly. All ACO* performs better than the stand-alone GA for all the
tested algorithms clearly outperform RS, but there are also test problems. ACO*-BP and ACO*-LM perform respectively
significant differences among the more complex algorithms. better than GA-BP and GA-LM on both of the more difficult
BP, which was performing quite well on the other two test problems Diabetes1 and Heart1 and worse on Cancer1.
problems, did not do so well on Heart1. ACO* achieves For the Heart1 problem the mean performance of any ACO*
results similar to BP. In turn, LM which was not performing based algorithm is significantly better than the best GA based
so well on the first two problems, obtains quite good results. algorithm (which was reported as the state-of-the-art for this
Very interesting is the performance of the hybridized versions problem in 2004).
of ACO*, namely ACO*-BP and ACO*-LM. The ACO*-BP
hybrid clearly outperforms both ACO* and BP. ACO*-LM V. C ONCLUSION
outperforms respectively ACO* and LM. Additionally, We have presented an ant colony optimization algorithm
ACO*-LM performs best overall. (i.e., ACO*) for the training of feed-forward neural networks
for pattern classification. The performance of the algorithm
Finally, it is interesting to compare the performance of the was evaluated on real-world test problems and compared
ACO* based algorithms to some other general optimization to specialized algorithms for feed-forward neural network
algorithms. Alba and Chicano [1] have published the results training (back propagation and Levenberg-Marquardt), and
of a genetic algorithm (GA) used for tackling exactly the also to algorithms based on a genetic algorithm.
same three problems as we did. They have tested not only The performance of the stand-alone ACO* was comparable
a stand-alone GA, but also its hybridized versions: GA-BP (or at least not much worse) than the performance of spe-
TABLE III
PAIR - WISE COMPARISON OF THE RESULTS OF THE ACO* BASED ALGORITHMS WITH RECENT RESULTS OBTAINED BY A SET OF GA BASED ALGORITHMS
( SEE [1]). T HE RESULTS CAN BE COMPARED DUE TO THE FACT THAT 1000 EVALUATIONS AS STOPPING CRITERION WERE USED FOR ALL THE
ALGORITHMS . F OR EACH PROBLEM - ALGORITHM PAIR WE GIVE THE MEAN ( OVER 50 INDEPENDENT RUNS ), AND THE STANDARD DEVIATION ( IN
BRACKETS ). T HE BEST RESULT OF EACH COMPARISON IS INDICATED IN BOLD .
Heart (CEP) [2] B. Bilchev and I. C. Parmee, “The ant colony metaphor for searching
continuous design spaces,” in Proceedings of the AISB Workshop on
Evolutionary Computation, ser. Lecture Notes in Computer Science, vol.
993, 1995, pp. 25–39.
35
1999.
[7] G. E. P. Box and M. E. Muller, “A note on the generation of random
normal deviates,” Annals of Mathematical Statistics, vol. 29, no. 2, pp.
15
610–611, 1958.
[8] J.-L. Deneubourg, S. Aron, S. Goss, and J.-M. Pasteels, “The self-
aco acobp acolm bp lm rs organizing exploratory pattern of the argentine ant,” Journal of Insect
Behaviour, vol. 3, pp. 159–168, 1990.
Fig. 5. Box-plots for Heart1. The boxes are drawn between the first and [9] M. Dorigo, “Optimization, learning and natural algorithms (in italian),”
the third quartile of the distribution, while the indentations in the box-plots Ph.D. dissertation, Dipartimento di Elettronica, Politecnico di Milano,
(or notches) indicate the 95 % confidence interval. Italy, 1992.
[10] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant System: Optimization by
a colony of cooperating agents,” IEEE Transactions on Systems, Man,
and Cybernetics – Part B, vol. 26, no. 1, pp. 29–41, 1996.
cialized algorithms for neural network training. This result is [11] M. Dorigo and T. Stützle, Ant Colony Optimization. MIT Press,
particularly interesting as ACO*—being a much more generic Cambridge, MA, 2004, to appear.
[12] J. Dréo and P. Siarry, “A new ant colony algorithm using the heterarchi-
approach—allows also the training of networks in which cal concept aimed at optimization of multiminima continuous functions,”
the neuron transfer function is either not differentiable or in Proceedings of ANTS 2002, ser. Lecture Notes in Computer Science,
unknown. The hybrid of ACO* and the Levenberg-Marquardt M. Dorigo, G. Di Caro, and M. Sampels, Eds., vol. 2463. Springer
Verlag, Berlin, Germany, 2002, pp. 216–221.
algorithm (i.e., ACO*-LM) was in some cases able to out- [13] G. H. Golub and C. F. Loan, Matrix Computations, 2nd ed. The John
perform the back propagation and the Levenberg-Marquardt Hopkins University Press, Baltimore, USA, 1989.
algorithms. Finally, the results indicate that ACO* outperforms [14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning. Springer-Verlag, Berlin, Germany, 2001.
other general-purpose optimizers such as genetic algorithms. [15] K. Levenberg, “A method for the solution of certain problems in least
squares,” Quarterly Applied Mathematics, vol. 2, pp. 164–168, 1944.
ACKNOWLEDGMENT [16] D. Marquardt, “An algorithm for least-squares estimation of nonlinear
parameters,” SIAM Journal on Applied Mathematics, vol. 11, pp. 431–
This work was supported by the Spanish CICYT project TRACER (grant TIC-2002- 441, 1963.
04498-C05-03), and by the “Juan de la Cierva” program of the Spanish Ministry of [17] N. Monmarché, G. Venturini, and M. Slimane, “On how pachycondyla
Science and Technology of which Christian Blum is a post-doctoral research fellow.
apicalis ants suggest a new search algorithm,” Future Generation Com-
puter Systems, vol. 16, pp. 937–946, 2000.
This work was also partially supported by the ANTS project, an Action de Recherche [18] L. Prechelt, “Proben1—a set of neural network benchmark problems
Concertée funded by the Scientific Research Directorate of the French Community of and benchmarking rules,” Fakultät für Informatik, Universität Karlsruhe,
Belgium.
Karlsruhe, Germany, Tech. Rep. 21, 1994.
[19] D. Rummelhart, G. Hinton, and R. Williams, “Learning representations
by backpropagation errors,” Nature, vol. 323, pp. 533–536, 1986.
R EFERENCES [20] K. Socha, “Extended ACO for continuous and mixed-variable optimiza-
tion,” in Proceedings of ANTS 2004 – Fourth International Workshop
[1] E. Alba and J. F. Chicano, “Training neural networks with GA hybrid al-
on Ant Algorithms and Swarm Intelligence, ser. Lecture Notes in Com-
gorithms,” in Proceedings of the Genetic and Evolutionary Computation
puter Science, M. Dorigo, M. Birattari, C. Blum, L. M. Gambardella,
Conference—GECCO 2004, ser. Lecture Notes in Computer Science,
F. Mondada, and T. Stützle, Eds. Springer Verlag, Berlin, Germany,
K. D. et al., Ed., vol. 3102. Springer Verlag, Berlin, Germany, 2004,
2004, to appear.
pp. 852–863.