0% found this document useful (0 votes)
12 views

A Hybrid Gene Selection Approach For Microarray Data Classification Using

This document presents a study that develops an ant colony optimization methodology for clustering objects into groups. The algorithm is inspired by how real ants find the shortest path between their nest and food sources by communicating information about paths via pheromone trails. The methodology was tested on simulated and real datasets, and its performance was compared to other clustering methods like genetic algorithms and simulated annealing. The results showed that the ant colony approach found high quality solutions using fewer function evaluations and less processing time.

Uploaded by

installheri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

A Hybrid Gene Selection Approach For Microarray Data Classification Using

This document presents a study that develops an ant colony optimization methodology for clustering objects into groups. The algorithm is inspired by how real ants find the shortest path between their nest and food sources by communicating information about paths via pheromone trails. The methodology was tested on simulated and real datasets, and its performance was compared to other clustering methods like genetic algorithms and simulated annealing. The results showed that the ant colony approach found high quality solutions using fewer function evaluations and less processing time.

Uploaded by

installheri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Analytica Chimica Acta 509 (2004) 187–195

An ant colony approach for clustering


P.S. Shelokar, V.K. Jayaraman, B.D. Kulkarni∗
Chemical Engineering & Process Development Division, National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411 008, India

Received 2 September 2003; received in revised form 25 November 2003; accepted 11 December 2003

Abstract

This paper presents an ant colony optimization methodology for optimally clustering N objects into K clusters. The algorithm employs
distributed agents which mimic the way real ants find a shortest path from their nest to food source and back. This algorithm has been imple-
mented and tested on several simulated and real datasets. The performance of this algorithm is compared with other popular stochastic/heuristic
methods viz. genetic algorithm, simulated annealing and tabu search. Our computational simulations reveal very encouraging results in terms
of the quality of solution found, the average number of function evaluations and the processing time required.
© 2003 Elsevier B.V. All rights reserved.

Keywords: Ant colony metaheuristic; Clustering; Optimization; Euclidean distance

1. Introduction In this work, we recast the recently proposed ant colony


optimization algorithm to suit the need for data clustering.
Clustering aims to discover sensible organization of Ant colony optimization (ACO) metaheuristic, a novel
objects in a given dataset by identifying and quantifying population-based approach was recently proposed by
similarities (dissimilarities) between the objects. Cluster Dorigo et al. to solve several discrete optimization prob-
analysis has found many applications including qualitative lems [22,23]. The ACO mimics the way real ants find the
interpretation and data compression [1], process monitoring shortest route between a food source and their nest. The
[2], local model development [3], analysis of chemical com- ants communicate with one another by means of pheromone
pounds for combinatorial chemistry [4], toxicity testing [5], trails and exchange information about which path should be
finding structure–activity relations [6], discovering of clus- followed. The more the number of ants traces a given path,
ters in DNA dinucleotides [7], classification of coals [8], the more attractive this path (trail) becomes and is followed
etc. A good introduction to contemporary data-clustering by other ants by depositing their own pheromone. This auto
algorithms can be found elsewhere [9]. Different criteria catalytic and collective behavior results in the establishment
are employed by different authors for clustering. Most of of the shortest route. As shown in Fig. 1, two ants start
the clustering criterion functions are nonconvex and non- from their nest in search of food source at the same time
linear so that the problem may have local minimum solu- to different directions. One of them chooses the path that
tions, which are not necessarily optimal [10]. Moreover, turns out to be shorter while the other takes the longer so-
they possess exponential complexity in terms of number journ. The ant moving in the shorter path returns to the nest
of clusters and become an NP-hard problem when number earlier and the pheromone deposited in this path is obvi-
of clusters exceeds 3 [11]. Due to the strategic importance ously more than what is deposited in the longer path. Other
of clustering in many fields, several algorithms have been ants in the nest thus have high probability of following the
proposed in literature to solve clustering problems [12–17]. shorter route. These ants also deposit their own pheromone
Recently, evolutionary and metaheuristics like, tabu search on this path. More and more ants are soon attracted to
[18], genetic algorithms [14,19] and simulated annealing this path and hence the optimal route from the nest to the
[20,21] have been successfully employed for clustering. food source and back is very quickly established. Such a
pheromone-meditated cooperative search process leads to
∗ Corresponding author. Tel.: +91-20-589-3095; the intelligent swarm behavior. This real-life search behav-
fax: +91-20-589-3041. ior was the key motivation factor leading to the formulation
E-mail address: [email protected] (B.D. Kulkarni). of artificial ant algorithms to solve several large-scale com-

0003-2670/$ – see front matter © 2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.aca.2003.12.032
188 P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195

rithm on several simulated and chemical datasets. Finally,


conclusions of the current work are reported in Section 4.
Food source

2. ACO algorithm for clustering problems

This section describes the ant algorithm to solve a clus-


tering problem where the aim is to obtain optimal assign-
ment of N objects in Rn to one of the K clusters such that
Nest the sum of squared Euclidean distances between each object
and the center of the belonging cluster is minimized. The
algorithm considers R agents to build solutions. An agent
a b
starts with an empty solution string S of length N where each
Fig. 1. Movement of ant algorithm from nest-food source and back: (a) element of string corresponds to one of the test samples.
two ants start exploring paths towards the food source and (b) pheromone The value assigned to an element of solution string S repre-
is deposited more quickly on the shortest path and eventually most of the sents the cluster number to which the test sample is assigned
ants have chosen the shortest path.
in S. For example, a representative solution string, S1 in
Table 4 constructed for N = 8 and K = 3 is given below as
binatorial and function optimization problems [24–26]. In
all these algorithms, a set of ant-like agents or software 2 1 3 2 2 3 2 1
ants solve the problem under consideration through a co-
operative effort. This effort is mediated by exchanging in- We note that the first element of the above string is as-
formation on the problem structure the agents concurrently signed to cluster number 2, second element is allocated to
collect while stochastically building solutions. Similarly, cluster number 1 and so on. To construct a solution, the
we propose an ACO algorithm for data clustering, in which agent uses the pheromone trail information to allocate each
a set of concurrent distributed agents collectively discover element of string S to an appropriate cluster label. At the
a sensible organization of objects for a given dataset. start of the algorithm, the pheromone matrix, τ is initialized
Recently, ant-like agents have been applied to solve prob- to some small value, τ 0 . The trail value, τ ij at location (i, j)
lems in the context of objects clustering [27–30]. In these al- represents the pheromone concentration of sample i associ-
gorithms, a population of distributed agents randomly move ated to the cluster j. For the problem of separating N sam-
onto the two-dimensional grid to move objects to form clus- ples into K clusters the pheromone matrix is of size N × K.
ters. Initially each agent selects a random direction among Thus, each sample is associated with K pheromone concen-
the eight possible ones. The agent has a threshold proba- trations. The pheromone trail matrix evolves as we iterate.
bility to further continue in the previously chosen direction At any iteration level, each one of the agents or software
when moving next, else it generates randomly a new di- ants will develop such trial solutions using the process of
rection. The number of moves an agent can perform is de- pheromone-mediated communication with a view to obtain
fined a priori. The agents try to pick up/drop objects on the a near-optimal partition of the given N test samples into K
two-dimensional board according to a local density measure groups satisfying the defined objective. After generating a
of similar objects without any global control on the agents. population of R trial solutions, a local search is performed
The approach introduced in this paper is quite different from to further improve fitness of these solutions. The pheromone
the above-mentioned ant algorithms in the context of data matrix is then updated depending on the quality of solutions
clustering. In our algorithm, each agent discovers a possible produced by the agents. Guided by the modified pheromone
partition of objects in a given dataset and the level of par- matrix, the agents build improved solutions and the above
titioning is measured subject to some (Euclidean distance) steps are repeated for certain number of iterations.
metric. Information associated with an agent about cluster-
ing of objects is accumulated in the global information hub 2.1. Algorithm details
(pheromone trail matrix) and is used by the other agents
to construct possible clustering solutions and iteratively im- As explained earlier, ants start with empty solution strings
prove them. The algorithm works for a given maximum num- and in the first iteration the elements of the pheromone ma-
ber of iterations and the best solution found with respect to trix are initialized to the same values. With the progress of
a given metric represents an optimal or near-optimal parti- iterations, the pheromone matrix is updated depending upon
tioning of objects into subsets in a given dataset. the quality of solutions produced. Let us consider for the
This paper is organized as follows. Section 2 describes purpose of illustration, a dataset containing N = 8 test sam-
the steps involved in the ACO algorithm to solve a cluster- ples defined by n = 4 attributes as shown in Table 1. The
ing problem, while Section 3 reports on the computational test samples are to be clustered into K = 3 subsets using
results of evaluation of the performance of the ACO algo- R = 10 agents. We now proceed to describe the progress of
P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195 189

Table 1 Table 3
Illustrative dataset to explain ACO algorithm for clustering with N = 8 Normalized pheromone trail matrix
and n = 4
N K
N n
1 2 3
1 2 3 4
1 0.3695 0.3825 0.2479
1 5.1 3.5 1.4 0.2 2 0.3825 0.2479 0.3695
2 4.9 3 1.4 0.2 3 0.3825 0.3695 0.2479
3 4.7 3.2 1.3 0.2 4 0.2479 0.3825 0.3695
4 4.6 3.1 1.5 0.2 5 0.3695 0.3825 0.2479
5 5 3.6 1.4 0.2 6 0.2479 0.3695 0.3825
6 5.4 3.9 1.7 0.4 7 0.2479 0.5041 0.2479
7 4.6 3.4 1.4 0.3 8 0.3825 0.3695 0.2479
8 5 3.4 1.5 0.2

by using first procedure (i.e. clusters chosen with highest


current iteration, t with a view to providing a clear picture pheromone concentration referring to Table 2) since random
of the algorithm details. The agents build their solutions by numbers corresponding to these elements are less than q0 .
applying the information provided by the pheromone ma- On the other hand elements 3 and 4 of the solution string
trix updated at the end of iteration, t − 1. The pheromone S1 are assigned to one of the three clusters by using second
concentrations for the first sample as shown in Table 2 are: procedure since their corresponding random numbers are
τ11 = 0.014756, τ12 = 0.015274, and τ13 = 0.009900. It higher than the threshold, q0 . The second process chooses
indicates that at the current iteration, sample number 1 has any one of the three clusters with a normalized pheromone
the highest probability of belonging to cluster number 2 be- probability (pheromone probability normalized to 1) given
cause τ 12 is highest. To generate a solution S, the agent se- by
lects cluster number for each element of string S by one of τij
the following ways: pij = K , j = 1, . . . , K (1)
k=1 τik
(i) using probability q0 , cluster having the maximum
pheromone concentration is chosen (q0 being a pri- where pij is the normalized pheromone probability for ele-
ori defined number, 0 < q0 < 1, q0 = 0.98 for the ment i belongs to cluster j. For illustration purpose the nor-
illustrative example and in our simulations), and/or malized pheromone matrix is shown in Table 3. For the third
(ii) one of the K (three) clusters using a stochastic distribu- element of solution string S1 the cluster number 1, 2 or 3 is
tion with a probability (1 − q0 ), denoted as, pij . selected with the normalized probabilities 0.3695, 0.3825,
0.2479, respectively. This can be readily chosen by generat-
The first process is known as exploitation whereas the ing a number from uniform distribution. Thus, if the random
latter is termed as biased exploration [31]. To explain how number chosen in the range (0, 1) lies between 0 and 0.3695,
the above-mentioned procedures work simultaneously, con- cluster number 1 contains the third element, if it is between
sider developing first solution string, S1 shown in Table 4. 0.3695 and 0.7520, cluster number 2 is chosen and if it is
We generate numbers randomly from uniform distribution greater than 0.7520 cluster number 3 is chosen for allocating
in the range between 0 and 1. The numbers generated are the third element of S1 . Suppose the random number chosen
equal to the length of solution string. The generated random is, say, 0.784342, which is greater than 0.7520, hence third
numbers are, say (0.693241, 0.791452, 0.986142, 0.988432, element of string S1 is assigned to cluster number 3. Simi-
0.243672, 0.967721, 0.0914324, 0.348767). Thus, elements larly, cluster number for the other elements are assigned and
1, 2, 5, 6, 7, and 8 are assigned to appropriate clusters the complete solution string S1 shown in Table 4 is built. In
this way, remaining nine agents can construct their solutions
Table 2 as given in Table 4.
Pheromone trail matrix generated during run of the ACO algorithm for The quality of solution constructed is measured in terms
dataset shown in Table 1 of the value of objective function for a given data-clustering
N K problem. This objective function is defined as the sum of
1 2 3
squared Euclidean distances between each object and the
center of belonging cluster. Consider a given dataset of N
1 0.014756 0.015274 0.009900 objects {x1 , x2 , . . . , xN } in Rn -dimensional space to be
2 0.015274 0.009900 0.014756
3 0.015274 0.014756 0.009900
partitioned into a number, say K, of clusters or groups. The
4 0.009900 0.015274 0.014756 mathematical formulation of the data-clustering problem
5 0.014756 0.015274 0.009900 can be described as
6 0.009900 0.014756 0.015274 N 
K  n
7 0.009900 0.020131 0.009900

Min F (w, m) = wij ||xiv − mjv ||2 (2)
8 0.015274 0.014756 0.009900
j=1 i=1 v=1
190 P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195

Table 4 matrix can be given as


Solutions generated by R = 10 agents during run of the ACO algorithm
S N Fitness, F N K
1 2 3 4 5 6 7 8 1 2 3
S1 2 1 3 2 2 3 2 1 2.695110
S2 1 1 2 2 2 3 3 1 2.474522 1 0 1 0
S3 1 1 1 2 2 3 2 1 1.816471 2 1 0 0
S4 2 1 1 2 3 3 2 1 2.140193 3 0 0 1
S5 2 2 1 2 2 3 2 1 1.982272 4 0 1 0
S6 2 1 1 2 2 3 3 1 2.534078
5 0 1 0
S7 2 1 1 2 2 3 2 1 1.842034
S8 2 3 1 2 2 3 2 3 2.408086 6 0 0 1
S9 2 1 1 2 1 3 2 1 1.900668 7 0 1 0
S10 1 1 2 2 2 3 1 1 1.877386 8 1 0 0

Using these weights in Eq. (5), the center of each cluster


m1 , m2 , and m3 are obtained as shown below:
such that K n
K
 1 2 3 4
wij = 1, i = 1, . . . , N (3)
j=1 1 4.9500 3.2000 1.4500 0.2000
2 4.8250 3.4000 1.4250 0.2250
3 5.0500 3.5500 1.5000 0.3000
N

wij ≥ 1, j = 1, . . . , K (4) Substituting the weight matrix and cluster centers in
i=1 Eq. (2) fitness (objective function) of solution S1 is calcu-
lated. The computed objective function values for 10 strings
where xiv is a value of vth attribute of ith sample; m a cluster are shown in Table 4.
center matrix of size K × n; mj v an average of the vth Many of the available ACO algorithms employ some form
attribute values of all samples in the cluster j; w a weight of local search procedures with a view to improve solu-
matrix of size N × K; wij an associated weight of object xi tions discovered by the software ants [31,32]. If heuristic
with cluster j which can be assigned as information about a particular problem domain is not eas-
ily available, local search can help to find good results [33].
 In these algorithms, local search procedure is applied on all
1 if object i is contained in cluster j the generated solutions, R or on a few percent of R. In this
wij = ,
0 otherwise work, we have performed local search on L solutions repre-
i = 1, . . . , N, j = 1, . . . , K senting best 20% of the total solutions. Before conducting
local search, members of the population are sorted in the as-
cending order of their function values. A simple local search
Referring to first solution string S1 in Table 4: procedure is implemented on top L solutions with highest
fitness values (lowest values of objective function). In our
2 1 3 2 2 3 2 1 illustrative example, the sorted solution strings are shown
we note that the first element of the above string is assigned in Table 5 and with L = 2 we conduct local search on top
to cluster number 2 and thus w11 = 0, w12 = 1, w13 = 0. two solution strings in this table. There are various ways of
Similarly the fifth object is allocated to group number 2 and conducting local search. In our work, we altered the cluster
its wij vector is w51 = 0, w52 = 1, w53 = 0 and so on. After number of each sample in the solution string with certain
getting wij ’s, the center of each cluster, mj can be obtained threshold probability, pls a priori defined number in the range
as 0 and 1, pls = 0.01 for illustrative example and our simu-
lations. Considering the topmost solution string in Table 5:
N
i=1 wij xiv
mjv =  N
, j = 1, . . . , K, v = 1, . . . , n (5) 1 1 1 2 2 3 2 1
i=1 wij
Let us first generate eight random numbers in the
For a given solution string S1 , knowing the cluster center range between 0 and 1. Let the random number generated
matrix m and weight matrix w its function value can be be (0.231345, 0.742312, 0.655361, 0.198312, 0.001636,
calculated using Eq. (2). For solution S1 (Table 4), weight 0.1278345, 0.874452, 0.436587). Thus, only the value
P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195 191

Table 5 using (2) as Ft . If Ft is less than Fk , then Sk = St and


Solutions in Table 4 sorted as per the criterion of clustering problem Fk = Ft .
S N Fitness, F (v) k = k + 1; if k ≤ L go to step (ii), else stop.
1 2 3 4 5 6 7 8 After performing the local search operation, the
S1 1 1 1 2 2 3 2 1 1.816471 pheromone matrix is updated. Such a pheromone updating
S2 2 1 1 2 2 3 2 1 1.842034 process reflects the usefulness of dynamic information pro-
S3 1 1 2 2 2 3 1 1 1.877386 vided by the software ants. Thus, the pheromone matrix
S4 2 1 1 2 1 3 2 1 1.900668
is a kind of adaptive memory that contains information
S5 2 2 1 2 2 3 2 1 1.982272
S6 2 1 1 2 3 3 2 1 2.140193 provided by the previously found superior solutions, and is
S7 2 3 1 2 2 3 2 3 2.408086 updated at the end of iteration. The trail updating process
S8 1 1 2 2 2 3 3 1 2.474522 applied in this algorithm considers best L solutions out of
S9 2 1 1 2 2 3 3 1 2.534078 R members discovered by the agents as per the given cri-
S10 2 1 3 2 2 3 2 1 2.695110
terion (Eq. (2)) at iteration level t. These L agents mimic
deposition of the pheromone trail of real ants by assigning
some real numbers τ ij associated with solution attributes.
of random number corresponding to fifth element is less The trail information is updated using the following rule as
than the threshold probability, 0.01. So only, this element
has to be assigned a different cluster number. Currently L

the fifth element is assigned to cluster number 2. There- τij (t + 1) = (1 − ρ) τij (t) + τijl ,
fore, it has to be assigned to either cluster number 1 or l=1
3 with equal probability by generating a random num- i = 1, . . . , N, j = 1, . . . , K (6)
ber. The solution string (LS1 ) obtained in neighborhood
of topmost solution string S1 by local search is given where ρ is the persistence of trail that lies between [0, 1]
below as and (1 − ρ) the evaporation rate. Higher value of ρ sug-
gests that the information gathered in the past iterations is
LS1 1 1 1 2 1 3 2 1 forgotten faster. The amount τijl is equal to 1/Fl , if cluster
j is assigned to ith element of the solution constructed by
It can be observed that the fifth element has been relocated ant l and zero otherwise. An optimal solution is that solu-
to cluster number 1. Similarly, the second string, S2 from tion which minimizes the objective function value (2). The
top in Table 5 undergoes local search operation and the value of best solution in memory is updated with the value
solution string (LS2 ) generated in neighborhood of S2 is of the solution obtained as “current iteration best solution” if
given as it is having a lower objective function value than that of the
best solution in memory. This comprises one iteration of the
LS2 2 1 1 2 2 3 1 1 algorithm.
After conducting the local search, the objective function Thus, at any iteration level the algorithm essentially exe-
values for the newly generated solutions are computed us- cutes three steps viz. (1) generation of new R solutions by
ing Eq. (2). These solutions can be accepted only if there software ants using the modified pheromone trail informa-
is improvement in the fitness. The objective function values tion available from previous iteration, (2) performing local
for locally generated solutions, LS1 and LS2 are 1.593560 search operation on the newly generated solutions, and (3)
and 1.835535, respectively. Here, the quality of both solu- updating pheromone trail matrix. The algorithm repeatedly
tions generated is better than that of solutions S1 and S2 carries out these three steps for a maximum number of given
(1.816471 and 1.842034, respectively) given in Table 5. iterations, and solution having lowest function value repre-
Therefore, newly generated solutions replace these two so- sents the optimal partitioning of objects of a given dataset
lutions in Table 5. The local search algorithm can be written into several groups. The summary of ant algorithm for data
as follows: clustering is depicted as a flowchart shown in Fig. 2.
With local search probability threshold pls in [0, 1], a
neighbor of Sk , k = 1, . . . , L is generated as
3. Results and discussion
(i) k = 1.
(ii) Let St be a temporary solution and assign St (i) = Sk (i), We implemented the ACO clustering algorithm on five
i = 1, . . . , N. datasets. All algorithms are executed in C++ language and
(iii) For each element i of St , draw a random number r in all experiments are performed on a Pentium IV 400 MHz
(0, 1). If r ≤ pls , an integer j in the range (1, K), such Personal Computer. The five datasets (two simulated and
that Sk (i) = j is randomly selected and let St (i) = j. three chemical datasets) are described below.
(iv) Calculate cluster centers and weights associated with Both simulated datasets were created using a random
solution string St and find its objective function value number generator that produced Gaussian distributed set of
192 P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195

4
Start
A 2

0
Send R agents each with empty solution string, S
-2
x2

i=1 -4

-6
Construct solution, Si using pheromone trail
-8

-10
Compute weights of all test samples, and cluster centers -10 -8 -6 -4 -2 0 2 4

Fig. 3. Example 1: (䉫) objects from class 1; ( ) objects from class 2;


Compute clustering metric and assign it as (䊊) objects from class 3.
objective function value Fi of solution, Si
objects. These datasets are given as
i = i +1 • Example 1. This dataset is composed of K = 3 clusters
with 50 objects in each cluster. The data was generated
i≤R ? Yes using mean µ1 = [3, 0], µ2 = [0, 3], µ3 = [1.5, 2.5] and
variance λ1 = [0.3, 1], λ2 = [1, 0.5], λ3 = [2, 1]. The
No
dataset is shown in Fig. 3.
Select best L solutions out of R solutions using • Example 2. The dataset represents K = 6 clusters with
objective function values
allocation of 25 objects to each cluster. The data was
simulated using mean values µ1 = [3, 0], µ2 = [0, 3],
l =1 µ3 = [1.5, 2.5], µ4 = [0.2, 0.1], µ5 = [1.2, 0.8], µ6 =
B
[0.1, 1.1] and variance λ1 = [0.3, 1], λ2 = [1, 0.5], λ3 =
Let St = Sl, where St is a temporary solution and, [2, 1], λ4 = [0.03, 1], λ5 = [2, 0.5], λ6 = [0.2, 0.4]. The
perform local search on St dataset is given in Fig. 4.
Many authors have considered the iris, wine and human
Compute weights of all test samples, and cluster centers thyroid disease datasets as data-clustering problems to study
and evaluate the performance of their algorithms. These are
Compute clustering metric and assign it as briefly described as
objective function value, Ft of solution, St
• Example 3. The dataset consists of N = 150 samples of
three iris flowers (K = 3) viz. setosa, versicolor, and
If Ft < Fl then Fl = Ft and Sl = St verginica. Each object is defined by four attributes, n = 4:
sepal length, sepal width, petal length, and petal width.
The data is obtained from the UCI repository of machine
l = l +1
learning databases [34].
• Example 4. This dataset contains chemical analysis of
l≤L ? Yes
B N = 178 wines, derived from three different cultivars,
K = 3. Wine type is based on 13 continuous attributes,
n = 13 derived from chemical analysis: alcohol, malic
Update pheromone trail matrix using best L solutions
acid, ash, alcalinity of ash, magnesium, total phenols, fla-
vanoids, nonflavanoids phenols, proanthocyaninsm, color
intensity, hue, OD280/OD315 of diluted wines and pro-
No
Termination
A line. It is also available in the public domain of UCI repos-
criterion attained ? itory of machine learning databases [34].
• Example 5. This dataset categories N = 215 samples of
Yes patients suffering from three human thyroid diseases, K =
Print best solution 3 as: euthyroid, hyperthyroidism, and hypothyroidism pa-
tients where 150 individuals are tested euthyroid thyroid,
30 patients are experienced hyperthyroidism thyroid while
Stop 35 patients are suffered by hypothyroidism thyroid. Each
individual was characterized by the result of five, n =
Fig. 2. Flowchart of ant algorithm for data clustering.
P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195 193

2
x2

-2

-6

-10
-10 -5 0 5 10

Fig. 4. Example 2: (䉫) objects from class 1; (䊐) objects from class 2; ( ) objects from class 3; (×) objects from class 4; (+) objects from class 5;
(䊊) objects from class 6.

Table 6
Results obtained by the four algorithms for 10 different runs on Example 1
Method Function value Function evaluations CPU time (s)

Fbest Favg Fworst

ACO 203.595559 203.626619 203.906163 12396 31.49


GA 203.595559 204.057260 204.689421 32757 69.89
TS 204.053636 204.436562 205.381524 23401 61.69
SA 203.595559 203.706785 203.897976 27505 72.93

5 laboratory tests as: total serum thyroxine, total serum at most 30 000 times. The comparison of results for each
tri-iodothyronine, serum tri-iodothyronine resin uptake, dataset is based on the best solution found in 10 distinct
serum thyroid-stimulating hormone (TSH), and increase runs of each algorithm, the average number of function
TSH after injection of TSH-releasing hormone [35]. evaluations required and the average processing time taken
to attain the best solution. The solution quality is also given
To evaluate the performance of the ACO algorithm, we in terms of the average and worst values of the clustering
have compared it with several typical stochastic algorithms metric (Favg , Fworst , respectively) after 10 different runs for
including the simulated annealing (SA) approach [20], the each of the four algorithms. For clustering problem, Ex-
genetic algorithms (GA) [19] and the tabu search (TS) ample 1 results given in Table 6, show that the ACO, GA
approach [18]. The effectiveness of stochastic algorithms and SA clustering algorithms provide the optimum value
is greatly dependent on the generation of initial solutions. of 203.595559. In fact, the ACO found this optimum nine
Therefore, for every dataset, algorithms performed 10 times times as compared to the five times and one time obtained
individually for their own effectiveness tests, each time by the SA and GA approach in 10 runs, respectively. The
with randomly generated initial solutions. Each experiment average number of function evaluations to obtain the best
is made of at most 1000 iterations of the associated search solution and the average time required to attain the conver-
procedure by the ACO algorithm, the GA approach and the gence are 12 396 and 31.49, respectively, for the ACO algo-
TS algorithm. For each test, the SA procedure was called rithm, which are better than the other algorithms as shown

Table 7
Results obtained by the four algorithms for 10 different runs on Example 2
Method Function value Function evaluations CPU time (s)

Fbest Favg Fworst

ACO 172.948099 173.364862 173.613300 25260 66.21


GA 173.990484 177.266506 185.867600 40065 93.79
TS 176.576398 178.870536 180.802676 28191 83.91
SA 173.244913 174.572357 177.778584 30000 79.48
194 P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195

Table 8
Results obtained by the four algorithms for 10 different runs on Example 3
Method Function value Function evaluations CPU time (s)

Fbest Favg Fworst

ACO 97.100777 97.171546 97.808466 10998 33.72


GA 113.986503 125.197025 139.778272 38128 105.53
TS 97.365977 97.868008 98.569485 20201 72.86
SA 97.100777 97.136425 97.263845 29103 95.92

Table 9
Results obtained by the four algorithms for 10 different runs on Example 4
Method Function value Function evaluations CPU time (s)

Fbest Favg Fworst

ACO 16530.533807 16530.533807 16530.533807 9,306 68.29


GA 16530.533807 16530.533807 16530.533807 33,551 226.68
TS 16666.226987 16785.459275 16837.535670 22,716 161.45
SA 16530.533807 16530.533807 16530.533807 7,917 57.28

Table 10
Results obtained by the four algorithms for 10 different runs on Example 5
Method Function value Function evaluations CPU time (s)

Fbest Favg Fworst

ACO 10111.827759 10112.126903 10114.819200 25626 102.15


GA 10116.294861 10128.823145 10148.389608 45003 153.24
TS 10249.72917 10354.315021 10438.780449 29191 114.01
SA 10111.827759 10114.045265 10118.934358 28675 108.22

in Table 6. For clustering problem, Example 2 the ACO and tions (10 998) and the processing time (33.72). The results
SA approach provide the optimum value of 172.984099. obtained for the clustering problem, Example 4 are given
From Table 7, the Favg of 173.364862 obtained by the ACO in Table 9. The ACO, SA and GA approach provide the
algorithm is less than the best solution obtained by the GA optimum solution of 16530.533807. The ACO, SA and GA
and TS approach. In terms of the number of function evalu- methods found this optimum solution in all their 10 runs.
ations and the processing time required, the ACO algorithm The function evaluations and the execution time taken by
fairs better than its counterparts. the ACO algorithm are higher than the SA approach but less
The iris dataset is Example 3. It contains 150 objects to than that of the GA and TS approaches. The human thyroid
be partitioned into three clusters. For this problem, the ACO disease dataset (Example 5) consist of 215 objects to be al-
and SA methods obtain the best solution of 97.100777. The located to three clusters. Both the ACO and SA algorithms
ACO was able to find the optimum nine times as compared provide the optimum solution of 10111.827759 to this prob-
to that of five times obtained by the SA. Table 8 shows lem with success rate of 90 and 30% during 10 runs, respec-
that ACO required the least number of function evalua- tively. In terms of the function evaluations and the process-

Table 11
Values of parameters of each of the four algorithms
ACO GA TS SA

Parameter Value Parameter Value Parameter Value Parameter Value


Ants (R) 50 Population size 50 Tabu list size 25 Probability threshold 0.98
Probability threshold for 0.98 Crossover rate 0.8 Number of trial solutions 40 Initial temperature 5
maximum trail (q0 )
Local search probability (pls ) 0.01 Mutation rate 0.001 Probability threshold 0.98 Temperature multiplier 0.98
Evaporation rate (ρ) 0.01 Maximum number 1000 Maximum number of iterations 1000 Final temperature 0.01
of iterations
Maximum number of 1000 Number of iterations detect 100
iterations (itermax) steady state
Maximum number of 30000
iterations
P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195 195

ing time, the ACO performed better than the SA, GA and [2] P. Teppola, S.-P. Mujunen, P. Minkkinen, Chemometr. Intell. Lab.
TS clustering algorithms as can be observed from Table 10. Syst. 45 (1999) 23–38.
[3] M. Ronen, Y. Shabtai, H. Guterman, Biotech. Bioeng. 77 (2002)
Several simulations were performed to find the algorith- 420–429.
mic parameters that result into the best performance of all [4] A. Linusson, S. Wold, B. Nordén, Chemometr. Intell. Lab. Syst. 44
the algorithms in terms of the quality of solution found, (1998) 213–227.
the function evaluations and the processing time required. [5] R.G. Lawson, P.C. Jurs, J. Chem. Inf. Comput. Sci. 30 (1990) 137–
The algorithmic parameters used in this study are given in 144.
[6] W.J. Dunn, M.J. Greenberg, S.S. Callejas, J. Med. Chem. 19 (1976)
Table 11. 1299–1301.
In this study, several datasets were considered with clus- [7] M.L.M. Beckers, W.J. Melssen, L.M.C. Buydens, Comput. Chem.
ters ranging from K = 3 to K = 6 and number of attributes 21 (1997) 377–390.
from n = 2 to n = 13. As seen, the results obtained by the [8] L. Kaufman, A. Pierreux, P. Rousseuw, M.P. Derde, M.R. Detaev-
ACO method are superior to that of the SA, GA and TS tech- ernier, D.L. Massart, G. Platbrood, Anal. Chim. Acta 153 (1983)
257–260.
niques. The results illustrate that the proposed ant colony [9] J.W. Han, M. Kamber, Data Mining: Concepts and Techniques,
optimization approach can be considered as a viable and an Morgan Kaufmann, San Francisco, CA, 2001.
efficient heuristic to find optimal or near-optimal solutions [10] S.Z. Selim, M.A. Ismail, IEEE Trans. Pattern Anal. Mach. Intell. 6
to clustering problems of allocating N objects to K clusters. (1984) 81–87.
[11] J.W. Welch, J. Stat. Comput. Simulat. 15 (1983) 17–25.
[12] D. Fisher, Mach. Learn. 2 (1987) 139–172.
[13] J. Banfield, A. Raftery, Biometrics 49 (1993) 803–821.
4. Conclusions
[14] J.-H. Jiang, J.H. Wang, X. Chu, R.-Q. Yu, Anal. Chim. Acta 354
(1997) 263–274.
In summary, an ant colony optimization algorithm to solve [15] K. Szczubialka, J. Verdú-Andrés, D.L. Massart, Chemometr. Intell.
clustering problems has been developed in this paper. The Lab. Syst. 41 (1998) 145–160.
software ants use pheromone matrix a kind of adaptive mem- [16] J.A. Fernández Pierna, D.L. Massart, Anal. Chim. Acta 408 (2000)
13–20.
ory, which guide other ants towards the optimal cluster-
[17] T.N. Tran, R. Wehrens, L.M.C. Buydens, Anal. Chim. Acta 490
ing solution. The pheromone (weight) deposition at location (2003) 303–312.
(i, j) (i.e. allocation of sample i to the cluster j in a con- [18] K.S. Al-Sultan, Pattern Recogn. 28 (1995) 1443–1451.
structed solution) depends on its objective function value [19] C.A. Murthy, N. Chowdhury, Pattern Recogn. Lett. 17 (1996) 825–
(smaller function value deposit higher pheromone) and the 832.
[20] S.Z. Selim, K.S. Al-Sultan, Pattern Recogn. 24 (1991) 1003–1008.
evaporation rate. The evaporation rate is a kind of forgetting
[21] L.-X. Sun, Y.-L. Xie, X.-H. Song, J.-H. Wang, R.-Q. Yu, Comput.
factor that helps to look into other clustering locations of ob- Chem. 18 (1994) 103–108.
ject i. Therefore, it will surely provide an optimal cluster rep- [22] M. Dorigo, V. Maniezzo, A. Colorni, IEEE Trans. Syst. Man Cybern.
resentation for a clustering problem as iterations progress. 26 (1996) 29–41.
The ACO algorithm for data clustering can be applied [23] M. Dorigo, G. Di Caro, L.M. Gambardella, Artif. Life 5 (1999)
137–172.
when the number of clusters is known a priori and are crisp
[24] D. Costa, A. Hertz, J. Operat. Res. Soc. 48 (1997) 295–303.
in nature. To evaluate the performance of the ACO algo- [25] G. Di Caro, M. Dorigo, J. Artif. Intell. Res. 9 (1998) 317–365.
rithm, it is compared with other stochastic algorithms viz. [26] R. Schoonderwoerd, O. Halland, J. Bruten, L. Rothkrantz, Adapt.
genetic algorithm, simulated annealing and tabu search. The Behav. 5 (1996) 169–207.
algorithm has been implemented and tested on several sim- [27] J.-L. Deneubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain,
L. Chretien, From Animals to Animats1, in: J.A. Meyer et, S.W.
ulated and real datasets; preliminary computational experi-
Wilson (Eds.), MIT Press, Cambridge, MA, 1991, pp. 356–363.
ence is very encouraging in terms of the quality of solution [28] E.D. Lumer, B. Faieta, From Animals to Animats3, in: D. Cliff, P.
found, the average number of function evaluations and the Husbands, J.A. Meyer, W. Stewart (Eds.), MIT Press, Cambridge,
processing time required. MA, 1994, pp. 501–508.
[29] P. Kuntz, P. Layzell, D. Snyers, J. Heuristics 5 (1998) 327–351.
[30] N. Monmarché, M. Slimane, G. Venturini, in: D. Floreano, J.D.
Acknowledgements Nicoud, F. Mondala (Eds.), Lecture Notes in Artificial Intelligence,
Springer-Verlag, 1999, pp. 626–635.
[31] L.M. Gambardella, M. Dorigo, INFORMS J. Comput. 12 (2000)
Financial support received from the Department of Sci- 237–255.
ence and Technology, New Delhi, India is gratefully ac- [32] L.M. Gambardella, É.D. Taillard, G. Agazzi, in: D. Corne, M. Dorigo,
knowledged. The author PS thanks the Council of Scientific F. Glover (Eds.), New Ideas in Optimization, McGraw-Hill, London,
and Industrial Research (CSIR), the Government of India, UK, 1999, pp. 63–76.
[33] T. Stützle, H. Hoos, Proceedings of the Second International Confer-
New Delhi, for a Senior Research Fellowship. ence on Metaheuristics, Sophia-Antipolis, France, July 21–24, 1997,
pp. 309–314.
[34] UCI Repository of Machine Learning Databases retrieved
References from the World Wide Web: http://www.ics.uci.edu/∼mlearn/
MLRepository.html.
[1] K.J. Mo, S. Eo, D. Shin, E.S. Yoon, Comput. Chem. Eng. 22 (1998) [35] D. Coomans, M. Jonckheer, D.L. Massart, I. Broechaert, P. Blockx,
555–562. Anal. Chim. Acta 103 (1978) 409–415.

You might also like