A Hybrid Gene Selection Approach For Microarray Data Classification Using
A Hybrid Gene Selection Approach For Microarray Data Classification Using
Received 2 September 2003; received in revised form 25 November 2003; accepted 11 December 2003
Abstract
This paper presents an ant colony optimization methodology for optimally clustering N objects into K clusters. The algorithm employs
distributed agents which mimic the way real ants find a shortest path from their nest to food source and back. This algorithm has been imple-
mented and tested on several simulated and real datasets. The performance of this algorithm is compared with other popular stochastic/heuristic
methods viz. genetic algorithm, simulated annealing and tabu search. Our computational simulations reveal very encouraging results in terms
of the quality of solution found, the average number of function evaluations and the processing time required.
© 2003 Elsevier B.V. All rights reserved.
0003-2670/$ – see front matter © 2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.aca.2003.12.032
188 P.S. Shelokar et al. / Analytica Chimica Acta 509 (2004) 187–195
Table 1 Table 3
Illustrative dataset to explain ACO algorithm for clustering with N = 8 Normalized pheromone trail matrix
and n = 4
N K
N n
1 2 3
1 2 3 4
1 0.3695 0.3825 0.2479
1 5.1 3.5 1.4 0.2 2 0.3825 0.2479 0.3695
2 4.9 3 1.4 0.2 3 0.3825 0.3695 0.2479
3 4.7 3.2 1.3 0.2 4 0.2479 0.3825 0.3695
4 4.6 3.1 1.5 0.2 5 0.3695 0.3825 0.2479
5 5 3.6 1.4 0.2 6 0.2479 0.3695 0.3825
6 5.4 3.9 1.7 0.4 7 0.2479 0.5041 0.2479
7 4.6 3.4 1.4 0.3 8 0.3825 0.3695 0.2479
8 5 3.4 1.5 0.2
4
Start
A 2
0
Send R agents each with empty solution string, S
-2
x2
i=1 -4
-6
Construct solution, Si using pheromone trail
-8
-10
Compute weights of all test samples, and cluster centers -10 -8 -6 -4 -2 0 2 4
2
x2
-2
-6
-10
-10 -5 0 5 10
Fig. 4. Example 2: (䉫) objects from class 1; (䊐) objects from class 2; ( ) objects from class 3; (×) objects from class 4; (+) objects from class 5;
(䊊) objects from class 6.
Table 6
Results obtained by the four algorithms for 10 different runs on Example 1
Method Function value Function evaluations CPU time (s)
5 laboratory tests as: total serum thyroxine, total serum at most 30 000 times. The comparison of results for each
tri-iodothyronine, serum tri-iodothyronine resin uptake, dataset is based on the best solution found in 10 distinct
serum thyroid-stimulating hormone (TSH), and increase runs of each algorithm, the average number of function
TSH after injection of TSH-releasing hormone [35]. evaluations required and the average processing time taken
to attain the best solution. The solution quality is also given
To evaluate the performance of the ACO algorithm, we in terms of the average and worst values of the clustering
have compared it with several typical stochastic algorithms metric (Favg , Fworst , respectively) after 10 different runs for
including the simulated annealing (SA) approach [20], the each of the four algorithms. For clustering problem, Ex-
genetic algorithms (GA) [19] and the tabu search (TS) ample 1 results given in Table 6, show that the ACO, GA
approach [18]. The effectiveness of stochastic algorithms and SA clustering algorithms provide the optimum value
is greatly dependent on the generation of initial solutions. of 203.595559. In fact, the ACO found this optimum nine
Therefore, for every dataset, algorithms performed 10 times times as compared to the five times and one time obtained
individually for their own effectiveness tests, each time by the SA and GA approach in 10 runs, respectively. The
with randomly generated initial solutions. Each experiment average number of function evaluations to obtain the best
is made of at most 1000 iterations of the associated search solution and the average time required to attain the conver-
procedure by the ACO algorithm, the GA approach and the gence are 12 396 and 31.49, respectively, for the ACO algo-
TS algorithm. For each test, the SA procedure was called rithm, which are better than the other algorithms as shown
Table 7
Results obtained by the four algorithms for 10 different runs on Example 2
Method Function value Function evaluations CPU time (s)
Table 8
Results obtained by the four algorithms for 10 different runs on Example 3
Method Function value Function evaluations CPU time (s)
Table 9
Results obtained by the four algorithms for 10 different runs on Example 4
Method Function value Function evaluations CPU time (s)
Table 10
Results obtained by the four algorithms for 10 different runs on Example 5
Method Function value Function evaluations CPU time (s)
in Table 6. For clustering problem, Example 2 the ACO and tions (10 998) and the processing time (33.72). The results
SA approach provide the optimum value of 172.984099. obtained for the clustering problem, Example 4 are given
From Table 7, the Favg of 173.364862 obtained by the ACO in Table 9. The ACO, SA and GA approach provide the
algorithm is less than the best solution obtained by the GA optimum solution of 16530.533807. The ACO, SA and GA
and TS approach. In terms of the number of function evalu- methods found this optimum solution in all their 10 runs.
ations and the processing time required, the ACO algorithm The function evaluations and the execution time taken by
fairs better than its counterparts. the ACO algorithm are higher than the SA approach but less
The iris dataset is Example 3. It contains 150 objects to than that of the GA and TS approaches. The human thyroid
be partitioned into three clusters. For this problem, the ACO disease dataset (Example 5) consist of 215 objects to be al-
and SA methods obtain the best solution of 97.100777. The located to three clusters. Both the ACO and SA algorithms
ACO was able to find the optimum nine times as compared provide the optimum solution of 10111.827759 to this prob-
to that of five times obtained by the SA. Table 8 shows lem with success rate of 90 and 30% during 10 runs, respec-
that ACO required the least number of function evalua- tively. In terms of the function evaluations and the process-
Table 11
Values of parameters of each of the four algorithms
ACO GA TS SA
ing time, the ACO performed better than the SA, GA and [2] P. Teppola, S.-P. Mujunen, P. Minkkinen, Chemometr. Intell. Lab.
TS clustering algorithms as can be observed from Table 10. Syst. 45 (1999) 23–38.
[3] M. Ronen, Y. Shabtai, H. Guterman, Biotech. Bioeng. 77 (2002)
Several simulations were performed to find the algorith- 420–429.
mic parameters that result into the best performance of all [4] A. Linusson, S. Wold, B. Nordén, Chemometr. Intell. Lab. Syst. 44
the algorithms in terms of the quality of solution found, (1998) 213–227.
the function evaluations and the processing time required. [5] R.G. Lawson, P.C. Jurs, J. Chem. Inf. Comput. Sci. 30 (1990) 137–
The algorithmic parameters used in this study are given in 144.
[6] W.J. Dunn, M.J. Greenberg, S.S. Callejas, J. Med. Chem. 19 (1976)
Table 11. 1299–1301.
In this study, several datasets were considered with clus- [7] M.L.M. Beckers, W.J. Melssen, L.M.C. Buydens, Comput. Chem.
ters ranging from K = 3 to K = 6 and number of attributes 21 (1997) 377–390.
from n = 2 to n = 13. As seen, the results obtained by the [8] L. Kaufman, A. Pierreux, P. Rousseuw, M.P. Derde, M.R. Detaev-
ACO method are superior to that of the SA, GA and TS tech- ernier, D.L. Massart, G. Platbrood, Anal. Chim. Acta 153 (1983)
257–260.
niques. The results illustrate that the proposed ant colony [9] J.W. Han, M. Kamber, Data Mining: Concepts and Techniques,
optimization approach can be considered as a viable and an Morgan Kaufmann, San Francisco, CA, 2001.
efficient heuristic to find optimal or near-optimal solutions [10] S.Z. Selim, M.A. Ismail, IEEE Trans. Pattern Anal. Mach. Intell. 6
to clustering problems of allocating N objects to K clusters. (1984) 81–87.
[11] J.W. Welch, J. Stat. Comput. Simulat. 15 (1983) 17–25.
[12] D. Fisher, Mach. Learn. 2 (1987) 139–172.
[13] J. Banfield, A. Raftery, Biometrics 49 (1993) 803–821.
4. Conclusions
[14] J.-H. Jiang, J.H. Wang, X. Chu, R.-Q. Yu, Anal. Chim. Acta 354
(1997) 263–274.
In summary, an ant colony optimization algorithm to solve [15] K. Szczubialka, J. Verdú-Andrés, D.L. Massart, Chemometr. Intell.
clustering problems has been developed in this paper. The Lab. Syst. 41 (1998) 145–160.
software ants use pheromone matrix a kind of adaptive mem- [16] J.A. Fernández Pierna, D.L. Massart, Anal. Chim. Acta 408 (2000)
13–20.
ory, which guide other ants towards the optimal cluster-
[17] T.N. Tran, R. Wehrens, L.M.C. Buydens, Anal. Chim. Acta 490
ing solution. The pheromone (weight) deposition at location (2003) 303–312.
(i, j) (i.e. allocation of sample i to the cluster j in a con- [18] K.S. Al-Sultan, Pattern Recogn. 28 (1995) 1443–1451.
structed solution) depends on its objective function value [19] C.A. Murthy, N. Chowdhury, Pattern Recogn. Lett. 17 (1996) 825–
(smaller function value deposit higher pheromone) and the 832.
[20] S.Z. Selim, K.S. Al-Sultan, Pattern Recogn. 24 (1991) 1003–1008.
evaporation rate. The evaporation rate is a kind of forgetting
[21] L.-X. Sun, Y.-L. Xie, X.-H. Song, J.-H. Wang, R.-Q. Yu, Comput.
factor that helps to look into other clustering locations of ob- Chem. 18 (1994) 103–108.
ject i. Therefore, it will surely provide an optimal cluster rep- [22] M. Dorigo, V. Maniezzo, A. Colorni, IEEE Trans. Syst. Man Cybern.
resentation for a clustering problem as iterations progress. 26 (1996) 29–41.
The ACO algorithm for data clustering can be applied [23] M. Dorigo, G. Di Caro, L.M. Gambardella, Artif. Life 5 (1999)
137–172.
when the number of clusters is known a priori and are crisp
[24] D. Costa, A. Hertz, J. Operat. Res. Soc. 48 (1997) 295–303.
in nature. To evaluate the performance of the ACO algo- [25] G. Di Caro, M. Dorigo, J. Artif. Intell. Res. 9 (1998) 317–365.
rithm, it is compared with other stochastic algorithms viz. [26] R. Schoonderwoerd, O. Halland, J. Bruten, L. Rothkrantz, Adapt.
genetic algorithm, simulated annealing and tabu search. The Behav. 5 (1996) 169–207.
algorithm has been implemented and tested on several sim- [27] J.-L. Deneubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain,
L. Chretien, From Animals to Animats1, in: J.A. Meyer et, S.W.
ulated and real datasets; preliminary computational experi-
Wilson (Eds.), MIT Press, Cambridge, MA, 1991, pp. 356–363.
ence is very encouraging in terms of the quality of solution [28] E.D. Lumer, B. Faieta, From Animals to Animats3, in: D. Cliff, P.
found, the average number of function evaluations and the Husbands, J.A. Meyer, W. Stewart (Eds.), MIT Press, Cambridge,
processing time required. MA, 1994, pp. 501–508.
[29] P. Kuntz, P. Layzell, D. Snyers, J. Heuristics 5 (1998) 327–351.
[30] N. Monmarché, M. Slimane, G. Venturini, in: D. Floreano, J.D.
Acknowledgements Nicoud, F. Mondala (Eds.), Lecture Notes in Artificial Intelligence,
Springer-Verlag, 1999, pp. 626–635.
[31] L.M. Gambardella, M. Dorigo, INFORMS J. Comput. 12 (2000)
Financial support received from the Department of Sci- 237–255.
ence and Technology, New Delhi, India is gratefully ac- [32] L.M. Gambardella, É.D. Taillard, G. Agazzi, in: D. Corne, M. Dorigo,
knowledged. The author PS thanks the Council of Scientific F. Glover (Eds.), New Ideas in Optimization, McGraw-Hill, London,
and Industrial Research (CSIR), the Government of India, UK, 1999, pp. 63–76.
[33] T. Stützle, H. Hoos, Proceedings of the Second International Confer-
New Delhi, for a Senior Research Fellowship. ence on Metaheuristics, Sophia-Antipolis, France, July 21–24, 1997,
pp. 309–314.
[34] UCI Repository of Machine Learning Databases retrieved
References from the World Wide Web: http://www.ics.uci.edu/∼mlearn/
MLRepository.html.
[1] K.J. Mo, S. Eo, D. Shin, E.S. Yoon, Comput. Chem. Eng. 22 (1998) [35] D. Coomans, M. Jonckheer, D.L. Massart, I. Broechaert, P. Blockx,
555–562. Anal. Chim. Acta 103 (1978) 409–415.