Ant Colony Optimization For Feature Subset Selection: Abstract
Ant Colony Optimization For Feature Subset Selection: Abstract
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:1, No:4, 2007
problem and the potential of ACO, this paper presents a novel space of feature subsets. Some of the methods ask the
method that utilizes the ACO algorithm to implement a feature subset user to predefine the number of selected features. Other
search procedure. Initial results obtained using the classification of methods are based on the evaluation function, like
speech segments are very promising. whether addition/deletion of any feature does not produce
a better subset.
Keywords—Ant Colony Optimization, ant systems, feature In this paper, we will mainly be concerned with the second
selection, pattern recognition.
component, which is the search procedure. In the next section,
we give a brief description of some of the available search
I. INTRODUCTION
procedure algorithms and their limitations. An explanation of
International Scholarly and Scientific Research & Innovation 1(4) 2007 999 scholar.waset.org/1999.4/10371
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:1, No:4, 2007
based on both random and probabilistic measures. Subsets of pheromone trail will have a greater effect on the agents’
features are evaluated using a fitness function and then solutions.
combined via cross-over and mutation operators to produce It is worth mentioning that ACO makes probabilistic
the next generation of subsets [7]. The GA employ a decision in terms of the artificial pheromone trails and the
population of competing solutions, evolved over time, to local heuristic information. This allows ACO to explore larger
converge to an optimal solution. Effectively, the solution number of solutions than greedy heuristics. Another
space is searched in parallel, which helps in avoiding local characteristic of the ACO algorithm is the pheromone trail
optima. evaporation, which is a process that leads to decreasing the
A GA-based feature selection solution would typically be a pheromone trail intensity over time. According to [10],
fixed length binary string representing a feature subset, where pheromone evaporation helps in avoiding rapid convergence
the value of each position in the string represents the presence of the algorithm towards a sub-optimal region.
or absence of a particular feature. Promising results were In the next section, we present our proposed ACO
achieved when comparing the performance of GA with other algorithm, and explain how it is used for searching the feature
conventional methods [8]. space and selecting an “appropriate” subset of features.
We propose in this paper a subset search procedure that
International Science Index, Computer and Information Engineering Vol:1, No:4, 2007 waset.org/Publication/10371
utilizes the ACO algorithm and aims at achieving similar or IV. THE PROPOSED SEARCH PROCEDURE
better results than GA-based feature selection. For a given classification task, the problem of feature
selection can be stated as follows: given the original set, F, of
III. ANT COLONY OPTIMIZATION n features, find subset S, which consists of m features (m < n,
In real ant colonies, a pheromone, which is an odorous S F), such that the classification accuracy is maximized.
substance, is used as an indirect communication medium. The feature selection problem representation exploited by
When a source of food is found, ants lay some pheromone to the artificial ants includes the following:
mark the path. The quantity of the laid pheromone depends x n features that constitute the original set, F = {f1, …, fn}.
upon the distance, quantity and quality of the food source. x A number of artificial ants to search through the feature
While an isolated ant that moves at random detects a laid space (na ants).
pheromone, it is very likely that it will decide to follow its
x Ti, the intensity of pheromone trail associated with feature
path. This ant will itself lay a certain amount of pheromone,
fi.
and hence enforce the pheromone trail of that specific path.
x For each ant j, a list that contains the selected feature
Accordingly, the path that has been used by more ants will be
subset, Sj = {s1, …, sm}.
more attractive to follow. In other words, the probability with
which an ant chooses a path increases with the number of ants We propose to use a hybrid evaluation measure that is able
that previously chose the same path. This process is hence to estimate the overall performance of subsets as well as the
characterized by a positive feedback loop [9]. local importance of features. A classification algorithm is used
Dorigo et. al. [10] adopted this concept and proposed an to estimate the performance of subsets (i.e., wrapper
artificial colony of ants algorithm, which was called the Ant evaluation function). On the other hand, the local importance
Colony Optimization (ACO) metaheuristic, to solve hard of a given feature is measured using the Mutual Information
combinatorial optimization problems. The ACO was Evaluation Function (MIEF) [14], which is a filter evaluation
originally applied to solve the classical traveling salesman function.
problem [9], where it was shown to be an effective tool in In the first iteration, each ant will randomly choose a
finding good solutions. The ACO has also been successfully feature subset of m features. Only the best k subsets, k < na,
applied to other optimization problems including will be used to update the pheromone trial and influence the
telecommunications networks, data mining, vehicle routing, feature subsets of the next iteration. In the second and
etc [11, 12, 13] following iterations, each ant will start with m – p features that
For the classical Traveling Salesman Problem (TSP) [9], are randomly chosen from the previously selected k-best
each artificial ant represents a simple “agent”. Each agent subsets, where p is an integer that ranges between 1 and m – 1.
explores the surrounding space and builds a partial solution In this way, the features that constitute the best k subsets will
based on local heuristics, i.e., distances to neighboring cities, have more chance to be present in the subsets of the next
and on information from previous attempts of other agents, iteration. However, it will still be possible for each ant to
i.e., pheromone trail or the usage of paths from previous consider other features as well. For a given ant j, those
attempts of the rest of the agents. features are the ones that achieve the best compromise
In the first iteration, solutions of the various agents are only between previous knowledge, i.e., pheromone trails, and Local
based on local heuristics. At the end of the iteration, “artificial Importance with respect to subset Sj, which consists of the
pheromone” will be laid. The pheromone intensity on the features that have already been selected by that specific ant.
various paths will be proportional to the optimality of the The Updated Selection Measure (USM) is used for this
solutions. As the number of iterations increases, the purpose and defined as:
International Scholarly and Scientific Research & Innovation 1(4) 2007 1000 scholar.waset.org/1999.4/10371
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:1, No:4, 2007
T K LI S j N
° i i if i S j 5.
corresponding subset of features.
Using the feature subsets of the best k ant:
USM i
Sj °
® ¦ T g LI g j
K
S N
(1) x For j = 1 to k, /* update the pheromone trails */
° gS j max ( MSE ) MSE
° °
g j
¯ 0 Otherwise g 1:k
if f i S j
°
Sj 'T i ® max§ max( MSE ) MSE · (4)
where LIi is the local importance of feature fi given the subset ¨ h¸
° h 1:k © g 1:k
g
Sj. The parameters K and N control the effect of trail intensity ¹
S
and local feature importance respectively. LIi j is defined as: ¯° 0 Otherwise
Ti U .T i 'T i (5)
ª 2 º
I (C ; f i ) u « 1»
Sj
LI i (2) where U is a constant such that (1 - U) represents the
¬«1 exp(DDi ) ¼»
Sj
evaporation of pheromone trails.
where x For j = 1 to na,
ª H ( fi ) I ( fi , f s ) º o Randomly produce m – p feature subset for ant j,
»u
Sj
Di min « to be used in the next iteration, and store it in Sj.
f s S j
¬ H ( fi ) ¼
(3) 6. If the number of iterations is less than the maximum
International Science Index, Computer and Information Engineering Vol:1, No:4, 2007 waset.org/Publication/10371
International Scholarly and Scientific Research & Innovation 1(4) 2007 1001 scholar.waset.org/1999.4/10371
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:1, No:4, 2007
one feature at a time, such that the MIEF measure (Eq. 2) is similar performance as WVT with smaller number of features.
maximized. The GA-based selection is performed using the Both ACO and GA achieved comparable performance to MFB
following parameter settings: population size = 30, number of using similar number of features, with GA being slightly
generations = 20, probability of crossover = 0.8, and better. Note that SFS achieved a good performance when
probability of mutation = 0.05. The obtained strings are selecting small number of features, but its performance start to
constrained to have the number of ‘1’s matching a predefined worsen as the desired number of features increases. The figure
number of desired features. The MSE of an ANN trained with also shows that the overall performance of ACO is better than
randomly chosen 2000 segments is used as the fitness that of both GA and SFS, where the average classification
function. The parameters of the ACO algorithms described in accuracy of ACO, GA and SFS over all the cases are:
the previous section are assigned the following values: 84.22%, 83.49% and 83.19% respectively.
x K = N = 1, which basically makes the trail intensity and
local measure equally important. VI. CONCLUSION
x D = 0.3, E = 1.65 and J = 3, are found to be an In this paper, we presented a novel feature selection search
appropriate choice for this and other classification tasks. procedure based on the Ant Colony Optimization
x The number of ants, na = 30, and the maximum number metaheuristic. The proposed algorithm utilizes both the local
International Science Index, Computer and Information Engineering Vol:1, No:4, 2007 waset.org/Publication/10371
of iterations is 20, are chosen to justify the comparison importance of features and overall performance of subsets to
with GA. search through the feature space for optimal solutions. When
x k = 10. Thus, only the best na/3 ants are used to update used to select features for a speech segment classification
the pheromone trails and affect the feature subsets of the problem, the proposed algorithm outperformed both stepwise-
next iteration. and GA-based feature selection methods. Experiments on
x m – p = max(m – 5, round(0.65 u m)), where p is the other classification problems will be carried out in the future
number of the remaining features that need to be selected to further test the algorithm.
in each iteration. It can be seen that p will be equal to 5 if
m t 13. The rational behind this is that evaluating the REFERENCES
importance of features locally becomes less reliable as the [1] A.L. Blum and P. “Langley. Selection of relevant features and examples
number of selected features increases. In addition, this in machine learning”. Artificial Intelligence, 97:245–271, 1997.
[2] M.A. Hall. Correlation-based feature selection for machine learning.
will reduce the computational cost especially for large PhD thesis, The University of Waikato, 1999.
values of m. [3] R. Kohavi. Wrappers for performance enhancement and oblivious
x The initial value of trail intensity cc = 1, and the trail decision graphs. PhD thesis, Stanford University, 1995.
[4] J. Kittler. “Feature set search algorithms”. In C. H. Chen, editor, Pattern
evaporation is 0.25, i.e., U = 0.75. Recognition and Signal Processing. Sijhoff and Noordhoff, the
x Similar to the GA selection, the MSE of an ANN trained Netherlands, 1978.
with randomly chosen 2000 segments is used to evaluate [5] P. Pudil, J. Novovicova, and J. Kittler. “Floating search methods in
feature selection”. Pattern Recognition Letters, 15:1119-1125, 1994.
the performance of the selected subsets in each iteration. [6] P.M. Narendra and K. Fukunaga. “A branh and bound algorithm for
The selected features of each method are classified using feature subset selection”. IEEE Transactions on Computers, C-26: 917-
ANNs, and the obtained classification accuracies of the testing 922, 1977.
[7] J. Yang and V. Honavar, “Feature subset selection using a genetic
segments are shown in Fig. 1. algorithm,” IEEE Transactions on Intelligent Systems, 13: 44–49, 1998.
It can be seen that the three feature selection methods were [8] M. Gletsos, S.G. Mougiakakou, G.K. Matsopoulos, K.S. Nikita, A.S.
able to achieve classification accuracy similar to that of LPR Nikita, and D. Kelekis. “A Computer-Aided Diagnostic System to
Characterize CT Focal Liver Lesions: Design and Optimization of a
with far less number of features than that of the LPR baseline
Neural Network Classifier” IEEE Transactions on Information
set. However, the ACO was the only method that achieved Technology in Biomedicine, 7: 153-162, 2003.
[9] M. Dorigo, V. Maniezzo, and A. Colorni. “Ant System: Optimization by
88 a colony of cooperating agents”. IEEE Transactions on Systems, Man,
and Cybernetics – Part B, 26:29–41, 1996.
86 [10] T. Stützle and M. Dorigo. “The Ant Colony Optimization Metaheuristic:
Algorithms, Applications, and Advances”. In F. Glover and G.
Classification accuracy
International Scholarly and Scientific Research & Innovation 1(4) 2007 1002 scholar.waset.org/1999.4/10371