Feature Selection Using Forest Optimization Algorithm
Feature Selection Using Forest Optimization Algorithm
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
art ic l e i nf o a b s t r a c t
Article history: Feature selection as a combinatorial optimization problem is an important preprocessing step in data
Received 17 August 2015 mining; which improves the performance of the learning algorithms with the help of removing the
Received in revised form irrelevant and redundant features. As evolutionary algorithms are reported to be suitable for optimiza-
26 March 2016
tion tasks, so Forest Optimization Algorithm (FOA) – which is initially proposed for continuous search
Accepted 11 May 2016
problems – is adapted to be used for feature selection as a discrete search space problem. As the result,
Available online 24 May 2016
Feature Selection using Forest Optimization Algorithm (FSFOA) is proposed in this article in order to
Keywords: select the more informative features from the datasets. The proposed FSFOA is validated on several real
Feature selection world datasets and it is compared with some other methods including HGAFS, PSO and SVM-FuzCoc. The
Forest Optimization Algorithm (FOA)
results of the experiments show that, FSFOA can improve the classification accuracy of classifiers in some
KNN classifier
selected datasets. Also, we have compared the dimensionality reduction of the proposed FSFOA with
Dimension reduction
FSFOA other available methods.
& 2016 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.patcog.2016.05.012
0031-3203/& 2016 Elsevier Ltd. All rights reserved.
122 M. Ghaemi, M.-R. Feizi-Derakhshi / Pattern Recognition 60 (2016) 121–129
Section 4, the application of FOA for feature selection (FSFOA) is to the random nature of meta-heuristic search methods, the ap-
presented and Section 5 is devoted to the experiments and results plication of genetic algorithms, particle swarm optimization al-
on the proposed FSFOA. Finally, Section 6 summarizes the main gorithm and ant colony optimization in feature selection domain
conclusions. have shown promising results [18]; some of which are summar-
ized in the following.
Hamdani et al. proposed a new algorithm based on hierarchical
2. An overview of feature selection methods genetic algorithms with bi-coded chromosome representation and
new evaluation function [13]. In order to minimize the computa-
Many researchers have addressed feature selection (FS) pro- tional cost and also speed up the convergence speed, they used a
blem up to now and also more attempt is needed to further speed hierarchical algorithm with homogeneous and heterogeneous
up the process of selecting informative and useful features in da- population. In another attempt, Zhu et al. proposed a new algo-
tabases for data mining. rithm which is a combination of genetic algorithm and local search
The earliest methods in FS literature based on the machine method [40]. At first, GA population is generated randomly, then
learning algorithms are filters [11,12]. In all the filters, heuristic local search is applied to all of the individuals of the population in
techniques based on the general characteristics of data such as order to improve the classification accuracy and speed up the
information gain and distance is used instead of learning algo- searching process. Tan et al. used SVM (Support Vector Machine)
rithms. Another approach in feature selection is wrapper methods based on wrapper approach [31] in GA. In their proposed algo-
[11,19]. In contrary to filters, wrappers use learning algorithms to rithm, GA searches for the best feature subset and the classifica-
investigate the worthy of the selected features [41]. Generally, tion accuracy of SVM guides the search process. Gheyas et al.
wrappers produce better results than filters; because while using combined both simulated annealing (SA) and GA to use the ad-
wrapper approach, the relationship between the learning algo- vantages of both SA and GA [10]. In their proposed SAGA, GA helps
rithm and the training data is considered. The well-known draw- to escape from local optimum of SA with the crossover operator.
back of wrappers is that they are slower than filters; because the Nemati et al. proposed a new hybrid algorithm of GA and ACO in
learning algorithm must be repeatedly executed for every selected order to use the advantages of both algorithms [22]. In their al-
feature subset. Sometimes a hybrid of filter-wrapper methods is gorithm, ACO performs a local search, while GA is used to perform
used. Hybrid methods integrate feature selection within the a global search. Sivagaminathan et al. used ACO which searches for
learning algorithm in order to exploit the advantages of both near-optimum solution and ANN is used as a classifying function
wrappers and filters [11]. Ignoring the filter or wrapper approach [28]. ElAlami et al. proposed an algorithm based on GA, which
for feature selection methods, they can match any of the following optimizes the output nodes of ANN [7]. In their method, ANN is
groups: complete search, heuristic search and meta-heuristic used to give a weight to each of the features and GA finds the
methods. optimal relevant features. Kabir et al. proposed a new hybrid al-
Almuallim and Dietterich presented FOCUS method which gorithm that combines GA with local search method (HGAFS) [17].
completely searches the search space up to reaching to the smal- Their proposed method selects the feature subset with a limited
lest set of features that divides the training data into pure classes size; which is the important aspect of their method. Their method
[2,3]. But with n features to handle, there are (2n ) − 1 possible is a wrapper based method that uses both GA and ANN. In another
subsets of features so, evaluating all of the subsets is practically attempt, Tabakhi et al. presented an unsupervised feature selec-
impossible in datasets with many features. As the result, complete tion method based on ant colony optimization, called UFSACO [29].
search methods are seldom used for feature selection in large Their proposed UFSACO is a filter-based method and the search
datasets with many features. space is represented as a fully connected undirected weighted
Heuristic methods of feature selection problem include greedy graph. Xue et al proposed a series of methods based on PSO with
hill climbing algorithm [25,26], branch and bound method, beam novel initialization and updating mechanisms [35]. In their pro-
search and best first algorithm. Greedy hill climbing algorithm posed algorithm, three new initialization strategies and three new
evaluates all local changes in order to select the relevant features personal best and global best updating mechanisms in PSO are
[11,25]. SFS (Sequential Forward Selection) and SBS (Sequential presented to develop novel feature selection approaches; in which,
Backward Selection) are two kinds of hill climbing methods. SFS maximizing the classification performance, minimizing the num-
starts with an empty set of selected features and each step of the ber of features and reducing the computational time are the main
algorithm adds one of the informative features to the selected set; goals.
but, SBS starts will the full set of features and in each step, one of Despite good progress in solving feature selection problem,
the redundant or irrelevant features is omitted. Bi-directional more study is also welcomed to further optimize the solutions. In
search is another method which considers both adding and de- all the proposed methods, one should choose either computa-
leting the features simultaneously [11]. The main drawback of both tionally feasible or optimality of the selected features. Further
SFS and SBS algorithms is the “nesting effect” problem; which research is needed to develop more promising methods for feature
means that while a change is considered positive (either addition selection with the aim of providing very good results. In the pre-
or deletion of a feature), there is no chance of re-evaluating that sent work, FSFOA algorithm is proposed to further optimize the
feature. Later in order to overcome the “nesting effect” of SFS and results of feature selection methods in the case of improving
SBS algorithms, SFFS (Sequential Forward Floating Selection) and classification accuracy.
SBFS (Sequential Forward Floating Selection) were introduced [24].
Best first search is another method which like hill climbing con-
siders local changes in the search space but, it allows backtracking 3. An overview of the Forest Optimization Algorithm (FOA)
in the search space unlike hill climbing methods [11].
Heuristic algorithms perform better than complete search Forest Optimization algorithm is an evolutionary algorithm,
methods while comparing time complexities, but recently meta- which is inspired by the procedure of a few trees in the forests [9].
heuristic algorithms like Genetic Algorithm (GA), Particle Swarm FOA is proposed to solve continuous search space problems, but in
Intelligence Optimization (PSO) and Ant Colony Optimization this article we have attempted to adjust it to use in discrete search
(ACO) show more desirable results. The main advantage of the space problems like feature selection. FOA involves three main
meta-heuristic methods is their acceptable time complexity. Due stages: 1 – Local seeding of the trees, 2 – Population limiting, and 3
M. Ghaemi, M.-R. Feizi-Derakhshi / Pattern Recognition 60 (2016) 121–129 123
seeds fall just beneath the parent tree and then they turn into
young trees [9]; which is simulated by local seeding in FOA. After
initialization of the trees, the local seeding stage will operate on
trees with “Age” ‘0’ to simulate the nearby seeds of the parent
trees. Then all the trees, except new generated ones, get old and
their “Age” increases by ‘1’. This stage simulates the local search of
the algorithm.
Next stage is population limiting in which the trees with “Age”
bigger than “life time” parameter will be omitted from the forest
and they will form the candidate population [9]. Also in population
limiting stage, the rest of the trees of the forest are sorted ac-
cording to their fitness value and if the number of whole trees of
the forest exceeds the pre-defined “area limit” parameter, the extra
trees will join to the candidate population too. In the global
seeding stage, a percentage of the candidate population is chosen.
The selected trees from the candidate population will be used in
the global seeding stage. Global seeding stage simulates the global
search of FOA [9]. Next stage in FOA is updating the best tree in
which the best solution is selected according to its fitness value
and its “Age” is set to 0 in order to avoid the aging and afterward
removing the best tree from the forest. These stages will continue
iteratively until the termination criterion is met. Forest Optimi-
zation Algorithm has 5 parameters which should be initialized at
the start of the algorithm [9]:
Algorithm 1. FSFOA (life time, LSC, GSC, transfer rate, area limit)
subset
Heart-statlog 13 270 2
Vehicle 18 846 4
Cleveland 13 303 5
Dermatology 34 366 6
4. The proposed feature selection using forest optimization Ionosphere 34 351 2
algorithm (FSFOA) Sonar 60 208 2
Glass 9 214 7
Wine 13 178 3
The stages of FOA for feature selection problem are adapted as Segmentation 19 2310 7
the following. SRBCT 2308 63 4
Hepatitis 19 155 2
4.1. Initialize trees
Table 2
The forest is initialized by randomly generated trees [9]. At first, The value of “LSC” and “GSC” parameters for each dataset.
each variable of each tree in FSFOA is initialized randomly with
either ‘0’ or ‘1’. If a dataset has n features, the size of each tree will Dataset #Features “LSC” “GSC”
be 1*(n þ1); where one of the variables shows the “Age” of that
Heart-statlog 13 3 6
tree. Each ‘1’ in a tree indicates that the corresponding feature is
Vehicle 18 4 9
selected and therefore is involved in the machine learning process Cleveland 13 3 6
and each ‘0’ shows the exclusion of the related feature in the Dermatology 34 7 15
learning process. At first, the “Age” of each tree is considered to be Ionosphere 34 7 15
Sonar 60 12 30
‘0’, but local seeding in each iteration of the algorithm will increase
Glass 9 2 4
the “Age” of all trees except new generated ones in the local Wine 13 3 6
seeding stage. Segmentation 19 4 9
SRBCT 2308 460 700
4.2. Local seeding Hepatitis 19 4 10
This stage adds some neighbors of each tree with “Age” 0 to the
forest [9]. In order to simulate this stage in FSFOA, for each tree of Table 3
the forest with “Age” 0, some variables are selected randomly (“LSC” Summary of the methods for our comparisons.
parameter determines the number of the selected variables). Then
Method name Dataset Description/year
the values of the selected variables are changed from 0 to 1 or vice
splitting
versa. This procedure simulates local search in the search space;
because each time the importance of one feature is evaluated by SFS, SBS, SFFS 70–30 Greedy hill climbing methodsa[21] (2010)
adding and removing that feature prior to learning algorithm. NSM 10-fold Neighbor soft margin [14]/2010
Fig. 2 shows an example of local seeding operator on one tree, SVM-FuzCoc 70–30% A novel SVM- based FS [21]/2010
HGAFS 2-fold Hybrid genetic algorithm for FS [16]/2007
where the number of the features of the dataset is 5 and the value FS-NEIR 10-fold Neighborhood effective information ratio
of “LSC” is considered to be 2. After performing the local seeding based FS [40]/2013
stage, the “Age” of all trees except new generated ones, is increased UFSACO 70–30 Unsupervised FS algorithm based on ACO
by '1. [29]/2014
PSO(4-2) 10-fold Particle swarm optimization for feature se-
lection [35]/2013
4.3. Population limiting
a
Sequential Forward selection, Sequential Backward selection, Sequential
In this stage two series of trees will be omitted from the forest Floating Forward selection method reported from [21]
to form the candidate population: 1 – trees with “Age” bigger than
“life time” parameter and 2 – the extra trees that exceed “area limit” 4.4. Global seeding
parameter after sorting the trees according to their fitness value.
This stage forms the candidate population and pre-defined per- In order to perform this stage in FSFOA, at first for each selected
centage of the candidate population is used later in global seeding tree from the candidate population, some of the variables are se-
stage. lected randomly. The number of the selected variables is de-
termined by the “GSC” parameter. Then, the value of each selected
variable will be negated (changing from 0 to 1 or vice versa). But
this time, adding or deleting some features are considered si-
multaneously and not just one feature at a time. This operator
performs a global search in the search space. An example of per-
forming this operator on one tree is shown as Fig. 3. In Fig. 3, the
Fig. 2. An example of local seeding operation on one tree with “LSC” ¼ 2. value of “GSC” parameter is considered to be 3 (3 variables are
M. Ghaemi, M.-R. Feizi-Derakhshi / Pattern Recognition 60 (2016) 121–129 125
Fig. 4. Comparison between the Classification Accuracy (Accuracy) and Dimension Reduction (DR) obtained by FSFOA and other available methods on “Heart-statlog”,
“Cleveland”, “Vehicle”, “Dermatology”, “Sonar” “Ionosphere”, “Glass”, “Segmentation”, “Hepatitis”, “SRBCT” and “Wine” datasets.
126 M. Ghaemi, M.-R. Feizi-Derakhshi / Pattern Recognition 60 (2016) 121–129
Fig. 5. Graphical comparisons according to Accuracy for each data sets. “Hepatitis” and “Vehicle” are compared according to J48 and the others are compared due to KNN
classification accuracy.
5.3. Results and comparisons with 95% confidence interval. Feature selection algorithms se-
lected for comparisons are: Neighborhood soft margin (NSM)
We have compared our proposed FSFOA method with some method proposed by Hu et al. [14], SVM-FuzCoc by Moustakidis
other methods. All the results of our experiments are reported et al. [21], hybrid genetic algorithm for FS (HGAFS) by Huang [16],
128 M. Ghaemi, M.-R. Feizi-Derakhshi / Pattern Recognition 60 (2016) 121–129
FS-NEIR which uses a different feature evaluation criterion by Zhu 3 datasets (“Heart-statlog”, “Ionosphere” and “Segmentation”) of 11
et al. [40], an unsupervised feature selection algorithm based on ones according to classification accuracy. In datasets “Dermatol-
ant colony optimization (UFSACO) proposed by Tabakhi et al. [29] ogy”, “Sonar”, “Wine”, “Glass”, “Cleveland” and “Hepatitis”, FSFOA
and PSO(4-2) which is a PSO based method by Xue et al. [35]. outperforms many of the other methods except for one method
Among the methods, HGAFS uses support vector machine. SFS, where it has the second rank. In the other two datasets FSFOA
SBS and SFFS are among greedy methods and are chosen from the couldn't have the desirable performance. Among the selected
article of [21]. SVM-FuzCoc, PSO(4-2) and NSM use 1NN, 5NN and methods for comparison, there are methods which employ GA,
3NN classifiers respectively. UFSACO and FS-NEIR reported the PSO, and ACO; which are well-known algorithms. These results
classification accuracy of J48 classifier. The summary of these show that FSFOA has acceptable performance in solving feature
methods is shown in Table 3. Also, Table 3 shows the way each selection as a real optimization problem.
method have used the datasets (10 fold cross-validation, 70%
training and 30% testing, or 2-fold cross validation for training and
testing). 6. Conclusion
Classification accuracy and dimensionality reduction of FSFOA
and the other methods of Table 3 are reported in the tables Feature selection is considered to be an important preproces-
of Fig. 4. The results reported for FSFOA in Fig. 4 are over 10 in- sing step in machine learning and pattern recognition. Many
dependent runs. For each dataset the best classification accuracy heuristic and meta-heuristic methods have been proposed to ad-
and the best dimension reduction (DR) are highlighted in bold dress this problem.
form. Dimension Reduction (DR) in Fig. 4 is calculated by Eq. (2). In In this article, we have attempted to use Forest Optimization
order to provide fair comparisons, for each dataset multiple results Algorithm (FOA) for solving feature selection problem. As FOA is
according to dataset splitting with different percentages are re- reported to be suitable for continuous search space problems, so
ported and they are considered in our comparisons. Also, for each we have adjusted the stages of FOA for discrete search space of
method the used classifier- i.e. KNN, SVM or J48 is presented in feature selection problem and proposed FSFOA algorithm.
each table. Summary of the configuration parameters of the clas- In order to investigate the performance of FSFOA, we have se-
sifiers is presented as Table 4; which indicates that KNN classifier lected some well-known datasets from the UCI repository and
is used with different values for K where needed for comparisons compared the results of FSFOA with other methods. Among the
( K ∈ {1, 3, 5}) and J48 classifier of Weka is used as a decision tree selected methods for comparison, there are GA, ACO and PSO
based method. The kernel function for SVM classifier is radial basis based algorithms. The results of the experiments showed the su-
function (Rbf-svm). periority of our method in most of the selected datasets. In this
As it is obvious from the tables of Fig. 4, Classification Accuracy article, we have used KNN, SVM and J48 classifiers of WEKA
(CA) of “Heart-statlog”, “Ionosphere” and “Segmentation” datasets software to evaluate the fitness of each potential solution and
have improved in comparison with all the selected methods. classification accuracy is considered as our fitness function.
Among the selected methods there are GA_based and ACO_based This study shows that FOA is an effective search technique for
methods. This shows that FOA could improve the performance of feature selection problems but further research is also welcomed.
KNN, J48 and SVM classifier by reducing the redundant features in For our future research, we will attempt to investigate the per-
these datasets. In datasets “Sonar”, “Wine”, “Hepatitis”, “cleveland”, formance of FSFOA in very large datasets with huge number of
“Dermatology”, and “Glass”, FSFOA could outperform in almost all features (e.g. over 10,000); because the size of datasets both in the
the selected methods for comparisons; FSFOA has the second rank number of features and instances grows these days and data
in these datasets where it couldn't outperform. In “Vehicle” data- mining in very large datasets is a big concern. Also, involving the
set, FSFOA could outperform just one of the methods with the number of the selected features in fitness function with the aim of
same partitioning and classifier. FSFOA did not show a good per- improving the dimension reduction rate (DR) will be our feature
formance in “SRBCT” dataset. “SRBCT” dataset is the only dataset attempt. This can be implemented by multi-objective fitness
where the number of features are much more than the number of function which takes into account the classification accuracy and
samples; this makes it difficult to select the proper features for the number of selected features simultaneously.
prediction. As the number of samples are not sufficient for se-
lecting the more informative features and applying the traditional
methods yields poor results. Also, this dataset is partitioned to
Conflict of interest
70%-30% training and testing dataset; this makes the problem
worse; because some extend of the dataset is ignored during the
None declared.
training phase. This shows that feature selection in large datasets
with many features and where the number of samples is limited is
a challenging problem and deserves more research.
References
Comparing the DR of the methods in Fig. 4, it is obvious that
FSFOA couldnot outperform the selected methods; because as we
[1] W. Aha David, Feature weithing for lazy learning algorithms, in: Huan Liu,
mentioned before, the number of the selected features is not in- Hiroshi, Motoda (Eds.), Feature Extraction Construction and Selection: a Data
volved in the fitness evaluation of each potential solution and Mining Perspective, Kluwer Academic Publishers, Massachussetts, 1998, 13–32.
classification accuracy is considered as the fitness function. For [2] Almuallim Hussein, Thomas G. Dietterich, Learning Boolean concepts in the
presence of many irrelevant features, Artif. Intell. 69 (1) (1994) 279–305.
better performance illustration, we have shown the results gra- [3] H. Almuallim, T.G. Dietterich, Learning with many irrelevant features, in:
phically in the charts of Fig. 5. For datasets “Hepatitis” and “Ve- Proceedings of the AAAI, vol. 91, 1991 July 14, 547–552.
hicle” the two selected methods are compared according to J48 [4] C. Blake, E. Keogh, C.J. Merz, UCI Repository of machine learning databases,
University of California, Irvine, 〈http://www.ics.uci.edu/ mlearn/MLReposi
classifier and for datasets “Dermatology”, “Sonar”, “SRBCT”, “Wine”,
tory.html〉.
“Heart-statlog”, “Ionosphere”, “Glass”, “Cleveland” and “Segmenta- [5] M. Jose Cadenas, M. Carmen Carrido, Raquel Martinez, Feature subset selection
tion” KNN classifier is chosen in the graphical comparisons of filter-wrapper based on low quality data, Expert Syst. Appl. 40 (2013)
Fig. 5. 6241–6252.
[6] K.J. Cios, G. William Moore, Uniqueness of medical data mining, Artif. In-
While comparing the results of Fig. 4 and also charts of Fig. 5, tell. Med. 26 (1) (2002) 1–24.
FSFOA performs absolutely better than the other methods in [7] M.E. ElAlami, A filter model for feature subset selection based on genetic
M. Ghaemi, M.-R. Feizi-Derakhshi / Pattern Recognition 60 (2016) 121–129 129
algorithm, Knowl.-Based Syst. 22 (5) (2009) 356–362. [27] O. Seral, S. Gunes, Attribute weighting via genetic algorithms for attribute
[8] E. Gasca, J.S. Sanchez, R. Alonso, Eliminating redundancy and irrelevance using weighted artificial immune system (AWAIS) and its application to heart dis-
a new MLP-based feature selection method, Pattern Recognit. 39 (2006) ease and liver disorders problems, Expert Syst. Appl. 36 (2009) 386–392.
313–315. [28] Rahul Karthik Sivagaminathan, Sreeram Ramakrishnan, A hybrid approach for
[9] Manizheh Ghaemi, Mohammad-Reza Feizi-Derakhshi, Forest optimization al- feature subset selection using neural networks and ant colony optimization,
gorithm, Expert Syst. Appl. 41 (15) (2014) 6676–6687. Expert Syst. Appl. 33 (2007) 49–60.
[10] A. Gheyas Iffat, S. Smith Leslie, Feature subset selection in large dimensionality [29] S. Tabakhi, P. Moradi, F. Akhlaghian, An unsupervised feature selection algo-
domains, Pattern Recognit. 43 (2010) 5–13. rithm based on ant colony optimization, Eng. Appl. Artif. Intell. 32 (2014)
[11] A. Hall Mark, Correlation-based feature selection for machine learning (Ph.D. 112–123.
thesis), Hamilton, NewZealand, 1999. [30] M.A. Tahir, A. Bouridane, F. Kurugollu, Simultaneous feature selection and
[12] A. Hall Mark, Correlation-based feature selection for discrete and numeric feature weighting using Hybrid Tabu Search/K nearest neighbor classifier,
class machine learning, in: Proceedings of 17th International Conference on Pattern Recognit. Lett. 28 (2007) 438–446.
Machine Learning, 2000, 359–366. [31] K.C. Tan, E.J. Teoh, Q. Yu, K.C. Goh, A hybrid evolutionary algorithm for attri-
[13] M. Hamdani Tarek, Jin-Myung Won, M. Alimi Adel, Karray Fakhri, Hierarchical bute selection in data mining, Expert Syst. Appl. (2009) 8616–8630.
genetic algorithm with new evaluation function and bi-coded representation [32] A. Tosun, B. Turhan, A.B. Bener, Feature weighting heuristics for analogy-based
for the selection of features considering their confidence rate, Appl. Soft effort estimation models, Expert Syst. Appl. 36 (2009) 10325–10333.
Comput. 11 (2011) 2501–2509. [33] I. Triguero, J. Derrac, S. Garca, F. Herrera, Integrating a differential evolution
[14] Q. Hu, X. Che, L. Zhang, D. Yu, Feature evaluation and selection based on feature weighting scheme into prototype generation, Neurocomputing 97
neighborhood soft margin, Neurocomputing 73 (10) (2010) 2114–2124. (2012) 332–343.
[15] Q.H. Hu, D. Yu, J.F. Liu, C. Wu, Neighborhood rough set based heterogeneous [34] Dietrich Wettschereck, David W. Aha, Takao Mohri, A review and empirical
feature subset selection, Inf. Sci. 178 (2008) 3577–3594. evaluation of feature weighting methods for a class of lazy learning algo-
[16] J. Huang, Y. Cai, X. Xu, A hybrid genetic algorithm for feature selection wrapper rithms, Artif. Intell. Rev. 11 (1997) 273–314.
based on mutual information, Pattern Recognit. Lett. 28 (2007) 1825–1844. [35] B. Xue, M. Zhang, W.N. Browne, Particle swarm optimisation for feature se-
[17] Md. Kabir Monirul, Md. Shahjahan, Kazuyuki Murase, A new local search lection in classification: novel initialisation and updating mechanisms, Appl.
based hybrid genetic algorithm for feature selection, Neurocomputing 74 Soft Comput. 18 (2013) 261–276.
(2011) 2914–2928. [36] Zhi-Min Yang, Jun-Yun He, Yuan Hai Shao, Feature selection based on linear
[18] Md. Kabir Monirul, Md. Shahjahan, Kazuyuki Murase, A new hybrid ant colony twin support vector machine, Proc. Comput. Sci. 17 (2013) 1039–1046.
optimization algorithm for feature selection, Expert Syst. Appl. 39 (2012) [37] J.Y. Yeh, T.H. Wu, C.W. Tsao, Using data mining techniques to predict hospi-
3747–3763. talization of hemodialysis patients, Decis. Support Syst. 50 (2) (2011) 439–448.
[19] Ron Kohavi, H. John George, Wrappers for feature subset selection, Artif. Intell. [38] Y. Zhang, A. Yang, C. Xiong, T. Wang, Z. Zhang, Feature selection using data
97 (12) (1997) 273–324. envelopment analysis, Knowl.-Based Syst. 64 (2014) 70–80.
[20] N. Lavrac, Selected techniques for data mining in medicine, Artif. Intell. Med. [39] Mingyuan Zhao, Chong Fu, Luping Ji, Ke Tang, Mingtian Zhou, Feature selec-
16 (1) (1999) 3–23. tion and parameter optimization for support vector: a new approach based on
[21] S.P. Moustakidis, J.B. Theocharis, SVM-FuzCoC: a novel SVM based feature genetic machines algorithm with feature chromosomes, Expert Syst. Appl.
selection method using a fuzzy complementary criterion, Pattern Recognit. 43 38 (5) (2011) 5197–5204.
(2010) 3712–3729. [40] Wenzhi Zhu, Si Gangquan, Zhang Yanbin, Wang Jingcheng, Neighborhood
[22] Shahla Nemati, Mohammad Ehsan Basiri, Nasser Ghasem Aghaee, Mehdi effective information ratio for hybrib feature evaluation and selection, Neu-
Hosseinzadeh Aghdam, A novel ACO-GA hybrid algorithm for feature selection rocomputing 99 (2013) 25–37.
in protein function prediction, Expert Syst. Appl. 36 (2009) 12086–12094. [41] Zexuan Zhu, Yew-soon Ong, Manoranjan Dash, Wrapper-filter feature selec-
[23] G.A. Papakostas, A.S. Polydoros, D.E. Koulouriotis, V.D. Tourassis, Evolutionary tion algorithm using a memetic framework, IEEE Trans. Syst., Man, Cybern. 37
Feature Subset Selection for Pattern Recognition Applications, INTECH Open (2007) 70–76.
Access Publisher, 2011. [42] Alexandros Kalousis, Julien Prados, Melanie Hilario, Stability of feature selec-
[24] P. Pudil, J. Novovicov, J. Kittler, Floating search methods in feature selection, tion algorithms: a study on high-dimensional spaces, Knowl. Inf. Syst. 12 (1)
Pattern Recognit. Lett. 15 (1994) 1119–1125. (2007) 95–116.
[25] Bart Selman, Carla P. Gomes, Hill‐climbing Search, Encyclopedia of Cognitive
Science, 2006.
[26] B. Selman, H.J. Levesque, D.G. Mitchell, A new method for solving hard satisfiability
problems, in: Proceedings of the AAAI, vol. 92, July 12, 1992, 440–446.
Manizheh Ghaemi received her B.S. and M.S. degrees in Computer Science from the University of Tabriaz, Iran. She is now a Ph.D. student for Artificial Intelligence in K.N.
Toosi University of Technology, Iran. Her research interests include nature-based evolutionary algorithms, optimization and machine learning algorithms.
Mohammad-Reza Feizi-Derakhshi received his B.S. in Software Engineering from the University of Isfahan. He received his M.S. and Ph.D. in AI from the Iran University of
Science and Technology. He is currently a faculty member at the University of Tabriz. His research interests include: NLP, optimization algorithms and intelligent databases.