Literature Review On Feature Selection Methods For High-Dimensional Data
Literature Review On Feature Selection Methods For High-Dimensional Data
net/publication/295472880
CITATIONS READS
129 7,451
3 authors:
SEE PROFILE
All content following this page was uploaded by Asir Antony Gnana Singh Danasingh on 19 August 2016.
9
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
methods based on how the supervised learning algorithm is 3.1 Feature Selection Based on Combining
employed in the feature selection process.
the Features for Evaluation
This section reviews various methods of feature selection
based on how the features are combined for evaluation in
order to select the significant features from a dataset. They are
classified into feature subset-based and feature ranking-based
methods.
10
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
Tahir et al formed the feature subsets using tabu search then 3.1.2 Feature ranking-based methods
these subsets are evaluated using K-nearest neighbor classifier In the feature-ranking based approach, each feature of a
(kNN) with the classification error as evaluation criteria to dataset is weighted based on any one of the statistical or
obtain the significant feature subset [9]. information-theoretic measures and the features are ranked
A number of feature selection processes used the ant colony based on their weight. Then the higher ranked features are
optimization as the searching criteria for subset generation. selected as the significant features using a predefined
Aghdam et al employed the ant colony optimization search to threshold that determines the number of features to be
form the feature subsets and they are validated by the nearest selected from a dataset. The best example for the feature
neighbor classifier for text classification application [10]. ranking-based method is chi-square-based feature selection
Kanan & Faez proposed a feature selection method using ant (CQFS). In this method, Liu & Setiono used the chi-square
colony optimization for face recognition system. In this statistic measure to weight the features in order to rank them
approach, the nearest neighbor classifier is adopted for for selecting the significant features [24]. In the similar way,
evaluating the generated subset using ant colony optimization- the information-theoretic measures such as information gain,
based learning [11]. Sivagaminathan & Ramakrishnan symmetric uncertainty, gain ratio, etc. are employed to weight
developed an ant colony optimization-based feature selection the individual feature and rank them for selection.
with artificial neural networks (ANN) for medical diagnosis Further, it is observed that the feature ranking-based methods
system. In this method, the generated feature subsets are use the statistical measures or information-theoretic measures
validated using ANN [12]. Sreeja & Sankar presented an ant to weight the individual feature only by observing the
colony optimization-based feature selection with instance- relevancy between the individual feature and the target-class.
based pattern matching-based classification (PMC) [13]. Hence, these methods take less runtime but fail to remove the
In certain feature selection research works, the genetic redundant features [2]. The feature ranking-based methods
algorithm is adopted to generate the feature subsets for follow a filter-based approach since these methods do not
evaluation and the supervised machine learning algorithm is involve the supervised learning algorithm to evaluate the
used to evaluate the generated subsets. Welikala et al significance of the features. Consequently, these methods are
presented a feature selection using genetic algorithm with independent of the supervised learning algorithm hence they
support vector machine (SVM) for mining the medical dataset achieve more generality and less computational complexity.
[14]. Erguzel et al used the genetic algorithm and artificial Thus, the feature ranking-based methods can be a good choice
neural network for electroencephalogram (EEG) signal for selecting the significant features from the high-
classification [15]. Oreski & Oreski proposed a feature dimensional space with suitable redundancy analysis
selection method based on genetic algorithm with neural mechanism.
networks for credit risk assessment [16]. Li et al developed a
genetic algorithm with support vector machine for hyper-
3.2 Feature Selection Based on the
spectral image classification [17]. Das et al formulated a Supervised Learning Algorithm Used
genetic algorithm with support vector machine-based feature This section reviews various methods of feature selection
selection for handwritten digit recognition application [18]. based on the machine learning algorithm used. They are
Wang et al applied the genetic algorithm for subset generation categorized as wrapper, embedded, filter, and hybrid methods.
with support vector machine in feature selection process for
data classification applications [19]. 3.2.1 Wrapper-based methods
Wrapper-based approach generates the feature subsets using
In the literature, some researches employed the particle swam any one of the searching techniques and evaluates these
optimization to generate the feature subsets and to validate subsets using the supervised learning algorithm in terms of
them by supervised machine learning algorithm to identify the classification error or accuracy [25]. The wrapper method
significant feature subset. Xue et al designed a particle swarm seems to be a “brute force” method. This approach is
optimization (PSO)-based feature selection for classification. illustrated in Figure 1. Kohavi & John developed a wrapper-
In this method, the feature subsets generated by PSO are based feature selection method for selecting the significant
evaluated using supervised learning algorithm [20]. Chen et al features from the dataset [26]. This method consists of search
presented a feature selection method using particle swarm engine for subset generation and classification algorithm to
optimization search for sleep disorder diagnosis system [21]. evaluate the subset. Further, they compare the performance of
Yang et al developed a particle swarm optimization-based this method in terms of classification accuracy with hill-
feature selection for land cover classification [22]. climbing and best-first searching strategies using decision tree
and naïve Bayes classifiers. However, they observed that
From the subset-based feature selection literature, it is
wrapper method has the problems such as searching overhead,
observed that the exhaustive or complete search leads to high
overfitting, and increased runtime.
computational complexity as it generates 2N number of
subsets from N number of features for evaluation. This In wrapper approach, the searching is an overhead since the
searching strategy cannot be a better choice for high- searching technique does not have the domain knowledge. In
dimensional space. The heuristic search methods also lead to order to overcome the searching time overhead, Inza et al
more computational complexity, because they need prior used estimation of Bayesian network algorithm for feature
knowledge and each generated subset needs to develop a subset selection using naive Bayes and ID3 (Iterative
classification model for evaluating them to obtain the optimal Dichotomiser 3) [27]. In general, the searching method may
feature subset in an iterative manner, hence these searching lead to increase in computational complexity, since the
strategies are not suitable for high-dimensional space. training data is split for evaluation. In order to overcome this
However, these heuristic search methods follow a wrapper- issue, Grimaldi et al used an aggregation principle with
based approach. Therefore, these methods are computationally sequential search [28]. Dy & Brodley developed a wrapper-
expensive and they can only produce higher classification based approach for unsupervised learning using order
accuracy for the specific classification algorithm used to identification (recognizing the number of clusters in the data)
validate the subset, so they cannot achieve high generality. with the expectation maximization (EM) clustering algorithm
11
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
using maximum likelihood (ML) criterion [29]. Aha & Neumann et al developed an embedded-based feature
Bankert presented a wrapper-based method with beam search selection method for selecting the significant features from
and IB1 classifier [30]. Also, they compared its performance synthetic and real world datasets. In their approach, linear and
with the well known sequential search algorithms for feature non linear SVMs are employed in the selection process using
selection such as forward sequential selection (FSS) and the deference of convex functions algorithm (DCA) [47].
backward sequential selection (BSS). They observed that the Xiao et al proposed an embedded-based method to select the
beam search outperforms the FSS and BSS. significant features from audio signals for emotion
classification. This method was implemented based on the
The Maldonado & Weber developed a wrapper approach- principle of evidence theory with mass function and the
based feature selection by combining support vector machine identified most relevant features are added incrementally for
(SVM) with kernel functions. This method uses the sequential classification [48]. Maldonado et al developed an embedded
backward selection for feature subset generation and these method to select the significant features from imbalanced data
subsets are validated in terms of classification error to identify for classification with several objective functions [49].
the best subset [31]. In order to minimize the searching
overhead, Gütlein et al used the search algorithm namely Further, it is observed that the embedded methods are
ORDERED-FS that orders the features in terms of computationally efficient than the wrapper methods and
resubstitution error to identify their irrelevancy [32]. Kabir et computationally costlier than the filter methods hence they
al developed a wrapper-based constructive approach for cannot be suitable choice for high-dimensional space and they
feature selection (CAFS) using neural network (NN). In this have poor generality since the embedded methods use the
method, the correlation measure is used to remove the supervised learning algorithm.
redundancy in the searching strategy for improving the
performance of NN [33]. Stein et al proposed an ant colony 3.2.3 Filter-based methods
optimization-based feature selection with wrapper model. In The filter-based approaches are independent of the supervised
this approach, the ant colony optimization is used as a learning algorithm therefore offer more generality and they
searching method in order to reduce the searching overhead are computationally cheaper than the wrapper and embedded
such as blind search or forward selection or backward approaches. For processing the high-dimensional data, the
elimination searching methods [34]. Furthermore, to minimize filter methods are suitable rather than the wrapper and
the searching overhead, Zhuo et al presented a wrapper-based embedded methods.
feature selection using genetic algorithm with support vector Generally, the process of feature selection aimed at choosing
machine for classifying the hyper-spectral images [35]. the relevant features. The best example is Relief [50] that was
In the wrapper approach, overfitting can be overcome by post- developed with the distance-based metric function that
pruning, jitter, and early stopping methods. Post-pruning is weights each feature based on their relevancy (correlation)
carried out while developing the decision tree [36]. In jitter with the target-class. However, Relief is ineffective as it can
method, the noisy data that make the learning process more handle only the two-class problems and also does not deal
difficult are eliminated in order to fit the training data thereby with redundant features. The modified version of the Relief
the overfitting is eliminated [37]. In early stopping method, known as ReliefF [51] can handle the multi-class problems
overfitting is eliminated using neural network by stopping the and deal with incomplete and noisy datasets too. However, it
training process when performance on a validation set starts to fails to remove the redundant features. Holte developed a rule-
deteriorate [38] [39]. The researchers have tried to reduce the based attribute selection known as OneR which forms one
overfitting by early stopping method using genetic algorithm- rule for each feature and selects the rule with the smallest
based searching with early stopping (GAWES) [40]. error [52]. Yang & Moody proposed a joint mutual
information-based approach (JMI) for classification. It
Further, it is observed that the wrapper-based methods are calculates the joint mutual information between the individual
suffered by the searching overhead, overfitting [41] and have feature and the target-class to identify the relevant features,
more computational complexity with less generality since and a heuristic search is adopted for optimization when the
they use the supervised learning algorithm for evaluating the number of features is more. The features containing similar
generated subsets by the searching method. Therefore, these information and lesser relevancy to the target-class are treated
methods are not suitable choice for the high-dimensional as redundant features that are to be eliminated [53].
space.
Peng et al proposed a mutual information-based max-
3.2.2 Embedded-based methods relevancy min-redundancy (MRMR) feature selection. To
The embedded-based methods use a part of the learning identify the feature relevancy, the mutual information is
process of the supervised learning algorithm for feature computed between the individual feature and target-class, and
selection. Embedded-based methods reduce the computational to identify the redundant feature, the mutually exclusive
cost than the wrapper method [42]. This embedded method condition is applied [54]. Battiti developed a mutual
can be roughly categorized into three namely pruning method, information-based feature selection method (MIFS). In this
built-in mechanism, and regularization models. In the method, mutual information measure is used to determine the
pruning-based method, initially all the features are taken into relevancy between the individual feature and the target-class.
the training process for building the classification model and The features having similar information are considered as
the features which have less correlation coefficient value are redundant features that are to be removed [55]. Fleuret
removed recursively using the support vector machine (SVM) presented a feature selection scheme namely conditional
[43]. In the built-in mechanism-based feature selection mutual info maximization (CMIM) that recursively chooses
method, a part of the training phase of the C4.5 [36] and ID3 the features that have maximum mutual information with the
[44] supervised learning algorithms are used to select the target-class for classification [56].
features. In the regularization method, fitting errors are
minimized using the objective functions and the features with Meyer & Bontempi proposed a filter-based approach that uses
near zero regression coefficients are eliminated [45] [46]. double input symmetrical relevance (DISR) metric for feature
selection. This approach returns the selected features that
12
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
contain more information about the target-class than the dataset. For feature subset generation from the relevant
information about other features [57]. Lin & Tang introduced features, the searching strategies such as sequential backward
an information theory-based conditional infomax feature floating search (SBFS), extended sequential forward search
extraction (CIFE) algorithm to measure the class-relevancy (ESFS), and sequential forward floating search (SFFS) are
and redundancy for feature selection [58]. Brown et al used also employed [74]. Naseriparsa et al proposed a hybrid
the conditional redundancy (CondRed) metric for selecting method using information gain and genetic algorithm-based
the significant features from the dataset [59]. searching method combined with a supervised learning
algorithm [75]. Huda et al developed a hybrid feature
In the recent past, the clustering technique is also adopted in selection method by combining the mutual information (MI)
feature selection. Song et al developed a feature selection and artificial neural network (ANN) [76]. Gunal presented a
framework and adopted the graph-based clustering technique hybrid feature selection method by combining filter and
to identify the similarity among the features for removing the wrapper method for text classification. In this method,
redundant features [60]. Dhillon et al developed a feature information gain measure is used for ranking the significant
selection algorithm based on information theory for text features and the genetic algorithm is used as the searching
classification. In this approach, the hierarchical clustering is strategy with support vector machine [77].
used to cluster the features or terms of documents for
identifying their dependencies [61]. Li et al incorporated the D‟Alessandro et al proposed a hybrid approach for epileptic
clustering algorithm with the chi-square statistical measure to seizure prediction, in which ranking with genetic algorithm-
select the features from statistical data [62]. Cai et al based wrapper approach was implemented [78]. Yang et al
developed a spectral clustering-based feature selection developed a hybrid method for classifying the micro array
(MCFS) for selecting the significant features from the datasets data. In this method, the information gain and correlation
[63]. Chow & Huang employed the supervised clustering metric are used for filter method and an improved binary
technique and mutual information for identifying the salient particle swarm optimization (BPSO) method is used with the
features from synthetic and real world datasets [64]. Mitra et supervised learning algorithm as the wrapper method to
al presented a feature selection approach by adopting the improve the performance of the classification algorithm. The
graph-based clustering approach to identify the similarity performance of this method is evaluated using kNN and SVM
among the features for redundancy analysis [65]. Sotoca & classifiers [79]. To avoid the computational cost of the
Pla developed a feature selection method for classification wrapper method, Bermejo presented a hybrid method by
based on feature similarity with hierarchical clustering [66]. combining the filter and wrapper methods. In this method, the
GRASP meta-heuristic based on stochastic algorithm is used
Further, it is observed that the filter-based methods are as filter method for reducing the wrapper computation [80].
computationally better than the wrapper [67] and embedded Foithong et al also designed a hybrid feature selection method
[68] methods. Therefore, the filter-based methods can be a by combining the filter and the wrapper methods. In this
suitable choice for high-dimensional space. The filter-based method, the mutual information criterion is used for filtering
methods achieve high generality since they do not use the the relevant features and the supervised learning algorithm is
supervised learning algorithm. adopted as the wrapper method for evaluating features
3.2.3 Hybrid Methods obtained from the filter method [81].
The hybrid methods are the combination of filter and Further, it is observed that the hybrid methods are
wrapper-based approaches [69]. In general, processing the computationally intensive than the filter methods since they
high-dimensional data is a difficult task with the wrapper combine the wrapper and filter methods and have less
method therefore the authors Bermejo et al developed a generality compared to the filter methods since they use the
hybrid feature selection method known as filter-wrapper supervised learning algorithm in feature selection process.
approach. In this approach, they used a statistical measure to These hybrid methods take more computational time than the
rank the features based on their relevancy then the higher filter-based methods.
ranked features are given to the wrapper method so that the
number of evaluations required for the wrapper method is 4. SUMMARY
linear. Thus, the computational complexity is reduced using This section summarizes the feature selection methods that are
hybrid method for medical data classification [70]. Ruiz et al categorized based on how the features are combined in the
developed a gene (feature) selection algorithm for selecting selection process namely feature subset-based and feature
the significant genes for the medical diagnosis system. They ranking-based and based on how the supervised learning
used a statistical ranking approach to filter the features from algorithm used namely wrapper, embedded, hybrid, and filter.
high-dimensional space and the filtered features are fed into
the wrapper approach. This combination of the filter and The subset-based methods generate the feature subsets using
wrapper approach was used to distinguish the significant any one of the searching strategies for evaluation. The
genes causing cancer disease in the diagnosis process [71]. exhaustive or complete search is used to generate the subset
that leads to high computational complexity since maximum
Xie et al developed a hybrid approach for diagnosing the 2N number of possible combination of the subsets to be
erythemato-squamous diseases. In this approach, F-score generated from the N number of features to evaluate them.
measure is used to rank the features to identify the relevant This is a “brute force” method so this is not suitable for high-
features (filter approach). The significant features are selected dimensional space. The heuristic search such as SA, TS,
from the ranked features with the sequential forward floating ACO, GA, and PSO are employed to reduce the number of
search (SFFS) and SVM (wrapper method) [72]. Kannan & feature subset generation for evaluation using the heuristic
Faez presented a hybrid feature selection framework. In this function. The subset-based feature selection methods using
approach, ant colony optimization (ACO)-based local search the heuristic search lead to more computational complexity,
(LS) is used with the symmetric uncertainty measure to rank because they need prior knowledge and each generated subset
the features [73]. Xie et al designed a hybrid approach with need to develop a classification model to evaluate them.
F-score to identify the relevant attributes from a disease
13
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
However, this heuristic search methods follow a wrapper- generality. Therefore, the feature selection can be developed
based approach therefore these methods are computationally for high-dimensional data using the filter approach with
expensive and they can only produce higher classification ranking method for selecting the significant features from the
accuracy for the specific classification algorithm used to high-dimensional space. In addition, to overcome the
obtain the fitness or heuristic function. Therefore, these limitations of the ranking method the redundancy analysis
methods cannot achieve high generality. The ranking-based mechanism can be adopted with a suitable clustering
methods take less computation time and achieve high approach.
generality since they do not use the supervised learning
algorithm. They cannot remove the redundant features since 6. REFERENCES
they only compute the correlation or similarity between the [1] Saeys, Y, Inza, I & Larrañaga, P 2007, „A review of
individual feature and the target-class. Therefore, they can be feature selection techniques in bioinformatics.
a suitable choice for high dimensional space with a suitable Bioinformatics‟, vol. 23, no. 19, pp.2507-2517
redundancy analysis mechanism.
[2] Bolón-Canedo, V, Sánchez-Maroño, N & Alonso-
The wrapper, embedded, and hybrid methods are Betanzos, A, 2013, „A review of feature selection
computationally inefficient than the filter approach. In methods on synthetic data‟, Knowledge and information
addition, they do not have high generality since they use the systems, vol. 34, no.3, pp.483-519.
supervised learning algorithm in feature selection process.
[3] Hall, MA 1999, Correlation-based feature selection for
Therefore, the filter methods are the best choice for the high-
machine learning, Ph.D. thesis, The University of
dimensional data. Further, the filter methods are preferred
Waikato, NewZealand.
because they can perform better with any classification
algorithm since they possess better generality and require less [4] Liu, H & Setiono, R 1996, „A probabilistic approach to
computational complexity. The ranking-based approaches are feature selection-a filter solution‟, Proceedings of
better than the feature subset-based methods since the subset- Eighteenth International Conference on Machine
based methods require more space and computational Learning, Italy, pp. 319-327.
complexity. Therefore, the ranking-based methods are the best
choice for selecting the relevant features from the high- [5] Lisnianski, A, Frenkel, I & Ding, Y, 2010, „Multi-state
dimensional space. system reliability analysis and optimization for
engineers and industrial managers‟, Springer, New York.
In the feature selection literature, some researchers have
succeeded in effectively removing the irrelevant features, but [6] Lin, S.W, Tseng, TY, Chou, SY & Chen, SC 2008, „A
failed to handle the redundant features. On the other hand, simulated-annealing-based approach for simultaneous
some other researchers dealt with removing the irrelevant parameter optimization and feature selection of back-
features and redundant features. Furthermore, the state-of-the- propagation networks‟, Expert Systems with
art feature selection methods reported in literature use the Applications, vol. 34, no.2, pp.1491-1499.
rule-based metric and nearest neighbor principles. Both of the [7] Meiri, R & Zahavi, J 2006, „Using simulated annealing
methods eliminate the irrelevant features, but fail to treat the to optimize the feature selection problem in marketing
redundant features. Some of the methods use the information- applications‟, European Journal of Operational Research,
theoretic-based metric to calculate the relevancy between the vol.171, no.3, pp.842-858.
feature and the target-class for relevancy analysis and to
calculate the independency among features for redundancy [8] Zhang, H & Sun, G 2002, „Feature selection using tabu
analysis. search method‟, Pattern recognition, vol. 35, no.3,
pp.701-711.
In most of the information-theoretic-based approaches, the
same metric is used for both redundancy and irrelevancy [9] Tahir, MA, Bouridane, A & Kurugollu, F 2007,
analysis. Some of these approaches perform pair wise analysis „Simultaneous feature selection and feature weighting
to identify the independency among the features for using Hybrid Tabu Search/K-nearest neighbor classifier‟
redundancy analysis, resulting in increased time complexity. Pattern Recognition Letters, vol. 28, no.4, pp.438-446.
They do not have special mechanism for treating the [10] Aghdam, MH, Ghasem-Aghaee, N & Basiri, ME 2009,
redundant features, yet they do moderate redundancy analysis. „Text feature selection using ant colony optimization‟,
Most of the clustering-based approaches use the hierarchical Expert systems with applications, vol. 36, no.3, pp.6843-
clustering for feature selection and they deal with specific 6853.
types of datasets. However, the hierarchical clustering is
expensive for high-dimensional datasets and less effective in [11] Kanan, HR & Faez, K 2008, „An improved feature
high-dimensional space due to the dimensionality selection method based on ant colony optimization
phenomenon. Hence, simple, scalable, and faster K-means (ACO) evaluated on face recognition system‟, Applied
clustering algorithm can be used [82] for relevancy analysis in Mathematics and Computation, vol. 205, no.2, pp.716-
feature selection. 725.
14
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
[14] Welikala, R.A, Fraz, MM, Dehmeshki, J, Hoppe, A, Tah, [26] Kohavi, R & John, GH 1997, „Wrappers for feature
V, Mann, S, Williamson, TH & Barman, SA 2015, subset selection‟, Artificial intelligence, vol. 97, no.1,
„Genetic algorithm based feature selection combined pp.273-324.
with dual classification for the automated detection of
proliferative diabetic retinopathy‟, Computerized [27] Inza, I, Larrañaga, P, Etxeberria, R & Sierra, B 2000,
Medical Imaging and Graphics, vol. 43, pp.64-77. „Feature subset selection by Bayesian network-based
optimization‟, Artificial intelligence, vol. 123, no. 1,
[15] Erguzel, TT, Ozekes, S, Tan, O & Gultekin, S 2015, pp.157-184.
„Feature Selection and Classification of
Electroencephalographic Signals an Artificial Neural [28] Grimaldi, M, Cunningham, P & Kokaram, A 2003, „An
Network and Genetic Algorithm Based Approach‟, evaluation of alternative feature selection strategies and
Clinical EEG and Neuroscience, vol. 46, no.4, pp.321- ensemble techniques for classifying music‟, Proceedings
326. of Fourteenth European Conference on Machine
Learning and the Seventh European Conference on
[16] Oreski, S & Oreski, G 2014, „Genetic algorithm-based Principles and Practice of Knowledge Discovery in
heuristic for feature selection in credit risk assessment, Databases, Dubrovnik, Croatia
Expert systems with applications, vol. 41, no.4, pp.2052-
2064. [29] Dy, JG & Brodley, CE 2000, „Feature subset selection
and order identification for unsupervised learning‟,
[17] Li, S, Wu, H, Wan, D & Zhu, J, 2011, „An effective proceedings In Proceedings of the Seventeenth
feature selection method for hyperspectral image International Conference on Machine Learning, p. 247–
classification based on genetic algorithm and support 254.
vector machine‟, Knowledge-Based Systems, vol. 24,
no.1, pp.40-48. [30] Aha, DW & Bankert, RL 1996, „A Comparative
Evaluation of Sequential Feature Selection Algorithms,
[18] Das, N, Sarkar, R, Basu, S, Kundu, M, Nasipuri, M & Springer, New York.
Basu, DK 2012, „A genetic algorithm based region
sampling for selection of local features in handwritten [31] Maldonado, S & Weber, R 2009, „A wrapper method for
digit recognition application, Applied Soft Computing, feature selection using support vector machines‟,
vol.12, no.5, pp.1592-1606. Information Sciences, 179(13), pp.2208-2217.
[19] Wang, Y, Chen, X, Jiang, W, Li, L, Li, W, Yang, L, [32] Gütlein, M, Frank, E, Hall, M & Karwath, A 2009,
Liao, M, Lian, B, Lv, Y, Wang, S & Wang, S 2011, „March. Large-scale attribute selection using wrappers,
„Predicting human microRNA precursors based on an Proceeding of IEEE Symposium on Computational
optimized feature subset generated by GA–SVM‟, Intelligence and Data Mining, Nashville, TN, USA, pp.
Genomics, vol. 98, no.2, pp.73-78. 332-339.
[20] Xue, B, Zhang, M & Browne, WN 2013, „Particle swarm [33] Kabir, MM, Islam, MM & Murase, K 2010, „A new
optimization for feature selection in classification: A wrapper feature selection approach using neural
multi-objective approach‟, IEEE Transactions on network‟, Neurocomputing, vol. 73, no. 16, pp.3273-
Cybernetics, vol. 43, no.6, pp.1656-1671. 3283.
[21] Chen, LF, Su, CT, Chen, KH & Wang, PC 2012, [34] Stein, G, Chen, B, Wu, AS & Hua, KA 2005, „March.
„Particle swarm optimization for feature selection with Decision tree classifier for network intrusion detection
application in obstructive sleep apnea diagnosis‟, Neural with GA-based feature selection‟ Proceedings of the
Computing and Applications, vol. 2, no. 8, pp.2087- forty-third ACM Annual Southeast regional conference,
2096. Kennesaw, GA, USA, vol. 2, pp. 136-141.
[22] Yang, H, Du, Q & Chen, G 2012, „Particle swarm [35] Zhuo, L, Zheng, J, Li, X, Wang, F, Ai, B & Qian, J 2008,
optimization-based hyperspectral dimensionality „A genetic algorithm based wrapper feature selection
reduction for urban land cover classification, „IEEE method for classification of hyperspectral images using
Journal of Selected Topics in Applied Earth support vector machine‟ Proceedings of Geoinformatics
Observations and Remote Sensing, vol. 5 no.2, pp.544- and Joint Conference on GIS and Built Environment:
554. Classification of Remote Sensing Images, pp. 71471J-
71471J.
[23] Lin, SW, Ying, KC, Chen, SC & Lee, ZJ 2008, „Particle
swarm optimization for parameter determination and [36] Quinlan JR 2014, „C4.5: programs for machine learning‟,
feature selection of support vector machines‟, Expert Morgan Kaufmann publishers, San Mateo, California.
systems with applications, vol. 35, no. 4, pp.1817-1824. [37] Koistinen, P & Holmström, L 1991, „Kernel regression
[24] Liu, H & Setiono, R 1995, „Chi2: Feature selection and and backpropagation training with noise, Proceedings of
discretization of numeric attributes‟, Proceedings of the IEEE International Joint Conference on Neural
IEEE Seventh International Conference on Tools with Networks, pp. 367-372.
Artificial Intelligence, Washington DC, USA, pp. 388- [38] Baluja, S 1994, „Population-based incremental learning a
391. method for integrating genetic search based function
[25] Dash, M & Liu, H 1997, „Feature selection for optimization and competitive learning‟, Technical Report
classification‟, Intelligent data analysis, vol. 1, no.1, No. CMU-CS-94-163, Carnegie Mellon University,
pp.131-156. Pittsburgh, Pa.
[39] Buntine, W 1991, „Theory refinement on Bayesian
networks‟ Proceedings of the Seventh conference on
15
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
Uncertainty in Artificial Intelligence, Barcelona, Spain, [55] Battiti, R 1994, „Using mutual information for selecting
pp. 52-60. features in supervised neural net learning‟, „IEEE
Transactions on Neural Networks, vol. 5, no. 4, pp.537-
[40] Loughrey, J & Cunningham, P 2005, „Overfitting in 550.
wrapper-based feature subset selection: The harder you
try the worse it gets‟ Proceedings of Research and [56] Fleuret, F 2004, „Fast binary feature selection with
Development in Intelligent Systems, Springer London. conditional mutual information‟, The Journal of Machine
pp. 33-43. Learning Research, vol. 5, pp.1531-1555.
[41] Freeman, C, Kulić, D & Basir, O 2015, „An evaluation of [57] Meyer, PE & Bontempi, G 2006, „On the use of variable
classifier-specific filter measure performance for feature complementarity for feature selection in cancer
selection, Pattern Recognition, vol.48, no.5, pp.1812- classification‟, Applications of Evolutionary Computing.
1826. pp. 91-102.
[42] Chandrashekar G & Sahin, F 2014, „A survey on feature [58] Lin, D & Tang, X 2006, „Conditional infomax learning:
selection methods‟, Computers & Electrical Engineering, an integrated framework for feature extraction and
vol. 40, no.1, pp.16-28. fusion‟, Proceeding of ninth European Conference on
Computer Vision, Graz, pp. 68-82.
[43] Guyon, I, Weston, J, Barnhill, S & Vapnik, V, 2002,
„Gene selection for cancer classification using support [59] Brown, G, Pocock, A, Zhao, MJ & Luján, M 2012,
vector machines, Machine learning, vol. 46, no. 1-3, „Conditional likelihood maximisation: a unifying
pp.389-422. framework for information theoretic feature selection‟,
The Journal of Machine Learning Research, vol.13, no.1,
[44] Quinlan, JR, 1986, „Induction of decision trees‟, pp.27-66.
Machine learning, vol. 1, no.1, pp.81-106.
[60] Song, Q, Ni, J & Wang, G 2013, „A fast clustering-based
[45] Tibshirani, R, Saunders, M, Rosset, S, Zhu, J & Knight, feature subset selection algorithm for high-dimensional
K 2005, „Sparsity and smoothness via the fused lasso‟, data‟, IEEE Transactions on Knowledge and Data
Journal of the Royal Statistical Society: Series B Engineering, vol. 25, no.1, pp.1-14.
(Statistical Methodology), vol. 67, no. 1, pp. 91-108.
[61] Dhillon, IS, Mallela, S & Kumar, R 2003, „A divisive
[46] Ma, S & Huang, J 2008, „Penalized feature selection and information theoretic feature clustering algorithm for text
classification in bioinformatics‟, „Briefings in classification‟, The Journal of Machine Learning
bioinformatics, vol. 9, no. 5, pp.392-403. Research, vol. 3, pp.1265-1287.
[47] Neumann, J, Schnörr, C & Steidl, G 2004, „SVM-based [62] Li, Y, Luo, C, & Chung, SM 2008, „Text clustering with
feature selection by direct objective minimisation‟, feature selection by using statistical data. IEEE
Proceeding of the twenty-sixth DAGM Symposium on Transactions on Knowledge and Data Engineering, vol.
Pattern Recognition, Germany, pp. 212-219. 20, no.5, pp.641-652.
[48] Xiao, Z, Dellandrea, E, Dou, W & Chen, L 2008, „ESFS: [63] Cai, D, Zhang, C & He, X 2010, „Unsupervised feature
A new embedded feature selection method based on selection for multi-cluster data‟, Proceedings of the
SFS‟, Rapports de recherché. sixteenth ACM SIGKDD international conference on
[49] Maldonado, S, Weber, R & Famili, F 2014, „Feature Knowledge discovery and data mining, Washington, pp.
selection for high-dimensional class-imbalanced data sets 333-342.
using Support Vector Machines‟, Information Sciences, [64] Chow, TW & Huang, D 2005, „Estimating optimal
vol. 286, pp.228-246. feature subsets using efficient estimation of high-
[50] Kira, K & Rendell, LA 1992, „A practical approach to dimensional mutual information‟, IEEE Transactions on
feature selection‟, Proceedings of the ninth international Neural Networks, vol.16, no.1, pp.213-224.
workshop on Machine learning, Aberdeen, Scotland, UK [65] Mitra S, & Acharya T 2005, „Data mining: multimedia,
(pp. 249-256). soft computing, and bioinformatics‟ John Wiley & Sons,
[51] Kononenko, I 1994, „Estimating attributes: analysis and New Jersey.
extensions of RELIEF‟. Proceeding of European [66] Sotoca, JM & Pla, F 2010, „Supervised feature selection
Conference on Machine Learning, Catania, Italy, pp. by clustering using conditional mutual information-based
171-182. distances‟, Pattern Recognition, vol. 43, no.6, pp.2068-
[52] Holte, RC 1993, „Very simple classification rules 2081
perform well on most commonly used datasets, Machine [67] Freeman, C, Kulić, D & Basir, O 2015, „An evaluation of
learning, vol.11, no.1, pp.63-90. classifier-specific filter measure performance for feature
[53] Yang, HH & Moody, JE 1999, „Data Visualization and selection, Pattern Recognition, vol.48, no.5, pp.1812-
Feature Selection: New Algorithms for Nongaussian 1826.
Data‟, Advances in Neural Information Processing [68] Frénay, B, Doquire, G & Verleysen, M 2014,
Systems, vol. 99, pp. 687–693. „Estimating mutual information for feature selection in
[54] Peng, H, Long, F & Ding C 2005, „Feature selection the presence of label noise, Computational Statistics &
based on mutual information criteria of max-dependency, Data Analysis, vol. 71, pp.832-848.
max-relevance, and min-redundancy‟, IEEE Transactions [69] Tabakhi, S, Moradi, P & Akhlaghian, F 2014, „An
on Pattern Analysis and Machine Intelligence, vol. 27, unsupervised feature selection algorithm based on ant
no.8, pp.1226-1238.
16
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016
IJCATM : www.ijcaonline.org
17