0% found this document useful (0 votes)
63 views10 pages

Literature Review On Feature Selection Methods For High-Dimensional Data

Feature selection plays a significant role in improving the performance of the machine learning algorithms in terms of reducing the time to build the learning model and increasing the accuracy in the learning process. Therefore, the researchers pay more attention on the feature selection to enhance the performance of the machine learning algorithms. Identifying the suitable feature selection method is very essential for a given machine learning task with high-dimensional data. Hence, it is requi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views10 pages

Literature Review On Feature Selection Methods For High-Dimensional Data

Feature selection plays a significant role in improving the performance of the machine learning algorithms in terms of reducing the time to build the learning model and increasing the accuracy in the learning process. Therefore, the researchers pay more attention on the feature selection to enhance the performance of the machine learning algorithms. Identifying the suitable feature selection method is very essential for a given machine learning task with high-dimensional data. Hence, it is requi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/295472880

Literature Review on Feature Selection Methods for High-Dimensional Data

Article in International Journal of Computer Applications · February 2016


DOI: 10.5120/ijca2016908317

CITATIONS READS

129 7,451

3 authors:

Asir Antony Gnana Singh Danasingh Suganya Balamurugan


Anna University , Tiruchirappalli P.S.R. Engineering College
69 PUBLICATIONS 670 CITATIONS 64 PUBLICATIONS 492 CITATIONS

SEE PROFILE SEE PROFILE

JEBAMALAR LEAVLINE EPIPHANY


Anna University,Tiruchirappalli , India
90 PUBLICATIONS 648 CITATIONS

SEE PROFILE

All content following this page was uploaded by Asir Antony Gnana Singh Danasingh on 19 August 2016.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

Literature Review on Feature Selection Methods for


High-Dimensional Data
D. Asir Antony Gnana S. Appavu alias E. Jebamalar Leavline
Singh Balamurugan Anna University, BIT Campus,
Anna University, BIT Campus, K.L.N College of Information Tiruchirappalli, India.
Tiruchirappalli, India. Technology, Sivagangai, India.

ABSTRACT selection is a process of removing the redundant and the


Feature selection plays a significant role in improving the irrelevant features from a dataset to improve the performance
performance of the machine learning algorithms in terms of of the machine learning algorithms. The feature selection is
reducing the time to build the learning model and increasing also known as variable selection or attribute selection. The
the accuracy in the learning process. Therefore, the features are also known as variables or attributes. The
researchers pay more attention on the feature selection to machine learning algorithms can be roughly classified into
enhance the performance of the machine learning algorithms. two categories one is supervised learning algorithm and
Identifying the suitable feature selection method is very another one is unsupervised learning algorithm. The
essential for a given machine learning task with high- supervised learning algorithms learn the labeled data and
dimensional data. Hence, it is required to conduct the study on construct learning models that are known as classifiers. The
the various feature selection methods for the research classifiers are employed for classification or prediction to
community especially dedicated to develop the suitable identify or predict the class-label of the unlabeled data. The
feature selection method for enhancing the performance of the unsupervised learning algorithms lean the unlabeled data and
machine learning tasks on high-dimensional data. In order to construct the learning models that known as clustering
fulfill this objective, this paper devotes the complete literature models. The clustering models are employed to cluster or
review on the various feature selection methods for high- categorize the given data for predicting or identifying their
dimensional data. group or cluster. Mostly, the feature selections are employed
for the supervised learning algorithms since they suffered by
General Terms the high-dimensional space. Therefore, this paper presents a
Literature review on feature selection methods, study on complete literature review on various feature selection
feature selection, wrapper-based feature selection, embedded- methods for high-dimensional data.
based feature selection, hybrid feature selection, filter-based The rest of this paper is organized as follows: Section 2
feature selection, feature subset-based feature selection, describe the feature selection process. In Section 3, survey on
feature ranking-based feature selection, attribute selection, feature selection is conducted. Section 4 summarizes the
dimensionality reduction, variable selection, survey on feature survey on feature section. Section 5 concludes this paper.
selection, feature selection for high-dimensional data,
introduction to variable and feature selection, feature selection 2. FEATURE SELECTION
for classification. Feature selection is a process of removing the irrelevant and
redundant features from a dataset in order to improve the
Keywords performance of the machine learning algorithms in terms of
Introduction to variable and feature selection, information accuracy and time to build the model. The process of feature
gain-based feature selection, gain ratio-based feature selection is classified into two categories namely feature
selection, symmetric uncertainty-based feature selection, subset selection and feature ranking methods based on how
subset-based feature selection, ranking-based feature the features are combined for evaluation. The feature subset
selection, wrapper-based feature selection, embedded-based selection approach generates the possible number of
feature selection, filter-based feature selection, hybrid feature combinations of the feature subsets using any one of the
selection, selecting feature from high-dimensional data. searching strategies such as a greedy forward selection,
greedy backward elimination, etc. to evaluate the individual
1. INTRODUCTION feature subset with a feature selection metric such as
In the digital era, handling the massive data is a challenging correlation, consistency, etc. In this method, space and the
task among the researchers since the data are accumulated computational complexity involved are more due to the subset
through various data acquisition techniques, methods, and generation and evaluation [2].
devices. These accumulated massive raw data degrade the
performance of the machine learning algorithms in terms of In feature ranking method, each feature is ranked by a
causing overfitting, spending more time to develop the selection metric such as information gain, symmetric
machine learning modes and degrading their accuracy since uncertainty, gain ratio, etc. and the top ranked features are
the raw data are noisy in nature and have more number of selected as relevant features by a pre-defined threshold value.
features known as high-dimensional data. In general, the high- This approach is computationally cheaper and space
dimensional data contains irrelevant and the redundant complexity is less compared to subset approach. However, it
features. The irrelevant features cannot involve in the learning does not deal with redundant features.
process and the redundant features contain same information
hence thy miss lead the learning process. Therefore, these Further, the process of feature selection is classified into four
issues can be tackled by the feature selection. The feature categories namely wrapper, embedded, filter, and hybrid

9
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

methods based on how the supervised learning algorithm is 3.1 Feature Selection Based on Combining
employed in the feature selection process.
the Features for Evaluation
This section reviews various methods of feature selection
based on how the features are combined for evaluation in
order to select the significant features from a dataset. They are
classified into feature subset-based and feature ranking-based
methods.

3.1.1 Feature subset-based methods


In the feature subset-based method, the features are combined
as possible combinations of feature subsets using any one of
the searching strategies. Then, the feature subsets are
evaluated using any one of the statistical measures or the
supervised learning algorithms to observe the significance of
each subset and the most significant subset is selected as the
Figure 1 Feature selections with wrapper approach significant feature subset for a given dataset. If the subset is
Wrapper approach incorporates the supervised learning evaluated using the supervised learning algorithm, then this
algorithm for validating the generated feature subsets using method is known as wrapper method.
any one of the searching strategies as shown in Figure 1. It The best example for the feature subset-based method is
yields high classification accuracy only for the particular correlation-based feature subset selection (CRFS) developed
learning algorithm adopted. Hence, it does not possess a high by Hall [3]. In this approach, two correlation measures are
generality and the computational complexity is higher than considered; one is feature-class correlation and another one is
embedded and filter methods. feature-feature correlation. Initially, N numbers of features are
The embedded approach uses a part of supervised learning combined as possible combinations of feature subsets using
algorithm for feature selection process and it produces better heuristic-based best-first search, then each subset is evaluated
accuracy only for the learning algorithm used in the selection with the two correlation measures as mentioned above. The
process. Hence, it does not have a high generality and it is subset that has lesser feature-feature correlation and higher
computationally expensive than the filter and lesser than the feature-class correlation compared to other feature subsets is
wrapper method. considered as the selected significant feature subset for the
classification task. Liu & Setiono [4] proposed a feature
subset-based feature selection method namely consistency-
based feature subset selection (COFS). This method uses the
class consistency as an evaluation metric in order to select the
significant feature subset from the given dataset. These
methods are the filter-based methods since they do not use the
supervised learning algorithm to validate the subsets and they
use the statistical measure for evaluating the feature subsets.
In general, the exhaustive or complete search has to generate
Figure 2 Feature selection with filter approach 2N number of subsets to produce the maximum number of
The filter approach (Figure 2) selects the features without the possible combinations of feature subsets from the N number
influence of any supervised learning algorithm. Hence, it of features for evaluation. Therefore, this exhaustive
works for any classification algorithm and achieves more searching strategy is computationally quite expensive hence
generality with less computational complexity than the the heuristic searching strategies such as simulated annealing
wrapper and embedded methods. Therefore, it is suitable for (SA), tabu searching (TS), ant colony optimization (ACO),
high-dimensional space. The combination of wrapper and genetic algorithm (GA), particle swarm optimization (PSO),
filter approach is known as hybrid method [1]. etc. [5] are used by some of the researchers to get the optimal
solution by generating less number of feature subsets for
3. SURVEY ON FEATURE SELECTION evaluation. In the heuristic searching, the heuristic function
As the feature selection is employed in various machine obtains the prior knowledge to guide the search process to
learning applications, it has remarkable literature records generate the subsets and these subsets are evaluated using
made by the research community. Feature selection is a supervised machine learning algorithm. These factors make
preprocessing technique to select the significant features from the feature subset-based methods computationally expensive
a dataset by removing the irrelevant and redundant features and also these methods seem to be the wrapper approach.
for improving the performance of the machine learning
Some researchers used the simulated annealing search for
algorithms. The feature selection process can be categorized
generating the feature subset for evaluations. For example,
into various methods based on how the features are combined
Lin et al used the simulated annealing search to generate the
for evaluation in the feature selection process and how the
feature subsets and evaluated them by supervised learning
supervised learning algorithm is used to evaluate the features
algorithm namely back-propagation network (BPN) to choose
in the features selection process. This paper reviews the
the better feature subset [6]. Meiri & Zahavi used simulated
literature on various features selection methods and explores
annealing-based feature selection for marketing application
their merits and demerits.
[7]. In several feature selection methods, the tabu search is
used for subset generation such as Zhang & Sun developed a
tabu search-based feature selection. In this method, the
subsets generated by tabu search are evaluated using the
classification error criteria to find the better feature subset [8].

10
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

Tahir et al formed the feature subsets using tabu search then 3.1.2 Feature ranking-based methods
these subsets are evaluated using K-nearest neighbor classifier In the feature-ranking based approach, each feature of a
(kNN) with the classification error as evaluation criteria to dataset is weighted based on any one of the statistical or
obtain the significant feature subset [9]. information-theoretic measures and the features are ranked
A number of feature selection processes used the ant colony based on their weight. Then the higher ranked features are
optimization as the searching criteria for subset generation. selected as the significant features using a predefined
Aghdam et al employed the ant colony optimization search to threshold that determines the number of features to be
form the feature subsets and they are validated by the nearest selected from a dataset. The best example for the feature
neighbor classifier for text classification application [10]. ranking-based method is chi-square-based feature selection
Kanan & Faez proposed a feature selection method using ant (CQFS). In this method, Liu & Setiono used the chi-square
colony optimization for face recognition system. In this statistic measure to weight the features in order to rank them
approach, the nearest neighbor classifier is adopted for for selecting the significant features [24]. In the similar way,
evaluating the generated subset using ant colony optimization- the information-theoretic measures such as information gain,
based learning [11]. Sivagaminathan & Ramakrishnan symmetric uncertainty, gain ratio, etc. are employed to weight
developed an ant colony optimization-based feature selection the individual feature and rank them for selection.
with artificial neural networks (ANN) for medical diagnosis Further, it is observed that the feature ranking-based methods
system. In this method, the generated feature subsets are use the statistical measures or information-theoretic measures
validated using ANN [12]. Sreeja & Sankar presented an ant to weight the individual feature only by observing the
colony optimization-based feature selection with instance- relevancy between the individual feature and the target-class.
based pattern matching-based classification (PMC) [13]. Hence, these methods take less runtime but fail to remove the
In certain feature selection research works, the genetic redundant features [2]. The feature ranking-based methods
algorithm is adopted to generate the feature subsets for follow a filter-based approach since these methods do not
evaluation and the supervised machine learning algorithm is involve the supervised learning algorithm to evaluate the
used to evaluate the generated subsets. Welikala et al significance of the features. Consequently, these methods are
presented a feature selection using genetic algorithm with independent of the supervised learning algorithm hence they
support vector machine (SVM) for mining the medical dataset achieve more generality and less computational complexity.
[14]. Erguzel et al used the genetic algorithm and artificial Thus, the feature ranking-based methods can be a good choice
neural network for electroencephalogram (EEG) signal for selecting the significant features from the high-
classification [15]. Oreski & Oreski proposed a feature dimensional space with suitable redundancy analysis
selection method based on genetic algorithm with neural mechanism.
networks for credit risk assessment [16]. Li et al developed a
genetic algorithm with support vector machine for hyper-
3.2 Feature Selection Based on the
spectral image classification [17]. Das et al formulated a Supervised Learning Algorithm Used
genetic algorithm with support vector machine-based feature This section reviews various methods of feature selection
selection for handwritten digit recognition application [18]. based on the machine learning algorithm used. They are
Wang et al applied the genetic algorithm for subset generation categorized as wrapper, embedded, filter, and hybrid methods.
with support vector machine in feature selection process for
data classification applications [19]. 3.2.1 Wrapper-based methods
Wrapper-based approach generates the feature subsets using
In the literature, some researches employed the particle swam any one of the searching techniques and evaluates these
optimization to generate the feature subsets and to validate subsets using the supervised learning algorithm in terms of
them by supervised machine learning algorithm to identify the classification error or accuracy [25]. The wrapper method
significant feature subset. Xue et al designed a particle swarm seems to be a “brute force” method. This approach is
optimization (PSO)-based feature selection for classification. illustrated in Figure 1. Kohavi & John developed a wrapper-
In this method, the feature subsets generated by PSO are based feature selection method for selecting the significant
evaluated using supervised learning algorithm [20]. Chen et al features from the dataset [26]. This method consists of search
presented a feature selection method using particle swarm engine for subset generation and classification algorithm to
optimization search for sleep disorder diagnosis system [21]. evaluate the subset. Further, they compare the performance of
Yang et al developed a particle swarm optimization-based this method in terms of classification accuracy with hill-
feature selection for land cover classification [22]. climbing and best-first searching strategies using decision tree
and naïve Bayes classifiers. However, they observed that
From the subset-based feature selection literature, it is
wrapper method has the problems such as searching overhead,
observed that the exhaustive or complete search leads to high
overfitting, and increased runtime.
computational complexity as it generates 2N number of
subsets from N number of features for evaluation. This In wrapper approach, the searching is an overhead since the
searching strategy cannot be a better choice for high- searching technique does not have the domain knowledge. In
dimensional space. The heuristic search methods also lead to order to overcome the searching time overhead, Inza et al
more computational complexity, because they need prior used estimation of Bayesian network algorithm for feature
knowledge and each generated subset needs to develop a subset selection using naive Bayes and ID3 (Iterative
classification model for evaluating them to obtain the optimal Dichotomiser 3) [27]. In general, the searching method may
feature subset in an iterative manner, hence these searching lead to increase in computational complexity, since the
strategies are not suitable for high-dimensional space. training data is split for evaluation. In order to overcome this
However, these heuristic search methods follow a wrapper- issue, Grimaldi et al used an aggregation principle with
based approach. Therefore, these methods are computationally sequential search [28]. Dy & Brodley developed a wrapper-
expensive and they can only produce higher classification based approach for unsupervised learning using order
accuracy for the specific classification algorithm used to identification (recognizing the number of clusters in the data)
validate the subset, so they cannot achieve high generality. with the expectation maximization (EM) clustering algorithm

11
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

using maximum likelihood (ML) criterion [29]. Aha & Neumann et al developed an embedded-based feature
Bankert presented a wrapper-based method with beam search selection method for selecting the significant features from
and IB1 classifier [30]. Also, they compared its performance synthetic and real world datasets. In their approach, linear and
with the well known sequential search algorithms for feature non linear SVMs are employed in the selection process using
selection such as forward sequential selection (FSS) and the deference of convex functions algorithm (DCA) [47].
backward sequential selection (BSS). They observed that the Xiao et al proposed an embedded-based method to select the
beam search outperforms the FSS and BSS. significant features from audio signals for emotion
classification. This method was implemented based on the
The Maldonado & Weber developed a wrapper approach- principle of evidence theory with mass function and the
based feature selection by combining support vector machine identified most relevant features are added incrementally for
(SVM) with kernel functions. This method uses the sequential classification [48]. Maldonado et al developed an embedded
backward selection for feature subset generation and these method to select the significant features from imbalanced data
subsets are validated in terms of classification error to identify for classification with several objective functions [49].
the best subset [31]. In order to minimize the searching
overhead, Gütlein et al used the search algorithm namely Further, it is observed that the embedded methods are
ORDERED-FS that orders the features in terms of computationally efficient than the wrapper methods and
resubstitution error to identify their irrelevancy [32]. Kabir et computationally costlier than the filter methods hence they
al developed a wrapper-based constructive approach for cannot be suitable choice for high-dimensional space and they
feature selection (CAFS) using neural network (NN). In this have poor generality since the embedded methods use the
method, the correlation measure is used to remove the supervised learning algorithm.
redundancy in the searching strategy for improving the
performance of NN [33]. Stein et al proposed an ant colony 3.2.3 Filter-based methods
optimization-based feature selection with wrapper model. In The filter-based approaches are independent of the supervised
this approach, the ant colony optimization is used as a learning algorithm therefore offer more generality and they
searching method in order to reduce the searching overhead are computationally cheaper than the wrapper and embedded
such as blind search or forward selection or backward approaches. For processing the high-dimensional data, the
elimination searching methods [34]. Furthermore, to minimize filter methods are suitable rather than the wrapper and
the searching overhead, Zhuo et al presented a wrapper-based embedded methods.
feature selection using genetic algorithm with support vector Generally, the process of feature selection aimed at choosing
machine for classifying the hyper-spectral images [35]. the relevant features. The best example is Relief [50] that was
In the wrapper approach, overfitting can be overcome by post- developed with the distance-based metric function that
pruning, jitter, and early stopping methods. Post-pruning is weights each feature based on their relevancy (correlation)
carried out while developing the decision tree [36]. In jitter with the target-class. However, Relief is ineffective as it can
method, the noisy data that make the learning process more handle only the two-class problems and also does not deal
difficult are eliminated in order to fit the training data thereby with redundant features. The modified version of the Relief
the overfitting is eliminated [37]. In early stopping method, known as ReliefF [51] can handle the multi-class problems
overfitting is eliminated using neural network by stopping the and deal with incomplete and noisy datasets too. However, it
training process when performance on a validation set starts to fails to remove the redundant features. Holte developed a rule-
deteriorate [38] [39]. The researchers have tried to reduce the based attribute selection known as OneR which forms one
overfitting by early stopping method using genetic algorithm- rule for each feature and selects the rule with the smallest
based searching with early stopping (GAWES) [40]. error [52]. Yang & Moody proposed a joint mutual
information-based approach (JMI) for classification. It
Further, it is observed that the wrapper-based methods are calculates the joint mutual information between the individual
suffered by the searching overhead, overfitting [41] and have feature and the target-class to identify the relevant features,
more computational complexity with less generality since and a heuristic search is adopted for optimization when the
they use the supervised learning algorithm for evaluating the number of features is more. The features containing similar
generated subsets by the searching method. Therefore, these information and lesser relevancy to the target-class are treated
methods are not suitable choice for the high-dimensional as redundant features that are to be eliminated [53].
space.
Peng et al proposed a mutual information-based max-
3.2.2 Embedded-based methods relevancy min-redundancy (MRMR) feature selection. To
The embedded-based methods use a part of the learning identify the feature relevancy, the mutual information is
process of the supervised learning algorithm for feature computed between the individual feature and target-class, and
selection. Embedded-based methods reduce the computational to identify the redundant feature, the mutually exclusive
cost than the wrapper method [42]. This embedded method condition is applied [54]. Battiti developed a mutual
can be roughly categorized into three namely pruning method, information-based feature selection method (MIFS). In this
built-in mechanism, and regularization models. In the method, mutual information measure is used to determine the
pruning-based method, initially all the features are taken into relevancy between the individual feature and the target-class.
the training process for building the classification model and The features having similar information are considered as
the features which have less correlation coefficient value are redundant features that are to be removed [55]. Fleuret
removed recursively using the support vector machine (SVM) presented a feature selection scheme namely conditional
[43]. In the built-in mechanism-based feature selection mutual info maximization (CMIM) that recursively chooses
method, a part of the training phase of the C4.5 [36] and ID3 the features that have maximum mutual information with the
[44] supervised learning algorithms are used to select the target-class for classification [56].
features. In the regularization method, fitting errors are
minimized using the objective functions and the features with Meyer & Bontempi proposed a filter-based approach that uses
near zero regression coefficients are eliminated [45] [46]. double input symmetrical relevance (DISR) metric for feature
selection. This approach returns the selected features that

12
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

contain more information about the target-class than the dataset. For feature subset generation from the relevant
information about other features [57]. Lin & Tang introduced features, the searching strategies such as sequential backward
an information theory-based conditional infomax feature floating search (SBFS), extended sequential forward search
extraction (CIFE) algorithm to measure the class-relevancy (ESFS), and sequential forward floating search (SFFS) are
and redundancy for feature selection [58]. Brown et al used also employed [74]. Naseriparsa et al proposed a hybrid
the conditional redundancy (CondRed) metric for selecting method using information gain and genetic algorithm-based
the significant features from the dataset [59]. searching method combined with a supervised learning
algorithm [75]. Huda et al developed a hybrid feature
In the recent past, the clustering technique is also adopted in selection method by combining the mutual information (MI)
feature selection. Song et al developed a feature selection and artificial neural network (ANN) [76]. Gunal presented a
framework and adopted the graph-based clustering technique hybrid feature selection method by combining filter and
to identify the similarity among the features for removing the wrapper method for text classification. In this method,
redundant features [60]. Dhillon et al developed a feature information gain measure is used for ranking the significant
selection algorithm based on information theory for text features and the genetic algorithm is used as the searching
classification. In this approach, the hierarchical clustering is strategy with support vector machine [77].
used to cluster the features or terms of documents for
identifying their dependencies [61]. Li et al incorporated the D‟Alessandro et al proposed a hybrid approach for epileptic
clustering algorithm with the chi-square statistical measure to seizure prediction, in which ranking with genetic algorithm-
select the features from statistical data [62]. Cai et al based wrapper approach was implemented [78]. Yang et al
developed a spectral clustering-based feature selection developed a hybrid method for classifying the micro array
(MCFS) for selecting the significant features from the datasets data. In this method, the information gain and correlation
[63]. Chow & Huang employed the supervised clustering metric are used for filter method and an improved binary
technique and mutual information for identifying the salient particle swarm optimization (BPSO) method is used with the
features from synthetic and real world datasets [64]. Mitra et supervised learning algorithm as the wrapper method to
al presented a feature selection approach by adopting the improve the performance of the classification algorithm. The
graph-based clustering approach to identify the similarity performance of this method is evaluated using kNN and SVM
among the features for redundancy analysis [65]. Sotoca & classifiers [79]. To avoid the computational cost of the
Pla developed a feature selection method for classification wrapper method, Bermejo presented a hybrid method by
based on feature similarity with hierarchical clustering [66]. combining the filter and wrapper methods. In this method, the
GRASP meta-heuristic based on stochastic algorithm is used
Further, it is observed that the filter-based methods are as filter method for reducing the wrapper computation [80].
computationally better than the wrapper [67] and embedded Foithong et al also designed a hybrid feature selection method
[68] methods. Therefore, the filter-based methods can be a by combining the filter and the wrapper methods. In this
suitable choice for high-dimensional space. The filter-based method, the mutual information criterion is used for filtering
methods achieve high generality since they do not use the the relevant features and the supervised learning algorithm is
supervised learning algorithm. adopted as the wrapper method for evaluating features
3.2.3 Hybrid Methods obtained from the filter method [81].
The hybrid methods are the combination of filter and Further, it is observed that the hybrid methods are
wrapper-based approaches [69]. In general, processing the computationally intensive than the filter methods since they
high-dimensional data is a difficult task with the wrapper combine the wrapper and filter methods and have less
method therefore the authors Bermejo et al developed a generality compared to the filter methods since they use the
hybrid feature selection method known as filter-wrapper supervised learning algorithm in feature selection process.
approach. In this approach, they used a statistical measure to These hybrid methods take more computational time than the
rank the features based on their relevancy then the higher filter-based methods.
ranked features are given to the wrapper method so that the
number of evaluations required for the wrapper method is 4. SUMMARY
linear. Thus, the computational complexity is reduced using This section summarizes the feature selection methods that are
hybrid method for medical data classification [70]. Ruiz et al categorized based on how the features are combined in the
developed a gene (feature) selection algorithm for selecting selection process namely feature subset-based and feature
the significant genes for the medical diagnosis system. They ranking-based and based on how the supervised learning
used a statistical ranking approach to filter the features from algorithm used namely wrapper, embedded, hybrid, and filter.
high-dimensional space and the filtered features are fed into
the wrapper approach. This combination of the filter and The subset-based methods generate the feature subsets using
wrapper approach was used to distinguish the significant any one of the searching strategies for evaluation. The
genes causing cancer disease in the diagnosis process [71]. exhaustive or complete search is used to generate the subset
that leads to high computational complexity since maximum
Xie et al developed a hybrid approach for diagnosing the 2N number of possible combination of the subsets to be
erythemato-squamous diseases. In this approach, F-score generated from the N number of features to evaluate them.
measure is used to rank the features to identify the relevant This is a “brute force” method so this is not suitable for high-
features (filter approach). The significant features are selected dimensional space. The heuristic search such as SA, TS,
from the ranked features with the sequential forward floating ACO, GA, and PSO are employed to reduce the number of
search (SFFS) and SVM (wrapper method) [72]. Kannan & feature subset generation for evaluation using the heuristic
Faez presented a hybrid feature selection framework. In this function. The subset-based feature selection methods using
approach, ant colony optimization (ACO)-based local search the heuristic search lead to more computational complexity,
(LS) is used with the symmetric uncertainty measure to rank because they need prior knowledge and each generated subset
the features [73]. Xie et al designed a hybrid approach with need to develop a classification model to evaluate them.
F-score to identify the relevant attributes from a disease

13
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

However, this heuristic search methods follow a wrapper- generality. Therefore, the feature selection can be developed
based approach therefore these methods are computationally for high-dimensional data using the filter approach with
expensive and they can only produce higher classification ranking method for selecting the significant features from the
accuracy for the specific classification algorithm used to high-dimensional space. In addition, to overcome the
obtain the fitness or heuristic function. Therefore, these limitations of the ranking method the redundancy analysis
methods cannot achieve high generality. The ranking-based mechanism can be adopted with a suitable clustering
methods take less computation time and achieve high approach.
generality since they do not use the supervised learning
algorithm. They cannot remove the redundant features since 6. REFERENCES
they only compute the correlation or similarity between the [1] Saeys, Y, Inza, I & Larrañaga, P 2007, „A review of
individual feature and the target-class. Therefore, they can be feature selection techniques in bioinformatics.
a suitable choice for high dimensional space with a suitable Bioinformatics‟, vol. 23, no. 19, pp.2507-2517
redundancy analysis mechanism.
[2] Bolón-Canedo, V, Sánchez-Maroño, N & Alonso-
The wrapper, embedded, and hybrid methods are Betanzos, A, 2013, „A review of feature selection
computationally inefficient than the filter approach. In methods on synthetic data‟, Knowledge and information
addition, they do not have high generality since they use the systems, vol. 34, no.3, pp.483-519.
supervised learning algorithm in feature selection process.
[3] Hall, MA 1999, Correlation-based feature selection for
Therefore, the filter methods are the best choice for the high-
machine learning, Ph.D. thesis, The University of
dimensional data. Further, the filter methods are preferred
Waikato, NewZealand.
because they can perform better with any classification
algorithm since they possess better generality and require less [4] Liu, H & Setiono, R 1996, „A probabilistic approach to
computational complexity. The ranking-based approaches are feature selection-a filter solution‟, Proceedings of
better than the feature subset-based methods since the subset- Eighteenth International Conference on Machine
based methods require more space and computational Learning, Italy, pp. 319-327.
complexity. Therefore, the ranking-based methods are the best
choice for selecting the relevant features from the high- [5] Lisnianski, A, Frenkel, I & Ding, Y, 2010, „Multi-state
dimensional space. system reliability analysis and optimization for
engineers and industrial managers‟, Springer, New York.
In the feature selection literature, some researchers have
succeeded in effectively removing the irrelevant features, but [6] Lin, S.W, Tseng, TY, Chou, SY & Chen, SC 2008, „A
failed to handle the redundant features. On the other hand, simulated-annealing-based approach for simultaneous
some other researchers dealt with removing the irrelevant parameter optimization and feature selection of back-
features and redundant features. Furthermore, the state-of-the- propagation networks‟, Expert Systems with
art feature selection methods reported in literature use the Applications, vol. 34, no.2, pp.1491-1499.
rule-based metric and nearest neighbor principles. Both of the [7] Meiri, R & Zahavi, J 2006, „Using simulated annealing
methods eliminate the irrelevant features, but fail to treat the to optimize the feature selection problem in marketing
redundant features. Some of the methods use the information- applications‟, European Journal of Operational Research,
theoretic-based metric to calculate the relevancy between the vol.171, no.3, pp.842-858.
feature and the target-class for relevancy analysis and to
calculate the independency among features for redundancy [8] Zhang, H & Sun, G 2002, „Feature selection using tabu
analysis. search method‟, Pattern recognition, vol. 35, no.3,
pp.701-711.
In most of the information-theoretic-based approaches, the
same metric is used for both redundancy and irrelevancy [9] Tahir, MA, Bouridane, A & Kurugollu, F 2007,
analysis. Some of these approaches perform pair wise analysis „Simultaneous feature selection and feature weighting
to identify the independency among the features for using Hybrid Tabu Search/K-nearest neighbor classifier‟
redundancy analysis, resulting in increased time complexity. Pattern Recognition Letters, vol. 28, no.4, pp.438-446.
They do not have special mechanism for treating the [10] Aghdam, MH, Ghasem-Aghaee, N & Basiri, ME 2009,
redundant features, yet they do moderate redundancy analysis. „Text feature selection using ant colony optimization‟,
Most of the clustering-based approaches use the hierarchical Expert systems with applications, vol. 36, no.3, pp.6843-
clustering for feature selection and they deal with specific 6853.
types of datasets. However, the hierarchical clustering is
expensive for high-dimensional datasets and less effective in [11] Kanan, HR & Faez, K 2008, „An improved feature
high-dimensional space due to the dimensionality selection method based on ant colony optimization
phenomenon. Hence, simple, scalable, and faster K-means (ACO) evaluated on face recognition system‟, Applied
clustering algorithm can be used [82] for relevancy analysis in Mathematics and Computation, vol. 205, no.2, pp.716-
feature selection. 725.

5. CONCLUSION [12] Sivagaminathan, RK & Ramakrishnan, S 2007, „A


This paper analyzed several feature selection methods that are hybrid approach for feature subset selection using neural
proposed by various researchers. From the earlier research networks and ant colony optimization, Expert systems
works, it is observed that the feature ranking-based methods with applications, vol. 33, no.1, pp.49-60.
are better than the subset-based methods in terms of memory [13] Sreeja, NK & Sankar, A 2015, „Pattern Matching based
space and computational complexity and the ranking-based Classification using Ant Colony Optimization based
methods do not reduce the redundancy. Further, the wrapper, Feature Selection‟, Applied Soft Computing, vol. 31,
embedded, and hybrid methods are computationally pp.91-102.
inefficient than the filter method and they have poor

14
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

[14] Welikala, R.A, Fraz, MM, Dehmeshki, J, Hoppe, A, Tah, [26] Kohavi, R & John, GH 1997, „Wrappers for feature
V, Mann, S, Williamson, TH & Barman, SA 2015, subset selection‟, Artificial intelligence, vol. 97, no.1,
„Genetic algorithm based feature selection combined pp.273-324.
with dual classification for the automated detection of
proliferative diabetic retinopathy‟, Computerized [27] Inza, I, Larrañaga, P, Etxeberria, R & Sierra, B 2000,
Medical Imaging and Graphics, vol. 43, pp.64-77. „Feature subset selection by Bayesian network-based
optimization‟, Artificial intelligence, vol. 123, no. 1,
[15] Erguzel, TT, Ozekes, S, Tan, O & Gultekin, S 2015, pp.157-184.
„Feature Selection and Classification of
Electroencephalographic Signals an Artificial Neural [28] Grimaldi, M, Cunningham, P & Kokaram, A 2003, „An
Network and Genetic Algorithm Based Approach‟, evaluation of alternative feature selection strategies and
Clinical EEG and Neuroscience, vol. 46, no.4, pp.321- ensemble techniques for classifying music‟, Proceedings
326. of Fourteenth European Conference on Machine
Learning and the Seventh European Conference on
[16] Oreski, S & Oreski, G 2014, „Genetic algorithm-based Principles and Practice of Knowledge Discovery in
heuristic for feature selection in credit risk assessment, Databases, Dubrovnik, Croatia
Expert systems with applications, vol. 41, no.4, pp.2052-
2064. [29] Dy, JG & Brodley, CE 2000, „Feature subset selection
and order identification for unsupervised learning‟,
[17] Li, S, Wu, H, Wan, D & Zhu, J, 2011, „An effective proceedings In Proceedings of the Seventeenth
feature selection method for hyperspectral image International Conference on Machine Learning, p. 247–
classification based on genetic algorithm and support 254.
vector machine‟, Knowledge-Based Systems, vol. 24,
no.1, pp.40-48. [30] Aha, DW & Bankert, RL 1996, „A Comparative
Evaluation of Sequential Feature Selection Algorithms,
[18] Das, N, Sarkar, R, Basu, S, Kundu, M, Nasipuri, M & Springer, New York.
Basu, DK 2012, „A genetic algorithm based region
sampling for selection of local features in handwritten [31] Maldonado, S & Weber, R 2009, „A wrapper method for
digit recognition application, Applied Soft Computing, feature selection using support vector machines‟,
vol.12, no.5, pp.1592-1606. Information Sciences, 179(13), pp.2208-2217.

[19] Wang, Y, Chen, X, Jiang, W, Li, L, Li, W, Yang, L, [32] Gütlein, M, Frank, E, Hall, M & Karwath, A 2009,
Liao, M, Lian, B, Lv, Y, Wang, S & Wang, S 2011, „March. Large-scale attribute selection using wrappers,
„Predicting human microRNA precursors based on an Proceeding of IEEE Symposium on Computational
optimized feature subset generated by GA–SVM‟, Intelligence and Data Mining, Nashville, TN, USA, pp.
Genomics, vol. 98, no.2, pp.73-78. 332-339.

[20] Xue, B, Zhang, M & Browne, WN 2013, „Particle swarm [33] Kabir, MM, Islam, MM & Murase, K 2010, „A new
optimization for feature selection in classification: A wrapper feature selection approach using neural
multi-objective approach‟, IEEE Transactions on network‟, Neurocomputing, vol. 73, no. 16, pp.3273-
Cybernetics, vol. 43, no.6, pp.1656-1671. 3283.

[21] Chen, LF, Su, CT, Chen, KH & Wang, PC 2012, [34] Stein, G, Chen, B, Wu, AS & Hua, KA 2005, „March.
„Particle swarm optimization for feature selection with Decision tree classifier for network intrusion detection
application in obstructive sleep apnea diagnosis‟, Neural with GA-based feature selection‟ Proceedings of the
Computing and Applications, vol. 2, no. 8, pp.2087- forty-third ACM Annual Southeast regional conference,
2096. Kennesaw, GA, USA, vol. 2, pp. 136-141.

[22] Yang, H, Du, Q & Chen, G 2012, „Particle swarm [35] Zhuo, L, Zheng, J, Li, X, Wang, F, Ai, B & Qian, J 2008,
optimization-based hyperspectral dimensionality „A genetic algorithm based wrapper feature selection
reduction for urban land cover classification, „IEEE method for classification of hyperspectral images using
Journal of Selected Topics in Applied Earth support vector machine‟ Proceedings of Geoinformatics
Observations and Remote Sensing, vol. 5 no.2, pp.544- and Joint Conference on GIS and Built Environment:
554. Classification of Remote Sensing Images, pp. 71471J-
71471J.
[23] Lin, SW, Ying, KC, Chen, SC & Lee, ZJ 2008, „Particle
swarm optimization for parameter determination and [36] Quinlan JR 2014, „C4.5: programs for machine learning‟,
feature selection of support vector machines‟, Expert Morgan Kaufmann publishers, San Mateo, California.
systems with applications, vol. 35, no. 4, pp.1817-1824. [37] Koistinen, P & Holmström, L 1991, „Kernel regression
[24] Liu, H & Setiono, R 1995, „Chi2: Feature selection and and backpropagation training with noise, Proceedings of
discretization of numeric attributes‟, Proceedings of the IEEE International Joint Conference on Neural
IEEE Seventh International Conference on Tools with Networks, pp. 367-372.
Artificial Intelligence, Washington DC, USA, pp. 388- [38] Baluja, S 1994, „Population-based incremental learning a
391. method for integrating genetic search based function
[25] Dash, M & Liu, H 1997, „Feature selection for optimization and competitive learning‟, Technical Report
classification‟, Intelligent data analysis, vol. 1, no.1, No. CMU-CS-94-163, Carnegie Mellon University,
pp.131-156. Pittsburgh, Pa.
[39] Buntine, W 1991, „Theory refinement on Bayesian
networks‟ Proceedings of the Seventh conference on

15
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

Uncertainty in Artificial Intelligence, Barcelona, Spain, [55] Battiti, R 1994, „Using mutual information for selecting
pp. 52-60. features in supervised neural net learning‟, „IEEE
Transactions on Neural Networks, vol. 5, no. 4, pp.537-
[40] Loughrey, J & Cunningham, P 2005, „Overfitting in 550.
wrapper-based feature subset selection: The harder you
try the worse it gets‟ Proceedings of Research and [56] Fleuret, F 2004, „Fast binary feature selection with
Development in Intelligent Systems, Springer London. conditional mutual information‟, The Journal of Machine
pp. 33-43. Learning Research, vol. 5, pp.1531-1555.
[41] Freeman, C, Kulić, D & Basir, O 2015, „An evaluation of [57] Meyer, PE & Bontempi, G 2006, „On the use of variable
classifier-specific filter measure performance for feature complementarity for feature selection in cancer
selection, Pattern Recognition, vol.48, no.5, pp.1812- classification‟, Applications of Evolutionary Computing.
1826. pp. 91-102.
[42] Chandrashekar G & Sahin, F 2014, „A survey on feature [58] Lin, D & Tang, X 2006, „Conditional infomax learning:
selection methods‟, Computers & Electrical Engineering, an integrated framework for feature extraction and
vol. 40, no.1, pp.16-28. fusion‟, Proceeding of ninth European Conference on
Computer Vision, Graz, pp. 68-82.
[43] Guyon, I, Weston, J, Barnhill, S & Vapnik, V, 2002,
„Gene selection for cancer classification using support [59] Brown, G, Pocock, A, Zhao, MJ & Luján, M 2012,
vector machines, Machine learning, vol. 46, no. 1-3, „Conditional likelihood maximisation: a unifying
pp.389-422. framework for information theoretic feature selection‟,
The Journal of Machine Learning Research, vol.13, no.1,
[44] Quinlan, JR, 1986, „Induction of decision trees‟, pp.27-66.
Machine learning, vol. 1, no.1, pp.81-106.
[60] Song, Q, Ni, J & Wang, G 2013, „A fast clustering-based
[45] Tibshirani, R, Saunders, M, Rosset, S, Zhu, J & Knight, feature subset selection algorithm for high-dimensional
K 2005, „Sparsity and smoothness via the fused lasso‟, data‟, IEEE Transactions on Knowledge and Data
Journal of the Royal Statistical Society: Series B Engineering, vol. 25, no.1, pp.1-14.
(Statistical Methodology), vol. 67, no. 1, pp. 91-108.
[61] Dhillon, IS, Mallela, S & Kumar, R 2003, „A divisive
[46] Ma, S & Huang, J 2008, „Penalized feature selection and information theoretic feature clustering algorithm for text
classification in bioinformatics‟, „Briefings in classification‟, The Journal of Machine Learning
bioinformatics, vol. 9, no. 5, pp.392-403. Research, vol. 3, pp.1265-1287.
[47] Neumann, J, Schnörr, C & Steidl, G 2004, „SVM-based [62] Li, Y, Luo, C, & Chung, SM 2008, „Text clustering with
feature selection by direct objective minimisation‟, feature selection by using statistical data. IEEE
Proceeding of the twenty-sixth DAGM Symposium on Transactions on Knowledge and Data Engineering, vol.
Pattern Recognition, Germany, pp. 212-219. 20, no.5, pp.641-652.
[48] Xiao, Z, Dellandrea, E, Dou, W & Chen, L 2008, „ESFS: [63] Cai, D, Zhang, C & He, X 2010, „Unsupervised feature
A new embedded feature selection method based on selection for multi-cluster data‟, Proceedings of the
SFS‟, Rapports de recherché. sixteenth ACM SIGKDD international conference on
[49] Maldonado, S, Weber, R & Famili, F 2014, „Feature Knowledge discovery and data mining, Washington, pp.
selection for high-dimensional class-imbalanced data sets 333-342.
using Support Vector Machines‟, Information Sciences, [64] Chow, TW & Huang, D 2005, „Estimating optimal
vol. 286, pp.228-246. feature subsets using efficient estimation of high-
[50] Kira, K & Rendell, LA 1992, „A practical approach to dimensional mutual information‟, IEEE Transactions on
feature selection‟, Proceedings of the ninth international Neural Networks, vol.16, no.1, pp.213-224.
workshop on Machine learning, Aberdeen, Scotland, UK [65] Mitra S, & Acharya T 2005, „Data mining: multimedia,
(pp. 249-256). soft computing, and bioinformatics‟ John Wiley & Sons,
[51] Kononenko, I 1994, „Estimating attributes: analysis and New Jersey.
extensions of RELIEF‟. Proceeding of European [66] Sotoca, JM & Pla, F 2010, „Supervised feature selection
Conference on Machine Learning, Catania, Italy, pp. by clustering using conditional mutual information-based
171-182. distances‟, Pattern Recognition, vol. 43, no.6, pp.2068-
[52] Holte, RC 1993, „Very simple classification rules 2081
perform well on most commonly used datasets, Machine [67] Freeman, C, Kulić, D & Basir, O 2015, „An evaluation of
learning, vol.11, no.1, pp.63-90. classifier-specific filter measure performance for feature
[53] Yang, HH & Moody, JE 1999, „Data Visualization and selection, Pattern Recognition, vol.48, no.5, pp.1812-
Feature Selection: New Algorithms for Nongaussian 1826.
Data‟, Advances in Neural Information Processing [68] Frénay, B, Doquire, G & Verleysen, M 2014,
Systems, vol. 99, pp. 687–693. „Estimating mutual information for feature selection in
[54] Peng, H, Long, F & Ding C 2005, „Feature selection the presence of label noise, Computational Statistics &
based on mutual information criteria of max-dependency, Data Analysis, vol. 71, pp.832-848.
max-relevance, and min-redundancy‟, IEEE Transactions [69] Tabakhi, S, Moradi, P & Akhlaghian, F 2014, „An
on Pattern Analysis and Machine Intelligence, vol. 27, unsupervised feature selection algorithm based on ant
no.8, pp.1226-1238.

16
International Journal of Computer Applications (0975 – 8887)
Volume 136 – No.1, February 2016

colony optimization‟, Engineering Applications of using maximum relevance-minimum redundancy and


Artificial Intelligence, vol. 32, pp.112-123. artificial neural network input gain measurement
approximation (ANNIGMA)‟, Proceedings of the Thirty-
[70] Bermejo, P, Gámez, J & Puerta, J 2008, „On incremental Fourth Australasian Computer Science Conference,
wrapper-based attribute selection: experimental analysis Australia, vol. 113, pp. 43-52.
of the relevance criteria, Proceedings of International
Conference on Information Processing and Management [77] Gunal, S 2012, „Hybrid feature selection for text
of Uncertainty in Knowledge-Based Systems, France, classification‟, Turkish Journal of Electrical Engineering
pp.638-645. and Computer Sciences, vol. 20, no.2, pp.1296-1311.
[71] Ruiz, R, Riquelme, JC & Aguilar-Ruiz, JS 2006, [78] D'Alessandro, M, Esteller, R, Vachtsevanos, G, Hinson,
„Incremental wrapper-based gene selection from A, Echauz, J & Litt, B 2003, „Epileptic seizure prediction
microarray data for cancer classification‟ Pattern using hybrid feature selection over multiple intracranial
Recognition, vol. 39, no. 12, pp.2383-2392. EEG electrode contacts: a report of four patients‟, IEEE
Transactions on Biomedical Engineering, vol. 50, no.5,
[72] Xie, J, Xie, W, Wang, C & Gao, X 2010, „A Novel pp.603-615.
Hybrid Feature Selection Method Based on IFSFFS and
SVM for the Diagnosis of Erythemato-Squamous [79] Yang CS, Chuang LY, Ke CH, & Yang CH, 2008,‟A
Diseases‟, Proceedings of Workshop on Applications of hybrid feature selection method for microarray
Pattern Analysis, Cumberland Lodge, Windsor, UK, pp. classification, IAENG International Journal of Computer
142-151. Science, vol. 35, no. 3, pp. 1-3.
[73] Kannan, SS & Ramaraj, N 2010, „A novel hybrid feature [80] Bermejo, P, Gámez, JA & Puerta, JM 2011, „A GRASP
selection via Symmetrical Uncertainty ranking based algorithm for fast hybrid (filter-wrapper) feature subset
local memetic search algorithm‟, Knowledge-Based selection in high-dimensional datasets‟, Pattern
Systems, vol. 23, no. 6, pp.580-585. Recognition Letters, vol. 32, no.5, pp.701-711.
[74] Xie, J, Lei, J, Xie, W, Shi, Y & Liu, X 2013, „Two-stage [81] Foithong, S, Pinngern, O & Attachoo, B 2012, „Feature
hybrid feature selection algorithms for diagnosing subset selection wrapper based on mutual information
erythemato-squamous diseases‟, Health Information and rough sets‟, Expert Systems with Applications, vol.
Science and Systems, vol.1, no.10, pp.2-14. 39, no.1, pp.574-584.
[75] Naseriparsa, M, Bidgoli, AM & Varaee, T 2013, „A [82] Coates A, Ng AY (2012) Learning Feature
Hybrid Feature Selection method to improve Representations with K-Means. In: Montavon, G., Orr,
performance of a group of classification algorithms‟, G.B., and Müller, K.-R. (eds.) Neural Networks: Tricks
International Journal of Computer Applications, vol. 69, of the Trade. pp. 561–580. Springer Berlin
no. 17, pp. 0975 – 8887. Heidelberg.doi: 10.1007/978-3-642-35289-8_30
[76] Huda, S, Yearwood, J & Stranieri, A 2011, „Hybrid
wrapper-filter approaches for input feature selection

IJCATM : www.ijcaonline.org
17

View publication stats

You might also like