0% found this document useful (0 votes)
15 views24 pages

Advances in Nature Inspired Metaheuristic Optimization For 2023 Computer Sci

Uploaded by

Ramya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views24 pages

Advances in Nature Inspired Metaheuristic Optimization For 2023 Computer Sci

Uploaded by

Ramya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Computer Science Review 49 (2023) 100559

Contents lists available at ScienceDirect

Computer Science Review


journal homepage: www.elsevier.com/locate/cosrev

Review article

Advances in nature-inspired metaheuristic optimization for feature


selection problem: A comprehensive survey

Maha Nssibi a,b , , Ghaith Manita a,c , Ouajdi Korbaa a,d
a
Laboratory MARS, LR17ES05, ISITCom, University of Sousse, Sousse, Tunisia
b
ENSI, University of Manouba, Manouba, Tunisia
c
ESEN, University of Manouba, Manouba, Tunisia
d
ISITCom, University of Sousse, Sousse, Tunisia

article info a b s t r a c t

Article history: The main objective of feature selection is to improve learning performance by selecting concise
Received 13 October 2022 and informative feature subsets, which presents a challenging task for machine learning or pattern
Received in revised form 23 April 2023 recognition applications due to the large and complex search space involved. This paper provides
Accepted 1 May 2023
an in-depth examination of nature-inspired metaheuristic methods for the feature selection problem,
Available online 22 May 2023
with a focus on representation and search algorithms, as they have drawn significant interest from
Keywords: the feature selection community due to their potential for global search and simplicity. An analysis of
Feature selection various advanced approach types, along with their advantages and disadvantages, is presented in this
Metaheuristic study, with the goal of highlighting important issues and unanswered questions in the literature. The
Optimization article provides advice for conducting future research more effectively to benefit this field of study,
including guidance on identifying appropriate approaches to use in different scenarios.
© 2023 Elsevier Inc. All rights reserved.

Contents

1. Introduction......................................................................................................................................................................................................................... 2
2. Review methodology.......................................................................................................................................................................................................... 2
2.1. Preliminary study................................................................................................................................................................................................... 3
2.2. Screening of papers ............................................................................................................................................................................................... 3
2.3. Data extraction....................................................................................................................................................................................................... 4
2.4. Review reporting ................................................................................................................................................................................................... 4
3. Feature selection procedure .............................................................................................................................................................................................. 4
3.1. Search direction ..................................................................................................................................................................................................... 5
3.2. Search strategy ....................................................................................................................................................................................................... 5
3.3. Evaluation criteria.................................................................................................................................................................................................. 5
3.3.1. Filter approaches .................................................................................................................................................................................... 5
3.3.2. Wrapper approaches .............................................................................................................................................................................. 6
3.3.3. Embedded approaches ........................................................................................................................................................................... 7
3.4. Stopping criteria..................................................................................................................................................................................................... 7
3.5. Results validation................................................................................................................................................................................................... 7
4. Metaheuristic optimization for feature selection ........................................................................................................................................................... 8
4.1. Evaluation function................................................................................................................................................................................................ 9
4.2. Problem representation......................................................................................................................................................................................... 9
4.3. Binary metaheuristics for feature selection........................................................................................................................................................ 10
4.3.1. Binary evolutionary based metaheuristics .......................................................................................................................................... 11
4.3.2. Binary trajectory based metaheuristics ............................................................................................................................................... 11
4.3.3. Binary swarm based metaheuristics .................................................................................................................................................... 11
4.3.4. Binary nature based metaheuristics..................................................................................................................................................... 11
4.3.5. Binary mathematical based metaheuristics ........................................................................................................................................ 12

∗ Corresponding author at: Laboratory MARS, LR17ES05, ISITCom, University of Sousse, Sousse, Tunisia.
E-mail addresses: [email protected] (M. Nssibi), [email protected] (G. Manita), [email protected] (O. Korbaa).

https://doi.org/10.1016/j.cosrev.2023.100559
1574-0137/© 2023 Elsevier Inc. All rights reserved.
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

4.4. Hybrids and variants ............................................................................................................................................................................................. 12


4.4.1. Feature selection using chaos theory metaheuristic algorithms ...................................................................................................... 13
4.4.2. Feature selection using fractional order metaheuristic algorithms ................................................................................................. 13
4.4.3. Feature selection using island based metaheuristic algorithms....................................................................................................... 15
4.4.4. Feature selection using hybrid metaheuristic algorithms ................................................................................................................. 15
5. Issues and challenges......................................................................................................................................................................................................... 16
5.1. Representation scalability and stability .............................................................................................................................................................. 16
5.2. Selection of classifier ............................................................................................................................................................................................. 16
5.3. Wrapper objective function.................................................................................................................................................................................. 17
5.4. Challenges in feature selection and metaheuristic optimization: Recommendations and guidelines ........................................................ 17
6. Conclusion ........................................................................................................................................................................................................................... 18
Declaration of competing interest.................................................................................................................................................................................... 21
Data availability .................................................................................................................................................................................................................. 21
References ........................................................................................................................................................................................................................... 21

optimization methods, have been shown to enhance the per-


1. Introduction formance of heuristic procedures and are particularly useful for
addressing the local optima problem. Although no single algo-
In recent years, data gathering and storage have significantly rithm can solve every optimization problem, metaheuristics have
increased in almost every aspect of human life. However, as the been successfully applied to address a range of problems, includ-
dimensionality of datasets increases, it becomes more challeng- ing disease diagnosis, engineering optimization, action prediction,
ing to demonstrate statistical significance since there are fewer text mining, and feature selection [5].
significant data points. Moreover, the processing cost of high- Despite the significant amount of research on feature selection
dimensional datasets often grows exponentially. To address this over the past four decades, a comprehensive analysis covering
issue, a common solution is to identify which projection of the various aspects of the feature selection process, metaheuristic
data onto reduced variables preserves the most information [1,2]. techniques, and their variants (binary, island, chaotic, fractional
It is evident that not all features in a final dataset are neces- chaotic, etc.), as well as their applications, has yet to be reported.
sary or sufficient to comprehend the concept of interest. Feature This review addresses these issues and offers a comprehensive
selection involves choosing a limited set of features that optimally examination of feature selection. The main contributions of this
characterize the goal concept, assuming that all necessary fea- study can be summarized as follows:
tures are present. The objective of feature selection is to select the
best subset of features that contain the required information for • A comprehensive survey and analysis of the feature selec-
the classification or prediction process, while eliminating irrele- tion procedure, including its different approaches, method-
vant and redundant features [3]. To improve the readability and ologies, and strategies, is thoroughly explained.
interpretability of machine learning models, an optimal subset • The article presents a comprehensive survey of nature-
of features is often sought after starting with a fixed number of inspired optimization algorithms for feature selection, and
features. A feature selection criterion is necessary to determine also highlights the importance of fitness evaluation and
whether a feature is necessary to retain for the output classes representation techniques.
or labels. According to the machine learning principle, irrele- • The article highlights the main variants of metaheuristic
vant variables in a system negatively impact performance when representation in the treatment of the feature selection
applied to subsequent data. problem, including the continuous and binary variants. Fur-
Comparing feature elimination to other dimension reduction thermore, the literature on other variants, such as island-
methods like principal component analysis (PCA) or feature ex- based, chaotic, and fractional-chaotic approaches used in
traction is not appropriate since the relevant features may be feature selection with different metaheuristic algorithms, is
independent of the rest of the data. Reduction techniques are elaborated upon.
used to transform the original high-dimensional feature space
dataset into a low-dimensional feature space dataset, where re- The paper is organized as follows: Section 2 presents the feature
sulting features are typically a linear or nonlinear combination of selection process and its associated metrics. Section 3 outlines the
the original features. Unlike feature elimination, this process can methodology of the study. Section 4 provides a comprehensive
create new features by reducing the number of input features. analysis and discussion of the main feature selection approaches,
After selecting a feature selection criterion, a method must be metaheuristic methods, and their variants that have been em-
developed to determine the subset of valuable features. However, ployed to address the given problem. Section 5 discusses current
directly evaluating all feature subsets (2N) for a given dataset issues and future directions. Lastly, the paper concludes with a
becomes an NP-hard problem as the number of features increases. summary in Section 6.
Therefore, an optimization strategy must be employed to remove
irrelevant data while maintaining reasonable computing costs [4]. 2. Review methodology
As a result, the area of transdisciplinary optimization research is
expanding. The primary objective of this study is to propose a taxonomic
Optimization problems for fundamental functions can be read- framework for synthesizing and organizing the literature on the
ily handled using derivative and deterministic approaches. How- ongoing research field of metaheuristics in feature selection, with
ever, highly nonlinear, multimodal, and complex optimization reference to functionality. This approach also highlights potential
problems are challenging to solve using deterministic methods directions for future research and development in the feature
due to their high resource requirements. To address such prob- selection problem. The literature review method is a reliable tool
lems, a variety of methods, including gradient-based, gradient- for organizing and combining diverse knowledge and inevitably
free, trajectory-based, population-based, deterministic, or advances research by pointing out potential directions for further
stochastic algorithms, can be employed to solve an optimiza- investigation. The review procedure for this article is shown in
tion problem. Among these, metaheuristics, which are stochastic Fig. 1.
2
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 1. Review methodology.

Fig. 2. Rate of feature selection in different applications.

2.1. Preliminary study 2.2. Screening of papers

To identify and narrow down the list of relevant articles, a


The existing literature has demonstrated that numerous quest strategy was utilized. Multiple keywords and their syn-
nature-inspired metaheuristic techniques have been effectively onyms were used to extract pertinent articles for this study. The
utilized to address the challenge of feature selection. Further- quest approach enabled us to sift through the high number of
more, several authors have independently published survey/ relevant articles published over the last decade, totaling approx-
review papers on the use of feature selection and metaheuris- imately 2700 nature-inspired metaheuristic methods applied to
tics. However, a comprehensive analysis of various aspects of the feature selection problem. After carefully reviewing the title,
the feature selection process, metaheuristic approaches, their abstract, and content of each paper, a total of 180 articles were
ultimately selected for this study.
applications, the problem’s representation and its variations, has
The selection criteria for the articles included the journal
not been fully examined in a single manuscript. This study cov-
or publisher, indexing, quantity of citations, and impact factor
ers each of these topics in detail. To identify relevant publi- of the journal. Additionally, articles from conferences or book
cations, we searched multiple electronic databases, including chapters were also considered, with the quality and relevance
Google Scholar, Scopus, Springer Publishing, and Science Direct, being of utmost importance. Various publishers have published
using the keywords ‘‘feature selection’’ and ‘‘metaheuristics’’ in articles on the development of metaheuristic algorithms and fea-
the title, abstract, and keywords. The search results were filtered ture selection challenges, as shown in Fig. 3. From the figure, it
by relevancy. After reading and examining the titles and abstracts can be observed that Springer publisher publishes more papers
of the publications, we selected those that dealt with feature in top tier journals (Neural Computing and Applications (IF =
5.102), Applied Intelligence (IF = 5.019), Memetic Computing
selection problems and the use of metaheuristic approaches,
(IF = 3.577), Journal of Ambient Intelligence and Humanized
while excluding duplicate search results.
Computing (IF = 3.662), Soft Computing (IF = 3.732)). In El-
The important study fields and frequency of feature selection sevier Publisher, papers published in top tier journals (Experts
article publishing are displayed in Fig. 2, with image processing Systems with Applications (IF = 8.665), Applied Soft Computing
and bioinformatics being the most researched fields (see Table 1). (IF = 8.263), Knowledge-Based Systems (IF = 8.139), Neuro-
computing (IF = 5.779)). The other publishers consist of IOS
3
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 1
Related surveys contribution.
Survey paper Application Contribution
Agrawal et al.2021 [6] Feature selection Metaheuristic algorithms on feature selection: A survey of one decade of research
(2009–2019)
Visalakshi et al.2014 [7] Feature selection Review on feature selection techniques and classifiers
Sharma et al.2021 [8] Feature selection Binary and chaotic metaheuristics techniques
Nguyen et al.2020 [9] Data mining Swarm intelligence approaches for feature selection
Jain et al.2018 [10] Chronic disease prediction Review on the utilization of feature selection and classification techniques for the diagnosis
and prediction of chronic disease
Hira et al.2015 [11] Microarray data A review of feature selection and feature extraction method
Deng et al.2019 [12] Text classification Survey on popular representation schemes.
for documents, and similarity measures used in text classification.
Review the most popular text classifiers.
Review on feature selection methods
Ang et al.2015 [13] Gene selection Supervised, Unsupervised, and Semi-Supervised Feature Selection
Feizolla et al.2015 [14] Mobile malware detection Mobile malware detection & Review of 100 research works published between 2010 and
2014 with the perspective of feature selection
George et al.2011 [15] Cancer classification Review of feature selection techniques in micro array data based cancer classification and the
predominant role of SVM for cancer classification
Dai et al.2015 [16] Hyperspectral image processing Review of the fundamentals of each algorithm, applications in hyperspectral data analysis in
in food industry applications the food field illustration, and advantages and disadvantages of these algorithms
Alsolai et al.2019 [17] Software quality prediction The review evaluates 15 papers in 9 journals and 6 conference proceedings from 2007 to
2017 on feature selection techniques for software quality prediction

Fig. 3. The number of published papers of metaheuristic algorithms for feature selection problem.

Press, Massachusetts Institute of Technology (MIT) Press, Citeseer, 3. Feature selection procedure
ACM digital library, Hindawi publishing house, Multidisciplinary
Digital Publishing Institute (MDPI).
The feature selection process selects a smaller and more de-
2.3. Data extraction tailed feature subset from the initial features. The four main steps
of a feature selection algorithm are shown in Fig. 4. ‘‘Subset
During this stage of the study, data from the selected articles Evaluation’’ and ‘‘Subset Generation’’ are the two most important
were extracted and organized into a spreadsheet to track various procedures. ‘‘Subset Generation’’ generates probable feature sub-
aspects of each publication in order to meet the research aim sets using a search strategy. An evaluation function, commonly
and objectives. The spreadsheet included information such as the referred to as a fitness function, evaluates the effectiveness of
paper number, title, authors, publication year, location, publica- potential subsets in ‘‘Subset Evaluation’’. Based on the findings
tion type (conference, journal, or book chapter), and research type of ‘‘Subset Evaluation’’, it is anticipated that ‘‘Subset Generation’’
(new method, modification, hybrid, comparisons and analysis, or would generate more suitable feature subsets. Finding the search
survey).
direction is the first stage in the FS process. The first step in
the FS process is determining the direction of the search. The
2.4. Review reporting
approach is then decided upon, and both the stopping criterion
To aid readers in comprehending the gathered data, the study and the appropriate assessment criteria for the newly produced
is structured into the following sections. Each section provides subgroups are carefully chosen. The findings must be verified
a detailed analysis of the feature selection problem, with a fo- before they can be considered final. The stages that represent the
cus on a specific research area within the topic, as well as the feature selection process are described in detail below and are
metaheuristics used to address it. depicted in Fig. 5.
4
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 4. Feature selection process.

Fig. 5. Detail of feature selection process.

3.1. Search direction techniques in particular build several candidate feature subsets
through repetitive generation in accordance with predetermined
Determining the search direction and the starting point is the strategies, and then they employ a classification algorithm to as-
main task of the feature selection process. The search directions sess the corresponding classification accuracy. Although the fea-
can be broadly categorized into three groups, along with other tures are chosen during classifier training, embedded approaches
techniques like recursive search and naive search: forward search, always use a classification algorithm [21]. In contrast, filtering
backward search, and random search [18]. Forward searching is approaches analyze potential subsets without using a classifica-
the practice of iteratively adding new features to an empty set tion algorithm. The evaluation is mainly based on the inherent
at the start of a search. The backward elimination search, on properties of a dataset. The filter approach is the most traditional
the other hand, starts with a full set of features and gradually and is considered the simplest of the three since it requires
eliminates features until it gets to the necessary subset of fea- no learning [22]. On the other hand, the wrapper and embed-
tures. The second alternative option builds the feature subset ded techniques generally produce higher classification results
by continually adding and removing features using a random because they consider the interaction between the selected fea-
searching technique. tures and the classification algorithm [23]. In embedded methods,
the classifier is trained, the feature coefficients are set simulta-
3.2. Search strategy neously by minimizing the fitting errors, and the chosen feature
subset can be derived from the feature coefficients. Embedded
The three search methods are sequential, exponential, and approaches are generally less computationally demanding than
random. The drawback of exponential search is that 2n unique wrapper approaches, but they only apply to specific classification
feature selection combinations are required for each N feature. algorithms [24].
This NP-hard problem uses a thorough search strategy [19]. In a
sequential search, features are sequentially added to an empty 3.3.1. Filter approaches
set or deleted from the full set. These procedures are abbre- One of the earliest techniques for feature selection is the filters
viated as SFS (Sequential Forward Selection) and SBS (Sequen- approach, which is based on the inherent characteristics of the
tial Backward Selection). These methods have the flaw of not data. The main criterion for variable selection by ordering in filter
taking dropped features into account in subsequent iterations. methods uses variable ranking approaches. Ranking techniques
To solve these problems, researchers have created randomized are utilized because they are straightforward and have a proven
search algorithms. track record in actual applications. The variables are scored using
an appropriate ranking criterion, and variables below the thresh-
3.3. Evaluation criteria old are eliminated. The filter method technique is depicted in
Fig. 6. These methods have the drawback of ignoring the relation-
Wrapper approaches, embedding techniques, and filter ap- ships between classifiers and the dependence of one feature on
proach are the three groups into which feature selection ap- another, which may fail to choose the most relevant features. Fil-
proaches fall under the evaluation criteria [20]. A learning method ter methods analyze feature subsets by using metrics from several
is used in wrapper approaches to assess the value of chosen disciplines. Distance, correlation, consistency, and information
feature subsets, most frequently classification accuracy. Wrapper measures are the four most popular filter measures in feature
5
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 6. The process of filter method.

selection [25]. The objective of the distance measurements is subsets of candidate features iteratively following some strategies
to choose a feature subset that can most effectively distinguish and use the classification algorithm to determine the correspond-
occurrences from various classes. Relief is a well-known example ing classification accuracy, as shown in Fig. 7. A subset of features
of a distance-based approach [26]. Consistency measures reveal is typically preserved until a new subset with a greater accuracy
how well the chosen features distinguish the different classes replaces it. The classification performance of feature subsets is
compared to using all the original features [27]. If two instances assessed in a wrapper feature selection process. Most classifica-
belong to distinct classes but have the same feature values, they tion algorithms applied to feature selection in a wrapper method
are inconsistent. The goal is to find the smallest subset of features are K-nearest neighbor (KNN), Naive Bayes (NB), support vector
with a tolerable inconsistency rate. Correlation measures are used machine (SVM), and artificial neural networks (ANN). Due to
to choose a subset of features closely related to the class label that the difficulty of evaluating (2N ) subsets, unsatisfactory subsets
maximizes relevance and contain uncorrelated features to deter- are discovered using search algorithms that heuristically locate
mine the dependence of two random variables on each other that a subset. To identify a subset of variables that optimizes the ob-
minimizes redundancy [28]. Information measures can be used to jective function or classification performance, a variety of search
determine the significance and consistency of a feature subset, strategies can be applied. For larger datasets, exhaustive search
much as correlation measures [29]. Because they may identify techniques can become computationally expensive. Therefore,
non-linear correlations between random variables, information simpler algorithms that generate local optimal outcomes, such as
measures of the four measures generally draw more attention. sequential search, evolutionary, and nature-inspired algorithms,
ANOVA, Chi-square, Wilcoxon Whitney test, Pearson’s correlation, are used. These algorithms can produce solid results and are
Linear Discriminant Analysis, and Mutual Information are some of computationally viable. The Wrapper approaches are broadly di-
these techniques [30]. The interaction and feature variables in the vided into Randomized/Heuristic Search Algorithms and Sequen-
dataset are required for all these statistical techniques. Since the tial/Deterministic Selection Algorithms. The sequential selection
data distribution is unknown, various strategies can be employed methods begin with an empty set (complete set), add features,
to assess various subsets with a selected classifier. Additional and subtract features until the maximum objective function is
statistical tests that may be identified in the literature can be reached. To expedite the selection, a criterion is selected that
applied for feature ranking. For the text classification problem, gradually raises the objective function until the maximum is
authors in [31], takes into account twelve feature selection met- achieved with the fewest features. To optimize the objective func-
rics [31–33] Each metric ranks all the attributes, and a threshold tion, heuristic search algorithms consider several subsets. Various
is set to choose 100 words to input into the predictor. Several subsets are produced by creating solutions to the optimization
filter techniques are used in [34–36] for different applications. issue or exploring around in a search space. The feature subset
The authors of [37], create a ranking criterion for binary data selection in the wrapper technique is carried out as a ‘‘black
based on class densities. Another filter-based method that uses box’’, meaning that the underlying process is ambiguous [39].
a feature relevance criterion to rank the features is the RELIEF Inductive algorithms are used to choose feature subsets. This pre-
algorithm [36]. A subset of features is chosen using a threshold. determined feature subset calculates the training model accuracy.
The RELIEF algorithm limitation is in choosing a threshold. In [34], The procedure will decide whether to add or remove a feature
multitask learning is performed via discarded variables (MTL). from the selected subset depending on the accuracy determined
Authors in [38], employ Gram–Schmidt orthogonalization to rank from the previous phase. The wrapper approaches are, there-
the features using a random variable dubbed probe. Feature rank- fore, computationally more challenging. Evolutionary approaches
ing has the benefits of being computationally light and avoiding typically led the search for wrapper techniques. A subset of the
overfitting. However, the chosen subset might not be the best features is generally present in the population of solutions at the
option, which is one of the limitations. Important features that, outset. After that, a learner method would be used to compute
when paired with others, provide more information than they do the fitness of each subset. In most cases, the feature subsets
on their own could be dropped. There is no perfect way to select selection is enhanced through an iterative procedure to produce
the feature space dimension. the best results. Since wrapper strategies permit interactions be-
tween the solutions and predictors, they typically perform better
3.3.2. Wrapper approaches than filter techniques. Authors in [40], used the Cuckoo Search
Wrapper approaches add a learning algorithm to assess the (CS) algorithm for the facial recognition problem. First, discrete
selection’s effectiveness or, in most cases, the classification accu- cosine transformation, which served as the host egg in the CS
racy. To be more specific, wrapper approaches construct various method, was used to extract the features. It demonstrated its
6
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 7. The process of wrapper method.

effectiveness in facial recognition by locating the most similar by the assumptions of the classifier. The embedded technique
image. To find the best feature subset, a new function that con- aims to reduce the computational time for reclassifying various
verts the continuous variable to its binary form was used in the feature subsets [45]. The main goal of embedded approaches is
binary CS method (BCS) introduced by [41]. For the classification to incorporate feature selection into the learning process. An
task of selecting wrapper-based feature subsets, authors in [42], embedded feature selection method using a Weighted Gini Index
developed Ant Lion Optimization (ALO) algorithms. The ALO uses (WGI) and decision tree classifier was proposed in [46] to as-
a single operator, they stated, to strike a balance between explo- sess the feature subsets. One of the most well-known embedded
ration and exploitation phases. The algorithms used transfer and techniques, Support Vector Machine based on Recursive Feature
binary functions. They contrasted the outcomes of three previous Elimination (SVM-RFE), was developed by authors in [47]. It was
optimization algorithms with those proposed ALO algorithms explicitly created to choose genes for the classification of cancer.
(PSO, GA, and binary BA). Over 20 datasets taken from the UCI In this method, feature selection was carried out by repeatedly
repository tested all these approaches. Results demonstrated that training the SVM classifier using the given set of features and
regardless of the initial population generation strategies or other removing the less important features as determined by the clas-
algorithmic operators employed, the suggested ALO algorithms sifier. Comparative advantages and limitations of the FS methods
are efficient in the search for the optimal feature subset. A binary are shown in Table 11 (see Table 2).
wrapper Bat Algorithm (BA) for feature selection was presented
by [43]. A novel methodology was created to estimate the quality 3.4. Stopping criteria
of smaller feature sets. On public datasets, experiments revealed
that the method increases classification accuracy. A binary ver- The stopping criterion helps determine when the FS process
sion of the Dragonfly Algorithm (DA) for feature selection was should stop and evaluate the value of the chosen feature. Finding
proposed by [44]. On UCI datasets, the proposed DA was tested the best feature subset requires a small amount of computation
over five hundred times. Regarding classification accuracy and due to an adequate stopping criterion, which also solves the
the quantity of chosen attributes, the outcomes were contrasted over-fitting issue. The decisions taken in the earlier steps impact
with those of PSO, GA. The results demonstrated how effective the choice of the stopping criterion. Common stopping criteria
the binary DA is in selecting features. include a predefined number of features and iterations or can be
based on the evaluation function.
3.3.3. Embedded approaches
Wrapper techniques and embedded methods look for an op- 3.5. Results validation
timal subset for a particular learning algorithm. Still, embedded
methods are distinguished by a more deep interaction between The feature set validation approaches are used to validate the
feature selection and classifier design, as illustrated in Fig. 8. results. Some validation strategies include cross-validation, Con-
More particularly, the feature selection is included throughout fusion matrix, Jaccard similarity-based metric, and Rand Index.
the training phase. Although embedded approaches often require The most used validation technique is cross-validation (CV) [48].
less computing costs than wrapper approaches, they are only use- The CV’s major benefit is that it provides an unbiased error
ful for a limited number of classification algorithms. Even though estimate. The literature uses a variety of CV techniques. The 2-
they typically outperform wrappers computationally, they make fold cross-validation approach, which randomly divides the data
classifier-dependent choices that might not be compatible with into training and test sets, is one of the more straightforward
other classifiers. Since constructing the classifier results in creat- ones [49]. K-fold cross-validation extends 2-fold cross-validation
ing an optimal collection of features, the selection is influenced and involves the random division of the data into k sections. k − 1
7
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 8. The process of embedded method.

Table 2
Comparison of FS methods.
Feature selection method Advantages Limitations
Filter method •Efficient and rapid in computation. •The correlation between the classifiers is not
•Independent of the learning algorithm. taken into account.
•Appropriate for low-dimensional data. •The correlation between the attributes is not
taken into account.
• Incorrectly fails to notice patterns
throughout the learning phase.
Wrapper method •Considers the relationship between the class •More complex computation.
labels and the features. •Evaluate the chosen feature subset iteratively.
•Examines the dependencies between the •When discarded at the beginning, certain
features. traits might not be taken into consideration
•The wrapper approach is more precise. for evaluation.
Embedded method •Efficient in computation. •Computationally more expensive than the
•More accurate than filter and wrapper filter approach.
method. •Unsuitable for data with high dimensions.
•A weak generality.

subsets are chosen for training, and the final subset is used for and F (S) = maxX ⊆N ,|X |=y F (X ). In this instance, the following
testing. Up until all of the subgroups are tested, this process is computation is the criterion function for evaluating the ‘‘quality’’
repeated. The Leave-one-out cross-validation (LOOCV), a variation of a feature subset: Make I a collection of cases. We consider
of the K-fold cross-validation in which each sample is utilized for the class to which each case belongs (there are just two classes
testing, and the remaining samples are used for training, uses k to
taken into consideration). The following partition is made in I :
be equal to the number of samples. This procedure is continued
I = I1 ∪ I2 , where the number of cases in classes I1 and I2
until each sample has been examined.
is nearly equal, as are the percentages of cases in each class.
4. Metaheuristic optimization for feature selection The class with the closest case is assigned after we compute the
Euclidean distance between each instance in I2 and every case
The feature selection problem can be characterized as a com- in I1 . The percentage of matches in the designated classes is the
binatorial optimization problem to select a subset of features value of F (S). In other words, how frequently the allocated class
for which a feature-subset evaluation criterion is optimal. The matches the actual class. Examining each potential y-subset of the
issue of choosing the subset of features with the best classifi- feature set S would be necessary for an extensive solution to this
catory performance might be expressed as follows: Assume that problem. However, an exhaustive search is unfeasible for even
N represents the initial set of features with cardinality m. Let
low values of m due to the exponential growth possibilities. It has
y be the intended number of features in the chosen subset,
been demonstrated that the optimal feature selection problem is
and S be the subset with S ⊆ N. Suppose F represent the
feature selection criterion function for the set S by F (S). Let us NP-hard, making the application of heuristic and metaheuristic
examine a large value of F to denote a better feature subset techniques more adequate than the sequential search methods.
without losing any generality. Finding a subset S ⊆ N is the The rate of publication trend of metaheuristic techniques has
formal definition of the feature selection issue, such that |S | = y been analyzed and presented in Fig. 9.
8
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 9. Rate of metaheuristic based articles indexed in Google Scholar.

4.1. Evaluation function 4.2. Problem representation

Using optimization methods to address complex optimization The feature selection problem belongs to the binary-based
problems is challenging. On the other side, no single algorithm issue environment of the combinatorial optimization problems
(BOPs) [50]. Each solution to the feature selection problem is
can handle every optimization issue. It indicates that an approach
represented as a vector of one or zero. The length of the vector
may perform well on a certain problem set but cannot identify
represents the number of features in the dataset. The feature
the best answers to many situations. The broad adoption of
has been selected if the vector value is 1, while a value of 0
metaheuristic algorithms helps solve such issues. Metaheuristic
indicates that it has not. The search space for potential (possi-
algorithms produce the best results due to their simplicity, low
ble or unviable) solutions can be explored using metaheuristics.
number of parameters, high degree of problem-type dependence, Therefore, a representation of these solutions is essential to the
and lack of complexity. Finding a value for the variables in the design of a metaheuristic. This is a significant factor because the
problem makes it possible to optimize the objective function. The representation largely determines the efficiency of the approach.
best subset is provided by the outcome determined by the objec- In fact, a variety of representations could be utilized for a partic-
tive function. The evaluation function is therefore shown to be the ular problem. The metaheuristic can handle each representation
performance of the predictor. Predictive accuracy is frequently differently by using optimization methods such as operators or
employed as the key metric by researchers and practitioners in evaluation functions, which may be more or less effective for
wrapper approaches for the feature selection problem since it the situation at hand. The chosen representation should validate
is the main objective of classification to maximize predictive several attributes. Any search space solution must first be defined.
accuracy. The fitness functions allow us to distinguish between One candidate solution should theoretically correspond to one
various goals. Some attempt to do so by directly counting the representation. However, this is not always practically possible.
number of features or calculating the ratio of the cardinality of For a metaheuristic to employ the representation, it must adhere
the subset to the cardinality of the entire set of features. Most to the connexity property, which guarantees that a search path
fitness functions for wrapper models aim to reduce the utilized exists between every two solutions, especially with the best
classifier’s error rate. On the testing set, the error rate can be cal- solutions.
culated globally or via cross-validation accuracy. Thus the fitness Numerous representations have been proposed concerning
function can be mathematically formulated in two forms based data mining challenges. The most common form of representation
that can be found in the literature is a string of N bits, where N is
on the maximum classification accuracy Eq. (1), or the minimum
the total number of original attributes and each bit can represent
error rate Eq. (2) as follows:
whether or not the attribute is chosen by taking the value 1
|R| or 0, as depicted in Fig. 10. As binary representation is highly
↑ fitness = ACC + γ (1 − ), (1)
|N | conventional for metaheuristics, this individual representation
is straightforward, and operators may be used with ease. Due
where ACC is the classification accuracy, γ refers to the weight
to the apparent simplicity in operator implementation, the bi-
factor, R is the length of the selected feature subset, and N is the nary representation has been utilized frequently. However, since
total number of features. continuous spaces may be readily constructed in a real domain,
|R| many well-known metaheuristics were initially tested on them.
↓ fitness = αγR (D) + β , (2)
These metaheuristics can now operate in binary spaces due to the
|C |
binary versions that researchers have been designing. A continu-
where γR (D) is the classification error rate of the classifier R ous metaheuristic algorithm in the binary version can be created
relative to decision D, |R| is the length of the selected feature using a variety of techniques while still adhering to the lead-
subset, |C | is the total number of features, and α ∈ [0, 1], ing concepts behind the search process. Transfer Functions (TF)
β = (1 − α ) two parameters corresponding to the importance of are among the most widely used and simple binarization tech-
classification quality and subset length. Researchers also employ niques [51]. One of the simplest ways to create a binary algorithm
domain fitness functions for certain purposes, although maintain- from a continuous one is to employ a meta-heuristic algorithm
ing the fewest features necessary to produce the highest accurate with a transfer function. Over the past few decades, numerous
classification model remains the primary objective. studies on transfer functions have been conducted, and a number
9
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

of transfer functions, including S-shaped, V-shaped, U-shaped, Z-


shaped, linear, X-shaped, and other types of transfer functions,
have been proposed [52–56]. Some of these well-known transfer
functions are presented in Table 3, and the curves of the S-shape,
V-shape and U-shape are given in Fig. 11. Transfer functions are
responsible for mapping the continuous search space with Rn
solutions to a binary search space with solutions in [0, 1]n . A
transfer function applies a set of criteria to the conversion of a
continuous value to zero or one explained as follows: Let X =
(x1 , . . . , xn ) presents a feasible solution to the problem; for each
dimension, it is applied to the transfer function (TF), ij = TF (xj ),
obtaining an intermediate solution I = (I1 , I2 , . . . , In ), where
I ∈ [0, 1]n , and the probability of changing positions is defined by
the transfer function. The solution I is transformed into a binary
solution Ib = (Ib1 , . . . , Ibn ) by applying a binarization rule. The
most common rule is represented as follows:

rand < TF (xij )


{ }
0 if
xjnew = ,
1 if rand ≥ TF (xij )

independent of the previous value, the operator returns 1 if the


condition is met. If not, it returns 0. A class of new transfer
functions are proposed by using power functions on a symmetric
interval called Taper-shaped transfer functions proposed by [57].
Taper-shaped TF function have an attractive property, unlike the
existing transfer functions, Taper-shaped TF has a unified calcu- Fig. 10. The binary representation of the feature selection problem.
1
lation formula of T (x) = (| Ax |) n , x ∈ [−A, A], n ≥ 1. In the interest
of this unified formula and the calculation simplicity, a high gain Table 3
in the computational requirements and the algorithm robustness Mathematical formula of different transfer functions.
is achieved. The calculation formulas of the four Taper-shaped TF Transfer function No Formula Ref
are given in Table 4, and their curves on [−6, 6] are shown in S1 1
1+e−2x
Fig. 12. Another recent transfer function is also proposed by [58], S-shape TF S2 1
s
1+e−x
called the adaptive linear-based transfer function (UTF), which S3 1
1+e(−x/2)
relies on two linear functions that change the form during time. 1
S4 1+e(−x/3)
UTF has a simple structure and only requires two linear functions √
and three parameters to modify its shape over time, as shown in V1 |erf ( 2Π x)|
Fig. 13. As for the two linear functions, they are formulated as V-shape TF V2 | tanh(x)| v

follows: V3 |(x)/ 1 + x2 |
L1 vid (t + 1) = −G1 (t)vid (t)/ (Vmax − G2 (t)) + G3 (t) | Π2 arc tan( Π2 x)|
( )
V4
(3) U1 |x|1.5
L2 vid (t + 1) = G1 (t)vid (t)/ (Vmax − G2 (t)) + G3 (t)
( )
U2 |x|2
U-shape TF u
where Vmax is a constant. G1 and G3 defines the rate of change U3 |x|3
U4 |x|4
of the linear functions and the angle between them. G2 adjusts
1
the intersection location between the two linear functions during X1 1+ex
X-shape TF x
1
iterations. X2 1+e−x
The next positions of xd1i (t + 1) and xd2i (t + 1) in the binary Linear TF L
x−Rmin
l
Rmax−Rmin
transformation are calculated as follows:
{ (
rand < L1 vid (t + 1)
) ( )
¬ xd1i (t) if
xd1i (t + 1) = (4)
rand ≥ L1 vid (t + 1) a transfer function should be bounded in the interval [0, 1], as
( )
xd1i (t) if
{ ( they represent the probability that an agent should change its
rand < L2 vid (t + 1)
) ( )
¬ xd2i (t) if position. A transfer function should provide a high probability of
xd2i (t + 1) = (5)
rand ≥ L2 vid (t + 1)
( )
xd2i (t) if changing the position for a large absolute value of the velocity.
Agents with large absolute values for their velocities are probably
Eqs. (4) and (5) generate two different solutions where the best far from the best solution, so they should switch positions in the
one is selected as the next position, based on a comparison with next iteration. A transfer function should also present a small
the current one. This mathematical formula avoids trapping in probability of changing the position for a small absolute value of
local optima and expands the exploration and exploitation of the velocity.
space. Also, during the transformation from the continuous search
space to the binary one, the UTF method adapts itself. At the same 4.3. Binary metaheuristics for feature selection
time, the algorithm runs and switches between the metaheuristic
phases of exploration and exploitation. The metaheuristic optimization approaches use a population
The discretization method by transfer functions has the char- of candidate solutions. The solutions are usually represented as a
acteristics of universal solid applicability and the basic concepts vector of values. For metaheuristic feature selection algorithms,
that should be considered when selecting a transfer function to the representation of a solution is generally a binary encoding of
map velocity values to probability values, such as: The range of a selected set of features. In this representation, 2n many feature
10
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 11. The curves of S-shaped,V-shaped and U-shaped transfer functions.

Table 4 them to move from one solution to another. They have proven to
Mathematical formula of Taper-shape transfer functions. be effective in several situations. Simulated Annealing (SA), Tabu
Taper-shaped TF Search (TS), Iterated Local Search (ILS), Guided Local Search (GLS),

|x|
T1: TF (x) = √
|A |
and others are some of the popular methods in this area. Table 6
T2: TF (x) =
|x| lists some of these major algorithms recently used to solve the
|A|

3 |x| feature selection problem.
T3: TF (x) = √
3
|A |

4 |x|
T4: TF (x) = √
4
|A |
4.3.3. Binary swarm based metaheuristics
Swarm intelligence (SI) algorithms draw their inspiration from
the behaviors of social insects and other animals. In these algo-
rithms, a population made up of a group of individuals explores
the search space and imparts its knowledge to other population
members. The swarm as a whole moves toward better positions
in the search space according to the sharing mechanism, which
eventually converges to an optimal position. An organized group
of cooperative agents or items is referred to as a swarm. It
can serve as an illustration of how ants, bees, termites, fish,
and birds live. Particle Swarm Optimization (PSO), Artificial Bee
Colony (ABC), Ant Colony Optimization (ACO), and others are
well-known examples of swarm-based algorithms. Some of the
SI-based feature selection algorithms are summarized in Table 7.

4.3.4. Binary nature based metaheuristics


Methods that are nature-inspired adhere to natural laws. The
Fig. 12. The curve of Taper shape transfer function. laws of nature are plain and comprehensible. By principle, most
entities function as explorers in the search space, moving to-
ward the objective and solution. Nature-inspired methods incor-
subsets can be generated, where n is the number of features. porate human, plant, and physics/chemistry-based approaches.
This is theoretically the same with the formal definition of the Most physics- and chemistry-based metaheuristics are made by
number of feature subsets. Therefore, it is possible to find every mimicking their respective laws. These characteristics and rules
subset of features with this representation. That is why it is the include gravity, electrical charge, and material changes due to
most common way of representing the information of individuals physical and chemical processes. Most of the best-known exam-
for the feature selection problem. Next, an overview of different ples of this category are Chemical Reaction Optimization (CRO),
binary metaheuristics applied to the feature selection problem Black Hole (BH), Gravitational Search Algorithm (GSA) and Atom
and categorized according to their nature inspiration is presented Search Optimization (ASO).
and illustrated by the rate in Fig. 14. All of the individual and social behaviors of humans are mod-
eled using human-based methodologies. This collection of be-
4.3.1. Binary evolutionary based metaheuristics haviors covers particular actions, including how humans interact
Evolutionary Algorithms (EA) are included in population meta- with and adapt to their surroundings and how they act while they
heuristics, which are algorithms that learn by interactions be- are feeling distinct emotions. It is also possible to mimic social be-
tween several candidate solutions. When it comes to EAs, in haviors like the chaos that characterizes civilization, how human
addition to applying perturbations to existing solutions, they beings cooperate, or even imperialism. Imperialist Competitive
also combine previously existing solutions to produce new ones. Algorithm (ICA), Teaching-Learning-Based Optimization (TLBO),
Memetic Algorithm (MA), Genetic Algorithm (GA) and Differential Gaining–Sharing Knowledge-based optimization algorithm (GSK),
Evolution Algorithm (DEA) are the most often used algorithms in which is based on the concept of gaining and sharing knowledge
this area. Table 5 lists some of the main EAs recently used to solve of humans throughout their lifespan, and the recently released
the feature selection problem. Political Optimizer (PO) are the most often used algorithms in this
area. Plant growth, plant dissemination, root and plant expansion,
4.3.2. Binary trajectory based metaheuristics and other processes are the sources of inspiration for plant-based
Trajectory-based methods focus on a solution. In this way, a approaches. This category generally includes any algorithm that
trajectory is used to search the problem space. In most cases, the simulates a plant in some way. Flower Pollination Algorithm
nature of the problem dictates how these approaches should be (FPA), Invasive Weed Optimization (IWO), and Tree Growth Algo-
used. Metaheuristics that are trajectory-based improve a single rithm (TGA) are examples of algorithms in this category. Table 8
solution. They are performed using repetitive processes that allow lists some nature inspired algorithms for feature selection.
11
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Fig. 13. The curve of Upgrade Transfer Function (UTF).

Fig. 14. Comparison of metaheuristic algorithms based on their sources of inspiration.

Table 5
Binary evolutionary based metaheuristics.
Abbreviation Metaheuristic Application
EBGA [59] Genetic Algorithm FS for Student performance prediction
BDE [60] Differential Evolution FS for Molecular signatures
MBEGA [61] Memetic Algorithm FS for Microarray data
BGSK [62] Gaining–Sharing Knowledge Algorithm Feature selection
β -FSTC [63] β -Hill Climbing Algorithm FS for Text clustering

Table 6
Binary trajectory based metaheuristics.
Abbreviation Metaheuristic Application
SA [64] Simulated Annealing Algorithm FS for Marketing
SVM-SA [65] Simulated Annealing Algorithm FS for Hepatitis disease diagnosis
BTS [66] Tabu Search Algorithm FS for Model selection
FSRT [67] Tabu Search Algorithm FS for Credit scoring
ILS [68] Iterated Local Search FS for Timetabeling
APRO [69] Iterated Local Search FS for Rich portfolio
MA-SingleDocSum [70] Guided Local Search FS for Extractive single-document summary

4.3.5. Binary mathematical based metaheuristics Optimization (CGO). Some of the major nature-inspired algo-
Mathematically based metaheuristics are inspired by algebra, rithms and their application to the feature selection problem are
geometry, analysis and modern mathematics. In these algorithms, in Table 9.
the population of candidate solutions is created and is required to
move outwards or towards the best solution using a mathemat- 4.4. Hybrids and variants
ical model. The Sine–Cosine Algorithm (SCA) has been proposed
based on the sine and cosine functions. Other examples in this Several variants of different nature-inspired optimization al-
category include Arithmetic Optimization (AOA) and Chaos Game gorithms have been designed and employed to solve feature
12
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 7
Binary swarm based metaheuristics.
Abbreviation Metaheuristic Application
PSO [71] Particle Swarm Optimization FS for Gene selection
MBPSO [72] Particle Swarm Optimization FS for Spam detection
ABC [73] Artificial Bee Colony Feature selection
BDA [74] Dragonfly Algorithm Feature selection
BACO [75] Ant Colony Optimization FS for Fault diagnosis of rotary kiln
ChOA [76] Chimp Optimization FS for Biomedical data classification
bSSA [77] Salp Swarm Optimization Feature selection
BWOA [78] Whale Optimization Feature selection
BEPO [79] Emperor Penguin Feature selection
TSA [80] Tunicate Swarm Algorithm FS for Partitional clustering
BOA [81] Butterfly Algorithm Feature selection
BBA [82] Bat Algorithm Feature selection
BFA [83] Firefly Algorithm FS for Cancer diagnosis
EBHHO [84] Harris Hawks Algorithm FS for Software fault prediction
BCS [41] Cuckoo Search Feature selection
B-MFO [85] Moth Flame Optimization FS for biomedical datasets

Table 8
Binary nature based metaheuristics.
Abbreviation Metaheuristic Application
BEO [86] Equilibrium Optimizer Feature selection
EHHM [87] Electrical Harmony Algorithm Feature selection
BBHA [88] Black Hole Algorithm FS for biological data
Physics/Chemistry
GSA [89] Gravitational Search Algorithm FS for Biomedical data
BASO [90] Atom Search Optimization Feature selection
BCRO [91] Chemical Reaction Algorithm Feature selection
BPO [92] Political Optimizer FS for Gene selection
Human FS-BTLBO [93] Teaching Learning Based Optimization Feature selection
BICA [94] Imperialist Competitive Algorithm FS for Content-Based Image Retrieval
PbGSK [95] Gaining–Sharing Knowledge-based optimization algorithm Feature selection
BFPA [96] Flower Pollination Algorithm Feature selection
GIWO [97] Invasive Weed Optimization FS for Cancer prediction
Plant
BCFA [98] Clonal Flower Pollination Algorithm Feature selection
MBTGA [99] Tree Growth Algorithm FS for Myoelectric signals

Table 9
Binary mathematical based metaheuristics.
Abbreviation Metaheuristic Application
bSCA [100] Sine Cosine Algorithm FS for Medical data
BPSCOA [101] Sine Cosine Algorithm FS for Knapsack problem
CGO [102] Chaos Game Optimization FS for Internet of medical things
BAOA [103] Arithmetic Optimization Algorithm FS for Osteosarcoma detection

selection problems. These variants include hybridization, chaos just minimal parameters (the initial conditions) and some func-
theory, fractional order, island models, etc. This section gives tions (chaotic maps) are required. By changing its initial state,
brief information about these approaches applied to the feature an enormous variety of other sequences can also be formed.
selection problem. These sequences are also repeatable and deterministic. The use
of chaotic sequences can be conceptually explained by the fact
that they are unpredictable due to their spread-spectrum char-
4.4.1. Feature selection using chaos theory metaheuristic algorithms
acteristics and ergodic qualities. As a result, whenever a random
Chaos theory, defined as the simulation of the dynamic be-
number is required, it can be produced by iterating one step of
havior of nonlinear systems, presents one of the most successful
the selected chaotic map, starting from a random initial state at
applied methods for enhancing optimization algorithms proper-
the start of the run. Chaotic time series sequences are common
ties. The chaotic optimization algorithm (COA), one of the chaos
in literature; some chaotic maps and their dynamical distribution
theory applications, uses the chaos sequences nature and replaces are included in Table 10.
the random variables with chaotic variables denoted as chaotic Some of the studies are presented in Table 11.
maps [104]. It has been combined and integrated with various
nature-inspired metaheuristic algorithms. The basic character- 4.4.2. Feature selection using fractional order metaheuristic algo-
istic of metaheuristics, which all contain random attributes, is rithms
stochasticity, which is achieved utilizing values from statistical The fractional calculus is a functional and appealing method
distributions. The limitations with various metaheuristics, such that mathematicians recently introduced to open new directions
as quick convergence and stagnation in local optima, are resolved in various research areas [129]. New versions of the chaos maps
by replacing the random values with values generated by chaotic have been created using concepts from fractional calculus [130].
maps. The spread-spectrum sequence for a random number series Thus, fractional calculus is employed to enhance the distribution
can be created from a chaotic map. There is no need to save properties of chaos maps. As a matter of fact, fractional order
large sequences because chaotic sequences are quick and simple provides an attractive property where the memory of discrete
to construct and store [105]. For extremely lengthy sequences, fractional-order dynamical systems is dependent on all previous
13
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 10
Chaotic maps.

Table 11
Chaotic metaheuristic algorithms.
Abbreviation Metaheuristic Application
CBA [106] Chaotic Bat algorithm Feature selection
CDF [107] Chaotic Dragonfly algorithm Feature selection
CVSA [108] Chaotic Vortex Search algorithm Feature selection
CSSA [109] Chaotic Salp Swarm algorithm Feature selection
CISA [110] Chaotic Interior Search algorithm Feature selection
CCSA [111] Chaotic Crow Search algorithm Feature selection
CGSK [112] Chaotic Gaining Sharing Knowledge optimization algorithm Feature selection
CSHO [113] Chaotic Selfish Herd Optimizer Feature selection
CASO [114] Chaotic Atom Search Optimization Feature selection
CMFAO [115] Chaotic Moth Flame Optimization Algorithm Medical application
CBBHA [116] Chaotic binary Black Hole Algorithm Data classification
CEOA [117] Chaotic Equilibrium Optimizer Algorithm Feature selection
CMPA [118] Chaos Embed Marine Predator Algorithm Feature selection
CFSA [119] Chaotic fractal search algorithm Control design
CWOA [120] Chaotic Whale Optimization algorithm Feature selection
CAO [121] Chaotic antlion optimization Feature selection
CDTO [122] Chaotic Duck Traveler Optimization Feature Selection in Breast Cancer Dataset Problem
CHS [123] Chaotic Harmony Search algorithm Feature selection for classification of gene expression profiles
CBRSA [124] Chaotic binary Reptile Search algorithm Feature selection
CBOA [125] Chaotic Butterfly Optimization algorithm Feature selection
CEPO [126] Chaotic Emperor Penguin Optimization Microarray cancer classification
CGBC [127] Chaotic Genetic Bee Colony Feature selection in microarray cancer classification
CCSO [128] Chaotic Chicken Swarm Optimization algorithm Feature selection

14
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 12
Fractional chaos maps with range 0 and 1.

Table 13
Fractional order chaotic metaheuristic algorithms.
Abbreviation Metaheuristic Application
FOSMA [133] Fractional Calculus-Based Slime Mould Algorithm Feature selection
FOS-CS [134] Fractional-order Cuckoo Search Optimizer COVID-19 X-ray images classification
FOCLMPA [135] Fractional-order marine predators algorithm Feature selection
FC-FPA [136] Fractional chaos flower pollination algorithm Feature selection
FO-MPA [137] fractional-order marine predators algorithm COVID-19 image classification
FO-COA [138] Fractional Order Coot Optimization Algorithm Skin cancer detection
FO-CRO [139] Fractional-order Chaotic Chemical Reaction Optimization Parameter identification
E 2 O [140] Fractional order chaotic equilibrium optimization Global optimization

states, unlike discrete integer-order systems that depend only between islands. Researchers introduced different policies based
on the previous state. Consequently, fractional chaos maps en- on greed or random selection. The most used policies are the
hance optimization algorithms’ exploration phase and help avoid best-worst policy, where the worst solutions of one island are
premature convergence. Fractional chaotic maps [131,132], are replaced by the best solution from the other [149], and random
listed in Table 12. Hence, recent applications for the feature selec- policy, where solutions are migrated randomly [150]. Finally, The
tion problem integrate fractional order chaos with optimization migration process with all its factors can be carried out in two
algorithms, and some of these studies are given in Table 13. ways: synchronously or asynchronously. Island models have been
severely incorporated with different metaheuristic algorithms to
4.4.3. Feature selection using island based metaheuristic algorithms solve the feature selection problem; some of these studies are
The island model has been proposed mainly to address the listed in Table 14.
lack of heterogeneity from which most population-based algo-
rithms suffer. This technique is one of the most suitable popu- 4.4.4. Feature selection using hybrid metaheuristic algorithms
lation structuring techniques that split a single population into The hybrid algorithms are developed by combining the cur-
multiple sub-population termed islands. Island-based metaheuris- rent metaheuristics or classical algorithms. The main purpose of
tics have proven to be successful in reducing computational hybrid algorithms is to combine the skills of various algorithms to
requirements and providing good results. However, choosing the obtain better results. Therefore, hybrid metaheuristic algorithms
adequate values of different parameters and adopting the suitable have significant improvements compared to single metaheuristic
migration policy for the island-based technique has a great im- algorithms. Hence, more efficient and flexible algorithms can be
pact on the final results, as proved by many studies [141–145]. On developed using these hybrid methods. There is another scope
each island, the original algorithm is executed iteratively, either of combining the filter and wrapper methods to form a hybrid
synchronously or asynchronously. Hence, a migration process is algorithm for better accuracy and time complexity.
applied to improve the efficiency of the algorithm by moving Authors in [159] proposed a hybrid filter wrapper approach
some individuals from one island to another. This process follows for medical data classification, where a four-step hybrid ensem-
several strategies that ensure the exploration of new parts of the ble feature selection algorithm has been introduced. Firstly, the
search space [144]. The balance can explain the wide use of this dataset is partitioned using the cross-validation procedure. Sec-
technique it has ensured between exploration and exploitation. ondly, various filter methods based on weighted scores were en-
Moreover, dividing individuals (potential solutions) into differ- sembled in the filter step to generate a ranking of features. Third,
ent islands reduces the computational time and increases the a sequential forward selection algorithm is utilized as a wrapper
probability of weak solutions reaching their optimum [146]. The technique to obtain an optimal subset of features. Finally, the
island model requires, at the first two parameters, the number of resulting optimal subset is processed for subsequent classification
islands (In ) and the island size (Is ). Then, the migration process is tasks. [160], introduced a novel hybrid filter-wrapper feature
defined by four main factors: migration rate, frequency, policy, selection approach using whale optimization algorithm (WOA).
and topology. The migration rate (Rm ) is the number of solu- The proposed method is a multi-objective algorithm in which
tions exchanged between islands. The migration frequency (Fm ) filter and wrapper fitness functions are optimized simultane-
defines the periodic time for the exchange. The migration topol- ously. Another hybridization was proposed by [161], for Crack
ogy structures the path of exchanging solutions among islands. Severity Recognition; their approach comprises two main compo-
Several topologies were proposed in the literature and mainly nents, namely, feature extraction based on hand-crafted feature
categorized into two different sets, static and dynamic. Static engineering and CNN-based deep feature learning and feature
topologies [147] include ring, mesh and star, which have their selection using hybrid filter-wrapper with a multi-objective im-
structured paths predefined; it also remains static during the proved salp swarm optimization. Combining the best operators
migration process. However, in dynamic topologies [148], paths from different metaheuristic algorithms promoted the develop-
are randomly defined and change with every migration process. ment of new effective algorithms. In the work of [162], a new
The migration policy determines which solutions are exchanged hybrid optimization algorithm that benefits from the strengths
15
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 14
Island metaheuristic algorithms.
Abbreviation Metaheuristic Application
IMDE [151] Island Memetic Differential Evolution algorithm Feature selection
IP-HHO [152] Island parallel Harris Hawks Optimization algorithm Feature selection
IAGMFS [153] Island algorithm Based on Gaussian Mutation Feature selection
IBBO [154] Island Biogeography-based Optimization algorithm Histopathological Image Classification
IsBMFO [155] Island Moth Flame Optimization Software defect prediction
IMEA [156] Island model based Evolutionary Algorithms Brain–computer interfacing
PMSO [157] Island Particle Swarm Optimization Gene selection in DNA microarrays
I-GA [158] Island Model Genetic Algorithm Non-Traditional Credit Risk Evaluation

of both Grey wolf Optimization and Particle Swarm Optimization


was proposed to solve the feature selection problem. Another
method was introduced by [163], for Diabetes Diagnosis Appli-
cations. The proposed approach has three steps: preprocessing,
feature selection and classification. Several combinations of Har-
mony search algorithm, genetic algorithm, and particle swarm
optimization algorithm were examined with K-means for feature
selection. Another study [164], introduced an enhanced hybrid
metaheuristic approach using Grey Wolf Optimizer and Whale
Optimization Algorithm to develop a wrapper-based feature se-
lection method. Another scope of hybridization is developed by
using a metaheuristic approach combined with machine learning
algorithms. For example, authors in [165] aim to detect early
disease on plant leaves with small disease blobs. They used a
list of several measurement-based features representing the blobs
and then selected using a wrapper-based feature selection algo-
rithm built based on a hybrid metaheuristic. The chosen features
are used as inputs for an Artificial Neural Network (ANN). The Fig. 15. Rate of using hybrid metaheuristic algorithms and different variants for
results were promising compared to a popular Convolutional feature selection.
Neural Network model (CNN), and the conclusion that the pro-
posed approach can be implemented on low-end devices such as
smartphones was derived. In [166], a software defect prediction 5.1. Representation scalability and stability
based on Whale Optimization Algorithm and Simulated Annealing
for the feature selection process is developed. Then a Convo- A dataset for a real-world issue could comprise thousands,
lutional Neural Network (CNN) and Kernel Extreme Learning millions, or even more features. Due to the large datasets in-
Machine (KELM) are used to construct a unified defect prediction. volved in the feature selection challenge, the suggested technique
Another hybridization was proposed by [167], for an intrusion must be scalable. The construction of the algorithm must include
detection system with application in IoT-based healthcare. To re- the use of a good scalable classifier capable of managing large
duce computation costs, metaheuristic algorithms such as Particle datasets. Therefore, the process of developing an algorithm to
Swarm Optimization (PSO), Genetic Algorithm (GA), and Differ- address the feature selection problem must be scalable.
ential Evaluation (DE) were used. Supervised learning algorithms Stability is a crucial consideration while creating an algorithm
such as k- Nearest Neighbor (kNN) and Decision Tree (DT) are to address the feature selection problem. If an algorithm consis-
used to accurately classify normal and attack classes based on tently selects the same subset of features across several datasets,
selected features. Several combinations of different approaches it is considered to be stable for feature selection. Most of the time,
have been developed to solve different applications of feature se- the feature selection method gets unstable when trying to find
lection problems. Although hybrid algorithms achieve promising the optimum categorization. When features have a high degree
results, they also have more parameters than a single algorithm, of connection yet are eliminated to improve classification ac-
i.e. parameters from both algorithms and parameters to control curacy, instability results. Therefore, both classification accuracy
the algorithm hybridization. The role of variants and hybrids used and stability are crucial. When developing an algorithm to solve
in feature selection problems is shown in Fig. 15 (see Table 15).
the feature selection problem, stability is essential. A approach is
said to be stable for feature selection if it consistently uncovers
5. Issues and challenges
the same group of features across numerous samples of datasets.
When trying to discover the best categorization, the feature selec-
This study presents a number of feature selection-related stud-
ies. The purpose of issue representation has been explicitly stated, tion approach typically becomes unstable. When features with a
and several metaheuristics approaches have been investigated. high degree of linkage are dropped in order to increase classifica-
Applying metaheuristics has allowed us to find the optimal so- tion accuracy, instability develops. Hence stability is as important
lution in an acceptable time. Studies on feature selection based to classification accuracy.
on metaheuristic algorithms have been described in the wrap-
per method. Furthermore, it has been found that the areas of 5.2. Selection of classifier
disease diagnosis other than cancer and stock value prediction
have received the least attention in terms of feature selection. The classifier selected when creating a wrapper feature se-
Thus, it is crucial that researchers address these gaps. Although lection method significantly impacts the quality of the results.
metaheuristic algorithms have been quite successful in solving Different classifier types, including K-nearest neighbor (KNN),
feature selection problems, there are still certain issues that will Support Vector Machine (SVM), Naive Bayesian (NB), Random
be covered in the following sections. Forest (RF), Artificial Neural Network (ANN), ID3, C4.5, Fuzzy
16
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 15
Hybrid metaheuristic algorithms.
Abbreviation Metaheuristic Application
GTO-BSA [168] Gorilla Troops Optimizer (GTO) based on the algorithm for bird swarms (BSA) Feature selection for IoT Intrusion Detection
EMWS [169] Whale optimization algorithm (WOA) and simulated annealing (SA) Feature selection for Software defect prediction
QWOA [170] Quantum Whale Optimization algorithm Feature selection
HGWO [171] Grey Wolf and Particle Swarm Optimization Prediction of Rheumatoid Arthritis
BGWOPSO [172] Binary grey wolf optimization (GWO) and particle swarm optimization (PSO) Feature selection
PSO-GSA [173] Particle Swarm Optimization (PSO) and Gravitational Search algorithm (GSA) Feature selection
QBSO-FS [174] Reinforcement Learning based Bee Swarm Optimization Feature selection
MBA-SA [175] The Mine Blast algorithm (MBA) with Simulated Annealing (SA) Feature selection
ACO-CS [176] Ant Colony Optimization (ACO) and Cuckoo Search (CS) Feature selection in Digital mammogram
SHOSA [177] Spotted Hyena Optimization algorithm with Simulated Annealing Feature selection
DE-ABC [178] Differential Evolution and Artificial Bee Colony Feature selection
GWOCSA [179] Grey Wolf Optimization and Crow Search algorithm Feature selection
SM-GNCSOA [180] Grasshopper and Cat Swarm Optimization algorithm Feature selection
SCAGA [181] Sine Cosine algorithm and Genetic Algorithm Feature selection
TLBOGSA [182] Teaching Learning-based algorithm (TLBO) and Gravitational Search algorithm (GSA) Gene selection for cancer types classification
HHOBSA [183] Harris Hawks Optimization algorithm with Simulated Annealing Feature selection
WOA-FPA [184] Whale Optimization algorithm with Flower Pollination algorithm Feature selection for email spam detection
BOWOHHO [185] Binary Whale with Harris Hawks Feature selection
BBACE [186] Binary Bat algorithm with Cross-Entropy Feature selection
TOPSIS-Jaya [187] TOPSIS with Binary Jaya algorithm Microarray data classification
SSA-SA [188] Salp Swarm and Simulated Annealing Approach Feature selection
CHPSODE [189] Chaotic Particle Swarm Optimization with Differential Evolution Feature selection
BPSOGSA [190] Binary Particle Swarm Optimization and Gravitational Search algorithm Feature selection in industrial foam injection processes
CHIO-GC [191] Coronavirus Herd Immunity Optimizer with Greedy Crossover Feature selection in medical diagnosis

Fig. 16. Rate of used classifiers in metaheuristic algorithms for feature selection.

rule based (FR), and Decision Tree (DT), have been used to solve objective by weighing both objectives and performing the learn-
feature selection problems using metaheuristic algorithms. The ing process. Multi-objective functions have been used in several
role of classifiers used in feature selection problems is shown in research to determine the ideal feature subset [192–194].
Fig. 16.
5.4. Challenges in feature selection and metaheuristic optimization:
Recommendations and guidelines
5.3. Wrapper objective function

Feature selection and metaheuristic optimization are crucial


A wrapper feature selection method selects the best feature techniques for achieving optimal performance in machine learn-
subset by optimizing a given objective function. Depending on ing and data analysis. They facilitate the identification of the
the classification problem, different objective functions for fea- most relevant features in a dataset and optimize the parame-
ture selection are defined. An objective function can reduce the ters of a model. However, these techniques present significant
number of required features or increase classification accuracy. challenges when dealing with high-dimensional data, imbalanced
The multi-objective function was created in order to combine datasets, or complex models. To address these challenges, we
the two conflicting objectives for the feature selection problem. provide a list of recommendations, guidelines, insights, and sug-
The multi-objective function problem was reduced to a single gestions. These include exploring techniques that can handle
17
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

high-dimensional data and imbalanced datasets, producing in- investigate techniques that can integrate metaheuristic op-
terpretable models, assessing the stability, scalability, and gen- timization and reinforcement learning to optimize complex
eralization of the techniques, incorporating domain knowledge systems that require online learning.
and multiple objectives, and integrating metaheuristic optimiza-
tion with deep learning or reinforcement learning. By following A comprehensive overview of various binary metaheuristic al-
these suggestions, researchers can improve the effectiveness and gorithms that have been used for feature selection in machine
learning and data analysis is provided in Tables 16, 17. The
performance of feature selection and metaheuristic optimization
algorithms are compared based on their transfer function, classi-
techniques in various applications.
fier, evaluation metric, accuracy, and summary of their pros and
1. Address the curse of dimensionality: High-dimensional cons. By analyzing the performance and characteristics of these
data can be challenging for feature selection and meta- algorithms, researchers can gain insights into their strengths and
heuristic optimization. Techniques that can manage high- weaknesses and select the most suitable algorithm for their spe-
dimensional data, such dimensionality reduction and cific application.
sparse feature selection, should be investigated by re- In summary, researchers in the feature selection and meta-
heuristics domain should consider the above recommendations,
searchers.
guidelines, insights, and suggestions to address the issues and
2. Address the class imbalance problem: Imbalanced datasets
challenges in their research. Addressing these challenges can help
can be challenging for both feature selection and meta-
improve the efficiency and effectiveness of feature selection and
heuristic optimization. The problem of class imbalance
metaheuristic optimization techniques.
should be addressed through research into methods in-
cluding oversampling, undersampling, and cost-sensitive 6. Conclusion
learning.
3. Address the interpretability problem: The selected fea- This paper provides a thorough examination of nature-
tures should be interpretable and understandable to the inspired metaheuristic methods and their application to the
end-user. However, some metaheuristic optimization tech- feature selection process, emphasizing the importance of redun-
niques may produce complex models that are difficult to dancy and relevance metrics. A detailed description and mathe-
interpret. Researchers should investigate techniques that matical model of the feature selection challenge is also provided,
can produce interpretable models, such as decision trees which may aid researchers in fully understanding the problem.
and linear models. Methods for addressing feature selection challenges are dis-
4. Evaluate the stability of feature selection and metaheuris- cussed, with particular attention given to the role of metaheuris-
tic techniques: Feature selection and metaheuristic opti- tic methods in solving the problem. The fundamental definition,
mization can produce unstable results, particularly when significance, and classification of metaheuristic algorithms are
the dataset is small or noisy. Through the use of cross- presented, along with an updated list of nature-inspired meta-
validation and bootstrap procedures, researchers should heuristics. Various meta-heuristic approaches have been applied
assess the stability of their methodologies. to feature selection problems, employing evolutionary, swarm,
5. Evaluate the scalability of feature selection and metaheuris- natural, human, and mathematical principles, and their potential
tic techniques: Feature selection and metaheuristic opti- application domains are determined. Areas such as sentiment
mization can be computationally expensive, particularly for analysis, fraud detection, and weather forecasting have received
large datasets. Researchers need to assess the scalability of less attention, while education, robotics, and disease diagnosis
their approaches and look at parallelization strategies that have been more actively studied.
may improve effectiveness. In terms of article publishing frequency, diabetes and breast
6. Evaluate the generalization of feature selection and meta- cancer are the most commonly diagnosed diseases, while psy-
heuristic techniques: Feature selection and metaheuristic chological disorders are not given the same attention as other
optimization techniques can overfit to the training data. diseases in the diagnostic field. The paper focuses primarily on
Researchers should evaluate the generalization perform- solving the feature selection problem using the binary repre-
ance of their techniques using techniques such as cross- sentation of metaheuristic algorithms, with extensive literature
validation and out-of-sample testing. presented on each class of metaheuristic algorithms. Transfer
functions used for binary transformation are analyzed, and the
7. Investigate the integration of domain knowledge: Domain
use of chaotic, fractional order, island-based, and hybrid variants
knowledge can be valuable in guiding feature selection and
of nature-inspired metaheuristic techniques are summarized.
metaheuristic optimization. Researchers should look to-
The top three feature selection works identified in the liter-
wards methods like expert systems and knowledge-based
ature are diabetes, stress, and cancer, while few authors have
systems that may incorporate domain knowledge into the
used metaheuristic methods to address the feature selection issue
optimization process.
for neurological and psychological disorders. Hence, researchers
8. Investigate the integration of multiple objectives: In some are encouraged to consider using metaheuristic algorithms in
applications, feature selection and metaheuristic optimiza- scrutinizing this field.
tion may need to consider multiple objectives. Researchers According to the literature, finding the best feature subset for
should investigate multi-objective optimization techniques a given classification problem presents several challenges, and
that can handle multiple conflicting objectives. the quality of the solution produced is greatly influenced by the
9. Investigate the integration of deep learning and meta- classifier used. KNN and SVM classifiers are the most commonly
heuristic optimization: Deep learning has shown promising used classifiers for extracting the best subset from well-known
results in feature extraction and representation learning. datasets in the UCI repository. However, other classifiers exist but
To improve performance, researchers needs to search into are used less frequently for classification, emphasizing the need
methods that combine deep learning with metaheuristic to compare different classifiers with the most popular ones while
optimization. solving classification problems.
10. Investigate the integration of metaheuristic optimization Finally, this paper will be helpful to researchers since it com-
and reinforcement learning: Reinforcement learning can piles all the essential elements for employing metaheuristic algo-
be used to optimize complex systems. Researchers should rithms to solve the feature selection problem.
18
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 16
Comparison of binary metaheuristic algorithms for feature selection.
Algorithms Transfer function Classifier Evaluation metric Accuracy Summary
Binary Bat Algorithm S and V shape SVM Accuracy 98.25% + Good performance on large
(BBA) datasets.
− Slower convergence time
Binary Grasshopper S and V shape k-NN Accuracy 97.9% + Fast convergence time, good
Optimization feature selection.
Algorithm (BGOA) − Limited performance on
high-dimensional datasets
Binary Grey Wolf S and V shape k-NN Accuracy 84,20% + Fast convergence time, good
Optimizer (BGWO) feature selection.
− Can get stuck in local optima
Binary Firefly Aggregation function KNN, NB, LDA Accuracy 97.78% + Fast convergence time, good
Algorithm (BFA) feature selection.
− Can get stuck in local optima.
Binary Particle Swarm Sigmoid Decision Tree Accuracy 98.17% + Good performance on small
Optimization (BPSO) datasets
− Limited performance on large
datasets, can get stuck in local
optima.
S-shaped and S and V shape k-NN Accuracy 99.6% + Good performance on
V-shaped high-dimensional datasets, adaptive
gaining–sharing parameter tuning.
knowledge-based − May require tuning of additional
algorithm for feature parameters.
selection (pBGSKA)
Binary Sine–Cosine S and V shape k-NN Accuracy 98.23% + Efficient for high-dimensional
Algorithm (BSCA) problems.
− May require fine-tuning.
Binary Giza Pyramids S and V shape k-NN Accuracy 98.75% + Fast convergence and good
Construction performance for large datasets,
Algorithm (BGPC) − May not work well with small
datasets.
Binary Ant Lion S and V shape k-NN Accuracy 96.37% + BALO can effectively handle
Algorithm (BALO) problems with high dimensionality or
non-linearity.
− May suffer from slow convergence
and may get stuck in local optima. It
may also require a lot of
computational resources for optimal
performance.
Binary Salp Swarm S and V shape k-NN Accuracy 95.26% + BSSA can effectively handle
Algorithm (BSSA) problems with complex search spaces
or multiple objectives.
− BSSA can be sensitive to the
choice of its parameters. It may also
require a lot of computational
resources for optimal performance.
Binary Cuckoo Search Sigmoid Optimum-Path Accuracy 97.33% + BCSA can effectively handle
Algorithm (BCSA) Forest (OPF) problems with multimodal search
spaces or noisy objective functions.
− BCSA can be sensitive to the
choice of its parameters and may get
stuck in local optima.
Binary Equilibrium S and V shape k-NN Accuracy 97.01% + BEO has been shown to be
Optimizer (BEO) effective in finding global optima and
can achieve good performance with a
relatively small population size.
− BEO can be sensitive to the choice
of its parameters and may require a
large population size for optimal
performance.

19
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Table 17
Comparison of hybrid metaheuristic algorithms for feature selection.
Algorithms Transfer function Classifier Evaluation Accuracy Summary
metric
Binary Chaotic Bat V-shape Random forest, Accuracy 96.97% + BCBA can effectively handle
Algorithm (BCBA) k-NN problems with complex search spaces
or multiple objectives. The chaotic
dynamics can enhance its search
capability and prevent it from getting
stuck in local optima.
− BCBA can be sensitive to the
choice of its chaotic function and
may require a large population size
for optimal performance.
Binary Chaotic Chaotic k-NN Accuracy, 96.72% + BCDA can effectively handle
Dragonfly Algorithm F1-Score, problems with complex search spaces
(BCDA) Precision, or multiple objectives.
Recall − BCDA can be sensitive to the
choice of its chaotic function.
Binary Chaotic Vortex Chaotic k-NN Accuracy 97.45% + The performance of BCVA heavily
Algorithm (BCVA) relies on the parameter settings, such
as the population size and maximum
iteration number. These parameters
need to be carefully tuned to achieve
good results.
− BCVA may be computationally
expensive, especially for large
datasets, due to the need for multiple
evaluations of the fitness function.
Binary Chaotic Black Chaotic k-NN Accuracy 98.33% + The algorithm shows a promising
Hole Algorithm method for feature selection.
(BCBHA) − Its performance heavily depends
on the parameter settings and the
specific application at hand.
Binary Chaotic Chaotic k-NN Accuracy 96.62% + BCMFOA can effectively handle
Moth–Flame problems with complex search spaces
Optimization or multiple objectives.
Algorithm (BCMFOA) − BCMFOA can be sensitive to the
choice of its chaotic function. It may
also suffer from slow convergence
and can get stuck in local optima.
Fractional Chaotic – k-NN Accuracy 97.13% + FCOMPA is a promising method
Order Marine for feature selection, with the added
Predator Algorithm benefit of incorporating the concept
(FCOMPA) of fractional calculus to enhance its
exploration and exploitation abilities.
− FCOMPA can be computationally
expensive and may require a large
population size for optimal
performance.
Island-Based Genetic – SVM, k-NN, DT, Accuracy 93.51% + IBGA can effectively handle
Algorithm (IBGA) MLP problems with complex search spaces
or multiple objectives. It uses a
combination of global and local
search techniques to enhance its
search capability.
− IBGA can be computationally
expensive.
Quantum Whale – kNN, LDC, SVM, Accuracy 98.75% + QWOA can effectively handle
Optimization C4.5 problems with complex search spaces
Algorithm (QWOA) or multiple objectives due to its
quantum-inspired operators that
enhances its search capability.
− QWOA can be sensitive to the
choice of its parameters and may
require a large number of iterations
for optimal performance.
TOPSIS with Binary Time-varying Gaussian Naïve Accuracy 98.08% + TBJA is a hybrid algorithm that
Jaya Algorithm (TBJA) Bayes combines two different optimization
techniques to enhance its search
capability. It can effectively handle
problems with multiple objectives.
− TBJA may require a large number
of iterations for optimal performance
and can be computationally
expensive for large-scale problems.

20
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

Declaration of competing interest [24] T.N. Lal, O. Chapelle, J. Weston, A. Elisseeff, Embedded methods, in: I.
Guyon, M. Nikravesh, S. Gunn, L.A. Zadeh (Eds.), Feature Extraction: Foun-
dations and Applications, in: Studies in Fuzziness and Soft Computing,
The authors declare that they have no known competing finan-
Springer, Berlin, Heidelberg, 2006, pp. 137–165.
cial interests or personal relationships that could have appeared [25] N. Sánchez-Maroño, A. Alonso-Betanzos, M. Tombilla-Sanromán, Filter
to influence the work reported in this paper. methods for feature selection–a comparative study, in: International
Conference on Intelligent Data Engineering and Automated Learning,
Springer, 2007, pp. 178–187.
Data availability [26] R.J. Urbanowicz, M. Meeker, W. La Cava, R.S. Olson, J.H. Moore, Relief-
based feature selection: Introduction and review, J. Biomed. Inform. 85
No data was used for the research described in the article. (2018) 189–203.
[27] A. Arauzo-Azofra, J.M. Benitez, J.L. Castro, Consistency measures for
feature selection, J. Intell. Inf. Syst. 30 (3) (2008) 273–292.
References [28] C. Freeman, D. Kulić, O. Basir, An evaluation of classifier-specific filter
measure performance for feature selection, Pattern Recognit. 48 (5)
[1] H.M. Abdulwahab, S. Ajitha, M.A.N. Saif, Feature selection techniques in (2015) 1812–1826.
the context of big data: taxonomy and analysis, Appl. Intell. (2022) 1–46. [29] J.R. Vergara, P.A. Estévez, A review of feature selection methods based on
[2] J. Li, H. Liu, Challenges of feature selection for big data analytics, IEEE mutual information, Neural Comput. Appl. 24 (1) (2014) 175–186.
Intell. Syst. 32 (2) (2017) 9–15. [30] S. Velliangiri, S. Alagumuthukrishnan, et al., A review of dimensionality
[3] M. Dash, H. Liu, Feature selection for classification, Intell. Data Anal. 1 reduction techniques for efficient computation, Procedia Comput. Sci. 165
(3) (1997) 131–156. (2019) 104–111.
[4] X. Wang, J. Yang, X. Teng, W. Xia, R. Jensen, Feature selection based on [31] G. Forman, et al., An extensive empirical study of feature selection metrics
rough sets and particle swarm optimization, Pattern Recognit. Lett. 28 (4) for text classification, J. Mach. Learn. Res. 3 (Mar) (2003) 1289–1305.
(2007) 459–471. [32] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J.
[5] F. Peres, M. Castelli, Combinatorial optimization problems and meta- Mach. Learn. Res. 3 (Mar) (2003) 1157–1182.
heuristics: Review, challenges, design, and development, Appl. Sci. 11 (14) [33] R. Bekkerman, R. El-Yaniv, N. Tishby, Y. Winter, Distributional word
(2021) 6449. clusters vs. words for text categorization, J. Mach. Learn. Res. 3 (Mar)
[6] P. Agrawal, H.F. Abutarboush, T. Ganesh, A.W. Mohamed, Metaheuristic (2003) 1183–1208.
algorithms on feature selection: A survey of one decade of research [34] R. Caruana, V.R.d. Sa, Benefitting from the variables that variable selection
(2009–2019), Ieee Access 9 (2021) 26766–26791. discards, J. Mach. Learn. Res. 3 (Mar) (2003) 1245–1264.
[7] S. Visalakshi, V. Radha, A literature review of feature selection tech- [35] C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V.
niques and applications: Review of feature selection in data mining, in: de Schaetzen, R. Duque, H. Bersini, A. Nowe, A survey on filter techniques
2014 IEEE International Conference on Computational Intelligence and for feature selection in gene expression microarray analysis, IEEE/ACM
Computing Research, IEEE, 2014, pp. 1–6. Trans. Comput. Biol. Bioinform. 9 (4) (2012) 1106–1119.
[8] M. Sharma, P. Kaur, A comprehensive analysis of nature-inspired [36] R.J. Urbanowicz, M. Meeker, W. La Cava, R.S. Olson, J.H. Moore, Relief-
meta-heuristic techniques for feature selection problem, Arch. Comput. based feature selection: Introduction and review, J. Biomed. Inform. 85
Methods Eng. 28 (3) (2021) 1103–1127. (2018) 189–203.
[9] B.H. Nguyen, B. Xue, M. Zhang, A survey on swarm intelligence ap- [37] H. Peng, F. Long, C. Ding, Feature selection based on mutual information
proaches to feature selection in data mining, Swarm Evol. Comput. 54 criteria of max-dependency, max-relevance, and min-redundancy, IEEE
(2020) 100663. Trans. Pattern Anal. Mach. Intell. 27 (8) (2005) 1226–1238.
[10] D. Jain, V. Singh, Feature selection and classification systems for chronic [38] H. Stoppiglia, G. Dreyfus, R. Dubois, Y. Oussar, Ranking a random fea-
disease prediction: A review, Egypt. Inform. J. 19 (3) (2018) 179–189. ture for variable and feature selection, J. Mach. Learn. Res. 3 (2003)
[11] Z.M. Hira, D.F. Gillies, A review of feature selection and feature extraction 1399–1414.
[39] R. Kohavi, G.H. John, Wrappers for feature subset selection, Artificial
methods applied on microarray data, in: Advances in Bioinformatics, Vol.
Intelligence 97 (1–2) (1997) 273–324.
2015, 2015.
[40] V. Tiwari, Face recognition based on cuckoo search algorithm, Image 7
[12] X. Deng, Y. Li, J. Weng, J. Zhang, Feature selection for text classification:
(8) (2012) 9.
A review, Multimedia Tools Appl. 78 (3) (2019) 3797–3816.
[41] D. Rodrigues, L.A. Pereira, T. Almeida, J.P. Papa, A. Souza, C.C. Ramos,
[13] J.C. Ang, A. Mirzal, H. Haron, H.N.A. Hamed, Supervised, unsupervised, and
X.-S. Yang, BCS: A binary cuckoo search algorithm for feature selection,
semi-supervised feature selection: a review on gene selection, IEEE/ACM
in: 2013 IEEE International Symposium on Circuits and Systems, ISCAS,
Trans. Comput. Biol. Bioinform. 13 (5) (2015) 971–989.
IEEE, 2013, pp. 465–468.
[14] A. Feizollah, N.B. Anuar, R. Salleh, A.W.A. Wahab, A review on feature
[42] H.M. Zawbaa, E. Emary, B. Parv, Feature selection based on antlion
selection in mobile malware detection, Digit. Investig. 13 (2015) 22–37.
optimization algorithm, in: 2015 Third World Conference on Complex
[15] G. George, V.C. Raj, Review on feature selection techniques and the impact
Systems, WCCS, IEEE, 2015, pp. 1–7.
of SVM for cancer classification using gene expression profile, 2011, arXiv
[43] R.Y.M. Nakamura, L.A.M. Pereira, D. Rodrigues, K.A.P. Costa, J.P. Papa, X.-S.
preprint arXiv:1109.1062.
Yang, 9 - Binary bat algorithm for feature selection, in: X.-S. Yang, Z. Cui,
[16] Q. Dai, J.-H. Cheng, D.-W. Sun, X.-A. Zeng, Advances in feature se- R. Xiao, A.H. Gandomi, M. Karamanoglu (Eds.), Swarm Intelligence and
lection methods for hyperspectral image processing in food industry Bio-Inspired Computation, Elsevier, Oxford, 2013, pp. 225–237.
applications: A review, Crit. Rev. Food Sci. Nutr. 55 (10) (2015) [44] M.M. Mafarja, D. Eleyan, I. Jaber, A. Hammouri, S. Mirjalili, Binary drag-
1368–1382. onfly algorithm for feature selection, in: 2017 International Conference
[17] H. Alsolai, M. Roper, A systematic review of feature selection techniques on New Trends in Computing Sciences, ICTCS, IEEE, 2017, pp. 12–17.
in software quality prediction, in: 2019 International Conference on [45] R. Zhang, F. Nie, X. Li, X. Wei, Feature selection with multi-view data: A
Electrical and Computing Technologies and Applications, ICECTA, IEEE, survey, Inf. Fusion 50 (2019) 158–167.
2019, pp. 1–5. [46] H. Liu, M. Zhou, Q. Liu, An embedded feature selection method for
[18] J. Luo, D. Zhou, L. Jiang, H. Ma, A particle swarm optimization based imbalanced data classification, IEEE/CAA J. Autom. Sin. 6 (3) (2019)
multiobjective memetic algorithm for high-dimensional feature selection, 703–715.
Memet. Comput. 14 (1) (2022) 77–93. [47] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer
[19] T. Li, Z.-H. Zhan, J.-C. Xu, Q. Yang, Y.-Y. Ma, A binary individual search classification using support vector machines, Mach. Learn. 46 (1) (2002)
strategy-based bi-objective evolutionary algorithm for high-dimensional 389–422.
feature selection, Inform. Sci. 610 (2022) 651–673. [48] C. Catal, B. Diri, Investigating the effect of dataset size, metrics sets, and
[20] B. Venkatesh, J. Anuradha, A review of feature selection and its methods, feature selection techniques on software fault prediction problem, Inform.
Cybern. Inf. Technol. 19 (1) (2019) 3–26. Sci. 179 (8) (2009) 1040–1058.
[21] G. Chen, J. Chen, A novel wrapper method for feature selection and its [49] P. Refaeilzadeh, L. Tang, H. Liu, Cross-validation, in: Encyclopedia of
applications, Neurocomputing 159 (2015) 219–226. Database Systems, Vol. 5, Springer, 2009, pp. 532–538.
[22] N. Sánchez-Maroño, A. Alonso-Betanzos, M. Tombilla-Sanromán, Filter [50] B. Crawford, R. Soto, G. Astorga, J. García, C. Castro, F. Paredes, Putting
methods for feature selection – A comparative study, in: H. Yin, P. Tino, continuous metaheuristics to work in binary search spaces, Complexity
E. Corchado, W. Byrne, X. Yao (Eds.), Intelligent Data Engineering and 2017 (2017).
Automated Learning - IDEAL 2007, in: Lecture Notes in Computer Science, [51] S. Taghian, M.H. Nadimi-Shahraki, H. Zamani, Comparative analysis of
Springer, Berlin, Heidelberg, 2007, pp. 178–187. transfer function-based binary Metaheuristic algorithms for feature se-
[23] F. Lin, D. Liang, C.-C. Yeh, J.-C. Huang, Novel feature selection methods to lection, in: 2018 International Conference on Artificial Intelligence and
financial distress prediction, Expert Syst. Appl. 41 (5) (2014) 2472–2483. Data Processing, IDAP, IEEE, 2018, pp. 1–6.

21
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

[52] S. Mirjalili, A. Lewis, S-shaped versus V-shaped transfer functions for [80] M. Aslan, An approach based on tunicate swarm algorithm to solve
binary particle swarm optimization, Swarm Evol. Comput. 9 (2013) 1–14. partitional clustering problem, Balkan J. Electr. Comput. Eng. 9 (3) (2021)
[53] S. Ahmed, K.K. Ghosh, S. Mirjalili, R. Sarkar, AIEOU: Automata-based 242–248.
improved equilibrium optimizer with U-shaped transfer function for [81] S. Arora, P. Anand, Binary butterfly optimization approaches for feature
feature selection, Knowl.-Based Syst. 228 (2021) 107283. selection, Expert Syst. Appl. 116 (2019) 147–160.
[54] W.-Z. Sun, M. Zhang, J.-S. Wang, S.-S. Guo, M. Wang, W.-K. Hao, Binary [82] R.Y. Nakamura, L.A. Pereira, K.A. Costa, D. Rodrigues, J.P. Papa, X.-S. Yang,
particle swarm optimization algorithm based on Z-shaped probability BBA: a binary bat algorithm for feature selection, in: 2012 25th SIBGRAPI
transfer function to solve 0-1 knapsack problem, IAENG Int. J. Comput. Conference on Graphics, Patterns and Images, IEEE, 2012, pp. 291–297.
Sci. 48 (2) (2021) 294–303. [83] R. Sawhney, P. Mathur, R. Shankar, A firefly algorithm based wrapper-
[55] K.K. Ghosh, P.K. Singh, J. Hong, Z.W. Geem, R. Sarkar, Binary social penalty feature selection method for cancer diagnosis, in: International
mimic optimization algorithm with X-shaped transfer function for feature Conference on Computational Science and Its Applications, Springer, 2018,
selection, IEEE Access 8 (2020) 97890–97906. pp. 438–449.
[56] L. Wang, X. Wang, J. Fu, L. Zhen, A novel probability binary particle swarm [84] T. Thaher, N. Arman, Efficient multi-swarm binary harris hawks opti-
optimization algorithm and its application, J. Softw. 3 (9) (2008) 28–35. mization as a feature selection approach for software fault prediction, in:
[57] Y. He, F. Zhang, S. Mirjalili, T. Zhang, Novel binary differential evo- 2020 11th International Conference on Information and Communication
lution algorithm based on Taper-shaped transfer functions for binary Systems, ICICS, IEEE, 2020, pp. 249–254.
optimization problems, Swarm Evol. Comput. 69 (2022) 101022. [85] M.H. Nadimi-Shahraki, M. Banaie-Dezfouli, H. Zamani, S. Taghian, S.
[58] Z. Beheshti, UTF: Upgrade transfer function for binary meta-heuristic Mirjalili, B-MFO: a binary moth-flame optimization for feature selection
algorithms, Appl. Soft Comput. 106 (2021) 107346. from medical datasets, Computers 10 (11) (2021) 136.
[59] S.S. Shreem, H. Turabieh, S. Al Azwari, F. Baothman, Enhanced binary [86] Y. Gao, Y. Zhou, Q. Luo, An efficient binary equilibrium optimizer
genetic algorithm as a feature selection to predict student performance, algorithm for feature selection, IEEE Access 8 (2020) 140936–140963.
Soft Comput. 26 (4) (2022) 1811–1823. [87] K.H. Sheikh, S. Ahmed, K. Mukhopadhyay, P.K. Singh, J.H. Yoon, Z.W.
[60] X. Zhao, L. Bao, Q. Ning, J. Ji, X. Zhao, An improved binary differential Geem, R. Sarkar, EHHM: Electrical harmony based hybrid meta-heuristic
evolution algorithm for feature selection in molecular signatures, Mol. for feature selection, IEEE Access 8 (2020) 158125–158141.
Inform. 37 (4) (2018) 1700081. [88] E. Pashaei, N. Aydin, Binary black hole algorithm for feature selection and
[61] Z. Zhu, Y.-S. Ong, Memetic algorithms for feature selection on microarray classification on biological data, Appl. Soft Comput. 56 (2017) 94–106.
data, in: International Symposium on Neural Networks, Springer, 2007, [89] S. Nagpal, S. Arora, S. Dey, et al., Feature selection using gravitational
pp. 1327–1335. search algorithm for biomedical data, Procedia Comput. Sci. 115 (2017)
[62] P. Agrawal, T. Ganesh, A.W. Mohamed, A novel binary gaining–sharing 258–265.
knowledge-based optimization algorithm for feature selection, Neural [90] J. Too, A. Rahim Abdullah, Binary atom search optimisation approaches
Comput. Appl. 33 (11) (2021) 5989–6008. for feature selection, Connect. Sci. 32 (4) (2020) 406–430.
[63] L.M. Abualigah, A.T. Khader, M.A. Al-Betar, Z.A.A. Alyasseri, O.A. Alo-
[91] P.S. Rao, A.S. Kumar, Q. Niyaz, P. Sidike, V.K. Devabhaktuni, Binary
mari, E.S. Hanandeh, Feature selection with β -hill climbing search for
chemical reaction optimization based feature selection techniques for
text clustering application, in: 2017 Palestinian International Conference
machine learning classification problems, Expert Syst. Appl. 167 (2021)
on Information and Communication Technology, PICICT, IEEE, 2017,
114169.
pp. 22–27.
[92] G. Manita, O. Korbaa, Binary political optimizer for feature selection using
[64] R. Meiri, J. Zahavi, Using simulated annealing to optimize the feature
gene expression data, Comput. Intell. Neurosci. 2020 (2020).
selection problem in marketing applications, European J. Oper. Res. 171
[93] M. Allam, M. Nandhini, Optimal feature selection using binary teaching
(3) (2006) 842–858.
learning based optimization algorithm, J. King Saud Univ.-Comput. Inf.
[65] J.S. Sartakhti, M.H. Zangooei, K. Mozafari, Hepatitis disease diagnosis using
Sci. (2018).
a novel hybrid method based on support vector machine and simulated
[94] M. Mirhosseini, H. Nezamabadi-pour, BICA: a binary imperialist compet-
annealing (SVM-SA), Comput. Methods Programs Biomed. 108 (2) (2012)
itive algorithm and its application in CBIR systems, Int. J. Mach. Learn.
570–579.
Cybern. 9 (12) (2018) 2043–2057.
[66] G. Lebrun, C. Charrier, O. Lezoray, H. Cardot, Tabu search model selection
[95] P. Agrawal, T. Ganesh, D. Oliva, A.W. Mohamed, S-shaped and v-shaped
for SVM, Int. J. Neural Syst. 18 (01) (2008) 19–31.
gaining-sharing knowledge-based algorithm for feature selection, Appl.
[67] J. Wang, K. Guo, S. Wang, Rough set and Tabu search based feature
Intell. (2022) 1–32.
selection for credit scoring, Procedia Comput. Sci. 1 (1) (2010) 2425–2432.
[96] D. Rodrigues, X.-S. Yang, A.N.d. Souza, J.P. Papa, Binary flower pollination
[68] J.A. Soria-Alcaraz, E. Özcan, J. Swan, G. Kendall, M. Carpio, Iterated local
algorithm and its application to feature selection, in: Recent Advances
search using an add and delete hyper-heuristic for university course
in Swarm Intelligence and Evolutionary Computation, Springer, 2015,
timetabling, Appl. Soft Comput. 40 (2016) 581–593.
[69] R. Kizys, A.A. Juan, B. Sawik, L. Calvet, A biased-randomized iterated local pp. 85–100.
search algorithm for rich portfolio optimization, Appl. Sci. 9 (17) (2019) [97] S. Uma, E. Kirubakaran, S. Sathya Devi, Microarray image based cancer
3509. prediction: An genetic invasive weed optimization approach for feature
[70] M. Mendoza, S. Bonilla, C. Noguera, C. Cobos, E. León, Extractive single- selection, J. Med. Imag. Health Inform. 6 (8) (2016) 1934–1938.
document summarization based on genetic operators and guided local [98] S.A.-F. Sayed, E. Nabil, A. Badr, A binary clonal flower pollination
search, Expert Syst. Appl. 41 (9) (2014) 4158–4169. algorithm for feature selection, Pattern Recognit. Lett. 77 (2016) 21–27.
[71] M.S. Mohamad, S. Omatu, S. Deris, M. Yoshioka, Particle swarm optimiza- [99] J. Too, A.R. Abdullah, N. Mohd Saad, N. Mohd Ali, Feature selection based
tion for gene selection in classifying cancer classes, Artif. Life Robot. 14 on binary tree growth algorithm for the classification of myoelectric
(1) (2009) 16–19. signals, Machines 6 (4) (2018) 65.
[72] Y. Zhang, S. Wang, P. Phillips, G. Ji, Binary PSO with mutation operator [100] S. Taghian, M.H. Nadimi-Shahraki, Binary sine cosine algorithms for
for feature selection using decision tree applied to spam detection, feature selection from medical data, 2019, arXiv preprint arXiv:1911.
Knowl.-Based Syst. 64 (2014) 22–31. 07805.
[73] M. Schiezaro, H. Pedrini, Data feature selection based on Artificial Bee [101] H. Pinto, A. Peña, M. Valenzuela, A. Fernández, A binary sine-cosine
Colony algorithm, EURASIP J. Image Video Process. 2013 (1) (2013) 47. algorithm applied to the knapsack problem, in: Computer Science On-Line
[74] M. Mafarja, I. Aljarah, A.A. Heidari, H. Faris, P. Fournier-Viger, X. Li, Conference, Springer, 2019, pp. 128–138.
S. Mirjalili, Binary dragonfly optimization for feature selection using [102] A. Mabrouk, A. Dahou, M.A. Elaziz, R.P. Díaz Redondo, M. Kayed, Medical
time-varying transfer functions, Knowl.-Based Syst. 161 (2018) 185–204. image classification using transfer learning and chaos game optimization
[75] O. Kadri, L.H. Mouss, M.D. Mouss, Fault diagnosis of rotary kiln using SVM on the internet of medical things, Comput. Intell. Neurosci. 2022 (2022).
and binary ACO, J. Mech. Sci. Technol. 26 (2) (2012) 601–608. [103] P. Bansal, K. Gehlot, A. Singhal, A. Gupta, Automatic detection of osteosar-
[76] E. Pashaei, E. Pashaei, An efficient binary chimp optimization algorithm coma based on integrated features and feature selection using binary
for feature selection in biomedical data classification, Neural Comput. arithmetic optimization algorithm, Multimedia Tools Appl. 81 (6) (2022)
Appl. 34 (8) (2022) 6427–6451. 8807–8834.
[77] H. Faris, M.M. Mafarja, A.A. Heidari, I. Aljarah, A.-Z. Ala’M, S. Mirjalili, H. [104] D. Yang, G. Li, G. Cheng, On the efficiency of chaos optimization al-
Fujita, An efficient binary salp swarm algorithm with crossover scheme gorithms for global optimization, Chaos Solitons Fractals 34 (4) (2007)
for feature selection problems, Knowl.-Based Syst. 154 (2018) 43–67. 1366–1375.
[78] A.G. Hussien, A.E. Hassanien, E.H. Houssein, S. Bhattacharyya, M. Amin, [105] D. Yang, Z. Liu, J. Zhou, Chaos optimization algorithms based on chaotic
S-shaped binary whale optimization algorithm for feature selection, in: maps with different probability distribution and search speed for global
Recent Trends in Signal and Image Processing, Springer, 2019, pp. 79–87. optimization, Commun. Nonlinear Sci. Numer. Simul. 19 (4) (2014)
[79] G. Dhiman, D. Oliva, A. Kaur, K.K. Singh, S. Vimal, A. Sharma, K. Cengiz, 1229–1246.
BEPO: a novel binary emperor penguin optimizer for automatic feature [106] A.H. Gandomi, X.-S. Yang, Chaotic bat algorithm, J. Comput. Sci. 5 (2)
selection, Knowl.-Based Syst. 211 (2021) 106560. (2014) 224–232.

22
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

[107] G.I. Sayed, A. Tharwat, A.E. Hassanien, Chaotic dragonfly algorithm: an [134] D. Yousri, M. Abd Elaziz, L. Abualigah, D. Oliva, M.A. Al-Qaness,
improved metaheuristic algorithm for feature selection, Appl. Intell. 49 A.A. Ewees, COVID-19 X-ray images classification based on enhanced
(1) (2019) 188–205. fractional-order cuckoo search optimizer using heavy-tailed distributions,
[108] F.S. Gharehchopogh, I. Maleki, Z.A. Dizaji, Chaotic vortex search algorithm: Appl. Soft Comput. 101 (2021) 107052.
metaheuristic algorithm for feature selection, Evol. Intell. 15 (3) (2022) [135] D. Yousri, M. Abd Elaziz, D. Oliva, A. Abraham, M.A. Alotaibi, M.A. Hossain,
1777–1808. Fractional-order comprehensive learning marine predators algorithm for
[109] G.I. Sayed, G. Khoriba, M.H. Haggag, A novel chaotic salp swarm algorithm global optimization and feature selection, Knowl.-Based Syst. 235 (2022)
for global optimization and feature selection, Appl. Intell. 48 (10) (2018) 107603.
3462–3481. [136] D. Yousri, M. Abd Elaziz, S. Mirjalili, Fractional-order calculus-based
[110] S. Arora, M. Sharma, P. Anand, A novel chaotic interior search algorithm flower pollination algorithm with local search for global optimization and
for global optimization and feature selection, Appl. Artif. Intell. 34 (4) image segmentation, Knowl.-Based Syst. 197 (2020) 105889.
(2020) 292–328. [137] A.T. Sahlol, D. Yousri, A.A. Ewees, M.A. Al-Qaness, R. Damasevicius,
[111] G.I. Sayed, A.E. Hassanien, A.T. Azar, Feature selection via a novel chaotic M.A. Elaziz, COVID-19 image classification using deep features and
crow search algorithm, Neural Comput. Appl. 31 (1) (2019) 171–188. fractional-order marine predators algorithm, Sci. Rep. 10 (1) (2020) 1–15.
[112] P. Agrawal, T. Ganesh, A.W. Mohamed, Chaotic gaining sharing [138] L. Xu, Y. Si, Z. Guo, D. Bokov, Optimal skin cancer detection by a combined
knowledge-based optimization algorithm: an improved metaheuristic ENN and fractional order coot optimization algorithm, Proc. Inst. Mech.
algorithm for feature selection, Soft Comput. 25 (14) (2021) 9505–9528. Eng. H (2022) 09544119221113180.
[113] P. Anand, S. Arora, A novel chaotic selfish herd optimizer for global [139] J. Zhang, F. Gao, Y. Chen, Y. Zou, Parameter identification of fractional-
optimization and feature selection, Artif. Intell. Rev. 53 (2) (2020) order chaotic system based on chemical reaction optimization, in:
1441–1486. Proceedings of the 2018 2nd International Conference on Manage-
[114] J. Too, A.R. Abdullah, Chaotic atom search optimization for feature ment Engineering, Software Engineering and Service Sciences, 2018, pp.
selection, Arab. J. Sci. Eng. 45 (8) (2020) 6063–6079. 217–222.
[115] R.A. Khurma, I. Aljarah, A. Sharieh, An efficient moth flame optimiza- [140] A. Ates, Enhanced equilibrium optimization method with fractional order
tion algorithm using chaotic maps for feature selection in the medical chaotic and application engineering, Neural Comput. Appl. 33 (16) (2021)
applications, in: ICPRAM, 2020, pp. 175–182. 9849–9876.
[116] O.S. Qasim, N.A. Al-Thanoon, Z.Y. Algamal, Feature selection based on [141] E. Cantú-Paz, A survey of parallel genetic algorithms, Calc. Paralleles
chaotic binary black hole algorithm for data classification, Chemometr. Reseaux Syst. Repartis 10 (2) (1998) 141–171.
Intell. Lab. Syst. 204 (2020) 104104. [142] F. Fernández, M. Tomassini, L. Vanneschi, An empirical study of multipop-
ulation genetic programming, Genet. Program. Evol. Mach. 4 (1) (2003)
[117] G.I. Sayed, G. Khoriba, M.H. Haggag, A novel chaotic equilibrium optimizer
21–51.
algorithm with s-shaped and v-shaped transfer functions for feature
[143] Z. Skolicki, K. De Jong, The influence of migration sizes and intervals on
selection, J. Ambient Intell. Humaniz. Comput. 13 (6) (2022) 3137–3162.
island models, in: Proceedings of the 7th Annual Conference on Genetic
[118] A.F. Alrasheedi, K.A. Alnowibet, A. Saxena, K.M. Sallam, A.W. Mohamed,
and Evolutionary Computation, 2005, pp. 1295–1302.
Chaos embed marine predator (CMPA) algorithm for feature selection,
[144] M. Tomassini, Spatially Structured Evolutionary Algorithms: Artificial
Mathematics 10 (9) (2022) 1411.
Evolution in Space and Time, Springer, Berlin, Heidelberg, 2006.
[119] T.A. Rahman, A. As’ Arry, N.A.A. Jalil, R.M.K.R. Ahmad, Chaotic fractal
[145] M. Ruciński, D. Izzo, F. Biscani, On the impact of the migration topology
search algorithm for global optimization with application to control
on the island model, Parallel Comput. 36 (10–11) (2010) 555–571.
design, in: 2017 IEEE Symposium on Computer Applications & Industrial
[146] D. Whitley, S. Rana, R.B. Heckendorn, Island model genetic algorithms
Electronics, ISCAIE, IEEE, 2017, pp. 111–116.
and linearly separable problems, in: AISB International Workshop on
[120] G.I. Sayed, A. Darwish, A.E. Hassanien, A new chaotic whale optimization
Evolutionary Computing, Springer, 1997, pp. 109–125.
algorithm for features selection, J. Classification 35 (2) (2018) 300–344.
[147] L.A. da Silveira, J.L. Soncco-Álvarez, T.A. de Lima, M. Ayala-Rincón, Parallel
[121] H.M. Zawbaa, E. Emary, C. Grosan, Feature selection via chaotic antlion
Island Model Genetic Algorithms applied in NP-Hard problems, in: 2019
optimization, PLoS One 11 (3) (2016) e0150652.
IEEE Congress on Evolutionary Computation (CEC), 2019, pp. 3262–3269.
[122] K. Arumugam, et al., Chaotic duck traveler optimization (cDTO) algorithm
[148] G. Duarte, A. Lemonge, L. Goliatt, A dynamic migration policy to the Island
for feature selection in breast cancer dataset problem, Turk. J. Comput.
Model, in: 2017 IEEE Congress on Evolutionary Computation (CEC), 2017,
Math. Educ. (TURCOMAT) 12 (4) (2021) 250–262.
pp. 1135–1142.
[123] A. Wang, H. Liu, G. Chen, Chaotic harmony search based multi-objective [149] J.-i. Kushida, A. Hara, T. Takahama, A. Kido, Island-based differential
feature selection for classification of gene expression profiles, in: 2021 evolution with varying subpopulation size, in: 2013 IEEE 6th International
IEEE 9th International Conference on Bioinformatics and Computational Workshop on Computational Intelligence and Applications (IWCIA), 2013,
Biology, ICBCB, IEEE, 2021, pp. 107–112. pp. 119–124, ISSN: 1883-3977.
[124] L. Abualigah, A. Diabat, Chaotic binary reptile search algorithm and its [150] L. Araujo, J.J. Merelo, Diversity through multiculturality: Assessing mi-
feature selection applications, J. Ambient Intell. Humaniz. Comput. (2022) grant choice policies in an island model, IEEE Trans. Evol. Comput. 15 (4)
1–17. (2011) 456–469.
[125] A.A. Awad, A.F. Ali, T. Gaber, Feature selection method based on chaotic [151] M. Marinaki, Y. Marinakis, An island memetic differential evolution algo-
maps and butterfly optimization algorithm, in: The International Con- rithm for the feature selection problem, in: Nature Inspired Cooperative
ference on Artificial Intelligence and Computer Vision, Springer, 2020, Strategies for Optimization (NICSO 2013), Springer, 2014, pp. 29–42.
pp. 159–169. [152] T. Dokeroglu, E. Sevinc, An island parallel Harris hawks optimization
[126] S.K. Baliarsingh, S. Vipsita, Chaotic emperor penguin optimised extreme algorithm, Neural Comput. Appl. (2022) 1–28.
learning machine for microarray cancer classification, IET Syst. Biol. 14 [153] L. Han, H. Xu, J. Ma, Z. Jia, A feature selection method of the island
(2) (2020) 85–95. algorithm based on Gaussian mutation, Wirel. Commun. Mob. Comput.
[127] S.R.d. Silva, J.C. Gertrudes, Chaotic genetic bee colony: combining chaos 2022 (2022).
theory and genetic bee algorithm for feature selection in microarray [154] M. Saraswat, R. Pal, R. Singh, H. Mittal, A. Pandey, J. Chand Bansal,
cancer classification, in: Proceedings of the Genetic and Evolutionary An optimal feature selection approach using IBBO for histopathological
Computation Conference Companion, 2022, pp. 296–299. image classification, in: Congress on Intelligent Systems, Springer, 2020,
[128] K. Ahmed, A.E. Hassanien, S. Bhattacharyya, A novel chaotic chicken pp. 31–40.
swarm optimization algorithm for feature selection, in: 2017 Third [155] R.A. Khurma, H. Alsawalqah, I. Aljarah, M.A. Elaziz, R. Damaševičius, An
International Conference on Research in Computational Intelligence and enhanced evolutionary software defect prediction method using island
Communication Networks, ICRCICN, 2017, pp. 259–264, http://dx.doi.org/ moth flame optimization, Mathematics 9 (15) (2021) 1722.
10.1109/ICRCICN.2017.8234517. [156] J. Ortega, D. Kimovski, J.Q. Gan, A. Ortiz, M. Damas, A parallel island
[129] I. Petráš, Fractional-order chaotic systems, in: Fractional-Order Nonlinear approach to multiobjective feature selection for brain-computer inter-
Systems, Springer, 2011, pp. 103–184. faces, in: International Work-Conference on Artificial Neural Networks,
[130] Y. Peng, K. Sun, D. Peng, W. Ai, Dynamics of a higher dimensional Springer, 2017, pp. 16–27.
fractional-order chaotic map, Phys. A 525 (2019) 96–107. [157] J. García-Nieto, E. Alba, Parallel multi-swarm optimizer for gene selection
[131] G.-C. Wu, D. Baleanu, Discrete fractional logistic map and its chaos, in DNA microarrays, Appl. Intell. 37 (2) (2012) 255–266.
Nonlinear Dynam. 75 (1) (2014) 283–287. [158] Y. Liu, A. Ghandar, G. Theodoropoulos, Island model genetic algorithm for
[132] G.-C. Wu, D. Baleanu, S.-D. Zeng, Discrete chaos in fractional sine and feature selection in non-traditional credit risk evaluation, in: 2019 IEEE
standard maps, Phys. Lett. A 378 (5–6) (2014) 484–487. Congress on Evolutionary Computation, CEC, IEEE, 2019, pp. 2771–2778.
[133] R.A. Ibrahim, D. Yousri, M. Abd Elaziz, S. Alshathri, I. Attiya, Fractional [159] N. Singh, P. Singh, A hybrid ensemble-filter wrapper feature selection
calculus-based slime mould algorithm for feature selection using rough approach for medical data classification, Chemometr. Intell. Lab. Syst. 217
set, IEEE Access 9 (2021) 131625–131636. (2021) 104396.

23
M. Nssibi, G. Manita and O. Korbaa Computer Science Review 49 (2023) 100559

[160] A. Got, A. Moussaoui, D. Zouache, Hybrid filter-wrapper feature selection [178] E. Zorarpacı, S.A. Özel, A hybrid approach of differential evolution and
using whale optimization algorithm: A multi-objective approach, Expert artificial bee colony for feature selection, Expert Syst. Appl. 62 (2016)
Syst. Appl. 183 (2021) 115312. 91–103.
[161] E. Elhariri, N. El-Bendary, S.A. Taie, Using hybrid filter-wrapper feature se- [179] S. Arora, H. Singh, M. Sharma, S. Sharma, P. Anand, A new hybrid
lection with multi-objective improved-salp optimization for crack severity algorithm based on grey wolf optimization and crow search algorithm for
recognition, IEEE Access 8 (2020) 84290–84315. unconstrained function optimization and feature selection, Ieee Access 7
[162] Q. Al-Tashi, S.J.A. Kadir, H.M. Rais, S. Mirjalili, H. Alhussian, Binary (2019) 26343–26361.
optimization using hybrid grey wolf optimization for feature selection, [180] P. Bansal, S. Kumar, S. Pasrija, S. Singh, A hybrid grasshopper and new
Ieee Access 7 (2019) 39496–39508. cat swarm optimization algorithm for feature selection and optimization
[163] X. Li, J. Zhang, F. Safara, Improving the accuracy of diabetes diagnosis of multi-layer perceptron, Soft Comput. 24 (20) (2020) 15463–15489.
applications through a hybrid feature selection algorithm, Neural Process. [181] L. Abualigah, A.J. Dulaimi, A novel feature selection method for data
Lett. (2021) 1–17. mining tasks using hybrid sine cosine algorithm and genetic algorithm,
[164] M. Mafarja, A. Qasem, A.A. Heidari, I. Aljarah, H. Faris, S. Mirjalili, Efficient Cluster Comput. 24 (3) (2021) 2161–2176.
hybrid nature-inspired binary optimizers for feature selection, Cogn. [182] T. Elemam, M. Elshrkawey, A highly discriminative hybrid feature
Comput. 12 (1) (2020) 150–175. selection algorithm for cancer diagnosis, Sci. World J. 2022 (2022).
[165] T.N. Pham, L. Van Tran, S.V.T. Dao, Early disease classification of mango [183] M. Abdel-Basset, W. Ding, D. El-Shahat, A hybrid Harris Hawks optimiza-
leaves using feed-forward neural network and hybrid metaheuristic tion algorithm with simulated annealing for feature selection, Artif. Intell.
feature selection, IEEE Access 8 (2020) 189960–189973. Rev. 54 (1) (2021) 593–637.
[166] K. Zhu, S. Ying, N. Zhang, D. Zhu, Software defect prediction based on [184] H. Mohammadzadeh, F.S. Gharehchopogh, A novel hybrid whale optimiza-
enhanced metaheuristic feature selection optimization and a hybrid deep tion algorithm with flower pollination algorithm for feature selection:
neural network, J. Syst. Softw. 180 (2021) 111026. Case study Email spam detection, Comput. Intell. 37 (1) (2021) 176–209.
[167] S. Saif, P. Das, S. Biswas, M. Khari, V. Shanmuganathan, HIIDS: Hybrid [185] R. Alwajih, S.J. Abdulkadir, H. Al Hussian, N. Aziz, Q. Al-Tashi, S. Mir-
intelligent intrusion detection system empowered with machine learning jalili, A. Alqushaibi, Hybrid binary whale with harris hawks for feature
and metaheuristic algorithms for application in IoT based healthcare, selection, Neural Comput. Appl. (2022) 1–19.
Microprocess. Microsyst. (2022) 104622. [186] G. Li, C. Le, Hybrid binary bat algorithm with cross-entropy method for
[168] S.S. Kareem, R.R. Mostafa, F.A. Hashim, H.M. El-Bakry, An effective feature feature selection, in: 2019 4th International Conference on Control and
selection model using hybrid metaheuristic algorithms for iot intrusion Robotics Engineering, ICCRE, IEEE, 2019, pp. 165–169.
detection, Sensors 22 (4) (2022) 1396. [187] A. Chaudhuri, T.P. Sahu, A hybrid feature selection method based on
[169] M.M. Mafarja, S. Mirjalili, Hybrid whale optimization algorithm with Binary Jaya algorithm for micro-array data classification, Comput. Electr.
simulated annealing for feature selection, Neurocomputing 260 (2017) Eng. 90 (2021) 106963.
302–312. [188] M. Khamees, A. Albakry, K. Shaker, Multi-objective feature selection:
[170] R. Agrawal, B. Kaur, S. Sharma, Quantum based whale optimization hybrid of salp swarm and simulated annealing approach, in: Interna-
algorithm for wrapper feature selection, Appl. Soft Comput. 89 (2020) tional Conference on New Trends in Information and Communications
106092. Technology Applications, Springer, 2018, pp. 129–142.
[171] S. Sundaramurthy, P. Jayavel, A hybrid grey wolf optimization and particle [189] S.-S.M. Ajibade, N.B.B. Ahmad, A. Zainal, A hybrid chaotic particle swarm
swarm optimization with C4. 5 approach for prediction of rheumatoid optimization with differential evolution for feature selection, in: 2020
arthritis, Appl. Soft Comput. 94 (2020) 106500. IEEE Symposium on Industrial Electronics & Applications, ISIEA, IEEE,
[172] Q. Al-Tashi, S.J.A. Kadir, H.M. Rais, S. Mirjalili, H. Alhussian, Binary 2020, pp. 1–6.
optimization using hybrid grey wolf optimization for feature selection, [190] E.O. Reséndiz-Flores, J.A. Navarro-Acosta, A. Hernández-Martínez, Optimal
Ieee Access 7 (2019) 39496–39508. feature selection in industrial foam injection processes using hybrid
[173] S. Mirjalili, G.-G. Wang, L.d.S. Coelho, Binary optimization using hybrid binary Particle Swarm Optimization and Gravitational Search Algorithm
particle swarm optimization and gravitational search algorithm, Neural in the Mahalanobis–Taguchi System, Soft Comput. 24 (1) (2020) 341–349.
Comput. Appl. 25 (6) (2014) 1423–1435. [191] M. Alweshah, S. Alkhalaileh, M.A. Al-Betar, A.A. Bakar, Coronavirus
[174] S. Sadeg, L. Hamdad, A.R. Remache, M.N. Karech, K. Benatchba, Z. Hab- herd immunity optimizer with greedy crossover for feature selection in
bas, QBSO-FS: A reinforcement learning based bee swarm optimization medical diagnosis, Knowl.-Based Syst. 235 (2022) 107629.
metaheuristic for feature selection, in: International Work-Conference on [192] N. Simumba, S. Okami, A. Kodaka, N. Kohtake, Multiple objective meta-
Artificial Neural Networks, Springer, 2019, pp. 785–796. heuristics for feature selection based on stakeholder requirements in
[175] M. Alweshah, S. Alkhalaileh, D. Albashish, M. Mafarja, Q. Bsoul, O. credit scoring, Decis. Support Syst. 155 (2022) 113714.
Dorgham, A hybrid mine blast algorithm for feature selection problems, [193] B. Abdollahzadeh, F.S. Gharehchopogh, A multi-objective optimization
Soft Comput. 25 (1) (2021) 517–534. algorithm for feature selection problems, Eng. Comput. 38 (3) (2022)
[176] J. Jona, N. Nagaveni, Ant-cuckoo colony optimization for feature selection 1845–1863.
in digital mammogram, Pak. J. Biol. Sci.: PJBS 17 (2) (2014) 266–271. [194] B. Nouri-Moghaddam, M. Ghazanfari, M. Fathian, A novel multi-objective
[177] H. Jia, J. Li, W. Song, X. Peng, C. Lang, Y. Li, Spotted hyena optimization forest optimization algorithm for wrapper feature selection, Expert Syst.
algorithm with simulated annealing for feature selection, IEEE Access 7 Appl. 175 (2021) 114737.
(2019) 71943–71962.

24

You might also like